TamDesign.memo   Purcell	 D*R*A*F*T	14-Aug-85 20:04:32

This document describes internal data structures used in the Tamarin-1 hardware and low-level software. These data structure are also designed to coexist with a version of Interlisp-D on the D-Machines for software development and hardware simulation.

 
POINTER FORMAT

The interim word size will be 32 bits (expandable to 40 only on Tamarin hardware).  There are three pointer formats in memory and one more on the stack:

	*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
	|0 0|SUBTYP|0 0 0|	PTR					|
	*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*

	*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
	|0 1|S			INTEGER: 30-bit 2's Complement 		|
	*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
		
	*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
	|1|STY|F F F F F|  	Stack Block Marker			|
	*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*

	*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
	|1|S| EXPONENT  |	MANTISSA				|
	*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*

	
TYP	This field describes the basic type of the pointer.
	The tags were chosen to ease transition from D-Machine to TamOps.
	The taging choice for basic types might be reexamined after 32 bit conversion.
	
	00:  Pointer - subtype valid
	01:  30 bit 2's complement Integers (to one billion)
	1X:  Stack markers for D-Machine (unused by Tamarin, alias to Tamarin Floatp's)
	1X:  31 bit Immediate Floating point number  10↑-19 to 10↑19 (unused by D-Machine)
	
SUBTYP	When TYP=00 this field describes the exact type of some objects. 
	00,XXX1XX:  Reserved for type or address expansion
	00,XXXX1X:  Reserved for type or address expansion
	00,XXXXX1:  Reserved for type or address expansion
	
	00,000000:  ObjectP 	backwards compatible pointer.  Type is in usual type table.
	00,001000:  UserListP 	(ListP returns true for user object)
	00,010000:  ListP 	(else CAR/CDR trap)
	00,011000:  CodeP 	(else FN traps to interpreter)
	00,100000:  AtomP 	(else apply traps)	
	00,101000:  StackP 	(else FVAR← traps)
	00,110000:  NumberP 	(commonlisp EQL must trap)
	00,111000:  UnboundP 	(traps FVAR)	
	00,111100:  IndirectP 	(var reference treated as indirect)	
			
	01,SXXXXX:  IntegerP	
	1SEEEEEEE:  FloatP	
		
	100,XXXFF:  Basic Frame
	101,XXXXX:  Free block
	110,FFFFF:  Frame Extension with flags
	111,XXXXX:  Guard Block
	
	
PTR	This field contains a (usually even) pointer to 16 bits of memory.
	This pointer can address 32MB and be expanded to 256MB.
	
S	The sign bit for either IntegerP's or FloatP's.

FF	Stack Block (PTR=stackBlock)	.
	This pattern marks the start of a DMachine stack block
	
EXPONENT is the 7-bit excess 64 exponent, like a shortened IEEE exponent.

MANTISSA is an IEEE-like 23 bit mantissa with an assumed leading 1.

INTEGER is a 30-bit 2's complement integer.	


STACK STRUCTURE

Stack Frame (dumped in memory)

HDR	*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
-1:	|			      NEXT				|
	*=+=+=+=+=+=+=+=*=+=+=+=+=+=+=+=*=+=+=+=+=+=+=+=*=+=+=+=+=+=+=+=*
 0:	|T|S|N|M|L|C|F|R|    (MAXVAR)	|      SP	|   USECOUNT	|
	*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
	|			    NAMETABLE		   		|
	*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
	|			    CODE Base				|
	*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
	|			       PC				|
	*=+=+=+=+=+=+=+=*=+=+=+=+=+=+=+=*=+=+=+=+=+=+=+=*=+=+=+=+=+=+=+=*
	|			      ALINK				|
	*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
	|			      CLINK				|
	*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
	|			    EXTENSION				|
	*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
	|			      VAR0				|
	|			      ...				|
MAXVAR:	|			      VARi				|
	*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
SP:	|			      STK0				|
	|			      ...				|
~40:	|			      STKj				|
	*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*


NEXT	Frames remain chained down the stack and into the free list by NEXT pointers.
	Spagetti manipulation rearranges the linear threads but Dump/Restore does not.

FLAGS		* = used by microcode
	*T: Trap on Entry	
	*S: Slow Return ALINK # CLINK
	?N: No-Push: Don't push result when returned to
	?M: Multiple Values accepted
	 L: Large frame extension
	 C: InCall ??
	 F: Fast: binds no variables
	 R: Reserved for expansion (default 0)
	 
SP	The Stack Pointer indicates the top item on the stack in the internal frame.

MAXVAR	The MAXVAR field could be used for optionally checking stack underflow.

USECNT	The Usecnt field indicates if this frame is pointed to by any other frames or Stack pointers

CLINK	This field point sto the preveious frame in system memory.
	The CLINK field is only valid for the shallowest frame in the processor.
	Hence, there is only one CLINK register in the processor even though there are more frames.

ALINK	This field point to the frame which denotes the previous access environment.
	If this field is UNBOUND, then the CLink should be used for the ALink.
	
	
NAMETABLE
	This points to the nametable of the function.
	If it is unbound then the function binds no names.
	If it is unbound then the function should not be searched on free variable lookup.
	The field is copied from the compiled function header. 
	
Vars	This section contains the IVars, PVars, and FVars.
	Each is indexed from a single offset.
	This field can contain indirect pointers.
	
Stk	This section contains the working stack.
	The start of the stack is determined by the number of IVars and PVars.

Function Definition Cell:

	*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
	|   "CodeP"	|	   pointer to CodeP		   	|
	*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*

Function Header:

CodeP:	*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
	|     FLAGS	|    (MAXVAR)	|      SP	|   USECOUNT	|
	*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
	|			   NAMETABLE		   		|
	*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
	|    ENTRY 0	|    ENTRY 1	|    ENTRY 2	|    ENTRY 3	|
	*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
	|    ENTRY 4	|    ENTRY 5	|    ENTRY 6	|    ENTRY 7	|
	*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
	|    "atom"	|	FUNCTION NAME				|
	*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
	|			    unused				|
	*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
	|			    unused				|
	*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
	|int|		|    NTSIZE	|    NLOCALS	|  FVAROFFSET	|
	*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
	|     "atom" 	|	    Var Name				|
	|      ...	|	      ...				|
	|     "atom" 	|	    Var Name				|
	*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
	|int|			     Offset				|
	|int|     		      ...				|
	|int|			     Offset				|
	*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
ENTRY0:	|		         Compiled Code				|
	|								|
	|	ENTRY2							|
	|		ENTRY1						|
	|     			      ...				|
	|								|
	*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*


Some Processor Unique Registers:

	*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
	|				T				|
	*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
	|			       NIL				|
	*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
	|				0				|
	*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
	|				1				|
	*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
	|			     UNBIND				|
	*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
	|			  Val Space Base			|
	*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
	|			  FnDef Space Base			|
	*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
	|			  Type Table Base			|
	*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
	|		 FREE: Stack Frame Free List			|
	*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
	|			      CLINK				|
	*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
	|								|


	Global variables (GVARs) used often by UFNS could be stored in unique registers.
	
	
	*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
	|			Memory Address Reg			|
	*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
	

SOFTWARE DEVELOPMENT STRATEGY

We are building a lisp.sysout with data structures compatible with both reduced D-MachineOps and TamarinOps.  We believe the load up can be done on a Dorado with certain opcodes disabled.  We think all the opcode changes can be done in Lisp in the Unimplemented Function (UFN) code rather than by new microcode.  Changes are as follows: 

* Microcode work
  New traps to UFN
    Opcodes to trap on new Immediate data types
      TYPE, LISTP, EQ
      arithmetic already ok
    Odd pointer traps?
  Disable some opcodes
    CONS, RPLACD, RPLACA, BIN
  Change some bit packing in 24 vs 32 bit pointers
    FN checks for compiled flg in Function Defn cell


{	32 bit pointer fields: GETBASEPTR32, BIN
	CDR coding off: CAR, CDR, CONS, RPLCONS
	New Type subroutine due to Immediate 30-bit integers: LISTP, NTYPX,TYPEP,DTEST
	Immediate 30-bit integers coexisting with SMALLPs: EQ
}

* Call and Return across worlds UFNs (Interpreter)

First Observation: We can skip control transfers across worlds by bringing up a well ordered set of Tamarin Functions, and implementing only one (non reentrant) entry to the Tamarin Emulator.  We chose to skip fancy control transfers across worlds because it is tricky and we don't have good ways to do free variable look-up across worlds.

 * CodeP bit in Function Defn Cells

The most significant bit (compiled flg) in Function Defn cell can remain.  In the set of Tamarin 32-bit objects, Tamarin CodeP's and D-Machine CodeP's are disjoint.  PUTD will preserve Tamarin CodeP's and turn on the "FunDefCell msb" for D-Machine CodeP's.  The D-Machine will execute FunDefCells iff they contain D-Machine CodeP's, and Tamarin will executed FunDefCells iff they contain Tamarin CodeP's.

* Mixed immediate integer implementations 
    DLisp is efficient for SMALLP.  UFNS can implement TamINTEGER in terms of SMALLP
    TamOps is efficient for TamINTEGER.  UFNS can implement SMALLP in terms of TamINTEGER
    LispOps already traps arithmetic on non SMALLPs; only EQ needs work.
    EQ on both machines must be more careful to make the two representations of SMALLP equal.
    
  Limits
    FreeVar lookup across boundary.
    Context switch to Keyboard (in TamOps?)
    
Microcode work
  New traps to UFN
    Opcodes to trap on new Immediate data types
      TYPE, LISTP, EQ
      arithmetic already ok
    Odd pointer traps?
  Disable some opcodes
    CONS, RPLACD, RPLACA, BIN
  Change some bit packing in 24 vs 32 bit pointers
    FN checks for compiled flg in Function Defn cell
  Change Stack block types to make room for immediate types 

DESIGN DISCUSSION

The following subset of features is the goal for a first Tamarin machine:

YES	* 32 or 40 bit word size (vs 16)

YES	* Tagged architecture

BOTH	* 40 vs. 32 bit architecture

DEFER	* 4-bit Reference Count hardware

DEFER	* Cdr coding

IF EASY	* Indirect Pointers on the stack

YES	* Stack format changes

YES	* Bind/Unbind

YES	* Opcode changes

DEFER	* Prolog

STUDY	* CommonLisp

DEFER	* I/O changes

Some choices not emphasised in Alan's June 17 memo are:

DEFER	* 32 bit word addressability rather than the current 16 bit addressability  

Apart from the width of the machines data paths, we can choose the meaning of the least significant bit (LSB) of addresses.  Most instructions move 32 bit quantities and most but not all addresses point to 32 bit alligned quantities.  It is awkward to support odd addresses that point to 16 bit quanties.  Even if we intend to change the address conventions, a transition period could be accomodated if the LSB meaning were left unchanged until most system software had been changed to never use odd addresses.  During the transition period the use of odd addresses could be unsupported by anything more than a trap to a UFN thereby retaining odd addresses with a severe speed penalized.  During the transition period we could do more staging on the Dandelion and even install the "odd address penalty/detection" first for perfomance tuning the speed critical code, and secondly to detect and remove all odd addressing.


The hardest change to lisp in the subset is the new stack design and the many issues it involves:

* Variables in a Fixed Size Stack Frame

If a funtion's PVARs cannot all fit in the limited processor frame then a frame extension is allocated in memory at function entry and either freed at exit or garbaged collected later.  These extensions would be from a pool that is treated like stack for reference count purposes.  That is, pointers in the extensions are not reference counted and the pool is scanned like the stack at GC time.


* Evaluation Stack in a Fixed Size Stack Frame

Statistics indicate 99% of functions need only 16 cells of evaluation stack.  Tamarin can have more stack with less variables or less stack and more variables, but there is an upper limit on stack.  The compiler will be responsible for limiting stack depth.  Deep stacks (beyond about 32) seem to be rare enough that we can tempararily call them compile errors.  Later, techniques for moving stack variables into the frame extension or function splitting can be implemented.  
 
* Spagetti Stack: Merging Basic Frames and Frame Extensions

Where we currently copy only frame extensions we will copy whole frames.  We believe the slight change in sematics is acceptable to us and our customers.  We should try this change on a DMachine
Basic Frame vs Frame eXtension


*  Spagetti Stack: Slow Return

There are three cases of slow return in present Interlisp-D:
(1) ALINK # CLINK   The returning frame executes a RETURN which UFNs upon detecting "Slow" bit.  Tamarin will also UFN the RETURN opcode when the S (Slow) flag is set in the frame header.
(2) UseCnt(caller) > 1:  The returner can return but before the Caller can resume it must be copied.  Tamarin will Trap on entering the caller when the T (Trap on entry) flag is set in the frame header.
(3) No stack space ahead of caller.  Tamarin stack frames are heap allocated so this is only a case of "Stack Frame FreeList Empty"

Tamarin handles Spagetti Stack with Traps just before and just after RETURN.

All Calls are ordinary.
	Caller is suspended; new stack frame is instanciated for the Callee (in processor and/or memory).

Normal Frames point only to their caller and are pointed to only by one "callee".
	Return from such a frame to such a frame is ordinary
	Returner's frame is freed; Clink points to Returnee's frame which is resumed.
	
(ALink # CLink) in some frames which point to two different frames:
	Return to such a frame is ordinary
	Return from such a frame is special and should trap to a UFN.
		UFN must decrement usecnt(ALink) as the ALink reference is about to disappear
		
(usecnt>1) in some frames which are pointed to by more than one frame:
	Return from such a frame doesn't happen (because no running frame is ever pointed to)
	Return to such a frame is special and should trap to a UFN.
		UFN must copy the frame; decrement usecnt in original; usecnt←1 in new copy
		UFN returns to the new copy which hase usecnt=1 and noone points to it.

ISSUES


UFN arg count comes from where?


* Lexical closure

Masinter and vanMelle indicate that our current FVAR mechanism is sufficient to implement closures by forwarding variable references to closure frames.  The generalization is that FVAR binding pointers can point anywhere in system memory.  Stores to FVARS that don't point to the stack must be reference counted.

  
* Pre-assign stack frame buffers in memory for each processor frame?: NO

Assigning processor frames to memory frames in the free list at overflow time rather than pre-assigning memory frames simplifies and improves the latency of context switches where it is desirable to change free lists on context switch.  This design gives each context has its own safe pool of stack frame buffers.



Calls

Number of arguments is known at compile time except for APPLY which compiles closed today.

* Argument passing

Up to 6 arguments can be pushed on the stack before a call and the remaining arguments must be presented in a vector pushed on the stack.  The number of arguments (0-7) are copied from the callers stack to the callee's variables.  The callee has eight entry points for reformating different numbers of arguments.  In this way even evaluating default values for Common Lisp optional arguments can be efficiently compiled and executed.

PageFaults

The Memory Address register is pushed onto the stack by the PageFault microcode which then calls a ufn routine which effects a context switch from below the suspended frame.  Microcode that may fault must be sure to leave a valid TOS for a Fault Address to be push onto.


Traps

Traps are forced calls between instructions.  Generally the PC remains at the start of a trapping instruction.

ColdStart, Interrupt, PageFault, NoFreeFrames, EnterTrap


UFN

UFNs are substitute subroutines for some instruction in the instruction stream.  The PC is generally advanced as the UFN is called so that the instruction is skipped over on return.

* Free Variable Lookup

Free variable lookup is done by a UFN that scans up the stack looking at nametables until a match is found.  The inner loop of table search has an opcode assist.  If the variable is found in a processor cached frame that frame must be dumped before a bind pointer can be created that points to it.  (We may or may not want to push frames out of cache just to read the nametable pointer).  

An invariant to be maintained is that a bind pointer should point to memory and never into a cached stack frame.  Bell observed that (with suitable conventions) if Lookup pushes the specvar value out of the cache, then it will not return during the life of the frame that did the lookup.

* Pointers to cached stack frames

It should be a rule that frames with pointers to them should not be allowed in the frame cache.  Generally this condition is easy to maintain. StackP pointers to a frame are counted in UseCnt. Frames with UseCnt>1 are copied on load (EntryTrap).  FVAR Bind Pointers are accounted for in the paragraph above about free variable lookup.

We must be CAREFUL when extra frames are force loaded other than to return to.  We should be careful with any fancy multivalue return protocol that could use FVars in returning frame after forcing in caller.

MYCLINK opcode must force out the frame that it will point to.

  
* Context switching
  Almost for free we can give each context its own frame free list.
  The frame over flow trap can let groups of contests share frames among themselves.
	
SPEED CALCULATIONS
I estimate that on a well tuned Tamarin1, Interlisp-D programs will run 3 times faster than currently on a DLion.  The speed estimate assumes certain chip cycle times and certain memory cycle times and assumes a dynamic instruction mix. 

DYNAMIC INSTRUCTION STATS 
Statistics collectd June 81 by Dorado PC sampling of program "TEST".

			--- DLion ---	---------------	Tamarin ------------------------
	opcode	freq	time	prod	x100ns	cycles	cProd	mems x 500ns	Qmem x 500+250ns

Load Immediate	29	2.2	63	29	1	29	
Load/Store	28	4	117	40	1.1	25	 3	15
Read/Write	13	5.5	73     125	4.5	59	13	65
Jumps		12	3	37	45	-	10.5	-	 -	 7-	35
Arithmetic	 7	3	21	11	1.5	10.5
Call (4 PVars)	 6	19	114    101	3	18	 3	15	 9+9	68
Cadr/Type	 6	6	38	27	0.3	 2	 5	25	
Bind		 2	7?	14 	 8	4	 8	
IFU (10% miss) (10)	0.7	70	50					10-	50
Stk Flush (10%)	 0.6	-		32	3	 2			 6-	30
GC		(2)	20?	40	30		20?	 2?	10	
FreeVarFrame	 1	30	30	53		 5	 2	10	 5+5	38
------------	---	--	---	--	--	--	--	--	--	--	
TOTAL		102	5.5	577    550ns	       189	       140	       221

Chance of a memory instruction colliding with an immediately following memory cycle:
Jump Refill	 8%
Call		 3%
UFNs		 1%
Cadr		 5%
Read/Write	 0%
IFU Refill	10%
StackOver/Under	 1%

TOTAL		27%

	
SUMMARY Machine	cycles	instr	KIPS	BITBLT	BLT	Memory bandwidth cache/frame/ifu
	DLion	2.4MHz/ 5.5 =	 400	1MB/s	2MB/s	5 MBytes/sec
	Dorado	16 MHz/ 15  =   1000	5MB/s	10MB/s	32 MBytes/sec
	Dragon	10 MHz/ 5   =	2000	3MB/s	4MB/s	80 MBytes/sec (2 caches, 2 fast regs)
	Tamarin	10 MHz/ 6   =	1700	10MB/s	16MB/s	32 MBytes/sec (nibble mode)



APPENDIX A	DETAILS OF DYNAMIC INSTRUCTION STATS:

			--- DLion --	-------	Tamarin -------
*XOP	opcode	freq	time	prod	opcode	time	tProd	mem	mProd	Qmem	QmProd

Load Immediate	29	2.2	63		1	29	
	SIC,0,1	15	2	30	SCn	1	15
	SICX	5	3	15	SICX	1	05
	NIL	1	2	02	NIL	1	01
	COPY	3	2	06	COPY	1	03
	POP	5	2	10	POP	1	05
	GCONST	?	3		GCONST	1
	
Load/Store	28	4	117		1.1	25	3			
	IVARn	11	4	50	VARn	1	11
	PVARn	8	4	36	VARn	1	08
	PVAR←	1	4	04	VARn←	1	01
	PVARD←	1	4	04	VARn←↑	1	01
	PVARX	3	4	13	VARX	1	03			
	IVARX←	1	4	05	VARX←	1	01			
	GVAR	3	5	15	GVAR	0	 0	1	3			5		
	IVARX	-	4		IVARX	1					
	PVARX←	-	4		PVARX←	1
	
Read/Write	13	5.5	73		 4.5	59	1	13
	GETBASE	2	3	06	GBP, GF	 3	 6	1	 2
*	GETBYTE	3	4.5	13  RS,AB,GP,PF	 6	18		
	GETPTR	0.3	4	01	GETPTR	 1.5	 0.5	1	 		
	GETBITS	1?	5	05	GBP,GF	 3	 3	1	 1
*	PUTBASE	1	4	04   GBP,PF,PBP	 4.5	 4.5	2	 2
*	PUTBYTE	3	9	27RS,AB,GP,PF,PP 7.5	22.5	2	 6
	PUTPTR	2	5	10	PUTPTR	 1.5	 3	1	 2
*	PUTBITS	-	7	07   GBP,PF,PBP  4.5		2	
        ADDBASE	1	3	03	ADD	 1.5	 1.5	

Jumps		12	3	37	-	-	10.5	-	0	-	7	
	JUMP	-			NOPn	1
	JUMPX	1	2	02	JUMPX	1	 1				1	1
	JUMPXX	1	3	03	JUMPXX					1	1
	T/FJUMP	6	4	24	T/FJUMP	1/2	 3			1/2	3
        NTFJMPX	4	4	08	NTFJMPX	1/2	 2			1/2	2
	EQ	(3)	3	09	EQ	1.5	 4.5		
        T/FJMPX	-
	
Arithmetic	7	3	21		1.5	10.5
      IPLUS,DIF	2	3	06	PLUS	1.5	 3
*	AND	2	3	06	AND	1.5	 3	
*	VAG2	1	3	03	?	1?	 1
       LO/HILOC	1	3	03	?	1?	 1
      IGREATERP 1	3	03	GREATERP2.5	 2.5
	SHIFT	(2)	2	(06)	SHIFT	1.5	(2)
	
Call (4 PVars)	6	19	114		3	18	0.5	3	1.5	9
	FN1	1	22+8	30	FN1	4	 4	1	1	2	2
	FN2	1	22+8	30	FN2	5	 5	1	1	2	2
	FNX	1	22+8	30	FNX	6	 6	1	1	2	2
	RETURN	3	8	24	RETURN	1	 3			1	3
		
Cadr/Type	6	6	38		0.3	2	1	5	
	CAR	2	7	14	CAR	0		1	2
	CDR	2	8	16	CDR	0		1	2
*	TYPEP	1	4	04	TYPEP	1	1	1	1
	LISTP	1	4	04	LISTP	1	1	


BIND (2NIL 1PV)	2	7?	14 V←↑,N,V←,V←↑	4	8	
(D)UNBIND	(1)	(1)	   UB,V←,V←,V←↑ 4	8

	
GC		(2)	20?	40		20	40
*	RPLPTR	-	?		
*	SCAN1	-	?
*	SCAN2	-	?

FreeVarFrame	1	30	30			5		2		5
	names  .1*10*15	1.5	22						1/4	4
	frames	.1*10?	7	07		5	5	2	2	1	1
	
BLT 32 bit step	?	DLion	Bandwidth     refs/cycle Tamarin w/o BitBlt chip		
	BITBLT REPLACE	8	10 Mbps		3	5+2QM	? Mbps
	BITBLT XOR	8	10 Mbps		3	4+2QM	? Mbps
	BLT		4       20 Mbps		3	2QM	? Mbps






STATISTICS

Frame cache predictions  Bell date?
     #Frames  %Hit 1  % Hit 2  %Miss1  %Miss2
     -------  ------  -------  ------  ------
	1	50	60	50	40
	2	75	80	25	20
	3	80	90	20	10
	4	85	95	15	 5
	5	88	96	12	 4
	6	90	97	10	 3
	7	91	97 	 9	 3
	8	92	98	 8	 2


Frame Dumping Time: Tamarin VS Dragon

Parameter	Dragon	Tamarin  300ns cycle, 600 quad word
---------	------	-------
Call/Ret	 1.6	4 us 	= 1@ + 1@4 + 3 + refill  +  1 + refill  = .3+.6+.6+1.2 + .3+ 1.2
Dump/Load	20 us   8 us 	= (2@4+3(12/4)@4+3)*2 = 10@.6+1.8 + 8
Call time(32i)  16 us  20-30
miss rate      <10 %   <20 %
Dumps		2/16	2/20
Overhead       <12 %   <10 %


7480 BIND instructions analysed to show a static average of 1 PVar and 2 Nils being bound.

#PVar  Cnt   %
-----  ---  ---
   0  1825 24.10
   1  3543 47.36 
   2  1379 18.44
   3   506  6.76   
   4   129  1.72
   5    45   .60
   6    22   .30
   7    12   .16
   8     6   .08
   9     2   .02
  10     2   .02
  11     4   .06
  12     1   .00
  13     2   .02
  15     2   .02
  
#Nils  Cnt   %   Ave
-----  ---  ---  ---
   0  2440 32.62 
   1  1591 21.24 .21
   2  1119 14.96 .28
   3   953 12.74 .36
   4   468  6.26 .24
   5   223  2.98 .15
   6   154  2.06 .12
   7   115  1.54 .11
   8    81  1.08  8
   9    66   .88  8
  10    49   .66  7
  11    30   .40  4
  12    24   .32  4
  13    20   .26  3
  14    21   .28  4
  15   126  1.68 25
            ---- --
	        210

Static Frame Sizes 	Purcell Oct 84 <LISPCORE>NEXT>FULL.SYSOUT

With 16 Registers for IVars and PVars 2% (16/962) of frames will have Extenstions.
With 20 Registers for IVars and PVars 1% (6/962) of frames will have Extenstions.

#IVars+PVars	Cnt
------------	---
	0	  3	  
	1	 15	 15
	2	130	260
	3	103	309
	4	116	464
	5	119	595
	6	120	720
	7	 69	483
	8	 79	632
	9	 42	378
	10	 47	470
	11	 34	373
	12	 31	372
	13	 10	130
	14	 12	168
	15	 10	150
	16        7	112
	20	 10	200
	34	  4	136
	61	  1	 61
	81	  1	 81
		---	---
		962    6099/962 = 7
			
Static Stack depth by printcode (with compiler removing pops)	 Purcell April 85
7883 Functions analyzed  (results rounded)

    StackSize % F's   % Miss
    --------- -----   ------
     	 1	 4	96
	 2	11	84
	 3	15	68
	 4	20	48
	 5	15	34
	 6	10	24
	 7	 7	17
	 8	 5	13
	 9	 3	 9
	 10	 2	 7
	 11	 1	 5
	 12	 1	 4
	 13	 1	 3
	 14	 0.8	 2
	 15	 0.4	 2
	 16	 0.3	 1
	 20	 0.1	 0.6	
	 24	 0.08	 0.4
 	 28	 0.05	 0.2
	 32	 0.02	 0.1
	 37	(MAX)	 0.00
 

BitBlt performance limit:
 3 quads * 500ns/q + 8 cy * 200ns/cy = 3200ns/32x4bits = 25ns/bit = 5MB/sec
 
Expected number of args (Static) passed is <2
	0.50  FN0	0
	2.48  FN1	2.48
	2.85  FN2	5.70
	0.90  FN3	2.70
	0.24  FN4	0.98
	0.27  FNX	1.8
	----		----
	7.24	       13.66/7.24 = 2

Static instruction statistics collectd Purcell Oct 84 from <LISPCORE>NEXT>FULL.SYSOUT.
576760 Instructin analysed

OpCode Cnt    %   Name
------ ---   ---  ----
   0  7516  1.30  -X-
   1 14683  2.55  CAR
   2 17652  3.06  CDR
   3  4241   .74  LISTP
   4  1151   .20  NTYPX
   5  2357   .41  TYPEP
   6 10034  1.74  DTEST
  10  2869   .50  FN0
  11 14305  2.48  FN1
  12 16430  2.85  FN2
  13  5183   .90  FN3
  14  1408   .24  FN4
  15  1533   .27  FNX
  16   814   .14  APPLYFN
  17   476   .08  CHECKAPPLY*
  20 12474  2.16  RETURN
  21  7480  1.30  BIND
  22  1626   .28  UNBIND
  23  1439   .25  DUNBIND
  24  4160   .72  RPLPTR.N
  25    39   .00  GCREF
  27  1524   .26  GVAR←
  30   753   .13  RPLACA
  31   600   .10  RPLACD
  32 12405  2.15  CONS
  33   501   .09  GETP
  34   679   .12  FMEMB
  35   199   .03  GETHASH
  37   511   .09  CREATECELL
  40   592   .10  BIN
  44     3   .00  DOCOLLECT
  45     3   .00  ENDCOLLECT
  46   554   .10  RPLCONS
  50   463   .08  ELT
  51   143   .02  NTHCHC
  52   165   .03  SETA
  53    27   .00  RPLCHARCODE
  54    39   .00  EVAL
  55    38   .00  EVALV
  57     6   .00  STKSCAN
  73     1   .00  \MU.DRAWLINE
  76     2   .00  RAID
 100 24170  4.19  IVAR
 101 11723  2.03  IVAR
 102  5090   .88  IVAR
 103  2376   .41  IVAR
 104  1068   .19  IVAR
 105   599   .10  IVAR
 106   292   .05  IVAR
 107   547   .09  IVARX
 110 15235  2.64  PVAR
 111 10040  1.74  PVAR
 112  6955  1.21  PVAR
 113  5085   .88  PVAR
 114  4082   .71  PVAR
 115  3388   .59  PVAR
 116  2723   .47  PVAR
 117 17122  2.97  PVARX
 120  1395   .24  FVAR
 121  1243   .22  FVAR
 122  1498   .26  FVAR
 123  1272   .22  FVAR
 124  1153   .20  FVAR
 125   807   .14  FVAR
 126   602   .10  FVAR
 127  4994   .87  FVARX
 130  1847   .32  PVAR←
 131  2005   .35  PVAR←
 132  1837   .32  PVAR←
 133  1127   .20  PVAR←
 134   943   .16  PVAR←
 135   799   .14  PVAR←
 136   627   .11  PVAR←
 137  7441  1.29  PVARX←
 140 10180  1.77  GVAR
 141    80   .01  ARG0
 142  2669   .46  IVARX←
 143  2910   .50  FVARX←
 144 19875  3.45  COPY
 145    66   .01  MYARGCOUNT
 146    42   .00  MYALINK
 147 22169  3.84  ACONST
 150 13028  2.26  'NIL
 151  6558  1.14  'T
 152  8159  1.41  '0
 153  7523  1.30  '1
 154 12587  2.18  SIC
 155   537   .09  SNIC
 156  2435   .42  SICX
 157  6123  1.06  GCONST
 161    16   .00  READFLAGS
 162    13   .00  READRP
 163    27   .00  WRITEMAP
 164     1   .00  READPRINTERPORT
 165     1   .00  WRITEPRINTERPORT
 166    15   .00  PILOTBITBLT
 167    18   .00  RCLK
 170    15   .00  MISC1
 171    14   .00  MISC2
 172     2   .00  RECLAIMCELL
 173     1   .00  GCSCAN1
 174     1   .00  GCSCAN2
 175    28   .00  SUBRCALL
 176    27   .00  CONTEXTSWITCH
 200   779   .14  JUMP
 201   656   .11  JUMP
 202  1667   .29  JUMP
 203   397   .07  JUMP
 204   293   .05  JUMP
 205   298   .05  JUMP
 206   300   .05  JUMP
 207   239   .04  JUMP
 210   276   .05  JUMP
 211   175   .03  JUMP
 212   186   .03  JUMP
 213   231   .04  JUMP
 214   208   .04  JUMP
 215   266   .05  JUMP
 216   139   .02  JUMP
 217   173   .03  JUMP
 220    82   .01  FJUMP
 221  1318   .23  FJUMP
 222  1873   .32  FJUMP
 223  1691   .29  FJUMP
 224  1309   .23  FJUMP
 225  1311   .23  FJUMP
 226  1190   .21  FJUMP
 227  1123   .19  FJUMP
 230   829   .14  FJUMP
 231   803   .14  FJUMP
 232   650   .11  FJUMP
 233   579   .10  FJUMP
 234   448   .08  FJUMP
 235   510   .09  FJUMP
 236   453   .08  FJUMP
 237   360   .06  FJUMP
 240    70   .01  TJUMP
 241  1357   .24  TJUMP
 242  1957   .34  TJUMP
 243   621   .11  TJUMP
 244   751   .13  TJUMP
 245  1280   .22  TJUMP
 246   710   .12  TJUMP
 247   553   .10  TJUMP
 250   541   .09  TJUMP
 251   394   .07  TJUMP
 252   295   .05  TJUMP
 253   388   .07  TJUMP
 254   314   .05  TJUMP
 255   250   .04  TJUMP
 256   258   .04  TJUMP
 257   168   .03  TJUMP
 260  8277  1.44  JUMPX
 261  5829  1.01  JUMPXX
 262  8781  1.52  FJUMPX
 263  4941   .86  TJUMPX
 264  3209   .56  NFJUMPX
 265  7612  1.32  NTJUMPX
 270  1024   .18  PVAR←↑
 271  1369   .24  PVAR←↑
 272  1278   .22  PVAR←↑
 273  1127   .20  PVAR←↑
 274   976   .17  PVAR←↑
 275   775   .13  PVAR←↑
 276   726   .13  PVAR←↑
 277 32090  5.56  POP
 302   355   .06  GETBASEBYTE
 304   154   .03  BLT
 307   220   .04  PUTBASEBYTE
 310  4874   .85  GETBASE.N
 311  8764  1.52  GETBASEPTR.N
 312  1389   .24  GETBITS.N.FD
 315  2150   .37  PUTBASE.N
 316   216   .04  PUTBASEPTR.N
 317   921   .16  PUTBITS.N.FD
 320  2513   .44  ADDBASE
 321   869   .15  VAG2
 322   262   .05  HILOC
 323   571   .10  LOLOC
 324   154   .03  PLUS2
 325   167   .03  DIFFERENCE
 326    58   .01  TIMES2
 327    14   .00  QUOTIENT
 330  6215  1.08  IPLUS2
 331  4312   .75  IDIFFERENCE
 332   473   .08  ITIMES2
 333   319   .06  IQUOTIENT
 334    58   .01  IREMAINDER
 340   948   .16  LLSH1
 341   473   .08  LLSH8
 342   989   .17  LRSH1
 343   495   .09  LRSH8
 344   238   .04  LOGOR2
 345  1710   .30  LOGAND2
 346   120   .02  LOGXOR2
 350   120   .02  FPLUS2
 351    67   .01  FDIFFERENCE
 352   250   .04  FTIMES2
 353    98   .02  FQUOTIENT
 355    27   .00  UBFLOAT1
 360 19235  3.34  EQ
 361  4614   .80  IGREATERP
 362    54   .00  FGREATERP
 363   215   .04  GREATERP
 365   466   .08  MAKENUMBER
 366   139   .02  BOXIPLUS
 367    25   .00  BOXIDIFFERENCE
 375   433   .08  SWAP