TamDesign.memo Purcell D*R*A*F*T 14-Aug-85 20:04:32
This document describes internal data structures used in the Tamarin-1 hardware and low-level software. These data structure are also designed to coexist with a version of Interlisp-D on the D-Machines for software development and hardware simulation.
POINTER FORMAT
The interim word size will be 32 bits (expandable to 40 only on Tamarin hardware). There are three pointer formats in memory and one more on the stack:
*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
|0 0|SUBTYP|0 0 0| PTR |
*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
|0 1|S INTEGER: 30-bit 2's Complement |
*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
|1|STY|F F F F F| Stack Block Marker |
*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
|1|S| EXPONENT | MANTISSA |
*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
TYP This field describes the basic type of the pointer.
The tags were chosen to ease transition from D-Machine to TamOps.
The taging choice for basic types might be reexamined after 32 bit conversion.
00: Pointer - subtype valid
01: 30 bit 2's complement Integers (to one billion)
1X: Stack markers for D-Machine (unused by Tamarin, alias to Tamarin Floatp's)
1X: 31 bit Immediate Floating point number 10↑-19 to 10↑19 (unused by D-Machine)
SUBTYP When TYP=00 this field describes the exact type of some objects.
00,XXX1XX: Reserved for type or address expansion
00,XXXX1X: Reserved for type or address expansion
00,XXXXX1: Reserved for type or address expansion
00,000000: ObjectP backwards compatible pointer. Type is in usual type table.
00,001000: UserListP (ListP returns true for user object)
00,010000: ListP (else CAR/CDR trap)
00,011000: CodeP (else FN traps to interpreter)
00,100000: AtomP (else apply traps)
00,101000: StackP (else FVAR← traps)
00,110000: NumberP (commonlisp EQL must trap)
00,111000: UnboundP (traps FVAR)
00,111100: IndirectP (var reference treated as indirect)
01,SXXXXX: IntegerP
1SEEEEEEE: FloatP
100,XXXFF: Basic Frame
101,XXXXX: Free block
110,FFFFF: Frame Extension with flags
111,XXXXX: Guard Block
PTR This field contains a (usually even) pointer to 16 bits of memory.
This pointer can address 32MB and be expanded to 256MB.
S The sign bit for either IntegerP's or FloatP's.
FF Stack Block (PTR=stackBlock) .
This pattern marks the start of a DMachine stack block
EXPONENT is the 7-bit excess 64 exponent, like a shortened IEEE exponent.
MANTISSA is an IEEE-like 23 bit mantissa with an assumed leading 1.
INTEGER is a 30-bit 2's complement integer.
STACK STRUCTURE
Stack Frame (dumped in memory)
HDR *-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
-1: | NEXT |
*=+=+=+=+=+=+=+=*=+=+=+=+=+=+=+=*=+=+=+=+=+=+=+=*=+=+=+=+=+=+=+=*
0: |T|S|N|M|L|C|F|R| (MAXVAR) | SP | USECOUNT |
*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
| NAMETABLE |
*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
| CODE Base |
*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
| PC |
*=+=+=+=+=+=+=+=*=+=+=+=+=+=+=+=*=+=+=+=+=+=+=+=*=+=+=+=+=+=+=+=*
| ALINK |
*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
| CLINK |
*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
| EXTENSION |
*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
| VAR0 |
| ... |
MAXVAR: | VARi |
*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
SP: | STK0 |
| ... |
~40: | STKj |
*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
NEXT Frames remain chained down the stack and into the free list by NEXT pointers.
Spagetti manipulation rearranges the linear threads but Dump/Restore does not.
FLAGS * = used by microcode
*T: Trap on Entry
*S: Slow Return ALINK # CLINK
?N: No-Push: Don't push result when returned to
?M: Multiple Values accepted
L: Large frame extension
C: InCall ??
F: Fast: binds no variables
R: Reserved for expansion (default 0)
SP The Stack Pointer indicates the top item on the stack in the internal frame.
MAXVAR The MAXVAR field could be used for optionally checking stack underflow.
USECNT The Usecnt field indicates if this frame is pointed to by any other frames or Stack pointers
CLINK This field point sto the preveious frame in system memory.
The CLINK field is only valid for the shallowest frame in the processor.
Hence, there is only one CLINK register in the processor even though there are more frames.
ALINK This field point to the frame which denotes the previous access environment.
If this field is UNBOUND, then the CLink should be used for the ALink.
NAMETABLE
This points to the nametable of the function.
If it is unbound then the function binds no names.
If it is unbound then the function should not be searched on free variable lookup.
The field is copied from the compiled function header.
Vars This section contains the IVars, PVars, and FVars.
Each is indexed from a single offset.
This field can contain indirect pointers.
Stk This section contains the working stack.
The start of the stack is determined by the number of IVars and PVars.
Function Definition Cell:
*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
| "CodeP" | pointer to CodeP |
*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
Function Header:
CodeP: *-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
| FLAGS | (MAXVAR) | SP | USECOUNT |
*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
| NAMETABLE |
*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
| ENTRY 0 | ENTRY 1 | ENTRY 2 | ENTRY 3 |
*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
| ENTRY 4 | ENTRY 5 | ENTRY 6 | ENTRY 7 |
*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
| "atom" | FUNCTION NAME |
*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
| unused |
*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
| unused |
*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
|int| | NTSIZE | NLOCALS | FVAROFFSET |
*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
| "atom" | Var Name |
| ... | ... |
| "atom" | Var Name |
*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
|int| Offset |
|int| ... |
|int| Offset |
*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
ENTRY0: | Compiled Code |
| |
| ENTRY2 |
| ENTRY1 |
| ... |
| |
*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
Some Processor Unique Registers:
*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
| T |
*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
| NIL |
*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
| 0 |
*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
| 1 |
*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
| UNBIND |
*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
| Val Space Base |
*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
| FnDef Space Base |
*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
| Type Table Base |
*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
| FREE: Stack Frame Free List |
*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
| CLINK |
*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
| |
Global variables (GVARs) used often by UFNS could be stored in unique registers.
*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
| Memory Address Reg |
*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*
SOFTWARE DEVELOPMENT STRATEGY
We are building a lisp.sysout with data structures compatible with both reduced D-MachineOps and TamarinOps. We believe the load up can be done on a Dorado with certain opcodes disabled. We think all the opcode changes can be done in Lisp in the Unimplemented Function (UFN) code rather than by new microcode. Changes are as follows:
* Microcode work
New traps to UFN
Opcodes to trap on new Immediate data types
TYPE, LISTP, EQ
arithmetic already ok
Odd pointer traps?
Disable some opcodes
CONS, RPLACD, RPLACA, BIN
Change some bit packing in 24 vs 32 bit pointers
FN checks for compiled flg in Function Defn cell
{ 32 bit pointer fields: GETBASEPTR32, BIN
CDR coding off: CAR, CDR, CONS, RPLCONS
New Type subroutine due to Immediate 30-bit integers: LISTP, NTYPX,TYPEP,DTEST
Immediate 30-bit integers coexisting with SMALLPs: EQ
}
* Call and Return across worlds UFNs (Interpreter)
First Observation: We can skip control transfers across worlds by bringing up a well ordered set of Tamarin Functions, and implementing only one (non reentrant) entry to the Tamarin Emulator. We chose to skip fancy control transfers across worlds because it is tricky and we don't have good ways to do free variable look-up across worlds.
* CodeP bit in Function Defn Cells
The most significant bit (compiled flg) in Function Defn cell can remain. In the set of Tamarin 32-bit objects, Tamarin CodeP's and D-Machine CodeP's are disjoint. PUTD will preserve Tamarin CodeP's and turn on the "FunDefCell msb" for D-Machine CodeP's. The D-Machine will execute FunDefCells iff they contain D-Machine CodeP's, and Tamarin will executed FunDefCells iff they contain Tamarin CodeP's.
* Mixed immediate integer implementations
DLisp is efficient for SMALLP. UFNS can implement TamINTEGER in terms of SMALLP
TamOps is efficient for TamINTEGER. UFNS can implement SMALLP in terms of TamINTEGER
LispOps already traps arithmetic on non SMALLPs; only EQ needs work.
EQ on both machines must be more careful to make the two representations of SMALLP equal.
Limits
FreeVar lookup across boundary.
Context switch to Keyboard (in TamOps?)
Microcode work
New traps to UFN
Opcodes to trap on new Immediate data types
TYPE, LISTP, EQ
arithmetic already ok
Odd pointer traps?
Disable some opcodes
CONS, RPLACD, RPLACA, BIN
Change some bit packing in 24 vs 32 bit pointers
FN checks for compiled flg in Function Defn cell
Change Stack block types to make room for immediate types
DESIGN DISCUSSION
The following subset of features is the goal for a first Tamarin machine:
YES * 32 or 40 bit word size (vs 16)
YES * Tagged architecture
BOTH * 40 vs. 32 bit architecture
DEFER * 4-bit Reference Count hardware
DEFER * Cdr coding
IF EASY * Indirect Pointers on the stack
YES * Stack format changes
YES * Bind/Unbind
YES * Opcode changes
DEFER * Prolog
STUDY * CommonLisp
DEFER * I/O changes
Some choices not emphasised in Alan's June 17 memo are:
DEFER * 32 bit word addressability rather than the current 16 bit addressability
Apart from the width of the machines data paths, we can choose the meaning of the least significant bit (LSB) of addresses. Most instructions move 32 bit quantities and most but not all addresses point to 32 bit alligned quantities. It is awkward to support odd addresses that point to 16 bit quanties. Even if we intend to change the address conventions, a transition period could be accomodated if the LSB meaning were left unchanged until most system software had been changed to never use odd addresses. During the transition period the use of odd addresses could be unsupported by anything more than a trap to a UFN thereby retaining odd addresses with a severe speed penalized. During the transition period we could do more staging on the Dandelion and even install the "odd address penalty/detection" first for perfomance tuning the speed critical code, and secondly to detect and remove all odd addressing.
The hardest change to lisp in the subset is the new stack design and the many issues it involves:
* Variables in a Fixed Size Stack Frame
If a funtion's PVARs cannot all fit in the limited processor frame then a frame extension is allocated in memory at function entry and either freed at exit or garbaged collected later. These extensions would be from a pool that is treated like stack for reference count purposes. That is, pointers in the extensions are not reference counted and the pool is scanned like the stack at GC time.
* Evaluation Stack in a Fixed Size Stack Frame
Statistics indicate 99% of functions need only 16 cells of evaluation stack. Tamarin can have more stack with less variables or less stack and more variables, but there is an upper limit on stack. The compiler will be responsible for limiting stack depth. Deep stacks (beyond about 32) seem to be rare enough that we can tempararily call them compile errors. Later, techniques for moving stack variables into the frame extension or function splitting can be implemented.
* Spagetti Stack: Merging Basic Frames and Frame Extensions
Where we currently copy only frame extensions we will copy whole frames. We believe the slight change in sematics is acceptable to us and our customers. We should try this change on a DMachine
Basic Frame vs Frame eXtension
* Spagetti Stack: Slow Return
There are three cases of slow return in present Interlisp-D:
(1) ALINK # CLINK The returning frame executes a RETURN which UFNs upon detecting "Slow" bit. Tamarin will also UFN the RETURN opcode when the S (Slow) flag is set in the frame header.
(2) UseCnt(caller) > 1: The returner can return but before the Caller can resume it must be copied. Tamarin will Trap on entering the caller when the T (Trap on entry) flag is set in the frame header.
(3) No stack space ahead of caller. Tamarin stack frames are heap allocated so this is only a case of "Stack Frame FreeList Empty"
Tamarin handles Spagetti Stack with Traps just before and just after RETURN.
All Calls are ordinary.
Caller is suspended; new stack frame is instanciated for the Callee (in processor and/or memory).
Normal Frames point only to their caller and are pointed to only by one "callee".
Return from such a frame to such a frame is ordinary
Returner's frame is freed; Clink points to Returnee's frame which is resumed.
(ALink # CLink) in some frames which point to two different frames:
Return to such a frame is ordinary
Return from such a frame is special and should trap to a UFN.
UFN must decrement usecnt(ALink) as the ALink reference is about to disappear
(usecnt>1) in some frames which are pointed to by more than one frame:
Return from such a frame doesn't happen (because no running frame is ever pointed to)
Return to such a frame is special and should trap to a UFN.
UFN must copy the frame; decrement usecnt in original; usecnt←1 in new copy
UFN returns to the new copy which hase usecnt=1 and noone points to it.
ISSUES
UFN arg count comes from where?
* Lexical closure
Masinter and vanMelle indicate that our current FVAR mechanism is sufficient to implement closures by forwarding variable references to closure frames. The generalization is that FVAR binding pointers can point anywhere in system memory. Stores to FVARS that don't point to the stack must be reference counted.
* Pre-assign stack frame buffers in memory for each processor frame?: NO
Assigning processor frames to memory frames in the free list at overflow time rather than pre-assigning memory frames simplifies and improves the latency of context switches where it is desirable to change free lists on context switch. This design gives each context has its own safe pool of stack frame buffers.
Calls
Number of arguments is known at compile time except for APPLY which compiles closed today.
* Argument passing
Up to 6 arguments can be pushed on the stack before a call and the remaining arguments must be presented in a vector pushed on the stack. The number of arguments (0-7) are copied from the callers stack to the callee's variables. The callee has eight entry points for reformating different numbers of arguments. In this way even evaluating default values for Common Lisp optional arguments can be efficiently compiled and executed.
PageFaults
The Memory Address register is pushed onto the stack by the PageFault microcode which then calls a ufn routine which effects a context switch from below the suspended frame. Microcode that may fault must be sure to leave a valid TOS for a Fault Address to be push onto.
Traps
Traps are forced calls between instructions. Generally the PC remains at the start of a trapping instruction.
ColdStart, Interrupt, PageFault, NoFreeFrames, EnterTrap
UFN
UFNs are substitute subroutines for some instruction in the instruction stream. The PC is generally advanced as the UFN is called so that the instruction is skipped over on return.
* Free Variable Lookup
Free variable lookup is done by a UFN that scans up the stack looking at nametables until a match is found. The inner loop of table search has an opcode assist. If the variable is found in a processor cached frame that frame must be dumped before a bind pointer can be created that points to it. (We may or may not want to push frames out of cache just to read the nametable pointer).
An invariant to be maintained is that a bind pointer should point to memory and never into a cached stack frame. Bell observed that (with suitable conventions) if Lookup pushes the specvar value out of the cache, then it will not return during the life of the frame that did the lookup.
* Pointers to cached stack frames
It should be a rule that frames with pointers to them should not be allowed in the frame cache. Generally this condition is easy to maintain. StackP pointers to a frame are counted in UseCnt. Frames with UseCnt>1 are copied on load (EntryTrap). FVAR Bind Pointers are accounted for in the paragraph above about free variable lookup.
We must be CAREFUL when extra frames are force loaded other than to return to. We should be careful with any fancy multivalue return protocol that could use FVars in returning frame after forcing in caller.
MYCLINK opcode must force out the frame that it will point to.
* Context switching
Almost for free we can give each context its own frame free list.
The frame over flow trap can let groups of contests share frames among themselves.
SPEED CALCULATIONS
I estimate that on a well tuned Tamarin1, Interlisp-D programs will run 3 times faster than currently on a DLion. The speed estimate assumes certain chip cycle times and certain memory cycle times and assumes a dynamic instruction mix.
DYNAMIC INSTRUCTION STATS
Statistics collectd June 81 by Dorado PC sampling of program "TEST".
--- DLion --- --------------- Tamarin ------------------------
opcode freq time prod x100ns cycles cProd mems x 500ns Qmem x 500+250ns
Load Immediate 29 2.2 63 29 1 29
Load/Store 28 4 117 40 1.1 25 3 15
Read/Write 13 5.5 73 125 4.5 59 13 65
Jumps 12 3 37 45 - 10.5 - - 7- 35
Arithmetic 7 3 21 11 1.5 10.5
Call (4 PVars) 6 19 114 101 3 18 3 15 9+9 68
Cadr/Type 6 6 38 27 0.3 2 5 25
Bind 2 7? 14 8 4 8
IFU (10% miss) (10) 0.7 70 50 10- 50
Stk Flush (10%) 0.6 - 32 3 2 6- 30
GC (2) 20? 40 30 20? 2? 10
FreeVarFrame 1 30 30 53 5 2 10 5+5 38
------------ --- -- --- -- -- -- -- -- -- --
TOTAL 102 5.5 577 550ns 189 140 221
Chance of a memory instruction colliding with an immediately following memory cycle:
Jump Refill 8%
Call 3%
UFNs 1%
Cadr 5%
Read/Write 0%
IFU Refill 10%
StackOver/Under 1%
TOTAL 27%
SUMMARY Machine cycles instr KIPS BITBLT BLT Memory bandwidth cache/frame/ifu
DLion 2.4MHz/ 5.5 = 400 1MB/s 2MB/s 5 MBytes/sec
Dorado 16 MHz/ 15 = 1000 5MB/s 10MB/s 32 MBytes/sec
Dragon 10 MHz/ 5 = 2000 3MB/s 4MB/s 80 MBytes/sec (2 caches, 2 fast regs)
Tamarin 10 MHz/ 6 = 1700 10MB/s 16MB/s 32 MBytes/sec (nibble mode)
APPENDIX A DETAILS OF DYNAMIC INSTRUCTION STATS:
--- DLion -- ------- Tamarin -------
*XOP opcode freq time prod opcode time tProd mem mProd Qmem QmProd
Load Immediate 29 2.2 63 1 29
SIC,0,1 15 2 30 SCn 1 15
SICX 5 3 15 SICX 1 05
NIL 1 2 02 NIL 1 01
COPY 3 2 06 COPY 1 03
POP 5 2 10 POP 1 05
GCONST ? 3 GCONST 1
Load/Store 28 4 117 1.1 25 3
IVARn 11 4 50 VARn 1 11
PVARn 8 4 36 VARn 1 08
PVAR← 1 4 04 VARn← 1 01
PVARD← 1 4 04 VARn←↑ 1 01
PVARX 3 4 13 VARX 1 03
IVARX← 1 4 05 VARX← 1 01
GVAR 3 5 15 GVAR 0 0 1 3 5
IVARX - 4 IVARX 1
PVARX← - 4 PVARX← 1
Read/Write 13 5.5 73 4.5 59 1 13
GETBASE 2 3 06 GBP, GF 3 6 1 2
* GETBYTE 3 4.5 13 RS,AB,GP,PF 6 18
GETPTR 0.3 4 01 GETPTR 1.5 0.5 1
GETBITS 1? 5 05 GBP,GF 3 3 1 1
* PUTBASE 1 4 04 GBP,PF,PBP 4.5 4.5 2 2
* PUTBYTE 3 9 27RS,AB,GP,PF,PP 7.5 22.5 2 6
PUTPTR 2 5 10 PUTPTR 1.5 3 1 2
* PUTBITS - 7 07 GBP,PF,PBP 4.5 2
ADDBASE 1 3 03 ADD 1.5 1.5
Jumps 12 3 37 - - 10.5 - 0 - 7
JUMP - NOPn 1
JUMPX 1 2 02 JUMPX 1 1 1 1
JUMPXX 1 3 03 JUMPXX 1 1
T/FJUMP 6 4 24 T/FJUMP 1/2 3 1/2 3
NTFJMPX 4 4 08 NTFJMPX 1/2 2 1/2 2
EQ (3) 3 09 EQ 1.5 4.5
T/FJMPX -
Arithmetic 7 3 21 1.5 10.5
IPLUS,DIF 2 3 06 PLUS 1.5 3
* AND 2 3 06 AND 1.5 3
* VAG2 1 3 03 ? 1? 1
LO/HILOC 1 3 03 ? 1? 1
IGREATERP 1 3 03 GREATERP2.5 2.5
SHIFT (2) 2 (06) SHIFT 1.5 (2)
Call (4 PVars) 6 19 114 3 18 0.5 3 1.5 9
FN1 1 22+8 30 FN1 4 4 1 1 2 2
FN2 1 22+8 30 FN2 5 5 1 1 2 2
FNX 1 22+8 30 FNX 6 6 1 1 2 2
RETURN 3 8 24 RETURN 1 3 1 3
Cadr/Type 6 6 38 0.3 2 1 5
CAR 2 7 14 CAR 0 1 2
CDR 2 8 16 CDR 0 1 2
* TYPEP 1 4 04 TYPEP 1 1 1 1
LISTP 1 4 04 LISTP 1 1
BIND (2NIL 1PV) 2 7? 14 V←↑,N,V←,V←↑ 4 8
(D)UNBIND (1) (1) UB,V←,V←,V←↑ 4 8
GC (2) 20? 40 20 40
* RPLPTR - ?
* SCAN1 - ?
* SCAN2 - ?
FreeVarFrame 1 30 30 5 2 5
names .1*10*15 1.5 22 1/4 4
frames .1*10? 7 07 5 5 2 2 1 1
BLT 32 bit step ? DLion Bandwidth refs/cycle Tamarin w/o BitBlt chip
BITBLT REPLACE 8 10 Mbps 3 5+2QM ? Mbps
BITBLT XOR 8 10 Mbps 3 4+2QM ? Mbps
BLT 4 20 Mbps 3 2QM ? Mbps
STATISTICS
Frame cache predictions Bell date?
#Frames %Hit 1 % Hit 2 %Miss1 %Miss2
------- ------ ------- ------ ------
1 50 60 50 40
2 75 80 25 20
3 80 90 20 10
4 85 95 15 5
5 88 96 12 4
6 90 97 10 3
7 91 97 9 3
8 92 98 8 2
Frame Dumping Time: Tamarin VS Dragon
Parameter Dragon Tamarin 300ns cycle, 600 quad word
--------- ------ -------
Call/Ret 1.6 4 us = 1@ + 1@4 + 3 + refill + 1 + refill = .3+.6+.6+1.2 + .3+ 1.2
Dump/Load 20 us 8 us = (2@4+3(12/4)@4+3)*2 = 10@.6+1.8 + 8
Call time(32i) 16 us 20-30
miss rate <10 % <20 %
Dumps 2/16 2/20
Overhead <12 % <10 %
7480 BIND instructions analysed to show a static average of 1 PVar and 2 Nils being bound.
#PVar Cnt %
----- --- ---
0 1825 24.10
1 3543 47.36
2 1379 18.44
3 506 6.76
4 129 1.72
5 45 .60
6 22 .30
7 12 .16
8 6 .08
9 2 .02
10 2 .02
11 4 .06
12 1 .00
13 2 .02
15 2 .02
#Nils Cnt % Ave
----- --- --- ---
0 2440 32.62
1 1591 21.24 .21
2 1119 14.96 .28
3 953 12.74 .36
4 468 6.26 .24
5 223 2.98 .15
6 154 2.06 .12
7 115 1.54 .11
8 81 1.08 8
9 66 .88 8
10 49 .66 7
11 30 .40 4
12 24 .32 4
13 20 .26 3
14 21 .28 4
15 126 1.68 25
---- --
210
Static Frame Sizes Purcell Oct 84 <LISPCORE>NEXT>FULL.SYSOUT
With 16 Registers for IVars and PVars 2% (16/962) of frames will have Extenstions.
With 20 Registers for IVars and PVars 1% (6/962) of frames will have Extenstions.
#IVars+PVars Cnt
------------ ---
0 3
1 15 15
2 130 260
3 103 309
4 116 464
5 119 595
6 120 720
7 69 483
8 79 632
9 42 378
10 47 470
11 34 373
12 31 372
13 10 130
14 12 168
15 10 150
16 7 112
20 10 200
34 4 136
61 1 61
81 1 81
--- ---
962 6099/962 = 7
Static Stack depth by printcode (with compiler removing pops) Purcell April 85
7883 Functions analyzed (results rounded)
StackSize % F's % Miss
--------- ----- ------
1 4 96
2 11 84
3 15 68
4 20 48
5 15 34
6 10 24
7 7 17
8 5 13
9 3 9
10 2 7
11 1 5
12 1 4
13 1 3
14 0.8 2
15 0.4 2
16 0.3 1
20 0.1 0.6
24 0.08 0.4
28 0.05 0.2
32 0.02 0.1
37 (MAX) 0.00
BitBlt performance limit:
3 quads * 500ns/q + 8 cy * 200ns/cy = 3200ns/32x4bits = 25ns/bit = 5MB/sec
Expected number of args (Static) passed is <2
0.50 FN0 0
2.48 FN1 2.48
2.85 FN2 5.70
0.90 FN3 2.70
0.24 FN4 0.98
0.27 FNX 1.8
---- ----
7.24 13.66/7.24 = 2
Static instruction statistics collectd Purcell Oct 84 from <LISPCORE>NEXT>FULL.SYSOUT.
576760 Instructin analysed
OpCode Cnt % Name
------ --- --- ----
0 7516 1.30 -X-
1 14683 2.55 CAR
2 17652 3.06 CDR
3 4241 .74 LISTP
4 1151 .20 NTYPX
5 2357 .41 TYPEP
6 10034 1.74 DTEST
10 2869 .50 FN0
11 14305 2.48 FN1
12 16430 2.85 FN2
13 5183 .90 FN3
14 1408 .24 FN4
15 1533 .27 FNX
16 814 .14 APPLYFN
17 476 .08 CHECKAPPLY*
20 12474 2.16 RETURN
21 7480 1.30 BIND
22 1626 .28 UNBIND
23 1439 .25 DUNBIND
24 4160 .72 RPLPTR.N
25 39 .00 GCREF
27 1524 .26 GVAR←
30 753 .13 RPLACA
31 600 .10 RPLACD
32 12405 2.15 CONS
33 501 .09 GETP
34 679 .12 FMEMB
35 199 .03 GETHASH
37 511 .09 CREATECELL
40 592 .10 BIN
44 3 .00 DOCOLLECT
45 3 .00 ENDCOLLECT
46 554 .10 RPLCONS
50 463 .08 ELT
51 143 .02 NTHCHC
52 165 .03 SETA
53 27 .00 RPLCHARCODE
54 39 .00 EVAL
55 38 .00 EVALV
57 6 .00 STKSCAN
73 1 .00 \MU.DRAWLINE
76 2 .00 RAID
100 24170 4.19 IVAR
101 11723 2.03 IVAR
102 5090 .88 IVAR
103 2376 .41 IVAR
104 1068 .19 IVAR
105 599 .10 IVAR
106 292 .05 IVAR
107 547 .09 IVARX
110 15235 2.64 PVAR
111 10040 1.74 PVAR
112 6955 1.21 PVAR
113 5085 .88 PVAR
114 4082 .71 PVAR
115 3388 .59 PVAR
116 2723 .47 PVAR
117 17122 2.97 PVARX
120 1395 .24 FVAR
121 1243 .22 FVAR
122 1498 .26 FVAR
123 1272 .22 FVAR
124 1153 .20 FVAR
125 807 .14 FVAR
126 602 .10 FVAR
127 4994 .87 FVARX
130 1847 .32 PVAR←
131 2005 .35 PVAR←
132 1837 .32 PVAR←
133 1127 .20 PVAR←
134 943 .16 PVAR←
135 799 .14 PVAR←
136 627 .11 PVAR←
137 7441 1.29 PVARX←
140 10180 1.77 GVAR
141 80 .01 ARG0
142 2669 .46 IVARX←
143 2910 .50 FVARX←
144 19875 3.45 COPY
145 66 .01 MYARGCOUNT
146 42 .00 MYALINK
147 22169 3.84 ACONST
150 13028 2.26 'NIL
151 6558 1.14 'T
152 8159 1.41 '0
153 7523 1.30 '1
154 12587 2.18 SIC
155 537 .09 SNIC
156 2435 .42 SICX
157 6123 1.06 GCONST
161 16 .00 READFLAGS
162 13 .00 READRP
163 27 .00 WRITEMAP
164 1 .00 READPRINTERPORT
165 1 .00 WRITEPRINTERPORT
166 15 .00 PILOTBITBLT
167 18 .00 RCLK
170 15 .00 MISC1
171 14 .00 MISC2
172 2 .00 RECLAIMCELL
173 1 .00 GCSCAN1
174 1 .00 GCSCAN2
175 28 .00 SUBRCALL
176 27 .00 CONTEXTSWITCH
200 779 .14 JUMP
201 656 .11 JUMP
202 1667 .29 JUMP
203 397 .07 JUMP
204 293 .05 JUMP
205 298 .05 JUMP
206 300 .05 JUMP
207 239 .04 JUMP
210 276 .05 JUMP
211 175 .03 JUMP
212 186 .03 JUMP
213 231 .04 JUMP
214 208 .04 JUMP
215 266 .05 JUMP
216 139 .02 JUMP
217 173 .03 JUMP
220 82 .01 FJUMP
221 1318 .23 FJUMP
222 1873 .32 FJUMP
223 1691 .29 FJUMP
224 1309 .23 FJUMP
225 1311 .23 FJUMP
226 1190 .21 FJUMP
227 1123 .19 FJUMP
230 829 .14 FJUMP
231 803 .14 FJUMP
232 650 .11 FJUMP
233 579 .10 FJUMP
234 448 .08 FJUMP
235 510 .09 FJUMP
236 453 .08 FJUMP
237 360 .06 FJUMP
240 70 .01 TJUMP
241 1357 .24 TJUMP
242 1957 .34 TJUMP
243 621 .11 TJUMP
244 751 .13 TJUMP
245 1280 .22 TJUMP
246 710 .12 TJUMP
247 553 .10 TJUMP
250 541 .09 TJUMP
251 394 .07 TJUMP
252 295 .05 TJUMP
253 388 .07 TJUMP
254 314 .05 TJUMP
255 250 .04 TJUMP
256 258 .04 TJUMP
257 168 .03 TJUMP
260 8277 1.44 JUMPX
261 5829 1.01 JUMPXX
262 8781 1.52 FJUMPX
263 4941 .86 TJUMPX
264 3209 .56 NFJUMPX
265 7612 1.32 NTJUMPX
270 1024 .18 PVAR←↑
271 1369 .24 PVAR←↑
272 1278 .22 PVAR←↑
273 1127 .20 PVAR←↑
274 976 .17 PVAR←↑
275 775 .13 PVAR←↑
276 726 .13 PVAR←↑
277 32090 5.56 POP
302 355 .06 GETBASEBYTE
304 154 .03 BLT
307 220 .04 PUTBASEBYTE
310 4874 .85 GETBASE.N
311 8764 1.52 GETBASEPTR.N
312 1389 .24 GETBITS.N.FD
315 2150 .37 PUTBASE.N
316 216 .04 PUTBASEPTR.N
317 921 .16 PUTBITS.N.FD
320 2513 .44 ADDBASE
321 869 .15 VAG2
322 262 .05 HILOC
323 571 .10 LOLOC
324 154 .03 PLUS2
325 167 .03 DIFFERENCE
326 58 .01 TIMES2
327 14 .00 QUOTIENT
330 6215 1.08 IPLUS2
331 4312 .75 IDIFFERENCE
332 473 .08 ITIMES2
333 319 .06 IQUOTIENT
334 58 .01 IREMAINDER
340 948 .16 LLSH1
341 473 .08 LLSH8
342 989 .17 LRSH1
343 495 .09 LRSH8
344 238 .04 LOGOR2
345 1710 .30 LOGAND2
346 120 .02 LOGXOR2
350 120 .02 FPLUS2
351 67 .01 FDIFFERENCE
352 250 .04 FTIMES2
353 98 .02 FQUOTIENT
355 27 .00 UBFLOAT1
360 19235 3.34 EQ
361 4614 .80 IGREATERP
362 54 .00 FGREATERP
363 215 .04 GREATERP
365 466 .08 MAKENUMBER
366 139 .02 BOXIPLUS
367 25 .00 BOXIDIFFERENCE
375 433 .08 SWAP