TamDesign.memo Purcell D*R*A*F*T 14-Aug-85 20:04:32 This document describes internal data structures used in the Tamarin-1 hardware and low-level software. These data structure are also designed to coexist with a version of Interlisp-D on the D-Machines for software development and hardware simulation. POINTER FORMAT The interim word size will be 32 bits (expandable to 40 only on Tamarin hardware). There are three pointer formats in memory and one more on the stack: *-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-* |0 0|SUBTYP|0 0 0| PTR | *-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-* *-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-* |0 1|S INTEGER: 30-bit 2's Complement | *-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-* *-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-* |1|STY|F F F F F| Stack Block Marker | *-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-* *-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-* |1|S| EXPONENT | MANTISSA | *-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-* TYP This field describes the basic type of the pointer. The tags were chosen to ease transition from D-Machine to TamOps. The taging choice for basic types might be reexamined after 32 bit conversion. 00: Pointer - subtype valid 01: 30 bit 2's complement Integers (to one billion) 1X: Stack markers for D-Machine (unused by Tamarin, alias to Tamarin Floatp's) 1X: 31 bit Immediate Floating point number 10^-19 to 10^19 (unused by D-Machine) SUBTYP When TYP=00 this field describes the exact type of some objects. 00,XXX1XX: Reserved for type or address expansion 00,XXXX1X: Reserved for type or address expansion 00,XXXXX1: Reserved for type or address expansion 00,000000: ObjectP backwards compatible pointer. Type is in usual type table. 00,001000: UserListP (ListP returns true for user object) 00,010000: ListP (else CAR/CDR trap) 00,011000: CodeP (else FN traps to interpreter) 00,100000: AtomP (else apply traps) 00,101000: StackP (else FVAR_ traps) 00,110000: NumberP (commonlisp EQL must trap) 00,111000: UnboundP (traps FVAR) 00,111100: IndirectP (var reference treated as indirect) 01,SXXXXX: IntegerP 1SEEEEEEE: FloatP 100,XXXFF: Basic Frame 101,XXXXX: Free block 110,FFFFF: Frame Extension with flags 111,XXXXX: Guard Block PTR This field contains a (usually even) pointer to 16 bits of memory. This pointer can address 32MB and be expanded to 256MB. S The sign bit for either IntegerP's or FloatP's. FF Stack Block (PTR=stackBlock) . This pattern marks the start of a DMachine stack block EXPONENT is the 7-bit excess 64 exponent, like a shortened IEEE exponent. MANTISSA is an IEEE-like 23 bit mantissa with an assumed leading 1. INTEGER is a 30-bit 2's complement integer. STACK STRUCTURE Stack Frame (dumped in memory) HDR *-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-* -1: | NEXT | *=+=+=+=+=+=+=+=*=+=+=+=+=+=+=+=*=+=+=+=+=+=+=+=*=+=+=+=+=+=+=+=* 0: |T|S|N|M|L|C|F|R| (MAXVAR) | SP | USECOUNT | *-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-* | NAMETABLE | *-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-* | CODE Base | *-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-* | PC | *=+=+=+=+=+=+=+=*=+=+=+=+=+=+=+=*=+=+=+=+=+=+=+=*=+=+=+=+=+=+=+=* | ALINK | *-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-* | CLINK | *-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-* | EXTENSION | *-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-* | VAR0 | | ... | MAXVAR: | VARi | *-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-* SP: | STK0 | | ... | ~40: | STKj | *-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-* NEXT Frames remain chained down the stack and into the free list by NEXT pointers. Spagetti manipulation rearranges the linear threads but Dump/Restore does not. FLAGS * = used by microcode *T: Trap on Entry *S: Slow Return ALINK # CLINK ?N: No-Push: Don't push result when returned to ?M: Multiple Values accepted L: Large frame extension C: InCall ?? F: Fast: binds no variables R: Reserved for expansion (default 0) SP The Stack Pointer indicates the top item on the stack in the internal frame. MAXVAR The MAXVAR field could be used for optionally checking stack underflow. USECNT The Usecnt field indicates if this frame is pointed to by any other frames or Stack pointers CLINK This field point sto the preveious frame in system memory. The CLINK field is only valid for the shallowest frame in the processor. Hence, there is only one CLINK register in the processor even though there are more frames. ALINK This field point to the frame which denotes the previous access environment. If this field is UNBOUND, then the CLink should be used for the ALink. NAMETABLE This points to the nametable of the function. If it is unbound then the function binds no names. If it is unbound then the function should not be searched on free variable lookup. The field is copied from the compiled function header. Vars This section contains the IVars, PVars, and FVars. Each is indexed from a single offset. This field can contain indirect pointers. Stk This section contains the working stack. The start of the stack is determined by the number of IVars and PVars. Function Definition Cell: *-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-* | "CodeP" | pointer to CodeP | *-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-* Function Header: CodeP: *-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-* | FLAGS | (MAXVAR) | SP | USECOUNT | *-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-* | NAMETABLE | *-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-* | ENTRY 0 | ENTRY 1 | ENTRY 2 | ENTRY 3 | *-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-* | ENTRY 4 | ENTRY 5 | ENTRY 6 | ENTRY 7 | *-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-* | "atom" | FUNCTION NAME | *-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-* | unused | *-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-* | unused | *-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-* |int| | NTSIZE | NLOCALS | FVAROFFSET | *-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-* | "atom" | Var Name | | ... | ... | | "atom" | Var Name | *-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-* |int| Offset | |int| ... | |int| Offset | *-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-* ENTRY0: | Compiled Code | | | | ENTRY2 | | ENTRY1 | | ... | | | *-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-* Some Processor Unique Registers: *-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-* | T | *-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-* | NIL | *-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-* | 0 | *-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-* | 1 | *-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-* | UNBIND | *-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-* | Val Space Base | *-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-* | FnDef Space Base | *-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-* | Type Table Base | *-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-* | FREE: Stack Frame Free List | *-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-* | CLINK | *-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-* | | Global variables (GVARs) used often by UFNS could be stored in unique registers. *-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-* | Memory Address Reg | *-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-*-+-+-+-+-+-+-+-* SOFTWARE DEVELOPMENT STRATEGY We are building a lisp.sysout with data structures compatible with both reduced D-MachineOps and TamarinOps. We believe the load up can be done on a Dorado with certain opcodes disabled. We think all the opcode changes can be done in Lisp in the Unimplemented Function (UFN) code rather than by new microcode. Changes are as follows: * Microcode work New traps to UFN Opcodes to trap on new Immediate data types TYPE, LISTP, EQ arithmetic already ok Odd pointer traps? Disable some opcodes CONS, RPLACD, RPLACA, BIN Change some bit packing in 24 vs 32 bit pointers FN checks for compiled flg in Function Defn cell { 32 bit pointer fields: GETBASEPTR32, BIN CDR coding off: CAR, CDR, CONS, RPLCONS New Type subroutine due to Immediate 30-bit integers: LISTP, NTYPX,TYPEP,DTEST Immediate 30-bit integers coexisting with SMALLPs: EQ } * Call and Return across worlds UFNs (Interpreter) First Observation: We can skip control transfers across worlds by bringing up a well ordered set of Tamarin Functions, and implementing only one (non reentrant) entry to the Tamarin Emulator. We chose to skip fancy control transfers across worlds because it is tricky and we don't have good ways to do free variable look-up across worlds. * CodeP bit in Function Defn Cells The most significant bit (compiled flg) in Function Defn cell can remain. In the set of Tamarin 32-bit objects, Tamarin CodeP's and D-Machine CodeP's are disjoint. PUTD will preserve Tamarin CodeP's and turn on the "FunDefCell msb" for D-Machine CodeP's. The D-Machine will execute FunDefCells iff they contain D-Machine CodeP's, and Tamarin will executed FunDefCells iff they contain Tamarin CodeP's. * Mixed immediate integer implementations DLisp is efficient for SMALLP. UFNS can implement TamINTEGER in terms of SMALLP TamOps is efficient for TamINTEGER. UFNS can implement SMALLP in terms of TamINTEGER LispOps already traps arithmetic on non SMALLPs; only EQ needs work. EQ on both machines must be more careful to make the two representations of SMALLP equal. Limits FreeVar lookup across boundary. Context switch to Keyboard (in TamOps?) Microcode work New traps to UFN Opcodes to trap on new Immediate data types TYPE, LISTP, EQ arithmetic already ok Odd pointer traps? Disable some opcodes CONS, RPLACD, RPLACA, BIN Change some bit packing in 24 vs 32 bit pointers FN checks for compiled flg in Function Defn cell Change Stack block types to make room for immediate types DESIGN DISCUSSION The following subset of features is the goal for a first Tamarin machine: YES * 32 or 40 bit word size (vs 16) YES * Tagged architecture BOTH * 40 vs. 32 bit architecture DEFER * 4-bit Reference Count hardware DEFER * Cdr coding IF EASY * Indirect Pointers on the stack YES * Stack format changes YES * Bind/Unbind YES * Opcode changes DEFER * Prolog STUDY * CommonLisp DEFER * I/O changes Some choices not emphasised in Alan's June 17 memo are: DEFER * 32 bit word addressability rather than the current 16 bit addressability Apart from the width of the machines data paths, we can choose the meaning of the least significant bit (LSB) of addresses. Most instructions move 32 bit quantities and most but not all addresses point to 32 bit alligned quantities. It is awkward to support odd addresses that point to 16 bit quanties. Even if we intend to change the address conventions, a transition period could be accomodated if the LSB meaning were left unchanged until most system software had been changed to never use odd addresses. During the transition period the use of odd addresses could be unsupported by anything more than a trap to a UFN thereby retaining odd addresses with a severe speed penalized. During the transition period we could do more staging on the Dandelion and even install the "odd address penalty/detection" first for perfomance tuning the speed critical code, and secondly to detect and remove all odd addressing. The hardest change to lisp in the subset is the new stack design and the many issues it involves: * Variables in a Fixed Size Stack Frame If a funtion's PVARs cannot all fit in the limited processor frame then a frame extension is allocated in memory at function entry and either freed at exit or garbaged collected later. These extensions would be from a pool that is treated like stack for reference count purposes. That is, pointers in the extensions are not reference counted and the pool is scanned like the stack at GC time. * Evaluation Stack in a Fixed Size Stack Frame Statistics indicate 99% of functions need only 16 cells of evaluation stack. Tamarin can have more stack with less variables or less stack and more variables, but there is an upper limit on stack. The compiler will be responsible for limiting stack depth. Deep stacks (beyond about 32) seem to be rare enough that we can tempararily call them compile errors. Later, techniques for moving stack variables into the frame extension or function splitting can be implemented. * Spagetti Stack: Merging Basic Frames and Frame Extensions Where we currently copy only frame extensions we will copy whole frames. We believe the slight change in sematics is acceptable to us and our customers. We should try this change on a DMachine Basic Frame vs Frame eXtension * Spagetti Stack: Slow Return There are three cases of slow return in present Interlisp-D: (1) ALINK # CLINK The returning frame executes a RETURN which UFNs upon detecting "Slow" bit. Tamarin will also UFN the RETURN opcode when the S (Slow) flag is set in the frame header. (2) UseCnt(caller) > 1: The returner can return but before the Caller can resume it must be copied. Tamarin will Trap on entering the caller when the T (Trap on entry) flag is set in the frame header. (3) No stack space ahead of caller. Tamarin stack frames are heap allocated so this is only a case of "Stack Frame FreeList Empty" Tamarin handles Spagetti Stack with Traps just before and just after RETURN. All Calls are ordinary. Caller is suspended; new stack frame is instanciated for the Callee (in processor and/or memory). Normal Frames point only to their caller and are pointed to only by one "callee". Return from such a frame to such a frame is ordinary Returner's frame is freed; Clink points to Returnee's frame which is resumed. (ALink # CLink) in some frames which point to two different frames: Return to such a frame is ordinary Return from such a frame is special and should trap to a UFN. UFN must decrement usecnt(ALink) as the ALink reference is about to disappear (usecnt>1) in some frames which are pointed to by more than one frame: Return from such a frame doesn't happen (because no running frame is ever pointed to) Return to such a frame is special and should trap to a UFN. UFN must copy the frame; decrement usecnt in original; usecnt_1 in new copy UFN returns to the new copy which hase usecnt=1 and noone points to it. ISSUES UFN arg count comes from where? * Lexical closure Masinter and vanMelle indicate that our current FVAR mechanism is sufficient to implement closures by forwarding variable references to closure frames. The generalization is that FVAR binding pointers can point anywhere in system memory. Stores to FVARS that don't point to the stack must be reference counted. * Pre-assign stack frame buffers in memory for each processor frame?: NO Assigning processor frames to memory frames in the free list at overflow time rather than pre-assigning memory frames simplifies and improves the latency of context switches where it is desirable to change free lists on context switch. This design gives each context has its own safe pool of stack frame buffers. Calls Number of arguments is known at compile time except for APPLY which compiles closed today. * Argument passing Up to 6 arguments can be pushed on the stack before a call and the remaining arguments must be presented in a vector pushed on the stack. The number of arguments (0-7) are copied from the callers stack to the callee's variables. The callee has eight entry points for reformating different numbers of arguments. In this way even evaluating default values for Common Lisp optional arguments can be efficiently compiled and executed. PageFaults The Memory Address register is pushed onto the stack by the PageFault microcode which then calls a ufn routine which effects a context switch from below the suspended frame. Microcode that may fault must be sure to leave a valid TOS for a Fault Address to be push onto. Traps Traps are forced calls between instructions. Generally the PC remains at the start of a trapping instruction. ColdStart, Interrupt, PageFault, NoFreeFrames, EnterTrap UFN UFNs are substitute subroutines for some instruction in the instruction stream. The PC is generally advanced as the UFN is called so that the instruction is skipped over on return. * Free Variable Lookup Free variable lookup is done by a UFN that scans up the stack looking at nametables until a match is found. The inner loop of table search has an opcode assist. If the variable is found in a processor cached frame that frame must be dumped before a bind pointer can be created that points to it. (We may or may not want to push frames out of cache just to read the nametable pointer). An invariant to be maintained is that a bind pointer should point to memory and never into a cached stack frame. Bell observed that (with suitable conventions) if Lookup pushes the specvar value out of the cache, then it will not return during the life of the frame that did the lookup. * Pointers to cached stack frames It should be a rule that frames with pointers to them should not be allowed in the frame cache. Generally this condition is easy to maintain. StackP pointers to a frame are counted in UseCnt. Frames with UseCnt>1 are copied on load (EntryTrap). FVAR Bind Pointers are accounted for in the paragraph above about free variable lookup. We must be CAREFUL when extra frames are force loaded other than to return to. We should be careful with any fancy multivalue return protocol that could use FVars in returning frame after forcing in caller. MYCLINK opcode must force out the frame that it will point to. * Context switching Almost for free we can give each context its own frame free list. The frame over flow trap can let groups of contests share frames among themselves. SPEED CALCULATIONS I estimate that on a well tuned Tamarin1, Interlisp-D programs will run 3 times faster than currently on a DLion. The speed estimate assumes certain chip cycle times and certain memory cycle times and assumes a dynamic instruction mix. DYNAMIC INSTRUCTION STATS Statistics collectd June 81 by Dorado PC sampling of program "TEST". --- DLion --- --------------- Tamarin ------------------------ opcode freq time prod x100ns cycles cProd mems x 500ns Qmem x 500+250ns Load Immediate 29 2.2 63 29 1 29 Load/Store 28 4 117 40 1.1 25 3 15 Read/Write 13 5.5 73 125 4.5 59 13 65 Jumps 12 3 37 45 - 10.5 - - 7- 35 Arithmetic 7 3 21 11 1.5 10.5 Call (4 PVars) 6 19 114 101 3 18 3 15 9+9 68 Cadr/Type 6 6 38 27 0.3 2 5 25 Bind 2 7? 14 8 4 8 IFU (10% miss) (10) 0.7 70 50 10- 50 Stk Flush (10%) 0.6 - 32 3 2 6- 30 GC (2) 20? 40 30 20? 2? 10 FreeVarFrame 1 30 30 53 5 2 10 5+5 38 ------------ --- -- --- -- -- -- -- -- -- -- TOTAL 102 5.5 577 550ns 189 140 221 Chance of a memory instruction colliding with an immediately following memory cycle: Jump Refill 8% Call 3% UFNs 1% Cadr 5% Read/Write 0% IFU Refill 10% StackOver/Under 1% TOTAL 27% SUMMARY Machine cycles instr KIPS BITBLT BLT Memory bandwidth cache/frame/ifu DLion 2.4MHz/ 5.5 = 400 1MB/s 2MB/s 5 MBytes/sec Dorado 16 MHz/ 15 = 1000 5MB/s 10MB/s 32 MBytes/sec Dragon 10 MHz/ 5 = 2000 3MB/s 4MB/s 80 MBytes/sec (2 caches, 2 fast regs) Tamarin 10 MHz/ 6 = 1700 10MB/s 16MB/s 32 MBytes/sec (nibble mode) APPENDIX A DETAILS OF DYNAMIC INSTRUCTION STATS: --- DLion -- ------- Tamarin ------- *XOP opcode freq time prod opcode time tProd mem mProd Qmem QmProd Load Immediate 29 2.2 63 1 29 SIC,0,1 15 2 30 SCn 1 15 SICX 5 3 15 SICX 1 05 NIL 1 2 02 NIL 1 01 COPY 3 2 06 COPY 1 03 POP 5 2 10 POP 1 05 GCONST ? 3 GCONST 1 Load/Store 28 4 117 1.1 25 3 IVARn 11 4 50 VARn 1 11 PVARn 8 4 36 VARn 1 08 PVAR_ 1 4 04 VARn_ 1 01 PVARD_ 1 4 04 VARn_^ 1 01 PVARX 3 4 13 VARX 1 03 IVARX_ 1 4 05 VARX_ 1 01 GVAR 3 5 15 GVAR 0 0 1 3 5 IVARX - 4 IVARX 1 PVARX_ - 4 PVARX_ 1 Read/Write 13 5.5 73 4.5 59 1 13 GETBASE 2 3 06 GBP, GF 3 6 1 2 * GETBYTE 3 4.5 13 RS,AB,GP,PF 6 18 GETPTR 0.3 4 01 GETPTR 1.5 0.5 1 GETBITS 1? 5 05 GBP,GF 3 3 1 1 * PUTBASE 1 4 04 GBP,PF,PBP 4.5 4.5 2 2 * PUTBYTE 3 9 27RS,AB,GP,PF,PP 7.5 22.5 2 6 PUTPTR 2 5 10 PUTPTR 1.5 3 1 2 * PUTBITS - 7 07 GBP,PF,PBP 4.5 2 ADDBASE 1 3 03 ADD 1.5 1.5 Jumps 12 3 37 - - 10.5 - 0 - 7 JUMP - NOPn 1 JUMPX 1 2 02 JUMPX 1 1 1 1 JUMPXX 1 3 03 JUMPXX 1 1 T/FJUMP 6 4 24 T/FJUMP 1/2 3 1/2 3 NTFJMPX 4 4 08 NTFJMPX 1/2 2 1/2 2 EQ (3) 3 09 EQ 1.5 4.5 T/FJMPX - Arithmetic 7 3 21 1.5 10.5 IPLUS,DIF 2 3 06 PLUS 1.5 3 * AND 2 3 06 AND 1.5 3 * VAG2 1 3 03 ? 1? 1 LO/HILOC 1 3 03 ? 1? 1 IGREATERP 1 3 03 GREATERP2.5 2.5 SHIFT (2) 2 (06) SHIFT 1.5 (2) Call (4 PVars) 6 19 114 3 18 0.5 3 1.5 9 FN1 1 22+8 30 FN1 4 4 1 1 2 2 FN2 1 22+8 30 FN2 5 5 1 1 2 2 FNX 1 22+8 30 FNX 6 6 1 1 2 2 RETURN 3 8 24 RETURN 1 3 1 3 Cadr/Type 6 6 38 0.3 2 1 5 CAR 2 7 14 CAR 0 1 2 CDR 2 8 16 CDR 0 1 2 * TYPEP 1 4 04 TYPEP 1 1 1 1 LISTP 1 4 04 LISTP 1 1 BIND (2NIL 1PV) 2 7? 14 V_^,N,V_,V_^ 4 8 (D)UNBIND (1) (1) UB,V_,V_,V_^ 4 8 GC (2) 20? 40 20 40 * RPLPTR - ? * SCAN1 - ? * SCAN2 - ? FreeVarFrame 1 30 30 5 2 5 names .1*10*15 1.5 22 1/4 4 frames .1*10? 7 07 5 5 2 2 1 1 BLT 32 bit step ? DLion Bandwidth refs/cycle Tamarin w/o BitBlt chip BITBLT REPLACE 8 10 Mbps 3 5+2QM ? Mbps BITBLT XOR 8 10 Mbps 3 4+2QM ? Mbps BLT 4 20 Mbps 3 2QM ? Mbps STATISTICS Frame cache predictions Bell date? #Frames %Hit 1 % Hit 2 %Miss1 %Miss2 ------- ------ ------- ------ ------ 1 50 60 50 40 2 75 80 25 20 3 80 90 20 10 4 85 95 15 5 5 88 96 12 4 6 90 97 10 3 7 91 97 9 3 8 92 98 8 2 Frame Dumping Time: Tamarin VS Dragon Parameter Dragon Tamarin 300ns cycle, 600 quad word --------- ------ ------- Call/Ret 1.6 4 us = 1@ + 1@4 + 3 + refill + 1 + refill = .3+.6+.6+1.2 + .3+ 1.2 Dump/Load 20 us 8 us = (2@4+3(12/4)@4+3)*2 = 10@.6+1.8 + 8 Call time(32i) 16 us 20-30 miss rate <10 % <20 % Dumps 2/16 2/20 Overhead <12 % <10 % 7480 BIND instructions analysed to show a static average of 1 PVar and 2 Nils being bound. #PVar Cnt % ----- --- --- 0 1825 24.10 1 3543 47.36 2 1379 18.44 3 506 6.76 4 129 1.72 5 45 .60 6 22 .30 7 12 .16 8 6 .08 9 2 .02 10 2 .02 11 4 .06 12 1 .00 13 2 .02 15 2 .02 #Nils Cnt % Ave ----- --- --- --- 0 2440 32.62 1 1591 21.24 .21 2 1119 14.96 .28 3 953 12.74 .36 4 468 6.26 .24 5 223 2.98 .15 6 154 2.06 .12 7 115 1.54 .11 8 81 1.08 8 9 66 .88 8 10 49 .66 7 11 30 .40 4 12 24 .32 4 13 20 .26 3 14 21 .28 4 15 126 1.68 25 ---- -- 210 Static Frame Sizes Purcell Oct 84 NEXT>FULL.SYSOUT With 16 Registers for IVars and PVars 2% (16/962) of frames will have Extenstions. With 20 Registers for IVars and PVars 1% (6/962) of frames will have Extenstions. #IVars+PVars Cnt ------------ --- 0 3 1 15 15 2 130 260 3 103 309 4 116 464 5 119 595 6 120 720 7 69 483 8 79 632 9 42 378 10 47 470 11 34 373 12 31 372 13 10 130 14 12 168 15 10 150 16 7 112 20 10 200 34 4 136 61 1 61 81 1 81 --- --- 962 6099/962 = 7 Static Stack depth by printcode (with compiler removing pops) Purcell April 85 7883 Functions analyzed (results rounded) StackSize % F's % Miss --------- ----- ------ 1 4 96 2 11 84 3 15 68 4 20 48 5 15 34 6 10 24 7 7 17 8 5 13 9 3 9 10 2 7 11 1 5 12 1 4 13 1 3 14 0.8 2 15 0.4 2 16 0.3 1 20 0.1 0.6 24 0.08 0.4 28 0.05 0.2 32 0.02 0.1 37 (MAX) 0.00 BitBlt performance limit: 3 quads * 500ns/q + 8 cy * 200ns/cy = 3200ns/32x4bits = 25ns/bit = 5MB/sec Expected number of args (Static) passed is <2 0.50 FN0 0 2.48 FN1 2.48 2.85 FN2 5.70 0.90 FN3 2.70 0.24 FN4 0.98 0.27 FNX 1.8 ---- ---- 7.24 13.66/7.24 = 2 Static instruction statistics collectd Purcell Oct 84 from NEXT>FULL.SYSOUT. 576760 Instructin analysed OpCode Cnt % Name ------ --- --- ---- 0 7516 1.30 -X- 1 14683 2.55 CAR 2 17652 3.06 CDR 3 4241 .74 LISTP 4 1151 .20 NTYPX 5 2357 .41 TYPEP 6 10034 1.74 DTEST 10 2869 .50 FN0 11 14305 2.48 FN1 12 16430 2.85 FN2 13 5183 .90 FN3 14 1408 .24 FN4 15 1533 .27 FNX 16 814 .14 APPLYFN 17 476 .08 CHECKAPPLY* 20 12474 2.16 RETURN 21 7480 1.30 BIND 22 1626 .28 UNBIND 23 1439 .25 DUNBIND 24 4160 .72 RPLPTR.N 25 39 .00 GCREF 27 1524 .26 GVAR_ 30 753 .13 RPLACA 31 600 .10 RPLACD 32 12405 2.15 CONS 33 501 .09 GETP 34 679 .12 FMEMB 35 199 .03 GETHASH 37 511 .09 CREATECELL 40 592 .10 BIN 44 3 .00 DOCOLLECT 45 3 .00 ENDCOLLECT 46 554 .10 RPLCONS 50 463 .08 ELT 51 143 .02 NTHCHC 52 165 .03 SETA 53 27 .00 RPLCHARCODE 54 39 .00 EVAL 55 38 .00 EVALV 57 6 .00 STKSCAN 73 1 .00 \MU.DRAWLINE 76 2 .00 RAID 100 24170 4.19 IVAR 101 11723 2.03 IVAR 102 5090 .88 IVAR 103 2376 .41 IVAR 104 1068 .19 IVAR 105 599 .10 IVAR 106 292 .05 IVAR 107 547 .09 IVARX 110 15235 2.64 PVAR 111 10040 1.74 PVAR 112 6955 1.21 PVAR 113 5085 .88 PVAR 114 4082 .71 PVAR 115 3388 .59 PVAR 116 2723 .47 PVAR 117 17122 2.97 PVARX 120 1395 .24 FVAR 121 1243 .22 FVAR 122 1498 .26 FVAR 123 1272 .22 FVAR 124 1153 .20 FVAR 125 807 .14 FVAR 126 602 .10 FVAR 127 4994 .87 FVARX 130 1847 .32 PVAR_ 131 2005 .35 PVAR_ 132 1837 .32 PVAR_ 133 1127 .20 PVAR_ 134 943 .16 PVAR_ 135 799 .14 PVAR_ 136 627 .11 PVAR_ 137 7441 1.29 PVARX_ 140 10180 1.77 GVAR 141 80 .01 ARG0 142 2669 .46 IVARX_ 143 2910 .50 FVARX_ 144 19875 3.45 COPY 145 66 .01 MYARGCOUNT 146 42 .00 MYALINK 147 22169 3.84 ACONST 150 13028 2.26 'NIL 151 6558 1.14 'T 152 8159 1.41 '0 153 7523 1.30 '1 154 12587 2.18 SIC 155 537 .09 SNIC 156 2435 .42 SICX 157 6123 1.06 GCONST 161 16 .00 READFLAGS 162 13 .00 READRP 163 27 .00 WRITEMAP 164 1 .00 READPRINTERPORT 165 1 .00 WRITEPRINTERPORT 166 15 .00 PILOTBITBLT 167 18 .00 RCLK 170 15 .00 MISC1 171 14 .00 MISC2 172 2 .00 RECLAIMCELL 173 1 .00 GCSCAN1 174 1 .00 GCSCAN2 175 28 .00 SUBRCALL 176 27 .00 CONTEXTSWITCH 200 779 .14 JUMP 201 656 .11 JUMP 202 1667 .29 JUMP 203 397 .07 JUMP 204 293 .05 JUMP 205 298 .05 JUMP 206 300 .05 JUMP 207 239 .04 JUMP 210 276 .05 JUMP 211 175 .03 JUMP 212 186 .03 JUMP 213 231 .04 JUMP 214 208 .04 JUMP 215 266 .05 JUMP 216 139 .02 JUMP 217 173 .03 JUMP 220 82 .01 FJUMP 221 1318 .23 FJUMP 222 1873 .32 FJUMP 223 1691 .29 FJUMP 224 1309 .23 FJUMP 225 1311 .23 FJUMP 226 1190 .21 FJUMP 227 1123 .19 FJUMP 230 829 .14 FJUMP 231 803 .14 FJUMP 232 650 .11 FJUMP 233 579 .10 FJUMP 234 448 .08 FJUMP 235 510 .09 FJUMP 236 453 .08 FJUMP 237 360 .06 FJUMP 240 70 .01 TJUMP 241 1357 .24 TJUMP 242 1957 .34 TJUMP 243 621 .11 TJUMP 244 751 .13 TJUMP 245 1280 .22 TJUMP 246 710 .12 TJUMP 247 553 .10 TJUMP 250 541 .09 TJUMP 251 394 .07 TJUMP 252 295 .05 TJUMP 253 388 .07 TJUMP 254 314 .05 TJUMP 255 250 .04 TJUMP 256 258 .04 TJUMP 257 168 .03 TJUMP 260 8277 1.44 JUMPX 261 5829 1.01 JUMPXX 262 8781 1.52 FJUMPX 263 4941 .86 TJUMPX 264 3209 .56 NFJUMPX 265 7612 1.32 NTJUMPX 270 1024 .18 PVAR_^ 271 1369 .24 PVAR_^ 272 1278 .22 PVAR_^ 273 1127 .20 PVAR_^ 274 976 .17 PVAR_^ 275 775 .13 PVAR_^ 276 726 .13 PVAR_^ 277 32090 5.56 POP 302 355 .06 GETBASEBYTE 304 154 .03 BLT 307 220 .04 PUTBASEBYTE 310 4874 .85 GETBASE.N 311 8764 1.52 GETBASEPTR.N 312 1389 .24 GETBITS.N.FD 315 2150 .37 PUTBASE.N 316 216 .04 PUTBASEPTR.N 317 921 .16 PUTBITS.N.FD 320 2513 .44 ADDBASE 321 869 .15 VAG2 322 262 .05 HILOC 323 571 .10 LOLOC 324 154 .03 PLUS2 325 167 .03 DIFFERENCE 326 58 .01 TIMES2 327 14 .00 QUOTIENT 330 6215 1.08 IPLUS2 331 4312 .75 IDIFFERENCE 332 473 .08 ITIMES2 333 319 .06 IQUOTIENT 334 58 .01 IREMAINDER 340 948 .16 LLSH1 341 473 .08 LLSH8 342 989 .17 LRSH1 343 495 .09 LRSH8 344 238 .04 LOGOR2 345 1710 .30 LOGAND2 346 120 .02 LOGXOR2 350 120 .02 FPLUS2 351 67 .01 FDIFFERENCE 352 250 .04 FTIMES2 353 98 .02 FQUOTIENT 355 27 .00 UBFLOAT1 360 19235 3.34 EQ 361 4614 .80 IGREATERP 362 54 .00 FGREATERP 363 215 .04 GREATERP 365 466 .08 MAKENUMBER 366 139 .02 BOXIPLUS 367 25 .00 BOXIDIFFERENCE 375 433 .08 SWAP