Tamarin: A Custom VLSI-Based Lisp Machine April 14, 1987 Tamarin A Custom VLSI-Based Lisp Machine Mark Ross April 14, 1987 Outline Architectural overview System Configuration Instruction and Memory Timing Block diagram and Functional units Architectural Overview Some architectural features: · 40 bit word size 4 2 2 4 1 1 26 GC | CDR | TAG | subtype |F | R| REST 4 2 2 32 GC | CDR | TAG | REST · Stack oriented, load/store instruction set · On-chip register frame caching - fixed size frames (number is technology dependent) · Instruction Buffer to minimize bus traffic System Configuration 1186 Softcard · Plugs in the mem. expansion slot of the 6085 · 20 MBytes of memory · TI 34010 graphics controller chip for display with its own video ram and program memory. Reads and writes to the video memory are controlled by the processor. · Memory Interface chip for bridge between outside world and Processor memory. Instruction and Memory Timing The Tamarin CMOS processor uses a two stage instruction pipeline. Its first stage performs instruction fetch and decode while the second stage performs instruction execution. Timings for various operations are shown below: · Generic instruction rate = 1 instruction per clock cycle · Single word read or write = 3 cycles · Multiple (n) word read or write = 3 + (n-1) cycles · Page relative Jump = 5 cycles · Off Page Jump = 6 cycles · Function Call = > 11 cycles Instruction and Memory Timing (cont). Normal: Fetch & Dec| Execute Fetch & Dec| Execute Mem. read/write: F & D| Ex F & D| Map | RAS/CAS | Wt. Back | Dead F & D| Ex Block Diagram/Functional Units · Register File : 6 frames X 40 words per frame (plus a global frame). Static Ram 8T (extra read xstors) · Execution Units Multiplier 4 bits per cycle (can be double clocked) => 6 cycles for 32 X 32 early completion (if one input is 15 b or less takes only 4 cycles). Adder Funnel Shifter (64 to 32 bits) Priority Encoder (finds the first one) Useful for synthesizing divide, etc. Logical Unit (general 2 variable LU) only have opcodes for AND, OR, XOR Functional Units (cont.) · Microcode ROM Double bank implementation (2 Micro-instructions fetched per cycle. Mux decides which to use.) · Instruction Buffer and PC logic 32 byte instruction buffer. Continuous, low priority fetch. Inst. buffer bypassing on jump. · Memory Controller On chip TLB (16 entries) Supports burst mode (multi-cycle) reads and writes. Directly computes RAS and CAS. Offset is used for adjusting fetch address (this is always page-relative). Background instruction fetching (fetch is generally transparent). Functional Units (cont.) · Register Mux Addressing for the Register file. TOS reg. and two ARG registers. Can use TOS and ARG0 or ARG0 and ARG1 for addressing. · Condition Code Checked in parallel with operation. Critical Path Timing Two paths compete: one for decode and one for execution. Decode Path: Read words from inst buffer Extract OpCode and IBufN Present the micro-PCs to UCode Rom Fetch from UCode Rom Mux. the MI words. (requires condition code) Store in register. Execution Path: Read words from Ram Perform operation (Add is the worst case). Get condition code Conditional Write-back to reg. file (based on Cond. code)