Tamarin: A Custom VLSI-Based Lisp Machine April 14, 1987 Tamarin A Custom VLSI-Based Lisp Machine Mark Ross April 14, 1987 Outline Architectural overview System Configuration Instruction and Memory Timing Block diagram and Functional units Architectural Overview Some architectural features: 40 bit word size 4 2 2 4 1 1 26 GC | CDR | TAG | subtype |F | R| REST 4 2 2 32 GC | CDR | TAG | REST Stack oriented, load/store instruction set On-chip register frame caching - fixed size frames (number is technology dependent) Instruction Buffer to minimize bus traffic System Configuration 1186 Softcard Plugs in the mem. expansion slot of the 6085 20 MBytes of memory TI 34010 graphics controller chip for display with its own video ram and program memory. Reads and writes to the video memory are controlled by the processor. Memory Interface chip for bridge between outside world and Processor memory. Instruction and Memory Timing The Tamarin CMOS processor uses a two stage instruction pipeline. Its first stage performs instruction fetch and decode while the second stage performs instruction execution. Timings for various operations are shown below: Generic instruction rate = 1 instruction per clock cycle Single word read or write = 3 cycles Multiple (n) word read or write = 3 + (n-1) cycles Page relative Jump = 5 cycles Off Page Jump = 6 cycles Function Call = > 11 cycles Instruction and Memory Timing (cont). Normal: Fetch & Dec| Execute Fetch & Dec| Execute Mem. read/write: F & D| Ex F & D| Map | RAS/CAS | Wt. Back | Dead F & D| Ex Block Diagram/Functional Units Register File : 6 frames X 40 words per frame (plus a global frame). Static Ram 8T (extra read xstors) Execution Units Multiplier 4 bits per cycle (can be double clocked) => 6 cycles for 32 X 32 early completion (if one input is 15 b or less takes only 4 cycles). Adder Funnel Shifter (64 to 32 bits) Priority Encoder (finds the first one) Useful for synthesizing divide, etc. Logical Unit (general 2 variable LU) only have opcodes for AND, OR, XOR Functional Units (cont.) Microcode ROM Double bank implementation (2 Micro-instructions fetched per cycle. Mux decides which to use.) Instruction Buffer and PC logic 32 byte instruction buffer. Continuous, low priority fetch. Inst. buffer bypassing on jump. Memory Controller On chip TLB (16 entries) Supports burst mode (multi-cycle) reads and writes. Directly computes RAS and CAS. Offset is used for adjusting fetch address (this is always page-relative). Background instruction fetching (fetch is generally transparent). Functional Units (cont.) Register Mux Addressing for the Register file. TOS reg. and two ARG registers. Can use TOS and ARG0 or ARG0 and ARG1 for addressing. Condition Code Checked in parallel with operation. Critical Path Timing Two paths compete: one for decode and one for execution. Decode Path: Read words from inst buffer Extract OpCode and IBufN Present the micro-PCs to UCode Rom Fetch from UCode Rom Mux. the MI words. (requires condition code) Store in register. Execution Path: Read words from Ram Perform operation (Add is the worst case). Get condition code Conditional Write-back to reg. file (based on Cond. code) 1 PressFonts"slides" styleIunleadedMark insideFooterdis**K outsideFootertitle)raggedMMM MMMMMMM"MMbMMBM CharProps4APostfix)1 outlineBoxBearoff 1 outlineBoxThicknessAMM5M4LPostfix)1 outlineBoxBearoff 1 outlineBoxThicknessLMM*, "M55M,.M M,.MMMMOMMMMMM8:M$&M24M!MMI pagebreak&M4Postfix)1 outlineBoxBearoff 1 outlineBoxThicknessM6Postfix)1 outlineBoxBearoff 1 outlineBoxThickness))M4Postfix)1 outlineBoxBearoff 1 outlineBoxThicknessM6-Postfix)1 outlineBoxBearoff 1 outlineBoxThickness00M6=Postfix)1 outlineBoxBearoff 1 outlineBoxThicknessKKMMM@WM M)MMEMM&M$$M"M M`!M^MMTMKMAM M$MXM%MM; MMM"MM-MMMM*MM97