-- Last Edited by: Alan Bell September 24, 1987 6:24:18 pm PDT Tamarin Documentation Overview Tamarin is a high-performance 40-bit computer system designed for running CommonLisp in symbolic compution environments. The system is composed of a custom VLSI microprocessor, a bus interface chip, and 20 megabytes of memory. The processor is based is a RISC processor that executes most instructions in one cycle time. It also includes a small number of more complicated functions, primarily function call, that execute in several cycle times. It is been specifically designed for the CommonLisp Language [Steele]. This is been designed from a system perspective. - optimization, lower cost, better interfac to software, etc. The system is being built through a series of processors, each with increasing performance. It is designed to be able to interace to other host systems with minimal work. Related Documents Opcode Functional Spec LP schematics MI schematics Microcode Listing CommonLisp Manual System Configurations The system is built as a softcard. A softcard is a board that plugs into an existing computer as a peripheral and acts as an accelerator. In this case, it provides a high-performance Lisp system. The standand computer could be a IBM-PC, a PS 2, a Xerox 6085, etc. The softcard depends on the host computer to provide the hardware and software for interacting with the periperal devices, the disk, Ethernet, keyboard. The host computer basically runs the softcard. The softcard, in its simplest form, consists of the processor, memory and an interface to the host as illustrated in Figure Softcard. The host processor communicates through the bus interface to examine and store into the softcard's memory. It uses this ability to send and receive command blocks, and to send and receive data. The host processor also has the ability to start, stop, and interrupt the processor. The path between the host and the memory has have high bandwidth (20 Mits/sec) but needn't have extremely low latency. The processor has a very tight coupling with memory. The memory must respond to a processor request with very low latency. High bandwidth transfers of blocks of data are also important. The system elements have been optimized for this path. The processor has limited ability to communicate with the host processor. It can interrupt the host by signaling the bus interface chip. The purpose of the bus interface chip is to couple the host processor's bus with the softcard memory. It also performs refresh of the host memory, and starts and stops the Lisp Processor. It reformats the word size. If softcard processor needs some operating system service performed, say read a disk page, it posts a request. It puts a command block into the softcard memory, atomically puts a pointer to the command block on a request queue, and then interrupts the host processor. The host processor atomically reads softcard memory and finds the command block. It performs the requested action and puts another command block on the queue of commands for the softcrad processor, and interrupts it so that it knows that it is finished. The softcard acts as a slave. It has no ability to write to the host's memory or IO devices. There is really no advantage to having this capability. The host will be running an operating system. That operating system needs to coordinate the system peripherals so direct access would just confuse things. There is no need to expand the memory space by using the host memory. It is too slow and with 1MB chips, ... There have been several product configurations considered. One is as a softcard to the 6085. ... Lisp Principles Since its inception, LISP has concentrated on providing functionality for the user, sometimes at the expense of performance. Most microprocessor architectures don't have the features to run Lisp as effieiently as possible. Lisp is a dynamic language. Most decisionsare left until runtime. The type of data stored in a variable can change at runtime. Most data is represented by objects that have pointers to them. Compact Reprsentation of Code Compact representation of Data EQL requirements Close Coupling to Memory Garbage Collection primitives Function call requiements Built in TLB System part not component Invisible pointers - state vars, GC, data driven programming Lisp Processor Chip Lisp Processor chip architecture The basic architecture of Tamarin is based on an internal register file supplying data to the execution unit which then stores it. Additionally there additional functional units which buffer words to and from external memory, maintains the PC, obtains the instruction data, etc. The machine is controlled by horizontal microcode which is 120 bits wide. This controls each of the functional units. Conventions Clocking Naming Software principles Byte codes Tagged pointers Functional Block Diagram Register File The register file is organizedas a collection of frames. Each frame is intended to store the information associated with a single function call. The word size is 34 bits. There are 4 to 8 frames with 40 words in each. Additionally, there is one frame of 10 words that stores global (non-stack related) information, such as the stack frame free list, etc. The register file is given to address: the word within a frame and the frame number. Both the read and write addresses are presented over the wires. The read address is presented during nClock and the decoded address is latched on the upgoing edge of Clock. The write address is presented during Clock and is latched on the upgoing edge of nClock. The register file returns two words: the word at the presented address and the word at address minus 1. This is done because the instruction set is stack based and most opcodes operate on the two top of stack words. Some opcodes need other words in the frame. However, those opcodes only need one word. There are control signals which control which word or words get gated onto the D1 and/or D2 buses. The RD1addr microcode field controls which source gates onto D1. If it is zero, then the register file is gated onto D1. The same holds for D2 using the RD2addr microcode field. The addressed register location is normally placed onto D1 and the previous location is placed onto D2. If the DSwap microcode field is set, then the previous location is placed onto D1 and the addressed location is placed onto D2. The R bus provides the data to be written into the register file at the end of the cycle. If the Waddr microcode field is non-zero and WriteOK control signal is true, then the R buss is written into the register file. This writing happens during Phi3 (nClcok&nClock2). If the WriteOctal microcode field is turned on, then eight consecutive locations are written. The low 3 bits of the word address must be zero. This is primarily used to initialize the Variable section of a frame during a function call. Execution Units The executions units perform various poerations on the data on the D1 and D2 buses and write the result to the R bus. These units are purely combinatorial logic and have no registers nor store any state. The units are an adder, logical unit, shifter, priority encoder, and a Tag shifter. The operation to be performed is selected by the EUop field of the microcode. The top three bits select the unit and the lower four bits select the specific operation. The EU condition code is returned from the adder. The adder performs additions and subtractions on the data on the D1 and D2 buses. It also takes the EUCc microcode field to select which condition is placed on the EU Condition Result wire. The choices are Overflow, Carry, Zero result, or T. The Logical unit performs any of the 16 possible logical operations on 2 inputs - D1 and D2. These operations include AND, OR, XOR, NAND, NOR, etc. The Shifter takes a 64 bit input by concatenating the D1 and D2 data and left shifts that word to form a 32 bit result. The shift distance is the value of the MuxBus. Given that D1 and D2 can be set to zero or to the same value, a variety of shift types are possible. The Priority Encoder takes the value on D1 and determines the number of zero bits before the first one bit. The result is placed on the R Bus. The Tag shift unit is used to set and get the tag field of a pointer. Special Registers The Special register unit is another way to place data onto the D1 or D2 bus without using the register file. For example, the Register file can be used for D1 and a constant from Special Registers used for D2. It can place on D1 and D2 particular constants, the value of Temp Reg, or the value on MuxBus. A single temporary register is contained in this unit. When the RD2addr microcode field so indicates, the temporary register is written with the data on the R bus. This is written during Phi3 if the WriteOk clock field is on. The value to be placed on D1 is selected by RD1addr. If the most significant bit is on, then this unit places a value on D1. The low-order bits select which value. For D1, the choice is MuxBus, Temp reg, or 2 constants. The 2 constants are determined by Lisp code. D2 is similiar with 6 constants to choose from. DPCc The Condition Code unit looks at the value on the D1 and D2 buses. It checks for a particular condition such as D1=0 or D2's Tag is integer, etc. The condition to be used is selected by DpCC micorcode field. The condition to be checked are hard coded in the chip by the Lisp configuration file. Each condition is made up of one or more conditions on D1 and D2. Each bit position can be a D1=zero, one, or Don't care, D2=zero, one, don't care, or D1=D2. The condition of each bit is anded together to form one row. One or more rows are connected to form the ultimate result which is placed on DpConditionResult. Instruction Buffer and PC The instruction buffer prefetches words of Lisp byte code and presents it to the rest of the machine for execution. It output three forms of data: newOpcode, newIBufN, and IBufData. NewOpcode is the next opcode to be executed. It is at the location that the Program COunter points at. The following byte is presented as Memory Buffer, TLB UCode Rom General Control Memory Control Register Multiplexor UCode Fields External Signals Timing System Timing memory System Timing LP Timing Specs Pinout Tools Functional Simulator Microcode Assembler Memory Interface Chip PC Board CMOS I.5 improvements Write back pipelining Registers behind PC, Stack word 0 2 bit per cycle mult with load control for floating point Input to PC for absolute jumps Memory Buffer with Parity Check, GC, Cdr coding Flag register After the fact abort on memory fetch Overflow detect on shifter T/NIL on CCode not on RBus UCode subroutines Change IBuf timing Number of Frames Better Mem control CMOS II improvements Static Ram Cache Tlb size Jump ideas 1 cycle cache return instruction fast context switch indirect pointers b $TamDoc.tioga Figure Softcard ÊT˜J™ J˜>J˜J˜J˜˜J˜‡J˜J˜pJ˜J˜[J˜N—J˜˜J˜J˜ J˜ J˜J˜—J˜˜J™JšÐbl™J˜J˜ÔJ˜JšœuÏiœ“˜—J˜J˜þJ˜J˜ÙJ˜J˜ŒJ˜J˜¢J˜J˜a—J˜˜J˜áJ˜J˜?J˜J˜J˜J˜J˜J˜J˜ J˜J˜