TamDoc.tioga
-- Last Edited by: Alan Bell September 24, 1987 6:24:18 pm PDT
Tamarin Documentation
Overview
Tamarin is a high-performance 40-bit computer system designed for running CommonLisp in symbolic compution environments. The system is composed of a custom VLSI microprocessor, a bus interface chip, and 20 megabytes of memory. The processor is based is a RISC processor that executes most instructions in one cycle time. It also includes a small number of more complicated functions, primarily function call, that execute in several cycle times. It is been specifically designed for the CommonLisp Language [Steele].
This is been designed from a system perspective. - optimization, lower cost, better interfac to software, etc.
The system is being built through a series of processors, each with increasing performance.
It is designed to be able to interace to other host systems with minimal work.
Related Documents
Opcode Functional Spec
LP schematics
MI schematics
Microcode Listing
CommonLisp Manual
System Configurations
Figure Softcard
The system is built as a softcard. A softcard is a board that plugs into an existing computer as a peripheral and acts as an accelerator. In this case, it provides a high-performance Lisp system. The standand computer could be a IBM-PC, a PS 2, a Xerox 6085, etc. The softcard depends on the host computer to provide the hardware and software for interacting with the periperal devices, the disk, Ethernet, keyboard. The host computer basically runs the softcard.
The softcard, in its simplest form, consists of the processor, memory and an interface to the host as illustrated in Figure Softcard. The host processor communicates through the bus interface to examine and store into the softcard's memory. It uses this ability to send and receive command blocks, and to send and receive data. The host processor also has the ability to start, stop, and interrupt the processor. The path between the host and the memory has have high bandwidth (20 Mits/sec) but needn't have extremely low latency.
The processor has a very tight coupling with memory. The memory must respond to a processor request with very low latency. High bandwidth transfers of blocks of data are also important. The system elements have been optimized for this path. The processor has limited ability to communicate with the host processor. It can interrupt the host by signaling the bus interface chip.
The purpose of the bus interface chip is to couple the host processor's bus with the softcard memory. It also performs refresh of the host memory, and starts and stops the Lisp Processor. It reformats the word size.
If softcard processor needs some operating system service performed, say read a disk page, it posts a request. It puts a command block into the softcard memory, atomically puts a pointer to the command block on a request queue, and then interrupts the host processor. The host processor atomically reads softcard memory and finds the command block. It performs the requested action and puts another command block on the queue of commands for the softcrad processor, and interrupts it so that it knows that it is finished.
The softcard acts as a slave. It has no ability to write to the host's memory or IO devices. There is really no advantage to having this capability. The host will be running an operating system. That operating system needs to coordinate the system peripherals so direct access would just confuse things. There is no need to expand the memory space by using the host memory. It is too slow and with 1MB chips, ...
There have been several product configurations considered. One is as a softcard to the 6085. ...
Lisp Principles
Since its inception, LISP has concentrated on providing functionality for the user, sometimes at the expense of performance. Most microprocessor architectures don't have the features to run Lisp as effieiently as possible. Lisp is a dynamic language. Most decisionsare left until runtime. The type of data stored in a variable can change at runtime.
Most data is represented by objects that have pointers to them.
Compact Reprsentation of Code
Compact representation of Data
EQL requirements
Close Coupling to Memory
Garbage Collection primitives
Function call requiements
Built in TLB
System part not component
Invisible pointers - state vars, GC, data driven programming
Lisp Processor Chip
Lisp Processor chip architecture
The basic architecture of Tamarin is based on an internal register file supplying data to the execution unit which then stores it. Additionally there additional functional units which buffer words to and from external memory, maintains the PC, obtains the instruction data, etc. The machine is controlled by horizontal microcode which is 120 bits wide. This controls each of the functional units.
Conventions
Clocking
Naming
Software principles
Byte codes
Tagged pointers
Functional Block Diagram
Register File
The register file is organizedas a collection of frames. Each frame is intended to store the information associated with a single function call. The word size is 34 bits. There are 4 to 8 frames with 40 words in each. Additionally, there is one frame of 10 words that stores global (non-stack related) information, such as the stack frame free list, etc.
The register file is given to address: the word within a frame and the frame number. Both the read and write addresses are presented over the wires. The read address is presented during nClock and the decoded address is latched on the upgoing edge of Clock. The write address is presented during Clock and is latched on the upgoing edge of nClock.
The register file returns two words: the word at the presented address and the word at address minus 1. This is done because the instruction set is stack based and most opcodes operate on the two top of stack words. Some opcodes need other words in the frame. However, those opcodes only need one word.
There are control signals which control which word or words get gated onto the D1 and/or D2 buses. The RD1addr microcode field controls which source gates onto D1. If it is zero, then the register file is gated onto D1. The same holds for D2 using the RD2addr microcode field. The addressed register location is normally placed onto D1 and the previous location is placed onto D2. If the DSwap microcode field is set, then the previous location is placed onto D1 and the addressed location is placed onto D2.
The R bus provides the data to be written into the register file at the end of the cycle. If the Waddr microcode field is non-zero and WriteOK control signal is true, then the R buss is written into the register file. This writing happens during Phi3 (nClcok&nClock2). If the WriteOctal microcode field is turned on, then eight consecutive locations are written. The low 3 bits of the word address must be zero. This is primarily used to initialize the Variable section of a frame during a function call.
Execution Units
The executions units perform various poerations on the data on the D1 and D2 buses and write the result to the R bus. These units are purely combinatorial logic and have no registers nor store any state. The units are an adder, logical unit, shifter, priority encoder, and a Tag shifter. The operation to be performed is selected by the EUop field of the microcode. The top three bits select the unit and the lower four bits select the specific operation. The EU condition code is returned from the adder.
The adder performs additions and subtractions on the data on the D1 and D2 buses. It also takes the EUCc microcode field to select which condition is placed on the EU Condition Result wire. The choices are Overflow, Carry, Zero result, or T.
The Logical unit performs any of the 16 possible logical operations on 2 inputs - D1 and D2. These operations include AND, OR, XOR, NAND, NOR, etc.
The Shifter takes a 64 bit input by concatenating the D1 and D2 data and left shifts that word to form a 32 bit result. The shift distance is the value of the MuxBus. Given that D1 and D2 can be set to zero or to the same value, a variety of shift types are possible.
The Priority Encoder takes the value on D1 and determines the number of zero bits before the first one bit. The result is placed on the R Bus.
The Tag shift unit is used to set and get the tag field of a pointer.
Special Registers
The Special register unit is another way to place data onto the D1 or D2 bus without using the register file. For example, the Register file can be used for D1 and a constant from Special Registers used for D2. It can place on D1 and D2 particular constants, the value of Temp Reg, or the value on MuxBus. A single temporary register is contained in this unit. When the RD2addr microcode field so indicates, the temporary register is written with the data on the R bus. This is written during Phi3 if the WriteOk clock field is on.
The value to be placed on D1 is selected by RD1addr. If the most significant bit is on, then this unit places a value on D1. The low-order bits select which value. For D1, the choice is MuxBus, Temp reg, or 2 constants. The 2 constants are determined by Lisp code. D2 is similiar with 6 constants to choose from.
DPCc
The Condition Code unit looks at the value on the D1 and D2 buses. It checks for a particular condition such as D1=0 or D2's Tag is integer, etc. The condition to be used is selected by DpCC micorcode field. The condition to be checked are hard coded in the chip by the Lisp configuration file. Each condition is made up of one or more conditions on D1 and D2. Each bit position can be a D1=zero, one, or Don't care, D2=zero, one, don't care, or D1=D2. The condition of each bit is anded together to form one row. One or more rows are connected to form the ultimate result which is placed on DpConditionResult.
Instruction Buffer and PC
The instruction buffer prefetches words of Lisp byte code and presents it to the rest of the machine for execution. It output three forms of data: newOpcode, newIBufN, and IBufData. NewOpcode is the next opcode to be executed. It is at the location that the Program COunter points at. The following byte is presented as
Memory Buffer, TLB
UCode Rom
General Control
Memory Control
Register Multiplexor
UCode Fields
External Signals
Timing
System Timing
memory System Timing
LP Timing Specs
Pinout
Tools
Functional Simulator
Microcode Assembler
Memory Interface Chip
PC Board
CMOS I.5 improvements
Write back pipelining
Registers behind PC, Stack word 0
2 bit per cycle mult with load control for floating point
Input to PC for absolute jumps
Memory Buffer with Parity Check, GC, Cdr coding
Flag register
After the fact abort on memory fetch
Overflow detect on shifter
T/NIL on CCode not on RBus
UCode subroutines
Change IBuf timing
Number of Frames
Better Mem control
CMOS II improvements
Static Ram Cache
Tlb size
Jump ideas
1 cycle
cache return instruction
fast context switch
indirect pointers
b