MapCacheLayout.tioga
Written by: Sindhu, June 4, 1985 2:18:11 pm PST
Pradeep Sindhu March 13, 1986 12:36:05 pm PST
Bertrand Serlet July 6, 1985 3:12:12 pm PDT
MAP CACHE LAYOUT
MAP CACHE LAYOUT
MAP CACHE LAYOUT
DRAGON PROJECT — FOR INTERNAL XEROX USE ONLY
DRAGON PROJECT — FOR INTERNAL XEROX USE ONLY
DRAGON PROJECT — FOR INTERNAL XEROX USE ONLY
Map Cache Layout
Release as [Indigo]<Dragon>Documentation>MapCache.tioga, .press

© Copyright 1984 Xerox Corporation. All rights reserved.
Abstract: This document collects together in one place all of the relevant information about the Map Cache's layout. Where possible, the document also says why a particular choice was made in the hope that this will encourage uniform design practice and avoid unnecessary rethinking.
XEROX  Xerox Corporation
   Palo Alto Research Center
   3333 Coyote Hill Road
   Palo Alto, California 94304



For Internal Xerox Use Only
Contents
Action Items
Global Issues
State Control
MInterface
AID Ram
Bypass
ArrayCtl
Cache Array
RAM Cell
Action Items
MUST DO BEFORE SUBMISSION:
1. Alps Gnd and Vdd Extend.
8. Size of Vdd and Gnd for alps and mbusinterface slices.
13. Check that mcTop and mcArray don't interfere in their bus connections.
Get rid of long diffusion runs.
Get rid of extra diffusion close to vertical lines.
Make sure via is not on hilly topography. There are a few cases of this between the ram and ramaccess. Also, there is the case of via abutting diff or poly.
Make sure you put probe points for interesting signals. The rules are: at most 6 probes; at least a via sized (5x5) metal area to probe. Only top layer of metal is probable: ie. metal2 always; metal1 if not covered by metal2.
DBus signals: DHold, DExecute, DShift; all of these have the same timing as other MBus signals. That is, driven on phA; sampled on phB.
It appears there is a conflict on the QBus between order and flags during the second cycle of write entry.
phA.ldMatch (rather than just phA) is needed because the match signal is used one cycle after it is latched during a WriteEntry operation.
Global Issues
Power distribution
Vdd should not move by more than 0.25 V at any point, otherwise there is the possibility of latchup. If DI is the change in current, then 0.25 > DV = rDI.L/W; substituting values for metal2 we have, L/W < 5.95/DI, or 6/DI where DI is in amps.
Signal Naming
Signals that have the string "xAB" inside them are latched on phA and stable throughout phB. Signals that have the string "xBA" are latched on phB and stable throughout phA. Signals that have neither string inside them are not latched. This convention must be followed rigorously because at least for control signals the names will be parsed to determine which phase to latch them in, if at all. Once both sexes of latch are provided by Alps it will be unnecessary to carry latching information inside signal names although it will still be desirable for checking purposes.
Signal Numbering
When there is an array of signals, such as s[0], s[1], ..., s[n], the convention is that s[0] is the most significant bit and s[n] the least. Furthermore, when these signals run along the x axis, s[0] appears at the bottom (that is at the lowest y coordinate) and s[n] appears at the top. Similarly, when the signals run along the y axis, s[0] appears at the left and s[n] at the right (increasing values of the x coordinate).
True signals always appear to the left of and below their complements when both signals are running in parallel.
Centralized Control For Debug Version
For the debug version of the chip all of the control is generated by a centralized Alps block. The outputs from this block will be driven to a CtlOut bus which will lie across the width of the chip.
Furthermore, the signal MCSelected has been set to 1 internally during debugging.
Control Signal Drive Transistors
The capacitance on the address lines is about the largest, so lets do the computation of transistor sizes for these lines. We'll make all the drive transistors the same size so the other signals will be driven fast enough. The section titled "Adrs Section Transistor sizing" gives 4.1pf as the capacitance on each address line when there are 150 lines. The debug version will have 64 lines, making this capacitance ~2pf. To this we need to add the capacitance of the horizontal metal run of the CtlOut bus which is ~5mm.
So, the capacitance per line
= 2+50*0.0176 = 3 pf.
If we make the drive transistors 16/2 then they will drive the signal in around 7ns. Of course for tristate drivers the transistors must be 32/2.
Power Supply for Control Signal Drive Transistors
A 16/2 transistor will pull down 2 mA. There are roughly 50 control signals, so the total power drain is 100 mA, or 0.1 A. Using our magic formula, we get L/W < 60. Since L is 24*50=1200, we have W > 20.
Control Signal for Control Signal Drive Transistors
Because of horizontal pitch constraints we're forced to run the horizontal control signals Dr and nDr in poly. This poly run could be fairly long (as much as 2000 lambda). However, since we don't need to switch these signals except to disable the output of alps speed is not a problem.
State Machine
Control Signals
MInterface
Control Signals
BA Type
drMBusBA
AB Type
PSelMBusHi25AB, PSelMBusLo25AB, QSelMBuslo10AB
AID Ram
Control Signals
BA Type
pnSelMasterBA
AB Type
QSelAIDAB, wtAIDAB, ldxpnAB
Bypass
Control Signals
BA Type
RSelBypassBA, ldBypassResultBA
AB Type
forceInAB, forceOutAB, selBypassPPortAB
Order Logic
This is a combinatorial block that computes the 3 bit order field to be used within the return value for each operation. It is generated by Alps.
Control Signals
AB Type
QSelOrderAB
ArrayCtl
Control Signals
BA Type
arrayAdrsSelPBusBA, resetBA, ldMatchBA, prechCBBA, drCBforMatchBA
AB Type
setRefAB, matchValidAB, ctlRPVAB, wtVictimRPVAB, wtMatchingRPVAB, wtAddressedRPVAB, ldRPVInAB, setCtlRPVAB, ldAdrsInLineAB, VSelArrayAB, accessMatchingCAMAB, accessAddressedCAMAB, killAllLinesAB, ctlVPVAB, wtMatchingVPVAB, wtAddressedVPVAB, QSelArrayAIDAB, ldAIDandVPAB, PSelArrayVPAB, drCBforWriteAB, prechMatchAB, accessMatchingRAMAB, connectAccessLinesAB, arrayMatchAB, QSelOrderAB, ldRPandFlagsAB, drRBLinesAB, PSelArrayRPAB
Ctl Line Drive Transistors
The capacitance on the arrayAdrs lines is the largest of all the control signals. Making the transistors for these lines 16/2 will charge the lines in 10ns.
Cache Array
Ram
Ram Cell
This cell is used both in the AID RAM and in the RAM section of the Cache Array. Its vital statistics are:
Pull Down Transistor: 7/2 (this is as large as possible)
Access Transistor: 3/2 (~2 times weaker than the pull down to avoid problems on read)
Pull Up Transistor: 3/5 (~6 times weaker than access so access is able to overpower it)
WxH=45x48
Ram Access Capacitance
= nCols[(Metal2 capacitance) + (gate capacitance for two 3/2 access transistors)]
= nCols[(3*45*0.68E-4) + (2*6*11.8E-4)] = nCols*0.026
= 52*0.026 = 1.35pf
Ram Bit Line Capacitance
= nRows [(Metal capacitance) + (diffusion area capacitance) + (diffusion periphery)]
= nRows[(0.48*0.0182) + (2*1.5*11E-4) + (2*3*10E-4)] = nRows*0.018
= 150*0.018 = 2.7pf
Ram BitLine Drive Transistor
Given a bit line capacitance of 2.7 pf, an 8/2 transistor will be able to drive in 4.5*2.7 = 12 ns, which is fast enough.
Ram Bit Line Precharge Transistor
A 10/2 p transistor will be able to charge 2.7 pf in 24 ns or so, which is quite comfortable.
RamTop Vdd sizing
Each column of ram will draw 1mA (current for one 8/2 n transistor). So for 52 columns we have 52 mA. The power formula requires L/W < 6/DI = 115. The ram is 45*52=2340 microns wide, so the power bus must be at least 2340/115=20 microns wide.
RamTop Gnd sizing for 64/2 n transistors
Each
Each column of ram will draw 2mA (current for one 16/2 transistor). So for 52 columns we have 104 mA. The power formula requires L/W < 6/DI = 57.7. The ram is 45*52=2340 microns wide, so the power bus must be at least 2340/57.7=40 microns wide.
RamTop Bus Tristate Drivers
If Nd is the number of tristate drivers per bus line the capacitance these drivers have to switch is
= [capacitance of metal2 bus + Nd*(capacitance due to each driver)]
As a starting point lets assume a drive strength of (8/2)n. The capacitance due to each driver then
= [n area + n periphery] + [p area + p periphery]
= [16*5*1.5+ 42*3]E-4 + [40*5*3 + 90*3]E-4
= 0.11pf
With Nd=3, the total capacitance
= [8mm*0.176] + [3*0.11] pf
= 1.74 pf.
An (8/2)n transistor will be able to drive this load in 4.5*1.74=7.8ns, which is adequate. So the tristate drivers will be made equivalent to (8/2)n's. Note that roughly the same computation applies to all of the bus drivers since the buses will be all this long or less.
RamTop Bus Tristate Driver Powersupply and Ground
The current pulled by each driver is around 1mA. There are a total of 26 drivers for the Ram, or 0.026Amps. L/W < 6/0.026=230. Now, L is 2340 lambda, so W > 10.
RamTop adrsInLineMSB drivers
The capacitance for the adrsInLineMSB line
= nLines*[metal capacitance per cell + gate capacitance of 4/2 n transistor]
= 128*[48*0.0182*E-2 + 11.8*8*E-4]
= 1.28*1.81=2.31pf
A 16/2 n transistor will drive such a load in 2.31*2.3= 5.3 ns. However, its convenient to use the same driver as for the adrsInLineLSB line which has more capacitance (see next point).
RamTop adrsInLineLSB drivers
The capacitance for the adrsInLineL SB line
= nCols/2*[metal2 capacitance per cell + gate capacitance of 8 16/2 n transistors]
= 26*[90*0.0176 *E-2 + 11.8*256*E-4]
= 0.26*31.8=8.3pf
A 32/2 n transistor will drive such a load in 1.2*8.3= 10 ns.
Ram Access Control
Use of phB.prechMatch instead of just phB:
The reason we can't just use phB is that we'd like to also read and write the CAM, and this is done during phB. If phB alone were used its possible that the precharge transistor would be fighting against some heavy-duty pull downs that happened to mismatch (very likely!).
Pull Down transistors connected to nArrayMatch:
These are made 7/2; here is the justification. The capacitance on nArrayMatch
= nRows[(Metal) + (Diff sidewall) + (Diff Area)]
= nRows [(3*48*0.51E-4) + (24*3.0E-4) + (26*1.5E-4)]
= nRows[0.018444] = 150*0.018444 = 2.76pf;
A 7/2 n transistor drives 1 pf in 5.1 ns, so the line will be pulled low in 12.0 ns, which is plenty.
RamAccess Drive transistor:
This is made 6/2; here is the justification. The access line capacitance is 1.35 pf, so a 6/2 will take 9*1.35/1.5 = 8.1 ns.
Ram Power Distribution
There are 52 cells horizontally. In each cell the worst case is when the cell is making a transition and both sides are on. Making the pessimistic assumption that both sides are in saturation, we get a current drain of 2*0.5*(2/4.0)*(2/5.0)*(1/2.5) mA = 0.08 mA per cell, or a total of 4.16 mA. Therefore L/W < 6/.004 = 1500. Given W = 6l, we have L < 9000l. Since the Ram Section is less than 2500l we don't have to put a vertical power grid inside the Ram section.
RamAccessCore Details
RamAccessCoreTop: Size of driver for nPrechMatch line
The capacitance on the nPrech line
=nLines*[metal capacitance per cell + gate capacitance]
=150*[48*0.0182*E-2 + 10*11.8*E-4]
= 150*.021 = 3.15 pf
A 16/2 n transistor will charge this in 2.3*3.15=7 ns, which is plenty.
Cam
Cam Power Distribution
Power and ground must be 8l, and there must be a vertical strap of nLines*2 on the RAM side. Here is the justification:
There are 30 cells horizontally. The worst case is during match when all of the the 7/2 pull downs except one are yanking down on the precharged match lines. A 3/4 pulls down 3/16mA max, so each power line gets a 5.6 mA hit. L/W < 6/.0056 = 1071. Therefore the maximum length for 8l wires is 8568l. Since the Cam section is around 1200, we are quite safe. The total drain from 150 lines will be 0.84 amp.
The size of straps can be justified as follows. Each Vdd and Gnd line in the Cam part is connected to two 6l lines coming from the Ram. One of these is committed to the Ram (since at most one Ram row can be active at one time), so the other one can supply the Cam. For the current required by the Cam, a 6l line can be at most  
Cam Access Stitching Frequency
Capacitance per cell of the CamAccess line
= [poly capacitance + two 3/2 gates]
= [0.7*0.024 + 2*6*11.8E-4] pf
= 0.03 pf
Resistance of poly line per cell
= 0.7*175 ohms
= 122 ohms
There are 30 cells, so the total C and R values are 0.9 pf and 3.6 Kohm, and the time constant is 3.2 ns. In a surge of conservatism lets put a strap every 10 cells, which will give us a time constant of 0.3 ns — dead safe!
Cam Match Transistor Sizing
Capacitance of the match line per cell
= [metal2 capacitance+drain area cap+drain periphery]
= [0.39*0.0176+34*1.5E-4+32*3*E-4] pf
= 0.02 pf
There are 30 cells, so the total C is 0.6pf. A 3/2 n transistor will pull this down in 12ns, so a 3/4 transistor seems adequate. The current due to the pull down transistors in one line will be 30*3/16=5.6mA.
CamTop Vdd sizing for precharge to Vdd
Each column of cam will draw 1 mA. So for 30 columns we have 30 mA. The power forumula requires L/W < 6/0.03 = 200. The cam is 39*30=1170 microns wide so the power bus must be at least 6 microns wide.
CamTop Gnd sizing for precharge to Gnd
Each column of cam will draw 2.5 mA. So for 30 columns we have 75 mA. The power forumula requires L/W < 6/0.075 = 80. The cam is 39*30=1170 microns wide so the power bus must be at least 14 microns wide.
CamTop Gnd sizing for bit line drive transistors
The effective transistor size is 16/2, and only one of CB, nCB will be on so the current per column is 2mA. So for 30 columns we have 60 mA. The power forumula requires L/W < 6/0.060 = 100. The cam is 39*30=1170 microns wide so the power bus must be at least 11 microns wide.
VPValid Section Transistor sizing
VPValid Top Transistor sizing for VPVMux
The capacitance fo the VPVin line
= nLines*[metal capacitance per cell + diff area + diff periphery]
= 128*[48*0.0182*E-2 + 56*1.5*E-4 +44*3*E-4]
= 128*0.89*E-2=1.14pf
A 10/2 n transistor will charge 1.14 pf in 4.5*1.14*0.8 = 4.1 ns, which is adequate.
RPValid Section Transistor sizing
RPValidTopLeft Transistor sizing for RPVMux
The capacitance fo the RPVin line is the same as for VPVin, so a 10/2n transistor suffices here as well.
Ref Section Transistor sizing
Ref Core Transistors
The transistors for driving nmselected and nrselected are made 7/2. The justification is the same as for the pull down transistors connected to nArrayMatch (see Ram Access Control Section).
RefTop nsetref Inverter
The capacitance on nsetref line
= nLines*[metal capacitance per cell + gate capacitances of a 14/2 n transistor]
= 128*[48*0.0182*E-2 + 28*11.8*E-4]
= 128*4.17=5.3 pf
A 32/2 n transistor will drive this load in 1.2*5.3=6.4ns, which is adequate.
Adrs Section Transistor sizing
Line Select Drive Transistor
The capacitance on the lineSel line
= lineLength*[capacitance per unit length of metal2]
= 9mm*0.176 = 1.6pf
An 12/2 transistor will drive this line in 3*1.6 = 4.8ns, so the drive transistor is made 12/2.
Address Decoder transistors
These transistors are made 8/2. A chain of 9 of these switches in around 12 ns from the start of the rising edge of phB (see AdrsDecoder8.2.plot).
Address Line Drive Transistors
The capacitance on each address line (each address line has (nLines/2)*2 transistors on it)
= nLines*[(gate capacitance) +(metal capacitance)]
= 150*[16*11.8 + 48*1.82]*E-4
= 4.1pf
If we give the drive transistors 20ns or so to do their job, we get can get by with a size of 8/2. However, since the address lines are just like other control lines, and we'd like the control lines to switch faster lets use 16/2 transistors.
Address Decoder Organization
The line with the lowest address is at the bottom of the array (smallest y value). For the address lines the least significant bit is rightmost. Each line has two decoders in it; one for the left side of the array and the other for the right side. The decoder for the right side is the lower one of the two. In the first version the left array will be omitted, but when this array is included an extra address line will have to be added at the least significant end.
RAM Cell
This cell is used both in the AID RAM and in the RAM section of the Cache Array. Its vital statistics are:
Pull Down Transistor: 7/2 (this is as large as possible)
Access Transistor: 3/2 (~2 times weaker than the pull down to avoid problems on read)
Pull Up Transistor: 3/5 (~6 times weaker than access so access is able to overpower it)
WxH=45x48
RamAccess Capacitance:
= nCols[(Metal2 capacitance) + (gate capacitance for two access transistors)]
= nCols[(4*45*0.68E-4) + (2*6*11.8E-4)] = nCols*0.026
= 52*0.026 = 1.3pf