MapCacheNotes.tioga
Copyright Ó 1987 by Xerox Corporation. All rights reserved.
Jean-Marc Frailong November 30, 1987 12:09:16 pm PST
Louis Monier February 14, 1988 9:40:06 pm PST
Summary of Map processor DynaBus interface
Mapping (aid, vp) to rp
Map(a: AID, v: VirtualPage) Returns (r: RealPage, f: Flags)
H : 11100,XX,iiiiiiiiii,000000000000000,vvvvvvvvvvvvvvvvvvvvvvXXXXXXXXXX
D : XXXXX XX XXXXXXXXXX XXXXXXXXXXXXXXX,XXXXXXXXXXXXXXXXaaaaaaaaaaaaaaaa
H : 11101,00,iiiiiiiiii,000000000000000,rrrrrrrrrrrrrrrrrrrrrrXXXXXXffff
D : XXXXX XX XXXXXXXXXX XXXXXXXXXXXXXXX,XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
NOTE: May also return in error if entry is not present; the packet is then
H : 11101,10,iiiiiiiiii,000000000000000,XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
D : XXXXX XX XXXXXXXXXX XXXXXXXXXXXXXXX,IIIIIIIIIIccccccccccccccccccc010
IO Read (aid implicit in AID register)
ReadMapCacheEntry(v: VirtualPage) Returns (r: RealPage, f: Flags) - implicit aid
H : 10000,XX,iiiiiiiiii,000000000000000,0101nnnn0Xvvvvvvvvvvvvvvvvvvvvvv
D : XXXXX XX XXXXXXXXXX XXXXXXXXXXXXXXX,XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
H : 10001,00,iiiiiiiiii,000000000000000, -- same as in request packet --
D : XXXXX XX XXXXXXXXXX XXXXXXXXXXXXXXX,rrrrrrrrrrrrrrrrrrrrrrXXXXXXffff
NOTE: May also return in error if entry is not present; the packet is then
H : 10001,10,iiiiiiiiii,000000000000000, -- same as in request packet --
D : XXXXX XX XXXXXXXXXX XXXXXXXXXXXXXXX,IIIIIIIIIIccccccccccccccccccc111
ReadMapCacheRegister(r: RegNum) Returns (d: CARD32)
H : 10000,XX,iiiiiiiiii,000000000000000,0101nnnn10XXXXXXXXXXXXXXXXXXXrrr
D : XXXXX XX XXXXXXXXXX XXXXXXXXXXXXXXX,XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
H : 10001,00,iiiiiiiiii,000000000000000, -- same as in request packet --
D : XXXXX XX XXXXXXXXXX XXXXXXXXXXXXXXX,dddddddddddddddddddddddddddddddd
IO Write (aid implicit in AID register)
WriteMapCacheEntry(v: VirtualPage, r: RealPage, f: Flags, V: Bool)
H : 10010,1X,iiiiiiiiii,000000000000000,0101nnnn0Vvvvvvvvvvvvvvvvvvvvvvv
D : XXXXX XX XXXXXXXXXX XXXXXXXXXXXXXXX,rrrrrrrrrrrrrrrrrrrrrrXXXXXXffff
H : 10011,00,iiiiiiiiii,000000000000000, -- same as in request packet --
D : XXXXX XX XXXXXXXXXXX XXXXXXXXXXXXXX,XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
WriteMapCacheRegister(r: RegNum, d: CARD22)
H : 10010,1X,iiiiiiiiii,000000000000000,0101nnnn10XXXXXXXXXXXXXXXXXXXrrr
D : XXXXX XX XXXXXXXXXX XXXXXXXXXXXXXXX,XXXXXXXXXXdddddddddddddddddddddd
H : 10011,00,iiiiiiiiii,000000000000000, -- same as in request packet --
D : XXXXX XX XXXXXXXXXX XXXXXXXXXXXXXXX,XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
NOTE: Any of the IOWrite may return in error if kernel=1; the packet is then
H : 10011,10,iiiiiiiiii,000000000000000, -- same as in request packet --
D : XXXXX XX XXXXXXXXXX XXXXXXXXXXXXXXX,IIIIIIIIIIccccccccccccccccccc001
Header decoding
A header is valid iff:
    Cmd  ZA DevT DevN VP in subset term
MapRequest  1 110 0  0 X  X  1   vp
IOReadRequest 1 000 0  0 5  DevID if func[0]=0 t*id*(f0+vp)
IOWriteRequest 1 001 0  0 5  DevID if func[0]=0 t*id*(f0+vp)
BIOWriteRequest 1 010 0  0 5  X  if func[0]=0 t*(f0+vp)
vp means that the virtual page number is in the subset specified by VPMask and VPPattern.
t means that the device type is mapcache (5).
id means that the four low-order bits of deviceID match the deviceNumber in the request.
f0 is the high-order bit of the function code. It is 1 for IO operations on registers.
Let's define Cmd=c0c1c2c3c4c5, iorw=~c1*~c2, biow=~c1*c2*~c3, tio=t*(iorw*id+biow), and K=headerIn*ZA*c0*~c4. I fold c4 into ZA since a 16 input gate is a beautiful thing, and get ~K=~headerIn +~ZA16 +~c1. Then
ValidHeader == K*(map*vp + tio*(f0+vp))
 == K*(vp*(map+tio)+tio*f0)
 == NOR[~K, ~(vp*(map+tio)+tio*f0)]
In addition, for IOWriteRequest and BIOWriteRequest, the mode bit must be 0 (kernel). If it isn't, the header is accepted but the write will not take place and an error will be issued in the reply packet. A trap: mode=0 means kernel.
AccessError == (iow+biow)*mode
 == ~c1*(c2#c3)*mode
Input FIFO
The fifo needs to hold 78 bits:
(1) Access error
(3) Command for map, ioRead, ioWrite, bioWrite
(10) DeviceID of requestor
(32) address
(32) data=rp+flags, or aid, or any mask and pattern
Control.sch
Cmd=c1c2c3
Map  c1
NotBroadcast c1+~c2
WTable  ~AccessError*~f0*~c1*(c2#c3)
WReg  ~AccessError*f0*~c1*(c2#c3)
ReadReg  f0*~c1*~c2*~c3
RegOp  f0*~c1
Error  AccessError + (map+read entry)*~match*lookUpMap
selRpF  c1*~Error
faultCode  Map*010+Read{Entry}*111+Write*001
Output.sch
rpf  IF LookUpMap THEN TableOut.(rp,flags) ELSE (Registers.rpOut,1100)
reg  content of the selected register
Registers.sch
All registers are 22-bit wide, except AID, a 16-bit register.
(0) AID: used by ReadEntry and WriteEntry.
(1) SharedPattern, (2) SharedMask: specify the shared area.
(3) BypassPattern, (4) BypassMask, (5) BypassBase: specify the bypass area.
(6) SubSetMask, (7) SubSetPattern: specify the subset of vp; used for multiple map caches.
The register RegAddr is accessed if RegOp=1. Its content (left-filled with zeros in the case of AID) is available on RegOut. It is written if WReg=1, and the value is the low-order part of DIn.
AddrIn  FifoOut.Address
DIn  FifoOut.Data
RegOp  (ioRead+ioWrite+bioWrite) * FifoOut.Address[8]
WReg  (ioWrite+ bioWrite) * FifoOut.Address[8] * ~AccessError
rpOut  IF ~LookUpMap THEN rp
vpIn  IF Map THEN FifoOut.IOAddress[0..22) ELSE FifoOut.IOAddress[10..32)
AID  IF vp IN SharedArea THEN 0 ELSE aid
Multiple map caches
Two registers accessible through IORead and IOWrite are used to specify a mask and a pattern for vp. If vp*SubSetMask#SubSetPattern*SubSetMask, any request using vp (Map, Read Entry, or Write Entry) is ignored. SubSetPattern and SubSetMask are 22 bits quantities.
Mapping algorithm
For Map and ReadEntry, the following algorithm is used to compute rp and the flags. For WriteEntry, the write is not performed if commandAid=-1 or vp IN BypassArea.
va IN BypassArea: vp'BypassMask=BypassPattern'BypassMask
va IN SharedArea: vp'SharedMask=SharedPattern'SharedMask
flags: Dirty, KWtEnable, UWtEnable, URdEnable
rp ← SELECT TRUE FROM
commandAid=-1 => vp,
vp IN BypassArea => (BypassBase'BypassMask) ( (vp'~BypassMask),
ENDCASE => LookUp[vp, vp IN SharedArea THEN 0 ELSE commandAid];
flags ← SELECT TRUE FROM
commandAid=-1, vp IN BypassArea => Dirty=TRUE, KWtEnable=TRUE, UWtEnable=FALSE, and URdEnable=FALSE,
ENDCASE => LookUp[vp, vp IN SharedArea THEN 0 ELSE commandAid];
Reply Packet
All packets are made with
header = 1 Cmd 1 Error 0 RqDevId Z15 | First
data = Z32 | Second
First ← IF Map THEN (rp Z6 flags) ELSE address
Second ← IF Error THEN (DeviceID, Z19, fault)
ELSE SELECT Cmd FROM
Read Reg => (extended reg),
Read Entry => (rp, Z6, flags),
Map, Write => (XXXX, but actually the same as for ReadEntry),
ENDCASE => NULL;
Error codes:
MapFault (010) issued in case of a Map Cache miss  -- Map
DynaBusOtherFault (111) issued in case of a Map Cache miss -- ReadEntry
IOAccessFault (001) issued by any IOWrite if kernel=1 -- IOWrite
if kernel=1 on a BIOWrite, we ignore the request
Booting
Set reset ...
After reset, the IOB sets the AID register to -1.
The boot program is running with aid=-1.
The IOB write aid=-1 into all caches,
then writes start code in VM corresponding to the reschedule location,
then issues a reschedule,
then releases PReset of all processors
Then the map cache gets flushed with 256 IOWrite (and cheating by knowing the hash function).
Hashing
Let's decompose vp=vpH|vpM|vpL, with vpL the low-order byte, vpM the next one, and vpH the high-order 6 bits. Similarly, aid=aidH|aidL, where aidL is the low-order byte.
Then h=vpL XOR vpM XOR aidL XOR aidH. The low-order byte aidL is not stored in the table, and can be restored as aidL=h XOR vpL XOR vpM XOR aidH.
This function is insensitive to the fact that some bits of vp might be stuck if the map cache maps only a part of the address space.
RAMs
Available RAM cells are:
short:  53*30 area=1590
square:  41*39 area=1599
tall:  34*52 area=1768
cache:  26*40 area=1040
Fifo: 16 words of 78 bits.
Tall: 1052 by 3062
Flat: 700 by 4662
Entry Table:
Width is 22(vp)+22(rp)+16(aid)+4(flags)+1(valid)—8(hash)=57 bits
There are 256 words, or 14592 bits. With 8tr/cell, that's 120K transistors for the array.
Table: array only.
     57x256    2x57x128  4x57x128
Flat (JH style):  3021 by 7680  6042 by 3840
Square (JH style):  2337 by 9984  4674 by 4992
Tall (JMF style):  1938 by 13312  3876 by 6656
Cache cell:   1482 by 10240  2964 by 5120,  5928 by 5120
Table: Using the flat rams, the whole table takes up x=4060 by y=7290.
Area estimates
RAMs: x=4060 by y=10498
Fifo: 16 words of 78 bits -> x=1052 by y=3078
Table: two blocks of 128 words of 58 (odd version of 57) bits, each x=4060 by y=3698
SC block: the 2676 cells occupy 13.34 mm2 (no routing).
With 26 rows, the block is x=5800 by y=8628. Area is 50.0424 mm2, ratio = 3.75.
With 30 rows, the block is x=5192 by y=9508. Area is 49.3656 mm2, ratio = 3.70.
Layout in 7 hr: 4 hr placement on Clarissa, 3hr routing on Dorado.
Possible improvements
Reset logic: counter for h, 0 on valid (X on other bits), automaton providing the write pulse, started on rising edge of reset.
A special column with reset, or a reset automaton
Output enable on ram (to assemble larger rams without paying the mux)
Single-ported ram for map table. The table could be much bigger, but is it worth the risk in yield?
Notes
Goal: to provide a Map Cache for a prototype machine. This cache is simple, has short latency, and should be easy to debug and test. The miss rate should be better than 20%, but really, nobody knows, so it is pointless to try to improve performance. Another constraint is that this chip is to be designed in less than two months, in order to be fabricated with the Small Cache.
Yield: With 256 entries, the chip has close to 120K transistors. I could put more entries, but in the absence of yield data, I don't want to take the risk of a zero yield.
Commercial RAM: Interfacing to commercial RAMs should be easy: keep the logic and if there are not enough pins, mux. Logic is designed so that slowing down the access to memory is no problem.
CAM: It should be easy to put a fully associative memory instead of this cache. Timing and latency could stay the same. Flush[aid] could be restored as an atomic operation.
DBus: Minimum hook-up to the DBus. DeviceID.