Summary of Map processor DynaBus interface Mapping (aid, vp) to rp Map(a: AID, v: VirtualPage) Returns (r: RealPage, f: Flags) H : 11100,XX,iiiiiiiiii,000000000000000,vvvvvvvvvvvvvvvvvvvvvvXXXXXXXXXX D : XXXXX XX XXXXXXXXXX XXXXXXXXXXXXXXX,XXXXXXXXXXXXXXXXaaaaaaaaaaaaaaaa H : 11101,00,iiiiiiiiii,000000000000000,rrrrrrrrrrrrrrrrrrrrrrXXXXXXffff D : XXXXX XX XXXXXXXXXX XXXXXXXXXXXXXXX,XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX NOTE: May also return in error if entry is not present; the packet is then H : 11101,10,iiiiiiiiii,000000000000000,XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX D : XXXXX XX XXXXXXXXXX XXXXXXXXXXXXXXX,IIIIIIIIIIccccccccccccccccccc010 IO Read (aid implicit in AID register) ReadMapCacheEntry(v: VirtualPage) Returns (r: RealPage, f: Flags) - implicit aid H : 10000,XX,iiiiiiiiii,000000000000000,0101nnnn0Xvvvvvvvvvvvvvvvvvvvvvv D : XXXXX XX XXXXXXXXXX XXXXXXXXXXXXXXX,XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX H : 10001,00,iiiiiiiiii,000000000000000, -- same as in request packet -- D : XXXXX XX XXXXXXXXXX XXXXXXXXXXXXXXX,rrrrrrrrrrrrrrrrrrrrrrXXXXXXffff NOTE: May also return in error if entry is not present; the packet is then H : 10001,10,iiiiiiiiii,000000000000000, -- same as in request packet -- D : XXXXX XX XXXXXXXXXX XXXXXXXXXXXXXXX,IIIIIIIIIIccccccccccccccccccc111 ReadMapCacheRegister(r: RegNum) Returns (d: CARD32) H : 10000,XX,iiiiiiiiii,000000000000000,0101nnnn10XXXXXXXXXXXXXXXXXXXrrr D : XXXXX XX XXXXXXXXXX XXXXXXXXXXXXXXX,XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX H : 10001,00,iiiiiiiiii,000000000000000, -- same as in request packet -- D : XXXXX XX XXXXXXXXXX XXXXXXXXXXXXXXX,dddddddddddddddddddddddddddddddd IO Write (aid implicit in AID register) WriteMapCacheEntry(v: VirtualPage, r: RealPage, f: Flags, V: Bool) H : 10010,1X,iiiiiiiiii,000000000000000,0101nnnn0Vvvvvvvvvvvvvvvvvvvvvvv D : XXXXX XX XXXXXXXXXX XXXXXXXXXXXXXXX,rrrrrrrrrrrrrrrrrrrrrrXXXXXXffff H : 10011,00,iiiiiiiiii,000000000000000, -- same as in request packet -- D : XXXXX XX XXXXXXXXXXX XXXXXXXXXXXXXX,XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX WriteMapCacheRegister(r: RegNum, d: CARD22) H : 10010,1X,iiiiiiiiii,000000000000000,0101nnnn10XXXXXXXXXXXXXXXXXXXrrr D : XXXXX XX XXXXXXXXXX XXXXXXXXXXXXXXX,XXXXXXXXXXdddddddddddddddddddddd H : 10011,00,iiiiiiiiii,000000000000000, -- same as in request packet -- D : XXXXX XX XXXXXXXXXX XXXXXXXXXXXXXXX,XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX NOTE: Any of the IOWrite may return in error if kernel=1; the packet is then H : 10011,10,iiiiiiiiii,000000000000000, -- same as in request packet -- D : XXXXX XX XXXXXXXXXX XXXXXXXXXXXXXXX,IIIIIIIIIIccccccccccccccccccc001 Header decoding A header is valid iff: Cmd ZA DevT DevN VP in subset term MapRequest 1 110 0 0 X X 1 vp IOReadRequest 1 000 0 0 5 DevID if func[0]=0 t*id*(f0+vp) IOWriteRequest 1 001 0 0 5 DevID if func[0]=0 t*id*(f0+vp) BIOWriteRequest 1 010 0 0 5 X if func[0]=0 t*(f0+vp) vp means that the virtual page number is in the subset specified by VPMask and VPPattern. t means that the device type is mapcache (5). id means that the four low-order bits of deviceID match the deviceNumber in the request. f0 is the high-order bit of the function code. It is 1 for IO operations on registers. Let's define Cmd=c0c1c2c3c4c5, iorw=~c1*~c2, biow=~c1*c2*~c3, tio=t*(iorw*id+biow), and K=headerIn*ZA*c0*~c4. I fold c4 into ZA since a 16 input gate is a beautiful thing, and get ~K=~headerIn +~ZA16 +~c1. Then ValidHeader == K*(map*vp + tio*(f0+vp)) == K*(vp*(map+tio)+tio*f0) == NOR[~K, ~(vp*(map+tio)+tio*f0)] In addition, for IOWriteRequest and BIOWriteRequest, the mode bit must be 0 (kernel). If it isn't, the header is accepted but the write will not take place and an error will be issued in the reply packet. A trap: mode=0 means kernel. AccessError == (iow+biow)*mode == ~c1*(c2#c3)*mode Input FIFO The fifo needs to hold 78 bits: (1) Access error (3) Command for map, ioRead, ioWrite, bioWrite (10) DeviceID of requestor (32) address (32) data=rp+flags, or aid, or any mask and pattern Control.sch Cmd=c1c2c3 Map c1 NotBroadcast c1+~c2 WTable ~AccessError*~f0*~c1*(c2#c3) WReg ~AccessError*f0*~c1*(c2#c3) ReadReg f0*~c1*~c2*~c3 RegOp f0*~c1 Error AccessError + (map+read entry)*~match*lookUpMap selRpF c1*~Error faultCode Map*010+Read{Entry}*111+Write*001 Output.sch rpf IF LookUpMap THEN TableOut.(rp,flags) ELSE (Registers.rpOut,1100) reg content of the selected register Registers.sch All registers are 22-bit wide, except AID, a 16-bit register. (0) AID: used by ReadEntry and WriteEntry. (1) SharedPattern, (2) SharedMask: specify the shared area. (3) BypassPattern, (4) BypassMask, (5) BypassBase: specify the bypass area. (6) SubSetMask, (7) SubSetPattern: specify the subset of vp; used for multiple map caches. The register RegAddr is accessed if RegOp=1. Its content (left-filled with zeros in the case of AID) is available on RegOut. It is written if WReg=1, and the value is the low-order part of DIn. AddrIn FifoOut.Address DIn FifoOut.Data RegOp (ioRead+ioWrite+bioWrite) * FifoOut.Address[8] WReg (ioWrite+ bioWrite) * FifoOut.Address[8] * ~AccessError rpOut IF ~LookUpMap THEN rp vpIn IF Map THEN FifoOut.IOAddress[0..22) ELSE FifoOut.IOAddress[10..32) AID IF vp IN SharedArea THEN 0 ELSE aid Multiple map caches Two registers accessible through IORead and IOWrite are used to specify a mask and a pattern for vp. If vp*SubSetMask#SubSetPattern*SubSetMask, any request using vp (Map, Read Entry, or Write Entry) is ignored. SubSetPattern and SubSetMask are 22 bits quantities. Mapping algorithm For Map and ReadEntry, the following algorithm is used to compute rp and the flags. For WriteEntry, the write is not performed if commandAid=-1 or vp IN BypassArea. va IN BypassArea: vp&BypassMask=BypassPattern&BypassMask va IN SharedArea: vp&SharedMask=SharedPattern&SharedMask flags: Dirty, KWtEnable, UWtEnable, URdEnable rp _ SELECT TRUE FROM commandAid=-1 => vp, vp IN BypassArea => (BypassBase&BypassMask) V (vp&~BypassMask), ENDCASE => LookUp[vp, vp IN SharedArea THEN 0 ELSE commandAid]; flags _ SELECT TRUE FROM commandAid=-1, vp IN BypassArea => Dirty=TRUE, KWtEnable=TRUE, UWtEnable=FALSE, and URdEnable=FALSE, ENDCASE => LookUp[vp, vp IN SharedArea THEN 0 ELSE commandAid]; Reply Packet All packets are made with header = 1 Cmd 1 Error 0 RqDevId Z15 | First data = Z32 | Second First _ IF Map THEN (rp Z6 flags) ELSE address Second _ IF Error THEN (DeviceID, Z19, fault) ELSE SELECT Cmd FROM Read Reg => (extended reg), Read Entry => (rp, Z6, flags), Map, Write => (XXXX, but actually the same as for ReadEntry), ENDCASE => NULL; Error codes: MapFault (010) issued in case of a Map Cache miss -- Map DynaBusOtherFault (111) issued in case of a Map Cache miss -- ReadEntry IOAccessFault (001) issued by any IOWrite if kernel=1 -- IOWrite if kernel=1 on a BIOWrite, we ignore the request Booting Set reset ... After reset, the IOB sets the AID register to -1. The boot program is running with aid=-1. The IOB write aid=-1 into all caches, then writes start code in VM corresponding to the reschedule location, then issues a reschedule, then releases PReset of all processors Then the map cache gets flushed with 256 IOWrite (and cheating by knowing the hash function). Hashing Let's decompose vp=vpH|vpM|vpL, with vpL the low-order byte, vpM the next one, and vpH the high-order 6 bits. Similarly, aid=aidH|aidL, where aidL is the low-order byte. Then h=vpL XOR vpM XOR aidL XOR aidH. The low-order byte aidL is not stored in the table, and can be restored as aidL=h XOR vpL XOR vpM XOR aidH. This function is insensitive to the fact that some bits of vp might be stuck if the map cache maps only a part of the address space. RAMs Available RAM cells are: short: 53*30 area=1590 square: 41*39 area=1599 tall: 34*52 area=1768 cache: 26*40 area=1040 Fifo: 16 words of 78 bits. Tall: 1052 by 3062 Flat: 700 by 4662 Entry Table: Width is 22(vp)+22(rp)+16(aid)+4(flags)+1(valid)8(hash)=57 bits There are 256 words, or 14592 bits. With 8tr/cell, that's 120K transistors for the array. Table: array only. 57x256 2x57x128 4x57x128 Flat (JH style): 3021 by 7680 6042 by 3840 Square (JH style): 2337 by 9984 4674 by 4992 Tall (JMF style): 1938 by 13312 3876 by 6656 Cache cell: 1482 by 10240 2964 by 5120, 5928 by 5120 Table: Using the flat rams, the whole table takes up x=4060 by y=7290. Area estimates RAMs: x=4060 by y=10498 Fifo: 16 words of 78 bits -> x=1052 by y=3078 Table: two blocks of 128 words of 58 (odd version of 57) bits, each x=4060 by y=3698 SC block: the 2676 cells occupy 13.34 mm2 (no routing). With 26 rows, the block is x=5800 by y=8628. Area is 50.0424 mm2, ratio = 3.75. With 30 rows, the block is x=5192 by y=9508. Area is 49.3656 mm2, ratio = 3.70. Layout in 7 hr: 4 hr placement on Clarissa, 3hr routing on Dorado. Possible improvements Reset logic: counter for h, 0 on valid (X on other bits), automaton providing the write pulse, started on rising edge of reset. A special column with reset, or a reset automaton Output enable on ram (to assemble larger rams without paying the mux) Single-ported ram for map table. The table could be much bigger, but is it worth the risk in yield? Notes Goal: to provide a Map Cache for a prototype machine. This cache is simple, has short latency, and should be easy to debug and test. The miss rate should be better than 20%, but really, nobody knows, so it is pointless to try to improve performance. Another constraint is that this chip is to be designed in less than two months, in order to be fabricated with the Small Cache. Yield: With 256 entries, the chip has close to 120K transistors. I could put more entries, but in the absence of yield data, I don't want to take the risk of a zero yield. Commercial RAM: Interfacing to commercial RAMs should be easy: keep the logic and if there are not enough pins, mux. Logic is designed so that slowing down the access to memory is no problem. CAM: It should be easy to put a fully associative memory instead of this cache. Timing and latency could stay the same. Flush[aid] could be restored as an atomic operation. DBus: Minimum hook-up to the DBus. DeviceID. ºMapCacheNotes.tioga Copyright Ó 1987 by Xerox Corporation. All rights reserved. Jean-Marc Frailong November 30, 1987 12:09:16 pm PST Louis Monier February 14, 1988 9:40:06 pm PST Êð– "cedar" style˜šœ™Icode™˜>Mšœ œ  œ˜Mšœ œ œ œ˜IMš œ œ œ  œ œ˜)—˜Mšœ¦žœž œžœC˜ˆ—˜š œžœž œCž œ4 œ ˜¤Kš œ œ žÏmž œž ¡ž ˜8Kš œ œ ž¡ž œž ¡ž ˜8šœžœž œž œž ˜-K˜—šœ œ œ ˜Kšœ˜Kšœ œž ¡ž œ¡œž¡œž œ˜?Kš œ œ  œ œ ˜?—šœ œ œ ˜Kšœ œžœÏsž œ¢œž œ¢œž œ¢œ˜dKš œ œ  œ œ ˜?———˜ ˜O˜,˜K˜Kš œ œ œ œ œ˜.šœ  œ œ  œ ˜.š  œ ˜Kšœ˜Kšœ œ ˜Kšœ=˜=Kš œ œ˜————˜ Mšžœ1˜9Mšžœ6˜Gšž œ3˜@Mšœ0˜0———˜Mšœ ˜ Mšœ3˜3Mšœ)˜)˜&M˜GM˜M˜&—M˜]—˜M˜ªMš œ ¢œ¢œ¢œZ¢œ¢œ¢œ˜’M˜„—˜šœ ¢œ ˜Mšœ˜Mšœ˜Mšœ˜Mšœ˜—˜Ošœ˜Ošœ˜—šœ ˜ Mšœ@˜@MšœY˜Y—šœ˜Ošœ$˜$Ošœ,˜,Ošœ.˜.Ošœ.˜.Ošœ:˜:—MšœF˜F—˜šœ˜Mšœ-˜-MšœT˜T—šœ(Ïuœ˜7Mšœ@£œ˜PMšœ@£œ˜PM˜B——˜Mšœ˜Mšœ1˜1MšœE˜EM˜d—L˜Mšžœ÷˜üMšžœ¦˜¬Mšžœ±˜ÀMšžœª˜®Mšžœ(˜-—…—'¼.f