4 blocks per assoc in small cache CWS turns around with correct data Wts turn around at cache with owner and ~shared Data always valid at and below cache with owner and shared below RB snooper can record valid data. Retry not needed Kill Block To enable >2 levels Assoc limits last level isn't doing work unless assoc limits reached (and kill needed) Instead, larger blocks more time external matches words blocks smlLine bigLine blocks 8 smlLine 32 4 bigLine 1024 32 8 bigCache 256 32 blocks/bigLine * 7 data bits/block 7 data words/bigLine Big cache requirements: Number of blocks matched in a big cache is >= sum of all the caches below. Big Cache can maintain data off chip to make room for more matchers. Since the data is maintained off chip on big caches, high speed consistancy maintenance operations triggered by monitoring the bus above must be simple. Specifically, CWS and WS are difficult to implement since CWS requires a read, compare, write operation on off chip data in 125ns and WS requires a 50ns write (assuming 25 ns bus cycle). Proposed Changes: SmallCache store 4 blocks per matcher rather than one. This makes is possible for a big cache can cover 8 to 16 small caches. Simplify write protocols. Old bus write operation protocols: CWS Rqst 2 Cycle Rply 5 Cycle mirrors request (3 dummy cycles to provide time) WS Rqst 2 Cycle Rply 2 Cycle mirrors request WB Rqst 5 Cycle Rply 2 Cycle mirrors request address Proposed bus write operation protocols: CWS Rqst 2 Cycle Rply 5 Cycle update block replies with new block WS Rqst 2 Cycle Rply 5 Cycle update block replies with new block WB Rqst 5 Cycle Rply 5 Cycle update block replies with new block Use Bit Victim Each cycle the victim pointer examines the use bit. IF set THEN clear and advance ELSE hold To execute victim, prevent movement for two cycles. This clears the use bit and since the processor can't do any more requests, the use bit will stay cleared. The first time an Intermediate cache hands back a RBRqst it sets shared to insure that a write will propagate to the top. shared owner sharedBelow -- may contain exact details rather than BOOL ownedBelow -- may contain exact details rather than BOOL shared => shared for all caches equal or below. shared => sharedBelow. sharedBelow #> shared. ownedBelow => ownedBelow for all caches above. ownedBelow => owned. owned #> ownedBelow. Caches reply to botWrites to NOT shared blocks. Caches relay up botWrites to shared blocks. Caches reply to topReads from owner blocks. Caches relay down topReads from ownedBelow blocks. Caches reply to botReads from NOT ownedBelow blocks. Caches relay up botReads from NOT ownedBelow AND UnCached blocks. Caches update from topWrites to NOT sharedBelow blocks. Caches relay down topWrites to sharedBelow. A victim block for which owner is set must be flushed. bottom WSRqst ~ sharedAbove => WSRply bottom bottom WSRqst sharedAbove => WSRqst top top WSRply ownedBelow => WSRply bottom top RBRqst owned => RBRply top top RBRqst ownedBelow => RBRqst bottom SnoopMatch[snoop, rAddr] should probably be enabled by ~myId Snooper only really needed as FB enabler and RBRply shared and replyStale Snooper initialized by RBRqst Snooper.shared set if matching mem ref noticed RB, WS, CWS, WB. (not FB) Snooper.rplyStal set if matching mem Wt ref noticed WS, CWS, WB. Difficulty: When victimizing a block, you must pick some instance to look at the owner bit to decide whether to send a FBRqst. From that instance until the FBRqst actually appears on the bus, the bus must be monitored for WSRply CWSRply WB and if seen, cancel the FB. Header: (Cmd, ModeOrFault, ReplyShared, DeviceID, Address) ReadBlock 0000 Cache to Memory or Cache RBRqst 2 Header; ValidVictim, VictimAddress RBRply 5 Header; CyclicOrderData WriteBlock 0001 External to Memory and Cache WBRqst 5 Header; CyclicOrderData WBRply 2 Header; x WriteSingle 0010 Cache to (only) Caches WSRqst 2 Header; Data WSRply 2 Header; Data CWriteSingle 0011 xxx CWSRqst 2 Header; Old, New CWSRply 5 Header; Old, New; x; x; x FlushBlock 0100 Cache to (only) Memory FBRqst 5 Header; CyclicOrderData FBRply 2 Header; x KillBlock ---- BigCache to SmlCaches KBRqst 5 Header; CyclicOrderData FBRply 2 Header; x IORead 1000 Caches to IO IORRqst 2 Header; x IORRply 2 Header; Data IOWrite 1001 Caches to IO IOWRqst 2 Header; Data IOWRply 2 Header; x BIOWrite 1010 Caches to IO Type BIOWRqst 2 Header; Data BIOWRply 2 Header; x Map 1110 SmlCache to MapCache MapRqst 2 VPage; AddrSpaceID MapRply 2 RPage, Flags; XXX DeMap 1111 Clears VPValid in all Caches DMapRqst 2 RPage; x DMapRply 2 RPage; x RBRqst - Header; ValidVictim, VictimAddress RBRply 5 Header; CyclicOrderData WBRqst 5 Header; CyclicOrderData WBRply - Header; x WSRqst - Header; Data WSRply - Header; Data CWSRqst - Header; Old, New CWSRply 5 Header; Old, New; x; x; x FBRqst 5 Header; CyclicOrderData FBRply - Header; x KBRqst ? Header; CyclicOrderData FBRply - Header; x IORRqst - Header; x IORRply - Header; Data IOWRqst - Header; Data IOWRply - Header; x BIOWRqst - Header; Data BIOWRply - Header; x MapRqst - VPage; AddrSpaceID MapRply - RPage, Flags; x DMapRqst 2 RPage; x DMapRply 2 RPage; x Rqstr/Rplyr Lstnr RBRqst C C M RBRply M C WBRqst C C M WBRply M WSRqst C C WSRply T C CWSRqst C C CWSRply T C FBRqst FBRply KBRqst FBRply MapRqst C MapRply mc C DMapRqst DMapRply IORRqst IO IO IORRply IO T IO IOWRqst IO IO IOWRply IO T IO BIOWRqst IO IO BIOWRply T IO Fetch or Store and miss => ReadBlock Store and shared => WriteSingle victim and owner => FlushBlock For each block shared owner For each outstanding request sharedAccumulator rplyStale ReadBlockReply For each block existsBelow bit for a block is set only if some small cache also has a copy of the block allows a big cache to filter packets that appear on the main bus BigCache.tioga Open BigCacheNotes.tioga DynaBusLogicalSpecifications.tioga DynaBusGuidelines.tioga Time all referenced to Recieved Packets Multi level Shared at top implies shared at bottom Owner at Bottom implies Owner at top Two choices one line in the big cache for each small cache line xxx watch victims/ decode cache id/ keep track of who has what Dynabus: 64 bits Per 25ns x 4/7 < 200 MBytes/Sec Memory average 80ns/bit = 12 Mbits/Sec 8*200/12 = 128 bit bus to memory 32 4x256K rams => 2 MBytes 120 = 16 Bytes Per Mem access A hit must be encoded into Ram Address Ops 32 bit real address => Assume technology = .8 => 1/6 2 micron area Header: (Cmd, ModeOrFault, ReplyShared, DeviceID, Address) ReadBlock 0000 Cache to Memory or Cache RBRqst 2 Header; ValidVictim, VictimAddress RBRply 5 Header; CyclicOrderData WriteBlock 0001 External to Memory and Cache WBRqst 5 Header; CyclicOrderData WBRply 2 Header; x WriteSingle 0010 Cache to (only) Caches WSRqst 2 Header; Data WSRply 2 Header; Data CWriteSingle 0011 xxx CWSRqst 2 Header; Old, New CWSRply 5 Header; Old, New; x; x; x FlushBlock 0100 Cache to (only) Memory FBRqst 5 Header; CyclicOrderData FBRply 2 Header; x KillBlock ---- BigCache to SmlCaches KBRqst 5 Header; CyclicOrderData FBRply 2 Header; x IORead 1000 Caches to IO IORRqst 2 Header; x IORRply 2 Header; Data IOWrite 1001 Caches to IO IOWRqst 2 Header; Data IOWRply 2 Header; x BIOWrite 1010 Caches to IO Type BIOWRqst 2 Header; Data BIOWRply 2 Header; x Map 1110 SmlCache to MapCache MapRqst 2 VPage; AddrSpaceID MapRply 2 RPage, Flags; XXX DeMap 1111 Clears VPValid in all Caches DMapRqst 2 RPage; x DMapRply 2 RPage; x RBRqst - Header; ValidVictim, VictimAddress RBRply 5 Header; CyclicOrderData WBRqst 5 Header; CyclicOrderData WBRply - Header; x WSRqst - Header; Data WSRply - Header; Data CWSRqst - Header; Old, New CWSRply 5 Header; Old, New; x; x; x FBRqst 5 Header; CyclicOrderData FBRply - Header; x KBRqst ? Header; CyclicOrderData FBRply - Header; x IORRqst - Header; x IORRply - Header; Data IOWRqst - Header; Data IOWRply - Header; x BIOWRqst - Header; Data BIOWRply - Header; x MapRqst - VPage; AddrSpaceID MapRply - RPage, Flags; x DMapRqst 2 RPage; x DMapRply 2 RPage; x Rqstr/Rplyr Lstnr RBRqst C C M RBRply M C WBRqst C C M WBRply M WSRqst C C WSRply T C CWSRqst C C CWSRply T C FBRqst FBRply KBRqst FBRply MapRqst C MapRply mc C DMapRqst DMapRply IORRqst IO IO IORRply IO T IO IOWRqst IO IO IOWRply IO T IO BIOWRqst IO IO BIOWRply T IO Fetch or Store and miss => ReadBlock Store and shared => WriteSingle victim and owner => FlushBlock RBRqst and ~self and owner => RBRply waiting on RB and (WSRply, CWSRply WBRqst) => RBRqst again (RplyStale) For each block shared owner For each outstanding request sharedAccumulator rplyStale ReadBlockReply For each block existsBelow bit for a block is set only if some small cache also has a copy of the block allows a big cache to filter packets that appear on the main bus BigCache.tioga Open BigCacheNotes.tioga DynaBusLogicalSpecifications.tioga DynaBusGuidelines.tioga Dynabus: 64 bits Per 25ns x 4/7 < 200 MBytes/Sec Memory average 80ns/bit = 12 Mbits/Sec 8*200/12 = 128 bit bus to memory 32 4x256K rams => 2 MBytes 120 = 16 Bytes Per Mem access A hit must be encoded into Ram Address Ops 32 bit real address => Assume technology = .8 => 1/6 2 micron area τBigCacheNotes.tioga Don Curry February 9, 1988 5:13:52 pm PST Notes White Board Size Notes Flush block - Executing the Victim After locating a victim block, you must pick some time to look at the owner bit to decide whether to send a FBRqst. If the bit is set, then from that time until the FBRqst actually appears on the bus, the bus must be monitored for WSRply CWSRply WB. If one of these write packets is seen then ownership has been lost and the FBRqst must be aborted. There is still the time period, bus to snooper plus headerEnable to bus which if long enough for a matching WBRply or WSRply/FB to sneak in, could cause problems. In the first implementation, this is taken care of by requiring memorys to throw away any FB's which occur immediately following WB's. (The time period in question is too short for a WSRply/FB pair to sneak in). Read Block Between the time of sending out a RBRqst and receiving the reply, the cache is responsible for monitoring the requested address just as if the data had already been received. That is setting shared if someone else does a memory reference to that location and either recording new data of WSRply, CWSRply and WBRply or just marking the transaction as invalid using the replyStale bit of the Snooper. Snooper The Snooper is used in both cases described above: Disable sending a FBRqst if the victim is no longer owned. The Snooper is enabled with the victim addr when the victim.owned bit is read. While waiting for the FBRqst to be sent, abortFB is set if: WSRply, CWSRply, WBRply Notice new clients and values while waiting for RBRply The Snooper is enabled with the reqested addr when the RBRqst is seen on the bus. While waiting for the RBRply, replyStale is set if: WSRply, CWSRply, WBRply While waiting for the RBRply, shared is set if: RB, WS, CWS, WB Write Block WriteBlockRqst moves up to the root where the write occurs. WriteBlockRply follows IsPresent down clearing owner and updating data on the way. Caches do not pull shared/owner for and WBRqst. WBRply clears owner and leaves shared unchanged. WriteSingle WriteSingleRqst moves up to the point where the block is no longer sharedAbove. Then it is turned around and follows IsPresent down. The sender path sets owner and the previous owner is cleared. Caches pull shared for the Rqst and set their shared values and data based on the reply. ReadBlock ReadBlockRqst moves up to a level where it is cached (default top) where it is turned around and follows the requestor path back down. Intermediate caches set their shared bits on this new entry. Caches pull shared/owner and the requestor sets shared based on the reply. FlushBlock FlushBlockRqst moves ownership up one level. At the top, this moves valid data to memory. Victim pointer. There is a use bit associated with each block. Each cycle the victim pointer examines the use bit. IF set THEN clear and advance ELSE hold position To execute victim, prevent movement for two cycles. This clears the use bit and since the processor can't do any more requests, the use bit will stay cleared. Ensuring that an owner path gets initialized in a multi-level system. In order to ensure that ownership of a block is communicated all the way up the tree when a block is first written, it is necessary for all intermediate caches to set their shared bits true for a block when it first gets read into the cache. If after the first write into this block it truely is not shared, these shared bits will be cleared. In the process though, an owner path from the top will have been defined. Change Notes. Multiple (4) blocks per associator in Bottom Cache. 50 associators per small cache 400 associators per second level Cache 8 Wds per Block 32 Wds per associator QuadBlock 1024 Wds per page CWSRply returns the correct data (computation only done once at point of reflection). All write requests (WS, CWS, WB) are reflected by cache with the block present and ~shared. This is always true for the top cache server (memory). All write replys (WS, CWS, WB) are 5 cycles and contain the entire new block. While waiting for a RBRply: Array Data is maintained as if the block were already present. Any write reply to that block is performed and staleReply is set. Array Shared is maintained as if the block were already present. Bus shared is pulled on other requests. Array Owner will always be false. The arrival of the RBRply updates data only if NOT staleReply. A cache which pulls owner on a BlockRead and does not currently have shared set must issue a modified RBRply (MRBRply) instead of RBRply in order to notify the cache above to update its data for the block and insure that the parent caches of shared caches have valid data. This replaces FB. Consider a revolving Blank in addition to victim (instead of Snooper) Consider bottom cache may issue RB for address it already has (VM page alias) Victim can't be touched until reply received The client does an array match on all? The server only receives: Request where owner is not set. Reply Addressed to it. MRBReply. FBRqst. FB is only done by caches victimizing owned unshared blocks If it was shared, the parents copy would already be valid The FB snooper needs to only watch for WBRply (WSRply and CWSRply cant happen) Don Curry December 10, 1987 4:08:11 pm PST Don Curry December 10, 1987 4:08:11 pm PST Κ b˜šœ™J™)J˜—head™™ J˜!J˜"J˜/J˜@J˜3˜ J˜J˜ J˜IJ˜1——™ J˜!J˜ J˜J˜J˜J˜J˜;—K™™"Jš œχΟkœΣœ œγœ™Χ—™ J™Ž—™™2™:J™NJ™S—™6J™QJ™KJšœ0œœœ™?———™ Jšœς™ς—™ J™Ÿ—™ J™‘—™ J™[—™J™.šœ4œ™:Jšœ™Jšœ™—J™Ÿ—™EJ™’—™ ™3J™ J™&J™J™J™—J™U™[J™6—J™M™™>J™A—J™@J™'J™!J™>—J™J™J™E™MJ™,—J™&™J™J™J™ J™—™;J™9J™N———˜JšœJ˜JJšœD˜DJ˜ΤJ˜—˜˜6J˜F—˜˜"JšœL˜OJšœ+˜-Jšœ3˜5—˜'Jšœ@˜CJšœ@˜BJšœ@˜B———J˜˜J˜J˜šœ4œ˜:Jšœ˜Jšœ˜ —J˜ŸJ˜yJ˜Jšœ˜J˜Jšœ Οc-˜9Jšœ ž-˜8J˜Jšœ1˜1Jšœ˜Jšœ˜J˜J˜/J˜J˜J˜Jšœœ˜0šœ.˜.J˜—J˜.šœ2˜2J˜—Jšœ œ˜6Jšœœ œ˜BJ˜Jšœ œ˜7Jšœ-˜-J˜J˜6J˜J˜Jšœ,˜,Jšœ(˜(Jšœ(˜(J˜Jšœ!˜!Jšœ(˜(J˜Jšœ<˜