NewSmallCacheAlgorithm.tioga
Pradeep Sindhu January 25, 1988 2:36:09 pm PST
New Cache Consistency Algorithm (A1)
0. Note on the New Algorithm
This document specifies an improvement to the current cache consistency algorithm A0. The purpose of the improvement is to clean up algorithm A0 conceptually in the area of FlushBlock and WriteBlock. The basic idea behind the cleanup is given in this section, while details are left to the section on Bus Side Specification.
The key idea is that FlushBlock is conceptually requested by the memory controller, not by caches. Given this premise, the data returned by the FlushBlockReply was good at the time of the FlushBlockRequest, so all we have to do is to snoop between the time of the request and the reply to check if the reply is still good. If the data is good, the memory controller writes the block into its ram, otherwise it does not.
There are two separate implementations of the snooping to determine whether the reply data is good. The first of these is considerably more complicated for the memory controller, but simpler to understand: The memory controller keeps a snooper for each potential requestor on the bus, and does all of the snooping for the FBRqsts by itself; this is bad because of the multiple snoopers, but its easy to understand why it works (it is completely analogous to the snooping for ReadBlock). In the second, the interval between FBRqst and FBRply is divided up into two non-overlapping parts: a part where the cache snoops as long as it can, and a part where the memory controller takes over the snooping because the cache is blind (it cannot see a packet just before the FBRply). In this latter implementation, the "snooping" done by the memory controller is very simple—all it does is to not process an FBRply to a given address for some small number of cycles after an FBRply or WBRqst to that address.
Now, how does one get the memory controller to issue an FBRqst? It turns out one doesn't need to because the RBRqst already contains the address of the block to be flushed, so the cache can treat an RBRqst as a combination of the old RBRqst and FBRqst. The immediate problem is that when the requesting cache is owner of the address being requested, it won't be able to pull owner or reply because it is busy reading out the victim at this time, so the memory will respond by default. However, this is not really a problem, because the cache already has up-to-date data. When the RBRply comes, the cache will get a match on the real address, and this match can be used as a signal that it must ignore the data part of the reply. Thus, the sequence of events on the bus will be:
RBRqst; (FBRply); RBRply;
The FBRply is conditional (hence the parentheses).
There is one added complication, which has to do with the Snooper. In A0, the Snooper was used for one thing at a time, while here we need the Snooper to snoop on two addresses at a time: RAdrs, and VRAdrs (the victim real address). The two addresses in the Snooper would be loaded at the same time as the Rqst buffer header and data, and the Snooper would be activated by the arrival of MyRBRqst. The part of the Snooper containing VRAdrs would not be needed beyond the FBRply, while the part containing RAdrs would be needed all the way up to RBRply.
1. Bus Side Specification
The specification is broken up into several parts, one for each packet type on the DynaBus. Note that in the current algorithm, data is not maintained consistent all the time. In particular, when a cache does an RBRqst and gets a non-orphan reply that is stale, it puts the block into the cache anyway with RPValid=1 and VPValid=0. Now clearly, the data in the block is out of date, but it cannot be accessed by anyone: on the processor side VPValid is 0; on the bus side, OwBit is 0 so no one can see this data. The reason we set RPValid is to stay consistent with the fact that in the RBRqst we said we were going to cache the block. This way, the big cache can maintain the exists below bits exactly.
The transaction WB will be used by a high speed device to inject data into the memory system as follows: When it wants to inject a block, it simply puts a WBRqst on the bus. Caches will check if the WBRqst matches and will overwrite their contents and clear owner if it does. This scheme has a hole in that an FBRqst immediately following the WBRqst will cause stale data to appear in memory. This, combined with the fact that no one is owner (because of the WBRqst, means we have a problem).
At this time the only thing we know needs to be done to make the algorithm work for the big cache is to add a transaction called KillBlockRqst (KBRqst). The request would be issued by the big cache after it had made sure that it had current data by doing an RBRqst. A small cache seeing KBRqst would simply clear the valid bits (real and virtual) for that block.
RBRqst[RequestorId, RA, VRA]
Slave Caches
idMatch: BOOL ← myId=RequestorId;
IF idMatch
THEN {
Snooper.Valid ← 1;
RMatch[RA]; Using Victim DO {ClrRPValid; ClrVPValid};
Using Victim DO RdRam;
IF VictimValid THEN FIFO ← [Cmd: FBRply, VRA, Data]
}
ELSE {
IF Snooper.Match THEN {Snooper.ShBit ← 1; BShared ← 1};
RMatch[RA];
IF ArrayRealMatch THEN {
Using RMatch Do {RdRam; Array.ShBit ← 1};
BShared ← 1;
IF Array.OwBit THEN {BOwner ← 1; FIFO ← [Cmd: RBRply, RplyShared: 1, RA, Data]}
}
}
Memory Controller Slave
InitiateFetch[RA];
Wait[dso];
abort ← BOwner; RplyShared ← BShared;
IF ~abort THEN FIFO ← [Cmd: RBRply, RplyShared: RplyShared, RA, Data];
RBRply[RequestorId, RplyShared, RA, Data]
Slave Caches
idMatch: BOOL ← myId=RequestorId;
IF idMatch
THEN {Snooper.Valid ← 0; NonFBTIP ← 0}
ELSE quit;
Snooper.Valid ← 0;
RMatch[RA];
IF ArrayRealMatch
THEN Using RMatch Do {WtVCam; WtRCam -- Note: no WtRam!!}
ELSE Using Victim Do {
VPValid ← ~Snooper.RplyStale; RPValid ← 1;
WtVCam; WtRCam;
WtRam; Array.ShBit ← Snooper.ShBit OR RplyShared; Clr[Array.OwBit]
};
UnFreezeVictim[];
Memory controller
NoOP[];
WBRqst[RequestorId, RA, Data]
Slave Caches
idMatch: BOOL ← myId=RequestorId;
IF idMatch
THEN {Snooper.Valid ← 1}
ELSE {IF Snooper.Match THEN Snooper.RplyStale ← 1}
RMatch[RA];
IF ArrayRealMatch THEN {
Using RMatch Do {WtRam; Clr[Array.OwBit]};
}
Memory Controller Slave
Store[RA, Data];
FIFO ← [Cmd: FBRply, RplyShared: 0, RA];
WBRply[RequestorId, RplyShared, RA, Data]
Slave Caches
idMatch: BOOL ← myId=RequestorId;
IF idMatch THEN {Snooper.Valid ← 0; NonFBTIP ← 0};
Memory controller
NoOP[];
FBRqst[RequestorId, RA, Data]
Slave Caches
idMatch: BOOL ← myId=RequestorId;
IF idMatch
THEN Snooper.Valid ← 0;
Memory Controller Slave
Store[RA, Data];
FIFO ← [Cmd: FBRply, RplyShared: 0, RA];
FBRply[RequestorId, RplyShared, RA, Data]
Slave Caches
idMatch: BOOL ← myId=RequestorId;
IF idMatch THEN FBTIP ← 0;
Memory controller
NoOP[];
WSRqst[RequestorId, RA, Data]
Slave Caches
idMatch: BOOL ← myId=RequestorId;
IF idMatch
THEN Snooper.Valid ← 1
ELSE IF Snooper.Match THEN BShared ← 1;
RMatch[RA];
IF ArrayRealMatch THEN {
IF ~idMatch THEN BShared ← 1;
};
Memory Controller Slave
Wait[dso];
RplyShared ← BShared;
FIFO ← [Cmd: WSRply, RplyShared: RplyShared, RA, Data];
WSRply[RequestorId, RplyShared, RA, Data]
Slave Caches
idMatch: BOOL ← myId=RequestorId;
IF idMatch
THEN {Snooper.Valid ← 0; NonFBTIP ← 0}
ELSE IF Snooper.Match THEN Snooper.RplyStale ← 1;
RMatch[RA];
IF ArrayRealMatch THEN {
IF idMatch
THEN Using RMatch Do {WtRam; Array.ShBit ← RplyShared OR Snooper.ShBit; Set[Array.OwBit]}
ELSE Using RMatch Do {WtRam; Clr[Array.OwBit]}
};
Memory controller
NoOP[];
CWSRqst[RequestorId, RA, New, Old]
Slave Caches
idMatch: BOOL ← myId=RequestorId;
IF idMatch
THEN LdSnooper[RA]
ELSE IF Snooper.Match THEN BShared ← 1;
RMatch[RA];
IF ArrayRealMatch THEN {
IF ~idMatch THEN BShared ← 1;
};
Memory Controller Slave
Wait[dso];
RplyShared ← BShared;
FIFO ← [Cmd: WSRply, RplyShared: RplyShared, RA, New, Old];
CWSRply[RequestorId, RA, New, Old]
Slave Caches
Note: Old value is in the MSW.
idMatch: BOOL ← myId=RequestorId;
IF idMatch
THEN NonFBTIP ← 0
ELSE IF Snooper.Match THEN Snooper.RplyStale ← 1;
RMatch[RA];
IF ArrayRealMatch THEN {
Using RMatch Do RdRam;
IF Old=ValueRead THEN Using RMatch Do WtRam[New];
IF idMatch
THEN Using RMatch Do {Set[Array.OwBit]; Array.ShBit ← RplyShared OR Snooper.ShBit}
ELSE Using RMatch Do Clr[Array.OwBit];
}
Memory controller
NoOP[];
IORRqst[RequestorId, IOA]
Slave Caches
RMatch[IOA];
IF ArrayRealMatch THEN {
Using RMatch Do RdRam;
FIFO ← [Cmd: IORRply, IOA, RamData]
}
Memory Controller Slave
Match[IOA];
IF match THEN {
FIFO ← [Cmd: IORRply, IOA, RdIOReg[IOA]]
}
IORRply[RequestorId, IOA, Data]
Slave Caches
idMatch: BOOL ← myId=RequestorId;
IF idMatch THEN NonFBTIP ← 0;
Memory controller
NoOP[];
IOWRqst[RequestorId, IOA, Data]
Slave Caches
RMatch[IOA];
IF ArrayRealMatch THEN {
Using RMatch Do IOAction[];
FIFO ← [Cmd: IOWRply, IOA, Data]
}
Memory Controller Slave
Match[IOA];
IF match THEN {
WtIOReg[IOA, Data];
FIFO ← [Cmd: IORRply, IOA, Data]
}
IOWRply[RequestorId, IOA, Data]
Slave Caches
idMatch: BOOL ← myId=RequestorId;
IF idMatch THEN NonFBTIP ← 0;
Memory controller
NoOP[];
BIOWRqst[RequestorId, IOA, Data]
Slave Caches
NoOp[];
Memory Controller Slave
FIFO ← [Cmd: BIOWRply, IOA, Data];
BIOWRply[RequestorId, IOA, Data]
Slave Caches
idMatch: BOOL ← myId=RequestorId;
IF idMatch THEN NonFBTIP ← 0;
RMatch[IOA];
IF ArrayRealMatch THEN {
Using RMatch Do IOAction[];
}
Memory controller
NoOp[];
MapRqst[RequestorId, VA]
Slave Caches
NoOp[];
Memory Controller Slave
NoOp[];
MapRply[RequestorId, RA, Undefined] -- note the order: RA comes in header word
Slave Caches
This design requires RA to come in the header cycle
idMatch: BOOL ← myId=RequestorId;
IF idMatch THEN {
{NonFBTIP ← 0; RMatch[RA]} -- putting RA onto the bit lines
RCamRdLatch ← RA; -- We use the RdLatch as a temp reg for RA
LdFlags into VCamWtReg;
};
Memory controller
NoOp[];
DeMapRqst[RequestorId, RA]
Slave Caches
NoOp[];
Memory Controller Slave
FIFO ← [Cmd: DeMapRply, RA];
DeMapRply[RequestorId, RA]
Slave Caches
PRMatch[RP];
IF ArrayRealMatch THEN {
Using RMatch Do Clr[VPValid];
};
Memory controller
NoOP[];
2. Processor Side Specification
Two actions comprise the processor side microcode: PFetch and PStore.
The PSide microcode uses the following variables to communicate with the BSide microcode:
NonFBTIP  a transaction other than FB is in progress
FBTIP  an FB transaction is in progress
PFetch[VA]
VMatch[VA];
IF ArrayVirtualMatch
THEN { -- PFetchHit
Using VMatch Do RdRam
}
ELSE RETURN [HandleMiss[]];
PStore[VA, Data]
VMatch[VA];
IF ArrayVirtualMatch
THEN { -- PStoreHit
Using VMatch Do WtRam; Set[Array.OwBit];
IF ~Array.ShBit
THEN {Using VMatch Do WtRam; Set[Array.OwBit]}
ELSE {
Using VMatch Do RdRCam -- putting RA into RCamRdLatch;
Snooper.Adrs ← RCamRdLatch; Snooper.RplyStale ← 0; Snooper.Shared ← 0;
RqstBuffer.Data ← PData;
RqstBuffer.Header ← WSRqst|myId|RCamRdLatch;
RqstBuffer.DoRqst;
WHILE NonFBTIP DO NoOp[];
RETURN RplyReg;
}
}
ELSE {HandleMiss[]; Goto PStore}
HandleMiss[]
WHILE FBTIP DO NoOp[]; -- NB: This requires FBTIP to be cleared on Reset
PVMatch[VP];
IF ArrayVirtualMatch
THEN {
Using VMatch Do RdRCam -- putting RP|Offset into RCamRdLatch;
}
ELSE {
RqstBuffer.Header ← MapRqst|myId|VCamRdLatch; RqstBuffer.DoRqst; NonFBTIP ← 1;
WHILE NonFBTIP DO NoOp[];
}
Snooper.Adrs ← RCamRdLatch; Snooper.RplyStale ← 0; Snooper.Shared ← 0;
RqstBuffer.Header ← RbRqst|myId|RCamRdLatch;
FreezeVictim[];
Using Victim Do RdRCam; RqstBuffer.Data ← RCamRdLatch;
RqstBuffer.DoRqst; NonFBTIP ← 1;
WHILE NonFBTIP DO NoOp[];
RETURN RplyReg; -- NB: We don't wait for the FBRply. Its checked for before the next processor reference!