2. Functional Description
2.1 Generalities.
When the address of a command from below misses, the cache issues a RQ on the bus above. It then restarts the command, as in the hit case. When the address of a command from above misses, the cache ignores that command. From now on, we talk about a command on the assumption that it hits.
A cache monitors address cycles of commands issued by other caches on both buses. It pulls the shared line above on address match, and below when the address is marked shared.
The value of Master is set to 1 on Store from below; it is set to 0 on Store from above.
Victim selection in SC is done on a hex basis, where a hex is 4 quads. This happens on a line miss in SC, which provoques a RQ. The victim address is indicated during the 3rd cycle of the RQ (the second dead cycle), together withe the hex size, so as to make this mechanism general, and also usable during victimisation in BC. It follow that ExistsBelow is set in BC on RQ from below, and it is cleared according to the victim address in the 3rd cycle of a RQ from below, provided that no other cache has pulled the shared line after that cycle. Victim selection in BC is interesting since every quad in the caches below must also be in BC: a good victim for BC is one for which none of the ExistsBelow bits is set.
Before issuing a transaction for which the main MBus is required (WS, IOR, IOW, IORF, IOWF, MR, MRD), a cache first acquires the main mBus. Each cache has two request wires running to the arbiter above: Rq and GRq (for global request). For global transactions it asserts both Rq and GRq. A GRq request from below flows through intermediate caches and comes out on the M-Bus as Rq in one phase.
Unlike RQ and WQ, these other requests do not need to be restartable. A pBus command which asserts hold is handled in the same way, by first acquiring the main mBus.
RQ restart. For a processor read we can let the processor go ahead as soon as the requested word is fetched by SC. For a processor write we must hold up the processor till the RQ is complete. The cost of this should be small since the number of writes that cause RQ's should be small. A data miss happens around 15% of the time, and a write ref happens 1 out of 21 cycles. Three cycles are added for each missing write, so the added cost is 3*0.15/21, or around 2%.
If the fetched quad goes into an existing line then don't set quadValid until after commit. If it goes into a victim then don't set vpValid, rpValid, and quadValid until after commit
WQ restart. Note that unlike RQ we don't have to make the processor wait any longer than necessary. To see this, there are two cases to consider: WQ's caused by flush or launder; here the processor is not involved at all, so the assertion is true. For WQ's caused by a miss resulting from a processor reference the processor must be forced to wait anyway since the WQ is part of a WQ, RQ sequence. Inside BC don't clear master until after commit
2.2 ReadQuad.
2.21 Issue RQ above.
mBus below: Miss(A)
?A x x
mBus above
←Rqst Gnt ←RQA x x D0 D1 D2 D3
Upon reading a fresh quad, a cache sets Shared to 1, Master and ExistsBelow to 0.
2.22 Monitor RQ above.
When Master, provide the data, using the mBus nMAbort convention. The correct value has to be obtained through a RQ below when ~Shared and ExistsBelow; otherwise the cache uses its internal storage.
mBus above: Master(A) and {Shared(A) or ~ExistsBelow(A)}
RQA x ←nMAb
mBus above: Master(A) and ~Shared(A) and ExistsBelow(A)
RQA x ←nMAb ←nMdv ←nMdv ←nMdv
mBus below:
←RQA x x D0 D1 D2 D3
2.23 Monitor RQ below.
The cache just provides the data, assuming the nMAbort line has not been pulled during cycle 3:
RQA x x
2.3 WriteSingle.
2.31 Issue WS above.
To ensure atomicity of Stores, a cache needs to acquire the main mBus before issuing a WS (see Hold operations).
←*Rqst *Gnt ←WSA 𡤍
2.33 Monitor WS above.
Modify internal value. If ExistsBelow, pass the WS on the bus below :
mBus above: ExistsBelow(A) {note: it must be the case that Shared(A)}
WSA D
mBus below:
←WSA 𡤍
2.32 Monitor WS below.
To ensure atomicity of Stores, a cache needs to acquire the main mBus before issuing a WS.
2.33 Issue WS above.
The value of a SharedBit gets updated to that of the SharedLine on every WS issued above by that cache
2.23 Issue WS below.
2.4 WriteQuad.
A cache should not monitor WQ above. If it happen to own a copy of that quad, it has already been maintained consistent through WS.
2.41 Monitor WQ below.
2.42 Issue WQ above.