[_CD8_]<doradosource>DoradoManual.dm!7>d1memory2.bravo

Information about each reference is recorded in the 16-word pipe memory. Pipe layout is shown in Figure 10, which you should have in front of you while reading this section. The processor reads the pipe with the B←Pipe0, ..., B←Pipe5 functions. You should note that Pipe0, 1, and 5 are read high-true, while Pipe2 and 3 are read low-true; Pipe4 contains a mixture of high and low-true fields; 1503618 xor Pipe4’ produces high-true values for all fields in Pipe4. The discussion in this section assumes that all low-true fields have been inverted.

It is illegal to do ALU arithmetic on pipe data (not valid soon enough for carry propagation), and B←Pipei is illegal in the same instruction with a reference because Hold won’t be computed properly.

The EmulatorFault, NFaults, and SRNFirstFault stuff in Pipe2, which duplicate what B←FaultInfo would read back, is not part of the pipe, although it is read by B←Pipe2’; B←Pipe2’ is simply a convenient decode for reading it back—this will be discussed in the section on fault handling, not here. Similarly, Dirty, Vacant, WP, BeingLoaded, NextVictim, and Victim stuff in Pipe5 is not part of the pipe and is read back by B←Pipe5 purely for decoding convenience. This information, used primarily for debugging, is discussed later.

The Task, SubTask, VA, and cache control stuff in Pipe0, 1, 2, and 5 is used both internally by the memory system and externally by the processor. Map and error stuff in Pipe3 and 4 is solely for memory management and diagnostic activities carried out by the processor.

System microcode is expected to use the pipe in only two situations: fault handling by task 15 (the "fault task") and reading map or base registers by task 0 (the "emulator"). Other tasks will not read the pipe. This rigid view of how the pipe will be used during system operation has motivated the implementation discussed below.

Pipe entries are addressed by 4-bit storage reference numbers, or SRNs, assigned to each storage reference. All task 0 and task 15 references except PreFetch← with miss (and implicit FlushStore and Victim references) use the SRN contained in ProcSRN exclusively; all other references share SRN’s 2 to 15, which form a ring buffer addressed by an invisible register called ASRN.

To read a pipe entry, first ProcSRN←B addresses the pipe entry, then the contents of that entry are read with B←Pipei. In system microcode, the emulator is expected to keep the value 0 in ProcSRN to avoid smashing the ring buffer on references; if the fault task needs to make a reference, it will normally load ProcSRN with 1 and use that SRN for the reference; the fault task will manipulate ProcSRN however it likes to examine the pipe but always restore it to 0 before blocking; other tasks will not use ProcSRN. This implementation is welded to the assumption that only the fault task will probe the pipe when io tasks are running.

To io task references and emulator PreFetch←’es that miss, the cache address section’s SRN, called ASRN, is assigned at t2. ASRN will be advanced to the next ring value iff the reference starts the map. In all other cases ASRN remains unchanged and is used by the next reference as well.

A reference starts the map unless it is a DummyRef←, a cache reference or PreFetch← that hits, or a Flush← that misses or gets a clean hit. A convenient way to guarantee that the map is started without worrying about the contents of the cache is to do a Map← in the emulator or an IOFetch← in any other task. The reasoning behind this treatment of ASRN is explained in the section on fault reporting.

Tasks 1 to 14 generally cannot find out the SRN for their last reference. Even if this were determined somehow by polling all the pipe entries, there would be no assurance that, meanwhile, a higher priority task didn’t clobber the pipe entry.

Because of its single pipe entry, the emulator must wait for an earlier reference to finish or fault, before starting another. Of all emulator references, only a fetch, Store←, or PreFetch← might fault. However, PreFetch← doesn’t use the private pipe entry, so only a preceding fetch or Store← might still be in progress when a new reference is issued. If the new reference is another fetch or Store←, it will hold until the preceding one finishes (no problem). Hence, the only restriction imposed by the private pipe entry is that the emulator must cause hold with ←Md before issuing Map←, Flush←, or DummyRef←, if a fetch or Store← might still be in progress.

Timing constraints do not permit generating Hold in the above case. It has been observed that issuing Map← without holding for a previous Store← to finish will result in infinite DBufBusy (i.e., infinite Hold), so do not fail to issue ←Md before or concurrent with Map← or RMap←.

Conceptually, the pipe is three different memories. First, VA, task, subtask, and cache control bits in Pipe0, 1, 2, and 5 are written during the reference. Next, the 20 bits of map information in Pipe3 and Pipe4 are written following the map read-write (if any). Finally, the error correction-detection stuff in Pipe4 is written following the storage read (if any). The memory system needs one cycle for each of these accesses.

However, the hardware treats the pipe as only two separate memories internally, or as only a single memory for purposes of holding the processor. In other words, within the memory system Pipe0, 1, 2, and 5 may be accessed by one part of the pipeline, while another part independently accesses Pipe3 and 4. But processor accesses by B←Pipei are held, if the memory system wants any part of the pipe. Worse, the memory system uses the pipe between even clocks (t0 to t2), the processor between odd clocks (t1 to t3), so the processor is locked out for two cycles during each of these intervals.

Programs can safely read Pipe0, Pipe1, Pipe2, or Pipe5 (i.e., task, subtask, VA, and cache control stuff) in the cycle after any reference, since these are updated at the end of the cache address section cycle. B←Pipei in the cycle after a reference will hold for one cycle while the memory system uses the pipe.

Values in a pipe entry are not reset at the onset of a reference and Pipe3 and Pipe4 are not written at all unless storage is accessed. Consequently, Pipe3 and Pipe4 may refer to a previous reference *Caution*.

CacheRefa fetch or Store←
Store’Store←’
IFURefIFU fetches
RefTypedistinguishes read, write, Map←, and other references
FlushStoredirty victim write triggered by Flush←
ColViccache column of a hit, or of the victim on a miss

DummyRef← finishes immediately and only VA in Pipe0 and Pipe1 and the stuff in Pipe2 are relevant. For Flush←, cache information in Pipe5 is also valid. Flush← finishes immediately because the resulting FlushStore and dirty-victim write references (if any) are started in ring-buffer pipe entries.

Programs can read map stuff (Pipe3 and Ref, WP, Dirty, MapTrouble, and MemError in Pipe4) as soon as that part of the reference is complete. For Map←, completion of the map read is coincident with MapBufBusy going false, determined by polling. For a fetch or store, there is no way to distinguish completion of the map read from completion of the entire reference. Consequently, Pipe3 and Pipe4 are normally read by doing ←Md (which holds for completion), then reading the pipe.

For IOFetch←, IOStore←, and PreFetch← there is no way to tell when the reference has finished, except by waiting longer than the memory can possibly take to complete the reference.

IOStore←’s and dirty victim writes zero the Syndrome and EcFault fields in Pipe4. Hence, the only reference that leaves junk in these bits is Map←; the fault task can distinguish pipe entries for Map← by means of the RefType field.

All data in Pipe0, 1, 2, and 5 except FlushStore and ColVic are written at t3, and can be read immediately after a reference. However, FlushStore and ColVic are written at t4. Ordinarily, this would mean that their values could not be read safely; however, since B←Pipei is held in the cycle after a reference, the values will always be ok.

Several events cause memory errors from which the system does not recover. Errors halt the processor if the MemoryPE error is enabled (see "Error Handling"). If MemoryPE is disabled, the program will continue without any error indication. MemoryPE conditions are:

Byte parity errors from the cache data memory (checked on write of a dirty victim, not on ←Md or IFU reads); the processor checks Md parity (see "Error Handling") and the IFU checks F/G parity;

Other events cause faults. A fault condition is indicated in the MapTrouble, MemError, and EcFault fields of Pipe4 when it occurs; in addition, the fault task is woken to deal with the situation unless NoWake is true in Mcr. The encoding of the various errors is as follows:

In the above table, WPFlt and PageFlt have the same encoding; these must be distinguished by means of the Store’ bit in Pipe5 and the WP bit in Pipe4; WPFlt can only occur for Store←, IOStore←, or dirty-victim stores that encounter WP true.

MapTrouble might be true and reported to the fault task on a fetch or store that misses or an IOFetch←, IOStore←, FlushStore, or dirty-victim write. Flush← and DummyRef← never cause MapTrouble. Map←, PreFetch←, or IFU fetches might record MapTrouble in the pipe but never wake the fault task. Map faults on IFU fetches are reported instead to the IFU, which buffers the fault indication until an IFUJump occurs to an opcode with at least one instruction byte in the word affected by the map fault; then a trap occurs, as discussed in "Instruction Fetch Unit".

In system microcode, we expect a WPFlt and PageFlt due to IOFetch←, IOStore←, FlushStore, or a victim write to indicate a programming error; however MapPE might occur. Note that if any kind of MapTrouble occurs on a storage write (i.e., on an IOStore←, FlushStore, or victim write), storage is not modified and contains the old value; however, the map’s Dirty bit will be true, even though the storage write has not completed.

SE and DE may occur on any cache reference or PreFetch← that misses or on an IOFetch←. Map←, IOStore←, DummyRef←, and Flush← never cause these errors. Also note that fault task wakeup on an SE requires not only NoWake false but also ReportSE true in Mcr; the fault indication transmitted with the munch for an IOFetch← is set only for DE, never for SE.

Unlike map faults, data errors on IFU fetches and PreFetch←’es are reported to the fault task. This must be done for DE’s, which are fatal; for corrected SE’s, the fault causes no disruption to the program because the fault task, after logging the failure, simply lets the task that faulted continue.

B←FaultInfo stuff is updated and the fault task is woken at the end of the storage pipeline, but sufficiently in advance of hold termination that it will surely run first. For this reason, any operation that might fault is illegal with tasking off.

Subtleties: In the event of a miss with a dirty victim, the new cache entry read starts and finishes before the victim write. However, data transport of the victim to storage finishes before data transport of new data into the cache starts—storage actually reads new data first, but meanwhile transports the victim into a holding register on the storage board, from which it is written into storage after the read.

The fault task is expected to read B←FaultInfo, service all faults it describes, service stack underflow or overflow, then block. Because it is highest priority, the fault task cannot do much computing (io tasks that are lower priority have to be serviced); probably it should not make any memory references itself. Its normal actions are:

Several faults could occur while the fault task is running (due to references initiated before the fault task was awakened). In this case, when the fault task blocks, it will continue because of the pending wakeup, and so service the faults. Only while the fault task is running or while tasking is off is it possible for FaultCnt to become greater than one.

The careful scheme in which ASRN is advanced only for storage references, and faults reported in precise order is essential. If faults were reported out of sequence, then the fault task might see Pipe0 to Pipe2 stuff inconsistent with Pipe3 and Pipe4 error indicators for a previous loop through the ring buffer.

The hardware must not and does not report MapTrouble until the end of the pipeline. If this were not true, then an SRN might report MapTrouble before its predecessor reported SE or DE; this could screw up the fault task.

In tasks other than the emulator, map faults will probably represent programming errors. In the emulator, page-not-in-memory and write-protect faults are expected, and the fault task will trap the emulator to a fault-handling Mesa program. Information saved by the trap microcode must be sufficient to continue the faulted opcode at the instruction that faulted.

At the end of a storage read, the hardware indicates DE after a double-error or SE after a single error as discussed earlier. The SE or DE indication is unambiguous assuming at most two bits in error in any 64-bit quadword; for an odd number of errors greater than 2, the hardware erroneously reports an SE; for an even number of errors greater than 2, DE is reported. If several quadwords in a munch suffer errors, the hardware reports the first DE, if any, or the last SE, if no DE’s.

Error correction can be enabled/disabled by the LoadTestSyndrome function discussed later; when enabled, the hardware will complement (= correct) any SE; for DE’s the hardware does not modify any bits from storage.

The absolute address of the quadword containing the reported error is RP[0:15]..VA[24:27]..quadword[0:1] with 256-word pages, or RP[2:15]..VA[22:27]..quadword[0:1] with 1024-word pages. I.e., the word address of the first word in the quadword would be these 22 bits with two low-order zeroes appended.

SE and DE are derived from the 8-bit syndrome field in Pipe4. Syndrome = 0 means no error; neither DE nor SE should be true in this case. Syndrome non-0 with an odd number of 1’s should have SE indicated. Syndrome non-0 with an even number of 1’s or an invalid word code (discussed below) should have DE indicated.

For SE’s, syndrome specifies exactly which of the 64 data bits or 8 check bits was in error. If syndrome has a single one in it, then the corresponding checkbit was in error. When syndrome contains more than one 1, then syndrome[4:6] indicate which word in the quadword suffered the error as follows:

Syndrome[0:3] indicates the bit position within that word; unfortunately these bits are reversed, so that the bit number is given when the bits are taken in the order 3, 2, 1, 0. Syndrome[7] is the parity of the syndrome, and a double error is indicated by a non-zero syndrome having even parity.

As discussed below in the "Testing" and "Initialization" sections, TestSyndrome is xor’ed with check bits that would otherwise be written on storage writes. This means that Syndrome-of-read equals TestSyndrome-of-write is an exact indication of no-error. However, the hardware always reports non-zero syndrome as an error, as discussed above, regardless of what’s in TestSyndrome.

Dirty is set in the cache after a Store← that misses, despite any fault, so when that munch is chosen as victim, it will be written back into storage. Consequently, if the fault task attempts recovery from a double error on a Store←, it may have to clear the cache address section’s Dirty bit for the munch using the tricky sequences discussed later.

Storage is organized into modules consisting of two identical boards per module. The modules appear in the chasis as shown in Figure 2. Depending on whether 16k-bit or 64k-bit IC’s are used, a module stores 256k or 1m 64-bit (+8 check bit) quadwords. A Dorado can have up to 4 modules, for a maximum of 16m words. Every module must be the same size—it is illegal to mix module sizes.

The module in slot 0 supplies the first quarter of real memory; slot 1, second quarter; slot 2, third quarter; and slot 3, fourth quarter. In other words, real memory addresses are not interleaved among modules and the address range covered by a particular module cannot be controlled by the firmware.

The B←Config’ function (Figure 10) returns M0, M1, M2, and M3 which are true only when a module is plugged into the corresponding storage board slot. ChipSize indicates what size ic’s are used on the storage boards. The memory system automatically adjusts itself to operate according to the IC size in use on the storage boards.

When 256k x 1 MOS storage ic’s become available, we plan to replace the 4k and 16k wires on the backplane by an extra address wire and a 256k wire; at this time we will lose the ability to handle 4k x 1 and 16k x 1 ic’s and the hardware will allow either 64k or 256k storage ic’s to be used.

MOS RAMs used on storage boards (and in the map) must be refreshed at regular intervals, else they drop data. This occurs during refresh references once every 16 ms. Every MOS RAM on every storage board participates in every refresh reference, and one row of data is refreshed each time. This means that 64 (4k RAMs), 128 (16k RAMs), or 256 (64k RAMs) refresh references are required to refresh all data (So the refresh period is 2 or 4 ms—the specification on both 16k and 64k RAMs is a 2 ms refresh period at the maximum operating temperature (85o C). The dominant leakage term is exponential in temperature so the refresh period can be doubled each 5o C drop in operating temperature. Because the specification is conservative and because we have no intention of operating anywhere near 85o C, a 4 ms refresh period should be adequate.

The time for each refresh reference is 8 cycles (13 cycles with 4k-bit RAMs), same as normal references. Refresh hardware competes for storage access with the cache data section and fast io references. During the first 8 ms of a 16 ms period, refresh defers to normal references; during the last 8 ms, it preempts normal references.

The physical cache structure consists of 256 entries in an array of 64 rows by 4 columns. Each entry holds 15 address bits, a parity bit for the address bits, four flag bits, and one munch of data (= 256 data bits + 32 check bits). Hence, the cache holds a total of 4k words of data.

The address section is implemented with 256-word RAM’s, but only 64 words are presently used. The data section uses 1kx1 RAM’s for storage. When sufficiently fast 4kx1 ECL RAM’s become available, we plan to use them in the cache data section and utilize all 256 words in the address section. In this case, the cache geometry will be 256 rows by 4 columns (16k words in the data section).

The cache address section stores 4 flag bits discussed below, 15 VA bits, and 1 parity bit. The way the VA bits are assigned depends upon whether or not 4k x 1 ECL RAM’s are used in the cache data section. VA[7:19] are stored in the address section for all configurations. Two other bits are either VA[5:6] or VA[20:21]; VA[5:6] are used with 4k ic’s in the cache data section (VA[20:21] then appear in the row address of the cache, so they don’t have to be stored). The hardware is also arranged so that the parity bit may be replaced by VA[4].

In other words, the cache initially implements a 225-word virtual memory with provision for expanding this to 227 words when 4k x 1 RAM’s are available or to 228 words at the cost of eliminating the parity bit in the address section. However, the map organization also limits virtual memory, probably to a smaller size than the cache limit, as discussed earlier.

Normally, the cache is invisible to the programmer except for problems with map/cache consistency discussed in the map section. However, features discussed below in "Testing" allow more direct access for checkout, initialization, and error recovery.

An address VA, if in the cache at all, must be in one of the four columns of the row addressed by VA[22:27] (or VA[20:27] if the cache is expanded). References compare the appropriate 15 or 16 bits of VA[4:21] with the values stored in each of the 4 columns to determine which cache entry, if any, contains VA.

The VNV memory contains two two-bit entries for each row of the cache. The Victim field specifies the cache column displaced if a miss occurs in this row. The NextV field is the next victim. When a miss or a hit in Victim occurs, Victim←NextV is done. When a miss, hit in Victim, or hit in NextV occurs, NextV←Victim.0’,,NextV.1’ is done (i.e., NextV is loaded with a value different from both the original NextV and Victim). This strategy is not quite LRU, since there is a 50-50 guess on the ordering of the third and fourth replacements. This treatment of VNV is used for fetches, Store←, PreFetch←, and IFU fetches but not for IOFetch←, IOStore←, or Map←, which don’t use the cache.

On a Flush←, Victim is written with 0 on a miss or with the column of the hit and NextV is written with Victim.0’,,NextV.1’. If the Flush← hit a dirty cache entry, then a FlushStore reference is fabricated which will wind up writing Victim (= column hit by the Flush←) back into storage. The FlushStore reference will also do Victim←NextV and NextV←Victim.0’,,NextV.1’ again. This means that the VNV entry for the row touched by a Flush← is effectively garbaged, which probably won’t affect performance much.

A better strategy for Flush← and IOStore← would be as follows: On a miss, Victim and NextV remain unchanged; on a hit in a column different from Victim, Victim←hit column, NextV←Victim; on a hit in Victim, no change.

Dirty - set by Store←, cleared when loaded from storage. This bit does not imply anything about the map’s Dirty bit. The cache Dirty bit causes a storage write when the entry is chosen as victim, and the map’s Dirty bit is set at that time.

Vacant - set by hit on Flush←, hit on IOStore←, or Store← into a write-protected entry, cleared when the entry is loaded from storage. Vacant is not set after an SE or DE. Vacant prevents the entry from matching any VA presented to the cache.

WriteProtect - a copy of the map’s WP bit. It is copied from the map when the cache entry is loaded and not subsequently changed. If a Store← is attempted into a write-protected entry, the entry is invalidated, there is a cache fault, and a write protect fault will be reported by the map.

BeingLoaded - set while an entry is waiting for data from storage. Any reference that hits in the same row will remain in the cache address section until the bit goes off; any reference or ←Md following the one which hit a row being loaded will be held.

At the end of a miss, data from the error-corrector is loaded into the cache 16 bits/clock. Not until all 16 words of the munch have been loaded is Md loaded and the task (which has been held) allowed to continue. A scheme whereby the word being waited for is loaded into Md concurrent with writing it into the 1kx1 RAM’s has been considered but rejected as too complicated. This would reduce average miss time from about 28 cycles to about 24.

The instruction decoding flipflops of the memory section are enabled when the processor clocks are enabled. All other memory clocks are enabled by a signal called RunRefresh, as discussed in "Dorado Debugging Interface".

When RunRefresh is true, clocks internal to the memory system always happen, even if the processor is halted. When RunRefresh is false, memory clocks run with the processor. Except for low-level debugging of the memory system itself, RunRefresh should be true. Otherwise, storage will not retain data at breakpoints.

The Memory Control Register (Mcr) contains fields that affect the memory system (see Figure 10). Mcr is intended to facilitate testing, and in some cases initialization. The register can be loaded with the Mcr← function and read back over the DMux. Bits in Mcr are as follows (Some of these bits are loaded from A and others from B, as indicated in Figure 10):

dVA←VicOn each reference, write the cache address entry selected by the row of VA and column of Victim (Note: Victim determines the column, even on a hit) into VA of the pipe, so that VA[4:21] in the pipe contain the address from the cache. Also prevent both map and storage automata from starting (which prevents ring buffer pipe entries from being allocated to these as well). FDMiss should always be true when dVA←Vic is true.

During normal operation every bit in Mcr should be 0, except possibly ReportSE’, if correctable errors are not being monitored. It is illegal to load Mcr while references are in progress (Changing DisHold is known to cause problems).

System initialization must get the map initialized as desired and the cache in agreement with the map. Initialization firmware should allow for cache rows containing several entries for the same address, which might occur after power up or after running diagnostics.

The first step is to set NoWake, FDMiss, and DisBR to true. Now processor references will deposit Mar in VA of pipe entry 0 (in the emulator), or of every other pipe entry (in other tasks), so that this part of the pipe can be tested (LongFetch← allows all the VA bits to be tested). Next, setting DisBR false, loading BR’s, and making more processor references will allow BR’s and adder to be tested.

Then set NoWake, FDMiss and UseMcrV to true and use McrV and McrNV to select one column of the cache at-a-time. Each processor reference will store its VA into that column, and into the pipe, and will read out the old VA into the next ring buffer pipe entry (as the victim because FDMiss is true). This allows the VA bits in the address memory to be initialized and tested. The column number in Pipe2 should read back the value in McrV in this case.

Above, address memory values are read using FDMiss, then VA is checked in the pipe entry created for the victim. A simpler method of reading any address section VA is as follows: Turn on DisBR, UseMcrV, and dVA←Vic. On processor references, the cache entry addressed by Mar[6:11] (the row) and McrV (the column) will then have its VA[7:21] written into VA[7:21] of the pipe entry for the reference.

The flag bits in the address section can be directly tested using B←Pipe5 and CFlags←A. These functions operate on the cache entry addressed by the row of the last reference and column of the hit or victim on a miss. Since the IFU or another task could have issued the last reference, these functions are realistically limited to initialization and checkout, where the last reference is known. Normally these will be used with UseMcrV and FDMiss true in Mcr, so McrV will select the column.

CFlags←A requires that Mar data continue without glitching during the preceding instruction as well. This means that data originating in RM or T must not have been loaded during either of the two previous instructions (else a glitch might occur when the multiplexor switched from the bypass to direct path) and that no higher priority tasks may intervene between the two instructions. Issuing CFlags←A in both instructions is the easiest way to drive Mar continuously for two cycles.

Next, initialize the cache address section VA’s and flags so that the cache data section can be tested. To do this turn off FDMiss while leaving on NoWake, dVA←Vic, UseMcrV. Initialize the address section to a convenient range of virtual addresses by Store←’s to each munch with appropriate McrV values. In the instruction after each reference, write the flags to WP = false, BeingLoaded = false, Vacant = false with CFlags←A.

At the end of this setup, the address section will be loaded and have write access to the desired virtual addresses. Hence, Fetch←’es and Store←’s to these VA’s will not miss, and will access the 4k of cache data memory, which can thus be systematically tested.

Next, turn off UseMcrV, leaving only NoWake turned on and use Map← to test the map. At the end of this test initialize the map, say, to map virtual addresses into corresponding real addresses.

In normal operation TestSyndrome contains 0 and Syndrome, written by the error corrector, should be 0 if no error was corrected or detected. For test purposes, TestSyndrome can be loaded with any non-zero value and one bit disables error correction altogether. If there are no storage failures, TestSyndrome should wind up in Syndrome after a storage read.