DYNABUS LOGICAL SPECIFICATIONS
DYNABUS LOGICAL SPECIFICATIONS
DYNABUS LOGICAL SPECIFICATIONS
DRAGON PROJECT — FOR INTERNAL XEROX USE ONLY
DRAGON PROJECT — FOR INTERNAL XEROX USE ONLY
DRAGON PROJECT — FOR INTERNAL XEROX USE ONLY
The Dragon DynaBus
Logical Specifications
Pradeep Sindhu
Dragon-86-?? Written May 22, 1986 Revised May 4, 1987
© Copyright 1986 Xerox Corporation. All rights reserved.
Abstract: The DynaBus is a high bandwidth, synchronous, packet switched bus that will be used to interconnect the DRAGON memory system and IO devices. This document provides the logical specifications for this bus and explains how it operates. Electrical specifications are not covered here, but are taken up in the document "DynaBus Electrical Specifications".
Keywords: Memory Interconnect, Multiprocessor Bus, Packet Switching, Consistency Protocol
FileName: [Indigo]<Dragon>DynaBus>DynaBusLogicalSpecifications.tioga, .interpress
XEROX  Xerox Corporation
   Palo Alto Research Center
   3333 Coyote Hill Road
   Palo Alto, California 94304



For Internal Xerox Use Only
Contents
1. Introduction
2. Bus Overview
3. Bus Interface
4. Bus Protocol
5. Data Consistency
6. ConditionalWriteSingle
7. Input Output
8. Error Handling
Appendix A. Encoding of the Command Field
Appendix B. Allocation of IO Address Space
Appendix C. Format of FaultCode
References
1. Introduction
The DynaBus is a high bandwidth, synchronous, packet switched bus that forms the backbone of the Dragon memory system. This bus is used at two levels in the memory hierarchy: At the first level, it interconnects the components of a Dragon cluster. These components include one or more Dragon processors attached to the bus via Small Caches; some memory, connected via a Big Cache; a Map Cache, and one or more IO devices (Figure 1). At the second level, the DynaBus interconnects clusters and main memory. The principal feature of this two level architecture is that it provides sufficient bandwidth to support a few tens of Dragon processors, high speed IO devices, and a high resolution color display, even when processors have a cycle time close to that of the bus. In our estimate, this assumption of balanced bus and processor cycle times most closely reflects what technology can deliver within the next few years.
[Artwork node; type 'ArtworkInterpress on' to command tool]
 Figure 1. The Two-level Dragon Memory Hierarchy
This document describes the structure, protocol, and operation of the DynaBus at the logical level. It begins in the following section with an overview of the bus's main characteristics. Section 3 defines the bus interface, which encapsulates all of the knowledge about the bus that a device needs to have in order to connect to it. Section 4 uses this interface definition to specify the bus protocol: it discusses arbitration, lists the set of bus commands currently defined, and shows how these commands are used in the various transactions. Section 5 describes how the protocol is used to keep multiple copies of cached data consistent. Section 6 discusses why there is a special ConditionalWriteSingle transaction. Section 7 takes up IO. Finally, Section 8 discusses error handling. Electrical characteristics are not covered here at all, but appear in a companion document titled "DynaBus Electrical Specifications". The appendices contain details that do not fit into the flow of the main document.
2. Bus Overview
A key requirement for a multiprocessor bus is that it be able to deliver a large usable bandwidth. The cycle time of such a bus must be correspondingly small since the usable bandwidth is at most as large as the electrical bandwidth. Traditional circuit switched bus designs, which tie up the bus for the duration of each transaction, rapidly become unreasonable as the cycle time of the bus is decreased. To see this, we first need to note that reads to memory constitute most of the traffic on a memory bus. If we assume that memory access time is around an order of magnitude larger than bus cycle time, we see that the bus will be idle for a significant fraction of the time. One way to get around this difficulty is to increase the size of the transport unit to amortize the cost of memory accesses, but this compromises cache efficiency. To avoid these idle bus cycles, the Dynabus is packet switched: the request and reply for each transaction are dissociated and the bus is free to be used by other requesters in between. Although such dissociation is strictly necessary only for transactions that involve main memory, for uniformity the DynaBus uses it for all transactions; this decision also simplifies the solutions to a number of other problems (bus deadlock, data consistency) that are somewhat difficult to handle if the bus is circuit switched.
The DynaBus can be understood best in terms of three layers: cycles, packets, and transactions (these layers correspond to the electrical, logical, and functional levels of operation, respectively). A bus cycle is simply one complete period of the bus clock; it forms the unit of time and electrical information transfer on the bus; the information is typically either an address or data. A packet is a contiguous sequence of cycles; it is the mechanism by which one-way logical information transfer (including broadcast) occurs on the bus. The first cycle of a packet carries address and control information; subsequent cycles typically carry data. A transaction consists of a pair of request-reply packets which together perform some logical function (such as a memory read); the pair may be separated by an arbitrary number of cycles, modulo timeout.
There are around 70 signals on the DynaBus. One of these is the bus clock, which provides synchronization. Several others serve the control functions of sending and receiving packets, of indicating bus error, and of resetting the system. A multiplexed data/address path accounts for a majority of the signals, 64. During data cycles this path carries 64 bits of data; during address cycles it carries a command, some control bits, a device id, and an address; the address part is 47 bits and represents either a real address or an IO address, depending on the command. The unit of addressing is a 32-bit word. However, for efficient bus utilization, data is usually transferred in multiple word chunks called blocks. Finally, the signals shared and owner, are used by the data consistency algorithm.
Each DynaBus has an arbiter that permits the bus to be time multiplexed amongst contending devices. These devices make requests to the arbiter using dedicated request lines, and the arbiter grants the bus using dedicated grant lines. Different devices may have different levels of priority, and the arbiter guarantees fair (bounded-time) service to each device within its priority level. Bus allocation is non-preemptive, however. A device that has been granted the bus is called bus master. The sequence of cycles initiated by a master will typically involve at least one other device on the bus; such a device is called a slave. Since the DynaBus is packet switched, every device must be capable of serving both as a master and as a slave. The notions of master and slave are adequate to describe the transmission of a single packet, but fail to capture transaction level interaction. The terms requestor and responder are used in this context to indicate the devices that place a request and the corresponding reply packet on the bus, respectively.
The following transactions are defined for the DynaBus: ReadBlock, FlushBlock, WriteBlock, WriteSingle, ConditionalWriteSingle, IORead, IOWrite, BIOWrite, Map, and DeMap. ReadBlock allows a cache to read a packet from memory or another cache. FlushBlock allows caches to write back dirty data to memory. WriteSingle is used by caches to update multiple copies of shared data. IORead, IOWrite and BIOWrite do reads and writes to IO devices. Map is used to read an entry from the Map Cache. Finally, ConditionalWriteSingle implements the Dragon ConditionalStore instruction.
3. Bus Interface
All devices connect to the DynaBus via a standard bus interface, shown in Figure 2. This interface consists of five signal groups: the control port, arbitration port, receive port, send port, and the consistency port. The control port provides the clock. The arbitration port is for bus acquisition. The send and receive port signals allow a device to send and receive packets. The consistency port provides signals useful in keeping multiple copies of data in a consistent state.
This section simply names the signals in the interface and gives their functions without saying how they are used. Usage will be taken up in the bus protocol section, which immediately follows this one. The description uses the following conventions: S[a..b) denotes a group of b—a lines that encode the bits representing the signal S; most significant bits of the signal are written leftmost. A signal represented by a single wire is written unencumbered with the [..) notation. The term device will be used throughout to indicate an entity connected to the bus. A block will denote a chunk of 8 contiguous 32-bit words aligned in real address space such that the address of the first word is 0 MOD 8. All signal names in the interface use the device as reference, not the bus interface. Thus for example, DataOut is used to send data from the device to the bus.
[Artwork node; type 'ArtworkInterpress on' to command tool]
Figure 2. DynaBus Logical Interface
Control Port
Clock (Input)
This signal provides the basic timing for interactions between a device and the interface. Any other clocks used by a device typically will be derived from this signal.
Arbitration Port
RequestOut[0..2) (Output)
These signals are used to make bus requests to the arbiter.
Grant (Input)
This signal is used by the arbiter to signal a device that it has bus grant. A device must drive onto the bus for exactly the cycles during which Grant is asserted.
HiPGrant (Input)
This signal tells a device whether the current grant is for a low high request or a high priority one. This signal comes one cycle earlier than Grant and is valid for exactly one cycle.
LongGrant (Input)
This signal tells a device whether the current grant is for a 5 cycle packet or a 2 cycle packet. This signal also comes one cycle earlier than Grant and is valid for exactly one cycle.
Receive Port
HeaderCycleIn (Input)
This signal indicates the beginning of a packet. It is asserted when the header (first cycle) of a packet has arrived and is ready to be read from DataIn. In other words, HeaderCycle can be used to latch the header from DataIn.
DataIn[0..64) (Input)
These 64 lines encode the cycles of incoming packets. The header cycle for a packet always carries the bus command, a Mode/Fault bit, a RplyShared bit, the id of the transaction's requestor, and an address. Subsequent cycles may carry either data or address, depending on the command. Also depending on the command, the address may be either in real address space or IO address space. The fields within the 64 bits of the header cycle are arranged as follows:
[Artwork node; type 'ArtworkInterpress on' to command tool]
Figure 3. Encoding of Header Cycle
For machines with a real address smaller than 47 bits, the address must appear in the low order bits of the field and the contents of the unused high order bits must be 0.
ParityIn (Input)
This signal carries parity computed over the DataIn lines. Its timing is currently undefined.
Send Port
HeaderCycleOut (Output)
This signal indicates the beginning of the packet being sent. It is asserted during the cycle in which the header is being sent. Note that HeaderCycleOut is generated by the device sending the packet, although it could also have been generated by the arbiter.
DataOut[0..64) (Output)
These 64 lines encode the cycles for outgoing packets. The encoding for the header cycle is the same as that given above.
ParityOut (Output)
This signal carries parity computed over the DataOut lines. Its timing is currently undefined.
Consistency Port
SharedOut (Output)
The DynaBus Shared line is used to maintain multiple copies of cached data in a consistent state. When an address cycle appears on the bus, all caches match the address against addresses they have stored. Caches that find a match use SharedOut to assert the Shared line. The delay from address in to SharedOut is implementation dependent and therefore not part of the bus specifications. There is no corresponding signal SharedIn because caches don't read the Shared line directly—its state is returned via a reply packet.
OwnerOut (Output)
The DynaBus Owner line is also used by caches to maintain consistency. A cache uses OwnerOut to assert the Owner line when a read request appears on the bus and the cache is the owner of that block (a cache is the owner of a block when its processor was the last one that did a write into the block). The delay from address in to OwnerOut is the same as for SharedOut. There is no corresponding signal OwnerIn because caches don't need to read the Owner line.
4. Bus Protocol
The DynaBus's protocol can be described best in terms of bus operation at three levels: cycles, packets, and transactions. These terms were defined earlier, but we will review them quickly. A cycle is one complete period of the bus clock; a packet is a contiguous sequence of cycles; and a transaction is a pair of request-reply packets. The first cycle of a packet always carries a device id, a command, a Mode/Fault bit, a RplyShared bit, and an address; subsequent cycles may carry data or address, depending on the command.
Before a transaction can begin, the requesting device must get bus mastership from the arbiter. Once it has the bus, the device puts a request packet on the bus one cycle at a time, releases the bus, and then waits for the reply packet. Packet transmission is uninterruptable in that no other device can take the bus away during this time, regardless of its priority. The transaction is complete when another device sends a reply packet. Note that between the request and reply packets there may be an arbitrary number of cycles. This is not quite true because of timeouts and the need to do flow control, but more on these topics later. Under normal operation, the protocol ensures a one-to-one correspondence between request packets and reply packets; however, because of errors, some request packets may not get a reply. Thus, devices must not depend on the number of request and reply packets being equal since this invariant will not in general be maintained. The protocol requires devices to provide a simple, but crucial guarantee: that they service request packets in arrival order.
The detailed description of the protocol below first takes up arbitration and flow control. It then discusses time outs, the mechanism used to detect no response. Next, it lists the bus commands currently defined, says how NoOps are issued, and explains the purpose of the device id. The last subsection provides the detailed makeup of the different transactions. The description of the cache consistency algorithm is deferred to the following section.
Arbitration and Flow Control
The arbiter serves two related purposes. First, it allows multiple contending devices to use the bus without stepping on one another. Second, it ensures that devices whose service times prevent them from keeping up with the maximum arrival rate of requests don't have to worry about internal buffers overflowing. This second function is referred to as flow control.
Each device interacts with the arbiter via a port consisting of two request wires Request[0..1] and three grant wires Grant, HiPGrant, and LongGrant. The arbiter supports two basic types of ports, called normal and memory. A normal port has two request counters, one for high priority requests and the other for low priority requests. In addition, each counter has an associated packet length. A device indicates whether it wants to make a low or high priority request via its Request wires. It may also use these wires to sieze the bus, locking out all requestors, and to release it. The Request wires are encoded as follows:
Encoding Interpretation
0 release system-wide hold if in effect, NoOp otherwise
1 assert system-wide hold
2 increase the number of low priority requests by one
3 increase the number of high priority requests by one
A memory port consists of a single request FIFO, with a single priority for all incoming requests. A device indicates the length of a packet via its Request wires. It may also use these wires to sieze and release the bus. The Request wires are encoded as follows:
Encoding Interpretation
0 release system-wide hold if in effect, NoOp otherwise
1 assert system-wide hold
2 make a request for a 2 cycle packet
3 make a request for a 5 cycle packet
For both types of ports a device is permitted to make more than one request to the arbiter before any of the pending requests have been granted, subject to the implementation limit imposed by the arbiter. A device does this simply by holding the Request wires in the desired state for more than one cycle. One request is registered for each cycle in which the wires are in request state. For memory ports note that the FIFO guarantees that grants will be given in the order in which requests arrived.
The type of a port, as well as type dependent characteristics (the lengths for the high and low priorities for a normal port, and the priority for a memory port) are provided at initialization time using the Debug bus. For a normal port the priority of the high priority port must not be made less than that of the low priority port.
Grant is used by the arbiter to signal that a device has grant. If Grant is asserted in cycle i then the device can use the bus in cycle i. Finally, HiPGrant tells the device whether the current grant corresponds to a high priority request or not; and LongGrant tells the device whether the current grant corresponds to a 5 cycle packet or not. The timing of key signals at the pins of a requester is shown below for a two cycle request (DriveData of course doesn't appear at the pins of the chip, but the signals it enables do).
[Artwork node; type 'ArtworkInterpress on' to command tool]
 Figure 4. Arbitration Timing for a Minimum Latency, Two-Cycle Packet; all signals are at the pins of a requesting device.
The arbitration delay shown (7 cycles). is the minumum. If there are multiple requestors, or if the bus is busy, Grant may take several more cycles depending on how many devices are ahead of the one making the request. Note that HiPGrant and LongGrant arrive one cycle before Grant and are valid for exactly one cycle.
The arbiter supports seven distinct priority levels, all of which are used. The algorithm it uses is round-robin with priority—that is, highest priority requests are served first and requests within a level are served round-robin. The assignment of levels to devices is as follows:
Value of Priority Meaning
  0 (highest) cache reply priority
  1  memory priority
  2  reserved priority-not available to devices
  3  display high priority
  4  I/O priority
  5  cache low priority
  6 (lowest) display low priority
The two highest priorities are assigned to requests to send reply packets. This ensures that devices can get rid of transactions quickly once they have serviced them and prevents output queues from building up indefinitely. Memory uses its priority to send both five cycle and two cycle packets. The two priorities assigned to the display are the lowest and the highest for request packets. Normally, the display uses its low priority to satisfy requests; since this is the lowest priority, the display will end up using whatever unused "holes" there are on the bus. Occasionally, when the display's queue gets dangerously empty, it switches to high priority.
Time Outs
To detect devices that do not respond, or devices that aren't there in the first place, the protocol requires devices to implement their own time out facilities. Each device must maintain a counter that starts counting bus cycles when a bus request for a request packet is issued. If maxWaitCycles cycles have elapsed before the corresponding reply packet is received, the device must assume that an error has occurred and either handle it on its own or assert the DBus Error line.
The determination of a system-wide value for maxWaitCycles is a little tricky because of the wide variance in expected service times. For example, a low priority device might take a long time to just get bus grant, while a higher priority device would get grant relatively quickly. A low priority device might in fact be forced to wait arbitrarily long if a higher priority device decides to hog the bus. The question of whether this ought to be considered an error is debatable.
To avoid getting entangled in these issues, the protocol simply specifies a system-wide lower bound on the limit maxWaitCycles and leaves it up to the device implementor to decide the exact value. Such a lower limit is needed to avoid generating frequent false alarms. A conservative lower limit can be arrived at by computing the worst-case service time for a cache request and increasing it by an order of magnitude for safety (caches are taken since they are the lowest priority devices that do not change their request priority). Assuming there are 8 caches and only one memory bank, the worst case service time is at most
= 8*#cycles to service one request in an unloaded system
= 8*25 cycles.
Increasing this by an order of magnitude gives 2048 cycles, so each device is required to have maxWaitCycles e 2048.
NoOps
It sometimes happens that a device that has done an arbiter request has nothing to send when it gets grant. In this situation the device is expected to send a NoOp packet of the same length as the packet it had originally intended to send. It does this simply by not asserting HeaderCycleOut during its allocated header cycle. Thus, there is no special command to indicate NoOp.
Commands 
The packet command field is 5 bits. Four bits encode up to 16 different transactions, while the fifth bit encodes whether the packet is a request or a reply. Ten of the transactions are currently defined.
The description below lists commands by pairs (request/reply). For brevity, each command is given an abbreviation which appears in parentheses. The encoding for the commands appears in Appendix A.
ReadBlockRequest (RBRqst), ReadBlockReply (RBRply)
These commands are used by devices to read a block from the memory system.
WriteBlockRequest (WBRqst), FlushBlockReply (WBRply)
These commands are used to write a block to the memory system. The first command does the actual write, while the second is just an acknowledgement that the write has been performed.
FlushBlockRequest (FBRqst), FlushBlockReply (FBRply)
These commands are used to write a block to the memory system. The first command does the actual write, while the second is just an acknowledgement that the write has been performed.
WriteSingleRequest (WSRqst), WriteSingleReply (WSRply)
These commands are used by caches to update multiple copies of shared data. The first command requests a write, while the second actually performs it.
ConditionalWriteSingleRequest (CWSRqst), ConditionalWriteSingleReply (CWSRply)
These commands are used by caches to perform a read-modify-write operation on shared data. The first command merely requests the read-modify-write while the second actually performs it.
IOReadRequest (IORRqst), IOReadReply (IORRply)
These commands are used by devices to read a single word from the IO address space. The responding IO device must be unique.
IOWriteRequest (IOWRqst), IOWriteReply (IOWRply)
These commands are used by devices to write a single word out to the IO address space. The responding IO device must be unique.
BIOWriteRequest (BIOWRqst), BIOWriteReply (BIOWRply)
These commands are used by devices to Broadcast write a single word out to the IO address space. More than one device may act upon a BIOWriteRequest.
MapRequest (MapRqst), MapReply (MapRply)
These commands are used by caches to fetch missing map entries.
DeMapRequest (DeMapRqst), DeMapReply (DeMapRply)
These commands are used to clear virtual valid bits for a real page.
Mode/Fault
This bit has different interpretations for request and reply packets. In a request packet it supplies the mode (kernel=0, user=1) of the device that issued the request (when this device is a cache, the mode bit is the mode bit of the cache's processor). In a reply packet a 1 indicates that the device generating the reply encountered a fault, a 0 indicates no fault. When the fault bit is set in a reply packet, the 32 low order bits of the second cycle supply a FaultCode. The format of FaultCode is defined in Appendix C.
RplyShared
This bit is unused within request packets. Within reply packets it says whether the block is shared or not. Only devices that play the consistency game need to look at this bit.
Device ID
Every packet on the bus carries a device identifier in its first cycle. For request packets, this DeviceID carries the unique number of the device making the request. For reply packets it is the number of the intended recipient (ie. the device that sent the request). Such an identifier is needed because the address part of a packet alone is not sufficient to disambiguate replies.
Devices that either have only one outstanding reply, or that have multiple outstanding replies but can somehow discriminate between them, need only one DeviceID. Other devices are allocated multiple DeviceID's to allow them to disambiguate replies. These DeviceID's must be contiguous and must be a power of two in number.
The DeviceID(s) for a device are loaded in at system initialization time via the Debug bus (see [DBusSpec] for details).
Address
The address part of the first cycle is 47 bits. In the current implementation only 32 of these bits are used—the high-order 15 must be 0. To ensure easy extension, devices must check that these high order bits are 0 and do nothing if they are not. For non-IO transactions, the 32 bits represent the address of a 32-bit word in real address space. For IO transactions the 32 bits represent the IO address of a 32 bit word in the IO address space.
For all transactions other than Map, the contents of the address field in the request packet and its corresponding reply must be identical.
Transactions
The following transactions are currently defined.
ReadBlock
ReadBlock reads a block from main memory or from a cache, depending on whether main memory is consistent with the caches. Recall that a block is eight contiguous 32-bit words aligned in real address space such that the address of the first word is 0 MOD 8.
The request packet for ReadBlock consists of two address cycles. The first cycle provides the address of the block to be read; the second provides the address of the block being victimized within the cache requesting the ReadBlock. Bit 0 of this cycle is a 1 if the address is valid, while bits 17-63 contain the address. The victim address allows a second level cache to determine when a block is no longer in any of the first level caches, and therefore is a candidate for being replaced.
The reply packet for ReadBlock is five cycles long. The first cycle is the address of the first word being returned, and the remaining four cycles carry the cycles of block data. The 64-bit data cycles of the block must be in cyclic order, with the cycle containing the addressed word appearing first.
WriteBlock
WriteBlock writes a block to main memory and to mathing caches. This transaction is used by producers of data (outside the memory system) to inject new data into the memory system.
The request packet is five cycles. The first cycle carries the address, and the remaining four carry the four cycles of block data. The data cycles must be in cyclic order, with the cycle containing the addressed word appearing first.
The reply packet, which is two cycles, serves as the acknowledgement that the write has been performed. The first cycle carries the address; the contents of the second cycle are undefined.
FlushBlock
FlushBlock writes a block to main memory from some cache—other caches do not listen. This transaction is provided (in addition to WriteBlock) to allow a cache to flush its data without conflicting with processor accesses within other caches.
The request packet is five cycles. The first cycle carries the address, and the remaining four cycles carry the four cycles of block data. The data cycles must be in cyclic order, with the cycle containing the addressed word appearing first.
The reply packet, which is two cycles, serves as the acknowledgement that the write has been performed. The first cycle carries the address; the contents of the second cycle are undefined.
WriteSingle
WriteSingle writes a 32-bit word out to real address space. This transaction is used by caches to maintain multiple copies of read/write data in a consistent state. Unlike FlushBlock, WriteSingle does not write the word out to memory, but is directed exclusively to caches.
The request packet for WriteSingle consists of two cycles, address and data, in that order. The reply packet is identical, except that the command is WriteSingleReply and the RplyShared bit indicates whether the datum is shared.
ConditionalWriteSingle
ConditionalWriteSingle does a read-modify-write to a given location in real address space. This transaction is provided to allow the actions of multiple processors to be synchronized.
The request packet is two cycles, the first of which carries the address. The second carries the old and new values, with the old value appearing in the most significant 32 bits and the new value in the least significant 32 bits.
The reply packet is five cycles. The first two cycles are identical to the request packet, except for command and RplyShared, while the contents of the remaining three cycles are undefined.
IORead
The IORead and IOWrite transactions are used by caches to communicate with IO devices. The 32 bit address of an IORead or IOWrite defines a common, unmapped IO address space. Each IO device has a unique piece of the space allocated to itself, and is free to interpret IOReads and IOWrites to that piece in any manner it sees fit. Further details of the IO model and the makeup of the IO address space appear in Section 7 and Appendix B.
IORead reads a single word from IO address space. The request packet consists of two cycles, the first of which carries the IO address. This address appears in the low 32 bits of the 47 bit address field—the unused 15 high order bits must be 0. The contents of the second cycle are undefined.
The reply packet also consists of two cycles. The first carries the IO address in the request, while the second carries the data read from the IO device. The 32 bits of data appear in the low 32 bits of the 64 bit cycle; the unused high order bits are undefined.
IOWrite
IOWrite writes a single word to IO address space. The request packet is two cycles, the first carries the IO address, the second the data. The address appears in the low 32 bits of the 47 bit address field—the unused 15 high order bits must be 0. The 32 bits of data appear in the low 32 bits of the 64 bit cycle; the unused high order bits are undefined.
The reply packet is two cycles. The first cycle contains IOWriteReply as the command and the same address as in the corresponding request. The contents of the second cycle are undefined.
BIOWrite
BIOWrite (Broadcast IOWrite) writes a single word to the IO address space. This transaction is different from IOWrite in that there may be multiple devices that act on the request. The devices that act on a given BIOWrite are all instances of the same device type (see Appendix B); thus, BIOWrite is not a global broadcast, but only a broadcast to all devices of a given type.
The request packet is two cycles, the first carries the IO address, and the second the data. The address appears in the low 32 bits of the 47 bit address field—the unused 15 high order bits must be 0. The 32 bits of data appear in the low 32 bits of the 64 bit cycle; the unused high order bits are undefined.
The reply packet is two cycles. The first cycle contains BIOWriteReply as the command and the same address as in the corresponding request. The contents of the second cycle are undefined.
Map
Map reads a virtual-page-to-real-page mapping entry from the Map Cache.
The request packet is two cycles. The first cycle contains the virtual page number, while the second contains the address space id in which the virtual page is to be mapped (see [MapSpec] for the detailed format).
The reply packet is also two cycles. The first cycle contains the real page and flags, while the second is undefined. Note that for Map, the address in the request and reply packets is not the same.
DeMap
DeMap causes the virtual valid bits for a given real page to be cleared in all caches.
The request packet is two cycles. The first cycle contains the real page number in bits 32..53, while the second cycle is undefined.
The reply packet is identical to the request packet except for the command.
5. Data Consistency
A basic decision in the Dragon architecture is that multiple copies of read/write data are allowed to exist when data is cached. An immediate consequence of this is that the copies must be kept consistent if computations are to produce correct results.
This section describes briefly how the above transactions can be used to maintain consistency in a system with multiple levels of cache. The description begins with a definition of data consistency. It then explains how consistency is maintained in a single level system. The two-level case is taken up last. Although only the single-level and two-level cases are described here, it should be noted that the algorithm works equally well for the N-level case.
Definition of Data Consistency
One way to define consistency for a particular datum A is to say that during each clock cycle all copies of A have the same value. While this definition is adequate and easy to understand, it is hard to implement efficiently when the potential number of copies is large. Fortunately, there is a weaker definition that is still adequate for correctness, but is much easier to implement in a large system. It is based on the notion of serializability.
To define serializability, consider the abstract model of a shared memory multiprocessor shown in Figure 5. There are some number of processors connected to a shared memory. Each processor has a private line to shared memory; over this line the processor issues the commands Fetch(A) and Store(A, D), where A is an address and D is data. For Fetch(A) the memory returns the value currently stored at A; for Store(A, D) it writes the value D into A and returns an indication that the write has completed. Let the starting time of an operation be the moment the request is sent over the line and the ending time be the moment either the return value or an indication of completion is received by the processor.
[Artwork node; type 'ArtworkInterpress on' to command tool]
 Figure 5. Abstract Model of Shared Memory Multiprocessor
Let S be the state of the shared memory before the start of some computation C, and F be the state after C has completed. The Fetches and Stores of C are said to be serializable if there exists some serial order such that if these operations were performed in this order, without overlap, the same final state F would be reached starting from the initial state S. Two operations are said to overlap if the starting time of one is before the ending time of the other and the starting of the other is before the ending time of the one.
Single Level Operation
For the purposes of explaining the consistency algorithm, a single level system consists of one or more processors connected to the DynaBus through caches, and a single main memory. The first thing to note about this configuration is that it is sufficient to maintain consistency between cached copies. Since processors have no way to access data except through their caches, the main memory copy can be stale without causing incorrect behavior.
The algorithm requires that for each block of data a cache keep two additional bits, shared and owner. For a given block, the shared bit indicates whether there are multiple copies of that block or not. This indication is not accurate, but conservative: if there is more than one copy then the bit is TRUE; if there is only one copy then the bit is probably FALSE, but may be TRUE. We will see later that this conservative indication is sufficient. The owner bit is set in a given cache if and only if the cache's processor wrote into the block last; thus at most one copy of a datum can have owner set. A cache is also required to maintain two state bits sharedAccumulator and rplyStale that correspond to an address for which the cache has sent a request but has not yet received a reply. The bit sharedAccumulator is used in computing the value to be put into the shared bit for the block addressed by the transaction, while rplyStale is used to determine if the data in a ReadBlockReply is valid when it arrives. In addition to this state, the algorithm uses two lines on the DynaBus, Shared and Owner that were described earlier in Section 3.
Generally, a cache initiates a ReadBlock transaction when its processor does a Fetch or Store to a block and the block is not in the cache; it initiates a FlushBlock when a block needs to get kicked out of the cache to make room for another one (only blocks with owner set are written out); and it initiates a WriteSingle when its processor does a write to a block that has the shared bit set. Caches do a match only if they see one of the following packet types: RBRqst, RBRply, WSRqst, WSRply, CWSRqst, CWSRply, and WBRqst. In particular, note that no match is done either for a FBRqst or a FBRply. This is because FB is used only to flush data from a cache to memory, not to notify other caches that data has changed. No match is done for a WBRply either, because all this packet does is acknowledge that the memory has processed the WBRqst.
When a cache issues a RBRqst or WSRqst, all other caches match the block address to see if they have the block. Each cache that matches, asserts Shared to signal that the block is shared and also sets its own copy of the shared bit for that block. The requesting cache uses sharedAccumulator to compute the value of the shared bit. The reason it can't just copy the value of Shared into the shared bit like the other caches is because the status of the block might change from not shared to shared between request and reply due to an intervening packet with the same address. It is precisely to accumulate changes in the shared bit caused by intervening packets that sharedAccumulator is needed—hence its name. This computation proceeds as follows: the requesting cache first clears sharedAccumulator, and waits and watches packets on the bus, just like other caches. If the address in any of these packets matches the address that was sent out, the cache sets sharedAccumulator. When it sees the reply packet, it OR's sharedAccumulator and rplyShared from the packet and puts the result into the shared bit for the block. This ensures that the shared bit is TRUE for a block only if there are multiple copies, and that the shared bit is cleared eventually if there is only one copy. The clearing happens when only one copy is left and that copy's processor does a store. The store turns into a WSRqst, no one asserts Shared, and so the value the requestor computes for the shared bit is FALSE.
The manipulation of the owner bit is simpler. This bit is set each time a processor stores into one of the words of the block; it is cleared each time a WSRply arrives on the bus (except for the cache whose processor initiated the WSRqst). There are two cases to consider when a processor does a store. If the shared bit for the block is FALSE, then the cache updates the appropriate word and sets the owner bit right away. If the shared bit is TRUE, the cache puts out a WSRqst. When the memory sees the WSRqst, it turns it around as a WSRply with the same address and data, making sure that the shared bit in the reply is set to the value of the Shared line an appropriate number of cycles after the appearance of the WSRqst's header cycle. When the requestor sees the WSRply, it updates the word and also sets owner. Other caches that match on the WSRply update the word and clear owner. This guarantees that at most one copy of a block can ever have owner set. Owner may not be set at all, of course, if the block has not been written into since it was read from memory.
When an RBRqst appears on the bus, two distinct cases are possible. Either some cache has owner set for the block or none has. In the first case the owner (and possibly other caches) assert Shared. The owner also asserts Owner, which prevents memory from responding, and then proceeds to supply the block via an RBRply. The second case breaks down into two subcases. In the first subcase no other cache has the block, Shared does not get asserted, and the block comes from memory. In the second subcase at least one other cache has the data, Shared does get asserted, but the block still comes from memory because no cache asserted Owner. Because the bus is packet switched, it is possible for ownership of a block to change between the request and its reply. Suppose for instance that a cache does an RBRqst at a time when memory was owner of the block, and before memory could reply, some other cache issues a WSRqst which generates a WSRply which in turn makes the issuing cache the owner. Since Owner wasn't asserted for the RBRqst, memory still believes it is owner, so it responds with the RBRply. To avoid taking this stale data, the cache that did the RBRqst uses its rplyStale state bit. It sets the bit to FALSE when it sends its RBRqst and sets it to TRUE if between the RBRqst and RBRply there is a WSRply, CWSRply, or WBRqst to the same address. When the RBRply is received, the cache checks if rplyStale is set. If it isn't, the RBRply is good and the block is put into the cache; if it is, the cache issues another RBRqst and waits for the reply. This process is repeated as many times as necessary to get valid data.
It is interesting to note that in the above algorithm the Shared and Owner lines are output only for caches and input only for memory. This is because the caches never need the value on the Owner line, and the value on the Shared line is provided in the reply packet so they don't need to look at the Shared line either.
In the discussion above, we did not say how the transactions CWS and WB work. We will do this now. CWS is identical in its manipulation of the Shared and Owner bits and the Shared and Owner lines to WS, so as far as consistency is concerned these transactions can be treated the same. WB, on the other hand, is identical to FB as far as memory is concerned. Caches ignore FB, but overwrite their data for a matching WBRqst and clear the owner bit for this block.
Two-Level Operation
For the purposes of the consistency algorithm, a two-level system consists of a number of one-level systems called clusters connected by a main bus that also has the system's main memory. Each cluster contains a single big cache, which connects the cluster to the main bus, and a private DynaBus, which connects the big cache to the small caches in the cluster. This private DynaBus is electrically and logically distinct from the DynaBuses of other clusters. From the standpoint of a private DynaBus, its big cache looks identical to the main memory in a single-level system. From the standpoint of the main DynaBus, a big cache looks and behaves very much like a small cache in a single-level system. Further, the design of the protocol and the consistency algorithm is such that a small cache cannot even discover whether it is in a one-level or a two-level system—the response from its environment is the same in either case. Thus the small cache's behavior is identical to what was described in the previous subsection.
The algorithm requires the big cache to keep all of the state bits a small cache maintains, plus some additional ones. These additional bits are the existsBelow bits, kept one bit per block of the big cache. The existsBelow bit for a block is set only if some small cache in that cluster also has a copy of the block. This bit allows a big cache to filter packets that appear on the main bus and put only those packets on the private bus for which the existsBelow bit is set. Without such filtration, all of the traffic on the main bus would appear on every private bus, defeating the purpose of a two-level organization.
We have already stated that the behavior of a small cache in a two-level system is identical to its behavior in a one-level system. We have also said that a big cache behaves like main memory at its private bus interface and a small cache at its main bus interface. What remains to be described is what the big cache does internally, and how packets on a private bus relate to those on the main bus and vice-versa.
When a big cache gets an RBRqst from its private bus, two cases are possible: either the block is there or it's not. If it's there, the cache simply returns the data via an RBRply, making sure that it sets the shared bit in the reply packet to its current state in the cache (recall that in the single-level system main memory returned the value on the Shared line for this bit). If the block is not in the cache, the cache puts out an RBRqst on the main bus. When the RBRply comes back the cache updates itself with the new data and its shared bit and puts the RBRply on the private bus. When a big cache gets a WSRqst on its private bus, it checks to see if the shared bit for the block is set. If it is not set, then it updates the data, sets owner, and puts a WSRply (with shared set to the value of the Shared line at the appropriate time) on the private bus. If shared is set, then it puts out a WSRqst on the main bus. The memory responds some time later with a WSRply. At this time the big cache updates the word, sets the owner bit, and puts a WSRply on the private bus with shared set to one. When a big cache gets a FBRqst, it simply updates the block and sends back an FBRply.
When a big cache gets an RBRqst on its main bus, it matches the address to see if has the block. If there is a match and owner is set, then it responds with the data. However, there are two cases. If existsBelow is set, then the data must be retrieved from the private bus by placing a RBRqst. Else the copy of the block it has is current, and it can return it directly. When a big cache gets a WSRqst on the main bus, it matches the address to see if the block is there and asserts shared as usual, but takes no other action. When the WSRply comes by, however, and there is a match, it updates the data it has. In addition, if the existsBelow bit for that block happens to be set, it also puts WSRply on the private bus. Note that this WSRply appears out of the blue on the private bus; that is, it has no corresponding request packet. This is another reason why the number of reply packets on a bus may exceed the number of request packets.
This completes the description of the consistency algorithm. This description has been sketchy, but intentionally so since its meant to be an introduction rather than a rigorous specification. Such a specification is contained in [DynaAlg].
6. ConditionalWriteSingle
The DynaBus defines the ConditionalWriteSingle transaction with the idea that the Dragon read-modify-write instruction ConditionalStore will be implemented directly by caches. This transaction, like the Dragon instruction, takes three arguments: an address, an old value and a new value, and its semantics are:
ConditionalWriteSingle[address, oldval, newval] Returns[sample] =
{<begin critical section>
sample ← address^;
If sample=oldval Then address ← newval
<end critical section>
}
Direct implementation of ConditionalWriteSingle by caches has several key advantages over alternative schemes. First, it is easy to show that this scheme functions correctly since the proof is identical to that for WriteSingle. Second, it allows the maximum possible concurrency for read-modify-writes to a particular location. And third, the cost of a single read-modify-write as seen by a processor is small, especially when the location is not shared.
7. Input Output
The DynaBus protocol was written with a particular model of IO devices in mind. In this model, all interactions with IO devices fall into one of two categories, control, and data transfer. Control interactions are used to initiate IO and discover whether an earlier request has completed, while data transfer interactions actually move the data to and from memory. It is assumed that the bus bandwidth requirements of control interactions are small compared to those of data transfer. The model permits control and data interactions to be combined for devices with low data transfer rates.
Following the dictates of the model, this section first describes how control interactions work and then turns to data transfer.
Control
All control interactions are carried out through the use of IORead, IOWrite and BIOWrite transactions directed to a common, unmapped 32-bit IO address space. This address space is common in the sense that all processors see the same space, and it is unmapped in the sense that addresses issued by processors are the ones seen by the IO devices. Generally, each type of IO device is allocated a unique, contiguous chunk of IO space at system design time, and the device responds only if an IORead, IOWrite, or BIOWrite is directed to its chunk. The term IO device is being used here not just for real IO devices, but any device (such as a cache) that responds to a portion of the IO address space. See Appendix B for details on allocation of IO address space.
IOWrite transactions are typically used to set up IO transfers and to start IO. The address cycle of the request packet carries an IO address, while the data cycle carries 32 bits of data whose interpretation depends on the IO address. For block transfer devices, a processor typically will do a number of IOWrites to set up the transfer, and then a final IOWrite to initiate the transfer.
An IOWrite starts out at a small cache as an IOWRqst packet. The big cache of the cluster puts the IOWRqst on the main DynaBus, where it is picked up by all the other big caches. These caches put the IOWRqst on their private buses. Thus the IOWRqst is broadcast throughout the system. Broadcasting eliminates the need for requestors to know the location of devices in the hierarchy and makes for a simpler protocol. When the IOWRqst reaches the intended device, the device performs the requested operation and sends an IOWRply on its way. The IOWRply is broadcast in the same way as the IOWRqst, so it eventually makes its way to the requesting small cache. When the reply arrives, the small cache then lets its processor proceed.
IORead transactions read 32 bits of data from a device. This data may either be status bits that are useful in controlling the device, or data being transferred directly from the device to the processor.
IOReads work the same as IOWrites: An IORead starts out at a small cache as an IORRqst packet. The big cache of the cluster puts the IORRqst onto the main DynaBus, where it is picked up by other big caches and put on the private buses. Once the intended IODevice receives the request, it reads the data from its registers and sends it on its way via an IORRply. The IORRply gets broadcast in exactly the same way as the IORRqst, and eventually makes its way to the cache that initiated the transaction. Note that for both IOReads and IOWrites exactly one device responds to a given IO address.
BIOWrites are used in cases where a processor needs to have more than one IO device act upon a command without having to explicitly send multiple IOWrites (interprocessor interrupts and map updates are examples where BIOWrites are useful).
A BIOWrite starts out at a small cache as a BIOWRqst packet. The big cache of the cluster puts the BIOWRqst on the main bus. The memory then generates a BIOWRply with the same parameters as the BIOWRqst, and all big caches put this BIOWRply on their private DynaBuses. Thus the BIOWRply is broadcast throughout the system. When the BIOWRply reaches the requesting small cache, the cache lets its processor proceed. Note that the reply is not generated by the IO device, but by main memory. The reason is that there is no unique IO device that can generate the reply packet. It is important to point out that errors that occur during a BIOWrite may not be caught by the requesting device's time out mechanism. If one of the intended recipients of the BIOWrite is broken, for instance, the requestor won't get any indication of it. This is a basic problem with broadcast operations, however, and there is no simple solution.
Data Transfer
A simple way to accomplish data transfer is to require that every IO device connect to the DynaBus via a cache. This way, data going in and out of the memory system automatically participates in the consistency algorithm, and no additional mechanism is needed.
Unfortunately, there is a disadvantage to this way of doing things. For input, the bus bandwidth required for data transfer is effectively doubled since a block must first be fetched to service a miss and then get written out when it is victimized. This doubling may be avoided by defining a new transaction, IOWriteStream, which is used directly to write to memory. The caches on the bus, of course, must also watch for IOWriteStreams and update their copies on a match.
8. Error Handling
The DynaBus presumes the following model for dealing with errors and other exceptional events. Each device provides its own capabilities for detecting errors, regardless of whether these errors are internal (for example a parity error on an internal bus) or result from interaction with other devices (for example transaction time-out, or illegal parameters within a request packet).
Once an error is detected, a device decides whether it can handle the error on its own, or needs to report the error to some other party. If the device is capable of handling the error, no facilities need to be provided by the bus, so this case is uninteresting. The errors that a device cannot handle by itself typically fall into two categories: recoverable (at least as far as the detecting device is concerned), and catastrophic.
When a device encounters a catastrophic error it uses the DBus to freeze the state of the machine so it can be examined by a debug processor (see [DBusSpec]). When it encounters a recoverable error while servicing a request, it uses the DynaBus Mode/Fault bit in the reply packet to report it. The least significant 32 bits of the first data word of the reply packet are set aside for the FaultCode (the format of FaultCode is described in Appendix C). Thus the only reporting mechanism the DynaBus provides is a facility for indicating that a transaction completed abnormally.
Appendix A. Encoding of the Command Field
The table below gives the encoding for the Command field within the header cycle of a DynaBus packet.
Encoding  Command
0000 0  ReadBlockRequest
 0000 1  ReadBlockReply
 0001 0  WriteBlockRequest
 0001 1  WriteBlockReply
  
 0010 0  WriteSingleRequest
 0010 1  WriteSingleReply
 0011 0  ConditionalWriteSingleRequest
 0011 1  ConditionalWriteSingleReply

 0100 0  FlushBlockRequest
 0100 1  FlushBlockReply
  
 [0101 0..0111 1]  Unused

 1000 0  IOReadRequest
 1000 1  IOReadReply
 1001 0  IOWriteRequest
 1001 1  IOWriteReply
 1010 0  BIOWriteRequest
 1010 1  BIOWriteReply

 [1011 0..1101 1]  Unused

 1110 0  MapRequest
 1110 1  MapReply
 1111 0  DeMapRequest
 1111 1  DeMapReply
Appendix B. Allocation of IO Address Space
This section defines how the I/O address space on the DynaBus is allocated amongst I/O devices. This address space can be as large as 47 bits, but currently it is limited to 32 bits since Dragons only supply 32 bit I/O addresses. The high order 15 bits of DynaBus address must be 0; however, devices must check this and only respond if the bits are 0 to facilitate future extentions.
The I/O address space allocation problem is nontrivial only because the space requirements of different devices are quite different from one another. Devices such as the small caches have very modest requirements, while devices such as the IOP, which produces addresses on an industrial bus, need large chunks of address space. Thus the obvious solution of dividing the space equally amongst the maximum planned number of devices doesn't work.
The allocation method chosen here is to divide all I/O devices into three classes small, medium, and large, according to how many I/O addresses a device is likely to need. A small device is given 210 contiguous addresses, a medium device 216 bits, and a large device 224 bits. Section B1 below defines the precise address encoding for devices in each class. Section B2 describes how devices must decode which I/O operations are meant for them. The last three sections, B3, B4, and B5 give the current allocations for devices in each of the three classes, respectively.
B1. DynaBus I/O Address Encoding
DynaBus I/O addresses are split up into three fields: a device type DevType, which is different for each type of device (eg. small cache, IOP, map cache); a device number DevNum, which is different for each instance of a given type; and a DevOffset which is the address of an I/O location within a particular device instance. Having an explicit concept of device type in the encoding is convenient because it allows us to address all devices of a given type via broadcast I/O operations.
The address format for each of the three classes is as follows:
DevType (msb)DevNumDevSubAddr (lsb)
Small 12 bits 10 bits 10 bits
Medium 8 bits 8 bits 16 bits
Large 4 bits 4 bits 24 bits
The most significant bits of an address determine the class of a device as follows:
Value (12 bits)Type
000000000000 Reserved - No DynaBus device should respond to this DevType
000000000001
to Small device (1K) - 31 different types supported, 1024 devices per type
000000011111
00000010XXXX
to Medium devices (64K) - 31 different types supported, 256 devices per type
00011111XXXX
0010XXXXXXXX
to Large devices (16M) - 14 different types supported, 16 devices per type
1111XXXXXXXX
The DevType for a device is hardwired internally for each device, while DevNum is derived from a device's DynaBus DeviceID which is set up at system initialization. Which bits of DeviceID are chosen as the DevNum is entirely up to the designer of the device. The only guarantee he must make is that the bits selected result in unique DevNum's being assigned to devices of that type (ie. no two devices of the type have the same DevNum).
B2. Address decoding by devices
For IOReadRequest and IOWriteRequest packets, devices should check that the DevType and DevNum match the internally stored values. A match indicates that this device is being addressed. Both IOReadRequest and IOWriteRequest require the device to generate a reply.
For BIOWriteRequest, devices should check only that DevType matches the internally stored value. A match indicates that this device is being addressed. This allows BIOWrites to be directed to all devices of a given type. When a device is requested to do a BIOWrite, it must not generate a reply since multiple devices may have been addressed.
B3. Allocation Table for Small devices
Small devices have a 12-bit DevType, ranging from 01H to 1FH, a 10 bit DevNum, and a 10-bit DevOffset. The following table describes the current allocation:
TypeDeviceComments
01 Small Cache Access to Small Cache registers
02 Display Access to Display control registers
03 Memory Controller
04 Free
05 Free
06 Free
07 Free
08 Free
09 Free
0A Free
0B Free
0C Free
0D Free
0E Free
0F Free
10 Free
11 Free
12 Free
13 Free
14 Free
15 Free
16 Free
17 Free
18 Free
19 Free
1A Free
1B Free
1C Free
1D Free
1E Free
1F Free
B4. Allocation Table for Medium devices
Medium devices have an 8-bit DevType, ranging from 02H to 1FH, an 8 bit DevNum, and a 16-bit DevOffset. The following table describes the current allocation:
TypeDeviceComments
02 IOP Access to SloBus I/O space and IOP registers
03 Free
04 Free
05 Free
06 Free
07 Free
08 Free
09 Free
0A Free
0B Free
0C Free
0D Free
0E Free
0F Free
10 Free
11 Free
12 Free
13 Free
14 Free
15 Free
16 Free
17 Free
18 Free
19 Free
1A Free
1B Free
1C Free
1D Free
1E Free
1F Free
B2.3. Allocation Table for Large devices
Large devices have a 4-bit DevType, ranging from 02H to 0FH, a 4 bit DevNum, and a 24-bit DevOffset. The following table describes the current allocation:
TypeDeviceComments
02 IOP Access to SloBus memory space per byte
03 IOP Access to SloBus memory space per halfword/fullword
04 Map Cache Access to Map Cache entries and control registers
05 Free
06 Free
07 Free
08 Free
09 Free
0A Free
0B Free
0C Free
0D Free
0E Free
0F Free
Appendix C. Format of FaultCode
The error reporting mechanism on the DynaBus includes a fault bit and 32 bits of information about the fault, FaultCode. This section defines the format of FaultCode.
FaultCode is divided up into a 3 bit MajorCode, which appears in the low-order three bits, and 29 bits of MinorCode which comprise the rest of the word. MajorCode divides up all faults into 8 categories that are important to distinguish quickly, while MinorCode provides a way to encode a large number of infrequent subcases (actually, MinorCode is used only for one of the major codes).
The encoding of MajorCode is as follows:
Encoding Name  Meaning
000 MemAccessFault  first write to page or insufficient privilege
001 IOAccessFault insufficient privilege to read or write IO location
010 MapFault  map cache miss
011 AUFault  arithmetic unit fault
100 DynaBusTimeOut transaction timeout on DynaBus
111 DynaBusOtherFault some other DynaBus fault reported via reply packet
MinorCode is meaningful only for DynaBusOtherFault. Its top 10 bits give the id of the reporting device, while the remaining 19 bits indicate the fault. The encoding of these 19 bits is left up to the designers of individual devices.
References
[DynaElec] DynaBus Electrical Specifications.
[DynaAlg] DynaBus Consistency Algorithm Specifications.
[DynaImpl]  DynaBus Implementation Guidelines.
[ArbSpec] Arbiter Specifications.
[DBusSpec] The Dragon DBus Specifications.
[MapSpec] THE DRAGON MAP PROCESSOR
[ArbSpec] The DRAGON ARBITER
ChangeLog
October 13, 1986: [PSS] Added Small Cache, Display, and Map Cache to device allocation tables.
January 26, 1987: [PSS] Made numerous changes to fix inaccuracies and get the document to conform to truth once again.