[_CD4_]<dragon7.0>DynaBus>DynaBusLogicalSpecifications.tioga!9

DYNABUS LOGICAL SPECIFICATIONS

DRAGON PROJECT — FOR INTERNAL XEROX USE ONLY

The Dragon DynaBus

Logical Specifications

Pradeep Sindhu

Dragon-86-?? Written May 22, 1986 Revised December 2, 1987

Abstract: The DynaBus is a high bandwidth, synchronous, packet switched bus that will be used to interconnect the DRAGON memory system and IO devices. This document provides the logical specifications for this bus and explains how it operates. Electrical specifications are not covered here, but are taken up in the document "DynaBus Electrical Specifications".

Keywords: Memory Interconnect, Multiprocessor Bus, Packet Switching, Consistency Protocol

FileName: [Indigo]<Dragon>DynaBus>DynaBusLogicalSpecifications.tioga, .interpress

XEROX Xerox Corporation
Palo Alto Research Center
3333 Coyote Hill Road
Palo Alto, California 94304

For Internal Xerox Use Only

Contents

1. Introduction

2. Bus Overview

3. Bus Interface

4. Bus Protocol

5. Data Consistency

6. ConditionalWriteSingle

7. Input Output

8. Error Handling

Appendix A. Encoding of the Command Field

Appendix B. Allocation of IO Address Space

Appendix C. Format of FaultCode

References

1. Introduction

The DynaBus is a synchronous, packet switched bus designed to address the requirements of high bandwidth and data consistency within the memory system of a high performance shared memory mutiprocessor. Within the Dragon System this bus is used at two levels in the memory hierarchy: At the first level, it interconnects the components of a Dragon cluster. These components include one or more Dragon processors attached to the bus via Small Caches; some memory, connected via a Big Cache; a Map Cache, and one or more IO devices (Figure 1). At the second level, the DynaBus interconnects clusters and main memory. The principal feature of this two level architecture is that it provides sufficient bandwidth to support a few tens of Dragon processors, high speed IO devices, and a high resolution color display, even when processors have a cycle time close to that of the bus. In our estimate, this assumption of balanced bus and processor cycle times most closely reflects what technology can deliver within the next few years.

[Artwork node; type 'ArtworkInterpress on' to command tool]

Figure 1. The Two-level Dragon Memory Hierarchy

This document describes the structure, protocol, and operation of the DynaBus at the logical level. It begins in the following section with an overview of the bus's main characteristics. Section 3 defines the bus interface, which encapsulates all of the knowledge about the bus that a device needs to have in order to connect to it. Section 4 uses this interface definition to specify the bus protocol: it discusses arbitration, lists the set of bus commands currently defined, and shows how these commands are used in the various transactions. Section 5 describes how the protocol is used to keep multiple copies of cached data consistent. Section 6 discusses why there is a special ConditionalWriteSingle transaction. Section 7 takes up IO. Finally, Section 8 discusses error handling. Electrical characteristics are not covered here at all, but appear in a companion document titled "DynaBus Electrical Specifications". The appendices contain details that do not fit into the flow of the main document.

2. Bus Overview

A key requirement for a multiprocessor bus is that it be able to deliver a large usable bandwidth. The cycle time of such a bus must be correspondingly small since the usable bandwidth is at most as large as the electrical bandwidth. Traditional circuit switched bus designs, which tie up the bus for the duration of each transaction, rapidly become unreasonable as the cycle time of the bus is decreased. To see this, we first need to note that reads to memory constitute most of the traffic on a memory bus. If we assume that memory access time is approximately an order of magnitude larger than bus cycle time, we see that the bus remains idle for a significant fraction of the time. One way to avoid this difficulty is to increase the size of the transport unit to amortize the cost of memory accesses, but this compromises cache efficiency. To avoid these idle bus cycles, the Dynabus is packet switched: the request and reply for each transaction are dissociated and the bus is free to be used by other requesters in between. Although dissociation strictly is necessary only for transactions involving main memory, the DynaBus uses it for all transactions. Additionally, the dissociation is complete in that the number of delay cycles between request and reply are not fixed. In addition to making the design more uniform, these decisions have several important advantages. They eliminate bus deadlock, simplify the solution the data consistency problem in a multi-level cache system, and make it possible to connect to slow devices such as bridges to industry standard buses.

Another important requirement for a multiprocessor bus is that it address the problem of maintaining consistency in the face of multiple updates to shared data. There are several well-known solutions to this problem for circuit switched buses, but unfortunately none of them is directly applicable to a packet switched bus. The DynaBus definition includes a novel scheme for maintaining consistency on a packet switched bus. This scheme is applicable to memory systems with more than one level of cache in the memory hierarchy where each level is connected by a packet switched bus.

The DynaBus can be understood best in terms of three layers: cycles, packets, and transactions (these layers correspond to the electrical, logical, and functional levels of operation, respectively). A bus cycle is simply one complete period of the bus clock; it forms the unit of time and electrical information transfer on the bus; the information is typically either an address or data. A packet is a contiguous sequence of cycles; it is the mechanism by which one-way logical information transfer (including broadcast) occurs on the bus. The first cycle of a packet carries address and control information; subsequent cycles typically carry data. A transaction consists of a pair of request-reply packets that together perform some logical function (such as a memory read); the pair may be separated by an arbitrary number of cycles, modulo timeout.

There are approximately 70 signals on the DynaBus. One of these is the bus clock, which provides synchronization. Several other signals serve the control functions of sending and receiving packets, of handling exceptional conditions, and of resetting the system. A multiplexed data/address path accounts for a majority of the signals, 64. During data cycles this path carries 64 bits of data; during address cycles it carries a command, some control bits, a device id, and an address; the address part is 47 bits and represents either a real address or an IO address, depending on the command. This permits direct access to memories of up to 128 Tera addressing units, which in the case of Dragon is a 32-bit word. For efficient bus utilization, data is usually transferred in multiple word chunks called blocks. Finally, the signals shared and owner, are used to maintain multiple copies of shared data in a consistent state.

Each DynaBus has an arbiter that permits the bus to be time multiplexed amongst contending devices, each of which is identified by a 10-bit unique id called the DynaBusId. These devices make requests to the arbiter using dedicated request lines, and the arbiter grants the bus using dedicated grant lines. Different devices may have different levels of priority, and the arbiter guarantees fair (bounded-time) service to each device within its priority level. Bus allocation is non-preemptive, however. A device that has been granted the bus is called bus master. The sequence of cycles initiated by a master will typically involve at least one other device on the bus; such a device is called a slave. Since the DynaBus is packet switched, every device must be capable of serving both as a master and as a slave. The notions of master and slave are adequate to describe the transmission of a single packet, but fail to capture transaction level interaction. The terms requestor and responder are used in this context to indicate the devices that place a request and the corresponding reply packet on the bus, respectively.

The following transactions are defined for the DynaBus: ReadBlock, FlushBlock, WriteBlock, WriteSingle, ConditionalWriteSingle, IORead, IOWrite, BIOWrite, Map, and DeMap. ReadBlock allows a cache to read a packet from memory or another cache. FlushBlock allows caches to write back dirty data to memory. WriteBlock allows new data to be introduced into the memory system (for example reads from a disk). WriteSingle is a short transaction used by caches to update multiple copies of shared data without affecting main memory. IORead, IOWrite and BIOWrite initiate and check IO operations. The Map and DeMap transactions permit the implemention of high speed address mapping in a multiple address space environment. Finally, the ConditionalWriteSingle transaction provides the ability to do atomic read-modify-writes to shared data at the maximum rate transactions can be initiated on the bus. The encoding space leaves room for defining six other transactions.

3. Bus Interface

All devices connect to the DynaBus via a standard bus interface, shown in Figure 2. This interface consists of five signal groups: a control port, an arbitration port, a receive port, a send port, and a consistency port. The control port provides the clock. The arbitration port is for bus acquisition. The send and receive port signals allow a device to send and receive packets. The consistency port provides signals useful in keeping multiple copies of data in a consistent state.

This section simply names the signals in the interface and gives their functions without saying how they are used. Usage will be taken up in the bus protocol section, which immediately follows this one. The description uses the following conventions: S[a..b) denotes a group of b—a lines that encode the bits representing the signal S; most significant bits of the signal are written leftmost. A signal represented by a single wire is written unencumbered with the [..) notation. The term device will be used throughout to indicate an entity connected to the bus. A block will denote a chunk of 8 contiguous 32-bit words aligned in real address space such that the address of the first word is 0 MOD 8. All signal names in the interface use the device as reference, not the bus interface. Thus for example, DataOut is used to send data from the device to the bus.

[Artwork node; type 'ArtworkInterpress on' to command tool]

Figure 2. DynaBus Logical Interface

Control Port

Clock (Input)

This signal provides the basic timing for interactions between a device and the interface. Any other clocks used by a device typically will be derived from this signal.

SStopOut (Output)

The signals SStopOut and SStopIn provide a way to synchronously stop all devices on the dynabus. A device asserts SStopOut when it wants to bring the system to a synchronous stop, and keeps it asserted as long as it wants to maintain the stopped state. This stop signal is eventually seen as SStopIn by all devices on the Dynabus during the same cycle.

SStopIn (Input)

When the arbiter sees SStopIn, it stops granting requests, eventually halting all bus activity. Thus a device is not required to look at SStopIn, but it may do so to stop bus activity voluntarily.

DBus (Input/Output)

These signals constitute the system Debug bus (see [DBusSpec]).

DSelect (Input)

This signal tells a device that it is being addressed via the DBus (see [DBusSpec]).

Arbitration Port

RequestOut[0..2) (Output)

These signals are used to make bus requests to the arbiter.

Grant (Input)

This signal is used by the arbiter to signal a device that it has bus grant. A device must drive onto the bus for exactly the cycles during which Grant is asserted.

HiPGrant (Input)

This signal tells a device whether the current grant is for a high priority request or a low priority one. This signal comes one cycle earlier than Grant and is valid for exactly one cycle.

LongGrant (Input)

This signal tells a device whether the current grant is for a 5 cycle packet or a 2 cycle packet. This signal also comes one cycle earlier than Grant and is valid for exactly one cycle.

Receive Port

HeaderCycleIn (Input)

This signal indicates the beginning of a packet. It is asserted when the header (first cycle) of a packet has arrived and is ready to be read from DataIn. In other words, HeaderCycle can be used to latch the header from DataIn.

DataIn[0..64) (Input)

These 64 lines encode the cycles of incoming packets. The header cycle for a packet always has the same format. It carries the bus command, a Mode/Fault bit, a RplyShared bit, the id of the transaction's requestor, and an address. Subsequent cycles may carry either data or address, depending on the command. Also depending on the command, the address may be either in real address space or IO address space. The fields within the 64 bits of the header cycle are arranged as follows:

[Artwork node; type 'ArtworkInterpress on' to command tool]

Figure 3. Encoding of Header Cycle

For machines with real and I/O addresses smaller than 47 bits, the address must appear in the low order bits of the field and the contents of the unused high order bits must be 0.

ParityIn (Input)

This signal carries parity computed over the DataIn lines. Its timing is currently undefined.

Send Port

HeaderCycleOut (Output)

This signal indicates the beginning of the packet being sent. It is asserted during the cycle in which the header is being sent. Note that HeaderCycleOut is generated by the device sending the packet, although it could also have been generated by the arbiter.

DataOut[0..64) (Output)

These 64 lines encode the cycles for outgoing packets. The encoding for the header cycle is the same as that given above.

ParityOut (Output)

This signal carries parity computed over the DataOut lines. Its timing is currently undefined.

Consistency Port

SharedOut (Output)

The DynaBus Shared line is used to maintain multiple copies of cached data in a consistent state. When an address cycle appears on the bus, all caches match the address against addresses they have stored. Caches that find a match use SharedOut to assert the Shared line. The delay from address in to SharedOut is implementation dependent and therefore not part of the bus specifications. There is no corresponding signal SharedIn because caches don't read the Shared line directly—its state is returned via a reply packet.

OwnerOut (Output)

The DynaBus Owner line is also used by caches to maintain consistency. A cache uses OwnerOut to assert the Owner line when a read request appears on the bus and the cache is the owner of that block (a cache is the owner of a block when its processor was the last one that did a write into the block). The delay from address in to OwnerOut is the same as for SharedOut. There is no corresponding signal OwnerIn because caches don't need to read the Owner line.

SpareIn[0..2) (Input)

These two lines are spare.

SpareOut[0..2) (Output)

These two lines are spare.

4. Bus Protocol

The DynaBus's protocol can be described best in terms of bus operation at three levels: cycles, packets, and transactions. These terms were defined earlier, but we will review them quickly. A cycle is one complete period of the bus clock; a packet is a contiguous sequence of cycles; and a transaction is a pair of request-reply packets. The first cycle of a packet always carries a device id, a command, a Mode/Fault bit, a RplyShared bit, and an address; subsequent cycles may carry data or address, depending on the command.

Before a transaction can begin, the requesting device must get bus mastership from the arbiter. Once it has the bus, the device puts a request packet on the bus one cycle at a time, releases the bus, and then waits for the reply packet. Packet transmission is uninterruptable in that no other device can take the bus away during this time, regardless of its priority. The transaction is complete when another device sends a reply packet. Note that an arbitrary number of cycles may elapse between the request and reply packets. This is not quite true because of timeouts and the need to do flow control, but we'll say more about these topics later. Under normal operation, the protocol ensures a one-to-one correspondence between request packets and reply packets; however, because of errors, some request packets may not get a reply. Thus, devices must not depend on the number of request and reply packets being equal since this invariant will not in general be maintained. The protocol requires devices to provide a simple, but crucial guarantee: that they service request packets in arrival order.

The detailed description of the protocol below first takes up arbitration and flow control. It then discusses time outs, the mechanism used to detect no response. Next, it lists the bus commands currently defined, says how NoOps are issued, and explains the purpose of the device id. The last subsection provides the detailed makeup of the different transactions. The description of the cache consistency algorithm is deferred to the following section.

Arbitration and Flow Control

The arbiter serves two related purposes. First, it time-multiplexes the bus accesses of multiple contending devices. Second, it implements flow control. For flow control we need to distinguish between two kinds of packets: requests and replies. A request packet is answered in a bounded time by a matching reply packet. The canonical example of a request-reply pair is a memory read: "What does memory location 234 contain?" -> "It contains 567." Since each of the two packets is independently arbitrated, some mechanism is necessary to avoid congestion. Congestion is being asked too many questions before being given the opportunity to answer. Such congestion might occur for example in a cache or a memory controller.

There are two mechanisms for avoiding congestion. The first is arbitration priorities. A reply packet always takes precedence over a request. This mechanism would eliminate the congestion problem entirely if devices were always ready to reply before the onset of congestion. Unfortunately this is not true for memory controllers, for which very long request queues would be required, so a second mechanism deals with congestion that impends before the device is ready to reply. The device demands system-wide hold, which takes precedence over all queries but yielding to all replies. During system-wide hold the arbiter allows no further request packets. When the crisis has passed, the device releases its demand for system-wide hold.

Each device interacts with the arbiter via a dedicated port consisting of two request wires Request[0..1] and one grant wires Grant. Two other wires, HiPGrant, and LongGrant, are shared by all devices connected to an arbiter. The arbiter supports two basic types of ports, called normal and memory. A normal port has two request counters, one for high priority requests and the other for low priority requests. In addition, each counter has an associated packet length. A device indicates whether it wants to make a low or high priority request via its Request wires. It may also use these wires to sieze the bus, locking out all queries, and to release it. The Request wires are encoded as follows:

Encoding Interpretation
0 release this device's demand for system-wide hold, if any
1 demand system-wide hold
2 add a low priority request
3 add a high priority request

A memory port consists of a single request FIFO, with a single priority for all incoming requests. A device indicates the length of a packet via its Request wires. It may also use these wires to sieze and release the bus. The Request wires are encoded as follows:

Encoding Interpretation
0 release this device's demand for system-wide hold, if any
1 demand system-wide hold
2 add a request for a 2 cycle packet
3 add a request for a 5 cycle packet

A device is permitted to have several pending requests within the arbiter, subject to an upper limit imposed by the arbiter implementation. A separate request is registered for each cycle in which the Request wires are in the "add a request" state. For memory ports the FIFO guarantees that grants will be given in the order in which requests arrived.

The type of a port, as well as type-dependent characteristics (the lengths for the high and low priorities for a normal port, and the priority for a memory port) are provided at initialization time using the Debug bus. For a normal port the priority number of the high priority port (0, 1, 3, 4, 5, or 6) must be no greater than that of the low priority port, since lower priority numbers take precedence.

Grant is used by the arbiter to signal that a device has grant. Grant is asserted for as many cycles as the packet is long. If Grant is asserted in cycle i then the device can drive its outgoing bus segment in cycle i+1. HiPGrant and LongGrant describe a grant that is about to take place. The cycle before Grant is asserted, HiPGrant tells the device whether the next grant will correspond to a high priority request or not; and LongGrant tells the device whether the next grant will correspond to a 5 cycle packet or not. The timing of key signals at the pins of a requester is shown below for a two cycle request (DriveData of course doesn't appear at the pins of the chip, but the signals that it enables do).

[Artwork node; type 'ArtworkInterpress on' to command tool]

Figure 4. Arbitration Timing for a Minimum Latency, Two-Cycle Packet; all signals are at the pins of a requesting device.

The arbitration delay shown (7 cycles). is the minumum. If bus traffic is heavy, competition may delay the arrival of Grant for a long time, depending on the sequence and priority of competing requests. Observe that HiPGrant and LongGrant arrive one cycle before Grant and are valid for exactly one cycle.

The arbiter supports six distinct priority levels. Highest priority requests are served first and requests within a level are served approximately round-robin. The current assignment of levels to devices is as follows:

Value of Priority Meaning

0 (highest) cache reply priority
1 memory priority
3 display high priority
4 I/O priority
5 cache low priority
6 (lowest) display low priority

The two highest priorities are assigned to reply packets. Memory uses its priority to send both five cycle and two cycle packets. The two priorities assigned to the display are the lowest and the highest for request packets. Normally, the display uses its low priority to satisfy requests; since this is the lowest priority, the display will end up using otherwise unusable bandwidth on the bus. Occasionally, when the display's queue gets dangerously empty, it adds a few high priority requests.

The synchronous stop signal is used to enable and disable arbitration. After machine reset it is used to start arbitration synchronously. Thereafter it may be asserted and de-asserted at will. While it is asserted, no new packets will be granted, but the arbiter will continue to count requests, and grant them later when synchronous stop is de-asserted.

Time Outs

To detect devices that do not respond, or devices that aren't there in the first place, the protocol requires devices to implement their own time out facilities. Each device must maintain a counter that starts counting bus cycles when a bus request for a request packet is issued. If maxWaitCycles cycles have elapsed before the corresponding reply packet is received, the device must assume that an error has occurred and either handle it on its own or assert the DBus Error line.

The determination of a system-wide value for maxWaitCycles is a little tricky because of the wide variance in expected service times. For example, a low priority device might take a long time to just get bus grant, while a higher priority device would get grant relatively quickly. A low priority device might in fact be forced to wait arbitrarily long if a higher priority device decides to hog the bus. The question of whether this ought to be considered an error is debatable.

To avoid getting entangled in these issues, the protocol simply specifies a system-wide lower bound on the limit maxWaitCycles and leaves it up to the device implementor to decide the exact value. Such a lower limit is needed to avoid generating frequent false alarms. A conservative lower limit can be arrived at by computing the worst-case service time for a cache request and increasing it by an order of magnitude for safety (caches are taken since they are the lowest priority devices that do not change their request priority). Assuming there are 8 caches and only one memory bank, the worst case service time is at most

= 8*#cycles to service one request in an unloaded system
= 8*25 cycles.

Increasing this by an order of magnitude gives 2048 cycles, so each device is required to have maxWaitCycles e 2048.

NoOps

It sometimes happens that a device that has done an arbiter request has nothing to send when it gets grant. In this situation the device is expected to send a NoOp packet of the same length as the packet it had originally intended to send. It does this simply by not asserting HeaderCycleOut during its allocated header cycle. Thus, there is no special command to indicate NoOp.

Commands

The packet command field is 5 bits. Four bits encode up to 16 different transactions, while the fifth bit encodes whether the packet is a request or a reply. Ten of the transactions are currently defined.

The description below lists commands by pairs (request/reply). For brevity, each command is given an abbreviation that appears in parentheses. The encoding for the commands appears in Appendix A.

ReadBlockRequest (RBRqst) Request to read a block from memory system
ReadBlockReply (RBRply) Reply containing the block

WriteBlockRequest (WBRqst) Request to write a block to the memory system
WriteBlockReply (WBRply) Reply indicating the write is done

FlushBlockRequest (FBRqst) Request to write a block from a cache to memory
FlushBlockReply (FBRply) Reply indicating the write is done

WriteSingleRequest (WSRqst) Request to update a single word of shared data in caches
WriteSingleReply (WSRply) Reply to update a single word of shared data in caches

ConditionalWriteSingleRequest (CWSRqst)Request to perform a read-modify-write to shared data
ConditionalWriteSingleReply (CWSRply)Reply to perform a read-modify-write to shared data

IOReadRequest (IORRqst) Request to read a single word from IO address space
IOReadReply (IORRply) Reply containing the word read

IOWriteRequest (IOWRqst) Request to write a single word to IO address space
IOWriteReply (IOWRply) Reply indicating the write was done

BIOWriteRequest (BIOWRqst) Request to broadcast write a single word to IO address space
BIOWriteReply (BIOWRply) Reply indicating it is ok to initiate the next operation, if any

MapRequest (MapRqst) Request to fetch a map entry from the Map Cache
MapReply (MapRply) Reply containing requested map entry or fault

DeMapRequest (DeMapRqst) Request to demap a physical page from caches
DeMapReply (DeMapRply) Reply to demap a physical page from caches

Mode/Fault

This bit has different interpretations for request and reply packets. In a request packet it supplies the mode (kernel=0, user=1) of the device that issued the request (when this device is a cache, the mode bit is the mode bit of the cache's processor). In a reply packet a 1 indicates that the device generating the reply encountered a fault, a 0 indicates no fault. When the fault bit is set in a reply packet, the 32 low order bits of the second cycle supply a FaultCode. The format of FaultCode is defined in Appendix C.

RplyShared

This bit is unused within request packets. Within reply packets it says whether the block is shared or not. Only devices that play the consistency game need to look at this bit.

Device ID

Every packet on the bus carries a device identifier in its first cycle. For request packets, this DeviceID carries the unique number of the device making the request. For reply packets it is the number of the intended recipient (ie. the device that sent the request). Such an identifier is needed because the address part of a packet alone is not sufficient to disambiguate replies.

Devices that either have only one outstanding reply, or that have multiple outstanding replies but can somehow discriminate between them, need only one DeviceID. Other devices are allocated multiple DeviceID's to allow them to disambiguate replies. These DeviceID's must be contiguous and must be a power of two in number.

The DeviceID(s) for a device are loaded in at system initialization time via the Debug bus (see [DBusSpec] for details).

Address

The address part of the first cycle is 47 bits. In the current implementation only 32 of these bits are used—the high-order 15 must be 0. To ensure easy extension, devices must check that these high order bits are 0 and do nothing if they are not. For non-IO transactions, the 32 bits represent the address of a 32-bit word in real address space. For IO transactions the 32 bits represent the IO address of a 32 bit word in the IO address space.

For all transactions other than Map, the contents of the address field in the request packet and its corresponding reply must be identical.

Transactions

The following transactions are currently defined.

ReadBlock

ReadBlock reads a block from main memory or from a cache, depending on whether main memory is consistent with the caches. Recall that a block is eight contiguous 32-bit words aligned in real address space such that the address of the first word is 0 MOD 8.

The request packet for ReadBlock consists of two address cycles. The first cycle provides the address of the block to be read; the second provides the address of the block being victimized within the cache requesting the ReadBlock. Bit 63 of this second cycle is a 1 if the address is valid, while bits 17-60 contain the block address; the value of bits 61 and 62 may be arbitrary. The victim address allows a second level cache to determine when a block is no longer in any of the first level caches, and therefore is a candidate for being replaced.

The reply packet for ReadBlock is five cycles long. The first cycle is the address of the first word being returned, and the remaining four cycles carry the cycles of block data. The 64-bit data cycles of the block must be in cyclic order, with the cycle containing the addressed word appearing first.

WriteBlock

WriteBlock writes a block to main memory and to matching caches. This transaction is used by producers of data (outside the memory system) to inject new data into the memory system.

The request packet is five cycles. The first cycle carries the address, and the remaining four carry the four cycles of block data. The data cycles must be in cyclic order, with the cycle containing the addressed word appearing first.

The reply packet, which is two cycles, serves as the acknowledgement that the write has been performed. The first cycle carries the address; the contents of the second cycle are undefined.

FlushBlock

FlushBlock writes a block to main memory from some cache—other caches do not listen. This transaction is provided (in addition to WriteBlock) to allow a cache to flush its data without conflicting with processor accesses within other caches.

The request packet is five cycles. The first cycle carries the address, and the remaining four cycles carry the four cycles of block data. The data cycles must be in cyclic order, with the cycle containing the addressed word appearing first.

The reply packet, which is two cycles, serves as the acknowledgement that the write has been performed. The first cycle carries the address; the contents of the second cycle are undefined.

WriteSingle

WriteSingle writes a 32-bit word out to real address space. This transaction is used by caches to maintain multiple copies of read/write data in a consistent state. Unlike FlushBlock, WriteSingle does not write the word out to memory, but is directed exclusively to caches.

The request packet for WriteSingle consists of two cycles, address and data, in that order. The reply packet is identical, except that the command is WriteSingleReply and the RplyShared bit indicates whether the datum is shared.

ConditionalWriteSingle

ConditionalWriteSingle does a read-modify-write to a given location in real address space. This transaction is provided to allow the actions of multiple processors to be synchronized.

The request packet is two cycles, the first of which carries the address. The second carries the old and new values, with the old value appearing in the most significant 32 bits and the new value in the least significant 32 bits.

The reply packet is five cycles. The first two cycles are identical to the request packet, except for command and RplyShared, while the contents of the remaining three cycles are undefined.

IORead

The IORead and IOWrite transactions are used by caches to communicate with IO devices. The 32 bit address of an IORead or IOWrite defines a common, unmapped IO address space. Each IO device has a unique piece of the space allocated to itself, and is free to interpret IOReads and IOWrites to that piece in any manner it sees fit. Further details of the IO model and the makeup of the IO address space appear in Section 7 and Appendix B.

IORead reads a single word from IO address space. The request packet consists of two cycles, the first of which carries the IO address. This address appears in the low 32 bits of the 47 bit address field—the unused 15 high order bits must be 0. The contents of the second cycle are undefined.

The reply packet also consists of two cycles. The first carries the IO address in the request, while the second carries the data read from the IO device. The 32 bits of data appear in the low 32 bits of the 64 bit cycle; the unused high order bits are undefined.

IOWrite

IOWrite writes a single word to IO address space. The request packet is two cycles, the first carries the IO address, the second the data. The address appears in the low 32 bits of the 47 bit address field—the unused 15 high order bits must be 0. The 32 bits of data appear in the low 32 bits of the 64 bit cycle; the unused high order bits are undefined.

The reply packet is two cycles. The first cycle contains IOWriteReply as the command and the same address as in the corresponding request. The contents of the second cycle are undefined.

BIOWrite

BIOWrite (Broadcast IOWrite) writes a single word to the IO address space. This transaction is different from IOWrite in that there may be multiple devices that act on the request. The devices that act on a given BIOWrite are all instances of the same device type (see Appendix B); thus, BIOWrite is not a global broadcast, but only a broadcast to all devices of a given type.

The request packet is two cycles, the first carries the IO address, and the second the data. The address appears in the low 32 bits of the 47 bit address field—the unused 15 high order bits must be 0. The 32 bits of data appear in the low 32 bits of the 64 bit cycle; the unused high order bits are undefined.

The reply packet is two cycles. The first cycle contains BIOWriteReply as the command and the same address as in the corresponding request. The contents of the second cycle are undefined.

Map

Map reads a virtual-page-to-real-page mapping entry from the Map Cache.

The request packet is two cycles. The first cycle contains the virtual page number, while the second contains the address space id in which the virtual page is to be mapped (see [MapSpec] for the detailed format).

The reply packet is also two cycles. The first cycle contains the real page and flags, while the second is undefined. Note that for Map, the address in the request and reply packets is not the same.

DeMap

DeMap causes the virtual valid bits for a given real page to be cleared in all caches.

The request packet is two cycles. The first cycle contains the real page number in bits 32..53, while the second cycle is undefined.

The reply packet is identical to the request packet except for the command.

5. Data Consistency

A basic decision in the Dragon architecture is that multiple copies of read/write data are allowed to exist when data is cached. An immediate consequence of this is that the copies must be kept consistent if computations are to produce correct results.

This section describes briefly how the above transactions can be used to maintain consistency in a system with multiple levels of cache. The description begins with a definition of data consistency. It then explains how consistency is maintained in a single level system. The two-level case is taken up last. Although only the single-level and two-level cases are described here, it should be noted that the algorithm works equally well for the N-level case.

Definition of Data Consistency

One way to define consistency for a particular datum A is to say that during each clock cycle all copies of A have the same value. While this definition is adequate and easy to understand, it is hard to implement efficiently when the potential number of copies is large. Fortunately, there is a weaker definition that is still adequate for correctness, but is much easier to implement in a large system. It is based on the notion of serializability.

To define serializability, consider the abstract model of a shared memory multiprocessor shown in Figure 5. There are some number of processors connected to a shared memory. Each processor has a private line to shared memory; over this line the processor issues the commands Fetch(A) and Store(A, D), where A is an address and D is data. For Fetch(A) the memory returns the value currently stored at A; for Store(A, D) it writes the value D into A and returns an indication that the write has completed. Let the starting time of an operation be the moment the request is sent over the line and the ending time be the moment either the return value or an indication of completion is received by the processor.

[Artwork node; type 'ArtworkInterpress on' to command tool]

Figure 5. Abstract Model of Shared Memory Multiprocessor

Let S be the state of the shared memory before the start of some computation C, and F be the state after C has completed. The Fetches and Stores of C are said to be serializable if there exists some serial order such that if these operations were performed in this order, without overlap, the same final state F would be reached starting from the initial state S. Two operations are said to overlap if the starting time of one is before the ending time of the other and the starting of the other is before the ending time of the one.

Single Level Operation

For the purposes of explaining the consistency algorithm, a single level system consists of one or more processors connected to the DynaBus through caches, and a single main memory. The first thing to note about this configuration is that it is sufficient to maintain consistency between cached copies. Since processors have no way to access data except through their caches, the main memory copy can be stale without causing incorrect behavior.

The algorithm requires that for each block of data a cache keep two additional bits, shared and owner. For a given block, the shared bit indicates whether there are multiple copies of that block or not. This indication is not accurate, but conservative: if there is more than one copy then the bit is TRUE; if there is only one copy then the bit is probably FALSE, but may be TRUE. We will see later that this conservative indication is sufficient. The owner bit is set in a given cache if and only if the cache's processor wrote into the block last; thus at most one copy of a datum can have owner set. A cache is also required to maintain two state bits sharedAccumulator and rplyStale that correspond to an address for which the cache has sent a request but has not yet received a reply. The bit sharedAccumulator is used in computing the value to be put into the shared bit for the block addressed by the transaction, while rplyStale is used to determine if the data in a ReadBlockReply is valid when it arrives. In addition to this state, the algorithm uses two lines on the DynaBus, Shared and Owner that were described earlier in Section 3.

Generally, a cache initiates a ReadBlock transaction when its processor does a Fetch or Store to a block and the block is not in the cache; it initiates a FlushBlock when a block needs to get kicked out of the cache to make room for another one (only blocks with owner set are written out); and it initiates a WriteSingle when its processor does a write to a block that has the shared bit set. Caches do a match only if they see one of the following packet types: RBRqst, RBRply, WSRqst, WSRply, CWSRqst, CWSRply, and WBRqst. In particular, note that no match is done either for a FBRqst or a FBRply. This is because FB is used only to flush data from a cache to memory, not to notify other caches that data has changed. No match is done for a WBRply either, because all this packet does is acknowledge that the memory has processed the WBRqst.

When a cache issues a RBRqst or WSRqst, all other caches match the block address to see if they have the block. Each cache that matches, asserts Shared to signal that the block is shared and also sets its own copy of the shared bit for that block. The requesting cache uses sharedAccumulator to compute the value of the shared bit. The reason it can't just copy the value of Shared into the shared bit like the other caches is because the status of the block might change from not shared to shared between request and reply due to an intervening packet with the same address. It is precisely to accumulate changes in the shared bit caused by intervening packets that sharedAccumulator is needed—hence its name. This computation proceeds as follows: the requesting cache first clears sharedAccumulator, and waits and watches packets on the bus, just like other caches. If the address in any of these packets matches the address that was sent out, the cache sets sharedAccumulator. When it sees the reply packet, it OR's sharedAccumulator and rplyShared from the packet and puts the result into the shared bit for the block. This ensures that the shared bit is TRUE for a block only if there are multiple copies, and that the shared bit is cleared eventually if there is only one copy. The clearing happens when only one copy is left and that copy's processor does a store. The store turns into a WSRqst, no one asserts Shared, and so the value the requestor computes for the shared bit is FALSE.

The manipulation of the owner bit is simpler. This bit is set each time a processor stores into one of the words of the block; it is cleared each time a WSRply arrives on the bus (except for the cache whose processor initiated the WSRqst). There are two cases to consider when a processor does a store. If the shared bit for the block is FALSE, then the cache updates the appropriate word and sets the owner bit right away. If the shared bit is TRUE, the cache puts out a WSRqst. When the memory sees the WSRqst, it turns it around as a WSRply with the same address and data, making sure that the shared bit in the reply is set to the value of the Shared line an appropriate number of cycles after the appearance of the WSRqst's header cycle. When the requestor sees the WSRply, it updates the word and also sets owner. Other caches that match on the WSRply update the word and clear owner. This guarantees that at most one copy of a block can ever have owner set. Owner may not be set at all, of course, if the block has not been written into since it was read from memory.

When an RBRqst appears on the bus, two distinct cases are possible. Either some cache has owner set for the block or none has. In the first case the owner (and possibly other caches) assert Shared. The owner also asserts Owner, which prevents memory from responding, and then proceeds to supply the block via an RBRply. The second case breaks down into two subcases. In the first subcase no other cache has the block, Shared does not get asserted, and the block comes from memory. In the second subcase at least one other cache has the data, Shared does get asserted, but the block still comes from memory because no cache asserted Owner. Because the bus is packet switched, it is possible for ownership of a block to change between the request and its reply. Suppose for instance that a cache does an RBRqst at a time when memory was owner of the block, and before memory could reply, some other cache issues a WSRqst which generates a WSRply which in turn makes the issuing cache the owner. Since Owner wasn't asserted for the RBRqst, memory still believes it is owner, so it responds with the RBRply. To avoid taking this stale data, the cache that did the RBRqst uses its rplyStale state bit. It sets the bit to FALSE when it sends its RBRqst and sets it to TRUE if between the RBRqst and RBRply there is a WSRply, CWSRply, or WBRqst to the same address. When the RBRply is received, the cache checks if rplyStale is set. If it isn't, the RBRply is good and the block is put into the cache; if it is, the cache issues another RBRqst and waits for the reply. This process is repeated as many times as necessary to get valid data.

It is interesting to note that in the above algorithm the Shared and Owner lines are output only for caches and input only for memory. This is because the caches never need the value on the Owner line, and the value on the Shared line is provided in the reply packet so they don't need to look at the Shared line either.

In the discussion above, we did not say how the transactions CWS and WB work. We will do this now. CWS is identical in its manipulation of the Shared and Owner bits and the Shared and Owner lines to WS, so as far as consistency is concerned these transactions can be treated the same. WB, on the other hand, is identical to FB as far as memory is concerned. Caches ignore FB, but overwrite their data for a matching WBRqst and clear the owner bit for this block.

Two-Level Operation

For the purposes of the consistency algorithm, a two-level system consists of a number of one-level systems called clusters connected by a main bus that also has the system's main memory. Each cluster contains a single big cache, which connects the cluster to the main bus, and a private DynaBus, which connects the big cache to the small caches in the cluster. This private DynaBus is electrically and logically distinct from the DynaBuses of other clusters. From the standpoint of a private DynaBus, its big cache looks identical to the main memory in a single-level system. From the standpoint of the main DynaBus, a big cache looks and behaves very much like a small cache in a single-level system. Further, the design of the protocol and the consistency algorithm is such that a small cache cannot even discover whether it is in a one-level or a two-level system—the response from its environment is the same in either case. Thus the small cache's behavior is identical to what was described in the previous subsection.

The algorithm requires the big cache to keep all of the state bits a small cache maintains, plus some additional ones. These additional bits are the existsBelow bits, kept one bit per block of the big cache. The existsBelow bit for a block is set only if some small cache in that cluster also has a copy of the block. This bit allows a big cache to filter packets that appear on the main bus and put only those packets on the private bus for which the existsBelow bit is set. Without such filtration, all of the traffic on the main bus would appear on every private bus, defeating the purpose of a two-level organization.

We have already stated that the behavior of a small cache in a two-level system is identical to its behavior in a one-level system. We have also said that a big cache behaves like main memory at its private bus interface and a small cache at its main bus interface. What remains to be described is what the big cache does internally, and how packets on a private bus relate to those on the main bus and vice-versa.

When a big cache gets an RBRqst from its private bus, two cases are possible: either the block is there or it's not. If it's there, the cache simply returns the data via an RBRply, making sure that it sets the shared bit in the reply packet to the OR of the value on the bus and its current state in the cache (recall that in the single-level system main memory returned the value on the Shared line for this bit). If the block is not in the cache, the cache puts out an RBRqst on the main bus. When the RBRply comes back the cache updates itself with the new data and its shared bit and puts the RBRply on the private bus. When a big cache gets a WSRqst on its private bus, it checks to see if the shared bit for the block is set. If it is not set, then it updates the data, sets owner, and puts a WSRply (with shared set to the value of the Shared line at the appropriate time) on the private bus. If shared is set, then it puts out a WSRqst on the main bus. The memory responds some time later with a WSRply. At this time the big cache updates the word, sets the owner bit, and puts a WSRply on the private bus with shared set to one. When a big cache gets a FBRqst, it simply updates the block and sends back an FBRply.

When a big cache gets an RBRqst on its main bus, it matches the address to see if has the block. If there is a match and owner is set, then it responds with the data. However, there are two cases. If existsBelow is set, then the data must be retrieved from the private bus by placing a RBRqst. Else the copy of the block it has is current, and it can return it directly. When a big cache gets a WSRqst on the main bus, it matches the address to see if the block is there and asserts shared as usual, but takes no other action. When the WSRply comes by, however, and there is a match, it updates the data it has. In addition, if the existsBelow bit for that block happens to be set, it also puts WSRply on the private bus. Note that this WSRply appears out of the blue on the private bus; that is, it has no corresponding request packet. This is another reason why the number of reply packets on a bus may exceed the number of request packets.

This completes the description of the consistency algorithm. This description has been sketchy, but intentionally so since its meant to be an introduction rather than a rigorous specification. Such a specification is contained in [DynaAlg].

6. ConditionalWriteSingle

The DynaBus defines the ConditionalWriteSingle transaction with the idea that the Dragon read-modify-write instruction ConditionalStore will be implemented directly by caches. This transaction, like the Dragon instruction, takes three arguments: an address, an old value and a new value, and its semantics are:

ConditionalWriteSingle[address, oldval, newval] Returns[sample] =

{<begin critical section>

sample ← address^;

If sample=oldval Then address ← newval

}

Direct implementation of ConditionalWriteSingle by caches has several key advantages over alternative schemes. First, it is easy to show that this scheme functions correctly since the proof is identical to that for WriteSingle. Second, it allows the maximum possible concurrency for read-modify-writes to a particular location. And third, the cost of a single read-modify-write as seen by a processor is small, especially when the location is not shared.

7. Input Output

The DynaBus protocol was written with a particular model of IO devices in mind. In this model, all interactions with IO devices fall into one of two categories, control, and data transfer. Control interactions are used to initiate IO and discover whether an earlier request has completed, while data transfer interactions actually move the data to and from memory. It is assumed that the bus bandwidth requirements of control interactions are small compared to those of data transfer. The model permits control and data interactions to be combined for devices with low data transfer rates.

Following the dictates of the model, this section first describes how control interactions work and then turns to data transfer.

Control

All control interactions are carried out through the use of IORead, IOWrite and BIOWrite transactions directed to a common, unmapped 32-bit IO address space. This address space is common in the sense that all processors see the same space, and it is unmapped in the sense that addresses issued by processors are the ones seen by the IO devices. Generally, each type of IO device is allocated a unique, contiguous chunk of IO space at system design time, and the device responds only if an IORead, IOWrite, or BIOWrite is directed to its chunk. The term IO device is being used here not just for real IO devices, but any device (such as a cache) that responds to a portion of the IO address space. See Appendix B for details on allocation of IO address space.

IOWrite transactions are typically used to set up IO transfers and to start IO. The address cycle of the request packet carries an IO address, while the data cycle carries 32 bits of data whose interpretation depends on the IO address. For block transfer devices, a processor typically will do a number of IOWrites to set up the transfer, and then a final IOWrite to initiate the transfer.

An IOWrite starts out at a small cache as an IOWRqst packet. The big cache of the cluster puts the IOWRqst on the main DynaBus, where it is picked up by all the other big caches. These caches put the IOWRqst on their private buses. Thus the IOWRqst is broadcast throughout the system. Broadcasting eliminates the need for requestors to know the location of devices in the hierarchy and makes for a simpler protocol. When the IOWRqst reaches the intended device, the device performs the requested operation and sends an IOWRply on its way. The IOWRply is broadcast in the same way as the IOWRqst, so it eventually makes its way to the requesting small cache. When the reply arrives, the small cache then lets its processor proceed.

IORead transactions read 32 bits of data from a device. This data may either be status bits that are useful in controlling the device, or data being transferred directly from the device to the processor.

IOReads work the same as IOWrites: An IORead starts out at a small cache as an IORRqst packet. The big cache of the cluster puts the IORRqst onto the main DynaBus, where it is picked up by other big caches and put on the private buses. Once the intended IODevice receives the request, it reads the data from its registers and sends it on its way via an IORRply. The IORRply gets broadcast in exactly the same way as the IORRqst, and eventually makes its way to the cache that initiated the transaction. Note that for both IOReads and IOWrites exactly one device responds to a given IO address.

BIOWrites are used in cases where a processor needs to have more than one IO device act upon a command without having to explicitly send multiple IOWrites (interprocessor interrupts and map updates are examples where BIOWrites are useful).

A BIOWrite starts out at a small cache as a BIOWRqst packet. The big cache of the cluster puts the BIOWRqst on the main bus. The memory then generates a BIOWRply with the same parameters as the BIOWRqst, and all big caches put this BIOWRply on their private DynaBuses. Thus the BIOWRply is broadcast throughout the system. When the BIOWRply reaches the requesting small cache, the cache lets its processor proceed. Note that the reply is not generated by the IO device, but by main memory. The reason is that there is no unique IO device that can generate the reply packet. It is important to point out that errors that occur during a BIOWrite may not be caught by the requesting device's time out mechanism. If one of the intended recipients of the BIOWrite is broken, for instance, the requestor won't get any indication of it. This is a basic problem with broadcast operations, however, and there is no simple solution.

Data Transfer

A simple way to accomplish data transfer is to require that every IO device connect to the DynaBus via a cache. This way, data going in and out of the memory system automatically participates in the consistency algorithm, and no additional mechanism is needed.

Unfortunately, there is a disadvantage to this way of doing things. For input, the bus bandwidth required for data transfer is effectively doubled since a block must first be fetched to service a miss and then get written out when it is victimized. This doubling may be avoided by defining a new transaction, IOWriteStream, which is used directly to write to memory. The caches on the bus, of course, must also watch for IOWriteStreams and update their copies on a match.

8. Error Handling

The DynaBus presumes the following model for dealing with errors and other exceptional events. Each device provides its own capabilities for detecting errors, regardless of whether these errors are internal (for example a parity error on an internal bus) or result from interaction with other devices (for example transaction time-out, or illegal parameters within a request packet).

Once an error is detected, a device decides whether it can handle the error on its own, or needs to report the error to some other party. If the device is capable of handling the error, no facilities need to be provided by the bus, so this case is uninteresting. The errors that a device cannot handle by itself typically fall into two categories: recoverable (at least as far as the detecting device is concerned), and catastrophic.

When a device encounters a catastrophic error it uses the DBus to freeze the state of the machine so it can be examined by a debug processor (see [DBusSpec]). When it encounters a recoverable error while servicing a request, it uses the DynaBus Mode/Fault bit in the reply packet to report it. The least significant 32 bits of the first data word of the reply packet are set aside for the FaultCode (the format of FaultCode is described in Appendix C). Thus the only reporting mechanism the DynaBus provides is a facility for indicating that a transaction completed abnormally.

Appendix A. Encoding of the Command Field

The table below gives the encoding for the Command field within the header cycle of a DynaBus packet.

Encoding Command

0000 0 ReadBlockRequest
0000 1 ReadBlockReply
0001 0 WriteBlockRequest
0001 1 WriteBlockReply

0010 0 WriteSingleRequest
0010 1 WriteSingleReply
0011 0 ConditionalWriteSingleRequest
0011 1 ConditionalWriteSingleReply

0100 0 FlushBlockRequest
0100 1 FlushBlockReply

[0101 0..0111 1] Unused

1000 0 IOReadRequest
1000 1 IOReadReply
1001 0 IOWriteRequest
1001 1 IOWriteReply
1010 0 BIOWriteRequest
1010 1 BIOWriteReply

[1011 0..1101 1] Unused

1110 0 MapRequest
1110 1 MapReply
1111 0 DeMapRequest
1111 1 DeMapReply

Appendix B. Allocation of IO Address Space

This section defines how the I/O address space on the DynaBus is allocated amongst I/O devices. This address space can be as large as 47 bits, but currently it is limited to 32 bits since Dragons only supply 32 bit I/O addresses. The high order 15 bits of DynaBus address must be 0; however, devices must check this and only respond if the bits are 0 to facilitate future extentions.

The I/O address space allocation problem is nontrivial only because the space requirements of different devices are quite different from one another. Devices such as the small caches have very modest requirements, while devices such as the IOP, which produces addresses on an industrial bus, need large chunks of address space. Thus the obvious solution of dividing the space equally amongst the maximum planned number of devices doesn't work.

The allocation method chosen here is to divide all I/O devices into three classes small, medium, and large, according to how many I/O addresses a device is likely to need. A small device is given 210 contiguous addresses, a medium device 216 bits, and a large device 224 bits. Section B1 below defines the precise address encoding for devices in each class. Section B2 describes how devices must decode which I/O operations are meant for them. The last three sections, B3, B4, and B5 give the current allocations for devices in each of the three classes, respectively.

B1. DynaBus I/O Address Encoding

DynaBus I/O addresses are split up into three fields: a device type DevType, which is different for each type of device (eg. small cache, IOP, map cache); a device number DevNum, which is different for each instance of a given type; and a DevOffset that is the address of an I/O location within a particular device instance. Having an explicit concept of device type in the encoding is convenient because it allows us to address all devices of a given type via broadcast I/O operations.

The address format for each of the three classes is as follows:

DevType (msb) DevNum DevSubAddr (lsb)

Small 12 bits 10 bits 10 bits
Medium 8 bits 8 bits 16 bits
Large 4 bits 4 bits 24 bits

The most significant bits of an address determine the class of a device as follows:

Value (12 bits) Type

000000000000 Reserved - No DynaBus device should respond to this DevType

000000000001
to Small device (1K) - 31 different types supported, 1024 devices per type
000000011111

00000010XXXX
to Medium devices (64K) - 31 different types supported, 256 devices per type
00011111XXXX

0010XXXXXXXX
to Large devices (16M) - 14 different types supported, 16 devices per type
1111XXXXXXXX

The DevType for a device is hardwired internally for each device, while DevNum is derived from a device's DynaBus DeviceID which is set up at system initialization. Which bits of DeviceID are chosen as the DevNum is entirely up to the designer of the device. The only guarantee he must make is that the bits selected result in unique DevNum's being assigned to devices of that type (ie. no two devices of the type have the same DevNum).

B2. Address decoding by devices

For IOReadRequest and IOWriteRequest packets, devices should check that the DevType and DevNum match the internally stored values. A match indicates that this device is being addressed. Both IOReadRequest and IOWriteRequest require the device to generate a reply.

For BIOWriteRequest, devices should check only that DevType matches the internally stored value. A match indicates that this device is being addressed. This allows BIOWrites to be directed to all devices of a given type. When a device is requested to do a BIOWrite, it must not generate a reply since multiple devices may have been addressed.

B3. Allocation Table for Small devices

Small devices have a 12-bit DevType, ranging from 01H to 1FH, a 10 bit DevNum, and a 10-bit DevOffset. The following table describes the current allocation:

Type Device Comments

01 Small Cache Access to Small Cache registers
02 Display Access to Display control registers
03 Memory Controller
04 Free
05 Free
06 Free
07 Free
08 Free
09 Free
0A Free
0B Free
0C Free
0D Free
0E Free
0F Free
10 Free
11 Free
12 Free
13 Free
14 Free
15 Free
16 Free
17 Free
18 Free
19 Free
1A Free
1B Free
1C Free
1D Free
1E Free
1F Free

B4. Allocation Table for Medium devices

Medium devices have an 8-bit DevType, ranging from 02H to 1FH, an 8 bit DevNum, and a 16-bit DevOffset. The following table describes the current allocation:

Type Device Comments

02 IOB Access to SloBus I/O space and IOB registers
03 Free
04 Free
05 Free
06 Free
07 Free
08 Free
09 Free
0A Free
0B Free
0C Free
0D Free
0E Free
0F Free
10 Free
11 Free
12 Free
13 Free
14 Free
15 Free
16 Free
17 Free
18 Free
19 Free
1A Free
1B Free
1C Free
1D Free
1E Free
1F Free

B2.3. Allocation Table for Large devices

Large devices have a 4-bit DevType, ranging from 02H to 0FH, a 4 bit DevNum, and a 24-bit DevOffset. The following table describes the current allocation:

Type Device Comments

02 IOB Access to SloBus memory space per byte
03 IOB Access to SloBus memory space per halfword/fullword
04 Free (JMF may have his eyes on this one, so check with him)
05 Map Cache Access to Map Cache entries and control registers
06 Free
07 Free
08 Free
09 Free
0A Free
0B Free
0C Free
0D Free
0E Free
0F Free

Appendix C. Format of FaultCode

The error reporting mechanism on the DynaBus includes a fault bit and 32 bits of information about the fault, FaultCode. This section defines the format of FaultCode.

FaultCode is divided up into a 3 bit MajorCode, which appears in the low-order three bits, and 29 bits of MinorCode which comprise the rest of the word. MajorCode divides up all faults into 8 categories that are important to distinguish quickly, while MinorCode provides a way to encode a large number of infrequent subcases.

The encoding of MajorCode is as follows:

Encoding Name Meaning
000 MemAccessFault first write to page or insufficient privilege
001 IOAccessFault insufficient privilege to read or write IO location
010 MapFault map cache miss
011 AUFault arithmetic unit fault
100 DynaBusTimeOut transaction timeout on DynaBus
111 DynaBusOtherFault some other DynaBus fault reported via reply packet

The top 10 bits of MinorCode give the Dynabus DeviceId of the reporting device, while the remaining 19 bits indicate the fault. The encoding of these 19 bits is left up to the designers of individual devices.

References

[DynaElec] DynaBus Electrical Specifications.
[DynaAlg] DynaBus Consistency Algorithm Specifications.
[DynaImpl] DynaBus Implementation Guidelines.
[ArbSpec] Arbiter Specifications.
[DBusSpec] The Dragon DBus Specifications.
[MapSpec] THE DRAGON MAP PROCESSOR
[ArbSpec] The DRAGON ARBITER

ChangeLog

October 13, 1986: [PSS] Added Small Cache, Display, and Map Cache to device allocation tables.

January 26, 1987: [PSS] Made numerous changes to fix inaccuracies and get the document to conform to truth once again.

December 2, 1987: [PSS] Changed Device Type of MapCache from 4 to 5.

December 2, 1987: [PSS] Changed description of FaultCode to make MinorCode applicable to all MajorCodes.