Internal Memo XEROX To From Dragon Core Rick Barth PARC - CSL Subject Date M Bus Compatibility November 19, 1984 Introduction Long, long, ago and far, far, away, say about 400 miles, Chuck Thacker, Ron Rider, Steve Dashiell and God knows who else sat down and decided what the main bus, called the M bus, should look like for a new memory system. For a variety of reasons communications between the Dragon design crew and the folks down south broke down over the last couple of years and the decisions regarding the common bus fell by the wayside. Sometime towards the end of August of this year Ron Rider came cruising by and Ed McCreight, Carl Black and I had a little meeting with him where we talked about how similiar our main memory buses were and, gosh, wouldn't it be nice if they were the same. There have been a number of phone calls, we made a visit to El Segundo, and a number of messages have been transmitted since then. This memo is an attempt to write down the state of current designs, future designs that are contemplated which depend on current designs, different options from which we can choose, the impact of the options upon the designs, specifically Dragon, and to come to some sensible conclusion about what we should do. Current Designs ESS The current ESS design has only one custom part that depends on the M bus protocol. It is a cache and will be referred to as the ESS cache. The specific M bus to which an ESS cache may be attached is referred to as the ESS M bus. The ESS cache is being used to connect a 8086 style bus to the M bus. It has a small amount of cached data, about 16 quad words, and has facilities for mailboxes and stream operations. It has 24 bit real, and virtual address fields although the protocol allows space for 31 bit virtual addresses and 28 bit real addresses. The protocol has 3 free commands, one of which is actually used internally by the cache, leaving 2 free commands for expansion. The ESS cache does not have any flow controlled I/O operations. This makes connecting the synchronous M bus to an asynchronous bus such as the VME bus more difficult. It also prevents the map processor from using the M bus during map operations thereby forcing map memory accesses to take a different route. The I/O address space does not have large contiguous sections and is limited to a total of no more than 25 bits because 4 bits are used for the command field and at least 3 bits have been removed to ease 8086 compatibility. This does not allow addressing large frame buffers in the I/O space. Provisions for multiple address spaces are not yet implemented and the proposed change to allow for them does not cleanly implement the required operations. ED is currently awaiting second silicon on this chip. First silicon showed great promise. A number of I/O controllers, including disk and Ethernet controllers, have been or are being constructed based on the capabilities of the ESS cache. Memory controllers have also been built to this specification. Dragon There is a design for the Dragon cache that has executed each transaction on the M bus and all of the commands on the P bus. The execution was driven by a black box Rosemary test program. The level of description in the Rosemary code is slightly above that of transistors. The layout for an entry of the cache has been done, although the first pass at the control logic for an entry requires too much space and so it will have to be redone. Some preliminary layout of the CAM and RAM drivers has been done but the design has changed so much since these were layed out that at best they serve as guidelines to how big these blocks might be. Basically a very firm specification for the circuit design and layout portions of the design process is in hand. About 6 months of work remains prior to submitting first masks for processing. Modifications that will be made to the Dragon cache Byte write enable pins will be added to the cache. A bus mode pin that causes the parity bit of a word being written to be calculated internally will be added. These pins will be added to the design regardless of whether or not the entire Dragon cache is redesigned. They allow for attaching 68020 and 80186 processors to the Dragon cache. A small pile of SSI and MSI cruft will be required between the cache and such foreign processors but implementation will be straightforward. I plan to attach 68020 processors to a complete Dragon memory system in order to test it independently of the Dragon processor. If someone were to sit down and play with the 68020 and 80186 processor interfaces enough that we could be reasonably sure of getting the interface logic right there is plenty of silicon area to embed the interface for the Dragon, 80186 and 68020 all on one die and use strap pins to select which processor is actually attached. Currently there is no way to detect when an illegal memory reference is made. The controllers in every bus master will be modified to detect when the command wires on the bus have not been driven to a value different than the value with which the bus master started a transaction. A bus master that detects this condition must raise an error and release the bus. The cache can handle this cleanly by causing its processor to trap. I/O controllers might set an error flag. The map processor could set an error flag and return an error condition as the result of the IORead or IOWrite that started the map processor. There are a few inconsistencies between the current M bus specification and the implementation that must be cleaned up. Reasons to redesign the cache Regardless of which M bus protocol is used there are some reasons to redesign the cache. The most important reason is to make the RAM static. Dynamic RAM is a holdover from Thacker's design that I never really thought all the way through. The problem with it is that we cannot calculate the error rate that will be introduced by alpha particle hits. We have to build the cell and see what the error rate is. Waiting until we have built the cache and embedded it in the system to find out if the error rate is acceptable seems like a recipe for disaster. A single port static RAM will have about the same area as the dual port dynamic RAM in the current design. I can go through reasonable bandwidth arguments which show that a dual ported RAM is not needed for the Dragon anyway. Some thought should be given to the amount of bandwidth a 68020 or 80186 will require if attaching those processors in an optimal manner is felt to be very important. Another reason to redesign the cache is to remove some of the complexity from the bus controller. I am still adapting to the delay characteristics of dual layer metal and I put some complexity into building a fast controller that could probably be eliminated through judicious use of second level metal. Finally I have some ideas about how to do the design using a better methodology. This is not a very compelling argument from a project management point of view except that it keeps me interested in designing the beast. Designing it for the fourth time with changes to the specifications but not to the methodology is boring. Changing both could be at least somewhat interesting. Redesigning the cache, assuming that strict compatibility with the ESS M bus is not required, that the design methodology is not changed, and that I do the redesign, will add about 3 months to the design schedule, changing it from 6 months to 9 months until first masks. Future Designs C Workstation The C workstation, which, lest anyone become confused, has nothing to do with the C programming language, is planned to be introduced by Xerox in early 1987. The requirements that I have seen for it specify that it be a multiprocessor and allow foreign processors to be attached to it. The notion is that some number of enhanced Mesa processors will be the main computing engines in this machine. The actual amount of manpower working on it right now is very small. During the summer of 1985 it is expected that the manpower will ramp up. Steve Dashiell told me that they would like to move the I/O controllers from the ESS design directly into the C workstation. They expect to design a processor cache in CMOS so that it will perform adequately with 16 MHz processors. They realize that the ESS M bus protocol is inadequate for the C workstation but believe that the I/O controllers can still be salvaged, at least temporarily. I have trouble believing that the I/O controllers should be salvaged for a product which is slated to come out some time after 1Q 1987. More cost effective solutions are becoming available for 2 of the key controllers, disk and Ethernet, and a new workstation should be designed to take advantage of them. It is not clear how real the schedule or the proposed design are at this point. It is especially interesting to note that none of the SBU's have signed up for this thing and that no systems software group has, to my knowledge, examined the proposal for feasibility. I believe that the hardware can be built, perhaps within the proposed schedule, but hardware doth not a workstation make. However, it is clear that the C workstation is receiving a fair amount of high level management attention. Options Conform to the ESS M bus Although it is possible to be completely compatible with the ESS M bus I believe this would not be a wise choice. In addition to the restrictions mentioned earlier there are a significant number of warts that would contribute to the complexity of the design. For example the order that words within a quad are returned is not as simple in the ESS design as in the Dragon. This was done so that the amount of M bus bandwidth available does not change when a single memory chip goes bad. They do not store parity within the cache because of their desire to easily interface with 16 bit processors and so they do not transport parity on the same cycle as the data. This is bad news for caches which do store parity with the data internally. They have the notion of stream buffers for I/O, which may be necessary for an M bus which is supporting a printer, but is not necessary in a workstation. They do not insert wait states into the ReadQuad protocol in the same manner that wait states are inserted into the WriteQuad protocol. The page size is 512 bytes instead of 1024 bytes. This doubles the size of the page table and has some as yet unknown, but negative, impact on cache performance. They do not allow a cycle between bus grant to the cache and actual assumption of bus mastership by the cache. This imposes significant space penalties in the cache because the cache must be ready to respond as a slave or as a master, rather than time multiplexing a single set of data paths to accomplish one or the other. There is only space in their encoding for 16 processors, which may be enough but does not leave enough room for experimentation. They have no notion of bus hold or a method for examining the internal state of the chips in the system. This eliminates a powerful debugging aid and performance analysis tool. No one of these problems is insurmountable, all can be justified for the ESS design, none are necessary or even desirable for the Dragon design. They increase complexity or degrade performance or both. The reasons that were originally given for being compatible with the ESS M bus included the ability to use their boards, backplane and mechanical system. The boards included a memory controller, simple map processor, memory, disk controller and Ethernet controller. Since then it has become clear that we cannot use their backplane or map processor and the memory, disk and Ethernet controllers would have to be modified to be acceptable. We can still use their mechanical system and we could modify their controller boards to be acceptable. Go our own way We could continue the design assuming that there will be no exchange of hardware with ED. This is attractive because it leaves us with the most flexibility to modify the design as problems arise. Flexibility can lead to reduced design time since it is possible to modify the specifications to make hard problems go away rather than having to really solve them. It is unattractive because we end up doing all of the work and there is a much less clear path from research to product. Considering the Dragon as an isolated project I believe that this route will give us working Dragons faster than trying to be compatible with the ESS bus but is potentially slower than the next alternative. Cooperatively design a new bus and the chips that go on it This option gives us the opportunity to clean up all the warts in the ESS M bus design that we have discovered so far. With this cooperative option we have less time to notice new warts before they solidify. There still remains time between now and when the C workstation is fully staffed to finish the architecture down to quite detailed simulations. I think this may be the fastest method of producing both working Dragons and working C workstations. Conclusions The major motivations for becoming compatible with an ED M bus are to eliminate needless replication of effort in research and development and enhance the exchange of ideas and results between research and development. We should pursue these goals. We should not attempt to do so by becoming compatible with the ESS design because of the deficiencies in the protocol outlined earlier. Convergence of the Dragon and ED M buses should occur in the C workstation. The I/O controller issue should be resolved by either redesigning the I/O controllers or designing a gate array that translates between the new and old buses. Joint design of a cache for Dragon and the C workstation should be explored. Technical exchange of portions of the designs should occur even if independent design efforts are pursued. StyleDefBeginStyle (Cedar) AttachStyle (firstHeadersAfterPage) {0} .cvx .def (root) "format for root nodes" { docStandard 36 pt topMargin 36 pt headerMargin .5 in footerMargin .5 in bottomMargin 1.75 in leftMargin 1.5 in rightMargin 5.25 in lineLength 24 pt topIndent 24 pt topLeading 0 leftIndent 10 pt rightIndent } StyleRule (positionInternalMemoLogo) "Xerox logo: screen" { docStandard 1 pt leading 1 pt topLeading 1 pt bottomLeading } ScreenRule (positionInternalMemoLogo) "for Xerox logo" { docStandard 1 pt leading 1 pt topLeading -28 pt bottomLeading -1.5 in leftIndent } PrintRule (internalMemoLogo) "Xerox logo: screen" { "Logo" family 18 bp size 20 pt topLeading 20 pt bottomLeading } ScreenRule (internalMemoLogo) "for Xerox logo" { "Logo" family 18 pt size 12 pt leading -28 pt topLeading 36 pt bottomLeading -1.5 in leftIndent } PrintRule (memoHead) "for the To, From, Subject nodes at front of memos" { docStandard AlternateFontFamily 240 pt tabStops } StyleRule EndStyleIblockMark insideHeaderbox IpositioninternalmemologoIinternalmemologomemoheadsN t N NNNNN N%%NNhead IbodyiPPP R PPPP3PPPwPXPPPP PPPPP PP:P P6