[_CD4_]<dragon>Documentation>MemoryController.tioga!2

MemoryController.tioga

Last Edited by: Gasbarro June 10, 1986 6:58:50 pm PDT

Dragon Memory Controller Specification

Jim Gasbarro

Abstract: This document describes a strawman for the main memory system of the June '87 Dragon.

Created by: Jim Gasbarro

Keywords: Dragon, Memory, RAM, Error Correction

XEROX Xerox Corporation
Palo Alto Research Center
3333 Coyote Hill Road
Palo Alto, California 94304

For Internal Xerox Use Only

1. Memory Control

Performance Goals

The majority of bus cycles will consist of Read Block operations. If necessary, the memory controller should be tuned to respond to these operations at the expense of other types of bus commands. Most of this document deals with the tradeoffs involved in delivering read data in minimum time.

Page Mode vs. Nibble Mode Access

When accessing a bit in a DRAM the row and column address are strobed into the chip sequentially using the Row Address Strobe (RAS) and Column Address Strobe (CAS) lines. Once both RAS and CAS have been asserted the user has the option of deasserting and reasserting CAS to initiate another RAM cycle. The details of what happens during this second cycle depend on whether the RAM is of the page or nibble mode variety. If the RAM uses page mode access the value of the Address lines is used to access a new column of the RAM. Thus, any bit in the "page" (i.e. same row address) can be accessed without going through the RAS portion of the cycle. In nibble mode, the address during subsequent CAS cycles is ignored and the next data in sequence is delivered. The data sequence is determined by a wrap-around count of the low order bits of the initial CAS address. Most nibble mode RAMs use the low two bits for nibble mode address providing fast nibble mode access to four bits.

The question of whether to use page mode or nibble mode access for the RAMs has been a hotly contested issue. The key difference between the two schemes is the number of RAMs that must be cycled in order to obtain the required memory bandwidth. In the case of page mode access, the cycle time is sufficiently long (230ns min.) that a full block (256 bits) must be cycled in parallel to obtain the required bandwidth. This leads to problems associated with how to multiplex the 256 bits into a quantity that can be fed to the error correction chip(s) without consuming large amounts of board area and power. The advantage of using a nibble mode cycle is that one obtains the 256 bits time multiplexed in a 4 bit serial fashion from the RAMs. This is done at the expense of a somewhat longer cycle time, but no external components are needed to reduce the pin count. A cycle by cycle analysis of the two schemes is given below:

Page Mode Read Access (40ns cycles)

Cycle Bus Operation Memory ECC

0 Grant

1 Read Block Cmd

2 Victim Address

3 Request Start Access

5 Grant Data 0

6 Address Data 1 Data 0

7 Data 0 Data 2 Data 1

8 Data 1 Data 3 Data 2

9 Data 2 Data 3

10 Data 3

Nibble Mode Read Access (40ns cycles)

Cycle Bus Operation Memory ECC

0 Grant

1 Read Block Cmd

2 Victim Address

3 Start Access

5 Data 0

6 Request

7 Data 1

8 Grant

9 Address Data 2 Data 0

10 Data 0 Data 1

11 Data 1 Data 3 Data 2

12 Data 2 Data 3

13 Data 3

Tradeoffs

Bank Size - for page mode the minimum memory size and increment is 32 MBytes. This is quite large. For nibble mode the minimum size and increment is 8 MBytes.

Area - for page mode a large bank size implies large area increments for additional memory. Also the multiplexers to route the data to the ECC chips require substantial area.

For a fee of about $6K plus $6 per module Texas Instruments will mount the muxes on the same substrate as the RAMs if we use their SIP DRAM packaging technique. However this makes using memories from other manufacturers difficult.

Latency - using nibble mode adds approximately 120nS to the latency of the memory over page mode. Current estimates are that this will cause approximately a 5% loss in uniprocessor performance.

Bandwidth - under the assumption that the majority of bus commands will be Read Blocks with a one command FIFO, the page mode controller should be able to pipeline the operation of the RAMs in such a way as to completely saturate the bus. The nibble mode controller due to the increased latency and reduced RAM bandwidth at best approaches 50% bus saturation. Thus a minimum of two controllers are needed to maximize the number of processors that can be serviced.

Power - the biggest advantage of nibble mode over page mode is in the area of power dissapation. In a DRAM the highest power consuming operation is the cycling of the sense amplifiers. Nibble mode accesses which do not cycle the sense amps use considerably less power. When using page mode 288 chips are cycled at a maximum operating current per chip of 100mA (Fujitsu spec). For 100% duty cycle the power dissapation is 144 watts, or 3/4 of the 6085's +5 current capability. This is indeed a frightening number. To compare the two access schemes on a equal level look at the energy required to read a single block:

Page mode: 120ns * 100mA * 5V * 288chips

= 17.3uW-Sec/Read Block

Nibble mode: (120ns * 100mA + 3 * 70ns * 25mA) * 5V * 72chips

= 6.21uW-Sec/Read Block

This does not include the power necessary for the multiplexor chips needed for page mode operation. An example mux chip would be a 74ALS648. These add an additional 4.3uW-Sec/Read Block. Thus page mode accesses consume nearly four times the power of nibble mode accesses.

Page Address Caching - if page mode is used and the controller is smart enough, the memory latency can be reduced by one or two cycles when a page address hit occurs. There is currently no simulation data available to indicate how useful this feature is.

Bus Loading - in order to achieve full bus bandwidth from the memory system when using nibble mode it is necessary to have at least two memory controllers on the bus. This increases bus length and capacitive loading somewhat, but in the propagation mode bus scheme should pose no great problem.

Packaging

The most attractive form of packaging for the DRAMs is in SIP (Single Inline Package) or SIMM (Single Inline Memory Module) form. The SIP package has leads while the SIMM one has a card edge style connector and requires a socket. In both of these package styles one can buy modules with eight or nine RAMs, mounted on one or both sides, with the die arranged with their long edge horizontal or vertical. According to one rep 90% of the sales go to single-sided 8 or 9 RAM modules with the die vertically mounted. Using this style of module will cause board spacing problems. It would be preferable not to use two board slots to accommodate the RAMs. Information will soon be available on angled sockets for reducing RAM height.

Drive Considerations

The RAM modules have a capacitance of 80pF for some of their input lines. Eight such modules in parallel will total nearly 700pF. It seems clear that some sort of external buffer will be necessary to drive this load. There are two main problems to worry about when driving such loads. The first is ground bounce. The currents one encounters when driving ten 700pF address lines can be enormous resulting in large voltage differences between ground on the driving chip and ground on the RAM. This can cause all sorts of problems which result in loss of data. To solve this problem one need only ensure that half of the RAMs are driven with the address while the other half of the RAMs are driven with the complement of the address. In such a system the currents in the ground plane sum to zero and there is very little potential difference between ground on the driving and receiving chips.

The second potential problem associated with driving DRAMs is overshoot (or undershoot). The inductance of the PC board trace and the capacitance of the RAM inputs combine to form a transmission line. If the output impedance of the driver is not matched to the impedance of the line reflection from the end of the line results in overshoot on the driven line. With SIP technology the board spacing will be much closer than conventional DIPs. It may be the case that termination is not needed due to the very short line, but this is not clear. Parts have been purchased to build a test jig to measure the magnitude of this problem.

Refresh

The data sheets available at this time (TI, Hitachi, Fujitsu) all offer 512 cycle, 8 mSec, CAS-before-RAS automatic refresh. This mode makes use of an internal counter in the memory chips to supply the refresh address. Use of this counter reduces the amount of hardware that has to be built. If refresh cycles are distributed over time, it will be necessary to do:

8 mSec/512 cycles = 1 refresh cycle every 16 uSec.

Interface

Memory Data 64

Memory Check 08

Memory Address 12

Data Clocks 04

Refresh Cycle 01

Start Cycle 01

Bus DAL 64

Bus Interface 07

Total Pins 161

2. Error Handling

Scrubbing

In order to prevent the accumulation of multi-bit errors, a background task of one of the Dragon processors will be to periodically scrub the entire memory system. The idea is to catch infrequently accessed words with single-bit failures before a fatal double-bit error occurs. Failures of this type are primarily due to alpha particle hits and thus are statistical in nature. The probability of two alpha particle hits to the same word is sufficiently small that scrubbing the entire memory twice per day is enough to reduce the probability of a double alpha particle hit to nearly zero.

The scrubbing process must read every location in memory to find those locations were a single error has occurred but has not yet been detected through normal access by some other process. For a system with 32 MBytes of memory, the scrubbing process must perform:

32 MBytes / 12 hours * 3600 seconds * 32 bytes/Read Block = 25 Read Blocks/sec

Single-bit Errors

When a single bit error is detected by the Memory Controller the error is corrected in real time and is passed to the requesting processor with no indication of the fault. The controller does however post the failing Address, Check, and Parity bits in an IO location. This allows a software process (probably the same one that performs scrubbing) to note the error and perform a read-correct-write operation to "fix" the bad bit. It is important to note that the read-correct-write operation must be performed atomically to avoid overwriting an intervening write operation from another process. Thus it must be possible to lock the bus arbitration to prevent other processors from acquiring the bus and set the priority of the correcting process high enough to ensure that no other process on the same processor can be scheduled. Given the Address, Check, and Parity bits the error fixing process can keep track of failing memory chips for maintenance purposes.

Multi-bit Errors

In the event of a multi-bit failure the Memory Controller will assert the Bus Error signal stopping the failing processor.

3. Miscellaneous Operations

Write Single

Some device on the bus must be responsible for returning Write Single Reply packets. Currently this task is assigned to the memory controller. The operation involves watching for Write Single Request packets on the bus, arbitrating for bus control, and sending a Reply packet. The Memory Controllers must be programmable in some fashion so that only one controller performs this operation.

(I hope to eventually punt this operation to some other device)

IO Read

For error logging purposes the memory controller must be capable of responding to IO Read commands and returning the last failing address and syndrome bits.