[Indigo]<Dragon>Documentation>Map>MapProcessor.tioga!3

MapProcessor.tioga

Written by: Sindhu, November 23, 1984 2:18:11 pm PST

Last Edited by: Sindhu, June 22, 1985 5:24:24 pm PDT

Pradeep Sindhu January 14, 1986 10:57:55 pm PST

THE DRAGON MAP PROCESSOR

DRAGON PROJECT — FOR INTERNAL XEROX USE ONLY

The Dragon Map Processor

Design Document

Release as [Indigo]<Dragon>Documentation>MapProcessor.tioga, .press

Abstract: This memo describes the Dragon Map Processor, the device that maps virtual addresses to real and provides per-page memory protection in Dragon. The Map Processor's main features are fast response to mapping requests; the use of a small, fixed fraction of main memory for mapping; the support of multiple address spaces with sharing; and an organization in which both function and performance can be enhanced relatively easily after initial implementation.

XEROX Xerox Corporation
Palo Alto Research Center
3333 Coyote Hill Road
Palo Alto, California 94304

For Internal Xerox Use Only

Contents

1. Introduction

2. The Addressing Architecture

3. System Organization

4. System Functions

5. The Map Cache

6. The Map Cache Controller

7. The Map Table

8. Miscellaneous Issues

Appendix A. Size of Bypass Area

Appendix B. Layout of IO Address Space

Appendix C. Format of Map Entries

Appendix D. Notation

1. Introduction

The Dragon Map Processor is a single logical device that implements the mapping from virtual addresses to real and provides per-page memory protection. Its main features are: fast response to mapping requests on the average (300 ns); the use of a map table that consumes a small (< 1%), fixed fraction of real memory; the support of multiple virtual address spaces that are permitted to share memory; and an organization in which both function and performance can be enhanced relatively easily after hardware implementation. In spite of these features the design is simple.

This document begins with the addressing architecture supported by the Map Processor. It then describes the Map Processor's organization in terms of its three components: the Map Cache, which is a custom VLSI chip; the Map Cache Controller, whose functionality is implemented by ordinary Dragon processors; and the main memory Map Table, which stores the virtual to physical mapping information. The following section provides the functional specifications for the Map Processor; it also indicates how each function is invoked and where it is implemented. The next three sections describe each of the components in greater detail, while the last section addresses design issues that do not fit elsewhere. Important information is collected together in the Appendices.

2. The Addressing Architecture

The Map Processor's addressing architecture supports multiple address spaces, with fixed size pages being the unit of mapping and protection. The virtual addresses issued by each Dragon processor are mapped according to one of a number of independent addressing contexts, or address spaces. For a given virtual address the mapping to a real address can be thought of as a two step process, the first step being to determine the address space in which the virtual address must be interpreted, and the second being to perform the actual translation.

Before we describe each of these steps it is convenient to define basic terms and introduce some shorthand notation. A page is a contiguous chunk of memory 2s 32-bit words long, aligned on a 2s word boundary; s is fixed at 10. In our description virtual addresses will be denoted by va, real addresses by ra, virtual pages by vp, real pages by rp, and offsets into pages by offset. Both va and ra are 32-bit quantities, both vp and rp are (32—s)-bit quantities, and offset is an s-bit quantity. If vp and offset are the virtual page and offset corresponding to va, we will write va=vp|offset; similarly, we will write ra=rp|offset for real addresses. Four flags bits are kept for each page; these will be denoted by flags. Faults associated with address mapping will be communicated via a three-bit order field. The sizes of these quantities are constrained in the architecture by the relation ||rp||+||flags||+||order||=32, where ||x|| is shorthand for log2xË, the number of bits needed to represent x. Address spaces will be denoted by their address space id, aid, a quantity constrained by ||aid||+||vp||=32. Processors will be identified by their processor number, pn.

Figure 1 illustrates address mapping for procesor pn. Recall that the first step involves determining what address space to use for translating a given va. For this step the 32-bit address space of each processor is divided up into two non-overlapping regions: a 228 word shared region at the low end of the address space, and a 232-228 word switched region comprising the rest. If va lies in the shared region the address space used for mapping is space 0, otherwise it is the one currently loaded on the processor. At most one address space can be loaded on a given processor at a time, but the same space may be present simultaneously on different processors. The overall effect of the first step therefore is to map (pn, va<31:28>) onto aid. The second step translates va in the context of the aid given by the first step. This translation produces ra=rp|offset, where rp is obtained by looking up the entry corresponding to aid|vp in the Map Table.

The architecture also provides a mechanism to bypass the above mapping. Bypassing is needed to provide a way for processors to access the Map Table and map-fault handling code without getting map faults in the process; it is also useful in turning off mapping altogether. Bypassing gets activated whenever a processor makes a reference to a portion of virtual address space called the map bypass area. This area appears at the same locations in all address spaces, and is defined by three (32—s)-bit quantities BypassMask, BypassPattern, and BypassBase. The mask and pattern determine the bypass area's location and size in virtual memory as follows: a va is defined to be in the bypass area if the bits of va under BypassMask match the corresponding bits of BypassPattern. BypassBase determines the starting location of the area in real memory. The real address produced by bypass mapping is ra=rp|offset, where rp= (BypassBase.BypassMask) ( (vp.~BypassMask). The mask, pattern and base can all be modified.

This architecture is designed to facilitate sharing between address spaces. The shared area provides a restricted form of sharing where all of the vp's sharing a particular rp must be identical; this sharing is also all-or-nothing in that a page is shared by all spaces if it is shared by any. It is expected that all but a few cases of sharing will be covered by this restricted mechanism. The architecture also permits more general sharing in which the vp's sharing a particular rp are unrestricted and where each page may be shared by any subset of the spaces.

3. System Organization

The Map Processor consists of three components (Figure 2): a custom VLSI Map Cache, a Map Cache Controller, and a Map Table kept in main memory. The Map Cache is simply a performance accelerator that sits between processor caches and the Map Table. It contains around 1200 mapping entries and allows mapping requests that hit to be serviced in three cycles, or 300 ns. Processor caches themselves keep a limited number of entries (~50), so the Map Cache is really a second level cache. When a map miss occurs in a processor cache, the cache fires off a mapping request to the Map Cache. The Map Cache returns the entry if it is present, and signals a map fault if it is not. This fault is then handled by the processor attatched to the cache that got the map miss. A small table in the Map Cache is used to determine the address space in which the references for each processsor should be mapped (the first step of the mapping process described in the last section). The Map Cache also implements a number of other operations, including ones to flush entries, to manipulate the control bits for entries in processor caches, to switch address spaces, and to control the location and size of the map bypass area. All communication with the Map Cache occurs over the M-Bus.

The Map Cache Controller is a fictitious device whose functionality is implemented by ordinary Dragon processors. As explained above, whenever the Map Cache misses, the processor whose cache got the miss fields the map fault. This processor plays the role of the Cache Controller in fetching the missing entry from the Map Table and shipping it to the Map Cache. In addition to servicing misses, the Controller also implements a complete set of operations for manipulating the Map Table. The code for these operations is available to all processors and can be executed by any one of them. This split in functionality between Map Cache and Controller forces us to freeze in hardware only that portion of the design which is necessary for speed, leaving the remainder to be implemented by ordinary Dragon code. As a result, enhancements in both function and performance can be made relatively easily. For example, the structure of the Map Table can be left completely open since the hardware has no knowledge of it.

The final component is the Map Table. It serves as the repository for mapping information for all address spaces, mapping (aid, vp) to rp. For each real page the table also maintains four flag bits used for paging and memory protection; in high to low order these are: shared, wtEnable, dirty, and spare. Note that this order is compatible with ED's order which is: spare, wtEnable, dirty, and referenced. The Map Table stores information only about pages currently in main memory, and is structured as a hash table indexed by (aid, vp). This allows the table to fit in a small, fixed fraction of main memory (< 1%), in contrast with direct map schemes. These schemes consume table space proportional to all of virtual memory or to the fraction being used frequently, depending on how they are implemented. Assuming that the hash function spreads things properly, the average time to access the table will be quite good. The worst case time, however, depends on how collisions are resolved. Initially, linear chaining with "move to front on access" will be used; if this turns out to be a performance problem the scheme will be modified to use a tree structure. Dragon processors access the Map Table directly via the map bypass area.

4. System Functions

This section describes the functions implemented by the Map Processor. Map Cache functions are taken up first, followed by Cache Controller functions. The description for each function includes its specification, the method used for invoking it, and miscellaneous information relevant largely only to the implementor.

4.1 Map Cache Functions

Two of the Map Cache functions are invoked via the dedicated M-Bus transactions ReadMapSetRef and ReadMapSetRefSetDirty, while the remainder are invoked by IOReadFlow and IOWriteFlow. For the former two the address portion of the transaction is restricted to be in the range [0..225). For the latter two it is restricted to be in the range [0..3*225).

ReadMapSetRef(pn: ProcessorNumber, vp: VirtualPage) Returns(rp: RealPage, order: Order, flags: Flags)

ReadMapSetRef returns the real page rp and flags corresponding to a given virtual page vp; the mapping is performed in the context of the address space currently loaded on processor pn, say aid. If there is no entry for (aid, vp) in the Map Cache a map fault is signalled by setting the order bits to indicate page fault. ReadMapSetRef is used by processor caches when they get a map miss while servicing a processor read, so its speed is important for good system performance.

Invocation: ReadMapSetRef(MAdrs). MAdrs[0..6]= 0; MAdrs[7..28]= vp; and MAdrs[29..31]= xxx; pn is derived from the number of the current M-Bus master.

Results: returned via Done(MAdrs, MData). MAdrs[0..21]= rp; MAdrs[22..24]= xxx; MAdrs[25:27]= order; MAdrs[28:31]= flags; MData[0..21]= rp; MData[22..24]= xxx; MData[25:27]= undefined; MData[28:31]= flags; If vp lies in the map bypass area the values for flags are: shared= FALSE, wtEnable= TRUE, dirty= TRUE, and spare= FALSE.

Implementation Notes: Note that MAdrs[29..31] for ReadMapSetRef is not interpreted but is turned around as MAdrs[22..24] and MData[22..24] for Done.

ReadMapSetRefSetDirty(pn: ProcessorNumber, vp: VirtualPage) Returns(rp: RealPage, order: Order, flags: Flags)

ReadMapSetRefSetDirty checks whether writes are permitted for virtual page vp in the address space currently loaded on processor pn and returns the (rp, flags) if they are. This function is invoked by a cache when its processor tries to write into an entry whose wtEnable bit is not set. If both the dirty and the wtEnable bits for this entry are set in the Map Cache, the invoking cache is permitted to do the write, otherwise a write protect fault is signaled. If the entry is not present in the first place, a map fault is signalled.

Invocation: ReadMapSetRefSetDirty(MAdrs). MAdrs[0..6]=0; MAdrs[7..28]=vp; and MAdrs[29..31]= xxx; pn is derived from the number of the current M-Bus master.

Implementation Notes: Note that MAdrs[29..31] for ReadMapSetRef is not interpreted but is turned around as MAdrs[22..24] and MData[22..24] for Done.

ReadEntry(pn: ProcessorNumber, vp: VirtualPage) Returns(rp: RealPage, order: Order, flags: Flags)

Let aid be the address space currently loaded on processor pn. ReadEntry returns the rp and flags corresponding to (aid, vp) in the Map Cache; if the entry is not in the cache a map fault is signaled. This operation may be used to verify the contents of the Map Cache.

Invocation: IOReadFlow(MAdrs). MAdrs[0..6]=0; MAdrs[7..28]=vp; and MAdrs[29..31]= xxx; pn is derived from the number of the current M-Bus master.

Results: returned via Done(MAdrs, MData): MAdrs[0..24]= undefined; MAdrs[25..27]= noop; MAdrs[28..31]= undefined; MData[0..21]= rp; MData[22..24]= xxx; MData[25:27]= noop if entry is in cache, otherwise page fault; MData[28:31]=flags. If vp lies in the map bypass area the values for flags are: shared=FALSE, wtEnable=TRUE, dirty=TRUE, and spare=FALSE.

Implementation Notes:

WriteEntry(pn: ProcessorNumber, vp: VirtualPage, rp: RealPage, flags: Flags)

Let aid be the address space currently loaded on processor pn. WriteEntry puts the entry (aid, vp) b (rp, flags) into the Map Cache. This operation is used by the Cache Controller to return an entry to the Map Cache after a miss.

Invocation: IOWriteFlow(MAdrs, MData). MAdrs[0..6]= 0; MAdrs[7..28]= vp; MAdrs[29..31]= xxx; MData[0..21]= rp; MData[22..24]= xxx; MData[25:27]= don't care; MData[28:31]= flags.

Implementation Notes:

FlushEntry(aid: AddressSpaceId, vp: VirtualPage)

FlushEntry causes the entry corresponding to (aid, vp) to be removed from the Map Cache. If the entry was not there to begin with, the operation has no effect. Note that this operation does not disturb any (aid, vp) entries in processor caches; an explicit ChangeFlags must be done to flush these out.

Invocation: IOWriteFlow(MAdrs, MData). MAdrs[0..6]= 1; MAdrs[7..28]= vp; MAdrs[29..31]= don't care; MData[0..21]= don't care; and MData[22..31]= aid.

Implementation Notes:

FlushSpace(aid: AddressSpaceId)

FlushSpace causes entries for space aid be removed from the Map Cache. If there are no entries for this space the operation has no effect.

Invocation: IOWriteFlow(MAdrs, MData). MAdrs[0..6]=2; MAdrs[7..9]= 0; MAdrs[10..31]= don't care; MData[0..21]= don't care; and MData[22..31]= aid.

Implementation Notes:

FlushCache()

FlushCache causes all entries to be removed from the Map Cache.

Invocation: IOWriteFlow(MAdrs, MData). MAdrs[0..6]= 2; MAdrs[7..9]= 1; MAdrs[10..31]= don't care; and MData[0..31]= don't care.

Implementation Notes:

ChangeFlags(rp: RealPage, order: Order, flags: Flags)

ChangeFlags prods the Map Cache into executing a Done M-Bus command with the specified parameters.

Invocation: IOWriteFlow(MAdrs, MData). MAdrs[0..6]= 2; MAdrs[7..9]= 2; MAdrs[10..31]= don't care; MData[0..21]= rp; MData[22..24]= xxx; MData[25:27]= order; MData[28:31]= flags.

Implementation Notes: Note that ChangeFlags should not be used to set wtEnable in processor caches as long as we permit different address spaces because wtEnable may have different values for the same page (this is highly desirable, eg. for a debugger). Also note that it is not necessary to keep both the wtEnable and dirty bits in the Map Cache; a single bit, wEd, which tracks the logic value wtEnable'dirty is sufficient. However, both bits will be kept so they can be returned for IOReads from processors.

WriteSpaceRegister(pn: ProcessorNumber, aid: AddressSpaceId)

WriteSpaceRegister writes aid into the space register for processor pn. Note that this operation does not invalidate the map entries in pn's caches; this should be done in software via DeMap.

Invocation: IOWriteFlow(MAdrs, MData). MAdrs[0..6]= 2; MAdrs[7..9]= 3; MAdrs[10..31]= don't care; MData[0..21]= don't care; MData[22..31]= aid; pn is derived from the current M-Bus master.

Implementation Notes:

ReadSpaceRegister(pn: ProcessorNumber) Returns(aid: AddressSpaceId)

ReadSpaceRegister returns the contents of the space register for processor pn.

Invocation: IOReadFlow(MAdrs). MAdrs[0..6]= 2; MAdrs[7..9]= 3; and MAdrs[10..31]= don't care; pn is derived from the current M-Bus master.

Implementation Notes:

WriteBypassRegister(reg: [BypassMask, BypassPattern, BypassBase], value: (32—s)BitQuantity)

WriteBypassRegister writes value into the bypass register reg. Recall that the contents of the bypass registers define the location and size of the map bypass area both in virtual address space and in real memory.

On power up the BypassMask register is set to 0 which causes the Map Cache to implement the identity function.

Invocation: IOWriteFlow(MAdrs, MData). MAdrs[0..6]= 2; MAdrs[7..9]= 4; MAdrs[10..29]= don't care; MAdrs[30..31]= reg; MData[0..21]= value; MData[22..31]= don't care.

Implementation Notes:

ReadBypassRegister(reg: [BypassMask, BypassPattern, BypassBase]) Returns(value: (32—s)BitQuantity)

ReadBypassRegister returns the contents of bypass register reg.

Invocation: IOWriteFlow(MAdrs). MAdrs[0..6]= 2; MAdrs[7..9]= 4; MAdrs[10..29]= don't care; MAdrs[30..31]= reg.

Implementation Notes:

4.2 Cache Controller Functions

The Map Cache Controller provides two interfaces: one to service requests from the Map Cache and the other to service requests from Dragon software to manipulate the Map Table. Functions in the two interfaces have different speed requirements, and are also invoked differently. The Map Cache interface functions must be relatively efficient since they are invoked frequently, while the Map Table functions are not as critical. The former are invoked via traps initiated by the Map Cache, while the latter are invoked via procedure calls.

The first two functions below belong to the Map Cache interface. The remainder belong to the Map Table interface.

HandleMapFault(aid: AddressSpaceId, vp: VirtualPage)

HandleMapFault is called when there is a miss in the Map Cache. It first checks if the entry for (aid, vp) is in the Map Table. If the entry is there it is sent to the Map Cache via TakeEntry, otherwise a page fault is signaled.

Invocation: via map fault.

Implementation Notes:

HandleWPFault(aid: AddressSpaceId, vp: VirtualPage)

HandleWPFault is called by the Map Cache when it is servicing a MapForWrite request and finds dirty'wtEnable FALSE. It first checks if the write protect fault is real or simply a result of this being the first write to the entry's page. If it is real, a write protect fault is signaled, otherwise the dirty bit for the entry is set in the Map Table and the updated entry sent to the Map Cache via TakeEntry.

Invocation: via write protect fault.

Implementation Notes: Note that the entry is guaranteed to be in the Map Table since it was in the Map Cache so we won't ever have to signal page fault.

ReadEntry(aid: AddressSpaceId, vp: VirtualPage) Returns(rp: RealPage, flags: Flags)

This operation is like Map(pn, vp) implemented by the Map Cache except that it returns the flags flags and real page rp for an explicit address space id aid rather than for the space currently loaded. If there is no entry corresponding to (aid, vp) a map table fault is signaled.

Invocation: via procedure call.

Implementation Notes:

WriteEntry(aid: AddressSpaceId, vp: VirtualPage, flags: Flags, rp: RealPage)

WriteEntry writes the entry (aid, vp) b (rp, flags) into the Map Table. If an entry already existed for (aid, vp) it is overwritten, otherwise a new entry is added.

Invocation: via procedure call.

Implementation Notes:

GetNextEntry() Returns(rp: RealPage, flags)

GetNextEntry returns the next entry from the map table according to some enumeration order. The enumeration is not perfect: some entries in the table may not appear and one entry may appear more than once. However, the enumeration is good enough to allow most of the entries in the table to be listed. This operation will be used by the page replacement algorithm, amongst others.

Invocation: via procedure call.

Implementation Notes:

DeleteEntry(aid: AddressSpaceId, vp: VirtualPage) Returns(deleted: Bool)

This operation deletes the entry corresponding to aid|vp from the Map Table. An attempt to map the address aid|vp following this operation will result in a page fault unless an intervening operation has placed a new entry for aid|vp into the table.

Invocation: via procedure call.

Implementation Notes:

5. The Map Cache

The Map Cache is an accelerator for speeding up mapping requests issued by processor caches. It has space for around 1200 of the entries stored in the Map Table, and is able to respond to requests in 300 ns in the case of a hit. It is implemented as a single VLSI chip whose main link with the outside world is via the M-Bus. The chip also connects to the D-Bus to permit initialization and debugging.

Two overall aspects of the design are worth pointing out here because they contribute to simple implementation and good performance. The first is that the cache is pure: an entry is never modified within the cache once it has been read in; if modifications need to be made, the entry is flushed from the cache, modified in the Map Table, and read in again. The most important consequence of this is that the cache can hold many more entries by making use of dynamic cells. These cells are smaller than static ones, but also much more prone to alpha particle hits. However, since the cache never contains irreplaceable data, we don't need to worry about correcting errors—we can get away by simply detecting them and flushing erroneous entries. Another consequence is that entries never need to be written through or written back, making the control portions of the chip simpler. The second aspect of the design is that the cache functions as a slave to other devices. Thus it never has to arbitrate for the M-Bus, and all of its interactions with the bus are simple. This minimizes the control logic needed for interfacing to the M-Bus.

5.1 M-Bus Interface

During normal operation the Map Cache communicates with the outside world exclusively over the M-Bus. The only M-Bus transactions recognized are IOReads and IOWrites directed to the appropriate portion of IO address space. As noted above, the Map Cache always operates as a slave to some other device on the M-Bus (usually a processor cache), so it never has to arbitrate for the bus. The only M-Bus signals it uses are shown in the table below.

Signal Direction Use

MCmdAB[0..4) M <—> C recognize IORead and IOWrite,
execute IODone and ChangeFlags

MDataAB[0..32) M <—> C read function name and parameters,
return results

Master[0..6) M —> C read M-Bus master

MGnt M —> C tell when mastership is granted

5.2 Structure

The Map Cache chip consists of the following parts: M-Bus interface logic, control registers, an aid array, a cache array, victim select logic, refresh logic, and D-Bus interface logic.

5.2.1 M-Bus Interface Logic

This interface logic contains: pad drivers and receivers; circuitry to do address matching; latches to hold MCmd, IOAddress, IOData, and IODone response; array bypass logic that implements identity map mode as well as map bypass area addressing, and a control unit that steers the various operations.

5.2.2 Control Registers

The control registers store global mapping information, which includes identity map mode and MapBypassBase.

5.2.3 Aid Registers

This is a set of 2||pn|| registers of ||aid|| bits each that give the mapping between processors and the address spaces currently loaded on them. During normal operation, these registers are accessed using pn, and the aid read out is used along with vp to lookup (rp, flags).

5.2.4 Cache Array

This array holds the mapping between (aid, vp) and (rp, flags). It consists of a number of lines, where each line is made up of corresponding rows from four separate arrays: an ||aid||-bit associative memory array AIDArray; a (||vp||—2)-bit associative memory array VPArray; a 1-bit memory array MPar which stores the parity for the two associative arrays; and a 4*(||rp||+||flags||+1)-bit memory array RPFArray that contains four (rp, flags) pairs for each line (the remaining bit in this array is rpfPar). The top (||vp||—2) bits of the incoming vp and the aid bits from AidReg[pn] are used to do an associative lookup to select a line, while the bottom 2 bits of vp are used to select between the four (rp, flags) pairs potentially stored within that line.

In addition, each line has a number of control bits: ref is set whenever its line is used in mapping, lValid indicates if the (aid, vp) pair in this line is valid; lBroken is used to turn off bad lines completely, and rpf0Valid..rpf3Valid indicate whether each of the four (rp, flags) pairs in this line is valid. All parts of the Cache Array except for these control bits are implemented using dynamic cells.

The final section is the address decoder which permits individual lines in the cache to be selected for victimization as well as for reading and writing during testing.

5.2.5 Victim Select Logic

When a miss occurs in the Map Cache, the line selected as victim is the one pointed to by the victim pointer vPtr. For each cycle during which a victim is not needed the ref bit for the line pointed to by vPtr cleared and vPtr is incremented if ref was set. Thus vPtr moves from one line to the next till it lands on a line that has ref clear. It follows that vPtr will tend to be on lines that have ref clear when there is a request to find a victim. The ref bit is set whenever its line is used in mapping. Thus the lines used most frequently in mapping will be the least likely to be picked as victims, and so this procedure approximates least frequently used victim selection.

At the moment there is no logic to handle broken lines. A clever way to implement this in the future would be as follows. When lBroken is set, its line does not participate in matches and does not permit its ref bit to be cleared (when the victim pointer passes over it). The broken line will appear as though it is in use, so on the average it will not be picked as the victim. However, when it does get picked the effects will not be harmful. The entry will get written into the bogus line but this line will never match so it will appear as though the write entry was never done. The next reference to this entry will miss, and the entry will get written to the cache one more time. However by this time the victim pointer will in all likelihood have moved on to a legitimate line so the entry will be written correctly!! Isn't caching wonderful?

5.2.6 Refresh Logic

The refresh logic is responsible for maintaining information in dynamic cells. It is activated periodically whenever the Map Cache is not busy processing normal requests. For each activation it reads all the dynamic cells belonging to the line that has rSel set and then writes them back. It then clears rSel for this line and sets it for the next line. At any time at most one line is permitted to have rSel set.

5.2.7 D-Bus Interface Logic

-- to be written

5.3 Operation

-- to be written

5.4 D-Bus Interface

-- to be written

5.5 Miscellaneous Information

Maintainence of dirty bit: when a processor writes into a page the first time after it has been brought into main memory, its dirty bit must be set. Since the Map Cache has a copy of this bit, this copy must get set also. This is done by flushing the entry from the Map Cache and signalling a write protect fault in response to a MapForWrite from a processor cache. This operation is issued by a cache when it tries to do a write, but finds wtEnabled clear. It is then up to the Map Cache Controller software to figure out whether the write protect fault is real or not and to update the dirty bit in the Map Table and send the updated entry to the Map Cache if it is not.

6. The Map Cache Controller

As stated earlier, the Map Cache Controller is implemented by software running on Dragon processors. This software provides two interfaces, one to service requests from the Map Cache and the other to perform more general manipulations on the Map Table.

Dragon procesors access the Map Table through the special map bypass area provided in low virtual memory. This area is also meant to contain any other system code (such as some of the fault handlers) that cannot tolerate map faults during its execution.

7. The Map Table

The Map Table is stored in main memory, and maps pairs (aid, vp) to pairs (rp, flags). It is implemented as a hash table, where the hash index is derived by folding and XORing the 32 bits of (aid, vp). Each row of the table is 4 words long and is quadword aligned. A row contains two mapping entries and two pointers used in collision resolution.

Initially, the method for resolving collisions will be linear chaining. If this causes performance problems the method will be changed to tree chaining. This change will be straightforward since the access algorithms are in Dragon code.

Since there may be more than one mapping entry per physical page, it is possible for the Map Table to overflow. The number of entries in the table will be 1.5 times the number of real pages in main memory, making overflow extremely unlikely. However, overflow is still possible, and it is a problem because a miss in the Map Table does not necessarily mean page fault—it could also mean that there was not enough space in the table. To fix this problem, the Map Table will be maintained using the all-or-nothing rule, which states that either all of the vp's for a given rp are in the table or none are. With this rule, table miss once again means page fault, freeing the software from having to discriminate between page faults and table misses.

8. Miscellaneous Issues

-- Variable page size

-- Multiple Map Caches

-- Multiple M-Bus systems

-- In restricted sharing all vp's must have identical protection info

Appendix A. Size of Bypass Area

The map bypass area must be large enough to fit the Map Table as well as code for handling traps. Since the Map Table will larger by far, we need only consider its size to calculate how large to make the bypass area. Map Table size in turn depends directly on the amount of real memory. Assuming that 4 Mbit chips will be available within the lifetime of Dragon, that we can surface mount 1024 memory chips per board, and that we will put at most 8 memory boards on the largest Dragons, the maximum amount of real memory comes out to 1 GWord.

The size of the Map Table is 1.5*(#pages in real memory)*4 words. Rounding up the 1.5 up to 2, the Map Table occupies 223 words. Note that in rounding up 1.5 to 2 we've left plenty of space for code and any other data that needs to be kept in the bypass area.

Appendix B. Layout of IO Address Space

Appendix C. Format of Map Entries

Appendix D. Notation

||x|| log2xË, or the number of bits needed to represent x

a|b a*2||b||+b

vp virtual page

rp real page

s page size

aid address space id

pn processor number

AddressArch.press leftmargin: 1 in, topmargin: 1 in, width: 6.5 in, height: 7 in

MapProcOrg.press leftmargin: 1 in, topmargin: 1 in, width: 6.5 in, height: 5.5 in