The Memory System of a High-Performance Personal Computerby Douglas W. Clark1, Butler W. Lampson, and Kenneth A. PierJanuary 1981ABSTRACTThe memory system of the Dorado, a compact high-performance personal computer, hasvery high I/O bandwidth, a large paged virtual memory, a cache, and heavily pipelinedcontrol; this paper discusses all of these in detail. Relatively low-speed I/O devices transfersingle words to or from the cache; fast devices, such as a color video display, transferdirectly to or from main storage while the processor uses the cache. Virtual addresses areused in the cache and for all I/O transfers. The memory is controlled by a seven-stagepipeline, which can deliver a peak main-storage bandwidth of 530 million bits per second toservice fast I/O devices and cache misses. Interesting problems of synchronization andscheduling in this pipeline are discussed. The paper concludes with some performancemeasurements that show, among other things, that the cache hit rate is over 99 percent.A revised version of this paper will appear in IEEE Transactions on Computers.CR CATEGORIES6.34, 6.21.KEY WORDS AND PHRASESbandwidth, cache, latency, memory, pipeline, scheduling, storage, synchronization, virtualmemory.1. Present address: Digital Equipment Corporation, Tewksbury, Mass. 01876.c Copyright 1981 by Xerox Corporation.XEROXPALO ALTO RESEARCH CENTER3333 Coyote Hill Road / Palo Alto / California 94304НЇО\█pТ─НЇОY>П"НЇОRдqТXОSQrОRдqП(НЇОLJНЇОCБsНЇО@ЇtТбТцПHНЇО?3Тэutut ТщП=НЇО=╞Т⌠П.Т■ututНЇО<+ТлП2ТмНЇО:їТ≥Т ПCНЇО9#ТюТаututП6НЇО7÷Т▓ Т⌠ПHНЇО6ТлututТмП;НЇО4≈ ТйП!ТкП*НЇО3Т─ПKНЇО/ХП/vwtНЇО+\sНЇО'─t НЇО"ТsНЇОиt ТXПPНЇОEНЇО▄uТ─ПHНЇОXxuП%Н)XО▄yН)aОqТXН)aО└sТGП0Ъ≤TVk(б THE MEMORY SYSTEM OF A HIGH-PERFORMANCE PERSONAL COMPUTER521. IntroductionThis paper describes the memory system of the Dorado, a high-performance compact personalcomputer. This section explains the design goals for the Dorado, sketches its overall architecture,and describes the organization of the memory system. Later sections discuss in detail the cache (І2), the main storage (І 3), interactions between the two (І 4), and synchronization of the variousparallel activities in the system (І 5). The paper concludes with a description of the physicalimplementation (І 6), and some performance measurements (І 7).1.1 GoalsA high-performance successor to the Alto computer [18], the Dorado is intended to provide thehardware base for the next generation of computer system research at the Xerox Palo Alto ResearchCenter. The Dorado is a powerful but personal computing system supporting a single user within aprogramming system that extends from the microinstruction level to an integrated programmingenvironment for a high-level language. It is physically small and quiet enough to occupy space nearits users in an office or laboratory setting, and inexpensive enough to be acquired in considerablenumbers. These constraints on size, noise, and cost have had a major effect on the design.The Dorado is designed to rapidly execute programs compiled into a stream of byte codes [16]; themicrocode that does this is called an emulator. Byte code compilers and emulators exist for Mesa[6, 13], Interlisp [4, 17], and Smalltalk [7]. An instruction fetch unit (IFU) in the Dorado fetchesbytes from such a stream, decodes them as instructions and operands, and provides the necessarycontrol and data information to the emulator microcode in the processor; it is described in anotherpaper [9]. Further support for fast execution comes from a very fast microcycle, and amicroinstruction set powerful enough to allow interpretation of a simple byte code in a singlemicrocycle; these are described in a paper on the Dorado processor [10]. There is also a cache [2,11] which has a latency of two cycles, and which can deliver a 16-bit word every cycle.Another major goal is to support high-bandwidth input/output. In particular, color monitors, rasterscanned printers, and high speed communications are all part of the computer research activities;these devices typically have bandwidths of 20 to 400 million bits per second. Fast devices must notexcessively degrade program execution, even though the two functions compete for many of thesame resources. Relatively slow devices, such as a keyboard or an Ethernet interface [12], must alsobe supported cheaply, without tying up the high-bandwidth I/O system. These considerationsclearly suggest that I/O activity and program execution should proceed in parallel as much aspossible. The memory system therefore allows parallel execution of cache accesses and main storagereferences. Its pipeline is fully segmented: a cache reference can start in every microinstructioncycle, and a main storage reference can start in every main storage cycle.1.2 Gross structure of the DoradoFigure 1 is a simplified block diagram of the Dorado. Aside from I/O, the machine consists of theprocessor, the IFU, and the memory system, which in turn contains a cache, a hardware virtual-to-real address map, and main storage. Both the processor and the IFU can make memory referencesand transfer data to and from the memory through the cache. Slow, or low-bandwidth I/O devicescommunicate with the processor, which in turn transfers their data to and from the cache. Fast, orhigh-bandwidth devices communicate directly with storage, bypassing the cache most of the time.For the most part, data is handled sixteen bits at a time. The relatively narrow busses, registers,data paths, and memories which result from this choice help to keep the machine compact. This isespecially important for the memory, which has a large number of busses. Packaging, however, isnot the only consideration. Speed dictates a heavily pipelined structure in any case, and thisparallelism in the time domain tends to compensate for the lack of parallelism in the space domain.Keeping the machine physically small also improves the speed, since physical distance (i.e., wirelength) accounts for a considerable fraction of the basic cycle time. Finally, performance is oftenlimited by the cache hit rate, which cannot be improved, and may be reduced, by wider data paths(if the number of bits in the cache is fixed).Н⌡Оf2zТXУ{z{z{z{z{z{z{ z{z{НЇzНЇО_ш|НЇО\╟zТпПUНЇО[,Т║Т╒ПUНЇОY╗Т∙П]Т√НЇОX$ТїП_НЇОV═ТйПOТкНЇОU Т─П0НЇОQ}ТXНЇОMРzТ╦ПAТ╧НЇОLnТ┐ПFТ└НЇОJЙТ┘ПQТ├НЇОIf ТаТбП4НЇОGБ Т─П-Т│П,НЇОF^Т÷Т═ПTНЇОDзТ─ПSНЇОA╞Т▒Т▓П,} z НЇО@+Т⌡Т° }zП3НЇО>їТ╙ПH{zНЇО=#ТіПCТїНЇО;÷Т░П\НЇО:ТТПMНЇО8≈ТмТнП1НЇО7 Т√ПXНЇО5▐Т─ПTНЇО2dТ┘Т├ПWНЇО0ЮТ╘П,Т╙П.НЇО/\Т┤П[Т┬НЇО-ь ТІП+ТЇП&НЇО,TТ├П;Т┤П&НЇО*пТьП8{z{zНЇО)LТр {z{zПEНЇО'хТ─П:Т│НЇО&D ТҐ}zТЎП.НЇО$юТ─ПDНЇО а}ТXНЇО√zТ█П#Т▌{z{zНЇО Т Т⌡{zПOНЇО▌Т■П)Т∙{zНЇО Т▌ПBТ▐{z{zНЇО├ Т▄П2Т█П&НЇО Т─ПQНЇОвТіП/ТїП2НЇОSТ┼П&Т▀П7НЇОо Т ПJТ⌡НЇОKТпТяП=НЇОг Т┤ПCТ┬НЇОCТ╣ТІПNНЇО ©Т║П?Т╒НЇО;Т█ПYНЇОЇТ─П+Ъ╨TVk(ЄSEC. 1INTRODUCTION53<==їТ═Т║ПMНЇО=#Т┤П3Т┬П,НЇО;÷Т╛ТґП=НЇО: ТаП){z{zП$НЇО8≈Т┬П7Т┴П#НЇО7Т╪{z{zТҐПLНЇО5▐Т─ПMНЇО2dТтПRТуНЇО0Ю}zТ⌡Т°П7{z{zП$НЇО/\ТіТїПZНЇО-ьТ┘Т├ПGНЇО,TТ┌ПbНЇО*пТ╒ТёП?НЇО)LТ┴Т┼П/}zНЇО'хТ╟ПCТ╠НЇО&DТ─ПRНЇО#Т√П,Т≈П1НЇО!∙Т╧ПWТ╨НЇО Т╟ПJТ╠ НЇО█ТЁ{zП#ТЄНЇО Т╗Т╘П?НЇО┘Т▄П.Т█П2НЇОТ▓П*Т⌠П4НЇО}Т┤П`НЇОЫТҐП#ТЎ{zП/НЇОu Т─П0НЇОv}ТXНЇОKzТжПGТвНЇОг ТеТфПOНЇОCТ╧П6Т╨НЇО ©Т²Т·П@НЇО;Т²П-Т·П/НЇОЇТ─ПTЪнTVk(E THE MEMORY SYSTEM OF A HIGH-PERFORMANCE PERSONAL COMPUTER54Memory references specify a 16 or 28 bit displacement, and one of 32 base registers of 28 bits; thevirtual address is the sum of the displacement and the base. Virtual address translation, or mapping,is implemented by table lookup in a dedicated memory. Main storage is the permanent home ofdata stored by the memory system. The storage is necessarily slow (i.e., it has long latency, whichmeans that it takes a long time to respond to a request), because of its implementation in cheap butslow dynamic MOS RAMs (random access memories). To make up for being slow, storage is big,and it also has high bandwidth, which is more important than latency for sequential references. Inaddition, there is a cache which services non-sequential references with high speed (low latency),but is inferior to main storage in its other parameters. The relative values of these parameters areshown in Table 1.CacheStorageLatency-1151Bandwidth12Capacity1250Table 1: Parameters of the cache relative to storageWith one exception (the IFU), all memory references are initiated by the processor, which thus actsas a multiplexor controlling access to the memory (see І 1.2 and [10]), and is the sole source ofaddresses. Once started, however, a reference proceeds independently of the processor. Each onecarries with it the number of its originating task, which serves to identify the source or sink of anydata transfer associated with the reference. The actual transfer may take place much later, andeach source or sink must be continually ready to deliver or accept data on demand. It is possiblefor a task to have several references outstanding, but order is preserved within each type ofreference, so that the task number plus some careful hardware bookkeeping is sufficient to matchup data with references. Table 2 lists the types of memory references executable by microcode. Figure 2, a picture of thememory system's main data paths, should clarify the sources and destinations of data transferred bythese references (parts of Figure 2 will be explained in more detail later). All references, includingfast I/O references, specify virtual, not real addresses. Although a microinstruction actually specifiesa displacement and a base register which together form the virtual address, for convenience we willsuppress this fact and write, for example, Fetch(a) to mean a fetch from virtual address a.A Fetch from the cache delivers data to a register called FetchReg, from which it can be retrieved atany later time; since FetchReg is task-specific, separate tasks can make their cache referencesindependently. An I/ORead reference delivers a 16-word block of data from storage to theFastOutBus (by way of the error corrector, as shown in Figure 2), tagged with the identity of therequesting task; the associated output device is expected to monitor this bus and grab the datawhen it appears. Similarly, the processor can Store one word of data into the cache, or do anI/OWrite reference which demands a block of data from an input device and sends it to storage (byway of the check-bit generator). There is also a Prefetch reference, which brings a block into thecache. Fetch, Store and Prefetch are called cache references. There are special references to flushdata from the cache and to allow a map entries to be read and written; these will be discussed later.The instruction fetch unit is the only device that can make a reference independently of theprocessor. It uses a single base register, and is treated almost exactly like a processor cache fetch,except that the IFU has its own set of registers for receiving memory data (see [9] for details). Ingeneral we ignore IFU references from now on, since they add little complexity to the memorysystem.ЪН⌡Оf2zТXУ{z{z{z{z{z{z{ z{z{НЇzНЇО_шТ∙П#}zТ√ } zНЇО^WТ─ПVТ│}zНЇО\сТ═П+Т║ }zНЇО[OТ√П9Т≈П'НЇОYкТ└ПRТ┘НЇОXGТ╙{z{zТ╚П7НЇОVцТ█Т▌ПXНЇОU?Т╞}zТ╟П*НЇОS╩Т√П5Т≈П-НЇОR7Т─НЇОOdЧД█ОOЧДН⌡ОOdЧ х█ОOЧ хН cОOdЧ(█ОOЧ(Н/▀ОOdЧ і█ОOЧ іН91ОOdЧo█ОOЧoН!SОLЄ}Н(OНЇОKфЧДFН⌡ОKфЧ хFН cОKфЧ(FН/▀ОKфЧ іFН91ОKфЧoFНґОHьzОIe{Н"tОHьzН,НґОGTН"лН,НґОEпН"лН*╨НЇОEЧДОDLЧД█Н⌡ОEЧ хОDLЧ х█Н cОEЧ(ОDLЧ(█Н/▀ОEЧ іОDLЧ і█Н91ОEЧoОDLЧo█НBОA!ТXП/НЇО=ЖТ░{zТ▒ПFНЇО╒ЧДО=оЧД█Н⌡О>╒Ч хО=оЧ х█Н cО>╒Ч(О=оЧ(█Н/▀О>╒Ч іО=оЧ і█Н91О>╒ЧoО=оЧo█НўО:єТXП8НЇО7yТ▀}zП2Т▄}z НЇО5УТ■ }z}zП9НЇО4qТ├{z{zТ┤П!НЇО2МТзТш}z}zП8НЇО1i Т╞П9Т╟НЇО/ЕТ■П({z{zТ∙НЇО.aТ─ {z{zНЇО*b}ТXП$НЇО'7zТ│П=Т┌НЇО%ЁТ─ НґО#[}zТзТшП0НґО!вТ≈П6{z{zТ≤НґО SТ÷П1Т═П"НґОоТ─П#{z{zП2НґОw}zТ┤Т┬П4НґОСТ┐ПSНґОoТ─ПDНЇОТуП%ТжП2НЇО⌠ Т Т⌡ПIНЇОТ▀П[Т▄ НЇО▀Т∙П4Т√П*НЇОТ┌ПYТ┐НЇО┐ТўП)Т╞П*НЇО ЪТ─ПYзTVk() THE MEMORY SYSTEM OF A HIGH-PERFORMANCE PERSONAL COMPUTER56<==Т┤} НЇОOzТ▀}zТ▄П$НЇОM┬Т▀ПUТ▄НЇОLТ╞П<Т╟НЇОJ─Т╗ПMТ╘НЇОHЭТ┴|Т┼zПBНЇОGxТ╚ПBТ╛НЇОEТТ─П#НЇОBиТ┼ПWТ▀НЇОAE ТёТє}zПAНЇО?аТ─{zП8НґО=iНґО;ЕП'НЇО9█Т²ПWТ·НЇО8 ТсП'ТтП.НЇО6┘Т┼Т▀П>НЇО5Т≈Т≤ПHНЇО3}Т└ Т┘ПPНЇО1ЫТ─НЇО.нПSНґО,vТєП6Т╔НґО*РТ─НґО( ПSНґО&BТаТбП2НґО$ЎТ─НЇО!⌠Т▒ПTТ▓ НЇО Т≤П3Т≥П+НЇО▀Т▀Т▄П)|zНЇОТ└П%Т┘П-НЇО┐Т─П4НЇО╟|ТX НЇО┘zТ≥ПH}zНЇО{ z}zТ НЇО}ТхП'ТиП,НЇОЫ Т⌡Т°{z{zП<НЇО uТЮТА{z{z{z{z{zНЇОЯТ│Т┌П4{zНЇО mТ─{zЪФTVk(⌠ THE MEMORY SYSTEM OF A HIGH-PERFORMANCE PERSONAL COMPUTER58<==НЇОмТґ{zП"ТўП3НЇОIТ─ПQНЇОТ²Т·ПVНЇО Т▀ Т▄ПSНЇОТг}z }z Тх}zП-НЇО▓Т├П;Т┤П#НЇО ТєТ╔ }zНЇО┼Т└ПOТ┘НЇО Т·ПCТ÷НЇО┌Т▓П,Т⌠ }zНЇОЧТ─П& БTVk(SEC. 2THE CACHE59<=={НЇО5▐zТ²Т·ПBНЇО4Т─П3НЇО0ЮТ┬П2Т┴П+НЇО/\Т─НЇО+]}ТXНЇО(2zТ⌠П\Т■НЇО&ўТ√ Т≈ПOНЇО%*Т┤ПUТ┬НЇО#іТ┤П@Т┬П"НЇО""ТІП8ТЇП$НЇО ·НЇОsТ■ПOТ∙НЇООТЙТКПAНЇОk Т▌П-Т▐П+НЇОГТ░Т▒П_НЇОcТ√ПQТ≈НЇОъТ╠Т╡ПOНЇО[Т─Н ╙ОНґ~zН ╙ОНґН ╙ОШНґП!НЇОпТ ПAТ⌡НЇО LТ╙Т╚ПDНЇОх Т▐П9Т░НЇОDТ─ ЛTVk(▓SEC. 2THE CACHE61Unfortunately this sleight-of-hand does not always work. The sequence(1)Fetch(address)(2)computation involving FetchReg actually needs the data during cycle (2), which will therefore have to wait for one cycle (see І 5.1).Data retrieved in cycle (1) would be the old value of FetchReg; this allows a sequence of fetches(1)Fetch(address1)(2)register1 _ FetchReg, Fetch(address2)(3)register2 _ FetchReg, Fetch(address3)(4)register3 _ FetchReg, Fetch(address4). . .to proceed at full speed.3. The storage pipelineCache misses and fast I/O references use the storage portion of the pipeline, shown in Figure 3. Inthis section we first describe the operation of the individual pipeline stages, then explain how fastI/O references use them, and finally discuss how memory faults are handled. Using I/O referencesto expose the workings of the pipeline allows us to postpone until І 4 a close examination of themore complicated references involving both cache and storage.3.1 Pipeline stagesEach of the pipeline stages is implemented by a simple finite-state automaton that can change stateon every microinstruction cycle. Resources used by a stage are controlled by signals that itsautomaton produces. Each stage owns some resources, and some stages share resources with others.Control is passed from one stage to the next when the first produces a start signal for the second;this signal forces the second automaton into its initial state. Necessary information about thereference type is also passed along when one stage starts another.3.1.1 The ADDRESS stageAs we saw in І 2, the ADDRESS stage computes a reference's virtual address and looks it up inCacheA. If it hits, and is not I/ORead or I/OWrite, control is passed to HITDATA. Otherwise, controlis passed to MAP, starting a storage cycle. In the simplest case a reference spends just onemicroinstruction cycle in ADDRESS, but it can be delayed for various reasons discussed in І 5.3.1.2 The MAP stageThe MAP stage translates a virtual address into a real address by looking it up in a hardware tablecalled the MapRAM, and then starts the STORAGE stage. Figure 5 illustrates the straightforwardconversion of a virtual page number into a real page number. The low-order bits are not mapped;they point to a single word on the page.Three flag bits are stored in MapRAM for each virtual page:ref, set automatically by any reference to the page;dirty, set automatically by any write into the page;writeProtect, set by memory-management software (using the MapWrite reference). A virtual page not in use is marked as vacant by setting both writeProtect and dirty, an otherwisenonsensical combination. A reference is aborted by the hardware if it touches a vacant page,attempts to write a write-protected page, or causes a parity error in the MapRAM. All three kindsof map fault are passed down the pipeline to READTR2 for reporting; see І 3.1.5.НЇОf2{ТGУН;НG?zНЇО_шТ─П9Н ╙О]┐Нґ~zН ╙О[ЪНґНЇОYїТ▐ПVТ░НЇОX#Т─П]Н ╙ОUкНґ~z Н ╙ОTGНґ~z Н ╙ОRцНґ~z Н ╙ОQ?Нґ~z НґОO╩НЇОMcНЇОH░|ТXНЇОEezТ┬{z{zТ┴П8НЇОCАТ·П\Т÷НЇОB]{z{zТ⌠П;Т■|z{z{z НЇО@ыТ÷П1Т═П.НЇО?UТ─П9НЇО;V}ТXНЇО8+zТ█Т▌ПHНЇО6їТьП9ТыП#НЇО5#Т┼}zП!}zТ▀НЇО3÷Т√Т≈П$}zНЇО2ТяП<ТрНЇО0≈Т─П9НЇО,≤}ТX}НЇО)mzТІ{zТЇП#НЇО'ИТ│{z{~zТ┌{z{~z{}zНЇО&eТъ {zП,ТЮП!НЇО$АТ─ {zП=НЇО Б}ТX}НЇОЇzТ≤{zПHТ≥НЇО3Т╨}{z{zТ╩НЇО╞ Т▄П*Т█П,НЇО+Т─П$НЇОП!{zНґО╗~zП1НґОP~zП/НґОЬ~zП/~z НЇО═ТґП"Тў}z~z~z НЇО Тк|zТлПBНЇО ≤Т≥ПE{zТ НЇОТ─}zП!{zЪlTVk(~ THE MEMORY SYSTEM OF A HIGH-PERFORMANCE PERSONAL COMPUTER62<==ї Т²{z{z{zТ·НЇО=#Т─ПX{Н*+ОПzдTVk(УSEC. 3THE STORAGE PIPELINE65and completes the StorageRAM cycle (І 3.1.3). READTR1 and READTR2 transport the data, controlthe error corrector, and deliver the data to FastOutBus (І 3.1.5). Fault reporting, if necessary, isdone by READTR2 as soon as the condition of the last quadword in the block is known (І 3.3).It is clear from Figure 7 that an I/ORead can be started every eight machine cycles, since this is thelongest period of activity of any stage. This would result in 530 million bits per second ofbandwidth, the maximum supportable by the memory system. The inner loop of a fast I/O task canbe written in two microinstructions, so if a new I/ORead is launched every eight cycles, one-fourthof the processor capacity will be used. Because ADDRESS is used for only one cycle per I/ORead,other tasks²notably the emulator²may continue to hit in the cache when the I/O task is notrunning.I/OWrite(x) writes into virtual location x a block of data delivered by a fast input device, togetherwith appropriate Hamming code check bits. The data always goes to storage, never to the cache,but if address x happens to hit in the cache, the entry is invalidated by setting a flag (І 4). Figure8 shows that an I/OWrite proceeds through the pipeline very much like an I/ORead. The difference,of course, is that the WRITETR stage runs, and the READTR1 and READTR2 stages, although they run,do not transport data. Note that the write transport, from FastInBus to WriteBus, proceeds inparallel with mapping. Once the block has been loaded into WriteReg, STORAGE issues a writesignal to StorageRAM. All that remains is to run READTR1 and READTR2, as explained above. If amap fault occurs during address translation, the write signal is blocked and the fault is passed alongto be reported by READTR2.<==їТ▓ПFТ⌠НЇО=#Т²П#}zП(Т·НЇО;÷Т≤ Т≥ПRНЇО:ТйТкПPНЇО8≈Т▄Т█ПMНЇО7Т≈П"{zТ≤НЇО5▐Т╪} zП7ТҐНЇО4 Т─ НЇО0ЮТ═Т║ПUНЇО/\ТтПGТу НЇО-ьТ╧ПCТ╨НЇО,TТї}zТ╗П<НЇО*пТ╘Т╙}zНЇО)LТ═П6Т║П$НЇО'хТ÷П9{zТ═НЇО&DТ П/Т⌡П${zНЇО$юТх{zП*Ти НЇО#<Т─ПG{zНЇО iЧД█О ЧДН⌡О iЧ х█О Ч хН cО iЧ(█О Ч(Н/▀О iЧ і█О Ч іН91О iЧo█О ЧoНґО█}Н:▒ НЇО÷ЧДFН⌡О÷Ч хFН cО÷Ч(FН/▀О÷Ч іFН91О÷ЧoFНґО┘zП:Н:▒{НґО-z{zН:▒{НґОуzП0Н:▒{НЇО$ЧДОQЧД█Н⌡О$Ч хОQЧ х█Н cО$Ч(ОQЧ(█Н/▀О$Ч іОQЧ і█Н91О$ЧoОQЧo█НО&zТXП"НЇОШТ▓П>Т⌠} НЇО wzТ╞П(} z {zТ╟НЇОСТєПIТ╔НЇО o ТеТфП7НЇОКТ▄ПMТ█DTVk(8SEC. 3THE STORAGE PIPELINE67other tasks. If more faults have occurred in the meantime, FaultCount will have been incrementedagain and the fault task will be reawakened.The fault task does different things in response to the different types of fault. Single bit errors,which are corrected, are not reported at all unless a special control bit in the hardware is set. Withthis bit set, the fault task can collect statistics on failing storage chips; if too many failures areoccurring, the bit can be cleared and the machine can continue to run. Double bit errors may bedealt with by re-trying the reference; a recurrence of the error must be reported to the operatingsystem, which may stop using the failing memory, and may be able to reread the data from the diskif the page is not dirty, or determine which computation must be aborted. Page faults are the mostlikely reason to awaken the fault task, and together with write-protect faults are dealt with byyielding to memory-management software. MapRAM parity errors may disappear if the reference isre-tried; if they do not, the operating system can probably recover the necessary information.Microinstructions that read the various parts of History are provided, but only the emulator and thefault task may use them. These instructions use an alternate addressing path to History which doesnot interfere with the SRN addressing used by references in the pipeline. Reading base registers,the MapRAM, and CacheA can be done only by using these microinstructions.This brings us to a serious difficulty with treating History as a pure ring buffer. To read aMapRAM entry, for example, the emulator must first issue a reference to that entry (normally aMapRead), and then read the appropriate part of History when the reference completes; similarly, aDummyRef (see Table 3) is used to read a base register. But because other tasks may run and issuetheir own references between the start of the emulator's reference and its reading of History, theemulator cannot be sure that its History entry will remain valid. Sixteen references by I/O tasks, forexample, will destroy it.To solve this problem, we designate History[0] as the emulator's "private" entry: MapRead,MapWrite, and DummyRef references use it, and it is excluded from the ring buffer. Because thefault task may want to make references of its own without disturbing History, another private entryis reserved for it. The ring buffer proper, then, is a 14-element memory used by all referencesexcept MapRead, MapWrite, and DummyRef in the emulator and fault task. For historical reasons,Fetch, Store and Flush references in the emulator and fault task also use the private entries; the tagmechanism (І 4.1) ensures that the entries will not be reused too soon.In one case History is read, rather than written, by a pipeline stage. This happens during a readtransport, when READTR1 gets from History the cache address (row and column) it needs for writingthe new data and the cache flags. This is done instead of piping this address along from ADDRESSto READTR1.4. Cache-storage interactionsThe preceding sections describe the normal case in which the cache and main storage functionindependently. Here we consider the relatively rare interactions between them. These can happenfor a variety of reasons:Processor references that miss in the cache must fetch their data from storage.A dirty block in the cache must be re-written in storage when its entry is needed.Prefetch and flush operations explicitly transfer data between cache and storage.I/O references that hit in the cache must be handled correctly.Cache-storage interactions are aided by the four flag bits that are stored with each cache entry tokeep track of its status (see Figure 4). The vacant flag indicates that an entry should never match; itis set by software during system initialization, and by hardware when the normal procedure forloading the cache fails, e.g., because of a page fault. The dirty flag is set when the data in the entryis different from the data in storage because the processor did a store; this means that the entryЪНЇОf2{ТGУН4НG?zНЇО_шТ▄П(Т█П4НЇО^WТ─П'НЇО[,Т╚П0Т╛П2НЇОY╗Т┬П@Т┴П"НЇОX$Т╪ТҐПYНЇОV═ Т≈Т≤П;НЇОUТёП5ТєП(НЇОS≤Т┌ПZНЇОRТ┤ПHТ┬НЇОP░ТиП.ТйП,НЇОOТ▄П${zТ█П0НЇОM┬Т─ПUНЇОJ]Т└Т┘ПNНЇОHыТ┴Т┼ПQНЇОGUТ╗{zП=Т╘ НЇОEяТ─{zП?НЇОBіТсПXТтНЇОA"{zТ╦ПXНЇО?·~zТ▒П7Т▓П"НЇО>~zТ▌Т▐П:НЇО<√Т╘ПMТ╙НЇО;Т─П:Т│{z{z НЇО9▌Т─НЇО6cТТПJ~zНЇО4ъ~zТ╞~zП=Т╟НЇО3[Т▀Т▄ПHНЇО1вТЄП5Т╣П)НЇО0SТ╞~z~z~zП!Т╟НЇО.о~zТ≈~z~zПLТ≤НЇО-KТ─П>НЇО* Т║ПEТ╒НЇО(° Т┌Т┐{zПJНЇО'Т░П>Т▒{НЇО%■zТ─{zНЇО а|ТXНЇО√zТц}zП/ТдНЇО Т▒ПSНЇО▌Т─НґО6ПOНґОчПRНґО├ПQНґО.{z{zП<НЇОТ║Т╒ПIНЇОТ│Т┌~zП4НЇО ШТ╩Т╪П>НЇО wТ┘П"Т├~zП'НЇОСТ╙П#Т╚П=Ъ$TVk(° THE MEMORY SYSTEM OF A HIGH-PERFORMANCE PERSONAL COMPUTER68must be written back to storage before it is used for another block. The writeProtected flag is a copyof the corresponding bit in the map. It causes a store into the block to miss and set vacant; theresulting storage reference reports a write-protect fault (І 3.3). The beingLoaded flag is set for about 15 cycles while the entry is in the course of being loaded from storage; whenever the ADDRESSstage attempts to examine an entry, it waits until the entry is not beingLoaded, to ensure that theentry and its contents are not used while in this ambiguous state.When a cache reference misses, the block being referenced must be brought into the cache. Inorder to make room for it, some other block in the row must be displaced; this unfortunate iscalled the victim. CacheA implements an approximate least-recently-used rule for selecting thevictim. With each row, the current candidate for victim and the next candidate, called next victim,are kept. The victim and next victim are the top two elements of an LRU stack for that row;keeping only these two is what makes the replacement rule only approximately LRU. On a miss, thenext victim is promoted to be the new victim and a pseudo-random choice between the remainingtwo columns is promoted to be the new next victim. On each hit, the victim and next victim areupdated in the obvious way, depending on whether they themselves were hit.The flow of data in cache-storage interactions is shown in Figure 2. For example, a Fetch thatmisses will read an entire block from storage via the ReadBus, load the error-corrected block intoCacheD, and then make a one-word reference as if it had hit.What follows is a discussion of the four kinds of cache-storage interaction listed above.4.1 Clean missWhen the processor or IFU references a word w that is not in the cache, and the location chosen asvictim is vacant or holds data that is unchanged since it was read from storage (i.e., its dirty flag isnot set), a clean miss has occurred. The victim need not be written back, but a storage read mustbe done to load into the cache the block containing w. At the end of the read, w can be fetchedfrom the cache. A clean miss is much like an I/ORead, which was discussed in the previous section.The chief difference is that the block from storage is sent not over the FastOutBus to an outputdevice, but to the CacheD memory. Figure 9 illustrates a clean miss.All cache loads require a special cycle, controlled by READTR1, in which they get the correct cacheaddress from History and write the cache flags for the entry being loaded; the data paths ofCacheA are used to read this address and write the flags. This RThasA cycle takes priority over allother uses of CacheA and History, and can occur at any time with respect to ADDRESS, which alsoneeds access to these resources. Thus all control signals sent from ADDRESS are inhibited byRThasA, and ADDRESS is forced to idle during this cycle. Figure 9 shows that the RThasA cycleoccurs just before the first word of the new block is written into CacheD. (For simplicity andclarity we will not show RThasA cycles in the figures that follow.) During RThasA, the beingLoadedflag is cleared (it was set when the reference was in ADDRESS) and the writeProtected flag is copiedfrom the writeProtected bit in MapRAM. As soon as the transport into CacheD is finished, the wordreference that started the miss can be made, much as though it had hit in the first place. If thereference was a Fetch, the appropriate word is sent to FetchReg in the processor (and loaded intoFetchRegRAM); if a Store, the contents of StoreReg are stored into the new block in the cache.If the processor tries to use data it has fetched, it is prevented from proceeding, or held until theword reference has occurred (see І 5.1). Each fetch is assigned a sequence number called its tag,which is logically part of the reference; actually it is written into History, and read when needed byREADTR1. Tags increase monotonically. The tag of the last Fetch started by each task is kept inStartedTag (it is written there when the reference is made), and the tag of the last Fetch completedby the memory is kept in DoneTag (it is written there as the Fetch is completed); these are task-specific registers. Since tags are assigned monotonically, and fetches always complete in orderwithin a task, both registers increase monotonically. If StartedTag=DoneTag, all the fetches thatЪН⌡Оf2zТXУ{z{z{z{z{z{z{ z{z{НЇzНЇО_шТ┘П"Т├П$~ zНЇО^WТїПPТ╗~zНЇО\сТ┴П*Т┼~ zТ─НЇО[OТ╣ТІП={НЇОYкzТ╛ТґП+~ zНЇОXGТ─П=НЇОUТ╡П:ТЁНЇОS≤Т╨Т╩ПPНЇОRТх}zП@Ти НЇОP░Т▐Т░П=} zНЇОOТҐТЎП<{zНЇОM┬Т─ПF{zТ│НЇОLТ≈ПKТ≤ НЇОJ─Т⌡ПHТ°НЇОHЭТ─ПCНЇОEяТ╦Т╧ПB~zНЇОDMТ·П'Т÷П5НЇОBиТ─П5НЇО?·ПYНЇО;÷}ТX НЇО8tzТ▄Т█ {z~zП5НЇО6ПТ▐Т░ПBНЇО5lТ√} zТ≈П2НЇО3ХТ°П2~zТ²~zНЇО2dТ┘Т├|{z{~zП.НЇО0ЮТ╘Т╙ПLНЇО/\Т─П>НЇО,1Т▓П4{zТ⌠НЇО*ґТрТсПPНЇО))Т▐Т░П1~zНЇО'╔Т⌠ПCТ■{zНЇО&!ТяП:Тр{zНЇО$²~zТ╟{zТ╠П?~zНЇО#Т╦Т╧ПIНЇО!∙Т√~zТ≈~z~ НЇО zТ Т⌡{z ~ zНЇО█Т▌~ z {z Т▐П3НЇО ТєП(Т╔П1НЇО┘Т÷~zТ═П.НЇО{zТ─~zПFНЇОжТ═П3Т║П"}z НЇОRТ² Т·ПP}zНЇОнТ├Т┤ПTНЇОJ{zТ╔ТіП3~zНЇОф} zТ░Т▒П-~z НЇОBТі}z~zТї НЇО ЎТл ТмПMНЇО:Т─П_╙TVk(ЙSEC. 4CACHE-STORAGE INTERACTIONS69<==НЇОP░Т─{zН*+О+]НЇО(2ТҐП=ТЎНЇО&ўТ─П7Т│П'НЇО%*Т─Т│ПEНЇО#іТ≈{zП4НЇО"" Т°Т²П3НЇО ·ТхП;ТиНЇОТ═Т║ПFНЇО√Т─П5НЇОkТ≤Т≥ПDНЇОГТ⌠П&Т■П:НЇОcТ█ПBТ▌НЇОъТ─НЇОЮ}ТXНЇО╣~zТ⌠Т■ ~zПOНЇО 1Т┌Т┐ПFНЇОґТ▓~zП#{z{zТ⌠~zНЇО )Т─П\ \TVk( SEC. 4CACHE-STORAGE INTERACTIONS71A Flush explicitly removes the block containing the addressed location from the cache, rewriting itin storage if it is dirty. Flush is used to remove a virtual page's blocks from the cache so that itsMapRAM entry can be changed safely. If a Flush misses, nothing happens. If it hits, the hitlocation must be marked vacant, and if it is dirty, the block must be written to storage. To simplifythe hardware implementation, this write operation is made to look like a victim write. A dirty Flushis converted into a FlushFetch reference, which is treated almost exactly like a Prefetch. Thus, whena Flush in ADDRESS hits, three things happen:the victim for the selected row of CacheA is changed to point to the hit column;the vacant flag is set;if the dirty flag for that column is set, the Flush is converted into a FlushFetch. Proceeding like a Prefetch, this does a useless read (which is harmless because the vacant flag hasbeen set), and then a write of the dirty victim. Figure 11 shows a dirty Flush. The FlushFetchspends two cycles in ADDRESS, instead of the usual one, because of an uninteresting implementationproblem.<==НЇОKТ╦ПF~zТ╧~ НЇОI┬zТ┐{zТ└П=НЇОHН*+О╦НЇО╧}ТX}}НЇО▌zТ·{z{~zТ÷~zП.НЇО Т≥П?{z{~zТ НЇО├ Т▐Т░ПUНЇОТдП0ТеП'НЇО~Т═Т║{z{~zП1НЇОЗТґП)ТўП2НЇО v Т╟П!Т╠П4НЇОРТКП&{z{~zТЛНЇО n Т─пTVk(Ъ THE MEMORY SYSTEM OF A HIGH-PERFORMANCE PERSONAL COMPUTER72<==ТXНЇО:}НЇО6Н~zТ÷ПHТ═НЇО5jТ■ПdНЇО3ФТ░ПVТ▒ НЇО2bТ÷~zП+Т═П-НЇО0чТ─П@НЇО-Ё }zП*~zНґО+[}ТёzТє~zНґО)в~zТ°Т²ПPНґО(SТ╛{z ~zТґП&НґО&оТ├Т┤~zП#{zНґО%KТ┌П,Т┐П0НґО#гТ└П$Т┘П3НґО"CТ≤~zТ≥~zП'НґО ©Т┼ПQНґО;Т─П3}НґОЦ{}Теz{zТфНґО_Т┴{zТ┼{zП2НґОшТ│ПQ{zНґОWТ╞П({zП%НґОсТ≈Т≤{zНґОO~zТ─ПF~zНґОВ}Т▀zТ▄~z{zП.НґОsТ╙~zП3Т╚ НґООТКПJТЛНґОk ТцТдПDНґОГТ└ПAТ┘НґОcТ°ПCТ²НґО ъТ▒П+~zТ▓НґО[Т╠Т╡ ~zП%ЪtTVk(_ THE MEMORY SYSTEM OF A HIGH-PERFORMANCE PERSONAL COMPUTER74too expensive to do this, and we observed that stores are rare compared to fetches in anycase.History busy. As discussed in І 3.3, a reference uses various parts of the History memory atvarious times as it makes its way through the pipeline. Microinstructions for readingHistory are provided, and they must be held if they will conflict with any other use.The memory system must generate Hold for precisely the above reasons. It turns out, however, thatthere are several situations in which hardware or time can be saved if Hold is generated when it isnot strictly needed. This was done only in cases that we expect to occur rarely, so the performancepenalty should be small. An extra Hold has no logical effect, since it only converts the currentmicroinstruction into a jump-to-self. One example of this situation is that a reference in the cycleafter a miss is always held, even though it must be held only if the miss' victim is dirty or the mapis busy; the reason is that the miss itself is detected barely in time to generate Hold, and there is notime for additional logic. Another example: uses of FetchReg are held while ADDRESS is busy,although they need not be, since they do not use it.5.2 Waiting in ADDRESSA reference in ADDRESS normally proceeds either to HITDATA (in the case of a hit) or to MAP (for amiss, a victim write or an I/O reference) after one cycle. If HITDATA or MAP is busy, it will wait inADDRESS, causing subsequent references to be held because ADDRESS is busy, as discussed above.HITDATA uses CacheD, and therefore cannot be started when CacheD is busy. A reference that hitsmust therefore wait in ADDRESS while CacheD is busy, i.e., during transports to and from storage,and during single-word transfers resulting from previous fetches and stores. Some additionalhardware would have enabled a reference to be passed to HITDATA and wait there, instead of inADDRESS, for CacheD to become free; ADDRESS would then be free to accept another reference.This performance improvement was judged not worth the requisite hardware.When MAP is busy with an earlier reference, a reference in ADDRESS will wait if it needs MAP. Anexample of this is shown in Figure 10, where the victim write waits while MAP handles the read.However, even if MAP is free, a write must wait in ADDRESS until it can start WRITETR; sinceWRITETR always takes longer than MAP, there is no point in starting MAP first, and theimplementation is simplified by the rule that starting MAP always frees ADDRESS. Figure 13 showstwo back-to-back I/OWrites, the second of which waits one extra cycle in ADDRESS before startingboth WRITETR and MAP.The last reason for waiting in ADDRESS has to do with the beingLoaded flag in the cache. IfADDRESS finds that beingLoaded is set anywhere in the row it touches, it waits until the flag iscleared (this is done by READTR1 during the RThasA cycle). A better implementation would waitonly if the flag is set in the column in which it hits, but this was too slow and would also requirespecial logic to ensure that an entry being loaded is not chosen as a victim. Of course it would bemuch better to Hold a reference to a row being loaded before it ever gets into ADDRESS, butunfortunately the reference must be in ADDRESS to read the flags in the first place.5.3 Waiting in MAPThe traffic control techniques discussed thus far, namely, Hold and waiting in ADDRESS, are notsufficient to prevent all the conflicts shown in Table 4. In particular, neither deals with conflictsdownstream in the pipeline. Such conflicts could be resolved by delaying a reference in ADDRESSuntil it was certain that no further conflicts with earlier references could occur. This is not a goodidea because references that hit, which is to say most references, must be held when ADDRESS isbusy. If conflicts are resolved in MAP or later, hits can proceed unimpeded, since they do not uselater sections of the pipeline.Н⌡Оf2zТXУ{z{z{z{z{z{z{ z{z{НЇzНґО_шТ² Т·ПLНґО^WНґО[Ъ}Т─z Т│ПBНґОZ{ТвТьП9НґОXВТ─ПNНЇОUлТ┬}z ~zТ┴П&НЇОTHТ√ПB~zТ≈ НЇОRдТ┬Т┴ПFНЇОQ@ТІ~zТЇНЇОO╪Т≤П<Т≥НЇОN8Т┼ПOТ▀НЇОLЄТ┤ПQ~z Т┬НЇОK0ТЇТ╦П2{zНЇОI╛Т─П,НЇОEґ}ТXНЇОB┌zТ└ {zТ┘{z{zНЇО@ЧТ├П:{z{zТ┤НЇО?z{zТ─П2{zНЇОНЇОHЭТ─НЇОDЩ}ТX}НЇОAрzТц{z}z{zТдНЇО@NТёП<ТєП!НЇО>йТ─НґО=sqТ─sqsqНЇО;ТжП+ТвП&НЇО9▌Т┼Т▀ПTНЇО8 Т─П`НЇО37sТXНЇО0qТ╚Т╛П@НЇО.┬Т▐П&tqtqТ░НЇО-Т▒П7tqТ▓НЇО+─Т÷П0Т═П,tНЇО)ЭqТ┤Т┬ПGНЇО(xТ■Т∙ПOНЇО&ТТ─ПTНЇО#иТ└Т┘rqП5НЇО"EТ√rqТ≈П>НЇО аТ─П.НЇОНЧД█О√ЧДН⌡ОНЧ х█О√Ч хН cОНЧ(█О√Ч(Н/▀ОНЧ і█О√Ч іН91ОНЧo█О√ЧoН⌡О>uТX Н'GН;РН⌡О╨ НЮН'GН/▀Н;РНЇОлЧДFН⌡ОлЧ хFН cОлЧ(FН/▀ОлЧ іFН91ОлЧoFНЇОчqН⌡НЮН'GН/▀Н;РНЇОZН⌡НЮН'GН/▀Н;РНЇОжrН⌡qНЮН'GН/▀Н;РНЇО%ЧДОRЧД█Н⌡О%Ч хОRЧ х█Н cО%Ч(ОRЧ(█Н/▀О%Ч іОRЧ і█Н91О%ЧoОRЧo█НУО'П"НЇО ЭТ╗ПKТ╘НЇО x Т≥Т vqП4НЇОТТ√tqТ≈П1ЪTVk(LSEC. 7PERFORMANCE79large²all over 99 percent. This is one reason that the number of held cycles is small: a miss cancause the processor to be held for about thirty cycles while a reference completes. In fact, the tableshows that Hold and hit are inversely related over the programs measured. Beads has the lowest hitrate and the highest Hold rate; Placer has the highest hit rate and the lowest Hold rate.The percentage of Store references is interesting because stores eventually give rise to dirty victimwrite operations, which consume storage bandwidth and cause extra occurrences of Hold by tying upthe ADDRESS section of the pipeline. Furthermore, one of the reasons that the StoreReg registerwas not made task-specific was the assumption that stores would be relatively rare (see thediscussion of StoreReg in І 5.1). Table 5 shows that stores accounted for between 10 and 19percent of all references to the cache.Comparing the number of hits to the number of stores shows that the write-back discipline used inthe cache was a good choice. Even if every miss had a dirty victim, the number of victim writeswould still be much less than under the write-through discipline, when every Store would cause awrite. In fact, not all misses have dirty victims, as shown in the last column of the table. Thepercentage of misses with dirty victims varies widely from program to program. Placer, which hadthe highest frequency of stores and the lowest frequency of misses, naturally has the highestfrequency of dirty victims. Beads, with the most misses but the fewest stores, has the lowest. Thelast three columns of the table show that write operations would increase about a hundredfold ifwrite-through were used instead of write-back.Acknowledgements The concept and structure of the Dorado memory system are due to Butler Lampson and ChuckThacker. Much of the design was brought to the register-transfer level by Lampson and BrianRosen. Final design, implementation, and debugging were done by the authors and Ed McCreight,who was responsible for the main storage boards. Debugging software and microcode were writtenby Ed Fiala, Willie-Sue Haugeland, and Gene McDaniel. Haugeland and McDaniel were also ofgreat help in collecting the statistics reported in І 7. Useful coments on earlier versions of thispaper were contributed by Forest Baskett, Gene McDaniel, Jim Morris, Tim Rentsch, and ChuckThacker. References1.Bell, J. et. al. An investigation of alternative cache organizations. IEEE Trans. Computers C-23, 4, April 1974, 346-351.2.Bloom, L., et. al. Considerations in the design of a computer with high logic-to-memory speed ratio. Proc. GigacycleComputing Systems, AIEE Special Pub. S-136, Jan. 1962, 53-63.3.Conti, C.J. Concepts for buffer storage. IEEE Computer Group News 2, March 1969, 9-13.4.Deutsch, L.P. Experience with a microprogrammed Interlisp system. Proc. 11th Ann. Microprogramming Workshop, PacificGrove, Nov. 1979.5.Forgie, J.W. The Lincoln TX-2 input-output system. Proc. Western Joint Computer Conference, Los Angeles, Feb. 1957,156-160.6.Geschke, C.M. et. al. Early experience with Mesa. Comm. ACM 20, 8, Aug. 1977, 540-552.7.Ingalls, D.H. The Smalltalk-76 programming system: Design and implementation. 5th ACM Symp. Principles ofProgramming Languages, Tucson, Jan. 1978, 9-16.8.Knuth, D.E. TEX and METAFONT: New Directions in Typesetting. American Math. Soc. and Digital Press, Bedford, Mass.,1979.9.Lampson, B.W. et. al. An instruction fetch unit for a high-performance personal computer. Technical Report CSL-81-1,Xerox Palo Alto Research Center, Jan. 1981. Submitted for publication.10.Lampson, B.W., and Pier, K.A. A processor for a high performance personal computer. Proc. 7th Int. Symp. ComputerArchitecture, SigArch/IEEE, La Baule, May 1980, 146-160. Also in Technical Report CSL-81-1, Xerox Palo Alto ResearchCenter, Jan. 1981.11.Liptay, J.S. Structural aspects of the System/360 model 85. II. The cache. IBM Systems Journal 7, 1, 1968, 15-21.12.Metcalfe, R.M., and Boggs, D.R. Ethernet: distributed packet switching for local computer networks. Comm. ACM 19, 7,July 1976, 395-404.НЇОf2rТGУН8м НG?qНЇО_шТ≤П.Т≥П+НЇО^WТ└П9Т┘П)НЇО\сТ┤tqПIТ┬ НЇО[OТ─tqП6tqНЇОX$ТєtqП)Т╔П%НЇОV═Т│П0Т┌tqНЇОUТ╗rq Т╘ПJНЇОS≤ТНТОПOНЇОR ТдТеПFНЇОP░Т─НЇОMeТ▄П6Т█П"НЇОKАТ║ Т╒uqП5НЇОJ]ТёПHtq НЇОHыТўП=Т╞НЇОGU Т▓Т⌠П:НЇОEяТчПJТъНЇОDMТ▒П>Т▓НЇОBиТ╗Т╘ПVНЇОAEТ─П!НЇОL ╩к# в,ж. m8Д?╩B nL┌TЮ[╘ch mхxd┐▌■Ж⌡═Wє ║ґ ЇЛЎЫдСйЄуKЮeДЪЪj/ХФ⌠0чЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪDoradoMem.bxDakeJanuary 13, 1981 3:42 PM