A Processor for a High-Performance Personal Computerby Butler W. Lampson and Kenneth A. PierJanuary 1981ABSTRACTThis paper describes the design goals, microarchitecture, and implementation of themicroprogrammed processor for a compact high performance personal computer.  Thismachine supports a range of high level language environments and high bandwidth I/Odevices.  It also has a cache, a memory map, main storage, and an instruction fetch unit;these are described in other papers.  The processor can be shared among 16 microcodedtasks, performing microcode context switches on demand with essentially no overhead.Conditional branches are done without any lookahead or delay.  Microinstructions are fairlytightly encoded, and use an interesting variant on control field sharing.  The processorimplements a large number of internal registers, hardware stacks, a cyclic shifter/masker,and an arithmetic-logic unit, together with external data paths for instruction fetching,memory interface, and I/O, in a compact, pipelined organization.The machine has a 60 ns microcycle, and can execute a simple macroinstruction in onecycle; the I/O bandwidth is 530 megabits/sec.  The entire machine, including disk, displayand network interfaces, is implemented with approximately 3000 MSI components, mostly ECL10K; the processor is about 35% of this.  In addition there are up to 4 storage modules, eachwith about 300 16K or 64K RAMs and 200 MSI components, for a maximum of 8 megabytes.The total volume, including power and cooling, is about .14 m3 (4.5 ft3).  A number ofmachines are currently running.A version of this paper appeared in Proc. 7th Symposium on Computer Architecture,SigArch/IEEE, La Baule, May 1980, 146-160.CR CATEGORIES6.34, 6.21KEY WORDS AND PHRASESarchitecture, controller, emulation, input/output, microprogram, pipeline, processor.c Copyright 1981 by Xerox Corporation.XEROXPALO ALTO RESEARCH CENTER3333 Coyote Hill Road / Palo Alto / California 94304                                                                                Н  ОM'p Т ─Н  ОIьП"Н  ОEqТ XП&Н  О@2Н  О:ErН  О7sТТП4Н  О5√Т БП2Т ЦН  О4Т сТ тП@t s t Н  О2▌sТ ╙Т ╚ПHН  О1
Т ёП$Т єП,Н  О/├Т Д
Т ЕПCН  О.
Т  Т ⌡ПGН  О,~Т сП(Т тП)Н  О*З	Т ЇТ ╦ПNН  О)vТ ТП@Т УН  О'РТ ─t s t sП'Н  О$гТ ╪Т ҐПEН  О#CТ ╙t s t sПDТ ╚Н  О!©Т ▌П<tstН  О ;sТ │ПYН  ОЇТ ⌡Т °ts	tsП*Н  ОТ оП.Т пО²t ОsО²t ОsН  О▄Т ─Н  Оa Т ЫП#uТ Зs Н  Ощts Т ─Н  О$rН  ОЫs	Н  ОmrН  ОBsПUН  О©v w tП$Н║О&xН╙О╒qТ XН╙О dyТ GП0        ╡Їf    H÷[б                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          A PROCESSOR FOR A HIGH-PERFORMANCE PERSONAL COMPUTER21.  IntroductionThe machine described in this paper, called the Dorado, was designed by and for the ComputerScience Laboratory (CSL) of the Xerox Palo Alto Research Center.  CSL has approximately fortypeople doing research in most areas of computer science, including VLSI design, communications,programming systems, graphics and imaging, office automation, artificial intelligence, computationallinguistics, and analysis of algorithms.  There is a heavy emphasis on building usable prototypesystems, and many such systems, both hardware and software, have been developed over the lastseven years.  Most are part of a personal computing environment which is loosely coupled to othersuch environments, and to service facilities for storage and printing, by a high bandwidthcommunication network [8].The Dorado provides the hardware base for the next generation of system research in CSL.  Earliermachines have limitations on virtual address size, real memory size, memory bandwidth, andprocessor speed that severely hamper our work.  The size and speed of the Dorado minimize theselimitations.  The paper has six sections.  We begin by sketching the history of the machine's development (І 2).Then we discuss the design goals for the Dorado (І 3), and explain how these goals and theavailable technology determine the high level processor architecture (І 4).  Next, we present themost important details of the processor architecture (І 5) and some interesting aspects of theimplementation (І 6).  A final section describes the machine's performance (І 7).2.  HistoryThe Dorado is a descendant of a small personal computer called the Alto, which was designed andbuilt as an experimental machine in CSL during 1973 [8].  The Alto was a fairly simple machine, butit had several features which turned out to be important:Їa microprogrammed processor that is efficiently shared among all the device controllers aswell as the virtual machine interpreter;Їa fairly high resolution display system that uses a full bitmap stored in the Alto mainmemory;Їa device for pointing at images on the display;Їan interface to a high bandwidth communication network.The microarchitecture allows all the device controllers to share the full power of the processor,rather than having independent access to the memory.  As a result, controllers can be small and yetthe I/O interface provided to programs can be powerful.  This concept of processor sharing isfundamental to the Dorado as well, and is more fully explained in І 4.Although there are now many hundreds of Altos at work within Xerox and elsewhere, and theyformed the hardware base for CSL until mid-1980, it was clear by 1976 that a large and rapidlyincreasing amount of effort was going into surmounting the Alto's limitations of space and speed,rather than trying out research ideas in experimental systems.  CSL therefore began to design a newmachine aimed at relieving these burdens.  During 1976 and 1977, design work on the Doradoproceeded in CSL and the System Development Department.  Requirements and contributions fromparts of Xerox outside of CSL affected the design considerably, as did the tendency towardgrandiosity well known in second systems.  The memory bandwidth and processor throughput weresubstantially increased.In 1977, implementation of the laboratory prototype for the Dorado began.  The prototypepackaging and a design automation system had already been implemented, and were used forconstructing and debugging Dorado Model 0.  A small team of people worked steadily on allaspects of the Dorado system until summer of 1978, when the prototype successfully ran all theAlto software.  During the summer and fall of 1978 we used the lessons learned in debugging andЪ                                                                                                                                                                                                                                                                                                                          НДО^ёz wТ XУ   zw zw z w zw z
w zw zН  w Н  ОXL{Н  ОU!wТ ґП8Т ўП!Н  ОS²Т ЄzwП+zwН  ОRТ ╛Т ґzwН  ОP∙
Т ▌Т ▐П:Н  ОOТ ╪Т ҐПDН  ОM█Т ╙П8Т ╚Н  ОL	Т ▌П@Т ▐Н  ОJ┘ТТПHН  ОIТ ─Н  ОEжТ ░Т ▒ПJzw	Н  ОDRТ ФПCТ ГН  ОBнТ ▐
Т ░ПKН  ОAJТ ─Н  О>Т ▀Т ▄ПGН  О<⌡Т м
Т нПKН  О;Т ╡Т ЁПCН  О9⌠Т ьП#Т ыП7Н  О8Т ─ПCН  О3<{ w Т X{Н  О0wТ ░ПOТ ▒Н  О.█Т │zwП(Т ┌Н  О-	Т ─П7НЕО*╠ НЖ Т °П.Т ²П+НЖО)-Т ─П$НЕО&у НЖ Т хТ иП7НЖО%QНЕО"Ы НЖ Т ─П.НЕО ║ НЖП7Н  ОvТ юП7Т аП'Н  ОРТ ┤П&Т ┬П7Н  ОnТ х z w z wП#Т иП3Н  ОЙ
Т ─П;Н  О©Т ╠П:Т ╡Н  О;Т ЄzwП%Т ╣Н  ОЇ	Т ║Т ╒П<Н  О3Т ▀Т ▄П*zwН  О╞Т ╩Т ╪ПJН  О+Т ┴zwТ ┼ПLН  ОїТ ОzwП=Н  О#
Т ▄ПRН  О	÷Т ─
Н  ОtТ РПHТ СН  ОПТ сП0Т тН  ОlТ йП!Т кП,Н  ОХТ ЁП+Т ЄП,Н  О dТ ■ПBТ ∙Ъ        хЇ▐    H÷c≥                                                                                                                                                                                                             SEC. 2HISTORY3microcoding the Model 0, together with the significant improvements in memory technology sincethe Model 0 design was frozen, to redesign and reimplement nearly every section of the Dorado.We fixed some serious design errors and a number of annoyances to the microcoder, substantiallyexpanded all the memories of the machine, and speeded up the basic cycle time.  Dorado Model 1came up in the spring of 1979.  During the next year several copies of this machine were built in the stitchweld technology used forthe prototypes.  Stitchwelding worked very well for prototypes, but is too expensive for evenmodest quantities.  Its major advantages are packaging density and signal propagation characteristicsvery similar to those of the production technology, very rapid turnaround during development(three days for a complete 300-chip board, a few hours for a modest change), and completecompatibility with our design automation system.At the same time, the design was transferred to multiwire circuit boards; the Manhattan wirerouting and lower impedance of this technology slowed the machine down by about 15%.  Doradosare now assembled with very little in-house labor, since boards and backpanels are manufacturedand loaded by subcontractors.  We do 100% continuity testing of the boards both before and afterthey are loaded with components and soldered.  Checkout of an assembled machine is still non-trivial, but is a fairly predictable operation done entirely by technicians.3.  GoalsThis section of the paper describes the overall design goals for the Dorado.  The high levelarchitecture of the processor, described in the next section, follows from these goals and thecharacteristics of the available technology.The Dorado is intended to be a powerful but personal computing system.  It supports a single userwithin a programming system which may extend from the microinstruction level to a fullyintegrated programming environment for a high-level language; programming at all levels must berelatively easy.  The machine must be physically small and quiet enough to occupy space near itsusers in an office or laboratory setting, and cheap enough to be acquired in considerable numbers.These constraints on size, noise, and cost have a major effect on the design.In order for the Dorado to quickly become useful in the existing CSL environment, it had to becompatible with the Alto software base.  High-performance Alto emulation is not a requirement,however; since the existing software is also obsolescent and due to be replaced, the Dorado onlyneeds to run it somewhat faster than the Alto can.Instead, the Dorado is optimized for the execution of languages that are compiled into a stream ofbyte codes; this execution is called emulation.  Such byte code compilers exist for Mesa [3, 6],Interlisp [2, 7] and Smalltalk [4].  An instruction fetch unit (IFU) in the Dorado fetches bytes fromsuch a stream, decodes them as instructions and operands, and provides the necessary control anddata information to the processor; it is described in another paper [5].  Further support for this goalcomes from a very fast microcycle, and a microinstruction powerful enough to allow interpretationof a simple macroinstruction in a single microinstruction.  There is also a cache which has a latencyof two cycles, and can deliver a word every cycle.  The goal of fast execution affects the choices ofimplementation technology, microstore organization, and pipeline organization.  It also mandates anumber of specific features, for example, stacks built with high speed memory, and hardware baseregisters for addressing software contexts.Another major goal for the Dorado is to support high-bandwidth input/output.  In particular, colormonitors, raster scanned printers, and high speed communications are all part of the researchactivities within CSL; one of these devices typically has a bandwidth of 20 to 400 megabits/second.Fast devices should not slow down the emulator too much, even though the two functions competefor many of the same resources.  Relatively slow devices must also be supported, without tying upthe high bandwidth I/O system.  These considerations clearly suggest that I/O activity and emulationshould proceed in parallel as much as possible.  Also, it must be possible to integrate as yetЪ                                                                                                                                                                                                                                                                                                Н  О^[zТ GУ  Н0зН<8w Н  ОX
Т ÷	Т ═ПIН  ОV─Т іТ їП@Н  ОTЭТ ²Т ·ПWН  ОSxТ ▌ПNТ ▐Н  ОQТТ ─Н  ОNиТ ┘П^Н  ОMEТ жП,Т вП.Н  ОKаТ ┘П"Т ├П=Н  ОJ=Т пТ яПDН  ОH╧Т э
Т щПHН  ОG5Т ─П#Н  ОD
Т рТ сПVН  ОB├Т ┼Т ▀ПRН  ОAТ ╘Т ╙ПNН  О?~Т √	Т ≈ПSН  О=ЗТ ґП*Т ўП/Н  О<vТ ─ПDН  О7ё{Т XН  О4xwТ шПRТ эН  О2ТТ ъПRН  О1pТ ─Н  О.EТ ▄ПPТ █Н  О,аТ ШПKТ ЭН  О+=	Т °ПUН  О)╧	Т ═ПRТ ║Н  О(5Т ⌠Т ■ПZН  О&╠Т ─ПHН  О#├Т ґП1Т ўzwН  О"Т ╞ПKН  О ~Т ╙ПLТ ╚Н  ОЗТ ─П-Н  ОоТ ⌠ПZН  ОKТ юТ а |wП2Н  ОгТ ∙П7zwТ √
Н  ОCТ ·ПFТ ÷Н  О©Т ┐Т └ПTН  О;Т ≤Т ≥ПUН  ОЇТ ┤Т ┬ПDН  О3Т ▌ПcН  О╞Т °Т ²ПHН  О+Т ≤ПHТ ≥Н  ОїТ ─П"Н  О	|Т ┼Т ▀П;Н  ОЬТ ьП;Т ыН  Оt	Т ∙zwПNН  ОПТ ▄ПZН  ОlТ ∙ПUТ √Н  ОХТ │Т ┌ z w z wП4z w z wН  О dТ пП*Т яП.Ъ        БЇв    H÷cQ▓                                                                                                                                                                                                                            A PROCESSOR FOR A HIGH-PERFORMANCE PERSONAL COMPUTER4undefined device controllers into the Dorado system in a relatively straightforward way.  Thememory system supports these requirements by allowing cache accesses and main storage referencesto proceed in parallel, and by fully segmented pipelining which allows a cache reference to start inevery cycle, and a storage reference to start in every storage cycle; this system is described inanother paper [1].Any system for experimental research should provide adequate resources at many levels.  For theprocessor, this means plenty of high speed internal storage as well as ample speed.  Hardwaresupport for handling arbitrary bit strings, both large and small, is also necessary.4.  High level architectureWe now proceed to consider the major design decisions which shaped the Dorado processor.  Forthe most part these were guided by the goals set out above, the available implementationtechnology, and our past experience.  In this section we stay at a high level, reserving the details of the architecture for the next.The Dorado fits into a very compact package, illustrated in Figure 1a; Figure 1b is a high-levelblock diagram.  Circuits are mounted on large, high density logic boards (288 16-pin DIP logicpackages plus 144 8-pin SIP resistor packages per board).  The boards slide horizontally into zero-insertion-force connectors mounted in dual backpanels ("sidepanels"); they are .625 inches apart.This density makes it possible to reconcile the goals of size and capability.  Certain sacrifices aremade, however.  For example, it is not possible to access every signal with a scope probe fordebugging and maintenance.  We make up for this by providing sophisticated debugging facilities,diagnostics, and the ability to incrementally assemble and test a Dorado from the bottom up.  The entire machine, including disk, display and network interfaces, is implemented withapproximately 3000 MSI components, mostly ECL 10K; the processor is about 35% of this.  Inaddition there are up to 4 storage modules, each with about 300 16K or 64K RAMs and 200 MSIcomponents, for a maximum of 8 megabytes.  The total volume, including power and cooling, isabout .14 m3 (4.5 ft3); this is without any enclosing cabinet, however, and the open machine is quitenoisy.  Including an 80 megabyte removable disk, it requires about 2.5 Kw of AC power.Most data paths are sixteen bits wide.  The relatively small busses, registers, data paths, andmemories which result help to keep the machine compact.  Packaging, however, is not the onlyconsideration.  CSL has a large class of applications where doubling the data path width increasesperformance only a little, because some of the bits contain type codes, flags or whatever which mustbe examined before an entire datum can be processed.  Speed dictates a heavily pipelined structurein any case, and this parallelism in the time domain tends to compensate for the lack of parallelismin the space domain.  Keeping the machine physically small also improves the speed, since physicaldistance accounts for a considerable fraction of the basic cycle time.  Finally, performance is oftenlimited by the cache hit rate, which cannot be improved, and may be reduced, by wider data paths(if the number of bits in the cache is fixed).Rather than putting processing capability in each I/O controller and using a shared bus or a switchto access the memory, the Dorado shares the processor among all the I/O devices and the emulator.This fundamental concept of the architecture, which motivates much of the processor design, wasfirst tried in the Alto.  It works for two main reasons.  ЇFirst, unless a system has both multiple memory busses (i.e., multi-ported memories) andmultiple memory modules which can cycle independently, the main factor governingprocessor throughput is memory contention.  Put simply, when I/O interfaces make memoryreferences, the emulator ends up waiting for the memory.  In this situation the processormight as well be working for the I/O device.                                              НДОXЭz wТ XУ   zw zw z w zw z
w zw zН  w Н  ОR╔Т вТ ьПMН  ОQ!Т ┴П>Т ┼Н  ОO²Т ░Т ▒ПMН  ОNТ иПYТ йН  ОL∙Т ─
Н  ОIjТ ╒Т ёПUН  ОGФ	Т фПBТ гН  ОFbТ ─ПMН  ОA▐{Т XН  О>dwТ  П"Т ⌡П9Н  О<ЮТ ЧП#Т ЪП2Н  О;\
Т ┼Т ▀ПCТ ─ Н  О9ьН  О6ґТ ╡П?Т ЁН  О5)Т бПPzwН  О3╔Т ·zwТ ÷П?Н  О2!Т Ё
Т ЄПGН  О0²Т їП=Т ╗П$Н  О/Т гПTТ хН  О-∙Т  П!Т ⌡П6Н  О,Т ─ПRН  О(ФТ/Т0П4Н  О'bТ кzwzwТ лН  О%чТ іТ їП)zw	zН  О$Zw
Т ╙ПEТ ╚Н  О"жТ ┌О#c} О"жwО#c} О"жwП?Т ┐Н  О!RТ ─ПPН  О'Т зП[Н  ОёТ ІП;Т ЇН  ОТ єzwП1Т ╔Н  О⌡
Т │Т ┌П:Н  ОТ ┼Т ▀ПWН  О⌠Т ┴П"Т ┼П@Н  ОТ ┴П'Т ┼П9Н  О▀Т √Т ≈ПTН  ОТ █ПYН  О┐Т ─П+Н  ОXТ ░П,z w z wП%Т ▒Н  ОтТ ┬П=Т ┴z w z wН  О
PТ іТ їПDН  ОлТ ─П5НЕОt НЖТ ╙ПDТ ╚
|НЖОПwТ ЖТ ВП/НЖОlТ │
Т ┌П)z w z wНЖОХ
Т їТ ╗П<НЖО dТ ─z w z w	Ъ        жЇ6    H÷]Р╘                                                                                                                                                                              SEC. 4HIGH LEVEL ARCHITECTURE5<==<ProcFig1.press<Ъ  Front ViewTop View+5 Volt-5 Volt-2 Volt+12 VoltFanFanqqqqqqqr.625 inr4.5 in.qrq13 in.3.5 in.qr15.5 inqrrqPower Supplies15 in.<>PanelWiringSideSideWiringPanel-5-2+5+12VVVVxxxx250757025AAAAFigure 1a: Dorado chassisStorage Storage Storage Storage Storage Storage Memory addressingInstruction fetch unitProcessor, high byteProcessor, low byteControl sectionMicroinstruction memoryBaseboardStorage Storage Disk/Ethernet controllerDisplay controllerI/OI/OI/OI/OI/O10.5 inPipe, map and storage controlCache data, error correctionInstructionProcessorCacheStorageSlow input/outputFast input/outputEthernetDiskDisplayKeyboardFetch Unit8K-32KbytesFigure  1b: Dorado block diagram<==<<265 MBits/sec16 bits/60 ns265 MBits/sec16 bits/60 ns120 ns access530 MBits/sec256 bits/480 ns1.7 us accessAir plenumBoard Area288 16-pin DIPs (logic)and144 8-pin SIPs (terminators)per board512K-16M bytes  Н  ОfzТ GУ  Н%zН<8w НtО d2Ъ         4Ї     H÷k Н╡rО#╛П
Н6≈О#╛ПН'pО$:ПН(ЁО'Ч $Н)оО&	Ч $Н(ЁО%ЕЧ@ $Н(ЁО%ЕЧ $@Н(ЁОV<Ч $Н)оОUCЧ $Н(ЁОU Ч@ $Н(ЁОU Ч $@НJ	ОV<Ч $НK&ОUCЧ $НJ	ОU Ч@ $НJ	ОU Ч $@НJ	О'Ч $НK&О&	Ч $НJ	О%ЕЧ@ $НJ	О%ЕЧ $@Н'ОV┐ПНHМОV┐ПНH^О$:ПН1≈ОUўЧ $Н7ЁОT'Ч $╚Н1≈ОTЧ@ $Н1≈ОTЧ $нН<ЁОUўЧ╚ $НC^ОT'Ч $╚Н<ЁОTЧн $Н<ЁОTЧ $нН3пОUУПН>ЛОUУПН2%|ОOеПН3пОOеПН5zОOеПН=пОOеПН?{ОOеПНA%ОOеПНKЄОUўЧ╚ $НKЄО&sЧ╚ $НKЄОQoПНKЄО#QПН&zpО_ўПНKЄОKЧ╚ $НKЄ|ОGЩПНK&pОPПНKЄ|ОFАПНKЄО+ПНKЄО)ЭПНLBО>бЧ $─НLBО.^Ч $²НK&pО<fПНK&О*ПН)AО&PЧ!z GН)AО&PЧ G/┌Н)AОU┼Ч!· GНJ≈О&PЧ G/┌Н&²О`НЧ $Н(HОa|Ч $╚Н(HО].Ч $9Н'╨|О[ПН'╨О]ППН уОbuЧ─ $Н  pОP┼ПНОR5Ч $dНО=иЧ $2Н┤|О^7ПН┤О:└ПН уО&sЧг $НО3Ч $
╡Н┤О#QПН┤О9hПН$rО8;ПН8ApОYПН)AОX
Ч $гНJ≈ОX≥Ч $9Н)AОY▒Чr $Н)A|ОUАПНI{ОUАПНK&О.;Ч9 $НLBОQ_Ч $rН*pО<ПН*О:tПН*]ОKЧ ▌ $Н+zОKЧ ▌ $Н,√ОKЧ ▌ $Н-ЁОKЧ ▌ $Н.оОKЧ ▌ $Н/ЛОKЧ ▌ $Н1ОKЧ ▌ $Н2%ОKЧ ▌ $Н3AОKЧ ▌ $Н4^ОKЧ ▌ $Н5zОKЧ ▌ $Н6≈ОKЧ ▌ $Н7ЁОKЧ ▌ $Н8пОKЧ ▌ $Н9ЛОKЧ ▌ $Н;	ОKЧ ▌ $Н<%ОKЧ ▌ $Н=BОKЧ ▌ $Н>^ОKЧ ▌ $Н?{ОKЧ ▌ $Н@≈ОKЧ ▌ $НAЄОKЧ ▌ $НBпОKЧ ▌ $НCМОKЧ ▌ $НE	ОKЧ ▌ $НF%ОKЧ ▌ $НGBОKЧ ▌ $НH^ОKЧ ▌ $НI{ОKЧ ▌ $НHМО.;Ч ▌ $НGпО.;Ч ▌ $НFЄО.;Ч ▌ $НE≈О.;Ч ▌ $НD{О.;Ч ▌ $НC^О.;Ч ▌ $НBBО.;Ч ▌ $НA%О.;Ч ▌ $Н@	О.;Ч ▌ $Н>ЛО.;Ч ▌ $Н=пО.;Ч ▌ $Н<ЁО.;Ч ▌ $Н;≈О.;Ч ▌ $Н:zО.;Ч ▌ $Н9^О.;Ч ▌ $Н8AО.;Ч ▌ $Н7%О.;Ч ▌ $Н6	О.;Ч ▌ $Н4ЛО.;Ч ▌ $Н3пО.;Ч ▌ $Н2ЁО.;Ч ▌ $Н1≈О.;Ч ▌ $Н0zО.;Ч ▌ $Н/^О.;Ч ▌ $Н.AО.;Ч ▌ $Н-%О.;Ч ▌ $Н,О.;Ч ▌ $Н*ЛО.;Ч ▌ $Н)оО.;Ч ▌ $Н2ЁОR|Ч $╚Н4^ОR|Ч $╚Н6	ОR|Ч $╚Н>^ОR|Ч $╚Н@	ОR|Ч $╚НAЄОR|Ч $╚Н*О=иПНО&чЧ $
 НEчО=┌ПНEчО:-ПНEчО;ьПН<ЁОY▒ЧД $НLBОKCЧ $╧НLBО+PЧ $НLBО&≈Ч $Н@О3;ПН@О1░ПНGО/ЕПН∙О.;ПНyО3;ПНyО1░ПНyО/ЕПНyО.;ПН$О3;ПН$О1░ПН$О/ЕПН$О.;ПНdО3;ПНО1░ПНО/ЕПНО.;ПН√О3;ПН√О1░ПН√О/ЕПН√О.;ПН$О ЕПНгОSuПНгОQНПНгОPfПНгОNъПНгОMXПНгОKяПНrОG<ПН+ОEЄПНrОD-ПНrОBіПН▌ОAПНUО?≤ПН9О>П	НэО?QЧ! $НэО@ьЧ! $НэОB_Ч! $НэОH|Ч! $НэОFТЧ! $НэОEmЧ! $НэОCФЧ! $НэОJЧ! $НэОK┼Ч! $НэОMЧ! $НэОN≤Ч! $НэОV<Ч! $НэОT╣Ч! $НэОS.Ч! $НэОQїЧ! $НэОPЧ! $НэОWцЧ! $НэОYJЧ! $НэОZяЧ! $НэО\YЧ! $НэО]ЮЧ! $НгОTЭПНгОV┐ПНUОX
ПН▌ОY▒ПНrО[ПНэО_gЧ! $НэО`НЧ! $НrО\═ПНrО^'ПНrО_ўПНrОa5ПН∙О&≈Ч G<%Н%ЛО&≈Ч G<%Н&²О_gЧ $НэОbuЧ!3 GНэО=іЧ! GН∙О&PЧ!· GН ЫО=іЧ─ $Н  О1mПНОHцПН╚ОJJПНОхЧД $Н%ЛОAЧ $╚НОЧ	 $НОЧ $нНОsЧ $∙НОsЧ	 $Н%ЛО√Ч $rНОЕЧД $Н3пОхЧV $Н<%ОAЧ $╚Н3пОЧy $Н3пОЧ $нН@≈О9Ч $нН@≈О9Чr $НNФО]Ч $╚Н@≈ОДЧO $Н√rО:ПН$ОsП	Н5zО▐ПНE	О
╚ПН'√ОsЧ $]Н #О	Ч@P ╡Н>^О	$Ч k	rН ╠ОПН╡ОхПНКОДПНОДПН	rОДПН ╠ОДПН]О▐Ч9 $Н√ОzЧ $9Н]ОVЧ] $Н]ОVЧ $]Н@ОVЧ $]Н@ОVЧ∙ $Н╡ОzЧ $9Н@О▐Чr $Н	О▐Ч $Н$ОzЧ $9Н	ОVЧ@ $Н	ОVЧ $]НОЁЧ GДНyОЁЧ GДН╡ОЁЧ GДНнО	$Ч ▌2НнОЁЧ GДН ╠О,ЧV kН #О▐Чг $НКОzЧ $9Н #ОVЧК $Н #ОVЧ $]Н<%ОOЧ9 GН√О▐П
Н%ЛО╛Ч╚ $Н%ЛО,ЧД kН5zpОVПН6	О╛ПН!zОЛЧ k╚НyОДП Н(%tО dЯН╡pО╛ПН╡ОПН(%О╛ПН(%ОПН'щОхПН(%ОхПН'щО
ПН(%ОVПН6	rОOП
Н5zОE▒П
Н2ЁpОB;ПН8пО@▒ПН1О>ФПН7%О=;П	НBпОДПЪ    4  l     O-c'                                       A PROCESSOR FOR A HIGH-PERFORMANCE PERSONAL COMPUTER6ЇSecond, when the processor is available to each device, complex device interfaces can beimplemented with relatively little dedicated hardware, since most of the control does nothave to be duplicated in each interface.  For low bandwidth devices, the force of thisargument is reduced by the availability of LSI controller chips, but for data rates above onemegabit/second no such chips exist as yet.  Of course, to make this sharing feasible, switching the processor must be nearly free of overhead,and devices must be able to make quick use of the processor resources available to them.Many design decisions are based on the need for speed.  Raw circuit speed is a beginning.  Thus,the Dorado is implemented using the fastest commercially available technology which has areasonable level of integration and is not too hard to package.  In 1976, the obvious choice was theECL 10K family of circuits; probably it still is.  Secondly, the processor is organized around twopipelines.  One allows a microinstruction to be started in each cycle, though it takes three cycle tocomplete execution.  Another allows a processor context switch in each cycle, though it takes twocycles to occur.  Thirdly, independent busses communicate with the memory, IFU, and I/O systems,so that the processor can both control and service them with minimal overhead.Finally, the design makes the processor both accessible and flexible for users at the microcode level,so that when new needs arise for fast primitives, they can easily be met by new microcode.  Inparticular, the hardware eliminates constraints on microcode operations and sequencing often foundin less powerful designs, e.g., delay in the delivery of intermediate results to registers or incalculating and using branch conditions, or pipeline delays that require padding of microinstructionsequences without useful work.  We also included an ample supply of resources: 256 generalregisters, four hardware stacks, a fast barrel shifter, and fully writeable microstore, to make theDorado reasonably easy to microcode.  5.  Low level architectureThis section describes in some detail the key ideas of the architecture.  Implementation techniquesand details are for the most part deferred to the next section; readers may want to jump ahead tosee the application of these ideas in the processor.  Along with each key idea is a reference to theplaces in the processor where it is used.5.1 TasksThere are 16 priority levels associated with microcode execution.  These levels are called microtasks,or simply tasks.  Each task is normally associated with some hardware and microcode whichtogether implement a device controller.  The tasks have a fixed priority, from task 0 (lowest) to task15 (highest).  Device hardware can request that the processor be switched to the associated task;such a wakeup request will be honored when no requests of higher priority are outstanding.  The setof wakeup requests is arbitrated within the processor, and a task switch from one task to anotheroccurs on demand, typically every ten or twenty microcycles when a high-speed device is running.When a device acquires the processor (that is, the processor is running at the requested prioritylevel and executing the microcode for that task), the device will presumably receive service from itsmicrocode. Eventually the microcode will block, thus relinquishing the processor to lower prioritytasks until it next requires service.  While a given task is running, it has the exclusive attention ofthe processor.  This arrangment is similar in many ways to a conventional priority interrupt system.An important difference is that the tasks are like coroutines or processes, rather than subroutines:when a task is awakened, it continues execution at the point where it blocked, rather than restartingat a fixed point.  This ability to capture part of the state in the program counter is very powerful.Task 0 is not associated with a device controller; its microcode implements the emulators currentlyresident in the Dorado.  Task 0 requests service from the processor at all times, but with the lowestpriority.                                                                                                                                                                                                                                                                                                                                                                                                                                                НДО]нz wТ XУ   zw zw z w zw z
w zw zН  w НЕОWw НЖТ ґТ ўПLНЖОUС
Т ╠ПJТ ╡НЖОToТ хП%Т иП-НЖОRКТ ▀
Т ▄zwП/НЖОQgТ ─Н  ОN<Т ║П5Т ╒П+Н  ОL╦Т ─ПUН  ОI█Т ≤ПJТ ≥Н  ОH	Т ТТ УП6Н  ОF┘	Т ┼Т ▀ПEН  ОEzwТ ґТ ўП[Н  ОC}	Т √П5Т ≈П&Н  ОAЫТ ║Т ╒П<Н  О@uТ ▌Т ▐П%zwz w z wН  О>ЯТ ─ПLН  О;фТ └ПWТ ┘Н  О:BТ ╠Т ╡ПNН  О8Ў
Т ▀ПKТ ▄Н  О7:Т ХПCТ ИН  О5І
Т ▌ПHТ ▐Н  О42Т уТ жПBН  О2ў	Т юПYН  О1*Т ─Н  О,W{Т XН  О),wТ ≈ПDТ ≤Н  О'╗Т ≈Т ≤ПVН  О&$Т √П&Т ≈П;Н  О$═Т ─П#Н  О ║|Т XН  ОvwТ ┘Т ├П?|	w Н  ОРТ э|wТ щПCН  ОnТ ┘П/Т ├П/Н  ОЙТ ╛Т ґПLН  ОfТ │|wТ ┌П9Н  ОБТ ╔П?Т і |wН  О^Т ─ПZН  О3Т ЄПJТ ╣Н  О╞Т ▀ПGТ ▄Н  О+	Т ї|wТ ╗П.Н  ОїТ ≤Т ≥ПKН  О#Т ▄ПOТ █Н  О	÷Т ═Т ║ПMН  ОТ ┘П.Т ├П3Н  О≈Т ─ПcН  ОlТ ■ПGТ ∙Н  ОХТ ┬Т ┴ПNН  О d        RЇd    H÷bд⌡                                                                                                                                                                                                          SEC. 5LOW LEVEL ARCHITECTURE75.2 Task schedulingWhenever resources (in this case, the processor) are multiplexed, context switching must onlyhappen when the state being temporarily abandoned can be restored.  In most multiplexedmicrocoded systems, this requires the microcode itself to explicitly poll for requests, save andrestore state, and initiate context switches.  A certain amount of overhead results.  Furthermore,the presence of a cache introduces large and unpredictable delays in the execution of microcode(because of misses).  A polling system would leave the processor idle during these delays, eventhough the work of another task can usually proceed in parallel.  To avoid these costs, the Doradodoes task switching on demand of a higher priority device, much like a conventional interruptsystem.  That is, if a lower priority task is executing and a higher priority device requests a wakeup,the lower priority task will be preempted; the higher priority device will be serviced without theconsent or even the knowledge of the currently active task.  The polling overhead is absorbed bythe hardware, which also becomes responsible for resuming a preempted task once the processor isrelinquished by the higher priority device.A controller will continue to request a wakeup until notified by the processor that it is about toreceive service; it then removes the request, unless it needs more than one unit of service.  Whenthe microcode is done, it executes an operation called Block which releases the processor.  Theeffect is that requesting service is done explicitly by device controllers, but scheduling of a giventask is invisible to the microcode (and nearly invisible to the device hardware).5.3 Task specific stateIn order to allow the immediate task switching described above, the processor must be able to saveand restore state within one microcycle.  This is accomplished by keeping the vital state informationthroughout the processor not in a single rank of registers but in task specific registers.  These areactually implemented with high speed memory that is addressed by a task number.  Examples oftask specific registers are the microcode program counter, the branch condition register, themicrocode subroutine link register, the memory data register, and a temporary storage register foreach task.  The number of the task which will execute in the next microcycle is broadcastthroughout the processor and used to address the task specific registers.  Thus, data can be fetchedfrom the high speed task specific memories and be available for use in the next cycle.Not all registers are task specific.  For example, COUNT and Q are normally used only by task 0.However, they can be used by other tasks if their contents are explicitly saved and restored.5.4 PipeliningThere are two distinct pipelines in the Dorado processor.  The main one fetches and executesmicroinstructions.  The other handles task switching, arbitrates wakeup requests and broadcasts thenext task number to the rest of the Dorado.  Each structure is synchronous, and there is no waitingbetween stages.The instruction pipeline, illustrated in Figure 2, requires three cycles (divided into six half cycles) tocompletely execute a microinstruction.  The first cycle is used to fetch it from microstore (time t-2 tot0).  The result of the fetch is loaded into the microinstruction register MIR at t0.  The second cycleis split; in the first half, operand fetches (as dictated by the contents of MIR) are performed and theresults latched at t1 in two registers (A and B) which form inputs to the next stage.  In the secondhalf cycle, the ALU operation is begun.  It is completed in the first half cycle of cycle three, and theresult is latched in register RESULT (at t3).  The second half of cycle three (t3 to t4) is used to loadresults from RESULT into operand registers.                                                                                                                                                                                                                                    Н  ОXozТ GУ  Н%ОН<8w Н  ОR|Т XН  ОNМwТ чП(Т ъП-Н  ОMiТ РП/Т СП"Н  ОKЕ	Т ьП%Т ыП1Н  ОJaТ ЁПNТ ЄН  ОHщТ ╠ПOТ ╡Н  ОGYТ ╩П.Т ╪П)Н  ОEуТ ▒П%Т ▓П7Н  ОDQТ хПOТ и	Н  ОBмТ └Т ┘ПSН  ОAIТ Ё|wП$Т ЄН  О?еТ ║Т ╒П<Н  О>AТ √П.|wТ ≈Н  О<ҐТ ─Н  О9▓ Т ╚П$Т ╛П=Н  О8Т ⌡ПDТ °Н  О6┼Т юП4~wП#Н  О5Т ╘ПIТ ╙Н  О3┌Т ─ПMН  О/┐|Т XН  О,XwТ █П=Т ▌П#Н  О*тТ └П.Т ┘П4Н  О)P	Т ║П8|wТ ╒	Н  О'лТ ╔П+Т іП)Н  О&HТ ЖП:Т ВН  О$дТ ёП*Т єП/Н  О#@Т ЙП8Т КН  О!╪	Т ░ПZН  О 8Т ─ПRН  ОТ єП"Т ╔zwz wП"Н  О┴Т ─ПUН  О┼|Т X
Н  О_wТ хПJТ иН  ОшТ ∙Т √П>Н  ОWТ ┬П'Т ┴П8Н  ОсТ ─Н  О╗Т ├П>Т ┤ |w|
wН  О$	Т │ПWТ ┌ ~ О
≈О$wН  О	6~ О╘ О	6wТ ▌Т ▐П4zw~ О╘ О	6wН  ОHТ ├ПGТ ┤zwН  ОдТ ≤~ О7 Одwz wz wТ ≥П)Н  ОжТ ┘zwПQТ ├Н  ОRТ ∙zw~ Ое ОRwП$~ Ое ОRw~ Ое ОRwН  О dТ ─zw        Їц    H÷]eў                                                                                                                                                                    A PROCESSOR FOR A HIGH-PERFORMANCE PERSONAL COMPUTER8<==<ProcFig2.press<<==<ProcFig3.press<The figure also shows how the pipeline overlapping is achieved.  A new microinstruction begins atevery cycle time.  The operand registers are used in the first half cycle of every cycle to fetchoperands for the current instruction (during t0²t1).  The second half of every cycle is used to storeresults for the previous instruction (during t3²t4).  Figure 3 shows the task arbitration pipeline.  This pipeline is two stages long, and also requires onecycle per stage.  At the beginning of the pipeline (t0), wakeup requests from device controllers arelatched into the WAKEUP register.  During the first half cycle (t0²t1), arbitration is performed andthe highest priority task determined.  During the second half cycle (t1²t2), the microprogramaddress for the highest priority task is fetched from the task specific program counter TPC.  The tasknumber, its TPC, and the command to switch tasks (if the highest priority task is higher than thecurrently executing task) are loaded into registers at t2.  In the second pipe cycle, the TPC is used tofetch the next microinstruction from the microstore, the entire processor uses the selected task  t -2t -1t 0t 1t 2t 3t 4>MIRFetch fromInstruction memoryoperandfetchABoperandmodification>>>resultstorefirst cyclesecond cyclethird cycleT4T3T2T1T0T0T1T2T3T4T4T3T2T1T0T0T1T2T3T4T4T3T2T1T0T0T1T2T3T4T-2T-1T-1T-2T-2T-1T-1T-2T-2T-1T-1T-2<>>><<>>>>ResultInstruction PipelineTiming OverlapFigure 2: Instruction pipeline and timing overlap<==<<t 4t 3t 2t 1t 0requestsarbitratefetchNextNextSwitch>>>>>>>>WakeUpTaskfetch Next task specific statefetch Next microinstructionbroadcast Next taskCurrentTaskCurrent<>><second cyclefirst cycleStateFigure 3: Task arbitration pipelineTPCBestNextTPCt0 offigure 2<==<<  НДО^z wТ XУ   zw zw z w zw z
w zw zН  w НtО82НtОТ2Н  ОиТ ■П[Т ∙Н  ОEТ ҐПSТ ЎН  ОаТ █Т ▌~ О4 Оаw ~ О4 ОаwП3Н  ОсТ ─П&~ ОF Осw ~ ОF ОсwН  О╗Т ┬П\Т ┴Н  О$Т ≈Т ≤~ О
≈ О$wП.Н  О	6Т ≥	zw	Т  ~ О╘ О	6w ~ О╘ О	6wН  ОHТ жПB~ О╩ ОHw ~ О╩ ОHwТ вН  ОZТ │ПQzwТ ┌Н  ОжТ іzwТ їПMН  ОRТ ├Т ┤~ Ое ОRwП!zw
Н  О dТ мПMТ н        тЇЁ    H÷cu лН GrО ПНdО ПНО ПН$eО ПН-IО ПН8ТО ПНC┌О ПНРО√Ч╚ GНРОЁЧР GНРОЁЧ G	+НжО√Ч9 GНОЗЧ GДНжОЁЧ─ GНжОЁЧ G	+Н²О%Ч9 GН╨|О≤ПН²ОЗЧ GДНerОСПНeОHПНeО²ПНdОeП
Н²О╨ПНHО√Ч╚ GНСОЗЧ GДНHОЁЧР GНHОЁЧ G	+НжОeПНСО╨ПН!·О√Ч9 GН#вОЗЧ GДН!·ОЁЧ─ GН!·ОЁЧ G	+Н!·О%Ч9 GН",ОСПН",О│ПН&·О√ЧД GН4┌ОЗЧ GДН&·ОЁЧ+ GН&·ОЁЧ G	+Н+ОeПН)eО╨ПНСО^Ч╚ GНСОЛЧ╚ GН │|ОяПН │О_ПН6╩О√Ч9 GН8ТОЗЧ GДН6╩ОЁЧ─ GН6╩ОЁЧ G	+Н5·О≤ПН4┌О%Ч9 GН;-О√ЧV GНC┌ОЗЧ GДН;-ОЁЧ² GН;-ОЁЧ G	+Н<ьrОeПН<ьО╨ПН,ОаЧ $ДНdОаЧ $ДНD÷ОаЧ $ДН.eОЗЧ $╚НОЗЧ $╚Н	+pО²ПН.eОаЧ $╚Н!О²ПН:ОЗЧ $╚Н6╩О²ПНdО+Ч9 $НHvОхПНЗОхПН╚ОхПН│ОхПНVОхПН╚О
╚ПНЗО
╚ПНHО
╚ПН╨О
╚ПН",О
╚ПН+О	▐ПН&·О	▐ПН",О	▐ПН╨О	▐ПНHО	▐ПН",ОrПН&·ОrПН+ОrПН/^ОrПН3┴ОrПН;ъОVПН7ЄОVПН3┴ОVПН/^ОVПН+ОVПН3┴О9ПН7ЄО9ПН;ъО9ПН@	О9ПНD4О9ПН]О
РЧє $Н]ОЧє $Н╛О
РЧг $Н╛О	жЧг $НО	жЧг $НО
РЧг $Н#░О	жЧг $Н#░О╧Чг $Н(О╧Чг $Н(О	жЧг $Н,sО╧Чг $Н,sО²Чг $Н  ОхПН+ОхПНОЧ╚ $Н@ОЧР $Н√О
РЧР $Н│О
╚ПНVО
╚ПН╚О	▐ПНЗО	▐ПНО	жЧР $НСО╧ЧР $Н│О╧Ч╚ $Н╨ОrПНHОrПН",ОVПН&·ОVПН$eО²Ч╚ $Н(вО²ЧР $Н-IО─Ч╚ $Н/^О9ПН+О9ПН%│ОЗЧ $╚Нd|О{ПНО{ПН-IО{ПНC┌О{ПН,О{ПН.eО{ПНО%Ч9 GН#вО^Чг GН#вОЛЧг GН8ТО%Ч9 GН:О≤ПН%│ОяПН%│О_ПН,О≤ПН7IrОПН7IО╛ПН7IО▐ПН7IО,ПН7IОхПН7IОdПН│О"HПН,О╨ПН*┌О+Ч╚ $НСО+Ч▐ $Н>┌О+Ч $Н²pОП1Н"╨О dЯН
$ОЧ] $Н
kО
РЧ $Н+ОЧ] $НzОЧ] $НzО
РЧ] $НаО	ЫЧ $Н=ґО─Ч9 $Н54О²Ч] $Н54ОєЧ] $Н1	О²Ч] $Н1tО─ЧР $НAШО─Ч $Н9┌О²Ч9 $Н9_ОєЧ] $Н1-О╧Ч $Ъ   т  Pq?]    Eъ#┴IНBrОСПН/;ОСПН╛ОСПН▐ОСПН9ОСПН  ОzЧ9 GН9О	lЧ GVН  О	%Ч─ GН  О	%Ч G²НОAЧ
▐ GН▐О	lЧ GНО	%Ч
ж GНО	%Ч GdН╚ОвПН╚О│П	НхОAЧ▌ GНVОчЧ G╚НхО≈Чж GНхО≈Ч GРНЕО╨ПНЕОПНОzЧ▌ GН╛О	lЧ GVНО	%Чж GНО	%Ч G²НОЧ▌ GНО≈Ч▌ GН:ОПНVОЁЧг GНsО
оЧ GGНsОпЧ╚ GН▐О
оЧ
ж GН╛ОeПН|О'ПНОCПНОCПНДОCПН9ОпЧг GН!ЕОzЧх GН8ґОPЧ GrН!ЕОЧ GН!ЕОЧ G╧Н!ЕО	%Ч G╧Н!ЕО	%Ч GН8ґО	lЧ GrН!ЕО≈Чх GНСОAЧР GНСО
оЧР GН иО╣ПН иОCПН:ФОzЧ╚ GНA▒О	lЧ GVН:ФО	%ЧР GН:ФО	%Ч G²Н:ФОпЧ╚ GН8ґОAЧ9 GН8ґО^Ч9 GНVО3Ч $VН иО3Ч $VН9иО▒ПН9иОґПН ▐rОСПН ▐ОIПН ▐О·ПН ▐ОСПН ▐ОHПН ▐О·ПН:О
eПН#ОСПН#О╨ПН&WО
СПН0WОчЧ $╚Н;tО╨ПН<▒ОПН;tОeПН╛О	lЧ $Н и|ОМПНBОМПН╛ОМПНVОМПН,tpОПНДОПНC;О3Ч $VН<▒rО╨ПН╛О²Ч: $Н5ФО²ЧV $НхpОП#НЕrОeПН:О:ПН:ОвПН:ОsПН=ґtОиПН=ґО╛ПН!ЕpО dЯНVО²Ч
  $   $  FC    Dау                                                                                                                                                                                                                                                                SEC. 5LOW LEVEL ARCHITECTURE9number to fetch the appropriate task specific information, and device controllers are told which taskwill have the processor next.  Finally, at t4 the task switch is complete, and the new task is incontrol of the processor; this time corresponds to t0 of the first microinstruction executed by thenew task.5.5 Microinstruction formatOne of the key decisions made in the design of any microprogrammed processor is the format andsemantics of the microinstruction.  The Dorado's demand for compactness and power are at odds inthis case.  Compactness dictates that an essentially vertical structure be used, with encoded fieldsspecifying many functions in a few bits.  The details of the microinstruction format appear in І 6.The major features of interest here are the choice of successor instruction encoding, and thespecification of a large number of functions which may be executed by the processor.In a classical microprogrammed processor, each instruction carries with it the address of itssuccessor, NEXTPC; this address is latched with the rest of the instruction, and then used directly toaddress the microstore for fetching the next instruction.  NEXTPC may be modified by state withinthe processor during execution, but the basic idea is that enough bits must be present in eachmicroword to address the whole microstore.  This results in a uniform structure for addressing, andallows the next instruction fetch to proceed without any delay for decoding; it has the disadvantagesof increasing the size and cost (and reducing the speed) of the microstore.  The lack of anydecoding time also makes it impossible to specify a subroutine return or other major change insequencing, and have it take effect immediately (branches can still use the scheme described below).The alternative, used in the Dorado, is to divide the microstore into pages, use a few bits to specifya next address within the current page, and have a type field which can specify branches, calls,returns, transfers to another page, or whatever.  At the start of a microcycle, the processor decodesthe type field and accesses other information (such as the current page number or the return link)to compute NEXTPC.  In addition, some types cause side effects such as the loading the return link.The net result is substantially fewer bits to control microsequencing than a horizontal scheme wouldrequire (in the Dorado, 8 bits instead of about 16).  The disadvantages are, of course, the cost andtime for decoding this field, and the additional complexity of an assembler which can fitinstructions onto pages appropriately.Conditional branching is always a problem with pipelined instruction execution.  Most designs useone of the following two schemes, and tolerate its drawbacks.  The first requires that a branch bespecified one (or more) instructions before it is taken.  Although this simplifies and speeds up thehardware, it imposes severe constraints on the microcode organization, and often forces extrainstructions to be executed.  The second scheme detects the branch and inserts asynchronous delayor an extra cycle to allow time for the new instruction to be fetched.  This obviously slows downthe machine.Conditional branching in the Dorado is handled by allowing one of eight branch conditions tomodify the low order bit of NEXTPC.  This modification (Boolean or into the low order bit) takesplace about half way into the instruction fetch cycle.  The microstore is organized so that this bitdoes not change the chip address, but instead selects a different chip from a set of chips whoseoutputs are tied directly together.  Since access time from the chip select is considerably faster thanfrom the address, the late arriving branch condition does not increase the total cycle time.  For thisto work, the assembler must place each false branch target at an even address, and thecorresponding true branch target at the next higher odd address.  An annoying consequence is thatseveral conditional branches cannot have same target; when this case arises the target must beduplicated.  Everything has its price.Another tradeoff occurs in the mechanism for controlling the functions of the processor at eachmicrocycle.  The Dorado encodes most of its operations (other than register selection, ALUoperations, storing results, and memory references) in an eight bit function field called FF.  This is                                                                                                                                                                                                        Н  О\ЄzТ GУ  Н%ОН<8w Н  ОV]Т ┐П8Т └П'Н  ОTыТ ╨Т ╩~ ОTL ОTыwП4Н  ОRКТ ╛П,~ ОR^ ОRКwТ ґП+Н  ОPЩТ ─Н  ОLЧ|Т XН  ОIсwТ ▒Т ▓ПAН  ОHOТ └ПBТ ┘Н  ОFкТ ╙Т ╚ПDН  ОEG	Т ≈Т ≤ПTН  ОCцТ ъПZН  ОB?Т ─ПGН  О?Т РТ СП?Н  О=░	Т ▌ zwТ ▐П7Н  О<Т  П%Т ⌡zwН  О:┬Т бП$Т цП7Н  О9Т ▄П!Т █П9Н  О7─Т │Т ┌П[Н  О5ЭТ тП=Т уН  О4xТ ╦Т ╧ПCН  О2Т
Т ─ПYН  О/иТ ┼ПC|wТ ▀Н  О.E Т ╧П2|wТ ╨Н  О,аТ ▓П0Т ⌠П-Н  О+=Т ≈П3Т ≤П,Н  О)╧Т ▌zwП2Т ▐Н  О(5Т ┐П2Т └П/Н  О&╠Т ▓ПTТ ⌠Н  О%-ТТП8Н  О#╘Т ─Н  О ~
Т ≤Т ≥П8Н  ОЗТ °П:Т ²П%Н  ОvТ ≤П/Т ≥П,Н  ОРТ ЦП/Т ДП%Н  ОnТ ▓П:Т ⌠Н  ОЙТ ·Т ÷ПMН  ОfТ ─Н  О;
Т аП#Т бП.Н  ОЇТ ═zw{wТ ║Н  О3Т ═Т ║ПKН  О╞Т ╟Т ╠ПXН  О+Т ▀ПTТ ▄Н  ОїТ ┼ПRТ ▀Н  О#ТТП@Н  О	÷Т ▐П@Т ░Н  ОТ хТ иПKН  О≈
Т ─Н  ОlТ ЇП5Т ╦П#Н  ОХ
Т ЗПKТ Ш zН  О dw
Т ▒Т ▓П:~w	Ъ        :Ї	~    H÷a╙╡                                                                                                                                                            A PROCESSOR FOR A HIGH-PERFORMANCE PERSONAL COMPUTER10quickly decoded at the beginning of every microinstruction execution cycle (during t0-t1), and isused to invoke all of the less frequently used operations that the processor can do: controlling theI/O busses, reading and setting state in the memory and IFU, extracting an arbitrary field from aword, reading and loading most registers, non-standard carry and shift operations, and loadingvalues into small registers.  FF can also serve as an eight bit constant or as part of a full microstoreaddress.  This encoding saves many bits in the microinstruction, at the expense of allowing only oneFF-specified operation to be done in each cycle, even though the data paths exist for doing manysuch operations in parallel.5.6 Data bypassingRecall that a microinstruction is initiated at the beginning of every cycle, but takes one cycle forinstruction fetch and two cycles for execution.  If an instruction uses a result generated by itsimmediate predecessor, it needs to get that result from an operand register before the predecessorhas actually delivered the result to that register.  Rather than forbidding such use of results, ordelaying execution until the register has been loaded, we solved this problem with a techniquecalled bypassing.  The hardware detects that an operand specified in the current instruction isactually the result of the previous instruction.  Rather than obtaining the operand from the usualsource in a RAM, the processor takes it directly from the input to the RAM, which is the result of theprevious instruction. Figure 4 illustrates the scheme.  This costs extra hardware for multiplexors andbypass detection logic, but the result is much smaller and faster microcode in many common cases.In the Model 0 Dorado, we omitted bypassing logic in a few places, and required the microcoder toavoid these cases.  The result was a number of subtle bugs and a significant loss of performance.<==<ProcFig4.press<5.7 Memory delaysPipelining and bypassing are effective ways to reduce delay and increase throughput within theprocessor.  Interactions with the memory, however, pose different problems.  Once a memoryreference has been made, there must be some way to tell when the memory system has deliveredthe requested data.  Two simple techniques are to wait a fixed (unfortunately, maximum) amount oftime before using the data, or to explicitly poll the memory system.  Neither is satisfactory for ahigh performance machine.  First, the difference between the best case (cache hit) and the worst(cache miss plus memory system resource contention) is more than an order of magnitude.  Second,useful work can often be performed by a given task before it uses the requested memory data.Third, even if a given task must wait for memory data before it can proceed, higher priority tasksmay very well be able to do useful work in the meantime.The Dorado manages this problem by making the memory keep track of when data is ready, andallowing the processor to keep executing instructions.  Only instructions which use memory data or  tluseRstoreresult>>>>bypass pathnormal pathmemoryPrevious Result Addressoperand fetchFigure 4: Bypassing examplemultiplexor switched ifCurrent Operand Address =<==<<   НДО^z wТ XУ   zw zw z w zw z
w zw zН  wН  ОWЎТ ╨
Т ╩ПA~ ОW1 ОWЎw ~ ОW1 ОWЎwН  ОUпТ ≥ПPТ  Н  ОTLz w z wТ ╘Т ╙П-zwП&Н  ОRхТ пТ яПQН  ОQDТ ┼
Т ▀~wПHН  ОOюТ ┌П&Т ┐П6Н  ОN<~w	Т ÷П?Т ═Н  ОL╦Т ─Н  ОH╧|Т XН  ОE▌wТ ╗ПTТ ╘	Н  ОD

Т фП7Т гН  ОB├Т ÷П.Т ═П+Н  ОAТ ╦ПTТ ╧Н  О?~Т бП.Т цП(Н  О=ЗТ о |wТ пП8Н  О<vТ ёТ єПHН  О:РТ ─zwП8zw	Т │Н  О9nТ └Т ┘ПHН  О7ЙТ ▌П#Т ▐П8Н  О6fТ └П4Т ┘П+Н  О4БТ ─П\НtОА2Н  ОБ|Т XН  ОЇw	Т юП+Т аП)Н  О3	Т рТ сПBН  О╞Т ═Т ║ПDН  О+Т ─Т │ПBН  ОїТ їП-Т ╗П2Н  О#Т ╘П!Т ╙П;Н  О	÷Т ┘Т ├ПUН  ОТ ╧Т ╨ПDН  О≈Т  ПMТ ⌡Н  ОТ ─П5Н  ОХТ ·Т ÷ПCН  О dТ ▌П.Т ▐П,        Ї    H÷cMНVОЁЧД GН:ОЧ GДНVОоЧ	+ GНVОоЧ G	+Н ▌rО│ПН ▌ОДПН ▌О	HПН ▌О
╛ПН ▌ОхПН ▌О,ПН  ОоЧ G	+Н  ОоЧ─ GН9ОЧ GДН  ОЁЧ9 GН▐О	жПН▐О│ПН9О
eЧ $Н:О
eЧ
  $Н :О
eЧ $нН :ОЧr $Н╚О
┬Ч $ДН╚ОHЧ $Н$╛ОЗЧ G╚Н$╛О,ЧД $Н#░|О≤ПН#░О_ПН,sО|ПН:ОЄПНVrОвПНЕО╨ПНО,ПН"spО	жПН%иrОHПНpОПН!WО,ПН"sО│ПНО dЯ      і!     5╔ ╩                                                                                                                                                                                                                                                                                                            SEC. 5LOW LEVEL ARCHITECTURE11start memory references can be affected by the state of the memory.  When such an instruction isexecuted, the memory checks to see whether it can be allowed to proceed.  If so, no action is taken.But if the memory is busy, or the data being used is not ready, the memory responds by activatingthe signal Hold.  The effect of Hold is to stop any state changes specified by the current instruction.However, all the clocks in the system keep running.  This is important, because task switching mustnot be inhibited during memory delays.  In effect, Hold converts the currently executing instructioninto a "no operation, jump to self" instruction.  If no task switch occurs, the instruction is executedagain, and a new calculation is made to see whether it can proceed.  Meanwhile, the memorypipeline is running, and sooner or later, the need for Hold will be gone as the pipeline progresses.Note that if a task switch occurs while an instruction is held, the state is such that the heldinstruction may simply be restarted when the lower priority task is resumed by the processor.Cycles which would otherwise be dead time are consumed instead by higher priority tasks doinguseful work.5.8 Separate external interfacesIf most macroinstructions (byte codes) are to execute in a small number of cycles, hardware  mustbe provided to make communication among processor, IFU, and memory very quick in the commoncases.  The Dorado provides a number of data paths and control structures for this purpose,detailed in the block diagrams, Figures 5 and 6.  All the busses are a full word wide and can beaccessed in one cycle or less.  The B input to the ALU is extended to the remainder of the Dorado(except I/O devices, which have their own busses) for the transfer of status and control between theprocessor and the other subsystems.  The memory address bus is a copy of the A side ALU input.Memory data comes directly into the processor and is routed to a variety of destinationssimultaneously, to make such operations as field manipulations and indirect addressing fast.  TheIFU can directly supply operand data to the processor, and any microinstruction can specify that it isthe last of a macroinstruction, in which case the successor address is supplied by the IFU.  Thisrequires a microstore address bus and operand data bus directly from the IFU to the processor.It is also desirable to make I/O transfers through the processor fast.  To this end there is an I/Oaddress bus and an I/O data bus for direct access to I/O controllers.  The data bus can transfer oneword per cycle, or 265 megabits/second, and both the memory reference and the I/O transfer canbe specified in a single instruction, so that it is possible to move a sequence of words between the cache and a device at this rate.  However, this subsystem is called the slow I/O system.  There isalso a more direct memory access I/O subsystem, the fast I/O system; it allows data to move directlybetween storage and I/O devices, in blocks of 16 words, without polluting the cache.  Figure 1bshows a display controller that uses both slow and fast I/O systems.5.9 ConstantsNotice that there is no source for 16 bit constants within the processor.  Such constants arenecessary, particularly in device controller microcode where they often are used as commands,addresses or literal data.  It would be possible to include a constant box, addressed perhaps with anFF function, as a source for constants.  However, such a box would have a limited size and,experience tells us, would not hold enough constants to satisfy a growing world.Fortunately, a large fraction of the constants used in microcoding are either small positive or smallnegative (2's complement) integers, or sparsely populated bit vectors, with the property that one ofthe two eight bit fields in the constant is all zeroes or all ones.  Thus a useful subset of constantscan be specified using the eight bits of FF for one byte of the constant and two other bits to specifythe other byte value and position.  Using this technique, most 16 bit constants can be specified inone microinstruction, and any constant can be assembled in two microinstructions.  (The "other"two bits come from the BSelect field in the microword).Ъ                                                                                                                                                                                                                                                                                                                                                    Н  ОY╛zТ GУ  Н%ОН;┬wН  ОSUТ ≤П6Т ≥П%Н  ОQяТ ┐
Т └ПPН  ОPMТ █Т ▌П@Н  ОNиТ ▀~w~w
Т ▄П8Н  ОMEТ ┤Т ┬ПIН  ОKаТ █П/Т ▌ ~wП-Н  ОJ=Т ┼ПcН  ОH╧Т еП>Т фН  ОG5Т ─П/~wП)Н  ОD
Т рПVТ сН  ОB├
Т пПCТ яН  ОAТ ґП;Т ўН  О?~Т ─Н  О;|Т XН  О8TwТ ≈П_Н  О6пТ ┘П1zwТ ├Н  О5LТ жТ вПIН  О3хТ ║ПCТ ╒Н  О2DТ ▓z wТ ⌠zwП+Н  О0юТ ┬ z w z wТ ┴П?Н  О/<Т ·П7Т ÷z wzwН  О-╦Т
Т	ПGН  О,4Т ╘Т ╙ПOН  О*╟zwТ ─П3Т │П0Н  О),Т ╡Т ЁПLzwН  О'╗Т ─ПAzwН  О$}Т іТ їz w z wП@z w z Н  О"ЫwТ ▀Т ▄z w z wz w z wП,Н  О!uТ ÷ПEТ ═z w z wН  ОЯТ ■Т ∙ПUТ ─ Н  ОmТ ёПC|w z w z wТ єН  ОИТ ┌z w z wТ ┐ |w z w z wП(Н  ОeТ ╞z w z wТ ╟П/Н  ОАТ ─П3z w z wН  ОБ|Т X	Н  ОЇwТ шТ э|wП*Н  О3	Т рТ сП<Н  О╞Т ┴П5|wТ ┼Н  О+zwТ я	Т рПOН  Ої	Т ─ПFН  О	|Т ▓Т ⌠ПAН  ОЬТ ■ПPТ ∙Н  ОtТ ⌡ПYТ °	Н  ОПТ ┐П&~wП0Т └
Н  ОlТ ≤Т ≥ПOН  ОХТ ╛П\Н  О dТ ─~w        ўЇ├    H÷^╒н                                                                                                    A PROCESSOR FOR A HIGH-PERFORMANCE PERSONAL COMPUTER12<==<ProcFig5.press<Ъ  FFIFUAdQ[14]TPCITPCOBLinkFFReadyWakeuprrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr>>>>>rrrrCBrIMOutCPRegTLink*TPC*+1rrProcessorControlTPIMOutBestNextPCBestNextTaskNEXTIMAddressThisTaskNextPCLinkNextControlFFASelectLoadControlBSelectALUOPRAddressLink>>>*{PE}{Switch}{StartCycle}{StartCycle}{StartCycle}{InstructionDecode}{TPCByPass}{Switch}{UseCPReg}{ReadIM}{RAddr}{ReadLink}{Link_B}r{  }NextCtrlBlockmultiplexormultiplexor select signaltask specificFigure 5: Control sectionThisTaskLastTaskThisPCThisPC+1MemoryInstructionIMThisPCMIRt-2t0t2t-4Priority EncoderTLinkByPassLink_B{}<==<<>rr>rBBBBTLinkAddrTPCAddr  НДО`Mz wТ XУ   zw zw z w zw z
w zw zН  wНtО d2         LЇЕ    H÷eC 'Н2ОR÷ПН-ОR÷ПН(HОR÷ПН*:ОOЭЧ $9Н.СОOЭЧ $9Н2░ОOЭЧ $9Н6sОOЭЧ $9Н*хОG┐ПН,sОI
Ч $─Н*│О>XПН,sОFfЧr $Н0ЕО;ШЧ $
▐Н.╛О5PЧ $9Н,sОJьЧ╚ $Н3О5PЧ $╚Н.╛ОOЧ $9Н4:О	ПН<ОOЧ $9Н:ЕОOЧ $9Н9иОOЧ $9Н8╛ОOЧ $9Н╚ОR÷ПН9ОOЭЧ $9Н
▌ОKУПН ОD╩Чх $НОKУПНОI
Чг GНхОC÷ЧU $Н2ОЫЧ $9Н5ЕО	ЗЧ $гН8╛ОаЧ $гН,sОаЧ $гН:WО┬Ч $гН0ЕО3Ч $9Н2░ОTnЧ $╚Н9ОTnЧ $╚Н9ОM|Ч $─Н ОGіЧ $╚Н,sОMцЧ $9НДОO╣Ч╚ GНVОVЧ╚ $НОVЧ $1PНОB┌Чг $НrОOьЧ $]НrОRЧ $Н│О&чЧ $+WН╚О&╨ЧЫ $НVО4ФЧ $┤НVО6IЧ+ $Н╚О2пЧ $9Н
 ОOьЧ $]НгО6ЄЧ $│НгОоЧ $&ЕНх|О┼ПН9О┼ПНО
ПНО&`ПНО/▀ПНхО1дПНrО1дПНrО5НПНrОDaПН╚ОFПН9ОFПН╚ОJ6ПНДОLІПН╚ОLІПН	rОLІПН╚ОQ(ПН)╛ОLІПН.eОLІПН2ОLІПН5ЕОLІПН:ЕОLІПН<░ОLІПН5ЕОQ(ПН2ОQ(ПН+ЕОJ}ПН+ЕОEдПН+ЕОA ПН+ЕО<≥ПН+ЕО8ІПН0WО8ІПН2░О2ПН.О2ПН0WО/рПН0WО&`ПН0WО
ПН0WОCПН.О
ПН(▐О
ПН'sОCПН+ЕОCПН+ЕО{ПН3О	{ПН5WОBПН5WОМПН-░ОМПН8О	{ПН8О{ПН6sОCПН9иОCПН9иОCПН8О
ПН9;О
ПН:WО
ПН;tО
ПН1sОЄПН╚ОAПН╚О>рПНО?ОПН(ЗО?ОПН)О≤ПН▐ОЧ▌ GНrО
ПНхОЄПН
 ОKfЧ $Н	rО9Ч
ж $НHО7BЧ $РН	rО7Ч
Ы $Н	rО7Ч $НРО)┌Ч	+ $НО'ЁЧ $РНРО'░Ч	O $НДОжЧ² $НДО,Чю $НДО,Ч $нН5WО²ЧР $Н<IОЧ $╚Н5WОСЧ $Н5WОСЧ $нН+ЕО
dЧ▌ $Н1sОщЧ $╚Н+ЕО╧Ч╡ $Н+ЕО╧Ч $нН3╛О
dЧr $Н8ОщЧ $╚Н)╛О²Ч  $Н.╛ОЧ $╚Н)╛ОСЧ$ $Н)╛ОСЧ $нН,sО)┌Ч² $Н5О'ЗЧ $╚Н,sО'вЧю $Н,sО'вЧ $нН*:О?╩Чr $Н.╛О>4Ч $╚Н*:О>Ч∙ $Н*:О>Ч $нН*:ОHФЧr $Н.╛ОG_Ч $╚Н*:ОG;Ч∙ $Н*:ОG;Ч $нН ОK┼Ч $РН
 ОMXЧ  $НхОIQЧ $9Н9ОIQЧ $9Н О5	Ч $9Н+ЕОCПН+WО;ЄЧ╚ GН-░О5	Ч╚ GН,sО;ШЧ $9Н╚ОOЧ $dН&ЕОAЧ GН.╛ОeЧ9 $Н(ОЧг GН8╛ОаЧ $UН5ЕО2Ч $╚Н*:ОDtЧr $Н2░ОzЧ9 GН5ЕОAЧж GН╚О┬Ч GН.О2Ч $╚Н*:ОжЧ  $Н*:О,Ч$ $Н*:ОBйЧ∙ $Н
 ОKfЧ$ $НхОA┼Ч GrН*│ОжЧ $ kН*│ОЧ$ $Н*хОAЧ $ GН/иОHЧ $@Н*хОeЧ$ $Н*│ОDtЧ $ kН*│ОD╩Чr $Н.╛ОC÷Ч k $Н*хОD╩Ч $ kН*хОEЧ∙ $Н/:ОCФЧ $@Н.СОCФЧ k $Н,sОEЧ $]Н,sОeЧ $КНДО4бЧ▌ GН/:ОЧ k $Н/│ОHЧ k $Н7░ОЧ▌ GН%хОOьЧ $]Н%:ОLІПН$╛pОR÷ПН$╛ОO╣Ч▐ GН▐ОKfЧ@ $Н8╛О╨ПН,sО	ПН*│ОsПН*хОCПН/иО#ПН0W|О!ПН.О
dЧ $нН-░ОBПН"spО
╚П	Н"sОПН╚ОK┼Ч $РН▐ОK┼Ч $РН▐ОMXЧ $Н5ЕО:ПН0ЕО!PЧ $╚Н)СО7ґП
Н
▌О7ґПН2░О7┴Ч $╚Н)О7fЧ $нН О1IПНРО'░Ч $Н-░О1IП	Н=ОOЭЧ $rН'sОKУПН*хО:ПН2░ОX.ПН.ОX.ПН!VОX.ПН²ОX.ПНРОX.ПН
жОX.ПНUОX.ПН;tОOЭЧ $╚Н=ОWПН9ОЧ GrН|О_ПНО≤ПН  ОЧ9 $Н  О	HЧ9 $Н9О+Чг $НДО{ПН╚pО─ПН2vОHФПНДО4·ПНДОЕПН  ОdПНrОFґПН▐ОO▒ПН#▐О;░ПН(▐О4ФПН7░ОхП
Н<IОПН=ґОЕПН:WОДП
Н"sОЕПН.╛ОVЧ $╚Н.|ОRсПН╚pОHПН&·ОC÷Ч² $Н*:ОBМЧ $╚Н.╛ОBМЧ $╚Н.СОCбЧ $Н)О9Ч	√ $Н)О7fЧ	√ $Н-О│Ч $нН-О│ЧК $Н4иО╔Ч $╚Н-О!,Чг $Н/│О%Ч $Н4:ОR÷ПН(HОX.ПН0ЕОeЧ $@НVО╚Ч $єН│ОOЧ $╚НО²ПНОHПНО─ПН╚О─ПН0ЕО$^Ч $²НVО(ПНHОsПН.eО(ПН-HОхПН<О/ПН;,О0╩ПН9;О,%Ч
ж GНCиО,lЧ G╚Н9;О2пЧ
▐ GН=ґrО-eПН=ОTJЧ9 $Н=pОTьПН0ЕО0-ЧV $Н,sОAТЧ
▐ $Н7О┬Ч $-░Н;tО)┌Ч╚ $НBО'ЗЧ $╚Н;tО'вЧн $Н;tО'вЧ $нН>иО)╔Ч $гН:WО+,Ч∙ $Н:WО┬Ч $хН=rО(ПН>иО'Ч $НDФО'Ч $5бН :О\═Ч$╛ $Н9;О,lЧ G╚Н0ЕО%·Чх $Н,sОжЧV $НгО6IЧ9 $НгО+Чr $Н:ОHЧД $НгОOЧ $9НVОжЧ $2Н  О7fПН ╡tО7ПН  rО(eПН ╡tО(ПН  rО,ПН ╡tОДПН  rОKУПН ╡tОKґПН О94Ч $9Н
 pОEьПН9ОYnЧ $╚НVОYnЧ $╚НrОYnЧ $╚НхОYnЧ $╚Н$ОYnЧ $╚Н*:ОYnЧ $╚Н.╛ОYnЧ $╚Н6sОYnЧ $╚Н9ОZУЧ/: $НHиОжЧ $GяН;tОV┐Чy $НGґО%·Ч $.пН=ОTJЧ
╡ $НгОRЧ9 $НЗvОlПН │О┬ПНHОПН%ЛОПН3╛ОщЧ $╚Н3╛О╧Ч∙ $Н3╛ОаЧ $9Н,sОЗЧ $Н)ОЧ  $Н-ОКЧ
  GН#pО dЯН|О≤ПН'sОHЧК $Н.╛О94Ч $гН.О5НПН6sОTJЧ $yН9ОUУЧ&W $Н9О

ПН8╛ОeЧy $Н8О,|ПН>;О&`ПН)ОOЧ $UНОOЧ $UН,sО┬Ч $гН*:ОOЧ $╚Н/:ОOЧ $╚Н(О┬Ч $╚Н√pОхПН(▐ОхПН'sО²ПН1sО+ПН:О+Ч $@Н0ЕО)┌Ч $]Н╚О)╔Ч $РН╛О╨П	Н╛ОCПН:WО┬Ч $гН,sО?ъЧ $Н>иО'%Ч $ уН :О[Ч $╚Ъ    L  j*Е    HМ\цL                                                                                                  SEC. 5LOW LEVEL ARCHITECTURE13<==<ProcFig6.press<Ъ  TSTACKQrrRBaseqIFU>>>>>+Shifter.......................RM..>..r..COUNTALUMemBase>>>>>>>TIOAIOAddressAdder..ALUFMrLSH 1RSH 1.......StackPtrRBaseABExternalBIODatadevicesRESULT,....>>>>>>>memorymemorydevicesto I/OsmallconstantconstantRESULTBdata from>IFUAqFigure 6: Data sectionRAddrRaddr>register or memorylatchmultiplexor latchmultiplexornarrower data pathFFfrom t1 to t2from t2 to t3.>RT*task specific****LoadcontrolcontrolLoadALUOPaddress toRRTTMDQMDQRTMemData (MD)FFFFFF(copy of A)MemAdMDMemBALUFMALUALUALU.IFUconstconst(copies of B)IODataCOUNTIOAdBaseRegsto/fromcontrol,memory,IFU,to/from,XB{FF, mask}{bypass}{bypass}}{}{{control}{control}rother 16 bit pathpointerStackData readyshortly after t1Latches followLatches followData readyshortly before t3RAMs readat t1, load at t4load at t3RegistersBalso to RESULTshift 1 bit left or rightShiftCtl)(also toto RESULT<==<<main bus (A, B, RESULT, MemData, IOData)SHIFTCTLSHIFTCTL  Н  О`MzТ GУ  Н%ОН;┬wНtО d2Ъ         4ЇЕ    H÷eC Н2░О3┌Чr $Н7О$;Ч $kН2░О$Ч $▐НО0-Ч $∙НО0-Ч ╡ $Н▐О0PЧ $rНО4÷Ч ▌ $НО'IЧ ▌ $Н▐О"ШЧ $rНО5-Ч▌ $Н╚О1ШЧ $UНО1ьЧ╡ $НО1ьЧ $yНДО&PЧ9 $Н╚О&PЧ─ $НО$┌Ч $yНО$┌Ч╡ $Н╚О$╔Ч $UНО'вЧ▌ $НО'IЧ ▌ $НО"вЧ $∙Н2rО%бПН┬pО.┌ПНоО/4Ч Ы $НО0╩Ч▌ $Н╚О-┴Ч $UНО-fЧ╡ $НО-fЧ $yН┬О╔Ч $9НЫО┌Ч╡ $НЫО┌Ч $]Н2rО ПНхО:╩Ч² $НdО8іЧ $9НхО8┐Чю $НхО8┐Ч $]Н]О(пЧ $гН]О(пЧ+ $НdО(ТЧ $єНДО+tЧ $9Н9О5-Ч $нН╚|О2ПНrО2ПНvО9XПНVxО$ыПН	rvО;ПНVО8Ч $UН	rО6ьЧг $Н9|О7ПН9О5ОПН9О4рПН╚О5ОПНО&їПНkО*WЧР $НєО+╩Чd $НєО*Ч $╚НхpО*{ПНО*4Ч $нНєО*Ч┤ $Н+О+	Ч@ $Н	rО,&Ч	r $Н	rО+≈Ч $ ╡Н	rО+≈Ч2 $Н▌О бЧ	╧ $Н╡О)┌Ч	╧ $НkО);Ч $9НО4бЧ
▐ $Н▐О%4Чг $НО=Ч
k $Н▐О;ЭЧ $9НО;ьЧ ╡ $НО;ьЧ $]НДО=Ч
$ $НVО;ьЧ $]НVО;ьЧ ╡ $НДО;ЭЧ $9НVО>Ч ▌ $НО>Ч ▌ $Н!3О%WЧ $┤НаО бЧ $√Н$О!чЧ $гН'sО/Ч $Н-░О(┴Ч $╚Н'sО(fЧ@ $Н'sО(fЧ $нН!zО*бЧ▌ $Н ]О,ШЧ╚ $Н(rО+-ПН-░О+ъЧ $Н/╔О+ъЧ $╡НО0tЧ▌ kНdО-┴Ч kUНО-fЧ╡ kНО-┴Ч k2НО1ШЧ k2НdО1ШЧ k2НdО4ФЧ$ kНAО1ьЧ$ kНО$╔Ч GUН┬О$┌Ч GUНAО'ЄЧk GНdО$┌Чk GНО"вЧ ╡ $Н╚О3іЧК $НоО.МЧd $Н╚О3_Ч $НО.МЧ $∙НхО/4Ч $NН3О3_Чd $НДО3іЧ9 $НДО3_Ч9 $НщО.МЧ $∙НщО.МЧ@ $НHО1mЧД $Н$О1&Ч	 $Н$О/4Ч Ы $Н$О1mЧ $Н╚О&	Ч─ $Н$О$Ч	r $НщО#пЧ	╧ $НДО&	Ч Ы $НHО&	Ч у $НщО#ТЧ $9Н$О$;Ч $РН╚О'&Ч
Ы $Н▌О {Ч	╧ $НО#ґЧ $dНО ·Ч $єН О&≈Ч $оН!zО%WЧ $GНщО(BЧ $╡Н,О.МЧ ╡ $Н▐О2BЧ┤ $НДО<яЧ
$ $Н%О(ґЧ $dН!zО*{Ч▌ $Н ]О,ЄЧ╚ $Н!zО*ФЧ $жН ]О'Ч $жН ]О-Ч $GНО<яЧ
k $Н ▌О2BЧ $Н9О4{Ч
k $Н/ЛО+╩Ч $╡Н-░О+≈Ч─ $Н$AО&-Чy $Н$О1ьЧ² $Н$О1mЧ┤ $Н/ЛО1mЧн $Н/ЛО5-Ч $ ]Н$AО5
Чо $Н$AО5QЧd $Н/╔О5QЧ $ │Н'sО.МЧ@ GН'sО(fЧ@ GН'sО(┴Ч G╚Н-lО(┴Ч G╚Н2░О3;Ч∙ kН2░О$Ч∙ kН2░О$^Ч kHН6╩О$;Ч kHН"√xО+ОПН"√О*DПН"√О(≥ПН"√О&НПН"√О DПН"√ОНПН"√ОCПН ]О&чЧ $НО(BЧk $НО.МЧd $НаО/4Ч╚ $Н!zО0≤ЧР $Н!3О0ъЧ9 $Н │О2BЧК $Н▐О2┴Ч	щ $Н%О3МЧG $Н%О44ЧG $НО#┴Чd $НО#BЧd $Н!zО%4ЧР $Н▐О$МЧ	щ $Н%О(┴ЧG $НzО+`ПНzО)'ПНVОПНzОрПНоО*DПНоО,}ПНоОНПНоО'ПН%ЛО$╣ПН%ЛО"|ПН1	О)nПН1	ОцПНrО|ПНЫО!╩Ч╡ $НоО4рПНzО4рПНyО9÷ЧN $НVО9÷Ч  $Н▌О8┐ЧК $Н$AО!╩Ч ╔ $Н"√О≥ПН│rО2ТПНО╩Ч ▌ $НО┌Ч $]НО┌Ч ╡ $Н▐О╔Ч $9Н!zОIЧ $єН!zО%Чk $Н"√xОНПН┬ОЕЧ $НоО·Ч $НЁОzЧ@ $Н▐ОбЧ $Н$AО;Ч─ $Н'О^Чщ $Н$eОиЧ │ $Н▐О?QЧ $9НО?-Ч ╡ $НО?-Ч $]НОAfЧ ▌ $НЫОH÷Ч╡ $НЫОFgЧ $]НЫОFgЧ╡ $Н┬ОF┼Ч $9НжОTKЧ╡ $НжОRЧ $]НжОRЧ╡ $Н/иОBЧ $─НkОBЧ*│ $Н ОвЧ+W $Н$О%·Чо $Н ОЗЧ $:WНzО=oПНОFЧ ▌ $Н▐ОBНЧ $UНОBйЧ ╡ $НОBйЧ $yНа|О?╗ПНОE╣Ч $Н╚ОG_Ч $НхОE▒Ч $РНОEьЧ $╚НхОEnЧd $НЁОDuЧ $Н*хО/4Ч $dНоxО>DПНrО?aПН²ОGїЧ╚ $Н²ОG_Ч╚ $НkОF┼Ч9 $Н▌ОFCЧ $Н*:|О+ОПНkОL┐Ч9 $НkОLйЧ9 $НyОKCЧн $НyОK┼Чн $НrxОCEПНоОD┘ПН╚ОKCЧ%Л $НЫОJJЧ▌ $НоvОK ПН3BrО+-ПН▐О<яЧN $Н%О4XЧ $²НО5QЧ $UНAvОRцПН9|ОPЎПН9ОO~ПН9ОN>ПН╧ОO~ПНyОOJЧ9 $Н√ОK ПН9ОJ║ПН9ОL⌠ПН▌ОPCЧг $НжОN.Ч╡ $НжОN.Ч $]НжОPgЧ╡ $Н]vОNЮПН┬ОO'Ч/: GН%:rОOыП	Н2ОY╣Ч  GН7ОTnЧ G▌Н2ОT'ЧG GН2ОT'Ч GжН)eОY▓Ч$ kН.eОUУЧ kДН)ОUУЧ kДН.╛ОUУЧ $Н)ОYыЧ▌ $Н)eОUУЧ$ kН/╔ОUўЧ┤ $Н.╛ОXuЧ─ $Н.╛ОX.Ч─ $Н2lОVїПН┬ОRЧ $]Н┬ОN.Ч $]Н0WxОP0ПН0WОMiПН/ЛОUgЧ@ $Н=ґО$Ч╧ $Н9иО)Ч² $Н9иО*бЧ² $Н9иО.Ч² $Н2ЁО7ґЧ GН6╩О5╩Ч G9Н2ЁО5tЧN GН2ЁО5tЧ G─Н2ШvО6mПН4иО3┌Ч $Н4;|О0=ПН7%О,mЧ $Н9иО*{Ч² $Н=ТО#пЧr $Н=ТО"%Ч $╚Н=ґО"%Ч $Н$AО"IЧk $Н=ТО"IЧР $Н9иО*бЧ $dН9иО)Ч $dН9┌О(пЧД $Н9┌О(ТЧ $2Н9иО,&Чy $Н7%О,&Ч─ $Н>МtО+	ПН>иО)^ПНA░xОрПНA░О рПНA░О"|ПНA░О$'ПНA░О%рПНA░О'}ПНA░О)'ПН9иО.;Ч $ Н9┌О-ТЧ $ Н9┌О-пЧД $Н▐О=Ч : $Н%О<яЧ] $Н:бО/ФЧ $ДН:{О/÷Ч $аН:{О/{Чг $Н:бО/бЧ─ $Н┬ОGїЧ%^ $Н3ОG_Ч#l $Н;tО1&Ч $AН;tО1&ЧР $Н;╩О1mЧ╚ $Н;╩О1▒Ч $Н┬ОK┼Ч&W $Н<░О2ТЧ╚ $НОL┐Ч▌ $НЫОJnЧ $9Н┬ОJJЧ $]Н<░О3Ч $VН┬ОS.Ч(% $Н=О'IЧ@ $Н<ШvО'░ПН>;О%ЕПН4ЛО4бЧO $Н7ОWYЧД $Н7ОWЧД $Н9О=Ч $КНО4ФЧ $Н ▌О%4Ч $Н9О'░Ч $Н9О'mЧ
k $Н╚О'&Ч $-IН╚ОS╪Ч $КН╚О[└ЧCи $Н9ОR═Ч $²Н9О[ЧC; $Н  О3Ч $VН  О2┴Чє $Н ▌О$МЧ $Н ▌О%WЧ $Н ▌ОаЧ $OНщО1IЧ $9НДО3_Ч Ы $НДО1ШЧ $UНVО1ьЧ $yНVО$┌Ч $yНДО$╔Ч $UНVО'вЧ ▌ $НVО'вЧ ▌ $НДО&	Ч Ы $НщО#ТЧ $9НVО$┌Ч ╡ $НC;О,░Ч
  $НC;О,Ч	r $НLґОаЧ $dНM;О3Ч $│Н ▌О·ЧLB $Н  ОЧM_ $Н%:rО2BПН%:О&╩ПН%:О П	Н%:О"░ПНFpО!PПНFrО,вПНD4xОSПНDОOПНDОGПНDОЮПНDО`ПН=О%·Ч $Н=ґО6JЧ▌ $НB|О2≥ПНBО0ОПНBО/DПНBО#≥ПНBО!НПН ОS.Чy $Н▌ОLНЧ $@НkОFўЧ $ЫН9ОFЎПН9ОDлПНFpОZDПНFОV<ПНFОNuПНFОO╣ПНC;О#┴Ч GН$О.Ч G
▐Н$ОeЧ G@НVОEЭЧ GUНVОIЮЧ GUНVОMцЧ GНVОQ`Ч GюН rО@JПН О?-ПН ОПНОЕПНО╛ПНFpО[`П	НVОJnЧ$ $НVОNQЧ  $НVОCXЧ╚ $НVО@JЧ╚ $На|О< ПН	√vОT╣ПН	√ОH|ЧД $Н	√ОHцПН	rО:╩ЧД $НVО8Ч GUН+·ОS.Ч $гН+|ОQ⌠ПНyОS.Ч] $НpОПН	rvО7ПН	OО*{ПНVОбЧ $6PНД|ОНПНОHЧ╚ $НгОЧ $UНО
СЧн $НО
СЧ $yНДpОПН9О	жЧ ▌ $НгОOЧ $╚Н9О,Ч ╡ $Н9О,Ч $нНДО,ПНzО╨Ч ▌ $НО│Ч $]НzО│Ч ╡ $НzО│Ч $]НДОаЧ G9НОПНО,ПНUО	ЧД $НsО,Ч+ $НsОЧ+ $НsО	HЧ∙ $НsО,Ч∙ $НUО·ЧК $Н,sО·Ч▌ $Н,sОЧ▌ $Н,sО
╛Ч▌ $Н,sО
eЧ▌ $Н,sО╨Ч▌ $Н3О,ПНО·Ч $НvОЕПНyОQНЧ  $НVtО░ПНVО░ПН┬О!чЧД $НоО!≈Ч² $Н"√xО≥ПН┬О бЧ9 $Н&аО%Ч $9Н▐О@JЧ $Н╛О5ъЧ $
▐Н╛О5╩Чr $Н#|О2ПНrО2ТПНО%·ПН0ЕpОdПН3ОСПНrО%·ПН▐О9÷ПНОN╪ПН▐ОS.ПНdtО#┴ПН
ЫО"lПН
ЫО/бПНdО0ъПН4иО7ґЧ $┤Н1,О94Чю $Н1,vО9{ПНFpОWYП
Н!ЕtО2яПН!ЕО'&ПН!ЕО1&ПН!ЕО%{ПН!3О4{ПН!ЕО/{ПН!·О(пПН!ЕО#пПН%:О-BПН%:О+	ПНdrО\ПНхtОC÷ПНхО@▒ПНхОЕПН%:О7ПН0WОвЧ $КН%:rО5╩ПН=ґО6mЧ $ДН?ФtО._ПН>іО6▒ПН>;О5
ПН?{О,ЄПН<mО+	ПН<mО)^ПН ╔О7┼Чг $Н ╔О7BЧг $Н"√xО/DПН ЛvО7яПН :tО"%ПН О6ПН%:О$╔ПН>^О$^ПН>;О1ЄПН?4О3;ПН*rОX.ПН*:ОVкПНC;О,mЧ
  $НLяО·Ч $dНMО3Ч $│НC;О,&Ч	╧ $НLяО+╩Ч $ ▌Н ▌ОzЧLB $Н  О3ЧM_ $Н $О3Ч $VН kОzЧ $√Н kО%4Ч $2НО'mЧ $2НнО'&Ч $-IН$ОЗЧ $:WН▌О бЧ $%╔НkО бЧ $%╔Н▌ОBЧ $]НkОBЧ $]НkОeЧ*│ $Н ОЗЧ+W $Н▌ОFўЧ $ЫНkОLНЧ $@Н9О4ФЧ $НО=Ч $КНнОS╪Ч $КНОR═Ч $²Н╚О[`ЧCи $Н9ОZУЧC; $Н/ЛОBЧ $─Н03ОвЧ $КН03О%бЧ┤ $Н$AО&	Чy $Н$AО"%Чk $Н$AО!чЧ ╔ $Н$eО╔Ч │ $Н$eО^Ч─ $Н!3ОЧ $UН!zОЛЧР $Н'О;Чщ $Н>О"%Ч┤ $Н,sОzЧ▌ $Н,sО3Ч▌ $НFpО^ПНFОПНFО╩ПНFО3ПНAвxО`ПНFpО"░ПНAвxОЮПН$О1ЄЧ² $Н$О1IЧ┤ $Н	√ОTnЧД $Н	OО*4Чy $Н!VtО {ПНAО8;П
НVО/XПНVО"ПН+О0tПН
kО0tПН+О#ПН
kО#ПНСО
┬П	НСОхП	Н4;|О4▀ПН3pО
ПН'ОчЧ $єН!3ОчЧЫ $НєvО)^ПНО*WПН%:tО╛П
Н%:О░ПНVО╛ПНVО╛ПНEtО╛П
НEtО░ПН╚ОЕП	Н▐ОхПН╚О░П
Н╚О╛П	НVО5-Ч ▌ $НVО1ьЧ ╡ $Н!3О 4Ч9 $Н0WО%·Ч@ $Н0О1IЧ┤ $Н▌rОSuПН$О/XЧ $Н	OtО,mПН]О+tЧ $НOОeПН┬О {Чє $Н О&≈ЧU $Н$О%бЧР $Н&аО2BП	Н&·О3_ПН$О7ґЧ╚ $НоО7ТП	Н'spО dЯН3ОП(НdtОG<ПН<ЄО0	ПЪ    4  °╦Е    O
]RФ                                          A PROCESSOR FOR A HIGH-PERFORMANCE PERSONAL COMPUTER146.  ImplementationIn this section we describe, at the block diagram level, the actual implementation of the Doradoprocessor.  There is only space to cover the most interesting points and to illustrate the key ideasfrom І 5.6.1 ClocksThe Dorado has a fully synchronous clock system, with a clock tick every 30 nanoseconds.  A cycleconsists of two successive clock ticks; it begins on an even tick, which is followed by an odd tick,and completes coincident with the beginning of a new cycle on the next even tick.  Even ticks maybe labeled with names like t-2, t0, t2, t4 to denote events within a microinstruction execution or apipeline, relative to some convenient origin.  Odd ticks are similarly labeled t-1, t1, t3.6.2 The control sectionThe processor can be divided into two distinct sections, called control and data.  The control sectionfetches and broadcasts the microinstructions to the data section (and the remainder of the Dorado),handles task switching, maintains a subroutine link, and regulates the clock system.  It also has aninterface to a console and monitoring microcomputer which is used for initialization and debuggingof the Dorado.  Figure 5 is a block diagram of the control section.6.2.1 Task pipelineThe task pipeline consists of an assortment of registers and a priority encoder.  All the registers areloaded on even clocks.  Wakeup requests are latched at t0 in WAKEUP, one bit per task; READY hascorresponding bits for preempted and explicitly readied tasks.  The requests in WAKEUP and READYcompete.  A task can be explicitly made ready by a microcode function.  The priority encoderproduces the number of the highest priority task, which is loaded into BESTNEXTTASK and also usedto read the TPC of this task into BESTNEXTPC; these registers are the interface between the twostages in this pipeline.  The NEXT bus normally gets the larger of BESTNEXTTASK and THISTASK.THISTASK is loaded from NEXT, and LASTTASK is loaded from THISTASK, as the pipeline progresses.This method of priority scheduling means that once a task is initiated, it must explicitly relinquishthe processor before a lower priority task can run.  A bit in the microword, Block, is used toindicate that NEXT should get BESTNEXTTASK unconditionally (unless the instruction is held).Note that it takes a minimum of two cycles from the time a wakeup changes to the time thischange can affect the running task (one for the priority encoding, one to fetch the microinstruction).This implies that a task must execute at least two microinstructions after its wakeup is removedbefore it blocks; otherwise it will continue to run, since the effects of its wakeup will not have beencleared from the pipe.  The device cannot remove the wakeup until it knows that the task will run(by seeing its number on NEXT).  Hence the earliest the wakeup can be removed is t0 of the firstinstruction (NEXT has the task number in the previous cycle, and the wakeup is latched at t0); thusthe grain of processor allocation is two cycles for a task waking up after a Block.Some trouble was taken to keep the grain small, for the following reason.  Since the memory isheavily pipelined and contains a cache which does not interact with high bandwidth I/O, the I/Omicrocode often needs to execute only two instructions, in which a memory reference is started anda count is decremented.  The processor can then be returned to another task.  The maximum rate atwhich storage references can be made is one every eight cycles (this is the cycle time of the mainstorage RAMs).  A two cycle grain thus allows the full memory bandwidth of 530 megabits/secondto be delivered to I/O devices using only 25% of the processor.Ъ                                                                                                                                                                                                                                                                                                                                                                    НДО[z wТ XУ   zw zw z w zw z
w zw zН  wН  ОTІ{Н  ОQ▀wТ ўПPТ ╞Н  ОP	Т ÷Т ═ПSН  ОN┐Т ─Н  ОJ└|Т XН  ОGYwТ ▀П;|w|Н  ОEуwТ ║П/Т ╒ |w|wН  ОDQТ ▀П7Т ▄П'Н  ОBмТ ╘Т ╙~ ОB@ОBмw~ ОB@ ОBмw~ ОB@ ОBмw~ ОB@ ОBмwП:Н  О@ъТ ─ПF~ О@RО@ъw~ О@R О@ъw~ О@R О@ъw Н  О<Ю|Т XН  О9╣wТ ┐	Т └П3|w|wН  О81Т █П1Т ▌П+Н  О6ґТ ≥ПVТ  Н  О5)Т ▀Т ▄ПVН  О3╔Т ─ПAН  О/і|Т XН  О,{wТ ▄П`Т █Н  О*ВТ ▀Т ▄П.~ О*j О*ВwzwzwН  О)	Т ┤ПBТ ┬ zwzН  О'┘wТ ҐТ ЎПHН  О&Т ┐П)Т └zwН  О$}Т ╩	zwz	wП3Н  О"ЫТ ґzwТ ў
zwzw Н  О!uzwТ ─zwzwzwН  ОJТ ≤ПVТ ≥
Н  ОфТ гПIТ х ~wН  ОBТ ─zwzwП2Н  ОТ аТ бПFН  О⌠Т │ПCТ ┌Н  ОТ ІП@Т ЇН  О▀Т ┘П.Т ├П3Н  ОТ ░П-Т ▒П-Н  О┐Т ÷zwП4~ ОЖ О┐wН  О∙
Т ░zwТ ▒ПA~ О О∙wН  ОїТ ─ |wПD~w Н  О	|Т ґП$Т ўП6Н  ОЬТ ╗П.Т ╘z w z wz w z Н  ОtwТ ┬Т ┴П=Н  ОП Т ─П%Т │П;Н  ОlТ °П?Т ²Н  ОХТ ≥ zwТ  ПOН  О dТ ─z w z wП)Ъ        ·Ї%    H÷`ъ                                                                  SEC. 6IMPLEMENTATION15A simpler design would require the microcode to explicitly notify its device when the wakeupshould be removed; it would then be unnecessary to broadcast NEXT to the devices.  Since thisnotification could not be done earlier than the first instruction, however, the grain would be threecycles rather than two, and 37.5% of the processor would be needed to provide the full memorybandwidth.  Other simplifications in the implementation would result from making the pipelinelonger; in particular, squeezing the priority encoding and reading of TPC into one cycle is quitedifficult.  Again, however, this would increase the grain.6.2.2 Fetching microinstructionsRefer to the right hand side of Figure 5.  At t0 of every instruction, the microinstruction registerMIR is loaded from the outputs of IM, the microinstruction memory, and the THISPC register isloaded with IMADDRESS.  The NEXTPC is quickly calculated based on the NextControl field in MIR,which encodes both the instruction type and some bits of NEXTPC;  see Figure 7 for details.  Thiscalculation produces THISTASKNEXTPC, so called because if a task switch occurs it is not used as thenext IMADDRESS.  Instead, the BESTNEXTPC computed in the task pipeline is used as IMADDRESS.<==<ProcFig7.press<  110001000ALU = 0ALU < 0Global CallLong Jump/Call01234567ADDRESS BITS01111Return001111BRANCHCONDITIONADDRESS BITSRETURNFUNCTIONNUMBERNEXTIFU JumpLocal Jump/Call0123456Conditional BranchConditionalJump/CallR < 0R odd-or-FF6061626364656667--Carry'0001x111undefined234567891011121314151514131211109876543223456789101112131415151413121110987654322345678910111213141515141312111098765432Link[2:15]000000FF[0:7]R015141312111098765432Loaded into Link by Call, Return, or IFUJump# 000xOverflow'IOAtten' (non-emulator) ThisTaskNextPCCPC[2:9]NC[2:7]CPC[2:3]NC[2:7]CPC[2:3]CPC[2:3]NC[4:7]CPC[2:9]NC[1:2]NC[3:4]R is branchconditionIFUAddress[4:13]NC[3:4]NC[5:7]CPC[2:9]CPC[10:15]+1CPC = CurrentPCNC = NextControlADDRESS BITSADDRESS BITSFigure 7: Next address formationInstruction typeNextControl Bit 15 is lsbBranch conditionA long, local or conditional branch is a CALL if, before any modification by branchCount=0  (& Count_Count-1)conditions or dispatches, ThisTaskNextPC[12:15]=0; otherwise it is a jump.<==<<  Н  О^іzТ GУ  Н+IН;┬wН  ОXO Т иП!Т йП:Н  ОVкТ ╣П7zwТ ІН  ОUGТ  ПIТ ⌡Н  ОSцТ ╚Т ╛ПBН  ОR?	Т ©Т юП<Н  ОP╩Т ІП?zwТ ЇН  ОO7	Т ─П0Н  ОK8|Т XН  ОHwТ ╔П(Т і ~ ОG─ ОHwП4Н  ОFzwТ юzwТ аzwН  ОD⌡Т °zwТ ²zwП$~
w	zw Н  ОCТ ≤П4zwТ ≥Н  ОA⌠
Т ┬	zwПAН  О@Т ─ zwz	wП*zw НtО d2Ъ        ЧЇ▄    H÷c° іН ▐rО0╩ПН ▐О+╨ПН ▐О&╨ПН ▐О!╨ПНхО0╩ПНхО+╨ПНхО&╨ПНО&╨ПН:О&╨ПН
tОrПН
ОVПНspО+╨ПНsО&╨ПН ▐О4ПНхО4ПНО4ПН:О4ПН	sО4ПН╚О4ПНДО4ПНО4ПН	suО'ПН ▐rО╨ПНхО╨ПН╚О╨ПНДО╨ПНО╨ПНspО╨ПН ▐rО▐ПНхО▐ПНО▐ПНО▐ПНДО▐ПН╚О▐ПН:uО"ПН╚О ЕП	НхО"ПН▐О▐ПНОsПН╚ОПН:ОПНspО▐ПНsО0╩ПН9tОrПН9ОVПН9О9ПН9О
ПН9О	 ПН9ОДПН9ОгПНVpОПНsО"HПНsО ·П	Н
tО	 ПН
ОДПН╧uОПН:ОПН:tОrПН:ОVПН:О9ПН:О
ПН:О	 ПН:ОДПН:ОгПН:О╚ПНРО╚ПН
О9ПН ▐rОdПНхОdПНHОdПН│ОdПН
pОdПН╚rОdПНДОdПНОdПНspОdП	Н!ЕО2eЧ $НAО0PЧ $9Н!ЕО0,ЧA $Н!ЕО0,Ч $]Н$О0PЧ $ ▌Н(░О0PЧ $ ▌Н*иО0PЧ $ ▌Н-О0PЧ $ ▌Н/;О0PЧ $ ▌Н1tО0PЧ $ ▌НAО0PЧ $ ▌Н>йО0PЧ $ ▌Н<▒О0PЧ $ ▌Н:XО0PЧ $ ▌Н8О0PЧ $ ▌Н5ФО0PЧ $ ▌Н#tО/ПН%;О/ПН'sО/ПН)╛О/ПН+ЕО/ПН.О/ПН0WО/ПН2░О/ПН4;О/ПН6tО/ПН8ґО/ПН:ФО/ПН=О/ПН?XО/ПН?XО*ПН=О*ПН:ФО*ПН8ґО*ПН6tО*ПН4;О*ПН2░О*ПН0WО*ПН.О*ПН+ЕО*ПН)╛О*ПН'sО*ПН%;О*ПН#О*ПН5ФО+PЧ $ ▌Н8О+PЧ $ ▌Н:XО+PЧ $ ▌Н<▒О+PЧ $ ▌Н>йО+PЧ $ ▌НAО+PЧ $ ▌Н1tО+PЧ $ ▌Н/;О+PЧ $ ▌Н-О+PЧ $ ▌Н*иО+PЧ $ ▌Н(░О+PЧ $ ▌Н$О+PЧ $ ▌Н!ЕО+,Ч $]Н!ЕО+,ЧA $НAО+PЧ $9Н!ЕО-eЧ $Н!ЕО(eЧ $НAО&PЧ $9Н!ЕО&,ЧA $Н!ЕО&,Ч $]Н$О&PЧ $ ▌Н(░О&PЧ $ ▌Н*иО&PЧ $ ▌Н-О&PЧ $ ▌Н/;О&PЧ $ ▌Н1tО&PЧ $ ▌НAО&PЧ $ ▌Н>йО&PЧ $ ▌Н<▒О&PЧ $ ▌Н:XО&PЧ $ ▌Н5ФО&PЧ $ ▌Н3ґО&PЧ $ ▌Н#О%ПН%;О%ПН'sО%ПН)╛О%ПН+ЕО%ПН.О%ПН0WО%ПН2░О%ПН4;О%ПН6tО%ПН8ґО%ПН:ФО%ПН=О%ПН?XО%ПН?XО ·ПН=О ·ПН:ФО ·ПН8ґО ·ПН6tО ·ПН4;О ·ПН2░О ·ПН0WО ·ПН.О ·ПН+ЕО ·ПН)╛О ·ПН'sО ·ПН%;О ·ПН#О ·ПН5ФО!чЧ $ ▌Н<▒О!чЧ $ ▌НAО!чЧ $ ▌Н1tО!чЧ $ ▌Н/;О!чЧ $ ▌Н-О!чЧ $ ▌Н*иО!чЧ $ ▌Н(░О!чЧ $ ▌Н$О!чЧ $ ▌Н!ЕО!╨Ч $]Н!ЕО!╨ЧA $НAО!чЧ $9Н!ЕО#СЧ $Н!ЕОСЧ $НAОчЧ $9Н!ЕО╨ЧA $Н!ЕО╨Ч $]Н$ОчЧ $ ▌Н&WОчЧ $ ▌Н(░ОчЧ $ ▌Н*иОчЧ $ ▌Н-ОчЧ $ ▌Н/;ОчЧ $ ▌Н1tОчЧ $ ▌НAОчЧ $ ▌Н>йОчЧ $ ▌Н<▒ОчЧ $ ▌Н:XОчЧ $ ▌Н8ОчЧ $ ▌Н5ФОчЧ $ ▌Н3ґОчЧ $ ▌Н#О²ПН%;О²ПН'sО²ПН)╛О²ПН+ЕО²ПН.О²ПН0WО²ПН2░О²ПН4;О²ПН6tО²ПН8ґО²ПН:ФО²ПН=О²ПН?XО²ПН?XО,ПН=О,ПН:ФО,ПН8ґО,ПН6tО,ПН4;О,ПН2░О,ПН0WО,ПН.О,ПН+ЕО,ПН)╛О,ПН'sО,ПН%;О,ПН#О,ПН3ґОlЧ $ ▌Н5ФОlЧ $ ▌Н8ОlЧ $ ▌Н:XОlЧ $ ▌Н>йОlЧ $ ▌НAОlЧ $ ▌Н1tОlЧ $ ▌Н/;ОlЧ $ ▌Н-ОlЧ $ ▌Н*иОlЧ $ ▌Н(░ОlЧ $ ▌Н$ОlЧ $ ▌Н!ЕОHЧ $]Н!ЕОHЧA $НAОlЧ $9Н!ЕО│Ч $Н'sО+PЧ $9Н'sО&PЧ $9Н'sОlЧ $9Н.ТvО▐П
Н&WО0PЧ $ ▌Н3ґО0PЧ $9Н3ґО+PЧ $9Н4┌tО,ПН6╩О,ПН8ТО,ПН;-О,ПН=fО,ПН?÷О,ПН8О&PЧ $9Н-░vО'ПН&WО!чЧ $ ▌Н3ґО!чЧ $9Н8О!╨Ч9 $Н>йО!чЧ $9Н8О!чЧ $9Н:XО!чЧ $9Н?÷О"▐ПН8ТО"▐ПН<▒ОlЧ $9Н?XtО─ПН=О─ПН:ФО─ПН8ґО─ПН6tО─ПН4;О─ПН2░О─ПН0WО─ПН.О─ПН+ЕО─ПН)╛О─ПН'sО─ПН%;О─ПН#О─ПН5ФОюЧ $ ▌Н8ОюЧ $ ▌Н:XОюЧ $ ▌Н<▒ОюЧ $ ▌Н>йОюЧ $ ▌НAОюЧ $ ▌Н1tОюЧ $ ▌Н/;ОюЧ $ ▌Н-ОюЧ $ ▌Н*иОюЧ $ ▌Н(░ОюЧ $ ▌Н&WОюЧ $ ▌Н$ОюЧ $ ▌Н!ЕО²Ч $]Н!ЕО²ЧA $НAОюЧ $9Н!ЕО	жЧ $Н3ґОюЧ $9Н3ґО²Ч $Н!ЕpО
РП,НДuО ЕПН
tО╚П	Н
ОгПН-░rО6IПН)vО1ПН8О1ПН",О,ПН+WО,ПН",О'ПН",ОПН:О'ПН)╛О"▐ПН3ТО"▐ПН:÷О"▐ПНBuО#ПНBО"П	Н+ЕvОПН<ьОПН ▐uОПН(░vОrПН6╩ОrПН!ЕО4WПН4;О4WПН:uО1ПН:О,ПНWpОП НsrО6IПН иОkЧ $ :Н  ОHЧ х $НхОkЧ $ :Н  ОkЧ $ :Н  О3┌Ч х $НО6IПН  О3┌ЧGґ $Н  ОHЧGґ $НGґОkЧ $ :НBtО1░ПН
uОПН!·tОПSН
О
ПН!·ОРПJН$╛pО dЯЪ   Ч  Х▄    Gя7┴x                                                                                                                                                                                                                            A PROCESSOR FOR A HIGH-PERFORMANCE PERSONAL COMPUTER16TPC is written with the previous value of THISTASKNEXTPC every cycle (at t3), and read for the taskin BESTNEXTTASK every cycle as well.  Thus, TPC is constantly recording the program counter valuefor the current task, and also constantly preparing the value for the next task in case there is a taskswitch.6.2.3 Miscellaneous featuresThere is a task specific subroutine linkage register, LINK, shown in Figure 5, which is loaded withthe value in THISPC+1 on every microcode call or return.  Thus each task can have its ownmicrocoded coroutines.  LINK can also be loaded from a data bus, so that control can be sent to anarbitrary computed address; this allows a microprogram to implement a stack of subroutine links,for example.  In addition to conditional branches, which select one of two NEXTPC values, there arealso eight-way and 256-way dispatches, which use a value on the B bus to select one of eight, or oneof 256 NEXTPC values. Since the Dorado's microstore is writeable, there are data paths for reading and writing it.  Relatedpaths allow reading and writing TPC.  These paths (through the register TPIMOUT) are folded intoalready existing data paths in the control section and are somewhat tortuous, but they are usedinfrequently and hence have been optimized for space.  In addition, another computer (either aseparate microcomputer or an Alto) serves as the console processor for the Dorado; it is interfacedvia the CPREG and a very small number of control signals.6.3 The data sectionFigure 6 is a block diagram of the data section, which is organized around an arithmetic/logic unit(ALU).  It implements most of the registers accessible to the programmer and the microcodefunctions for selecting operands, doing operations in the ALU and shifter, and storing results.  It alsocalculates branch conditions, decodes MIR fields and broadcasts decoded signals to the rest of theDorado, supplies and accepts memory addresses and data, and supplies I/O data and addresses.6.3.1 The microinstruction registerMIR (which actually belongs to the control section) is 34 bits wide and is partitioned into thefollowing fields:RAddress4Addresses the register bank RM.ALUOp4Selects the ALU operation or controls the shifter.BSelect3Selects the source for the B bus, including constants.LoadControl3Controls loading of results into RM and T.ASelect3Selects the source for the A bus, and starts memory references.Block1Blocks an I/O task, selects a stack operation for task 0.FF 8Catchall for specifying functions.NextControl8Specifies how to compute NEXTPC.6.3.2 BussesThe major busses are A, B (ALU sources), RESULT, EXTERNALB, MEMADDRESS, IOADDRESS, IODATA,IFUDATA, and MEMDATA .The ALU accepts two inputs (A and B) and produces one output (RESULT).  The input busses have avariety of sources, as shown in the block diagram.  RESULT usually gets the ALU output, but it isalso sourced from many other places, including a one bit shift in either direction of the ALU output.A copy of A is used for MEMADDRESS; two copies of B are used for EXTERNALB and IODATA.MEMADDRESS provides a sixteen bit displacement, which is added to a 28 bit base register in thememory system to form a virtual addresses.  EXTERNALB is a copy of B which goes to the control,memory, and IFU sections, and IODATA is another copy which goes to the I/O system; the sources ofЪ                                                                                                                                                                                      НДО^z wТ XУ   zw zw z w zw z
w zw zН  wН  ОX(zwТ ▌П'zw~ ОW⌡ ОX(wТ ▐Н  ОV:Т ▀ zwzwП2Н  ОTІТ ┼Т ▀ПRН  ОS2Н  ОO3|Т XН  ОLwТ ⌡П1zwП$Т °Н  ОJ└Т пТ я zwПFН  ОI 	Т ▀zwПFН  ОG|Т єТ ╔П@Н  ОEЬТ ┬Т ┴П?zwН  ОDtТ └|	wz wП#Н  ОBПТ ─zwН  О?еТ █П5Т ▌П+Н  О>AТ ёzwП%zw Т єН  О<ҐТ ╨П#Т ╩П5Н  О;9Т ІТ ЇП>Н  О9╣Т ⌠П(Т ■П3Н  О81Т ─zwП,Н  О42|Т XН  О1wТ ░ПDТ ▒Н  О/┐ zwТ ЮПFТ АН  О-ЪТ ─П,Т │zwП+Н  О,{	Т іzwП&Т їН  О*ВТ ─П>z w z wН  О&Ь|Т XН  О#мzwТ лПXТ мН  О"IТ ─НЖОЯ~Н⌡w Н]zw НЖОmz~Н⌡w Н]zwП#НЖОИ~Н⌡w Н]z wНЖОe~
Н⌡w Н]П!zwz w НЖОА~Н⌡w Н]z wП#НЖО]~Н⌡w Н]	z w z wП,НЖОы~w Н⌡ Н]П"НЖОU~
Н⌡w Н]zw Н  ОV|Т XН  О+wТ ⌠Т ■z wz wzw
zwzwz	wzwzw Н  Оїzw Т ─zwН  О	|Т ┘ zwz wz wzwТ ├Н  ОЬТ ╒Т ёzwzwН  ОtТ ┐Т └ПDzwН  ОП Т бz wz	wz wТ цzwzw Н  Оlz	wТ іПNТ їН  ОХТ ≤П&zwz wТ ≥Н  О dТ │zwzwТ ┌П#z w z w        LЇЁ    H÷cuМ                                      SEC. 6IMPLEMENTATION17B can thus be sent to the entire processor.  Both  are bidirectional and can serve as a source for B aswell.  IOADDRESS is driven from a task specific register; it specifies the particular device and registerwhich should source or receive IODATA.IFUDATA and MEMDATA allow the processor to receive data from the IFU and memory in parallelwith other data transfers.  MEMDATA has the value of the memory word most recently fetched bythe current task; if the fetch is not complete, the processor is held when it tries to use MEMDATA.IFUDATA has an operand of the current macroinstruction; as each operand is used, the IFU presentsthe next one on IFUDATA.6.3.3 RegistersHere is a list and brief description of registers seen by the microprogrammer.  All are one word (16bits) wide.RM:a bank of 256 general purpose registers; a register can be read onto A, B, or theshifter, and loaded from RESULT under the control of LoadControl.  Normally, thesame register is both read and loaded in a given microinstruction, but loading of adifferent register can be specified by FF.STACK:a memory addressed by the STACKPTR register.  A word can be read or written,and STACKPTR adjusted up or down, in one microinstruction.  If STACK is used in amicroinstruction, it replaces any use of RM, and the RAddress field in the microwordtells how much to increment or decrement STACKPTR.  The 256 word memory  isdivided into four 64 word stacks, with independent underflow and overflowchecking.T:a task specific register used for working storage; like RM, it can be read onto A, B,or the shifter, and loaded from RESULT under the control of LoadControl..COUNT:a counter; it can be decremented and tested for zero in one microinstruction, usingonly the NextControl or FF field.  It is loaded from B or with small constants fromFF.SHIFTCTL:a register which controls the direction and amount of shifting and the width of leftand right masks; it is loaded from B or with values useful for field extraction fromFF.Q:a hardware aid for multiply and divide instructions; it can be read onto A or B, andloaded from B, and is automatically shifted in useful ways during multiply anddivide step microinstructions.The next group of registers vary in width.  They are used as control or address registers, changeddynamically but infrequently by microcode.RBASE:RM addressing requires eight bits.  Four come from the RAddress field in themicroword, and the other four are supplied from RBASE.  It is loaded from B or FF,and can be read onto RESULT.STACKPTR:an eight bit register used as a stack pointer.  Two bits of STACKPTR select a stack,and the least significant six bits a word in the stack.  The latter bits areincremented or decremented under control of the RAddress field whenever a stackoperation is specified.MEMBASE:a five bit register which selects one of 32 base registers in the memory to be usedfor virtual address calculation.  It is loaded from FF field or from B, and can beloaded from the IFU at the start of a macroinstruction.ALUFM:a 16 word memory which maps the four-bit ALUOp field into the six bits requiredto control the ALU.IOADDRESS:a task specific register which drives the IOADDRESS bus, and is loaded by I/Omicrocode to specify a device address for subsequent Input and Output operations.It may be loaded from B or FF.Ъ                                                                                                                                                                                                                                                          Н  О\ЗzТ GУ  Н+IН;┬wН  ОVёz wТ ─ПTТ │z wН  ОUzwТ ┌ПYН  ОS⌡Т ─zw Н  ОPpzwТ ╒zwП$Т ё	zwН  ОNЛТ ÷zwП/Т ═
Н  ОMhТ ≥П,Т  П,zw Н  ОKДzwТ ▄ПMТ █ zwН  ОJ`Т ─zw Н  ОFa|Т X	Н  ОC6wТ ├Т ┤ПLН  ОA╡Т ─Н  О?║zw Н	Л Т ╛ПDz wz wН	ЛО>Т єzw~
wТ ╔Н	ЛО<≥Т ▓П&Т ⌠П)Н	ЛО;Т ─~w Н  О9zw Н	Л Т ╠zwТ ╡Н	ЛО7─Т ┌ zwТ ┐П'zwН	ЛО5ЭТ └Т ┘zw	~wН	ЛО4xТ ²П$zwТ ·Н	ЛО2ТТТП5Н	ЛО1pН  О/_z w Н	Л Т ⌠Т ■zwz wz w Н	ЛО-шТ ─zw~
wН  О+йzw Н	Л Т ▀П*Т ▄П(Н	ЛО*FТ ≈~
w~wТ ≤z wН	ЛО(б~w Н  О&╠zw Н	Л Т █ПEТ ▌Н	ЛО%-Т ░z wТ ▒П(Н	ЛО#╘~w Н  О!≤z w Н	Л Т └ПHz wТ ┘ z wН	ЛО Т гz wП(Т хН	ЛО░Т ─Н  ОeТ ²П(Т ·П7Н  ОА
Т ─Н  Опzw Н	ЛzwТ щП5~wТ чН	ЛОL	Т ┐Т └zwz w~w Н	ЛОхТ ─zw Н  ОЇzw Н	ЛТ ≈Т ≤zwН	ЛО3ТТП3Н	ЛО╞
Т ╔Т і~wН	ЛО+Т ─Н  Оzw Н	Л Т ≤П/Т ≥П#Н	ЛО
√Т ╘П1~wz w Т ╙
Н	ЛО	Т ─	zwП$Н  Оzw Н	Л Т ≤Т ≥z~wП!Н	ЛО}Т ─zw Н  Оlzw Н	Л Т тП)zwТ у z w z Н	ЛОХwТ іП,~wТ ї ~wН	ЛО dТ ─z w~w         Ї	8    H÷aП                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              A PROCESSOR FOR A HIGH-PERFORMANCE PERSONAL COMPUTER186.3.4 The shifterThe Dorado has a 32 bit barrel shifter for handling bit-aligned data.  It takes 32 bits of input fromRM and T, performs a left cycle of any number of bit positions, and places the result on A.  TheALU output may be masked during a shift instruction, either with zeroes or with data fromMEMDATA.The shifter is controlled by the SHIFTCTL register.  To perform a shift operation, SHIFTCTL is loaded(in one of a variety of ways) with control information, and then one of a group of "shift and mask"microoperations is executed.6.4 Physical organizationOnce the goal of a physically small but powerful machine was established, engineering design andmaterial lead times forced us to develop the Dorado package before the implementation was morethan partially completed, and the implementation then had to fit the package.  The data section ispartitioned onto two boards, eight bits on each; the boards are about 70% identical.  The controlsection divides naturally into one board consisting of all the IM chips (high speed 1K x 1 bit ECLRAMs) and their associated address drivers, and a second board with the task switch pipeline,NEXTPC logic, and LINK register.The sidepanel pins are distributed in clusters around the board edges to form the major busses.The remaining edge pins are used for point to point connections between two specific boards.  TheI/O busses go uniformly to all the I/O slots, but all the other boards occupy fixed slots specificallywired for their needs.  Half the pins available on the sideplanes are grounded, but wire lengths arenot controlled except in the clock distribution system, and no twisted pair is used in the machineexcept for distribution of one copy of the master clock to each board.We were very concerned throughout the design of the Dorado to balance the pipelines so that noone pipe stage is significantly longer than the others.  Furthermore, we worked hard to make thelongest stage (which limits the speed of this fully synchronous machine) as short as possible.  Thelongest stage in the processor, as one might have predicted, is the IMADDRESS calculation andmicroinstruction fetch in the control slice.  There is about a 50 nanosecond limit for reliableoperation in a stitchwelded machine, and 60 ns in a multiwired machine.  There are pipe stages ofabout the same length in the memory and IFU.We also worked hard to get the most out of the available real estate, by hand tailoring theintegrated circuit layout and component usage, and by incrementally adding function until nearlythe entire board was in use.  We also found that performance could be significantly improved bycareful layout of critical paths for minimum loading and wiring delay.  Although this was a verylabor intensive operation, we believe it pays off.7.  PerformanceFour emulators have been implemented for the Dorado, interpreting the BCPL, Lisp, Mesa andSmalltalk instruction sets.  A typical microinstruction sequence for a load or store instruction takesonly one or two microinstructions in Mesa (or BCPL), and five in Lisp.  The Mesa opcode can senda 16 bit word to or from memory in one microinstruction; Lisp deals with 32 bit items and keepsits stack in memory, so two loads and two stores are done in a basic data transfer operation.  Morecomplex operations (such as read/write field or array element) take five to ten microinstructions inMesa and ten to twenty in Lisp.  Note that Lisp does runtime checking of parameters, while inMesa most checking is done at compile time.  Function calls take about 50 microinstructions forMesa and 200 for Lisp.The Dorado supports raster scan displays which are refreshed from a full bitmap in main memory;this bitmap has one bit for each picture element (dot) on the screen, for a total of .5²1 megabits                                                                                                                                                                                                                                                                      НДО]Яz wТ XУ   zw zw z w zw z
w zw zН  wН  ОW |Н  ОTowТ █ПFТ ▌Н  ОRКzwТ ║z wТ ╒П@z wН  ОQgzwТ ЕП1Т ФП%Н  ОOЦzw Н  ОL╦Т ┌
Т ┐zwП*zw	Н  ОK4Т ┘ПLТ ├Н  ОI╟Т ─Н  ОE╠|Т XН  ОB├wТ ⌡П4Т °П(Н  ОAТ ≤П3Т ≥П#Н  О?~Т ≈П@Т ≤Н  О=З
Т єТ ╔П:Н  О<vТ ·П8zwТ ÷zН  О:РwТ тП*Т уП.Н  О9nzwТ ─zw	Н  О6CТ ЄПEТ ╣Н  О4©Т ▀ПHТ ▄Н  О3;z w z wТ √Т ≈ z w z wП@Н  О1ЇТ ▌П@Т ▐Н  О03Т ═Т ║ПMН  О.╞Т ─П@Н  О+└Т °П;Т ²П!Н  О* Т ║П,Т ╒П1Н  О(|Т ⌡ПDТ °Н  О&ЬТ мТ нzwН  О%tТ жПFТ вН  О#ПТ ▓Т ⌠П9Н  О"lТ ─П#zw Н  ОAТ шП(Т эП1Н  ОҐ	Т ╙П(Т ╚П.Н  О9Т ═П\Н  О╣Т їТ ╗П@Н  О1Т ─П-Н  О^{Т XН  О3wТ дТ еП3zwН  О╞Т ∙Т √ПHН  О+Т ▀
Т ▄zwП.Н  Ої Т ≥ПGТ  Н  О#Т ▀П"Т ▄П>Н  О	÷Т ▓Т ⌠ПLН  ОТ ╟ПDТ ╠Н  О≈Т ╚П7Т ╛П$Н  ОТ ─Н  ОХТ ⌠ПEТ ■ |wН  О dТ  П5Т ⌡П)Ъ        ЭЇA    H÷bГ╒                                                                                                                                                                                            SEC. 7PERFORMANCE19(more for gray-scale or color pictures).  A special operation called BitBlt (bit boundary blocktransfer) makes it easier to create and update bitmaps; for more information about BitBlt consult [9],where it is called RasterOp.  BitBlt makes extensive use of the shifting/masking capabilities of theprocessor, and attempts to prefetch data so that it will always be in the cache when needed.  TheDorado's BitBlt can move display objects around in memory at 34 megabits/sec for simple cases likeerasing or scrolling a screen.  More complex operations, where the result is a function of the sourceobject, the destination object and a filter, run at 24 megabits/sec.I/O devices with transfer rates up to 10 megabits/sec are handled by the processor via the IODATAand IOADDRESS busses.  The microcode for the disk takes three cycles to transfer two words in thisway; thus the 10 megabit/sec disk consumes 5% of the processor.  Higher bandwidth devices usethe fast I/O system, which does not interact with the cache.  The fast I/O microcode for the displaytakes only two instructions to transfer a 16 word block of data from memory to the device.  Thiscan consume the available memory bandwidth for I/O  (530 megabits/sec) using only one quarter ofthe available microcycles (that is, two I/O instructions every eight cycles).Recall that the NEXTPC scheme (І 5.5 and І 6.2.2) imposes a rather complicated structure on themicrostore, because of the pages, the odd/even branch addresses, and the special subroutine calllocations  We were concerned about the amount of microstore which might be wasted by automaticplacement of instructions under all these constraints.  In fact, however, the automatic placer can use99.9% of the available memory when called upon to place an essentially full microstore.AcknowledgementsThe early design of the Dorado processor was done by Chuck Thacker and Don Charnley.  Thedata section was redesigned and debugged by Roger Bates and Ed Fiala.  Peter Deutsch wrote themicrocode assembler and instruction placer, and Ed Fiala wrote the Dorado assembler macros, themicroprogram debugger, and the hardware manual.  Willie-Sue Haugeland, Nori Suzuki, BruceHorn, Peter Deutsch, Ed Taft and Gene McDaniel are responsible for production and diagnosticmicrocode.References1.Clark, D.W. et. al. The memory system of a high-performance personal computer.  Technical Report CSL-81-1, Xerox PaloAlto Research Center, January 1981.  Revised version to appear in IEEE Transactions on Computers.2.Deutsch, L.P.  Experience with a microprogrammed Interlisp system.  Proc. 11th Ann. Microprogramming Workshop, PacificGrove, Nov. 1979.3.Geschke, C.M. et. al.  Early experience with Mesa.  Comm ACM 20, 8, Aug 1977, 540-5524.Ingalls, D.H.  The Smalltalk-76 programming system: Design and implementation. 5th ACM Symp. Principles ofProgramming Languages, Tucson, Jan 1978, 9-16.5. Lampson, B.W. et. al.  An instruction fetch unit for a high-performance personal computer.  Technical Report CSL-81-1,Xerox Palo Alto Research Center, Jan. 1981.  Submitted for publication.6.Mitchell, J.G. et. al.  Mesa Language Manual, Technical Report CSL-79-3, Xerox Palo Alto Research Center, April 1979.7. Teitelman, W.  Interlisp Reference Manual, Xerox Palo Alto Research Center, Oct. 1978.8. Thacker, C.P. et. al. Alto: A personal computer. In Computer Structures: Readings and Examples, 2nd edition, Sieworek, Bell and Newell, eds., McGraw-Hill, 1981.  Also in Technical Report CSL-79-11, Xerox Palo Alto Research Center, August1979.9. Newman, W.M. and Sproull, R.F.  Principles of Interactive Computer Graphics, 2nd ed.  McGraw-Hill, 1979.                                                                                                                                                                                                                                                                                                                                                                                                                                                            Н  О=KzТ GУ  Н-
Н;┬wН  О6ТТ ыТ зП1~wН  О5pТ │П.Т ┌~wН  О3ЛТ ╗~w~wТ ╘П,Н  О2h	Т °П!Т ²П6Н  О0ДТ ┐ ~wП=Т └Н  О/`Т ┬П^Н  О-эТ ─П=Н  О*╠z w z wТ ≈ПSТ ≤zН  О)-wТ █ zwП0Т ▌П%Н  О'╘Т ╔П,Т іП-Н  О&%Т ▀z w z wП)Т ▄z w z wН  О$║Т ÷П,Т ═П/Н  О#Т │П,z w z wТ ┌Н  О!≥Т ─П%z w z wП"Н  ОnТ ї	zwП,Т ╗Н  ОЙ
Т ЁПPТ ЄН  ОfТ ├ Т ┤ПTН  ОБТ ┘П-Т ├П0Н  О^Т ─ПRН  О▀{Н  О`wТ ╘Т ╙ПPН  ОэТ ∙ПZН  ОXТ ≤	Т ≥ПLН  ОтТ иП$Т йП)Н  О
PТ ґТ ўПEН  Ол	Н  ОЫ{	Н  О dy w Н{zТ GЪ        	Ї(Г    H÷BA
НzО)KpТ GУ  qПOrqН{О(ПBsq pq Н  О&ft u Н{qПDpП)qН{О%(Н  О#│t u Н{qpqpsq tqН  О!зt u Н{qПOpspН{О °qН  ОУt u Т X Н{qТ GpqПYrqН{ОЇПGН  Оt u Н{qpqpqrqП3Н  Оit u Т X Н{q	Т GpqП-Н  Обt u Т X Н{qТ GpqpП*qН{О└ПDrqП/Н{ОFН  ОЕt qН{pП+qЪ   	  6Ї      H÷   ÷                                                                                                                                                                              A PROCESSOR FOR A HIGH-PERFORMANCE PERSONAL COMPUTER20                                                                                                                                                                                                                                                                                                                                                                                                                                                                          НДО dq uТ XУ   qu qu q u qu q
u qu qН  uЪ        8Їeн    H÷Z $                                                                                                                                                                                                                                                                                                                                                                                                                                                            	HELVETICA                 	HELVETICA            
     	HELVETICA                 	HELVETICA             
     	HELVETICA                  	HELVETICA            
     MATH                  
     
TIMESROMAN            
     LOGO                      	 
TIMESROMAN               
 
TIMESROMAN                 
TIMESROMAN           
     
TIMESROMAN           
     
TIMESROMAN                
TIMESROMAN           	     
TIMESROMAN                 
TIMESROMAN               
TIMESROMAN                
TIMESROMAN                
TIMESROMAN               
TIMESROMAN               
TIMESROMAN            
     	HELVETICA             
    	HELVETICA            
    	HELVETICA                 	HELVETICA                	HELVETICA                TEMPLATE              @    GATES                                                                                                                          Щ    
 f     n    
 W   %     -  e   8 
 R   B 
 ─   L  N   W 	 √   `  2   k 	 1   t     ┐ 
 !   █  n   ≥ 	    ╒ 
 Ъ   ╛ 
 ^   І 
 W   ю  э  б      ╒     і  ї   ╘  Я   ╚ 	 ~   Є  K   ц 	 ⌠  л   Х          Z;[Y '   .RUN.                                    ZёZ; '  .RUN.                                    Zё[Y '  pressedit          B┴(■              ZoZё '  pressedit                                Z┴Zё '  pressedit                                ZҐZё '  prj/ ф  д ЪЪ√▀²7ЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪЪprocFinal.press                                    piera                          13-Jan-81 16:15:51 PSTд                                                                                                                                                    