This document is for internal Xerox use only.Dorado Hardware Manualby E.R. Fialacontributions to the manual byR. Bates, D. Boggs, B. Lampson, K. Pier, E. Taft, and C. Thackerother help byD. Clark, W. Crowther, W. Haugeland, G. McDaniel, andS. Ornstein14 September 1981The document describes the architecture and hardware design of the Dorado computer at alevel appropriate for programming. At the date of this printing, approximately 22 systemshave been released to users.This release incorporates a major revision of the Display Controller chapter, medium revisionsto the Disk Controller and Instruction Fetch Unit chapters, and minor revisions elsewhere.Revision history:14 February 1979First complete manual exclusive of io controller chapters.8 October 1979Chapters on io controllers added; major revisions.14 September 1981Major revision to the Display Controller chapter, medium revision toInstruction Fetch Unit and Disk chapters, minor revisions elsewhere.XEROXPalo Alto Research CenterComputer Sciences Laboratory3333 Coyote Hill Rd.Palo Alto, California 94304This document is for internal Xerox use only.bpX,q Xxr TsX 2QNO@2K J#5HY Au >I <8*0 :n 6^ 51Dy2t:0_Q5:.Q*:-Q'R+%,u,p,R,s,tpX-\ >QSTable of Contents1.Introduction 12.Overview 22.1Control 22.2Registers, Memories, and Data Paths 22.3Timing 62.4Instruction Fields 82.6Notation 93.Processor Section 103.1RM and STK Memories, RBase and StkP Registers 103.2Cnt Register 123.3Q Register 133.4T Register 133.5BSEL: B Multiplexor Select 133.6ASEL: A Source/Destination Control 153.7ALUF, ALU Operations 173.8LC: Load Control for RM and T 193.9FF: Special Function 193.10Multiply and Divide 233.11Shifter 233.12Hold and Task Simulator 254.Control Section 264.1Tasks 264.2Task Switching 264.3Next Address Generation 274.4Conditional Branches 294.5Subroutines and the Link Register 304.6Dispatches 314.7IFU Addressing 324.8IM and TPC Access 334.9Hold 344.8Program Control of the DMux 345.Memory Section 365.1Memory Addressing 365.2Processor Memory References 375.3IFU References 415.4Memory Timing and Hold 41$:^pi[q 9rPXxq 9rPV 9PT #9PR 9PP 9PN 9PKq q9rPIP -9PG 9PD 9PB 9P@~ 9P>J #9P< 9P9 9P7 9P5x 9P3C 9P1 P9P-q q9rP+ 9P) 9P' 9P% 9P# !9P"  9P  9P. 9P@ 9PR 9P'q q 9rP 9P( 9P: 9PL 9P"/HQ+SDorado Hardware ManualTable Of Contents14 September 19815.5The Map 445.6An Automatic Storage Management Algorithm 485.7Mesa Map Primitives 495.8The Pipe 515.9Faults and Errors 535.10Storage 575.11The Cache 585.12Initialization 595.13Testing 616.Instruction Fetch Unit 646.1Overview of Operation 646.2The IFUJump Entry Vector 696.3Timing Summary 716.4Use of MemBX and Duplicate Stk Regions 726.5Traps 726.6IFU Reset 756.7Rescheduling 756.8Breakpoints 766.9Reading and Writing IFUM 766.10Continuing from Processor Faults 776.11IFU Testing 796.12Details of Pipe Operation 806.13Timing Details 827.Slow IO 857.1Input/Output Functions 857.2IO Opcodes 867.3Wakeup, Block, and Next 877.4SubTasks 887.5Illegal Things IO Tasks Must Not Do 888.Fast IO 908.1Transport 908.2Wakeups and Microcode 908.3Latency 919.Disk Controller 929.1Disk Addressing 939.2Sector Layout Considerations 939.3General Firmware Organization 959.4Task Wakeups 969.5Control Register 979.6Format RAM and Sequence PROMs 97 fr!Zs1Ur_ 9P]( )9P[: 9PYL 9PW^ 9PUp 9PS 9PQ 9PO 9PL{q q9rPJj 9PH| 9PF 9PD &9PB 9P@ 9P> 9P< 9P: 9P9  9P7 9P51 9P3C 9P0q q9rP.* 9P,< 9P*N 9P(` 9P&s #9P#Gq q9rP!6 9PH 9PZ 9P/q q9rPA 9PS 9Pe 9Pw 9P 9P 9P T37YDorado Hardware ManualTable Of Contents14 September 19819.7Tag Register 999.8FIFO Register 1019.9Muffler Input 1019.10Error Detection and Correction 10410.Display Controller 10910.1Operational Overview 10910.2Video Data Path 11010.3Horizontal and Vertical Control 11310.4Pixel Clock System 11510.5OIS Seven-Wire Video Interface 11610.6Processor Task Management 11710.7Slow IO Interface 11910.8DispM Terminal Interface 12110.9DDC Initialization Requirements 12210.10Speed and Resolution Limits 12211.Ethernet Controller 12411.1Ethernet Packets 12411.2Controller Overview 12511.3Receiver 12711.4Transmitter 12811.5Clocks 12911.6Task Wakeups 12911.7Muffler Input 13011.8IOB Registers 13111.9Control Register 13111.10Status Register 13212.Other IO and Event Counters 13312.1Junk Task Wakeup 13312.2General IO 13312.3Event Counters 13313.Error Handling 13613.1Processor Errors 13713.2Control Section Errors 13913.3IFU Errors 13913.4Memory System Errors 13913.5Sources of Failure 14013.6Error Correction 14114.Performance Issues 14414.1Cycle Time 144 fr!Zs1Ur_ 9P]( 9P[: 9PYL 9PV!q q9rPT3 9PRE 9PP4 9PN# 9PL5 9PJG 9PHY 9PFH 9PD7 9PB% 9P>q q9rP=  9P; 9P90 9P7B 9P5U 9P3g 9P1y 9P/ 9P- 9P+ 9P(q q9rP& 9P$ 9P" 9Pq q 9rP 9P 9P 9P 9P 9P 9Pq q9rPx 9P 137YDorado Hardware ManualTable Of Contents14 September 198114.2Emulator Performance 14414.3IFU Not-Ready Wait 14514.4Microstore Requirements 14514.5Cache Efficiency and Miss Wait 14614.6Performance Degradation Due to IO Tasks 14714.7Cache and Storage Geometry 14715.Glossary 150 fr!Zs1Ur_ 9P]( 9P[: 9PYL 9PW^ '9PUp 9PREq q9rP0 Pz37Dorado Hardware ManualTable Of Contents14 September 1981List of Tables 1.Memories 3 2.Registers 4 3.Data Paths 5 4.Load Timing 7 5.Instruction Fields 8 6.RSTK Decodes for Stack Operations 11 7.BSEL Decodes 13 8.ASEL Decodes 15 9.ALUFM Control Values 1710.LC Decodes 1911.FF Decodes 2012.ALUF Shift Decodes 2513.Branch Conditions 3014.Reserved Locations in the Microstore 3315.Timing of a Dirty Miss 4416.Map Configurations 4517.Fault Indications 5418.IFUM Fields 6519.Operand Sequence for _Id 6620.IFU FF Decodes 6821.IO Register Addresses 8522.Task Assignments 8623.T-80 Specifications and Characteristics 9524.OIS Terminal Microcomputer Messages 11725.DDC Muffler Signals 12026.Ethernet Muffler Signals 13027.Error-Related Signals 13728.Double Error Incidence vs. Repair Rate 14329.Utilization of the Microstore 14530.Execution Time vs. Cache Efficiency 14631.Cache Geometry vs. LRU Behavior 149 fr!Zs1Ur%^pi Zr\9PY\9PPW\ 9PPU*\ 9PPS<\9PPQN\!9PO`\ 9PMr\ 9PK\9PI\ 9PG\ 9PE\9PC\9PA\$9P?\9P>\9P<\9P:'\ 9P89\9P6K\ 9P4^\9P2p\9P0\'9P.\#9P,\9P*\9P(\9P&\&9P$\9P#\#9P!\9P ~37N?Dorado Hardware ManualTable Of Contents14 September 1981List of Figures 1.Dorado: Programmer's View 2.Card Cage 3.Processor Hardware View 4.Shifter 5.Control Section 6.Next Address Formation 7.Instruction Timing 8.Overall Structure of the Memory System 9.Cache, Map, and Storage Addressing10.The Pipe and Other Memory Registers11.Error Correction12.Instruction Fetch Unit Organization13.Disk Controller14.Display Controller15.Display Controller IO Registers16.Ethernet Controller17.Programmers' Crib Sheet fr!Zs1Ur%d^pi:[rX\:YL\:W\:U\:S\:R"\:PW\:N\&:L\":J\#:I-\:Gb\#:E\:C\:B\:@7\:>m\ >&37)Dorado Hardware ManualIntroduction14 September 19811IntroductionDorado is a high performance, medium cost microprogrammed computer designed primarilyto implement a virtual machine for the Mesa language, as described in "The MesaProcessor Principles of Operation," and to provide high storage bandwidth for picture-processing applications. Dorado aims more at word processing than at numericalapplications.The microprocessor has a nominal cycle time of 60 ns, and most Mesa opcodes willexecute in one or two cycles; the overall average opcode execution time will be subject toa number of considerations discussed later. Dorado will also achieve respectableperformance when implementing virtual machines for the Alto, Interlisp, and Smalltalkprogramming systems, although simple instructions for these run three to five times slowerthan Mesa.Dorado is implemented primarily of MECL-10K integrated circuits; storage boards use MOSand Schottky-TTL components primarily. Backplanes and storage boards are printedcircuits; other logic boards are stitchweld in prototypes and multiwire or PC in productionmachines. The mainframe is divided into sections called Control, Processor, InstructionFetch Unit (IFU), and Memory, and peripheral control is accomplished by the Disk,Ethernet, and Display Controller sections, as discussed in chapters of this manual. Themain data paths, shown in Figure 1, are 16-bits wide (the word size). The control section isshown in Figure 5. The Baseboard section, used to control the mainframe, is discussed inthe "Dorado Debugging Interface" document.The processor is organized around an Arithmetic and Logic Unit (ALU) whose two inputsare the A and B data paths (Figure 1), and whose output is normally routed to the Pd datapath. Inputs to A, B, and Pd include all registers accessible to the programmer. Inaddition, 16-bit literal constants can be generated on B. B appears on the backplane forcommunication with the IFU, Control, and Memory sections.The processor also includes a 32-bit in/16-bit out shifter-masker optimized for fieldinsertion and extraction and with specialized paths for the bit-boundary block transfer(BitBlt) instruction.An instruction fetch unit (the IFU) operating in parallel with the processor can handle up tofour instruction sets with 256 opcodes each; opcodes may independently be specified asone, two, or three bytes long.Emulator and IFU references to main memory are made through a 4k-word high-speedcache. Main storage can be configured in various sizes up to a maximum of 222 16-bitwords when 64k x 1 RAMs.The processor initiates data transfers between main memory and fast input/output devices.16 16-bit words are then transmitted without disturbing the processor data paths in about1.68 ms (28 cycles). New references can be initiated every 8 cycles, so total bandwidth ofthe memory, 533 mHz, is available for devices with enough buffering. fp%5q 5pGf ar ^ep J \ A ZV YO W; S5 QW P4Q Ni.' L5% J GbW EI CJ BX @7K >m-+ <9$ :Y 9 * 5!4 3; 2A 0;U .q9 *U )4N 'i #W "-8 b 3 &1s&p [ Y $5 Ttp7 CD d =X$Dorado Hardware ManualOverview14 September 19812OverviewExperience suggests that programmers will gradually develop a mental model somethinglike Figure 1; until this mental model is well established, it is probably desirable toRead the following with Figure 1 in view.Dorado has Processor, Control, Memory, IFU, and IO controller sections.Io controllers are independent of each other and of the other sectionsyou will have tounderstand a particular io controller iff you are going to write microcode that controls it.The memory and IFU are "slaves" to the processor/control section. In most situations,their external interface is simple relative to internal details of operation, and effectiveprogramming is usually possible without detailed understanding.However, programmers will have to understand the processor thoroughly because thedifferent parts of the processor are controlled directly by instruction fields, and most of theprocessor will be used, even in a small program.Programmers must also understand most of the control section, although fairly simpleassembly language contstructs are transformed into the complicated branch encodingsneeded by Dorado, so detailed understanding of Dorado branching is not required.ControlDorado supports up to 16 independent tasks at the microcode level. Each task has its ownprogram counter (TPC), and other commonly-used registers are also replicated on a per-task basis. Tasks are scheduled automatically by the hardware in response to wakeuprequests, where task 15 is highest priority, task 0, lowest.Emulator microcode runs entirely in task 0 (lowest priority); fault conditions normallywakeup task 15, the "fault task" (highest priority). Other tasks are normally paired with iodevices that issue wakeup requests when they need service. Task switching, discussed in"Control Section", is in most cases invisible to the programmer, because commonly-usedregisters are duplicated for each task.In this manual, "instruction" refers to a microinstruction in the control store, as opposed toan opcode in the higher level language interpreted by a microprogram. The JCN field in aninstruction encodes a variety of jumps, calls, conditional jumps and calls, instructiondispatches and returns for the current task.Registers, Memories, and Data PathsTables 1, 2, and 3 describe memories, registers, and data paths in Dorado; these arediagrammed in Figure 1. The first two tables below focus on a particular register ormemory and tell how it is used and where it connects; the third table focuses on particulardata paths and shows how they connect various parts of the machine. fp%5q5pGf ar ^ep= \TY)qX) UpG RE,+ Pz@ MA K>? Is? FM D7$; Bl0 >T =/6 ;eP 6Ks 2p$5 1< /D> -z< *W (=N &s L $V "' kS Q W  , s# pJ "3 Q C B <]Dorado Hardware ManualOverview14 September 19813Table 1: MemoriesMemoryCommentsIMIM is a 4096-word x 34-bit (+2 parity) RAM used to store instructions. When written, the address istaken from Link and data from B 16 bits at-a-time (1 extra bit and parity from RSTK field). Whenread, the address is taken from Link, and data is delivered to Link 9 bits at-a-time. The read orwrite is controlled by the JCN field and two or three low bits of RSTK.ALUFMALUFM is a 16-word x 6-bit ALU control RAM addressed by the 4-bit ALUF field. Five ALUFM bitsspecify 16 boolean or 5 arithmetic operations on A and B. One bit is the input carry for arithmeticoperations (modifiable by several functions). ALUFM[ALUF] is read onto Pd by the ALUFMEMfunction or both read onto Pd and loaded from B by the ALUFMRW_ function.RMRM is a 256-word x 16-bit (+2 parity) RAM used for general storage by all tasks. The normaladdress is RBase[0:3],,RSTK[0:3]. Data can be read onto A or B and loaded from Pd or Mdwithout using FF. Together with T, RM forms the input to the Shifter.STKSTK is a 256-word x 16-bit (+2 parity) stack accessible only to the emulator, used instead of RMwhen the BLOCK bit in the instruction is 1. Its address comes from StkP, modified by -4 to +3under control of RSTK.IFUMIFUM is a 1024-word x 24-bit (+3 parity) decoding memory containing 256 words for each of fourinstruction sets. The instruction set can be set by the InsSetOrEvent_ function. The low 8 addressbits are normally an opcode fetched from the cache, but can be loaded from B by the BrkIns_function to read or write IFUM itself. The IFUMLH_ and IFUMRH_ functions load, and theB_IFUMLH' and B_IFUMRH' functions read different bits of IFUM. During normal operation IFUMcontrols decoding of the stream of opcodes and operands fetched from memory relative to BR 31,the code base.MAINMain storage consists of a 64-row x 4-column x 16-word virtual cache coupled with one to four256k x 16-bit memory modules (using 16k-bit storage chips). The IFU and processor independentlyaccess the cache, with IFU references deferring to the processor. The processor has two dissimilarmethods of reference, one primarily to the cache (with "misses" initiating main memory action) andone directly to main memory (invalidating cache hits on writes, using dirty cache hits on reads).Fetch_, Store_, IFetch_, LongFetch_, and PreFetch_ are cache references. Md can be loadedinto T or RM (LC field), routed onto B (BSEL field), onto A (FF field), or used in a shift-and-maskoperation (ASEL and ALUF fields). IOFetch_ and IOStore_ (ASEL field) initiate a 16-word transferbetween an io device and memory without further processor interaction (using Fin or Fout bus).Virtual addresses are transformed to absolute using the Map memory. All references leaveinformation in the Pipe memory.BRA 32-word x 28-bit base register memory addressed by the MemBase register. The virtual addressfor any memory reference is BR[MemBase]+Mar. BR is loaded from Mar by the BrLo_A andBrHi_A functions and can be read indirectly onto B via the virtual address left in the Pipe after amemory reference (Pipe0 and Pipe1 functions).PipeThe 16-entry x 6-word pipe contains trace information left by memory references. This informationincludes the virtual address, map stuff, single-error and double-error information, cache control stuff,task and subtask. It is automatically loaded during any memory reference and can be read onto Bby the Pipe0, Pipe1, ..., Pipe5' functions.MapThe Map is a 16k or 64k-word x 19-bit (+parity) memory used to transform virtual addesses toabsolute. Addressed by VA[10:23], map entries contain 16 bits of real page, write protect, dirty,and referenced bits. They can be written from B with Map_ (ASEL) and read from the Pipe aftermain storage references. fp%5q5pGf#bsX `vtP ]uPY \NZV YG V!P@T>&S@QI NPUM,'1KF HPMG?EE BPUARQ?E>JG<>;AZ9 6PE5U<$3212L#?0/2/DK-9*,<a*'7)4!8' $P@#GQ!C ?- ZP=%KRN+ P@eNX] oP ;P>"9S 6P S51E 2LPP901,/DU-?,<= )WP['' $Pb#j[!M b }P+7buG PU  +9 PX  =XVDorado Hardware ManualOverview14 September 19815Table 3: Data PathsPathCommentsAThe 16-bit high-true A bus (called "alua" in hardware drawings) may be driven from T, RM, STK, Q,Id, Md, a small constant between 0 and 178, or the shifter. It is also possible to 'or' the low-trueshifter output with one of the other A sources. The A bus is totally inside the processor section,not connected to any other sections of Dorado, and it is one of the two Alu inputs. The RF_A andWF_A functions, which load ShC for subsequent shift operations, receive data from A.MarThe 16-bit Mar bus transmits the displacement for a memory reference from the processor or IFUsection to the memory section. The CFlags register, some bits of the Mcr register, and the BRmemory in the memory section are also loaded from Mar. The processor drives Mar only when it isstarting a reference or executing one of the functions between 1208 and 1278 (i.e., CFlags_A' andLoadMcr[A,B] are in this group of functions); during other instructions, the IFU may use Mar toinitiate instruction-fetches. Mar is driven low-true; when driven by the processor, it receives thesame data as are driven onto A (but the shifter cannot drive Mar).BThe 16-bit B bus consists of one data path inside the processor section (called "alub" in hardwaredrawings) and another on the backplane (called "Bmux" in hardware drawings); the IOB bus isdriven from Alub on Output operations, when it also is an extension of B. Alub and Bmux may bedirectly driven high-true from registers inside the processor; alternatively, Bmux may be driven low-true from other sections, in which case the processor receives the data onto alub through inverters(so the data appears high-true on alub). The BSEL field in an instruction can specify that either T,RM/STK, Q, or Md sources B; other sources and destinations loaded from B are specified in the FFfield; BSEL and FF are used in combination to specify that a literal 8-bit constant (in either the leftor right byte of the word with 0's or 1's in the other byte) sources B. Alub is one of the two Aluinputs. The processor computes odd byte parity on alub; Bmux and IOB destinations may store orcheck the parity computed by the processor. PdThe Pd path ("Processor data") receives data from an 8-input multiplexor whose inputs are the Aluoutput, possibly shifted left or right one bit on Alu shift functions or masked on a shifter operation,io device input data, and the infrequently read registers in the processor section. Pd may bewritten into the T register or the RM or STK memories.IdThe Id path ("IFU data") is used to send arguments from the IFU to the processor for interpretation.It can be routed onto A using ASEL (A_Id, Fetch_Id, Store_Id, or IFetch_RM/STK); alternatively,the TIsId or RIsId functions can be used to replace data from T or from RM/STK by IFUdatathese functions provide a roundabout method of getting Id onto B.MdThe Md path ("Memory data") moves data from the cache in the memory section into theprocessor. The processor latches Md and can route it onto A or B, load it into T and RM/STK, oruse it in a shift-and-mask operation.IOAThe IOA bus ("Input-output address") is driven from the TIOA register; it specifies the io deviceaffected by a Pd_Input or Output_B function.IOBThe IOB bus ("Input-output bus") is driven from alub on an Output_B function or received on Pdby a Pd_Input function; it transmits data to or from an io device.Fout("Fast output bus") transmits data from the error corrector to a fast output device.Fin("Fast input bus") transmits data from a fast input device (Presently, there are no fast inputdevices) to the syndrome generator.Sout("Storage output bus") transmits data from the syndrome generator to storage.Sin("Storage input bus") transmits data from storage to the error corrector. fp%5q5pGf"bsX `vtP ]uP^\)[\7ZfcXGW^T TyP TRS Qq6*O!!O`OO`ONFLL`K>B HYP*8FLEQIC`BIZ@e?AH=:-<8c:\90, 6KP 56KP56KPH4O3C//16 .P u.O.P..P<-V="+C*NF 'iP :''iP 4''iP>%Q$a% !}P !&O!}P!&!}P!|!&!}P*, P KOPHP!-PBB PTPTPJ PoPoPBB# ]P]P]P@ xP!xP!xP= =[Dorado Hardware ManualOverview14 September 19816TimingThe terminology used in discussing timing is as follows:clockThe 30 ns (nominal) atomic time period of the machine. Clock period can becontrolled by the baseboard microcomputer or through the manifold system asdiscussed in the "Dorado Debugging Interface" document.1cycleThe duration of instructionstwo clocks or 60 ns except for instructions thatread/write IM or TPC.t0The instant at which MIR (MicroInstruction Register) is loadedthe beginning ofa cycle.t1The next instant after t0always one clock later.t2The instant following t1one clock after t1 except for instructions thatread/write IM or TPC. Additional clocks intervening for these special cases,which only affect the control section, are denoted by t1a, t1b, etc.t3, t4Subsequent instants for a instruction. t3 of the previous instruction coincideswith t1 of the current instruction; t4 with t2.First half cycleThe interval from t0 to t1 (or t2 to t3).Second half cycleThe interval from t1 to t2 (or t3 to t4).As implied by this terminology, Dorado initiates a new instruction every cycle. Instructionsare pipelined, requiring a total of three cycles for execution. Timing for a typicalinstruction is shown in Figure 7. At t-2, the next instruction address is determined andinstruction fetch from IM begins; at t0, the instruction is loaded into MIR from IM. Duringthe first half cycle, the selected register is read from RM or STK, and at t1 is loaded into aregister. During the next two clocks (t1-t3), addition is performed in the ALU; at t3 theresult is loaded into a register for writing into RM/STK or T. During the final clock, RM iswritten.Since a new instruction begins before the previous one finishes, paths exist to bypass theregister being written if the following instruction specifies it as a source (These paths,inaccessible to the programmer, are not shown in Figure 1).Most registers load from B at t3 (i.e., at the mid-clock of the cycle following the loadinstruction). These may source B in the instruction after they are loaded. The loadinformation and data are pipelined into the next cycle, as described above. Registersloaded at t2 may be used during the first half-cycle of the following instruction. Usually,this type of register is used for some type of control information, since control registers arenormally clocked at t0 (= t2 of previous instruction), data-oriented registers at t1 (t3 ofprevious instruction).Table 4 summarizes the time at which loading takes place and some other information.1 We actually operate with a clock speed of 32 ns, slower than the 30 ns nominal period, and productionmachines typically become unreliable at about a 29 ns clock period. fp%5q5pGf bs ^p8 g[KEKYL >KW7Xu gUpK-KS gR"QuKR"p$Q)R"($QcR" /yQR" KPW gNMuKNpMuNp gLL5uKLp*PL5uLpPL5uLp+KJ7KI-7HupI-HupI- gG?FuG?pFuKG?pP FuG?pKEtDuEtpDuEtpDuEtp gCAPARuApARuApARuApARuAp g@>JP=u>Jp=u>Jp=u>Jp=u>Jp :H 90 G 7f'6u7fp! 5&5u5p0 3L3Cu3p 2(1yu2p1yu2p'1yu2p 0;Z .q *F )4C 'i; ##ku#p * "-%1 bL   upG V uupuupuupuup 8 > Tu=) fC x ;epQ 9D 6(2C 4^0) 2)$ 0 -VA +C )A)4u)p)4u)p &sL $Q !6H k$+ +'  P  U A: 82 =S5Dorado Hardware ManualOverview14 September 198196. FF is the catch-all field in which operations or data not otherwise specifiable can beencoded. Operations encoded in FF are called "functions". There are five ways FF isused:a.To extend the branch address encoded in JCN (long goto, long call).b.To form a constant on B as selected by BSEL.c.To specify one of 64 common functions and branch conditions while the two leading bits modify thememory reference operation specified in ASEL.d.To specify one of 256 functions and branch conditions, some of which use low bits of FF as literalvalues.e.As a shift control value when ASEL decodes to "shift" and BSEL to a constant.When FF is used as a function, it sometimes modifies the interpretation of other fields inthe instruction. For example:a.16 FF decodes modify RM write address bits which would otherwise have come from RSTK or StkP.b.16 FF decodes modify RM write-address bits which would otherwise come from RBase.c.16 FF decodes select less common B sources, causing BSEL to encode a destination rather than asource for B.7. JCN (in conjunction with current address) encodes the next instruction address asfollows:a.One of 64 global Calls.b.One of 60 local Gotos.c.One of 4 local Calls.d.One of 14 local conditional branches with 7 branch conditions.e.One of 16 long Gotos/Calls (use FF field for rest of address).f.One of 4 IFU jumps for next opcode (high 10 address bits from IFU).g. Return.h.TPC read/write.i.IM read/write (Use low bits of RSTK also).8.P0 and P1 are odd parity on the left and right halves of IM. When wrong, these giverise to error signals (see "Theory of Operations") which stop the machine after(unfortunately) the instruction with bad parity has been executed. The artifice ofdeliberately loading both parity bits incorrectly is used to implement breakpoints.NotationThe notation used in referring to fields in the instruction is that the left-most bit of the fieldis denoted as 0. Hence, the fields in the instruction are as follows: RSTK[0:3], ALUF[0:3],BSEL[0:2], LC[0:2], ASEL[0:2], BLOCK[0], FF[0:7], JCN[0:7].The BLOCK bit is also called StackSelect, for its use in choosing STK instead of RM for theemulator task. fp%5q5pGf b9! `SE ^x[uCxZf,xY01xW-xVD*8xTxSM P4pF NixKuWxJGQxH"<xG D7p5 Blx?ux>Jx<x;>x:'>x8Cx7f x6x4* 1Up29 /2A3 -C +S &s #jpD !E ; c8#   R 778, and underflow occurs when location 0 is fp#$q5pGf ar ^ep; \ P Z):W^q0 Sp9 R"*) PW= L,( K: IP[ G9 C@s- ?p?w??w)?$ >6! <8;t<8p ;t<8p9 :n+/ 88t8p8t8p 5U-) 34$ 1 .M#5 ,K )T 'FR %|A #? !2 t,* $5 V '/ J J H  q p- Cqp $ x$ txp& txp "4 / Vt p 3=]^Dorado Hardware ManualProcessor Section14 September 198111either read or written or when StkP[2:7] is decremented below 0.StkP[2:7] are initialized to 0, denoting the empty stack. A push could do StkP_StkP+1 andwrite in one instruction. A pop does StkP_StkP1, and the item being popped off can bereferenced in the same instruction if desired.Table 6: RSTK Decodes for Stack OperationsRSTK[0]0 = no underflow on StkP = 0 at start or end1 = underflow when StkP originally 0 or finally 0.RSTK[1:3]Meaning 0no StkP change 1StkP_StkP+1 2StkP_StkP+2 3StkP_StkP+3 4StkP_StkP4 5StkP_StkP3 6StkP_StkP2 7StkP_StkP1In other words, RSTK[1:3] treated as a signed number are added to StkP[2:7] (StkP[0:1]don't change.). In the emulator, an attempt to underflow or overflow the stack generatesthe signal StkError:StkError = (BLOCK eq 1) & Emulator &[((StkP[2:7] + RSTK[1:3]) < 0) % ((StkP[2:7] + RSTK[1:3]) > 778) %((RSTK[0] eq 1) & ((StkP[2:7] eq 0) % ((StkP[2:7] + RSTK[1:3]) eq 0)))]StkError generates HOLD and wakes up the fault task (task 15) to deal with the situation,so the instruction causing StkError has not been executed when the fault task runs.StkUnd and StkOvf are remembered in flipflops read by the Pd_Pointers function. Theseget cleared (i.e., recomputed) when the next stack operation is executed by the emulator.The fault task can read them to decide whether stack underflow or overflow action isnecessary.Interpretation of underflow: StkP eq 0 denotes the empty stack. A stack adjustment mayoccur either by itself or with a read or write stack reference. StkP originally equal 0underflows if the top of stack is read or written; decrementing StkP below 0 is always anunderflow error; StkP equal 0 after modification underflows iff writing at the modifiedaddress. Consequently, the assembler sets RSTK[0] equal 1 for a stack reference onlywhen either reading STK and incrementing the pointer or writing at the modified addressand decrementing the pointer.In other words, the microassembler must tell the hardware when to make the StkP equal 0underflow checks, and it must do this correctly when the ModStkPBeforeW FF decode isused.StkP can be loaded from B[8:15] using the StkP_B function; however, this is illegal inconjunction with a STK read or write in the same instruction (e.g., T_Stack, StkP_T leavesStkP unchanged). fp#$q5pGf b@ ^< \W [.WsX*TVut,R2PuMOt K JG H G? E D7 B ?pK =!8 ;8t$(6>6o6(5UG 2pB 0;M .q9 ,H *B ) %G ##q " p"qp ?qp tP 7  mO Q  f5q  =[)Dorado Hardware ManualProcessor Section14 September 198112StkP is saved at t2 of an instruction dispatched to by the IFU. The saved value may bereloaded into StkP at t2 by the RestoreStkP function; RestoreStkP is illegal in conjunctionwith a STK read or write in the same instruction.RestoreStkP is useful only if opcodes are restarted after servicing map faults. However, we are alsoarranging for the IFU state, branch conditions, etc. of an interrupted opcode to be readable andreproducible, so that it will be possible to simply continue from the instruction that faulted.RestoreStkP will be useless if the continue-method of restarting is adopted.The opcode-restart method effectively prevents use of the IFU entry vector scheme discussed in "IFUSection," degrading performance perhaps 2%, so it is desirable to continue from rather than restartfrom faults. Also, complicated opcodes may require special-case code in the fault handler beforeopcode restart is possible, so continuing from the instruction that faulted is likely to be simpleroverall.Two groups of FF decodes change the RM address for the write portion of an instruction.The first group of 16 FF decodes forces the write address to come fromRBase[0:3],,FF[4:7]. This allows different registers in the same group of 16 to be used forthe read and write portions of the instruction, or allows STK[StkP] to be used for the readportion and any of the 16 registers pointed to by RBase in the write portion.The second group of 16 FF decodes forces the top four write address bits to come fromFF[4:7]. The complete RM write address becomes FF[4:7],,RSTK[0:3]. This allows anarbitrary RM address to be written without having to load RBase in a previous instruction.Alternatively, if the i'th register in a group of 16 is read from RM, it permits the i'th registerin a different group of 16 to be written in the same instruction. In conjunction with a readof STK, RSTK[0:3] will encode the StkP modification, and whatever RM word this happensto point to will be written (Programmers will have to struggle to use this with a STK read.).Note: SubTask does not affect the write address for these functions.Note that there is no way to read RM and write STK in one instruction.The RisId FF decode causes Id to be substituted for RM/STK in the A, B, or shiftermultiplexing.There are branch conditions to test R[0] (R<0) and R[15] (R odd). These branch conditionsare unaffected by the RisId FF decode; actual data from RM/STK is tested.Cnt RegisterThe 16-bit Cnt register is provided for use as a loop counter. Since it is not task-specific,io tasks must save and restore it.Cnt can be decremented and tested for 0 by the Cnt=0&1 branch condition; loaded fromB[0:15] or from small constants 1 to 16 (FF decodes), and read onto the Pd path (into T orRM/STK) by an FF decode. gp#$q5pGf batbpA `S_t`Spq ^0px[t*vtxZfFxY4vt#xWLxUQxSV xRE TxPHxO L5p< HF FJ E-N CcM ?< >&', <\*0 :Q 8E 6!5 51(5 3gq> /pF ,= * 'FO %|q p; bs pG &" .' !9   =TV!Dorado Hardware ManualProcessor Section14 September 198113Q RegisterThe 16-bit Q register is provided primarily for use as a shift register with multiply anddivide, but will probably be used more widely by the emulator. Since it is not task-specific,io tasks must save and restore it.Q can be read onto B (BSEL) or onto A (FF); it can be loaded from B (FF) and when FFspecifies an external B source in the memory, ifu, or control sections, it can also be loadedfrom B (BSEL). Q can be left-shifted or right-shifted one (bringing 0 into the vacant bit) bytwo FF decodes.T RegisterThe 16-bit T register is the primary register for data manipulation in the processor. Since itis task-specific io tasks do not have to save and restore it. T can be read onto B (BSEL) orA (ASEL); it can be loaded from Pd or Md (LC).BSEL: B Multiplexor SelectBSEL normally selects one of the "internal" processor sources for B, as shown in the"Primary" column in the table below (Note that although Md originates in the memorysection, it is latched by the processor and appears as an internal B source.). However, theFF field can be used to substitute some other source external to the processorthere aremany "external" sources in the control, IFU, and memory sections, and the codes for theseare given in Table 11. When an external source is specified, then BSEL instead encodesthe destination for B, as shown in the "External" column of the table below.The sources selected by BSEL are:Table 7: BSEL DecodesBSELPrimaryWith External Source 0Md 1RM/STK 2T 3QQ_B * 40,,FFInapplicable because FF is not available to encode an external source 53778,,FFInapplicable 6FF,,0Inapplicable 7FF,,3778Inapplicable*Note: BSEL decode for Q_B is needed in initializing Dorado from the baseboard or Alto. BecauseALUFM contents may be unknown, and data from the Alto is transmitted via the B_Link FF decode,some other field is needed to encode a destination that can then be routed into ALUFM.The values selected by BSEL=4-7 are 16-bit constants obtained by concatenating the 8-bitFF field with zeroes or ones. When this is done, normal effects of functions are disabled,so external B sources are impossible. In conjunction with a shift operation on A, BSEL =4 to 7 will cause the shifter controls to come directly from FF rather than from ShC as fp#$q5pGf bs ^p8! \N [" WH U5( T6( RE M,s IpC G%8 F$. A s =p G ;: :6& 89'1 6o'2 4(/ 2L /h!!w+sX(u c%|t c# c"s c  ck c 'Z c @ c / c yJFy5)yB7 p@ )8# ^5$ T  L<\=Dorado Hardware ManualProcessor Section14 September 198114discussed in "Shifter"; the Q-register sources B when an FF-controlled shift is carried out.The TisId and RisId FF decodes may be used with the B_T or B_RM/STK BSEL decodes,respectively, to accomplish B_Id.The "External" decode of BSEL applies with Link, DBuf, Pipe0-Pipe5, FaultInfo, PCX,DecLo, DecHi, and other functions that source B on the backpanel, as selected by the FFdecode. For these external sources, BSEL is interpreted as the destination for B ratherthan the source.Note: When the memory or control section sources the external B bus, it is illegal toexecute arithmetic alu operations; these sources are not electrically stable soon enough topermit the extra 10 ns required for carry propagation. But: if you are sure carries will notpropagate into the high 8 bits of ALU result, then the hardware is fast enough.However: Arithmetic is permitted when the IFU sources the external B bus, provided theprevious instruction was not one of the slow B sources from the memory or controlsections. This permits (Id)-(PCX')-1, common in emulator microcode.This implies that an io task must never block on an instruction that reads B from a slow externalsource.Hardware ImplementationThe processor's internal version of B, called Alub, is driven by a 4-input multiplexor when sourced from withinthe processor; in this case an identical multiplexor drives the external bus, called Bmux (high-true). When theB source is external, both of these multiplexors are disabled, and the backpanel Bmux (low-true) is invertedthrough a gate onto Alub. The multiplexor arrangement is shown in Figure 3.The IFU section is on/off of Bmux by t1+6 ns and the processor section is off by t1+7 ns, but the memoryand control sections are not on/off until t1+16 ns; hence, a slow Bmux source in the previous instructionprevents Bmux from stabilizing until t1+16 ns of the current instruction, allowing insufficient time to propagateBmux onto Alub and finish carry propagation. However, because Bmux is gated onto Alub, and the gate shutsoff quickly, arithmetic on internal Alub sources is always permissible.Bmux sources in this manual are given high or low-true names that agree with the way signals appear on Alub.For external sources this is inverted with respect to the sense of these signals on Bmux. However, becauseexternal sources cannot feed external destinations (no way to encode this in an instruction), the signalinversion is invisible to programmers. fp#$q5pGf bq@p ^H \! YoN WW UF T Pqp =qp NI M8qp K>O Gqp> F5 D7DyAut?"y@ :< 90p? 7f'+ 5D 3? 0v -tX ,<] *F$ )4 &svt1 $Z #j a !Z &4 Is#HtIsp GGtGp< E EQtEp BI =/sX%:9t&,6  :89 &,*6::6 &,*6::51 &,*6::3 &,*6::2) &,*6::0 :/! ::- :, :* -:) :' :&  :$ :# :!} :%>:}< /pE d3(  tpL ;Btp @ : 1 " =\UkDorado Hardware ManualProcessor Section14 September 198118External B sources from the IFU and internal sources are ready in time for arithmetic, butexternal sources from the memory and control sections are not (see the earlier section on"BSEL: B Multiplexor Select"). Internal A sources except shifter are ready in time forarithmetic. Unless explicitly disabled by the FreezeBC function, the branch conditionsALU<0, ALU=0, Carry' (ALU carry out'), and Overflow are available for testing on thecontrol card at t3.The Overflow branch condition, defined as carry-out from bit 0 unequal to carry-out frombit 1, is true iff a signed arithmetic operation yields an incorrect result.Normally, the ALU is routed directly onto Pd, and Pd is then written into either T orRM/STK. However, several functions route ALU output shifted left or right 1 position ontoPd. Note that the ALU output of this instruction are used (not the previous one) and thatALUcarry is undefined on a logical ALU operation. The right shifts are:ALU rsh 1(0 onto Pd[0])ALU rcy 1(ALU[15] onto Pd[0])ALU arsh 1(ALU[0] onto Pd[0] preserving the sign)ALU brsh 1(ALUcarry onto Pd[0])Multiply(ALUcarry onto Pd[0]).The left shifts are:ALU lsh 1(0 onto Pd[15])ALU lcy 1(ALU[0] onto Pd[15])Divide(Q[0] onto Pd[15])CDivide(Q[0] onto Pd[15]).Multiply, Divide, and CDivide have other effects as well discussed later.Note: The barrel shifter discussed in the "Shifter" section also use the Pd multiplexor formasking, so it is illegal to combine barrel shifts and ALU shifts in the same instruction.Note: ALU<0, ALU=0, Carry', and Overflow branch conditions test the ALU output of theprevious instruction executed by the task and any shifting or masking that takes place inthe Pd input multiplexor does not affect the result of these branch conditions.Note: The value of Carry' and Overflow change only on arithmetic ALU operations.However, ALU_A may be either an arithmetic or a logical operation; in order to useXorCarry with ALU_A, we will probably use the arithmetic form of ALU_A, but theconsequence of this is that Carry' will change on ALU_A. Programmers will have to bewary of this.Note: Overflow is implemented correctly only for the A+B, A+B+1, A-B, and A-B-1operations; other arithmetic ALU operations (A+1, A-1, 2A, 2A+1, etc.) may modify thebranch condition erroneously. fp#$q5pGf bA `SI ^(/ \L Z'. Y)XtY)p U3% TL P< NL M"=LGM5 K> JgK>3:Gt~ :Fk~ :D ~:Cc ~ :A~ >p:;At~ :9~ :89~:6~ 3gpI /qpR .*A *qp: (qp2 '#qp. #qpq p !3 8 QE  qp0 J5  ( n=T<Dorado Hardware ManualProcessor Section14 September 198119LC: Load Control for RM and TThis field controls the loading and source selection for the RM/STK memory and T register.The eight combinations are:Table 10: LC DecodesLC Meaning0No Action1T_Pd2T_Md, RM/STK_Pd3T_Md4RM/STK_Md5T_Pd, RM/STK_Md6RM/STK_Pd7T_Pd, RM/STK_PdThe only missing combination is T_Md, RM/STK_Md. T_Md, RM/STK_Md can beaccomplished by combining an LC value of 5 with the TgetsMd FF decode. It is illegal touse TgetsMd with other LC decodes.FF: Special FunctionThis field is the catch-all for functions not otherwise encoded in the instruction. Forconsistency with the hardware implementation, the 8-bit FF field is shown below as a two-bit field FA (= FF[0:1]) and two 3-bit fields, FB (= FF[2:4]) and FC (= FF[5:7]). Fieldvalues are given in octal.The FF field is interpreted as a function iff:(BSEL not selecting a constant) andJCN does not select a "long" goto or callWhen ASEL selects one of the memory references, the FF decode is forced to be that ofFA=0 because the FA field specifies the source for A or alternate memory reference in thiscase.The decoding assignments have been made with the following considerations:Functions that source the external BMux are grouped for easy decode of the signalthat turns off the processor's B-multiplexors.Operations that might be useful in conjunction with a memory reference are put inthe first 64 decodes (FA=0) since FA is decoded as zero on memory references.Functions decoded by different hardware sections are arranged in groups toreduce decoding logic. fp#$q5pGf bs ^pP \!YsX:Vgu:S_t:Q:PW:N:MO:K:JG:H EtpH CX A" =s :'p7! 8]@ 6D 4 1U.:-#:,) (/& &Z % !Jy.3yc.y2y''&yJy0 =X2Dorado Hardware ManualProcessor Section14 September 198120Table 11a: FF Decodes (FA = 0)FBFCFunction* The AMux is not disabled when A_xx decodes below are used while ASEL selects a shift.0-1A[12:15] _ FF[4:7]2 0A _ RM/STK2 1A _ T2 2A _ Md2 3A _ Q2 4XorCarry (complements ALUFM carry bit)see the "ALUF, ALU Operations" section2 5XorSavedCarrysee the "ALUF, ALU Operations" section2 6Carry20 (carry-in to bit 11 of ALU = 1)see the "ALUF, ALU Operations" section2 7ModStkPBeforeW (Use modified StkP for write address of STK)3 03 1ReadMap. Modifies action of Map_ (see "Memory Section")3 2Pd _ Input (checks for IOB parity error)3 3Pd _ InputNoPE (no check for IOB parity error)3 4RisId (causes Id to replace RM/STK in A_RM/STK, B_RM/STK, and shifter)3 5TisId (causes Id to replace T in A_T, B_T, and shifter)3 6Output _ B3 7FlipMemBase (MemBase _ MemBase xor 1)4-5Replace RMaddr[0:3] by RBase[0:3] and RMaddr[4:7] by FF[4:7] for write of RM;Forces RM to be written even if STK was read.60-7Branch conditions (see "Control"). In conjunction with an IFU jump in JCN,if the condition is true, IFU advance is disabled (see "IFU")7 0BigBDispatch _ B (256-way dispatch on B[8:15]. See "Control")7 1BDispatch _ B (8-way dispatch on B[13:15]. See "Control")7 2Multiply (Pd[0:15] _ ALUcarry,,ALU[0:14]; Q[0:15] _ ALU[15],,Q[0:14];Q[14] OR'ed into TNIA[10] as slow branchsee "Multiply")7 3Q _ B7 47 5TgetsMd (In conjunction with LC=5, this causes T_Md, RM/STK_Md)7 6FreezeBC (freezes previous values of ALU and IOAtten' branch conditions for 1 cycle)7 7Reserved as a no-opTable 11b: FF Decodes (FA = 1)FBFCAction0 0PCF _ B. Load PCF and starts fetching instructions0 1IFUTest _ B, dismisses junk wakeup, bits used as follows: 0:7 TestFG 8 TestParity 9 TestFault10 TestMemAck 11 TestMakeF_D12 TestFH'13 TestSH'14 enables testing0 2IFUTick0 3RescheduleNow (doesn't set Reschedule branch condition)0 4AckJunkTW_B. B[15]=1 shuts off junk task wakeups, =0 enables them; B[0:14] ignored0 5MemBase_B[3:7]0 6RBase_B[12:15]0 7Pointers_B (MemBase_B[3:7] and RBase_B[12:15])10:7Unused fp#$q5pGfcbsX ^u: [tW Z : W: U: T: R: Q:2 O:# M:$# L{: - J: Is:8 G:& Fk:, D:A Cc:2 A: @[:  =S:M;- 8:K7B= 5: 2 4::1 2:=128 /: .*: ,:? +":0 ): S%|sX "Pu: Ht:0 :2@ ! , 7 ! , 7 8: :7 0: G : (: :.  : x;YDorado Hardware ManualProcessor Section14 September 198121Table 11c: FF Decodes (FA = 1)FBFCAction*The following 8 FF decodes drive Mar from A.20-1Unused2 2CFlags _ A' (see Figure 10) (Mar must be stable during prev. instr.)2 3BrLo _ A. BR[16:31] _ A[0:15]2 4BrHi _ A. BR[4:15] _ A[4:15]2 5LoadTestSyndrome from DBuf (see Figure 10)2 6LoadMcr[A,B] (see Figure 10)2 7ProcSRN _ B[12:15]3 0InsSetorEvent _ B. If B[0] = 0, then B[4:15] are controls for EventCntA and EventCntB;if B[0] = 1, then B[6:7] are loaded into the IFU's InsSet register.3 1EventCntB _ B or equivalently GenOut_B (General output to printer, etc.)3 2Reschedule3 3NoRescheduleB data must setup during previous instruction and not glitch when writing IFUMLH/RHsee IFU section.3 4IFUMRH _ B. Packeda_B.5, IFaddr'_B[6:15]3 5IFUMLH _ B. Sign_B.0, PE[0:2]_B[1:3], Length'_B[4:5], RBaseB'_B.6,MemB_B[7:9], TPause'_B.10, TJump_B.11, N_B[12:15]3 6IFUReset. Reset IFU3 7BrkIns _ B. Opcode_B[0:7] and set BrkPending4 0UseDMD (see "Control Section")4 1MidasStrobe _ B (see "Control Section")4 2TaskingOff4 3TaskingOn4 4StkP _ B[8:15]4 5RestoreStkP4 6Cnt _ B (overrules Cnt=0&1 in the same instruction)4 7Link _ B (overrules loading of Link by Call or Return in same instruction)5 0Q lsh 1 (Q[0:14] _ Q[1:15], Q[15] _ 0)5 1Q rsh 1 (Q[1:15] _ Q[0:14], Q[0] _ 0)5 2TIOA[0:7] _ B[0:7] (Note: loaded from left-half of B)5 35 4Hold&TaskSim _ B (Hold reg _ B[0:7], Task reg _ B[9:15].See "HOLD and Task Simulator")5 5WF _ A (load ShC with write-field controlssee "Shifter")5 6RF _ A (load ShC with read-field controlssee "Shifter")5 7ShC _ B (see "Shifter")6 0B _ FaultInfo'. B[8:11]_SRN for 1st fault, B[12:15]_number of faults6 1B _ Pipe0 (B_VaHisee Figure 10)6 2B _ Pipe1 (B_VaLosee Figure 10)6 3B _ Pipe2' (see Figure 10)6 4B _ Pipe3' (B_Map'see Figure 10)6 5B _ Pipe4' (B_Errors'see Figure 10)6 6B _ Config' (see Figure 10)6 7B _ Pipe5' (see Figure 10)7 0B _ PCX'7 1B _ EventCntA' (see "Other IO and Event Counters")7 2B _ IFUMRH' (low part of IFUM)7 3B _ IFUMLH' (high part of IFUM)7 4B _ EventCntB' (see "Other IO and Event Counters")7 5B _ DBuf (normally non-task-specific data from last Store_  see "Memory")7 6B _ RWCPReg (= Link_B' and B_CPReg)7 7B _ Link fp#$q5pGfcbsX ^u: [t- Zf: X:> W^: U: TV: R:  QN: O: DNFC L:? K>: I: H6I F:# E-:=C1 B%: @:' =: <:  :: 9 : 7: 6: 4:4 2:F 1y:% /:$ .q:vt ,: +i:8) (`:7 &:6 %X: "P:D : H: : @: :# 8: : 0: :1 (: :  :1 :I :" : p;\>Dorado Hardware ManualProcessor Section14 September 198122Table 11d: FF Decodes (FA = 2)FBFCAction0-1RBase _ FF[4:7]2-3Replace RMaddr[0:3] by FF[4:7] for write of RM.Forces RM to be written even if STK was read.4TIOA[5:7] _ FF[5:7] (TIOA[0:4] unchanged)50-3MemBaseX _ FF[6:7](MemBase[0] _ 0, MemBase[1:2] _ MemBX[0:1], MemBase[3:4] _ FF[6:7])54-7MemBX _ FF[6:7]60-16 2Pd _ ALUFMRW (Pd _ ALUFMEM as below, ALUFMEM _ B.8, B[11:15])6 3Pd _ ALUFMEM (Pd.0 _ DMux data, Pd.8 and Pd[11:15] _ ALUFMEM[ALUF])6 4Pd _ Cnt (If Cnt=0&1 in same instruction, unmodified value is read)6 5Pd _ Pointers (Pd[1:2] _ MemBX, Pd[3:7] _ MemBase,Pd[8] _ StkOvf, Pd[9] _ StkUnd, Pd[12:15] _ RBase)6 6Pd _ TIOA&StkP (Pd[0:7]_TIOA, Pd[8:15]_StkP; if the instruction modifies StkPconcurrently, the MODIFIED value is read)6 7Pd _ ShC7 0Pd _ ALU rsh 1 (Pd[0] _ 0)7 1Pd _ ALU rcy 1 (Pd[0] _ ALU[15])7 2Pd _ ALU brsh 1 (Pd[0] _ ALUcarry)7 3Pd _ ALU arsh 1 (Pd[0] _ ALU[0] preserving sign)7 4Pd _ ALU lsh 17 5Pd _ ALU lcy 17 6Divide (Pd[0:15]_ALU[1:15],,Q[0]; Q[0:15]_Q[1:15],,ALUcarry)7 7CDivide (Pd[0:15]_ALU[1:15],,Q[0]; Q[0:15]_Q[1:15],,ALUcarry')Table 11e: FF Decodes (FA = 3)0-3MemBase _ FF[3:7]4-5Cnt _ small constant (Cnt[0:10] _ 0, Cnt[11] _ 0 if FF[4:7] # 0 else 1,Cnt[12:15] _ FF[4:7]; i.e., values of 1 to 16 are loadable)6-7Wakeup[n]  Initiate wakeup request for task FF[4:7] fp#$q5pGf1bsX ^u: [t: Zf:/X- W^:) U:TVC R: O: NF:; L:A K>:B I:0H62 F:KE-) C: B%: @: ?: =:. <: :: 9 :6 7:774^sX 12t: /:G.*< ,:4X *:=S+Dorado Hardware ManualProcessor Section14 September 198123Multiply and DivideThe Multiply, Divide, and CDivide functions operate on unsigned 16-bit operands. Unsignedrather than signed operands are used so that the algorithms will work properly on the extrawords of multiple-precision numbers.The actions caused by these functions are as follows:Multiply:Result _ ALUCarry..ALU/2Q _ ALU[15]..Q/2Next branch address _ whatever it is OR 2 if Q[14] is 1.Divide, CDivide:Result _ 2*ALU..Q[00]Q _ 2*Q..ALUCarry -or- 2*Q..ALUCarry'Complete examples for Multiply and Divide subroutines are given in the microassemblerdocument. The inner loop time is 1 cycle/bit for multiply and 2 cycles/bit for divide.ShifterSee Figure 4.Dorado contains a 32-bit barrel shifter and associated logic optimized for field extraction,field insertion and the BitBlt instruction.The shifter is controlled by a 16-bit register ShC. To perform a shift operation, ShC isloaded in one of three ways discussed below with 14 bits of control information, and one ofeight shift-and-mask operations is then executed in a subsequent instruction. Alternatively,(a limited selection of) shift controls may be specified in FF and BSEL concurrent with ashift; in this case, ShC is not modified. ASEL=7 causes a shift and ALUF[0:2] select thekind of masking.The execution of a shift instruction (after ShC has been loaded in a previous instruction)proceeds as follows:ShC[2] selects between T and RM/STK for the left-most 16 bits input to the shifter;ShC[3] selects between T and RM/STK for the right-most 16 bits. Using the RisIdor TisId FF decode in the same instruction allows Id to replace either T or RM/STKin the shift. This 32-bit quantity is then left-cycled by the number of positions (0-15) given by ShC[4:7]. When ShC[2] and ShC[3] are both 1, then the shifter left-cycles T; when both 0, RM/STK. In these cases it operates as a 16-bit cycler.When ShC[2] and ShC[3] are loaded with complementary values, then it left-cyclesthe 32-bit quantity R..T or T..R.The low order 16 bits of shifted data are placed complemented on A by the shift,and normal A source is disabled (except when the source for A is encoded inFFsee the ASEL section).ALUF[0:2] select one of eight mask operations (see below) and the first three fp#$q5pGf bs ^pF \E [$ W5 T3t:R:Q+:O8 L:K:I% F$pF DZ> @s:x&Qx[/x:x!xT1q p xHxx MG <]Dorado Hardware ManualProcessor Section14 September 198124ALUFM address bits are forced to 1, so that the ALU operation in either ALUFM168 or ALUFM 178 can be performed. This must be a logical ALU operation usingthe shifted data on A and data on B because there is insufficient time to propagatecarries for an arithmetic operation. The intent is that ALUFM 168 contain thecontrol for the "NOT A" ALU operation normally desired, while ALUFM 178 is usedby BitBlt and other opcodes that need computed ALU operations.ALU output passes to the masking logic. The mask operation determines which oftwo independent masks in ShC are applied to the data. LMask contains 0 to 15ones starting at bit 0, RMask 0 to 15 ones starting at bit 15. The masked area(s)of ALU output corresponding to 1's in the mask are replaced either with zeroes orwith corresponding bits from Md according to the shift-and-mask function selected.Replace-with-Md generates HOLD if Md isn't ready yet, and the timing for this isthe same as Md onto B (i.e., data is never ready sooner than the secondinstruction after the Fetch_).Masked data is routed onto Pd, then sent to the destination specified by LC.Note: The Pd input multiplexor is used to carry out masking, so it is illegal tocombine a shifter operation with an ALU shift in the same instruction.Three functions load ShC: RF_A and WF_A treat A[8:15] as a Mesa field descriptor andtransform the bits appropriately before loading ShC; they also load ShC[2:3] from A[2:3].ShC_B allows an arbitrary value to be placed in ShC (used by BitBlt).Microcode for the Mesa RF (Read Field) and WF (Write Field) opcode is shown as anexample of the use of the shifter. In these examples, a and b are the two operand bytesfor the opcode, as discussed in "Instruction Fetch Unit." RF and WF both take a pointerfrom the top of the stack and add a to it as a displacement. RF fetches the word, andpushes the field specified by b onto the stack; WF fetches the word, and inserts a fieldfrom the rightmost bits of the word in the second position of the stack into it, then restoresthe word to memory.RF:IFetch_Stack, TisId;*Calculate the pointer. a replaces BR[MemBase] (MDS);*this value is then added to Stack to compute the*address for the pointer.Stack_Md, RF_Id;*IFU supplies b, the field descriptorIFUJump[0], Stack_ShiftLMask;*Right-justify & mask the field, IFU to next instructionWF:T_(IFetch_Stack&-1)+T, TisId;*Start fetch of word containing fieldWF_Id, RTemp _T;*IFU supplies b, the field descriptorT_ShMdBothMasks[Stack&-1];IFUJump[0], Store_RTemp, DBuf_T;The shift controls come directly from FF if ASEL=7 (a shift) and if BSEL = 4, 5, 6, or 7,selecting a constant. This specifies complete shift control in the instruction which does theshift, so ShC doesn't have to be loaded in a previous instruction, and ShC isn't clobbered,so io tasks don't have to save and restore it. When BSEL controls a shift in this way, the Bsource is forced to be Q.The mask operations are as follows: gp#$q5pGfxb;x`S_t`Sp _t`Sp>x^@x\8\1t\p xZ7ZftZpxY)>xU'(xS/xR"?xPW%,xN;xL9xJ!&xI-xE.xBIqp:x@~F = #2 ;AG 9wE 62 4:&wpwp 2p< 0wp3 .wp5 -*4 +E 't;$ wt$& 1$$:"$ wt:!6$ * .;$:$ wt::8 pM O 1F fB  *# =]KhDorado Hardware ManualProcessor Section14 September 198125Table 12: ALUF Shift Decodes ALUF[0:2]*0ShiftNoMask1ShiftLMaskmasked bits on the left-hand-side of the word replaced with 0's2ShiftRMaskmasked bits on the right with 0's3ShiftBothMasksmasked bits on both sides replaced with 0's4ShMdNoMaskunused (falls out of decoding)5ShMdLMaskmasked bits replaced with Md6ShMdRMaskmasked bits replaced with Md7ShMdBothMasksmasked bits replaced with Md*ALUF[3] selects the ALU operation in either ALUFM 168 or 178ShiftLMask implements right shift and load-field operations; ShiftRMask implements leftshift; ShiftBothMasks deposits the selected field into a word of zeroes; ShMdBothMasksdeposits the selected field into data coming from memory; and ShiftNoMask implementsvarious cycle operations.Note: On a shift the ALU branch conditions apply to the unmasked ALU output.Hold and Task SimulatorThe hold and task simulators are provided for hardware checkout (programmers skip thissection).Hold&TaskSim_B loads HOLDSIM[0:7] from B[8:14]..0 and TASKSIM[0:6] from B[1:7].HOLDSIM is a recirculating shift register in which the presence of a 1 in bit 7 causes HOLDtwo instructions later. For example, Hold&TaskSim_2008 will complete three instructionsafter the Hold&TaskSim_, HOLD the next cycle, and HOLD every seventh instruction (i.e.,every eighth cycle) thereafter. Since this register cannot be loaded with all 1's and since itsclocks are not disabled by HOLD, HOLD of infinite duration is impossible.To disable this debugging feature, the register must be loaded with 0.TASKSIM is a seven-bit counter which determines the number of cycles before a taskwakeup occurs. The task selected for wakeup must be jumpered on the backplane (else no-op). Whenever TASKSIM is loaded with a non-zero value, it counts up to 1778, thengenerates a wakeup request when the counter overflows to 2008. The wakeup requestremains true until TASKSIM is reloaded. fp#$q5pGf`SsX^t]( :[ :Z J:X,:W::U):T&:R&:Q*:M5MrMMr Jp!6 HV GT EQ A9qp 4 "s@ %& t p #Qtp ' >K>Dorado Hardware ManualControl Section14 September 198126Control SectionThe control section interfaces the mainframe to the baseboard microcomputer or Alto whichcontrols it as detailed in the "Dorado Debugging Interface" document. In addition, thecontrol section stores instructions in 4k x 34-bit (+2 parity) IM ("Instruction Memory") andcontains logic for sequencing through instructions and switching among tasks.The current instruction is clocked into the MIR register at t0 and exported to the processor,memory, and IFU sections for decoding. The control section itself decodes the JCN field,the BLOCK bit, and its own FF decodes (Wakeup, B_Link, B_RWCPReg, Link_B,TaskingOn, TaskingOff, BDispatch_B, BigBDispatch_B, Multiply, MidasStrobe_B, UseDMD,and branch conditions).The control section also exports the task number via the Next bus, which somewhat after t-2contains the task number that will execute an instruction at t0.Figure 5 shows the overall organization of the control section. Figure 6 shows how branchcontrol is encoded in JCN. Figure 7 shows the timing for regular instructions and for themulti-cycle TPC and IM read/write instructions.TasksDorado provides sixteen independent priority-scheduled tasks at the microcode level. Task15 is highest priority, task 0 lowest. Task 15 (the "fault task") is woken by StkError and bymemory map and data error faults. Tasks 1-14 provide processing functions for iocontrollers implemented partially in hardware, partially in firmware; the present assignment ofthese tasks to device controllers is given in the "Slow IO" chapter. Task 0 (the "emulator")implements instruction sets (Mesa, Alto, etc.). In the absence of io activity, task 0 (alwaysawake) controls the processor.Essentially, io devices are paired to tasks when built, and a device controller can assert awakeup request for the task with which it is paired. A program cannot modify theassignment of controllers to tasks (although the hardware change for this is easy).Additional flexibility in this area is not thought to be worth additional hardware cost.Each task has its own program counter and subroutine return link, stored in the (task-specific) TPC and TLINK registers when the task is inactive. TPC may also be treated as amemory, so program counters for tasks other than the current task can be read and writtenby a program. This is discussed later in this chapter.Task SwitchingWhen device hardware requires service from a task, it activates its wakeup request line at t0.Wakeup requests are priority-encoded, and the highest priority request (BNT or "Best NextTask") is clocked at t2 and competes with the current task (CTASK) for control of themachine. If BNT is higher priority than CTASK, or if the current (non-emulator) instructionhas BLOCK = 1, a task switch will take place; in this case, CTASK will be loaded from BNTat t4. This implies that the shortest delay from a wakeup request to the first instruction of fp##q5pGf ar ^epK \S Z#!7ZycZ ?7Zy)Z YM U"UsUp SJ Q34, P4L Ni JFJjs I-p>HsI-p EO D*0 BI/ =/t 9p< 7B 6( ! 0 4^;$ 2 R 0"< . +7% )Q ': &,B " I X %H Z7 At p%7Bsp -#B9E :s:p: oL =  Ms p.+ *>\fDorado Hardware ManualControl Section14 September 198127the associated task is two cycles.The 16 Wakeup[task] FF decodes allow any task to be woken, just as though a hardwaredevice had activated its wakeup line. A minimum of two cycles elapses after the instructioncontaining Wakeup before the task executes its first instruction. The task responding to aWakeup must not block sooner than the second instruction, or it will get reawakened.When a task has been woken by Wakeup[task] or has executed one or more instructions andthen deferred to a higher priority task, the fact that it is runnable is remembered in a Readyflipflop. The Ready flipflop is cleared only when the associated task blocks. In other words,there is no way to deactivate a task, after its ready flipflop has been set, except by forcing itto execute an instruction that blocks. The Wakeup[task] function must be executed withtasking off, if it is possible that the specified task might be waking up for some other reason(e.g., due to a wakeup request from an external device, or due to a wakeup issued by yetanother task). Otherwise, the control section may get horribly confused, and the machinewill hang in the same task forever.An acceptable sequence is:TaskingOff;Wakeup[task];TaskingOn;The baseboard and Alto controllers may also clear the Ready flipflops by another mechanism,discussed in "Dorado Debugging Interface".The emulator has no Ready flipflop and cannot block; the BLOCK bit in the instruction is interpretedas StackSelect for the emulator.Task switching may occur after every instruction unless explicitly disabled by the TaskingOfffunction. The TaskingOn function reverses the effect of TaskingOff. TaskingOff is "atomic";an instruction containing TaskingOff will be held if a task switch is pending; the nextinstrcution will be executed in sequence without any intervening task switches. TaskingOn isnot immediately effective; at least two more instructions will be executed by the same taskbefore task switching can occur.It would be a programming error for a task to block with tasking off, but if it did, the block would fail,and it would continue execution.It is illegal for a task to block in an instruction that might be held, if the wakeup line for thetask might be dropped at t0 of the instruction. If this occurred, the instruction mightinadvertently be repeated before the block occurred.RemarkMultiple tasks seem better than a more conventional priority interrupt system because interference byinput/output tasks is substantially reduced. As to the exact implementation, variations are possible. The currentscheme requires more hardware than one in which the program explicitly indicates when a task switch is legal(as on Alto and D0). However, because Hold may last for about 30 cycles, a reliance upon explicit taskingwould result in inadequate service for high priority tasks. fp##q5pGf b" ^> \> [K YLT UC TH RER Pza N8 LX K#5 IPY G# Dy@ y> y= y:Ks[y8*y6KX y4 1p7& /S .$3 ,<M *r[ (y%s%Ey$ !6p8* k skp= 4 u s!D vM& l nL ; N =VDDorado Hardware ManualControl Section14 September 198128Next Address GenerationThis section gives a low-level view of jump control. Because the microassemblerand loader handle details of instruction placement automatically, programmers neednot struggle with the encodings directly. For this reason, programmers may wish toskim this section while concentrating on high-level jump concepts described in"Dorado Microassembler".Read this with Figure 6 in front of you.For the most part, instruction memory (IM) addressing paths are 16 bits wide, although only12 bits are presently used; the extra width allows for future expansion to 13 or 14 bits, whensufficiently fast 4kx1 ECL RAMS are economically available; there are no plans to utilize theremaining 2 bits, but since nearly all hardware components in the control data paths arepackaged 4/can, the extra two bits are almost free. Also, the 16-bit wide Link register canbe used to hold full word data items.The various registers and data paths that contain IM addresses are numbered 0:15, wherebits 4:15 are significant for the 4k-word microstore, while the quadrant bits 2:3 are ignored.This numbering conveniently word-aligns the bits while also allowing for future expansion.The discussion below assumes a 4k-word microstore.Dorado does not have an incrementing instruction-address counter. Instead, the address ofthe next instruction is determined by modifying the current instruction address (CIA) invarious ways. The Tentative Next Instruction Address (TNIA) is determined from JCN[0:7] inthe instruction according to rules in Figure 6. TNIA addresses IM for the fetch of the nextinstruction unless a task switch occurs. If a task switch occurs, the program counter for thehighest priority competing task (BNPC or "Best Next PC") addresses IM. A 16k-word microstore is viewed as consisting of four 4k-word quadrants; each IM quadrantis viewed as containing 64 pages of 64 instructions. Values in JCN are provided for thefollowing kinds of branches:Local branches to any of the 64 locations in the current page;Global branches to location 0 on any of the 64 pages of the current quadrant;Long branches to any location in the quadrant using the 8-bit FF field to extend JCN(normal interpretation of FF is disabled);Conditional branches to any of 14 even locations in the current page, if the selectedcondition is false, or to the adjacent odd location, if the condition is true (7 branchconditions are available);IFU jumps to a starting address supplied by the IFU; JCN selects any one of up to 4entries in the starting address vector (This is motivated by an entry-vector schemediscussed in "Instruction Fetch Unit".);read/write IM and read/write TPC, after which execution continues at .+1;Return to the address in Link; fp##q5pGf bty^q+%y\(*y[KyYL7yW9TX( Pp; N+3 ME K>N IsX G% D7> BlQ @0* >2 ;eZ 9; 7? 60, 4:3+ 2pG ."7 -3C +iy(q p0y&,qp;y#q pGy!*y%qp1yZM yyqp+y'Sy\(yIy qp >X2 Dorado Hardware ManualControl Section14 September 198129Branch conditions may also be specified in FF, as discussed below. Several dispatches mayalso be specified in FF. These 'OR' bits into the branch address computed by the followinginstruction.If IM is expanded to 16k words, branching from one quadrant to another will only be possibleby loading the Link register with a 14-bit address and then returning; jumps, calls, andIFUJumps will be confined to the current 4k-word IM quadrant.Remarks on JCN EncodingJCN cleverly encodes in 8 bits almost as much programming flexibility as would be possible with an arbitrarilylarge and general field. The main disadvantage is that MicroD is needed to postprocess assemblies and placeinstructions.The earliest prototype of Dorado used a 7-bit JCN encoding that had fewer global and conditional branchtargets, so programming was harder and additional instructions had to be inserted in a few places. This wasslightly worse than the 8-bit encoding, but it would have been feasible to stay with the 7-bit encoding andemploy the bit thus saved for some other use in the instruction.Local, global, and long branches are analogous, respectively, to local, page-zero, and indirect branches used onmany minicomputers. However, Dorado scatters its global locations over the microstore rather than concentratingthem in page-zero; this is better than the minicomputer scheme for the following reason. During instructionplacement, when a cluster of instructions is too large to fit on one page, a global allows it to be divided betweentwo pages; but if all globals were in page zero, then page zero itself would quickly fill up. In other words,dispersing the globals is theoretically more powerful than concentrating them in page zero; because MicroD doesall the tedious work of placing instructions, this theoretical advantage is made practical; minicomputers have notemployed any program like MicroD, so they have used the less powerful but simpler page-zero scheme.Local branches on Dorado are within a 64-word page, where minicomputers usually branch relative to the currentPC. Relative branching is probably more powerful, but it cannot be used on Dorado because of insufficient timefor addition.Long branches on Dorado use 4 bits of JCN in conjunction with the 8-bit FF field to specify any location in the4k-word quadrant. Since BSEL never selects a constant in this case, an improvement on our scheme wouldhave used 3 bits of JCN in conjunction with BSEL.0 and the 8-bit FF field; this would have freed 8 values ofJCN to encode some other kind of branch. In addition, 5 of the 256 values of JCN are unused and 1 is aduplicate (See Figure 6 for the 5 unused decodes; the replicated decode is the Global call on the Local page.).We have variant JCN decodings that correct these problems, but they were not ready when the design wasfrozen.Conditional BranchesIM is organized in two banks, with odd addresses in one bank, even in the other. Theaddress is needed shortly after t0, but the bank-select signal not until 15 ns after theaddress. For this reason conditional branches select between an even-odd pair ofinstructions (i.e., between the two banks) according to branch conditions that need not bestable until a little after t1.Alternatively, a conditional branch may be encoded in FF in conjunction with any addressingmode except a long branch in JCN. When this is done, the result of the branch test is ORedwith TNIA[15].This implies that for both FF-encoded and JCN-encoded branch conditions, the false targetaddress is even and the true target is odd.Hence, it is possible to conditionally branch using only JCN, while using FF for an unrelated fp##q5pGf bV `S9q ^p [W YL*. W= S_u Ps95 O`T M Ka` JQ HT G?@ D_ C@b AF& @~] ?] =o <\[ :): 8]%I 6j 5 2S 1+< 0;C) .(? -zZ ,b * %t "PpP !s p Q G &s&p ;  N B Y + -0v L>\RDorado Hardware ManualControl Section14 September 198130function, or to encode a branch condition in FF while using any addressing mode in JCN. Ifbranch conditions are encoded in both FF and JCN, the branch test results are OR'ed,providing further flexibility.The branch condition encodings are:Table 13: Branch ConditionsJCN[5:7]FFBranch Condition060ALU=0161ALU<0262ALUcarry'363Cnt=0&-1 (decrements count after testing)464R<0 (RM or STK, whichever is selected, not overruled by RIsId)565R Odd (RM or STK, whichever is selected, not overruled by RIsId)666IOAtten' (non-emulator) or ReSchedule (emulator)67OverflowALU=0 and ALU<0 are the results of the last ALU operation executed by the current task.ALUcarry' (the saved carry-out of the ALU) and Overflow are the result of the last arithmeticALU operation executed by the current task (ALU_A may be stored in ALUFM as either anarithmetic or logical operation, so programmers should be wary of smashing these branchconditions when ALU_A is used.). These are saved in a RAM and may be frozen by theFreezeBC function for one cycle. In other words, the branch conditions are ordinarily loadedinto the RAM at t3, but if FreezeBC is present, then the RAM is not loaded and values fromthe previous instruction for the same task will apply.The IOAtten' branch condition tests the task-specific IOAttention signal optionally generatedby the io device associated with the current (non-emulator) task.Remark on Target PairsThe bank-select toggling trick, which allows branch conditions to be developed very late, is valuable. Withoutthis trick, it would be necessary to choose between slowing the instruction cycle or restricting branch conditionsto signals stable at t0. Neither of these alternatives is palatable.A more traditional implementation of conditional branches would go to the branch address, if a condition weretrue, or fall through to the instruction at .+1, if it were false. This traditional scheme is never faster but issometimes more space-efficient than the target-pair scheme because the target-pair requires a duplicatedinstruction for every instance of a conditional branch to a single target, which is fairly common. The traditionalscheme does not allow DblGoto and DblCall constructs discussed in "Dorado Microassembler," but these areinfrequent.Subroutines and the Link RegisterDorado provides single-level subroutines by means of the (task-specific) Link register. A Calloccurs on any instruction whose destination address is 0 mod 16 before any modification ofTNIA due to branch conditions or dispatches. On a Call, Return, or IFUJump, Link is loadedwith CIA+1.Because Return loads Link with CIA+1, CoReturn constructs are possible. Because IFUJump alsoloads Link with CIA+1, the conditional exit feature discussed in the "Instruction Fetch Unit" chapter ispossible. fp##q5pGf b5& `ST ^ [#VtX S_v~!Y:P4s~!Y:N~!Y:M,~!Y:K~!Yus:J#~!Y'us:H~!Y)us :G~!Y0:E~!Y BIp8 @~.%q >p1$ <6! ;+( 9T] 76s7pH 56 2LA 0A -Vu *sP )WK' ''j'. %XI$ #^ " Z !6B1 I t ~t!  pH BQ w2)  y sP y hy *  >]RDorado Hardware ManualControl Section14 September 198131CIA+1 is used rather loosely in discussion here; the actual value loaded into Link by a callor return is [(CIA & 1777008) + ((CIA+1) & 778)]. In other words, a call or return in location778 of any page loads Link with location 0 of that page.Link may be loaded and read by programs, so deeper subroutine nesting is possible, if Linkis saved/restored across calls.The functions Link_B and B_RWCPReg and the B dispatch functions discussed below, all of whichload Link from B, overrule a call. In other words, if there are conflicting reasons for loading Link,Link_B wins over Link_CIA+1.The B_RWCPReg function (= Link_B, B_CPReg') is provided primarily for initialization from thebaseboard computer and for use by the Midas debugging program. Since the CPReg register clock isasynchronous to the Dorado clock system, a Dorado microprogram that reads CPReg (e.g., to receiveinformation from the baseboard) must use some synchronization method to ensure that CPReg isstable during the cycle in which it is read.Note: it is illegal to use an ALU branch condition in the instruction after Pd_RWCPReg, if CPRegmight have been loaded during the cycle in which it is readthis might result in an unstable IMaddress being presented to the control store.Remark on Call/JumpDeciding between call and jump based on target address saves one bit in the instruction and costs little for thefollowing reasons. Instructions can be divided into three groups: those always jumped to, those always called,those for which Link can be smashed (i.e., "don't care" about call or jump), and those both jumped to andcalled.A realistic guess is that over half of all instructions will be "don't care"; namely, these will be executed at thetop level, not inside a subroutine, and the Link register will not contain anything of importance. Assemblylanguage declarations make this information available to MicroD.The hardware makes 1/16 of the locations in each page "call locations". It is estimated that this is somewhatmore than real programs will need, on the average (although we vacillated about whether 1/8 or 1/16 of thetargets should be calls).In each page, MicroD first places instructions that must be called or must be jumped to. Because there are somany "don't care" instructions, it is unlikely that either call or jump slots in a page will be exceeded.Consequently, it will nearly always be possible to complete allocation of the call and jump targets withoutoverflowing due to the call/jump restriction. After this "don't care" instructions fill in the remaining slots.The remaining situation, with which Dorado cannot cope, is an instruction both called and jumped to. Thiswould arise in a subroutine whose entry instruction closed a loop (uncommon). On Dorado, this situationrequires duplicating the entry instruction, so it costs one location but no extra time. fp##q5pGf bM `S_s`Sp_s`Sp ^]s^p5 [:+/ YoyVs?yUMTySyQNYyOXyN10yM,HyK,yI-!?yG_yFk- Bu ?susus9 >'H =/P ; 90 h 7 b 6o@ 3N 2p;/ 1 .q;3 -36 +R *N97 '` &OX $W d $>C@Dorado Hardware ManualControl Section14 September 198132DispatchesSeveral FF decodes are dispatches which OR various bits with TNIA[8:15] during thefollowing instruction. The dispatch bits must be stable by t2.Dispatches are:BigBDispatch_BB[8:15] (256-way dispatch)BDispatch_BB[13:15] (8-way dispatch)MultiplyOR's Q[14] into TNIA[14] (The value of Q[14] is captured in a flipflop at t2 of theinstruction containing the Multiply function and is OR'ed into TNIA[14] during thenext instruction for the same task.)Example:BDispatch_T;Branch[300];*branches to 300 OR T[13:15]The two B dispatches load Link register from B, then OR appropriate bits of Link into TNIAduring the next instruction for the task. Since Link is task-specific, this works correctlyacross task switching. The Q-bit is only loaded during a multiply, and tasks other than theemulator are not allowed to use the multiply function.The decision between call and jump in the instruction after a dispatch is unaffected bydispatch bitsit depends only upon JCN. In other words, the instruction following adispatch is a Call if its unmodified target address is 0 mod 16, else a jump.It is possible to neutralize any bits in a dispatch by placing target instructions at locationswith 1's in the neutralized bits. In other words, a dispatch on B[8:10] could be accomplishedby locating the 8 target instructions at IM locations whose low five address bits were 1, e.g.at 378, 778, 1378, 1778, 2378, 2778, 3378, and 3778, and by branching to 378 in theinstruction after the BigBDispatch_B.Note: Methods discussed later for resuming a program interrupted by a page fault do notpermit continuation when a fault occurs between a dispatch and the following instruction; forthis reason, programmers should ensure that no fault can possibly occur by holding formemory faults with _Md prior to or concurrent with the dispatch; also, stack operations thatmight overflow/underflow may not be used in the instruction after a dispatch.Note: When the PC for another task is loaded using the LdTPC_ operation discussed later,any pending dispatch conditions for that task are cleared. The debugging program Midasdoes not clear pending dispatches, however, so it should be ok to put a breakpoint on theinstruction after a dispatch or to single-step through a dispatch.IFU AddressingThe IFU supplies ten bits of opcode starting address to the processor. During the lastinstruction of every opcode, exit to the next opcode is accomplished by IFUJump[n] (n = 0to 3) which selects among four entry locations for the next opcode. The starting addresssupplied by the IFU is used for TNIA[4:13] and TNIA[14:15] are set to n. If the IFU isunprepared, it supplies a trap address instead of a starting address, and control goes to the fp##q5pGf bt ^pq p1 \qp4\Ts\p Y:Vs :U :S_.RS_Q#/PW$ Mp:J :H6  DC BA A.X ?d6 ;%2 :'< 8]M 4_ 3 V 1U;# /.s/p.s/p.s/p.s/p.s/p.s/p.s/p.s/p.s/p -% *NqpS (:# &V $ O #$M qp2" W C RB 9t pN W 2@ g> Y U=]TDorado Hardware ManualControl Section14 September 198133nth location in a trap vector.IFUJump's always load Link with CIA+1. This is necessary to implement the followingconditional exit feature for opcodes.If an FF-encoded branch condition is true in the same instruction as an IFUJump, IFUadvance to the next opcode is disabled. This kludge allows an opcode with common anduncommon exit conditions to finish, for example, with IFUJump[2,condition]. If the conditionis false (common case), then the IFU advances normally to the next opcode, starting atlocation 2 of the entry vector. Otherwise (uncommon case), control continues at location 3of the entry vector, but the IFU does not advance, so emulation of the current opcode cancontinue.Utilization of IFUJump and conditional IFUJump is discussed in "Instruction Fetch Unit."IFU trap addresses and other reserved locations in the microstore are as follows:Table 14: Reserved Locations in the MicrostoreReasonLocationsCommentReschedule request*14-17Indicates that some previous instruction executed theReSchedule function.IFUM parity error*74-77Indicates a hardware failure in the IFUM storage.IFU not ready*34-37The instructions in this vector should contain IFUJump[n],waiting for the IFU to become ready.IFU data parity error *4-7Parity wrong on data from cache.IFU map fault *0-3The IFU buffers the fact of a map fault and completes allopcodes in the pipe ahead of the one experiencing the fault.Upon dispatch to the first instruction for the opcode affectedby the fault, this trap occurs.Midas Call command 7776Midas Crash detect 7777*Ifu traps OR the 1's complement of the instruction set into bits 8:9 of the trap address, so actual traplocations for Reschedule, for example, are 14-17, 114-117, 214-217, and 314-317. The trap vector is 1to 4 instructions long according to the IFUJump programming convention, as discussed in the"Instruction Fetch Unit" chapter.IM and TPC AccessSee figures 6 and 7.IM is read and written by programs using a special decode of JCN in conjunction with theRSTK field of the instruction; TPC is also read and written using a special JCN decode.TaskingOff must be in force, and anything that might cause hold is illegal in the sameinstruction; hold is also illegal in the instruction after an IM or TPC read, when the data isaccessed using B_Link.It has been reported that IM_Md doesn't work because _Md causes hold at unexpected times.After the read or write instruction, control passes to the next sequential instruction, i.e., toCIA+1 (with wrap-around at 64-word page boundaries). CIA+1 also winds up in Link. fp##q5pGf b ^T \% Yo5 WQ UE TN RE[ PzN N K>1' GQ'DZtX/xA.uR$x>&sR%54(5%5<x:R%5(x90 R%5#%57$x6R%5x4: R%5!%52 .%51y>%50x.MRx,Rx)Tx( Yx'#Wx%! t:Zq pS "5 SqQ G ys,- pJ 5 4 >]o:Dorado Hardware ManualControl Section14 September 198134Note: The hardware does not actually load Link with the IM or TPC data; instead B_Link in the nextcycle routes inverted data onto B using an alternate path. The Link register itself is smashed withCIA+1 as discussed above, and this value would be read (assuming it wasn't overwritten) in laterinstructions.This implies that continuation from a breakpoint or program-interrupt halt on the instruction followingan IM or TPC read (i.e., on the B_Link instruction) won't work correctly.Total time for an IM or TPC read or write operation is 6 clocks (i.e., thrice as long as anormal instruction).A 34 (+2 parity)-bit IM word is read as four 9-bit quantities. The read address is taken fromLink. Data must be read from Link[7:15] in the instruction immediately after the IM read; thisdata is inverted; Link[0:6] contain 1's, so that when the entire word is 1's complemented thedesired data will have leading 0's. The byte select is RSTK[2:3].IM writes also take the write address from Link, 16 bits of data from B and 2 bits from RSTK;the half-word affected is also specified in RSTK.Any task can read or write TPC for an arbitrary task other than itself (an attempt to set TPCof the running task is unpredictable). The task number is B[12:15], and data is taken from orwritten into Link. The assembly language notations for these are RdTPC_B and LdTPC_B.After RdTPC_B, the 16 bits of data in Link are 1's complemented.Note: The dispatch-pending conditions for a task whose TPC is loaded by LdTPC_ arecleared, so LdTPC_ works even when that task has just executed a BDispatch_B orBigBDispatch_B.HoldMany events in the memory system, StkError and the hold simulator in the processor, andseveral IFU error conditions generate hold (The IFU error conditions cause a one-cycle holdiff an IFUJump occurs on the first cycle of the error.). The control section itself forces holdwhen a task switch occurs concurrent with TaskingOff. This signal, clocked at t1, occurswhen the current instruction cannot be completed. Its effect on the hardware is to suspendthe current instruction, while completing parts of the previous instruction that have beenpipelined into the current cycle. Approximately, it converts the current instruction into aGoto[.] while preserving branch conditions and some other stuff.Higher priority tasks are not prevented from running when the current task is experiencingHold.RemarkThe fact that the address of the next instruction is needed at t0, while Hold is not generated until t1 means thatconcurrence of Hold and BLOCK with a switch to a lower priority task produces an anomalous situation called"Next Lies". The hardware disables clocks to CIA, TPC, and MIR when this occurs, so that the currentinstruction is repeated. This results in some hardware complications discussed in the "Slow IO" chapter, butprogrammers need not worry about it.Program Control of the DMux fp##q5pGfybAsA!y`Oy_@y^ y[VyZ I Vp)1 U QG Oq4p M+2 L5B HJ F1 C Q A&8 ?7 >&@ :qpC 8@ 7 2t .p!6 ,V *Z )4 E(s)4p 'iS %2( #= " @ qp  u s@ ?, ?& :76 $ t >]o4Dorado Hardware ManualControl Section14 September 198135Dorado contains a large number of multiplexors called mufflers which allow a selected signalfrom a set of up to 2048 signals to be observed on a one-wire bus called the DMux. Thisprovides a passive method by which the Baseboard section or the external Midas debuggercan examine internal control signals and registers not otherwise observable.The particular DMux signal is selected by shifting in an 11-bit address one bit at-a-time.Each board with mufflers contains a 12-bit address register that responds to the shiftedaddress bits; the highest bit is ignored for the purposes of selecting the signal to be read."Dorado Debugging Interface" discusses a clever generator algorithm that allows all 2048signals to be read into a table in 2048+11 shift-read cycles.In addition, the DMux address can also be executed as a control function. In this case thefull 12-bit address determines what function is executed. This "manifold" mechanism isused to control power supplies, set clock rate, enable/disable error halt conditions, and testIM without involving other hardware.The DMux facility can also be controlled directly by Dorado programs by means of theMidasStrobe_B and UseDMD functions. Essentially, the DMux address mechanism iscontrolled externally by the Baseboard or by Midas operating through the Baseboard whenDorado isn't running, and by Dorado when Dorado is running.The MidasStrobe_B function causes B[4] to be shifted out as an address bit. This takesthree cycles, so the program must execute three more instructions before doing anotherMidasStrobe_B function. The DMux signal selected by the last 11 address bits shifted out isread on B[0] when the Pd_ALUFMEM function is executed.The UseDMD function causes the current DMux address to be executed as a manifoldoperation.The following subroutine reads the DMux signal selected by the address in T:Subroutine;ReadDMux:Cnt_13S;RdDMuxLp:MidasStrobe_T;*Shift out address in T[4]Noop;Noop;T_(T) lsh 1, Goto[RdDMuxLp,Cnt#0&-1];T_ALUFMEM;*T[0] returns selected DMux addressReturn; fp##q5pGf bL `S K ^"5 \L YL5% WL UF S9 R"= NM LT KI IP$ E(, D1 BI). @~; = 8 ;A/' 9w@ 76 4:6 2p .L:+s :*+(:'#% "s$"!% "s# 2 >J#Dorado Hardware ManualMemory Section14 September 198136Memory Section Dorado supports a linear 22-bit to 28-bit virtual address space and contains a cache toincrease memory performance. All memory addressing is done in terms of virtualaddresses; later sections deal with the map and page faults. Figure 8 is a picture of thememory system; Figure 9 shows cache, map, and storage addressing. As Figure 8suggests, the memory system is organized into three more-or-less independent parts:storage, cache data, and addressing.Inputs to the memory system are NEXT (the task that will control the processor in the nextcycle) from the control section, subtask from io devices, Mar (driven from A or by the IFU),MemBase, B, the fast input bus, and an assortment of control signals. Outputs are B, Mdto the processor, the F/G registers for the IFU, the fast output bus (data, task, andsubtask), and Hold.The processor references the memory by providing a base register number (MemBase) and16-bit displacement (Mar) from which a 28-bit virtual address VA is computed; the kind ofreference is encoded in the ASEL field of the instruction in conjunction with FF[0:1].Subsequently, cache references transfer single 16-bit words between processor and cache;fast io references independently transfer 256-bit munches between io devices and storage.There is a weak coupling between the two data sections, since sometimes data must beloaded into the cache from storage, or returned to storage.The storage pipeline allows new requests every 8 cycles, but requires 28 cycles tocomplete a read. The state of the pipeline is recorded in a ring buffer called the pipe,where new entries are assigned for each storage reference. The processor can read thepipe for fault reporting or for access to internal state of the memory system.Memory AddressingProcessor memory references supply (explicitly) a 16-bit displacement D on Mar and(implicitly) a 5-bit task-specific base register number MemBase. Subtask[0:1] (See "SlowIO") are OR'ed with MemBase[2:3] to produce the 5-bit number sent to the memory.MemBase addresses 1 of 32 28-bit base registers. The full virtual address VA[4:31] isBR[MemBase]+D. D is an unsigned number.The 28 bits in BR, VA, etc. are numbered 4:31 in the discussion here, consistent with the hardwaredrawings. This numbering conveniently relates to word boundaries.Note that although the VA path is 28 bits wide, limitations imposed by cache and map geometry limitusable virtual memory to only 222 or 224 words in most configurations, as discussed in "The Map"section later.MemBase can be loaded from the five low bits of FF, and the FlipMemBase function loadsMemBase from its current value xor 1. In addition, MemBase can be loaded from0.MemBX[0:1].FF[6:7], where the purpose of the 2-bit MemBX register is discussed in "IFUSection." The IFU loads the emulator task's MemBase at the start of each opcode with aMemBX-relative value between 0 and 3.The intent is to point base registers at active structures in the virtual space, so thatmemory references may specify a small displacement (usually 8 or 16 bits) rather than full fp#$q 5pFf ar p ^eW \O ZD YL W;@ Up$ Q? P40, Ni#5 L+* J Gb0% E>qp CM B.* @7+qp >mT <; 90< 7fO 5: 3N .s +EpFqp ){Y '#- %= $(y!Ytut? yByZcy@@/yR p#3 9; nC A % g= = > U=]DDorado Hardware ManualMemory Section14 September 19813728-bit VA's. In the Mesa emulator, for example, two base registers point at local (MDS+L)and global (MDS+G) frames.In any cycle with no processor memory reference, the IFU may make one. IFU referencesalways use base register 31, the code base for the current procedure; the D supplied bythe IFU is a word displacement in the code segment.Programmers may think of Mar as an extension of A since, when driven by the processor, Marcontains the same information as A.The base register addressed by MemBase can be loaded using BrLo_A and BrHi_Afunctions. VA is written into the pipe memory on each reference, where it can be read asdescribed later. The contents of the base register are VA-D on any reference.Processor Memory ReferencesMemory references are initiated only by the processor or IFU. This section discusses whathappens only when references proceed unhindered. Subsequent sections deal with mapfaults, data errors, and delays due to Hold.Processor references (encoded in the ASEL and FF[0:1] instruction fields as discussed inthe "Processor Section" chapter) have priority over IFU references, and are as follows:Fetch_Initiates one-word fetch at VA. Data can be retrieved in anysubsequent instruction by loading Md into R or T, onto A or Bdata paths, or masking in a shift operation.Store_Stores data on B into VA.LongFetch_A fetch for which the complete 28-bit VA is(B[4:15],,Mar[0:15])+BR[MemBase].IFetch_A fetch for which BR[24:31] are replaced by Id from the IFU.When BR[24:31] are 0 (i.e., when BR points at a page boundary),this is equivalent to BR+Mar+Id, saving 1 instruction in manycases. Note: the IFU does not advance to the next item of _Idfor IFetch_, so an accompanying TisId or RisId function isneeded to advance.PreFetch_Moves the 16-word munch containing VA to the cache.DummyRef_Loads the pipe with VA for the reference without initiating cache,map, or storage activity.Flush_Removes a munch containing VA (if any) from the cache, storingit first if dirty (emulator or fault task only).Map_Loads the map entry for the page containing VA from B andclears Ref; action is modified by the ReadMap function discussedlater (emulator or fault task only).IOFetch_Initiates transfer of munch from memory to io device via fastoutput bus (io task only). fp#$q 5pFf b9! `S \ J [2% YL3yVt/+yU*# Qp- PN NFN Js Fpq p3 DS B, ?U =Gx;JP"*9T$*7,x4JPx2L JP%Q*0!x-J+*,$**N+*(q&*&%*$x"PpJP3xJPP,*xIJP6*~0xJP&*qp*K$xJP8*  J !4!<:;e!*!: 6p8 4< 3 @ 1U9 / ,E *NV ($ $>q p=  J 7? ltlp1x7tptp xS'!qxp x*tp xLtLp?x 7qx p  X p<\ZDorado Hardware ManualMemory Section14 September 198139The above timing is minimum, and delays may be longer if data is not in the cache or if thecache is still busy with an earlier reference.Md remains valid until and during the next fetch by the task. If a Store_ intervenesbetween the Fetch_ and its associated _Md, then _Md will be held until the Store_completes but will then deliver data for the fetch exactly as though no Store_ hadintervened.Store_Store_ loads the memory section's DBuf register from B data in the same instruction. On ahit, DBuf is passed to the cache data section during the next cycle. On a miss DBufremains busy during storage access and is written into the cache afterwards.Because DBuf is neither task-specific nor reference-specific, any Store_, even by anothertask, holds during DBuf-busy. However, barring misses, Store_'s in consecutiveinstructions never hold. A fetch or _Md by the same task will also hold for an unfinishedStore_.PreFetch_PreFetch_ is useful for loading the cache with data needed in the near future. PreFetch_does not clobber Md and never causes a map fault, so it can be used after a fetch beforereading Md.IOFetch_An IOFetch_ is initiated by the processor on behalf of a fast output device. When ready toaccept a munch, a device controller wakes up a task to start its memory reference and doother housekeeping.An IOFetch_ transfers the entire munch of which the requested address is a part (in 16clocks, each transferring 16 data+2 parity bits); the low 4 bits of VA are ignored by thehardware. If not in the cache, the munch comes direct from storage, and no cache entry ismade. If in the cache and not dirty, the munch is still transferred from storage. Only whenin the cache and dirty is the munch sent from the cache to the device (but with the sametiming as if it had come from storage). In any case, no further interaction with theprocessor occurs once the reference has been started. As a result, requested data not inthe cache (the normal case) is handled entirely by storage, so processor referencesproceed unhindered barring cache misses.The destination device for an IOFetch_ identifies itself by means of the task and subtasksupplied with the munch (= task and subtask that issued IOFetch_). The fast output bus,task, and subtask are bussed to all fast output devices. In addition, a Fault signal issupplied with the data (correctable single errors never cause this fault signal); the devicemay do whatever it likes with this information. More information relevant to IOFetch_ is inthe "Fast IO" chapter. fp#$q 5pFf b.- `S. \> [ F YL2 W S 9 dq pH (H ]#6 (3 A =ZDorado Hardware ManualMemory Section14 September 198141IFU ReferencesThe F and G data registers shown in the IFU picture (Figure 11) are physically part of thememory system. The memory system fetches words referenced by the IFU directly intothese registers. The IFU may have up to two references in progress at-a-time, but thesecond of these is only issued when the memory system is about to deliver data for the firstreference. An IFU reference cannot be initiated when the processor is either using Mar or referencingthe Pipe; for simplicity of decoding, the hardware disables IFU references when theprocessor is either making a reference or doing one of the functions 1208 to 1278(CFlags_A', BrLo_A, BrHi_A, LoadTestSyndrome, or ProcSRN_B); or 1608 to 1678(B_FaultInfo', B_Pipei, or B_Config').The IFU is not prevented from making references while the processor is experiencing Hold,unless the instruction being held is making a reference or doing one of the functionsmentioned above.Memory Timing and HoldMemory system control is divided into three more or less autonomous parts: address, cachedata, and storage sections. The storage section, in turn, has several automata that may beoperating simultaneously on different references. Every reference requires one cycle in theaddress section, but thereafter an io reference normally deals only with storage, a cachereference only with the cache data section. Address and cache data sections can handleone reference per cycle if all goes well. Thus, barring io activity and cache misses, theprocessor can make a fetch or store reference every cycle and never be held.If the memory is unready to accept a reference or deliver Md, it inhibits execution with hold(which converts the instruction into a Goto[.] while freezing branch conditions, dispatches,etc.). The processor attempts the instruction again in the next cycle, unless a task switchoccurs. If the memory is still not ready, hold continues. If a task switch occurs, theinstruction is reexecuted when control returns to the task; thus task switching is invisible tohold.In the discussion below, cache references are ones that normally get passed from theaddress section to the cache data section, unless they miss (fetches, stores, and IFUfetches), while storage references unconditionally get passed to storage (IOFetch_,IOStore_, Map_, FlushStore arising from Flush_ with dirty hit, and dirty-victim writes).PreFetch_ and DummyRef_ don't fall into either category.Situations When Hold OccursA fetch, store, or _Md is held after a preceding fetch or store by the same task has misseduntil all 16 words of the cache entry are loaded from storage (about 28 cycles).Store_ is held if DBuf is busy with data not yet handed to the cache data or storagesections. LongFetch_ (unfortunately) is also held in this case. Since DBuf is not task-specific, this hold will occur even when the preceding Store_ was by another task. fp#$q 5pFf bs ^p@ \A ["4 YLE W TN RE2! PzHOtPzpOt Np%-&N#tNpN#t Lpqp IsY G: E @s =Sp%4 ;; 9A 7S 6(C 4^ M 2L /!Xq -Vp\ +\ )E '.1 &, "qp+ ; %qp  ZX 8 Jq p6%  P  F 2' (* | =]L6Dorado Hardware ManualMemory Section14 September 198142An immediate _Md is held in the cycle after a fetch or store, and in the cycle after adeferred _Md.Because the task-specific Md RAM is being read t2 to t3 for the deferred _Md in the precedingcycle, and t0 to t1 for the immediate _Md in the current cycle, which are coincident, hold isnecessary when the tasks differ. Unfortunately, hold occurs erroneously when the immediate anddeferred _Md's are by the same task.Any reference or _Md is held if the address section is busy in one of the ways discussedbelow._Md is erroneously held when the address section is busy, an unfortunate consequence of thehardware implementation, which combines logic for holding _Md on misses with logic for holdingreferences when the address section is busy.B_Pipei is held when coincident with any memory system use of the pipe. Each memorysystem access uses the pipe for one cycle but locks out the processor for two cycles. Thememory system accesses the pipe t2 to t4 following any reference, so B_Pipei will be heldin the instruction after any reference. Storage reads and writes access the pipe twicemore; references that load the cache from storage access the pipe a third time.Map_, LoadMcr, LoadTestSyndrome, and ProcSRN_ are not held for MapBuf busy; theprogram has to handle these situations itself by polling MapBufBusy or waiting longenough, as discussed in the Map section.Flush_, Map_, and DummyRef_ are not held until a preceding fetch or store has finishedor faulted. The emulator or fault task should force Hold with _Md before or coincident withissuing one of these references, if it might have a fetch or store in progress.In the processor section, stack overflow and underflow and the hold simulator may causeholds; in the control section TaskingOff or an IFUJump in conjunction with the onset of oneof the rare IFU error conditions may cause one-cycle holds; there is also a backpanelsignal called ExtHoldReq to which nothing is presently connectedthis is reserved forinput/output devices that may need to generate hold in some situation. All of thesereasons for hold are discussed in the appropriate chapters.Address Section BusyThe address section can normally be busy only if some previous reference has not yetbeen passed to the cache data section (for a cache reference that hits) or to storage (for astorage reference, or a cache reference or PreFetch_ that misses). A reference is passedon immediately unless either its destination is busy or the being-loaded condition discussedbelow occurs.The address section is always busy in the two cycles after a miss, or in the cycle after aFlush_, Map_, IOFetch_, or IOStore_.Hardware note: This allows Asrn to advance; for emulator and fault task fetch and store misses,which do not use Asrn, this hold is unnecessary. Unfortunately, the display controller's word taskfinishes each iteration with IOFetch_ and Block, so many emulator fetches and stores will be heldfor one cycle when a high-bandwidth display is being driven. Asrn is the internal register thatcontains the pipe address for storage references. fp#$q 5pFf bH `S y]t0]]]]&y[ [][[][4yZCMyX$ Up9 SyQt;yO@yNF, JpqpE I-I Gb!FtGbpFtGbp#qp E? CO @[2qp >$q p <( 9Tq p. 7'5 5O 2L> 0F .K ,Q +"K )W; $>q pA *2 7B lG  0-- e$x u tRxLxL8)x Ix 1 6 g<[[Dorado Hardware ManualMemory Section14 September 198143There are six other ways for the address section to be busy:(1)A cache reference or PreFetch_ that misses, or a FlushStore, transfers storagedata into the cache. At the end of this reference, as the first data word arrives,storage takes another address section cycle.(2)The preceding cache reference hit but cannot be passed to the cache data sectionbecause the data section is busy transferring munches to/from storage (or to an iodevice if an IOFetch_ finds dirty data in the cache). Total time to fetch a munchfrom storage is about 28 cycles, but the cache data section is busy only during thelast 10 of these cycles (9 for PreFetch or IOFetch_ with dirty hit), while data iswritten into the cache. The cache data section is free during the interim.(3)The preceding storage reference, or cache reference or PreFetch_ that missedhas not been passed on to storage because the storage section is busy. Storageis busy if it received a reference less than 8 cycles previously, and may be busylonger as follows:successive cache references must be 10 cycles apart;successive write references must be 11 cycles apart;with 4k storage ic's, successive references must be 13 cycles apart.(4)A cache write (caused by a miss with a dirty victim or FlushStore) ties up theaddress section until the storage reference for the write is started; this happens 8cycles after the storage reference for the miss or FlushStore is started. Note thatthe new munch fetch starts before the dirty victim store and that hold terminatesright after the store is started.(5)A reference giving rise to a cache write that follows any other cache miss will tieup the address section until the previous miss is finished.(6)The address section is busy in the cycle after any reference that hits a cache rowin which any column is being loaded from storage.Any reference except IOFetch_, DummyRef_, or Map_ that hits a cache row inwhich any column is being loaded from storage remains in the address sectionuntil the BeingLoaded flag is turned offi.e., for the first 19 of the 28 cyclesrequired to service a miss, the reference is suspended in the address section;during the last 9 cycles of the miss, when the munch is transferred into the cachedata section, the reference proceeds (except that a fetch or store will still be heldbecause the cache data section is busy during these 9 cycles). This is believed tobe very infrequent.A more perfect implementation would suspend a reference in the address section onlywhen the hit column, rather than any column in the row, was being loaded. However, thesituation is only costly when the suspended reference is by another task; since there are64 rows, ~1.5% of all references will be held whenever any task is experiencing a miss.There is more discussion of this in the "Performance Issues" chapter.References to storage arise as follows:a cache miss (from a cache reference or PreFetch_) causes a storage read;a cache reference or PreFetch_ miss with dirty victim also causes a storage writeimmediately after the read; fp#$q 5pFf b< _x=x]6x[, YLxPxWRxU FxS$/xR";xPWK Mx7xK=xJ# DxHY:E4:C4:B%D ?x3x=J x;/%x:'qp0x8]qp 5x(+x3; 1Ux(*x/q px,= x+"q p%x)W q p"x' Cx%4x#G x"-Sx b;}t7;C;uI;O;mE p'x; x#.x   :/[;LDorado Hardware ManualMemory Section14 September 198144a Flush_ which gets a dirty hit causes a FlushStore read reference which in turncauses a storage write of the dirty victim;every io reference causes a storage read or write;a Map_ causes a reference to storage (actually only the map is referenced, but thetiming is the same as for a full storage reference).The following table shows the activity in various parts of the memory system during a fetchthat misses in the cache and displaces a dirty victim; the memory system is assumed idleinitially and nothing unusual happens.Table 15: Timing of a Dirty Miss Time Time(Cycles)Activity of Fetch(Cycles)Activity of Dirty-Victim Write 0Fetch_ starts 1in address section 2-9in address section (wait for map) 3-18in ST automaton (generatesyndrome, transport to storage) 2-9in map automaton *10-17in map automaton * 7-14in memory automaton *15-22in memory automaton *14-21in Ec1 automaton22-29in Ec1 automaton **21-28in Ec2 automaton29-36in Ec2 automaton ** 27_Md succeeds* The map automaton continues busy for two cycles after a reference is passed to the memory automatonbecause it is necessary for the Map storage chips to complete their cycle.** The work of the dirty-victim write is complete after it has finished with the memory automaton, but itmarches through Ec1 and Ec2 anyway for fault reporting.STOP! The sections which follow are about the Map, Pipe, Cache, Storage, Errors, andother internal details of the memory system. Only programmers of the fault task ormemory system diagnostic software are expected to require this information. Since thereare many complications, you are advised to skip to the next chapter.The MapVA is transformed into a real address by the map on the way to storage. The hardware iseasily modifiable to create a page size of either 256, 1024, or 4096 words and to use either16k, 64k, or 256k ic's for map storage. The table below shows the virtual memory (VM)sizes achievable with different map configurations. However, the cache configuration limitsVM size independently, as discussed later, and this limit may be smaller than the Map limit. fp#$q 5pFfxbHx`S+x]2x[.$xYL4 U)2 T J RE& NFvF!yKu)WyJG)W1yGtyFH)W1)WD11CyB%)W1y@)W1y?d)W1y>)W1y<y:14y8Jy6Py47 /q*+ -E ,X *ND %5s !p%3 H .M c\ @` RKtK>pKt@K>pKt!K>p){KtK>p FL E-EtE-pEtE-p@ CcU AK ?4! >1 :G 8C 67t6p 51)5t51p 1,P# / ,/- *H (6 %|.qpqp #qp(x ?@x4.x, THqp @H B q p% =/7 ;e 73# 6(; 4^-- 23 t2p3 t2p. 0B -V#y*t#utu t y)4utut"y' $p8 "0q pK }H ; +, N S$q$ ,p O q p5 ) *  LF K9 IPM G6) E"3 C(- B%wpwp @[ = u :ntQ 9 Vut 7 5ut4utut 3utA 2L0utu t 0ut, .MX-.M ,!, ){_ &,v) #tQ "-+@ ? .43 64 lK  O   H$ M KyNy= =Z \Dorado Hardware ManualMemory Section14 September 198149This aging is performed by first clearing the entire cache using the clever algorithm discussed earlier, thensampling and zeroing Ref.DeliverPage first returns the RP of any page on the vacant queue. If the vacant queue is empty, it next scansentries on the disk write-complete queue; if one is found that has not been referenced in the interim, its mapentry is cleared and its RP is returned; if referenced, it is moved to bin 0. If the disk write-complete queue isempty, entries in bin 7 are scanned; if this bin is exhausted, bin 6 is scanned, etc., until finally bin 0 isscanned. When an entry has been referenced, it is moved to bin 0; when unreferenced but dirty, it is put onthe disk write queue; when unreferenced and clean it is returned.The caller of DeliverPage will frequently be a disk read or new page creation procedure. It should complete itswork and then call ReturnPage(RP,0) to restore the page to the storage manager. ReturnPage will put thepage on the vacant queue, if it is vacant, or into bin 0.Mesa Map PrimitivesBasic Mesa mapping primitives are:Associate[vp,rp,flags] adds virtual pages to the real memory, or removes them if flags=Vacant.SetFlags[vp,flags] RETURNS oldValue: flags reads and sets the flags. If flags=Vacant, the page isremoved from the real memory.GetFlags[vp] RETURNS [rp,flags].These are defined as indivisible operations and are implemented trivially on a machine with no cache (e.g.,Dolphin). For example, if a SetFlags clears Dirty and sets WP, the returned value of Dirty tells correctlywhether the page has been changedno store into the page may occur between reading Dirty and setting WP.One intended use of the primitives is illustrated by the following Mesa sequence for removing a virtual pagefrom real memory:oldFlags _ SetFlags[v,WP];IF oldFlags.Dirty THEN WritePage[...]SetFlags[v,Vacant]This sequence prevents the page from being changed during the write. Another possibility would be just toclear Dirty, and then to check it again after the write. This must be done properly, however, to avoid a racecondition:WHILE (oldFlags_SetFlags[v, Vacant]).Dirty DO SetFlags[v, [Dirty: false]]; WritePage[...]; ENDLOOPTo avoid inconsistent map and cache entries, SetFlags[v, ...] must remove entries for page v from the cache.Unfortunately, since we don't want to make the cache removal process atomic, parts of the page alreadypassed over by the removal process could be brought back into the cache before the process is complete.The implementation of the primitive must allow for this.On Dorado it is, unfortunately, difficult to implement these primitives as indivisible operations because almostany change to map flags must be preceded by clearing all cache entries in the page. However, it isunacceptable to do this with TaskingOff because the time required might be as long as 16*10 cycles with 256-word pages or 64*10 cycles with 1024-word pages (if every munch in the page is in the cache and dirty),which is too long. Consequently, io tasks will be active during the removal process, and one of them mightmove a munch back into the cache after it has been passed over by the removal process. For this reason, thepresent Mesa code flushes once with tasking on and then again with tasking off.The implementation of SetFlags(v, ...) proceeds as follows (Associate is similar.):Flush all cache entries for the page in question. If any entry is dirty, removal will cause a write andset Dirty in the map, as discussed earlier.Disable tasking.Flush all cache entries for the page again. fp#$q 5pFf bAtV ` ^BQ \[ [^ Z D) X ^ W^A T88 S_F" Q9 Mv Kat"xH^xF$9)xDxB% ?ut8 >&L <h :' ` 8x6x51%x3 12D& /V .q x,<*x*: (=6ut$ut &^ %|R $8 !} d  Y G% ZF! 29 l 8O Sxehx+xx+ x=Y:Dorado Hardware ManualMemory Section14 September 198150oldFlags _ map[v].FlagsIf turning on WP: map[v].flags _ [WP: true, Dirty: false, Ref: false].If setting Vacant: map[v].flags _ [WP: true, Dirty: true, Ref: false].If turning off WP: map[v].flags _ [WP: false, Dirty: false, Ref: false].These are done with Map_ after which old data is retrieved from the pipe (possibly followed byPreFetch to set map[v].Ref true). Note: These primitives do not support the complete cache flush discussed earlier; another primitive willprobably needed to do this. Also, we really want a primitive that will allow the flags to be sampled and Refzeroed without changing the value of WP or Dirty. And efficiency may demand primitives particularly tailoredto the needs of whatever storage management algorithm is employed. fp#$q 5pFfxbAtx` Gx]Gx[IxYoHxX" UpuO T1< R _ QNB* Q+<VDorado Hardware ManualMemory Section14 September 198151The PipeInformation about each reference is recorded in the 16-word pipe memory. Pipe layout isshown in Figure 10, which you should have in front of you while reading this section. Theprocessor reads the pipe with the B_Pipe0, ..., B_Pipe5 functions. You should note thatPipe0, 1, and 5 are read high-true, while Pipe2 and 3 are read low-true; Pipe4 contains amixture of high and low-true fields; 1503618 xor Pipe4' produces high-true values for allfields in Pipe4. The discussion in this section assumes that all low-true fields have beeninverted.It is illegal to do ALU arithmetic on pipe data (not valid soon enough for carry propagation),and B_Pipei is illegal in the same instruction with a reference because Hold won't becomputed properly.The EmulatorFault, NFaults, and SRNFirstFault stuff in Pipe2, which duplicate what B_FaultInfo wouldread back, is not part of the pipe, although it is read by B_Pipe2'; B_Pipe2' is simply a convenientdecode for reading it backthis will be discussed in the section on fault handling, not here.Similarly, Dirty, Vacant, WP, BeingLoaded, NextVictim, and Victim stuff in Pipe5 is not part of thepipe and is read back by B_Pipe5 purely for decoding convenience. This information, used primarilyfor debugging, is discussed later.The Task, SubTask, VA, and cache control stuff in Pipe0, 1, 2, and 5 is used both internallyby the memory system and externally by the processor. Map and error stuff in Pipe3 and 4is solely for memory management and diagnostic activities carried out by the processor.Two main problems in dealing with the pipe are:Finding the pipe entry for a particular reference;Knowing when various bits become valid;How the Pipe Is AddressedSystem microcode is expected to use the pipe in only two situations: fault handling by task15 (the "fault task") and reading map or base registers by task 0 (the "emulator"). Othertasks will not read the pipe. This rigid view of how the pipe will be used during systemoperation has motivated the implementation discussed below.Pipe entries are addressed by 4-bit storage reference numbers, or SRNs, assigned to eachstorage reference. All task 0 and task 15 references except PreFetch_ with miss (andimplicit FlushStore and Victim references) use the SRN contained in ProcSRN exclusively;all other references share SRN's 2 to 15, which form a ring buffer addressed by an invisibleregister called ASRN.To read a pipe entry, first ProcSRN_B addresses the pipe entry, then the contents of thatentry are read with B_Pipei. In system microcode, the emulator is expected to keep thevalue 0 in ProcSRN to avoid smashing the ring buffer on references; if the fault task needsto make a reference, it will normally load ProcSRN with 1 and use that SRN for thereference; the fault task will manipulate ProcSRN however it likes to examine the pipe butalways restore it to 0 before blocking; other tasks will not use ProcSRN. Thisimplementation is welded to the assumption that only the fault task will probe the pipewhen io tasks are running. fp$q 5pFf br ^p/) \Z [Dq YLB W+VsWq- UQ S Pzp? N qpG LyJ#ts tsts t7yH:*yGbHyF stststs ts tst"yDA"yC@" ?pqpqp4 >&&3 <\A 8/:6K2:4' 0;q ,pS *S )49 'i; ##qp "-C bK %7  [D qp; (3 9 1F f*'+( W   <[PDorado Hardware ManualMemory Section14 September 198152To io task references and emulator PreFetch_'es that miss, the cache address section'sSRN, called ASRN, is assigned at t2. ASRN will be advanced to the next ring value iff thereference starts the map. In all other cases ASRN remains unchanged and is used by thenext reference as well.A reference starts the map unless it is a DummyRef_, a cache reference or PreFetch_ thathits, or a Flush_ that misses or gets a clean hit. A convenient way to guarantee that themap is started without worrying about the contents of the cache is to do a Map_ in theemulator or an IOFetch_ in any other task. The reasoning behind this treatment of ASRNis explained in the section on fault reporting.Tasks 1 to 14 generally cannot find out the SRN for their last reference. Even if this weredetermined somehow by polling all the pipe entries, there would be no assurance that,meanwhile, a higher priority task didn't clobber the pipe entry.Because of its single pipe entry, the emulator must wait for an earlier reference to finish orfault, before starting another. Of all emulator references, only a fetch, Store_, orPreFetch_ might fault. However, PreFetch_ doesn't use the private pipe entry, so only apreceding fetch or Store_ might still be in progress when a new reference is issued. If thenew reference is another fetch or Store_, it will hold until the preceding one finishes (noproblem). Hence, the only restriction imposed by the private pipe entry is that the emulatormust cause hold with _Md before issuing Map_, Flush_, or DummyRef_, if a fetch orStore_ might still be in progress.Timing constraints do not permit generating Hold in the above case. It has been observed thatissuing Map_ without holding for a previous Store_ to finish will result in infinite DBufBusy (i.e.,infinite Hold), so do not fail to issue _Md before or concurrent with Map_ or RMap_.When the Pipe is AccessedConceptually, the pipe is three different memories. First, VA, task, subtask, and cachecontrol bits in Pipe0, 1, 2, and 5 are written during the reference. Next, the 20 bits of mapinformation in Pipe3 and Pipe4 are written following the map read-write (if any). Finally, theerror correction-detection stuff in Pipe4 is written following the storage read (if any). Thememory system needs one cycle for each of these accesses.However, the hardware treats the pipe as only two separate memories internally, or as onlya single memory for purposes of holding the processor. In other words, within the memorysystem Pipe0, 1, 2, and 5 may be accessed by one part of the pipeline, while another partindependently accesses Pipe3 and 4. But processor accesses by B_Pipei are held, if thememory system wants any part of the pipe. Worse, the memory system uses the pipebetween even clocks (t0 to t2), the processor between odd clocks (t1 to t3), so theprocessor is locked out for two cycles during each of these intervals.Programs can safely read Pipe0, Pipe1, Pipe2, or Pipe5 (i.e., task, subtask, VA, and cachecontrol stuff) in the cycle after any reference, since these are updated at the end of thecache address section cycle. B_Pipei in the cycle after a reference will hold for one cyclewhile the memory system uses the pipe.Values in a pipe entry are not reset at the onset of a reference and Pipe3 and Pipe4 arenot written at all unless storage is accessed. Consequently, Pipe3 and Pipe4 may refer to a fp$q 5pFf b/' `S"_t`Sp ^> \ YL4$ WD U7 SR R"/ NW LU K@ G P E ? D*. BIJ @~C >5( < G ;"x8]t^x65/x5T 1yq .pX ,<9% *rG (J &9 #j%5 !N F  /qp @qp' utuptup&tuptup F 9G nK qp7 & gqp1 .. x U<]cDorado Hardware ManualMemory Section14 September 198153previous reference *Caution*.The control bits in Pipe2' and Pipe5, used by the memory system, also indicate (to the faulttask) what kind of reference is described in the pipe, as follows:CacheRefa fetch or Store_Store'Store_'IFURefIFU fetchesRefTypedistinguishes read, write, Map_, and other referencesFlushStoredirty victim write triggered by Flush_ColViccache column of a hit, or of the victim on a missDummyRef_ finishes immediately and only VA in Pipe0 and Pipe1 and the stuff in Pipe2 arerelevant. For Flush_, cache information in Pipe5 is also valid. Flush_ finishes immediatelybecause the resulting FlushStore and dirty-victim write references (if any) are started inring-buffer pipe entries.Programs can read map stuff (Pipe3 and Ref, WP, Dirty, MapTrouble, and MemError inPipe4) as soon as that part of the reference is complete. For Map_, completion of the mapread is coincident with MapBufBusy going false, determined by polling. For a fetch orstore, there is no way to distinguish completion of the map read from completion of theentire reference. Consequently, Pipe3 and Pipe4 are normally read by doing _Md (whichholds for completion), then reading the pipe.For IOFetch_, IOStore_, and PreFetch_ there is no way to tell when the reference hasfinished, except by waiting longer than the memory can possibly take to complete thereference.IOStore_'s and dirty victim writes zero the Syndrome and EcFault fields in Pipe4. Hence,the only reference that leaves junk in these bits is Map_; the fault task can distinguish pipeentries for Map_ by means of the RefType field.All data in Pipe0, 1, 2, and 5 except FlushStore and ColVic are written at t3, and can be readimmediately after a reference. However, FlushStore and ColVic are written at t4. Ordinarily, thiswould mean that their values could not be read safely; however, since B_Pipei is held in the cycleafter a reference, the values will always be ok.In the best case, map information in Pipe3 and Pipe4 will be loaded at t14, fault and error correctorinformation in Pipe4 at t48.Faults and ErrorsRemember that high-true values for all fields in the Pipe are used in the followingdiscussion.ErrorsSeveral events cause memory errors from which the system does not recover. Errors haltthe processor if the MemoryPE error is enabled (see "Error Handling"). If MemoryPE isdisabled, the program will continue without any error indication. MemoryPE conditions are: fp$q 5pFf brp ^V \ByYo~yW~yU~ yT~5yRE ~&yPz~1 M3% K> S IsB G D7"qpqp Bl? @q p >E = *, ;A- 79 69 4: 0qpqp .:$ -3!qpy*rt&s tst)*ry()s tst(=(y'#7sty%0y#$!'"#$y!} !} Zr q4   fpJ qp*qp +qp <[?Dorado Hardware ManualMemory Section14 September 198154Byte parity errors from the cache data memory (checked on write of a dirty victim, noton _Md or IFU reads); the processor checks Md parity (see "Error Handling") and theIFU checks F/G parity;Byte parity errors from fast input bus;Cache address memory parity errors.FaultsOther events cause faults. A fault condition is indicated in the MapTrouble, MemError, andEcFault fields of Pipe4 when it occurs; in addition, the fault task is woken to deal with thesituation unless NoWake is true in Mcr. The encoding of the various errors is as follows:Table 17: Fault IndicationsKind of ErrorNameMapTroubleMemErrorEcFaultMap parity errorMapPE11Page faultPageFlt10Write-protectWPFlt10Single errorSE001Double errorDE011In the above table, WPFlt and PageFlt have the same encoding; these must be distinguishedby means of the Store' bit in Pipe5 and the WP bit in Pipe4; WPFlt can only occur forStore_, IOStore_, or dirty-victim stores that encounter WP true.MapTrouble might be true and reported to the fault task on a fetch or store that misses oran IOFetch_, IOStore_, FlushStore, or dirty-victim write. Flush_ and DummyRef_ nevercause MapTrouble. Map_, PreFetch_, or IFU fetches might record MapTrouble in the pipebut never wake the fault task. Map faults on IFU fetches are reported instead to the IFU,which buffers the fault indication until an IFUJump occurs to an opcode with at least oneinstruction byte in the word affected by the map fault; then a trap occurs, as discussed in"Instruction Fetch Unit".In system microcode, we expect a WPFlt and PageFlt due to IOFetch_, IOStore_, FlushStore, or avictim write to indicate a programming error; however MapPE might occur. Note that if any kind ofMapTrouble occurs on a storage write (i.e., on an IOStore_, FlushStore, or victim write), storage isnot modified and contains the old value; however, the map's Dirty bit will be true, even though thestorage write has not completed.SE and DE may occur on any cache reference or PreFetch_ that misses or on anIOFetch_. Map_, IOStore_, DummyRef_, and Flush_ never cause these errors. Also notethat fault task wakeup on an SE requires not only NoWake false but also ReportSE true inMcr; the fault indication transmitted with the munch for an IOFetch_ is set only for DE,never for SE.Unlike map faults, data errors on IFU fetches and PreFetch_'es are reported to the fault task. Thismust be done for DE's, which are fatal; for corrected SE's, the fault causes no disruption to theprogram because the fault task, after logging the failure, simply lets the task that faulted continue. fp$q 5pFfyb?y`S@y^y['yYL# Uq Qpqp%qp OqpA Mqp.IrX:FHq #$ ,5:Cpq&-p/D6:A q&,p/D6:@ q&,p/D6:>J q&,p/D6:< q&,p/D6 89qpqp4 6oqpqp qp 48qp 12q pG /hU -q p" q p +F *Y (=K &sx#t!stst(x"P6stx s t!9xyKLyI =:Ft4st:EQst$s t":C;:BB:A.0y>p q pq pqp y<y:KDy8F 5U 3CF 1yR /?y,%*y)Jy&!*y$% !Yq pqp; 6 ? RK W 3qp  v7# Is.* GyDt$stst#yC(stx@*s tx?Hx>&Px<T 9wp O 7V 5A 4N /r ,_pqp4 */) (4$ '5 %5' !\ 1+ .U c +qpqpqpqp '$qp \Y >  8qpq Upqp \  y=]FDorado Hardware ManualMemory Section14 September 198158MOS RAMs used on storage boards (and in the map) must be refreshed at regularintervals, else they drop data. This occurs during refresh references once every 16 ms.Every MOS RAM on every storage board participates in every refresh reference, and onerow of data is refreshed each time. This means that 64 (4k RAMs), 128 (16k RAMs), or 256(64k RAMs) refresh references are required to refresh all data (So the refresh period is 2 or4 msthe specification on both 16k and 64k RAMs is a 2 ms refresh period at themaximum operating temperature (85o C). The dominant leakage term is exponential intemperature so the refresh period can be doubled each 5o C drop in operatingtemperature. Because the specification is conservative and because we have no intentionof operating anywhere near 85o C, a 4 ms refresh period should be adequate.The time for each refresh reference is 8 cycles (13 cycles with 4k-bit RAMs), same asnormal references. Refresh hardware competes for storage access with the cache datasection and fast io references. During the first 8 ms of a 16 ms period, refresh defers tonormal references; during the last 8 ms, it preempts normal references.The CacheThe physical cache structure consists of 256 entries in an array of 64 rows by 4 columns.Each entry holds 15 address bits, a parity bit for the address bits, four flag bits, and onemunch of data (= 256 data bits + 32 check bits). Hence, the cache holds a total of 4kwords of data.The address section is implemented with 256-word RAM's, but only 64 words are presently used.The data section uses 1kx1 RAM's for storage. When sufficiently fast 4kx1 ECL RAM's becomeavailable, we plan to use them in the cache data section and utilize all 256 words in the addresssection. In this case, the cache geometry will be 256 rows by 4 columns (16k words in the datasection).The cache address section stores 4 flag bits discussed below, 15 VA bits, and 1 parity bit.The way the VA bits are assigned depends upon whether or not 4k x 1 ECL RAM's areused in the cache data section. VA[7:19] are stored in the address section for allconfigurations. Two other bits are either VA[5:6] or VA[20:21]; VA[5:6] are used with 4kic's in the cache data section (VA[20:21] then appear in the row address of the cache, sothey don't have to be stored). The hardware is also arranged so that the parity bit may bereplaced by VA[4].In other words, the cache initially implements a 225-word virtual memory with provision forexpanding this to 227 words when 4k x 1 RAM's are available or to 228 words at the cost ofeliminating the parity bit in the address section. However, the map organization also limitsvirtual memory, probably to a smaller size than the cache limit, as discussed earlier.Normally, the cache is invisible to the programmer except for problems with map/cacheconsistency discussed in the map section. However, features discussed below in"Testing" allow more direct access for checkout, initialization, and error recovery.An address VA, if in the cache at all, must be in one of the four columns of the rowaddressed by VA[22:27] (or VA[20:27] if the cache is expanded). References compare theappropriate 15 or 16 bits of VA[4:21] with the values stored in each of the 4 columns todetermine which cache entry, if any, contains VA. fp$q 5pFf bM `S3"up ^B \Y ZF Y)G W^!WtW^p, U)!*V!tUp SF QRtQp- N< LI Jup up I-%up! Dr @p M >X = B ;A y8t]y7$7y5+6y4^Dy2 /pO -Q ,= *NL (R &P $ !}2" t!}p"  ?tp ?tp )4 > 9   8 T < W D D1  =\cDorado Hardware ManualMemory Section14 September 198159The VNV memory contains two two-bit entries for each row of the cache. The Victim fieldspecifies the cache column displaced if a miss occurs in this row. The NextV field is thenext victim. When a miss or a hit in Victim occurs, Victim_NextV is done. When a miss,hit in Victim, or hit in NextV occurs, NextV_Victim.0',,NextV.1' is done (i.e., NextV is loadedwith a value different from both the original NextV and Victim). This strategy is not quiteLRU, since there is a 50-50 guess on the ordering of the third and fourth replacements.This treatment of VNV is used for fetches, Store_, PreFetch_, and IFU fetches but not forIOFetch_, IOStore_, or Map_, which don't use the cache.On a Flush_, Victim is written with 0 on a miss or with the column of the hit and NextV iswritten with Victim.0',,NextV.1'. If the Flush_ hit a dirty cache entry, then a FlushStorereference is fabricated which will wind up writing Victim (= column hit by the Flush_) backinto storage. The FlushStore reference will also do Victim_NextV andNextV_Victim.0',,NextV.1' again. This means that the VNV entry for the row touched by aFlush_ is effectively garbaged, which probably won't affect performance much.A better strategy for Flush_ and IOStore_ would be as follows: On a miss, Victim and NextV remainunchanged; on a hit in a column different from Victim, Victim_hit column, NextV_Victim; on a hit inVictim, no change.The UseMcrV feature discussed in "Testing" allows Victim and NextV to be replaced by McrV andMcrNV.Associated with each cache entry are four flag bits that keep track of its state, as follows:Dirty - set by Store_, cleared when loaded from storage. This bit does not implyanything about the map's Dirty bit. The cache Dirty bit causes a storage writewhen the entry is chosen as victim, and the map's Dirty bit is set at that time.Vacant - set by hit on Flush_, hit on IOStore_, or Store_ into a write-protectedentry, cleared when the entry is loaded from storage. Vacant is not set after an SEor DE. Vacant prevents the entry from matching any VA presented to the cache.WriteProtect - a copy of the map's WP bit. It is copied from the map when thecache entry is loaded and not subsequently changed. If a Store_ is attempted intoa write-protected entry, the entry is invalidated, there is a cache fault, and a writeprotect fault will be reported by the map.BeingLoaded - set while an entry is waiting for data from storage. Any referencethat hits in the same row will remain in the cache address section until the bit goesoff; any reference or _Md following the one which hit a row being loaded will beheld.RemarkAt the end of a miss, data from the error-corrector is loaded into the cache 16 bits/clock. Not until all 16words of the munch have been loaded is Md loaded and the task (which has been held) allowed to continue.A scheme whereby the word being waited for is loaded into Md concurrent with writing it into the 1kx1 RAM'shas been considered but rejected as too complicated. This would reduce average miss time from about 28cycles to about 24. fp$q 5pFf b1qp `S5qp ^ L \_ Z$8 Y)(/ W^D U7 R"Z PWI N9" LE J8 I-MyFkt5-yE DyCyA stIsty?st <\pFx9qp9x7Ox6(5x3qpJx15x/!-x-Vq p)x+9x)8x'*x%Xq pFx#Lx!7x s tB+ =+  c d  0 =U*:Dorado Hardware ManualMemory Section14 September 198160InitializationThis section outlines the order in which parts of the memory system can be initialized.ClocksThe instruction decoding flipflops of the memory section are enabled when the processorclocks are enabled. All other memory clocks are enabled by a signal called RunRefresh, asdiscussed in "Dorado Debugging Interface".When RunRefresh is true, clocks internal to the memory system always happen, even if theprocessor is halted. When RunRefresh is false, memory clocks run with the processor.Except for low-level debugging of the memory system itself, RunRefresh should be true.Otherwise, storage will not retain data at breakpoints.Mcr RegisterThe Memory Control Register (Mcr) contains fields that affect the memory system (seeFigure 10). Mcr is intended to facilitate testing, and in some cases initialization. Theregister can be loaded with the Mcr_ function and read back over the DMux. Bits in Mcrare as follows (Some of these bits are loaded from A and others from B, as indicated inFigure 10):dVA_VicOn each reference, write the cache address entry selected by the rowof VA and column of Victim (Note: Victim determines the column, evenon a hit) into VA of the pipe, so that VA[4:21] in the pipe contain theaddress from the cache. Also prevent both map and storageautomata from starting (which prevents ring buffer pipe entries frombeing allocated to these as well). FDMiss should always be true whendVA_Vic is true.FDMiss"Force dirty miss" forces each cache reference to miss and store thevictim, even if not dirty. Misses caused by FDMiss do not cause Hold(*details*).UseMcrVUse McrV as victim and McrNV as next victim for all cache missesinstead of Victim and NextV from VNV.McrVThe two-bit victim, or cache column used on a miss, when UseMcrV istrue.McrNVThe two bit next-victim when UseMcrV is true.DisBR"Disable base registers" prevents base registers from being added toD in computing VA and prevents BR from being written.DisCF"Disable cache flags" forces cache flags to read out as zeroes andprevents them from being written.DisHold"Disable Hold" unconditionally prevents hold and BLretry fromoccurring. fp$q 5pFf br ^pC Zfq VpI U*8q p S_* OT N#(- LXM J7 FHq BpO A U ?A< =vC ; x9 q~p0~7B(~5x/~3# $.~1&~0E~.Mx+q~p1~): ~(rpx%|q~p@~#%x!q~p3~Hxq~p-x q~p!#~Aqp4xq~p0~!x:q~pD0E ~o  (=X&Dorado Hardware ManualMemory Section14 September 198161NoRefDisable storage references.WMissWakeup fault task on every miss.ReportSE'Don't wake up fault task after (correctable) single errors.NoWakeNever wakeup fault task.During normal operation every bit in Mcr should be 0, except possibly ReportSE', ifcorrectable errors are not being monitored. It is illegal to load Mcr while references are inprogress (Changing DisHold is known to cause problems).System InitializationSystem initialization must get the map initialized as desired and the cache in agreementwith the map. Initialization firmware should allow for cache rows containing several entriesfor the same address, which might occur after power up or after running diagnostics.There are many ways to carry out this initialization. One is as follows:1. Set NoWake and DisHold true in Mcr, so the fault task won't disturb initialization, and so thatBeingLoaded conditions won't cause trouble.2.Clear TestSyndrome.3. Load the map as desired. Clear the cache as discussed in the Map section. After this thecache will be empty and Ref and Dirty in map entries will be smashed.4. Reload the map as desired.5. Read FaultInfo to kill any pending wakeup for the fault task.6. Setup Mcr for normal activity (0 or ReportSE'). fp$q 5pFfxbq~px_q~px\q~p;xZCq~p W;< qp Up,2 S7 O`q Kp: J#U HYG DIxBt'<x@~+x=x:#;x9wststx6x3Ax123 1<6Dorado Hardware ManualMemory Section14 September 198162TestingThis section outlines the order in which parts of the memory system can be tested, so thatonly a few new components are involved at each step.VA, Adder, BR'sThe first step is to set NoWake, FDMiss, and DisBR to true. Now processor references willdeposit Mar in VA of pipe entry 0 (in the emulator), or of every other pipe entry (in othertasks), so that this part of the pipe can be tested (LongFetch_ allows all the VA bits to betested). Next, setting DisBR false, loading BR's, and making more processor referenceswill allow BR's and adder to be tested.Cache Address StorageThen set NoWake, FDMiss and UseMcrV to true and use McrV and McrNV to select onecolumn of the cache at-a-time. Each processor reference will store its VA into that column,and into the pipe, and will read out the old VA into the next ring buffer pipe entry (as thevictim because FDMiss is true). This allows the VA bits in the address memory to beinitialized and tested. The column number in Pipe2 should read back the value in McrV inthis case.Above, address memory values are read using FDMiss, then VA is checked in the pipeentry created for the victim. A simpler method of reading any address section VA is asfollows: Turn on DisBR, UseMcrV, and dVA_Vic. On processor references, the cacheentry addressed by Mar[6:11] (the row) and McrV (the column) will then have its VA[7:21]written into VA[7:21] of the pipe entry for the reference.The flag bits in the address section can be directly tested using B_Pipe5 and CFlags_A.These functions operate on the cache entry addressed by the row of the last reference andcolumn of the hit or victim on a miss. Since the IFU or another task could have issued thelast reference, these functions are realistically limited to initialization and checkout, wherethe last reference is known. Normally these will be used with UseMcrV and FDMiss true inMcr, so McrV will select the column.B_Pipe5 also reads V and NV from the selected row. CFlags_A won't work if DisCF istrue, and B_Pipe5 will read zeroes for all four flags in this case.CFlags_A requires that Mar data continue without glitching during the precedinginstruction as well. This means that data originating in RM or T must not have been loadedduring either of the two previous instructions (else a glitch might occur when themultiplexor switched from the bypass to direct path) and that no higher priority tasks mayintervene between the two instructions. Issuing CFlags_A in both instructions is theeasiest way to drive Mar continuously for two cycles.Cache Data StorageNext, initialize the cache address section VA's and flags so that the cache data section canbe tested. To do this turn off FDMiss while leaving on NoWake, dVA_Vic, UseMcrV. fp$q 5pFf br ^pZ \4 Xq U*p(2 S_:! Q< O,+ M' Iq FHpL D}.. B P @N ?? =S 9= 8T 6K2 4A 2: /D4# -z7" +; )_ (&3 &O$ "? !C 9(:' <   C  AG v@ 5 fq pX *7 =](Dorado Hardware ManualMemory Section14 September 198163Initialize the address section to a convenient range of virtual addresses by Store_'s to eachmunch with appropriate McrV values. In the instruction after each reference, write theflags to WP = false, BeingLoaded = false, Vacant = false with CFlags_A.At the end of this setup, the address section will be loaded and have write access to thedesired virtual addresses. Hence, Fetch_'es and Store_'s to these VA's will not miss, andwill access the 4k of cache data memory, which can thus be systematically tested.MapNext, turn off UseMcrV, leaving only NoWake turned on and use Map_ to test the map. Atthe end of this test initialize the map, say, to map virtual addresses into corresponding realaddresses.Main StorageThen finally the storage can be accessed and tested with fetches and Store_. FDMiss canbe used to force storage references.Fault ReportingNoWake can be turned off and methods similar to the above can be used to test faultreporting.IOFetch_, IOStore_, Fast IO BussesSpecial hardware is needed to test these (the IOTest board).Error CorrectionIn normal operation TestSyndrome contains 0 and Syndrome, written by the error corrector,should be 0 if no error was corrected or detected. For test purposes, TestSyndrome canbe loaded with any non-zero value and one bit disables error correction altogether. If thereare no storage failures, TestSyndrome should wind up in Syndrome after a storage read.The error-corrector, MemError, ECfault, ReportSE', and fault reporting can be tested usingTestSyndrome.The LoadTestSyndrome function causes TestSyndrome to be loaded from DBuf. Thisshould normally be done after a Store_, as follows:TaskingOff;Store_RMAddr, DBuf_T;*DBuf_data for TestSyndromeLoadTestSyndrome;TaskingOn;TaskingOff is required because an intervening higher priority task might change thecontents of DBuf. fp$q 5pFf b#: `SM ^G [R YLP W2 Smq :p,' 90 4q" 1yp< -3q )pN ' L &,)4 $a,* Z % O 3:v :" :: S ` <\x HELVETICA  TIMESROMAN  HELVETICA HELVETICA  HELVETICALOGO TIMESROMAN  HELVETICA  HELVETICA  HELVETICA  HELVETICA  HELVETICA  TIMESROMAN  HELVETICAHIPPO  HELVETICA  HELVETICA  TIMESROMAN  HELVETICA  HELVETICA HELVETICA HELVETICA  HELVETICA  TIMESROMAN  HELVETICA  HELVETICA HELVETICA HELVETICAHIPPO  HELVETICA  HELVETICA  TIMESROMAN  HELVETICA HELVETICA  HELVETICA HELVETICA HELVETICA  HELVETICA  TIMESROMAN  HELVETICA  HELVETICA HELVETICA HELVETICAHIPPO  HELVETICA  HELVETICA  HELVETICA  HELVETICA HELVETICAHIPPO d `fx} (0 o: kC NN W Oahow~%   $* 3:BJQY b kr-y      b "B":J Z  !?B Z iKiJ,Cf9G FiCB"9@ ]F[Do;J;\;;@;[E=<t$~y?Q,3gfp XXޖ0d )( DQ,p  pfemory2.pressffemory2.press to local file d1memory2.press [New file]]o96Z" :#: Z"FD8m;J;\;;j/ GDoradoManual-A.pressFiala17-Sep-81 9:32:22 PDT