Page Numbers: Yes X: 520 Y: -.5" First Page: 75
Heading:
Dorado Hardware ManualInstruction Fetch Unit14 September 1981
IFU Reset
The processor can reset the IFU by executing the IFUReset function. This clears all IFU error conditions, prevents further IFU memory references, clears the BrkIns← feature discussed earlier and the test features discussed later, and generally puts the IFU in a clean and operable state. The Reschedule feature is not affected by IFUReset.
IFUReset should be executed after power-on to get the IFU shut off. A single IFUReset will make the IFU passive with respect to operating the rest of Dorado. However, the IFU itself might not be operable until a second IFUReset is executed because of a pathological condition (If BrkIns is loaded and Testing is true, then the first IFUReset will clear Testing but not BrkIns; a second IFUReset is required to clear BrkIns in this case).
If the IFU has any outstanding memory references pending at the time the first IFUReset is executed, those references will complete and disturb the top part of the IFU pipeline. A second IFUReset must be issued after these references have all finished prior to reading or writing IFUM. If the second IFUReset is executed 36 or more cycles after the first, then it will for sure completely reset the IFU.
The worst case is when a miss has just started the storage pipeline with an IFU reference in the cache address section. In this case the IFU reference does not enter the storage pipeline until the 8th cycle and then takes 28 cycles to complete.
IFUReset should be executed prior to using BrkIns←. It should also be executed after reading or writing IFUM (to reset the BrkPending condition that is still lurking).
Rescheduling
Io tasks request service from the emulator by first indicating a request in some way (Presently an RM location is used as a 16-bit table in which 1’s indicate requests.), then executing the Reschedule function, and finally blocking. The IFU and the processor store the reschedule condition in flipflops which remain set until the NoReschedule function turns them off.
The next IFUJump after Reschedule transfers to the entry vector for the opcode as usual; the reschedule trap address will drop into the IFAddr register at t2 of this instruction, and the first IFUJump after that will dispatch into the reschedule trap vector. This means that second IFUJump will trap unless the second IFUJump occurs on the instruction immediately after the first IFUJump, in which case the trap will not occur until the third IFUJump. IFUJump’s that experience a NotReady trap are not counted.
The entry vector at the reschedule trap location is entered as though it were the next opcode. When Reschedule is used by io tasks to request the wakeup of another process, this fact is unimportant. However, the other use of Reschedule is in continuation from map (and other) faults. In this application, the reschedule trap will wind up restoring the IFU state by executing an appropriate number of ←Id’s and eventually branching back to the instruction that experienced the fault. The continuation method is discussed later.
Opcodes which might execute for a long time, such as block transfer and BitBlt, must check for rescheduling explicitly, and the (emulator only) Reschedule branch condition makes this check easier. If such opcodes did not check for rescheduling, then service to the io device might be postponed for too long.
The reschedule flipflops are not cleared by IFUReset, so the NoReschedule function must be executed as part of system reset.
When the reschedule trap vector is entered, the IFU is in an undefined state except for PCX’, and PCF← is needed to restart the IFU at the continuation address.
Breakpoints
BrkIns←B implements debugging breakpoints straightforwardly. The idea is that a one-byte opcode, BrkP, is used to transfer control to a debugger while saving emulator state needed to continue later, and another opcode, Continue, is used to continue from breakpoints (For Mesa, BrkP and Continue are special cases of Xfer.).
BrkP may be substituted for any opcode in a program. The debugger gets control when BrkP is executed, saves state, and eventually can execute Continue to restore state from values saved by BrkP.
Continue first restores registers, then loads BrkIns with the opcode for which BrkP was substitued; then it uses PCF←B to restart the IFU at the breakpoint. The IFU will then start running; the first opcode fetched will again be the BrkP opcode, but the contents of BrkIns will be substituted for the one fetched from memory, and the program will continue correctly.
Without BrkIns←B the debugger would have to simulate the broken opcode before continuing at the following opcode, which would be harder. The example below shows a code sequence for the final part of Continue.
Continue:
. . .
IFUReset;*Stop future IFU fetches and clear pipe
T←41C;
Cnt←T;
IFUReset, Goto[.,Cnt#0&-1];*Reset after previous IFU fetches complete
BrkIns←Opcode;*Load opcode which BrkP replaced
PCF←BreakAddress;*Restart IFU at address of BrkP
Noop;*No-op required after PCF← before IFUJump
IFUJump[0];*Resume program
Note: IFUReset is required before BrkIns←, even when an opcode of type Pause is in progress.
Reading and Writing IFUM
In addition to its function related to breakpoints, BrkIns←B is used to address IFUM when reading or writing that memory.
When IFUM is loaded, it is addressed by the instruction set InsSet[0:1] and BrkIns. The data must remain on B for two cycles, so tasking must be disabled and the instruction following the one with IFUMLH/RH← must put the same data on B. If this data comes from RM or T, the register must not have been loaded in the cycle preceding the IFUMLH/RH← (because the bypass logic will change the B select from Pd or Md to RM or T, possibly glitching data on B). The following subroutines illustrate loading and reading back IFUM.
WriteIFUM:
IFUReset;*Stop future IFU fetches and clear the pipe
T←41C;
Cnt←T;
IFUReset, GoTo[.,Cnt#0&−1];*Reset after previously issued fetches complete
InsSetOrEvent←RMaddr0;*Load 2 instruction set bits forming IFUM address
BrkIns←RMAddr1;*Load 8 opcode bits forming IFUM address
TaskingOff;*Ensure no B glitch below and let BrkIns← settle for 1 cycle
IFUMLH←RMdataHi;*Write high part of IFUM
B←RMdataHi;*Keep data good a little longer (mustn’t glitch)
IFUMRH←RMdataLo;*Write low part of IFUM
B←RMdataLo, TaskingOn;*Keep data good a little longer
IFUReset, Return;*Clear BrkIns
ReadIFUM:
IFUReset;*Stop future IFU fetches and clear the pipe
T←41C;
Cnt←T;
IFUReset, GoTo[.,Cnt#0&−1];*Reset after previously issued fetches complete
BrkIns←RMaddr1;*Load 8 opcode bits forming IFUM address
InsSetOrEvent←RMaddr0;*Load 2 instruction set bits forming IFUM address
Noop;*Two instructions must elapse after loading BrkIns
*one after loading InsSet (?Two noops after loading InsSet
*might be better since this is a tight path?)
RMdataHi←IFUMLH;*Read IFUM into RM.
RMdataLo←IFUMRH;
IFUReset, Return;*Clear BrkIns
Continuing from Processor Faults
Saving and restoring the state of an interrupted program requires some cleverness not only for the IFU, but also for the Control, Processor, and Memory sections. The emulator might fault for a data error, map fault, or stack overflow/underflow; for io tasks, stack overflow/underflow is impossible and map faults will probably be illegal, so only data error faults are legitimate. The discussion here will concentrate on map faults, though the same approach could be used for other fault conditions as well.
The fault task must use as few instructions as possible so that io tasks won’t be preempted for too long. The minimum is to copy all pipe entries that contain memory faults into RM or Stk buffers, preserve DBuf, and save the emulator’s TPC; the fault task must itself deal with data error faults by io tasks; it then restarts the emulator at a trap address. The emulator microprogram then saves the rest of the emulator state and deduces the nature of the fault(s) using methods discussed in "Memory Section".
The emulator fault microcode first saves ALU branch conditions and task-specific registers, then other information of interest. The saved information is stored where the Mesa (or whatever) program can get at it; then the trap microcode restarts Mesa at a trap procedure that will service the map fault (probably swap in a page from the disk); eventually, state will be restored and the opcode that faulted will be resumed at the instruction that faulted.
The IFU state may be saved via B←IFUMLH’ and B←PCX’. B←IFUMLH’ reads the current instruction set and IdCnt from B[0:4]; B[5:15] are IFUM bits which are not of interest when saving the state of the program, so the tricky code sequence given earlier for reading IFUM is not required. B←PCX’ reads the current PC.
The 3-bit counter, IdCnt, keeps track of how many ←Id’s have been done; to avoid overflowing this counter, no more than 7 ←Id’s should be done when executing any opcode. This is one (harmless) restriction on coding emulators. The other is that emulators never map fault on the instruction after a dispatch (BDispatch←B, BigBDispatch←B, or Multiply); this can be assured by doing ←Md prior to or concurrent with any dispatch.
Sample microcode for saving emulator state is as follows:
%Must first save the volatile branch conditions; Overflow and Carry won’t change unless an arithmetic ALU operation is executed, so saving them can be deferred. T, the first item saved, is written into the RM region reserved for Save using the change-RBase-for-write FF decode.
%
Save:FreezeBC, DblGoTo[ALUls,ALUge,ALU<0];
ALUls:SavedT←T;
T←0C, GoTo[SaveBC];
ALUge:SavedT←T, DblGoTo[ALUgr,ALUeq,ALU#0];
ALUgr:T←1C, GoTo[SaveBC];
ALUeq:T←2C;
*Have a code, 0, 1, or 2, in T indicating the state of the ALU<0 and ALU=0 branch conditions.
SaveBC:SavedALULEZ←T;*Save the branch condition code
T←Pointers;*T←MemBase, MemBX, and RBase
T←T Or (100000C);*Make negative
RBase←RBase[SaveRMRegion];
*Now choose two numbers such that their sum produces the correct ALUcry and Overflow branch
*conditions.
SavedPointers←T, MemBase←SaveBaseReg, DblGoto[Cry,NoCry,Carry];
Cry:DblGoTo[CryOvf,CryNoOvf,Overflow];
NoCry:DblGoTo[NoCryOvf,NoCryNoOvf,Overflow];
CryOvf:SaveA1←100000C;
SaveA2←100000C, GoTo[SaveRest];*Numbers such that SaveA1+SaveA2 produces
*Overflow and Carry result
NoCryNoOvf:
SaveA2←0C, GoTo[.+2];
CryNoOvf:SaveA2←1C;
SaveA1←177777C, GoTo[SaveRest];
NoCryOvf:SaveA1←77777C;
SaveA2←77777C, GoTo[SaveRest];
SaveRest:
SavedPCX←Not(PCX’);
T←Not(IFUMLH’);*Read IdCnt and InsSet in IFUMLH[0:4]
SavedIdCnt←LdF[T,0,2];
T←T and (14000C);
T←RSh[T,2];
SavedInsSet←T+(100000C);*Set up word for InsSetOrEvent← below
. . .*Code to save rest of state (all easy)
Sample microcode for continuing is given below:
Resume:. . .*Restore all processor registers except T, Cnt, RBase,
*and MemBase.
InsSetOrEvent←SavedInsSet;*Restore the IFU instruction set number.
PCF←SavedPCX;*Restart IFU at address of the opcode that faulted
WakeUp[ContTask];*Wakeup the special task used for continuation.
Noop;*No-op required so that the instruction after the IFUJump
*below will be executed by the continuation task.
Cnt←SavedIdCnt, IFUJump[0];*Continue execution in the continuation task at Cont0
Resume1:Skip[Cnt=0&−1], At[Resume1Loc];*Reissue the appropriate number of ←Id’s to put
A←Id, GoTo[.−1];*the IFU in the state it was in at the fault.
Cnt←SavedCnt;*Restore Cnt
. . .*Restore Md by fetching from a convenient storage
*location. Then repeat the Fetch← or Store← that
*faulted using a convenient base register and restore
*the base register (complicated code here needs careful
*thought).
T←SaveA1;
Pd←T+SaveA2;*Restore Carry and Overflow branch conditions.
T←SavedT, TaskingOff;*Restore T register
*Below, the TaskingOff, WakeUp, TaskingOn sequence insures that precisely one emulator instruction will
*be executed after the TaskingOn before the continuation task runs.
BDispatch←SavedALULEZ;*Dispatch to 0, 1, or 2 in table based on
*ALU>0, ALU<0, or ALU=0.
WakeUp[ContTask];*Wakeup the special task reserved for continuation.
Link←SavedLink, At[ConTab,0];*Restore Link and ALU>0
TaskingOn;
Pd←Not(Pointers←SavedPointers), GoTo[COK];
Link←SavedLink, At[ConTab,1];*Restore Link and ALU<0
TaskingOn;
Pd←Pointers←SavedPointers, GoTo[COK];
Link←SavedLink, At[ConTab,2];*Restore Link and ALU=0
TaskingOn;
Pd←(SavedPointers) xor (Pointers←SavedPointers), GoTo[COK];
COK:FreezeBC, GoTo[.];
*The special restart task needed for continuation
ContinueInit:
RBase←RBase[SavedTPC];*Initialization code for the task
*First of two wakeups comes here—change emulator’s TPC to Resume1 and block.
Cont0:Block;
T←Resume1Loc;
Link←T, TaskingOff;
LdTPC←0C;*Restart emulator at Resume1
TaskingOn;
Block;
*Second of two wakeups comes here. Reload emulator TPC with continuation address.
Cont1:Link←SavedTPC;
LdTPC←0C;*Restart emulator at saved continue address
Branch[Cont0];
IFU Testing
The IFU test control register is loaded by the IFUTest←B function; when not testing, this register should contain 1, and it is loaded with 1 by the IFUReset function. IFUTest.15 disables the periodic wakeup request to the Junk task discussed in the "Slow IO" chapter; when IFUTest.15 is 0, the junk wakeups occur 60 times/sec and are dismissed by any IFUTest← function.
IFUTest.14 (TestEn) enables IFU test mode; it is illegal for this bit to change from 0 to 1 when the IFU is active because, if this occurred in the same cycle that an IFU memory reference was issued, then the IFU would pollute the Mar bus indefinitely, making the memory system unusable by the processor.
The test features aim at two situations. First, they allow the IFU clocks to be controlled by a program, so a diagnostic can slowly step the IFU pipeline through its stages. Secondly, they allow data supplied by a diagnostic to be substituted for signals that would otherwise come from the memory system. This allows the IFU to be tested in the absence of the memory system, which allows scope probes to be inserted easily and decouples IFU problems from memory system problems.
The TestFH’ and TestSH’ bits in the IFUTest register enable the first-half-cycle and second-half-cycle clocks, respectively, which will occur between t2 and t4 of the cycle after the one issuing the IFUTick function. Thus, the IFU can be stepped through a PCF←B function as follows:
TaskingOff;
IFUTest←TestEn;
IFUTick;
PCF←value;
where PCF←value is just an example—any other IFU function or an IFUJump could be used instead.
The IFU’s memory interface is simulated by the TestFG, TestParity, TestFault, TestMemAck, and TestMakeF←D bits in IFUTest. Memory references are not issued by the IFU when TestEn is true. TestFG and TestParity are substituted for the FG byte and parity bit from the memory system; the other signals are control signals sent by the memory system in response to IFU references. They are supposed to work as follows:
MemAck occurs at t2 of a cycle in which the IFU makes a reference at t1, iff the memory system accepted the reference; if the memory system was busy and did not accept the reference, then MemAck does not occur, and the IFU should repeat its reference. The absence of MemAck serves approximately the same purpose for the IFU that Hold serves for the processor.
MakeF←D occurs at t1 of a cycle in which the memory system loads F at t3; in the event of a map fault, MakeF←D occurs at t1 of the cycle in which the memory system would have loaded F at t3 if the map fault had not occurred. The IFU can try to start a reference at t1, even though it has an unfinished reference in progress. The memory system will accept the reference iff MakeF←D occurs; otherwise, it will refuse the reference. In other words, the IFU’s second reference starts at t1 iff the first reference will deliver data at t3.
Fault is concurrent with (?) MakeF←D and indicates that the IFU reference experienced a map fault.
In other words, a memory reference can be simulated with the IFU test feature by (1) ticking the IFU through a cycle in which it makes a reference; (2) ticking the TestMemAck response of the memory system with IFUTest←B and IFUTick; (3) ticking TestMakeF←D; (4) ticking with TestFG and TestParity holding simulated memory data.
Details of Pipe Operation
The IFU is a six-stage pipeline, starting with words fetched from memory, and ending with opcode starting addresses delivered to the control section and operands delivered to the processor. The levels are named: F, G, H, J, M and X. Each level has a data-valid bit indicating whether or not it contains something useful.
PCF, PCJ, PCM, and PCX are PC’s for the corresponding pipe levels (except that PCF is a word PC rather than a byte PC). PCF, PCM, and PCX are independent of each other since jumps and PCF← may result in these all being different; PCJ is related to PCF by the number of valid bytes in the F/G/H levels; the hardware also uses PCFG, which contains PCF plus the number of valid bytes in the F/G levels. Operationally, F/G are a FIFO in which PCF is the write pointer, incremented as words are fetched from the cache, and PCFG is the read pointer, incremented as bytes are moved from F/G into J/H. Note that there is no PCH because PCH would equal PCJ+1.
Pipe control is straightforward in principle. The F and G levels are 16-bit registers filled from the cache. Following PCF←B, if there is space in the pipeline for another word, the IFU will start a reference at t1 of any cycle in which the processor is not using Mar (so as many as 2 IFU references can be outstanding). Cache words are stored in F at t1, then dropped into G at t2; bytes drop into H at t3 or J at t4; there are bypass paths to get bytes directly from F/G into J when H is invalid. As the processor executes opcodes, F and G become invalid, and the IFU refills them from memory automatically. This continues until the IFU is reset by the processor, or encounters a pause opcode.
The F and G registers are physically located on the MemD board. The four bytes in F/G are inputs to a multiplexor controlled by the IFU, and the multiplexor output is sent across the backplane to the IFU. BrkIns[0:7] or IFUTest[0:7] replace F/G data when using breakpoints, reading/writing IFUM, or using IFU test features.
While following the opcode stream, a jump will invalidate data in F. However, if a reference is in progress and F has not yet been filled by the memory system, then the IFU will invalidate the data when it arrives and restart the next reference immediately. In other words, the IFU cannot abandon the useless fetch; it must wait for it to finish and discard the result.
The J and H levels are one byte wide. For one-byte opcodes it is possible to consider H and J as independent levels of the pipe; however for two or three-byte opcodes, it is appropriate to consider J/H as a single level in which J holds the opcode and H holds a.
If J is invalid, then it will be loaded from the next opcode (which may be in G, F, or H according to various conditions) at an even clock (t0) and H will be loaded from the byte after the opcode (which is always in G) at the following odd clock (t1); if the byte after the opcode isn’t ready, it will drop into H at the next odd clock after it is ready. The InsSet and J registers address IFUM and IFUM outputs reveal whether the byte in H is a (Length = 2 or 3) or the next opcode (Length = 1).
The conditions under which the M level can be loaded from J are that M is invalid (or about to become invalid) and:
Length = 1 -or-
Length = 2 and H is valid -or-
Length = 3 and H is valid and either F or G is valid.
If these conditions are met, then the M level is loaded (t2) with information from IFUM and with a, if Length = 2 or 3. If Length = 3, then b will drop from G into H (t3).
If Length < 3, then the H/J level is now free to work on the next opcode. If Length = 1 and the next opcode happens to be in H, then H will drop into J at the same time (t2); otherwise, J will be loaded from the next opcode in F/G when it is ready.
When the processor does an IFUJump[n], level M presents information needed by the next opcode as follows:
IFaddr is TNIA[4:13] for the IFUJump;
MemBase is set to 0.MemBX.MemB[1:2] or 348+MemB[1:2];
RBase is set to 0 or 1;
N, Sign, Length, Packeda, and a are loaded into the X level;
b is loaded into the M level if Length = 3.
Referencing IFU operands with A←Id, TisId, or RisId affects the IFU in two ways: it causes the IFU to advance to the next item of Id, and for a 3 byte instruction when a is taken (a[4:7] when Packeda = 1) it causes b to drop from M to X, freeing M for the next instruction.
IFetch← also uses Id, as discussed in memory section, but does not advance the IFU to the next item of Id.
For a one or two-byte opcode, it is permissible for the processor to do an IFUJump before referencing any operands with ←Id; this will advance normally to the next opcode. However, for a three-byte opcode the processor must reference all of a, so that b drops into X, before doing an IFUJump.
When a pause or jump is recognized, the IFU may already have filled the F and G levels erroneously (i.e., 4 bytes ahead). These levels are flushed and refilled along the jump path.
Timing Details
This section discusses timing details of the IFU pipeline assuming that all IFU references hit in the cache and are never deferred for processor references.
First case: Restart IFU at even byte
t0:An instruction with PCF←FOO is started, where FOO is even.
t2:F, G, H, J, and M levels are made invalid.
t3:Reference the word containing FOO.
t5:Reference word containing FOO+2.
t7:Load F with data from the FOO reference; reference the word containing FOO+4.
t8:Load the first byte from F into J; load G from F; F becomes invalid; start reading the IFUM entry for J.
t9:Load the putative operand byte from G into H; G becomes invalid; load F from the FOO+2 reference.
t10:Distinguish 5 cases below.
FOO is a one-byte regular opcode
t10:Load M from IFUM; IFUJump will now succeed; load J from H (FOO+1); load G from F (FOO+2 and FOO+3); F and H become invalid; start reading the IFUM entry for J.
t11:Load H from G (FOO+2); load F from FOO+4 reference.
t12:— (The FOO+1 opcode would pop into M if IFUJump were done at t10.)
IFU is quiescent; F has two useful bytes, G one byte, J/H has two bytes; M level is ready and waiting for IFUJump.
FOO is a two-byte regular opcode
t10:Load M from IFUM and M[a] from H; IFUJump will now succeed; load J from F (FOO+2); load G from F (garbage and FOO+3); F and H become invalid; start reading the IFUM entry for J.
t11:Load H from G (FOO+3); G becomes invalid; load F from FOO+4 reference; reference the word containing FOO+6.
t12:Load G from F; F becomes invalid.
t15:Load F from the FOO+6 reference; now quiescent.
FOO is a three-byte regular opcode
t10:Load M from IFUM and M[a] from H; IFUJump will now succeed; load G from F (FOO+2 and FOO+3); H and F become invalid; J goes to special state (b in H).
t11:Load H from G (FOO+2 = b); load F from the FOO+4 reference; now quiescent.
t12:— (The FOO+2 byte would pop from H into M[b] if IFUJump were done at t10.)
FOO is a one-byte jump opcode
t10:Load M from IFUM; IFUJump will now succeed; J, H, G, and F become invalid.
t11:Discard the FOO+4 reference; reference the first word along the jump path.
t13:Reference the second word along the jump path.
t15:Load F from the first word along the jump path.
t16:Load J from F, etc.
FOO is a two-byte jump opcode
t10:Load M from IFUM and M[a] from H; IFUJump will now succeed; G and F become invalid; J and H are in a special jump state, computing the jump address.
t11:Discard the FOO+4 reference; reference the first word along the jump path.
t12:J and H become invalid.
t13:Reference the second word along the jump path.
t15:Load F from the first word along the jump path, etc.
Second case: Restart IFU at odd byte
t0:An instruction with PCF←FOO is started, where FOO is odd.
t2:F, G, H, J, and M levels are invalid; IFUJump will trap at NotReady.
t3:Reference the word containing FOO.
t5:Reference word containing FOO+1.
t7:Load F with data from the FOO reference; reference the word containing FOO+3.
t8:Load the second byte from F into J; F becomes invalid; start reading the IFUM entry for J.
t9:Load F from the FOO+1 reference.
t10:Distinguish 3 cases below (and the one and two-byte jump cases which are not repeated below).
FOO is a one-byte opcode
t10:Load M from IFUM; IFUJump will now succeed; load J from F (FOO+1); load G from F (garbage and FOO+2); F becomes invalid; start reading the IFUM entry for J.
t11:Load H from G (FOO+2); G becomes invalid; load F with the FOO+3 reference; reference the word containing FOO+5.
t12:Load G from F; F becomes invalid.
t15:Load F from the FOO+5 reference; now quiescent.
FOO is a two-byte opcode
t10:Load G from F (FOO+1 and FOO+2); F becomes invalid.
t11:Load H from G (FOO+1); load F with the FOO+3 reference.
t12:Load M from IFUM and M[a] from H; IFUJump will now succeed; load J from G (FOO+2); load G from F; F and H become invalid; start reading the IFUM entry for J.
t13:Reference the word containing FOO+5; load H from G (FOO+3).
t17:Load F with data from the FOO+5 reference; now quiescent.
FOO is a three-byte opcode
t10:Load G from F (FOO+1 and FOO+2); F becomes invalid.
t11:Load H from G (FOO+1); load F from the FOO+3 reference.
t12:Load M from IFUM and M[a] from H; IFUJump will now succeed; H becomes invalid; J is in a special state (b in H).
t13:Load H from G (FOO+2); load G from F (FOO+3 and FOO+4); F becomes invalid; reference the word containing FOO+5.
t17:Load F from the FOO+5 reference; now quiescent.