:TITLE[MesaOP2]; *Opcodes 200 to 277b + refill trap instructions %Ed Fiala 21 February 1984: Bummed 2 cycles each off JW, EXCH, ACD, ADDSB, DCMP<0, UDCMP<0, timing, 4 cycles off DEXCH, 1 to 7 cycles off J5-J8. Add @J5, @J7, @JDEB, @JDNEB for Klamath; delete @OP276; move @LP to MesaOP3; change TrapParm to xfTrapParm0; absorb refill trap mi from elsewhere. % %The PCB,,PCBhi base register points at the current instruction quadword. PCB[14:15] are 0, and the low 3 bits of the PC (which point at a byte within the quadword) are kept in PCF. Since code segments cannot cross 64K boundaries and are limited to 32K words in length, the two bytes of PCBhi are forced to be equal, rather than having the least significant byte differ from the msb by 1. This facilitates negative jumps. Refill occurs when, at the onset of a NextInst or NextData, PCF contains a value greater than 7b. In this case the mi is aborted and the trap mi at location 0, 'LoadPageExternal[0], GoToExternal[377]', is executed, sending control to location 377b on the page that caused refill. Identical mi exist on all pages from which refill might occur. Refill timing is as follows: The aborted mi, trap mi at 0, and PFetch4 use 6 cycles; memory wait uses 12 more, totalling 18 cycles. Assuming that the next byte in the instruction stream crosses a quadword boundary 1/8 of the time, a charge of 18/8 = 2.25 cycles/byte is appropriate. Ideally, a jump opcode should be charged execution time + 2.25 cycles/byte corrected by the improvement or worsening of the quadword position in the instruction quadword as a result of the jump x 2.25 cycles. However, if all quadword positions are equally likely jump targets, the position will be improved by 1 byte on average (because the original value of PCF is 1..10b; after the jump it is 0..7b), so average execution time + 2.25 cycles/byte in the opcode - 2.25 cycles is charged to a jump. However, counting is more difficult when memory references are in progress at the tail of the opcode. Consider the following sequence, for example: PFetch1[LOCAL,Stack]; LU _ NextInst[IBuf]; NIRet; The 1st mi of the next opcode will be aborted once (two cycles) to ensure that PCX and SStkP will remain valid for the previous opcode on a fault. Any fault (sans memory transport) aborts the 4th mi after the PFetch1. If the NextInst causes refill, timing for this sequence will be 25 cycles; otherwise, timing will be 9 cycles if the next opcode does not reference [S] for a long time or 15 cycles if [S] is referenced in the 1st mi of the next opcode. Assuming the NextInst causes refill 1/8 of the time, this sequence will average 11.25 cycles if [S] is not referenced for a long time or 16.25 cycles if [S] is referenced in the 1st mi of the next opcode. All of these times assume the quadword is not retransmitted from MC2 as a result of a correctable storage failure. Similarly, for the following: PStore1[LOCAL,Stack]; LU _ NextInst[IBuf]; NIRet; If the NextInst causes refill, the timing for this sequence is 35 cycles; otherwise, timing is ~9 cycles if [S] is not written and no reference occurs for a long time, 17 cycles if [S] is written or a new reference is issued in the 1st mi of the next opcode. Assuming NextInst causes refill 1/8 of the time, this sequence averages 12.25 cycles if the next opcode doesn't write [S] or start a reference and 19.25 cycles if it does one of these things in its 1st mi. % MesaRefill: PCF _ RZero, At[MesaRefillLoc]; *Hold page fault on page 0 while assuring that the PCB_PCB+4 doesn't happen *until AFTER the page fault. Even if the PCB+4 were in the mi immediately *after the PFetch4, it would be uncertain whether or not that mi had *completed, so the fault handler could not be certain what to do. Nop; Nop; PCB _ (PCB) + (4C), Return; *Mandatory refill trap mi for NextInst and NextData operations. :IF[WithDShift]; *************************************** PFetch4[PCB,IBuf,4], GoToP[MesaRefill], At[LShift[dsPage0,10],377]; *3b :ENDIF; ************************************************ PFetch4[PCB,IBuf,4], GoToP[MesaRefill], At[LShift[opPage0,10],377]; PFetch4[PCB,IBuf,4], GoToP[MesaRefill], At[LShift[opPage1,10],377]; PFetch4[PCB,IBuf,4], GoToP[MesaRefill], At[LShift[opPage2,10],377]; PFetch4[PCB,IBuf,4], GoToP[MesaRefill], At[LShift[opPage3,10],377]; PFetch4[PCB,IBuf,4], GoToP[MesaRefill], At[LShift[moPage,10],377]; *10b PFetch4[PCB,IBuf,4], GoToP[MesaRefill], At[LShift[bbP1,10],377]; *11b :IF[WithFloatingPoint]; ******************************** PFetch4[PCB,IBuf,4], GoToP[MesaRefill], At[LShift[fpPage0,10],377]; *13b :ENDIF; ************************************************ PFetch4[PCB,IBuf,4], GoToP[MesaRefill], At[LShift[xfPage1,10],377]; *15b PFetch4[PCB,IBuf,4], GoToP[MesaRefill], At[LShift[prPage,10],377]; *16b %NOTE: after any PFetch4 on page 6 faults, the fault handler resumes after filling IBuf with 377b bytes; this means that it is inadvisable to use the bypass kludge after the PFetch4 and control must remain on page 6 (i.e., no tasking is allowed in the 3 mi after the PFetch4 which refills IBuf). If the bypass kludge were used, transport for a preceding PFetch4, such as the one in RDC.Mc, could experience error correction and advance page fault time so that the mi containing the bypass kludge was aborted, and this would then execute incorrectly. NOTE: The final mi of a jump opcode writes the PCB register, so it is illegal to begin any opcode with a PCB-relative fetch with non-zero displacement. PCF points to the byte after the last one fetched (i.e., PCF = 1..10b). PCX is loaded from PCF at T2 of the first mi executed. NextInst is illegal in the mi after PCF_. @J2 or @JB cannot be used instead of J2 or JB because, after page faults, PCX is wrong for continuation since it is smashed at an opcode starting instruction. % @CATCH: SkipData, CallX[P6Tail], Opcode[200]; @J2: SkipData, CallX[P6Tail], Opcode[201]; *Timing: 6+(18*(2/8)) = 10.5 @J3: SkipData, CallX[J2], Opcode[202]; *Timing: 8+(18*(3/8)) = 14.75 @J4: SkipData, CallX[J3], Opcode[203]; *Timing: 10+(18*(4/8)) = 19.00 ShortJumpFix: T _ (PCFReg) - T; RTemp _ T, GoTo[P6PCFTail,ALU<0]; PFetch4[PCB,IBuf,4]; PCB _ (PCB) + (4C); P6PCFTail: PCF _ RTemp; Nop; *NextInst illegal in mi after PCF_; no tasking here P6Tail: LU _ NextInst[IBuf]; P6Tailx: PCB _ (PCB) and not (3C), NIRet; %In these opcodes, PCF is in the range 1..10b, so J5, for example, must add 5-1 to PCF to get the target; this result is in the range 5..14b. For @J5 to @J7, the target can wind up in either the quadword already in IBuf or the quadword after that. Since (X-10b) mod 10b = X, the ShortJumpFix subroutine computes PCF+disp-1-10b = PCF-(11b-disp), which is .ls. 0 if the target is in the current quadword. Timing for these is 15 cycles if in the same quadword or 24 cycles if in the next quadword. % *Avg. timing = 20.625 cycles @J5: T _ 4C, GoTo[ShortJumpFix], Opcode[204]; *Avg. timing = 21.75 cycles @J6: T _ 3C, GoTo[ShortJumpFix], Opcode[205]; *Avg. timing = 22.875 cycles @J7: T _ 2C, GoTo[ShortJumpFix], Opcode[206]; %PFetch4[PCB,...] should be ok in 1st mi of opcode because, even if PCB was loaded in the previous mi at P6Tailx, the value of PCB being loaded and the old value are equally good for the reference. But this caused problems somehow?? Timing = 20 cycles % @J8: PCB _ (PCB) + (4C), Opcode[207]; *Timing = 18 cycles PFetch4[PCB,IBuf,0]; T _ (PCFReg) - 1; RTemp _ T, GoTo[P6PCFTail]; J2: SkipData, CallX[P6Tail]; *Odd Ejmp: T _ NextData[IBuf], CallX[JBr]; *Even JB: T _ NextData[IBuf], CallX[JBr]; *Odd Enojmp: SkipData, CallX[P6Tail]; *Even %Jump Byte: alpha is a signed displacement from the opcode. AllOnes is used as a temporary, restored when done. Note that RH[PCBhi] = LH[PCBhi] since code can't cross 64k boundary. Timing: 28.25 (+3 if negative displacement) cycles. % @JB: T _ NextData[IBuf], Opcode[210]; *JBr timing: 24 cycles (+3 if displacement negative). JBr: T _ (PCFReg) + T + 1, Skip[H2Bit8']; PCB _ (PCB) - (200C); *Offset 400b bytes if neg. displacement *Jump here on @JW, @JIB, and @JIW. *AllOnes _ alpha+PCF-2 for @JB or (alpha..beta)+PCF-3 for @JW, where -2 *or -3 displaces alpha to the 1st opcode byte. JBy: AllOnes _ (Form-3[AllOnes]) + T; T _ RSh[AllOnes,1], Skip[R>=0]; T _ (LSh[R400,7]) or T; PFetch4[PCB,IBuf]; PCF _ AllOnes; PCB _ (PCB) + T; AllOnes _ (Zero) - 1, GoTo[P6Tail]; %Jump Word. alpha..beta is a 2's complement displacement relative to the first opcode byte. Because PCF is read before the final NextData, the code must allow for the possibility that PCF was 10b when read, but will be 1b after the final NextData (and PCB will have been advanced). To do this, the high bit of PCF is cleared on read-out. Timing: 30.5 cycles (+3 if displacement negative). % @JW: LU _ CycleControl _ NextData[IBuf], Opcode[211]; *get alpha T _ (Cycle&PCXF) and not (370C); T _ (NextData[IBuf]) + T + 1, CallX[JBy]; PairComp: T _ Stack&-1, UseCTask; LU _ (LdF[Cycle&PCXF,0,4]) xor T, Return; *Jump Equal Pair. *Jump if the 1st nibble in alpha .eq. TOS with displacement equal to the 2nd *nibble in alpha + 4 = -2 + (4+PCF+pair right); pop stack once. *Timing: 17.5 cycles (no jump), 12.25 cycles + JBr (jumps). @JEP: LU _ CycleControl _ CNextData[IBuf], Call[PairComp], Opcode[212]; T _ 4C, Skip[ALU=0]; LU _ NextInst[IBuf], CallX[P6Tailx]; T _ (LdF[Cycle&PCXF,4,4]) + T, GoTo[JBr]; stkdif: LU _ (Stack&-1) - T, Return; *Jump Equal Byte. *Jump with displacement alpha if TOS .eq. 2OS; pop stack twice. *Timing: 10.25 + JBr (jumps), 17.5 cycles (no jump) @JEB: T _ Stack&-1, UseCTask, Call[stkdif], Opcode[213]; JEQBx: DblGoTo[Ejmp,J2,ALU=0]; *Jump Equal Byte Byte. *Jump with displacement beta if TOS .eq. alpha; pop stack once. *Timing: 12.5 cycles + JBr (jumps), 19.75 cycles (no jump). @JEBB: T _ NextData[IBuf], Opcode[214]; LU _ (Stack&-1) xor T; Skip[ALU=0]; SkipData, CallX[P6Tail]; T _ (NextData[IBuf]) - 1, CallX[JBr]; *Jump Not Equal Pair. *Jump if the 1st nibble in alpha .ne. TOS with displacement equal to the 2nd *nibble in alpha + 4; = -2 + (4+PCF+pair right) pop stack once. *Timing: 16.5 cycles (no jump), 12.25 cycles + JBr (jumps). @JNEP: LU _ CycleControl _ CNextData[IBuf], Call[PairComp], Opcode[215]; T _ 4C, Skip[ALU#0]; LU _ NextInst[IBuf], CallX[P6Tailx]; T _ (LdF[Cycle&PCXF,4,4]) + T, GoTo[JBr]; *Jump Not Equal Byte. *Jump with displacement alpha if TOS .ne. 2OS; pop stack twice. *Timing: 16.5 cycles (no jump), 11.25 cycles + JBr (jumps). @JNEB: T _ Stack&-1, UseCTask, Call[stkdif], Opcode[216]; JNEBx: DblGoTo[JB,Enojmp,ALU#0]; *Jump Not Equal Byte Byte. *Jump with displacement beta if alpha .ne. TOS; pop stack once. *Timing: 18.75 cycles (no jump), 13.5 cycles + JBr (jumps). @JNEBB: T _ NextData[IBuf], Opcode[217]; LU _ (Stack&-1) xor T; Skip[ALU#0]; SkipData, CallX[P6Tail]; T _ (NextData[IBuf]) - 1, CallX[JBr]; JLBpos: DblGoTo[J2,Ejmp,Ovf']; *Even JLBneg: DblGoTo[JB,Enojmp,Ovf']; *Odd *Jump Less Byte. *Jump with displacement alpha if integer 2OS .ls. TOS; pop stack twice. *Timing: 18.5 or 20.5 cycles (no jump), 13.25 cycles + JBr (jumps). @JLB: T _ Stack&-1, UseCTask, Call[stkdif], Opcode[220]; JLBx: FreezeResult, DblGoTo[JLBpos,JLBneg,ALU>=0]; JGEBpos: DblGoTo[JB,Enojmp,Ovf']; *Even JGEBneg: DblGoTo[J2,Ejmp,Ovf']; *Odd *Jump Greater Equal Byte. *Jump with displacement alpha if integer 2OS .ge. TOS; pop stack twice. *Timing: 19.5 cycles (no jump), 12.25 or 13.25 cycles + JBr (jumps). @JGEB: T _ Stack&-1, UseCTask, Call[stkdif], Opcode[221]; JGEBx: FreezeResult, DblGoTo[JGEBpos,JGEBneg,ALU>=0]; stksw: T _ Stack&+1, Return; *Jump Greater Byte. *Jump with displacement alpha if integer 2OS .gr. TOS; pop stack twice. *Timing: 20.5 or 22.5 cycles (no jump), 15.25 cycles + JBr (jumps). @JGB: Stack&-1, UseCTask, Call[stksw], Opcode[222]; LU _ (Stack&-2) - T, GoTo[JLBx]; *Jump Less Equal Byte. *Jump with displacement alpha if integer 2OS .le. TOS; pop stack twice. *Timing: 21.5 cycles (no jump), 14.25 or 15.25 cycles + JBr (jumps). @JLEB: Stack&-1, UseCTask, Call[stksw], Opcode[223]; LU _ (Stack&-2) - T, GoTo[JGEBx]; *Jump Unsigned Less Byte. *Jump with displacement alpha if cardinal 2OS .ls. TOS; pop stack twice. *Timing: 10.25 cycles + JBr (jumps), 17.5 cycles (no jump). @JULB: T _ Stack&-1, UseCTask, Call[stkdif], Opcode[224]; JULBx: DblGoTo[J2,Ejmp,Carry]; *Jump Unsigned Greater Equal Byte. *Jump with displacement alpha if cardinal 2OS .ge. TOS; pop stack twice. *Timing: 11.25 cycles + JBr (jumps), 16.5 cycles (no jump). @JUGEB: T _ Stack&-1, UseCTask, Call[stkdif], Opcode[225]; JUGEBx: DblGoTo[JB,Enojmp,Carry]; *Jump Unsigned Greater Byte. *Jump with displacement alpha if cardinal 2OS .gr. TOS; pop stack twice. *Timing: 12.25 cycles + JBr (jumps), 19.5 cycles (no jump). @JUGB: Stack&-1, UseCTask, Call[stksw], Opcode[226]; LU _ (Stack&-2) - T, GoTo[JULBx]; *Jump Unsigned Less Equal Byte. *Jump with displacement alpha if cardinal 2OS .le. TOS; pop stack twice. *Timing: 12.25 cycles + JBr (jumps), 18.5 cycles (no jump). @JULEB: Stack&-1, UseCTask, Call[stksw], Opcode[227]; LU _ (Stack&-2) - T, GoTo[JUGEBx]; *Jump Zero 3. *Jump with displacement 3 if TOS .eq. 0; pop stack once. *Timing: 16.5 cycles (jumps), 11.25 cycles (no jump). @JZ3: LU _ Stack&-1, Opcode[230]; Skip[ALU=0]; LU _ NextInst[IBuf], CallX[P6Tailx]; J3: SkipData, CallX[J2]; *Jump Zero 4. *Jump with displacement 4 if TOS .eq. 0; pop stack once. *Timing: 20.75 cycles (jumps), 11.25 cycles (no jump). @JZ4: LU _ Stack&-1, Opcode[231]; Skip[ALU=0]; LU _ NextInst[IBuf], CallX[P6Tailx]; SkipData, CallX[J3]; *Jump Zero Byte. *Jump with displacement alpha if TOS .eq. 0; pop stack once. *Timing: 8.25 + JBr (jumps), 15.5 cycles (no jump) @JZB: LU _ Stack&-1, GoTo[JEQBx], Opcode[232]; *Jump Not Zero 3. *Jump with displacement 3 if TOS .ne. 0; pop stack once. *Timing: 17.5 cycles (jumps), 10.25 cycles (no jump). @JNZ3: LU _ Stack&-1, Opcode[233]; Skip[ALU#0]; LU _ NextInst[IBuf], CallX[P6Tailx]; SkipData, CallX[J2]; *Jump Not Zero 4. *Jump with displacement 4 if TOS .ne. 0; pop stack once. *Timing: 21.75 cycles (jumps), 10.25 cycles (no jump). @JNZ4: LU _ Stack&-1, Opcode[234]; Skip[ALU#0]; LU _ NextInst[IBuf], CallX[P6Tailx]; SkipData, CallX[J3]; *Jump Not Zero Byte. *Jump with displacement alpha if TOS .ne. 0; pop stack once. *Timing: 14.5 cycles (no jump), 9.25 cycles + JBr (jumps). @JNZB: LU _ Stack&-1, GoTo[JNEBx], Opcode[235]; StkDDif: LU _ (Stack&+1) xor T, Return; *Jump Double Equal Byte. Jump if the doublewords at TOS,,2OS and *3OS,,4OS are equal. *Timing = 4.5+13 cycles (no jump); x @JDEB: T _ Stack&-2, UseCTask, Call[StkDDif], Opcode[236]; T _ Stack&-2, Skip[ALU=0]; SkipData, Stack&-1, CallX[P6Tail]; LU _ (Stack&-1) xor T, GoTo[JEQBx]; *Jump Double Not Equal Byte. Jump if the doublewords at TOS,,2OS and *3OS,,4OS are not equal. @JDNEB: T _ Stack&-2, UseCTask, Call[StkDDif], Opcode[237]; T _ Stack&-2, Skip[ALU#0]; SkipData, Stack&-1, CallX[P6Tail]; LU _ (Stack&-1) xor T, GoTo[JNEBx]; CODEToRTemp: PFetch1[CODE,RTemp]; T _ PCFReg, Return; P6PopComp: T _ Stack&-1, UseCTask; LU _ (Stack) - T, Return; *Jump Indexed Byte. *Alpha,,beta is a CODE-relative pointer to an array of bytes; TOS is a byte *index to the array (even bytes in bits 0:7 of a word, odd bytes in bits *8:15); jump with unsigned displacement fetched from the byte array. *Timing: 21.75 cycles (no jump), 32.5 cycles + JBy (jumps). @JIB: LU _ CycleControl _ CNextData[IBuf], Call[P6PopComp], Opcode[240]; T _ RSh[Stack,1], Skip[Carry']; SkipData, CallX[P6Pop]; *Exit to next opcode T _ (NextData[IBuf]) + T; *Add beta T _ (LHMask[Cycle&PCXF]) + T, Call[CODEToRTemp]; Stack&-1, Skip[R Odd]; T _ (LdF[RTemp,0,10]) + T, GoTo[JBy]; T _ (RHMask[RTemp]) + T, GoTo[JBy]; *Jump Indexed Word. *Alpha,,beta is a CODE-relative pointer to an array of words indexed by TOS; *carry out a PC-relative jump using the signed displacement from the array. *Timing: 21.75 cycles (no jump), 32.5 cycles + JBy (jumps). @JIW: LU _ CycleControl _ CNextData[IBuf], Call[P6PopComp], Opcode[241]; T _ Stack&-1, Skip[Carry']; SkipData, CallX[P6Tail]; *Flush beta and exit T _ (NextData[IBuf]) + T; *add beta T _ (LHMask[Cycle&PCXF]) + T, Call[CODEToRTemp]; T _ (RTemp) + T, GoTo[JBy]; *Recover. Timing: 6.25 cycles. @REC: LU _ NextInst[IBuf], Opcode[242]; P6PushTailx: Stack&+1, NIRet; *Recover Two. Timing: 6.25 cycles. @REC2: LU _ NextInst[IBuf], Opcode[243]; Stack&+2, NIRet; *Discard. Timing: 6.25 cycles. @DIS: LU _ NextInst[IBuf], Opcode[244]; P6PopTailx: Stack&-1, NIRet; *Discard Two. Timing: 6.25 cycles. @DIS2: LU _ NextInst[IBuf], Opcode[245]; Stack&-2, NIRet; *Exchange. Timing: 12.25 cycles. @EXCH: T _ Stack&-1, Opcode[246]; *The preceding Stack&-1 has interlocked any PFetch to the stack. *Stack&+1 _ here interlocks a PStore1 at Stack properly. Stack&+1 _ Stack&+1, LoadPage[moPage]; Stack&-1 _ T, GoToP[moPsh]; *Double Exchange. Timing: 20.25 cycles. *See interlocking comments for EXCH. @DEXCH: T _ Stack&-2, Opcode[247]; *Exch(StkP-2,StkP) Stack&+2 _ Stack&+2; Stack&-2 _ T; Stack&-1; T _ Stack&+2; *Exch(StkP-3,StkP-1) Stack&-2 _ Stack&-2, LoadPage[moPage]; Stack&+2 _ T, GoToP[moPsh]; *Duplicate. Timing: 8.25 cycles. *Stack&+1 _ Stack&+1 won't interlock odd PFetch2. @DUP: T _ Stack&-1, Opcode[250]; SkipPushT: *Here from @LP in MesaOP3 and MesaESC. LU _ NextInst[IBuf]; SkipPushTx: Stack&+2 _ T, NIRet; *Double Duplicate. Timing: 10.25 cycles. @DDUP: T _ Stack&-1, Opcode[251]; Stack&+2 _ Stack&+2, GoTo[P6PushT]; %Exchange Discard. Timing: 8.25 cycles. NextInst[IBuf]; Stack&-1 _ Stack&-1, NIRet; might work but Stack&-1 _ Stack&-1 does not interlock an unaligned PStore2 at StkP-3/StkP-2. A dangerous opcode sequence would be some unaligned PStore2 opcode (followed by NextInst, NIRet, and 1 aborted mi = 6 cycles + 2 for transport), followed by REC2 (4 cycles), followed by EXDIS. In this case the Stack&-1 _ would take place 1 cycle before the PStore2 finished. Since the final cycle of the PStore2 will not have transport (because to be unaligned the referenced words must be 1 and 2 within the quadword), the faster code for EXDIS given above probably works. I am unsure what error-correction would do to this sequence. % @EXDIS: T _ Stack&-2, GoTo[P6PushT], Opcode[252]; ACDx: Stack&-1 _ Stack&-1, NIRet; *Negate. Timing: 10.25 cycles. *... Stack&+1 _ (Zero) - T, NIRet; gets stack underflow. @NEG: T _ Stack&-1, Opcode[253]; T _ (Zero) - T, GoTo[P6PushT]; *Increment. Timing: 8.25 cycles. @INC: T _ (Stack&-1) + 1, GoTo[P6PushT], Opcode[254]; *Decrement. Timing: 8.25 cycles. @DEC: T _ (Stack&-1) - 1, GoTo[P6PushT], Opcode[255]; P6PS2Safety: *This mi wasted for PStore2 safety (?ugh?) Stack&+1, LU _ T, Return; *Double Increment. Timing: 14.25 cycles. @DINC: T _ Stack&-2, Call[P6PS2Safety], Opcode[256]; Stack _ (Stack) + 1; T _ (RZero) + T, UseCOutAsCIn, GoTo[P6PushT]; *Double. Timing: 8.25 cycles. @DBL: T _ LSh[Stack&-1,1], Opcode[257]; P6PushT: LU _ NextInst[IBuf]; Stack&+1 _ T, NIRet; *Double Double. Timing: 14.25 cycles. @DDBL: T _ LSh[Stack&-2,1], Call[P6PS2Safety], Opcode[260]; T _ (RSh[Stack,17]) + T; Stack _ LSh[Stack,1], GoTo[P6PushT]; *Triple. Timing: 10.25 cycles. @TRPL: T _ LSh[Stack&-1,1], Opcode[261]; Stack&+1, GoTo[Addx]; *And. Timing: 8.25 cycles. @AND: T _ Stack&-1, Opcode[262]; LU _ NextInst[IBuf]; Stack _ (Stack) and T, NIRet; *Ior. Timing: 8.25 cycles. @IOR: T _ Stack&-1, Opcode[263]; LU _ NextInst[IBuf]; Stack _ (Stack) or T, NIRet; *Add Signed Byte. Timing: 12.5 cycles pos., 15.5 cycles neg. @ADDSB: T _ NextData[IBuf], Opcode[264]; *Here an immediately preceding unaligned PFetch2 to the stack has had *at least the following sequence: PFetch2[...,Stack]; NextInst; NIRet; *1 aborted mi; so when it gets to Addx, enough time will have elapsed *for the PFetch2 to have completed, so interlocking it is not a problem. LU _ T, GoTo[Addx,H2Bit8']; T _ (LHMask[AllOnes]) + T, GoTo[Addx]; *Add. Timing: 8.25 cycles. @ADD: T _ Stack&-1, Opcode[265]; Addx: LU _ NextInst[IBuf]; Stack _ (Stack) + T, NIRet; *Subtract. Timing: 8.25 cycles. @SUB: T _ Stack&-1, Opcode[266]; Subx: LU _ NextInst[IBuf]; Stack _ (Stack) - T, NIRet; GetTDecStk2: T _ Stack&-2, Return; *grab it, point to lsb of second doubleword *Double Add. Timing: 16.25 or 17.25 cycles. @DADD: MNBR _ Stack&-1, Call[GetTDecStk2], Opcode[267]; *point to lsb of top doubleword Stack _ (Stack) + T; *add low bits Stack&+1, Skip[Carry]; T _ MNBR, GoTo[Addx]; *pick up high bits of top doubleword T _ (MNBR) + 1, GoTo[Addx]; *pick up high bits of top doubleword *Double Subtract. Timing: 16.25 or 17.25 cycles. @DSUB: MNBR _ Stack&-1, Call[GetTDecStk2], Opcode[270]; *point to lsb of top doubleword Stack _ (Stack) - T; * subtract low bits Stack&+1, Skip[Carry']; *point to msb of second doubleword T _ MNBR, GoTo[Subx]; *remember msb of top doubleword (TOS) T _ (MNBR) + 1, GoTo[Subx]; *Add Double to Cardinal. Timing: 16.25 cycles. *Cardinal at TOS, Double at 2OS,,3OS. @ADC: T _ Stack&-3, Call[P6PS2Safety], Opcode[271]; Stack _ (Stack) + T; Stack&+1, FreezeResult; Stack _ (Stack) + 1, UseCOutAsCIn, GoTo[P6Tail]; *Add Cardinal to Double. Timing: 14.25 or 15.25 cycles. *Double at TOS,,2OS, Cardinal at 3OS. @ACD: LU _ Stack&-2, Opcode[272]; T _ Stack&+1; Stack&-1 _ (Stack&-1) + T; Stack&+2, Skip[Carry]; LU _ NextInst[IBuf], CallX[ACDx]; LU _ NextInst[IBuf]; Stack&-1 _ (Stack&-1) + 1, NIRet; *Add Local 0 to Immediate Byte. :IF[CacheLocals]; ************************************ *Timing: 12.5 cycles. @AL0IB: T _ NextData[IBuf], Opcode[273]; T _ (LocalCache0) + T, GoTo[P6PushT]; :ELSE; *********************************************** *Timing: 20.5 cycles. @AL0IB: PFetch1[LOCAL,Stack,0], Opcode[273]; T _ NextData[IBuf], CallX[Addx]; :ENDIF; ********************************************** %Multiply--high half of 32-bit product is left above the top of the Stack multiplier in RTemp (from the argument at TOS) multipliplicand in T (from the argument at TOS-1) product low in Stack, hi in RTemp1 The first loop flushes leading 0's in the multiplier with timing 2 cycles/0; The second loop processes 0's in 6 cycles and 1's in 10 or 11 cycles. Note how a low-order 1 in the multiplier serves as an end flag. Timing: 18.25 cycles if multiplier is 0, else 58.25 cycles if multiplier is 1, else (16.25 to 19.25) + 2*LZ + (16-LZ)*6 + (4 or 5)*(NOnes) cycles = (48.25 to 51.25) + (16-LZ)*4 + (4 or 5)*NOnes cycles. NOTE: For random numbers, this algorithm averages about 23 cycles faster than the one on the next page. However, when the multiplier has many leading + trailing zeroes, it is worse than the other. For products less than 16d bits, this algorithm is 28 cycles slower for a multiplier with a single 1 bit but gains 2 cycles for each additional 1 bit in the multiplier. For products greater than 16d bits, it is 23 cycles slower for a multiplier with a single 1 bit and gains 5 cycles for each additional 1 bit in the multiplier after the product has exceeded 16 bits. Since 87 percent of all multiplies are preceded by small constant pushes, the other algorithm probably averages faster than this one, but this one is 6b mi smaller, so we use it. % @MUL: RTemp1 _ T _ 30C, Call[MulSU], Opcode[274]; *2nd loop shifts the product RTemp1/Stack left 1 and conditionally adds the *multiplicand T based upon sign of the multiplier RTemp, which is left-shifted *until the right-most 1 bit is seen. RTemp _ (RTemp) SALUFOP T, GoTo[Mul1,R<0]; Mul0: Stack _ (Stack) SALUFOP T; RTemp1 _ (RTemp1) SALUFOP T, UseCOutAsCIn, Return; Mul1: RTemp1 _ (RTemp1) SALUFOP T, UseCOutAsCIn, GoTo[MulLast,ALU=0]; Stack _ (LSh[Stack,1]) + T, Skip[R<0]; RTemp1 _ (RTemp1) - 1, UseCOutAsCIn, Return; RTemp1 _ (RTemp1) + 1, UseCOutAsCIn, Return; *Force the low bit of multiplier RTemp to 1 for the end test. *Initialize the high product word (RTemp1) to 0; low product word (TOS-1) *already contains the multiplicand, so we don't zero it and add the *multiplicand on the 1st multiplier 1 (but we have to test multiplier for 0). MulSU: RTemp1 _ (RTemp1) - (SALUF _ T); *SALUF = 30b is LU _ 2A T _ (Stack&-1) SALUFOP T, Skip[R>=0]; *Multiplier*2 from TOS RTemp _ (Zero) + T + 1, GoTo[MulSUX]; *Multiplier .ls. 0 RTemp _ (Zero) + T + 1, Skip[ALU#0]; *Multiplier .ge. 0 T _ Stack _ 0C, GoTo[mdPush]; *Multiplier .eq. 0 *One mi loop shifts off leading 0's in multiplier. RTemp _ (RTemp) SALUFOP T, Skip[R<0]; RTemp _ (RTemp) SALUFOP T, GoTo[.,R>=0]; MulSUX: T _ Stack, Return; *Multiplicand from TOS-1 MulLast: T _ RSh[RTemp1,1], Skip[Carry']; T _ (LSh[AllOnes,17]) or T; mdPush: Stack&+1 _ T; *Even P6Pop: LU _ NextInst[IBuf], CallX[P6PopTailx]; %Multiply--high half of 32-bit product is left above the top of the Stack product low in Stack, hi in RTemp1 multipliplicand low in xBuf, hi in xBuf1 multiplier in RTemp The first loop runs until the multiplicand being left-shifted 1 each step overflows into the high word. It has timing of 4 cycles on 0's, 10 on 1's; this loop doesn't task on 0's (worst case without tasking ~ 70 cycles on a multiplier of 100000b and multiplicand of 1). Note that the end test need be made only when processing a multiplier 1. The second loop runs until the last multiplier 1 is processed with timing of 6 cycles on 0's, 14 on 1's. Timing: 20.25 cycles if the multiplier is 0, else 30.25 cycles if the multiplier is 1, else 20.25 + (1 if product .ge. 2^16) + + (4/multiplier 0) + (10/1) cycles while product .ls. 2^16 + (6/multiplier 0) + (14/1) cycles while product .ge. 2^16 where the zeroes are those between the leftmost and rightmost ones in the multiplier. PopToT: T _ Stack&-1, FreezeResult, Return; @MUL: RTemp1 _ T _ 30C, Opcode[274]; *SALUF = 30b is LU _ 2A RTemp1 _ (RTemp1) - (SALUF _ T), Call[PopToT]; *RTemp1 _ 0 RTemp _ T, UseCTask, Call[PopToT]; Stack&+1 _ 0C, Skip[ALU#0]; *tests RTemp _ T T _ Stack&+1 _ 0C, GoTo[P6Pop]; xBuf _ T, Call[.+1]; *1st loop RTemp _ RSh[RTemp,1], GoTo[MulZ,R Even]; MulO: Stack _ (Stack) + T, Skip[ALU#0]; T _ (RTemp1) + 1, UseCoutAsCin, GoTo[mdPush]; T _ xBuf _ (xBuf) SALUFOP T, FreezeResult, Skip[R<0]; RTemp1 _ (RTemp1) + 1, UseCOutAsCIn, Return; RTemp1 _ (RTemp1) + 1, UseCOutAsCIn, GoTo[MulL]; MulZ: T _ xBuf _ (xBuf) SALUFOP T, Skip[R<0]; *Must replicate the mi at MulO-1 because the opcode dispatch locations are *only four apart on this page. RTemp _ RSh[RTemp,1], DblGoTo[MulO,MulZ,R Odd]; MulL: xBuf1 _ 1C, Call[.+1]; *2nd loop RTemp _ RSh[RTemp,1], GoTo[MulLZ,R Even]; MulLO: Stack _ (Stack) + T, GoTo[.+3,ALU#0]; T _ xBuf1, FreezeResult; T _ RTemp1 _ (RTemp1) + T + 1, UseCOutAsCIn, GoTo[mdPush]; T _ xBuf1, FreezeResult; RTemp1 _ (RTemp1) + T + 1, UseCOutAsCIn; MulLZ: T _ xBuf _ (xBuf) SALUFOP T; *Double the multiplicand xBuf1 _ (xBuf1) SALUFOP T, UseCOutAsCIn, Return; % *Double Compare (signed). *If 3OS,,4OS < TOS,,2OS, push -1, 17.25 cycles *If 3OS,,4OS = TOS,,2OS, push 0, timing 16.25 cycles *If 3OS,,4OS > TOS,,2OS, push 1, 18.25 cycles *Add 4 cycles if high order words are equal and low order words unequal. @DCMP: T _ (Stack&-2) + (100000C), Opcode[275]; Stack _ (Stack) + (100000C), GoTo[DCMPy]; *Unsigned Double Compare @UDCMP: T _ Stack&-2, Opcode[276]; *Compare msb's, point at lsb of high doubleword *grab lsb of top doubleword, point at lsb of second doubleword DCMPy: LU _ (Stack&+1) - T; T _ Stack&-2, FreezeResult, Skip[ALU=0]; Stack _ (Zero) + 1, DblGoTo[DUCompL,DUCompG,Carry']; Stack _ (Stack) - T; *compare low words FreezeResult, Skip[ALU#0]; LU _ NextInst[IBuf], Call[P6Tailx]; Stack _ (Zero) + 1, DblGoTo[DUCompL,DUCompG,Carry']; DUCompL: LU _ NextInst[IBuf]; Stack _ (Stack) or not (0C), NIRet; DUCompG: LU _ NextInst[IBuf], Call[P6Tailx]; P6Undef: LoadPage[opPage0]; RTemp _ sOpcodeTrap, GoToP[SDTrap]; @OP277: xfTrapParm0 _ 277C, GoTo[P6Undef], Opcode[277]; :END[MesaOP2];(1795)\f2