:TITLE[MesaOP2]; *Ed Fiala 21 May 1982. %PCB,,PCBhi is a base register pointing to the current instruction quadword. PCB[14:15] are 0, and the low 3 bits of the PC (which point to a byte within the quadword) are kept in PCF. Since code segments cannot cross 64K boundaries and are limited to 32K words in length, the two bytes of PCBhi are forced to be equal, rather than having the least significant byte differ from the msb by 1. This facilitates negative jumps. Refill occurs when, at the onset of a NextInst or NextData, PCF contains a value greater than 7b. In this case the mi is aborted and the trap mi at location 0, 'LoadPageExternal[0], GoToExternal[377]', is executed, sending control to location 377b on the page that caused refill. Identical mi exist on all pages from which refill might occur. Refill timing is as follows: The aborted mi, trap mi at 0, and PFetch4 use 6 cycles; memory wait uses 12 more, totalling 18 cycles. This time is distributed over the 8 bytecodes in a quadword, so non-jump opcodes are charged 2.25 cycles/byte. Opcodes which jump are independently charged for refilling IBuf - 2.25 cycles because the next byte never causes refill. In other words, in counting opcode execution, assume that the next byte in the instruction stream crosses a quadword boundary 1/8 of the time, so a charge of 18/8 = 2.25 cycles is made for this. However, counting is more difficult when memory references are in progress at the tail of the opcode. Consider the following sequence, for example: PFetch1[LOCAL,Stack]; LU _ NextInst[IBuf]; NIRet; The 1st mi of the next opcode will be aborted once (two cycles) to ensure that PCX and SStkP will remain valid for the previous opcode on a fault. Any fault aborts the 4th mi after the PFetch1. If the NextInst causes refill, timing for this sequence will be 25 cycles; otherwise, timing will be 9 cycles if the next opcode does not reference TOS for a long time or 15 cycles if TOS is referenced in the 1st mi of the next opcode. Assuming that the NextInst causes refill 1/8 of the time, this sequence will average 11.25 cycles if TOS is not referenced for a long time or 16.25 cycles if TOS is referenced in the 1st mi of the next opcode. All of these times assume the quadword is not retransmitted from MC2 as a result of a correctable storage failure. Similarly, for the following: PStore1[LOCAL,Stack]; LU _ NextInst[IBuf]; NIRet; If the NextInst causes refill, the timing for this sequence is 35 cycles; otherwise, timing is ~9 cycles if TOS is not written and no reference occurs for a long time, 17 cycles if TOS is written or a new reference is issued in the 1st mi of the next opcode. Assuming that NextInst causes refill 1/8 of the time, this sequence averages 12.25 cycles if the next opcode doesn't write TOS or start a reference and 19.25 cycles if it does one of these things in its 1st mi. % MesaRefill: PCF _ RZero, At[MesaRefillLoc]; *Hold page fault on page 0 while assuring that the PCB_PCB+4 doesn't happen *until AFTER the page fault. Even if the PCB+4 were in the mi immediately *after the PFetch4, it would be uncertain whether or not that mi had *completed, so the fault handler could not be certain what to do. Nop; Nop; PCB _ (PCB) + (4C), Return; PFetch4[PCB,IBuf,4], GoToP[MesaRefill], At[LShift[opPage2,10],377]; %NOTE: after any PFetch4 on page 6 faults, the fault handler will resume after filling IBuf with 377b bytes; this means that it is inadvisable to use the bypass kludge after the PFetch4 and control must remain on page 6. If the bypass kludge were used, the transport for a preceding PFetch4, such as the one in RDC.Mc, could experience error correction and advance the time of a page fault so that the mi containing the bypass kludge was aborted, and this would execute incorrectly. % JnComO: PFetch4[PCB,IBuf]; JnComO1: Nop; *Avoid bypass kludge T _ PCB _ (PCB) + T; PCB _ (LSh[PCB,1]) + 1; JnFin: PCF _ PCB, PCB _ T, NoRegILockOK; *Only low 3 bits of PCF loaded Nop; **NextInst illegal in the mi after PCF_. P6Tail: LU _ NextInst[IBuf]; *Odd; paired with JJmp P6Tailx: PCB _ (PCB) and not (3C), NIRet; JnComZ: PFetch4[PCB,IBuf]; Nop; *Avoid bypass kludge T _ PCB _ (PCB) + T; PCB _ LSh[PCB,1], GoTo[JnFin]; *Even; paired with P6Tail JJmp: T _ (PCF.word) + T, DblGoTo[JnComO,JnComZ,R Odd]; JnEe: T _ (PCF.word) + T, GoTo[JnComO]; *Even; paired with JnEo JnEo: T _ (PCF.word) + T + 1, GoTo[JnComZ]; *Odd; paired with JnEe *Note: cannot use @J2 or @JB because, after page faults, PCX is wrong *for continuation since it is smashed wat an opcode starting instruction. J2: SkipData, CallX[P6Tail]; *Odd Ejmp: T _ NextData[IBuf], CallX[JBr]; *Even JB: T _ NextData[IBuf], CallX[JBr]; *Odd Enojmp: SkipData, CallX[P6Tail]; *Even %Jn, n=2-4, 6, 8. PCF points to the byte beyond the opcode when execution starts (i.e., 1<=PCF<=10b), so if PCF is odd, the opcode is the even byte of the current word, else the odd byte of the previous word. PCX is loaded from PCF at T2 of the first mi executed. The word displacement of the target from PCF[0:2] and the final lsb of the PC are: n PCF PCF even odd 2 0,1 1,0 3 1,0 1,1 4 1,1 2,0 6 2,1 3,0 8 3,1 4,0 % @CATCH: SkipData, CallX[P6Tail], Opcode[200]; @J2: SkipData, CallX[P6Tail], Opcode[201]; *Timing: 6+(18*(2/8)) = 10.5 @J3: SkipData, CallX[J2], Opcode[202]; *Timing: 8+(18*(3/8)) = 14.75 @J4: SkipData, CallX[J3], Opcode[203]; *Timing: 10+(18*(4/8)) = 19.00 *Timing: 22 to 23 cycles @J6: T _ 2C, Cycle&PCXF, DblGoTo[JnEo,JnEe,R Odd], Opcode[204]; @J8: T _ 3C, Cycle&PCXF, DblGoTo[JnEo,JnEe,R Odd], Opcode[205]; %Jump Byte: alpha is a signed displacement from the opcode. AllOnes is used as a temporary, restored when done. Note that RH[PCBhi] = LH[PCBhi] since code can't cross 64k boundary. Timing: 28.25 (+3 if negative displacement) cycles. % @JB: T _ NextData[IBuf], Opcode[206]; *JBr timing: 24 cycles (+3 if displacement negative). JBr: T _ (PCFReg) + T + 1, Skip[H2Bit8']; PCB _ (PCB) - (200C); *Offset 400b bytes if neg. displacement *Jump here on @JW, @JIB, and @JIW. *AllOnes _ alpha+PCF-2 for @JB or alpha..beta+PCF-3 for @JW, where -2 or -3 *displaces alpha to the 1st byte of the opcode. JBy: AllOnes _ (Form-4[AllOnes]) + T + 1; T _ RSh[AllOnes,1], Skip[R>=0]; T _ (LSh[R400,7]) or T; *Need this for (alpha+PCF) .ls. -2 PFetch4[PCB,IBuf]; PCF _ AllOnes; PCB _ (PCB) + T; AllOnes _ (Zero) - 1, GoTo[P6Tail]; *Jump Word *alpha,,beta is a 2's complement displacement. *Timing: 32.5 cycles (+3 if displacement negative). @JW: LU _ CycleControl _ NextData[IBuf], Opcode[207]; *get alpha T _ NextData[IBuf]; T _ (LHMask[Cycle&PCXF]) or T; *Alpha,,Beta T _ (PCFreg) + T, GoTo[JBy]; PairComp: T _ Stack&-1, UseCTask; LU _ (LdF[Cycle&PCXF,0,4]) xor T, Return; *Jump Equal Pair. *Jump if the 1st nibble in alpha .eq. TOS with displacement equal to the 2nd *nibble in alpha + 4 = -2 + (4+PCF+pair right); pop stack once. *Timing: 17.5 cycles (no jump), 12.25 cycles + JBr (jumps). @JEP: LU _ CycleControl _ CNextData[IBuf], Call[PairComp], Opcode[210]; T _ 4C, Skip[ALU=0]; LU _ NextInst[IBuf], CallX[P6Tailx]; T _ (LdF[Cycle&PCXF,4,4]) + T, GoTo[JBr]; stkdif: LU _ (Stack&-1) - T, Return; *Jump Equal Byte. *Jump with displacement alpha if TOS .eq. 2OS; pop stack twice. *Timing: 10.25 + JBr (jumps), 17.5 cycles (no jump) @JEB: T _ Stack&-1, UseCTask, Call[stkdif], Opcode[211]; JEQBx: DblGoTo[Ejmp,J2,ALU=0]; *Jump Equal Byte Byte. *Jump with displacement beta if TOS .eq. alpha; pop stack once. *Timing: 12.5 cycles + JBr (jumps), 19.75 cycles (no jump). @JEBB: T _ NextData[IBuf], Opcode[212]; LU _ (Stack&-1) xor T; Skip[ALU=0]; SkipData, CallX[P6Tail]; T _ (NextData[IBuf]) - 1, CallX[JBr]; *Jump Not Equal Pair. *Jump if the 1st nibble in alpha .ne. TOS with displacement equal to the 2nd *nibble in alpha + 4; = -2 + (4+PCF+pair right) pop stack once. *Timing: 16.5 cycles (no jump), 12.25 cycles + JBr (jumps). @JNEP: LU _ CycleControl _ CNextData[IBuf], Call[PairComp], Opcode[213]; T _ 4C, Skip[ALU#0]; LU _ NextInst[IBuf], CallX[P6Tailx]; T _ (LdF[Cycle&PCXF,4,4]) + T, GoTo[JBr]; *Jump Not Equal Byte. *Jump with displacement alpha if TOS .ne. 2OS; pop stack twice. *Timing: 16.5 cycles (no jump), 11.25 cycles + JBr (jumps). @JNEB: T _ Stack&-1, UseCTask, Call[stkdif], Opcode[214]; JNEBx: DblGoTo[JB,Enojmp,ALU#0]; *Jump Not Equal Byte Byte. *Jump with displacement beta if alpha .ne. TOS; pop stack once. *Timing: 18.75 cycles (no jump), 13.5 cycles + JBr (jumps). @JNEBB: T _ NextData[IBuf], Opcode[215]; LU _ (Stack&-1) xor T; Skip[ALU#0]; SkipData, CallX[P6Tail]; T _ (NextData[IBuf]) - 1, CallX[JBr]; JLBpos: DblGoTo[J2,Ejmp,Ovf']; *Even JLBneg: DblGoTo[JB,Enojmp,Ovf']; *Odd *Jump Less Byte. *Jump with displacement alpha if integer 2OS .ls. TOS; pop stack twice. *Timing: 18.5 or 20.5 cycles (no jump), 13.25 cycles + JBr (jumps). @JLB: T _ Stack&-1, UseCTask, Call[stkdif], Opcode[216]; JLBx: FreezeResult, DblGoTo[JLBpos,JLBneg,ALU>=0]; JGEBpos: DblGoTo[JB,Enojmp,Ovf']; *Even JGEBneg: DblGoTo[J2,Ejmp,Ovf']; *Odd *Jump Greater Equal Byte. *Jump with displacement alpha if integer 2OS .ge. TOS; pop stack twice. *Timing: 19.5 cycles (no jump), 12.25 or 13.25 cycles + JBr (jumps). @JGEB: T _ Stack&-1, UseCTask, Call[stkdif], Opcode[217]; JGEBx: FreezeResult, DblGoTo[JGEBpos,JGEBneg,ALU>=0]; stksw: T _ Stack&+1, Return; *Jump Greater Byte. *Jump with displacement alpha if integer 2OS .gr. TOS; pop stack twice. *Timing: 20.5 or 22.5 cycles (no jump), 15.25 cycles + JBr (jumps). @JGB: Stack&-1, UseCTask, Call[stksw], Opcode[220]; LU _ (Stack&-2) - T, GoTo[JLBx]; *Jump Less Equal Byte. *Jump with displacement alpha if integer 2OS .le. TOS; pop stack twice. *Timing: 21.5 cycles (no jump), 14.25 or 15.25 cycles + JBr (jumps). @JLEB: Stack&-1, UseCTask, Call[stksw], Opcode[221]; LU _ (Stack&-2) - T, GoTo[JGEBx]; *Jump Unsigned Less Byte. *Jump with displacement alpha if cardinal 2OS .ls. TOS; pop stack twice. *Timing: 10.25 cycles + JBr (jumps), 17.5 cycles (no jump). @JULB: T _ Stack&-1, UseCTask, Call[stkdif], Opcode[222]; JULBx: DblGoTo[J2,Ejmp,Carry]; *Jump Unsigned Greater Equal Byte. *Jump with displacement alpha if cardinal 2OS .ge. TOS; pop stack twice. *Timing: 11.25 cycles + JBr (jumps), 16.5 cycles (no jump). @JUGEB: T _ Stack&-1, UseCTask, Call[stkdif], Opcode[223]; JUGEBx: DblGoTo[JB,Enojmp,Carry]; *Jump Unsigned Greater Byte. *Jump with displacement alpha if cardinal 2OS .gr. TOS; pop stack twice. *Timing: 12.25 cycles + JBr (jumps), 19.5 cycles (no jump). @JUGB: Stack&-1, UseCTask, Call[stksw], Opcode[224]; LU _ (Stack&-2) - T, GoTo[JULBx]; *Jump Unsigned Less Equal Byte. *Jump with displacement alpha if cardinal 2OS .le. TOS; pop stack twice. *Timing: 12.25 cycles + JBr (jumps), 18.5 cycles (no jump). @JULEB: Stack&-1, UseCTask, Call[stksw], Opcode[225]; LU _ (Stack&-2) - T, GoTo[JUGEBx]; *Jump Zero 3. *Jump with displacement 3 if TOS .eq. 0; pop stack once. *Timing: 16.5 cycles (jumps), 11.25 cycles (no jump). @JZ3: LU _ Stack&-1, Opcode[226]; Skip[ALU=0]; LU _ NextInst[IBuf], CallX[P6Tailx]; J3: SkipData, CallX[J2]; *Jump Zero 4. *Jump with displacement 4 if TOS .eq. 0; pop stack once. *Timing: 20.75 cycles (jumps), 11.25 cycles (no jump). @JZ4: LU _ Stack&-1, Opcode[227]; Skip[ALU=0]; LU _ NextInst[IBuf], CallX[P6Tailx]; SkipData, CallX[J3]; *Jump Zero Byte. *Jump with displacement alpha if TOS .eq. 0; pop stack once. *Timing: 8.25 + JBr (jumps), 15.5 cycles (no jump) @JZB: LU _ Stack&-1, GoTo[JEQBx], Opcode[230]; *Jump Not Zero 3. *Jump with displacement 3 if TOS .ne. 0; pop stack once. *Timing: 17.5 cycles (jumps), 10.25 cycles (no jump). @JNZ3: LU _ Stack&-1, Opcode[231]; Skip[ALU#0]; LU _ NextInst[IBuf], CallX[P6Tailx]; SkipData, CallX[J2]; *Jump Not Zero 4. *Jump with displacement 4 if TOS .ne. 0; pop stack once. *Timing: 21.75 cycles (jumps), 10.25 cycles (no jump). @JNZ4: LU _ Stack&-1, Opcode[232]; Skip[ALU#0]; LU _ NextInst[IBuf], CallX[P6Tailx]; SkipData, CallX[J3]; *Jump Not Zero Byte. *Jump with displacement alpha if TOS .ne. 0; pop stack once. *Timing: 14.5 cycles (no jump), 9.25 cycles + JBr (jumps). @JNZB: LU _ Stack&-1, GoTo[JNEBx], Opcode[233]; CODEToRTemp: PFetch1[CODE,RTemp]; T _ PCFReg, Return; P6PopComp: T _ Stack&-1, UseCTask; LU _ (Stack) - T, Return; *Jump Indexed Byte. *Alpha,,beta is a CODE-relative pointer to an array of bytes; TOS is a byte *index to the array (even bytes in bits 0:7 of a word, odd bytes in bits *8:15); jump with unsigned displacement fetched from the byte array. *Timing: 21.75 cycles (no jump), 32.5 cycles + JBy (jumps). @JIB: LU _ CycleControl _ CNextData[IBuf], Call[P6PopComp], Opcode[234]; T _ RSh[Stack,1], Skip[Carry']; SkipData, CallX[P6Pop]; *Exit to next opcode T _ (NextData[IBuf]) + T; *Add beta T _ (LHMask[Cycle&PCXF]) + T, Call[CODEToRTemp]; Stack&-1, Skip[R Odd]; T _ (LdF[RTemp,0,10]) + T, GoTo[JBy]; T _ (RHMask[RTemp]) + T, GoTo[JBy]; *Jump Indexed Word. *Alpha,,beta is a CODE-relative pointer to an array of words indexed by TOS; *carry out a PC-relative jump using the signed displacement from the array. *Timing: 21.75 cycles (no jump), 32.5 cycles + JBy (jumps). @JIW: LU _ CycleControl _ CNextData[IBuf], Call[P6PopComp], Opcode[235]; T _ Stack&-1, Skip[Carry']; SkipData, CallX[P6Tail]; *Flush beta and exit T _ (NextData[IBuf]) + T; *add beta T _ (LHMask[Cycle&PCXF]) + T, Call[CODEToRTemp]; T _ (RTemp) + T, GoTo[JBy]; *Recover. Timing: 6.25 cycles. @REC: LU _ NextInst[IBuf], Opcode[236]; Stack&+1, NIRet; *Recover Two. Timing: 6.25 cycles. @REC2: LU _ NextInst[IBuf], Opcode[237]; Stack&+2, NIRet; *Discard. Timing: 6.25 cycles. @DIS: LU _ NextInst[IBuf], Opcode[240]; P6PopTailx: Stack&-1, NIRet; *Discard Two. Timing: 6.25 cycles. @DIS2: LU _ NextInst[IBuf], Opcode[241]; Stack&-2, NIRet; *Exchange. Timing: 14.25 cycles. @EXCH: T _ Stack&-1, Opcode[242]; *The preceding Stack&-1 has interlocked any PFetch to the stack. *Want to do MNBR _ Stack, Stack _ T, NoRegILockOK; here instead of *the next two mi, but that won't interlock a PStore1 at Stack properly. LU _ MNBR _ Stack&-1; Stack&+1 _ T; T _ MNBR, GoTo[P6PushT]; ExchST: MNBR _ Stack, Stack _ T, NoRegILockOK; T _ MNBR, Return; *Double Exchange. Timing: 24.25 cycles. *See interlocking comments for EXCH. @DEXCH: LU _ Stack&-3, Opcode[243]; T _ Stack&+2, Call[ExchST]; *Exch(StkP-3,StkP-1) Stack&-2 _ T; Stack&+3; *Exch(StkP,StkP-2) T _ Stack&-2, Call[ExchST]; SkipPushT: LU _ NextInst[IBuf]; SkipPushTx: Stack&+2 _ T, NIRet; *Duplicate. Timing: 8.25 cycles. *Stack&+1 _ Stack&+1 won't interlock odd PFetch2. @DUP: T _ Stack&-1, GoTo[SkipPushT], Opcode[244]; *Double Duplicate. Timing: 10.25 cycles. *Stack&+2 _ Stack&+2 won't interlock odd PFetch2. @DDUP: T _ Stack&-1, Opcode[245]; Stack&+2 _ Stack&+2, GoTo[P6PushT]; *Exchange Discard. Timing: 8.25 cycles. *Stack&-1 _ Stack&-1 does not interlock a PStore2 at StkP-3/StkP-2. @EXDIS: T _ Stack&-2, GoTo[P6PushT], Opcode[246]; *Negate. Timing: 10.25 cycles. @NEG: T _ Stack&-1, Opcode[247]; T _ (Zero) - T, GoTo[P6PushT]; *Increment. Timing: 8.25 cycles. @INC: T _ (Stack&-1) + 1, GoTo[P6PushT], Opcode[250]; *Decrement. Timing: 8.25 cycles. @DEC: T _ (Stack&-1) - 1, GoTo[P6PushT], Opcode[251]; P6PS2Safety: *This mi wasted for PStore2 safety (?ugh?) Stack&+1, LU _ T, Return; *Double Increment. Timing: 14.25 cycles. @DINC: T _ Stack&-2, Call[P6PS2Safety], Opcode[252]; Stack _ (Stack) + 1; T _ (RZero) + T, UseCOutAsCIn, GoTo[P6PushT]; *Double. Timing: 8.25 cycles. @DBL: T _ LSh[Stack&-1,1], Opcode[253]; P6PushT: LU _ NextInst[IBuf]; Stack&+1 _ T, NIRet; *Double Double. Timing: 14.25 cycles. @DDBL: T _ LSh[Stack&-2,1], Call[P6PS2Safety], Opcode[254]; T _ (RSh[Stack,17]) + T; Stack _ LSh[Stack,1], GoTo[P6PushT]; *Triple. Timing: 10.25 cycles. @TRPL: T _ LSh[Stack&-1,1], Opcode[255]; Stack&+1, GoTo[Addx]; *And. Timing: 8.25 cycles. @AND: T _ Stack&-1, Opcode[256]; LU _ NextInst[IBuf]; Stack _ (Stack) and T, NIRet; *Ior. Timing: 8.25 cycles. @IOR: T _ Stack&-1, Opcode[257]; LU _ NextInst[IBuf]; Stack _ (Stack) or T, NIRet; *Add Signed Byte. Timing: 14.5 cycles pos., 17.5 cycles neg. @ADDSB: T _ NextData[IBuf], Opcode[260]; RTemp _ T, Skip[H2Bit8']; T _ (RTemp) + (177400C); *This mi wasted to interlock immediately preceding unaligned PFetch2[x,Stack]. T _ (Stack&-1) + T, GoTo[P6PushT]; *Add. Timing: 8.25 cycles. @ADD: T _ Stack&-1, Opcode[261]; Addx: LU _ NextInst[IBuf]; Stack _ (Stack) + T, NIRet; *Subtract. Timing: 8.25 cycles. @SUB: T _ Stack&-1, Opcode[262]; Subx: LU _ NextInst[IBuf]; Stack _ (Stack) - T, NIRet; GetTDecStk2: T _ Stack&-2, Return; *grab it, point to lsb of second doubleword *Double Add. Timing: 16.25 or 17.25 cycles. @DADD: MNBR _ Stack&-1, Call[GetTDecStk2], Opcode[263]; *point to lsb of top doubleword Stack _ (Stack) + T; *add low bits Stack&+1, Skip[Carry]; T _ MNBR, GoTo[Addx]; *pick up high bits of top doubleword T _ (MNBR) + 1, GoTo[Addx]; *pick up high bits of top doubleword *Double Subtract. Timing: 16.25 or 17.25 cycles. @DSUB: MNBR _ Stack&-1, Call[GetTDecStk2], Opcode[264]; *point to lsb of top doubleword Stack _ (Stack) - T; * subtract low bits Stack&+1, Skip[Carry']; *point to msb of second doubleword T _ MNBR, GoTo[Subx]; *remember msb of top doubleword (TOS) T _ (MNBR) + 1, GoTo[Subx]; *Add Double to Cardinal. Timing: 16.25 cycles. *Cardinal at TOS, Double at 2OS,,3OS. @ADC: T _ Stack&-3, Call[P6PS2Safety], Opcode[265]; Stack _ (Stack) + T; Stack&+1, FreezeResult; Stack _ (Stack) + 1, UseCOutAsCIn, GoTo[P6Tail]; *Add Cardinal to Double. Timing: 16.25 or 17.25 cycles. *Double at TOS,,2OS, Cardinal at 3OS. @ACD: LU _ Stack&-2, Opcode[266]; T _ Stack&+1; Stack&-1 _ (Stack&-1) + T; Stack&+2, Skip[Carry]; ***Could save 2 cycles by expending 2 mi here. Stack&-1 _ Stack&-1, GoTo[P6Tail]; Stack&-1 _ (Stack&-1) + 1, GoTo[P6Tail]; *Add Local 0 to Immediate Byte. :IF[CacheLocals]; ************************************ *Timing: 12.5 cycles. @AL0IB: T _ NextData[IBuf], Opcode[267]; T _ (LocalCache0) + T, GoTo[P6PushT]; :ELSE; *********************************************** *Timing: 20.5 cycles. @AL0IB: PFetch1[LOCAL,Stack,0], Opcode[267]; T _ NextData[IBuf], CallX[Addx]; :ENDIF; ********************************************** %Multiply--high half of 32-bit product is left above the top of the Stack multiplier in RTemp (from the argument at TOS) multipliplicand in T (from the argument at TOS-1) product low in Stack, hi in RTemp1 The first loop flushes leading 0's in the multiplier with timing 2 cycles/0; The second loop processes 0's in 6 cycles and 1's in 10 or 11 cycles. Note how a low-order 1 in the multiplier serves as an end flag. Timing: 18.25 cycles if multiplier is 0, else 58.25 cycles if multiplier is 1, else (16.25 to 19.25) + 2*LZ + (16-LZ)*6 + (4 or 5)*(NOnes) cycles = (48.25 to 51.25) + (16-LZ)*4 + (4 or 5)*NOnes cycles. NOTE: For random numbers, this algorithm averages about 23 cycles faster than the one on the next page. However, when the multiplier has many leading + trailing zeroes, it is worse than the other. For products less than 16d bits, this algorithm is 28 cycles slower for a multiplier with a single 1 bit but gains 2 cycles for each additional 1 bit in the multiplier. For products greater than 16d bits, it is 23 cycles slower for a multiplier with a single 1 bit and gains 5 cycles for each additional 1 bit in the multiplier after the product has exceeded 16 bits. Since 87 percent of all multiplies are preceded by small constant pushes, the other algorithm probably averages faster than this one, but this one is 6b mi smaller, so we use it. % @MUL: RTemp1 _ T _ 30C, Call[MulSU], Opcode[270]; *2nd loop shifts the product RTemp1/Stack left 1 and conditionally adds the *multiplicand T based upon sign of the multiplier RTemp, which is left-shifted *until the right-most 1 bit is seen. RTemp _ (RTemp) SALUFOP T, GoTo[Mul1,R<0]; Mul0: Stack _ (Stack) SALUFOP T; RTemp1 _ (RTemp1) SALUFOP T, UseCOutAsCIn, Return; Mul1: RTemp1 _ (RTemp1) SALUFOP T, UseCOutAsCIn, GoTo[MulLast,ALU=0]; Stack _ (LSh[Stack,1]) + T, Skip[R<0]; RTemp1 _ (RTemp1) - 1, UseCOutAsCIn, Return; RTemp1 _ (RTemp1) + 1, UseCOutAsCIn, Return; *Force the low bit of multiplier RTemp to 1 for the end test. *Initialize the high product word (RTemp1) to 0; low product word (TOS-1) *already contains the multiplicand, so we don't zero it and add the *multiplicand on the 1st multiplier 1 (but we have to test multiplier for 0). MulSU: RTemp1 _ (RTemp1) - (SALUF _ T); *SALUF = 30b is LU _ 2A T _ (Stack&-1) SALUFOP T, Skip[R>=0]; *Multiplier*2 from TOS RTemp _ (Zero) + T + 1, GoTo[MulSUX]; *Multiplier .ls. 0 RTemp _ (Zero) + T + 1, Skip[ALU#0]; *Multiplier .ge. 0 T _ Stack _ 0C, GoTo[mdPush]; *Multiplier .eq. 0 *One mi loop shifts off leading 0's in multiplier. RTemp _ (RTemp) SALUFOP T, Skip[R<0]; RTemp _ (RTemp) SALUFOP T, GoTo[.,R>=0]; MulSUX: T _ Stack, Return; *Multiplicand from TOS-1 MulLast: T _ RSh[RTemp1,1], GoTo[mdPush,Carry']; T _ (LSh[AllOnes,17]) or T, GoTo[MdPush]; mdPush: Stack&+1 _ T; *Even P6Pop: LU _ NextInst[IBuf], CallX[P6PopTailx]; %Multiply--high half of 32-bit product is left above the top of the Stack product low in Stack, hi in RTemp1 multipliplicand low in xBuf, hi in xBuf1 multiplier in RTemp The first loop runs until the multiplicand being left-shifted 1 each step overflows into the high word. It has timing of 4 cycles on 0's, 10 on 1's; this loop doesn't task on 0's (worst case without tasking ~ 70 cycles on a multiplier of 100000b and multiplicand of 1). Note that the end test need be made only when processing a multiplier 1. The second loop runs until the last multiplier 1 is processed with timing of 6 cycles on 0's, 14 on 1's. Timing: 20.25 cycles if the multiplier is 0, else 30.25 cycles if the multiplier is 1, else 20.25 + (1 if product .ge. 2^16) + + (4/multiplier 0) + (10/1) cycles while product .ls. 2^16 + (6/multiplier 0) + (14/1) cycles while product .ge. 2^16 where the zeroes are those between the leftmost and rightmost ones in the multiplier. PopToT: T _ Stack&-1, FreezeResult, Return; @MUL: RTemp1 _ T _ 30C, Opcode[270]; *SALUF = 30b is LU _ 2A RTemp1 _ (RTemp1) - (SALUF _ T), Call[PopToT]; *RTemp1 _ 0 RTemp _ T, UseCTask, Call[PopToT]; Stack&+1 _ 0C, Skip[ALU#0]; *tests RTemp _ T T _ Stack&+1 _ 0C, GoTo[P6Pop]; xBuf _ T, Call[.+1]; *1st loop RTemp _ RSh[RTemp,1], GoTo[MulZ,R Even]; MulO: Stack _ (Stack) + T, Skip[ALU#0]; T _ (RTemp1) + 1, UseCoutAsCin, GoTo[mdPush]; T _ xBuf _ (xBuf) SALUFOP T, FreezeResult, Skip[R<0]; RTemp1 _ (RTemp1) + 1, UseCOutAsCIn, Return; RTemp1 _ (RTemp1) + 1, UseCOutAsCIn, GoTo[MulL]; MulZ: T _ xBuf _ (xBuf) SALUFOP T, Skip[R<0]; *Must replicate the mi at MulO-1 because the opcode dispatch locations are *only four apart on this page. RTemp _ RSh[RTemp,1], DblGoTo[MulO,MulZ,R Odd]; MulL: xBuf1 _ 1C, Call[.+1]; *2nd loop RTemp _ RSh[RTemp,1], GoTo[MulLZ,R Even]; MulLO: Stack _ (Stack) + T, GoTo[.+3,ALU#0]; T _ xBuf1, FreezeResult; T _ RTemp1 _ (RTemp1) + T + 1, UseCOutAsCIn, GoTo[mdPush]; T _ xBuf1, FreezeResult; RTemp1 _ (RTemp1) + T + 1, UseCOutAsCIn; MulLZ: T _ xBuf _ (xBuf) SALUFOP T; *Double the multiplicand xBuf1 _ (xBuf1) SALUFOP T, UseCOutAsCIn, Return; % *Double Compare (signed). *If 3OS,,4OS < TOS,,2OS, push -1, 19.25 cycles *If 3OS,,4OS = TOS,,2OS, push 0, timing 16.25 cycles *If 3OS,,4OS > TOS,,2OS, push 1, 18.25 cycles *Add 4 cycles if the high order words are equal and low order words unequal. @DCMP: T _ (Stack&-2) + (100000C), Opcode[271]; Stack _ (Stack) + (100000C), GoTo[DCMPy]; *Unsigned Double Compare @UDCMP: T _ Stack&-2, Opcode[272]; *Compare msb's, point at lsb of high doubleword *grab lsb of top doubleword, point at lsb of second doubleword DCMPy: LU _ (Stack&+1) - T; T _ Stack&-2, FreezeResult, Skip[ALU=0]; Stack _ (Zero) + 1, DblGoTo[DUCompL,DUCompG,Carry']; Stack _ (Stack) - T; *compare low words FreezeResult, Skip[ALU#0]; LU _ NextInst[IBuf], Call[P6Tailx]; Stack _ (Zero) + 1, DblGoTo[DUCompL,DUCompG,Carry']; DUCompL: Stack _ (Stack) or not (0C), GoTo[P6Tail]; DUCompG: LU _ NextInst[IBuf], Call[P6Tailx]; *ESC and ESCL (opcodes 273-274) are in MesaESC.Mc *Lengthen Pointer. *Timing: 13.25 cycles non-NIL, 10.25 cycles NIL. @LP: T _ Stack&-1, Opcode[275]; *Pop to interlock PFetch2's Skip[ALU=0]; *Test for NIL T _ RSh[MDShi,10]; *push MDShi if non-NIL LU _ NextInst[IBuf], CallX[SkipPushTx]; P6Undef: LoadPage[opPage0]; RTemp _ sOpcodeTrap, GoToP[SDTrap]; @OP276: TrapParm _ 276C, GoTo[P6Undef], Opcode[276]; @OP277: TrapParm _ 277C, GoTo[P6Undef], Opcode[277]; :END[MesaOP2];(1795)\f5 14255f0 7f5 813f0 7f5 104f0 7f5 69f0 7f5 82f0 7f5 255f0 7f5 75f0 7f5 800f0 7f5