:TITLE[MesaOP2];*Ed Fiala 21 May 1982.

%PCB,,PCBhi is a base register pointing to the current instruction quadword.
PCB[14:15] are 0, and the low 3 bits of the PC (which point to a byte within
the quadword) are kept in PCF. Since code segments cannot cross 64K
boundaries and are limited to 32K words in length, the two bytes of PCBhi
are forced to be equal, rather than having the least significant byte differ
from the msb by 1. This facilitates negative jumps.

Refill occurs when, at the onset of a NextInst or NextData, PCF contains a
value greater than 7b. In this case the mi is aborted and the trap mi at
location 0, ’LoadPageExternal[0], GoToExternal[377]’, is executed, sending
control to location 377b on the page that caused refill. Identical mi exist
on all pages from which refill might occur.

Refill timing is as follows: The aborted mi, trap mi at 0, and PFetch4 use
6 cycles; memory wait uses 12 more, totalling 18 cycles. This time is
distributed over the 8 bytecodes in a quadword, so non-jump opcodes are
charged 2.25 cycles/byte. Opcodes which jump are independently charged for
refilling IBuf - 2.25 cycles because the next byte never causes refill.

In other words, in counting opcode execution, assume that the next byte in the
instruction stream crosses a quadword boundary 1/8 of the time, so a charge
of 18/8 = 2.25 cycles is made for this.

However, counting is more difficult when memory references are in progress
at the tail of the opcode. Consider the following sequence, for example:
PFetch1[LOCAL,Stack];
LU ← NextInst[IBuf];
NIRet;
The 1st mi of the next opcode will be aborted once (two cycles) to ensure
that PCX and SStkP will remain valid for the previous opcode on a fault.
Any fault aborts the 4th mi after the PFetch1. If the NextInst causes
refill, timing for this sequence will be 25 cycles; otherwise, timing will
be 9 cycles if the next opcode does not reference TOS for a long time or
15 cycles if TOS is referenced in the 1st mi of the next opcode. Assuming
that the NextInst causes refill 1/8 of the time, this sequence will average
11.25 cycles if TOS is not referenced for a long time or 16.25 cycles if TOS
is referenced in the 1st mi of the next opcode. All of these times assume
the quadword is not retransmitted from MC2 as a result of a correctable
storage failure.

Similarly, for the following:
PStore1[LOCAL,Stack];
LU ← NextInst[IBuf];
NIRet;
If the NextInst causes refill, the timing for this sequence is 35 cycles;
otherwise, timing is ~9 cycles if TOS is not written and no reference occurs
for a long time, 17 cycles if TOS is written or a new reference is issued in
the 1st mi of the next opcode. Assuming that NextInst causes refill 1/8 of
the time, this sequence averages 12.25 cycles if the next opcode doesn’t
write TOS or start a reference and 19.25 cycles if it does one of these
things in its 1st mi.
%

MesaRefill:
PCF ← RZero, At[MesaRefillLoc];
*Hold page fault on page 0 while assuring that the PCB←PCB+4 doesn’t happen
*until AFTER the page fault. Even if the PCB+4 were in the mi immediately
*after the PFetch4, it would be uncertain whether or not that mi had
*completed, so the fault handler could not be certain what to do.
Nop;
Nop;
PCB ← (PCB) + (4C), Return;

PFetch4[PCB,IBuf,4], GoToP[MesaRefill], At[LShift[opPage2,10],377];

%NOTE: after any PFetch4 on page 6 faults, the fault handler will resume after
filling IBuf with 377b bytes; this means that it is inadvisable to use the
bypass kludge after the PFetch4 and control must remain on page 6. If the
bypass kludge were used, the transport for a preceding PFetch4, such as the
one in RDC.Mc, could experience error correction and advance the time of a
page fault so that the mi containing the bypass kludge was aborted, and
this would execute incorrectly.
%
JnComO:
PFetch4[PCB,IBuf];
JnComO1:
Nop;*Avoid bypass kludge
T ← PCB ← (PCB) + T;
PCB ← (LSh[PCB,1]) + 1;
JnFin:
PCF ← PCB, PCB ← T, NoRegILockOK;*Only low 3 bits of PCF loaded
Nop;**NextInst illegal in the mi after PCF←.
P6Tail:
LU ← NextInst[IBuf];*Odd; paired with JJmp
P6Tailx:
PCB ← (PCB) and not (3C), NIRet;

JnComZ:
PFetch4[PCB,IBuf];
Nop;*Avoid bypass kludge
T ← PCB ← (PCB) + T;
PCB ← LSh[PCB,1], GoTo[JnFin];

*Even; paired with P6Tail
JJmp:
T ← (PCF.word) + T, DblGoTo[JnComO,JnComZ,R Odd];

JnEe:
T ← (PCF.word) + T, GoTo[JnComO];*Even; paired with JnEo
JnEo:
T ← (PCF.word) + T + 1, GoTo[JnComZ];*Odd; paired with JnEe

*Note: cannot use @J2 or @JB because, after page faults, PCX is wrong
*for continuation since it is smashed wat an opcode starting instruction.
J2:
SkipData, CallX[P6Tail];*Odd
Ejmp:
T ← NextData[IBuf], CallX[JBr];*Even

JB:
T ← NextData[IBuf], CallX[JBr];*Odd
Enojmp:
SkipData, CallX[P6Tail];*Even

%Jn, n=2-4, 6, 8. PCF points to the byte beyond the opcode when execution
starts (i.e., 1<=PCF<=10b), so if PCF is odd, the opcode is the even byte of
the current word, else the odd byte of the previous word. PCX is loaded from
PCF at T2 of the first mi executed. The word displacement of the target from
PCF[0:2] and the final lsb of the PC are:
nPCFPCF
evenodd
20,11,0
31,01,1
41,12,0
62,13,0
83,14,0
%
@CATCH:
SkipData, CallX[P6Tail], Opcode[200];
@J2:
SkipData, CallX[P6Tail], Opcode[201];*Timing: 6+(18*(2/8)) = 10.5
@J3:
SkipData, CallX[J2], Opcode[202];*Timing: 8+(18*(3/8)) = 14.75
@J4:
SkipData, CallX[J3], Opcode[203];*Timing: 10+(18*(4/8)) = 19.00
*Timing: 22 to 23 cycles
@J6:
T ← 2C, Cycle&PCXF, DblGoTo[JnEo,JnEe,R Odd], Opcode[204];
@J8:
T ← 3C, Cycle&PCXF, DblGoTo[JnEo,JnEe,R Odd], Opcode[205];

%Jump Byte: alpha is a signed displacement from the opcode.
AllOnes is used as a temporary, restored when done. Note that
RH[PCBhi] = LH[PCBhi] since code can’t cross 64k boundary.
Timing: 28.25 (+3 if negative displacement) cycles.
%
@JB:
T ← NextData[IBuf], Opcode[206];
*JBr timing: 24 cycles (+3 if displacement negative).
JBr:
T ← (PCFReg) + T + 1, Skip[H2Bit8’];
PCB ← (PCB) - (200C);*Offset 400b bytes if neg. displacement
*Jump here on @JW, @JIB, and @JIW.
*AllOnes ← alpha+PCF-2 for @JB or alpha..beta+PCF-3 for @JW, where -2 or -3
*displaces alpha to the 1st byte of the opcode.
JBy:
AllOnes ← (Form-4[AllOnes]) + T + 1;
T ← RSh[AllOnes,1], Skip[R>=0];
T ← (LSh[R400,7]) or T;*Need this for (alpha+PCF) .ls. -2
PFetch4[PCB,IBuf];
PCF ← AllOnes;
PCB ← (PCB) + T;
AllOnes ← (Zero) - 1, GoTo[P6Tail];

*Jump Word
*alpha,,beta is a 2’s complement displacement.
*Timing: 32.5 cycles (+3 if displacement negative).
@JW:
LU ← CycleControl ← NextData[IBuf], Opcode[207]; *get alpha
T ← NextData[IBuf];
T ← (LHMask[Cycle&PCXF]) or T;*Alpha,,Beta
T ← (PCFreg) + T, GoTo[JBy];

PairComp:
T ← Stack&-1, UseCTask;
LU ← (LdF[Cycle&PCXF,0,4]) xor T, Return;

*Jump Equal Pair.
*Jump if the 1st nibble in alpha .eq. TOS with displacement equal to the 2nd
*nibble in alpha + 4 = -2 + (4+PCF+pair right); pop stack once.
*Timing: 17.5 cycles (no jump), 12.25 cycles + JBr (jumps).
@JEP:
LU ← CycleControl ← CNextData[IBuf], Call[PairComp], Opcode[210];
T ← 4C, Skip[ALU=0];
LU ← NextInst[IBuf], CallX[P6Tailx];
T ← (LdF[Cycle&PCXF,4,4]) + T, GoTo[JBr];

stkdif:
LU ← (Stack&-1) - T, Return;

*Jump Equal Byte.
*Jump with displacement alpha if TOS .eq. 2OS; pop stack twice.
*Timing: 10.25 + JBr (jumps), 17.5 cycles (no jump)
@JEB:
T ← Stack&-1, UseCTask, Call[stkdif], Opcode[211];
JEQBx:
DblGoTo[Ejmp,J2,ALU=0];

*Jump Equal Byte Byte.
*Jump with displacement beta if TOS .eq. alpha; pop stack once.
*Timing: 12.5 cycles + JBr (jumps), 19.75 cycles (no jump).
@JEBB:
T ← NextData[IBuf], Opcode[212];
LU ← (Stack&-1) xor T;
Skip[ALU=0];
SkipData, CallX[P6Tail];
T ← (NextData[IBuf]) - 1, CallX[JBr];

*Jump Not Equal Pair.
*Jump if the 1st nibble in alpha .ne. TOS with displacement equal to the 2nd
*nibble in alpha + 4; = -2 + (4+PCF+pair right) pop stack once.
*Timing: 16.5 cycles (no jump), 12.25 cycles + JBr (jumps).
@JNEP:
LU ← CycleControl ← CNextData[IBuf], Call[PairComp], Opcode[213];
T ← 4C, Skip[ALU#0];
LU ← NextInst[IBuf], CallX[P6Tailx];
T ← (LdF[Cycle&PCXF,4,4]) + T, GoTo[JBr];

*Jump Not Equal Byte.
*Jump with displacement alpha if TOS .ne. 2OS; pop stack twice.
*Timing: 16.5 cycles (no jump), 11.25 cycles + JBr (jumps).
@JNEB:
T ← Stack&-1, UseCTask, Call[stkdif], Opcode[214];
JNEBx:
DblGoTo[JB,Enojmp,ALU#0];

*Jump Not Equal Byte Byte.
*Jump with displacement beta if alpha .ne. TOS; pop stack once.
*Timing: 18.75 cycles (no jump), 13.5 cycles + JBr (jumps).
@JNEBB:
T ← NextData[IBuf], Opcode[215];
LU ← (Stack&-1) xor T;
Skip[ALU#0];
SkipData, CallX[P6Tail];
T ← (NextData[IBuf]) - 1, CallX[JBr];

JLBpos:
DblGoTo[J2,Ejmp,Ovf’];*Even
JLBneg:
DblGoTo[JB,Enojmp,Ovf’];*Odd

*Jump Less Byte.
*Jump with displacement alpha if integer 2OS .ls. TOS; pop stack twice.
*Timing: 18.5 or 20.5 cycles (no jump), 13.25 cycles + JBr (jumps).
@JLB:
T ← Stack&-1, UseCTask, Call[stkdif], Opcode[216];
JLBx:
FreezeResult, DblGoTo[JLBpos,JLBneg,ALU>=0];

JGEBpos:
DblGoTo[JB,Enojmp,Ovf’];*Even
JGEBneg:
DblGoTo[J2,Ejmp,Ovf’];*Odd

*Jump Greater Equal Byte.
*Jump with displacement alpha if integer 2OS .ge. TOS; pop stack twice.
*Timing: 19.5 cycles (no jump), 12.25 or 13.25 cycles + JBr (jumps).
@JGEB:
T ← Stack&-1, UseCTask, Call[stkdif], Opcode[217];
JGEBx:
FreezeResult, DblGoTo[JGEBpos,JGEBneg,ALU>=0];

stksw:
T ← Stack&+1, Return;

*Jump Greater Byte.
*Jump with displacement alpha if integer 2OS .gr. TOS; pop stack twice.
*Timing: 20.5 or 22.5 cycles (no jump), 15.25 cycles + JBr (jumps).
@JGB:
Stack&-1, UseCTask, Call[stksw], Opcode[220];
LU ← (Stack&-2) - T, GoTo[JLBx];

*Jump Less Equal Byte.
*Jump with displacement alpha if integer 2OS .le. TOS; pop stack twice.
*Timing: 21.5 cycles (no jump), 14.25 or 15.25 cycles + JBr (jumps).
@JLEB:
Stack&-1, UseCTask, Call[stksw], Opcode[221];
LU ← (Stack&-2) - T, GoTo[JGEBx];

*Jump Unsigned Less Byte.
*Jump with displacement alpha if cardinal 2OS .ls. TOS; pop stack twice.
*Timing: 10.25 cycles + JBr (jumps), 17.5 cycles (no jump).
@JULB:
T ← Stack&-1, UseCTask, Call[stkdif], Opcode[222];
JULBx:
DblGoTo[J2,Ejmp,Carry];

*Jump Unsigned Greater Equal Byte.
*Jump with displacement alpha if cardinal 2OS .ge. TOS; pop stack twice.
*Timing: 11.25 cycles + JBr (jumps), 16.5 cycles (no jump).
@JUGEB:
T ← Stack&-1, UseCTask, Call[stkdif], Opcode[223];
JUGEBx:
DblGoTo[JB,Enojmp,Carry];

*Jump Unsigned Greater Byte.
*Jump with displacement alpha if cardinal 2OS .gr. TOS; pop stack twice.
*Timing: 12.25 cycles + JBr (jumps), 19.5 cycles (no jump).
@JUGB:
Stack&-1, UseCTask, Call[stksw], Opcode[224];
LU ← (Stack&-2) - T, GoTo[JULBx];

*Jump Unsigned Less Equal Byte.
*Jump with displacement alpha if cardinal 2OS .le. TOS; pop stack twice.
*Timing: 12.25 cycles + JBr (jumps), 18.5 cycles (no jump).
@JULEB:
Stack&-1, UseCTask, Call[stksw], Opcode[225];
LU ← (Stack&-2) - T, GoTo[JUGEBx];

*Jump Zero 3.
*Jump with displacement 3 if TOS .eq. 0; pop stack once.
*Timing: 16.5 cycles (jumps), 11.25 cycles (no jump).
@JZ3:
LU ← Stack&-1, Opcode[226];
Skip[ALU=0];
LU ← NextInst[IBuf], CallX[P6Tailx];
J3:
SkipData, CallX[J2];

*Jump Zero 4.
*Jump with displacement 4 if TOS .eq. 0; pop stack once.
*Timing: 20.75 cycles (jumps), 11.25 cycles (no jump).
@JZ4:
LU ← Stack&-1, Opcode[227];
Skip[ALU=0];
LU ← NextInst[IBuf], CallX[P6Tailx];
SkipData, CallX[J3];

*Jump Zero Byte.
*Jump with displacement alpha if TOS .eq. 0; pop stack once.
*Timing: 8.25 + JBr (jumps), 15.5 cycles (no jump)
@JZB:
LU ← Stack&-1, GoTo[JEQBx], Opcode[230];

*Jump Not Zero 3.
*Jump with displacement 3 if TOS .ne. 0; pop stack once.
*Timing: 17.5 cycles (jumps), 10.25 cycles (no jump).
@JNZ3:
LU ← Stack&-1, Opcode[231];
Skip[ALU#0];
LU ← NextInst[IBuf], CallX[P6Tailx];
SkipData, CallX[J2];

*Jump Not Zero 4.
*Jump with displacement 4 if TOS .ne. 0; pop stack once.
*Timing: 21.75 cycles (jumps), 10.25 cycles (no jump).
@JNZ4:
LU ← Stack&-1, Opcode[232];
Skip[ALU#0];
LU ← NextInst[IBuf], CallX[P6Tailx];
SkipData, CallX[J3];

*Jump Not Zero Byte.
*Jump with displacement alpha if TOS .ne. 0; pop stack once.
*Timing: 14.5 cycles (no jump), 9.25 cycles + JBr (jumps).
@JNZB:
LU ← Stack&-1, GoTo[JNEBx], Opcode[233];


CODEToRTemp:
PFetch1[CODE,RTemp];
T ← PCFReg, Return;

P6PopComp:
T ← Stack&-1, UseCTask;
LU ← (Stack) - T, Return;

*Jump Indexed Byte.
*Alpha,,beta is a CODE-relative pointer to an array of bytes; TOS is a byte
*index to the array (even bytes in bits 0:7 of a word, odd bytes in bits
*8:15); jump with unsigned displacement fetched from the byte array.
*Timing: 21.75 cycles (no jump), 32.5 cycles + JBy (jumps).
@JIB:
LU ← CycleControl ← CNextData[IBuf], Call[P6PopComp], Opcode[234];
T ← RSh[Stack,1], Skip[Carry’];
SkipData, CallX[P6Pop];*Exit to next opcode
T ← (NextData[IBuf]) + T;*Add beta
T ← (LHMask[Cycle&PCXF]) + T, Call[CODEToRTemp];
Stack&-1, Skip[R Odd];
T ← (LdF[RTemp,0,10]) + T, GoTo[JBy];
T ← (RHMask[RTemp]) + T, GoTo[JBy];

*Jump Indexed Word.
*Alpha,,beta is a CODE-relative pointer to an array of words indexed by TOS;
*carry out a PC-relative jump using the signed displacement from the array.
*Timing: 21.75 cycles (no jump), 32.5 cycles + JBy (jumps).
@JIW:
LU ← CycleControl ← CNextData[IBuf], Call[P6PopComp], Opcode[235];
T ← Stack&-1, Skip[Carry’];
SkipData, CallX[P6Tail];*Flush beta and exit
T ← (NextData[IBuf]) + T;*add beta
T ← (LHMask[Cycle&PCXF]) + T, Call[CODEToRTemp];
T ← (RTemp) + T, GoTo[JBy];

*Recover. Timing: 6.25 cycles.
@REC:
LU ← NextInst[IBuf], Opcode[236];
Stack&+1, NIRet;

*Recover Two. Timing: 6.25 cycles.
@REC2:
LU ← NextInst[IBuf], Opcode[237];
Stack&+2, NIRet;

*Discard. Timing: 6.25 cycles.
@DIS:
LU ← NextInst[IBuf], Opcode[240];
P6PopTailx:
Stack&-1, NIRet;

*Discard Two. Timing: 6.25 cycles.
@DIS2:
LU ← NextInst[IBuf], Opcode[241];
Stack&-2, NIRet;

*Exchange. Timing: 14.25 cycles.
@EXCH:
T ← Stack&-1, Opcode[242];
*The preceding Stack&-1 has interlocked any PFetch to the stack.
*Want to do MNBR ← Stack, Stack ← T, NoRegILockOK; here instead of
*the next two mi, but that won’t interlock a PStore1 at Stack properly.
LU ← MNBR ← Stack&-1;
Stack&+1 ← T;
T ← MNBR, GoTo[P6PushT];

ExchST:
MNBR ← Stack, Stack ← T, NoRegILockOK;
T ← MNBR, Return;

*Double Exchange. Timing: 24.25 cycles.
*See interlocking comments for EXCH.
@DEXCH:
LU ← Stack&-3, Opcode[243];
T ← Stack&+2, Call[ExchST];*Exch(StkP-3,StkP-1)
Stack&-2 ← T;
Stack&+3;*Exch(StkP,StkP-2)
T ← Stack&-2, Call[ExchST];
SkipPushT:
LU ← NextInst[IBuf];
SkipPushTx:
Stack&+2 ← T, NIRet;

*Duplicate. Timing: 8.25 cycles.
*Stack&+1 ← Stack&+1 won’t interlock odd PFetch2.
@DUP:
T ← Stack&-1, GoTo[SkipPushT], Opcode[244];

*Double Duplicate. Timing: 10.25 cycles.
*Stack&+2 ← Stack&+2 won’t interlock odd PFetch2.
@DDUP:
T ← Stack&-1, Opcode[245];
Stack&+2 ← Stack&+2, GoTo[P6PushT];

*Exchange Discard. Timing: 8.25 cycles.
*Stack&-1 ← Stack&-1 does not interlock a PStore2 at StkP-3/StkP-2.
@EXDIS:
T ← Stack&-2, GoTo[P6PushT], Opcode[246];

*Negate. Timing: 10.25 cycles.
@NEG:
T ← Stack&-1, Opcode[247];
T ← (Zero) - T, GoTo[P6PushT];

*Increment. Timing: 8.25 cycles.
@INC:
T ← (Stack&-1) + 1, GoTo[P6PushT], Opcode[250];

*Decrement. Timing: 8.25 cycles.
@DEC:
T ← (Stack&-1) - 1, GoTo[P6PushT], Opcode[251];

P6PS2Safety:
*This mi wasted for PStore2 safety (?ugh?)
Stack&+1, LU ← T, Return;

*Double Increment. Timing: 14.25 cycles.
@DINC:
T ← Stack&-2, Call[P6PS2Safety], Opcode[252];
Stack ← (Stack) + 1;
T ← (RZero) + T, UseCOutAsCIn, GoTo[P6PushT];

*Double. Timing: 8.25 cycles.
@DBL:
T ← LSh[Stack&-1,1], Opcode[253];
P6PushT:
LU ← NextInst[IBuf];
Stack&+1 ← T, NIRet;

*Double Double. Timing: 14.25 cycles.
@DDBL:
T ← LSh[Stack&-2,1], Call[P6PS2Safety], Opcode[254];
T ← (RSh[Stack,17]) + T;
Stack ← LSh[Stack,1], GoTo[P6PushT];

*Triple. Timing: 10.25 cycles.
@TRPL:
T ← LSh[Stack&-1,1], Opcode[255];
Stack&+1, GoTo[Addx];

*And. Timing: 8.25 cycles.
@AND:
T ← Stack&-1, Opcode[256];
LU ← NextInst[IBuf];
Stack ← (Stack) and T, NIRet;

*Ior. Timing: 8.25 cycles.
@IOR:
T ← Stack&-1, Opcode[257];
LU ← NextInst[IBuf];
Stack ← (Stack) or T, NIRet;

*Add Signed Byte. Timing: 14.5 cycles pos., 17.5 cycles neg.
@ADDSB:
T ← NextData[IBuf], Opcode[260];
RTemp ← T, Skip[H2Bit8’];
T ← (RTemp) + (177400C);
*This mi wasted to interlock immediately preceding unaligned PFetch2[x,Stack].
T ← (Stack&-1) + T, GoTo[P6PushT];

*Add. Timing: 8.25 cycles.
@ADD:
T ← Stack&-1, Opcode[261];
Addx:
LU ← NextInst[IBuf];
Stack ← (Stack) + T, NIRet;

*Subtract. Timing: 8.25 cycles.
@SUB:
T ← Stack&-1, Opcode[262];
Subx:
LU ← NextInst[IBuf];
Stack ← (Stack) - T, NIRet;

GetTDecStk2:
T ← Stack&-2, Return; *grab it, point to lsb of second doubleword

*Double Add. Timing: 16.25 or 17.25 cycles.
@DADD:
MNBR ← Stack&-1, Call[GetTDecStk2], Opcode[263]; *point to lsb of top doubleword
Stack ← (Stack) + T;*add low bits
Stack&+1, Skip[Carry];
T ← MNBR, GoTo[Addx];*pick up high bits of top doubleword
T ← (MNBR) + 1, GoTo[Addx];*pick up high bits of top doubleword

*Double Subtract. Timing: 16.25 or 17.25 cycles.
@DSUB:
MNBR ← Stack&-1, Call[GetTDecStk2], Opcode[264]; *point to lsb of top doubleword
Stack ← (Stack) - T; * subtract low bits
Stack&+1, Skip[Carry’]; *point to msb of second doubleword
T ← MNBR, GoTo[Subx]; *remember msb of top doubleword (TOS)
T ← (MNBR) + 1, GoTo[Subx];

*Add Double to Cardinal. Timing: 16.25 cycles.
*Cardinal at TOS, Double at 2OS,,3OS.
@ADC:
T ← Stack&-3, Call[P6PS2Safety], Opcode[265];
Stack ← (Stack) + T;
Stack&+1, FreezeResult;
Stack ← (Stack) + 1, UseCOutAsCIn, GoTo[P6Tail];

*Add Cardinal to Double. Timing: 16.25 or 17.25 cycles.
*Double at TOS,,2OS, Cardinal at 3OS.
@ACD:
LU ← Stack&-2, Opcode[266];
T ← Stack&+1;
Stack&-1 ← (Stack&-1) + T;
Stack&+2, Skip[Carry];
***Could save 2 cycles by expending 2 mi here.
Stack&-1 ← Stack&-1, GoTo[P6Tail];
Stack&-1 ← (Stack&-1) + 1, GoTo[P6Tail];

*Add Local 0 to Immediate Byte.
:IF[CacheLocals]; ************************************
*Timing: 12.5 cycles.
@AL0IB:
T ← NextData[IBuf], Opcode[267];
T ← (LocalCache0) + T, GoTo[P6PushT];
:ELSE; ***********************************************
*Timing: 20.5 cycles.
@AL0IB:
PFetch1[LOCAL,Stack,0], Opcode[267];
T ← NextData[IBuf], CallX[Addx];
:ENDIF; **********************************************

%Multiply--high half of 32-bit product is left above the top of the Stack
multiplier in RTemp (from the argument at TOS)
multipliplicand in T (from the argument at TOS-1)
product low in Stack, hi in RTemp1

The first loop flushes leading 0’s in the multiplier with timing 2 cycles/0;
The second loop processes 0’s in 6 cycles and 1’s in 10 or 11 cycles.
Note how a low-order 1 in the multiplier serves as an end flag.
Timing: 18.25 cycles if multiplier is 0, else
58.25 cycles if multiplier is 1, else
(16.25 to 19.25) + 2*LZ + (16-LZ)*6 + (4 or 5)*(NOnes) cycles =
(48.25 to 51.25) + (16-LZ)*4 + (4 or 5)*NOnes cycles.

NOTE: For random numbers, this algorithm averages about 23 cycles faster than
the one on the next page. However, when the multiplier has many leading +
trailing zeroes, it is worse than the other. For products less than 16d bits,
this algorithm is 28 cycles slower for a multiplier with a single 1 bit but
gains 2 cycles for each additional 1 bit in the multiplier. For products
greater than 16d bits, it is 23 cycles slower for a multiplier with a single
1 bit and gains 5 cycles for each additional 1 bit in the multiplier after the
product has exceeded 16 bits. Since 87 percent of all multiplies are
preceded by small constant pushes, the other algorithm probably averages
faster than this one, but this one is 6b mi smaller, so we use it.
%
@MUL:
RTemp1 ← T ← 30C, Call[MulSU], Opcode[270];
*2nd loop shifts the product RTemp1/Stack left 1 and conditionally adds the
*multiplicand T based upon sign of the multiplier RTemp, which is left-shifted
*until the right-most 1 bit is seen.
RTemp ← (RTemp) SALUFOP T, GoTo[Mul1,R<0];
Mul0:
Stack ← (Stack) SALUFOP T;
RTemp1 ← (RTemp1) SALUFOP T, UseCOutAsCIn, Return;
Mul1:
RTemp1 ← (RTemp1) SALUFOP T, UseCOutAsCIn, GoTo[MulLast,ALU=0];
Stack ← (LSh[Stack,1]) + T, Skip[R<0];
RTemp1 ← (RTemp1) - 1, UseCOutAsCIn, Return;
RTemp1 ← (RTemp1) + 1, UseCOutAsCIn, Return;

*Force the low bit of multiplier RTemp to 1 for the end test.
*Initialize the high product word (RTemp1) to 0; low product word (TOS-1)
*already contains the multiplicand, so we don’t zero it and add the
*multiplicand on the 1st multiplier 1 (but we have to test multiplier for 0).
MulSU:
RTemp1 ← (RTemp1) - (SALUF ← T);*SALUF = 30b is LU ← 2A
T ← (Stack&-1) SALUFOP T, Skip[R>=0];*Multiplier*2 from TOS
RTemp ← (Zero) + T + 1, GoTo[MulSUX];*Multiplier .ls. 0
RTemp ← (Zero) + T + 1, Skip[ALU#0];*Multiplier .ge. 0
T ← Stack ← 0C, GoTo[mdPush];*Multiplier .eq. 0
*One mi loop shifts off leading 0’s in multiplier.
RTemp ← (RTemp) SALUFOP T, Skip[R<0];
RTemp ← (RTemp) SALUFOP T, GoTo[.,R>=0];
MulSUX:
T ← Stack, Return;*Multiplicand from TOS-1

MulLast:
T ← RSh[RTemp1,1], GoTo[mdPush,Carry’];
T ← (LSh[AllOnes,17]) or T, GoTo[MdPush];

mdPush:
Stack&+1 ← T;*Even
P6Pop:
LU ← NextInst[IBuf], CallX[P6PopTailx];

%Multiply--high half of 32-bit product is left above the top of the Stack
product low in Stack, hi in RTemp1
multipliplicand low in xBuf, hi in xBuf1
multiplier in RTemp
The first loop runs until the multiplicand being left-shifted 1 each step
overflows into the high word. It has timing of 4 cycles on 0’s, 10 on 1’s;
this loop doesn’t task on 0’s (worst case without tasking ~ 70 cycles on a
multiplier of 100000b and multiplicand of 1). Note that the end test need be
made only when processing a multiplier 1. The second loop runs until the
last multiplier 1 is processed with timing of 6 cycles on 0’s, 14 on 1’s.

Timing: 20.25 cycles if the multiplier is 0, else
30.25 cycles if the multiplier is 1, else
20.25 + (1 if product .ge. 2↑16) +
+ (4/multiplier 0) + (10/1) cycles while product .ls. 2↑16
+ (6/multiplier 0) + (14/1) cycles while product .ge. 2↑16
where the zeroes are those between the leftmost and rightmost ones in the
multiplier.

PopToT: T ← Stack&-1, FreezeResult, Return;

@MUL:
RTemp1 ← T ← 30C, Opcode[270];*SALUF = 30b is LU ← 2A
RTemp1 ← (RTemp1) - (SALUF ← T), Call[PopToT];*RTemp1 ← 0
RTemp ← T, UseCTask, Call[PopToT];
Stack&+1 ← 0C, Skip[ALU#0];*tests RTemp ← T
T ← Stack&+1 ← 0C, GoTo[P6Pop];
xBuf ← T, Call[.+1];
*1st loop
RTemp ← RSh[RTemp,1], GoTo[MulZ,R Even];
MulO:
Stack ← (Stack) + T, Skip[ALU#0];
T ← (RTemp1) + 1, UseCoutAsCin, GoTo[mdPush];
T ← xBuf ← (xBuf) SALUFOP T, FreezeResult, Skip[R<0];
RTemp1 ← (RTemp1) + 1, UseCOutAsCIn, Return;
RTemp1 ← (RTemp1) + 1, UseCOutAsCIn, GoTo[MulL];

MulZ:
T ← xBuf ← (xBuf) SALUFOP T, Skip[R<0];
*Must replicate the mi at MulO-1 because the opcode dispatch locations are
*only four apart on this page.
RTemp ← RSh[RTemp,1], DblGoTo[MulO,MulZ,R Odd];
MulL:
xBuf1 ← 1C, Call[.+1];
*2nd loop
RTemp ← RSh[RTemp,1], GoTo[MulLZ,R Even];
MulLO:
Stack ← (Stack) + T, GoTo[.+3,ALU#0];
T ← xBuf1, FreezeResult;
T ← RTemp1 ← (RTemp1) + T + 1, UseCOutAsCIn, GoTo[mdPush];
T ← xBuf1, FreezeResult;
RTemp1 ← (RTemp1) + T + 1, UseCOutAsCIn;
MulLZ:
T ← xBuf ← (xBuf) SALUFOP T;*Double the multiplicand
xBuf1 ← (xBuf1) SALUFOP T, UseCOutAsCIn, Return;
%

*Double Compare (signed).
*If 3OS,,4OS < TOS,,2OS, push -1, 19.25 cycles
*If 3OS,,4OS = TOS,,2OS, push 0, timing 16.25 cycles
*If 3OS,,4OS > TOS,,2OS, push 1, 18.25 cycles
*Add 4 cycles if the high order words are equal and low order words unequal.
@DCMP:
T ← (Stack&-2) + (100000C), Opcode[271];
Stack ← (Stack) + (100000C), GoTo[DCMPy];

*Unsigned Double Compare
@UDCMP:
T ← Stack&-2, Opcode[272];
*Compare msb’s, point at lsb of high doubleword
*grab lsb of top doubleword, point at lsb of second doubleword
DCMPy:
LU ← (Stack&+1) - T;
T ← Stack&-2, FreezeResult, Skip[ALU=0];
Stack ← (Zero) + 1, DblGoTo[DUCompL,DUCompG,Carry’];
Stack ← (Stack) - T;*compare low words
FreezeResult, Skip[ALU#0];
LU ← NextInst[IBuf], Call[P6Tailx];
Stack ← (Zero) + 1, DblGoTo[DUCompL,DUCompG,Carry’];

DUCompL:
Stack ← (Stack) or not (0C), GoTo[P6Tail];
DUCompG:
LU ← NextInst[IBuf], Call[P6Tailx];

*ESC and ESCL (opcodes 273-274) are in MesaESC.Mc

*Lengthen Pointer.
*Timing: 13.25 cycles non-NIL, 10.25 cycles NIL.
@LP:
T ← Stack&-1, Opcode[275];*Pop to interlock PFetch2’s
Skip[ALU=0];*Test for NIL
T ← RSh[MDShi,10];*push MDShi if non-NIL
LU ← NextInst[IBuf], CallX[SkipPushTx];

P6Undef:
LoadPage[opPage0];
RTemp ← sOpcodeTrap, GoToP[SDTrap];

@OP276:
TrapParm ← 276C, GoTo[P6Undef], Opcode[276];
@OP277:
TrapParm ← 277C, GoTo[P6Undef], Opcode[277];

:END[MesaOP2];