:IF[AltoMode];
TITLE[MesaJ.Alto.Mode];
:ELSEIF[CedarMode];
TITLE[MesaJ.Cedar.Mode];
:ELSE;
TITLE[MesaJ.Pilot.Mode];
:ENDIF;
%
Edit by Fiala 1 February 1982: Change MesaRefill to avoid race against
page fault; avoid bypass kludge after various PFetch4’s which cause
trouble when error correction transport advances page faults.
Edit by Fiala 12 June 1981: Fixed @JB and JB problem analogous to J2/J3;
faster MUL and DIV algorithms; MesaRefillLoc.
Edit by Fiala 23 April 1981: save 1 mi in MesaRefill.
Edit by Neely April 2, 1981 5:57 PM Create a J2: and J3: separate from @J2:
and @J3: because can not call or goto an opcode because of possible page
faults in the instruction stream.
Edit by Fiala 30 March 1981 Fix harmless bug in JW saving 3 mi.
Edit by Fiala 5 March 1981: Fix JB 0 and JB 1; TrapParm←0 at UnImpP6.
Edit by Fiala February 12, 1981 9:55 AM Fix UnImpP6 for Pilot; implement
Cedar undefined opcode trap change; tiny improvement to DIV.
Edit by Jim Frandeen January 5, 1981 4:33 PM: Delete CallX Macro definition
for new D0Lang.
Edit by Johnsson September 29, 1980 8:46 AM; trap fixup
Edit by Fiala August 25, 1980 6:31 PM Incorporate Alto Mesa improvements
since September 1979: bum out 25b mi net; speed up AND, OR, XOR, Jx, JEQx,
JNEx, JIB, JIW, DIV, LDIV, SHIFT, DCOMP, and DUCOMP; remove range
restrictions on two byte jumps (-128 to +127 now legal).
Edit by Johnsson July 28, 1980 5:38 PM: trap fixup
Edit by Jim Frandeen January 22, 1980 11:24 AM: cleanup for D0Lang Version 6.
Edit by Jim Frandeen December 22, 1979 9:43 PM: cleanup for D0Lang Version 5.
MESA JUMP AND ARITHMETIC INSTRUCTIONS
PCB,,PCBhi is a base register pointing to the current instruction quadword.
PCB[14:15] are 0, and the low 3 bits of the PC (which point to a byte within
the quadword) are kept in PCF. Since code segments cannot cross 64K
boundaries and are limited to 32K words in length, the two bytes of PCBhi
are forced to be equal, rather than having the least significant byte differ
from the msb by 1 as is the normal case for base registers.
Refill occurs when, at the onset of a NextInst or NextData, PCF contains a
value greater than 7b. In this case the mi is aborted and the trap mi at
location 0, ’LoadPage[0], GoTo[377]’, is executed, sending control to
location 377 on the page that caused refill. MesaRefill6 below is the mi
for page 6; identical mi exist on other pages from which refill might occur.
Refill timing is as follows: The aborted mi, trap mi at 0, and PFetch4 use
6 cycles; in Pilot mode, memory wait uses 12 more, totalling 18 cycles;
in Alto mode, memory wait and byte swapping add 20, totalling 26 cycles.
This time is distributed over the 8 bytecodes in a quadword; any additional
refills caused by jumps are charged to the jump bytecodes themselves. Hence,
Pilot mode refill accounts for 2.25 and Alto mode for 3.25 cycles/non-jump
bytecode.
%
*NOTE: MicroD almost can’t place page 6 due to shortage of branch condition
*target pairs. This occurs as follows: of the 128 branch target pairs,
*opcode entries use 64; opcode entries which do a call use 13 more, leaving
*only 51 for everything else, which is barely enough.
MesaRefill:*Called from many places
PCF ← RZero, At[MesaRefillLoc];
:IF[AltoMode]; ****************************************
PCB ← (PCB) + (4C);
SwapBytes:*Called from MesaX
IBuf ← LCy[IBuf,10];
SwapBytes1:*Called from MesaX
IBuf1 ← LCy[IBuf1,10];
IBuf2 ← LCy[IBuf2,10];
IBuf3 ← LCy[IBuf3,10], Return;
:ELSE; ************************************************
*Hold page fault on page 0 and do not advance PCB until a fault is impossible;
*otherwise, there is a race between the fault and the PCB←PCB+4; this race
*is particularly bad if error correction happens.
Nop;
Nop;
PCB ← (PCB) + (4C), Return;
%xBuf ← T;
Nop;
UseCTask;
T ← APCTask&APC, LoadPage[opPage2];
xBuf1 ← T;
OnPage[opPage2];
T ← (SStkP&NStkP) xor (377C), Task;
xBuf2 ← IP[FFault]C;
StkP ← xBuf2, xBuf2 ← T, NoRegILockOK, Call[.+1];
LU ← (Stack) and (40000C);
T ← xBuf, Skip[ALU=0];
Return;
StkP ← xBuf2;
APCTask&APC ← xBuf1;
PCB ← (PCB) + (4C), Return;
%
:ENDIF; ***********************************************
MesaRefill6:
PFetch4[PCB,IBuf,4], GoToP[MesaRefill], At[3377]; *Page 6 refill
%NOTE: after any PFetch4 on page 6 faults, the fault handler will resume
after filling IBuf with 377b bytes; this means that it is inadvisable to
use the bypass kludge after the PFetch4 and must remain on page 6. If the
bypass kludge were used, the transport for a preceding PFetch4, such as the
one in RDC.Mc, could experience error correction and advance the time of a
page fault so that the mi containing the bypass kludge was aborted, and
this would execute incorrectly.
%
JnComO:PFetch4[PCB,IBuf];
JnComO1:
Nop;
T ← PCB ← (PCB) + T;
PCB ← (LSh[PCB,1]) + 1;
JnFin:PCF ← PCB, PCB ← T, NoRegILockOK;*Only low 3 bits of PCF loaded
:IF[AltoMode]; ****************************************
JSwapx:IBuf ← LCy[IBuf,10];
IBuf1 ← LCy[IBuf1,10];
IBuf2 ← LCy[IBuf2,10];
IBuf3 ← LCy[IBuf3,10];
P6Tail:LU ← NextInst[IBuf];*Odd; paired with JJmp
:ELSE; ************************************************
Nop;**NextInst is illegal in the mi after PCF←.
JSwapx:
P6Tail:LU ← NextInst[IBuf];*Odd; paired with JJmp
:ENDIF; ***********************************************
P6Tailx:
PCB ← (PCB) and not (3C), NIRet;
JnComZ:PFetch4[PCB,IBuf];
Nop;
T ← PCB ← (PCB) + T;
PCB ← LSh[PCB,1], GoTo[JnFin];
JJmp:T ← (PCF.word) + T, DblGoTo[JnComO,JnComZ,R Odd];*Even; paired with P6Tail
JnEe:T ← (PCF.word) + T, GoTo[JnComO];
JnEo:T ← (PCF.word) + T + 1, GoTo[JnComZ];
%Jn, n=2-8. PCF points to the byte beyond the opcode when execution starts
(i.e., 1<=PCF<=10b), so if PCF is odd, the opcode is the even byte of the
current word, else the odd byte of the previous word. PCX is loaded from PCF
at T2 of the first mi executed. The word displacement of the target from
PCF[0:2] and the final lsb of the PC are:
nPCFPCF
evenodd
20,11,0
31,01,1
41,12,0
52,02,1
62,13,0
73,03,1
83,14,0
94,04,1
%
*Note: cannot use @J2, @J3, or @JB because, after page faults,
*PCF and PCX are wrong for continuation.
J2:SkipData, CallX[P6Tail];
J3:SkipData, CallX[J2];
JB:T ← NextData[IBuf], CallX[JBr];*Paired with Enojmp
@J2:SkipData, CallX[P6Tail], Opcode[200];*Time = 6+(18*(2/8)) = 10.50
@J3:SkipData, CallX[J2], Opcode[201];*Time = 8+(18*(3/8)) = 14.75
@J4:SkipData, CallX[J3], Opcode[202];*Time =10+(18*(4/8)) = 19.00
@J5:T ← 2C, GoTo[JJmp], Opcode[203];*Time =22 to 23
@J6:T ← 2C, Cycle&PCXF, DblGoTo[JnEo,JnEe,R Odd], Opcode[204];
@J7:T ← 3C, GoTo[JJmp], Opcode[205];
@J8:T ← 3C, Cycle&PCXF, DblGoTo[JnEo,JnEe,R Odd], Opcode[206];
@J9:PCB ← (PCB) + (4C), Opcode[207];**NOT PFetch4 (bypass kludge)
PFetch4[PCB,IBuf,0], GoTo[JSwapx];*PCF unchanged (Time = 21.50)
%Jump Byte: alpha is a signed displacement from the opcode.
AllOnes is used as a temporary, restored when done. Note that
RH[PCBhi] = LH[PCBhi] since code can’t cross 64k boundary.
%
@JB:T ← NextData[IBuf], Opcode[210];
JBr:T ← (PCFReg) + T, Skip[H2Bit8’];
PCB ← (PCB) - (200C);*Offset by 400b bytes if negative displacement
:IF[AltoMode]; ****************************************
*T has displacement relative to LAST byte of opcode.
**JB 0 is a problem here with PCF .eq. 0.
JBx:AllOnes ← (AllOnes) + T;
:ELSE; ************************************************
*T has signed BYTE displacement relative to FIRST byte of opcode.
**JB 0 with PCF .eq. 0 or 1 and JB 1 with PCF .eq. 0 are problems.
AllOnes ← (Form-2[AllOnes]) + T;
:ENDIF; ***********************************************
JBy:T ← RSh[AllOnes,1], Skip[R>=0];
T ← (LSh[R400,7]) or T;
*Fault handler fills IBuf with 377b bytes and resumes if this PFetch4 faults.
PFetch4[PCB,IBuf];
PCF ← AllOnes;
PCB ← (PCB) + T;
AllOnes ← (Zero) - 1, GoTo[JSwapx];
*Jump Word: alpha,,beta is a 2’s complement displacement.
:IF[AltoMode]; ****************************************
@JW:Cycle&PCXF, Skip[R Even], Opcode[211];
CSkipData;
T ← NextData[IBuf];*get beta
LU ← CycleControl ← NextData[IBuf];*get alpha
T ← (LHMask[Cycle&PCXF]) or T;*CycleControl is in bits 0:7
*T has 2’s complement displacement relative to LAST byte of the opcode
JWx:T ← (PCFReg) + T, GoTo[JBx];
:ELSE; ************************************************
@JW:LU ← CycleControl ← NextData[IBuf], Opcode[211]; *get alpha
T ← NextData[IBuf];*get beta
T ← (LHMask[Cycle&PCXF]) or T;*CycleControl is in bits 0:7
*T has 2’s complement displacement relative to the FIRST byte of the opcode
JWx:T ← (PCFReg) + T;
AllOnes ← (Form-4[AllOnes]) + T + 1, GoTo[JBy];*-4+T+1 = T-3
:ENDIF; ***********************************************
stkdif:LU ← (Stack&-1) - T, Return;
Ejmp:T ← NextData[IBuf], CallX[JBr];*Even; paired with J2
Enojmp:SkipData, CallX[P6Tail];*Even; paired with JB
JNEjO:T ← (Zero) + T + 1, GoTo[JnComZ];*Odd
JNEjE:PFetch4[PCB,IBuf], GoTo[JnComO1];*Even
*APC[13:14] contains 1, 2, or 3 for JEQ2, JEQ4, or JEQ6, respectively,
*as left by the last NIRet.
JEQev:LU ← (Stack&-1) - T, UseCTask;
T ← (LdF[APCTask&APC,13,2]) - 1, Skip[ALU=0];
P6Taila:LU ← NextInst[IBuf], Call[P6Tailx];*Odd
JJmpa:T ← (PCF.word) + T, DblGoTo[JNEjO,JNEjE,R Odd];*Even
JEQod:LU ← (Stack&-1) - T, UseCTask;
T ← LdF[APCTask&APC,13,2], DblGoTo[JJmp,P6Tail,ALU=0];
*Jump Equal n, n=2..9:
@JEQ2:T ← Stack&-1, CallX[JEQev], Opcode[212];*CIA=3051b
@JEQ3:T ← Stack&-1, CallX[JEQod], Opcode[213];*CIA=3055b
@JEQ4:T ← Stack&-1, CallX[JEQev], Opcode[214];
@JEQ5:T ← Stack&-1, CallX[JEQod], Opcode[215];
@JEQ6:T ← Stack&-1, CallX[JEQev], Opcode[216];
@JEQ7:T ← Stack&-1, CallX[JEQod], Opcode[217];
@JEQ8:T ← Stack&-1, UseCTask, Call[stkdif], Opcode[220];
T ← 3C, DblGoTo[JJmpa,P6Taila,ALU=0];
@JEQ9:T ← Stack&-1, UseCTask, Call[stkdif], Opcode[221];
T ← 4C, DblGoTo[JJmp,P6Tail,ALU=0];
*Jump Equal Byte
@JEQB:T ← Stack&-1, UseCTask, Call[stkdif], Opcode[222];
JEQBx:DblGoTo[Ejmp,J2,ALU=0];
*APC[12:14] contains 1,2,2,3,3,4,4,5 for JNE2 to JNE9.
JNEev:LU ← (Stack&-1) - T, UseCTask;
T ← (LdF[APCTask&APC,12,3]) - 1, Skip[ALU#0];
LU ← NextInst[IBuf], Call[P6Tailx];
T ← (PCF.word) + T, DblGoTo[JNEjO,JNEjE,R Odd];*Odd
JNEod:LU ← (Stack&-1) - T, UseCTask;
T ← (LdF[APCTask&APC,12,3]) - 1, Skip[ALU#0];
LU ← NextInst[IBuf], Call[P6Tailx];*Even
T ← (PCF.word) + T, DblGoTo[JnComO,JnComZ,R Odd];*Odd
*Jump Not Equal n, n=2..9:
@JNE2:T ← Stack&-1, CallX[JNEev], Opcode[223];*CIA=3115b
@JNE3:T ← Stack&-1, CallX[JNEod], Opcode[224];
@JNE4:T ← Stack&-1, CallX[JNEev], Opcode[225];
@JNE5:T ← Stack&-1, CallX[JNEod], Opcode[226];
@JNE6:T ← Stack&-1, CallX[JNEev], Opcode[227];
@JNE7:T ← Stack&-1, CallX[JNEod], Opcode[230];
@JNE8:T ← Stack&-1, CallX[JNEev], Opcode[231];
@JNE9:T ← Stack&-1, CallX[JNEod], Opcode[232];
*Jump Not Equal Byte
@JNEB:T ← Stack&-1, UseCTask, Call[stkdif], Opcode[233];
JNEBx:DblGoTo[JB,Enojmp,ALU#0];
JLBpos:DblGoTo[J2,Ejmp,Ovf’];*Even
JLBneg:DblGoTo[JB,Enojmp,Ovf’];*Odd
JGEBpos:DblGoTo[JB,Enojmp,Ovf’];*Even
JGEBneg:DblGoTo[J2,Ejmp,Ovf’];*Odd
stksw:T ← Stack&+1, Return;
*Jump Less Byte - jump if (TOS-1) < TOS
@JLB:T ← Stack&-1, UseCTask, Call[stkdif], Opcode[234];
JLBx:FreezeResult, DblGoTo[JLBpos,JLBneg,ALU>=0];
*Jump Greater Equal Byte
@JGEB:T ← Stack&-1, UseCTask, Call[stkdif], Opcode[235];
JGEBx:FreezeResult, DblGoTo[JGEBpos,JGEBneg,ALU>=0];
*Jump Greater Byte
@JGB:Stack&-1, UseCTask, Call[stksw], Opcode[236];
LU ← (Stack&-2) - T, GoTo[JLBx];
*Jump Less Equal Byte
@JLEB:Stack&-1, UseCTask, Call[stksw], Opcode[237];
LU ← (Stack&-2) - T, GoTo[JGEBx];
*Jump Unsigned Less Byte
@JULB:T ← Stack&-1, UseCTask, Call[stkdif], Opcode[240];
JULBx:DblGoTo[J2,Ejmp,Carry];
*Jump Unsigned Greater Equal Byte
@JUGEB:T ← Stack&-1, UseCTask, Call[stkdif], Opcode[241];
JUGEBx:DblGoTo[JB,Enojmp,Carry];
*Jump Unsigned Greater Byte
@JUGB:Stack&-1, UseCTask, Call[stksw], Opcode[242];
LU ← (Stack&-2) - T, GoTo[JULBx];
*Jump Unsigned Less Equal Byte
@JULEB:Stack&-1, UseCTask, Call[stksw], Opcode[243];
LU ← (Stack&-2) - T, GoTo[JUGEBx];
*Jump Zero Equal Byte
@JZEQB:LU ← Stack&-1, GoTo[JEQBx], Opcode[244];
*Jump Zero Not Equal Byte
@JZNEB:LU ← Stack&-1, GoTo[JNEBx], Opcode[245];
*Jump Indexed Byte and Jump Indexed Word
CODEToRTemp:PFetch1[CODE,RTemp], Return;
:IF[AltoMode]; ****************************************
@JIB:LoadPage[opPage3], GoTo[UnImpP6], Opcode[246];
@JIW:LU ← Cycle&PCXF, Skip[R Even], Opcode[247];
CSkipData;*Odd byte can’t cause refill
T ← Stack&-1;
LU ← (Stack) - T;
T ← Stack&-1, Skip[Carry’];
SkipData, Call[J2];
T ← (NextData[IBuf]) + T;*get beta
LU ← CycleControl ← NextData[IBuf];*get alpha
:ELSE; ************************************************
P6PopComp:
T ← Stack&-1, UseCTask;
LU ← (Stack) - T, Return;
@JIB:LU ← CycleControl ← CNextData[IBuf], Call[P6PopComp], Opcode[246];
T ← RSh[Stack,1], Skip[Carry’];
SkipData, Call[P6Pop];*Exit to next opcode
T ← (NextData[IBuf]) + T;*add beta
T ← (LHMask[Cycle&PCXF]) + T, Call[CODEToRTemp];
Stack&-1, Skip[R Odd];
T ← LdF[RTemp,0,10], GoTo[JWx];
T ← RHMask[RTemp], GoTo[JWx];
@JIW:LU ← CycleControl ← CNextData[IBuf], Call[P6PopComp], Opcode[247];
T ← Stack&-1, Skip[Carry’];
SkipData, Call[P6Tail];*Flush beta and exit
T ← (NextData[IBuf]) + T;*add beta
:ENDIF; **********************************************
T ← (LHMask[Cycle&PCXF]) + T, Call[CODEToRTemp];
T ← RTemp, GoTo[JWx];
@ADD:T ← Stack&-1, Opcode[250];
Addx:LU ← NextInst[IBuf];
Stack ← (Stack) + T, NIRet;
@SUB:T ← Stack&-1, Opcode[251];
Subx:LU ← NextInst[IBuf];
Stack ← (Stack) - T, NIRet;
%Multiply--high half of 32-bit product is left above the top of the Stack
product low in Stack, hi in RTemp1
multipliplicand in T (from the argument at TOS-1)
multiplier in RTemp (from the argument at TOS)
The first loop flushes leading 0’s in the multiplier with timing 2 cycles/0;
The second loop processes 0’s in 6 cycles and 1’s in 10 or 11 cycles.
Note how a low-order 1 in the multiplier serves as an end flag.
Total time is (14 to 17) + 2*LZ + (16-LZ)*6 + (4 or 5)*(NOnes).
NOTE: For random numbers, this algorithm averages about 35 cycles faster than
the algorithm commented on the next page. However, when the multiplier has
many leading + trailing zeroes, it is worse than the other.
This algorithm has 10 to 13 cycles less setup overhead and the same loop time
if the product is less than 16d bits; for larger products it has 15 to 18
cycles less setup overhead and 3 to 4 cycles less per multiplier 1; however,
in all cases, it requires 2 more cycles per leading or trailing multiplier 0
than the other algorithm.
In the most unfavorable case (1 x 1), the other algorithm is 17 cycles faster;
for all other bad cases, it is no more than 13 cycles faster. Since 87
percent of all multiplies are preceded by small constant pushes, the other
algorithm may average faster than this one, but this one is 5b mi smaller, so
we use it.
%
@MUL:RTemp1 ← T ← 30C, Call[MulSU], Opcode[252];
*2nd loop shifts the product RTemp1/Stack left 1 and conditionally adds the
*multiplicand T based upon sign of the multiplier RTemp, which is left-shifted
*until the right-most 1 bit is seen.
RTemp ← (RTemp) SALUFOP T, GoTo[Mul1,R<0];
Mul0: Stack ← (Stack) SALUFOP T;
RTemp1 ← (RTemp1) SALUFOP T, UseCOutAsCIn, Return;
Mul1:RTemp1 ← (RTemp1) SALUFOP T, UseCOutAsCIn, GoTo[MulLast,ALU=0];
Stack ← (LSh[Stack,1]) + T, Skip[R<0];
RTemp1 ← (RTemp1) - 1, UseCOutAsCIn, Return;
RTemp1 ← (RTemp1) + 1, UseCOutAsCIn, Return;
*Force the low bit of multiplier RTemp to 1 for the end test.
*Initialize the high product word (RTemp1) to 0; low product word (TOS-1)
*already contains the multiplicand, so we don’t zero it and add the
*multiplicand on the 1st multiplier 1 (but we have to test multiplier for 0).
MulSU:RTemp1 ← (RTemp1) - (SALUF ← T);*SALUF = 30b is LU ← 2A
T ← (Stack&-1) SALUFOP T, Skip[R>=0];*Multiplier*2 from TOS
RTemp ← (Zero) + T + 1, GoTo[MulSUX];*Multiplier .ls. 0
RTemp ← (Zero) + T + 1, Skip[ALU#0];*Multiplier .ge. 0
T ← Stack ← 0C, GoTo[mdPush];*Multiplier .eq. 0
*One mi loop shifts off leading 0’s in multiplier.
RTemp ← (RTemp) SALUFOP T, Skip[R<0];
RTemp ← (RTemp) SALUFOP T, GoTo[.,R>=0];
MulSUX:T ← Stack, Return;*Multiplicand from TOS-1
MulLast:
T ← RSh[RTemp1,1], GoTo[mdPush,Carry’];
T ← (LSh[AllOnes,17]) or T, GoTo[MdPush];
mdPush:Stack&+1 ← T;
P6Pop:LU ← NextInst[IBuf];
Stack&-1, NIRet;
PopToT: T ← Stack&-1, FreezeResult, Return;
@DBL:T ← LSh[Stack&-1,1], Opcode[253];
PushTP6:LU ← NextInst[IBuf];
Stack&+1 ← T, NIRet;
%Multiply--high half of 32-bit product is left above the top of the Stack
product low in Stack, hi in RTemp1
multipliplicand low in RTemp, hi in xfMX
multiplier in xfMY
The first loop, which runs until the multiplicand being left-shifted 1 each
step overflows into the high word, has timing of 4 cycles on 0’s, 10 on 1’s;
this loop doesn’t task on 0’s (worst case without tasking ~ 70 cycles on a
multiplier of 100000b and multiplicand of 1).
Note that the end test need be made only when processing a multiplier 1.
The second loop, which runs until the last multiplier 1 is processed, has
timing of 6 cycles on 0’s, 14 on 1’s.
Total time 26 cycles + (14 per 1) + (6 per 0 between the leftmost and
rightmost 1’s) + (4 per 0 to the right of the rightmost 1).
@MUL:RTemp1 ← T ← 30C, Opcode[252];*SALUF = 30b is an LU ← 2A
RTemp1 ← (RTemp1) - (SALUF ← T), Call[PopToT];*RTemp1 ← 0
xfMY ← T, UseCTask, Call[PopToT];
Stack&+1 ← 0C, Skip[ALU#0];*tests xfMY ← T
T ← Stack&+1 ← 0C, GoTo[P6Pop];
RTemp ← T, Call[.+1];
*1st loop
xfMY ← RSh[xfMY,1], GoTo[MulZ,R Even];
MulO:Stack ← (Stack) + T, GoTo[MulLast,ALU=0];
T ← RTemp ← (RTemp) SALUFOP T, FreezeResult, Skip[R<0];
RTemp1 ← (RTemp1) + 1, UseCOutAsCIn, Return;
RTemp1 ← (RTemp1) + 1, UseCOutAsCIn, GoTo[MulL];
MulZ:T ← RTemp ← (RTemp) SALUFOP T, Skip[R<0];
*Must replicate the mi at MulO-1 because the opcode dispatch locations are
*only four apart on this page.
xfMY ← RSh[xfMY,1], DblGoTo[MulO,MulZ,R Odd];
MulL:xfMX ← 1C, Call[.+1];
*2nd loop
xfMY ← RSh[xfMY,1], GoTo[MulLZ,R Even];
MulLO:Stack ← (Stack) + T, GoTo[MulLLast,ALU=0];
T ← xfMX, FreezeResult;
RTemp1 ← (RTemp1) + T + 1, UseCOutAsCIn;
MulLZ:T ← RTemp ← (RTemp) SALUFOP T;*Double the multiplicand
xfMX ← (xfMX) SALUFOP T, UseCOutAsCIn, Return;
MulLLast:
T ← xfMX, FreezeResult;
T ← RTemp1 ← (RTemp1) + T + 1, UseCOutAsCIn, GoTo[mdPush];
MulLast:
T ← (RTemp1) + 1, UseCoutAsCin;
Old divide inner loop
LU ← (Zero) + T;*Carry←0
Stack ← (Stack) SALUFOP T, UseCOutAsCIn, Call[DvStart];
*Loop time: 13 to 15 cycles/bit.
LU ← RTemp1;
RTemp ← (RTemp) - T, Skip[ALU>=0]; *subtract divisor
Stack ← (Stack) SALUFOP T, GoTo[div1]; *No carry test--q bit 1
Stack ← (Stack) SALUFOP T, UseCOutAsCIn, Skip[Carry];*Subtract ok?
RTemp ← (RTemp) + T, FreezeResult;*No--undo
div1:xfTemp ← (xfTemp) - 1, FreezeResult, Skip[R>=0];
T ← RTemp, GoTo[mdPush];
DvStart:
RTemp ← (RTemp) SALUFOP T, UseCOutAsCIn, Skip[R<0];
RTemp1 ← Zero, Return;
RTemp1 ← (RTemp1) or not (0C), Return;
%
*Divide--(0,,TOS-1)/TOS. Single word dividend and divisor, no overflow check.
*The remainder is left above the Stack.
@DIV:MNBR ← Stack&-1, Opcode[254];
RTemp ← 0C, GoTo[LDIVx];
%Long Divide--(TOS-1,,TOS-2)/TOS. Double-word dividend, single-word divisor,
no overflow check. The remainder is left above the Stack.
dividend low in Stack; hi in RTemp
divisor in T
quotient appears in Stack; remainder in RTemp
Timing ~ 30 + (9/quotient 1) + (12/quotient 0) cycles
%
@LDIV:MNBR ← Stack&-1, Call[PopToT], Opcode[255];
RTemp ← T;
LDIVx:T ← 31C;*SALUFOP = A+A+1
SALUF ← T;
T ← MNBR;*T ← divisor
:IF[AltoMode]; ****************************************
LU ← (RTemp) - T;*no checks in Alto mode
xfTemp ← 16C, Skip[Carry’];
T ← 0C, GoTo[mdPush];*0 divisor or divide check
:ELSE; ************************************************
LU ← (RTemp) - T, GoTo[DvsNZ,ALU#0];
RTemp ← sZeroDivisor;
DvTrap: LoadPage[opPage0];
T ← SStkP, GoToP[BackSPPCandTrap];
DvsNZ:xfTemp ← 16C, Skip[Carry’];
RTemp ← sDivideCheck, GoTo[DvTrap];
:ENDIF; ***********************************************
RTemp1 ← (Zero) - T;*Save minusdivisor
*The 1st bit shifted into stack is "Don’t care."
Stack ← (Stack) SALUFOP T, Call[DvStart];
*Shift the high dividend while simultaneously subtracting the divisor and
*adding in the bit shifted out of the low word in the previous step.
T ← RTemp ← (LSh[RTemp,1]) + T + 1, Skip[R>=0];
*Shift the low dividend while bringing in the quotient bit
Stack ← (Stack) SALUFOP T, GoTo[DvSub];*q bit 1
*Subtract ok?
Stack ← (Stack) SALUFOP T, UseCOutAsCIn, GoTo[DvSub,Carry];
*No--undo by adding back the divisor. Would like to simply add in the next
*step (i.e., ((X+divs) lshift 1) - divs .eq. (X lshift 1) + divs), but the
*carry gets screwed up.
T ← RTemp1, FreezeResult;
T ← RTemp ← (RTemp) - T, FreezeResult;
DvSub:xfTemp ← (xfTemp) - 1, FreezeResult, Skip[R>=0];
Stack&+1 ← T, GoTo[P6Pop];
*Get minusdivisor + bit shifted out of low word
DvStart:
T ← (RTemp1) - 1, UseCOutAsCIn, Return;
@NEG:T ← Stack&-1, Opcode[256];
T ← (Zero) - T, GoTo[PushTP6];
@INC:T ← (Stack&-1) + 1, GoTo[PushTP6], Opcode[257];
@AND:T ← Stack&-1, Opcode[260];
LU ← NextInst[IBuf];
Stack ← (Stack) and T, NIRet;
@OR:T ← Stack&-1, Opcode[261];
LU ← NextInst[IBuf];
Stack ← (Stack) or T, NIRet;
@XOR:T ← Stack&-1, Opcode[262];
LU ← NextInst[IBuf];
Stack ← (Stack) xor T, NIRet;
@SHIFT:T ← Stack&-1, Opcode[263];
RTemp ← T, DblGoTo[ShiftRight,ShiftLeft,ALU<0];
ShiftRight: RTemp ← (RTemp) + (17C);
CycleControl ← RTemp, Skip[Carry];
Stack ← Zero, GoTo[P6Tail];*shift count > 17 , use zero
LU ← NextInst[IBuf];
Stack ← RF[Stack], NIRet;
ShiftLeft:LU ← (RTemp) and not (17C);
RTemp ← (RTemp) xor (17C), Skip[ALU=0];
Stack ← Zero, GoTo[P6Tail];*shift count > 17 , use zero
*T has positive count. form 0,,-count, then use WFA
CycleControl ← RTemp;
LU ← NextInst[IBuf];
Stack ← WFA[Stack], NIRet;
@DADD:MNBR ← Stack&-1, Call[GetTDecStk2], Opcode[264]; *point to lsb of top doubleword
Stack ← (Stack) + T; *add low bits
Stack&+1, Skip[Carry];
T ← MNBR, GoTo[Addx]; *pick up high bits of top doubleword
T ← (MNBR) + 1, GoTo[Addx]; *pick up high bits of top doubleword
@DSUB:MNBR ← Stack&-1, Call[GetTDecStk2], Opcode[265]; *point to lsb of top doubleword
Stack ← (Stack) - T; * subtract low bits
Stack&+1, Skip[Carry’]; *point to msb of second doubleword
T ← MNBR, GoTo[Subx]; *remember msb of top doubleword (TOS)
T ← (MNBR) + 1, GoTo[Subx];
GetTDecStk2:T ← Stack&-2, Return; *grab it, point to lsb of second doubleword
*Double Compare:
*If (TOS-2),,(TOS-3) < TOS,,(TOS-1), push -1
*If (TOS-2),,(TOS-3) = TOS,,(TOS-1), push 0
*If (TOS-2),,(TOS-3) > TOS,,(TOS-1), push 1
*Comparisons are signed
@DCOMP:T ← (Stack&-2) + (100000C), Opcode[266];
Stack ← (Stack) + (100000C), GoTo[DUCOMPy];
*Comparisons are unsigned
@DUCOMP:T ← Stack&-2, Opcode[267];
*Compare msb’s, point at lsb of high doubleword
*grab lsb of top doubleword, point at lsb of second doubleword
DUCOMPy:
LU ← (Stack&+1) - T;
T ← Stack&-2, FreezeResult, Skip[ALU=0];
Stack ← (Zero) + 1, DblGoTo[DUCompL,DUCompG,Carry’];
Stack ← (Stack) - T;*compare low words
FreezeResult, Skip[ALU#0];
LU ← NextInst[IBuf], Call[P6Tailx];
Stack ← (Zero) + 1, DblGoTo[DUCompL,DUCompG,Carry’];
DUCompL:Stack ← (Stack) or not (0C), GoTo[P6Tail];
DUCompG:LU ← NextInst[IBuf], Call[P6Tailx];
*ADD01 - on D0, equivalent to ADD
@ADD01:T ← Stack&-1, GoTo[Addx], Opcode[270];
*Unused opcodes on page 6
:IF[AltoMode]; ****************************************
LoadPage[opPage3], GoTo[UnImpP6], Opcode[271];
LoadPage[opPage3], GoTo[UnImpP6], Opcode[272];
LoadPage[opPage3], GoTo[UnImpP6], Opcode[273];
LoadPage[opPage3], GoTo[UnImpP6], Opcode[274];
LoadPage[opPage3], GoTo[UnImpP6], Opcode[275];
LoadPage[opPage3], GoTo[UnImpP6], Opcode[276];
LoadPage[opPage3], GoTo[UnImpP6], Opcode[277];
UnImpP6:
T ← sUnimplemented, GoToP[kfcr];
:ELSEIF[CedarMode]; ************************************
RTemp ← 271C, GoTo[UnImpP6], Opcode[271];
RTemp ← 272C, GoTo[UnImpP6], Opcode[272];
RTemp ← 273C, GoTo[UnImpP6], Opcode[273];
RTemp ← 274C, GoTo[UnImpP6], Opcode[274];
RTemp ← 275C, GoTo[UnImpP6], Opcode[275];
RTemp ← 276C, GoTo[UnImpP6], Opcode[276];
RTemp ← 277C, GoTo[UnImpP6], Opcode[277];
UnImpP6:
LoadPage[opPage0];
TrapParm ← 0C, GoToP[UndefTrap];
:ELSE; ************************************************
LoadPage[opPage0], GoTo[UnImpP6], Opcode[271];
LoadPage[opPage0], GoTo[UnImpP6], Opcode[272];
LoadPage[opPage0], GoTo[UnImpP6], Opcode[273];
LoadPage[opPage0], GoTo[UnImpP6], Opcode[274];
LoadPage[opPage0], GoTo[UnImpP6], Opcode[275];
LoadPage[opPage0], GoTo[UnImpP6], Opcode[276];
LoadPage[opPage0], GoTo[UnImpP6], Opcode[277];
UnImpP6:
RTemp ← sUnimplemented, GoToP[BackPCandTrap];
:ENDIF; ***********************************************
:END[MesaJ];