:TITLE[MesaOP3];*Opcodes 300b - 377b
%
Ed Fiala 11 July 1984: Make xfFault not fetch the local frame overhead and
not call Loadgc when FromReschedule=1.
Ed Fiala 14 March 1984: Add GA1, CAW, LFC opcodes; move LP here from
MesaOP2 (renumbered & modified); eliminate LFC1..LFCB, D0, DB; move GAB,
GAW, EFC0..EFCB, SFC, KFCB, BLT..BLTCL for Klamath. Save 2 mi with
TGetsCycle subroutine. Double index into SD in kfcr subroutine. Change
xfTemp1 to xfOldPC in SavPCinFrame. Change trap parameters everywhere.
Improve xfer timing about 15 percent by moving buffer refill back into
both the type 0 and type 1 paths and by changing the way the PC is
restored on page faults. Speed up BLT and BLTC by 9 cycles + 2 cycles/word;
remove BLT fault fixup. Change entry at prFault. Move refill to
MesaOP2.mc. Change opcode 377 to enter Fault.mc differently.
Ed Fiala 21 December 1982: Make opcodes 372-375 undefined only when not Cedar.
Ed Fiala 13 October 1982: Add SavPCinFrame1 label for MesaESC; fix bug in PC
backup at xfTrap; add tasking at LFC and RET.
Ed Fiala 23 August 1982: Fix xfer allocate to not overwrite frame’s FSI word
with the FSI of the chain because sometimes ths FSI contains 37b rather than
the chain’s FSI.
Ed Fiala 26 May 1982: Remove local cache refill at BLTint; eliminate
MemStat← at StorePCTrap for frame faults; add MemStat←Normal at P7Tailx;
do Xfer local cache refill only on type 0.
Ed Fiala 20 May 1982: Replace NoPushSD by PushSD in MemStat; fix bug at
xfT0e+1; improve tasking at xfT0e return; cause PushSD only on indirect
xfers saving 10 cycles in ordinary case of EFC; interchange at
BLTdonex for CacheLocals=1.
Ed Fiala 3 May 1982: Merge xfLM1toT into SavPCinFrame saving 1 mi.
Ed Fiala 15 December 1981: Create from old Pilot MesaX and other opcodes;
substantial rework of page placement and registers; numerous Xfer changes
for new Pilot.
%

*LoadRAM jumps here at end of initialization to start the emulator.
*StkP points at FFault; set "trap on page fault" bit.
InitEnd:
Stack ← (Stack) or (1C), At[InitEndLoc];
T ← xfMX, LoadPage[xfPage1];
StkP ← RZero, GoToP[xfT2];

%Xfer transfers control from one context to another. Affected are the
LOCAL, GLOBAL, CODE/CODEhi, PCB/PCBhi/PCF, xfGFIWord, and xfBrkByte
registers and the values in the overhead words of the local frame
(return link, global link, and return pc).

Entries:
SavePCXferfrom EFC, @SFC, @KFCB, and BackTrap (MesaOP0).
Xferfrom @XE, @XF, @PO, @POR.
TrapXferfrom xfTrap2.
xfT2from InitEnd.
RetGofrom Reschedule (MesaP).
xfLFCfrom @LFC.
Duplicate Xfer and Xfer+1 for entry from @RET.
xfFaultfrom Fault.mc when EarlyXfer=1 in MemStat and a
fault occurs.

Permanent registers used
xfBrkByte
40400b + break bytecode to execute (Known bugs!!).
xfXTSReg
Xfer trap status register--right-shifted
1 bit each Xfer; an odd value means trap at exit
from the Xfer just before entering the new context.
Input registers
xfTemp/1
destination link (xfTemp[16..17b] has a type code for
the link (0 = frame link, 1 = procedure link, 2 =
indirect link).
xfMX
some of the xfer entries are deeper with the destination
link in xfMX rather than xfTemp/1; garbage for @LFC.
xfMY
source link--may become trap parameter or return link;
garbage for @RET.
MemStat
possible FreeFrame, Trap, or FromReschedule bits set;
EarlyXfer bit is set during the entire xfer.
MNBR
current frame’s LOCAL if FreeFrame=0 in MemStat.
xfOldPC
2*PCB+PCF if FreeFrame=1 in MemStat else 2*(PCB-CODE)+PCF.
xfFSI
frame size index if FreeFrame=1 in MemStat.
xfFrame
pointer to frame (i.e., to LOCAL-4) if FreeFrame=1.
Temporary registers
xfCount, xfSD, CycleControl, xfTrapParm0-3, RTemp1.
prConditionQ (=RTemp) must not be smashed by the code at xfFault.
xBuf-xBuf3 are used by Fault.mc, so these must not hold information needed
to undo the xfer.

Various value restrictions:
CODE
quadaligned
left-half CODEhi = right-half CODEhi because code segments don’t cross
64k boundaries.
PCB
quadaligned (PCB*2 + PCF = byte pc of next byte to execute)
PCBhi = CODEhi
0 <= PCB-CODE <= 32k
GLOBAL
quadaligned; overhead words at GLOBAL-4 to GLOBAL-1.
LOCAL
quadaligned; overhead words at LOCAL-4 to LOCAL-1; LOCAL-4 and
LOCAL+3 are on the same page.
GLOBALhi = LOCALhi = MDShi
AV
page-aligned, locked in storage.

Page faults are dealt with in three steps:
(1) Bytecodes that page fault prior to setting EarlyXfer in MemStat
will be restarted after normal fault service.
(2) A page fault with EarlyXfer will rebuild LOCAL from MNBR if
FreeFrame=0 in MemStat, else from xfFrame+4; then GLOBAL is restored from
the local frame overhead; finally, CODE, etc. are rebuilt for the old
context by the Loadgc subroutine. If either FromReschedule=1 or Trap=1
in MemStat, the restart PC is already in the frame. Otherwise, if
FreeFrame=1, the unsaved return byte PC is restored as xfOldPC - 2*CODE; if
FreeFrame=0, the saved byte PC is in xfOldPC. Finally,
the return byte PC is backed up using PCX as in (1) above and saved in
the frame; then FromReschedule is set to 1 in MemStat so that Reschedule
won’t overwrite the PC which has been saved in the frame.
(3) If the final NextInst out of xfer causes refill and buffer refill
page faults, IBuf will be filled with 377 bytecodes and the fault handler
will reenter and complete the Xfer; bytecode 377 microcode (illegal in
normal programs) then deals with the trap.
(4) To make (2) above possible, the two opcodes which set FreeFrame=1
(RET and XF) preserve LOCAL-4 in xfFrame and 2*PCB+PCF in xfOldPC.
(5) The fault handler must not dispatch on MemStat when a NextInst is
responsible for a page fault, so that if NextInst causes refill on a type 1
xfer, it will complete correctly. No faults happen after freeing the frame
because buffer refill precedes that.

Destination (xfMX) and Source (xfMY) links will be pushed onto the stack
if PushSD=1 in MemStat, but this must not happen until EarlyXfer faults
(step 2 above) are impossible because PO, POR, and SFC depend upon stack
arguments which would be overwritten, making restart impossible. The
stack is popped twice to its original level after these links are pushed.

If the Trap bit in MemStat is 1, xfTrapParm0-3 are stored in Local 0-3;
a type 0 Xfer also stores the source link (xfMY) in the return link
word of the frame being entered.

The following trap and fault conditions are detected by Xfer:

1) If after following all indirect links the final destination link is 0, a
control trap occurs. This will backup and save the PC in the old frame,
then reenter Xfer with the Trap bit set in MemStat, the destination link
changed to the sControlTrap entry in SD, and the original source link in
xfTrapParm0. This occurs legitimately when initializing port linkages;
otherwise, it indicates a bug.

2) If either the global frame address or byte PC for the new context is 0,
an unbound trap occurs with the destination link in xfTrapParm0-1. This
indicates that the procedure doesn’t exist (i.e., a bug). Type 0 xfers
instead get a control trap when the global frame address is 0.

3) If the new code base is uneven, a code trap occurs--like (1) with the
global frame in xfTrapParm0. This is used for start traps.

4) If a type 1 Xfer’s frame allocation fails, a frame fault occurs.
This backs up and saves the PC in the old frame, then enters prFault in
MesaP with the new frame’s FSI in prData, qFrameFaultOs in T, and PCB odd.
A process switch will occur to the frame fault process; PCB odd indicates
"context invalid" and prevents both StkP←SStkP and storing PC in the frame
upon a page fault. Completion of an Xfer will turn off "context invalid."

5) If xfXTSReg is odd, an Xfer trap occurs provided xfGFIWord bit 14 is
also 1. This trap occurs in the new context with the sXferTrap entry in
SD as the destination link and the new frame as the source link. The
original destination link is in xfTrapParm0-1 and a code indicating the
xfer type in xfTrapParm2. Codes for xfer traps are:
RET
0
SFC, KFCB, EFC
1
LFC
2
PO, POR
3
XE, XF
4
traps
5
process switch
6

Since the timing for xfer includes buffer refill at the destination, the
first code byte of an xfer is, for accounting purposes, not charged 2.25
cycles for buffer refill. However, other code bytes are charged 2.25
cycles. To get total time for an Xfer opcode, add the values in the tables
below; the first table below includes time for buffer refill.

Timing from beginning of opcode to the label Xfer:
28.00 cycles EFC type 1 code link, 29.00 cycles if frame link;
(-1 cycles if type 0, +2 if type 2) + 6.25 cycles if EFCB.
14.00 cycles on RET (Same timing for types 0 or 2; type 1 illegal)
25.00 cycles on SFC type 1 (-1 cycles type 0, +2 cycles type 2)
29.25 cycles on KFCB type 1 (-1 cycles type 0, +2 cycles type 2)
48.25 cycles on PO or POR type 1 (+1 cycles type 0, +4 cycles type 2)
52.50 cycles on @XE
50.50 cycles on @XF

Timing from @LFC to the label xfLFC:
33.50 cycles.

Timing from xfLFC to xfGo:
64.00 in the ordinary case where FreeFrame=0 and PushSD=0
(assumes the next opcode begins with a memory reference).

Timing from Xfer to xfGo:
20 cycles/indirection (type 2 destination link) +
39 cycles for type 0 Local return (+19 cycles if non-Local)
(+16 cycles if Trap=1)
92 cycles for type 1 (+11 cycles if Trap=1 in MemStat) when
FreeFrame=0 and PushSD=0

Timing from xfGo to exit:
20 cycles if FreeFrame=0 and PushSD=1;
10 cycles if FreeFrame=0 and PushSD=0;
40 cycles if FreeFrame=1 (PushSD=1 is free; assumes the next
opcode begins with a memory reference).

Total time for typical procedure calls:
28+92+10+2 + 14+39+19+40 = 132+112 = 244 cycles for EFC+RET;
33.5+64+10+2 + 14+39+40 = 109.5+93 = 202.5 cycles for LFC+RET.
%
OnPage[xfPage1];

xfPushTR:
Stack&+1 ← T, Return;

xfRet:
Return;

xfBufRefill:
PFetch4[CODE,IBuf], GoTo[xfTtoPCB,R Even];
xfSD ← sCodeTrap;
T ← Global, GoTo[xfSetTrapParm];
xfTtoPCB:
PCB ← T, Return;*Bypass kludge

xfSaveRETPC:
T ← (LSh[PCB,1]) + T;
xfTtoxfOldPC:
xfOldPC ← T, Return;

%Compute CODE-relative byte PC in xfOldPC and return LOCAL-1 in T so that
the caller can immediately save the PC in the frame.
CODEhi .eq. PCBhi and |PCB-CODE| < 32K, so the result fits in one word.
Note that code segments never cross 64k boundaries so that adding negative
displacements to PCB never involves carries into PCBhi.
Called by SavePCXfer, @PO, @POR, @XE, and MesaP.
%
SavPCinFrame:
T ← PCFreg;
SavPCinFrame1:
T ← (LSh[CODE,1]) - T;
T ← (LSh[PCB,1]) - T;
xfOldPC ← T;
T ← (MNBR ← LOCAL) - 1, Return;


%FreeSub free’s a frame except for the final PStore1[MDS,xfFrame] which
is not done for tasking reasons. None of FreeSub’s memory references can
page fault because AV must be resident and L[-4] to L[+3] are on the same
page and not swapped out. L[0] is quadaligned. @FF opcode enters at
FreeSub0, Xfer at FreeSub0 and FreeSub1.

input registers
xfFramepointer to frame - 4
xfFSIdisplacement to head-of-list in AV
temps
xBufframe chain link
%
FreeSub0:
xfFSI ← T ← (xfFSI) + (AVOffset);
FreeSub1:
PFetch1[MDS,xBuf];*get head of list from AV
T ← xfFrame ← (xfFrame) + (4C);
PStore4[MDS,xBuf];*link ← head
T ← xfFSI, Return;


%Loadgc loads GLOBAL, xfGFIWord, CODE, and CODEhi given a pointer to a
global frame; also loads PCF from xfTemp1. Generates an Unbound trap
if xfTemp1=0 or a Code trap if CODE is odd.

input registers
xfTempglobal frame address
xfTemp1new CODE-relative byte PC
output registers
xfGFIWord
CODE,CODEhicode base register
PCBhisame value as CODEhi
GLOBALglobal base register
PCFloaded from xfTemp1

Calls to Loadgc from xfTrap & xfTrap1; to Loadgc2 from xfer type 1, and
to Loadgc3 from xfer type 0. Fill xfGFIWord, CODE, and CODEhi with new
global overhead; the register preceding xfGFIWord (xfGarb0) is smashed.
%
Loadgc:
T ← (xfTemp) - (4C);
Loadgc2:
PFetch4[MDS,xfGarb0];
*Note that the value loaded into PCF depends only on the relative byte PC
*here--subtracting CODE is unnecessary because it is quadword-aligned.
Loadgc3:
LU ← PCF ← xfTemp1;
T ← (Form4[AllOnes]) + T, Skip[ALU#0];
xfSD ← sUnboundTrap, GoTo[xfDestParmTrap];*New PC = 0
GLOBAL ← T;
T ← RHMask[CODEhi];
*CODE doesn’t cross 64K boundary to avoid complications when adding
*negative displacements to CODE or PCB.
T ← CODEhi ← (LSh[CODEhi,10]) or T;
PCBhi ← T, Return;

SavePCXfer:
T ← PCFreg, Call[SavPCinFrame1];
PStore1[MDS,xfOldPC];
*MemStat will contain 0 here except on the jump from BackTrap in MesaOP0;
*when it contains Trap!EarlyXfer!xfTypeTrap.
MemStat ← (MemStat) or (Or[EarlyXfer!,xfTypeCALL!]C);
TrapXfer:
xfMY ← (Zero) + T + 1;*xfMY ← LOCAL
Xfer:
T ← Form-2[xfTemp], Skip[R Even];
xfMX ← (Zero) + T + 1, DblGoTo[xfT1,xfUnbound,ALU#0];
LU ← Dispatch[xfTemp,16,1];
*Save original word 0 of control link to use as a trap parameter later.
xfMX ← T, Disp[xfT0];

xfUnbound:
xfSD ← sUnboundTrap, GoTo[xfDestParmTrap];

*An indirect link is even, so use PFetch2. Source and destination links
*are pushed onto the stack only when an indirect link is encountered.
*Since SFC and PO have stack arguments which must not be smashed until
*page faults are impossible, the links cannot be pushed yet.
xfT2:
PFetch2[MDS,xfTemp], Task, At[xftype,1];*Indirect link
MemStat ← (MemStat) or (PushSD);
T ← Form-2[xfTemp], Skip[R Even];
DblGoTo[xfT1,xfUnbound,ALU#0];
LU ← Dispatch[xfTemp,16,1];
T ← xfTemp, Disp[xfT0];

*LOCAL is offset by -2 here; jump if this xfer is servicing a trap.
xfT0e:
LOCAL ← (LOCAL) + (2C), GoTo[xfT0d,ALU#0];
:IF[CacheLocals]; ********************************
*Note that xfTrapParm0-3 and LocalCache0-3 are identical registers when
*CacheLocals is true.
PFetch4[LOCAL,LocalCache0,0];
:ENDIF; ******************************************
xfT0c:
T ← GLOBAL, Return;
xfT0d:
PStore4[LOCAL,xfTrapParm0,0];
T ← (LOCAL) - (3C);*Bypass kludge ok
PStore1[MDS,xfMY];*Save return link at [LOCAL-3]↑
*Disable interrupts if trapping to a frame.
xfWDC ← (xfWDC) + 1, GoTo[xfT0c];

%Frame link. T points at frame being reentered. That frame has FSI in
word -4, return link (just dispatched on) in word -3, GLOBAL in word -2,
and return PC in word -1.
%
xfT0:
T ← (Form-2[AllOnes]) + T, GoTo[.+3,ALU#0], At[xftype,0];
xfSD ← sControlTrap;
T ← xfMY, GoTo[xfSetTrapParm];
*xfTemp,,xfTemp1 ← new GLOBAL,,return PC (byte PC relative to CODE).
PFetch2[MDS,xfTemp];
*LOCAL is offset by -2 here.
LOCAL ← T;
LU ← (MemStat) and (Trap), Call[xfT0e];
**2 dead cycles here.
LU ← (xfTemp) xor T;
*Equal old and new GLOBAL implies equal CODE, CODEhi, PCBhi, and xfGFIWord,
*so Loadgc need not be called. Type 1 Xfers don’t need this improvement
*because local function calls are used.
*NewG=0 is illegal here.
T ← (xfTemp) - (4C), Skip[ALU#0];
PCF ← xfTemp1, Skip;
PFetch4[MDS,xfGarb0], Call[Loadgc3];
T ← RSh[xfTemp1,1], Call[xfBufRefill];

%An MC1 fault aborts before the 4th mi after the PFetch4[CODE,IBuf] in
xfBufRefill. This is the last fault which can happen since FreeSub’s
references can’t fault.

Free the old frame if FreeFrame=1 in MemStat; push the source and dest.
links if PushSD=1 in MemStat. PushSD=1 only on indirect xfers; FreeFrame=1
only on @RET and @XF.
%
*Dispatch on PushSD & FreeFrame bits
xfGo:
LU ← Dispatch[MemStat,7,2];
PCB ← Form-4[PCB], Disp[.+1];
xfFSI ← RHMasK[xfFSI], At[xfMSTab,1];
T ← xfFSI ← (xfFSI) + (AVOffset), Call[FreeSub1];
PStore1[MDS,xfFrame], GoTo[xfGo1];

xfFSI ← RHMasK[xfFSI], Call[FreeSub0], At[xfMSTab,3];
PStore1[MDS,xfFrame];
T ← xfMX, At[xfMSTab,2];
Stack&+1 ← T;
T ← xfMY, Call[xfPushTR];
Stack&-2;
xfGo1:
xfXTSReg ← RSh[xfXTSReg,1], DblGoTo[XferTrap,xfTail,R Odd], At[xfMSTab,0];


*Have destination link-1 in T known to be non-zero. Odd placement.
**From SavePCXfer’s PStore1, there are 2 dead cycles here at the PFetch4
**in Loadgc, so 1 more mi can be put here almost for free.
xfT1:
T ← (Form-4[AllOnes]) + T, Call[Loadgc2];

%Enter at xfLFC from LFC with CODE-relative byte PC in xfTemp1.
Allocate new local frame and initialize local overhead words.

First byte of code is the frame size index (FSI) for the new frame. The
word at MDS[AVOffset+FSI] is the header for a chain of frames in which the
low two bits of each pointer are interpreted as follows:
00Word[0:15d] points to a quadaligned frame; the overhead
words are -4 to -1 with respect to this pointer; the link to
the next frame is word 0; overhead words and words 0 to 3
(which become Locals 0-3) are guaranteed on the same page.
x1No frames available--trap with original FSI as parameter.
10No more frames of the exact size wanted, but Word[0:13d]
are an alternate FSI.
References to the FSI table can’t page or write protect fault; the first
reference to any frame might page fault, but since the four overhead words
and the first four locals are guaranteed to be on the same page, no faults
will happen after that.

The original FSI from the 1st code byte is preserved in CycleControl and
becomes the trap parameter, if frame allocation fails; the current FSI
+ AVOffset is kept in xfCount. Note that the FSI in the new frame’s
overhead word is not necessarily the same as the FSI for the list from
which the frame is obtained; it may be different when the allocation
software wants the frame returned to a different list.

RET can wind up with a type 1 link (after indirection) in the case of fork
termination.

Timing from Xfer to here = 30 cycles.
Timing from @LFC to here = 33.5 cycles.
%
xfLFC:
T ← RSh[xfTemp1,1], Call[xfBufRefill];
T ← AVOffset;
*4 cycles suspended here unless an io task intervenes
T ← (CycleControl ← CNextData[IBuf]) + T, Call[xfGF1];
*Have original FSI in CycleControl, current FSI + AVOffset in T.
*Also return here on indirection.
*9 cycles suspended here unless an io task intervenes.
LU ← LdF[xfTemp,16,1], GoTo[xfGF0,R Even];
*Frame allocation failed--cause FrameFault with original FSI as parameter.
*To share backup code, 1st part of FrameFault is treated as trap.
T ← RSh[Cycle&PCXF,10];
prConditionQ ← qFrameFaultOs, GoTo[xfFault];
xfGF0:
T ← xfTemp, GoTo[xfGF2,ALU=0];
xfTemp ← RSh[xfTemp,2];*Indirect
AllocInd:
*Here from @AF in MesaESC.
T ← (xfTemp) + (AVOffset);
xfGF1:
PFetch1[MDS,xfTemp];
TtoxfCount:
*Here from @AF in MesaESC.
xfCount ← T, Return;

%Alloc succeeded--fetch word 0, pointer to next free frame in chain
This is the last reference which can page fault before XferT1Fixup.
Timing from xfLFC to here = 36 + 19/indirect allocate cycles.
%
xfGF2:
PFetch1[MDS,RTemp1];*Word 0 of frame = link to next frame.
LOCAL ← T, Task;
T ← (LOCAL) - (4C);
%PFetch1-PStore4 into the overhead words here (of xfRSAV (frame’s FSI),
xfMY (return link), GLOBAL, and GLOBALhi) is substantially faster than two
PStore1’s (PStore2 can’t be used because the two words are unaligned).
Only xfMY and GLOBAL need to be stored. The FSI in word -4 of the frame is
not always the same as the FSI of the chain--sometimes it is 37b, if the
frame handler wants the frame to be freed after execution.
%
PFetch1[MDS,xfRSAV];
LU ← (MemStat) and (Trap);
PStore4[MDS,xfRSAV], FreezeResult;
T ← xfCount, GoTo[.+3,ALU=0];
*PStore4 is ok because frame is quadaligned and extends to at least LOCAL+3.
PStore4[LOCAL,xfTrapParm0,0];
Call[xfRet];
*Complete frame allocation by storing the link into MDS[FSI+AVOffset].
PStore1[MDS,RTemp1], GoTo[xfGo];

%Jump to xfDestParmTrap from Loadgc with sUnboundTrap in xfSD if the byte
PC of the destination is 0 or from xfUnbound if the global frame address of
a type 1 xfer is 0. For an indirect destination link, only xfMX will be
valid, and xfTemp1 will contain garbage; this is ok. For a direct link
xfMX/xfTemp1 are the destination link.

Jump to xfSetTrapParm with new GLOBAL in T and sCodeTrap in xfSD from
xfBufRefill, if the new value for CODE is odd.

Jump to xfSetTrapParm with source link in T and sControlTrap in xfSD, if
the destination link is 0.

Jump to xfSetTrapParm with GLOBAL in T, sUnboundTrap in xfSD, and 0 in
xfTrapParm1 if the new PC of a LFC (i.e., alphabeta) is 0.

Jump to xfFault with FSI in T and with qFrameFaultOs in prConditionQ on a
type 1 xfer frame allocation failure. External jump to xfFaultLoc from
Fault.mc on page, write protect, and MOB faults with the fault
parameter in T and fault queue in prConditionQ. Making xfSD negative
allows faults and traps to share code for backing up the PC and restoring
StkP.

XferTraps enter at xfTrap2 with the new context’s CODE-relative PC in PCB
and the new LOCAL and StkP already setup. An XferTrap returns to the new
context.

FromReschedule=1 in MemStat only on the xfer from Reschedule; for this
xfer, LOCAL must be restored and StkP remains correct during the xfer, and
the PC in the frame must not be clobbered; the code must be careful
because SStkP and PCX will have values for another process in this case,
and the global frame pointed at by the frame being entered (= backed up to
here) may not be in storage. Though the normal Reschedule xfer is through a
type 0 link to an existing frame (which can’t trap), it might (?) go through
a type 2 link and trap during process forking or whatever; also, the global
frame overhead reference might fault, so the global frame must not be
referenced after a fault or trap begins; since eventually local frames may
also be made non-resident, the local frame should not be touched either.

Nested traps and faults, i.e., those occurring when Trap=1 in MemStat, are
treated like those with FromReschedule=1. Namely, the backed up PC is
already saved in the frame, StkP is correct, and GLOBAL, CODE, CODEhi, etc.
are consistent, so no further references to the local and global frame of
the old context are required.

For the remaining xfers, those with FromReschedule=0 and Trap=0, all
registers smashed must be restored so that the information saved by
Reschedule is correct. Registers which may be clobbered are LOCAL, GLOBAL,
CODE, etc., PCB, and PCF. When FreeFrame=1 in MemStat, LOCAL is restored
from xfFrame+4 and the return 2*PCB+PCF is restored from xfOldPC.
Otherwise, LOCAL is restored from MNBR, and the return PC is in xfOldPC.

Because GLOBAL may be inconsistent with CODE, CODEhi, PCBhi, etc., if
a fault or trap happened during the Loadgc subroutine, Loadgc is called to
make these registers self-consistent for the old context before saving the
backed up PC.
%
xfDestParmTrap:
T ← xfTemp1;
xfTrapParm1 ← T, Task;
T ← xfMX;
xfSetTrapParm:
xfTrapParm0 ← T, GoTo[xfTrap];

*GoToExternal[xfFaultLoc] from Fault.mc with fault parameter in T and
*prConditionQ holding the queue pointer for fault notification when
*EarlyXfer=1 in MemStat; also here on frame faults.
**NOTE: must not smash prConditionQ = RTemp here.
xfFault:
*xfSD and prData are the same register
xfSD ← prData ← T, At[xfFaultLoc];
xfSD ← prData ← (xfSD) or (100000C);
xfTrap:
LU ← (MemStat) and (FreeFrame), Call[xfTrap0];
*When FreeFrame=0, LOCAL is restored from MNBR. Then the old GLOBAL, CODE,
*etc. are restored preparatory to saving the old PC in the frame except
*when FromReschedule=1 or Trap=1 in MemStat; in these latter cases, the PC
*in the frame and StkP are already correct so no references to the old
*context’s local or global frames are made.
LU ← (MemStat) and (Or[Trap!,FromReschedule!]C);
LOCAL ← T, GoTo[xfTrap2,ALU#0];*Restore LOCAL
T ← (LOCAL) - (2C), GoTo[xfTrap3];

*This is a subroutine only to task.
**Defeat xfTemp1=0 check in Loadgc.
xfTrap0:
xfTemp1 ← 377C, GoTo[xfTrap1,ALU#0];
T ← MNBR, Return;

*When FreeFrame=1, LOCAL is restored from xfFrame+4 and the unsaved
*return PC can be computed by restoring first GLOBAL, then CODE, and
*finally subtracting 2*CODE from the value 2*PCB+PCF saved in xfOldPC.
xfTrap1:
T ← (xfFrame) + (2C);
PFetch1[MDS,xfTemp], Task;
T ← (xfFrame) + (4C);
LOCAL ← T, Call[Loadgc];
T ← LSh[CODE,1];
xfOldPC ← (xfOldPC) - T, GoTo[xfBackPC];
xfTrap3:
PFetch1[MDS,xfTemp], Call[Loadgc];*xfTemp ← old GLOBAL
*FromReschedule=1 and Trap=1 are impossible here because of the branch to
*xfTrap2 earlier.
xfBackPC:
T ← (PCXreg) - 1;
*Backup PC by PCF - [(PCX-1) mod 10b]
T ← (LdF[xfOldPC,15,3]) - T, Skip[ALU>=0];
T ← 7C, GoTo[.-1];
xfOldPC ← (xfOldPC) - T, Skip[ALU>=0];
*Crossed quadword boundary in opcode.
xfOldPC ← (xfOldPC) - (10C);
*Restore StkP to beginning of opcode (needed for @SFC, @PO, and @POR)
T ← SStkP;
RTemp1 ← T;
StkP ← RTemp1;
xfTrap2:
*xfSD is < 0 here only on a fault.
T ← (xfSD) + (SDOffset), GoTo[.+3,R<0];
T ← (xfSD) + T;***Could bum this mi.
PFetch2[MDS,xfTemp];
*The source link for traps is always the current frame’s LOCAL.
LU ← (MemStat) and (Or[FromReschedule!,Trap!]C);
T ← (MNBR ← LOCAL) - 1, Skip[ALU#0];
PStore1[MDS,xfOldPC];*Save backed up PC
prData ← xfSD ← (prData) and not (100000C), GoTo[.+3,R<0];
*It is ok to clear the FromReschedule bit in MemStat at this point because
*the Trap bit will also prevent the PC from being saved in the frame.
MemStat ← Or[EarlyXfer!,xfTypeTRAP!]C;
MemStat ← (MemStat) or (Trap), GoTo[TrapXfer];
*Page write protect, and frame faults are handled by Fault Notification
LoadPage[opPage0];
*Set flag so that when Reschedule calls SavPCinFrame, it won’t save PC.
MemStat ← FromReschedule, GoToP[prFault];*In MesaP

*Enter from xfGo1 if xfXTSReg was odd with new CODE-relative PC in xfTemp1.
*The 2nd word of the destination link is of interest only when the xfer
*has a type 1 link without indirection; in this case the original byte PC
*is still in xfTemp1.
XferTrap:
LU ← LdF[xfGFIWord,16,1];
T ← xfMX, Skip[ALU#0];
LU ← NextInst[IBuf], CallX[xfTailx];
xfTrapParm0 ← T, Task;
T ← xfTemp1;
xfTrapParm1 ← T, Call[xfTtoxfOldPC];
T ← LdF[MemStat,12,3];
xfTrapParm2 ← T;
MemStat ← Normal;
xfSD ← sXferTrap, GoTo[xfTrap2];

*Timing: 6.25 cycles
@LI0:
LU ← NextInst[IBuf], Opcode[300];*LIn for n = 0-10d
Stack&+1 ← 0C, NIRet;
@LI1:
LU ← NextInst[IBuf], Opcode[301];
Stack&+1 ← 1C, NIRet;
@LI2:
LU ← NextInst[IBuf], Opcode[302];
Stack&+1 ← 2C, NIRet;
@LI3:
LU ← NextInst[IBuf], Opcode[303];
Stack&+1 ← 3C, NIRet;
@LI4:
LU ← NextInst[IBuf], Opcode[304];
Stack&+1 ← 4C, NIRet;
@LI5:
LU ← NextInst[IBuf], Opcode[305];
Stack&+1 ← 5C, NIRet;
@LI6:
LU ← NextInst[IBuf], Opcode[306];
Stack&+1 ← 6C, NIRet;
@LI7:
LU ← NextInst[IBuf], Opcode[307];
Stack&+1 ← 7C, NIRet;
@LI8:
LU ← NextInst[IBuf], Opcode[310];
Stack&+1 ← 10C, NIRet;
@LI9:
LU ← NextInst[IBuf], Opcode[311];
Stack&+1 ← 11C, NIRet;
@LI10:
LU ← NextInst[IBuf], Opcode[312];
Stack&+1 ← 12C, NIRet;

*Load -1 (8.25 cycles).
*LU ← NextInst[IBuf]; Stack&+1 ← (Stack&+1) or not (0C), NIRet;
*is faster here, but illegal to read stack, which might be empty.
@LIN1:
T ← (Zero) - 1, GoTo[P7PushT], Opcode[313];

*Load Immediate Negative Infinity (6.25 cycles).
@LINI:
LU ← NextInst[IBuf], Opcode[314];
Stack&+1 ← 100000C, NIRet;

*Load Immediate Byte (10.5 cycles).
@LIB:
T ← NextData[IBuf], CallX[P7PushT], Opcode[315];

TGetsCycle:
T ← LHMask[Cycle&PCXF], Return;

*Load Immediate Word (16.75 cycles).
@LIW:
LU ← CycleControl ← CNextData[IBuf], Call[TGetsCycle], Opcode[316];
LADRBx:
T ← (NextData[IBuf]) + T, CallX[P7PushT];

*Load Immediate Negative Byte (12.5 cycles).
@LINB:
T ← 177400C, GoTo[LADRBx], Opcode[317];

*Load Immediate High Byte (12.5 cycles).
@LIHB:
LU ← CycleControl ← CNextData[IBuf], Call[TGetsCycle], Opcode[320];
P7PushT:
LU ← NextInst[IBuf];
Stack&+1 ← T, NIRet;

*Load Immediate Double Zero (8.25 cycles).
@LID0:
T ← Stack&+1 ← 0C, GoTo[P7PushT], Opcode[321];

*Local Address 0 to 3, 6, 8, Byte, and Word (8.25, 12.5, and 18.75 cycles).
@LA0:
T ← (LOCAL) + (0C), GoTo[P7PushT], Opcode[322];
@LA1:
T ← (LOCAL) + (1C), GoTo[P7PushT], Opcode[323];
@LA2:
T ← (LOCAL) + (2C), GoTo[P7PushT], Opcode[324];
@LA3:
T ← (LOCAL) + (3C), GoTo[P7PushT], Opcode[325];
@LA6:
T ← (LOCAL) + (6C), GoTo[P7PushT], Opcode[326];
@LA8:
T ← (LOCAL) + (10C), GoTo[P7PushT], Opcode[327];
@LAB:
T ← LOCAL, GoTo[LADRBx], Opcode[330];*Local Address Byte
@LAW:
T ← LOCAL, Opcode[331];*Local Address Word
GLWx:
LU ← CycleControl ← NextData[IBuf];
T ← (LHMask[Cycle&PCXF]) + T, GoTo[LADRBx];

*Global Address 0, 1, Byte, and Word (8.25, 8.25, 12.5, and 18.75 cycles).
@GA0:
T ← GLOBAL, GoTo[P7PushT], Opcode[332];*Global Address 0
@GA1:
T ← (GLOBAL) + 1, GoTo[P7PushT], Opcode[333];*Global Address 1
@GAB:
T ← GLOBAL, GoTo[LADRBx], Opcode[334];*Global Address Byte
@GAW:
T ← GLOBAL, GoTo[GLWx], Opcode[335];*Global Address Word

*Code Address Word (22.75 cycles).
@CAW:
LU ← CycleControl ← CNextData[IBuf], Call[TGetsCycle], Opcode[336];
T ← (NextData[IBuf]) + T;
T ← (CODE) + T;
Stack&+1 ← T;
*Carry=1 is impossible because code segments can’t cross 64k boundaries.
T ← RSh[CODEhi,10], GoTo[P7PushT];

*External Function Call 0 to 12, Byte
@EFC0:
T ← (RZero) or not (1C), GoTo[EFC], Opcode[337];*-2
T ← (RZero) or not (3C), GoTo[EFC], Opcode[340];*-4
T ← (RZero) or not (5C), GoTo[EFC], Opcode[341];*-6, etc.
T ← (RZero) or not (7C), GoTo[EFC], Opcode[342];
T ← (RZero) or not (11C), GoTo[EFC], Opcode[343];
T ← (RZero) or not (13C), GoTo[EFC], Opcode[344];
T ← (RZero) or not (15C), GoTo[EFC], Opcode[345];
T ← (RZero) or not (17C), GoTo[EFC], Opcode[346];
T ← (RZero) or not (21C), GoTo[EFC], Opcode[347];
T ← (RZero) or not (23C), GoTo[EFC], Opcode[350];
T ← (RZero) or not (25C), GoTo[EFC], Opcode[351];
T ← (RZero) or not (27C), GoTo[EFC], Opcode[352];
T ← (RZero) or not (31C), GoTo[EFC], Opcode[353];
@EFCB:
T ← (MNBR ← NextData[IBuf]) + 1, Opcode[354];
T ← (MNBR) + T;
T ← (RZero) or not T;*(2xalpha + 1)’ = -2xalpha - 2
EFC:
xfGFIWord, LoadPage[xfPage1], Skip[R Even];
PFetch2[CODE,xfTemp], GoToP[SavePCXfer];*Code link
*0th global link is at GLOBAL-6 (skips four overhead words).
T ← (Form-4[AllOnes]) + T, GoToP[.+1];*Frame link (-2xalpha - 6)
OnPage[xfPage1];
PFetch2[GLOBAL,xfTemp], GoTo[SavePCXfer];

*Timing to xfLFC: (28x3/4) + 52/8 + 48/8 = 21 + 6.5 + 6 = 33.5 cycles
@LFC:
T ← (LSh[CODE,1]) - 1, Opcode[355];
T ← (PCFreg) - T, Task;
*xfOldPC ← 2 x (PCB-CODE) + PCFreg + 2 = CODE-relative return PC
T ← (LSh[PCB,1]) + T + 1;
xfOldPC ← T, Task;
T ← (MNBR ← LOCAL) - 1;
PStore1[MDS,xfOldPC];*Save return PC at LOCAL-1
*Delay tasking after PStore1 because an io task will hang in page wait.
*
xfMX ← 0C;*Destination link (not needed)
xfMY ← (Zero) + T + 1;*Source link
MemStat ← Or[EarlyXfer!,xfTypeLFC!]C;
LU ← CycleControl ← NextData[IBuf];
T ← LHMask[Cycle&PCXF];
*T ← (PCF ← NextData[IBuf]) + T may not always work.
T ← (NextData[IBuf]) + T;*T ← alpha,,beta
xfTemp1 ← T, LoadPage[xfPage1], Skip[ALU=0];
PCF ← xfTemp1, GoToP[xfLFC];
xfTrapParm1 ← T, GoToP[.+1];
OnPage[xfPage1];
T ← GLOBAL;
xfSD ← sUnboundTrap, GoTo[xfSetTrapParm];

*Stack Function Call
@SFC:
T ← Stack&-1, Opcode[356];
xfTemp1 ← T;
T ← Stack&-1, LoadPage[xfPage1];
xfTemp ← T, GoToP[SavePCXfer];

*Return. Fetch frame FSI into xfFSI and return link into xfMX.
@RET:
T ← (LOCAL) - (4C), Opcode[357];
PFetch2[MDS,xfFSI];*xfFSI ← FSI number for frame deallocation
*xfMX ← return link (can’t fault)
***I think the source link is don’t care, but what about zeroing ports?
*
xfMY ← 0C;
xfFrame ← T, LoadPage[xfPage1];
MemStat ← Or[FreeFrame!,EarlyXfer!,xfTypeRET!]C, GoToP[.+1];
OnPage[xfPage1];
T ← PCFreg, Call[xfSaveRETPC];
*Also jump here from Reschedule. Type 1 illegal; will be type 0 or 2.
RetGo:
LU ← Dispatch[xfMX,16,1];
T ← xfMX, Disp[xfT0];

*Kernel Function Call Byte
@KFCB:
T ← NextData[IBuf], Opcode[360];
xfSD ← T;
*Entry at kfcr after xfSD ← T ← SDindex from Fault.mc (sStackError),
*Cedar (sCatastrophe), and MesaP (sRescheduleError).
kfcr:
xfSD ← (xfSD) + (SDOffset), At[KFCRLoc];
T ← (xfSD) + T, LoadPage[xfPage1];
PFetch2[MDS,xfTemp], GoToP[SavePCXfer];

*@ME (361) and @MX (362) are in MesaP.mc.

P7Tail:
LU ← NextInst[IBuf], At[P7TailLoc];
*MemStat←Normal is relied upon by the BLT opcodes. This works only because
*Fault.Mc ALWAYS continues the opcode without dispatching on MemStat when
*a NextInst page or write protect fault occurs.
P7Tailx:
MemStat ← Normal, NIRet;

P7Ret:
Return;

%BLock Transfer.
[S]/
short pointer to destination
[S-1]/
cardinal word count
[S-2]/
short pointer to source
Timing: 18.25 cycles for count = 0, else 15.25 cycles + 25 cycles/word.
*NOTE: the best single-word loop would be 2 cycles/word faster.
%
@BLT:
LP ← 0C, Opcode[363];
T ← MDShi;
*Also jump here from @BLTC with CODEhi in T.
BLTcom:
Stack&-1;*Point StkP at count
LU ← Stack&-1;*Test for count=0, point StkP at source.
LPhi ← T, Skip[ALU#0];*Complete source base register
Stack&-1, GoTo[P7Tail];*Word count = 0
T ← Stack, Call[.+1];*T ← source displacement
*Loop here with "Return"; additional tasking is undesirable within the
*loop because any io task to run would wait for the PStore1.
PFetch1[LP,RTemp];
Stack&+2;*Point at dest
T ← Stack&-1;*Get dest, point to count
PStore1[MDS,RTemp];
T ← (RZero) + T + 1;*T ← dest+1
Nop;
Nop;
*Begin updates no sooner than 4th mi after PStore1, after any faults will
*have happened.
Stack ← (Stack) - 1;*Decrement count
Stack&+1 ← T, Skip[ALU#0];*Increment destination
Stack&-3, GoTo[BLTdonex];
Stack&-2, Skip[IntPending];*Point at source
T ← Stack ← (Stack) + 1, Return;*Increment source and T←
Stack ← (Stack) + 1, LoadPage[opPage0];
Stack&+2, GoToP[NopInt];

:IF[CacheLocals]; ********************************
*Jump here from BLTcom, @BLTL, and @DSK and BLTLR1 in MesaESC.Mc.
BLTdonex:
PFetch4[LOCAL,LocalCache0,0], GoTo[P7Tail];
:ELSE; *******************************************
BLTdonex:
LU ← NextInst[IBuf], CallX[P7Tailx];
:ENDIF; ******************************************


*StackLPDest0 is called by @BLTL and @BLTCL, StackLPDest2 by @BLECL in
*MesaESC.
StackLPDest0:
LPDestHi ← T;
LPDestHi ← (LSh[LPDestHi,10]) + T + 1;
T ← Stack&-2;*Skip over the count word.
StackLPDest2:
LPDest ← T, Return;

%Block Transfer Long.
[S]..[S-1]/
long pointer to destination
[S-2]/
word count
[S-3]..[S-4]/
long pointer to source
This implementation requires that REC never be used following BLTL because
values on the stack may have been modified by page faults or interrupts to
reflect intermediate transfer states.
Timing: 35.25 + 23 cycles/word
%
@BLTL:
T ← RHMask[Stack&-1], Call[StackLPDest0], Opcode[364];
RTemp1 ← (RTemp1) or not (0C);
T ← RHMask[Stack&-1];
LPhi ← T, LoadPage[opPage1];
*Fixup: source+T, dest+T, count+1
MemStat ← BltLFixup, Call[StackLPy];
Stack&+3, Call[BLTLgo];*Point to count
*Loop here
PFetch1[LP,RTemp];
*Count is decremented immediately after the PFetch1 so that it will
*always have been decremented if the PFetch1 page faults.
*(Unless a PFetch4 by an io task experiences error correction immediately
*after the PFetch1 and the PFetch1 page faults).
Stack ← (Stack) - 1;*Decrement count
PStore1[LPdest,RTemp];
*This Nop is needed both to make 3 mi after the PStore1 and to avoid the
*bypass kludge.
GoTo[BLTLint,IntPending];
BLTLgo:
LU ← Stack;
Skip[ALU#0];
Stack&-3, GoTo[BLTdonex];
*T doesn’t change until 4th mi after PStore1, so it will still have the old
*value if PStore1 page faults (???).
T ← RTemp1 ← (RTemp1) + 1, Return;

BLTLbump:
Stack ← (Stack) + T, At[BLTLFixLoc];*Abs. placement for Fault.mc
Stack&+1, FreezeResult;
Stack ← (Stack) + 1, UseCOutAsCIn, Return;

BLTLint:
Nop;
Nop;
*No change to StkP or T until 4th mi after PStore1 in case of fault.
Stack&-2;
T ← (RTemp1) + 1, Call[BLTLbump];
Stack&+2, Call[BLTLbump];
*Local cache refill not needed here because none of the BLT opcodes jumping
*here depends upon any locals. Jump here from BLTLR1 in MesaESC.Mc.
BLTint:
LoadPage[opPage0];
MemStat ← Normal, GoToP[NopInt];

@BLTC:
T ← CODE, Opcode[365];*Block Transfer Code
LP ← T;
T ← CODEhi, GoTo[BLTcom];

%Block Transfer Code Long
[S],[S-1]/
Long pointer to destination
[S-2]/
Word count
[S-3]/
Source offset from CODE (cardinal)
**Implement BLTCLR (reversed) here
Point LPDest base reg at destination and StkP at source offset.
%
@BLTCL:
T ← RHMask[Stack&-1], Call[StackLPDest0], Opcode[366];
T ← (Stack&+1) - 1;*Get source offset, point StkP at count
LU ← Stack;
*Loop here with StkP pointing at Count after LU ← Count.
BLTCL1:
T ← (Stack) + T, Skip[ALU#0];*T ← count + source offset - 1.
Stack&-2, GoTo[BLTdonex];
PFetch1[CODE,RTemp];
T ← (Stack&-1) - 1;
*Page or write protect fault aborts 4th mi after PStore1; do not update count
*until the 4th mi, so that no fixup action is needed in Fault.Mc.
PStore1[LPDest,RTemp], Call[P7Ret];***Nop better than Call here
T ← (Stack&+1) - 1, Call[P7Ret];
Stack ← (Stack) - 1, GoTo[BLTCL1,IntPending’];*Decrement count
Stack&+2, GoTo[BLTint];

*Lengthen Pointer.
*Timing: 13.25 cycles non-NIL, 10.25 cycles NIL.
@LP:
T ← Stack&-1, LoadPage[opPage2], Opcode[367];*Pop to interlock PFetch2’s
SkipP[ALU=0];*Test for NIL
OnPage[opPage2];
T ← RSh[MDShi,10];*push MDShi if non-NIL
LU ← NextInst[IBuf], CallX[SkipPushTx];

*ESC and ESCL (opcodes 370-371) are in MesaESC.Mc

OnPage[opPage3];
P7Undef:
LoadPage[opPage0];
RTemp ← sOpcodeTrap, GoToP[SDTrap];

:UNLESS[WithCedar]; ******************************
@OP372:
xfTrapParm0 ← 372C, GoTo[P7Undef], Opcode[372];
@OP373:
xfTrapParm0 ← 373C, GoTo[P7Undef], Opcode[373];
@OP374:
xfTrapParm0 ← 374C, GoTo[P7Undef], Opcode[374];
@OP375:
xfTrapParm0 ← 375C, GoTo[P7Undef], Opcode[375];
:ENDIF; ******************************************
@OP376:
xfTrapParm0 ← 376C, GoTo[P7Undef], Opcode[376];

%Cause pagefault trap--should not occur as a bytecode. Fault.Mc fills IBuf
with 377b bytes when it wants to continue the opcode which faulted but
trap at entry to the next opcode. This circumlocution is needed to
execute the microinstruction after the NextInst and still service the
fault.
***What happens if an interrupt intervenes here? This would be fatal.
%
@FaultFinish:
LoadPageExternal[FaultPage], Opcode[377];
GoToExternal[StartMemTrapLoc];

:END[MesaOP3];