DragonStackSave.mesa
Russ Atkinson, August 17, 1983 8:40 pm
DIRECTORY
HandCoding,
HandCodingPseudos;
DragonStackSave: CEDAR PROGRAM
IMPORTS HandCoding, HandCodingPseudos
= BEGIN OPEN HandCoding, HandCodingPseudos;
RegArray: TYPE = ARRAY Reg OF Word;
DumpFrame: TYPE = LONG POINTER TO DumpFrameRep;
DumpFrameRep: TYPE = RECORD [
link: DumpFrame, -- link to previous frame
lastPC: Word, -- continuation PC for this frame
nRegs: Word, -- number of regs used for this frame at last save
others: DumpFrame, -- if # NIL, then it is a pointer to an aux frame
regs: RegArray -- addressable by local regs
];
regOff: CARDINAL = 4; -- offset of DumpBlock.regs
LocalAllocationRep: TYPE = RECORD [
lockPtr: GlobalAllocationPtr, -- pointer to global lock for frame allocation
ptrs: LocalArray -- start of frame pointers
];
LocalArray: TYPE = ARRAY [0..16] OF DumpFrame;
type of holder for frames per processor, limit is quite arbitrary (need not be fixed)
NIL is stored at the very end to make test for empty easy
GlobalAllocationPtr: TYPE = LONG POINTER TO GlobalAllocationRep;
GlobalAllocationRep: TYPE = RECORD [
lockPtr: Word, -- pointer to global lock for frame allocation
gFree: Word, -- pointer to the next free frame in frames
emergency: Word, -- pointer to the emergency limit for frames (not yet used)
frames: GlobalArray -- space for the global pool of dump frames
];
GlobalArray: TYPE = ARRAY [0..1024] OF DumpFrame;
type of holder for frames, limit is quite arbitrary (need not be fixed)
NIL is stored at the very end to make test for empty easy
StackMargin: NAT ← 8;
Number of words in EU stack to reserve to handle stack overflow.
framesToTransfer: NAT ← 4;
Number of frames to transfer between local and global frame pools.
StackUnderflowTrap: PROC = {
This trap occurs when there are no more entries in the IFU stack and a return is executed. There may well be words on the EU stack that must be preserved (since they are being returned to the frame we want to restore).
drLI0[]; SetInterruptEnable[];
disable interrupts
RestoreStackFrame[];
restore the youngest frame in memory, continuation PC is on the stack
drLI1[]; SetInterruptEnable[];
enable interrupts
drSJ[];
transfer to the frame
};
StackOverflowTrap: PROC = {
Note: assume that there are sufficient words on the EU stack to complete the saving of one frame. The stack overflow trap is disabled on entry to this routine, and must be reenabled on the way out.
SaveStackFrame[];
save the stack frame
drRET[];
return to the faulting instruction in the most recent frame (we assume that this will re-enable interrupts if they were enabled at the start of the trap)
};
SaveStackFrame: PROC = {
This sequence saves one stack frame to memory. We assume that we will NOT be rescheduled, and that there will be no stack overflow during this routine.
Timing (assuming all frames from local array, no traps, bus collisions, etc.)
<= 16 regs => 38+n*2 cycles.
> 16 regs => 53+n*2 cycles + 1 mispredicted jump.
Note that stack overflow occurs with a few (governed by the stack overflow register in the IFU) extra words on the stack. In this "routine" we assume that we have at least two words left on the stack.
Allocate a fixed dump block to hold the first 16 registers. Then save the linkage, the current PC, and the number of registers in use to the dump block.
shortLabel: Label = GenLabel[];
jumpLabel: Label = GenLabel[];
AllocDumpFrame[];
Allocate a dump frame to use, it is left on the stack.
LReg[hook]; drPSB[0]; PReg[hook];
store the hook into it; set the hook to the new frame; leave hook on stack
GetEldestPC[]; drPSB[1];
Store the continuation PC
GetEldestNregs[]; PReg[nregs];
determine the # of regs to store; save that number on stack and in nregs
drPSB[2];
store that number into the save block
drLI0[]; drWSB[3];
clear the others field, remove the hook from the stack
drLIB[16];
drRJLEBJ[popLeft: TRUE, right: nregs, dist: UseLabel8[shortLabel]];
If there are no more than 16 locals, skip the extra code that dumps an extra block.
AllocDumpFrame[];
Allocate a dump frame to use, it is placed on the stack.
PReg[temp]; LReg[hook]; drWB[3];
(hook+3)^ ← temp, which sets the "others" field
drLIB[16]; drRSUB[c: nregs, a: nregs, b: popSrc];
nregs ← nregs - 16
GetEldestRL[]; LReg[nregs]; drADD[]; SetRL[];
set the current RL to the eldest RL + nregs - 16
FOR i: NAT DECREASING IN [0..15] DO
Generate 16 stores of locals to the save block
drWAI[reg1: [reg[i]], reg2: temp, disp: regOff+i];
ENDLOOP;
SetLabel[shortLabel];
at this point nregs <= 16
GetEldestRL[]; SetRL[];
set the current RL to the eldest RL so we can access the locals of the frame to be saved
drRSUB[c: pushDst, a: const0, b: nregs];
top of stack now has 0 - nregs
drRSUB[c: topDst, a: topSrc, b: nregs];
top of stack now has 0 - nregs*2
drRSUB[c: topDst, a: topSrc, b: nregs];
top of stack now has 0 - nregs*3
IndexedJump[jumpLabel];
jump into the save table
FOR i: NAT DECREASING IN [0..15] DO
Generate 16 stores of locals to the save block
drWAI[reg1: [reg[i]], reg2: hook, disp: regOff+i];
ENDLOOP;
SetLabel[jumpLabel];
FlushEldest[];
Remove the eldest IFU frame (the one we just put into memory)
};
RestoreStackFrame: PROC = {
This routine restores one stack frame (pointed to by hook). It returns the continuation PC of the restored frame on the stack. It assumes that the IFU stack is empty. We assume that we will NOT be rescheduled during this routine.
Timing (assuming all frames from local array, no traps, bus collisions, etc.)
<= 16 regs => 35+n cycles.
> 16 regs => 47+n cycles + 1 mispredicted jump.
shortLabel: Label = GenLabel[];
jumpLabel: Label = GenLabel[];
MoveRegI[nregs, hook, const2];
Put the # of registers in nregs
MoveRegI[temp, hook, const3]; LReg[temp];
Put the frame extension address in temp (and on the stack)
drRJEBJ[popLeft: TRUE, right: const0, dist: UseLabel8[shortLabel]];
Jump if the extension address is 0 (flush stack in either case)
GetRL[]; drADDNB[NegByte[16]]; SetRL[];
Set the current RL to the current RL - 16
FOR i: NAT DECREASING IN [0..15] DO
restore each word
drRAI[reg1: [reg[i]], reg2: temp, disp: regOff+i];
ENDLOOP;
FreeDumpFrame[];
drLIB[16]; drRADD[c: nregs, a: nregs, b: popSrc];
nregs ← nregs - 16
SetLabel[shortLabel];
This is point where we restore frames without extensions.
MoveReg[temp, hook]; MoveRegI[hook, hook, const0];
temp ← hook; hook ← hook^
GetRL[]; drRSUB[c: topDst, a: topSrc, b: nregs];
put the current RL on the stack and subtract the # of regs
drDUP[]; SetRL[];
dup the new RL and set it in the IFU
drADDNB[NegByte[StackMargin]]; GetSPLimit[];
set the stack overflow to be 8 less than the new RL
drRSUB[c: pushDst, a: const0, b: nregs];
top of stack now has 0 - nregs
drRSUB[c: topDst, a: topSrc, b: nregs];
top of stack now has 0 - nregs*2
drRSUB[c: topDst, a: topSrc, b: nregs];
top of stack now has 0 - nregs*3
IndexedJump[jumpLabel];
jump into the table
FOR i: NAT DECREASING IN [0..15] DO
restore each word
drRAI[reg1: [reg[i]], reg2: temp, disp: regOff+i];
ENDLOOP;
SetLabel[jumpLabel];
FreeDumpFrame[];
LRegI[temp, const1];
push the continuation PC
};
AllocDumpFrame: PROC = {
Allocate a dump frame using the local array of frames, defaulting to the global array of frames if not immediately successful. There is no provision for failure from the global array at this time.
When there are frames in the local frame array, the allocation time will be a optimal 3 cycles. If frames must be pulled from the global area, things will be substantially slower (roughly 40 cycles for 4 frames pulled from the global array).
exitLabel: Label = GenLabel[];
retryLabel: Label = GenLabelHere[];
LRegI[free, const0]; AddReg[free, const1];
Allocate a frame; Adjust the free pointer (free cycle due to the fetch)
drRJNEBJ[right: const0, dist: UseLabel8[exitLabel]];
Jump if we got the frame; leave 0 on stack if not.
drRFX[c: topDst, a: base, b: topSrc];
Replace top of stack (was 0) with the addr of the lock.
SubReg[free, const1];
Adjust for the bogus increment of free (free cycle due to the fetch)
drCST[c: topDst, a: const0, b: process];
Try for the global lock for the frame allocator.
drJCST[UseLabel8[retryLabel]];
If at first we do not succeed, try, try again.
LRegI[base, const0]; drRSB[1];
Get the free pointer on the stack (lock addr is also left on stack)
FOR i: NAT IN [0..framesToTransfer) DO
drRSB[i];
get the frame address
drRSD[c: free, b: free, a: popSrc];
(free ← free - 1)^ ← [S]; S ← S - 1
ENDLOOP;
drADDB[framesToTransfer];
adjust the global free frame pointer (gFree) for the number of frames allocated
drPSB[1];
Write gFree back, leaving the addr of the lock on the stack.
drLI0[]; drWSB[0];
Clear out the lock, which lets other processors get their chance.
drJB[UseLabel8[retryLabel]];
Go to retry the allocation from the start (stack is back to ground level).
SetLabel[exitLabel];
This is the successful exit point. The allocated frame is on the stack.
};
FreeDumpFrame: PROC [] = {
Free a dump frame (in temp) using the local array of frames, defaulting to the global array of frames if not immediately successful. There is no provision for failure from the global array at this time.
When there is room in the local array, the time to free a frame is 4 cycles (assuming no cache misses, of course). If frames must be put into the global area, things will be substantially slower (roughly 36 cycles for 4 frames put into the global array).
exitLabel: Label = GenLabel[];
retryLabel: Label = GenLabelHere[];
drRSUB[c: pushDst, a: free, b: const1];
Put free-1 on the stack
drRJNEBJ[popLeft: TRUE, right: base, dist: UseLabel8[exitLabel]];
Determine if free-1 will collide with base, jump if not.
LRegI[base, const0];
Put the addr of the lock on the stack
drCST[c: topDst, a: const0, b: process];
Try for the global lock for the frame allocator.
drJCST[UseLabel8[retryLabel]];
If at first we do not succeed, try, try again.
LRegI[base, const0]; drRSB[1];
get the global free pointer (gFree) on the stack (leaving the lock addr under it)
drADDNB[NegByte[framesToTransfer]];
gFree ← gFree - framesToTransfer; leave gFree on stack
FOR i: NAT IN [0..framesToTransfer) DO
LRegI[free, const0]; AddReg[free, const1]; drPSB[i];
(gFree+i)^ ← free^; free ← free + 1
ENDLOOP;
drPSB[1];
write gFree back, leaving the lock addr on the stack
drLI0[]; drWSB[0];
clear out the lock, which lets other processors get theirs
SetLabel[exitLabel];
This is the successful exit point. There is room between free and base.
drRSD[c: free, b: free, a: temp];
(free ← free - 1)^ ← temp
};
END.