DIRECTORY HandCoding, HandCodingPseudos; DragonStackSave: CEDAR PROGRAM IMPORTS HandCoding, HandCodingPseudos = BEGIN OPEN HandCoding, HandCodingPseudos; RegArray: TYPE = ARRAY Reg OF Word; DumpFrame: TYPE = LONG POINTER TO DumpFrameRep; DumpFrameRep: TYPE = RECORD [ link: DumpFrame, -- link to previous frame lastPC: Word, -- continuation PC for this frame nRegs: Word, -- number of regs used for this frame at last save others: DumpFrame, -- if # NIL, then it is a pointer to an aux frame regs: RegArray -- addressable by local regs ]; regOff: CARDINAL = 4; -- offset of DumpBlock.regs LocalAllocationRep: TYPE = RECORD [ lockPtr: GlobalAllocationPtr, -- pointer to global lock for frame allocation ptrs: LocalArray -- start of frame pointers ]; LocalArray: TYPE = ARRAY [0..16] OF DumpFrame; GlobalAllocationPtr: TYPE = LONG POINTER TO GlobalAllocationRep; GlobalAllocationRep: TYPE = RECORD [ lockPtr: Word, -- pointer to global lock for frame allocation gFree: Word, -- pointer to the next free frame in frames emergency: Word, -- pointer to the emergency limit for frames (not yet used) frames: GlobalArray -- space for the global pool of dump frames ]; GlobalArray: TYPE = ARRAY [0..1024] OF DumpFrame; StackMargin: NAT _ 8; framesToTransfer: NAT _ 4; StackUnderflowTrap: PROC = { drLI0[]; SetInterruptEnable[]; RestoreStackFrame[]; drLI1[]; SetInterruptEnable[]; drSJ[]; }; StackOverflowTrap: PROC = { SaveStackFrame[]; drRET[]; }; SaveStackFrame: PROC = { shortLabel: Label = GenLabel[]; jumpLabel: Label = GenLabel[]; AllocDumpFrame[]; LReg[hook]; drPSB[0]; PReg[hook]; GetEldestPC[]; drPSB[1]; GetEldestNregs[]; PReg[nregs]; drPSB[2]; drLI0[]; drWSB[3]; drLIB[16]; drRJLEBJ[popLeft: TRUE, right: nregs, dist: UseLabel8[shortLabel]]; AllocDumpFrame[]; PReg[temp]; LReg[hook]; drWB[3]; drLIB[16]; drRSUB[c: nregs, a: nregs, b: popSrc]; GetEldestRL[]; LReg[nregs]; drADD[]; SetRL[]; FOR i: NAT DECREASING IN [0..15] DO drWAI[reg1: [reg[i]], reg2: temp, disp: regOff+i]; ENDLOOP; SetLabel[shortLabel]; GetEldestRL[]; SetRL[]; drRSUB[c: pushDst, a: const0, b: nregs]; drRSUB[c: topDst, a: topSrc, b: nregs]; drRSUB[c: topDst, a: topSrc, b: nregs]; IndexedJump[jumpLabel]; FOR i: NAT DECREASING IN [0..15] DO drWAI[reg1: [reg[i]], reg2: hook, disp: regOff+i]; ENDLOOP; SetLabel[jumpLabel]; FlushEldest[]; }; RestoreStackFrame: PROC = { shortLabel: Label = GenLabel[]; jumpLabel: Label = GenLabel[]; MoveRegI[nregs, hook, const2]; MoveRegI[temp, hook, const3]; LReg[temp]; drRJEBJ[popLeft: TRUE, right: const0, dist: UseLabel8[shortLabel]]; GetRL[]; drADDNB[NegByte[16]]; SetRL[]; FOR i: NAT DECREASING IN [0..15] DO drRAI[reg1: [reg[i]], reg2: temp, disp: regOff+i]; ENDLOOP; FreeDumpFrame[]; drLIB[16]; drRADD[c: nregs, a: nregs, b: popSrc]; SetLabel[shortLabel]; MoveReg[temp, hook]; MoveRegI[hook, hook, const0]; GetRL[]; drRSUB[c: topDst, a: topSrc, b: nregs]; drDUP[]; SetRL[]; drADDNB[NegByte[StackMargin]]; GetSPLimit[]; drRSUB[c: pushDst, a: const0, b: nregs]; drRSUB[c: topDst, a: topSrc, b: nregs]; drRSUB[c: topDst, a: topSrc, b: nregs]; IndexedJump[jumpLabel]; FOR i: NAT DECREASING IN [0..15] DO drRAI[reg1: [reg[i]], reg2: temp, disp: regOff+i]; ENDLOOP; SetLabel[jumpLabel]; FreeDumpFrame[]; LRegI[temp, const1]; }; AllocDumpFrame: PROC = { exitLabel: Label = GenLabel[]; retryLabel: Label = GenLabelHere[]; LRegI[free, const0]; AddReg[free, const1]; drRJNEBJ[right: const0, dist: UseLabel8[exitLabel]]; drRFX[c: topDst, a: base, b: topSrc]; SubReg[free, const1]; drCST[c: topDst, a: const0, b: process]; drJCST[UseLabel8[retryLabel]]; LRegI[base, const0]; drRSB[1]; FOR i: NAT IN [0..framesToTransfer) DO drRSB[i]; drRSD[c: free, b: free, a: popSrc]; ENDLOOP; drADDB[framesToTransfer]; drPSB[1]; drLI0[]; drWSB[0]; drJB[UseLabel8[retryLabel]]; SetLabel[exitLabel]; }; FreeDumpFrame: PROC [] = { exitLabel: Label = GenLabel[]; retryLabel: Label = GenLabelHere[]; drRSUB[c: pushDst, a: free, b: const1]; drRJNEBJ[popLeft: TRUE, right: base, dist: UseLabel8[exitLabel]]; LRegI[base, const0]; drCST[c: topDst, a: const0, b: process]; drJCST[UseLabel8[retryLabel]]; LRegI[base, const0]; drRSB[1]; drADDNB[NegByte[framesToTransfer]]; FOR i: NAT IN [0..framesToTransfer) DO LRegI[free, const0]; AddReg[free, const1]; drPSB[i]; ENDLOOP; drPSB[1]; drLI0[]; drWSB[0]; SetLabel[exitLabel]; drRSD[c: free, b: free, a: temp]; }; END.  DragonStackSave.mesa Russ Atkinson, August 17, 1983 8:40 pm type of holder for frames per processor, limit is quite arbitrary (need not be fixed) NIL is stored at the very end to make test for empty easy type of holder for frames, limit is quite arbitrary (need not be fixed) NIL is stored at the very end to make test for empty easy Number of words in EU stack to reserve to handle stack overflow. Number of frames to transfer between local and global frame pools. This trap occurs when there are no more entries in the IFU stack and a return is executed. There may well be words on the EU stack that must be preserved (since they are being returned to the frame we want to restore). disable interrupts restore the youngest frame in memory, continuation PC is on the stack enable interrupts transfer to the frame Note: assume that there are sufficient words on the EU stack to complete the saving of one frame. The stack overflow trap is disabled on entry to this routine, and must be reenabled on the way out. save the stack frame return to the faulting instruction in the most recent frame (we assume that this will re-enable interrupts if they were enabled at the start of the trap) This sequence saves one stack frame to memory. We assume that we will NOT be rescheduled, and that there will be no stack overflow during this routine. Timing (assuming all frames from local array, no traps, bus collisions, etc.) <= 16 regs => 38+n*2 cycles. > 16 regs => 53+n*2 cycles + 1 mispredicted jump. Note that stack overflow occurs with a few (governed by the stack overflow register in the IFU) extra words on the stack. In this "routine" we assume that we have at least two words left on the stack. Allocate a fixed dump block to hold the first 16 registers. Then save the linkage, the current PC, and the number of registers in use to the dump block. Allocate a dump frame to use, it is left on the stack. store the hook into it; set the hook to the new frame; leave hook on stack Store the continuation PC determine the # of regs to store; save that number on stack and in nregs store that number into the save block clear the others field, remove the hook from the stack If there are no more than 16 locals, skip the extra code that dumps an extra block. Allocate a dump frame to use, it is placed on the stack. (hook+3)^ _ temp, which sets the "others" field nregs _ nregs - 16 set the current RL to the eldest RL + nregs - 16 Generate 16 stores of locals to the save block at this point nregs <= 16 set the current RL to the eldest RL so we can access the locals of the frame to be saved top of stack now has 0 - nregs top of stack now has 0 - nregs*2 top of stack now has 0 - nregs*3 jump into the save table Generate 16 stores of locals to the save block Remove the eldest IFU frame (the one we just put into memory) This routine restores one stack frame (pointed to by hook). It returns the continuation PC of the restored frame on the stack. It assumes that the IFU stack is empty. We assume that we will NOT be rescheduled during this routine. Timing (assuming all frames from local array, no traps, bus collisions, etc.) <= 16 regs => 35+n cycles. > 16 regs => 47+n cycles + 1 mispredicted jump. Put the # of registers in nregs Put the frame extension address in temp (and on the stack) Jump if the extension address is 0 (flush stack in either case) Set the current RL to the current RL - 16 restore each word nregs _ nregs - 16 This is point where we restore frames without extensions. temp _ hook; hook _ hook^ put the current RL on the stack and subtract the # of regs dup the new RL and set it in the IFU set the stack overflow to be 8 less than the new RL top of stack now has 0 - nregs top of stack now has 0 - nregs*2 top of stack now has 0 - nregs*3 jump into the table restore each word push the continuation PC Allocate a dump frame using the local array of frames, defaulting to the global array of frames if not immediately successful. There is no provision for failure from the global array at this time. When there are frames in the local frame array, the allocation time will be a optimal 3 cycles. If frames must be pulled from the global area, things will be substantially slower (roughly 40 cycles for 4 frames pulled from the global array). Allocate a frame; Adjust the free pointer (free cycle due to the fetch) Jump if we got the frame; leave 0 on stack if not. Replace top of stack (was 0) with the addr of the lock. Adjust for the bogus increment of free (free cycle due to the fetch) Try for the global lock for the frame allocator. If at first we do not succeed, try, try again. Get the free pointer on the stack (lock addr is also left on stack) get the frame address (free _ free - 1)^ _ [S]; S _ S - 1 adjust the global free frame pointer (gFree) for the number of frames allocated Write gFree back, leaving the addr of the lock on the stack. Clear out the lock, which lets other processors get their chance. Go to retry the allocation from the start (stack is back to ground level). This is the successful exit point. The allocated frame is on the stack. Free a dump frame (in temp) using the local array of frames, defaulting to the global array of frames if not immediately successful. There is no provision for failure from the global array at this time. When there is room in the local array, the time to free a frame is 4 cycles (assuming no cache misses, of course). If frames must be put into the global area, things will be substantially slower (roughly 36 cycles for 4 frames put into the global array). Put free-1 on the stack Determine if free-1 will collide with base, jump if not. Put the addr of the lock on the stack Try for the global lock for the frame allocator. If at first we do not succeed, try, try again. get the global free pointer (gFree) on the stack (leaving the lock addr under it) gFree _ gFree - framesToTransfer; leave gFree on stack (gFree+i)^ _ free^; free _ free + 1 write gFree back, leaving the lock addr on the stack clear out the lock, which lets other processors get theirs This is the successful exit point. There is room between free and base. (free _ free - 1)^ _ temp Κ φ˜šœ™Jšœ&™&—J˜šΟk ˜ Jšœ ˜ Jšœ˜J˜—šœ ˜Jšœ˜%Jšœœœ˜+J˜J–20 sp tabStopsšœ œœœ˜#J–20 sp tabStopsš œ œœœœ˜/–36 sp tabStopsšœœœ˜J–36 sp tabStopsšœΟc˜*J–36 sp tabStopsšœž!˜/J–36 sp tabStopsšœ ž2˜?J–36 sp tabStopsšœž1˜DJ–36 sp tabStopsšœž˜+J–36 sp tabStopsšœ˜J–36 sp tabStopsšœœž˜1—J˜–36 sp tabStopsšœœœ˜#J–36 sp tabStopsšœž.˜LJ–36 sp tabStopsšœž˜+J–36 sp tabStopsšœ˜–24 sp tabStopsšœ œœ œ ˜.J–24 sp tabStopsšœU™UJ–24 sp tabStops™9——J–24 sp tabStops˜J–24 sp tabStopsš œœœœœ˜@–24 sp tabStopsšœœœ˜$J–24 sp tabStopsšœž.˜=J–24 sp tabStopsšœ ž+˜8J–24 sp tabStopsšœž;˜LJ–24 sp tabStopsšœž+˜?J–24 sp tabStopsšœ˜–24 sp tabStopsšœ œœ œ ˜1J–24 sp tabStopsšœG™GJ–24 sp tabStops™9——J˜šœ œ˜Jšœ@™@—šœœ˜JšœB™B—J–24 sp tabStops™J˜šΟnœœ˜JšœΫ™Ϋšœ˜Jšœ™—šœ˜JšœE™E—šœ˜Jšœ™—šœ˜Jšœ™—J˜J˜—šŸœœ˜JšœΖ™Ζšœ˜Jšœ™—šœ˜J™™—J˜J˜—šŸœœ˜Jšœ˜™˜šœM™MJšœ™Jšœ1™1—JšœΙ™ΙJ˜Jšœ™™™J˜Jšœ˜Jšœ˜J˜˜Jšœ6™6—šœ!˜!JšœJ™J—šœ˜Jšœ™—šœ˜JšœH™H—šœ ˜ Jšœ%™%—šœ˜J™6—J™Jšœ ˜ šœœ-˜CJšœS™SJ˜˜Jšœ8™8—šœ ˜ Jšœ/™/—šœ1˜1Jšœ™—šœ-˜-Jšœ0™0—š œœ œœ ˜#Jšœ.™.Jšœ2˜2Jšœ˜—J™šœ˜Jšœ™—J˜—šœ˜JšœX™X—šœ(˜(Jšœ™—šœ'˜'Jšœ ™ —šœ'˜'Jšœ ™ —šœ˜Jšœ™—š œœ œœ ˜#Jšœ.™.Jšœ2˜2Jšœ˜—Jšœ˜šœ˜Jšœ=™=—J˜J˜—šŸœœ˜Jšœθ™θšœM™MJšœ™Jšœ/™/—Jšœ˜Jšœ˜J˜šœ˜Jšœ™—šœ)˜)Jšœ:™:—šœœ.˜CJ™?J˜šœ'˜'Jšœ)™)—š œœ œœ ˜#Jšœ™Jšœ2˜2Jšœ˜—J˜šœ1˜1Jšœ™—J˜šœ˜Jšœ9™9—J˜—šœ2˜2Jšœ™—šœ0˜0Jšœ:™:—šœ˜J™$—šœ,˜,J™3—J˜šœ(˜(Jšœ™—šœ'˜'Jšœ ™ —šœ'˜'Jšœ ™ —šœ˜Jšœ™—š œœ œœ ˜#Jšœ™Jšœ2˜2Jšœ˜—Jšœ˜J˜šœ˜Jšœ™—J˜J˜—šŸœœ˜Jšœ€ΟbE™ΕJšœσ™σJ˜Jšœ˜Jšœ#˜#J˜Jšœ*˜*JšœG™GJ˜šœ4˜4Jšœ2™2J˜˜%Jšœ7™7—šœ˜JšœD™D—šœ(˜(Jšœ0™0—šœ˜Jšœ.™.—J˜˜JšœC™C—šœœœ˜&˜ Jšœ™—šœ#˜#Jšœ#™#—Jšœ˜—šœ˜JšœO™O—˜ Jšœ<™<—˜JšœA™A—šœ˜JšœJ™J——J˜˜JšœH™HJ˜—J˜J˜—šŸ œœ˜Jšœ† E™ΛJšœ€™€J˜Jšœ˜Jšœ#˜#J˜šœ'˜'Jšœ™—šœœ+˜AJšœ8™8J˜˜Jšœ%™%—šœ(˜(Jšœ0™0—šœ˜Jšœ.™.—J˜˜JšœQ™Q—šœ#˜#Jšœ6™6—šœœœ˜&šœ4˜4Jšœ#™#—Jšœ˜—˜ Jšœ4™4—˜Jšœ:™:—J˜˜JšœH™H—J˜—šœ!˜!Jšœ™—J˜J˜—Jšœ˜—J˜—…—34