*----------------------------------------------------------- Title[BitBlt.mc...November 7, 1982 4:06 PM...Taft]; * Bit-boundary block-transfer (PrincOps, chapter 8) *----------------------------------------------------------- * Refer to the PrincOps for primary documentation. * All numbers are octal except timings, which are decimal. * Data structures: * BitBltArg: TYPE = MACHINE DEPENDENT RECORD [ MSC[BB.dstLo, 0]; * dst: BitAddress, MSC[BB.dstHi, 1]; MSC[BB.dstBit, 2]; MSC[BB.dstBpl, 3]; * dstBpl: INTEGER, MSC[BB.srcLo, 4]; * src: BitAddress, MSC[BB.srcHi, 5]; MSC[BB.srcBit, 6]; MSC[BB.srcBpl, 7]; * srcBpl: INTEGER, MSC[BB.width, 10]; * width: CARDINAL, MSC[BB.height, 11]; * height: CARDINAL, MSC[BB.flags, 12]; * flags: BitBltFlags ]; * BitAddress: TYPE = MACHINE DEPENDENT RECORD [ * word: LONG POINTER, * reserved: [0..7777B], * bit: [0..17B] ]; * BitBltFlags: TYPE = MACHINE DEPENDENT RECORD [ MC[BBF.direction, 100000]; * direction: {forward, backward}, MC[BBF.disjoint, 40000]; * disjoint: BOOLEAN, MC[BBF.disjointItems, 20000]; * disjointItems: BOOLEAN, MC[BBF.gray, 10000]; * gray: BOOLEAN, MC[BBF.srcFunc, 4000]; * srcFunc: {normal, complement}, MC[BBF.dstFunc, 3000]; * dstFunc: {null, and, or, xor}, * reserved: [0..777B] ← 0 ]; * -- re-interpretation of srcBpl if gray = TRUE * GrayParm: TYPE = MACHINE DEPENDENT RECORD [ * reserved: [0..17B] ← 0, * yOffset: [0..17B], * widthMinusOne: [0..17B] ← 0, -- present spec requires 0 * heightMinusOne: [0..17B] ]; % The 20 BitBlt Functions (combinations of gray, srcFunc, and dstFunc) are divided into 4 classes: A dest ← f(gray) B dest ← f(gray, dest) C dest ← f(source) D dest ← f(source, dest) The distribution of functions into classes is: Function (class) gray srcFunc dstFunc 0 (C) FALSE normal null 1 (D) FALSE normal and 2 (D) FALSE normal or 3 (D) FALSE normal xor 4 (C) FALSE complement null 5 (D) FALSE complement and 6 (D) FALSE complement or 7 (D) FALSE complement xor 10 (A) TRUE normal null 11 (B) TRUE normal and 12 (B) TRUE normal or 13 (B) TRUE normal xor 14 (A) TRUE complement null 15 (B) TRUE complement and 16 (B) TRUE complement or 17 (B) TRUE complement xor Terminology: when referring to words in a scan line, "left" and "right" refer to the words with lower and higher addresses, respectively, independent of the direction of processing the scan line; "first" and "last" refer to the first and last words encountered in the direction of processing. Since the Dorado can do BitBlt in either direction at equal cost, this implementation always uses the direction specified by the programmer. The disjoint bit is ignored, and disjointItems is used only to determine whether or not "touching" of the data is required in some cases (see below). The destination item is considered to consist of a left partial word, some number of full (body) words, and a right partial word. If the item begins or ends on a word boundary, the left or right word is still considered to be a partial word. The destination bits preserved in the left and right partial words are determined by the SHC register's LMask and RMask, respectively, where LMask = dst.bit, RMask = 17-((dst.bit+width-1) mod 20). The destination width in words, including the first and last partial words, is computed by DWidth = (width + dst.bit + 17)/20. Similarly, the source width in words, including the first and last partial words, is computed by SWidth = (width + src.bit + 17)/20. DWidth and SWidth differ by at most +/-1. If DWidth > 2 then the destination consists of a left partial word, DWidth-2 full body words, and a right partial word. If DWidth = 2 then the item consists of a left partial word immediately followed by a right partial word. If DWidth = 1 then the entire destination item lies within a single word, not crossing a word boundary. Effectively, the left and right partial words are one and the same. This case (called the "thin" case) requires special handling, as both LMask and RMask must be applied simultaneously. Processing of the scan line is controlled by two counters: the Cnt register (loaded from ICnt), which counts the inner loop (20 words or fewer), and MCount, which counts the outer loop. Roughly speaking, Cnt is loaded initially with DWidth mod 20 and MCount with DWidth/20. The inner loop is executed Cnt times; then, until MCount is exhausted, Cnt is reloaded with 20 and the inner loop is executed 20 more times. The reason for this arrangement is to permit a PreFetch to be executed every 20 words. More precisely: Before each horizontal loop, the Cnt register is loaded with if DWidth < 23 then DWidth-1 else (DWidth-3) mod 20 +2, so: Cnt > 1 if the body contains one or more words. Cnt = 1 if there are no body words. Cnt = 0 in the thin case. Cnt is decremented twice before the main loop is reached, so upon entry to the main loop it contains DWidth-3 (assuming DWidth<23), which is precisely the correct value for going around the loop DWidth-2 times. (The extra 2 in DWidth are the first and last partial words, which are handled outside the word loop.) MCount is loaded with (DWidth-3)/20 -1, so: MCount < 0 if DWidth < 23 MCount = 0 if 23 <= DWidth < 42, etc. Each time the inner loop terminates, if MCount<0 the scan line is finished, but if MCount>=0, MCount is decremented, Cnt is loaded with 17, and the main loop is reentered for another 20 iterations. When working left-to-right, the source may be thought of as T,,SrcWd -- that is, with T on the left and SrcWd on the right. In any given operation, T contains the previous source word and SrcWd contains the current one, and the shift operation consists of left-shifting T,,SrcWd by 0 to 17 bits (to align it with the destination) and storing the leftmost 20 bits of the result. Since the shifter is actually a cycler and produces the rightmost 20 bits (rather than the leftmost) of the cycled result, the shifter must be set up to left-cycle an additional 20 bits (by exchanging SHA and SHB). Hence SHA=R and SHB=T. When working right-to-left, the source may be thought of as SrcWd,,T, where T contains the previous source word and SrcWd the current, as before. This is exactly symmetric with the left-to-right case and requires only that SHA and SHB be exchanged. Hence SHA=T and SHB=R. Regardless of the direction, if there are at least as many data bits in the first word of the source item as in the first word of the destination item, then all the destination bits come from SrcWd during the first shift and T need not be loaded at all. On the other hand, if there are fewer bits in the first word of the source item, the destination bits come from both SrcWd and T. In this case, T must contain the first source word and SrcWd the second. These two situations must be distinguished, since one requires fetching only a single word whereas the other requires fetching two words before operating the shifter for the first time. A similar situation arises at the end of an item; that is, all the bits in the last partial destination item may come from the leftover source word in T, or it may be necessary to fetch an additional word into SrcWd. These situations must be distinguished also. An earlier version of this microcode omitted the test and simply fetched the "extra" word always; unfortunately, this sometimes resulted in touching a word outside the bit map, which caused problems in a paged environment when the bit map happened to be page-aligned. Two bits in BBCtrl control the fetching of the "extra" word in the above two cases. In the normal case in which both srcBpl and dstBpl are multiples of 16, the ShC register and all horizontal loop control information can be computed just once for all scan lines. If either srcBpl or dstBpl is not a multiple of 16, this doesn't work, and all this stuff has to be recomputed for each scan line. I consider this case to be anomalous -- the PrincOps specification is overly general, or perhaps was motivated by clever hacks such as being able to BitBlt parallelograms. Therefore, this implementation takes the easy way out in this case by doing just one scan line, saving state as if an interrupt had occurred, and restarting BitBlt from the beginning. This is slow as blazes, but the user gets what he deserves....... All the horizontal loops work in both directions by use of a trick: ALUF[15] is redefined to be A+1 if moving left-to-right but A-1 if right-to-left. This ALU function is invoked by the operation "A+/-1", and is used to advance source and destination pointers along the scan line. A consequence of this is that ALUF[15] (normally A AND NOT B) cannot be used by I/O tasks. BBDst and BBSrc are 16-bit displacements relative to base registers BBDstBR and BBSrcBR. If moving top-to-bottom they start near zero and count up; if bottom-to-top they start at ~2↑15 and count down. If more than ~2↑15 words of bit map are processed, one of these displacements becomes negative. This is detected and causes BitBlt to restart, recomputing the base registers and displacements. BR displacement overflow is handled this way because it takes too much code to recompute the BRs and displacements in mid-stream. The cost of restarting BitBlt from scratch in this case is unimportant, since such a large amount of data is involved. (Indeed, in normal programs an interrupt will occur before 2↑15 words are processed, so BitBlt will be restarted anyway.) BitBlt is normally invoked with the stack containing a single item: TOS: POINTER TO BitBltArg; However, when resuming from an interrupt or fault, the stack looks like this: TOS: (POINTER TO BitBltArg)+1; TOS+1: CARDINAL; -- number of scan lines already done Note that the PrincOps requires BitBltArg to be 16-word aligned, so the two cases can be distinguished by looking at the low-order bit of the pointer. (Note that we distinguish the cases differently from the manner specified in the PrincOps, which examines the stack depth to see whether intermediate state is present or absent. The latter is messy, especially in face of the fact that a page fault restores StkP to its initial value.) BitBlt exits in one of three ways: 1. Normal completion: BitBlt renders the stack empty and exits. 2. Interrupt request pending: BitBlt leaves at TOS+1 the number of scan lines already processed and branches to the interrupt handler. BitBlt is subsequently restarted with the value left in TOS+1. 3. Page fault: BitBlt is careful to ensure restartability in the face of page faults. That is, any time a page fault can occur, TOS+1 contains the number of scan lines already done, and restarting BitBlt with that argument will cause the correct thing to happen. Local stack usage: TOS: BitBltArg TOS+1: number of scan lines completed TOS+2: saved ALUFM[17] TOS+3: saved ALUFM[15] This is, of course, relative to the initial StkP value, which is the one that will be restored by RestoreStkP if a page fault occurs. During the main loops, StkP usually points to TOS+1. Also, remember that traps and faults save and restore two words above TOS. Approximate timing for initialization and cleanup (excluding main loops): 65 cycles +10 if direction=backward +29 if resuming from interrupt + one of the following: gray: 13 cycles minimum +6 if resuming from interrupt +37 if resuming and (heightMinusOne+1) is not a power of 2 ~gray: 21 cycles minimum +29 if resuming from interrupt +2 if direction=backward Totals: gray: 78 cycles minimum, 160 maximum ~gray: 86 cycles minimum, 156 maximum Note: these overheads are incurred on every scan line if dst.bit or src.bit is not a multiple of 16. See comments above main loops for main loop times. % *----------------------------------------------------------- * R-register assignments: *----------------------------------------------------------- SetRMRegion[BBRegs]; * The RMRegion itself is defined in RegisterDefs.mc RVN[BBDst]; * Address of next word to process RVN[BBSrc]; RVN[DstInc]; * Address increment between scan lines -- RVN[SrcInc]; * negative if direction = backward RVN[DRast]; * Raster length (bits, later words) -- RVN[SRast]; * negative if direction = backward RVN[PrefDst]; * Address of next munch to PreFetch RVN[PrefSrc]; RVN[DPrefOffset]; * Offset of leftmost word of next scan line relative RVN[SPrefOffset]; * to first word of current scan line RVN[VCount]; * Vertical line count RVN[MCount]; * Horizontal munch count RVN[ICnt]; * Initial value of Cnt register for word loops RVN[BBDisp]; * Control flags and horizontal loop dispatch * B0=1 => direction = backward * B8:15 = dispatch value relative to HorizontalDisp * (see below for details) RVN[BBCtrl]; * Control flags: * B0=1 => 2 source words required for first dest word * B15=1 => 2 source words required for last dest word RVN[SrcWd]; * Leftover source word -- must be RVREL 17 * Alias used when the source is gray: RME[GrayBump, SRast]; * Size of gray brick in words -1 * Aliases used during initialization: RME[Width, DPrefOffset]; * Width of item in bits RME[DWidth, ICnt]; * Width of destination item in words RME[SWidth, PrefSrc]; * Width of source item in words RME[Skew, DstInc]; * Destination-source skew, mod 20 RME[BBMasks, PrefDst]; * LMask and RMask values to be loaded into SHC RME[BBFlags, SrcInc]; * BitBlt flags word RME[DstX, BBDst]; * Destination starting X in bits, later in words RME[SrcX, BBSrc]; * Source starting X in bits, later in words RME[IBBSrc, SPrefOffset]; * Initial BBSrc (gray only) RME[BBTemp, SrcWd]; * Must be RVRel 17 because it is an arg to MulSub. *----------------------------------------------------------- * Other definitions *----------------------------------------------------------- * Base-register assignments % -- Actually defined in ADefs.mc. First two must be an even-odd pair. BR[BBDstBR, ?]; * BitBlt destination base BR[BBSrcBR, ?]; * BitBlt source base BR[LPtr, ?]; * Scratch (emulator use only) % * ALU functions defined by BitBlt. * The ALUF Ram is loaded by BitBlt with the desired operations. XALUOP[,BBOp,,17,E]; * A BBOp B -- logical operation invoked with shifter XAOP[,+/-1,15,E]; * A +/-1 -- A+1 or A-1 depending on horizontal * direction. This value of ALUF is normally * A AND NOT B and is restored by BitBlt when done. * This means, however, that A AND NOT B cannot be used * by other tasks. * Layout of BBDisp register: * B0 = 0 if direction = forward, 1 if backward * B8-15: BigBDispatch value for setup and body dispatches. * The following addressing constraints apply: * (1) B13 = 1 and B15 = 1 if a real source bit map is used (0 if gray block). * (2) B14 = 0 if the item(s) must be touched before beginning a transfer. * (3) B11 = 1 if the destination is an operand. * (4) Certain targets are tied together by Call constraints. * (5) Must not stomp on too many JCN conditional branch targets. * These bits are rather carefully selected to permit the same BBDisp to be * used in three different dispatches. Set[SrcFlg, 5]; Set[NoTouchFlg, 2]; Set[DstFlg, 20]; * BBDispX defines the page used for dispatches on BBDisp. Set[BBDispX, 1200]; M[BBAt, At[BBDispX, Add[#1, #2]]]; * Page-relative entry points for setup routines. * All must have B11 = 1, to neutralize body dispatch (DstFlag). Set[GrayDstSetup&TouchLoc, 70]; Set[GrayDstSetupLoc, 72]; Set[SrcDstSetup&TouchLoc, 75]; Set[SrcDstSetupLoc, 77]; * Must be SrcDstSetup&TouchLoc + 2 Set[HorizontalDispLoc, 70]; * Target of setup dispatch * Page-relative entry points for body routines. * Each pair must differ only in the value of DstFlag, and the addresses * must neutralize any one bits that may have been present for setup dispatch. Set[GrayBodyLoc, 12]; Set[GrayDstBodyLoc, 32]; Set[SrcBodyLoc, 17]; * These must be xxxx1111 because they contain Set[SrcDstBodyLoc, 37]; * conditional Call instructions *----------------------------------------------------------- NewOp; ESCEntry[BITBLT]; *----------------------------------------------------------- T← MDSHi, MemBase← LPtr, Call[BRHiGetsT]; * See whether this is an initial call or a resumption, and set low bit * of BitBltArg pointer to indicate that further restarts are resumptions. T← Stack&+1← (Stack&+1) OR (1C), Branch[.+2, R odd]; Stack← A0; * Initial call, zero # of lines done T← T-1, StkP+1, RBase← RBase[BBRegs]; BRLo← T; * Set up BR to point to BitBltArg Fetch← BB.flags; BBFlags← MD, Fetch← BB.height; * Dispatch on gray, srcFunc, dstFunc T← LDF[BBFlags, 4, 11]; BigBDispatch← T; Branch[BBFunctionTable]; *----------------------------------------------------------- BBFunctionTable: DispTable[20]; * gray, srcFunc, dstFunc, ALU op T← 1C, Branch[BBFC]; * 0, 0, 0 NOT A T← 21C, Branch[BBFD]; * 0, 0, 1 NOT A AND B T← 5C, Branch[BBFD]; * 0, 0, 2 NOT A OR B T← 15C, Branch[BBFD&T]; * 0, 0, 3 A EQV B T← 37C, Branch[BBFC]; * 0, 1, 0 A T← 35C, Branch[BBFD]; * 0, 1, 1 A AND B T← 27C, Branch[BBFD]; * 0, 1, 2 A OR B T← 23C, Branch[BBFD&T]; * 0, 1, 3 A XOR B T← 1C, Branch[BBFA]; * 1, 0, 0 NOT A T← 21C, Branch[BBFB]; * 1, 0, 1 NOT A AND B T← 5C, Branch[BBFB]; * 1, 0, 2 NOT A OR B T← 15C, Branch[BBFB&T]; * 1, 0, 3 A EQV B T← 37C, Branch[BBFA]; * 1, 1, 0 A T← 35C, Branch[BBFB]; * 1, 1, 1 A AND B T← 27C, Branch[BBFB]; * 1, 1, 2 A OR B T← 23C, Branch[BBFB&T]; * 1, 1, 3 A XOR B *----------------------------------------------------------- * Select the appropriate dispatch word for the function. BBFA: BBDisp← Add[NoTouchFlg]C, Branch[SetBBF]; BBFB: BBDisp← Add[DstFlg, NoTouchFlg]C, Branch[SetBBF]; BBFB&T: BBDisp← Add[DstFlg]C, Branch[SetBBF]; BBFC: BBDisp← Add[SrcFlg, NoTouchFlg]C, Branch[SetBBF]; BBFD: BBDisp← Add[SrcFlg, DstFlg, NoTouchFlg]C, Branch[SetBBF]; BBFD&T: BBDisp← Add[SrcFlg, DstFlg]C, Branch[SetBBF]; SetBBF: Stack&-1← ALUFMRW← T, ALUF[17], * Set ALU function and save old T← MD; * T← height in scan lines * VCount← (scan lines left to do)-1. Stack has scan lines already done. VCount← T-(Stack)-1; * height - (scan lines done) -1 *----------------------------------------------------------- * Set up destination base address *----------------------------------------------------------- * Registers in use: (permanent) BBDisp, VCount; (temporary) BBFlags. T← (Fetch← BB.dstBpl)-1, Branch[BitBltDone1, ALU<0]; Fetch← T, DRast← MD; * DRast← bits per line PD← (DRast) AND (17C); * See if integral words per line T← Stack, Branch[.+2, ALU=0]; T← Stack, Reschedule; * No, process only one scan line Q← T, Branch[DstMult, ALU#0]; * Branch if resuming from interrupt T← (Fetch← BB.dstLo)+1, DstX← MD; * DstX← dst.bit T← MD, Fetch← T; * T← low base word address BBTemp← MD, Branch[DstSetupBR]; * BBTemp← high base word address * Following garbage is only executed when resuming BitBlt from an interrupt * or fault. We must compute the base address for the next scan line. * n.b. our life is made very difficult by the presence of the BitAddress * structure and the specification of bit map width in bits rather than words, * to say nothing of features such as bits/line being negative if direction = * backwards. I assume these were intended to help the Dolphin implementation, * but they are definitely not helpful! Alto BitBlt both had a cleaner * interface and was easier to implement. DstMult: DstX← MD; * DstX← dst.bit T← DRast, Call[MulSub]; * T,,Q ← Q*T, clobbers BBTemp * Now have 32-bit bit offset in T,,Q. Add it to bit map base address. Fetch← BB.dstLo, DRast, Branch[.+2, R>=0], DispTable[1, 2, 2]; * Squash final Multiply dispatch T← T-(Stack); * DRast negative, adjust high result * to make it 32-bit 2's complement DstX← (DstX)+Q; * Add low bit offset to dst.bit T← BBTemp← A← T, XorSavedCarry; * BBTemp← high bit offset T← RCY[T, DstX, 4]; * Convert to word offset T← T+MD; * Add to low base word address DstX← (DstX) AND (17C); * Mask new BitAddress.bit Fetch← BB.dstHi; BBTemp← RSH[BBTemp, 4]; * BBTemp← high word offset BBTemp← (BBTemp)+MD, XorSavedCarry; * Add to high base word address * Together again here. Now BBTemp,,T = adjusted base value. DstSetupBR: MemBase← BBDstBR, DRast, Branch[DstReallySetupBR, R>=0]; * Processing bottom-to-top. * Decrease base by 2↑15, and (later) increase initial displacement by 2↑15. T← T-(100000C); BBTemp← (BBTemp)-1, XorSavedCarry; BBDisp← (BBDisp) OR (100000C); * Set flag to remember direction * Now finally ready to load the base register! DstReallySetupBR: BRLo← T; BRHi← BBTemp; *----------------------------------------------------------- * Set up source base address *----------------------------------------------------------- * Registers in use: (permanent) BBDisp, VCount, DRast; * (temporary) BBFlags, DstX. MemBase← LPtr; T← (Fetch← BB.srcBpl)-1; Fetch← T, SRast← MD, BBDisp, * SRast← bits per line Branch[SrcIsGray, R even]; * Branch if source bit map is gray PD← (SRast) AND (17C); * See if integral words per line T← Stack, Branch[.+2, ALU=0]; T← Stack, Reschedule; * No, process only one scan line Q← T, Branch[SrcMult, ALU#0]; * Branch if resuming from interrupt SrcX← MD, T← (Fetch← BB.srcLo)+1; * SrcX← src.bit T← MD, Fetch← T; * T← low base word address BBTemp← MD, Branch[SrcSetupBR]; * BBTemp← high base word address * Following garbage is only executed when resuming BitBlt from an interrupt * or fault. We must compute the base address for the next scan line. SrcMult: SrcX← MD; * SrcX← src.bit T← SRast, Call[MulSub]; * T,,Q ← Q*T, clobbers BBTemp * Now have 32-bit bit offset in T,,Q. Add it to bit map base address. Fetch← BB.srcLo, SRast, Branch[.+2, R>=0], DispTable[1, 2, 2]; * Squash final Multiply dispatch T← T-(Stack); * SRast negative, adjust high result * to make it 32-bit 2's complement SrcX← (SrcX)+Q; * Add low bit offset to src.bit T← BBTemp← A← T, XorSavedCarry; * BBTemp← high bit offset T← RCY[T, SrcX, 4]; * Convert to word offset T← T+MD; * Add to low base word address SrcX← (SrcX) AND (17C); * Mask new BitAddress.bit Fetch← BB.srcHi; BBTemp← RSH[BBTemp, 4]; * BBTemp← high word offset BBTemp← (BBTemp)+MD, XorSavedCarry; * Add to high base word address * Together again here. Now BBTemp,,T = adjusted base value. SrcSetupBR: MemBase← BBSrcBR, SRast, Branch[.+3, R>=0]; * Processing bottom-to-top. * Decrease base by 2↑15, and (later) increase initial displacement by 2↑15. T← T-(100000C); BBTemp← (BBTemp)-1, XorSavedCarry; * Now finally ready to load the base register! PD← (BBFlags) AND (Or[BBF.disjoint!, BBF.disjointItems!]C); BRLo← T, Branch[.+2, ALU#0]; BBDisp← (BBDisp) AND (Not[NoTouchFlg]C); * Not disjoint, must touch BRHi← BBTemp, Branch[BBSetupX]; *----------------------------------------------------------- * Set up gray source block *----------------------------------------------------------- * Registers in use: (permanent) BBDisp, VCount, DRast; * (temporary) DstX. * Re-interpret srcBpl as a GrayParm record. * Note: assume direction = forward always (required by PrincOps). * BBSrcBR will end up pointing one beyond the end of the gray brick MOD 2↑16, * and BBSrc will address it using the negative indexing trick. SrcIsGray: SrcX← MD, T← (Fetch← BB.srcLo)+1; * SrcX← src.bit BBTemp← MD, Fetch← T; * BBTemp← low base address T← LDF[SRast, 4, 10]; * Extract yOffset GrayBump← (SRast) AND (17C); * GrayBump← heightMinusOne -- * GrayBump and SRast are same register T← (GrayBump)-T; * T← heightMinusOne-yOffset BBTemp← (BBTemp)+T+1, MemBase← BBSrcBR; * End of brick +1 BBTemp← MD, BRLo← BBTemp; * Load up the base register BBTemp← (BBTemp)-1, XorSavedCarry; * high base-1 for negative indexing BRHi← BBTemp; PD← Stack; * Resuming from interrupt? T← (GrayBump)-T, * Recover yOffset Branch[NoGrayDiv, ALU=0]; * Branch if no division required * Following garbage is only executed when resuming BitBlt from an interrupt * or fault. We must compute the grayIndex for the next scan line. * In general this is (yOffset + scanLinesDone) MOD height. I expect height * usually to be a power of two, so optimize this case and only do the * full-blown MOD operation when necessary. T← (Stack)+T; * scanLinesDone + yOffset T← (GrayBump)+1, Q← T; * T← height, Q← scanLinesDone+yOffset BBTemp← T; PD← (GrayBump) AND T; * = 0 iff height is power of 2 T← (GrayBump) AND Q, Branch[.+2, ALU=0]; * DivSub performs [T: remainder, Q: quotient] ← (T,,Q)/BBTemp * and clobbers BBCtrl -- beware! T← A0, Call[DivSub]; Nop; * Together again here. Now T = appropriate initial y coordinate. NoGrayDiv: IBBSrc← T-(GrayBump)-1; * Init negative index for references *----------------------------------------------------------- * X-coordinate setup: widths, margins, skew, masks, etc. *----------------------------------------------------------- * Registers in use: (permanent) BBDisp, VCount, DRast, SRast (= GrayBump); * (temporary) DstX, SrcX (= BBSrc). BBSetupX: MemBase← LPtr; Fetch← BB.width; Width← MD, T← A0; * Compute LMask and RMask. * LMask = dst.bit. * RMask = 17-((dst.bit+width-1) mod 20) = (-dst.bit-width) mod 20. * For reference, SHC fields are: * B2: SHA=T, B3: SHB=T, B4-7: count, B8-11: RMask, B12-15: LMask T← T-(Width); * T← -width T← T-(DstX), * T← -dst.bit-width Branch[BitBltDone2, ALU>=0]; * Stop here if width <= 0 T← DPF[T, 4, 4], StkP+2; * B8-11←((-dst.bit-width)mod 20) lsh 4 BBMasks← T OR (DstX); * B12-15← dst.bit * Compute destination width in words, including first and last partial words. * DWidth← (width + dst.bit +17) / 20. T← (Width)+(17C); T← T+(Q← DstX); DWidth← RSH[T, 4]; BBCtrl← (BBDisp)-(BBDisp)-1, * BBCtrl[0] ← 1 in case gray source Branch[BBNoSetupSrcX, R even]; * Branch if gray * Compute source width in words, including first and last partial words. * SWidth← (width + src.bit +17) / 20. T← (SrcX)+(17C); T← (Width)+T; SWidth← T← RSH[T, 4]; * Set flags to control fetching of "extra" first and last words. * Except in the "thin" case (DWidth=1), the setup/finish routines for the * horizontal loops nominally fetch 1 word and store 2 words; an extra fetch * may be required at the beginning, the end, or both, depending on the number * of words in the source and destination items (see introductory comments). * DWidth and SWidth differ by at most +/-1. * If SWidth = DWidth+1, an extra source word must be fetched at both ends. * If SWidth = DWidth-1 or SWidth = 1, no extra source words need be fetched. * If SWidth = DWidth # 1, an extra source word must be fetched at one end: * if src.bit mod 20 > dst.bit mod 20 then left else right. * Set BBCtrl[0]← extraLeft, BBCtrl[1:15]← extraRight. * (these flags are exchanged later if working right-to-left.) * T still has SWidth; Q still has dst.bit. * The following 2 instructions set BBCtrl← -1 if SWidth>DWidth, 0 otherwise. PD← (DWidth)-T; BBCtrl← T-T-1, XorSavedCarry, Branch[SetupSkew, ALU#0]; * Set BBCtrl← 100000 if src.bit > dst.bit, 77777 otherwise. PD← (SrcX)-Q-1; * Q=DstX; carry iff src.bit > dst.bit BBCtrl← 100000C; BBCtrl← (BBCtrl)-1, XorSavedCarry, Branch[SetupSkew]; BBNoSetupSrcX: Nop; * X-coordinate setup (cont'd) * Compute skew = (src.bit-dst.bit) mod 20, and determine horizontal direction. * Still have Q = dst.bit. SetupSkew: T← (SrcX)-Q; PD← BBDisp, Branch[.+2, R even]; Skew← T AND (17C), DblBranch[SetupRtoL, SetupLtoR, ALU<0]; BBSrc← IBBSrc; * Gray, init source ptr Skew← T AND (17C); T← 200C, Branch[NoSrcThinChk]; * ALUFM control for "A+1" * direction = forward: work from left to right. * Set SHA=R, SHB=T, ALUF[15]="A+1". * Note: if skew = 0, set SHA=R, SHB=R. SetupLtoR: T← 200C, Branch[NoSrcThinChk, ALU=0]; * ALUFM control for "A+1" Skew← (Skew) OR (20C), Branch[SetupALU&ShC]; * SHB=T * direction = backward: work from right to left. * Set SHA=T, SHB=R, ALUF[15]="A-1". * Advance starting X coordinates to rightmost ends of items. * Note: if skew = 0, set SHA=R, SHB=R, and do not exchange extra-word flags. SetupRtoL: BBDisp← (BBDisp) OR (100000C), Branch[.+3, ALU=0]; Skew← (Skew) OR (40C); * SHA=T BBCtrl← (BBCtrl) LCY 1; * Exchange source extra-word flags T← (Width)-1; DstX← (DstX)+T; * Advance to rightmost X-coordinates SrcX← (SrcX)+T; * Advance non-gray source T← 36C; * ALUFM control for "A-1" * Have ALUFM control in T for "A+/-1" operation. SetupALU&ShC: PD← (SWidth)-1; * Thin source check (see below) NoSrcThinChk: Stack&-2← ALUFMRW← T, A+/-1, * Set ALU function, save old value. Branch[SetupShC, ALU#0]; * Leave StkP -> TOS (scan line count) * If we would have fetched an extra source word, but there is only one source * word to fetch, then reset source extra-word flags and set SHA=R, SHB=R. BBCtrl← A0, Branch[.+2, R>=0]; Skew← (Skew) AND (17C); T← LSH[BBMasks, 10], Branch[.+2]; * Placement * Merge skew with masks and load SHC. SetupShC: T← LSH[BBMasks, 10]; * Shift masks to B0-7 T← LCY[T, Skew, 10]; * Concatenate SHA, SHB, count, masks T← DWidth, ShC← T; *----------------------------------------------------------- * Compute word addresses and increments *----------------------------------------------------------- * Registers in use: (permanent) BBDisp, VCount, DRast, SRast, Width, BBCtrl; * (temporary) DstX, SrcX, DWidth, SWidth. * Note: at this point DRast and SRast are still in units of bits. * Convert DstX to X word displacement relative to leftmost word of * first scan line. Note: BBDst is the same register as DstX. BBDst← RSH[DstX, 4]; * BBDst← DstX/20 * Compute destination inter-scan-line word address increment. * DstInc← DRast + (if direction=forward then -DWidth else DWidth). * Also compute first PreFetch offset (from BBDst). * = leftmost word on next scan line, even if processing right-to-left. * DPrefOffset← if direction=forward then DRast else DRast-DWidth. * Finally, if direction=backward then add 2↑15 to the word address, * which will be counted down toward the base register which was initialized * to the leftmost word of the first scan line -2↑15. * Note: T = DWidth here. DRast← RSH[DRast, 4], * Convert DRast to words Branch[DstIncRtoL, R<0]; * Branch if direction=backward DstIncLtoR: DstInc← (DRast)-T; * L to R: DRast-DWidth DPrefOffset← T← DRast, Branch[DstIncDone]; DstIncRtoL: DRast← (DRast) OR (170000C); * Extend sign DstInc← (DRast)+T; * R to L: DRast+DWidth DPrefOffset← T← (DRast)-T; BBDst← (BBDst)+(100000C); DstIncDone: PrefDst← (BBDst)+T; * Initial PreFetch address * Is a source item required? If not, skip source setup. BBDisp, Branch[BBNoSource, R even]; * Convert SrcX to X word displacement relative to leftmost word of * first scan line. Note: BBSrc is the same register as SrcX. BBSrc← RSH[SrcX, 4]; * BBSrc← SrcX/20 * Compute source inter-scan-line word address increment. * SrcInc← SRast + (if direction=forward then -SWidth else SWidth). * Also compute first PreFetch offset (from BBSrc). * = leftmost word on next scan line, even if processing right-to-left. * SPrefOffset← if direction=forward then SRast else SRast-SWidth. * Finally, if direction=backward then add 2↑15 to the word address, * which will be counted down toward the base register which was initialized * to the leftmost word of the first scan line -2↑15. T← SWidth; SRast← RSH[SRast, 4], * Convert SRast to words Branch[SrcIncRtoL, R<0]; * Branch if direction=backward SrcIncLtoR: SrcInc← (SRast)-T; * L to R: SRast-SWidth SPrefOffset← T← SRast, Branch[SrcIncDone]; SrcIncRtoL: SRast← (SRast) OR (170000C); * Extend sign SrcInc← (SRast)+T; * R to L: SRast+SWidth SPrefOffset← T← (SRast)-T; BBSrc← (BBSrc)+(100000C); SrcIncDone: PrefSrc← (BBSrc)+T; * Initial PreFetch address *----------------------------------------------------------- * Final adjustments prior to entering vertical loop *----------------------------------------------------------- * Compute ICnt, the initial value of the Cnt register for each loop. * ICnt← if DWidth <= 22B then DWidth-1 else NOT (DWidth-3) [= 2-DWidth]. BBNoSource: PD← (DWidth)-(23C); ICnt← T← (DWidth)-1, Branch[.+2, ALU<0]; ICnt← (1S)-T; * = (2S)-(DWidth) * Never need to "touch" data in the "thin" case. * ALU=0 here iff the "thin" case applies (DWidth=1). Stack← (Stack)-1, Branch[.+2, ALU#0]; BBDisp← (BBDisp) OR (Add[NoTouchFlg]C); * Enter vertical loop with (scan lines done)-1 on TOS and ALU>=0. MCount← T-T-1, MemBase← BBSrcBR; PD← A0, Branch[BBVerticalLoop]; *----------------------------------------------------------- * BitBlt vertical loop (per-scan-line) * At top of loop, the following invariants hold: * VCount = (number of scan lines remaining)-1 * Stack = (number of scan lines done)-1 * MemBase = BBSrcBR * ALU<0 iff destination BR displacement has overflowed * Vertical loop overhead: * 6 cycles for loop control and destination pointer update * +4 cycles for source pointer update if source item is used * +4 cycles if item is greater than 20B words wide *----------------------------------------------------------- BBVerticalLoop: VCount← (VCount)-1, T← MD, * Force faults from previous item Branch[BBDoneOrOverflow, ALU<0, R<0]; * If positive, ICnt has the desired initial value of Cnt (DWidth-1). T← NOT (Cnt← ICnt), Branch[SmallBlock, R>=0]; * Item greater than one munch wide. Set up separate munch and word counts. * T now has DWidth-3, where DWidth is the number of words in the destination * item, including first and last partial words. * MCount ← (DWidth-3)/20 -1, Cnt ← (DWidth-3) mod 20 +2. MCount← RSH[T, 4]; T← T AND (17C); T← T+(2C); MCount← (MCount)-1, Cnt← T; * This dispatch goes to one of: GrayDstSetup, GrayDstSetup&Touch, * SrcDstSetup, or SrcDstSetup&Touch. SmallBlock: BigBDispatch← BBDisp; Stack← (Stack)+1, * Advance scan lines done BranchExternal[Add[BBDispX, HorizontalDispLoc]]; *----------------------------------------------------------- AdvanceSrcDst: * Control returns here at the end of individual horizontal loops that * involve both source and destination items. * BBSrc, BBDst point one beyond last word processed. *----------------------------------------------------------- T← SrcInc; BBSrc← T← (BBSrc)+T; PrefSrc← T+(SPrefOffset); T← BBDst, Branch[SrcBROverflow, ALU<0]; *----------------------------------------------------------- AdvanceDst: * Control returns here at the end of individual horizontal loops that * involve only a destination item (and gray, which is handled separately). * BBDst and T point one beyond last word processed. *----------------------------------------------------------- BBDst← T← T+(DstInc); PrefDst← T← T+(DPrefOffset), DblBranch[BBReschedPending, BBVerticalLoop, Reschedule]; SrcBROverflow: Branch[BBDoneOrOverflow]; *----------------------------------------------------------- * Either VCount is exhausted or one of the BR displacements overflowed * (or conceivably both events occurred at the same time). * If VCount is exhausted then return normally; otherwise restart BitBlt. * Note: scan line count at TOS is one behind. *----------------------------------------------------------- BBDoneOrOverflow: Branch[BBReschedPending]; * Handle same as Reschedule request *----------------------------------------------------------- * Reschedule pending. Branch to the reschedule trap handler. * After handling the interrupt, it will restart the BitBlt with the state * left on the stack. *----------------------------------------------------------- BBReschedPending: T← MD, VCount, Branch[.+2, R>=0]; * We just processed the last scan line, so exit normally. StkP+2, Branch[BitBltDone]; * Not done yet: either BR displacement overflowed or Reschedule is pending. * Restore clobbered registers, check for interrupts, and then restart the instruction. * Note: scan line count at TOS is one behind, so fix it. Stack&-1← (Stack&-1)+1; Call[RestoreStdRegs]; Branch[.+2, Reschedule]; * Start instruction over if no interrupt pending Branch[@BITBLT]; * Reschedule request is pending. Give interrupt a chance to take. If it doesn't, * restart the instruction from the current point. Call[InterruptLongRunningOpcode]; Branch[@BITBLT]; *----------------------------------------------------------- * Really done. Restore clobbered ALUFM, flush stack, and exit. *----------------------------------------------------------- * Get here during initialization when only ALUFM[17] has been clobbered. BitBltDone1: Nop; BitBltDone2: StkP+1, Branch[.+2]; * Get here at end when both ALUFM[17] and ALUFM[15] have been clobbered. BitBltDone: ALUFMRW← Stack&-1, ALUF[15]; ALUFMRW← Stack&-2, ALUF[17]; * Must clear the low-order bit of BitBltArg so that if the caller does * a REC to recover it, he will see the right thing. Stack&-1← (Stack&-1)-1, RBase← RBase[MesaRegsMain]; ShC← StdShC, NextOpcode; *----------------------------------------------------------- * Horizontal loops *----------------------------------------------------------- % Organization of horizontal (per-word) loops: There are a number of variations, depending on the following: 1. Whether the source is normal or gray; 2. Whether or not the destination is an operand; 3. Whether or not the destination is "thin" (one word per scan line); 4. Whether or not data need be "touched" before doing the transfer. There are fewer than 16 total cases because some of these combinations cannot occur (for example, the thin case never requires touching). Each case has a "setup" routine, a "body" routine, and a "finish" routine. Many of these routines are shared among cases, and the flow of control is determined by a complicated network of dispatches on BBDisp. Finish routines exit to the vertical loop by a branch to AdvanceDst or AdvanceSrcDst. It is never necessary to "touch" if any of the following is true: a. The destination is "thin"; b. The destination is not an operand of the BitBlt function; c. The function can be performed multiple times with the same effect as performing it only once (i.e., anything except XOR, assuming the source and destination items don't overlap). The "touch" and "no touch" cases are distinguished as part of the dispatch. To reduce miss wait and increase performance, while processing each scan line we fall out of the main loop once per munch and PreFetch one munch for the next scan line. This strategy depends on the assumption that each scan line is less than 100 munches long (for a 100-row cache, which is what the Dorado has at present). Note that PreFetches are done left-to-right even if transfers are done right-to-left. Note that the current implementation is imperfect in that the last munch of the next scan line may not be prefetched, or an extra munch prefetched unnecessarily, because the main loop does not terminate at munch boundaries but rather at multiples of 20 words from the end of the scan line. Also, the first scan line is not prefetched, and a line past the end of the last scan line is prefetched unnecessarily. These defects should not affect performance noticeably in normal use. % TopLevel; KnowRBase[BBRegs]; %*----------------------------------------------------------- Case A: dest ← f(gray) BBOp is taken from the following table: Function (class) gray srcFunc dstFunc ALU operation 10 (A) TRUE normal null NOT A 14 (A) TRUE complement null A Entry point to this code is at GrayDestSetup. It dispatches here after setup. Timing, per scan line, including vertical loop overhead: 19 cycles minimum in the normal case +1 cycle per full word (excluding first and last partial words) +4 + (2 * # of munches) cycles if item is wider than 20B words 12 cycles total in the "thin" case %*----------------------------------------------------------- * Entry point for srcFunc = normal. * T has first partial word to be stored, and SrcWd has gray word. GrayBody: BBAt[GrayBodyLoc], BBDst← (Store← BBDst)+/-1, DBuf← T; T← ShiftNoMask[SrcWd], Branch[GrayEnd, Cnt=0&-1]; * Just replicate the shifted gray word throughout the destination body. GrayLoop: BBDst← (Store← BBDst)+/-1, DBuf← T, Branch[GrayLoop, Cnt#0&-1]; GrayEnd: MCount← (MCount)-1, Cnt← 17S, Branch[GrayLast, R<0]; PrefDst← (PreFetch← PrefDst), Carry20, Branch[GrayLoop]; GrayLast: Fetch← BBDst, Branch[DstFinish]; * Store last partial word. This is the tail of cases A and B. * Have already fetched the last word (at BBDst). DstFinish: BBDisp, Branch[.+2, R<0]; T← XShMDRMask[SrcWd], B← MD, Branch[StoreLastDst]; T← XShMDLMask[SrcWd], B← MD, Branch[StoreLastDst]; * Thin destination slice, for cases A and B. DstThin: T← XShMDBothMasks[SrcWd], B← MD; StoreLastDst: BBDst← T← (Store← BBDst)+/-1, DBuf← T, FlipMemBase, Branch[AdvanceDst]; %*----------------------------------------------------------- Case B: dest ← f(gray, dest) BBOp is taken from the following table: Function (class) gray srcFunc dstFunc ALU operation 11 (B) TRUE normal and NOT A AND B 12 (B) TRUE normal or NOT A OR B 13 (B) TRUE normal xor A EQV B 15 (B) TRUE complement and A AND B 16 (B) TRUE complement or A OR B 17 (B) TRUE complement xor A XOR B The gray word is kept in SrcWd and put directly into the shifter. Case A also dispatches to one of case B's entry points, GrayDstSetup. The setup code dispatches to the correct body routine. Timing, per scan line, including vertical loop overhead: 18 cycles minimum in the normal case +3 cycles per full word (excluding first and last partial words) +6 + (2 * # of pages) cycles if data needs to be "touched" +4 + (2 * # of munches) cycles if item is wider than 20B words 13 cycles total in the "thin" case %*----------------------------------------------------------- GrayDstSetup&Touch: BBAt[GrayDstSetup&TouchLoc], T← DRast, FlipMemBase, Call[TouchDst]; FlipMemBase; GrayDstSetup: BBAt[GrayDstSetupLoc], T← BBSrc← (Fetch← BBSrc)+1, FlipMemBase; * Fetch gray word SrcWd← MD, Fetch← BBDst, * SrcWd← gray word Branch[.+2, ALU<0]; * Exhausted gray block? BBSrc← T-(GrayBump)-1; * Yes, reset for next scan line PrefDst← (PreFetch← PrefDst), Carry20, Branch[DstThin, Cnt=0&-1]; * The following dispatch may modify GrayBody to GrayDstBody. BigBDispatch← BBDisp, Branch[.+2, R<0]; T← XShMDLMask[SrcWd], B← MD, Branch[GrayBody]; T← XShMDRMask[SrcWd], B← MD, Branch[GrayBody]; * Body routine for case B only. GrayDstBody: BBAt[GrayDstBodyLoc], BBDst← T← (Store← BBDst)+/-1, DBuf← T; PrefSrc← (Fetch← T)+/-1, Branch[GrayDstEnd, Cnt=0&-1]; * Inner loop runs with one word fetched ahead (now in MD). * PrefSrc is used as a temporary -- runs one word ahead of BBDst. GrayDstLoop: PrefSrc← (Fetch← PrefSrc)+/-1, T← MD; T← XShiftNoMask[SrcWd], B← T; BBDst← (Store← BBDst)+/-1, DBuf← T, Branch[GrayDstLoop, Cnt#0&-1]; GrayDstEnd: MCount← (MCount)-1, Cnt← 17S, Branch[DstFinish, R<0]; PrefDst← (PreFetch← PrefDst), Carry20, Branch[GrayDstLoop]; %*----------------------------------------------------------- Case C: dest ← f(source) BBOp is taken from the following table: Function (class) gray srcFunc dstFunc ALU operation 0 (C) FALSE normal null NOT A 4 (C) FALSE complement null A Entry points to this code are at SrcDstSetup and SrcDstSetup&Touch. They dispatch here after setup. Timing, per scan line, including vertical loop overhead: 25 cycles minimum in the normal case +4 cycles per full word (excluding first and last partial words) +12 + (4 * # of pages) cycles if data needs to be "touched" +1 cycle if 2 source words are required for the first destination word +3 cycles if gray is required +4 + (5 * # of munches) cycles if item is wider than 20B words 17 cycles minimum in the "thin" case +1 cycle if 2 source words are required %*----------------------------------------------------------- * Body routine for case C only. SrcBody: BBAt[SrcBodyLoc], T← BBDst← (Store← BBDst)+/-1, DBuf← T, FlipMemBase, **** Program around MicroD problem. Desired branch clause is: **** Call[SrcEnd, Cnt=0&-1]; **** but we must actually write: BRGO@[0] RETCL@[2] JCN[43]; BBSrc← (Fetch← BBSrc)+/-1, FlipMemBase, BBAt[SrcBodyLoc, 1]; **** MicroD problem SrcWd← MD, T← SrcWd; T← XShiftNoMask[SrcWd], Branch[SrcBody]; * This is called as a subroutine at the end of each munch. * It either exits the horizontal loop or returns to do another munch. SrcEnd: Subroutine; MCount← (MCount)-1, Cnt← 17S, DblBranch[SrcDstFinish, SrcDstMore, R<0], BBAt[SrcBodyLoc, 2]; **** MicroD problem SrcDstMore: PrefSrc← (PreFetch← PrefSrc), Carry20; FlipMemBase; PrefDst← (PreFetch← PrefDst), Carry20; FlipMemBase, Return; TopLevel; * Store last partial word, for cases C and D. SrcDstFinish: BBCtrl, Branch[.+2, R even]; * Fetch extra word at end? BBSrc← (Fetch← BBSrc)+/-1, FlipMemBase, Branch[.+2]; FlipMemBase; SrcWd← MD, T← SrcWd, Fetch← T; BBDisp, Branch[.+2, R<0]; T← XShMDRMask[SrcWd], B← MD, Branch[StoreLastSrcDst]; T← XShMDLMask[SrcWd], B← MD, Branch[StoreLastSrcDst]; * Thin destination slice, for cases C and D. SrcDstThin: T← XShMDBothMasks[SrcWd], B← MD; StoreLastSrcDst: BBDst← (Store← BBDst)+/-1, DBuf← T, FlipMemBase, Branch[AdvanceSrcDst]; %*----------------------------------------------------------- Case D: dest ← f(source, dest) BBOp is taken from the following table: Function (class) gray srcFunc dstFunc ALU operation 1 (D) FALSE normal and NOT A AND B 2 (D) FALSE normal or NOT A OR B 3 (D) FALSE normal xor A EQV B 5 (D) FALSE complement and A AND B 6 (D) FALSE complement or A OR B 7 (D) FALSE complement xor A XOR B Case C also dispatches to these entry points. The setup code dispatches to the correct body routines. Timing, per scan line, including vertical loop overhead: 25 cycles minimum in the normal case + 5 cycles per full word (excluding first and last partial words) +12 + (4 * # of pages) cycles if data needs to be "touched" + 1 cycle if 2 source words required for the first destination word + 3 cycles if gray is required + 4 + (5 * # of munches) cycles if item is wider than 20B words 17 cycles minimum in the "thin" case + 1 cycle if 2 source words are required %*----------------------------------------------------------- SrcDstSetup&Touch: BBAt[SrcDstSetup&TouchLoc], T← DRast, FlipMemBase, Call[TouchDst]; T← SRast, FlipMemBase, Call[TouchSrc]; SrcDstSetup: BBAt[SrcDstSetupLoc], BBCtrl, DblBranch[SrcDstSetup2, SrcDstSetup1, R<0]; SrcDstSetup2: BBSrc← (Fetch← BBSrc)+/-1; SrcDstSetup1: PrefSrc← (PreFetch← PrefSrc), Carry20; T← MD, BBSrc← (Fetch← BBSrc)+/-1, FlipMemBase; SrcWd← MD, Fetch← BBDst; PrefDst← (PreFetch← PrefDst), Carry20, Branch[SrcDstThin, Cnt=0&-1]; * The following dispatch may modify SrcBody to SrcDstBody. BigBDispatch← BBDisp, Branch[.+2, R<0]; T← XShMDLMask[SrcWd], B← MD, Branch[SrcBody]; T← XShMDRMask[SrcWd], B← MD, Branch[SrcBody]; * Body routine for case D only. SrcDstBody: BBAt[SrcDstBodyLoc], T← BBDst← (Store← BBDst)+/-1, DBuf← T, FlipMemBase, **** Program around MicroD problem. Desired branch clause is: **** Call[SrcDstEnd, Cnt=0&-1]; **** but we must actually write: BRGO@[0] RETCL@[2] JCN[103]; BBSrc← (Fetch← BBSrc)+/-1, FlipMemBase, BBAt[SrcDstBodyLoc, 1]; **** MicroD problem SrcWd← MD, T← SrcWd, Fetch← T; T← XShiftNoMask[SrcWd], B← MD, Branch[SrcDstBody]; * This is called as a subroutine at the end of each munch. * It either exits the horizontal loop or returns to do another munch. SrcDstEnd: Subroutine; MCount← (MCount)-1, Cnt← 17S, DblBranch[SrcDstFinish, SrcDstMore, R<0], BBAt[SrcDstBodyLoc, 2]; **** MicroD problem TopLevel; *----------------------------------------------------------- TouchSrc: * Touches every page of the source item for this scan line. * Entry: BBSrc = address of first word of source item * T = SRast = words per source scan line; negative => bottom-to-top * SrcInc = distance between last word of one scan line and * first word of the next; negative => bottom-to-top * MemBase = BBSrcBR * Exit: T and BBTemp clobbered * Method: touch the last words of the item. If the item is >401B * words long, also touch intermediate words at multiples of 400B words from * the last word of the item. * Note: we don't also need to touch the first word of the item, since if a fault * occurs on it during the BitBlt word loop, nothing will have changed yet. * Note: |SRast-SrcInc| gives the number of words touched in the item; * the value is positive if moving left-to-right, negative if right-to-left. * Timing: 5 cycles for first page or less * +1 cycle if right-to-left * +2 cycles per additional page *----------------------------------------------------------- Subroutine; T← T-(SrcInc)-1; BBTemp← (BBSrc)+T, DblBranch[STouchRtoL, STouchLtoR, ALU<0]; STouchLtoR: * T has (# words in source item)-1, BBTemp points to last (rightmost) word. T← T-(400C)-1; BBTemp← (Fetch← BBTemp)-(400C), Branch[.+2, ALU<0]; T← T-(400C), Branch[.-1]; Return; STouchRtoL: * T has -(# words in source item)-1, BBTemp points to last (leftmost) word -2. BBTemp← (BBTemp)+(2C); * Fix the off-by-2 address T← T+(400C)+1; * Should be T+402, but no matter BBTemp← (Fetch← BBTemp)+(400C), Branch[.+2, ALU>=0]; T← T+(400C), Branch[.-1]; Return; *----------------------------------------------------------- TouchDst: * Touches every page of the destination item for this scan line. * Entry: BBDst = address of first word of destination item * T = DRast = words per destination scan line; negative => bottom-to-top * DstInc = distance between last word of one scan line and * first word of the next; negative => bottom-to-top * MemBase = BBDstBR * Exit: T and BBTemp clobbered * Note: must dirty any words touched, to force write-protect fault. * Timing: 6 cycles for first page or less * +1 cycle if right-to-left * +3 cycles per additional page *----------------------------------------------------------- Subroutine; T← T-(DstInc)-1; BBTemp← (BBDst)+T, DblBranch[DTouchRtoL, DTouchLtoR, ALU<0]; DTouchLtoR: * T has (# words in source item)-1, BBTemp points to last (rightmost) word. T← T-(400C)-1; DTouchLoopLtoR: Fetch← BBTemp, PD← T; T← T-(400C), Branch[DTouchDoneLtoR, ALU<0]; Store← BBTemp, DBuf← MD; BBTemp← (BBTemp)-(400C), Branch[DTouchLoopLtoR]; DTouchDoneLtoR: Store← BBTemp, DBuf← MD, Return; DTouchRtoL: * T has -(# words in source item)-1, BBTemp points to last (leftmost) word -2. BBTemp← (BBTemp)+(2C); * Fix the off-by-2 address T← T+(400C)+1; * Should be T+402, but no matter DTouchLoopRtoL: Fetch← BBTemp, PD← T; T← T+(400C), Branch[DTouchDoneRtoL, ALU>=0]; Store← BBTemp, DBuf← MD; BBTemp← (BBTemp)+(400C), Branch[DTouchLoopRtoL]; DTouchDoneRtoL: Store← BBTemp, DBuf← MD, Return;