*----------------------------------------------------------- Title[AltoBitBlt.mc...February 17, 1983 10:38 AM...Taft]; * Bit-boundary block-transfer * This version emulates Alto BitBlt, and has been extended to * permit use of long pointers. *----------------------------------------------------------- % Refer to the Alto Hardware Manual for primary documentation. All numbers are octal except timings, which are decimal. BBTable format: 0 Function (see below) 1 unused 2 DBCA destination base core address 3 DBMR destination bit map raster 4 DLX destination left X 5 DTY destination top Y 6 DW destination width (also source width) 7 DH destination height (also source height) 10 SBCA source base core address 11 SBMR source bit map raster 12 SLX source left X 13 STY source top Y 14 Gray0 15 Gray1 16 Gray2 17 Gray3 20 LSBCAlo long pointer to source bit map 21 LSBCAhi 22 LDBCAlo long pointer to destination bit map 23 LDBCAhi Function: B0=1 Use the long pointers in words 20-23 and ignore DBCA, SBCA Following 2 bits defined only if NOT using long pointers: B10=1 Source block is in the alternate bank (XM) B11=1 Destination block is in the alternate bank (XM) B12:13 SourceType B14:15 Operation The 20 BitBlt Functions (combinations of SourceType and Operation) are divided into 6 classes: A dest _ gray B dest _ f(gray, dest) C dest _ f(source) D dest _ f(source, gray) E dest _ f(source, dest) F dest _ f(source, gray, dest) The distribution of functions into classes is: Function (class) SourceType Operation 0 (C) 0 source 0 dest _ source 1 (E) 0 source 1 dest _ source OR dest 2 (E) 0 source 2 dest _ source XOR dest 3 (E) 0 source 3 dest _ NOT source AND dest 4 (C) 1 NOT source 0 dest _ source 5 (E) 1 NOT source 1 dest _ source OR dest 6 (E) 1 NOT source 2 dest _ source XOR dest 7 (E) 1 NOT source 3 dest _ NOT source AND dest 10 (D) 2 source AND gray 0 dest _ source 11 (F) 2 source AND gray 1 dest _ source OR dest 12 (F) 2 source AND gray 2 dest _ source XOR dest 13 (F) 2 source AND gray 3 dest _ NOT source AND dest 14 (A) 3 gray 0 dest _ source 15 (B) 3 gray 1 dest _ source OR dest 16 (B) 3 gray 2 dest _ source XOR dest 17 (B) 3 gray 3 dest _ NOT source AND dest If DLX <= SLX, the BitBlt horizontal loop works left-to-right; if DLX > SLX, right-to-left. Similarly, if DTY < STY, the vertical loop works top-to- bottom; if DTY >= STY, bottom-to-top. This is so that the correct thing will happen if source and destination blocks overlap. (Note that this test depends on the assumption that if the blocks overlap, they belong to the same bit map, i.e., DBCA=SBCA and DBMR=SBMR. No check is made for this.) The "=" cases could be handled either way; however, the DTY=STY case must be handled bottom-to-top because the documented algorithm for proper phasing of the gray block depends on this. Terminology: when referring to words in a scan line, "left" and "right" refer to the words with lower and higher addresses, respectively, independent of the direction of processing the scan line; "first" and "last" refer to the first and last words encountered in the direction of processing. The destination block is considered to consist of a left partial word, some number of full (body) words, and a right partial word. If the block begins or ends on a word boundary, the left or right word is still considered to be a partial word. The destination bits preserved in the left and right partial words are determined by the SHC register's LMask and RMask, respectively, where LMask = DLX mod 20, RMask = 17-((DLX+DW-1) mod 20). The destination width in words, including the first and last partial words, is computed by DWidth = (DW + (DLX mod 20) + 17)/20. Similarly, the source width in words, including the first and last partial words, is computed by SWidth = (DW + (SLX mod 20) + 17)/20. DWidth and SWidth differ by at most +/-1. If DWidth > 2 then the destination consists of a left partial word, DWidth-2 full body words, and a right partial word. If DWidth = 2 then the block consists of a left partial word immediately followed by a right partial word. If DWidth = 1 then the entire destination block lies within a single word, not crossing a word boundary. Effectively, the left and right partial words are one and the same. This case (called the "thin" case) requires special handling, as both LMask and RMask must be applied simultaneously. Processing of the scan line is controlled by two counters: the Cnt register (loaded from ICnt), which counts the inner loop (20 words or fewer), and MCount, which counts the outer loop. Roughly speaking, Cnt is loaded initially with DWidth mod 20 and MCount with DWidth/20. The inner loop is executed Cnt times; then, until MCount is exhausted, Cnt is reloaded with 20 and the inner loop is executed 20 more times. The reason for this arrangement is to permit a PreFetch to be executed every 20 words. More precisely: Before each horizontal loop, the Cnt register is loaded with if DWidth < 23 then DWidth-1 else (DWidth-3) mod 20 +2, so: Cnt > 1 if the body contains one or more words. Cnt = 1 if there are no body words. Cnt = 0 in the thin case. Cnt is decremented twice before the main loop is reached, so upon entry to the main loop it contains DWidth-3 (assuming DWidth<23), which is precisely the correct value for going around the loop DWidth-2 times. (The extra 2 in DWidth are the first and last partial words, which are handled outside the word loop.) MCount is loaded with (DWidth-3)/20 -1, so: MCount < 0 if DWidth < 23 MCount = 0 if 23 <= DWidth < 42, etc. Each time the inner loop terminates, if MCount<0 the scan line is finished, but if MCount>=0, MCount is decremented, Cnt is loaded with 17, and the main loop is reentered for another 20 iterations. When working left-to-right, the source may be thought of as T,,SrcWd -- that is, with T on the left and SrcWd on the right. In any given operation, T contains the previous source word and SrcWd contains the current one, and the shift operation consists of left-shifting T,,SrcWd by 0 to 17 bits (to align it with the destination) and storing the leftmost 20 bits of the result. Since the shifter is actually a cycler and produces the rightmost 20 bits (rather than the leftmost) of the cycled result, the shifter must be set up to left-cycle an additional 20 bits (by exchanging SHA and SHB). Hence SHA=R and SHB=T. When working right-to-left, the source may be thought of as SrcWd,,T, where T contains the previous source word and SrcWd the current, as before. This is exactly symmetric with the left-to-right case and requires only that SHA and SHB be exchanged. Hence SHA=T and SHB=R. Regardless of the direction, if there are at least as many data bits in the first word of the source block as in the first word of the destination block, then all the destination bits come from SrcWd during the first shift and T need not be loaded at all. On the other hand, if there are fewer bits in the first word of the source block, the destination bits come from both SrcWd and T. In this case, T must contain the first source word and SrcWd the second. These two situations must be distinguished, since one requires fetching only a single word whereas the other requires fetching two words before operating the shifter for the first time. A similar situation arises at the end of a block; that is, all the bits in the last partial destination block may come from the leftover source word in T, or it may be necessary to fetch an additional word into SrcWd. These situations must be distinguished also. An earlier version of this microcode omitted the test and simply fetched the "extra" word always; unfortunately, this sometimes resulted in touching a word outside the bit map, which caused problems in a paged environment when the bit map happened to be page-aligned. Two bits in BBFlags control the fetching of the "extra" word in the above two cases. All the horizontal loops work in both directions by use of a trick: ALUF[15] is redefined to be A+1 if moving left-to-right but A-1 if right-to-left. This ALU function is invoked by the operation "A+/-1", and is used to advance source and destination pointers along the scan line. A consequence of this is that ALUF[15] (normally A AND NOT B) cannot be used by I/O tasks. BBDst and BBSrc are 16-bit displacements relative to base registers BBDstBR and BBSrcBR. If moving top-to-bottom they start near zero and count up; if bottom-to-top they start at ~2^15 and count down. If more than ~2^15 words of bit map are processed, one of these displacements becomes negative. This is detected and causes BitBlt to restart, recomputing the base registers and displacements. BR displacement overflow is handled this way because it takes too much code to recompute the BRs and displacements in mid-stream. The cost of restarting BitBlt from scratch in this case is unimportant, since such a large amount of data is involved. (Indeed, in normal programs an interrupt will occur before 2^15 words are processed, so BitBlt will be restarted anyway.) *----------------------------------------------------------- * BitBlt Calling Sequence *----------------------------------------------------------- The BitBlt subroutine is called by: SCall[BitBltSub] with the Top-of-Stack (TOS) containing the following arguments: TOS-1: pointer to BBTable (in current space) TOS: 0 Assumes RBase[AEmRegs] when called and leaves it that way upon return. Assumes that the BBTable is relative to the MemBase in effect at the time of the call. The caller shouldn't assume anything about MemBase upon return. BitBlt exits in one of three ways: 1. Normal completion: BitBlt sets TOS to zero and returns to caller+2. 2. Interrupt request pending: BitBlt sets TOS to the number of scan lines already processed and returns to caller+1. The calling emulator processes the interrupt and then restarts the BitBlt with the value left in TOS. Note: if the BitBlt source or destination block is larger than 2^15 words, BitBlt stops after processing at most 2^15 words and returns +1, exactly as if an interrupt had occurred. 3. Page fault: BitBlt is careful to ensure restartability in the face of page faults. That is, any time a page fault can occur, TOS contains the number of scan lines already done, and restarting BitBlt with that argument will cause the correct thing to happen. (However, a BitBlt that causes a write-protect fault is NOT guaranteed to be restartable.) Local stack usage: TOS-1: BBTable TOS: number of scan lines completed TOS+1: not used (preserved for Alto Emulator's benefit) TOS+2: return Link TOS+3: saved ALUFM[17] TOS+4: saved ALUFM[15] Approximate timing for initialization and cleanup (excluding main loops): 69 cycles (minimum) +4 if working bottom-to-top +21 if DTY # 0 or working bottom-to-top +1 if DWidth = 1 or DWidth > 22B Add the following if a source bit map is required: +31 +5 if working right-to-left +4 if working bottom-to-top +21 if STY # 0 or working bottom-to-top +5 if DWidth=SWidth +1 if SWidth = 1 +2 if STY = DTY and SBMR = DBMR 95 cycles maximum in destination-only cases 164 cycles maximum in source-destination cases See comments above main loops for main loop times. % *----------------------------------------------------------- * R-register assignments: *----------------------------------------------------------- SetRMRegion[BBRegs]; * The RMRegion itself is defined in RegisterDefs.mc RVN[BBDst]; * Address of next word to process RVN[BBSrc]; RVN[DstInc]; * Address increment between scan lines -- RVN[SrcInc]; * negative if working bottom-to-top RVN[DRast]; * Raster length (words) -- RVN[SRast]; * negative if working bottom-to-top RVN[PrefDst]; * Address of next munch to PreFetch RVN[PrefSrc]; RVN[DPrefOffset]; * Offset of leftmost word of next scan line relative RVN[SPrefOffset]; * to first word of current scan line RVN[VCount]; * Vertical line count RVN[MCount]; * Horizontal munch count RVN[ICnt]; * Initial value of Cnt register for word loops RVN[BBDisp]; * Control flags and horizontal loop dispatch * B0=1 => work right-to-left * B8:15 = dispatch value relative to HorizontalDisp * (see below for details) RVN[BBFlags]; * Control flags: * B0=1 => 2 source words required for first dest word * B15=1 => 2 source words required for last dest word RVN[SrcWd]; * Leftover source word -- must be RVREL 17 * Additional registers, overlaid with emulator temps in the AEmRegs block. RME[Gray0, ETemp0]; * The gray block RME[Gray1, ETemp1]; RME[Gray2, ETemp2]; RME[Gray3, ETemp3]; * Aliases used during initialization RME[Width, DPrefOffset]; * Width of block in bits RME[DWidth, ICnt]; * Width of destination block in words RME[SWidth, PrefSrc]; * Width of source block in words RME[Skew, DstInc]; * Destination-source skew, mod 20 RME[BBMasks, PrefDst]; * LMask and RMask values to be loaded into SHC RME[BBFunc, SrcInc]; * BitBlt function word RME[DstX, BBDst]; * Destination starting X in bits, later in words RME[SrcX, BBSrc]; * Source starting X in bits, later in words RME[DstY, SPrefOffset]; * Destination starting Y in scan lines RME[SrcY, SRast]; * Source starting Y in scan lines RME[BBTemp, SrcWd]; * Must be RVRel 17 because it is an arg to MulSub. *----------------------------------------------------------- * Other definitions *----------------------------------------------------------- * Base-register assignments % -- Actually defined in ADefs.mc. First two must be an even-odd pair. BR[BBDstBR, ?]; * BitBlt destination base BR[BBSrcBR, ?]; * BitBlt source base BR[ScratchBR, ?]; * Scratch (emulator use only) % * ALU functions defined by BitBlt. * The ALUF Ram is loaded by BitBlt with the desired operations. XALUOP[,BBOp,,17,E]; * A BBOp B -- logical operation invoked with shifter XAOP[,+/-1,15,E]; * A +/-1 -- A+1 or A-1 depending on horizontal * direction. This value of ALUF is normally * A AND NOT B and is restored by BitBlt when done. * This means, however, that A AND NOT B cannot be used * by other tasks. * Layout of BBDisp register: * B0 = 0 if working left-to-right, 1 if right-to-left * B8-15: BigBDispatch value for setup and body dispatches. * The following addressing constraints apply: * (1) B9 = 1 and B15 = 1 if a source bit map is used. * (2) B14 = 0 if the block(s) must be touched before beginning a transfer. * (3) B12 = 1 and B13 = 0 if gray is used; B12 = 0 and B13 = 1 otherwise. * (4) B11 = 1 if the destination is an operand. * (5) Certain targets are tied together by Call constraints. * These bits are rather carefully selected to permit the same BBDisp to be * used in three different dispatches. Set[SrcFlg, 101]; Set[NoTouchFlg, 2]; Set[NoGrayFlg, 4]; Set[GrayFlg, 10]; Set[DstFlg, 20]; * BBDispX defines the first of two pages used for dispatches on BBDisp. * BBDispX must refer to an even page. Set[BBDispX, 1200]; M[BBAt, At[BBDispX, Add[#1, #2]]]; * Page-relative entry points for setup routines. * All must have B11 = 1, to neutralize body dispatch (DstFlag). Set[GrayDstSetup&TouchLoc, 30]; Set[GrayDstSetupLoc, 32]; Set[SrcDstSetup&TouchLoc, 125]; Set[SrcDstSetupLoc, 127]; * Must be SrcDstSetup&TouchLoc + 2 Set[SrcGrayDstSetup&TouchLoc, 131]; Set[SrcGrayDstSetupLoc, 133]; * Must be SrcGrayDstSetup&TouchLoc + 2 Set[HorizontalDispLoc, 20]; * Target of setup dispatch * Page-relative entry points for body routines. * They must neutralize all bits used for the setup dispatch. Set[GrayBodyLoc, 16]; Set[GrayDstBodyLoc, 36]; Set[SrcGrayBodyLoc, 117]; * These must be xxxx1111 because they contain Set[SrcGrayDstBodyLoc, 137]; * conditional Call instructions *----------------------------------------------------------- BitBltSub: * Entry point * Preliminaries: read BBTable, decode function *----------------------------------------------------------- KnowRBase[AEmRegs]; Subroutine; T_ Link, StkP-1; TopLevel; Fetch_ Stack&+3; * Fetch Function from BBTable Stack&+1_ T, MemBase_ ScratchBR; * Set up ScratchBR to point to base of BBTable. Gray2_ MD, T_ VAHi; * Gray2_ function word BRHi_ T; T_ VALo; BRLo_ T; * Read gray block and save in R-registers. Gray0_ (Fetch_ 17S) AND MD; * Gray0_ function dispatch Gray3_ MD, T_ (Fetch_ 16S)-1; Gray2_ MD, T_ (Fetch_ T)-1, Q_ Gray2; * Q_ function word Gray1_ MD, Fetch_ T; * Dispatch on BitBlt function to compute BBOP and BBDisp. Gray0_ MD, BigBDispatch_ Gray0; RBase_ RBase[BBRegs], Branch[BBFunctionTable]; *----------------------------------------------------------- BBFunctionTable: DispTable[20]; * SourceType, Operation T_ 1C, Branch[BBFC]; * 0, 0 NOT A T_ 5C, Branch[BBFE]; * 0, 1 NOT A OR B T_ 15C, Branch[BBFE&T]; * 0, 2 A EQV B T_ 35C, Branch[BBFE]; * 0, 3 A AND B T_ 37C, Branch[BBFC]; * 1, 0 A T_ 27C, Branch[BBFE]; * 1, 1 A OR B T_ 23C, Branch[BBFE&T]; * 1, 2 A XOR B T_ 21C, Branch[BBFE]; * 1, 3 NOT A AND B T_ 1C, Branch[BBFD]; * 2, 0 NOT A T_ 5C, Branch[BBFF]; * 2, 1 NOT A OR B T_ 15C, Branch[BBFF&T]; * 2, 2 A EQV B T_ 35C, Branch[BBFF]; * 2, 3 A AND B T_ 1C, Branch[BBFA]; * 3, 0 NOT A T_ 5C, Branch[BBFB]; * 3, 1 NOT A OR B T_ 15C, Branch[BBFB&T]; * 3, 2 A EQV B T_ 35C, Branch[BBFB]; * 3, 3 A AND B *----------------------------------------------------------- * Select the appropriate dispatch word for the function. BBFA: BBDisp_ Add[GrayFlg, NoTouchFlg]C, Branch[SetBBF]; BBFB: BBDisp_ Add[GrayFlg, DstFlg, NoTouchFlg]C, Branch[SetBBF]; BBFB&T: BBDisp_ Add[GrayFlg, DstFlg]C, Branch[SetBBF]; BBFC: BBDisp_ Add[SrcFlg, NoGrayFlg, NoTouchFlg]C, Branch[SetBBF]; BBFD: BBDisp_ Add[SrcFlg, GrayFlg, NoTouchFlg]C, Branch[SetBBF]; BBFE: BBDisp_ Add[SrcFlg, NoGrayFlg, DstFlg, NoTouchFlg]C, Branch[SetBBF]; BBFE&T: BBDisp_ Add[SrcFlg, NoGrayFlg, DstFlg]C, Branch[SetBBF]; BBFF: BBDisp_ Add[SrcFlg, GrayFlg, DstFlg, NoTouchFlg]C, Branch[SetBBF]; BBFF&T: BBDisp_ Add[SrcFlg, GrayFlg, DstFlg]C, Branch[SetBBF]; SetBBF: Stack&+1_ ALUFMRW_ T, ALUF[17]; * Set ALU function and save old *----------------------------------------------------------- * X-coordinate setup: widths, margins, skew, masks, etc. *----------------------------------------------------------- * Fetch the X-coordinate information. MCount_ NOT (Fetch_ 6S); * MCount_ small negative, fetch DW Width_ MD, Fetch_ 12S; * Fetch SLX SrcX_ MD, T_ A0, Fetch_ 4S; * Fetch DLX -- must be fetched last * Compute LMask and RMask. * LMask = DLX mod 20, RMask = 17-((DLX+DW-1) mod 20) = (-DLX-DW) mod 20. * For reference, SHC fields are: * B2: SHA=T, B3: SHB=T, B4-7: count, B8-11: RMask, B12-15: LMask DstX_ MD, T_ T-(Width); * T_ -DW T_ T-(DstX); * T_ -DLX-DW BBMasks_ DPF[T, 4, 4, MD]; * B8-11_ ((-DLX-DW) mod 20) lsh 4, * B12-15_ DLX mod 20 * Compute destination width in words, including first and last partial words. * DWidth_ (Width + (DLX mod 20) +17) / 20. T_ (DstX) AND (17C); T_ (Width)+T; T_ T+(17C); DWidth_ RSH[T, 4]; BBFunc_ Q, BBDisp, Branch[.+2, R odd]; * Source block required? * BBFunc_ full BitBlt function word * Source block not required. Set shift count to send R straight thru shifter, * and handle as left-to-right case. Skew_ A0, Branch[SetupLtoR]; * SHA=R, SHB=R * Compute source width in words, including first and last partial words. * SWidth_ (Width + (SLX mod 20) +17) / 20. T_ (SrcX) AND (17C); T_ (Width)+T; T_ T+(17C); SWidth_ T_ RSH[T, 4]; * Set flags to control fetching of "extra" first and last words. * Except in the "thin" case (DWidth=1), the setup/finish routines for the * horizontal loops nominally fetch 1 word and store 2 words; an extra fetch * may be required at the beginning, the end, or both, depending on the number * of words in the source and destination blocks (see introductory comments). * DWidth and SWidth differ by at most +/-1. * If SWidth = DWidth+1, an extra source word must be fetched at both ends. * If SWidth = DWidth-1 or SWidth = 1, no extra source words need be fetched. * If SWidth = DWidth # 1, an extra source word must be fetched at one end: * if SLX mod 20 > DLX mod 20 then left else right. * Set BBFlags[0]_ extraLeft, BBFlags[1:15]_ extraRight. * (these flags are exchanged later if working right-to-left.) * T still has SWidth; MD still has DLX. * The following 2 instructions set BBFlags_ -1 if SWidth>DWidth, 0 otherwise. PD_ (DWidth)-T; BBFlags_ T-T-1, XorSavedCarry, Branch[SetupSkew, ALU#0]; * Set BBFlags_ 100000 if SLX mod 20 > DLX mod 20, 77777 otherwise. T_ (SrcX) AND (17C); BBFlags_ (17S) AND MD; * DLX mod 20 PD_ T-(BBFlags)-1; * Carry iff SLX mod 20 > DLX mod 20 BBFlags_ 100000C; BBFlags_ (BBFlags)-1, XorSavedCarry; * X-coordinate setup (cont'd) * Compute skew = (SLX-DLX) mod 20, and decide on horizontal direction. SetupSkew: T_ (SrcX)-MD; Skew_ T AND (17C), DblBranch[SetupRtoL, SetupLtoR, ALU<0]; * SLX >= DLX: work from left to right. Set SHA=R, SHB=T, ALUF[15]="A+1". * Note: if skew = 0, set SHA=R, SHB=R. SetupLtoR: T_ 200C, Branch[NoSrcThinChk, ALU=0]; * ALUFM control for "A+1" Skew_ (Skew) OR (20C), Branch[SetupALU&ShC]; * SHB=T * SLX < DLX: work from right to left. * Set SHA=T, SHB=R, ALUF[15]="A-1". * Advance starting X coordinates to rightmost ends of blocks. * Note: if skew = 0, set SHA=R, SHB=R, and do not exchange extra-word flags. SetupRtoL: BBDisp_ (BBDisp) OR (100000C), Branch[.+3, ALU=0]; Skew_ (Skew) OR (40C); * SHA=T BBFlags_ (BBFlags) LCY 1; * Exchange source extra-word flags T_ (Width)-1; DstX_ (DstX)+T; * Advance to rightmost X-coordinates SrcX_ (SrcX)+T; T_ 36C; * ALUFM control for "A-1" * Have ALUFM control in T for "A+/-1" operation. SetupALU&ShC: PD_ (SWidth)-1; * Thin source check (see below) NoSrcThinChk: Stack&-4_ ALUFMRW_ T, A+/-1, * Set ALU function, save old value. Branch[SetupShC, ALU#0]; * Leave StkP -> TOS (scan line count) * If we would have fetched an extra source word, but there is only one source * word to fetch, then reset source extra-word flags and set SHA=R, SHB=R. BBFlags_ A0, Branch[.+2, R>=0]; Skew_ (Skew) AND (17C); T_ LSH[BBMasks, 10], Branch[.+2]; * Placement * Merge skew with masks and load SHC. SetupShC: T_ LSH[BBMasks, 10]; * Shift masks to B0-7 T_ LCY[T, Skew, 10]; * Concatenate SHA, SHB, count, masks PD_ (Width)-1, ShC_ T; * Convert DstX and SrcX to X word displacements relative to start of * first scan line. Note: BBDst is the same register as DstX, and * BBSrc is the same register as SrcX. BBDst_ RSH[DstX, 4], Branch[.+2, ALU>=0]; * BBDst_ DstX/20 Branch[BitBltDone]; * Width<=0 -- nothing to do BBSrc_ RSH[SrcX, 4]; * BBSrc_ SrcX/20 *----------------------------------------------------------- * Fetch and set up Y-coordinate information *----------------------------------------------------------- Fetch_ 7S; * Fetch DH T_ MD, Fetch_ 5S; * Fetch DTY DstY_ MD, Fetch_ 13S; * Fetch STY -- must be fetched last Skew_ (DstY)-MD; * Test and remember vertical direction * VCount_ (scan lines left to do)-1. Stack has scan lines already done. VCount_ T-(Stack)-1, Branch[.+2, ALU<0]; * DH - (scan lines done) -1 * DTY >= STY, work from bottom to top. Start with lowest line not yet done. T_ VCount, Branch[.+2]; * DTY < STY, work from top to bottom. Start with highest line not yet done. T_ Stack; DstY_ (DstY)+T; * Compute starting Y coordinates SrcY_ T+MD; * MD still has STY *----------------------------------------------------------- * Set up destination base address and increments *----------------------------------------------------------- * Compute starting destination scan line offset relative to base of bit map. * DstY has scan line number. T_ DstY, Fetch_ 3S; * Fetch DBMR DRast_ MD, T_ A0, Q_ T, Branch[.+2, ALU=0]; * Don't multiply if DstY=0 T_ DRast, Call[MulSub]; * T,,Q _ Q*T, clobbers BBTemp * Now have 32-bit offset in T,,Q. Add it to bit map base address. **** Program around MicroD problem. Desired statement is: **** PD_ (BBFunc) AND (20C), DblCall[DFetchBBLong, DFetchBBShort, R<0]; **** but we must actually write: PD_ (BBFunc) AND (20C), BRGO@[0] RETCL@[3] JCN[44] GPW0@[11400]; Set[DFetchBBShortLoc, 1420]; **** * Now T,,BBTemp = adjusted base value. * Note: Skew<0 here iff working top-to-bottom. MemBase_ BBDstBR, Skew, Branch[DstLoadBR, R<0]; * Processing bottom-to-top. * Decrease base by 2^15, and increase initial displacement by 2^15. BBTemp_ (BBTemp)-(100000C); T_ T-1, XorSavedCarry; BBDst_ (BBDst)+(100000C); DRast_ (0S)-(DRast); * DRast_ -DBMR * Now finally ready to load the base register! DstLoadBR: T_ DWidth, BRHi_ T; * DWidth for code below BRLo_ BBTemp; * Compute destination inter-scan-line word address increment. * DstInc_ DRast + (if left-to-right then -DWidth else DWidth). * Also compute first PreFetch offset (from BBDst). * = leftmost word on next scan line, even if processing right-to-left. * DPrefOffset_ if L-to-R then DRast else DRast-DWidth. * Note that DRast already has DBMR or -DBMR as appropriate. T = DWidth. BBDisp, Branch[DstIncRtoL, R<0]; * Test direction DstIncLtoR: DstInc_ (DRast)-T; * L to R: DRast-DWidth DPrefOffset_ T_ DRast, Branch[.+3]; DstIncRtoL: DstInc_ (DRast)+T; * R to L: DRast+DWidth DPrefOffset_ T_ (DRast)-T; PrefDst_ (BBDst)+T; * Initial PreFetch address *----------------------------------------------------------- * Set up source base address and increments *----------------------------------------------------------- * Is a source block required? If not, skip source setup. BBDisp, Branch[BBNoSource, R even]; * If a source-destination overlap exists, set the "touch" flag. * It is too hard to decide precisely whether an overlap exists, * but if SBMR=DBMR and STY=DTY, an overlap MAY exist, so we set the flag. * Note that the flag may already be set as a consequence of operation=XOR. * (Note that the sense of the flag is complemented: "touch" => NoTouchFlag=0.) T_ SrcY, MemBase_ ScratchBR; Fetch_ 3S; * Fetch DBMR BBTemp_ MD, Fetch_ 11S; * Fetch SBMR T_ (DstY) XOR T; BBTemp_ (BBTemp) XOR MD; PD_ T OR (BBTemp); T_ SrcY, Branch[.+2, ALU#0]; * Skip if SBMR#DBMR or STY#DTY BBDisp_ (BBDisp) AND (Not[NoTouchFlg]C), Branch[.-1]; * Compute starting source scan line offset relative to base of bit map. * T = ALU = SrcY = starting source scan line number. SRast_ MD, T_ A0, Q_ T, Branch[.+2, ALU=0]; * Don't multiply if SrcY=0 T_ SRast, Call[MulSub]; * T,,Q _ Q*T, clobbers BBTemp * Now have 32-bit offset in T,,Q. Add it to bit map base address. **** Program around MicroD problem. Desired statement is: **** PD_ (BBFunc) AND (40C), DblCall[SFetchBBLong, SFetchBBShort, R<0]; **** but we must actually write: PD_ (BBFunc) AND (40C), BRGO@[0] RETCL@[3] JCN[104] GPW0@[11400]; Set[SFetchBBShortLoc, 1440]; **** * Now T,,BBTemp = adjusted base value. * Note: DRast>=0 here iff working top-to-bottom. MemBase_ BBSrcBR, DRast, Branch[SrcLoadBR, R>=0]; * Processing bottom-to-top. * Decrease base by 2^15, and increase initial displacement by 2^15. BBTemp_ (BBTemp)-(100000C); T_ T-1, XorSavedCarry; BBSrc_ (BBSrc)+(100000C); SRast_ (0S)-(SRast); * SRast_ -SBMR * Now finally ready to load the base register! SrcLoadBR: T_ SWidth, BRHi_ T; * SWidth for code below BRLo_ BBTemp; * Compute source inter-scan-line word address increment. * SrcInc_ SRast + (if left-to-right then -SWidth else SWidth). * Also compute first PreFetch offset (from BBSrc). * = leftmost word on next scan line, even if processing right-to-left. * SPrefOffset_ if L-to-R then SRast else SRast-SWidth. * Note that SRast already has SBMR or -SBMR as appropriate. T = SWidth. BBDisp, Branch[SrcIncRtoL, R<0]; * Test direction SrcIncLtoR: SrcInc_ (SRast)-T; * L to R: SRast-SWidth SPrefOffset_ T_ SRast, Branch[.+3]; SrcIncRtoL: SrcInc_ (SRast)+T; * R to L: SRast+SWidth SPrefOffset_ T_ (SRast)-T; PrefSrc_ (BBSrc)+T; * Initial PreFetch address *----------------------------------------------------------- * Final adjustments prior to entering vertical loop *----------------------------------------------------------- * Compute ICnt, the initial value of the Cnt register for each loop. * ICnt_ if DWidth <= 22B then DWidth-1 else NOT (DWidth-3) [= 2-DWidth]. * MCount starts out negative (was initialized long ago). BBNoSource: PD_ (DWidth)-(23C); ICnt_ T_ (DWidth)-1, Branch[.+2, ALU<0]; ICnt_ (1S)-T; * = (2S)-(DWidth) * Never need to "touch" data in the "thin" case. * ALU=0 here iff the "thin" case applies (DWidth=1). Stack_ (Stack)-1, Branch[.+2, ALU#0]; BBDisp_ (BBDisp) OR (Add[NoTouchFlg]C); * Enter vertical loop with (scan lines done)-1 on TOS and ALU>=0. T_ A0; PD_ Q_ T, Branch[BBVerticalLoop]; * Init gray value to 0 *----------------------------------------------------------- * xFetchBBShort: * Fetch BitBlt short pointer for destination or source * Entry: ALU#0 iff alternate bank bit set * T,,Q = 32-bit displacement * Exit: T,,BBTemp = adjusted base address *----------------------------------------------------------- Subroutine; **** Note: flush absolute placement when MicroD is fixed **** DFetchBBShort: At[DFetchBBShortLoc], * Fetch DBMR BBTemp_ Q, Fetch_ 2S, DblBranch[BBNormal, BBAlternate, ALU=0]; SFetchBBShort: At[SFetchBBShortLoc], * Fetch SBMR BBTemp_ Q, Fetch_ 10S, DblBranch[BBNormal, BBAlternate, ALU=0]; BBNormal: BBTemp_ (BBTemp)+MD, RBase_ RBase[AEmRegs]; T_ T+(EmuBRHiReg), XorSavedCarry, Branch[FetchBBRet]; KnowRBase[BBRegs]; BBAlternate: BBTemp_ (BBTemp)+MD, RBase_ RBase[AEmRegs]; T_ T+(EmuXMBRHiReg), XorSavedCarry; FetchBBRet: RBase_ RBase[BBRegs], Return; *----------------------------------------------------------- * xFetchBBLong: * Fetch BitBlt long pointer for destination or source * Entry: T,,Q = 32-bit displacement * Exit: T,,BBTemp = adjusted base address *----------------------------------------------------------- Subroutine; **** Note: flush absolute placement when MicroD is fixed **** DFetchBBLong: At[DFetchBBShortLoc, 1], BBTemp_ 22C, Branch[.+2]; * Destination long pointer low word SFetchBBLong: At[SFetchBBShortLoc, 1], BBTemp_ 20C; * Source long pointer low word BBTemp_ (Fetch_ BBTemp)+1; BBTemp_ MD, Fetch_ BBTemp; BBTemp_ (BBTemp)+Q; T_ T+MD, XorSavedCarry, Return; TopLevel; *----------------------------------------------------------- * BitBlt vertical loop (per-scan-line) * At top of loop, the following invariants hold: * VCount = (number of scan lines remaining)-1 * Stack = (number of scan lines done)-1 * MemBase = BBDstBR in destination-only cases, BBSrcBR otherwise * ALU<0 iff destination BR displacement has overflowed * Vertical loop overhead: * 6 cycles for loop control and destination pointer update * +4 cycles for source pointer update if source block is used * +4 cycles if block is greater than 20B words wide *----------------------------------------------------------- BBVerticalLoop: VCount_ (VCount)-1, Branch[BBDoneOrOverflow, ALU<0, R<0]; * If positive, ICnt has the desired initial value of Cnt (DWidth-1). T_ NOT (Cnt_ ICnt), Branch[SmallBlock, R>=0]; * Block greater than one munch wide. Set up separate munch and word counts. * T now has DWidth-3, where DWidth is the number of words in the destination * block, including first and last partial words. * MCount _ (DWidth-3)/20 -1, Cnt _ (DWidth-3) mod 20 +2. MCount_ RSH[T, 4]; T_ T AND (17C); T_ T+(2C); MCount_ (MCount)-1, Cnt_ T; * This dispatch goes to one of: GrayDstSetup, GrayDstSetup&Touch, * SrcDstSetup, SrcDstSetup&Touch, SrcGrayDstSetup, or SrcGrayDstSetup&Touch. SmallBlock: BigBDispatch_ BBDisp; Stack_ (Stack)+1, Branch[HorizontalDisp]; * Advance scan lines done *----------------------------------------------------------- AdvanceSrcDst: * Control returns here at the end of individual horizontal loops that * involve both source and destination blocks. * BBSrc, BBDst point one beyond last word processed. *----------------------------------------------------------- T_ SrcInc; BBSrc_ T_ (BBSrc)+T; PrefSrc_ T+(SPrefOffset); T_ BBDst, Branch[SrcBROverflow, ALU<0]; *----------------------------------------------------------- AdvanceDst: * Control returns here at the end of individual horizontal loops that * involve only a destination block. * BBDst and T point one beyond last word processed. *----------------------------------------------------------- BBDst_ T_ T+(DstInc); PrefDst_ T_ T+(DPrefOffset), DblBranch[BBReschedPending, BBVerticalLoop, Reschedule]; *----------------------------------------------------------- * Reschedule pending. See if interrupt is really being requested, * and if so, terminate processing and return +1 from BitBltSub. * The calling emulator will process the interrupt and then * call BitBltSub again. * T = PrefDst here. *----------------------------------------------------------- BBReschedPending: VCount, RBase_ RBase[NWW], Branch[.+2, R>=0]; * We just processed the last scan line, so return normally. Branch[BitBltDone]; * Test NWW to see whether an interrupt is really pending. PD_ NWW, NoReschedule; Branch[.+2, ALU>0]; * No interrupt pending. Restore RBase and ALU and continue vertical loop. PD_ T, RBase_ RBase[BBRegs], Branch[BBVerticalLoop]; * Interrupt really pending. Restore clobbered ALUFM locations before * taking interrupt. Note: scan line count at TOS is one behind, so fix it. BBInitiateInterrupt: T_ Stack&+3_ (Stack&+3)+1, RescheduleNow, * Force immediate trap Branch[BitBltDone1]; *----------------------------------------------------------- * Either VCount is exhausted or one of the BR displacements overflowed * (or conceivably both events occurred at the same time). * If VCount is exhausted then return normally; otherwise restart BitBlt. * The easiest way to restart is to pretend an interrupt occurred. * Note: scan line count at TOS is one behind, and VCount is over-decremented. *----------------------------------------------------------- BBDoneOrOverflow: VCount_ (VCount)+1; SrcBROverflow: VCount, Branch[BitBltDone, R<0]; RBase_ RBase[AEmRegs], Branch[BBInitiateInterrupt]; *----------------------------------------------------------- * Really done. Set TOS=0, restore clobbered ALUFM, and return. *----------------------------------------------------------- BitBltDone: T_ Stack&+3_ A0, RBase_ RBase[AEmRegs]; BitBltDone1: ALUFMRW_ Stack&+1, ALUF[17]; * Restore ALUF[17] ALUFMRW_ Stack&-2, A+/-1; * Restore ALUF borrowed for A+/-1 PD_ T, Link_ Stack&-2; Subroutine; Return[ALU=0]; * Return +2 if done, +1 otherwise TopLevel; *----------------------------------------------------------- * Horizontal loops *----------------------------------------------------------- % Organization of horizontal (per-word) loops: There are a number of variations, depending on the following: 1. Whether or not a source block is used; 2. Whether or not a gray block is used; 3. Whether or not the destination is an operand; 4. Whether or not the destination is "thin" (one word per scan line); 5. Whether or not data need be "touched" before doing the transfer. There are fewer than 32 total cases because some of these combinations cannot occur (for example, no source block and no gray block). Each case has a "setup" routine, a "body" routine, and a "finish" routine. Many of these routines are shared among cases, and the flow of control is determined by a complicated network of dispatches on BBDisp. Finish routines exit to the vertical loop by a branch to AdvanceDst or AdvanceSrcDst. It is never necessary to "touch" if any of the following is true: a. The destination is "thin"; b. The destination is not an operand of the BitBlt function; c. The function can be performed multiple times with the same effect as performing it only once (i.e., anything except XOR, assuming the source and destination blocks don't overlap). The "touch" and "no touch" cases are distinguished as part of the dispatch. Q is used exclusively to hold the complement of the gray value for the current scan line, or zero if gray is not used. To reduce miss wait and increase performance, while processing each scan line we fall out of the main loop once per munch and PreFetch one munch for the next scan line. This strategy depends on the assumption that each scan line is less than 100 munches long (for a 100-row cache, which is what the Dorado has at present). Note that PreFetches are done left-to-right even if transfers are done right-to-left. Note that the current implementation is imperfect in that the last munch of the next scan line may not be prefetched, or an extra munch prefetched unnecessarily, because the main loop does not terminate at munch boundaries but rather at multiples of 20 words from the end of the scan line. Also, the first scan line is not prefetched, and a line past the end of the last scan line is prefetched unnecessarily. These defects should not affect performance noticeably in normal use. % TopLevel; KnowRBase[BBRegs]; %*----------------------------------------------------------- Case A: dest _ gray ShC = R straight through. BBOp = NOT A. Entry point to this code is at GrayDestSetup. It dispatches here after setup. Timing, per scan line, including vertical loop overhead: 20 cycles minimum in the normal case +1 cycle per full word (excluding first and last partial words) +4 + (2 * # of munches) cycles if block is wider than 20B words 13 cycles total in the "thin" case %*----------------------------------------------------------- * T has first partial word to be stored, and SrcWd has (uncomplemented) * gray word. GrayBody: BBAt[GrayBodyLoc], Q_ SrcWd; BBDst_ (Store_ BBDst)+/-1, DBuf_ T, Branch[GrayEnd, Cnt=0&-1]; GrayLoop: BBDst_ (Store_ BBDst)+/-1, DBuf_ Q, Branch[GrayLoop, Cnt#0&-1]; GrayEnd: MCount_ (MCount)-1, Cnt_ 17S, Branch[GrayLast, R<0]; PrefDst_ (PreFetch_ PrefDst), Carry20, Branch[GrayLoop]; GrayLast: Fetch_ BBDst, Branch[DstFinish]; %*----------------------------------------------------------- Case B: dest _ f(gray, dest) ShC = R straight through. BBOp is taken from the following table: Operation SourceType BBOP source OR dest gray NOT A OR B source XOR dest gray A EQV B NOT source AND dest gray A AND B The gray word is kept in SrcWd and put directly into the shifter. Case A also dispatches to one of case B's entry points, GrayDstSetup. The setup code dispatches to the correct body routine. Timing, per scan line, including vertical loop overhead: 18 cycles minimum in the normal case +3 cycles per full word (excluding first and last partial words) +6 + (2 * # of pages) cycles if data needs to be "touched" +4 + (2 * # of munches) cycles if block is wider than 20B words 13 cycles total in the "thin" case %*----------------------------------------------------------- GrayDstSetup&Touch: BBAt[GrayDstSetup&TouchLoc], T_ DRast, Call[TouchDst]; BDispatch_ VCount, Branch[.+2]; * Duplicate for placement GrayDstSetup: BBAt[GrayDstSetupLoc], BDispatch_ VCount; RBase_ RBase[Gray0], Call[GetGray]; KnowRBase[BBRegs]; SrcWd_ NOT T, Fetch_ BBDst; * Label the target for the horizontal dispatch. It is never actually reached * by the dispatch, since the branch address is modified by BigBDispatch. * The instruction so labelled should be on the same page as the actual * targets, and it should not otherwise be constrained. HorizontalDisp: BBAt[HorizontalDispLoc], PrefDst_ (PreFetch_ PrefDst), Carry20, Branch[DstThin, Cnt=0&-1]; * The following dispatch may modify GrayBody to GrayDstBody. BigBDispatch_ BBDisp, Branch[.+2, R<0]; T_ XShMDLMask[SrcWd], B_ MD, Branch[GrayBody]; T_ XShMDRMask[SrcWd], B_ MD, Branch[GrayBody]; * Body routine for case B only. GrayDstBody: BBAt[GrayDstBodyLoc], BBDst_ T_ (Store_ BBDst)+/-1, DBuf_ T; BBSrc_ (Fetch_ T)+/-1, Branch[GrayDstEnd, Cnt=0&-1]; * Inner loop runs with one word fetched ahead (now in MD). * BBSrc runs one word ahead of BBDst. GrayDstLoop: BBSrc_ (Fetch_ BBSrc)+/-1, T_ MD; T_ XShiftNoMask[SrcWd], B_ T; BBDst_ (Store_ BBDst)+/-1, DBuf_ T, Branch[GrayDstLoop, Cnt#0&-1]; GrayDstEnd: MCount_ (MCount)-1, Cnt_ 17S, Branch[DstFinish, R<0]; PrefDst_ (PreFetch_ PrefDst), Carry20, Branch[GrayDstLoop]; * Store last partial word. This is the tail of case A also. * Have already fetched the last word (at BBDst). DstFinish: BBDisp, Branch[.+2, R<0]; T_ XShMDRMask[SrcWd], B_ MD, Branch[StoreLastDst]; T_ XShMDLMask[SrcWd], B_ MD, Branch[StoreLastDst]; * Thin destination slice, for cases A and B. DstThin: T_ XShMDBothMasks[SrcWd], B_ MD; StoreLastDst: BBDst_ T_ (Store_ BBDst)+/-1, DBuf_ T, Branch[AdvanceDst]; %*----------------------------------------------------------- Case C: dest _ f(source) Case D: dest _ f(source, gray) BBOp is taken from the following table: Operation SourceType BBOP source source NOT A (Q = 0) source NOT source A (Q = 0) source source AND gray NOT A (Q = NOT gray) Q is ORed with shifter output on the AMux. Entry points to this code are at SrcDstSetup, SrcDstSetup&Touch, SrcGrayDstSetup, and SrcGrayDstSetup&Touch. They dispatch here after setup. Timing, per scan line, including vertical loop overhead: 25 cycles minimum in the normal case +4 cycles per full word (excluding first and last partial words) +12 + (4 * # of pages) cycles if data needs to be "touched" +1 cycle if 2 source words are required for the first destination word +3 cycles if gray is required +4 + (5 * # of munches) cycles if block is wider than 20B words 17 cycles minimum in the "thin" case +1 cycle if 2 source words are required +3 cycles if gray is required %*----------------------------------------------------------- * Body routine for cases C and D only. SrcGrayBody: BBAt[SrcGrayBodyLoc], T_ BBDst_ (Store_ BBDst)+/-1, DBuf_ T, FlipMemBase, **** Program around MicroD problem. Desired branch clause is: **** Call[SrcGrayEnd, Cnt=0&-1]; **** but we must actually write: BRGO@[0] RETCL@[2] JCN[43]; BBSrc_ (Fetch_ BBSrc)+/-1, FlipMemBase, BBAt[SrcGrayBodyLoc, 1]; **** MicroD problem SrcWd_ MD, T_ SrcWd; T_ XShiftNoMask[SrcWd], A_ Q, Branch[SrcGrayBody]; * This is called as a subroutine at the end of each munch. * It either exits the horizontal loop or returns to do another munch. SrcGrayEnd: Subroutine; MCount_ (MCount)-1, Cnt_ 17S, DblBranch[SrcDstFinish, SrcDstMore, R<0], BBAt[SrcGrayBodyLoc, 2]; **** MicroD problem TopLevel; %*----------------------------------------------------------- Case E: dest _ f(source, dest) Case F: dest _ f(source, gray, dest) BBOp is taken from the following table: Operation SourceType BBOP source OR dest source NOT A OR B (Q = 0) source XOR dest source A EQV B (Q = 0) NOT source AND dest source A AND B (Q = 0) source OR dest NOT source A OR B (Q = 0) source XOR dest NOT source A XOR B (Q = 0) NOT source AND dest NOT source NOT A AND B (Q = 0) source OR dest source AND gray NOT A OR B (Q = NOT gray) source XOR dest source AND gray A EQV B (Q = NOT gray) NOT source AND dest source AND gray A AND B (Q = NOT gray) Q is ORed with shifter output on the AMux. Cases C and D also dispatch these entry points. The setup code dispatches to the correct body routines. Timing, per scan line, including vertical loop overhead: 25 cycles minimum in the normal case + 5 cycles per full word (excluding first and last partial words) +12 + (4 * # of pages) cycles if data needs to be "touched" + 1 cycle if 2 source words required for the first destination word + 3 cycles if gray is required + 4 + (5 * # of munches) cycles if block is wider than 20B words 17 cycles minimum in the "thin" case + 1 cycle if 2 source words are required + 3 cycles if gray is required %*----------------------------------------------------------- SrcGrayDstSetup&Touch: BBAt[SrcGrayDstSetup&TouchLoc], T_ DRast, FlipMemBase, Call[TouchDst]; T_ SRast, FlipMemBase, Call[TouchSrc]; SrcGrayDstSetup: BBAt[SrcGrayDstSetupLoc], BDispatch_ VCount; RBase_ RBase[Gray0], Call[GetGray]; KnowRBase[BBRegs]; Q_ T, BBFlags, DblBranch[SrcDstSetup2, SrcDstSetup1, R<0]; SrcDstSetup&Touch: BBAt[SrcDstSetup&TouchLoc], T_ DRast, FlipMemBase, Call[TouchDst]; T_ SRast, FlipMemBase, Call[TouchSrc]; SrcDstSetup: BBAt[SrcDstSetupLoc], BBFlags, DblBranch[SrcDstSetup2, SrcDstSetup1, R<0]; SrcDstSetup2: BBSrc_ (Fetch_ BBSrc)+/-1; SrcDstSetup1: PrefSrc_ (PreFetch_ PrefSrc), Carry20; T_ MD, BBSrc_ (Fetch_ BBSrc)+/-1, FlipMemBase; SrcWd_ MD, Fetch_ BBDst; PrefDst_ (PreFetch_ PrefDst), Carry20, Branch[SrcDstThin, Cnt=0&-1]; * The following dispatch may modify SrcGrayBody to SrcGrayDstBody. BigBDispatch_ BBDisp, Branch[.+2, R<0]; T_ XShMDLMask[SrcWd], A_ Q, B_ MD, Branch[SrcGrayBody]; T_ XShMDRMask[SrcWd], A_ Q, B_ MD, Branch[SrcGrayBody]; * Cases E and F (cont'd) * Body routine for cases E and F only. SrcGrayDstBody: BBAt[SrcGrayDstBodyLoc], T_ BBDst_ (Store_ BBDst)+/-1, DBuf_ T, FlipMemBase, **** Program around MicroD problem. Desired branch clause is: **** Call[SrcGrayDstEnd, Cnt=0&-1]; **** but we must actually write: BRGO@[0] RETCL@[2] JCN[103]; BBSrc_ (Fetch_ BBSrc)+/-1, FlipMemBase, BBAt[SrcGrayDstBodyLoc, 1]; **** MicroD problem SrcWd_ MD, T_ SrcWd, Fetch_ T; T_ XShiftNoMask[SrcWd], A_ Q, B_ MD, Branch[SrcGrayDstBody]; * This is called as a subroutine at the end of each munch. * It either exits the horizontal loop or returns to do another munch. * Note: code following SrcGrayDstMore is used by cases C and D also. SrcGrayDstEnd: Subroutine; MCount_ (MCount)-1, Cnt_ 17S, DblBranch[SrcDstFinish, SrcDstMore, R<0], BBAt[SrcGrayDstBodyLoc, 2]; **** MicroD problem SrcDstMore: PrefSrc_ (PreFetch_ PrefSrc), Carry20; FlipMemBase; PrefDst_ (PreFetch_ PrefDst), Carry20; FlipMemBase, Return; TopLevel; * Store last partial word, for cases C, D, E, and F. SrcDstFinish: BBFlags, Branch[.+2, R even]; * Fetch extra word at end? BBSrc_ (Fetch_ BBSrc)+/-1, FlipMemBase, Branch[.+2]; FlipMemBase; SrcWd_ MD, T_ SrcWd, Fetch_ T; BBDisp, Branch[.+2, R<0]; T_ XShMDRMask[SrcWd], A_ Q, B_ MD, Branch[StoreLastSrcDst]; T_ XShMDLMask[SrcWd], A_ Q, B_ MD, Branch[StoreLastSrcDst]; * Thin destination slice, for cases C, D, E, and F. SrcDstThin: T_ XShMDBothMasks[SrcWd], A_ Q, B_ MD; StoreLastSrcDst: BBDst_ (Store_ BBDst)+/-1, DBuf_ T, FlipMemBase, Branch[AdvanceSrcDst]; *----------------------------------------------------------- * Other subroutines *----------------------------------------------------------- *----------------------------------------------------------- GetGray: * Get the gray word for the current scan line. * Calling sequence: * BDispatch_ VCount; * RBase_ RBase[Gray0], Call[GetGray]; * Exit: T = complement of gray word * RBase = BBRegs * Note: returns Gray(n mod 4), where n is the number of scan lines remaining * after the current scan line. Note that VCount = n-1. *----------------------------------------------------------- Subroutine; KnowRBase[Gray0]; T_ NOT (Gray1), RBase_ RBase[BBRegs], Return, DispTable[10], Global; KnowRBase[Gray0]; T_ NOT (Gray2), RBase_ RBase[BBRegs], Return; KnowRBase[Gray0]; T_ NOT (Gray3), RBase_ RBase[BBRegs], Return; KnowRBase[Gray0]; T_ NOT (Gray0), RBase_ RBase[BBRegs], Return; KnowRBase[Gray0]; T_ NOT (Gray1), RBase_ RBase[BBRegs], Return; KnowRBase[Gray0]; T_ NOT (Gray2), RBase_ RBase[BBRegs], Return; KnowRBase[Gray0]; T_ NOT (Gray3), RBase_ RBase[BBRegs], Return; KnowRBase[Gray0]; T_ NOT (Gray0), RBase_ RBase[BBRegs], Return; *----------------------------------------------------------- TouchSrc: * Touches every page of the source item for this scan line. * Entry: BBSrc = address of first word of source item * T = SRast = words per source scan line; negative => bottom-to-top * SrcInc = distance between last word of one scan line and * first word of the next; negative => bottom-to-top * MemBase = BBSrcBR * Exit: T and BBTemp clobbered * Method: touch first and last words of the item. If the item is >401B * words long, also touch intermediate words at multiples of 400B words from * the last word of the item. * Note: |SRast-SrcInc| gives the number of words touched in the item; * the value is positive if moving left-to-right, negative if right-to-left. * Timing: 5 cycles for first page or less * +1 cycle if right-to-left * +2 cycles per additional page *----------------------------------------------------------- Subroutine; T_ T-(SrcInc)-1; BBTemp_ (Fetch_ BBSrc)+T, DblBranch[TouchRtoL, TouchLtoR, ALU<0]; *----------------------------------------------------------- TouchDst: * Touches every page of the destination item for this scan line. * Entry: BBDst = address of first word of destination item * T = DRast = words per destination scan line; negative => bottom-to-top * DstInc = distance between last word of one scan line and * first word of the next; negative => bottom-to-top * MemBase = BBDstBR * Exit: T and BBTemp clobbered * Timing: 5 cycles for first page or less * +1 cycle if right-to-left * +2 cycles per additional page *----------------------------------------------------------- Subroutine; T_ T-(DstInc)-1; BBTemp_ (Fetch_ BBDst)+T, DblBranch[TouchRtoL, TouchLtoR, ALU<0]; TouchLtoR: * T has (# words in source item)-1, BBTemp points to last (rightmost) word. T_ T-(400C)-1; BBTemp_ (Fetch_ BBTemp)-(400C), Branch[.+2, ALU<0]; T_ T-(400C), Branch[.-1]; Return; TouchRtoL: * T has -(# words in source item)-1, BBTemp points to last (leftmost) word -2. BBTemp_ (BBTemp)+(2C); * Fix the off-by-2 address T_ T+(400C)+1; * Should be T+402, but no matter BBTemp_ (Fetch_ BBTemp)+(400C), Branch[.+2, ALU>=0]; T_ T+(400C), Branch[.-1]; Return;