:TITLE[BITBLT]; %Last edited by E. Fiala 29 April 1980 BBTable format WORD NAME 0 Function bit 0 Long Bitblt; bits 14-17 function 1 unused 2 DBCA Dest BCA Base Core Address of dest bit map 3 DBMR Dest BMR Bit Map Raster width in words (>=0) 4 DLX Dest LX Block's Left X offset from 1st bit of scan-line (>= 0) 5 DTY Dest TY Block's Top Y offset from 1st scan-line (>=0) 6 DW Dest W Width in bits of block (>=0) 7 DH Dest H Height in scan-lines of block (>=0) 10 SBCA Src BCA 11 SBMR Src BMR (>=0??) 12 SLX Src LX (>=0) 13 STY Src TY (>=0) 14 Gray0 These four words are the Gray Block 15 Gray1 Gray0 is used on the 1st item, 16 Gray2 Gray1 on the 2nd, Gray2 on the 3rd, 17 Gray3 Gray3 on the 4th, Gray0 on the 5th, etc. 20 LongSrcLo This pair instead of MDShi/SBCA if long 21 LongSrcHi 22 LongDestLo This pair instead of MDShi/DBCA if long 23 LongDestHi BitBlt functions: [X=0 uses BBFB (Dest & Mask'), X=1 uses BBFBX (Dest unmasked)] [MA=0 causes Src & Mask, MA=1 causes Src or Mask'] CODE MA X SAlufOp Action Computation 0 0 0 R or T S (D & M') or (S & M) 1 0 1 R or T D or S D or (S & M) 2 0 1 R # T D # S D # (S & M) 3 0 1 R & T' D & S' D & (S & M)' 4 1 0 R or T' S' (D & M') or (S or M')' 5 1 1 R or T' D or S' D or (S or M')' 6 1 1 R # T' D # S' D # (S or M')' 7 1 1 R & T D & S D & (S or M') 10 0 1 R or T (D & S') or (S & G) (D & (S & M)') or (S & M & G) 11 0 1 R or T D or (S & G) D or (S & M & G) 12 0 1 R # T D # (S & G) D # (S & M & G) 13 0 1 R & T' D & (S & G)' D & (S & M & G)' 14 0 0 R or T G (D & M') or (G & M) 15 0 1 R or T D or G D or (G & M) 16 0 1 R # T D # G D # (G & M) 17 0 1 R & T' D & G' D & (G & M)' bbFunction is in the following format: 00 mesa long pointer during early initialization; L-to-R = 0 / R-to-L = 1 after early init. 01-03 unused 04-07 which-innerloop index 10-13 unused 14-17 Bitblt function code % *BBFA dispatch values Set[bbItem,3]; *item refill Set[bbSDRef,4]; *source and destination refill Set[bbSRef,5]; *source refill Set[bbDRef,6]; *destination refill Set[bbNoRef,7]; *no refill Set[bbILtype0,00]; *functions 1-3 and 5-7 Set[bbILtype1,02]; *functions 0 and 4 Set[bbILtype2,04]; *function 10 Set[bbILtype3,06]; *functions 11-13 Set[bbILtype4,10]; *function 14 Set[bbILtype5,12]; *functions 15-17 OnPage[bbPage]; *Mesa entry here with StkP .eq. 2, Stack holding 2*scan-lines completed, and *Stack0 in AC2 (a pointer to the BitBlt table) MesaBitBLT: T _ MDShi; AC2hi _ T; *Long pointer to BitBLT table in AC2,AC2hi T _ (Cycle&PCXF) or (100000C); *Alto entry here with StkP pointing at AC1 (2*scan-lines completed), *(Cycle&PCXF) and not (100000C) in T, and pointer to BitBlt table in AC2 *(a base reg.); the BitBlt table is known to start at an even address. bbBitBlt: PFetch2[AC2,bbItemWid,6], Task; *fetch dw and dh bbGray1 _ (bbGray1) or not (0C); *-1=don't touch pages; 0=do AC0 _ T, Skip[BPCChk']; *Save PCF and Mesa/Alto flag in AC0 PC _ (PC) + (4C); *Advance PC on imminent refill PFetch1[AC2,bbFunction,0]; *Even bbNegBitsLeft _ Zero; *0 L-to-R, variable R-to-L *Setup bbSrcWLo/bbSrcQHi and bbDestWLo/bbDestQHi double-precision; in *short mode, sbca/dbca are fetched into bbSrcWLo/bbDestWLo and MDShi rsh 8 is *copied into bbSrcQHi/bbDestQHi; in long mode the long pointers are fetched *into bbSrcQLo/Hi and bbDestQLo/Hi and then the Lo parts are copied into *bbSrcWLo and bbDestWLo, respectively. PFetch1[AC2,bbDBMR,3]; T _ Rsh[Stack,1]; *T _ scan-lines completed bbItemsLeft _ (bbItemsLeft) - T - 1; PFetch1[AC2,bbSBMR,11], Goto[bbExit,Alu<0]; *exit if no items T _ bbItemWid; ***Maybe this check should be for item width > 0 bbNegItemWid _ (Zero) - T, Skip[Alu#0]; PCF _ AC0, DblGoto[bbMDone,bbNDone,R<0]; *Exit if width=0 PFetch2[AC2,bbDlx,4]; *fetch dlx and dty bbFunction _ (bbFunction) and not (177400C), Goto[bbShort,R>=0]; T _ 22C, Task; PFetch2[AC2,bbDestQLo]; T _ 20C; PFetch2[AC2,bbSrcQLo], Goto[bbDir]; bbShort: PFetch1[AC2,bbDestQLo,2]; *bbDestQLo _ dbca T _ Rsh[MDShi,10]; bbDestQHi _ T; PFetch1[AC2,bbSrcQLo,10], Task; *bbSrcQLo _ sbca bbSrcQHi _ T; %Determination of BitBlt directions (T=Top, B=Bottom, L=Left, R=Right): T-to-B, L-to-R is fastest, so do that when the source is not used; R-to-L is slowest, using multiple item setups/scan-line, so avoid it when possible. B-to-T, L-to-R (dty > sty) T-to-B, R-to-L (dty = sty) and (dlx - slx >= 100b) and (item >= 100b wide) and (slx+scanlinewidth > dlx) T-to-B, L-to-R (dty < sty) or (dty = sty) when not doing R-to-L This dangerously assumes overlap is impossible unless DBCA .eq. SBCA (or the equivalent in long mode) and no check is made for this equality. Timing from bbBitBlt to here: 47 short, 48 long (cycles). % bbDir: Dispatch[bbFunction,14,4], Call[bbFnSetup]; *Odd SAluf _ T, T _ bbDBMR; bbGray2 _ T, Goto[.+3,MB']; T _ Rsh[Stack,1]; bbDty _ (bbDty) + T, Goto[bbDestInit]; *T-to-B if no source T _ bbDty; LU _ (bbSty) - T; *calc sty - dty T _ bbSBMR, FreezeResult; bbGray _ T, FreezeResult, Goto[bbTtoB,Alu>=0]; bbSBMR _ (Zero) - T; *Odd; B-to-T if SrcY < DestY T _ bbDBMR, Task; bbDBMR _ (Zero) - T; T _ bbItemsLeft, Goto[bbGenlInit]; *SrcY >= DestY: will go T-to-B bbTtoB: T _ bbSlx, Skip[Alu=0]; T _ Rsh[Stack,1], Goto[bbGenlInit]; *L-to-R if SrcY > DestY *SrcY = DestY bbGray1 _ Zero; *Force pages to be touched T _ (bbDlx) - T; *T _ dlx - slx LU _ (Lsh[AllOnes,6]) and T, Skip[ALU>=0]; T _ Rsh[Stack,1], Goto[bbGenlInit]; *L-to-R if DestX < SrcX *DestX >= SrcX; bbNegSDNonOverlap _ - (DestX - SrcX) bbNegSDNonOverlap _ (Zero) - T, Skip[Alu#0]; *Even T _ Rsh[Stack,1], Goto[bbGenlInit]; *L-to-R if (DestX - SrcX) < 100b LU _ LdF[bbItemWid,0,12]; LU _ (bbItemWid) - T, Skip[Alu#0]; T _ Rsh[Stack,1], Goto[bbGenlInit]; *L-to-R if item < 100b long T _ Rsh[Stack,1], Goto[bbGenlInit,Carry']; *or width < non-overlap bbFunction _ (bbFunction) or (100000C), Goto[bbGenlInit]; *R-to-L *General init: T has items completed if T-to-B, items remaining if B-to-T. *Time: (78 to 82 B-to-T), (74 to 91 T-to-B, L-to-R), (95 R-to-L) + 1 if long bbGenlInit: bbDty _ (bbDty) + T; *Even **Save 3 mi both here and in dest init by using a multiply loop that is **9*nzeroes to the right + 13*nones (i.e., slows init about 5 + 2 cycles/bit) LU _ bbGray; bbSrcWLo _ Zero, Goto[bbAddSF,Alu=0]; *sbmr=0 not impossible T _ bbSty _ (bbSty) + T, Call[bbYHi0]; *bbSrcQLo/QHi + (sty*sbmr) + (slx rsh 4); product may be > 16 bits bbGray _ Rsh[bbGray,1], Goto[bbNoAddS,R Even]; *Multiply timing: 6*nzeroes right of the left-most one + 14*nones in sbmr. bbSrcQLo _ (bbSrcQLo) + T, Goto[.+3,Alu#0]; T _ (bbYHi) + 1, UseCOutAsCIn; bbSrcQHi _ (bbSrcQHi) + T, Goto[bbAddSF]; T _ (bbYHi) + 1, UseCOutAsCIn; bbSrcQHi _ (bbSrcQHi) + T; bbNoAddS: T _ bbSty _ Lsh[bbSty,1], DblGoto[bbLshyHi1,bbLshyHi0,R<0]; bbAddSF:T _ RHMask[bbSrcQHi]; *Even bbSrcQHi _ (Lsh[bbSrcQHi,10]) + T + 1; *bbSrcQHi in base reg. format T _ LdF[bbSlx,14,4], Call[bbNegIWSub]; *bbSlast _ (Slx & 17) + ItemWid - 1 = displacement to last bit of scan-line *Add (slx rsh 4) and copy WLo into QLo; point bbSlx at bit in 1st quadword. bbSlast _ (Zero) - T - 1, Call[bbSBWQ]; *Time to here: (1 if long) + {(68 if no source) else multiply time + * [(107 to 112 B-to-T), (101 to 119 T-to-B, L-to-R), (122 to 123 R-to-L)]} bbDestInit: LU _ bbGray2; bbDestWLo _ Zero, Goto[bbAddDF,Alu=0]; T _ bbDty, Call[bbYHi0]; *bbDestQLo/QHi + (dty*dbmr) + (dlx rsh 4); product may be > 16 bits bbGray2 _ Rsh[bbGray2,1], Goto[bbNoAddD,R Even]; bbDestQLo _ (bbDestQLo) + T, Goto[.+3,Alu#0]; T _ (bbYHi) + 1, UseCOutAsCIn; bbDestQHi _ (bbDestQHi) + T, Goto[bbAddDF]; T _ (bbYHi) + 1, UseCOutAsCIn; bbDestQHi _ (bbDestQHi) + T; bbNoAddD: T _ bbDty _ Lsh[bbDty,1], DblGoto[bbLshyHi1,bbLshyHi0,R<0]; bbAddDF:T _ RHMask[bbDestQHi]; *Even bbDestQHi _ (Lsh[bbDestQHi,10]) + T + 1; PFetch2[AC2,bbGray2,16], Call[bbDBWQ]; T _ LdF[bbDlx,14,4], Call[bbNegIWSub]; *offset to last scan-line bit PFetch2[AC2,bbGray,14], Skip[MB]; MNBR _ bbNegItemWid, Goto[bbTchS]; MNBR _ bbNegItemWid, Goto[bbNoTchS]; %Time to bbTchD/S: (1 if long) + [(103 to 104 if no source) else (144 to 150 B-to-T), (136 to 155 T-to-B, L-to-R), (157 to 159 R-to-L)] To this add entry/exit overhead less time between bbItemRefill and bbTchD/S: Alto--16 + 16; Alto Mesa--10 + 30; Pilot Mesa--10 + 15 (-13 T-to-B), (-17 B-to-T), (-10 no source). Total time: [152 to 158 B-to-T, 148 to 167 T-to-B L-to-R, 169 to 171 R-to-L, 118 to 119 no source] + [0 Pilot, 7 Alto, or 15 Alto Mesa] + [1 if long]. To this add about 60 cycles/multiply (1 multiply if no source else 2). % bbyHi0: bbYHi _ 0C, Return; bbLshyHi0: bbYHi _ Lsh[bbYHi,1], Return; bbLshyHi1: bbYHi _ (Lsh[bbYHi,1]) + 1, Return; bbNegIWSub: T _ (bbNegItemWid) - T; T _ (bbGray1) or T, Skip[R Odd]; T _ (Lsh[AllOnes,1]) and T; *Make bbLast be odd bbDlast _ (Zero) - T - 1, Return; *Mask bbLast be 0 *T[08] _ MA', T[09] _ MB, T[10:15] _ Alu op *The MB branch condition is used to indicate "no source." bbFnSetup: PFetch2[AC2,bbSlx,12], Disp[.+1]; *fetch slx and sty bbFunction _ 1000C, At[bbF,0]; *0,R or T bbOr: T _ 204C, Return, At[bbF,1]; *0,R or T bbX: T _ 263C, Goto[bbForceTch], At[bbF,2]; *0,R xor T bbAN: T _ 227C, Return, At[bbF,3]; *0,R & T' bbFunction _ 1000C, At[bbF,4]; *1,R or T' T _ 074C, Return, At[bbF,5]; *1,R or T' T _ 054C, Goto[bbForceTch], At[bbF,6]; *1,R xnor T T _ 056C, Return, At[bbF,7]; *1,R & T bbFunction _ 2000C, Goto[bbOr], At[bbF,10]; *0,R or T bbFunction _ 3000C, Goto[bbOr], At[bbF,11]; *0,R or T bbFunction _ 3000C, Goto[bbX], At[bbF,12]; *0,R xor T bbFunction _ 3000C, Goto[bbAN], At[bbF,13]; *0,R & T' T _ 304C, Goto[bbTR4], At[bbF,14]; *0,R or T T _ 304C, Goto[bbTR5], At[bbF,15]; *0,R or T T _ 363C, Goto[bbTR5Tch], At[bbF,16]; *0,R xor T T _ 327C, Goto[bbTR5], At[bbF,17]; *0,R & T' bbTR5: bbFunction _ 5000C, Return; *type 5; no source bbTR4: bbFunction _ 4000C, Return; *type 4; no source bbTR5Tch: bbFunction _ 5000C; *type 5; touch pages bbForceTch: bbGray1 _ Zero, Return; %Approx. item refill times starting at bbItemRefill are below; the inner-loop dependent constant given in the comments before the inner loops must be added to these and the time is 1 (Alto) or 9 (Mesa) cycles greater if interrupts are disabled: L-to-R, src used: 58 (T-to-B) or 61 (B-to-T) cycles [+ 12 + 11*(NDestPages+NSrcPages-2) if xor/xnor functions or sty=dty] L-to-R, src unused: 41 [+ 5 + 11*(NDestPages-1) if xor function] R-to-L: 11*(NDestPages+NSrcPages) + 93 cycles R-to-L, continuing item: 62 cycles, 47 or 63 on last continuation When both src and dest are used the no-refill case will occur at most 6 times followed by a src-refill, dest-refill, or item-refill. If src and dest are word-aligned, at most 3 no-refill loops will occur followed by a src-dest-refill or item refill. When only the dest is used, at most 3 no-refill loops occur followed by a src-dest-refill or item refill. % bbSrcFetch: PFetch4[bbSrcQLo,bbSrc], Return; *Note: Doing PStore4 first allows both PFetch4's to be launched before *transport for either occurs. If the PFetch4 for the src were done first, *the PFetch4 for the dest could not be launched before transport for both *preceding references had finished. bbSrcDestRfl: PStore4[bbDestQLo,bbDest]; T _ bbSrcWLo _ (bbSrcWLo) + (4C); PFetch4[bbSrcQLo,bbSrc], Skip; bbDestRfl: PStore4[bbDestQLo,bbDest]; T _ bbDestWLo _ (bbDestWLo) + (4C); PFetch4[bbDestQLo,bbDest], Return; *The bbIA, bbIB, bbIE, and bbIF dispatch tables could be united by revising *the SrcDestRefill subroutine to check MB; this would save 12b mi but slow *inner loops by 2 to 3 cycles. bbInnerLoops: *functions 1-3 and 5-7; refill times: i=4+I, sd=28, s=20, d=26, n=4 bbIA1: bbDlx _ (DB _ bbDlx) + T, Call[bbFixG], At[bbI,bbILtype0]; T _ BBFA[SB[bbSrc]] or T; bbIA2: DB[bbDest] _ BBFBX[DB[bbDest]] SAlufOp T, Disp[.+1]; bbItemRefill: T _ bbDestWLo, Goto[bbItemRfl], At[bbIA,bbItem]; bbSrcDestRefill: T _ bbDestWLo, Goto[bbSrcDestRfl], At[bbIA,bbSDRef]; bbSrcRefill: T _ bbSrcWLo _ (bbSrcWLo) + (4C), Goto[bbSrcFetch], At[bbIA,bbSRef]; bbDestRefill: T _ bbDestWLo, Goto[bbDestRfl], At[bbIA,bbDRef]; T _ BBFA[SB[bbSrc]] or T, Goto[bbIA2], At[bbIA,bbNoRef]; *functions 0 and 4; refill times: i=4+I, sd=32, s=18, d=26, n=4 bbIB1: bbDlx _ (DB _ bbDlx) + T, Call[bbFixG], At[bbI,bbILtype1]; T _ BBFA[SB[bbSrc]] or T; bbIB2: DB[bbDest] _ BBFB[DB[bbDest]] SAlufOp T, Disp[.+1]; T _ bbDestWLo, Goto[bbItemRefill], At[bbIB,bbItem]; T _ bbDestWLo, Goto[bbSrcDestRfl], At[bbIB,bbSDRef]; T _ bbSrcWLo _ (bbSrcWLo) + (4C), Goto[bbSrcFetch], At[bbIB,bbSRef]; T _ bbDestWLo, Goto[bbDestRfl], At[bbIB,bbDRef]; T _ BBFA[SB[bbSrc]] or T, Goto[bbIB2], At[bbIB,bbNoRef]; *function 10; refill times: i=8+I, sd=36, s=22, d=30, n=8 bbIC1: bbDlx _ (DB _ bbDlx) + T, Call[bbFixG], At[bbI,bbILtype2]; T _ BBFA[SB[bbSrc]] or T; DB[bbDest] _ (DB[bbDest]) and not T, Disp[.+1]; T _ PCF[bbGray] and T, Goto[bbICi], At[bbIC,bbItem]; T _ PCF[bbGray] and T, Goto[bbICsd], At[bbIC,bbSDRef]; T _ PCF[bbGray] and T, Goto[bbICs], At[bbIC,bbSRef]; T _ PCF[bbGray] and T, Goto[bbICd], At[bbIC,bbDRef]; T _ PCF[bbGray] and T, Goto[bbICr], At[bbIC,bbNoRef]; *functions 11-13; refill times: i=6+I, sd=34, s=20, d=28, n=6 bbID1: bbDlx _ (DB _ bbDlx) + T, Call[bbFixG], At[bbI,bbILtype3]; T _ BBFA[SB[bbSrc]] or T; T _ PCF[bbGray] and T, Disp[.+1]; bbICi: DB[bbDest] _ BBFBX[DB[bbDest]] SAlufOp T, Goto[bbItemRefill], At[bbID,bbItem]; bbICsd: DB[bbDest] _ BBFBX[DB[bbDest]] SAlufOp T, Goto[bbSrcDestRefill], At[bbID,bbSDRef]; bbICs: DB[bbDest] _ BBFBX[DB[bbDest]] SAlufOp T, Goto[bbSrcRefill], At[bbID,bbSRef]; bbICd: DB[bbDest] _ BBFBX[DB[bbDest]] SAlufOp T, Goto[bbDestRefill], At[bbID,bbDRef]; bbICr: DB[bbDest] _ BBFBX[DB[bbDest]] SAlufOp T, Return, At[bbID,bbNoRef]; *function 14; refill times: i=4+I, sd=26, s=never, d=never, n=4 bbIE1: bbDlx _ (DB _ bbDlx) + T, Call[bbFixG], At[bbI,bbILtype4]; T _ BBFA[PCF[bbGray]]; bbIE2: DB[bbDest] _ BBFB[DB[bbDest]] SAlufOp T, Disp[.+1]; T _ bbDestWLo, Goto[bbItemRflNS], At[bbIE,bbItem]; T _ bbDestWLo, Goto[bbDestRfl], At[bbIE,bbSDRef]; T _ BBFA[PCF[bbGray]], Goto[bbIE2], At[bbIE,bbNoRef]; *functions 15-17; refill times: i=4+I, sd=26, s=never, d=never, n=4 bbIF1: bbDlx _ (DB _ bbDlx) + T, Call[bbFixG], At[bbI,bbILtype5]; T _ BBFA[PCF[bbGray]]; bbIF2: DB[bbDest] _ BBFBX[DB[bbDest]] SAlufOp T, Disp[.+1]; T _ bbDestWLo, Goto[bbItemRflNS], At[bbIF,bbItem]; T _ bbDestWLo, Goto[bbDestRfl], At[bbIF,bbSDRef]; T _ BBFA[PCF[bbGray]], Goto[bbIF2], At[bbIF,bbNoRef]; bbFixG: PCF _ Stack, BBFBX, Return; bbItemRflNS: PStore4[bbDestQLo,bbDest], Call[bbCntI]; LU _ NWW, DblGoto[bbIntOff,bbIntOn,R<0]; *Worst case time to return from bbIR1 is 42 (src used). bbItemRfl: PStore4[bbDestQLo,bbDest], Call[bbIR1]; *Test for interrupts and done: *Item refill time: 12. LU _ NWW, Skip[R>=0]; bbIntOff: bbItemsLeft _ (bbItemsLeft) - 1, Skip; bbIntOn:bbItemsLeft _ (bbItemsLeft) - 1, Skip[Alu#0]; T _ bbDBMR, DblGoto[bbAdvD,bbExit1,Alu>=0]; LU _ PCF _ AC0, Skip[Alu>=0]; DblGoto[bbMDone,bbNDone,Alu<0]; LU _ xfWDC, DblGoto[bbMesaInt,bbNovaInt,Alu<0]; bbIR1: T _ bbNegBitsLeft, Goto[bbRtoLCont,R<0]; bbCntI: Stack _ (Stack) + (2C), Return; *At bbMesaInt and bbNovaint, we have committed to taking an interrupt; *control can only get back by restarting the opcode. *Enter MIPend knowing it will not return. bbMesaInt: IntType _ 1C, Skip[Alu#0]; *IntType _ 1 (PC backup if stopping) T _ bbDBMR, Goto[bbAdvD]; LoadPage[prPage]; **Worst case time since return is 17 cycles at entry to MIPend T _ (SStkP&NStkP) xor (377C), GotoP[MIPend]; *T _ StkP *Since the Nova only checks for interrupts during jumps, we simulate JMP . **Could call intEnt here and handle the (rare) return, if WW, ACTIVE, and **DMA (used by intEnt) did not clobber BitBlt registers other than **bbSrc and bbDest. bbNovaInt: T _ PCF[RZero], LoadPage[nePage], Goto[bbNExit]; * LoadPage[xoPage]; * Call[intEnt]; * T _ bbDBMR, Goto[bbAdvD]; bbNDone: T _ PCF[RZero] + 1, LoadPage[nePage]; bbNExit: PFetch4[PC,IBuf], GotoP[JmpFin]; bbMDone: PFetch4[PC,IBuf,0]; *Restore IBuf :IF[AltoMode]; ****************************** Stack&-2, LoadPage[0]; PFetch4[LOCAL,LocalCache0,4], Call[SwapBytes]; *In Alto mode, the opcode after BitBlt is the even byte of the next word, *so if BitBlt occurs on an even byte, the next byte must be skipped. Cycle&PCXF, LoadPage[4], Skip[R Even]; CSkipData, GotoP[P4Tail]; *Won't cause refill GotoP[P4Tail]; :ELSE; ************************************** Stack&-2, LoadPage[4]; PFetch4[LOCAL,LocalCache0,4], GotoP[P4Tail]; :ENDIF; ************************************* bbExit: PCF _ AC0, DblGoto[bbMDone,bbNDone,R<0]; bbExit1: PCF _ AC0, DblGoto[bbMDone,bbNDone,R<0]; *Item refill time: 16 (no src), 18 (src used) *bbDestW _ bbDestQ _ bbDestW + bbDBMR bbAdvD: T _ bbDestQLo _ (bbDestQLo) + T, Goto[bbAD1,Alu>=0]; *Even MNBR _ bbNegItemWid, Skip[Carry]; bbDestQHi _ (bbDestQHi) - (400C) - 1; bbDestWLo _ Zero; *bbSrcW _ bbSrcQ _ bbSrcW + bbSBMR (we know src is used for B-to-T) T _ bbSBMR; bbSrcQLo _ (bbSrcQLo) + T, Call[bbASn]; %Avoid page touching and simply refill bbSrc/bbDest except when either a non-restartable function is being executed (xor, xnor) or dty=sty, in which case there is possible src-dest overlap. When touching, begin with the last page of the scan-line and finish with a PFetch4 of the 1st quadword. Initial displacement is [(ItemWidth - 1 + (startbit & 17)) rsh 4] + non-page bits of start word, where the first term is computed during initialization. The choice to touch/not-touch is indicated in bbSLast/bbDLast which contain 0 when not touching or some odd value when touching. % bbTchS: T _ Rsh[bbSlast,4], Goto[bbNTS,R Even]; bbTS: T _ (RHMask[bbSrcQLo]) + T; *Odd T _ (Lsh[AllOnes,10]) and T, Call[.+2]; T _ (Lsh[AllOnes,10]) + T; PFetch4[bbSrcQLo,bbSrc], Skip[Alu=0]; *Even Return; bbSrcWLo _ Zero; SB _ bbSlx, Goto[bbTchD]; *Can eliminate these 3 mi and move the bbNTS label at cost of 2 cycles. bbNTS: PFetch4[bbSrcQLo,bbSrc]; *Even SB _ bbSlx; bbSrcWLo _ Zero, Goto[bbNTD]; bbAD1: MNBR _ bbNegItemWid, Skip[Carry']; *Even bbDestQHi _ (bbDestQHi) + (400C) + 1; bbNS1: bbDestWLo _ Zero, Goto[bbNoTchS,MB]; T _ bbSBMR; T _ bbSrcQLo _ (bbSrcQLo) + T, Call[bbASp]; T _ Rsh[bbSlast,4], DblGoto[bbNTS,bbTS,R Even]; bbNoTchS: SB _ bbDlx; bbTchD: T _ Rsh[bbDlast,4], Goto[bbNTD,R Even]; *Odd T _ (RHMask[bbDestQLo]) + T; *Odd T _ (Lsh[AllOnes,10]) and T, Call[.+2]; T _ (Lsh[AllOnes,10]) + T; bbNTD: PFetch4[bbDestQLo,bbDest], Skip[Alu=0]; *Even Return; *Item refill time to here: 33 + [5 + 11*(NDPages-1) if xor function] (no src) *50 (T-to-B), 53 (B-to-T) + *[12 + 11*(NDPgs+NSPgs-2) if xor/xnor functions or sty=dty] Dispatch[bbFunction,4,4], Goto[bbItemSetup,R>=0]; *New R-to-L item T _ bbNegItemWid; *Odd T _ (MNBR _ bbNegSDNonOverlap) - T, Goto[bbRtoLNew]; *T _ bits left bbRtoLCont: *Here when continuing R-to-L item. LU _ (MNBR _ bbNegSDNonOverlap) - T; *Odd bbDestWLo _ Zero, Skip[Carry]; T _ MNBR _ bbNegBitsLeft; *Worst case time from Return to here is 41 (continue), 34 on new item. *Initially T will contain ItemWidth-SDNonOverlap; subsequent iterations *T will contain -SDNonOverlap until the last iteration for the item, when *T will contain -BitsLeft. bbNegBitsLeft is 0 at the beginning of each item. bbRtoLNew: bbNegBitsLeft _ (bbNegBitsLeft) - T, Call[bbSlxFix]; *Odd bbDlx _ (LdF[bbDlx,14,4]) + T, Call[bbSBWQ]; SB _ bbSlx; PFetch4[bbSrcQLo,bbSrc,0], Call[bbDBWQ]; PFetch4[bbDestQLo,bbDest,0]; bbSrcWLo _ Zero; Dispatch[bbFunction,4,4]; bbItemSetup: T _ Lsh[bbDBMR,4], Disp[bbInnerLoops]; *Even %bbSBWQ and bbDBWQ are used both by initialization and in the R-to-L case. bbASn/bbASp are called by item refill to advance QLo/QHi by a signed word displacement, bbSBMR. The equivalent code in bbDBWQ is open-coded for item refill. % bbSBWQ: T _ Rsh[bbSlx,4], Skip[R>=0]; T _ (Lsh[AllOnes,14]) or T; T _ bbSrcQLo _ (bbSrcQLo) + T, Goto[bbASp,Alu>=0]; bbASn: T _ Lsh[bbSrcQLo,4], Skip[Carry]; bbSrcQHi _ (bbSrcQHi) - (400C) - 1; bbSlx _ (LdF[bbSlx,14,4]) + T, Return; bbASp: T _ Lsh[bbSrcQLo,4], Skip[Carry']; bbSrcQHi _ (bbSrcQHi) + (400C) + 1; bbSlxFix: bbSlx _ (LdF[bbSlx,14,4]) + T, Return; bbDBWQ: T _ Rsh[bbDlx,4], Skip[R>=0]; T _ (Lsh[AllOnes,14]) or T; T _ bbDestQLo _ (bbDestQLo) + T, Goto[bbADp,Alu>=0]; *Even T _ Lsh[bbDestQLo,4], Skip[Carry]; bbDestQHi _ (bbDestQHi) - (400C) - 1; bbDlx _ (LdF[bbDlx,14,4]) + T, Return; bbADp: T _ Lsh[bbDestQLo,4], Skip[Carry']; bbDestQHi _ (bbDestQHi) + (400C) + 1; bbDlxFix: bbDlx _ (LdF[bbDlx,14,4]) + T, Return; :END[BitBlt];e6(1795)