:TITLE[BITBLT];
%Last edited by E. Fiala 29 April 1980
BBTable format
WORD NAME
0Functionbit 0 Long Bitblt; bits 14-17 function
1unused
2DBCADest BCABase Core Address of dest bit map
3DBMRDest BMRBit Map Raster width in words (>=0)
4DLXDest LXBlock’s Left X offset from 1st bit of scan-line (>= 0)
5DTYDest TYBlock’s Top Y offset from 1st scan-line (>=0)
6DWDest WWidth in bits of block (>=0)
7DHDest HHeight in scan-lines of block (>=0)
10SBCASrc BCA
11SBMRSrc BMR(>=0??)
12SLXSrc LX(>=0)
13STYSrc TY(>=0)
14Gray0These four words are the Gray Block
15Gray1Gray0 is used on the 1st item,
16Gray2Gray1 on the 2nd, Gray2 on the 3rd,
17Gray3Gray3 on the 4th, Gray0 on the 5th, etc.
20LongSrcLoThis pair instead of MDShi/SBCA if long
21LongSrcHi
22LongDestLoThis pair instead of MDShi/DBCA if long
23LongDestHi
BitBlt functions:
[X=0 uses BBFB (Dest & Mask’), X=1 uses BBFBX (Dest unmasked)]
[MA=0 causes Src & Mask, MA=1 causes Src or Mask’]
CODEMA XSAlufOpActionComputation
00 0R or TS(D & M’) or (S & M)
10 1R or TD or SD or (S & M)
20 1R # TD # SD # (S & M)
30 1R & T’D & S’D & (S & M)’
41 0R or T’S’(D & M’) or (S or M’)’
51 1R or T’D or S’D or (S or M’)’
61 1R # T’D # S’D # (S or M’)’
71 1R & TD & SD & (S or M’)
100 1R or T(D & S’) or (S & G)(D & (S & M)’) or (S & M & G)
110 1R or TD or (S & G)D or (S & M & G)
120 1R # TD # (S & G)D # (S & M & G)
130 1R & T’D & (S & G)’D & (S & M & G)’
140 0R or TG(D & M’) or (G & M)
150 1R or TD or GD or (G & M)
160 1R # TD # GD # (G & M)
170 1R & T’D & G’D & (G & M)’
bbFunction is in the following format:
00mesa long pointer during early initialization;
L-to-R = 0 / R-to-L = 1 after early init.
01-03unused
04-07which-innerloop index
10-13unused
14-17Bitblt function code
%
*BBFA dispatch values
Set[bbItem,3];*item refill
Set[bbSDRef,4];*source and destination refill
Set[bbSRef,5];*source refill
Set[bbDRef,6];*destination refill
Set[bbNoRef,7];*no refill
Set[bbILtype0,00];*functions 1-3 and 5-7
Set[bbILtype1,02];*functions 0 and 4
Set[bbILtype2,04];*function 10
Set[bbILtype3,06];*functions 11-13
Set[bbILtype4,10];*function 14
Set[bbILtype5,12];*functions 15-17
OnPage[bbPage];
*Mesa entry here with StkP .eq. 2, Stack holding 2*scan-lines completed, and
*Stack0 in AC2 (a pointer to the BitBlt table)
MesaBitBLT: T ← MDShi;
AC2hi ← T;*Long pointer to BitBLT table in AC2,AC2hi
T ← (Cycle&PCXF) or (100000C);
*Alto entry here with StkP pointing at AC1 (2*scan-lines completed),
*(Cycle&PCXF) and not (100000C) in T, and pointer to BitBlt table in AC2
*(a base reg.); the BitBlt table is known to start at an even address.
bbBitBlt:
PFetch2[AC2,bbItemWid,6], Task;*fetch dw and dh
bbGray1 ← (bbGray1) or not (0C);*-1=don’t touch pages; 0=do
AC0 ← T, Skip[BPCChk’];*Save PCF and Mesa/Alto flag in AC0
PC ← (PC) + (4C);*Advance PC on imminent refill
PFetch1[AC2,bbFunction,0];*Even
bbNegBitsLeft ← Zero;*0 L-to-R, variable R-to-L
*Setup bbSrcWLo/bbSrcQHi and bbDestWLo/bbDestQHi double-precision; in
*short mode, sbca/dbca are fetched into bbSrcWLo/bbDestWLo and MDShi rsh 8 is
*copied into bbSrcQHi/bbDestQHi; in long mode the long pointers are fetched
*into bbSrcQLo/Hi and bbDestQLo/Hi and then the Lo parts are copied into
*bbSrcWLo and bbDestWLo, respectively.
PFetch1[AC2,bbDBMR,3];
T ← Rsh[Stack,1];*T ← scan-lines completed
bbItemsLeft ← (bbItemsLeft) - T - 1;
PFetch1[AC2,bbSBMR,11], Goto[bbExit,Alu<0];*exit if no items
T ← bbItemWid;
***Maybe this check should be for item width > 0
bbNegItemWid ← (Zero) - T, Skip[Alu#0];
PCF ← AC0, DblGoto[bbMDone,bbNDone,R<0];*Exit if width=0
PFetch2[AC2,bbDlx,4];*fetch dlx and dty
bbFunction ← (bbFunction) and not (177400C), Goto[bbShort,R>=0];
T ← 22C, Task;
PFetch2[AC2,bbDestQLo];
T ← 20C;
PFetch2[AC2,bbSrcQLo], Goto[bbDir];
bbShort:
PFetch1[AC2,bbDestQLo,2];*bbDestQLo ← dbca
T ← Rsh[MDShi,10];
bbDestQHi ← T;
PFetch1[AC2,bbSrcQLo,10], Task;*bbSrcQLo ← sbca
bbSrcQHi ← T;
%Determination of BitBlt directions (T=Top, B=Bottom, L=Left, R=Right):
T-to-B, L-to-R is fastest, so do that when the source is not used; R-to-L
is slowest, using multiple item setups/scan-line, so avoid it when possible.
B-to-T, L-to-R (dty > sty)
T-to-B, R-to-L (dty = sty) and (dlx - slx >= 100b) and (item >= 100b wide)
and (slx+scanlinewidth > dlx)
T-to-B, L-to-R (dty < sty) or (dty = sty) when not doing R-to-L
This dangerously assumes overlap is impossible unless DBCA .eq. SBCA
(or the equivalent in long mode) and no check is made for this equality.
Timing from bbBitBlt to here: 47 short, 48 long (cycles).
%
bbDir:Dispatch[bbFunction,14,4], Call[bbFnSetup];*Odd
SAluf ← T, T ← bbDBMR;
bbGray2 ← T, Goto[.+3,MB’];
T ← Rsh[Stack,1];
bbDty ← (bbDty) + T, Goto[bbDestInit];*T-to-B if no source
T ← bbDty;
LU ← (bbSty) - T;*calc sty - dty
T ← bbSBMR, FreezeResult;
bbGray ← T, FreezeResult, Goto[bbTtoB,Alu>=0];
bbSBMR ← (Zero) - T;*Odd; B-to-T if SrcY < DestY
T ← bbDBMR, Task;
bbDBMR ← (Zero) - T;
T ← bbItemsLeft, Goto[bbGenlInit];
*SrcY >= DestY: will go T-to-B
bbTtoB:T ← bbSlx, Skip[Alu=0];
T ← Rsh[Stack,1], Goto[bbGenlInit];*L-to-R if SrcY > DestY
*SrcY = DestY
bbGray1 ← Zero;*Force pages to be touched
T ← (bbDlx) - T;*T ← dlx - slx
LU ← (Lsh[AllOnes,6]) and T, Skip[ALU>=0];
T ← Rsh[Stack,1], Goto[bbGenlInit];*L-to-R if DestX < SrcX
*DestX >= SrcX; bbNegSDNonOverlap ← - (DestX - SrcX)
bbNegSDNonOverlap ← (Zero) - T, Skip[Alu#0];*Even
T ← Rsh[Stack,1], Goto[bbGenlInit];*L-to-R if (DestX - SrcX) < 100b
LU ← LdF[bbItemWid,0,12];
LU ← (bbItemWid) - T, Skip[Alu#0];
T ← Rsh[Stack,1], Goto[bbGenlInit];*L-to-R if item < 100b long
T ← Rsh[Stack,1], Goto[bbGenlInit,Carry’]; *or width < non-overlap
bbFunction ← (bbFunction) or (100000C), Goto[bbGenlInit]; *R-to-L
*General init: T has items completed if T-to-B, items remaining if B-to-T.
*Time: (78 to 82 B-to-T), (74 to 91 T-to-B, L-to-R), (95 R-to-L) + 1 if long
bbGenlInit:
bbDty ← (bbDty) + T;*Even
**Save 3 mi both here and in dest init by using a multiply loop that is
**9*nzeroes to the right + 13*nones (i.e., slows init about 5 + 2 cycles/bit)
LU ← bbGray;
bbSrcWLo ← Zero, Goto[bbAddSF,Alu=0];*sbmr=0 not impossible
T ← bbSty ← (bbSty) + T, Call[bbYHi0];
*bbSrcQLo/QHi + (sty*sbmr) + (slx rsh 4); product may be > 16 bits
bbGray ← Rsh[bbGray,1], Goto[bbNoAddS,R Even];
*Multiply timing: 6*nzeroes right of the left-most one + 14*nones in sbmr.
bbSrcQLo ← (bbSrcQLo) + T, Goto[.+3,Alu#0];
T ← (bbYHi) + 1, UseCOutAsCIn;
bbSrcQHi ← (bbSrcQHi) + T, Goto[bbAddSF];
T ← (bbYHi) + 1, UseCOutAsCIn;
bbSrcQHi ← (bbSrcQHi) + T;
bbNoAddS: T ← bbSty ← Lsh[bbSty,1], DblGoto[bbLshyHi1,bbLshyHi0,R<0];
bbAddSF:T ← RHMask[bbSrcQHi];*Even
bbSrcQHi ← (Lsh[bbSrcQHi,10]) + T + 1;*bbSrcQHi in base reg. format
T ← LdF[bbSlx,14,4], Call[bbNegIWSub];
*bbSlast ← (Slx & 17) + ItemWid - 1 = displacement to last bit of scan-line
*Add (slx rsh 4) and copy WLo into QLo; point bbSlx at bit in 1st quadword.
bbSlast ← (Zero) - T - 1, Call[bbSBWQ];
*Time to here: (1 if long) + {(68 if no source) else multiply time +
* [(107 to 112 B-to-T), (101 to 119 T-to-B, L-to-R), (122 to 123 R-to-L)]}
bbDestInit:
LU ← bbGray2;
bbDestWLo ← Zero, Goto[bbAddDF,Alu=0];
T ← bbDty, Call[bbYHi0];
*bbDestQLo/QHi + (dty*dbmr) + (dlx rsh 4); product may be > 16 bits
bbGray2 ← Rsh[bbGray2,1], Goto[bbNoAddD,R Even];
bbDestQLo ← (bbDestQLo) + T, Goto[.+3,Alu#0];
T ← (bbYHi) + 1, UseCOutAsCIn;
bbDestQHi ← (bbDestQHi) + T, Goto[bbAddDF];
T ← (bbYHi) + 1, UseCOutAsCIn;
bbDestQHi ← (bbDestQHi) + T;
bbNoAddD: T ← bbDty ← Lsh[bbDty,1], DblGoto[bbLshyHi1,bbLshyHi0,R<0];
bbAddDF:T ← RHMask[bbDestQHi];*Even
bbDestQHi ← (Lsh[bbDestQHi,10]) + T + 1;
PFetch2[AC2,bbGray2,16], Call[bbDBWQ];
T ← LdF[bbDlx,14,4], Call[bbNegIWSub];*offset to last scan-line bit
PFetch2[AC2,bbGray,14], Skip[MB];
MNBR ← bbNegItemWid, Goto[bbTchS];
MNBR ← bbNegItemWid, Goto[bbNoTchS];
%Time to bbTchD/S: (1 if long) + [(103 to 104 if no source) else
(144 to 150 B-to-T), (136 to 155 T-to-B, L-to-R), (157 to 159 R-to-L)]
To this add entry/exit overhead less time between bbItemRefill and bbTchD/S:
Alto--16 + 16; Alto Mesa--10 + 30; Pilot Mesa--10 + 15
(-13 T-to-B), (-17 B-to-T), (-10 no source).
Total time: [152 to 158 B-to-T, 148 to 167 T-to-B L-to-R, 169 to 171 R-to-L,
118 to 119 no source] + [0 Pilot, 7 Alto, or 15 Alto Mesa] + [1 if long].
To this add about 60 cycles/multiply (1 multiply if no source else 2).
%
bbyHi0:bbYHi ← 0C, Return;
bbLshyHi0:bbYHi ← Lsh[bbYHi,1], Return;
bbLshyHi1:bbYHi ← (Lsh[bbYHi,1]) + 1, Return;
bbNegIWSub:T ← (bbNegItemWid) - T;
T ← (bbGray1) or T, Skip[R Odd];
T ← (Lsh[AllOnes,1]) and T;*Make bbLast be odd
bbDlast ← (Zero) - T - 1, Return;*Mask bbLast be 0
*T[08] ← MA’, T[09] ← MB, T[10:15] ← Alu op
*The MB branch condition is used to indicate "no source."
bbFnSetup:
PFetch2[AC2,bbSlx,12], Disp[.+1];*fetch slx and sty
bbFunction ← 1000C, At[bbF,0];*0,R or T
bbOr:T ← 204C, Return, At[bbF,1];*0,R or T
bbX:T ← 263C, Goto[bbForceTch], At[bbF,2];*0,R xor T
bbAN:T ← 227C, Return, At[bbF,3];*0,R & T’
bbFunction ← 1000C, At[bbF,4];*1,R or T’
T ← 074C, Return, At[bbF,5];*1,R or T’
T ← 054C, Goto[bbForceTch], At[bbF,6];*1,R xnor T
T ← 056C, Return, At[bbF,7];*1,R & T
bbFunction ← 2000C, Goto[bbOr], At[bbF,10];*0,R or T
bbFunction ← 3000C, Goto[bbOr], At[bbF,11];*0,R or T
bbFunction ← 3000C, Goto[bbX], At[bbF,12];*0,R xor T
bbFunction ← 3000C, Goto[bbAN], At[bbF,13];*0,R & T’
T ← 304C, Goto[bbTR4], At[bbF,14];*0,R or T
T ← 304C, Goto[bbTR5], At[bbF,15];*0,R or T
T ← 363C, Goto[bbTR5Tch], At[bbF,16];*0,R xor T
T ← 327C, Goto[bbTR5], At[bbF,17];*0,R & T’
bbTR5:bbFunction ← 5000C, Return;*type 5; no source
bbTR4:bbFunction ← 4000C, Return;*type 4; no source
bbTR5Tch:bbFunction ← 5000C;*type 5; touch pages
bbForceTch:bbGray1 ← Zero, Return;
%Approx. item refill times starting at bbItemRefill are below; the inner-loop
dependent constant given in the comments before the inner loops must be
added to these and the time is 1 (Alto) or 9 (Mesa) cycles greater if
interrupts are disabled:
L-to-R, src used:58 (T-to-B) or 61 (B-to-T) cycles
[+ 12 + 11*(NDestPages+NSrcPages-2) if xor/xnor functions or sty=dty]
L-to-R, src unused:41 [+ 5 + 11*(NDestPages-1) if xor function]
R-to-L:11*(NDestPages+NSrcPages) + 93 cycles
R-to-L, continuing item:62 cycles, 47 or 63 on last continuation
When both src and dest are used the no-refill case will occur at
most 6 times followed by a src-refill, dest-refill, or item-refill.
If src and dest are word-aligned, at most 3 no-refill loops will occur
followed by a src-dest-refill or item refill.
When only the dest is used, at most 3 no-refill loops occur followed
by a src-dest-refill or item refill.
%
bbSrcFetch:PFetch4[bbSrcQLo,bbSrc], Return;
*Note: Doing PStore4 first allows both PFetch4’s to be launched before
*transport for either occurs. If the PFetch4 for the src were done first,
*the PFetch4 for the dest could not be launched before transport for both
*preceding references had finished.
bbSrcDestRfl:PStore4[bbDestQLo,bbDest];
T ← bbSrcWLo ← (bbSrcWLo) + (4C);
PFetch4[bbSrcQLo,bbSrc], Skip;
bbDestRfl:PStore4[bbDestQLo,bbDest];
T ← bbDestWLo ← (bbDestWLo) + (4C);
PFetch4[bbDestQLo,bbDest], Return;
*The bbIA, bbIB, bbIE, and bbIF dispatch tables could be united by revising
*the SrcDestRefill subroutine to check MB; this would save 12b mi but slow
*inner loops by 2 to 3 cycles.
bbInnerLoops:
*functions 1-3 and 5-7; refill times: i=4+I, sd=28, s=20, d=26, n=4
bbIA1:bbDlx ← (DB ← bbDlx) + T, Call[bbFixG], At[bbI,bbILtype0];
T ← BBFA[SB[bbSrc]] or T;
bbIA2:DB[bbDest] ← BBFBX[DB[bbDest]] SAlufOp T, Disp[.+1];
bbItemRefill:
T ← bbDestWLo, Goto[bbItemRfl], At[bbIA,bbItem];
bbSrcDestRefill:
T ← bbDestWLo, Goto[bbSrcDestRfl], At[bbIA,bbSDRef];
bbSrcRefill:
T ← bbSrcWLo ← (bbSrcWLo) + (4C), Goto[bbSrcFetch], At[bbIA,bbSRef];
bbDestRefill:
T ← bbDestWLo, Goto[bbDestRfl], At[bbIA,bbDRef];
T ← BBFA[SB[bbSrc]] or T, Goto[bbIA2], At[bbIA,bbNoRef];
*functions 0 and 4; refill times: i=4+I, sd=32, s=18, d=26, n=4
bbIB1:bbDlx ← (DB ← bbDlx) + T, Call[bbFixG], At[bbI,bbILtype1];
T ← BBFA[SB[bbSrc]] or T;
bbIB2:DB[bbDest] ← BBFB[DB[bbDest]] SAlufOp T, Disp[.+1];
T ← bbDestWLo, Goto[bbItemRefill], At[bbIB,bbItem];
T ← bbDestWLo, Goto[bbSrcDestRfl], At[bbIB,bbSDRef];
T ← bbSrcWLo ← (bbSrcWLo) + (4C), Goto[bbSrcFetch], At[bbIB,bbSRef];
T ← bbDestWLo, Goto[bbDestRfl], At[bbIB,bbDRef];
T ← BBFA[SB[bbSrc]] or T, Goto[bbIB2], At[bbIB,bbNoRef];
*function 10; refill times: i=8+I, sd=36, s=22, d=30, n=8
bbIC1:bbDlx ← (DB ← bbDlx) + T, Call[bbFixG], At[bbI,bbILtype2];
T ← BBFA[SB[bbSrc]] or T;
DB[bbDest] ← (DB[bbDest]) and not T, Disp[.+1];
T ← PCF[bbGray] and T, Goto[bbICi], At[bbIC,bbItem];
T ← PCF[bbGray] and T, Goto[bbICsd], At[bbIC,bbSDRef];
T ← PCF[bbGray] and T, Goto[bbICs], At[bbIC,bbSRef];
T ← PCF[bbGray] and T, Goto[bbICd], At[bbIC,bbDRef];
T ← PCF[bbGray] and T, Goto[bbICr], At[bbIC,bbNoRef];
*functions 11-13; refill times: i=6+I, sd=34, s=20, d=28, n=6
bbID1:bbDlx ← (DB ← bbDlx) + T, Call[bbFixG], At[bbI,bbILtype3];
T ← BBFA[SB[bbSrc]] or T;
T ← PCF[bbGray] and T, Disp[.+1];
bbICi:DB[bbDest] ← BBFBX[DB[bbDest]] SAlufOp T, Goto[bbItemRefill], At[bbID,bbItem];
bbICsd:DB[bbDest] ← BBFBX[DB[bbDest]] SAlufOp T, Goto[bbSrcDestRefill], At[bbID,bbSDRef];
bbICs:DB[bbDest] ← BBFBX[DB[bbDest]] SAlufOp T, Goto[bbSrcRefill], At[bbID,bbSRef];
bbICd:DB[bbDest] ← BBFBX[DB[bbDest]] SAlufOp T, Goto[bbDestRefill], At[bbID,bbDRef];
bbICr:DB[bbDest] ← BBFBX[DB[bbDest]] SAlufOp T, Return, At[bbID,bbNoRef];
*function 14; refill times: i=4+I, sd=26, s=never, d=never, n=4
bbIE1:bbDlx ← (DB ← bbDlx) + T, Call[bbFixG], At[bbI,bbILtype4];
T ← BBFA[PCF[bbGray]];
bbIE2:DB[bbDest] ← BBFB[DB[bbDest]] SAlufOp T, Disp[.+1];
T ← bbDestWLo, Goto[bbItemRflNS], At[bbIE,bbItem];
T ← bbDestWLo, Goto[bbDestRfl], At[bbIE,bbSDRef];
T ← BBFA[PCF[bbGray]], Goto[bbIE2], At[bbIE,bbNoRef];
*functions 15-17; refill times: i=4+I, sd=26, s=never, d=never, n=4
bbIF1:bbDlx ← (DB ← bbDlx) + T, Call[bbFixG], At[bbI,bbILtype5];
T ← BBFA[PCF[bbGray]];
bbIF2:DB[bbDest] ← BBFBX[DB[bbDest]] SAlufOp T, Disp[.+1];
T ← bbDestWLo, Goto[bbItemRflNS], At[bbIF,bbItem];
T ← bbDestWLo, Goto[bbDestRfl], At[bbIF,bbSDRef];
T ← BBFA[PCF[bbGray]], Goto[bbIF2], At[bbIF,bbNoRef];
bbFixG:PCF ← Stack, BBFBX, Return;
bbItemRflNS:
PStore4[bbDestQLo,bbDest], Call[bbCntI];
LU ← NWW, DblGoto[bbIntOff,bbIntOn,R<0];
*Worst case time to return from bbIR1 is 42 (src used).
bbItemRfl:
PStore4[bbDestQLo,bbDest], Call[bbIR1];
*Test for interrupts and done:
*Item refill time: 12.
LU ← NWW, Skip[R>=0];
bbIntOff: bbItemsLeft ← (bbItemsLeft) - 1, Skip;
bbIntOn:bbItemsLeft ← (bbItemsLeft) - 1, Skip[Alu#0];
T ← bbDBMR, DblGoto[bbAdvD,bbExit1,Alu>=0];
LU ← PCF ← AC0, Skip[Alu>=0];
DblGoto[bbMDone,bbNDone,Alu<0];
LU ← xfWDC, DblGoto[bbMesaInt,bbNovaInt,Alu<0];
bbIR1:T ← bbNegBitsLeft, Goto[bbRtoLCont,R<0];
bbCntI:Stack ← (Stack) + (2C), Return;
*At bbMesaInt and bbNovaint, we have committed to taking an interrupt;
*control can only get back by restarting the opcode.
*Enter MIPend knowing it will not return.
bbMesaInt:
IntType ← 1C, Skip[Alu#0];*IntType ← 1 (PC backup if stopping)
T ← bbDBMR, Goto[bbAdvD];
LoadPage[prPage];
**Worst case time since return is 17 cycles at entry to MIPend
T ← (SStkP&NStkP) xor (377C), GotoP[MIPend];*T ← StkP
*Since the Nova only checks for interrupts during jumps, we simulate JMP .
**Could call intEnt here and handle the (rare) return, if WW, ACTIVE, and
**DMA (used by intEnt) did not clobber BitBlt registers other than
**bbSrc and bbDest.
bbNovaInt:
T ← PCF[RZero], LoadPage[nePage], Goto[bbNExit];
*LoadPage[xoPage];
*Call[intEnt];
*T ← bbDBMR, Goto[bbAdvD];
bbNDone:
T ← PCF[RZero] + 1, LoadPage[nePage];
bbNExit:PFetch4[PC,IBuf], GotoP[JmpFin];
bbMDone:
PFetch4[PC,IBuf,0];*Restore IBuf
:IF[AltoMode]; ******************************
Stack&-2, LoadPage[0];
PFetch4[LOCAL,LocalCache0,4], Call[SwapBytes];
*In Alto mode, the opcode after BitBlt is the even byte of the next word,
*so if BitBlt occurs on an even byte, the next byte must be skipped.
Cycle&PCXF, LoadPage[4], Skip[R Even];
CSkipData, GotoP[P4Tail];*Won’t cause refill
GotoP[P4Tail];
:ELSE; **************************************
Stack&-2, LoadPage[4];
PFetch4[LOCAL,LocalCache0,4], GotoP[P4Tail];
:ENDIF; *************************************
bbExit:PCF ← AC0, DblGoto[bbMDone,bbNDone,R<0];
bbExit1: PCF ← AC0, DblGoto[bbMDone,bbNDone,R<0];
*Item refill time: 16 (no src), 18 (src used)
*bbDestW ← bbDestQ ← bbDestW + bbDBMR
bbAdvD:T ← bbDestQLo ← (bbDestQLo) + T, Goto[bbAD1,Alu>=0];*Even
MNBR ← bbNegItemWid, Skip[Carry];
bbDestQHi ← (bbDestQHi) - (400C) - 1;
bbDestWLo ← Zero;
*bbSrcW ← bbSrcQ ← bbSrcW + bbSBMR (we know src is used for B-to-T)
T ← bbSBMR;
bbSrcQLo ← (bbSrcQLo) + T, Call[bbASn];
%Avoid page touching and simply refill bbSrc/bbDest except when either a
non-restartable function is being executed (xor, xnor) or dty=sty, in which
case there is possible src-dest overlap. When touching, begin with the last
page of the scan-line and finish with a PFetch4 of the 1st quadword.
Initial displacement is [(ItemWidth - 1 + (startbit & 17)) rsh 4] + non-page
bits of start word, where the first term is computed during initialization.
The choice to touch/not-touch is indicated in bbSLast/bbDLast which contain
0 when not touching or some odd value when touching.
%
bbTchS:T ← Rsh[bbSlast,4], Goto[bbNTS,R Even];
bbTS:T ← (RHMask[bbSrcQLo]) + T;*Odd
T ← (Lsh[AllOnes,10]) and T, Call[.+2];
T ← (Lsh[AllOnes,10]) + T;
PFetch4[bbSrcQLo,bbSrc], Skip[Alu=0];*Even
Return;
bbSrcWLo ← Zero;
SB ← bbSlx, Goto[bbTchD];
*Can eliminate these 3 mi and move the bbNTS label at cost of 2 cycles.
bbNTS:PFetch4[bbSrcQLo,bbSrc];*Even
SB ← bbSlx;
bbSrcWLo ← Zero, Goto[bbNTD];
bbAD1:MNBR ← bbNegItemWid, Skip[Carry’];*Even
bbDestQHi ← (bbDestQHi) + (400C) + 1;
bbNS1:bbDestWLo ← Zero, Goto[bbNoTchS,MB];
T ← bbSBMR;
T ← bbSrcQLo ← (bbSrcQLo) + T, Call[bbASp];
T ← Rsh[bbSlast,4], DblGoto[bbNTS,bbTS,R Even];
bbNoTchS:SB ← bbDlx;
bbTchD:T ← Rsh[bbDlast,4], Goto[bbNTD,R Even];*Odd
T ← (RHMask[bbDestQLo]) + T;*Odd
T ← (Lsh[AllOnes,10]) and T, Call[.+2];
T ← (Lsh[AllOnes,10]) + T;
bbNTD:PFetch4[bbDestQLo,bbDest], Skip[Alu=0];*Even
Return;
*Item refill time to here: 33 + [5 + 11*(NDPages-1) if xor function] (no src)
*50 (T-to-B), 53 (B-to-T) +
*[12 + 11*(NDPgs+NSPgs-2) if xor/xnor functions or sty=dty]
Dispatch[bbFunction,4,4], Goto[bbItemSetup,R>=0];
*New R-to-L item
T ← bbNegItemWid;*Odd
T ← (MNBR ← bbNegSDNonOverlap) - T, Goto[bbRtoLNew];*T ← bits left
bbRtoLCont:*Here when continuing R-to-L item.
LU ← (MNBR ← bbNegSDNonOverlap) - T;*Odd
bbDestWLo ← Zero, Skip[Carry];
T ← MNBR ← bbNegBitsLeft;
*Worst case time from Return to here is 41 (continue), 34 on new item.
*Initially T will contain ItemWidth-SDNonOverlap; subsequent iterations
*T will contain -SDNonOverlap until the last iteration for the item, when
*T will contain -BitsLeft. bbNegBitsLeft is 0 at the beginning of each item.
bbRtoLNew:
bbNegBitsLeft ← (bbNegBitsLeft) - T, Call[bbSlxFix];*Odd
bbDlx ← (LdF[bbDlx,14,4]) + T, Call[bbSBWQ];
SB ← bbSlx;
PFetch4[bbSrcQLo,bbSrc,0], Call[bbDBWQ];
PFetch4[bbDestQLo,bbDest,0];
bbSrcWLo ← Zero;
Dispatch[bbFunction,4,4];
bbItemSetup:T ← Lsh[bbDBMR,4], Disp[bbInnerLoops];*Even
%bbSBWQ and bbDBWQ are used both by initialization and in the R-to-L case.
bbASn/bbASp are called by item refill to advance QLo/QHi by a signed
word displacement, bbSBMR. The equivalent code in bbDBWQ is open-coded
for item refill.
%
bbSBWQ:T ← Rsh[bbSlx,4], Skip[R>=0];
T ← (Lsh[AllOnes,14]) or T;
T ← bbSrcQLo ← (bbSrcQLo) + T, Goto[bbASp,Alu>=0];
bbASn: T ← Lsh[bbSrcQLo,4], Skip[Carry];
bbSrcQHi ← (bbSrcQHi) - (400C) - 1;
bbSlx ← (LdF[bbSlx,14,4]) + T, Return;
bbASp:T ← Lsh[bbSrcQLo,4], Skip[Carry’];
bbSrcQHi ← (bbSrcQHi) + (400C) + 1;
bbSlxFix:bbSlx ← (LdF[bbSlx,14,4]) + T, Return;
bbDBWQ:T ← Rsh[bbDlx,4], Skip[R>=0];
T ← (Lsh[AllOnes,14]) or T;
T ← bbDestQLo ← (bbDestQLo) + T, Goto[bbADp,Alu>=0];*Even
T ← Lsh[bbDestQLo,4], Skip[Carry];
bbDestQHi ← (bbDestQHi) - (400C) - 1;
bbDlx ← (LdF[bbDlx,14,4]) + T, Return;
bbADp:T ← Lsh[bbDestQLo,4], Skip[Carry’];
bbDestQHi ← (bbDestQHi) + (400C) + 1;
bbDlxFix:bbDlx ← (LdF[bbDlx,14,4]) + T, Return;
:END[BitBlt];