{File name: NewLispFFTB2.mc
Description:  DLion Lisp complex Fast Fourier Transform with Floating point chips
Purcell:  11-Apr-84 11:55:04	1032 chip diag bug worked around
Purcell: 31-Jan-84 13:37:09	Created}

{The LispFFT opcode performs one pass of an FFT of fixed geometry, not in place, decimation in time. 
Each cycle of the inner loop reads 2 adjacent complex samples, performs the FFT butterfly and writes the the 2 complex results to separate destinations.  The 16 memory references overlap 10 pipelined float operations.  The twiddle factor is initialized to unity and periodically 'rotated' by a twiddle delta.

Let the first complex butterfly input (AA + BB*j) be represented by two 32 bit IEEE floating point numbers located at (RRv + 2*HCnt + 2*RCnt).  (WRCnt=WCnt in each butterfly)  Let the second adjacent input be (CC + DD*j) and the outputs be (AW + BW*j) at (ABv + HCnt + WCnt) and (CW + DW*j) at (CDv + HCnt + WCnt).  Let XX (XX + YY*j) be the complex twiddle factor" and XW(XW + YW*j) be an intermediate result.  Then the butterfly equations are:

YW ← CC*YY + DD*XX
XW ← CC*XX - DD*YY
DW ← BB - YW
CW ← AA - XW
BW ← BB + YW
AW ← AA + XW

Xbus Float operands and operations(*,+,-) by time (result emerges 5 steps after operation due to pipeline):

(DD) *XX *YY CC *YY *XX DX DY .. +CY CX -DY .. BB -YW +YW XW AA- +AA DW BW DD CW AW


Pipeline overlap diagram of memory references (W=write, N=nop):

!  DW BW     NN DW  BW AW CW AW AW              !                                               !
!DD     DD CC     CC              BB BB NN AA AA!  DW BW     NN DW  BW AW CW AW AW              !
!                                               !DD     DD CC     CC              BB BB NN AA AA!

In the above diagram time passes from left to right with "!"'s separating cycles of the inner loop. Note that the horizontal line represents a complete butterfly operation spread over two cycles.  In the first cycle the input samples are all read and the butterfly begins. In the next cycle the butterfly is completed and all the output samples are written (overlapped with reads for the next butterfly). All the virtual memory mapping and faults occur very early in the cycle so that a cycle that gets past 3 operations will complete.  The writing of the cycle can be thought of as preceeding the reading, even though both are overlapped.  A series of cycles can be represented:

|3 W3 R3  |1 W3 R2  |1 W2 R1  |1 W1 R0  |0 W0 RT ||2 WT R3  |1 W3 R2  |1 W2 R1  |1 W1 R0  |0 W0 RT ||

Where there are four cycle types: Initial(3), Start(2), Middle(1) and End(0), denoted by the vertical bar, "|".
The write address is indicated after the W and the read address after the R.  The "twiddle" factor is read and written by RT and WT during cycle types 0 and 2. A series of 4 butterflies of the same twiddle consists of cycles of type: 0, 1, 1, 1, 2.  The last cycle 2 and the next cycle 0 together change the twiddle factor by complex multiplication by the "twiddle delta" argument.  This twiddle change is implemented as a butterfly operation whose source is the twiddle delta and whose destination is the twiddle register.

The cycle type is encoded in Link2.  Cycle types have unique beginnings and mostly common tails.

Cycle type 3 is initialization: the argument table is read into u regs including twiddle uXX uYY.  RCnt ← WCnt and FF←AB←CD←0.

Cycle type 2 uses twiddle argument as write address; updates twiddle u registers: uXX, uYY; and at end of cycle clears write address to force remapping AB ← CD ← 0; like cycle 1 it decrements and branches on WCnt.

Cycle type 1 is the normal inner loop; it decrements read and write address offsets: WCnt,RCnt; if WCnt<0 then changes to cycle type 0 else continue type 1.

Cycle type 0 reads the twiddle delta instead of an input sample; does not decrement addresses; if HCnt=0 then ends opcode else goto cycle 2.
 
Restartability: Two arguments HCnt and LCnt record forward progress; their decreasing sum is the offset for writing output samples.  After writing to offset zero the opcode exits, advancing the PC.  Earlier exits due to page fault or interrupts do not advance the PC, so FFT is resumed according to the HCnt and LCnt.  Also as the twiddle factor changes it is stored in the argument record.  Since the source and destination array do not overlap any butterfly may be repeated or its results partially stored as long as LCnt/HCnt only advances past completed butterflies.

Call: (LispFFT vArg LCnt) where vArg: [TW (4) FFv (6) ABv (8) CDv (10) TCnt HCnt (12) Z (16) TD] is the argument record which must not cross page boundary.  All the counts and addresses are in units of words but refer to quad words (complex samples) so they must be zero mod 4.  In addition FFv refers to a pair of complex quadwords and must be 0 mod 8.

Unload parameters from lisp stack in real memory to "stack" Uregisters}
{
RegDef[umXX0,	U,	02];{move to fZ=0 before using} {TWiddle factor 4 word complex float}
RegDef[ulXX,	U,	03];
RegDef[umYY0,	U,	04];{move to fZ=0 before using; STK points here}
RegDef[ulYY0,	U,	05];{fp chip doesn't like ulYY here for some STRANGE reason}
RegDef[uFFvH,	U,	06];{Virtual Address of source (From) array base.}
RegDef[uFFvL,	U,	07];
RegDef[uABvH,	U,	08];{Virtual Address of destination array base.}
RegDef[uABvL,	U,	09];
RegDef[uCDvH,	U,	0A];{Virtual Address of midpoint in destination array base.}
RegDef[uCDvL,	U,	0B];
RegDef[uTCnt,	U,	0C];{count of butterflies with same twiddle (x4)}
RegDef[uHCnt,	U,	0D];{high portion of count of butterflies remaining (x4)}
Set[ft.twiddle.offset, 4 ];
Set[ft.twiddle.delta.offset, 20'd ];{Twiddle Delta(8 words): address of complex zero followed by packed complex root of unity to change twiddle}
Set[ft.HCnt.offset, 11'd ];

RegDef[umXX,	U,	50];	{umXX,FTimes{0}}
RegDef[umYY,	U,	60];{uRx}	{umYY,FTimes{0}}
RegDef[ulYY,	U,	51];{alternate home for ulYY see ulYY0}

RegDef[uArgH,	U,	68];{virtual address of argument record}
RegDef[uArgL,	U,	69];{rA=Rx}
RegDef[ulCC,	U, 	4];	{ulCC, STK{FPlus}}
RegDef[umYW,	U, 	4];	{umYW, STK{FPlus}}

RegDef[FF,	R,	3];{PV}  {Real address in "From" array. Equal to Source + 2*(HCnt + RCnt) }
RegDef[rhFF,	RH,	3];
RegDef[AB,	R,	4];{S} {Real address in destination array. Equal to Dest + HCnt + WCnt) } 
RegDef[rhAB,	RH,	4];
RegDef[CD,	R,	1];{TOSH}{Real address in destination array. Equal to Dest + HCnt + WCnt) }
RegDef[rhCD,	RH,	1];

RegDef[rmAW,	R,	2];{TT}
RegDef[rlBW2,	R,	2];{TT}
RegDef[rTCnt,	R,	2];{TT}
RegDef[rmBW,	R,	6];{Rx}
RegDef[rlDY,	R,	6];{Rx}
{RegDef[rmCY,	R,	6];{Rx}{remove eventually}}
RegDef[rRCnt,	R,	2];{TT}{uRCnt, rRCnt, FMinus} 
RegDef[uRCnt,	U,	26];{uPV}{low portion of count of butterflies remaining to Read(x4)}
RegDef[uWCnt,	U,	47];{low portion of count of butterflies remaining to Write(x4)}
RegDef[ulDY,	U,	54];

{RegDef[TOS,	R,	0];}
{RegDef[rmDY,	R,	0];}
{RegDef[rlCY,	R,	0];{remove eventually}}
RegDef[rmYW,	R, 	0];
RegDef[rmAA,	R,	0];
RegDef[rlDW,	R,	0];
RegDef[rmDW,	R,	0];
RegDef[rlCW2,	R,	0];

{RegDef[PC,	R, 	5];}
RegDef[rlDD,	R, 	5];
RegDef[rlBW,	R, 	5];
RegDef[rmCW,	R, 	5];
RegDef[umCW,	U, 	58];	{rmCW,umCW}
RegDef[rlCC,	R, 	5];
RegDef[rlCW,	R, 	5];
RegDef[ulCW,	U, 	52];	{rlCW,ulCW}
RegDef[rHCnt,	R, 	5];
RegDef[rlAW,	R, 	5];
RegDef[rmDY,	R, 	5];	{rmDY,umDY}
RegDef[umDY,	U, 	56];	{umDY,FMinus{6}}
{RegDef[rlDY,	R, 	5];}{moved to 6=Rx}
RegDef[rlBB,	R, 	5];
RegDef[ulBB,	U, 	53];
RegDef[rlYW,	R, 	5];
RegDef[rlAA,	R, 	5];
RegDef[ulAA,	U, 	55];	{rlAA,ulAA,FMinusB{5}}


MacroDef[fpUMS, FloatUMS];
MacroDef[fpULS, FloatULS];
MacroDef[fpUMP, FloatUMP];
MacroDef[fpULP, FloatULP];
MacroDef[FloatR, FloatResult];
MacroDef[FTimes, FLTimes.A.B];
MacroDef[FPlus, FLPlus];
MacroDef[FMinus, FLBMinusA];
MacroDef[FMinusB, FLAMinusB];
{MacroDef[WriteOK,LOOPHOLE[wok]];}

SetTask[0];
RegDef[uBlock0, UY, 0];

Set[L2.init, 3 ];
Set[L2.start, 7{2} ];
Set[L2.middle, 1 ];
Set[L2.finish, 0 ];
{ Set[L0.xRedoFtFF, 7 ],
  Set[L0.RedoFtARG, 0C ],{% inconsistent with cons}
  Set[L0.RedoFtAB, 7 ],
  Set[L0.RedoFtCD, 0 ],
  Set[L3.fixFFT, 06 ];{extended fixup from L1.fixFV}
  }
}
{initialization save R registers, set float mode, set stackP, perform one FloatA← to start main loop on odd beat}
{	in Bank 1
@FFT:	opcode[371'b],
	Bank ← FFTBank,	c1;
	,	c2;
	CROSS[FFTEntry],		c3;
}

	FloatMode.RN.AI.IEEE, fpULS{PIPElined},		c1, at[FFTEntry];
	uPC ← PC,	c2;
	FloatA ← STK,	{GIGO}	c3;

ftS:	uuPV ← PV,	c1;
	uTT ← S,	c2;
	Noop,	c3;
	
{fetch argument record address from stack}	
	MAR ← [rhS, S+0],	c1;
	L1 ← L1.NoFixesB2,	c2;
	TT ← MD,	c3;
	
	MAR ← [rhS, S - 1], L0 ← L0.RedoFtARG,	c1;
	uWCnt{fixesTOS} ← TOS, CANCELBR[$, 2],	c2;
	rhTT ← MD,	c3;

{map argument record address}
	Map ← TT ← [rhTT,TT + 0], L1 ← L1.FFTFixesB2,	c1;
	TOSH ← 1,	 ,c2;
	Rx ← rhRx ← MD, XwdDisp{XDirtyDisp},		c3;

{fetch args}	
	MAR ← Q	← [rhRx, TT+0],  DISP2[ftARGmp],	c1, at[L0.RedoFtARG,10,WMapFixCallerB2];
	uArgH ← Rx, Rx ← Q,	c2, at[1, 4, ftARGmp];
	uArgL ← Rx, TT ← MD,	c3;

ftArg:	Ybus ← TOSH xor 0D, ZeroBr,	c1;
	TOSH ← TOSH + 1, AltUaddr, BRANCH[$, ftArgEnd],	c2;
	uBlock0 ← TT,	c3;

	MAR ← Rx ← [rhRx, Rx+1],	c1;
	Noop, CANCELBR[$, 2],	c2;
	TT ← MD, GOTO[ftArg],	c3;

ftArgEnd:
	stackP ← 04{FPlus},{note pending AltUaddr} GOTO[ftCy3],	c3;

{remap}	
	Noop, CANCELBR[WLMapFixB2, 3],	c2, at[0, 4, ftARGmp];
	Noop, CANCELBR[WLMapFixB2, 3],	c2, at[2, 4, ftARGmp];
	Noop, CANCELBR[WLMapFixB2, 3],	c2, at[3, 4, ftARGmp];

{if interupted restore TOS from uWCnt}	
ftDone:	TOSH ← smallpl,	c1;
	rhPV ← nRhS,	c2;
	rhS ← nRhS,	c3;

	PC ← uPC,	c1;
	PV ← uuPV,	c2;
	S ← uTT,	c3;

	Bank ← EmuBank,	c1;
	rhTOSH ← nRhS,	c2;
	CROSS[FFTExit],	c3;

{	in Bank 1
	stackP ← TOS ← 0,	c1, at[FFTExit];
	PC ← PC + PC16, IBDisp, L2 ← L2.0,	c2;
	S ← S - 2, L2 ← L2.0, DISPNI[OpTableB2],	c3;
}


	,	c1, at[L1.FFTFixesB2, 10,  TrapFixesB2];
	TOS ← uWCnt,	c2;
	TOSH ← smallpl,	c3;

	rhPV ← nRhS,	c1;
	stackP ← S xor S, rhS ← nRhS,	c2;
	rhTOSH ← nRhS,	c3;

	PC ← uPC,	c1;
	PV ← uuPV,	c2;
	S ← uTT, GOTO[B2CrossToPFaultc1],	c3;

{cycle type 3: init}
ftCy3:	uWCnt ← TOS, L2 ← L2.init,	c1, at[L2.init, 10, ftCy];
	uRCnt ← TOS,	c2;
	rRCnt ← TOS,	c3;
	
        Rx ← umXX0,	c1;
	umXX ← Rx,	c2;
	Rx ← umYY0,	c3;
	
	umYY ← Rx,	c1;
	Rx ← ulYY0,	c2;
	ulYY ← Rx,	c3;
	
	AB ← 0 + 0, rhAB ← 0,	c1;
	CD ← 0 + 0, rhCD ← 0, c2;
	FF ← 0 + 0, rhFF ← 0, GOTO[ftCy1],	c3;
	
{cycle type 2: start}
ftCy2:	rhAB ← uArgH,	c1, at[L2.start, 10, ftCy];
	rhCD ← uArgH,	c2;
	AB ← uArgL, L2 ← L2.start,	c3;
	
	AB ← AB + ft.twiddle.offset,	c1;
	CD ← AB,	c2;
	FF ← 0 + 0, rhFF ← 0, GOTO[ftCy1],	c3;

{cycle type 0: finish}
ftCy0:	rhFF ← uArgH,	c1, at[L2.finish, 10, ftCy];
	FF ← uArgL, L2 ← L2.finish,	c2;
	FF ← FF + ft.twiddle.delta.offset, GOTO[ftCy1],	c3;
























{cycle type 1: middle}{Check early for page crossings and remap}
ftCy1:
flDD:	MAR← FF← [FF,FF-1],		FloatA	← ulAA,			c1, at[L2.middle, 10, ftCy];
	STK{f+}	← rmAA,	    		FloatA	← rmAA LRot0,{f+}fpULS,	c2, {+AA} BRANCH[$,ftFFcros,1];
ftFFok:	STK{AA}	← rlAA,    rlDD	← MD, 					c3; {above:}{STK,FPlus}
flDW:	MAR← CD	← [CD,CD-1],		FloatA	← STK{AA}, {rev}fpUMS,	c1;
	MDR	←	   rlDW←	FloatA ← FloatR,	fpULS,	c2, {DW} BRANCH[$,ftCDcros,1];
ftCDok:	STK	← rlDD,	   rmDW← 	FloatA ← FloatR,	fpUMS,	c3;
flBW:	MAR← AB	← [AB,AB-1],						c1;
	MDR	←	   rlBW	←  	FloatA ← FloatR,		c2, {BW} BRANCH[$,ftABcros,1];
ftABok:			      Q	← uRCnt,				c3;
fmDD:	MAR← FF	← [FF,FF-1],						c1;
			   rmBW	← 	FloatA ← FloatR,		c2,	 CANCELBR[$,2];
			   rlBW2 ← rlBW,FloatAB	← MD, 		fpUMS,	c3; {DD}
flCC:	MAR← FF	← [FF,FF-1],		FloatAB	← STK{lDD},	fpULS,	c1;
			   rmCW	← 	FloatA ← FloatR,	fpUMS,	c2, {CW} CANCELBR[$,2];
	umCW	← rmCW,    rlCC ← MD, 		L2Disp,			c3;	{rmCW,umCW}
flNN:	STK{CC}	← rlCC,	   rlCW	← 	FloatA ← FloatR,	fpULS,	c1, BRANCH[flNN2, NewXY, 3];

{cycle 2: write twiddle: uXX uYY, increase RCnt, decrease and store HCnt}
NewXY:	ulCW	← rlCW,			c2;
	ulYY	← rlBW2,		c3;
ftWT1:	rHCnt ← uHCnt,			c1;
	rTCnt ← uTCnt,			c2;
	rHCnt ← rHCnt - rTCnt,		c3;
ftWT2:	Q{rRCnt} ← Q + rTCnt,		c1;
	TT ← uArgL,		c2;
	rhTT ← uArgH,			c3;
ftWT3:	MAR ← [rhTT, TT+ft.HCnt.offset],		c1;
	MDR ← uHCnt ← rHCnt,	WriteOK, CANCELBR[$ ,2],		c2;
	umYY ← rmBW, rmAW ← FloatA ← FloatR,	c3;
ftWT4:	umXX ← rmAW,			c1;
	rlAW ← FloatA ← FloatR,			c2;
	ulXX ← rlAW, GOTO[FmDW],	c3;

flNN2:	ulCW	← rlCW,	   rmAW	← 	FloatA ← FloatR,	        c2; {AW} {rlCW,ulCW}
			   rlAW	← 	FloatA ← FloatR,		c3;
FmDW:	MAR← CD	← [CD,CD-1],		FloatA	← umXX,	FTimes,		c1; {*XX}
	MDR	← rmDW,	    		FloatA	← ulXX,	WriteOK,	c2, CANCELBR[$,2];
			 		FloatA	← umYY,	FTimes,		c3; {*YY}
fmCC:	MAR← FF	← [FF,FF-1],		FloatA	← ulYY,			c1;
			   rlCW2	← ulCW, 				c2,	CANCELBR[$,2];
	uWCnt	← Q,			FloatAB	← MD,			c3; {CC}
flCW:	MAR← CD	← [CD,CD-1],		FloatAB	← ulCC{STK},		c1;
	MDR	← rlCW2,			FloatA	← umYY,	FTimes, WriteOK,c2, {*YY} CANCELBR[$,2];
	 				FloatA	← ulYY,			c3;
fmBW:	MAR← AB	← [AB,AB-1],		FloatA	← umXX,	FTimes, fpUMP,	c1; {*XX}
	MDR	← rmBW,			FloatA	← ulXX,	WriteOK,fpULP,	c2,	CANCELBR[$,2];
			rmBW{} ←	FloatAB ← FloatR,	fpUMP,	c3; {DX}
flAW:	MAR← AB	← [AB,AB-1],		FloatAB	← FloatR,	fpULP,	c1;
	MDR	← rlAW,	   rmDY	← 	FloatA ← FloatR, WriteOK,	c2, {DY} CANCELBR[$,2];
	{umDY	← rmDY,}	   rlDY	← 	FloatA ← FloatR, 	c3;	    {rmDY,umDY}
fmCW:	MAR← CD	← [CD,CD-1],		FloatA{Nop} ← umCW,	fpUMP,	c1; {..}
	MDR	← umCW,			FloatA{N}← umCW, WriteOK,fpULP, c2, CANCELBR[$,2];
	ulDY ← rlDY, {rmBW{6} ←}	FloatA	← FloatR, FPlus,fpUMP,	c3, L2Disp; {+CY}
fmAW:	MAR← AB	← [AB,AB-1],		FloatA	← FloatR, 	fpULP,	c1, BRANCH[$, NewW, 5];
fmAW2:	MDR	← rmAW,			FloatAB← FloatR, WriteOK,	c2, {CX} CANCELBR[$,2];
	umDY ← rmDY,		FloatAB	← FloatR,		c3;
FlBB:	MAR← FF	← [FF,FF-1], 		FloatA	← umDY,	FMinus,		c1; {-DY}   {umDY,FMinus}
{	Ybus	← rlDY,			FloatA	← rlDY LRot0,{=>}fpULS,	c2,	CANCELBR[$, 2];}
					FloatA	← ulDY,	c2,	CANCELBR[$, 2];
			   rlBB ← MD,	FloatA{Nop} ← MD,		c3, {..} L2Disp;
fmBB:	MAR← FF	← [FF,FF-1],		FloatA{Nop} ← FloatR,		c1, BRANCH[ftPaw, $, 6];
	STK{ulBB} ← rlBB,  rRCnt ← Q-4, CarryBr{>=0}, CANCELBR[$,2],	c2{,     at[1, 10, ftTst]};
fmBB3:			  L2 ← 0{or 1}, FloatAB	← MD,		fpUMS,	c3, {BB} BRANCH[ftRCZ, $];
FmNN:					FloatAB	← STK{ulBB},	fpULS,	c1;
FmNN2:	{uRCnt	← rRCnt, }  rmYW ←	FloatA	← FloatR, FMinus,	c2; {-YW}
	umYW{STK}{f+}	← rmYW,	   rlYW ←	FloatA	← FloatR,		c3;
flAA:	MAR← FF	← [FF,FF-1], 		FloatA	← STK{umYW}{f+}	fpUMS,	c1; {+YW}	{STK,FPlus}
	Ybus	← rlYW,			FloatA	← rlYW LRot0,	fpULS,	c2,	 CANCELBR[$,2];
	uRCnt	← rRCnt,	   rlAA	← MD,					c3;
fmAA:	MAR← FF	← [FF,FF-1],		FloatAB	← FloatR, 		c1; {XW}
			   {Q ← rRCnt,} FloatAB	← FloatR,  L2Disp,	c2,	 CANCELBR[$,2];
	ulAA	← rlAA,	   rmAA ← 	FloatA	← MD,FMinusB,DISP4[ftCy],c3; {AA-} {rlAA,ulAA,FMinusB}



{cycle 2,3: Clear Write Map entries:}
NewW:	MDR	← rmAW,	AB ← 0 and AB,		FloatAB ← FloatR, WriteOK,	c2, {CX} CANCELBR[$,2];
	umDY ← rmDY,	CD ← 0 and CD,		FloatAB	← FloatR, GOTO[FlBB],	c3;
			
{cycle 0: Pause: don't decrement rRCnt{Q}; End Test HCnt=0 }
ftPaw:	Ybus ← uHCnt, ZeroBr, L2 ← L2.start{7}, CANCELBR[$, 7], c2;
	STK{ulBB} ← rlBB, rRCnt ← Q, FloatAB ← MD,	fpUMS, BRANCH[$, ftDone],	c3;
	FloatAB	← STK{ulBB},	fpULS, GOTO[FmNN2],	c1;

{cycle 1 => cycle 0 when Read Counter Zero}
ftRCZ:	FloatAB	← STK{ulBB},	fpULS, GOTO[FmNN2],	c1;


{FF crosses:}

ftFFcar: rhTT ← Q LRot0,		c3;
	Map ← [rhTT,TT],	GOTO[ftFFcn],	c1;

ftFFcros: Q ← uHCnt, L0 ← L0.xRedoFtFF,	c3;
	TT ← Q + rRCnt{TT},		c1;
	TT ← TT LRot1, rhTT ← uFFvH,	c2;
	Rx ← uFFvL,		c3;
	
	Map ← TT ← [rhTT,TT + Rx], CarryBr,	c1;
ftFFcn:	Q ← rhTT + 1, BRANCH[$, ftFFcar], LOOPHOLE[byteTiming],	c2;
	FF ← rhFF ← MD, XRefBr,		c3;
	
ftFFre:	MAR← Q	← [FF,TT+7],  BRANCH[ftFFmp, $],	c1, at[L0.xRedoFtFF,10,RMapFixCallerB2];
	FF ← Q, BRANCH[ftFFok, ftFFcros{not expected},1],	c2;
ftFFmp:	Rx ← FF, CANCELBR[RLMapFixB2, 3]{return to ftFFre},	c2;


{AB crosses:}

ftABcar: rhTT ← Q LRot0,		c3;
	Map ← [rhTT,TT],	GOTO[ftABcn],	c1;

ftABcros: TT ← uHCnt, L0 ← L0.RedoFtAB,	c3;
	Q ← uWCnt,		c1;
	TT ← TT + Q{1}, c2;
	Noop,	c3;
	Noop,	c1;
	rhTT ← uABvH,	c2;

{	AB ← uWCnt,		c1;
	TT ← TT + AB{0}, rhTT ← uABvH{0},	c2;}
	Q ← uABvL,		c3;
	
	Map ← TT ← [rhTT,TT + Q], CarryBr,	c1;
ftABcn:	Q ← rhTT + 1, BRANCH[$, ftABcar], LOOPHOLE[byteTiming],	c2;
	AB ← rhAB ← MD, XwdDisp{XDirtyDisp},		c3;
	
ftABre:	MAR← Q	← [AB,TT+3],  DISP2[ftABmp],	c1, at[L0.RedoFtAB,10,WMapFixCallerB2];
	MDR	← rlBW, AB ← Q, BRANCH[ftABok, ftABcros{not expected},1],	c2, at[1, 4, ftABmp];

	Rx ← AB, CANCELBR[WLMapFixB2, 3],	c2, at[0, 4, ftABmp];
	Rx ← AB, CANCELBR[WLMapFixB2, 3],	c2, at[2, 4, ftABmp];
	Rx ← AB, CANCELBR[WLMapFixB2, 3],	c2, at[3, 4, ftABmp];

{CD crosses:}

ftCDcar: rhTT ← Q LRot0,		c3;
	Map ← [rhTT,TT],	GOTO[ftCDcn],	c1;

ftCDcros: TT ← uHCnt, L0 ← L0.RedoFtCD,	c3;
	Q ← uWCnt,		c1;
	TT ← TT + Q{1}, c2;
	Noop,	c3;
	Noop,	c1;
	rhTT ← uCDvH{0},{%}	c2;
	Q ← uCDvL,		c3;
	
	Map ← TT ← [rhTT,TT + Q], CarryBr,	c1;
ftCDcn:	Q ← rhTT + 1, BRANCH[$, ftCDcar], LOOPHOLE[byteTiming],	c2;
	CD ← rhCD ← MD, XwdDisp{XDirtyDisp},		c3;
	
ftCDre:	MAR← Q	← [CD,TT+3],  DISP2[ftCDmp],	c1, at[L0.RedoFtCD,10,WMapFixCallerB2];
	MDR	← rlDW, CD ← Q, BRANCH[ftCDok, ftCDcros{not expected},1],	c2, at[1, 4, ftCDmp];

	Rx ← CD, CANCELBR[WLMapFixB2, 3],	c2, at[0, 4, ftCDmp];
	Rx ← CD, CANCELBR[WLMapFixB2, 3],	c2, at[2, 4, ftCDmp];
	Rx ← CD, CANCELBR[WLMapFixB2, 3],	c2, at[3, 4, ftCDmp];

{CW:	MAR← CD	← [CD,CD-1],		FloatA{Nop} ← umCW,	{fpUMP,}	c1; {..}
	MDR	← umCW,			FloatA{N}← umCW, WriteOK,{fpULP,} c2, CANCELBR[$,2];
		 		{rmBW{6} ←}	FloatAB	← umXX, FTimes, fpUMP,	c3, L2Disp; {+CY}
	MAR← AB	← [AB,AB-1],		FloatAB	← ulXX, 	fpULP,	c1, BRANCH[fmAW2, NewW, 5];
	
{fmCW:}
mCW:	FloatA{Nop} ← FloatR,	fpUMP,	c1; {..}
	FloatA{Nop} ← FloatR,,	fpULP,	c2;
	rmCY{rmBW} ←	FloatA ← FloatR, FPlus,	c3; {+CY}
	
mAW:	rlCY{rmAA} ←	FloatA	← FloatR,	c1;
	AB ← 0,	c2;
	CD ← 0,	GOTO[FlBB],	c3;
	
lBB:	FloatA ← FloatR, 	c1;
	FloatA ← FloatR,	c2;
	FloatA{Nop} ← FloatR, GOTO[fmBB],	c3, {..} L2Disp;}