:IF[WithFloatingPoint]; ******************************** TITLE[MesaFP]; *Mesa single-precision floating point %Ed fiala 8 June 1982: Added better Unpack routine as a comment; debug FSc. Ed Fiala 20 May 1982: Fix register assignments. Ed Fiala 11 May 1982: Reverse args for FSC. Ed Fiala 2 April 1982: Adapt entry and trapping to new Pilot conventions; move page assignments to GlobalDefs. Ed Fiala 25 September 1981: Prefix names with "fp"; move fpSticky register for Cedar; implement square root and floating scale; insert more page breaks. Ed Fiala 15 June 1981: Improve speed about 60 percent; bum 7 registers; expended 23b mi net. Produce 0 on underflow for Wilhelm (non-spec) as option. Ed Fiala 20 March 1981: add Pilot conditionals; bum 2 registers; save 11b mi. Original version by Jim Boyce. Floating point numbers are represented in two different formats. Arguments on the stack are IEEE single precision reals with the following format: bit 0 sign (1 means negative) 1:8 exponent biased by 127d (exponent of 177b with a fraction of 0 is the integer 1; exponent of 0 with a fraction of 0 is true 0; exponent of 0 with non-0 fraction is denormalized [i.e., the fraction does not have an implicit "1." in front of it]; exponent of 377b with a fraction of 0 is infinity; exponent of 377b with non-0 fraction is Not-A-Number) 9:31 fraction (implicitly preceded by '1.' if non-0 and normalized.) These numbers appear on the stack with the word holding bits 0-15 nearer to the top of stack. Most operations unpack the operands into a format that is easier to work with. This format uses three words interpreted as follows: ExpSign bits 1:8 are the exponent (biased by 127d), bit 15d the sign. NOTE: FAdd, FSub, FMp, FDiv, etc. must and do compensate for the addition of 1 to the exponent which occurs in the Renorm subroutine. Frac The fraction. When unpacked, the leading one appears in Frac[1]; when about to be packed again, the leading bit is in Frac[0], Frac[31d] is a StickyBit (the inclusive or of all remaining bits); Fix and Round handle rounding slightly differently. FracH 16 high order bits of Frac. FracL 16 low order bits of Frac. Three registers for each of the two arguments hold a floating point number in unpacked form. These registers have names with 1's or 2's in them. Numbers are often referred to by index (SB or DB). Arg1 has index 2 and Arg2 has index 3. The IEEE specification calls for four different rounding modes, two methods of handling underflow/denormalized numbers (we have added a third), two kinds of infinity. We are in the process of converting to a Real package that records all mode settings in the fpSticky register accessible to microcode. When this conversion is complete, microcode will optionally implement as much of the package as it wants to, but trap when an opcode cannot be completed in microcode. We fall short in that the fpSticky register is not saved in the Mesa process state. In all other situations, microcode either completes the operation or traps to software, as indicated below. POSSIBLE IMPROVEMENTS: 1) Put call to RoundFrac in FDiv exit saving 6 cycles. 2) Faster Renorm called from Float and subtraction. 3) Eliminate 3 cycles in zero-arg testing for all two-arg opcodes. 4) Use 31b max shift rather than 40b for Unnorm entry from FAdd/FSub; possibly do the Unnorm entry differently for negative fpTemp. 5) Implement variant rounding modes, infinities, gradual denormalization, infinity on overflow, not-a-number on divide by zero, mod on fix/round overflow. 6) Get the fpSticky register and ? software bits in the process state. FPSTICKY REGISTER 0 0 OR 1 into fpSticky[15d] on every inexact result 1 Trap on any inexact result 1-2 0 Trap on any denormalized result (user may be interested in the loss of precision) 1 Substitute 0 on underflow (non-IEEE; requested by Wilhelm) 2 Gradually denormalize on underflow (not in microcode) 3 -- 3 0 Projective infinity (only one unsigned infinity; compare of anything with infinity traps; not sure what other operations are supposed to do). 1 Affine infinity (+ infinity and - infinity both defined; arithmetic and comparisons work as expected). 4-5 0 Round to nearest (unbiased; round to even if halfway) 1 Round toward 0 (truncate)--not in microcode 2 Round toward plus infinity--not in microcode 3 Round toward minus infinity--not in microcode 6 0 Trap if denormalized args are supplied 1 Normalize the arguments and then use them (not in microcode) 7 0 Trap on invalid operations (compare of projective infinity, Not-a-number as an argument) 1 Result is the infinity or not-a-number (not in microcode) 8 0 Trap on overflow of Fix or Round operation 1 Return low-order 16d bits of the result (not in microcode) 9 0 Trap on divide-by-zero 1 Stuff in not-a-number on divide-by-zero and continue (not in microcode) 10d 0 Trap on arithmetic overflow 1 Stuff in infinity on arithmetic overflow and continue (not in microcode) 11-14d -- undefined 15d 0 All results have been exact 1 One or more inexact results have occurred (i.e., rounding has taken place). Softare is expected to execute the FSticky opcode to exchange a new value on the stack with the value currently in Sticky; then it executes floating point opcodes in the new mode settings. The fpSticky register should be saved with the Mesa process state. Also, the software state of the floating point processor includes occurrence bits for all the events which have traps: Inexact result occurred (temporarily kept in fpSticky), Denormalized arguments were supplied; Invalid operations occurred; Fix/Round overflow occurred; Divide by zero occurred; Arithmetic overflow occurred; Underflow occurred. The microcode need not keep these occurrence bits because the software can originally cause the events to trap, remember the event in storage, and turn off the microcode trap enable. **What about keeping the events in the process state? In the current microcode implementation, any argument with 377b or 0b exponent and any result which overflows will trap to software. ROUNDING MODES Rounding occurs at SB[fpFracL] bit 7 for arithmetic or bit 15d for Fix and Round. The bit following the one rounded is the rounding bit, and all lower bits are sticky bits. If sticky bits are all 0, the result is exact, and no rounding occurs. Otherwise, if the InexactResult trap is enabled, trap. Otherwise, round in the direction specified by the rounding mode. "Round to nearest" goes in the obvious direction when the sticky bits are .ne. 0; if rounding bit is 1 and sticky bits .eq. 0, the number is exactly halfway between two possible results, so by convention round to the nearest even number. % *40, 53, and 70 are available temporaries. *Note that TrapParm=50 and RTemp=52 are also used, but the registers *fpExpSign, fpFracH, and fpFracL are not used (because they are used with *SB[fpExpSign], etc. RV[fpExpSign,70]; RV[fpFracH,44]; RV[fpFracL,64]; RV[fpExpSign1,72]; RV[fpFrac1H,46]; RV[fpFrac1L,66]; RV[fpExpSign2,73]; RV[fpFrac2H,47]; RV[fpFrac2L,67]; RV[fpSticky,75]; *See page 1 comment--NOT A TEMPORARY RV[fpCode,44]; *Fix, Round RV[fpCCTemp,71]; *Unnorm RV[fpTemp,45]; *Unnorm, FAdd, FSub RV[fpCount,71]; *FMul RV[fpFrac2HLSh1,51]; *FMul RV[fpFrac2HLSh1p1,45]; *FMul RV[fpOne,44]; *FMul RM[fpFrac1LL,IP[fpFrac2L]]; *FMul RV[fpTFrac,51]; *FDiv RV[fpFrac2H',71]; *FDiv RV[fpFrac2H'p1,45]; *FDiv RV[fpFrac2Hp1,44]; *FDiv RM[fpRhi,IP[fpFrac1L]]; *FSqRt RM[fpRlo,IP[fpFrac1H]]; *FSqRt %Esc bytecodes with alpha in [100b to 117b] are floating point operations. The 1st mi below is the Esc dispatch table entry for alpha in this range. RTemp holds alpha and alpha+EscTrapOffset is the trap location. T is loaded with 30b, stored in SALUF as the A + A + Cy0 ALU operation. RTemp and T hold the Esc alpha byte at entry here; two arg opcodes dispatch on the low four bits of RTemp not only initially but also out of Unpack2. The various Fix/Round opcodes use fpCode to control dispatches as follows: 10:11 FixDone dispatch (Long 0, Integer 1, Cardinal 2) 12:15 Unpack dispatch (FAdd 0, FSub 1, FMul 2, FDiv 3, FComp 4, Fix 6, Round 10, FSqRt 13, FSc 14); 2-arg opcodes dispatch on RTemp[12:15d], 1-arg opcodes on fpCode[12:15d]. Timing from the beginning of the opcode through the dispatch table below is 10 cycles + 2*2.25 cycles for buffer refill = 14.5 cycles; approximate timing for the whole opcode is shown in the table below. % TrapParm _ T, LoadPage[fpPage0], Disp[.+1], At[EscTop,4]; fpFrac2H _ T _ 30C, GoToP[Unpack2], DispTable[20]; *FAdd (~154) fpFrac2H _ T _ 30C, GoToP[Unpack2]; *FSub (~156) fpFrac2H _ T _ 30C, GoToP[Unpack2]; *FMul (~387) fpFrac2H _ T _ 30C, GoToP[Unpack2]; *FDiv (~420) fpFrac2H _ T _ 30C, GoToP[Unpack2]; *FComp (~92) fpCode _ 6C, GoToP[Unpack1]; *Fix (~70) fpFrac2H _ T _ 30C, GoToP[Float]; *Float (~45+Renorm) fpCode _ 26C, GoToP[Unpack1]; *FixI (~74) fpCode _ 46C, GoToP[Unpack1]; *FixC (~68) T _ Stack&-1, GoToP[FSticky]; *FSticky (~24) LoadPage[opPage0], GoToP[fpTrapExit]; *FRem unimplemented fpCode _ 10C, GoToP[Unpack1]; *Round (~83) fpCode _ 30C, GoToP[Unpack1]; *RoundI (~87) fpCode _ 50C, GoToP[Unpack1]; *RoundC (~81) fpCode _ 13C, GoToP[Unpack1]; *FSqRt (~480) fpCode _ 14C, GoToP[.+1]; *FSc (~92) OnPage[fpPage0]; T _ (Stack&-1) - 1; fpFrac1H _ T, GoTo[Unpack1]; %Unpack2 unpacks the first Real on the stack into fpExpSign2, fpFrac2H, and fpFrac2L and the 2nd into fpExpSign1, fpFrac1H, and fpFrac1L and returns SB pointing at arg1 and fpFrac1H in T for FAdd, FSub, FMul, and FDiv. Unpack1 unpacks one real on the stack into fpExpSign2, fpFrac2H, and fpFrac2L and returns SB and DB pointing at arg2 for Fix and Round. Unpack also filters out numbers which the microcode can't handle, traping on unnormalized numbers, Infinities, and NotANumbers. Timing = 31 cycles for Unpack1, 48 for 2 args (-3 per arg that is 0 and +1 per arg that is negative). % OnPage[fpPage0]; *Simultaneously do SALUF_30 and fpFrac2H_60b, where SB_fpFrac2H puts the two *leading ones (i.e., bits 10:11d) into the word-addressing part of SB and *select word 3 of an addressed quadword. Unpack2: fpFrac2H _ (fpFrac2H) + (SALUF _ T); fpFrac2H _ (SB _ fpFrac2H) - 1, Task; T _ LdF[Stack,11,7]; *BBFBX causes SBX_SB; the SB_ here takes effect when BBFBX in the last mi of *Unpack is executed. SB _ fpFrac2H, fpFrac2H _ T, BBFBX, NoRegILockOK, Call[Unpack2A]; fpFrac2H _ (LSh[fpFrac2H,7]) or T, Call[Unpack]; Dispatch[RTemp,14,4]; T _ fpFrac1H _ (LSh[fpFrac1H,7]) or T, Disp[@FAdd]; Unpack1: fpFrac2H _ T _ 30C; fpFrac2H _ (fpFrac2H) + (SALUF _ T), Task; SB _ fpFrac2H; DB _ fpFrac2H, BBFBX, Call[Unpack]; *BBFBX will cause SBX_SB Dispatch[fpCode,14,4]; T _ fpFrac2H _ (LSh[fpFrac2H,7]) or T, Disp[@Fix]; *Position exponent in bits 1:8 and sign in bit 15d because: *(a) XOR'ing signs and adding or subtracting exponents in FMul and FDiv can *be accomplished in one operation; *(b) Convenient overflow/underflow test in sign bit of word; and *(c) Convenient for packing result. Unpack: T _ LdF[Stack,11,7]; SB[fpFracH] _ T; Unpack2A: T _ (LdF[Stack&-1,1,17]) and not T, GoTo[UnpNeg,R<0]; SB[fpExpSign] _ T, GoTo[UnpNZ,ALU#0]; *Zero exponent may represent true 0 or denormalized number. UnpZ: T _ Stack&-1; LU _ SB[fpFracH] or T; SB[fpFracL] _ T, BBFBX, GoTo[DenormTrap,ALU#0]; CInexactOK: Return; UnpNeg: SB[fpExpSign] _ (Zero) + T + 1, GoTo[UnpZ,ALU=0]; UnpNZ: LU _ SB[fpExpSign] + (200C); SB[fpFracH] _ SB[fpFracH] or (200C), GoTo[NaNTrap,ALU<0]; T _ LSh[Stack,7]; SB[fpFracL] _ T, BBFBX; T _ RSh[Stack&-1,11], Return; %This is supposed to be a better version of the above. Caller does SB[fpFracL] _ T, BBFBX; in the mi after the return. Unpack: T _ (Stack) and not (177C), GoTo[UnpNeg,R<0]; SB[fpExpSign] _ T, GoTo[UnpNZ,ALU#0]; UnpZ: T _ LSh[Stack&-1,11]; LU _ (Stack) or T; *DenormTrap coincident with NaNTrap (costs 2 cycles/zero argument). SB[fpFracH] _ T, DblGoTo[DenormTrap,UnpZRet,ALU#0]; UnpNeg: T _ (LdF[Stack,1,17]) and T; SB[fpExpSign] _ (Zero) + T + 1, GoTo[UnpZ,ALU=0]; UnpNZ: T _ (LSh[Stack&-1,11]) + 1; SB[fpFracH] _ T; T _ RSh[Stack,11]; LU _ SB[fpExpSign] + (200C); SB[fpFracH] _ (RCy[SB[fpFracH],2]) or T, GoTo[NaNTrap,ALU<0]; UnpZRet: T _ LSh[Stack&-1,7], Return; % %Entries are PackA and Renorm; NormGo is entered by FAdd/FSub and FSc; Float contains a copy of the mi at Renorm. Renorm left-shifts the fraction SB[fpFracH/L] until SB[fpFracH][0] is 1 and subtracts the number of positions shifted from the exponent. Then it rounds SB[fpFracH],,SB[fpFracL] at SB[fpFracL][7] using SB[fpFracL][8] for rounding and SB[fpFracL][9:15d] as sticky bits. Finally, it performs range checks, pushes the number onto the stack, and exits. NOTE: If an overflow occurs, the possible range of exponents is 377 to 376+376-200 or 377b to 574b (For FMul with 0 NormSteps). For an underflow, the possible range is 0 or 777b down to 0-376b+177b-1; i.e., 0 or 777b down to 600b (For FDiv). Timing from Renorm to exit: 8 x nsteps + 30 + (if sticky bits .eq. 0 after normalizing: 1 if odd or 3 if even) + (3 if fpFracL carries rounding up (+5 if fpFracH also carries)). The number of normalizing steps will be: 0 for FDiv or FSqRt 1 for FSc (starting at NormGo); 0 or 1 for FMul; 0 or 1 for FAdd/FSub if signs are the same; 1 to 24d for FAdd/FSub if signs differ (starting at LongNormGo); 0 to 15d steps for Float (starting at LongRenorm). % NormStep: *Only Renorm and copy of Renorm in Float jump to NormStep. SB[fpExpSign] _ SB[fpExpSign] - (200C); LongNormGo: *Odd placement; paired with mi at SubNonNeg+1. NormGo: SB[fpFracL] _ SB[fpFracL] SALUFOP T, Task; SB[fpFracH] _ SB[fpFracH] SALUFOP T, UseCOutAsCIn; *Even placement; paired with MulDone+2. The load of T here is used at Pack2 *via RndNr, Inexact, InexactNoTrap. Renorm: T _ LSh[SB[fpFracH],10], GoTo[NormStep,R>=0]; Pack: Dispatch[fpSticky,4,2]; *Dispatch on rounding mode LU _ SB[fpFracL] and (177C), Disp[.+1]; *Round to nearest: Test sticky bits; begin assuming sticky bits .ne. 0 SB[fpFracL] _ SB[fpFracL] + (200C), DblGoTo[RndEx,RndNr,ALU=0], DispTable[4]; LoadPage[opPage0], GoTo[fpTrapExit]; *Truncate (i.e., round toward 0) LoadPage[opPage0], GoTo[fpTrapExit]; *Round toward plus infinity LoadPage[opPage0], GoTo[fpTrapExit]; *Round toward minus infinity *Assumption correct--continue rounding. RndNr: SB[fpFracH] _ SB[fpFracH] + 1, UseCOutAsCIn, DblGoTo[RndUp,Inexact,Carry]; *Assumption wrong--all sticky bits are 0--fixup result. RndEx: Dispatch[SB[fpFracL],7,3], Skip[Carry']; *If carry occurred, then the number was exactly halfway between an odd *number and the next higher even number, and it is now correctly rounded. SB[fpFracH] _ SB[fpFracH] + 1, GoTo[RndUp]; T _ LSh[SB[fpFracH],10], Disp[.+1]; Inexact: *Was halfway and odd--now fixed up. fpSticky _ (fpSticky) or (1C), DblGoTo[InexactTrap,InexactNoTrap,R<0], At[RoundDisp,0]; *Was exact and even--adding 200b didn't affect it. LU _ SB[fpExpSign] + (400C), DblGoTo[OverUnderflow,Pack2,R<0], At[RoundDisp,2]; *Was halfway beteen even & next higher odd--undo +200b. SB[fpFracL] _ SB[fpFracL] - (200C), GoTo[Inexact], At[RoundDisp,4]; RndUp: T _ LSh[SB[fpFracH],10], GoTo[.+3,Carry']; *Adding 400b to SB[fpExpSign] here while leaving SB[fpFracH] .eq. 0 works *except when exponent is 376b--false overflow gets generated. SB[fpExpSign] _ SB[fpExpSign] + (200C); SB[fpFracH] _ 100000C; fpSticky _ (fpSticky) or (1C), GoTo[InexactTrap,R<0]; InexactNoTrap: *Was exact and odd. LU _ SB[fpExpSign] + (400C), DblGoTo[OverUnderflow,Pack2,R<0], At[RoundDisp,6]; Pack2: T _ (RSh[SB[fpFracL],10]) or T, GoTo[Overflow,ALU<0]; Stack&+1 _ T; *Push low-order result word T _ SB[fpExpSign] and not (177C), Task; *Simultaneously offset the exponent by 1 while OR'ing in the fraction. T _ (LdF[SB[fpFracH],0,10]) + T; *Push high-order result & exit PackA: T _ (LSh[SB[fpExpSign],17]) or T, GoTo[FlPush]; %Improvement for Float and FAdd/FSub with different signs. Timing: 12/tristep + (5, 12, or 14 for the exit 0, 1, or 2 step cases). This is 2 slower for 0 steps, 1 faster for 1 step, and 5 faster for 2 steps; 12 faster for 3 steps; etc. Renorm goes 1 faster if all places which might use more than 1 NormStep jump to LongNorm. LongNorm: Dispatch[SB[fpFracH],1,2], GoTo[PackL,R<0]; SB[fpExpSign] _ SB[fpExpSign] - (200C), Disp[.+1]; Norm3: T _ RSh[SB[fpFracL],15], At[x,0]; SB[fpFracL] _ LSh[SB[fpFracL],3], Task; SB[fpFracH] _ (LSh[SB[fpFracH],3]) or T; SB[fpExpSign] _ SB[fpExpSign] - (400C), GoTo[LongNorm]; Norm2: T _ RSh[SB[fpFracL],16], At[x,1]; SB[fpFracL] _ LSh[SB[fpFracL],2]; SB[fpFracH] _ (LSh[SB[fpFracH],2]) or T; SB[fpExpSign] _ SB[fpExpSign] - (200C), GoTo[PackL]; NormGo: SB[fpFracL] _ SB[fpFracL] SALUFOP T, Skip, At[x,2]; SB[fpFracL] _ SB[fpFracL] SALUFOP T, At[x,3]; SB[fpFracH] _ SB[fpFracH] SALUFOP T, UseCOutAsCIn; PackL: T _ LSh[SB[fpFracH],10], GoTo[Pack]; NormStep: SB[fpExpSign] _ SB[fpExpSign] - (200C), GoTo[NormGo]; % %NegFrac performs a 2's complement negation on SB[fpFracH],,SB[fpFracL] and returns the new SB[fpFracL] in T. Timing: 6 cycles if SB[fpFracL] is 0, else 7 cycles. % NegFrac: T _ SB[fpFracL]; NegFrac1: SB[fpFracH] _ SB[fpFracH] xnor (0C), Skip[ALU=0]; T _ SB[fpFracL] _ (Zero) - T, Return; SB[fpFracH] _ SB[fpFracH] + 1, Return; %RoundCard rounds SB[fpFracH],,T using MNBR[0] as rounding bit and MNBR[1:17b] as sticky bits; T is unchanged if no rounding occurs, or contains the new SB[fpFracL] if rounding occurs. Timing: 4 if exact, 12 if sticky bits .ne. 0, 13 if sticky bits .eq. 0. % RoundCard: LU _ (MNBR) SALUFOP T, Skip[ALU#0]; Return; *Exact *Round up if halfway and sticky bits non-0. T _ (RZero) + T, UseCOutAsCIn, Skip[ALU=0]; SB[fpFracH] _ SB[fpFracH] + 1, UseCOutAsCIn, GoTo[CInexact]; *Sticky bits all 0, rounding bit 1: round up if odd, down if even, but just *added 1, so result is correct if SB[fpFracL] now even, else 1 too big. SB[fpFracH] _ SB[fpFracH] + 1, UseCOutAsCIn; T _ (Form-2[AllOnes]) and T; CInexact: fpSticky _ (fpSticky) or (1C), DblGoTo[InexactTrapA,CInexactOK,R<0]; %Unnorm right-shifts the fraction DB[fpFracH/L] by fpTemp[1:8] positions with underflow going into MNBR. MNBR[1:15] are sticky bits. Unnorm is called on FAdd, FSub, Fix, FixI, FixC, Round, RoundI, and RoundC. On FAdd and FSub, it right-shifts the fraction of the number with the smaller exponent by an amount equal to the difference in the exponents, so that the fractions can be added. On Fix or Round, it shifts the fraction so that its exponent can be made 235b, which positions the decimal point to the right of fpFracL bit 15d. Use of MNBR to capture underflow and the 37b maximum shift is needed for Fix/Round, which use MNBR[0] for rounding and MNBR[1:15d] as sticky bits. FAdd/FSub worst case is when signs differ and subtraction of fractions necessitates two normalization steps afterwards; in this case DB[fpFracL] bit 10d will become the rounding bit, so a maximum shift of 31b suffices, MNBR is not needed, and underflowing bits can result in OR'ing into any of DB[fpFracL] bits 11d to 15d. Timing: 8 cycles if shift count .ge. 40b (i.e., the leading fraction bit in DB[fpFracH] bit 1 is shifted further than bit 0 of MNBR); 14 if count is 0; 25 if count is 1 to 17b; 16 if count is 20b; or 27 if count is 21b to 37b. % Unnorm: LU _ (fpTemp) - (10000C); T _ (LdF[fpTemp,5,4]) - 1, GoTo[.+3,ALU<0]; *More than two words--any value .ls. 40000b ok here MNBR _ R400, BBFBX; T _ DB[fpFracH] _ 0C, Return; fpCCTemp _ T, BBFBX, GoTo[Unnorm0,ALU<0]; CycleControl _ fpCCTemp; *CycleControl _ RSCount - 1, so MWX _ RSCount - 1 and DBX _ 0. For WFA, this *will left-cycle by 17b - DBX - MWX (= 20b - RSCount) and mask with 1's in *bits DBX to MWX (= 0..(RSCount - 1)). fpCCTemp _ 16C; Dispatch[fpTemp,4,1]; fpCCTemp _ (fpCCTemp) - T, LoadPage[fpPage3], Disp[.+1]; MNBR _ WFA[DB[fpFracL]], GoToP[.+1], At[UnnormDisp,0]; *1 to 17b OnPage[fpPage3]; T _ WFA[DB[fpFracH]]; *CycleControl _ 16b - (RSCount mod 20b - 1), knowing RSCount is non-zero, or *DBX _ 0 and MWX _ 17b - (RSCount mod 20b). For RF, this left-cycles by *MWX+DBX+1 (= 20b - RSCount) and masks with 1's in bits 17b-MWX to 17b *(= RSCount mod 20b to 17b). CycleControl _ fpCCTemp; T _ (RF[DB[fpFracL]]) or T; DB[fpFracH] _ RF[DB[fpFracH]], Return; *21b to 37b LU _ DB[fpFracL] - 1, LoadPage[fpPage0], GoToP[.+1], At[UnnormDisp,1]; OnPage[fpPage3]; * +1 always .eq. or (1C) here T _ (WFA[DB[fpFracH]]) + 1, UseCOutAsCIn, GoToP[.+1]; OnPage[fpPage0]; CycleControl _ fpCCTemp, fpCCTemp _ T, NoRegILockOK; MNBR _ fpCCTemp; T _ RF[DB[fpFracH]]; Unnorm1: DB[fpFracH] _ 0C, Return; Unnorm0: Dispatch[fpTemp,4,1]; T _ MNBR _ DB[fpFracL], Disp[.+1]; MNBR _ RZero, Return, DispTable[2]; *No shift T _ DB[fpFracH], GoTo[Unnorm1]; *Shift exactly 1 word %Arrive at OverUnderflow for exponents 401b to 777b with the exponent field offset by -1. Exponents of 0 or of 777b down to 577b represent underflow; 377b to 576b represent overflow. Overflow always traps. Gradual denormalization is not handled in microcode and traps. A microcode implementation should sensibly handle not only denormalized results but also denormalized arguments. For an underflowing result, microcode would first right-shift the fraction one position because the implicit leading 1 isn't there with a zero exponent. Then it would right-shift the fraction while counting the exponent up to zero. Rounding would take place after shifting, and the special case in which rounding normalizes the number would be dealt with. Unpacking would normalize the fraction while counting the exponent to a negative value. % OverUnderflow: LU _ SB[fpExpSign] - (137400C); *Distinguish overflow and underflow LU _ (fpSticky) and (20000C), Skip[ALU<0]; DblGoTo[Pack0,FloatTrap,ALU#0]; *Underflow--check mode LoadPage[opPage0], GoTo[fpTrapExit]; *Overflow Overflow: *For exponent .eq. 377b or 400b LoadPage[opPage0], GoTo[fpTrapExit]; InexactTrap: LoadPage[opPage0], GoTo[fpTrapExit]; FixTrapA: LoadPage[opPage0], GoTo[fpTrapExit]; FixTrapB: LoadPage[opPage0], GoTo[fpTrapExit]; FixTrap: LoadPage[opPage0], GoTo[fpTrapExit]; NaNTrap: LoadPage[opPage0], GoTo[fpTrapExit]; InexactTrapA: DenormTrap: LoadPage[opPage0], GoTo[fpTrapExit]; FloatTrap: LoadPage[opPage0]; *Even placement; paired with mi at Pack0. fpTrapExit: RTemp _ (RTemp) + (Sub[EscTrapOffset!,SDOffset!]C), GoToP[P4Trap]; %Addition and subtraction are almost the same. 1) Unpack the arguments, negate Arg2 if FSub, check for zeroes. 2) Point DB at the argument with the smaller exponent, SB at the other, and call Unnorm to right-shift the DB argument by the difference in the exponents. 3) Add or subtract magnitudes, depending on the signs. 5) Renormalize the result, round, and push it on the stack. Get here after T _ fpFrac1H; SB selects arg1. Timing @FAdd to Renorm: 24 + (2 if FSub) + (5 if Exp1 .ls. Exp2) + (7 if subtraction produces negative fraction) + (1 if Unnorm produces ones in MNBR). Add Unnorm time to this. For those cases which enter Renorm at NormGo, 4 cycles have been subtracted. Subtraction can produce a negative result only when Exp1 .eq. Exp2; Unnorm produces a sticky bit only for exponent differences larger than 7. 0 or 1 NormSteps will be performed after addition; 1 or 2 NormSteps will be performed after subtraction usually; but up to 24d NormSteps might be performed if 0.5 < arg1/arg2 < 2.0. % @FSub: fpExpSign2 _ (fpExpSign2) xor (1C), At[UnpackRet,1]; *Complement sign @FAdd: LU _ (fpFrac2H) and T, At[UnpackRet,0]; *Simultaneously compute the exponent difference in bits 0:8 while XOR'ing *the signs in bit 15d and setting up a 3 in bits 10:11d for SB_ and DB_. *The "1" in "61C" here prevents sign subtraction from propagating higher. T _ (fpExpSign2) - (61C), GoTo[Add0,ALU=0]; T _ (fpExpSign1) - T, Call[FAddA]; *Return here after Unnorm. Or the Sticky bit into DB[fpFracL] bit 15 and test *cycle the (Sign1 xor Sign2) bit into the ALU sign for testing. LU _ MNBR; LU _ RCy[fpTemp,1], Skip[ALU=0]; T _ (RSh[AllOnes,17]) or T, DblGoTo[AddFrac,SubFrac,ALU<0]; DblGoTo[AddFrac,SubFrac,ALU<0]; *fpTemp _ [(Exp1-Exp2) lsh 7] + 61b + S1 - S2 FAddA: fpTemp _ T, Skip[ALU<0]; *Exp1 .ge. Exp2: Point SB at arg1 (it already does), DB at arg2. DB _ fpTemp, GoTo[Unnorm]; *Exp1 .ls. Exp2: Point SB at arg2, DB at arg1; negate the count. T _ 157C; *fpTemp _ [(Exp1-Exp2)' lsh 7] + 41b + S1 - S2; SB _ 3. fpTemp _ (SB _ fpTemp) xnor T; *fpTemp _ [(Exp2-Exp1) lsh 7] + 21b + S1 - S2; DB _ 2. fpTemp _ (DB _ fpTemp) + T + 1, GoTo[Unnorm]; *The signs are equal; add the fractions. DB[fpFracL] is in T. AddFrac: SB[fpFracL] _ SB[fpFracL] + T, Task; T _ DB[fpFracH] + 1, UseCOutAsCIn; SB[fpFracH] _ SB[fpFracH] + T, GoTo[Renorm]; *The signs are different; subtract the fractions. DB[fpFracL] is in T. SubFrac: SB[fpFracL] _ SB[fpFracL] - T; T _ DB[fpFracH], FreezeResult; SB[fpFracH] _ T _ (SB[fpFracH]) - T, UseCOutAsCIn; *13 cycles since last task here. LU _ SB[fpFracL] or T, GoTo[SubNeg,ALU<0]; SubNonNeg: SB[fpExpSign] _ SB[fpExpSign] - (200C), GoTo[LongNormGo,ALU#0]; T _ Stack&+1 _ 0C, GoTo[FlPush]; *Get here only when the exponents are equal and the DB fraction is larger *than the SB fraction. 2 to 24d NormSteps will be performed. SubNeg: T _ SB[fpFracL], Call[NegFrac1]; *Odd placement *Simultaneously complement the sign and subtract 1 from the exponent. SB[fpExpSign] _ SB[fpExpSign] - (177C), GoTo[LongNormGo]; %Multiply. 1) Unpack arguments leaving normalized fractions in fpFracH[1:15],,fpFracL[0:8]. 2) Check for zero arguments. GoTo Mul0 if either arg denormalized; since Unpack has trapped for any non-0 denormalized arg, the "and" at @FMul is always non-0 for both args non-0. 3) Set sign and exponent of Frac1 to Exp1+Exp2-177b+1-1, where -177b compensates for the excess in Exp1+Exp2, +1 for the NormStep normally required, and -1 for the exponent offset in Renorm; altogether this is Exp1+Exp2-177b. Adding fpExpSign1 to fpExpSign2 also XOR's the sign bit for the result. 4) Shift fpFrac2 so non-0 bits are in fpFrac2H[8:15],,fpFrac2L[0:15]; then copy fpFrac2L into MNBR for the inner loops; fpFrac2L becomes the low-order product word fpFrac1LL. 5) Left-shift fpFrac1H/L 2 positions and add fpFrac2H to fpFrac1L; the leading multiplier bit is known to be 1, so the inner loop is entered with the multiplier lsh 1 in fpFrac1H/L/LL[0:25b] and the multiplicand in fpFrac1H/L/LL[30:57b]. 6) Perform the shift and add loop 23d times, since the 1st of 24d steps was done before starting the loop. Initially, two 0 bits lie between the low-order multiplier and high-order product bits. Each step shifts the multiplier and product in fpFrac1H/L/LL and conditionally adds the multiplicand in fpFrac2H/MNBR. Carries may make the 2nd of the two 0's between the multiplier and product into a 1, but the 1st 0 will remain 0. 7) Do 0 or 1 NormSteps at Renorm. Timing @FMul to Renorm for non-0 args: (36 or 39) + 8*(10/zero or 19/one) + 15*(6/zero, [11 or 14]/one). Total averages 293.75 cycles. % @FMul: LU _ (fpFrac2H) and T, At[UnpackRet,2]; T _ (fpExpSign2) - (37400C), GoTo[Mul0,ALU=0]; fpExpSign1 _ (fpExpSign1) + T, LoadPage[fpPage4]; T _ RSh[fpFrac1L,16]; OnPage[fpPage4]; fpFrac1H _ (LSh[fpFrac1H,2]) or T, Task; T _ LSh[fpFrac2H,11]; fpFrac1LL _ (RSh[fpFrac2L,7]) or T; *For that 1st multiplier 1 T _ fpFrac2H _ RSh[fpFrac2H,7], Task; fpFrac1L _ (LSh[fpFrac1L,2]) or T; fpOne _ 1C; LU _ (MNBR _ fpFrac2L) SALUFOP T; T _ (fpFrac2H) SALUFOP T, UseCOutAsCIn; fpFrac2HLSh1 _ T, LoadPage[fpPage2]; fpFrac2HLSh1p1 _ (Zero) + T + 1; OnPage[fpPage2]; fpCount _ 6C, Call[FMNoA1]; *Total of 5 + 3*fpCount iterations %With this unrolled loop, the exterior step runs in 10 + 9/add cycles and interior steps in 6 + (5 or 8)/add cycles; the interior step can be replicated 0 to 4 times, trading space for speed. Each replication costs 10d mi each (+5 mi for initializing the three registers needed). fpCount and entry point have to be adjusted. The 1st fast loop saves an average of 2.625/cycles bit; the 2nd saves an additional .875 cycles/bit; 34d saves .4375 cycles/bit, etc. I haven't found any way of making the slow exterior loop look more like the fast one and still get in the required tasking. % fpCount _ (fpCount) - 1, GoTo[MulDone,R<0]; fpFrac1LL _ (fpFrac1LL) SALUFOP T; fpFrac1L _ (fpFrac1L) SALUFOP T, UseCOutAsCIn; fpFrac1H _ (fpFrac1H) SALUFOP T, UseCOutAsCIn, GoTo[FMAd1,R<0]; FMNoA1: fpFrac1LL _ (fpFrac1LL) SALUFOP T; fpFrac1L _ (fpFrac1L) SALUFOP T, UseCOutAsCIn; FMAdS2: fpFrac1H _ (fpFrac1H) SALUFOP T, UseCOutAsCIn, GoTo[FMAd2,R<0]; FMNoA2: fpFrac1LL _ (fpFrac1LL) SALUFOP T; fpFrac1L _ (fpFrac1L) SALUFOP T, UseCOutAsCIn; FMAdS3: *Additional loop replications go here fpFrac1H _ (fpFrac1H) SALUFOP T, UseCOutAsCIn, DblGoTo[FMAd3,FMNoA3,R<0]; FMAd1: T _ (MNBR) SALUFOP T; fpFrac1LL _ (LSh[fpFrac1LL,1]) + T, Skip[R<0]; T _ (fpFrac2HLSh1) + 1, UseCOutAsCIn, Skip; T _ (fpFrac2HLSh1p1) + 1, UseCOutAsCIn; fpFrac1L _ (LSh[fpFrac1L,1]) + T, GoTo[FMAdS2,R>=0]; **Carry is impossible for the 1st 7 iterations of the loop. T _ (fpOne) + 1, UseCOutAsCIn; fpFrac1H _ (LSh[fpFrac1H,1]) + T, GoTo[FMNoA2,R>=0]; FMAd2: T _ (MNBR) SALUFOP T; fpFrac1LL _ (LSh[fpFrac1LL,1]) + T, Skip[R<0]; T _ (fpFrac2HLSh1) + 1, UseCOutAsCIn, Skip; T _ (fpFrac2HLSh1p1) + 1, UseCOutAsCIn; fpFrac1L _ (LSh[fpFrac1L,1]) + T, GoTo[FMAdS3,R>=0]; T _ (fpOne) + 1, UseCOutAsCIn; fpFrac1H _ (LSh[fpFrac1H,1]) + T, GoTo[FMNoA3,R>=0]; FMAd3: *Additional loop replications go here T _ MNBR; fpFrac1LL _ (fpFrac1LL) + T; T _ fpFrac2H, FreezeResult; fpFrac1L _ (fpFrac1L) + T, UseCOutAsCIn; FMNoA3: fpFrac1H _ (fpFrac1H) + 1, UseCOutAsCIn, Return; *If fpFrac2L ne 0, set the fpFracL1[15d] to control rounding. MulDone: LU _ (fpFrac1LL) - 1, LoadPage[fpPage0]; *7 cycles from last task to Renorm. fpExpSign1 _ (fpExpSign1) - (200C), GoToP[Renorm,Carry']; OnPage[fpPage0]; *10 cycles from last task to Renorm. fpFrac1L _ (fpFrac1L) or (1C), GoTo[Renorm]; %Division. Arg1 is dividend, arg2 divisor; quotient in MNBR/fpTFrac. 1) Unpack arguments. 2) Trap if divisor is 0; else return 0 quotient if dividend is 0. 3) Copy fpFrac1L into fpTFrac so that fpFrac1L can accumulate the quotient. 4) Set sign and exponent of the result: Exp1-Exp2+177b-1 is the exponent if the initial subtract succeeds, where 177b supplies the offset and -1 compensates for the +1 in Renorm; if the initial subtract fails, an additional -1 compensates for the extra quotient bit generated. The subtraction of the exponents also xor's the signs, but it is necessary to offset fpExpSign1 by 2 to ensure that if arg1 is positive and arg2 negative, the carry from subtracting the signs won't affect the exponent calculation. 5) Compute the variations on fpFrac2H needed for the inner loop. 6) Initialize fpFrac1L to 1. On the 1st iteration, a 1 will be produced if dividend fraction is larger than divisor fraction; if so, 25d iterations produce a 24d bit result and an extra bit for rounding. If the 1st iteration produces a 0, a 26th iteration is required, and fpFrac1L is reinitialized to 1. The 1st inner loop exit occurs when the leading 1 has migrated out of the sign bit; the 16d bits in fpFrac1L are copied into MNBR, and fpFrac1L is reinitialized to 200b; the inner loop is then repeated 9d more times. After the 2nd inner loop exit, 25d quotient bits are in MNBR/fpFrac1L; the remainder serves as a sticky bit. The 1st inner loop exit is distinguished from the 2nd by the sign of fpFrac1L. 7) The shift and add-or-subtract loop is primed with an initial subtraction; then each step combines shifting the dividend with adding or subtracting the divisor according to the quotient bit produced by the previous step. 8) Undo the last subtraction if the right-most result bit is a 1, so that the remainder will correctly be 0 or non-0 for rounding. 9) Copy MNBR into fpFrac1H and OR into fpFrac1L for rounding if remainder non-0. 10) No normalization is required; round and push result. Timing @FDiv to Renorm: (52 or 53) + 24*(10 to 12) + (12 or 13 if dividend < divisor) + (3 if 25th bit is 0) + (3 if final remainder .ne. 0). % @FDiv: LU _ (fpFrac2H) and T, At[UnpackRet,3]; *Go if either arg denormalized, since denormalized implies 0 because Unpack *will have trapped for any non-zero denormalized argument. T _ (fpFrac2H) xnor (0C), Skip[ALU#0]; T _ fpExpSign2, GoTo[Div0]; fpFrac2H' _ T; fpFrac2Hp1 _ (Zero) - T; fpFrac2H'p1 _ (Zero) + T + 1; *Have to open-code the first subtraction. T _ fpFrac2L, LoadPage[fpPage1]; T _ (fpFrac1L) - T; OnPage[fpPage1]; fpTFrac _ T, FreezeResult, Task; T _ (fpFrac2H') + 1, UseCOutAsCIn; fpFrac1H _ (fpFrac1H) + T, Call[DivStart]; *Loop here 10 to 12 cycles fpFrac1H _ (LSh[fpFrac1H,1]) + T; *Shift in quotient bit fpFrac1L _ (fpFrac1L) SALUFOP T, UseCOutAsCIn, GoTo[DivSub,Carry]; *Shift the remainder left 1 bit and add the divisor. DivAdd: T _ fpFrac2L, FreezeResult, GoTo[DivAddLast,Carry]; DivAdR: fpTFrac _ (LSh[fpTFrac,1]) + T, Skip[R<0]; T _ (fpFrac2H) + 1, UseCOutAsCIn, Return; T _ (fpFrac2Hp1) + 1, UseCOutAsCIn, Return; *Shift the remainder left 1 bit and subtract the divisor. DivSub: T _ fpFrac2L, FreezeResult, GoTo[DivSubLast,Carry]; DivSbR: fpTFrac _ (LSh[fpTFrac,1]) - T, Skip[R<0]; T _ (fpFrac2H') + 1, UseCOutAsCIn, Return; T _ (fpFrac2H'p1) + 1, UseCOutAsCIn, Return; *Initialize fpFrac1L so the sign will become 1 on the final iteration; *a 1 will then be shifted into the sign for a normalized result. DivStart: T _ fpExpSign2 _ (fpExpSign2) - (37400C), GoTo[DivExtraStep,Carry']; fpExpSign1 _ (fpExpSign1) - T; fpFrac1L _ 3C, GoTo[DivSub]; *Initial subtract succeeded *Initial subtract failed--extra iteration and exponent is one smaller. DivExtraStep: T _ (fpExpSign2) + (200C); fpExpSign1 _ (fpExpSign1) - T; fpFrac1L _ 1C, GoTo[DivAdd]; *Get to DivAddLast or DivSubLast once after the result bits for the *high quotient word are accumulated and again after the final iteration. DivAddLast: fpExpSign1 _ (fpExpSign1) + 1, DblGoTo[FDivX,.+2,ALU>=0]; DivSubLast: fpExpSign1 _ (fpExpSign1) + 1, GoTo[FDivX,ALU>=0]; MNBR _ fpFrac1L, Skip[R Odd]; fpFrac1L _ 200C, GoTo[DivAdR]; fpFrac1L _ 200C, GoTo[DivSbR]; *Really done; position the fraction for normalization and rounding. FDivX: fpFrac1L _ LSh[fpFrac1L,7], Skip[R Even]; T _ fpFrac1H, LoadPage[fpPage0], GoTo[DivMore1]; fpTFrac _ (fpTFrac) + T; T _ (fpFrac2H) + 1, UseCOutAsCIn; T _ (fpFrac1H) + T, LoadPage[fpPage0]; DivMore1: LU _ (fpTFrac) or T; OnPage[fpPage0]; T _ MNBR, Skip[ALU=0]; fpFrac1L _ (fpFrac1L) or (1C); *20 to 27 cycles from last task to Renorm. fpFrac1H _ T, GoTo[Renorm]; %Square root. Conceptually, 25d iteration steps produce 24 data bits + 1 extra bit for rounding; fpRhi/RLo and FracH/L form a single register left-shifted 2 each step; fpRhi/RLo, called R, are initially 0. The square root Q accumulates in fpFrac2L/H (initially 0). An iteration is as follows: If 4*R - 4*Q - 1 >= 0, then R _ 4*R - 4*Q - 1 and Q _ 2*Q + 1, else R _ 4*R; Q _ 2*Q. or alternatively as follows (with Q multiplied by 2): Q _ 2*Q; R _ 4*R if R - Q - 1 >= 0 do [ R _ R - Q - 1; Q _ Q + 2 ] The exact algorithm is as follows: 1) Unpack1 leaves SB pointing at Frac2 and a normalized fraction in Frac2[1:24]. With the decimal point between bits 1 and 2, Exp2 is biased by 177b and 1 <= Frac2 < 2. 2) Return 0 if Frac2H is 0. 3) Trap if the sign is 1 (imaginary result). 4) Conditionally left-shift Frac 1 bit if Exp2 is even and then compute the exponent as Exp2 _ (Exp2-177b)/2 + 177b = (Exp2+177b)/2; after this 1 <= Frac < 4 and 1 <= SqRt[Frac] < 2. **ACTUALLY Exp2 _ (Exp2+176b)/2-1/2 5) The first Q bit is known to be 1; this step is done by hand while initializing registers. 6) 7 steps shift FracH only into fpRlo and build Q only in fpFrac2L; since Q is multiplied by 2 throughout, this iteration terminates when the leading 1 in Q is in bit 7. 7) 6 steps shift FracL only into fpRlo and build Q only in fpFrac2L; the final step shifts in a zero; this iteration terminates at the onset of the step in which the leading 1 in fpFrac2L is shifted into bit 0. 8) 11d steps shift in zeroes, building R in both fpRhi/Rlo and Q in both fpFrac2H/2L; before starting this iteration both R and Q are left-shifted so that the final step will leave the leading 1 in Q in the sign bit of fpFrac2H. 9) 1 is OR'ed into fpFrac2L bit 15d if either fpFracLLL or fpFracLL is non-zero ('sticky' bit for rounding). 10) No normalization is required; round and push the result. Timing @FSqRt to Renorm: 32 + (3 if Frac lsh 1) + 7*(13/zero or 12/one) + 6*(12/zero or 11/one) + 11*(18 to 20/zero or 19 to 21/one) = 380 to 429 cycles. % *T has fpFrac2H just put through the ALU. @FSqRt: T _ fpExpSign2 _ (fpExpSign2) + (LShift[176,7]C), Skip[ALU#0], At[UnpackRet,13]; T _ Stack&+1 _ 0C, GoTo[PackA]; fpExpSign2 _ RCy[fpExpSign2,1], GoTo[FSq0,H2Bit8]; T _ fpFrac2L _ (fpFrac2L) SALUFOP T; fpFrac2H _ (fpFrac2H) SALUFOP T, UseCOutAsCIn, GoTo[FSq1]; *Here Frac2L is copied into Frac1L so that the root can wind up in Frac2. *Copying Frac2H into Frac1H is unnecessary because Frac2H is consumed before *the square root accumulating in Frac2L overflows. *1st iteration, done by hand while copying Frac2L to Frac1L, is: * Frac2L _ 2, FracRlo _ RSh[Frac2H,16] - 1, Frac2H _ LSh[Frac2H,2] FSq0: T _ fpFrac2L; FSq1: fpExpSign2 _ (fpExpSign2) - (LShift[1,6]C), Skip[R>=0]; LoadPage[opPage0], GoTo[fpTrapExit]; fpFrac1L _ T, LoadPage[fpPage5]; T _ (RSh[fpFrac2H,16]) - 1; OnPage[fpPage5]; fpRlo _ T; *Work with 2*Q throughout; initialize fpFrac2L with the 1st root bit lsh 1. T _ fpFrac2L _ 4C, Call[.+2]; *7 iterations of 1st loop exhaust Frac2H. T _ fpFrac2L _ (fpFrac2L) SALUFOP T, GoTo[FSq2,H2Bit8]; T _ (LdF[fpFrac2H,2,2]) - T - 1; *Odd placement fpRlo _ (LSh[fpRlo,2]) + T; FSq0X: fpFrac2H _ LSh[fpFrac2H,2], GoTo[.+3,Carry]; FSqLpa: T _ fpFrac2L; fpRLo _ (fpRlo) + T + 1, Return; FSqLpb: T _ fpFrac2L _ (fpFrac2L) + (2C), Return; *7 iterations of 2nd loop exhaust fpFrac2L producing a 15d-bit result. FSq2: T _ (LdF[fpFrac2H,2,2]) - T - 1; *Even placement fpRlo _ (LSh[fpRlo,2]) + T, Call[FSq0X]; T _ fpFrac2L _ (fpFrac2L) SALUFOP T; T _ (RSh[fpFrac1L,16]) - T - 1, GoTo[FSq3,ALU<0]; fpRlo _ (LSh[fpRlo,2]) + T; fpFrac1L _ LSh[fpFrac1L,2], DblGoTo[FSqLpb,FSqLpa,Carry]; *The leading 1 in fpFrac2L was just left-shifted into bit 0 and 14d result *bits are in fpFrac2L. Now left-shift both Frac2 and R so termination will *occur after 11d iterations with the leading 1 in fpFrac2H bit 0. The initial *shift also accomplishes the shift of Frac2 and R for the next step, so Frac2 *is shifted by 6 (for a total shift of 7 since we just shifted Frac2 by 1) and *R is shifted by 8d (one more than the Q shift). FSq3: T _ RSh[fpRLo,10]; fpRHi _ T, LoadPage[fpPage6]; T _ RSh[fpFrac2L,12]; OnPage[fpPage6]; fpFrac2H _ T, LoadPage[fpPage7]; T _ fpFrac2L _ LSh[fpFrac2L,6]; OnPage[fpPage7]; fpRLo _ (LSh[fpRLo,10]) - T - 1, Call[FSq8]; *Rhi _ 4*Rhi + 2 bits from Rlo T _ RSh[fpRlo,16]; fpRhi _ (LSh[fpRhi,2]) + T; *Q _ 2*Q; Rlo _ 4*Rlo - Qlo - 100b; Rhi _ Rhi + Qhi' + Carry T _ fpFrac2L _ (fpFrac2L) SALUFOP T; fpRlo _ (LSh[fpRlo,2]) - T - 1, Skip[Carry]; T _ fpFrac2H _ LSh[fpFrac2H,1], DblGoTo[FSq4,FSq5,Carry]; T _ fpFrac2H _ (LSh[fpFrac2H,1]) + 1, DblGoTo[FSq4,FSq5,Carry]; FSq4: T _ (fpRhi) - T, DblGoTo[FSq9,FSq7,ALU<0]; FSq5: T _ (fpRhi) - T - 1, GoTo[FSq7,ALU>=0]; FSq9: FreezeResult, Call[FSq7]; *OR sticky bits into fpFrac2L bit 15d. T _ (fpRhi) or T, LoadPage[fpPage0]; LU _ (AllOnes) + T; OnPage[fpPage0]; fpFrac2L _ (fpFrac2L) + 1, UseCOutAsCIn, GoTo[Renorm]; OnPage[fpPage7]; FSq7: fpFrac2L _ (fpFrac2L) + (200C), GoTo[FSq6,Carry]; T _ fpFrac2L _ (fpFrac2L) - (200C); T _ fpRlo _ (fpRlo) + T + 1, Return; FSq6: fpRhi _ T; T _ fpRlo _ (fpRlo) and not (77C), Return; FSq8: T _ fpFrac2H, DblGoTo[FSq4,FSq5,Carry]; %Floating scale. 2OS,,3OS are a floating point number whose exponent is scaled by TOS, and integer such that -202b <= TOS <= 200b. (Larger values reverse overflow and underflow reporting, so overflow could substitute 0 rather than trapping in some cases.). The opcode definition requires that FSc work only for -200b <= TOS <= 200b. % @FSc: T _ LSh[fpFrac1H,7], Skip[ALU#0], At[UnpackRet,14]; T _ Stack&+1 _ 0C, GoTo[PackA]; fpExpSign2 _ (fpExpSign2) + T, GoTo[NormGo]; *ZERO ARGUMENTS. %FAdd/FSub: Get to Add0 knowing that (fpFrac1H) and (fpFrac2H) .eq. 0; since both args were either normalized or zero if passed by Unpack, passed by one/both fractions must be zero. If just one is 0, return the other. If both are 0, return 0 (negative 0 if both args are negative 0). SB points at arg1. % Add0: LU _ fpFrac1H; LU _ fpFrac2H, Skip[ALU=0]; Stack&+2, GoTo[NormReturn]; *fpFrac1H .ne. 0, so fpFrac2H .eq. 0 T _ fpExpSign2, Skip[ALU#0]; *fpFrac1H and fpFrac2H both 0. fpExpSign1 _ (fpExpSign1) and T, GoTo[Pack0]; *The following would work except that on FSub the sign must be complemented: * Stack&+3; * Stack&-2 _ Stack&-2; * Stack&+3; * Stack&-2 _ Stack&-2, GoTo[NormReturn]; fpTemp _ 60C; *fpFrac2H .ne. 0 SB _ fpTemp; BBFBX, GoTo[Renorm]; %Divide. Trap if divisor (fpFrac2) is 0; else dividend (Frac1) is 0, so return 0 with sign the xor of the dividend and divisor signs. Since Unpack has already trapped for all denormalized args except zeroes, the checks for 0 look at fpFracH bit 1. T contains fpExpSign2. Multiply. Return 0 with sign equal to the xor of the multiplier and multiplicand signs. T contains fpExpSign2 - 40000b. % Div0: LU _ (fpFrac2H) and (40000C); Mul0: fpExpSign1 _ (fpExpSign1) xor T, GoTo[FloatTrap,ALU=0]; Pack0: T _ Stack&+1 _ 0C, GoTo[PackA]; %Floating Comparison returns INTEGER -1 if arg1 .ls. arg2, 0 if arg1 .eq. arg2, or 1 if arg1 .gr. arg2. Timing @FComp to completion: 15 or 16 cycles if .eq., 27 to 30 cycles if .ne. **Unpacking apparently serves only to eliminate not-a-numbers and denormalized **numbers from testing. % @FComp: T _ LSh[fpExpSign1,17], Task, At[UnpackRet,4]; Stack&+3; LU _ (LSh[fpExpSign2,17]) xor T; T _ Stack&-2, GoTo[CompDiffSign,ALU<0]; LU _ (Stack&+3) - T; *LU _ arg1Lo - arg2Lo T _ Stack&-2, FreezeResult; *T _ arg1Hi - arg2Hi T _ (Stack&-1) - T, UseCOutAsCIn, GoTo[CompNon0Test,ALU#0]; Comp0Test: Stack _ T, FreezeResult, GoTo[NormReturn,ALU=0]; fpExpSign2 _ (fpExpSign2) + 1, DblGoTo[.+3,.+2,Carry']; CompNon0Test: fpExpSign2 _ (fpExpSign2) + 1, Skip[Carry']; fpExpSign1, DblGoTo[CompL,CompG,R Odd]; fpExpSign2, DblGoTo[CompL,CompG,R Odd]; CompG: Stack _ 1C, GoTo[NormReturn]; CompL: Stack _ (Zero) - 1, GoTo[NormReturn]; NormReturn: LU _ NextInst[IBuf]; NIRet; *If the signs differ, it is nevertheless possible that both fractions are 0 *(positive and negative 0). When the signs differ, we don't care which way *the Carry branch condition goes at Comp0Test+1 (???). CompDiffSign: T _ fpFrac1H; T _ (fpFrac2H) or T, GoTo[Comp0Test]; %Float LONG INTEGER to REAL 1) Copy the argument into Frac1 2) Check sign, and negate Frac1 if negative. 3) Avoid 16d of NormSteps if 1st word is 0; special case entirely 0. 4) Set exponent and Renormalize. Timing Float to Renorm: 21 cycles if positive, 28 if negative; +4 if 1st word insignificant. % Float: T _ Stack&-1, SALUF _ T; fpFrac1H _ 40C, Task; SB _ fpFrac1H, fpFrac1H _ T, NoRegILockOK; fpExpSign1 _ T, BBFBX; *Start building sign T _ Stack&-1, Call[FloatA]; *Have fpFrac1L in T from either FloatA or NegFrac. FloatB: fpExpSign1 _ (fpExpSign1) - (200C); LU_ fpFrac1H; fpExpSign1 _ (fpExpSign1) + (43400C), Skip[ALU=0]; fpExpSign1 _ (fpExpSign1) + (4000C), GoTo[LongRenorm]; fpFrac1H _ T; fpFrac1L _ 0C, Skip[ALU=0]; LongRenorm: T _ LSh[SB[fpFracH],10], DblGoTo[Pack,NormStep,R<0]; T _ Stack&+1 _ 0C, GoTo[FlPush]; FloatA: fpFrac1L _ T; fpExpSign1 _ RSh[fpExpSign1,17], DblGoTo[NegFrac,FloatB,R<0]; %Fix or Round REAL to LONG INTEGER, INTEGER, or CARDINAL. 1) Unpack the argument. 2) Unnormalize Frac1 so that the decimal point is positioned to the right of fpFrac1L bit 15d. 3) Round, if appropriate (controlled by fpCode) 4) Negate the fraction if sign is negative. 5) Several things depending on the result type (controlled by fpCode) LONG INTEGER: Push double-precision result. INTEGER: Trap if positive or negative number is larger than 15d bits, else push single-precision result. CARDINAL: Trap if fraction word 0 is non-zero, else push single-precision result. Timing from @Fix to exit (not including Unnorm): Fix 21 if 0, 23 if .gr. 0, 29 if .ls 0; FixC 19 if 0, 21 if .gr. 0, 27 if .ls 0; FixI 24 if 0, 26 if .gr. 0, 33 if .ls. 0; Add 2 + RoundCard time for the analogous Round operations. % *Computation is 236b - Exp1 (the low order +1 prevents sign subtraction *from propagating into the exponent field and inverts the branch condition). *GoTo FixTrap if exponent .gr. 235b; in this case the number is .ge. 2^31d *(31d bits holds all numbers .ls. 2^31. FixRA: SB[fpExpSign] _ T _ SB[fpExpSign] - (200C); LU _ SB[fpFracH], GoTo[FixTrap,ALU>=0]; fpTemp _ (Zero) - T, GoTo[Unnorm,ALU#0]; T _ SB[fpFracL], GoTo[FrPos]; *Fraction is 0 @Fix: SB[fpExpSign] _ SB[fpExpSign] - (47000C) - 1, Call[FixRA], At[UnpackRet,6]; *SB[fpFracL] is in T here. SB[fpExpSign], DblGoTo[FRPos,FRNeg,R Odd]; @Round: SB[fpExpSign] _ SB[fpExpSign] - (47000C) - 1, Call[FixRA], At[UnpackRet,10]; *SB[fpFracL] is in T here. LU _ MNBR, Call[RoundCard]; *SB[fpFracL] is still in T. SB[fpExpSign], Skip[R Odd]; FRNeg: LU _ T, Call[NegFrac1]; FRPos: Dispatch[fpCode,12,2]; LU _ SB[fpFracH], Disp[.+1]; *Cardinal GoTo[FixTrapA,ALU#0], At[FixRndDisp,2]; FlPush: LU _ NextInst[IBuf]; Stack&+1 _ T, NIRet; *Integer SB[fpExpSign], FreezeResult, GoTo[.+3,R Even], At[FixRndDisp,1]; LU _ T, GoTo[FixTrapB,ALU#0]; DblGoTo[FixTrapA,FlPush,ALU<0]; LU _ SB[fpFracH] xnor (0C); LU _ (LSh[AllOnes,17]) xor T, DblGoTo[FixTrapB,.-2,ALU#0]; *Long integer Stack&+1 _ T, At[FixRndDisp,0]; T _ SB[fpFracH], GoTo[FlPush]; %FSticky reads and sets the InexactResultTrapEnable and InexactResult Bit. It swaps Stack and fpSticky. % FSticky: MNBR _ fpSticky, fpSticky _ T, NoRegILockOK; T _ MNBR, GoTo[FlPush]; PFetch4[PCB,IBuf,4], GoToExternal[MesaRefillLoc], At[LShift[fpPage0,10],377]; END[MesaFP]; :ELSE; ************************************************ TITLE[No.floating.point.microcode]; :ENDIF; *********************************************** e6(2048)\f5 14318f0 10f5 61f0 10f5 56f0 10f5 7621f0 10f5 83f0 10f5 42f0 10f5 38f0 10f5 38f0 10f5 37f0 10f5 37f0 10f5 54f0 10f5 76f0 10f5 15259f0 10f5