DragonFP.tioga <> <> Floating Point Instructions Several conditions have contributed toward a change in the floating point control scheme: A desire to reduce the size of the Instruction Decode PLA New multicycle floating point instructions (single and double precision divide) Newly available chained floating point operations Possible future instruction timing problems Opportunity to use the EU stack as a source of mode changes The cost of the proposed changes will be use of 2 more 3 byte opcodes, four floating point shadow registers in the EU, and some floating point flags added to the IFUStatus bits. In addition, when floating point instructions are active, it may be necessary to save/restore up to all four of the floating point shadow registers during process switching (just as carry and the field register must be saved/restored). Although each floating point device (alu and multiplier) has on-board double word registers for both the A and B operands, the proposed scheme only allows the A operand registers to be used to hold values between instructions. Interprocess integrity is preserved by maintaining shadow registers in the EU (FpAluM, FpAluL, FpMultM, FpMultL) and by keeping flags describing the way in which the registers for both devices were loaded last (single, double, integer, clear). Clearing these flags indicates the registers are no longer important and need not be maintined during process switches. SFP alpha - Store FP - is used to load the A operand registers for either the floating point ALU or multiplier (as well as their EU shadows) from the EU Stack and to set the appropriate IFUStatus flags. SFP is also used to set the mode in the floating point devices. alpha[0] 0 => ALU 1 => Multiplier alpha[1] 0 => Store A Operand 1 => Execute Floating Point modeControlOperator alpha[6..7] 3 => A operand is integer - stores one word from stack 2 => A operand is double - stores two words from stack 1 => A operand is single - stores one word from stack 0 => A operand is clear - clears flags - indicates end of FP operations for the device Floating Point Mode Both the ALU and multiplier devices have 16 bits of mode state. This mode state must be changed 4 bits at a time using the SFP instruction where the top of the EU stack contains a modeControlOperator described below. The IFU register FPMode maintains a shadow copy of the mode state of both devices (16 multiplier mode bits at the high end). Currently, only one of the four nybls for each device need be manipulated by the 'user'. The following table describes the types of mode information: User mode: Floating point rounding {nearest, zero, plusInfinity, minusInfinity} Fixed point rounding {fpRounding, zero} Fast mode {Fast, IEEE} Mode fixed by implementation: Flowthrough Timing => 1 Accumulation Timing ALU => 0, Multiplier => 1 modeControlOperator: If alpha[1]=1 then modeControlOperator is taken from stack modeControlOperator[ 0..23] = 8 (funny control bit) modeControlOperator[26..27] = modeNyblIndex modeControlOperator[28..31] = modeNybl modeNyblIndex=1 modeNybl[0..1] = Mult/Div accumulation rate timer control 00 => 1 cycle for ALU @ 100ns cycle 01 => 2 cycles for Mult @ 100ns cycle modeNyblIndex=2 modeNybl[0..2] = Flowthrough timer control 001 => 2 cycles for both devices @ 100ns cycle modeNyblIndex=3=User Nybl modeNybl[0..1] = Floating point rounding mode 00 Round toward nearest 01 Round toward zero 10 Round toward Positive infinity 11 Round toward negative infinity modeNybl[2] = Fixed point rounding mode 0 Round according to Floating point rounding mode 1 Round toward zero modeNybl[3] = Fast mode 0 IEEE mode 1 Flush denormalized operands and results to zero For example, to set the user nybl in the multiplier for Floating Point -> Round toward Positive infinity, Fixed -> Round according to Floating point rounding mode, Fast mode -> IEEE One would: LIDB 00111000 00001000 (remember [S+1] _ 0 0 beta alpha, S _ S+1) SFP 11000000 The shadow registers may be read or written without side effects using LIP and SIP. FLIP/FLOP alpha beta - Floating point operation FLIP => put result In device. FLOP => put result On stack. The FLIP or FLOP instructions specify the FP operation. They also specifies the type of B operand (always located on the EU stack) and the destination of the result. The B operand can be: none single double The result of the operation can go in one of three places: top of the EU Stack (FLOP) the A operand of the FP alu (FLIP) the A operand of the FP multiplier (FLIP) FLIP and FLOP intructions also contain a 6 bit field which allows the number of wait cycles to be specified. FLIP/FLOP alpha beta alpha[0] 0 => ALU operation 1 => Multiplier operation alpha[1] 0 => FLIP => put result In ALU, FLOP => put result On stack 1 => FLIP => put result In Mult, FLOP => put result On stack alpha[2..7] 6 bit function code beta[0..1] 0 => no B operand 1 => B operand is single 2 => B operand is double beta[2..7] - 58 - # wait cycles This field helps to reduce the size of the decode PLA by precomputing an internal microcycle branch. So far there are exactly 4 values depending on the function code. double divide => 2 single divide => 31 double multiply => 54 otherwise => 56 FPMaskStatus register A 4 bit status code is returned at the end of each FLIP or FLOP instruction. This is ORed into the corresponding bit in the 16 bit status field in the FPMaskStatus register. Because the first 3 bits of the status field represent the result of compare operations they are always cleared prior to this update. A floating point fault is generated if the bit corresponding to the status code in the 16 bit mask of the FPMaskStatus register is not set. The FPMaskStatus register may be read or written using LIP and SIP. The Weitec 1164/1165 documentation contains extensive descriptions of how various status codes are generated. The code value, priority and description are as follows (10 is the highest priority status): Code Priority Description 0000 1 Result was +/- zero exact (compare equal) 0001 1 Result was +/- infinity exact (compare less than) 0010 1 Result was finite and # zero exact (compare gr than) 0011 1 Result was finite and # zero inexact 0101 2 Overflow and inexact 0110 3 UnderFlow 0111 4 UnderFlow and inexact 1000 5 Operand A Denormalized 1001 5 Operand B Denormalized 1010 6 Operands A+B Denormalized 1011 7 Divide by zero 1100 9 Operand A NaN 1101 9 Operand B NaN 1110 10 Operands A+B NaN 1111 8 Invalid operation IFUStatus register The IFUStatus register contains the following state Traps {enabled, disabled} Reschedule {clear, waiting} Mode {kernal, user} FPAluAreg {clear, single, double, integer} FPMultAreg {clear, single, double, integer} Floating Point Functions Double precision operands are stacked with the MSW pushed first. Interpretation of symbols in following function tables: I32 = integer F** = F32 or F64 - floating point number W** = W32 or W64 - wrapped FP num U** = U32 or U64 - unrounded wrapped FP num *32 = F32 or U32 - result may be too close to zero *64 = F64 or U64 - result may be too close to zero Multiplication xxx 000 F32/U32 _ F32 * F32 000 xxx A * B xxx 001 F64/U64 _ F32/F64 * F32/F64 001 xxx |A| * B xxx 010 F32/U32 _ W32 * F32 010 xxx A * |B| xxx 011 F64/U64 _ W32/W64 * F32/F64 011 xxx |A| * |B| xxx 100 F32/U32 _ F32 * W32 100 xxx - A * B xxx 101 F64/U64 _ F32/F64 * W32/W64 101 xxx -|A| * B xxx 110 F32/U32 _ W32 * W32 110 xxx - A * |B| xxx 111 F64/U64 _ W32/W64 * W32/W64 111 xxx -|A| * |B| ALU ALU-Subtract ALU-Compare (status only) 00 000 0 *32 _ F32 - F32 10 000 0 F32 - F32 00 000 1 *64 _ F64 - F64 10 000 1 F** - F** 00 001 0 *32 _|F32 - F32| 10 001 0 00 001 1 *64 _|F64 - F64| 10 001 1 00 010 0 10 010 0 |F32| - |F32| 00 010 1 10 010 1 |F**| - |F**| 00 011 0 *32 _ F32 / F32 10 011 0 00 011 1 *64 _ F** / F** 10 011 1 00 100 0 *32 _-F32 + 0 10 100 0 F32 - 0 00 100 1 *64 _-F64 + 0 10 100 1 F** - 0 00 101 0 10 101 0 00 101 1 10 101 1 00 110 0 *32 _-F32 + 0 10 110 0 00 110 1 *64 _-F64 + 0 10 110 1 00 111 0 *32 _ W32 / F32 10 111 0 00 111 1 *64 _ W** / F** 10 111 1 ALU-Add ALU-Convert 01 000 0 *32 _ F32 + F32 11 000 0 U32 to F32 (exact) 01 000 1 *64 _ F64 + F64 11 000 1 U64 to F64 (exact) 01 001 0 *32 _|F32 + F32| 11 001 0 D32 to W32 01 001 1 *64 _|F64 + F64| 11 001 1 D64 to W64 01 010 0 *32 _|F32|+ |F32| 11 010 0 U32 to F32 (inexact) 01 010 1 *64 _|F64|+ |F64| 11 010 1 U64 to F64 (inexact) 01 011 0 *32 _ F32 / W32 11 011 0 01 011 1 *64 _ F** / W** 11 011 1 01 100 0 *32 _ F32 + 0 11 100 0 F** to I32 01 100 1 *64 _ F64 + 0 11 100 1 F** to I32 01 101 0 11 101 0 I32 to F32 01 101 1 11 101 1 I32 to F64 01 110 0 *32 _|F32| + 0 11 110 0 F64 to *32 01 110 1 *64 _|F64| + 0 11 110 1 F32 to F64 01 111 0 *32 _ W32 / W32 11 111 0 01 111 1 *64 _ W** / W** 11 111 1