DragonFPNotes.tioga
Last modified by Curry - July 10, 1984 11:25:15 am PDT
Main Files Affected
DragonFP.mesa DragonFPImpl.mesa
DragonMicrocode.mesa DragonMicrocodeImpl.mesa
FP.rose IFP.rose IDecoder.rose IPipe.rose IFU.rose
    MicroInstruction  Cell interface
(add)   fpCSLoad     Lev0FPCSLoadBA
(add)   fpCSUAlu    Lev0FPCSUAluBA
(add)   fpCSUMult    Lev0FPCSUMultBA
(add)   XASources (7+9=16 => add one bit):
fpLdMode,
fpLdAMsw,  fpLdALsw,
fpLdBMsw,  fpLdBLsw,
fpUlMultMsw, fpUlMultLsw,
fpUlAluMsw, fpUlAluLsw,
FPStatusB  >EnumType["DragonFP.Status"],
FPCSLoadAB  <EnumType["DragonFP.CSLoad"],
FPCSUAluAB <EnumType["DragonFP.CSUnload"],
FPCSUMultAB <EnumType["DragonFP.CSUnload"],
KBus during Phase A
It is the responisbility of the Microcode (using multicycles) to insure that there is no conflict for the KBus when it is used at level 4 to move data from the EU to the IFU. SIFUR instructions are assessed a 3 cycle penalty by this requirement. Since the other instructions using the KBus at level4 (SJ, SFC and SFCI) are branching instructions, there is no penalty. When the KBus is used in this way, the CAddr passed to the EU in the previous phase B is in the range of IFU Registers.
KBus conflicts caused by FP ops should never happen since the load signals are driven from level3 and the unload signals from level 2 (just like EUAluRtIsK).
Floating Point operation
The two floating point chips operate using a single phase clock which makes it's active transition between PhA and PhB. In order to save about 12 to 15 pins (3 pins redundant), the KBus is used during PhA to move the load[0..5], unload[0..2] and function[0..5] signals from the IFU to the FP chips. The 3 ChipSelect signals are driven over separate pins.
Reject and Fault occuring in the instruction preceding a FP op are handled correctly (it says here). That is, the Load Chip Select signal is disabled during the next A,B cycle and the IFP section of the IFU knows to ignore a set mode function.
The only guaranteed state associated with FP operations is the Mode, Mask and Flag registers. There are no FP ops which allow 'partial' operations which assume the previous state of internal FP chip registers (ie AM, AL, BM, BL etc). Each FP op loads all its operands from the stack, waits a function specific numer of cycles then moves any results back to the stack.
IFP
The IFP is a section of the IFU which deals with two sets of data:
It maintains a copy of the mode register.
Write: The mode register is written as a side affect of setting the mode registers in the two FP chips. When both the high order Function bits (FPAlu, FPMult) are set and FPCSLoad is enabled then Function is interpreted as a new nibble in the Mode register as described in the Weitek documentation.
Read: The mode register may be read using a LIFUR instruction (Lev0BAddr matches FPMode).
It maintains copies of the 16 bit Mask and Flag registers and issues Reject and FPFfault.
Write: CSUnLoad during PhA causes the floating point Status signals during PhB (1.5 cycles later) to be decoded. The three unsticky flags (used for legal floating point comparisons) are cleared and the flag corresponding to the decoded status signal is set. If any of the curent flags are not masked then Reject and FPFault are asserted. The reject causes the EU to freeze and not store the FP data currently on the PBus, and the fault causes the IDecoder logic during the next phase A to generate a FP trap exception.
Write: The Mask and Flag registers may also be written using a SIFUR instruction (Lev3Caddr matches FPMaskFlag).
Read: The Mask and Flag registers may be read using a LIFUR instruction (Lev0BAddr matches FPMaskFlag).
FP format: FPOp alpha
alpha: BYTE = FPMult: BOOL, FPAlu: BOOL, FPFunction: CARDINAL[0..63]
IF FPMult AND FPAlu
THEN Set Mode
ELSE Execute FPFunction for specified device. (one BOOL must be TRUE)
Binary op operands (A op B) are stacked in the EU with A pushed first.
Double precision operands are stacked in the EU with the most significant word pushed first.
alpha for ALU OPs
Subtract Compare (returns only status)
01 00 000 0 F32 - F32 01 10 000 0 F32 - F32
01 00 000 1 F64 - F64 01 10 000 1 F64 - F64
01 00 001 0 |F32 - F32| 01 10 001 0 -F32 + F32
01 00 001 1 |F64 - F64| 01 10 001 1 -F64 + F64
01 00 010 0 01 10 010 0 |F32| - |F32|
01 00 010 1 01 10 010 1 |F64| - |F64|
01 00 011 0 01 10 011 0
01 00 011 1 01 10 011 1
01 00 100 0 -F32 + 0 01 10 100 0 F32 - 0
01 00 100 1 -F64 + 0 01 10 100 1 F64 - 0
01 00 101 0 01 10 101 0
01 00 101 1 01 10 101 1
01 00 110 0 -F32 + 0 01 10 110 0
01 00 110 1 -F64 + 0 01 10 110 1
01 00 111 0 01 10 111 0
01 00 111 1 01 10 111 1
Add Convert
01 01 000 0 F32 + F32 01 11 000 0 U32 to D32 (exact)
01 01 000 1 F64 + F64 01 11 000 1 U64 to D64 (exact)
01 01 001 0 |F32 + F32| 01 11 001 0 D32 to W32
01 01 001 1 |F64 + F64| 01 11 001 1 D64 to W64
01 01 010 0 |F32| + |F32| 01 11 010 0 U32 to D32 (inexact)
01 01 010 1 |F64| + |F64| 01 11 010 1 U64 to D64 (inexact)
01 01 011 0 01 11 011 0
01 01 011 1 01 11 011 1
01 01 100 0 F32 + 0 01 11 100 0 F32 - I32
01 01 100 1 F64 + 0 01 11 100 1 F64 - I32
01 01 101 0 01 11 101 0 I32 - F32
01 01 101 1 01 11 101 1 I32 - F64
01 01 110 0 |F32| + 0 01 11 110 0 F32 - F64
01 01 110 1 |F64| + 0 01 11 110 1 F64 - F32
01 01 111 0 01 11 111 0
01 01 111 1 01 11 111 1
alpha for Mult OPs
10 xxx 000 F32 * F32
10 xxx 001 F64 * F64
10 xxx 010 W32 * F32
10 xxx 011 W64 * F64
01 xxx 100 F32 * W32
01 xxx 101 F64 * W64
01 xxx 110 W32 * W32
01 xxx 111 W64 * W64
10 000 xxx A * B
10 001 xxx |A| * B
10 010 xxx A * |B|
10 011 xxx |A| * |B|
01 100 xxx - A * B
01 101 xxx -|A| * B
01 110 xxx - A * |B|
01 111 xxx -|A| * |B|
alpha for Set Mode
1100RRIF = mode0
RR Floating point rounding mode
00 Round toward nearest
01 Round toward zero
10 Round toward Positive infinity
11 Round toward negative infinity
I Fixed point rounding mode
0 Round according to Floating point rounding mode
1 Round toward zero
F Fast mode (this is not in the doc but is supposed to exist)
0 IEEE mode
1 Flush denormalized operands and results to zero
1101AAxx = mode1
AA  Multiplier Accumulation rate (does this really exist?)
00 Clock/1
01 Clock/2
10 Clock/3
11 Clock/4
Types of FPFunctions => separate decoding in the Decoder PLA
          Operands   Result Time  Cycles
Set Mode        -     -   120ns  2
   Unary Single   one Single  Single  240ns  3
   Unary Double  one Double  Double 240ns  3
   Binary Single   two Single  Single  240ns  3
   Binary Double ALU two Double  Double 240ns  3
   Binary Double Mult two Double  Double 360 ns 4
Compare Unary Single   one Single  -   240ns  3
Compare Unary Double  one Double  -   240ns  3
Compare Binary Single   two Single  -   240ns  3
Compare Binary Double  two Double  -   240ns  3
Convert Size  Single   one Single  Double 240ns  3
Convert Size  Double  two Double  Single  240ns  3