[Indigo]<Dragon>FP>Rosemary>DragonFPNotes.tioga!1

DragonFPNotes.tioga

Last modified by Curry - July 10, 1984 11:25:15 am PDT

Main Files Affected

DragonFP.mesa DragonFPImpl.mesa

DragonMicrocode.mesa DragonMicrocodeImpl.mesa

FP.rose IFP.rose IDecoder.rose IPipe.rose IFU.rose

MicroInstruction Cell interface

(add) fpCSLoad Lev0FPCSLoadBA

(add) fpCSUAlu Lev0FPCSUAluBA

(add) fpCSUMult Lev0FPCSUMultBA

(add) XASources (7+9=16 => add one bit):

fpLdMode,

fpLdAMsw, fpLdALsw,

fpLdBMsw, fpLdBLsw,

fpUlMultMsw, fpUlMultLsw,

fpUlAluMsw, fpUlAluLsw,

FPStatusB >EnumType["DragonFP.Status"],

FPCSLoadAB <EnumType["DragonFP.CSLoad"],

FPCSUAluAB <EnumType["DragonFP.CSUnload"],

FPCSUMultAB <EnumType["DragonFP.CSUnload"],

KBus during Phase A

It is the responisbility of the Microcode (using multicycles) to insure that there is no conflict for the KBus when it is used at level 4 to move data from the EU to the IFU. SIFUR instructions are assessed a 3 cycle penalty by this requirement. Since the other instructions using the KBus at level4 (SJ, SFC and SFCI) are branching instructions, there is no penalty. When the KBus is used in this way, the CAddr passed to the EU in the previous phase B is in the range of IFU Registers.

KBus conflicts caused by FP ops should never happen since the load signals are driven from level3 and the unload signals from level 2 (just like EUAluRtIsK).

Floating Point operation

The two floating point chips operate using a single phase clock which makes it's active transition between PhA and PhB. In order to save about 12 to 15 pins (3 pins redundant), the KBus is used during PhA to move the load[0..5], unload[0..2] and function[0..5] signals from the IFU to the FP chips. The 3 ChipSelect signals are driven over separate pins.

Reject and Fault occuring in the instruction preceding a FP op are handled correctly (it says here). That is, the Load Chip Select signal is disabled during the next A,B cycle and the IFP section of the IFU knows to ignore a set mode function.

The only guaranteed state associated with FP operations is the Mode, Mask and Flag registers. There are no FP ops which allow 'partial' operations which assume the previous state of internal FP chip registers (ie AM, AL, BM, BL etc). Each FP op loads all its operands from the stack, waits a function specific numer of cycles then moves any results back to the stack.

IFP

The IFP is a section of the IFU which deals with two sets of data:

It maintains a copy of the mode register.

Write: The mode register is written as a side affect of setting the mode registers in the two FP chips. When both the high order Function bits (FPAlu, FPMult) are set and FPCSLoad is enabled then Function is interpreted as a new nibble in the Mode register as described in the Weitek documentation.

Read: The mode register may be read using a LIFUR instruction (Lev0BAddr matches FPMode).

It maintains copies of the 16 bit Mask and Flag registers and issues Reject and FPFfault.

Write: CSUnLoad during PhA causes the floating point Status signals during PhB (1.5 cycles later) to be decoded. The three unsticky flags (used for legal floating point comparisons) are cleared and the flag corresponding to the decoded status signal is set. If any of the curent flags are not masked then Reject and FPFault are asserted. The reject causes the EU to freeze and not store the FP data currently on the PBus, and the fault causes the IDecoder logic during the next phase A to generate a FP trap exception.

Write: The Mask and Flag registers may also be written using a SIFUR instruction (Lev3Caddr matches FPMaskFlag).

Read: The Mask and Flag registers may be read using a LIFUR instruction (Lev0BAddr matches FPMaskFlag).

FP format: FPOp alpha

alpha: BYTE = FPMult: BOOL, FPAlu: BOOL, FPFunction: CARDINAL[0..63]

IF FPMult AND FPAlu

THEN Set Mode

ELSE Execute FPFunction for specified device. (one BOOL must be TRUE)

Binary op operands (A op B) are stacked in the EU with A pushed first.

Double precision operands are stacked in the EU with the most significant word pushed first.

alpha for ALU OPs

Subtract Compare (returns only status)

01 00 000 0 F32 - F32 01 10 000 0 F32 - F32

01 00 000 1 F64 - F64 01 10 000 1 F64 - F64

01 00 001 0 |F32 - F32| 01 10 001 0 -F32 + F32

01 00 001 1 |F64 - F64| 01 10 001 1 -F64 + F64

01 00 010 0 01 10 010 0 |F32| - |F32|

01 00 010 1 01 10 010 1 |F64| - |F64|

01 00 011 0 01 10 011 0

01 00 011 1 01 10 011 1

01 00 100 0 -F32 + 0 01 10 100 0 F32 - 0

01 00 100 1 -F64 + 0 01 10 100 1 F64 - 0

01 00 101 0 01 10 101 0

01 00 101 1 01 10 101 1

01 00 110 0 -F32 + 0 01 10 110 0

01 00 110 1 -F64 + 0 01 10 110 1

01 00 111 0 01 10 111 0

01 00 111 1 01 10 111 1

Add Convert

01 01 000 0 F32 + F32 01 11 000 0 U32 to D32 (exact)

01 01 000 1 F64 + F64 01 11 000 1 U64 to D64 (exact)

01 01 001 0 |F32 + F32| 01 11 001 0 D32 to W32

01 01 001 1 |F64 + F64| 01 11 001 1 D64 to W64

01 01 010 0 |F32| + |F32| 01 11 010 0 U32 to D32 (inexact)

01 01 010 1 |F64| + |F64| 01 11 010 1 U64 to D64 (inexact)

01 01 011 0 01 11 011 0

01 01 011 1 01 11 011 1

01 01 100 0 F32 + 0 01 11 100 0 F32 - I32

01 01 100 1 F64 + 0 01 11 100 1 F64 - I32

01 01 101 0 01 11 101 0 I32 - F32

01 01 101 1 01 11 101 1 I32 - F64

01 01 110 0 |F32| + 0 01 11 110 0 F32 - F64

01 01 110 1 |F64| + 0 01 11 110 1 F64 - F32

01 01 111 0 01 11 111 0

01 01 111 1 01 11 111 1

alpha for Mult OPs

10 xxx 000 F32 * F32

10 xxx 001 F64 * F64

10 xxx 010 W32 * F32

10 xxx 011 W64 * F64

01 xxx 100 F32 * W32

01 xxx 101 F64 * W64

01 xxx 110 W32 * W32

01 xxx 111 W64 * W64

10 000 xxx A * B

10 001 xxx |A| * B

10 010 xxx A * |B|

10 011 xxx |A| * |B|

01 100 xxx - A * B

01 101 xxx -|A| * B

01 110 xxx - A * |B|

01 111 xxx -|A| * |B|

alpha for Set Mode

1100RRIF = mode0

RR Floating point rounding mode

00 Round toward nearest

01 Round toward zero

10 Round toward Positive infinity

11 Round toward negative infinity

I Fixed point rounding mode

0 Round according to Floating point rounding mode

1 Round toward zero

F Fast mode (this is not in the doc but is supposed to exist)

0 IEEE mode

1 Flush denormalized operands and results to zero

1101AAxx = mode1

AA Multiplier Accumulation rate (does this really exist?)

00 Clock/1

01 Clock/2

10 Clock/3

11 Clock/4

Types of FPFunctions => separate decoding in the Decoder PLA

Operands Result Time Cycles

Set Mode - - 120ns 2

Unary Single one Single Single 240ns 3

Unary Double one Double Double 240ns 3

Binary Single two Single Single 240ns 3

Binary Double ALU two Double Double 240ns 3

Binary Double Mult two Double Double 360 ns 4

Compare Unary Single one Single - 240ns 3

Compare Unary Double one Double - 240ns 3

Compare Binary Single two Single - 240ns 3

Compare Binary Double two Double - 240ns 3

Convert Size Single one Single Double 240ns 3

Convert Size Double two Double Single 240ns 3