[Indigo]<Dragon>Documentation>PrincOps>xxxSingleProcOp.tioga!2

Bland, February 14, 1986 11:01:55 am PST

Fiala, February 18, 1986 4:36:22 pm PST

Copyright © 1985, 1986, by Xerox Corporation. All rights reserved.

/ivy/bland/doc/SProcOp.tioga

DRAFT

SINGLE PROCESSOR OPERATION - VERSION 3

FOR INTERNAL XEROX USE ONLY

2.0 Single Processor Operation

The Instruction Fetch Unit (IFU) and the Execution Unit (EU) form a single logical entity. Together, they constitute the fixed-point execution engine for the Dragon Processor. The IFU reads a stream of variable-length machine instructions and decodes them. It then expands them into microinstructions and controls the pipelining of these microinstructions. As a microinstruction progresses through the pipeline, less and less work is done in the IFU and more is performed in the EU.

Each Dragon Processor contains a 128-register stack, 16 registers for the local frame, 12 registers for constants, and 16 auxiliary registers. The following figure shows the logical organization of a Dragon Processor's registers.

IFU/EU - A Logical Model

[Artwork node; type 'ArtworkInterpress on' to command tool]

[Artwork node; type 'ArtworkInterpress on' to command tool]

[Artwork node; type 'ArtworkInterpress on' to command tool]

All the registers named in this logical model are contained 256-register enumerated type, ProcessorRegister. Here is a Cedar definition for ProcessorRegister:

ProcessorRegister: TYPE = MACHINE DEPENDENT {

Stack (0) the EU Stack (128 registers)

Spare (128) not used; possibly non-existent register

ToKBus (129) send result on K bus to IFU

MAR (130) MemoryAddressRegister

Field (131) Field register

Constants (132) Base of EU constant register (12 regs)

AuxRegs (144) Base of EU aux registers (16 regs)

[160..239] don't correspond to any existing registers.

YoungestL (240) youngest L in IFU stack

YoungestPC (241) youngest PC in IFU stack

EldestL (242) eldest L in IFU stack

EldestPC (243) eldest PC in IFU stack (rd removes, wt adds)

Status (244) IFU status

SLimit (245) stack limit register

[246..255] don't correspond to any existing registers

};

Some registers contained in ProcessorRegister are accessed from the IFU and some from the EU. Definitions for the registers in ProcessorRegister are given in the following sections on the IFU and EU as appropriate.

2.1 Instruction Fetch Unit

The IFU contains a small stack of the most recent procedure contexts. A procedure context for the IFU includes a program counter (PC) and an index (L) into the EU stack which is the base register of the local variables.

The IFU has nine accessible registers which are defined below. These registers may be read by the LIP (Load from Internal Processor register) instruction and written by the SIP (Store to Internal Processor register) instruction.

PC (Program Counter)

PC is the program counter (a byte address). Instruction space is limited to 232 bytes (roughly 4 gigabytes), even though the full Dragon address space is 232 words.

S (Stack pointer)

S is a 7-bit index into Stack.

SLimit (Stack Limit)

SLimit gives the limit for S. A stack overflow occurs when an instruction modifies S in such a way that the new value of S is in the range [SLimit..SLimit+16).

L (Local frame base)

L is a 7-bit index into Stack which serves as the base register for the current local frame. The first 16 registers at or above L are easily addressable through the instruction set.

EldestL

EldestL contains L of the oldest procedure context in the IFU stack.

EldestPC

EldestPC contains the PC of the oldest procedure context in the IFU stack. EldestPC is read by the LIP (Load from Internal Processor register) instruction. A read does the following:

Stack[S+1] ← IFUStack[Eldest].PCBits; S ← S + 1;

Eldest ← Eldest + 1;

EldestPC is written by the SIP (Store to Internal Processor register) instruction:

IFUStack[Eldest - 1].PCBits ← Stack[S]; S ← S — 1;

Eldest ← Eldest — 1;

If traps are enabled and there are already 11 entries in the IFU Stack, any instruction which would cause another entry will instead generate an IFU overflow trap. Upon entry into the overflow routine, the PC and L of the offending instruction are recorded as the last values in the IFU Stack.

YoungestL

YoungestL contains the L of the most recent procedure context in the IFU stack.

YoungestPC

YoungestPC contains the PC of the most recent procedure context in the IFU stack. This is useful in trap routines that need to examine code, or in cases where the trap routine wants to skip the instruction that caused the trap.

Status

Status logically contains only three bits of information: mode (user or kernel), traps (enabled or disabled), and rescheduling (pending or not pending). In order to facilitate partial changes in the status word during write (i.e. SIP) operations, each status bit is paired with a control bit which selects either the old or new value to be the result of the write. This control bit is called the keep bit since when it is true, it assures that the old value of the field is kept. When the status word is read into Stack[S+1] all of its paired keep bits are set to false. This facilitates restoration of saved state (LIP) at some future time.

Status: Machine Dependent Record[
reserved: twenty-six Bits ← false, Reserved bits are not currently used and must be set to 0.
userModeKeep: Bool ← false, ifTrue => when writing, keep old value
userMode: Bool ← false, TRUE => user, FALSE => kernel
trapsEnabledKeep: Bool ← false, ifTrue => when writing, keep old value
trapsEnabled: Bool ← false, TRUE => traps enabled
rescheduleKeep: Bool ← false, ifTrue => when writing, keep old value
reschedule: Bool ← false, TRUE => reschedule pending
];

Note: Arithmetic involving S and L is always performed modulo the size of the EU stack, without detection of underflow or overflow; it produces values in the range [0..127]. If traps are enabled and the stack pointer S is in the range [SLimit..SLimit + 16) at the end of any instruction, an EU stack overflow trap occurs.

2.2 The Execution Unit

The Execution Unit (EU) contains a 32-bit Arithmetic Logic Unit (ALU) and a Field Unit (FU) for shifting, masking and inserting fields. The EU contains the address and data pathway to its data cache (or caches) but does not control these caches. It also contains several multiplexers to select operands and implement pipeline short-circuits.

The Execution Unit contains a bank of registers that are 32 bits wide. These registers contain the most recent elements of the data stack for a process. There are also registers that contain constants, as well as special purpose quantities. This architecture permits most elementary operations involving local variables to be performed in 1 EU cycle. However, this architecture requires special attention be paid to migrating the contents of a process stack between registers and memory.

The EU has the following registers:

Stack

Stack indicates the 128 registers used for local variables in recent local frames.

Locals

Locals indicates the 16 local registers in Stack used for the current local frame. The L register in the IFU points at the base of Locals. When the local frame has fewer than 16 registers, the excess locals are synonomous with lower positions on the Stack.

AuxRegs

AuxRegs indicates the 16 auxiliary registers. Most of these registers will be used for the runtime support of higher-level languages, and it is illegal to write into the first 8 auxiliary registers when in User mode.

Constants

Constants indicates the 12 registers normally used to hold constants. Although these registers can be written in Kernel mode, they hold constants for runtime environments and cannot be written in User mode. They are more general in that they can be used in more addressing modes than the AuxRegs.

Field (Field unit register)

Field indicates a special register which can be loaded by the FSDB instruction; the value in this register is used by the RFU instruction to control the Field Unit.

MAR (Memory Address Register)

The MAR register is loaded with the memory address whenever an EU memory reference is delayed by the Wait signal from the cache. In the event of a page or write protect fault, MAR must be read by the fault routine before it issues any EU memory references.

ToKBus

ToKBus is loaded with the value being sent from the EU back to the IFU on SFC, SFCI, SJ, and SIP instructions. Writing this register with SIP serves no useful purpose and might interfere with normal use of the K bus.

Carry

A 1-bit register which is discussed in the "Arithmetic Operations" section below.

2.2.1 Arithmetic and Logical Unit (ALU)

2.2.1.1 Arithmetic Operations

It is intended that Dragon provide efficient instruction-level support for 32-bit 2's-complement integer arithmetic, which is expected to be much more common than any other kind. N-precision 2's-complement arithmetic is also supported but is expected to need little instruction level support. In constrast, 32-bit cardinal arithmetic is only incidentally supported, and 1's-complement arithmetic is not supported.

With these objectives in mind, arithmetic operands are treated in one of three ways: as signed numbers (or twos-complement integers) in the range [-231..231); as unsigned numbers (or cardinals) in the range [0..232); or as Lisp integers in the range [-229..229), where the top three bits must be either all 0's or all 1's.

These different interpretations of the operands require different handling of the Carry bit and of overflow conditions. Accordingly, five distinct kinds of addition/subtraction are defined:

Unsigned

On addition, the 1-bit Carry flip-flop supplies the adder's carry-in and is loaded from the adder carry-out; on subtraction, the complement of Carry bit supplies the adder carry-in, and Carry is loaded from the complement of the adder's carry-out. No traps are taken. The only instructions using this kind of arithmetic are RUADD and RUSUB, and executing one of these operations is the only way to load Carry with the value 1. Unsigned (= cardinal) arithmetic is used for the low-order terms of multiple-precision arithmetic.

Signed

On addition, the Carry bit supplies carry-in, and on subtraction it supplies the complement of carry-in, just as in Unsigned arithmetic. Carry is always loaded with 0 at the end of the instruction. Overflow, which causes a Trap, is defined to occur when the numerical result is not in the range [-231..231), or, equivalently, when the carry out of bit 0 is unequal to the carry out of bit 1. The integer overflow trap, like other traps, occurs before any machine state has been modified, so no information is lost when it is taken. Many instructions use signed arithmetic, which is intended for address computation, loop counting, single-precision, and the high-order term of multiple-precision arithmetic.

Lisp

The Carry bit is not used as an input, and is always set to 0. If either of the operands or the result is not in the range [-229..229) (top three bits being all 0s or all 1s), a Lisp NaN (Not a Number) Trap is taken. Like other traps, Lisp NaN occurs before any machine state has been modified, so no information is lost when it is taken.

Vanilla

The 1-bit Carry flip-flop is not used as an input and is not modified. No traps are taken. The only instructions using this kind of arithmetic are RVADD and RVSUB, which are intended for situations where overflow may occur but does not represent an error.

Instructions facilitating multiply and divide do not exist but will be provided later by an Arithmetic Unit (AU) which will be part of every Dragon processor. The AU will also support both 32-bit and 64-bit IEEE standard floating point arithmetic, but in other respects the AU is not presently defined. The AU will be controlled from the P bus using IO instructions.

The Carry bit may only be addressed indirectly. To read the Carry bit into Rc, execute a RUADD instruction: Stack[S+1] ← Constant[0] + Constant[0]; S ← S + 1. This will yield a 1 on the top of Stack if the Carry was set to 1, and a 0 if it was not. To set the Carry bit, load the saved Carry bit onto the Stack and execute a RUSUB instruction: Stack[S] ← Constant[0] — Stack[S]; S ← S — 1. This instruction will set the Carry bit in the EU and a word will be discarded from Stack.

The notation A-B-Carry is an abbreviation for A+~B+~Carry, where ~B is the 32-bit complement of B, and ~Carry is the 1-bit complement of Carry. The notation A-B, where A and B are 32-bit quantities, denotes the 3-way, signed addition of A+~B+1. It is worth noting the clever trick of complementing the value in Carry for subtraction, which allows the signed subtraction instructions also to be used for the high-order subtraction of two n-precision numbers.

Lisp arithmetic is intended for an implementation in which pointers to storage and numbers occur within the same 32-bit space. For example, one proposal was to divide the 32-bit address space according to the high-order 3 bits of an address as follows:

7 Negative Lisp integers
6 Negative floating point numbers
5 Negative floating point numbers
4 Cons storage
3 Ref storage
2 Positive floating point numbers
1 Positive floating point numbers
0 Positive Lisp integers and Kernel

With this kind of arrangement, arbitrary-precision integers can be implemented as follows: An integer in the range [-229..229) is encoded as a Lisp integer; integers outside this range are stored in a vector in Ref space where word 0 is the number of 32-bit words in the vector and words 1 to n are the n-precision integer. Under the assumption that integers in the range [-229..229) are vastly more common than those outside this range, addition and subtraction are compiled, respectively, into Lisp add or subtract instructions, which will trap and be interpreted by the NaN trap software in the uncommon case. (Arbitrary-precision floating point numbers embedded in pointers like the Lisp integers have also been discussed, as suggested in this table.)

To support 32-bit cardinal arithmetic, instructions which trapped on adder carry-out = 1 (i.e., on Cardinal overflow) would be needed; these do not exist, though the Vanilla or Unsigned arithmetic operations might sometimes be usable. However, it is possible to carry out cardinal addition and subtraction as follows: First complement the sign bit of each operand; then execute an integer operation; finally, complement the sign bit of the result. It can easily be shown that a cardinal comparison is equivalent to an integer comparison with the sign bit of the two operands complemented.

2.2.1.2 Logical Operations

Logical operations are performed on 32-bit quantities that are treated as arrays of 32 booleans. For example, the Or instruction reads: Stack[S-1] ← Stack[S] or Stack[S-1]; S ← S — 1. It performs a logical OR for each of the 32 bit positions.

2.2.1.3 Comparative Operations

Comparative operations are performed on 32-bit signed quantities. The Carry bit is not used or set. For example, the Jump Not Equal Byte Byte instruction reads: If Stack[S] ~= zExt[literal], then PC ← PC + sExt[displacement]; S ← S — 1. Note: Numbers that are zero-extended are understood to be positive (see the definition of zExt in Section 2.3.

2.2.2 Instruction Execution and Sequencing

2.2.3 Field Unit Operations

The field unit enables shifting, rotation, insertion, and masking of fields. It takes two words, and produces a one word result, under the control of a field descriptor. The field descriptor is a Machine Dependent Record. This means that it may not be packed for storage efficiency or arranged in an order that differs from the left to right order of the original record declaration. It is supplied either through a 16-bit constant or through the Field register in the EU; it has the following format:

[Artwork node; type 'ArtworkInterpress on' to command tool]

The four fields are defined as follows:

r1 - r3

r1 - r3 are bits that are not currently used and must be set to zero.

insert

insert governs the choice of background and low bits of the mask. If insert is false the output of the instruction is the logical And of the width and shift. If insert is true the instruction performs an insert operation.

width

Width gives the number of right-justified one's in the mask. (If width = 0, there are no one's. If width = 32, the mask is entirely one's.)

shift

Shift gives the number of bits to left-shift. the double word.

A Cedar program, entitled FieldUnit, describes the operation of the Field Unit. Here are definitions used by the FieldUnit program:

DragonTooth: CEDAR DEFINITIONS = {

Dragon teeth determine how big dragon bytes are.

Bit: TYPE = MACHINE DEPENDENT {zero (0), one (1)};

Zero: Bit = zero;

One: Bit = one;

BitsPerWord: CARDINAL = 32;

BitIndex: TYPE = CARDINAL [0..BitsPerWord);

FieldWidth: TYPE = CARDINAL [0..BitsPerWord];

ShiftIndex: TYPE = CARDINAL [0..BitsPerWord];

Word: TYPE = ARRAY BitIndex OF Bit;

ZeroesWord: Word = ALL[Zero];

OnesWord: Word = ALL[One];

}.

Here is the user interface to the FieldUnitImpl procedure:

DIRECTORY

DragonTooth;

FieldUnit: CEDAR DEFINITIONS = {

FieldDescriptor: TYPE = MACHINE DEPENDENT RECORD [

r1, r2, r3: DragonTooth.Bit ← DragonTooth.Zero,

Not currently used (reserved) but must be set to 0.

insert: BOOL ← FALSE,

The choice of background and low bits of mask.

width: DragonTooth.FieldWidth ← LAST[DragonTooth.FieldWidth],

When insert is FALSE (the easy case), this is the width of the field being extracted.

When insert is TRUE (the hard case), this is the width of the field plus the shift amount.

Note that width should always be greater than the shift amount to assure that extraction takes place from the left Word. Neither the code nor the hardware implementation enforce this constraint.

shift: DragonTooth.ShiftIndex ← FIRST[DragonTooth.ShiftIndex]

the number of bits to left-shift the concatenation of left and right.

];

Operate: PROCEDURE [left, right: DragonTooth.Word, fieldOp: FieldDescriptor] RETURNS [output: DragonTooth.Word];

There are two cases,

either

double-word-left-shifting left and right and returning some right-justified field from the result. This case is also useful for single-word-left-shifting (ZeroesWord as right), rotating (use the same value for left and right), and right-shifting (ZeroesWord as left, and left-shift by BitsPerWord minus the desired right shift).

or

extracting a right-justified field from left and inserting it into the middle of right.

}.

Here are definitions for procedures called by the FieldUnit program:

DragonToothOps: CEDAR DEFINITIONS = {

Operations on Dragon bits.

BitWiseAnd: PROCEDURE [this, that: DragonTooth.Word] RETURNS [DragonTooth.Word];

Form the logical AND of the bits in the two words.

BitWiseOr: PROCEDURE [this, that: DragonTooth.Word] RETURNS [DragonTooth.Word];

Form the logical OR of the bits in the two words.

BitWiseNot: PROCEDURE [this: DragonTooth.Word] RETURNS [DragonTooth.Word];

Form the logical NOT of the bits in the word.

BitWiseMultiplex: PROCEDURE [ifZero, ifOne, selector: DragonTooth.Word] RETURNS [DragonTooth.Word];

Use the bits of selector to choose the corresponding bits from either ifZero or ifOne.

DoubleWordShiftLeft: PROCEDURE [left, right: DragonTooth.Word, shiftAmount: DragonTooth.ShiftIndex] RETURNS [DragonTooth.Word];

Shifts two words left by shiftAmount and returns the resulting leftmost word.

}.

Here is the implementation of the FieldUnit program:

DIRECTORY

FieldUnit,

DragonTooth USING [Word, ZeroesWord, OnesWord],

DragonToothOps USING [BitWiseAnd, BitWiseMultiplex, DoubleWordShiftLeft];

FieldUnitImpl: CEDAR PROGRAM

IMPORTS DragonToothOps

EXPORTS FieldUnit = {

Operate: PUBLIC PROCEDURE [left, right: DragonTooth.Word, fieldOp: FieldUnit.FieldDescriptor] RETURNS [output: DragonTooth.Word] = {

There are two cases,

either

double-word-left-shifting left and right and returning some right-justified field from the result. This case is also useful for single-word-left-shifting (ZeroesWord as right), rotating (use the same value for left and right), and right-shifting (ZeroesWord as left, and left-shift by BitsPerWord minus the desired right shift).

or

extracting a right-justified field from left and inserting it into the middle of right.

widthMask: DragonTooth.Word ← DragonToothOps.DoubleWordShiftLeft[left: DragonTooth.ZeroesWord, right: DragonTooth.OnesWord, shiftAmount: fieldOp.width];

A mask for a right-justified field of length fieldOp.width.

shifted: DragonTooth.Word ← DragonToothOps.DoubleWordShiftLeft[left: left, right: right, shiftAmount: fieldOp.shift];

shift the contents of the field into position.

IF fieldOp.insert

THEN {

This is the hard case, where a right-justified field is extracted from left and inserted into the middle of right. In this case the operands aren't named very well.

`left' holds the right-justified source bits to be inserted,

and so `shifted' holds the source bits shifted into alignment with the destination bits.

`right' holds the word into which the field is inserted,

`widthMask' holds a mask for the field plus `shift' extra bits on the right.

`shiftMask' (defined below) will be used to mask off those extra bits.

The mask for the field is constructed and then used to extract the source bits from `shifted' and to extract the bits around the destination from `right'.

shiftMask: DragonTooth.Word ← DragonToothOps.DoubleWordShiftLeft[left: DragonTooth.OnesWord, right: DragonTooth.ZeroesWord, shiftAmount: fieldOp.shift];

fieldMask: DragonTooth.Word ← DragonToothOps.BitWiseAnd[widthMask, shiftMask];

note that if fieldOp.shift e fieldOp.width then fieldMask is all 0's.

output ← DragonToothOps.BitWiseMultiplex[ifZero: right, ifOne: shifted, selector: fieldMask];

the source bits from `shifted' and the surrounding bits of `right'.

}

ELSE {

The easy case, extract a right-justified field from the shifted Words.

output ← DragonToothOps.BitWiseAnd[shifted, widthMask];

};

};

}.

2.3 Instruction formats

There are eleven instruction formats. They vary in length from 1 to 5 bytes and specify operations on from 0 to 3 operands. Nine of the formats have an opcode occupying 8 bits. Two formats have a 4-bit opcode followed by a 4-bit specification for an operand. The operands may be specified implicitly or explicitly; they may be registers or be determined from an index register and an offset. In all cases the way in which the operands are specified is determined implicitly from the opcode.

Here are brief definitions of terms and abbreviations used in describing the instruction formats and instructions:

Implicit (abbreviated I) — The location of the operands are implicit in the opcode of the instruction. Some of the instructions specify some operands implicitly and some operands explicitly. The term implicit is used to describe the format for a particular instruction if and only if no operands are designated explicitly.

Literal (abbreviated L)— The instruction contains the operand itself, rather than an address or other information describing where the operand is. A frequently-used synonym for the term literal is immediate.

Register (abbreviated R) — The address fields of the instruction specify register operands.

Indexed Register (abbreviated X) — Indexed-Register instructions have an operand whose address is the sum of an offset and the contents of a register.

Mem — memory word

Mem[addr] — the contents of that register or memory at address addr

Aux — auxiliary register

Const — constants register

ProcReg — processor register

ProcBus — processor bus

FieldDesc — field unit descriptor

FieldOp — field operation

FP — floating point

Offset — offset indicates a non-negative byte displacement.

Displacement — displacement indicates a signed byte displacement.

sExt[x] — The signed magnitude of x, a 2's complement number, is extended to the width of the destination.

zExt[x] — The unsigned (positive) magnitude of x is extended of the width of the destination.

m — an integer in the range [0..8)

n — an integer in the range [0..8)

Implicit Operand Specification:
I Implicit

[Artwork node; type 'ArtworkInterpress on' to command tool]

This format is used to perform stack operations. The operand (if any) is implicitly specified by in the opcode.

Example: Add. ADD: Stack[S-1] ← Stack[S] + Stack[S-1] + Carry; Carry ← 0; S ← S — 1;

Literal Operand Specification:
LB Literal Byte

[Artwork node; type 'ArtworkInterpress on' to command tool]

For LB instructions the literal byte following the opcode is used in 1 of 3 ways. It is zero-extended to 32 bits for stack operations: operand ← zExt[literal]. It is used as a 32-bit signed displacement when computing a new PC: operand ← sExt[literal]. And, it is used to calculate a displacement from S or L: operand ← literal. In calculating displacement from S or L, all arithmetic is performed modulo the size of the EU Stack.

Example: Add Byte. ADDB: Stack[S] ← Stack[S] + zExt[literal] + Carry; Carry ← 0.

LH Literal Halfword

[Artwork node; type 'ArtworkInterpress on' to command tool]

For LH instructions the literal halfword following the opcode is used in 1 of 3 ways. It is zero-extended to 32 bits for stack operations: operand ← zExt[literal]. It is used as a 32-bit signed displacement when computing a new PC: operand ← sExt[literal]. The low-order 13 bits are used as a descriptor for Field Unit operations: operand ← Low13Bits[literal].

Example: Load Immediate Double Byte. Stack[S+1] ← zExt[literal]; S ← S + 1.

LW Literal Word

[Artwork node; type 'ArtworkInterpress on' to command tool]

For LW instructions the operand is the 32-bit literal quantity following the opcode, operand ← literal.

Example: Load Immediate Quad Byte. LIQB: Stack[S+1] ← literal; S ← S + 1.

LBD Literal Byte Displacement

[Artwork node; type 'ArtworkInterpress on' to command tool]

LBD instructions have two operands. The first operand is a literal used for comparison with the top of the stack, operand1 ← zExt[literal]. The second operand is a signed displacement, operand2 ← sExt[displacement], used in computing the new PC, PC ← PC+operand2.

Example: Jump Equal Byte Byte. JEBB: If Stack[S] = zExt[literal] then PC ← PC + sExt[displacement]; S ← S — 1.

Register Operand Specification:

The Registers to Register, Quick Register and Register Displacement formats take operands from registers. The Registers to Register format selects a destination operand (Rc) and 2 source operands (Ra and Rb). The Quick Register format is a tighter form of the Registers to Register format; it provides the same decoding as the Registers to Register format for the Rb operand, but limits the possibilities for Ra and Rc to 4 different pairs of locations. The Register Displacement format provides the same decoding as the Registers to Register format for the Rb operand and gives 4 possibilities for the second operand, Rs. (Rs stands for short source register.)

The algorithms used to select Rc, Ra and Rb, and Rs are given here by the CEDAR program, OperandSpeciferImpl.

Here are the definitions used by the CEDAR implementation.

OperandSpecifier: CEDAR DEFINITIONS = {

Here are the type definitions used in accessing Locals[], AuxRegs[] and Constants[]:

AuxiliaryRegisterIndex: TYPE = CARDINAL [0..15];

LocalRegisterIndex: TYPE = CARDINAL [0..15];

ConstantRegisterIndex: TYPE = CARDINAL [0..11];

ShortConstantIndex: TYPE = ConstantRegisterIndex [0..1];

Some types for strong type checking.

SourceDeltaS: TYPE = INTEGER [-1..0];

DestinationDeltaS: TYPE = INTEGER [0..+1];

Here are the abstract locations and specifiers for source operands (Ra and Rb):

SourceLocation: TYPE = {AuxReg, Local, Constant, Top, Under};

SourceSpecifier: TYPE = RECORD [

SELECT location: SourceLocation FROM

AuxReg => [aux: AuxiliaryRegisterIndex],

Local => [local: LocalRegisterIndex],

Constant => [constant: ConstantRegisterIndex],

Top => [deltaS: SourceDeltaS],

Under => [deltaS: SourceDeltaS],

ENDCASE

];

Here are the abstract locations and specifiers for destination operands (Rc):

DestinationLocation: TYPE = {AuxReg, Local, Constant, Top, Under, Push};

DestinationSpecifier: TYPE = RECORD [

SELECT location: DestinationLocation FROM

AuxReg => [aux: AuxiliaryRegisterIndex],

Local => [local: LocalRegisterIndex],

Constant2 => [constant: ConstantRegisterIndex],

Top => [-- deltaS: 0 --],

Under => [-- deltaS: 0 --],

Push => [-- deltaS: 1 --],

ENDCASE

];

Here are the abstract locations and specifiers for shortCASpecifier operands (Rc and Ra):

ShortCASpecifier: TYPE = RECORD [

location: ShortCASelector

];

These are the supporting definitions that reflect the mappings between bit patterns and names

SourceSelector: TYPE = MACHINE DEPENDENT {

Constant0 (0), Constant1, Constant2, Constant3,

Constant4 (4), Constant5, Constant6, Constant7,

Constant8 (8), Constant9, Constant10, Constant11,

Top (12), Under, PopTop, PopUnder (15)

};

DestinationSelector: TYPE = MACHINE DEPENDENT {

Constant0 (0), Constant1, Constant2, Constant3,

Constant4 (4), Constant5, Constant6, Constant7,

Constant8 (8), Constant9, Constant10, Constant11,

Top (12), Under, Push, Reserved (15)

};

ShortCASelector: TYPE = MACHINE DEPENDENT {

TopAtop(0), PushAtop, PushA0, PushA1 (3)

};

ShortSourceSelector: TYPE = MACHINE DEPENDENT {

Constant0 (0), Constant1, Top, PopTop (3)

};

Translations from bit patterns to operand specifiers

SourceOperand: PUBLIC PROCEDURE [auxFlag: BOOL, operandFlag: BOOL, operandSelector: SourceSelector] RETURNS [SourceSpecifier];

DestinationOperand: PUBLIC PROCEDURE [auxFlag: BOOL, operandFlag: BOOL, operandSelector: DestinationSelector] RETURNS [DestinationSpecifier];

ShortCAOperand: PUBLIC PROCEDURE [operandSelector: OperandSpecifier.ShortCASelector] RETURNS [C: OperandSpecifier.DestinationSpecifier, A: OperandSpecifier.SourceSpecifier];

ShortSourceOperand: PUBLIC PROCEDURE [operandSelector: ShortSourceSelector] RETURNS [SourceSpecifier];

}.

Here is the code that actually selects the registers to be used for destination operands (Rc), source operands (Ra and Rb) and short source operands (Rs):

DIRECTORY

OperandSpecifier;

OperandSpecifierImpl: CEDAR PROGRAM

EXPORTS OperandSpecifier = {

SourceOperand: PUBLIC PROCEDURE [auxFlag: BOOL, operandFlag: BOOL, operandSelector: OperandSpecifier.SourceSelector] RETURNS [OperandSpecifier.SourceSpecifier] = {

IF operandFlag

THEN

IF auxFlag

THEN RETURN [[AuxReg[aux: ORD[operandSelector]]]]

ELSE RETURN [[Local[local: ORD[operandSelector]]]]

ELSE

SELECT operandSelector FROM

IN [Constant0..Constant11] =>

RETURN [[Constant[constant: ORD[operandSelector]]]];

Top =>

RETURN [[Top[deltaS: 0]]];

Under =>

RETURN [[Under[deltaS: 0]]];

PopTop =>

RETURN [[Top[deltaS: -1]]];

PopUnder =>

RETURN [[Under[deltaS: -1]]];

ENDCASE =>

ERROR;

};

DestinationOperand: PUBLIC PROCEDURE [auxFlag: BOOL, operandFlag: BOOL, operandSelector: OperandSpecifier.DestinationSelector] RETURNS [OperandSpecifier.DestinationSpecifier] = {

IF operandFlag

THEN

IF auxFlag

THEN RETURN [[AuxReg[aux: ORD[operandSelector]]]]

ELSE RETURN [[Local[local: ORD[operandSelector]]]]

ELSE

SELECT operandSelector FROM

IN [Constant0..Constant11] =>

RETURN [[Constant[constant: ORD[operandSelector]]]];

Top =>

RETURN [[Top[-- deltaS: 0 --]]];

Under =>

RETURN [[Under[-- deltaS: 0 --]]];

Push =>

RETURN [[Push[-- deltaS: +1 --]]];

Reserved =>

ERROR;

ENDCASE =>

ERROR;

};

Short encoding for C and A operands of the QR format instructions. The B operand is always a general operand.

ShortCAOperand: PUBLIC PROCEDURE [operandSelector: OperandSpecifier.ShortCASelector] RETURNS [C: OperandSpecifier.DestinationSpecifier, A: OperandSpecifier.SourceSpecifier] = {

SELECT operandSelector FROM

TopAtop =>

RETURN [[Top[--deltaS: 0--]], [Top[deltaS: 0]]];

PushAtop =>

RETURN [[Push[--deltaS: 1--]], [Top[deltaS: 0]]];

PushA0 =>

RETURN [[Push[--deltaS: 1--]], [Constant[constant: 0]]];

PushA1 =>

RETURN [[Push[--deltaS: 1--]], [Constant[constant: 1]]];

ENDCASE =>

ERROR;

};

ShortSourceOperand: PUBLIC PROCEDURE [operandSelector: OperandSpecifier.ShortSourceSelector] RETURNS [OperandSpecifier.SourceSpecifier] = {

SELECT operandSelector FROM

Constant0 =>

RETURN [[Constant[constant: 0]]];

Constant1 =>

RETURN [[Constant[constant: 1]]];

Top =>

RETURN [[Top[deltaS: 0]]];

PopTop =>

RETURN [[Top[deltaS: -1]]];

ENDCASE =>

ERROR;

};

}.

RRR Registers to Register

[Artwork node; type 'ArtworkInterpress on' to command tool]

Instructions in the RRR format perform an operation with the contents of 2 registers and store the result in a third register, Rc ← Ra op Rb. F[] contains four boolean flags that determine which registers the three operand specifiers, a, b, and c, refer to.

Using the code for operand specifiers given above, Ra, Rb and Rc are specified by the following procedure calls:

Ra: OperandSpecifier.SourceSpecifier ← OperandSpecifier.SourceOperand[auxFlag: F[4], operandFlag: F[1], operandSelector: a];

Rb: OperandSpecifier.SourceSpecifier ← OperandSpecifier.SourceOperand[[auxFlag: F[4], operandFlag: F[3], operandSelector: b];

Rc: OperandSpecifier.DestinationSpecifier ← OperandSpecifier.DestinationOperand: [auxFlag: F[4], operandFlag: F[2], operandSelector: c];

Example: Register OR. ROR: Rc ← Ra or Rb.

Here are three of the most significant uses of the RRR instruction. For a complete listing of the important uses of the RRR instruction, see Appendix xx.

Rb Rc Ra

[Artwork node; type 'ArtworkInterpress on' to command tool]

[Artwork node; type 'ArtworkInterpress on' to command tool]

This instruction is an optimization of DUP followed by an operation.

It saves a memory reference.

[Artwork node; type 'ArtworkInterpress on' to command tool]

QR Quick Register

[Artwork node; type 'ArtworkInterpress on' to command tool]

The QR format is a tighter encoding of the RRR format. Rb is specified in the same manner that it is specified in the RRR format. Ra and Rc can take on 4 different pairs of values:

[Artwork node; type 'ArtworkInterpress on' to command tool]

Instructions in the QR format take less space than instructions in the RRR format; consequently, more of them can fit into the IFU cache. QR instructions will execute faster than RRR instructions and should be used when possible.

Using the code for operand specifiers given above, Ra, Rb and Rc are specified by the following procedure calls:

Ra & Rc: OperandSpecifier.ShortCASpecifier ← OperandSpecifier.ShortCAOperand[operandSelector: ShortCASelector];

Rb: OperandSpecifier.SourceSpecifier ← OperandSpecifier.SourceOperand[[auxFlag: F[2], operandFlag: F[1], operandSelector: b];

Example: Quick Lisp SUBtract. QLSUB: Stack[S] ← Stack[S] — Rb. Carry ← 0.

RD Register Displacement

[Artwork node; type 'ArtworkInterpress on' to command tool]

This format provides access to 3 operands, Rs (short source register), Rb, and a displacement. Instructions using this format compare Rs and Rb and jump to the PC given by PC+sExt[displacement] if the indicated comparision of Rs with Rb is true.

Using the code for operand specifiers given above, Rs and Rb are specified by the following procedure calls:

Rs: OperandSpecifierShort.SourceSpecifier ← OperandSpecifier.ShortSourceOperand[operandSelector: Rs];

Rb: OperandSpecifier.SourceSpecifier ← OperandSpecifier.SourceOperand[[auxFlag: F[2], operandFlag: F[1], operandSelector: b];

Example: Register Jump Equal Byte. RJEB: If Rs = Rb then PC ← PC + sExt[displacement].

LR Locals Register

[Artwork node; type 'ArtworkInterpress on' to command tool]

LR instructions have a 4-bit opcode. The next 4 bits are an index into Locals and specify the operand, operand = Locals[n].

Example: Store Register n. SRn: Locals[n] ← Stack[S]; S ← S — 1. { :n| 0 <= n < 16}.

Indexed Register Operand Specification:

XO Index Register Offset

[Artwork node; type 'ArtworkInterpress on' to command tool]

This format provides access to an operand at a given offset from the index register, Locals[n]. The operand is Mem[Locals[n]+ zExt[offset]].

Example: SRIn: Stack[S+1] ←Mem[Locals[n] + zExt[offset]]; S ← S + 1. {:n| 0 <= n < 16}

XRO Index-Register Register-Offset

[Artwork node; type 'ArtworkInterpress on' to command tool]

This format addresses 2 operands. The first operand specifies a Locals register, Locals[regA]. The second operand is used in 2 ways: it can be used to specify a memory location that is the sum of a Locals register and an offset, Mem[Locals[regB] + zExt[offset]], or it can indicate a memory location that is the sum of a register from AuxRegs and an offset, Mem[AuxRegs[regB] + zExt[offset].

Example: WRI: Mem[Locals[regB]+ zExt[offset]] ← Locals[regA].

2.4 Traps, Faults, and Checks

Conditions which result in an interruption of the normal instruction flow are sometimes called "traps," "faults," or "checks"; such conditions are all called Traps here and handled in the same way. In addition, the KFC (Kernel Function Call) instruction shares some properties of Traps. A trap sends control to location 4,002,000B + 20B * Trap No., where the trap no. is chosen according to the table below:

Trap Name Priority Trap No

Reset 1 07B location for control when RESET is asserted
IFU stack ovf 2 10B IFU stack has 12 frames
EU page fault 3 41B DPbus indicates EU Cache page fault
EU write fault 3 42B DPbus indicates EU Cache write protect fault
AU fault ? 43B DPbus indicates an AU fault
Address check 4 24B store into address < 100000000B from user mode
Integer overflow 4 30B integer result out-of-range
Bounds check 4 31B RBC, QBC, or BC opcode trap
Lisp ovf or NaN 4 32B Lisp arithmetic argument or result out-of-range
Kernel instr ? 37B kernel instruction attempted in user mode
EU stack ovf 5 11B called when, after an opcode, traps are enabled
and S is in the range [SLimit..SLimit+16).
IFU page fault 6 01B IFU page fault
Reschedule 7 12B called when RESCHEDULE is acknowledged

All Traps enter kernal mode, disable traps, and push the old IFU Status onto the EU stack. Instructions which result in traps have no effect on the machine state; it is as though the instruction which caused the trap were not executed. The only traps that can be disabled are Reschedule, EU stack overflow, and IFU stack overflow.

The KFC instruction is like an Xop in that it traps to a location which is 4,000,000B + 20B * opcode, and it advances the PC like an Xop, so that a procedure return will send control to the location after the KFC; however, KFC pushes the old IFU status and disables traps like a Trap.

There is no IFU stack underflow trap; the first location in the IFU call stack must be made to contain a return link to an appropriate trap procedure, thereby preventing underflow. When traps are enabled, the IFU stack then has 10 entries for normal calls and converts the 11th call (which uses the 12th call stack entry) into an IFU stack overflow trap. When traps are disabled, all 15 call stack entries are available.

Another way of dealing with stack underflow is to make the first location in the IFU call stack point to a special unmapped address, so that returning there causes an IFU page fault. This immediately puts the machine in Kernel mode with traps disabled, and it leaves the special unmapped address on the call stack. Then the IFU page fault trap procedure distinguishes this event from an ordinary code page fault according to the trap address.

A deficiency in the Dragon design is that, when another trap or KFC uses the 12th call stack entry, IFU stack overflow will not happen; if, for example, the KFC trap immediately reenables traps and does a procedure call, then the 13th call will cause IFU stack overflow. In this case, the IFU stack overflow handler has only two spare call stack entries and a nested page fault trap may be possible. Consequently, spare call stack entries have to be carefully husbanded.

Any instruction will cause an EU stack overflow trap if, at the end of the instruction, traps are enabled and S is in the range [SLimit..SLimit+16). This trap detects both underflow and overflow conditions, though underflow would represent a programming error.

EU Stack Overflow would ordinarily occur as the result of an instruction which advanced S into the restricted region, but could also occur as a result of reenabling traps with S already inside the restricted region. However, the IFU stack overflow trap only occurs at the onset of instructions that do calls, LFC, DFC, SFC, SFCI, and Xops. This means that if the IFU call stack depth were in the restricted region (12 to 15) and traps were reenabled, no IFU stack overflow trap would occur until the next call. However, there should never be a normally occurring situation in which traps are enabled when either the EU or IFU stack overflow condition exists at the same time.

If pushing IFU Status onto the EU stack at the onset of a Trap results in EU stack overflow, the trap will complete before the EU stack overflow occurs; this is why EU stack overflow is of lower priority than other EU conditions. (Nor will the EU stack overflow occur immediately after the trap instruction because then it is disabled.)

An IFU page fault can never occur in conjunction with EU stack overflow or any of the EU trap conditions, so its relative priority with respect to these other conditions is arbitrary. Since Reschedule is the lowest priority trap, and since it is disabled at the onset of any other trap, it will never nest with the other traps.

2.4.1 User Mode

User mode can be turned on by using SIP[IFUStatus] or RETK to change IFUStatus. Once in user mode, kernel mode can be reentered by executing a KFC instruction or by trapping. In User mode the following actions are illegal:

IO instructions: IOL, ION, IOS

The first byte after the opcode contains a user-mode-illegal bit. If this bit is set, the IO operation will cause a kernel instruction trap when an IO instruction is executed in user mode.

SIP instruction

The SIP instruction causes a kernel instruction trap when executed in user mode.

LIP[IFUEldestPC]

This instruction modifies the IFU state as well as reading the PC. It causes a kernel instruction trap when executed in user mode.

Writes into the 12 constants or the first 8 auxiliary registers

These writes (accomplished using instructions in the RRR format) will cause a kernel instruction trap when executed in user mode.

Writes to an address <100000000B

Such writes will cause an address check trap when executed in user mode.

.

2.4.2 Undefined Instructions (Xops)

The instruction length for every instruction is determined from a computation on the high-order three bits of its opcode. Those opcodes not specifically defined as machine instructions are called Xops. Executing an Xop is semantically equivalent to a function call (DFC or LFC) to its trap location. Each instruction has its own trap location at 4,000,000B + 20B * opcode no. In addition, Xops of length two, three or five bytes push the literal following their opcodes, so two-byte Xops push a one-byte literal value, three-byte Xops a two-byte literal value, and five-byte Xops a four-byte literal value. Execution of an Xop does not change the status of the machine to Kernel mode.

2.4.3 Undefined User Operations

There is a second category of undefined opcodes which must not be executed. These opcodes, six in all, have undefined effects. The execution of these six opcodes has been forbidden in order to simplify the amount of logic required to implement the instruction set. More specifically, these opcodes will do something defined and possibly even useful, but those opcodes may be reused for something else on future versions of the machine. These opcodes are indicated in the instruction map by three asterisks, (***).

2.5 Arithmetic Unit (AU)

The arithmetic unit will be interfaced to the EU P bus and controlled by IO operations. It has not been specified, but the proposal is for it to provide IEEE standard 32-bit and 64-bit floating point arithmetic, 32-bit single-precision integer (signed) multiply and divide, and multi-precision integer multiply and divide.