[Indigo]<Dragon>Doc>DragOps.tioga!28

Dragon Opcodes

To DragonCore^.pa Date August 23, 1984

From Russ Atkinson Location PARC

Subject Dragon Opcodes Organization CSL

XEROX

Release as [Indigo]<Dragon>Doc>DragOps.tioga
Last edited by Russ Atkinson, August 23, 1984 9:51:57 am PDT

Abstract This document describes the Dragon instruction set as seen by the machine code programmer and compiler builder.

Dragon Opcodes

This document is a DRAFT, and most of the information in it is volatile. This document is also INCOMPLETE, and should be viewed with some suspicion. Revisions are currently being sent through Russ Atkinson.

Introduction

The Dragon is a multi-processor general purpose computer currently under development in the Computer Science Laboratory at Xerox PARC. Most of the parts will will be fabricated in custom CMOS. The Dragon will have a shared address space of 2^32 32-bit words. It should have very high performance at relatively low cost.

Each Dragon processor has two logical parts, the IFU (instruction fetch unit) and the EU (execution unit). The IFU fetches instructions and directs the EU to perform arithmetic, logical, shifting, fetching, and storing operations. There are also two caches on memory, one for the IFU and one for the EU.

A Dragon word is 32-bits wide. In this document we follow the Mesa convention of numbering the bits from left to right, so that the most significant bit has index 0, and the least significant bit has index 31. This corresponds to the Mesa declaration:

Word: TYPE = PACKED ARRAY [0..32) OF BOOL;

The remainder of this document assumes some familiarity with the Mesa language, since we use it to describe data formats and the effect of instructions. There are some changes associated with putting Mesa on Dragon, detailed in [Indigo]<Dragon>Doc>DragonMesaChanges.tioga.

EU state

The EU performs various operations on local registers, and communicates with the outside world through a cache attached to a bus (the M-bus). The EU has a 32-bit ALU, a Field Unit (FU) for shifting and other field operations, and a multiplier unit.

The EU has a set of registers that contain the most recent elements of the data stack for a process. There are also registers that contain constants, as well as special purpose quantities. This architecture permits most elementary operations invloving local variables to be performed in 1 EU cycle. However, this architecture requires special attention to migrating the contents of a process stack between registers and memory.

The EU has the following registers:

Stack denotes the stack registers (ARRAY [0..StackSize) OF Word)

StackSize = 128 in current version.

Locals denotes the local registers (Locals[x] == Stack[(x+L) MOD StackSize])

note that Locals is aliased with Stack.

AuxRegs denotes the auxilliary registers (ARRAY [0..15] OF Word). Most of these registers will be used for runtime support for higher-level languages (i.e. Mesa).

Constants is an array of registers in the EU (ARRAY [0..11] OF Word). Although these registers are not really constant as far as the hardware is concerned, they are used to hold constants in the Mesa runtime. They are more general than the AuxRegs in that they can be used in more addressing modes.

Field denotes a special register that can be used to control the field unit operations, and also participates in multiplication and division.

MAR (Memory Address Register) denotes the register which holds the address given in the last memory operation. It is used to report EU page faults.

IFU state

The IFU turns a stream of variable-length instructions into signals that control the EU. It also performs address calculations used for jumps, calls and returns.

The IFU contains a small stack of the most recent procedure contexts. A procedure context for the IFU is simply a program counter (PC) and an index into the EU stack giving the base of the local variables.

The IFU has the following registers (actually more than these, but for simplicity):

PC is the program counter (a byte address). Instruction space is limited to 2^32 bytes (roughly 4 gigabytes), even though the full Dragon address space is 2^32 words.

S is the stack pointer (an index into Stack). Many instructions take operands implicitly using the stack. Procedure calls use the stack for arguments and returns.

L gives the base of the local frame (an index into Stack). The first 16 registers at or above L are easily addressable through Dragon instructions.

SLimit gives the limit for S. A stack overflow occurs when S is incremented to reach SLimit.

Instruction formats

This section describes the various instruction formats. Note that the description [0..255] specifies an unsigned number occupying 8 bits, while [-128..127] specifies a signed number occupying 8 bits. The description of the instruction format gives tentative bit assignments, followed by a list of the instructions using the format (in {} brackets), followed by a rough description of how the format is interpreted.

Arithmetic involving S or L will always be performed modulo the size of the EU stack, without detection of underflow or overflow. We will not further indicate this limitation of precision.

For convenience, we use the following abbreviations to refer to numbers obtained from bytes that follow the opcode byte:

Alpha is the first byte after the opcode. AlphaZ is Alpha extended to 32 bits with high-order 0s. AlphaS is Alpha extended to 32 bits with the sign (high-order) bit of Alpha.

Beta is the second byte after the opcode. BetaZ is Beta extended with 0s. When taken as a pair of four bit numbers, BetaL is the leftmost half of Beta, and BetaR is the rightmost half.

Gamma is the third byte after the opcode. Delta is the fourth byte after the opcode.

AlphaBeta is the unsigned unaligned halfword following the opcode byte, interpreted as Alpha + Beta*256. AlphaBetaZ is AlphaBeta extended to 32 bits with 0s. AlphaBetaS is AlphaBeta extended to 32 bits with the sign (high-order) bit of AlphaBeta.

AlphaBetaGammaDelta is the 32-bit unsigned unaligned quantity following the opcode byte. The number is interpreted as AlphaBeta + 256*256*(Gamma + Delta*256).

OI - Operation Implicit

[op: [0..255]] -- 1 byte

{ADD, BNDCK, DIS, DUP, EXCH, EXDIS, J1, LCn, MDIV, MUL, RDIV, RETN, RETT, SFC, SFCI, SJ, SUB, UDIV, UMUL}

For OI instructions the operand (if any) is implicit in the opcode.

OB - Operation Byte

[op: [0..255], lit: [0..255]] -- 2 bytes

{ADDB, ARL, AS, ASL, CST, EP, IN, J2, JB, LEUR, LIB, LIFUR, OUT, PSB, RB, RET, RSB, SEUR, SIFUR, SUBB, WB, WSB}

For OB instructions the operand is given by AlphaZ or AlphaS.

ODB - Operation Double Byte

[op: [0..255], lit: [0..65535]] -- 3 bytes

{ADDDB, FSDB, J3, JDB, LFC, LGF, LIDB, SHL, SHR}

For ODB instructions the operand is given by AlphaBetaZ or AlphaBetaS.

OQB - Operate Quad Byte

[op: [0..255], addr: Word] -- 5 bytes

{DFC, J5, LIQB}

For OQB instructions the operand is AlphaBetaGammaDelta.

JBB - Jump Byte Byte

[op: [0..255], dist: [-128..127], lit: [0..255]] -- 3 bytes

{JEBB, JEBBJ, JNEBB, JNEBBJ}

For JBB instructions the new PC is the byte address given by PC+AlphaS. BetaZ is used for comparison with the top of stack.

LR - Local Register

[op: [0..15], reg: [0..15]] -- 1 byte

{LRn, SRn}

For LR instructions the operand is Locals[reg], which is either pushed to or popped from the stack.

LRB - Local Register Byte

[op: [0..15], reg: [0..15], disp: [0..255]] -- 2 bytes

{LRIn, SRIn}

For these instructions, the operand is (Locals[reg]+AlphaZ)^, which is either pushed to or popped from the stack.

LRRB - Local Register Register Byte

[op: [0..255], disp: [0..255], reg1,reg2: [0..15]] -- 3 bytes

{RAI, RRI, WAI, WRI}

For all instructions, reg1 (BetaL) indicates a local register. For RRI and WRI, reg2 (BetaR) indicates a local register as well. For RAI and WAI, reg2 indicates an auxilliary register.

RR - Register to Register

[op: [0..255], c,a: [0..15], aOpt,cOpt,bOpt,aux: BOOL, b: [0..15]] -- 3 bytes

{RADD, RAND, RBC, RFU, ROR, RRX, RSUB, RUADD, RUSUB, RVADD, RVSUB, RXOR}

For these instructions the effect is roughly Rc←Ra op Rb. SE is used to indicate the effect of the instruction on S. SE𡤀 at the start of the instruction, and S←S+SE at the end of the instruction.

Ra: IF NOT aOpt

THEN IF aux THEN AuxRegs[a] ELSE Locals[a]

ELSE SELECT a FROM

< 12 => Constants[a]

12 => Stack[S];

13 => Stack[S-1];

14 => Stack[S]; SE←SE-1

15 => Stack[S-1]; SE←SE-1

Rb: IF NOT bOpt

THEN IF aux THEN AuxRegs[b] ELSE Locals[b]

ELSE SELECT b FROM

< 12 => Constants[b]

12 => Stack[S];

13 => Stack[S-1];

14 => Stack[S]; SE←SE-1

15 => Stack[S-1]; SE←SE-1

Rc: IF NOT cOpt

THEN IF aux THEN AuxRegs[c] ELSE Locals[c]

ELSE SELECT c FROM

< 12 => Constants[c]

12 => Stack[S];

13 => Stack[S-1];

14 => Stack[S+1]; SE←SE+1

15 => Stack[S+1]; SE←SE+1

RJB - Register Jump Byte

[op: [0..255], dist: [-128..127], sdd,sd,opt,aux: BOOL, reg: [0..15]] -- 3 bytes

{RJEB, RJEBJ, RJGB, RJGBJ, RJGEB, RJGEBJ, RJLB, RJLBJ, RJLEB, RJLEBJ, RJNEB,RJNEBJ}

For these instructions the effect is to jump to the byte PC given by PC+AlphaS if the indicated comparision of Ra with Rb is true. Beta is decoded as much as possible in the same way that Beta is decoded for the RR format instructions. The comparision is always made involving either [S] or [S-1] with a register determined by the decoding of Beta. SE is used as in the RR format description.

IF sdd THEN SE←SE-1

Ra: IF sd THEN Stack[S-1] ELSE Stack[S]

Rb: IF NOT opt

THEN IF aux THEN AuxRegs[reg] ELSE Locals[reg]

ELSE SELECT reg FROM

< 12 => Constants[reg]

12 => Stack[S];

13 => Stack[S-1];

14 => Stack[S]; SE←SE-1

15 => Stack[S-1]; SE←SE-1

Instruction descriptions

The following is a list of the instructions currently planned for the first version of the Dragon IFU. This instruction set is relatively sparse, to leave room for more advanced IFUs.

Notation

[x] means the contents of EU register x. Hence [S-1] is the second word on the stack (arithmetic within square brackets is performed modulo the stack size).

S is the stack pointer and L is the Register Local pointer (so [L+1] is register local 1).

(y)^ is the contents of memory location y.

Loads and Stores

Name format opcodes

DUP OI 1

DUPlicate. [S+1]←[S]; S←S+1.

EXCH OI 1

EXCHange. temp←[S]; [S]←[S-1]; [S-1]←temp.

EXDIS OI 1

EXCHange discard. [S-1]←[S]; S←S-1.

LCn OI 8

Load Constant n. [S+1]𡤌onstants[n]; S←S+1.

n indicates one of the first 8 constant registers.

LIB OB 1

Load Immediate Byte. [S+1]𡤊lphaZ; S←S+1.

LIDB ODB 1

Load Immediate Double Byte. [S+1]𡤊lphaBetaZ; S←S+1.

LIQB OQB 1

Load Immediate Quad Byte. [S+1]𡤊lphaBetaGammaDelta; S←S+1.

LIQB is useful for loading 32-bit constants.

LRn LR 16

Load Register n. [S+1]←[L+n]; S←S+1.

SRn LR 16

Store Register n. [L+n]←[S]; S←S-1.

Reads & Writes

Name format opcodes

LGF ODB 1 1

Load Global Frame. [S+1]←([GB]+AlphaBetaZ)^; S←S+1.

This operation is used to load global frames. GB denotes auxilliary register 0 in the EU. Unfortunately we have to have a global frame table, but at least it is quite large (64K frames). The decision to use this instruction is not entirely final, since LIQB may be preferable.

LRIn LRB 16

Load Register Indirect n. [S+1]←([L+n]+AlphaZ)^; S←S+1.

PSB OB 1

Put Swapped Byte. ([S-1]+AlphaZ)^←[S]; S←S-1.

RAI LRRB 1

Read Auxilliary register Indirect. [L+BetaL]←(AuxRegs[BetaR]+AlphaZ)^.

RB OB 1

Read Byte. [S]←([S]+AlphaZ)^.

RRI LRRB 1

Read Register Indirect. [L+BetaL]←([L+BetaR]+AlphaZ)^.

RRX RR 1

Read Register indeXed. [Rc]←([Ra]+[Rb])^.

RSB OB 1

Read Save Byte. [S+1]←([S]+AlphaZ)^; S←S+1.

SRIn LRB 16

Store Register Indirect n. ([L+n]+AlphaZ)^←[S]; S←S-1.

WAI LRRB 1

Write Auxilliary register Indirect. (AuxRegs[BetaR]+AlphaZ)^←[L+BetaL].

WB OB 1

Write Byte. ([S]+AlphaZ)^←[S-1]. S←S-2.

WRI LRRB 1

Write Register Indirect. ([L+BetaR]+AlphaZ)^←[L+BetaL].

WSB OB 1

Write Swapped Byte. ([S-1]+AlphaZ)^←[S]; S←S-2.

Arithmetic and logical

All arithmetic is performed using 32-bit 2s-complement arithmetic. Provisions are made for extended precision arithmetic by using Carry for some instructions.

Integer overflow is detected and trapped for most arithmetic. Overflow is defined to occur for addition if the two operands have equal sign and the sign of the result is not equal to the sign of the operands. Overflow cannot occur for addition if the operand signs differ. Overflow is defined for subtraction by considering it to be addition with modified operands.

Some langauges, especially Lisp, need to distinguish between numbers and addresses within a word. The convention in Dragon is to provide checking for Lisp NaN (Not a Number), which is defined to occur when the two top bits of a word are not equal. Lisp NaN checking is performed on both operands and results.

Name format opcodes

ADD OI 1

ADD. [S-1]←[S]+[S-1]; S←S-1. Carry is not used or set. Trap on integer overflow.

ADDB OB 1

Add Byte. [S]←[S]+AlphaZ. Carry is not used or set. Trap on integer overflow.

ADDDB ODB 1

Add Double Byte. [S]←[S]+AlphaBetaZ. Carry is not used or set. Trap on integer overflow.

BNDCK OI 1

BouNDs ChecK. Bounds Check trap if [S-1] < 0 OR [S-1]-[S] >= 0; S←S-1. No change to Carry.

This instruction is used to check indexes against bounds. It is also used when a number is narrowed to fit into a subrange.

MDIV OI 1

Modulus DIVide. [S-2],[S-1] = [S-2],[S-1] MDIV [S]; S←S-1. Initially, x = [S-2],[S-1] (most significant word in [S-2]), y = [S], then [S-2] ← q; [S-1] ← r; where x = q*y + r, and |r| < |y|, and Sign[r] = Sign[y]. Integer overflow trap on overflow or divide by zero.

MUL OI 1

MULtiply. [S-1],[S] ← [S-1]*[S]. Signed multiplication, 32x32 -> 64 bits. No traps possible.

RADD RR 1

RAND RR 1

RBC RR 1

Note: if Rb = FIRST[INT], RBC will fault when Ra < 0, which is useful for assignments between INT and CARD.

RDIV OI 1

Remainder DIVide. [S-2],[S-1] = [S-2],[S-1] RDIV [S]; S←S-1. Initially, x = [S-2],[S-1] (most significant word in [S-2]), y = [S], then [S-2] ← q; [S-1] ← r; where x = q*y + r, and |r| < |y|, and Sign[r] = Sign[x]. Integer overflow trap on overflow or divide by zero.

RLADD RR 1

RLSUB RR 1

ROR RR 1

RSUB RR 1

RUADD RR 1

RUSUB RR 1

RVADD RR 1

RVSUB RR 1

RXOR RR 1

SUB OI 1

SUBtract. [S-1]←[S-1]-[S]; S←S-1. Carry is not used or set. Trap on integer overflow.

SUBB OB 1

Subtract Byte. [S]←[S]-AlphaZ. Carry is not used or set. Trap on integer overflow.

UDIV OI 1

Unsigned DIVide. [S-2],[S-1] = [S-2],[S-1] UDIV [S]; S←S-1. Initially, x = [S-2],[S-1] (most significant word in [S-2]), y = [S], then [S-2] ← q; [S-1] ← r; where x = q*y + r, and r < y. x,y,q, and r are treated as unsigned numbers. Integer overflow trap on overflow or divide by zero.

UMUL OI 1

Unsigned MULtiply. [S-1],[S] ← [S-1]*[S]. Unsigned multiplication, 32x32->64 bits. No traps possible.

Field unit operations

Name format opcodes

FSDB ODB 1

Field Setup Double Byte. Field𡤊lphaBeta.

Stores AlphaBeta to Field to setup the field descriptor for a field unit operation.

RFU RR 1

This a general shift operation, including extract, insert, and shift. The Field register supplies the field descriptor.

SHL ODB 1

SHift Left. [S]𡤏ieldUnit[[S], 0, AlphaBeta].

This operation shifts and masks single words according to the field descriptor in AlphaBeta. It is especially useful for shifting left.

SHR ODB 1

SHift Right. [S]𡤏ieldUnit[[S], [S], AlphaBeta].

This operation shifts and masks single words according to the field descriptor in AlphaBeta. It is especially useful for shifting right, rotating, and extracting fields.

IFU index adjusting instructions

Name format opcodes

AL OB 1

Add to L. L←L+AlphaZ.

This instruction is especially useful for saving and restoring stack frames, but may have other exotic uses.

AS OB 1

Add to Stack. S←S+AlphaZ.

Stack overflow is not checked by this instruction, so it is primarily useful for discarding words from the stack.

ASL OB 1

Add to Stack from L. S←L+AlphaZ.

Stack overflow is not checked by this instruction, so it is primarily useful for discarding words from the stack when the distance from L is known, but S could be at any level.

DIS OI 1

DIScard. S←S-1.

EP OB 1

Enter Procedure. L←S+Alpha.

This instruction is used at procedure entry to establish the base of the local frame.

Unconditional jumps

Name format opcodes

DJ OQB 1

Direct Jump. PC𡤊lphaBetaGammaDelta.

JB OB 1

Jump Byte. PC←PC+AlphaS.

JDB ODB 1

Jump Double Byte. PC←PC+AlphaBetaS.

Jn O* 4

n IN {1, 2, 3, 5}. Jumps to byte address PC+n. These jump instructions are implemented as null operations for speed. J4 is not an opcode because there are no 4 byte opcodes.

SJ OI 1

Stack Jump. PC←PC+[S]. S←S-1. This operation is useful for computed jumps.

Conditional jumps

Each of the following conditional jumps takes two opcodes: one where the prediction is to not jump (no trailing J to the opcode), and one where the prediction is to jump (a trailing J to the opcode). Comparison treats the two numbers as 32-bit signed numbers, unsigned comparisons being judged less common.

Name format opcodes

JEBB(J) JBB 2

Jump Equal Byte Byte. IF BetaZ = [S] THEN PC←PC+AlphaS. S←S-1.

JNEBB(J) JBB 2

Jump Not Equal Byte Byte. IF BetaZ # [S] THEN PC←PC+AlphaS. S←S-1.

RJEB(J) RJB 2

RJGB(J) RJB 2

RJGEB(J) RJB 2

RJLB(J) RJB 2

RJLEB(J) RJB 2

RJNEB(J) RJB 2

Calls and returns

Name format opcodes

DFC OQB 1

Direct Function Call. Calls procedure at AlphaBetaGammaDelta.

LFC ODB 1

Local Function Call. Calls procedure at PC+AlphaBetaS.

SFC OI 1

Stack Function Call. Calls procedure via PC←[S]. S←S-1. The return PC and current L are saved in the IFU stack (possibly causing IFU stack overflow).

SFCI OI 1

Stack Function Call Indirect. Calls procedure via PC←([S])^. The return PC and current L are saved in the IFU stack (possibly causing IFU stack overflow).

RET OB 1

RETurn. Returns to (PC, L) saved in IFU stack (stack underflow trap if IFU stack is empty). The stack is adjusted to S←L+Alpha.

RETN OI 1

RETurn No adjustment. Returns to (PC, L) saved in IFU stack (stack underflow trap if IFU stack is empty). There is no change to S.

RETT OI 1

RETurn from Trap. Returns to (PC, L) saved in IFU stack (stack underflow trap if IFU stack is empty). Traps are enabled on return. There is no change to S.

Special operations

Name format opcodes

CST OB 1

Conditional STore. Takes 3 words of data on the stack: ptr = [S-2], new = [S-1], old = [S]. [S+1]←(ptr+AlphaZ)^ {hold}; IF [S+1] = old THEN (ptr+AlphaZ)^←new ELSE []←(ptr+AlphaZ)^ {release hold}; S←S+1.

CST is used to acheive atomic modification of storage that is shared between multiple processors. The sampled value is pushed onto the stack to be tested against the old value. If they are equal the store was successful in updating the designated word.

IN OB 1

INput operation. [S]←([S]+AlphaZ)^; bypasses cache. Details to be supplied.

LEUR OB 1

Load from EU Register. [S+1]←PReg[Alpha]; S←S+1.

LEUR reads data from the EU register indicated by the Alpha byte to the stack. Register assignments are given by DragOpsCross.ProcessorRegister.

LIFUR OB 1

Load from IFU Register. [S+1]←PReg[Alpha]; S←S+1.

LIFUR reads data from the IFU register indicated by the Alpha byte to the stack. Register assignments are given by DragOpsCross.ProcessorRegister.

OUT OB 1

OUTput operation. ([S]+AlphaZ)^←[S-1]; S←S-2; bypasses cache. Details to be supplied.

SEUR OB 1

Store to EU Register. PReg[Alpha]←[S]; S←S-1.

SEUR stores a word from the top of stack into a designated EU register. Register assignments are given by DragOpsCross.ProcessorRegister.

SIFUR OB 1

Store to IFU Register. PReg[Alpha]←[S]; S←S-1.

SEUR stores a word from the top of stack into a designated IFU register. Register assignments are given by DragOpsCross.ProcessorRegister.

Processor Registers

(Note: this description is partial).

In the IFU we need the following registers:

Status - contains trap enable/disable bit. May contain other bits as we think of them.

EldestL - contains eldest L in the IFU stack.

EldestPC - contains eldest PC in the IFU stack, reading this register removes the eldest entry, while writing this register adds a new eldest entry.

YoungestL - contains youngest L in the IFU stack.

YoungestPC - contains youngest PC in the IFU stack. This is useful in trap routines that need to examine code, or in cases where the trap routine wants to skip the offending instruction.

SLimit - contains current stack limit.

L - contains current value of L register. (Not: not available through LIFUR and SIFUR for current IFU)

S - contains current value of L register. (Not: not available through LIFUR and SIFUR for current IFU)

In the EU we need the following registers:

Field - Field Unit Control.

MAR - Memory Address Register.

Things left out

There are several instructions left out of the above list, for various reasons. To first approximation these are the floating point operations.

XOPs

XOPs are the undefined instructions in the instruction set. To permit efficient emulation of extended instruction sets, all XOPs behave as procedure calls to an address linearly dependent on the instruction code. XOPs have different lengths, depending on the instruction code. All XOPs of more than 1 byte push AlphaBetaGammaDelta onto the stack before they call, although parts of AlphaBetaGammaDelta are not used, depending on the length of the XOP.

For implementation reasons some undefined operations do not trap, but also do not produce well-defined results. These operations will be detailed when the PLA for the IFU is sufficiently determined.

Timing estimates

This section gives approximate times for the Dragon instruction set. Overall, an estimate of 2 cycles per instruction is a good guess for average time.

cycles operation

36? MDIV, RDIV, UDIV

19? MUL, UMUL

8 CST

5 conditional jump incorrectly predicted

5 SFCI, SFC, SJ

4 SIFUR

3 2,3,5 byte Xops

2 1 byte Xops

2 conditional jump correctly predicted

2 JB, JDB, DJ, returns, LFC, DFC

2 EXCH

1 all others

Additional 1 cycle delays are caused by the following:

* using the results of a cache fetch on the next EU cycle

* a word-boundary spanning instruction when the instruction buffer is empty (for example, jumping to an instruction that spans a word boundary)

Returns are delayed until previous calls or returns have completed (3 cycles from start of previous call or return).

There are additional delays due to the fetch-ahead of the instruction buffer. At the start of every IFU cycle a fetch-ahead is performed if the bus is not busy from the previous use of the IFU cache. This may cause the bus to be busy when there is a cache miss, but otherwise does not add cycles. For straight line code the instruction buffer can usually keep up with cache misses.

For the EU cache, assume 6 cycles for writing a dirty victim, 6 cycles for a quad read, and 6 cycles for a map miss. These will take more time, of course, when there is contention by other processors.

Opcode restrictions

This section describes the various restrictions on opcodes. These restrictions reduce the amount of hardware in the IFU. Further restrictions may be imposed. Changes are mostly at the whim of the IFU designer (these restrictions have been changed several times).

Instruction length

The instruction length is inferred from the 3 high bits of the opcode byte as follows:

000 => 1 byte [000B..037B] (1 byte XOPs)

001 => 5 bytes [040B..077B] OQB format

01x => 1 byte [100B..177B] OI, LR formats

10x => 2 bytes [200B..277B] OB, LRB formats

11x => 3 bytes [300B..377B] RR, RJB, ODB, JDB, LRRB formats

Format restrictions

LR & LRB format

must be in aligned blocks of 16 opcodes (there are four such blocks currently)

JBB & RJB format

low bits may govern the jump code (details?)

Calls and Returns

This section gives tentative information about the strategy used for procedure calls and returns on Dragon.

Simple call

The simple case of calling a procedure first pushes any arguments expected by the procedure, then calls the procedure via DFC or LFC. The first instruction is normally an EP instruction, which sets L to the base of the arguments. Therefore, the arguments become the initial local variables without moving those arguments.

The return PC and the return L are pushed onto the IFU stack by the call. If there is insufficient room to do this, an IFU stack overflow trap is taken after the call has transferred control.

Global frames

If the procedure needs access to a global frame, then it is the responsibility of the procedure to setup a register with the pointer to the global frame. Current plans are to use the LGF instruction to load the global frame pointer from the global frame table into a local register. The LIQB instruction could be used to setup the global frame pointer in the case where there are few procedures for the frame, but space considerations will normally make it more desirable to use the 3-byte LGF instead of the 5-byte LIQB, since that will save 2 bytes per procedure.

Simple return

The simple case of returning from a procedure uses the RET opcode, which specifies how much to adjust the stack before returning. If the IFU stack is not empty, then the return PC and L are taken from the IFU stack, and the most recent entry in the IFU stack is discarded. If the IFU stack is empty when a RET is performed, the results are undefined. S is adjusted according to the alpha byte. For cases where the stack should not be adjusted on return the RETN opcode is used.

Procedure variables

For various reasons covered below, procedure variables are called with one more level of indirection than simple procedures. Procedure variables are implemented as pointers to words that contain the starting address of the procedure. To call through a procedure variable, the procedure variable is pushed, then a call is made to ([S])^ using SFCI. This convention leaves an extra word on the stack, so procedure variable calls must go to a different entry point than simple procedure calls.

Nested procedures

Nested procedures are implemented by placing the starting address in the local frame extension (the part of the local frame required to be in memory). The procedure variable for this nested procedure will be a pointer to this word. Therefore, on entry to the nested procedure, the address of the frame (plus an offset) will be on the stack, which makes computing the static link easy (a SUBB instruction).

Interface function call

Interface function calls are both more flexible and more involved than simple procedure calls. Interface records are referred to via positions in the global frame. Procedures exported through those interfaces have procedure variables in various slots of the interface record. To make an interface function call one simply sets up the arguments, fetches the procedure variable from the interface record, and performs a procedure variable call.

Multiple global frames

Although procedures must be specially compiled to use multiple global frames, there is no additional mechanism beyond forcing all calls to routines with multiple global frames to use the indirect procedure call. The address on top of the stack can then be used to find the global frame (probably via adding a constant).

Coroutines

Coroutine calls are handled by traps. The data structures to be used are not yet defined. However, co-routine calls will be roughly as expensive as process switches.

Traps

This section gives tentative information about traps generated by the IFU. The general approach to traps is to have the instruction that generates the trap have no effect, and the return PC for the trap routine be the PC of the trapping instruction. Maskable traps (Reschedule, EU Stack overflow, and IFU stack overflow) disable further maskable traps until they are reenabled (usually via RETT).

Reschedule

The reschedule trap occurs when the RESCHEDULE line is raised and interrupts are enabled. An attempt is made to have idle processors notice the RESCHEDULE line before non-idle processors (this is controlled by software). If the RESCHEDULE line is raised while interrupts are disabled, the reschedule trap will be deferred until interrupts are enabled again. The response to the reschedule trap is quite complex, and will not be covered in this note.

EU Stack overflow

The stack overflow trap occurs when an attempt to increase the EU stack pointer by 1 (S←S+1) would result in S crossing the stack overflow limit register (and traps are enabled). The limit, which is set by software, must allow sufficient room for the trap handler. The handler for this trap should migrate the eldest frame in the EU and IFU stack registers to memory, then return via RETT.

IFU stack overflow

The IFU stack overflow trap occurs when an attempt to call a procedure when the IFU stack is full (and stack overflow is enabled). Some number of IFU frames are still available after this trap occurs, so calls can be made by the trap handler. The handler for this trap is the same as for EU stack overflow.

Stack underflow

The stack underflow trap occurs when a return instruction (RET, RETN, or RETT) tries to return to an empty IFU stack. The trap handler must arrange to migrate a frame from memory to the IFU and EU registers. This is not a true trap, since the IFU stack is left empty on entry to the handler. When the transfer takes place, maskable traps are disabled.

ALU fault

The ALU fault trap occurs when the ALU detects integer overflow, a bounds check (due to the BNDCK instruction) or a NIL check (due to the NILCK instruction), or Lisp NaN. The handler for ALU fault should turn this trap into the appropriate error, depending on the instruction that raised the fault.

EU page fault

The EU page fault trap occurs when a reference to unmapped memory is made by the EU. The faulting address is available through a special EU register (MAR). The handler should determine if the page is valid, then either cause the page to be made present, or cause a page fault error. Interrupts are disabled during the trap handler.

IFU page fault

The IFU page fault trap occurs when a reference to unmapped memory is made by the IFU. The faulting address is available as the return PC. The handler should determine if the page is valid, then either cause the page to be made present, or cause a page fault error.

Field Unit Specification

This section gives a Mesa specification for the Field Unit functions and the field descriptor format. We expect this description to be made more human-readable in the near future.

Note: this entire section will be revised according to what the EU can provide. Treat the information herein as entertaining fiction!

Word: TYPE = PACKED ARRAY [0..bitsPerWord) OF BOOL;

ZerosWord: Word = LOOPHOLE[LONG[0]];

OnesWord: Word = LOOPHOLE[LONG[-1]];

FieldDescriptor: TYPE = MACHINE DEPENDENT RECORD [

reserved: [0..7] ← 0,

reserved bits, not currently used, but must be 0s

insert: BOOL ← FALSE,

governs choice of background and low bits of mask

mask: [0..32] ← 32,

mask gives # of left-justified 1s in the mask

(mask = 0 => no 1s, mask = 32 => all 1s)

shift: [0..32] ← 0

gives # of bits to left-shift the double word

];

Operate: PROC [Left,Right: Word, fd: FieldDescriptor] RETURNS [out: Word] = {

shifter: Word = DoubleWordShiftLeft[Left, Right, fd.shift];

The shifter output has the input double word shifted left by fd.shift bits

mask: Word ← SingleWordShiftRight[OnesWord, 32-fd.mask];

The default mask has fd.mask 1s right-justified in the word

IF fd.insert THEN

mask ← DragAnd[mask, SingleWordShiftLeft[OnesWord, MIN[fd.mask, fd.shift]]];

fd.insert => clear rightmost fd.shift bits of the mask

out ← DragAnd[mask, shifter];

1 bits in the mask select the shifter output

IF fd.insert THEN out ← DragOr[out, DragAnd[DragNot[mask], Right]];

fd.insert => 0 bits in the mask select bits from Right to OR in to the result

};

END.

Definitions used from DragOpsCrossUtils

DoubleWordShiftLeft: PROC

[w0,w1: Word, dist: SixBitIndex] RETURNS [Word] = TRUSTED INLINE {

This procedure shifts two Dragon words left by dist bits and returns the leftmost word.

<< code omitted >>

};

SingleWordShiftLeft: PROC

[word: Word, dist: SixBitIndex] RETURNS [Word] = TRUSTED INLINE {

This procedure shifts one Dragon word left by dist bits and returns the shifted word.

<< code omitted >>

};

SingleWordShiftRight: PROC

[word: Word, dist: SixBitIndex] RETURNS [Word] = TRUSTED INLINE {

This procedure shifts one Dragon word right by dist bits and returns the shifted word.

<< code omitted >>

};

DragAnd: PROC [a,b: Word] RETURNS [Word] = INLINE {

This procedure is a 32-bit AND

<< code omitted >>

};

DragOr: PROC [a,b: Word] RETURNS [Word] = INLINE {

This procedure is a 32-bit OR

<< code omitted >>

};

DragNot: PROC [w: Word] RETURNS [Word] = INLINE {

This procedure is a 32-bit NOT

<< code omitted >>

};

Instruction Set Summary

Name form # Description

ADD OI 1 [S-1]←[S]+[S-1]; S←S-1; trap on overflow

ADDB OB 1 [S]←[S]+AlphaZ; trap on overflow

ADDDB OB 1 [S]←[S]+AlphaBetaZ; trap on overflow

AL OB 1 L←L+Alpha

AS OB 1 S←S+Alpha

ASL OB 1 S←L+Alpha

BNDCK OI 1 trap if [S] < 0 or [S-1]-[S] >= 0; S←S-1

CST OB 1 [S+1]←([S-2]+AlphaZ)^; [S+1]=[S] => ([S-2]+AlphaZ)^←[S-1]; S←S+1; special: atomic

DFC OQB 1 call proc at AlphaBetaGammaDelta

DIS OI 1 S←S-1

DJ OQB 1 PC ← AlphaBetaGammaDelta

DUP OI 1 [S+1]←[S]; S←S+1

EP OB 1 L←S+Alpha

EXCH OI 1 [S+1]←[S]; [S]←[S-1]; [S-1]←[S+1]

EXDIS OI 1 [S-1]←[S]; S←S-1

FSDB ODB 1 Field𡤊lphaBeta

IN OB 1 [S]←([S]+AlphaZ)^; special: uses IO lines

JB OB 1 PC←PC+Alpha

JDB ODB 1 PC←PC+AlphaBetaS

JEBBj JBB 2 BetaZ = [S] => PC←PC+AlphaS; S←S-1

Jn O* 4 Noop of length 1, 2, 3, or 5 bytes (used as jump)

JNEBBj JBB 2 BetaZ # [S] => PC←PC+AlphaS; S←S-1

LCn OI 8 [S+1]𡤌onstants[n]; S←S+1

LEUR OB 1 [S+1]←PReg[Alpha]; S←S+1

LFC JDB 1 call proc at PC+AlphaBetaS

LGF ODB 1 [S+1]←([GB]+AlphaBetaZ)^; S←S+1

LIB OB 1 [S+1]𡤊lphaZ; S←S+1

LIDB ODB 1 [S+1]𡤊lphaBetaZ; S←S+1

LIFUR OB 1 [S+1]←PReg[Alpha]; S←S+1

LIQB OQB 1 [S+1]𡤊lphaBetaGammaDelta; S←S+1

LRIn LRB 16 [S+1]←([L+n]+AlphaZ)^; S←S+1

LRn LR 16 [S+1]←[L+n]; S←S+1

MDIV OI 1 [S-2],[S-1] ← [S-2],[S-1] / [S]; signed, Sign[rem] = Sign[divisor]

MUL OI 1 [S-1],[S] ← [S-1]*[S]; signed

OUT OB 1 ([S]+AlphaZ)^←[S-1]; S←S-2; special: uses IO lines

PSB OB 1 ([S-1]+AlphaZ)^←[S]; S←S-1

RADD RR 1 Rc←Ra+Rb+carry; carry𡤀 trap on overflow

RAI LRRB 1 [L+BetaL]←(AuxRegs[BetaR]+AlphaZ)^

RAND RR 1 Rc←Ra AND Rb

RB OB 1 [S]←([S]+AlphaZ)^

RBC OB 1 trap if Ra < 0 OR Ra-Rb>= 0; Ra←Rc

RDIV OI 1 [S-2],[S-1] ← [S-2],[S-1] / [S]; signed, Sign[rem] = Sign[dividend]

RET OB 1 S←L+Alpha; return from proc

RETN OB 1 return from proc

RETT OB 1 return from proc; enable traps

RFU RR 1 [Rc]𡤏ieldUnit[[Ra],[Rb],Field]

RJEBj RJB 2 Ra=Rb => PC←PC+AlphaS

RJGBj RJB 2 Ra>Rb => PC←PC+AlphaS

RJGEBj RJB 2 Ra>=Rb => PC←PC+AlphaS

RJLBj RJB 2 Ra<Rb => PC←PC+AlphaS

RJLEBj RJB 2 Ra<=Rb => PC←PC+AlphaS

RJNEBj RJB 2 Ra#Rb => PC←PC+AlphaS

RLADD RR 1 Rc←Ra+Rb; carry𡤀 trap on overflow or Lisp NaN

RLSUB RR 1 Rc←Ra-Rb; carry𡤀 trap on overflow or Lisp NaN

ROR RR 1 Rc←Ra OR Rb

RRI LRRB 1 [L+BetaL]←([L+BetaR]+AlphaZ)^

RRX RR 1 [Rc]←([Ra]+[Rb])^

RSB OB 1 [S+1]←([S]+AlphaZ)^; S←S+1

RSUB RR 1 Rc←Ra-Rb-carry; carry𡤀 trap on overflow

RUADD RR 1 Rc←Ra+Rb+carry; set carry

RUSUB RR 1 Rc←Ra-Rb-carry; set carry

RVADD RR 1 Rc←Ra+Rb

RVSUB RR 1 Rc←Ra-Rb

RXOR RR 1 Rc←Ra XOR Rb

SEUR OB 1 PReg[Alpha]←[S]; S←S-1

SFC OI 1 call proc at [S]; S←S-1

SFCI OI 1 call proc at ([S])^

SHL ODB 1 [S]𡤏ieldUnit[[S],0,AlphaBeta]

SHR ODB 1 [S]𡤏ieldUnit[[S],[S],AlphaBeta]

SIFUR OB 1 PReg[Alpha]←[S]; S←S-1

SJ OI 1 PC←PC+[S]

SRIn LRB 16 ([L+n]+AlphaZ)^←[S]; S←S-1

SRn LR 16 [L+n]←[S]; S←S-1

SUB OI 1 [S-1]←[S-1]-[S]; S←S-1; trap on overflow

SUBB OB 1 [S]←[S]-AlphaBetaZ; trap on overflow

WAI LRRB 1 (AuxRegs[BetaR]+AlphaZ)^←[L+BetaL]

WB OB 1 ([S]+AlphaZ)^←[S-1]; S←S-2

WRI LRRB 1 ([L+BetaR]+AlphaZ)^←[L+BetaL]

WSB OB 1 ([S-1]+AlphaZ)^←[S]; S←S-2

UDIV OI 1 [S-2],[S-1] ← [S-2],[S-1] / [S]; unsigned

UMUL OI 1 [S-1],[S] ← [S-1]*[S]; unsigned

Sample code sequences

Notation

In the following examples, FD[insert,mask,shift] denotes a field descriptor with indicated fields. RR format operations are written with the operands in the order Ra,Rb,Rc.

Packed sequence fetch/store

Assume the following Cedar/Mesa code:

r: REF TEXT; -- in local register Lr (32-bit maxLength in word 1)

i: INT; -- in local register Li

c: CHAR; -- in local register Lc

c ← r[i]; -- generates the following code {11 cycles, 26 bytes}

LRI Lr,1 -- push the word containing the bound

RBC Li,[S],[S] -- bounds check the index given (leave Li on stack)

SHR FD[0,30,30] -- make a word index

RADD [S],Lr,[S] -- get the word address (-1)

RB 2 -- fetch the word containing the desired char

LR Li -- push the character index

SHL FD[0,5,3] -- make it a bit index into the word

ADDDB FD[0,8,8] -- add in the rest of the field descriptor

SEUR Field -- set the Field register to control the shift

RFU [S]-,C0,Lc -- extract the char and store it to c

r[i] ← c; -- generates the following code {14 cycles, 32 bytes}

LRI Lr,1 -- push the word containing the bound

RBC Li,[S],[S] -- bounds check the index given (leave Li on stack)

SHR FD[0,30,30] -- make a word index

RADD [S],Lr,[S] -- get the word address (-1)

RSB 2 -- fetch the array word (leave addr on stack)

LIDB FD[1,32,24] -- push the base field desc

LR Li -- push the character index

SHL FD[0,5,3] -- make it a bit index into the word

SHR FD[1,10,5] -- also insert it into the mask position

SUB -- adjust the field descriptor

SEUR Field -- set the Field register to control the shift

RFU Lc,[S],[S] -- insert the char into the array word

WSB 2 -- store the changed word

Procedure body & call

AddFunny: PROC [x,y: INT] RETURNS [INT] = {

z: INT ← x+y;

IF z > 1 THEN z ← z + z;

RETURN [z];

};

Generates

EP 377B -- point L at x

RADD Lx,Ly,[S+1]+ -- z: INT ← x+y

RJLEB 6,[S],C1 -- IF z > 1 THEN

RADD Lz,Lz,Lz -- z ← z + z

ROR Lz,Lz,Lx -- RETURN [z]

RET 0 -- (S ← L+0)

u ← AddFunny[v, w] + 1;

Assume u,v,w are in locals Lu,Lv,Lw

Assume AddFunny to be called via DFC

Generates

LR Lv -- push v

LR Lw -- push w

DFC AddFunny -- AddFunny[v, w]

RADD [S-1]-,C1,Lu -- u ← ... + 1

Arithmetic precision changes

Multiple-precision arithmetic quantities are stored with higher order words in lower addresses, even within the register stack. That is, the word at [S-1] is more significant than the word at [S]. Double precision numbers show up in multiplication and division.

Extend 32-bit signed number on stack to 64-bit signed number on stack {3 cycles, 7 bytes}

DUP -- low-order word on top of stack

RUADD [S-1],[S],[S] -- carry bit ← sign bit & put garbage in [S-1]

RSUB C0,C0,[S-1] -- negate sign bit into high-order word, clear the carry

Narrow 64-bit signed number on stack to 32-bit signed number on stack {4 cycles, 10 bytes}

RUADD [S]-,[S],[S+1]+ -- carry ← sign bit of low order word, garbage at [S+1]

RADD [S-1],C0,[S+1]+ -- [S+1]+ ← sign bit plus high-order word

RBC [S]-,C1,[S] -- bounds check fault when [S] # 0, pop stack

EXDIS -- flush the high-order word

Extend 16-bit signed number on stack to 32-bit signed number on stack {4 cycles, 12 bytes}

RXOR [S+1]+,C0,[S] -- push NOT of the number

SHR FD[0,17,0] -- isolate high bits of number

ADDDB 100000B -- bump the sign, which carries through

ROR [S-1],[S]-,[S-1] -- OR back the extension

Narrow 32-bit signed number on stack to 16-bit signed number on stack {4 cycles, 10 bytes}

DUP

SHR FD[0,17,0] -- isolate the 17 high bits, no shift

ADDDB 100000B -- bump the sign, which carries through

RBC [S]-,C1,[S] -- bounds check fault when [S] # 0, pop stack

Recent Changes

23 Aug 84

definition fixes to BNDCK & RBC

... to fix up carelessness.

22 Aug 84

RBC added

... due to interesting uses in arithmetic changes and other special bounds checking.

21 Aug 84

Cleanup pass

... to fix bugs in descriptions. Arithmetic precision changes were added.

MUL & DIV -> MUL, UMUL, RDIV, MDIV, UDIV

... since we think we need signed & unsigned versions of these operations. The difference between RDIV and MDIV is subtle (based on sign of remainder), and may not be in the final machine.

24 Apr 84

Added timing estimates

... to aid in choice of instructions when hand coding.

6 Apr 84

Added MUL & DIV

... as place holders for the eventual instructions. The current assumption is that they will perform complete signed multiply & divide.

Added DJ & ASL

... to make the instruction set more complete. DJ is useful for filling up trap vectors and other long transfers. ASL is useful for cutting back the stack to a known value relative to L, which we do in exiting a block with extra stuff on the stack.

ARL -> AL

... to make the names a little more consistent.

21 Mar 84

Stack underflow trap added

... by arrangement between McCreight and Atkinson. This makes stack save/restore much easier to code (and faster as well).

16 Mar 84

SPR => SEUR & SIFUR, LPR => LEUR, LIFUR

... at the the request of the IFU designers. Notice that this reverses a decision taken on 27 Feb 84. All instructions use identical decoding of Alpha, so EU registers and IFU registers share the same encoding space (see the declaration of ProcessorRegister in DragOpsCross).

15 Mar 84

REC dumped

... since it was largely useless.

9 Mar 84

0-byte instructions dumped

... since the IFU no longer needed them. This lets us determine the instruction length based on the top 3 bits of the opcode.

6 Mar 84

LILDB dumped

... since it complicated the IFU and is not likely to be a very high frequency instruction.

CST changed to OB format

... to simplify the IFU. This was made possible by an IFU change that allows [S-2] to be easily addressable.

1 Mar 84

IN & OUT replace MAP

... Alpha no longer designates the operation. These operations perform read & write operations with cache bypassing (I think).

27 Feb 84

LIFUR & LEUR => LPR, SIFUR & SEUR => SPR

... the location of the register is given by Alpha, not the opcode. The registers of interest are described under LPR.

RL => L

... which is in better agreement with S as a name.

RLADD & RLSUB added

... to support Lisp (& maybe Smalltalk) arithmetic. A new trap has been added for List NaN.

21 Feb 84

Write Protect fault added

... so we could distiguish it from page fault.

RIF dumped

... since it only had a 1-byte advantage over two read instructions.

10 Feb 84

Code generation samples

... were added.

7 Feb 84

Field Decsriptor mask field widened

To help with boundary conditions in generating field descriptors on the fly, the mask field of FieldDescriptor was widened to include 32. This also makes it follow the same convention as the shift field. We may wish to revisit both decisions when we see more cases.

NILCK dumped

NILCK is used to check for NIL pointers before they are dereferenced. We do not anticipate using NILCK often enough to warrant its inclusion. In cases where we would use it, inserting an extra fetch through the pointer will be quite sufficient. We can even afford to keep all of the low 64K of virtual memory unmapped to further reduce this problem.

SHL & SHR

Two useful shift operations, SHL and SHR, have been introduced to replace frequently occuring cases of FSDB followed by RFU.

6 Feb 84

SFCI replaces SFCB

SFCI replaces SCFB to save a byte, which increases the possible number of interfaces that can be called from a given global frame, since we can have 4 bytes of fetching preceding the SFCI. This allows us to dump RIF if it becomes necessary.

RRX replaces RFX

Simple name change.

LCn replaces LIn

LCn now allows short access to the first 8 constants. LCn replaces LIn, which was limited to the first 5 constants.

LRRB bytes swapped

For more commonality with instructions that added AlphaZ before fetching (Curry's suggestion).

AND, OR, and XOR dumped

They are not used often enough to warrant separate instructions. RAND, ROR, and RXOR should be used instead.

Field Unit instructions changed (again)

The Field Unit instructions have become FSDB (Field Setup Double Byte) and RFU (Register Field Unit). The previous instructions (FUDB, FUI, RFUI) are eliminated. FSDB and RFU are sufficient to do everything we need, although they are not always the most compact or efficient means to do so. We may identify special cases later in the design.

3 Feb 84

EU register names (Field, MQ)

The old name Q is now Field; ICAND is now MQ.

Field Unit instructions changed

The Field Unit is now accessible to 3 instructions: FUDB, FUI, RFUI. These instructions all take two words of data, and take the field descriptor from either AlphaBeta or Field, and produce one word of output.