Dragon Instruction Set
ToFrom
DragonCore^.paRuss Atkinson
 PARC - CSL
SubjectDate
Dragon Instruction Set July 15, 1985
Path Name /Indigo/Dragon/Documentation/DragOps.tioga
Last editedby Russ Atkinson on July 16, 1985 11:58:36 am PDT
AbstractThis document describes the Dragon instruction set as seen by the machine code programmer and compiler builder.
Attributes technical, Cedar, Dragon
Dragon Instruction Set
Copyright © 1984, 1985 by Xerox Corporation. All rights reserved.
This document is a DRAFT, and most of the information in it is volatile. This document is also INCOMPLETE, and should be viewed with some suspicion. Revisions are currently being sent through Russ Atkinson.
Introduction
The Dragon is a multi-processor general purpose computer currently under development in the Computer Science Laboratory at Xerox PARC. Most of the parts will will be fabricated in custom CMOS. The Dragon will have a shared address space of 2^32 32-bit words. It should have very high performance at relatively low cost.
Each Dragon processor has two logical parts, the IFU (instruction fetch unit) and the EU (execution unit). The IFU fetches instructions and directs the EU to perform arithmetic, logical, shifting, fetching, and storing operations. There are also two caches on memory, one for the IFU and one for the EU.
A Dragon word is 32-bits wide. In this document we follow the Mesa convention of numbering the bits from left to right, so that the most significant bit has index 0, and the least significant bit has index 31. This corresponds to the Mesa declaration:
Word: TYPE = PACKED ARRAY [0..32) OF BOOL;
The remainder of this document assumes some familiarity with the Mesa language, since we use it to describe data formats and the effect of instructions. There are some changes associated with putting Mesa on Dragon, detailed in [Indigo]<Dragon>Documentation>DragonMesaChanges.tioga.
EU state
The EU performs various operations on local registers, and communicates with the outside world through a cache attached to a bus (the M-bus). The EU has a 32-bit ALU, a Field Unit (FU) for shifting and other field operations, and a multiplier unit.
The EU has a set of registers that contain the most recent elements of the data stack for a process. There are also registers that contain constants, as well as special purpose quantities. This architecture permits most elementary operations invloving local variables to be performed in 1 EU cycle. However, this architecture requires special attention to migrating the contents of a process stack between registers and memory.
The EU has the following registers:
Stack
denotes the 128 registers used for local variables in recent local frames.
Locals
denotes the 16 local registers, which are those registers in Stack used for the current local frame. The L register in the IFU indicates where Locals starts.
AuxRegs
denotes the 16 auxilliary registers. Most of these registers will be used for runtime support for higher-level languages (i.e. Cedar).
Constants
denotes the 12 registers normally used to hold constants. Although these registers are not really constant as far as the hardware is concerned, they are used to hold constants in the Cedar runtime. They are more general than the AuxRegs in that they can be used in more addressing modes.
Field (Field unit register)
denotes a special register that can be used to control the field unit operations. This is a separate register to allow the field unit to operate on two words from registers and have sufficient control bits.
MAR (Memory Address Register)
denotes the register which holds the address given in the last memory operation. It is used to report EU page faults.
IFU state
The IFU turns a stream of variable-length instructions into signals that control the EU. It also performs address calculations used for jumps, calls and returns.
The IFU contains a small stack of the most recent procedure contexts. A procedure context for the IFU is simply a program counter (PC) and an index into the EU stack giving the base of the local variables.
The IFU has the following accessible registers:
PC (Program Counter)
is the program counter (a byte address). Instruction space is limited to 2^32 bytes (roughly 4 gigabytes), even though the full Dragon address space is 2^32 words.
S (Stack pointer)
S is a 7-bit index into Stack. Many instructions take operands implicitly using the stack. Procedure calls use the stack for arguments and returns.
L (Local frame base)
L is a 7-bit index into Stack. The first 16 registers at or above L are easily addressable through Dragon instructions.
SLimit (Stack Limit)
gives the limit for S. A stack overflow occurs when S is incremented to reach or exceed SLimit.
EldestL
contains eldest L in the IFU stack.
EldestPC
contains eldest PC in the IFU stack, reading this register removes the eldest entry, while writing this register adds a new eldest entry.
YoungestL
contains youngest L in the IFU stack.
YoungestPC
contains youngest PC in the IFU stack. This is useful in trap routines that need to examine code, or in cases where the trap routine wants to skip the offending instruction.
Status
has the rest of the abstract state of the IFU. It controls whether traps are enabled, whether a reshedule is waiting. Each status field has an accompanying bit to indicate whether the field should be set by a SIP instruction. The fields are (no positions yet assigned):
Traps {enabled, disabled}
Reschedule {clear, waiting}
Mode {kernel, user}
Instruction formats
This section describes the various instruction formats. Note that the description [0..255] specifies an unsigned number occupying 8 bits, while [-128..127] specifies a signed number occupying 8 bits. The description of the instruction format gives tentative bit assignments, followed by a rough description of how the format is interpreted.
Arithmetic involving S or L will always be performed modulo the size of the EU stack, without detection of underflow or overflow. We will not further indicate this limitation of precision.
For convenience, we use the following abbreviations to refer to numbers obtained from bytes that follow the opcode byte:
Alpha is the first byte after the opcode. AlphaZ is Alpha extended to 32 bits with high-order 0s. AlphaS is Alpha sign-extended to 32 bits with the high-order bit of Alpha.
Beta is the second byte after the opcode. BetaZ is Beta extended to 32 bits with 0s. BetaS is Beta sign-extended to 32 bits with the high-order bit of Beta. When taken as a pair of four bit numbers, BetaL is the leftmost half of Beta, and BetaR is the rightmost half.
AlphaBeta is the unsigned unaligned halfword following the opcode byte, interpreted as Alpha*256 + Beta. AlphaBetaZ is AlphaBeta extended to 32 bits with 0s. AlphaBetaS is AlphaBeta sign-extended to 32 bits with the high-order bit of AlphaBeta.
Gamma is the third byte after the opcode. Delta is the fourth byte after the opcode.
AlphaBetaGammaDelta is the 32-bit unsigned unaligned quantity following the opcode byte. The number is interpreted as AlphaBeta*256*256 + Gamma*256 + Delta.
OI Operation Implicit
[op: [0..255]] -- 1 byte
For OI instructions the operand (if any) is implicit in the opcode.
OB Operation Byte
[op: [0..255], lit: [0..255]] -- 2 bytes
For OB instructions the operand is given by AlphaZ or AlphaS.
ODB Operation Double Byte
[op: [0..255], b0,b1: [0..255]] -- 3 bytes
For ODB instructions the operand is given by AlphaBetaZ or AlphaBetaS.
OQB Operate Quad Byte
[op: [0..255], b0,b1,b2,b3: [0..255]] -- 5 bytes
For OQB instructions the operand is AlphaBetaGammaDelta.
JBB Jump Byte Byte
[op: [0..255], lit: [0..255], dist: [-128..127]] -- 3 bytes
For JBB instructions the new PC is the byte address given by PC+BetaS. AlphaZ is used for comparison with the top of stack.
LR Local Register
[op: [0..15], reg: [0..15]] -- 1 byte
For LR instructions the operand is Locals[reg], which is either pushed to or popped from the stack.
LRB Local Register Byte
[op: [0..15], reg: [0..15], disp: [0..255]] -- 2 bytes
For these instructions, the operand is (Locals[reg]+AlphaZ)^, which is either pushed to or popped from the stack.
LRRB Local Register Register Byte
[op: [0..255], disp: [0..255], reg1,reg2: [0..15]] -- 3 bytes
For all instructions, reg1 (BetaL) indicates a local register. For RRI and WRI, reg2 (BetaR) indicates a local register as well. For RAI and WAI, reg2 indicates an auxilliary register.
RR Register to Register
[op: [0..255], aOpt,cOpt,bOpt,aux: BOOL, b: [0..15], c,a: [0..15]] -- 3 bytes
For these instructions the effect is roughly Rc←Ra op Rb. SE is used to indicate the effect of the instruction on S. SE𡤀 at the start of the instruction, and S←S+SE at the end of the instruction.
Ra: IF NOT aOpt
THEN IF aux THEN AuxRegs[a] ELSE Locals[a]
ELSE SELECT a FROM
< 12 => Constants[a]
12 => Stack[S];
13 => Stack[S-1];
14 => Stack[S];  SESE-1
15 => Stack[S-1];  SESE-1
Rb: IF NOT bOpt
THEN IF aux THEN AuxRegs[b] ELSE Locals[b]
ELSE SELECT b FROM
< 12 => Constants[b]
12 => Stack[S];
13 => Stack[S-1];
14 => Stack[S];  SESE-1
15 => Stack[S-1];  SESE-1
Rc: IF NOT cOpt
THEN IF aux THEN AuxRegs[c] ELSE Locals[c]
ELSE SELECT c FROM
< 12 => Constants[c]
12 => Stack[S];
13 => Stack[S-1];
14 => Stack[S+1];  SESE+1
15 => Stack[S+1];  SESE+1
QR Quick Register
[op: [0..255], spare0,spare1,bOpt,aux: BOOL, b: [0..15]] -- 2 bytes
For these instructions the effect is roughly [S]←[S] op Rb, where Rb is decoded as for RR format (this includes the stack effect). This format is the "quick" register format because it takes up less space than the RR format, while covering many of the same cases, and we want to emphasize that shorter instructions are quicker, since more can fit in the IFU cache than the longer variety.
Actually, we just wanted a word that began with Q.
RJB Register Jump Byte
[op: [0..255], sdd,sd,opt,aux: BOOL, reg: [0..15], dist: [-128..127]] -- 3 bytes
For these instructions the effect is to jump to the byte PC given by PC+BetaS if the indicated comparision of Ra with Rb is true. The comparision is always made involving either [S] or [S-1] with a general register (specified as in the RR format). The stack effect, SE, is used as in the RR format description.
IF sdd THEN SESE-1
Ra: IF sd THEN Stack[S-1] ELSE Stack[S]
Rb: IF NOT opt
THEN IF aux THEN AuxRegs[reg] ELSE Locals[reg]
ELSE SELECT reg FROM
< 12 => Constants[reg]
12 => Stack[S];
13 => Stack[S-1];
14 => Stack[S];  SESE-1
15 => Stack[S-1];  SESE-1
Instruction descriptions
The following is a list of the instructions currently planned for the first version of the Dragon IFU. This instruction set is relatively sparse, to leave room for more advanced IFUs.
Notation
[x] means the contents of EU register x. Hence [S-1] is the second word on the stack (arithmetic within square brackets is performed modulo the stack size).
S is the stack pointer and L is the Register Local pointer (so [L+1] is register local 1).
(y)^ is the contents of memory location y.
Loads and Stores
The Dragon convention is that a load places a word on the stack from a source, and a store takes a word from the stack into a destination.
Name format opcodes
DUP OI 1
DUPlicate. [S+1]←[S]; S←S+1.
EXDIS OI 1
EXCHange discard. [S-1]←[S]; S←S-1.
LCn OI 8
Load Constant n. [S+1]𡤌onstants[n]; S←S+1.
n indicates one of the first 8 constant registers. The full complement of 12 instructions was not provided, since the frequency of use of the last 4 constants is not sufficient to justify using up 4 more opcodes (sheer speculation, really).
LIB OB 1
Load Immediate Byte. [S+1]𡤊lphaZ; S←S+1.
LIDB ODB 1
Load Immediate Double Byte. [S+1]𡤊lphaBetaZ; S←S+1.
LIQB OQB 1
Load Immediate Quad Byte. [S+1]𡤊lphaBetaGammaDelta; S←S+1.
LIQB is useful for loading 32-bit constants. Note that the byte order in the instruction stream is reversed from the byte order on the stack.
LRn LR 16
Load Register n. [S+1]←[L+n]; S←S+1.
SRn LR 16
Store Register n. [L+n]←[S]; S←S-1.
Reads & Writes
Reads & Writes perform a single memory access through the EU cache. Due to the structure of the EU, one gets a free add of a register and a constant (or two registers) for the address before sending a fetch request through to the cache. For writes, one can only get a free add of a register and a constant for the address.
Name format opcodes
LGF ODB 1
Load Global Frame. [S+1]←(AuxRegs[0]+AlphaBetaZ)^; S←S+1.
This operation may be used to load global frames. Unfortunately we have to have a global frame table, but at least it is quite large (64K frames). The decision to use this instruction is not entirely final, since LIQB may be preferable.
LRIn LRB 16
Load Register Indirect n. [S+1]←([L+n]+AlphaZ)^; S←S+1.
PSB OB 1
Put Swapped using Byte offset. ([S-1]+AlphaZ)^←[S]; S←S-1.
QRX QR 1
Quick Read indeXed. [S]←([S]+Rb)^. QRX is a short form for RRX [S],[S],Rb.
RAI LRRB 1
Read Auxilliary register Indirect. [L+BetaL]←(AuxRegs[BetaR]+AlphaZ)^.
RB OB 1
Read using Byte offset. [S]←([S]+AlphaZ)^.
RRI LRRB 1
Read Register Indirect. [L+BetaL]←([L+BetaR]+AlphaZ)^.
RRX RR 1
Register Read indeXed. Rc←(Ra+Rb)^.
RSB OB 1
Read Save using Byte offset. [S+1]←([S]+AlphaZ)^; S←S+1.
SRIn LRB 16
Store Register Indirect n. ([L+n]+AlphaZ)^←[S]; S←S-1.
WAI LRRB 1
Write Auxilliary register Indirect. (AuxRegs[BetaR]+AlphaZ)^←[L+BetaL].
WB OB 1
Write using Byte offset. ([S]+AlphaZ)^←[S-1]. S←S-2.
WRI LRRB 1
Write Register Indirect. ([L+BetaR]+AlphaZ)^←[L+BetaL].
WSB OB 1
Write Swapped using Byte offset. ([S-1]+AlphaZ)^←[S]; S←S-2.
Arithmetic and logical
All arithmetic is performed using 32-bit 2s-complement arithmetic as a basis, although there are different operations depending on the treatment of Carry and the testing for overflow. The four kinds of addition/subtraction arithmetic are:
Vanilla - the Carry bit is neither used nor set. No overflow checking is performed, and no traps can result. This is the kind of arithmetic that Mesa requires for CARDINAL arithmetic (or for unchecked signed arithmetic).
Signed - the Carry bit is used for addition/subtraction, and set to 0 always; overflow causes a integer overflow trap. Overflow is defined to occur for signed arithmetic if the numerical result is not in the range [-231..231).
Note that the notation A-B-Carry is an abbreviation for A+~B+~Carry, where ~B is the 32-bit complement of B, and ~Carry is the 1-bit complement of Carry. The notation A-B, where A and B are 32-bit quantities, denotes A+~B+1.
Unsigned - the Carry bit is used for addition/subtraction, and set by the carry out of the adder; there is no trap. This kind of operation is used for the lower-order words of multiple precision integer arithmetic. It is also the only way to set Carry.
Lisp - the Carry bit is not used as an input, and is set to 0 always; if either of the operands or the result is not in the range [-228..228) (top three bits being all 0s or all 1s), a Lisp NaN (Not a Number) trap is taken.
If an integer overflow trap or Lisp NaN trap is taken, the trapping instruction has not had any side effects, so no information has been lost.
Name format opcodes
ADD OI 1
Add. [S-1]←[S]+[S-1]+Carry; Carry𡤀 S←S-1. Trap on integer overflow. ADD is a short form for RADD [S-1], [S-1], [S]-.
ADDB OB 1
Add Byte. [S]←[S]+AlphaZ+Carry; Carry𡤀. Trap on integer overflow.
ADDDB ODB 1
Add Double Byte. [S]←[S]+AlphaBetaZ+Carry; Carry𡤀. Trap on integer overflow.
BC OI 1
Bounds Check. Bounds check trap if [S-1] < 0 OR [S-1]-[S] >= 0; S←S-1. Carry not used or set. BC is an abbreviation for RBC [S-1], [S-1], [S]-.
LADD OI 1
Lisp Add. [S-1]←[S]+[S-1]; Carry𡤀 S←S-1. Trap on Lisp NaN. LADD is an abbreviation for RLADD [S-1], [S-1], [S]-.
LSUB OI 1
Lisp Subtract. [S-1]←[S-1]-[S]; Carry𡤀 S←S-1. Trap on Lisp NaN. LSUB is an abbreviation for RLSUB [S-1], [S-1], [S]-.
QADD QR 1
Quick Add. [S]←[S]+Rb+Carry; Carry𡤀. Trap on integer overflow. QADD is a short form for RADD [S], [S], Rb.
QAND QR 1
Quick And. [S]←[S] AND Rb. QAND is a short form for RAND [S], [S], Rb.
QBC QR 1
Quick Bounds Check. Bounds Check trap if [S] < 0 OR [S] - Rb >= 0. Carry not used or set. QBC is an abbreviation for RBC [S], [S], Rb.
QLADD QR 1
Quick Lisp Add. [S]←[S]+Rb; Carry𡤀. Trap on Lisp NaN. QLADD is a short form for RLADD [S], [S], Rb.
QLSUB QR 1
Quick Lisp Subtract. [S]←[S]-Rb; Carry𡤀. Trap on Lisp NaN. QLSUB is a short form for RLSUB [S], [S], Rb.
QOR QR 1
Quick Or. [S]←[S] OR Rb. QOR is an abbreviation for ROR [S], [S], Rb.
QSUB QR 1
Quick Subtract. [S]←[S]-Rb-Carry; Carry𡤀. Trap on integer overflow. QSUB is an abbreviation for RSUB [S], [S], Rb.
RADD RR 1
Register Add. Rc←Ra+Rb+Carry; Carry𡤀. Trap on integer overflow.
RAND RR 1
Register And. Rc←Ra AND Rb.
RBC RR 1
Register Bounds Check. Bounds Check trap if Ra < 0 or Ra - Rb >= 0; Rc←Ra. Carry not used or set.
This instruction is used to check indexes against bounds. It is also used when a number is narrowed to fit into a subrange. Note that if Rb = FIRST[INT], RBC will fault when Ra < 0, which is useful for assignments between INT and CARD.
RLADD RR 1
Register Lisp Add. Rc←Ra+Rb; Carry𡤀. Trap on Lisp NaN.
RLSUB RR 1
Register Lisp Subtract. Rc←Ra-Rb; Carry𡤀. Trap on Lisp NaN.
ROR RR 1
Register Or. Rc←Ra OR Rb.
RSUB RR 1
Register Subtract. Rc←Ra-Rb-Carry; Carry𡤀. Trap on integer overflow.
RUADD RR 1
Register Unsigned Add. Rc←Ra+Rb+Carry; Carry�rryOut. No trap.
RUSUB RR 1
Register Unsigned Subtract. Rc←Ra-Rb-Carry; Carry←~CarryOut. No trap.
RVADD RR 1
Register Vanilla Add. Rc←Ra+Rb; Carry not used or set. No trap.
RVSUB RR 1
Register Vanilla Subtract. Rc←Ra-Rb; Carry not used or set. No trap.
RXOR RR 1
Register Xor. Rc←Ra XOR Rb.
SUB OI 1
Subtract. [S-1]←[S-1]-[S]-Carry; Carry𡤀 S←S-1. Trap on integer overflow. SUB is a short form for RSUB [S-1], [S-1], [S]-.
SUBB OB 1
Subtract Byte. [S]←[S]-AlphaZ-Carry; Carry𡤀. Trap on integer overflow.
SUBDB OB 1
Subtract Double Byte. [S]←[S]-AlphaBetaZ-Carry; Carry𡤀. Trap on integer overflow.
Field unit operations
The field unit allows shifting, rotation, insertion, and masking of fields. It takes two words, and produces a one word result, all under the control of a field descriptor, which can be supplied either through a 16-bit constant, or through the Field register in the EU. The Field register can be set either by the FSDB instruction, or the SIP Field instruction. For more details, see the Field Unit section.
Name format opcodes
FSDB ODB 1
Field Setup Double Byte. Field𡤊lphaBeta+[S]; S←S-1. Carry is not used or set.
Stores AlphaBeta+[S] to Field to setup the field descriptor for a field unit operation. This instruction is equivalent to (but smaller and faster than) the two instructions:
ADDDB AlphaBeta; SIP Field.
RFU RR 1
Register Field Unit. Rc𡤏ieldUnit[Ra, Rb, Field].
This a general shift operation, including extract, insert, and shift. The Field register supplies the field descriptor.
SHD ODB 1
Shift Double. [S-1]𡤏ieldUnit[[S-1], [S], AlphaBeta]; S←S-1.
This operation shifts and masks single words according to the field descriptor in AlphaBeta. It is especially useful for isolating arbitrary fields from a pair of words, or inserting fields. SHD does not affect the Field register.
SHL ODB 1
Shift Left. [S]𡤏ieldUnit[[S], 0, AlphaBeta].
This operation shifts and masks single words according to the field descriptor in AlphaBeta. It is especially useful for shifting left, although it can also take any right justified field in a word and shift it left. SHL does not affect the Field register.
SHR ODB 1
Shift Right. [S]𡤏ieldUnit[[S], [S], AlphaBeta].
This operation shifts and masks single words according to the field descriptor in AlphaBeta. It is especially useful for shifting right, rotating words, and extracting fields. SHR does not affect the Field register.
Arithmetic Unit instructions
The plan is to have an Arithmetic accelerator available to the IFU which will provide single and double precision fixed and floating point multiply and divide as well as other floating point operations. If one is not present in a particular cluster, the following two instructions will trap to software implementations. Currently, the AU is thought to have 2 read and writable double precision input registers, a read and writable control word containing mask, status, mode and function information and a readable double precision result register. The AU works automatically to compute the result corresponding to the 2 operands and control word. It starts over every time any part of its input state changes. It will delay using Reject or report trouble using Fault if and when the result words are read. The 5 input words are the only state that needs to be saved and restored accross a process switch.
Name format opcodes
AUMV OB 1
AU Move. Alpha (which becomes the DPBus command) specifies the movement of data between the AU and the EU stack. The exact encoding is tenative and may be found in AUDoc.tioga. The EU stack may grow by 1, remain the same or shrink by 1 or 2 as a result of this instruction. The hardware implementation of this instruction may be delayed by DPBus Reject and may generate an AUFault.
AUOP ODB 1
AU Op. This instruction is identical to AUMV except that Beta is used to specify a new value for the function portion of the AU's control word.
IFU index adjusting instructions
The registers used for a local frame are bounded the 7-bit IFU registers L and S. It is useful to be able to adjust either of L or S relative to either L or S.
Name format opcodes
AL OB 1
Add to L. L←L+Alpha.
This instruction is especially useful for saving and restoring stack frames, but may have other exotic uses.
ALS OB 1
Add to L from Stack. L←S+Alpha.
This instruction is used at procedure entry to establish the base of the local frame.
AS OB 1
Add to Stack. S←S+Alpha.
Stack overflow is not checked by this instruction, so it is primarily useful for discarding words from the stack.
ASL OB 1
Add to Stack from L. S←L+Alpha.
Stack overflow is not checked by this instruction, so it is primarily useful for discarding words from the stack when the distance from L is known, but S could be at any level.
DIS OI 1
Discard. S←S-1. DIS is a frequently used short form for AS 255.
Unconditional jumps
Name format opcodes
JB  OB 1
Jump using Byte offset. PC←PC+AlphaS.
JDB ODB 1
Jump using Double Byte offset. PC←PC+AlphaBetaS.
JQB  OQB 1
Jump using Quad Byte offset. PC𡤊lphaBetaGammaDelta.
Jn  O* 4
n IN {1, 2, 3, 5}. Jumps to byte address PC+n. These jump instructions are implemented as null operations for speed. J4 is not an opcode because there are no 4 byte opcodes.
SJ  OI 1
Stack Jump. PC←PC+[S]. S←S-1. This operation is useful for computed jumps.
Conditional jumps
Each of the following conditional jumps takes two opcodes: one where the prediction is to not jump (no trailing J to the opcode), and one where the prediction is to jump (a trailing J to the opcode). Comparison treats the two numbers as 32-bit signed numbers, unsigned comparisons being judged less common.
Name format opcodes
JEBBj JBB 2
Jump Equal Byte Byte. IF AlphaZ= [S] THEN PC←PC+BetaS. S←S-1.
JNEBBj JBB 2
Jump Not Equal Byte Byte. IF AlphaZ#[S] THEN PC←PC+BetaS. S←S-1.
RJEBj RJB 2
Register Jump Equal Byte. IF Ra=Rb THEN PC←PC+BetaS.
RJGBj RJB 2
Register Jump Greater Byte. IF Ra>Rb THEN PC←PC+BetaS.
RJGEBj RJB 2
Register Jump Greater Equal Byte. IF Ra>=Rb THEN PC←PC+BetaS.
RJLBj RJB 2
Register Jump Less Byte. IF Ra<Rb THEN PC←PC+BetaS.
RJLEBj RJB 2
Register Jump Less Equal Byte. IF Ra<=Rb THEN PC←PC+BetaS.
RJNEBj RJB 2
Register Jump Not Equal Byte. IF Ra#Rb THEN PC←PC+BetaS.
Calls and returns
Name format opcodes
DFC OQB 1
Direct Function Call. Calls procedure at AlphaBetaGammaDelta.
KFC OI 1
Kernel Function Call. [S+1]←Status; PC←InstTrap[KFC]; set kernel mode; S←S+1. This instruction is used by user mode code to call kernel operations.
LFC ODB 1
Local Function Call. Calls procedure at PC+AlphaBetaS.
SFC OI 1
Stack Function Call. Calls procedure via PC←[S]; S←S-1. The return PC and current L are saved in the IFU stack (possibly causing IFU stack overflow).
SFCI OI 1
Stack Function Call Indirect. Calls procedure via PC←([S])^. The return PC and current L are saved in the IFU stack (possibly causing IFU stack overflow).
RET OB 1
Return. Returns to (PC, L) saved in IFU stack (stack underflow trap if IFU stack is empty). The stack is adjusted to S←L+Alpha.
RETK OB 1
Return from Kernel. Status←[S]; S←L+AlphaZ; returns to (PC, L) saved in IFU stack (stack underflow trap if IFU stack is empty). This instruction is used by kernel mode code to return to user mode code, and to return from the reschedule trap while atomically enabling traps. An instruction trap occurs if this instruction is executed in user mode.
RETN OI 1
Return No adjustment. Returns to (PC, L) saved in IFU stack (stack underflow trap if IFU stack is empty). There is no change to S.
Special operations
Name format opcodes
CST OB 1
Conditional Store. Takes 3 words of data on the stack: ptr = [S-2], new = [S-1], old = [S]. [S+1]←(ptr+AlphaZ)^ {hold}; IF [S+1] = old THEN (ptr+AlphaZ)^←new ELSE []←(ptr+AlphaZ)^ {release hold}; S←S+1.
CST is used to achieve atomic modification of storage that is shared between multiple processors. The sampled value is pushed onto the stack to be tested against the old value. If they are equal the store was successful in updating the designated word.
IN OB 1
Input operation. [S]←([S]+AlphaZ)^; bypasses cache. Details to be supplied.
LIP OB 1
Load from Internal Processor register. [S+1]←PReg[Alpha]; S←S+1.
LIP reads data from the EU or IFU register indicated by the Alpha byte to the stack. Register assignments are given by DragOpsCross.ProcessorRegister.
OUT OB 1
Output operation. ([S]+AlphaZ)^←[S-1]; S←S-2; bypasses cache. Details to be supplied.
PIN ODB 1
Pbus Input operation. [S]←PBus[Alpha,[S+BetaZ]]. Details to be supplied. An instruction trap occurs if this instruction is executed in user mode.
POUT ODB 1
Pbus Output operation. [S]←PBus[Alpha,[S+BetaZ]]; S←S-1. Details to be supplied. An instruction trap occurs if this instruction is executed in user mode.
SIP OB 1
Store to Internal Processor register. PReg[Alpha]←[S]; S←S-1.
LIP stores a word from the top of stack into a designated EU or IFU register. Register assignments are given by DragOpsCross.ProcessorRegister. An instruction trap occurs if this instruction is executed in user mode.
Timing estimates
This section gives approximate minimal instruction times for Dragon. Simulation results indicate an estimate of 2 cycles per instruction is a reasonable guess for average time.
cyclesoperation
8CST
5  conditional jump incorrectly predicted
5SFCI, SFC, SJ
4SIP
3  2,3,5 byte Xops
3KFC, RETK
2  1 byte Xops
2  conditional jump correctly predicted to jump
2JB, JDB, DJ, RET, RETN, LFC, DFC
1  conditional jump correctly predicted to continue
1  all others
Using the results of a cache fetch on the next EU cycle will introduce a 1 cycle delay.
Executing a word-boundary spanning instruction when the instruction buffer is empty (for example, jumping to an instruction that spans a word boundary) will introduce a 1 cycle delay.
Returns are delayed until previous calls or returns have completed (3 cycles from start of previous call or return).
There are additional delays due to the fetch-ahead of the instruction buffer. At the start of every IFU cycle a fetch-ahead is performed if the bus is not busy from the previous use of the IFU cache. This may cause the bus to be busy when there is a cache miss, but otherwise does not add cycles. For straight line code the instruction buffer can usually keep up with cache misses.
When the Field register is set by an FSDB or SIP Field instruction, there will be a delay if there is an attempt to perform an RFU instruction in the next two cycles. This is to allow the Field contents time to reach the shadow register in the EU.
For the EU cache, assume 6 cycles for writing a dirty victim, 6 cycles for a quad read, and 6 cycles for a map miss. These will take more time, of course, when there is contention by other processors.
Calls and Returns
This section gives tentative information about the strategy used for procedure calls and returns on Dragon.
Simple call
The simple case of calling a procedure first pushes any arguments expected by the procedure, then calls the procedure via DFC or LFC. The first instruction is normally an ALS instruction, which sets L to the base of the arguments. Therefore, the arguments become the initial local variables without moving those arguments.
The return PC and the return L are pushed onto the IFU stack by the call. If there is insufficient room to do this, an IFU stack overflow trap is taken after the call has transferred control.
Global frames
If the procedure needs access to a global frame, then it is the responsibility of the procedure to setup a register with the pointer to the global frame. Current plans are to use the LGF instruction to load the global frame pointer from the global frame table into a local register. The LIQB instruction could be used to setup the global frame pointer in the case where there are few procedures for the frame, but space considerations will normally make it more desirable to use the 3-byte LGF instead of the 5-byte LIQB, since that will save 2 bytes per procedure.
Simple return
The simple case of returning from a procedure uses the RET opcode, which specifies how much to adjust the stack before returning. If the IFU stack is not empty, then the return PC and L are taken from the IFU stack, and the most recent entry in the IFU stack is discarded. If the IFU stack is empty when a RET is performed, the results are undefined. S is adjusted according to the alpha byte. For cases where the stack should not be adjusted on return the RETN opcode is used.
Procedure variables
For various reasons covered below, procedure variables are called with one more level of indirection than simple procedures. Procedure variables are implemented as pointers to words that contain the starting address of the procedure. To call through a procedure variable, the procedure variable is pushed, then a call is made to ([S])^ using SFCI. This convention leaves an extra word on the stack, so procedure variable calls must go to a different entry point than simple procedure calls.
Nested procedures
Nested procedures are implemented by placing the starting address in the local frame extension (the part of the local frame required to be in memory). The procedure variable for this nested procedure will be a pointer to this word. Therefore, on entry to the nested procedure, the address of the frame (plus an offset) will be on the stack, which makes computing the static link easy (a SUBB instruction).
Interface function call
Interface function calls are both more flexible and more involved than simple procedure calls. Interface records are referred to via positions in the global frame. Procedures exported through those interfaces have procedure variables in various slots of the interface record. To make an interface function call one simply sets up the arguments, fetches the procedure variable from the interface record, and performs a procedure variable call.
Multiple global frames
Although procedures must be specially compiled to use multiple global frames, there is no additional mechanism beyond forcing all calls to routines with multiple global frames to use the indirect procedure call. The address on top of the stack can then be used to find the global frame (probably via adding a constant). Support for multiple global frames can be delayed until there is need.
Coroutines
Coroutine calls are handled by traps. The data structures to be used are not yet defined. However, coroutine calls will be roughly as expensive as process switches. Support for coroutine can be delayed until there is need.
Traps
This section gives tentative information about traps generated by the IFU. The general approach to traps is to have the instruction that generates the trap have no effect, and the return PC for the trap routine be the PC of the trapping instruction. Maskable traps (Reschedule, EU Stack overflow, and IFU stack overflow) disable further maskable traps until they are reenabled (usually via RETT).
Reschedule
The reschedule trap occurs when the RESCHEDULE line is raised and interrupts are enabled. An attempt is made to have idle processors notice the RESCHEDULE line before non-idle processors (this is controlled by software). If the RESCHEDULE line is raised while interrupts are disabled, the reschedule trap will be deferred until interrupts are enabled again. The response to the reschedule trap is quite complex, and will not be covered in this note.
EU Stack overflow
The stack overflow trap occurs when an attempt to increase the EU stack pointer by 1 (S←S+1) would result in S crossing the stack overflow limit register (and traps are enabled). The limit, which is set by software, must allow sufficient room for the trap handler. The handler for this trap should migrate the eldest frame in the EU and IFU stack registers to memory, then return via RETT.
IFU stack overflow
The IFU stack overflow trap occurs when an attempt to call a procedure when the IFU stack is full (and stack overflow is enabled). Some number of IFU frames are still available after this trap occurs, so calls can be made by the trap handler. The handler for this trap is the same as for EU stack overflow.
Stack underflow
The stack underflow trap occurs when a return instruction (RET, RETN, or RETT) tries to return to an empty IFU stack. The trap handler must arrange to migrate a frame from memory to the IFU and EU registers. This is not a true trap, since the IFU stack is left empty on entry to the handler. When the transfer takes place, maskable traps are disabled.
ALU fault
The ALU fault trap occurs when the ALU detects integer overflow, a bounds check (due to the BNDCK instruction) or a NIL check (due to the NILCK instruction), or Lisp NaN. The handler for ALU fault should turn this trap into the appropriate error, depending on the instruction that raised the fault.
Protection fault
The first 2^24 words in each virtual address space are designated the kernel reserved area; this area will hold code and data the kernel wishes to protect. Any instruction executed in used mode that attempts to write into this reserved region gets a protection fault.
EU page fault
The EU page fault trap occurs when a reference to unmapped memory is made by the EU. The faulting address is available through a special EU register (MAR). The handler should determine if the page is valid, then either cause the page to be made present, or cause a page fault error. Interrupts are disabled during the trap handler.
AU fault
The AU fault trap occurs when an AUMV or AUOP instruction specifies reading of one of the result words and one of six unmasked exceptions is present in the Arithmetic Unit. The floating point exceptions are Inexact Result, UnderFlow, OverFlow, Divide by Zero and Invalid Operation. The fixed point exception is Divide OverFlow. The handler for this fault must meet the requirements for IEEE floating point exception handling.
IFU page fault
The IFU page fault trap occurs when a reference to unmapped memory is made by the IFU. The faulting address is available as the return PC. The handler should determine if the page is valid, then either cause the page to be made present, or cause a page fault error.
Instruction trap
Undefined instructions (also know as Xops) cause instruction traps. There is a separate instruction trap for each opcode. Some instructions are treated as defined in user mode, but not in kernel mode, and these instructions also cause instruction traps when executed in user mode.
To permit efficient emulation of extended instruction sets, all Xops behave as procedure calls to an address linearly dependent on the instruction code. Xops have different lengths, depending on the instruction code. Xops of length 2 push Alpha onto the stack (high-order bytes undefined), Xops of length 3 push AlphaBeta onto the stack (high-order bytes undefined), and Xops of length 5 push AlphaBetaGammaDelta.
For implementation reasons some undefined operations do not trap, but also do not produce well-defined results. These operations will be detailed when the PLA for the IFU is sufficiently determined.
Field Unit
This section gives a brief Mesa specification for the Field Unit functions and the field descriptor format.
FieldDescriptor: TYPE = MACHINE DEPENDENT RECORD [
reserved: [0..7] ← 0,
reserved bits, not currently used, but must be 0s
insert: BOOL ← FALSE,
governs choice of background and low bits of mask
mask: [0..32] ← 32,
mask gives # of right-justified 1s in the mask
(mask = 0 => no 1s, mask = 32 => all 1s)
shift: [0..32] ← 0
gives # of bits to left-shift the double word
];
Operate: PROC [Left,Right: Word, fd: FieldDescriptor] RETURNS [out: Word] = {
shifter: Word = DoubleWordShiftLeft[Left, Right, fd.shift];
The shifter output has the input double word shifted left by fd.shift bits
mask: Word ← SingleWordShiftRight[OnesWord, 32-fd.mask];
The default mask has fd.mask 1s right-justified in the word
IF fd.insert THEN
mask ← DragAnd[mask, SingleWordShiftLeft[OnesWord, MIN[fd.mask, fd.shift]]];
fd.insert => clear rightmost fd.shift bits of the mask
out ← DragAnd[mask, shifter];
1 bits in the mask select the shifter output
IF fd.insert THEN out ← DragOr[out, DragAnd[DragNot[mask], Right]];
fd.insert => 0 bits in the mask select bits from Right to OR in to the result
};
Definitions used from DragOpsCrossUtils
DoubleWordShiftLeft: PROC [w0,w1: Word, dist: SixBitIndex] RETURNS [Word];
This procedure shifts two Dragon words left by dist bits and returns the leftmost word.
SingleWordShiftLeft: PROC [word: Word, dist: SixBitIndex] RETURNS [Word];
This procedure shifts one Dragon word left by dist bits and returns the shifted word.
SingleWordShiftRight: PROC [word: Word, dist: SixBitIndex] RETURNS [Word];
This procedure shifts one Dragon word right by dist bits and returns the shifted word.
DragAnd: PROC [a,b: Word] RETURNS [Word];
This procedure is a 32-bit AND
DragOr: PROC [a,b: Word] RETURNS [Word];
This procedure is a 32-bit OR
DragNot: PROC [w: Word] RETURNS [Word];
This procedure is a 32-bit NOT
Instruction Set Summary
Name form # Description
ADD OI 1 [S-1]←[S]+[S-1]; S←S-1; trap on overflow
ADDB OB 1 [S]←[S]+AlphaZ; trap on overflow
ADDDB OB 1 [S]←[S]+AlphaBetaZ; trap on overflow
AL OB 1 L←L+Alpha
ALS OB 1 L←S+Alpha
AS OB 1 S←S+Alpha
ASL OB 1 S←L+Alpha
AUMV OB 1 Alpha specifies S change and Pbus cmnd
AUOP ODB 1 Alpha specifies S change and Pbus cmnd, Beta specifies AU function
BC OI 1 trap if [S] < 0 or [S-1]-[S] >= 0; S←S-1
CST OB 1 [S+1]←([S-2]+AlphaZ)^; [S+1]=[S] => ([S-2]+AlphaZ)^←[S-1]; S←S+1; special: atomic
DFC OQB 1 call proc at AlphaBetaGammaDelta
DIS OI 1 S←S-1
DJ OQB 1 PC𡤊lphaBetaGammaDelta
DUP OI 1 [S+1]←[S]; S←S+1
EXDIS OI 1 [S-1]←[S]; S←S-1
FSDB ODB 1 Field𡤊lphaBeta+[S]; S←S-1
IN OB 1 [S]←([S]+AlphaZ)^; special: uses IO lines
JB OB 1 PC←PC+Alpha
JDB ODB 1 PC←PC+AlphaBetaS
JEBBj JBB 2 AlphaZ = [S] => PC←PC+BetaS; S←S-1
Jn O* 4 Noop of length 1, 2, 3, or 5 bytes (used as jump)
JNEBBj JBB 2 AlphaZ # [S] => PC←PC+BetaS; S←S-1
KFC OI 1 [S+1]←Status; PC ← InstTrap[KFC]; Status.mode ← kernel; S←S+1
LCn OI 8 [S+1]𡤌onstants[n]; S←S+1
LIP OB 1 [S+1]←PReg[Alpha]; S←S+1
LFC JDB 1 call proc at PC+AlphaBetaS
LGF ODB 1 [S+1]←([GB]+AlphaBetaZ)^; S←S+1
LIB OB 1 [S+1]𡤊lphaZ; S←S+1
LIDB ODB 1 [S+1]𡤊lphaBetaZ; S←S+1
LIQB OQB 1 [S+1]𡤊lphaBetaGammaDelta; S←S+1
LRIn LRB 16 [S+1]←([L+n]+AlphaZ)^; S←S+1
LRn LR 16 [S+1]←[L+n]; S←S+1
OUT OB 1 ([S]+AlphaZ)^←[S-1]; S←S-2; special: uses IO lines
PIN ODB 1 [S]←Pbus[Alpha,[S+BetaZ]]; special: uses Pbus directly {kernel}
POUT ODB 1 [S]←Pbus[Alpha,[S+BetaZ]]; S←S-1; special: uses Pbus directly {kernel}
PSB OB 1 ([S-1]+AlphaZ)^←[S]; S←S-1
QADD QR 1 [S]←[S]+Rb+carry; carry𡤀 trap on overflow
QAND QR 1 [S]←[S] AND Rb
QBC QR 1 trap if [S] < 0 OR [S]-Rb>= 0
QLADD QR 1 [S]←[S]+Rb; carry𡤀 trap on Lisp NaN
QLSUB QR 1 [S]←[S]-Rb; carry𡤀 trap on Lisp NaN
QOR QR 1 [S]←[S] OR Rb
QRX QR 1 [[S]]←([[S]]+Rb)^
QSUB QR 1 [S]←[S]-Rb-carry; carry𡤀 trap on overflow
RADD RR 1 Rc←Ra+Rb+carry; carry𡤀 trap on overflow
RAI LRRB 1 [L+BetaL]←(AuxRegs[BetaR]+AlphaZ)^
RAND RR 1 Rc←Ra AND Rb
RB OB 1 [S]←([S]+AlphaZ)^
RBC OB 1 trap if Ra < 0 OR Ra-Rb>= 0; Ra←Rc
RET OB 1 S←L+Alpha; return from proc
RETK OB 1 Status←[S]; S←S-1; return from proc; {kernel}
RETN OB 1 return from proc
RFU RR 1 [Rc]𡤏ieldUnit[[Ra],[Rb],Field]
RJEBj RJB 2 Ra=Rb => PC←PC+BetaS
RJGBj RJB 2 Ra>Rb => PC←PC+BetaS
RJGEBj RJB 2 Ra>=Rb => PC←PC+BetaS
RJLBj RJB 2 Ra<Rb => PC←PC+BetaS
RJLEBj RJB 2 Ra<=Rb => PC←PC+BetaS
RJNEBj RJB 2 Ra#Rb => PC←PC+BetaS
RLADD RR 1 Rc←Ra+Rb; carry𡤀 trap on Lisp NaN
RLSUB RR 1 Rc←Ra-Rb; carry𡤀 trap on Lisp NaN
ROR RR 1 Rc←Ra OR Rb
RRI LRRB 1 [L+BetaL]←([L+BetaR]+AlphaZ)^
RRX RR 1 [Rc]←([Ra]+[Rb])^
RSB OB 1 [S+1]←([S]+AlphaZ)^; S←S+1
RSUB RR 1 Rc←Ra-Rb-carry; carry𡤀 trap on overflow
RUADD RR 1 Rc←Ra+Rb+carry; set carry
RUSUB RR 1 Rc←Ra-Rb-carry; set carry
RVADD RR 1 Rc←Ra+Rb
RVSUB RR 1 Rc←Ra-Rb
RX OI 1 [S-1]←([S-1]+[S])^; S←S-1
RXOR RR 1 Rc←Ra XOR Rb
SIP OB 1 PReg[Alpha]←[S]; S←S-1; {kernel}
SFC OI 1 call proc at [S]; S←S-1
SFCI OI 1 call proc at ([S])^
SHL ODB 1 [S]𡤏ieldUnit[[S],0,AlphaBeta]
SHR ODB 1 [S]𡤏ieldUnit[[S],[S],AlphaBeta]
SJ OI 1 PC←PC+[S]
SRIn LRB 16 ([L+n]+AlphaZ)^←[S]; S←S-1
SRn LR 16 [L+n]←[S]; S←S-1
SUB OI 1 [S-1]←[S-1]-[S]; S←S-1; trap on overflow
SUBB OB 1 [S]←[S]-AlphaZ; trap on overflow
SUBDB ODB 1 [S]←[S]-AlphaBetaZ; trap on overflow
WAI LRRB 1 (AuxRegs[BetaR]+AlphaZ)^←[L+BetaL]
WB OB 1 ([S]+AlphaZ)^←[S-1]; S←S-2
WRI LRRB 1 ([L+BetaR]+AlphaZ)^←[L+BetaL]
WSB OB 1 ([S-1]+AlphaZ)^←[S]; S←S-2
Instruction Map
This instruction map shows the current placement of instruction codes in the opcode space. It has been arranged to minimize the decoding difficulty for the IFU. This map is subject to change.
Octal Form Len --0 --1 --2 --3 --4 --5 --6 --7
00- OI 1 xop xop xop xop xop xop xop xop
01- OI 1 xop xop xop xop xop xop xop xop
02- OI 1 xop xop xop xop xop xop xop xop
03- OI 1 xop xop xop xop xop xop xop xop
04- OQB 5 xop xop xop xop xop xop xop xop
05- OQB 5 xop xop xop xop xop xop xop xop
06- OQB 5 xop DFC LIQB xop xop xop J5 JQB
07- OQB 5 xop xop xop xop xop xop xop xop
10- OI 1 OR AND RX BC ADD SUB LADD LSUB
11- OI 1 DUP DIS xop EXDIS SFC SFCI RETN xop
12- OI 1 xop xop xop xop KFC xop J1 SJ
13- OI 1 LC0 LC1 LC2 LC3 LC4 LC5 LC6 LC7
14- LR 1 LR0 LR1 LR2 LR3 LR4 LR5 LR6 LR7
15- LR 1 LR8 LR9 LR10 LR11 LR12 LR13 LR14 LR15
16- LR 1 SR0 SR1 SR2 SR3 SR4 SR5 SR6 SR7
17- LR 1 SR8 SR9 SR10 SR11 SR12 SR13 SR14 SR15
20- QR 2 QOR QAND QRX QBC QADD QSUB QLADD QLSUB
21- OB 2 ALS AL ASL AS CST xop RET RETK
22- OB 2 LIP SIP LIB xop ADDB SUBB J2 JB
23- OB 2 RB WB RSB WSB IN OUT AUMV PSB
24- LRRB 2 LRI0 LRI1 LRI2 LRI3 LRI4 LRI5 LRI6 LRI7
25- LRRB 2 LRI8 LRI9 LRI10 LRI11 LRI12 LRI13 LRI14 LRI15
26- LRRB 2 SRI0 SRI1 SRI2 SRI3 SRI4 SRI5 SRI6 SRI7
27- LRRB 2 SRI8 SRI9 SRI10 SRI11 SRI12 SRI13 SRI14 SRI15
30- RR 3 ROR RAND RRX RBC RADD RSUB RLADD RLSUB
31- RR 3 RXOR *** RFU *** RVADD RVSUB RUADD RUSUB
32- ODB 3 LGF LFC LIDB xop ADDDB SUBDB J3 JDB
33- ODB 3 RAI WAI RRI WRI PIN POUT AUOP xop
34- RJB 3 *** RJEB RJLB RJLEB *** RJNEB RJGEB RJGB
35- RJB 3 *** RJNEBJ RJGEBJ RJGBJ *** RJEBJ RJLBJ RJLEBJ
36- JBB 3 JEBB JNEBB JEBBJ JNEBBJ xop xop xop xop
37- ODB 3 SHL SHR SHD FSDB xop xop xop xop
*** => Undefined behavior
Sample code sequences
Notation
In the following examples, FD[insert,mask,shift] denotes a field descriptor with indicated fields. RR format operations are written with the operands in the order Rc,Ra,Rb (destination is first mentioned).
The following constant registers are reserved:
c0: 0, c1: 1, c2: 2, c3: 3, c4: 4, c5: -2, c6: -1, c8: FIRST[INT], c9: 100000B
Packed sequence fetch/store
Assume the following Cedar/Mesa code:
r: REF TEXT; -- in local register Lr (32-bit maxLength in word 1)
i: INT; -- in local register Li
c: CHAR; -- in local register Lc
c ← r[i]; -- generates the following code {10 cycles}
 -- FD: 0 => [0,8,8], 1 => [0,8,16], 2 => [0,8,24], 3 => [0,8,32]
LRI Lr,1 -- push the word containing the bound
LR Lc -- push the character index
SHL FD[0,5,3] -- make it a bit index into the word
FSDB [0,8,8] -- add in the rest of the field descriptor & set Field
RBC [S],Li,[S] -- bounds check the index given (leave Li in [S])
SHR FD[0,30,30] -- [s] = i / 4 {word index}
QADD Lr -- [s] = word address (-2)
RB 2 -- fetch the word containing the desired char
RFU Lc,C0,[S]- -- extract the char and store it to c
r[i] ← c; -- generates the following code {12 cycles}
 -- FD: 0 => [1,24,32], 1 => [1,16,24], 2 => [1,8,16], 3 => [1,0,8]
LRI Lr,1 -- push the word containing the bound
RVSUB [S+1]+,c3,Li -- push 3 - i
SHL FD[0,5,3] -- [s]←([s] MOD 4) * 8
SHR FD[1,12,6] -- [s]←[s] + [s]*64
FSDB [1,0,8] -- adjust the field descriptor & set Field
RBC [S],Li,[S] -- bounds check the index given (leave Li in [S-1])
SHR FD[0,30,30] -- [s] = i / 4 {word index}
QADD Lr -- [s] = word address (-2)
RSB 2 -- fetch the array word (leave addr on stack)
RFU [S],[S],Lc -- insert the char into the array word
WSB 2 -- store the changed word
Procedure body & call
AddFunny: PROC [x,y: INT] RETURNS [INT] = {
z: INT ← x+y;
IF z > 1 THEN z ← z + z;
RETURN [z];
};
Generates
ALS 377B -- point L at x
RADD [S+1]+,Lx,Ly -- z: INT ← x+y
RJLEB 6,[S],C1 -- IF z > 1 THEN
RADD Lz,Lz,Lz -- z ← z + z
ROR Lx,Lz,Lz -- RETURN [z]
RET 0 -- (S ← L+0)
u ← AddFunny[v, w] + 1;
Assume u,v,w are in locals Lu,Lv,Lw
Assume AddFunny to be called via DFC
Generates
LR Lv -- push v
LR Lw -- push w
DFC AddFunny -- AddFunny[v, w]
RADD Lu,[S-1]-,C1 -- u ← ... + 1
Field extraction & insertion
This Mesa code
r: RECORD [field0: [0..7), field1: [0..7), field2: [0..7)];
each field has 3 bits, fields are right-justified
assume r is allocated in local register Lr
...
r.field1 ← r.field0+1;
Generates {8 cycles}
LR Lr -- push r
DUP  -- push r again
SHR [0,3,26] -- isolate field0
ADDB 1 -- add 1 to it
LIB 7 -- push 7
BC  -- bounds check it
SHD [1,3,3] -- insert the number into field1 position
SR Lr -- store it back to r
Arithmetic precision changes
Multiple-precision arithmetic quantities are stored with higher order words in lower addresses, even within the register stack. That is, the word at [S-1] is more significant than the word at [S]. Double precision numbers show up in multiplication and division.
Extend 32-bit signed number on stack to 64-bit signed number on stack {3 cycles}
DUP -- low-order word on top of stack
RUADD [S-1],[S],[S] -- carry bit ← sign bit & put garbage in [S-1]
RSUB [S-1],C0,C0 -- negate sign bit into high-order word, (clear carry)
Narrow 64-bit signed number on stack to 32-bit signed number on stack {4 cycles}
RUADD [S+1]+,[S],[S]- -- carry ← sign bit of low order word, garbage at [S+1]
RADD [S+1]+,C0,[S-1] -- push sign bit plus high-order word (clear carry)
RBC [S],[S]-,C1 -- bounds check fault when [S] # 0, pop stack
EXDIS   -- flush the high-order word
Extend 16-bit signed number on stack to 32-bit signed number on stack {2 cycles}
(assumes leftmost 16 bits are 0s)
RXOR [S],CNSI,[S] -- complement the sign bit
QSUB CNSI -- subtract 100000B, which carries through
Narrow 32-bit signed number on stack to 16-bit signed number on stack {4 cycles}
(assures resulting leftmost 16 bits are 0s)
RVADD [S],CNSI,[S] -- bump the sign (100000B)
RXOR [S],CNSI,[S] -- complement the sign back to its original
LIQB 2000001B -- push the limit+1
BC  -- bounds check fault when not in 16-bit range
Recent Changes
12 Sept 85
Changed definition of byte order to:
AlphaBeta = Alpha*256 + Beta
AlphaBetaGammaDelta = AlphaBeta*256*256 + Gamma*256 + Delta
Affected format definitions:
ODB
was: [op: [0..255], b1,b0: [0..255]] -- 3 bytes
is: [op: [0..255], b0,b1: [0..255]] -- 3 bytes
OQB
was: [op: [0..255], b3,b2,b1,b0: [0..255]] -- 5 bytes
is: [op: [0..255], b0,b1,b2,b3: [0..255]] -- 5 bytes
Added Arithmetic Unit instructions:
AUMV
AUOP
Added AUFault description
Fixed PIN and POUT Defs as ODB's
Fixed PIN-IN and POUT-OUT transposition in table
15 Jul 85
Removed SMUL, UDIV, and floating point
... since these instructions are not supported by Dragon 1, and are not yet settled for Dragon 2.
Removed EXCH
... since it does not get used enough to justify the added IFU complexity.
19 Dec 84
Change to FSDB
FSDB now adds AlphaBeta to the top of stack and places the result in Field, instead of merely placing AlphaBeta in Field. This saves 1 instruction (and 1 cycle) in reading and writing bytes. The examples were updated to reflect this change.
Textual changes
... were made to improve the organization and readability of the document. Several typos were also fixed. A field extract/insert example was added.
Lisp NaN changed
... to test the top three bits instead of the top two bits. This is in keeping with the projected Lisp tag scheme on Dorados.
DJ => JQB
... for a little more uniformity.
17 Dec 84
ALU ops changes (QR format)
The QR format was established, more OI format operations were added, and now there is a common subset of operations (ADD, AND, BC, LADD, LSUB, OR, RX, SUB) in the OI, QR, and RR1 formats. Also, SMUL and UDIV are the surviving multiply and divide operations. Also, SUBDB has been added.
Field ops changes
SHD now allows extraction of fields without requiring that the fields be part of a word.
Kernel/user mode
KFC (Kernel Function Call) has been added to call kernel routines, and RETK (return from kernel) has been added (RETK also replaces RETT).
LEUR, LIFUR => LIP, SEUR, SIFUR => SIP
Once again, the instructions use Alpha to distinguish which unit they address.
Floating point instructions
For the first time, a sketch of the floating point instructions has been provided. The current instructions are SFP, LFIP, and FLOP, but may change.
BNDCK => BC, EP => ALS
... for uniformity.
New PIN & POUT
... as place holders for Pbus access instructions. Details not yet defined.
28 Aug 84
fixes to examples
... to make them more accurate (there were bugs).
23 Aug 84
definition fixes to BNDCK & RBC
... to fix up carelessness.
22 Aug 84
RBC added
... due to interesting uses in arithmetic changes and other special bounds checking.
21 Aug 84
Cleanup pass
... to fix bugs in descriptions. Arithmetic precision changes were added.
MUL & DIV -> MUL, UMUL, RDIV, MDIV, UDIV
... since we think we need signed & unsigned versions of these operations. The difference between RDIV and MDIV is subtle (based on sign of remainder), and may not be in the final machine.
24 Apr 84
Added timing estimates
... to aid in choice of instructions when hand coding.
6 Apr 84
Added MUL & DIV
... as place holders for the eventual instructions. The current assumption is that they will perform complete signed multiply & divide.
Added DJ & ASL
... to make the instruction set more complete. DJ is useful for filling up trap vectors and other long transfers. ASL is useful for cutting back the stack to a known value relative to L, which we do in exiting a block with extra stuff on the stack.
ARL -> AL
... to make the names a little more consistent.
21 Mar 84
Stack underflow trap added
... by arrangement between McCreight and Atkinson. This makes stack save/restore much easier to code (and faster as well).
16 Mar 84
SPR => SEUR & SIFUR, LPR => LEUR, LIFUR
... at the the request of the IFU designers. Notice that this reverses a decision taken on 27 Feb 84. All instructions use identical decoding of Alpha, so EU registers and IFU registers share the same encoding space (see the declaration of ProcessorRegister in DragOpsCross).
15 Mar 84
REC dumped
... since it was largely useless.
9 Mar 84
0-byte instructions dumped
... since the IFU no longer needed them. This lets us determine the instruction length based on the top 3 bits of the opcode.
6 Mar 84
LILDB dumped
... since it complicated the IFU and is not likely to be a very high frequency instruction.
CST changed to OB format
... to simplify the IFU. This was made possible by an IFU change that allows [S-2] to be easily addressable.
1 Mar 84
IN & OUT replace MAP
... Alpha no longer designates the operation. These operations perform read & write operations with cache bypassing (I think).
27 Feb 84
LIFUR & LEUR => LPR, SIFUR & SEUR => SPR
... the location of the register is given by Alpha, not the opcode. The registers of interest are described under LPR.
RL => L
... which is in better agreement with S as a name.
RLADD & RLSUB added
... to support Lisp (& maybe Smalltalk) arithmetic. A new trap has been added for List NaN.
21 Feb 84
Write Protect fault added
... so we could distiguish it from page fault.
RIF dumped
... since it only had a 1-byte advantage over two read instructions.
10 Feb 84
Code generation samples
... were added.
7 Feb 84
Field Decsriptor mask field widened
To help with boundary conditions in generating field descriptors on the fly, the mask field of FieldDescriptor was widened to include 32. This also makes it follow the same convention as the shift field. We may wish to revisit both decisions when we see more cases.
NILCK dumped
NILCK is used to check for NIL pointers before they are dereferenced. We do not anticipate using NILCK often enough to warrant its inclusion. In cases where we would use it, inserting an extra fetch through the pointer will be quite sufficient. We can even afford to keep all of the low 64K of virtual memory unmapped to further reduce this problem.
SHL & SHR
Two useful shift operations, SHL and SHR, have been introduced to replace frequently occuring cases of FSDB followed by RFU.
6 Feb 84
SFCI replaces SFCB
SFCI replaces SCFB to save a byte, which increases the possible number of interfaces that can be called from a given global frame, since we can have 4 bytes of fetching preceding the SFCI. This allows us to dump RIF if it becomes necessary.
RRX replaces RFX
Simple name change.
LCn replaces LIn
LCn now allows short access to the first 8 constants. LCn replaces LIn, which was limited to the first 5 constants.
LRRB bytes swapped
For more commonality with instructions that added AlphaZ before fetching (Curry's suggestion).
AND, OR, and XOR dumped
They are not used often enough to warrant separate instructions. RAND, ROR, and RXOR should be used instead.
Field Unit instructions changed (again)
The Field Unit instructions have become FSDB (Field Setup Double Byte) and RFU (Register Field Unit). The previous instructions (FUDB, FUI, RFUI) are eliminated. FSDB and RFU are sufficient to do everything we need, although they are not always the most compact or efficient means to do so. We may identify special cases later in the design.