Overview
Dragon is a 32-bit machine. In this note, "word" denotes a 32-bit quantity, and usually implies an aligned quantity. Primitive arithmetic operations are 32-bit signed (with overflow checking), or 32-bit unsigned (with no checking). Fetching and storing is done in 32-bit words. Data addresses are 32-bit word addresses. Code addresses are 32-bit byte addresses.
DragOps procedures and processes have 32-bit descriptors, and are very different than in Rubicon or Klamath PrincOps. Other than the addressing limits due to the 32-bit byte addresses, there is no limit on the number of procedures or processes.
DragOps local frames are implemented in a different manner than PrincOps local frames, although the user should not be able to see the difference in normal code. System code will need to be rewritten.
The remainder of this note should be viewed as a proposal, rather than a manifesto. Most of the points are open to debate.
Translation aids
There will be an interactive source to source translation program to aid in the conversion between current Mesa and Dragon Mesa. This program will be somewhat limited in its translations, since it will be limited to source conversion, and checking for patterns in the source. The user will have various conversion options, including warnings and optional substitutions for source patterns. These will be detailed below.
The compiler will have a checking option that will attempt to catch a few doubtful things that the translator will not be able to check. These checks will be based on sizes and types.
32-bit arithmetic changes
Dragon will have 32-bit signed (INT) and 32-bit unsigned (CARD) arithmetic. Dragon Mesa makes a stronger distinction than current Mesa between signed and unsigned arithmetic due to the difference between their operations. Signed arithmetic is the recommended norm, and has overflow detection built in. Unsigned arithmetic is an accomodation to modulus arithmetic, and has no checking, except when converting between signed and unsigned. Address arithmetic uses signed arithmetic and comparisons, although we assume that no objects contain the address equivalent to LAST[INT] or FIRST[INT].
INT &
CARD have order of halfwords reversed from current Mesa
This affects people who use LOOPHOLE to get the high/low halves of 32-bit numbers. It also affects compiler literals. It is not clear if the translator can give useful warnings about these things.
Signed arithmetic is performed with overflow checking
This change will reveal lots of errors about arithmetic assumptions involving 16-bit quantities, and may be the most costly part of the conversion. The rationale is that we now have hardware support for detecting loss of precision in intermediate calculations, so we should use it to try to catch bugs. Applications that perform hashing should use unsigned arithmetic to avoid the overflow checking.
When to use signed and when to use unsigned arithmetic?
When there are only signed values the compiler uses signed arithmetic. When there are only unsigned values the compiler uses unsigned arithmetic. When there are mixed expressions, the compiler uses signed arithmetic, but a warning about mixed arithmetic is given when the compiler is using the checking option mentioned above.
Subranges
Subranges of arithmetic types are also signed and unsigned. The default is signed, so [0..456) is a subrange of INT, while CARD[0..456) is a subrange of CARD, and is a different type than [0..456). NAT is a built-in signed subrange representing the intersection of INT and CARD, and is equivalent to INT[0..2147483647].
Comparisons
Signed comparisons will be faster and smaller than unsigned comparisons, and unsigned comparisons will be faster and smaller than mixed comparisons, although all three classes are supported.
REAL support
The order of halfwords is reversed from the current Mesa. The sign bit of a REAL number is in the same position (in both versions of Mesa) as the sign bit of INT. As with current Mesa, 32-bit IEEE standard floating point is supported. Automatic coercions from CARD to REAL and INT to REAL are supported. Coercions from REAL to INT or REAL to CARD are not supported. There is no support in the language for any modes except the default, such support (if any) being left to the system.
Changes related to 32-bit words vs. 16-bit words
We have chosen to lengthen quantities rather than provide compatibility through extensive declaration of subranges or support for MDS. For addresses, the choice is relatively easy, since the limitations of a 16-bit address space are well understood. For arithmetic quantities, it is because we believe that most programmers use INTEGER and CARDINAL for their speed on 16-bit machines rather than to check for their limited ranges.
LONG POINTER => POINTER, LONG STRING => STRING
POINTER and STRING will denote 32 bit word addresses. The translator will take LONG POINTER into POINTER, and LONG STRING into STRING. The user will have the option of a warning on occurences of short POINTER or short STRING in the source.
LONG INTEGER =>
INT,
INTEGER => INT
INT is the preferred type for fixed-point arithmetic. The translator will take LONG INTEGER or INTEGER into INT, although the user will have the option to turn INTEGER into the subrange INT[-32768..32767].
LONG CARDINAL =>
CARD,
CARDINAL =>
CARD
It is recommended that INT be used in preference to CARD, since this will result in better checking for arithmetic overflow. However, it is recognized that unsigned arithmetic has its uses. The translator will take LONG CARDINAL or CARDINAL into CARD, although the user will have the option to turn CARDINAL into the subrange CARD[0..177777B].
PROCESS (32 bits)
PROCESS is now interpreted as a 32-bit word address of a process control block for processes, although the compiler treats this address as opaque.
Sequences normally have 32-bit signed bounds
They used to have 16-bit (or subrange) bounds. Sequences indexed by signed quantities are OK, but negative bounds are not allowed (checked at creation). Sequence bounds may be anywhere within the common part of the record if forced there by a MACHINE DEPENDENT declaration. If the location and width of the sequence bound is not specified, the compiler is free to put it in any convenient place, although we expect the bound to normally occupy a whole word.
STRING &
TEXT have 32-bit signed bounds
They used to have 16-bit (or 15-bit) bounds. These types become equivalent to the following Dragon Mesa:
TEXT: TYPE = RECORD[
length: INT,
seq: PACKED SEQUENCE maxLength: INT OF CHAR];
STRING: TYPE = POINTER TO TEXT;
Note that negative bounds are prohibited, as with other sequences. Bounds checking is performed against the maxLength sequence bound, although it is conventional to use length to indicate the number of characters of information actually present in the string.
We intend to retain bitwise compatibility between Rope.ROPE and STRING. This means that we also intend to right-justify the length field in word 0, and the maxLength field in word 1. This leaves room for the tag field of Rope.RopeRep.
MACHINE DEPENDENT
The offsets need to be changed to 32-bit quantities (probably by hand). The translator will optionally produce a warning for each occurence of MACHINE DEPENDENT. As in current Mesa, the bits will be numbered from left to right, so bit 0 is most significant, and bit 31 is least significant. Variant record tags may be anywhere within the common part of the record if forced there by a MACHINE DEPENDENT declaration. The translator will give a warning about any use of MACHINE DEPENDENT.
SIZE
A programmer trying to write code that is compatible for a variety of machines will treat SIZE as returning the size in addressable units, which may or may not be related to the word length of the target machine. For a Dragon target machine, SIZE[type] returns the size of the type in 32-bit units. For a Dorado target machine, SIZE[type] returns the size of the type in 16-bit units.
@
Asking for the address of a field that is not on an addressable boundary results in a warning from the compiler, in which case the address generated is the address of the containing addressable unit.
LOOPHOLE
The translator will offer the option of a warning whenever a LOOPHOLE is used.
Variables pad to full words
Variables will pad out to full 32-bit words, instead of 16-bit words.
New features
WORD (32 bits),
HALFWORD (16 bits),
BYTE (8 bits)
WORD (an old synonym for CARDINAL) is now interpreted as a 32-bit non-arithmetic quantity, HALFWORD (a new type) as a 16-bit non-arithmetic quantity, and BYTE as an 8-bit non-arithmetic quantity. The user will be able to coerce these quantitites (without checks) by using INT[x] or CARD[x] (where x is a WORD, HALFWORD, or BYTE). The intention is that programmers should be able to use WORD in most places that UNSPECIFIED is used. The translator will allow conversion of WORD in old programs to either CARDINAL, CARDINAL[0..177777B], or WORD. The translator will allow conversion of BYTE in old programs to either CARDINAL[0..377B], or BYTE.
OFFSET
The
OFFSET of a component is its distance from the base in the same units that
SIZE uses. For example, suppose that we have the following source:
RT: TYPE = RECORD [a: INT, b: CARD];
off: INT = OFFSET[RT.b];
Then off = 1 for a Dragon target machine, and off = 2 for a Dorado target machine. Asking for the OFFSET of a component that is not on an addressable boundary results in a warning from the compiler, in which case the offset generated is the offset of the containing addressable unit.
Features reluctantly included for compatibility
The use of these features is not recommended, but they are supported in an attempt to reduce the cost of converting to Dragon Mesa.
LONG UNSPECIFIED =>
UNSPEC,
UNSPECIFIED =>
UNSPEC
It is recommended that LOOPHOLE be used in preference to UNSPECIFIED. However, UNSPECIFIED will be supported for the sake of compatibility. The translator will take LONG UNSPECIFIED or UNSPECIFIED into UNSPEC, although the user will have the option of a warning on occurences of UNSPECIFIED or LONG UNSPECIFIED in the source. As in current Mesa, an UNSPEC value can be assigned to any variable of compatible width (up to one word). Also, an UNSPEC variable can receive any value of compatible width. UNSPEC values do not participate in arithmetic without an explicit coercion.
LONG DESCRIPTOR =>
DESCRIPTOR
It is recommended that most uses of DESCRIPTOR be converted to uses of some kind of SEQUENCE. However, DESCRIPTOR will be supported for the sake of compatibility. The translator will take LONG DESCRIPTOR into DESCRIPTOR (now based on 32-bit quantities). The user will have the option of a warning on occurences of short DESCRIPTOR in the source.
Changes from Rubicon mostly realized in Klamath
32-bit procedure descriptors
Procedure descriptors will be 32 bits wide, as in Klamath. Unlike Klamath, a procedure descriptor is to be uniformly treated as a 32 bit word address of a 32 bit byte program counter (PC). This change applies to PROC. SIGNAL and ERROR are also made 32 bits wide, and their format is made the same as PROC (calling a SIGNAL or ERROR results in an appropriate call to the signal handling mechanism).
Signal handling based on PC ranges
This change allows us to handle signals arising from instructions other than procedure call.
Other changes
NEW,
START,
STOP of modules not allowed
This mess needs to be cleaned up anyway. Unless there is a strong outcry, multiple instances of global frames for a given module will not be supported.
PORT not allowed
Coroutines are not supported on Dragon. There is a poposal by Howard Sturgis to add coroutines in a different manner than the current support, and I believe that this should be considered separately after the initial conversion.
STATE not allowed
There is no state vector on Dragon. We also get rid of RETURN WITH state, TRANSFER WITH state, STATE ← state, and state ← STATE.
ARRAY[0..0) not allowed
ARRAY [0..0) OF T is a crock. The translator should be able to turn this into
RECORD[SEQUENCE COMPUTED INT OF T]
MACHINE CODE syntax changed
The interior of a MACHINE CODE procedure will be in some simple assembly language. The compiler will support labels to aid in coding conditionals.
Still up in the air
These issues have not yet been decided, although our leanings are usually obvious.
Should we provide built-in types for
INT[-32768..32767] &
CARD[0..177777B]?
Although my guess is that most people will want to use the 32-bit quantities, if enough people frequantly use 16-bit quantities then we could provide INT16 and CARD16 (unless some other name is preferred). This is only marginally useful, since these types could be easily declared in the Basics interface.
Should we support
LONG REAL?
Dragon hardware has provision for 64-bit IEEE floating point support. Eventually we may want support for this type in Mesa. The only question is: how soon?
POINTER TO FRAME?
This construct is essentially a hack to get at multiple global frames, or to sneakily access non-exported variables from global frames. It is not used much at all. Hal Murray indicates that it can be useful when (due to politics) one cannot recompile a module to add necessary access. However, the presence of the abstract machine may make this issue moot.
MUTABLE variant records?
In current Mesa the tag field of variant records is mostly mutable in unsafe code, and immutable in safe code. It would be possible to distinguish between mutable and immutable tag fields. If the variant part of a record is reference containing, then the tag is always immutable. If the tag is immutable, then variant record assignments will perform a check for equal tags before allowing the remainder of the assignment to proceed. If the tag is mutable, then the assignment can proceed. When generating code for
NEW[RT], where
RT is a variant record type, the maximum amount of space is allocated if the tag is mutable, and only the required amount of space for the given variant is allocated if the tag is immutable.
Note that composite RC assignment cannot reasonably be atomic on Dragon. This effectively prevents mutable tags for RC records, since two assignments occuring in parallel could violate storage safety.
Should
ROPE be built-in?
Given its widespread use, probably. There is some question about how to fold it into the language.
Potential cleanup
Here are a few items that we could clean up in this translation. However, there is no architectural reason to eliminate them. Most people agree that these are warts, however.
UNSPECIFIED?
UNSPECIFIED is an implicit LOOPHOLE. It would be better to make such uses explicit.
BOOLEAN => BOOL?
... just to remove unnecessary aliases.