A Modula-2 One-Pass Compiler for Dragon
Juerg Gutknecht
October 1985
1. Introduction
2. Program and Data Structure
3. Runtime Organization
4. Memory Allocation
5. Code Generation
6. Object File Generation
7. Conclusion
1. Introduction
By nature, programming language compilers perform a double function: Determine, if a given sequence of symbols (source program) is a syntactical correct construct, and, if result positive, translate the source program into an equivalent sequence of machine-instructions. Many effective parse-techniques have been developed to make the first step realizable by a computer program. Probably the most elegant is the method of recursive descent. It is based on a program that closely mirrors the underlying grammar. A procedure is associated with each production and instantiated whenever this production is recognized. The state of the parser is completely captured on the procedure activation stack. No explicit data structure has to be built up to represent the parsed sequence of symbols. This attractive characteristic of recursive descent can be fully exploited only if an explicit data structure is not needed to interface with the translation pass. An obvious way out is integrating the translation pass into the parsing process. The result is a one-pass compiler.
The absence of explicit parse tree-structures and intermediate files decisively contributes to both a compiler's simplicity and efficiency. This was the principal motivation behind a recent project, carried-out by N. Wirth and the author at the Federal Institute of Technology in Zurich. It's goal was the design and implementation of a one-pass compiler for the language Modula-2 on the basis of recursive descent. As a matter of fact, the goal was more ambitious: We planned to develop a root-compiler from which variants for actual processors can easily be derived. Not surprisingly, we found the key to success in a rigorous extraction of the part that depends on the target-machine [3, 4]. To summarize the opposite objectives: Most possible interweaving of intrinsic part and machine-specific part from a dynamic point of view, most possible decoupling of these two parts from a static point of view. Variants already exist for Lilith (AMS 2901), Ceres (National Semiconductor 32000) and Apple MacIntosh (Motorola 68000) [5]. All these compilers are implemented in Modula-2 itself.
The current paper presents a Modula-2 compiler whose target is Xerox' Dragon computer [6]. It was constructed by the author at Xerox PARC in the summer 1985. Although formulated in the Cedar language [2], the compiler should be regarded as a member of the above introduced family. While the intrinsic part is merely a translation from Modula-2 to Cedar, the same cannot be said about the machine-dependent component. A reduced instruction set and several architectural pecularities, for example a two-level store, necessitate extended concepts for code-generation and drive one-pass compilation to its limits. We shall particularly pursue this aspect in the subsequent Sections.
2. Program and Data Structure
In general, a large program consists of different functionally closed units or modules that are capable of managing themselves. Actions of these modules are normally triggered off directly or indirectly by the main program that controls the overall process.
In the Modula-2 compiler under discussion, the parser acts as main program. We recognize the following functional modules:
- Scanner: Converts a character stream into a Modula-2 symbol stream.
- Symbol table handler: Maintains the symbol table, i.e. a dynamic linked structure displaying scope hierarchy and describing declared objects.
- Symbol file handler: Supports separate compilation, i.e. generates and reads symbol files.
- Code generator: Generates the actual sequence of machine instructions.
Figure 1 shows the interdependence of these components. In reality, we have further subdivided the parser and the code generator, mainly to attain acceptable module lengths. In addition, we have introduced a base module that merely declares the global data structure. Finally, the whole module configuration is subordinate to a registry module that registers the compile command modula and a command decode to decode an object file. Figure 1 summarizes the resulting module structure, and Figure 2 displays an imaginary snapshot of the symbol table. We notice that the newest version of the root-compiler utilizes binary search trees instead of linear lists [3]. This adaptation has proven to accelerate compilation up to 30%.
It follows a short functional description of the individual modules. Numbers in parentheses indicate the approximate number of source program lines. The overall program length sums up to about 5000 lines.
- M2Impl (220): Registers commands modula and decode.
- M2D/M2DImpl (220): Defines global data structure.
- M2F/M2FImpl (200): Generates code for built-in Modula-2 functions and procedures.
- M2G/M2GImpl (720): Generates code for statements.
- M2H/M2HImpl (700): Generates code for expressions.
- M2I/M2IImpl (450): Generates Dragon instructions, generates object files.
- M2P/M2PImpl (670): Parses compilation units, blocks, declarations and statements.
- M2Q/M2QImpl (570): Parses types and expressions.
- M2R/M2RImpl (600): Generates symbol files and reference files, reads symbol files.
- M2S/M2SImpl (380): Scans source text and converts to Modula-2 symbols.
- M2T/M2TImpl (320): Maintains symbol table.
The command modula accepts a sequence of module-names (extension def for definition modules, mod for implementation modules). Definition modules are compiled into symbol files (extension sbl), implementation modules into pairs of object files (obj) and reference files (rfc). Reference files are generalized symbol files and serve symbolic debugging [4]. The decode command accepts a sequence of pairs of object-files and text files. The latters contain the decoded object file.
3. Runtime Organization
Let us first recapitulate Dragon's resources and architecture. The Dragon processor consists of two components: the instruction fetch unit and the execution unit. Both are permanently connected to the memory. The execution unit provides several sets of wordsize-registers. Two of them are of relevance to the presented compiler: 12 registers c[0], c[1], ..., c[11] and a stack of 128 registers r[0], r[1], ..., r[127]. The c-registers are intended to contain distinguished constant values. In particular, our compiler assumes c[i] to contain integer constant i for i=0...4, and c[6] to represent a mask of all ones (value -1). The r-register stack is circularly ordered. We shall call window any sequence of 16 contiguous registers in the stack. The two registers r[S-1] and r[S] at the top of the stack and the members r[L], r[L+1], ..., r[L+15] of the active register window are accessible by machine instructions (see Appendix 1). Registers of the active window are called local registers. They are assigned to local data. Two pointers S and L in the instruction fetch unit define the state of the register stack. Further, the instruction fetch unit maintains a program-count register (PC) and a stack of procedure contexts. A procedure context is a pair (L, PC) of a pointer L to a register window and a pointer PC to a program location. Procedure call instructions extend the procedure context stack, return instructions reduce it. The procedure context stack displays the dynamic link of procedure activations.
With these foundations in mind, we now turn to the runtime model, more precisely, to the hierarchial representation of instantiated procedures and modules. Figure 3 may serve as a sample accompanying our explanations. Whenever a procedure is activated, its entry routine creates a new local frame. A frame consists of a data part and two linkage control pointers. The data part contains the local data of the activated procedure. It is placed in local registers, and, if necessary, in a memory frame extension. We shall postpone the detailed explanation of its organization to the next Section. Both linkage control pointers are always part of the local register window. The first is the static link (SL). It points to the base of the frame extension that represents the next lower static level. In the case of an outermost procedure to be called, SL is the global base and is loaded by the called procedure. In all other cases, SL is expected to be provided by the caller as an additional parameter. The second linkage pointer (FRX) points to the own frame extension. Notice that in special cases one or both of these pointers may be void. Allocation of a local register is then a tribute to the one-pass compiling strategy.
We extend this scheme of procedure activation frames canonically to include the scope of global data. However, there is a principal difference between local and global data. The latter outlast the corresponding program body, i.e. the module's initialization part. Therefore, global data are allocated by the module loader entirely in memory. However, both linkage control pointers remain in local registers. The FRX-pointer becomes the global base register and the SL-pointer points to the base of the literal (string) table. Thus, access to literals fits harmoniously in the scheme of access to outer level data by tracing the static link.
Examining the technique of data access more closely, we make two observations. First, access to data of the next outer scope is equally efficient as access to data in the local memory frame extension. In both cases, a local register (SL or FRX) acts as a base. The Dragon instructions LRI (load indirect) and SRI (store indirect) ideally support this access mode. Notice a desirable consequence: Non-nested procedures have particularly efficient access to global data. The second observation is less than pleasing: It is architecturally impossible to access a data element, if it is located in the register window of a different frame. There exist two cases necessitating such access: 1) A procedure addresses a local register variable of an outer scope, and 2) A procedure addresses a local register variable, handed over as variable parameter. In both cases it is the callers responsability to allocate these externally accessed variables in the appropriate memory extension. The one-pass strategy is again put to a severe test.
Our solution bases on backup copies of externally accessed local register variables in the memory frame extension. To that aim we associate with each procedure the set of local register variables that are accessed (imported) from the next outer scope and with each procedure call the set of local register variables that are handed over as variable parameters (exported) to the called procedure. At a procedure call, the caller backups all exported register variables in the local frame extension. In the case of a nested procedure call, backups include the set of register variables imported from the caller. Remember that Modula-2 allows no variables of nested procedures. Therefore, this method is correct for calls of procedure variables as well.
As a matter of fact, external access of local register variables is not the only case requiring a memory backup of local registers. A memory copy of the static link itself may be necessary to establish the static link chain. Our Modula-2 compiler handles both cases in a unified way. Whenever a memory backup of local registers is demanded, a backup area of maximal 16 words is included in the memory frame extension as its first section. The i-th local register is mapped to the i-th memory backup word with two exceptions: The SL-register is sent to word 0 and the 0-th local register to the word that corresponds to the SL-register number, i.e. the mapping twists this pair (SL, local register 0). Notice that the memory word corresponding to the FRX-register is unused so far.
In contrast to procedures, modules do not generate new frames. A sub-module's initialization sequence is activated at entry to its ancestor procedure by a subroutine call instruction, but without changing the L-pointer. Before a main module can take control, its constituents have to be loaded from the object file by the module loader. We already know that the loader is expected to allocate the global data frame. In addition, its functions include loading the literal data frame and the code frame. We shall explain the object file in more detail in Section 6. The actual initialization of a main module is triggered off by import statements of client modules. The module initialization flag prevents from multiple initialization. It is allocated to the unused memory companion of the FRX-register, i.e. to the second word of the global frame.
In addition to the Modula-2 runtime organization as described above, system runtime-support is required. It includes backup and restore of the r-register stack in the case of overflow and process switching as well as routines to be called to handle traps and frequently occurring situations that are inappropriate or awkward to inline-code. These include block-moves, integer multiplication and division, real-number arithmetic, memory allocation and deallocation and bitset constructors (see Appendix 2). All runtime routines are called by a special CALL-instruction. Parameters are made available on the stack.
In this Section, we have outlined the runtime-hierarchy of frames in the large. The next Section will be devoted to the organization within a single data frame.
4. Memory Allocation
The basic unit of data to be stored in a frame (local or global) is the variable. Variables appear in three different forms: As formal parameters, as named variables and as anonymous variables. Anonymous variables are used to store the record-base in WITH-statements. Allocation of variables of all three forms is handled uniformly, although at different times. Formal parameters are allocated by the procedure caller, named variables by the called procedure at entry, and anonymous variables by the called procedure during statement processing.
To each variable a cell of wordsize is assigned. If the variable is elementary, it is placed directly in the cell. If the variable is structured (i.e. of array-type or record-type), it is placed at a certain memory location, and the cell points to that location. Open-array parameters need a slightly different handling. Their description is split up into two cells, the first pointing to the respective memory location, and the second indicating the length of the array.
In global frames, all cells are stored in memory, one after the other. In local frames, cells are placed first in local registers, and, in case of overflow, in the frame's memory extension. All parameter cells are assumed to fit in local registers. The two linkage control registers (Section 3) immediately succeed to parameter cells. Cells for named and anonymous variables fill up the 16 local registers, and, possibly, overflow to memory extension (see Figure 3). This scheme limits the number of parameters to 14. A second limit applies to the total number of cells. In favor of uniform and efficient access via the FRX register using LRI and SRI instructions, this number (including the two linkage control registers) is limited to 256. It is perhaps worth to emphasize that only one cell is associated with iteratively structured variables (for example an array of records). Namely, within the assigned memory area, the whole structured variable is canonically unrolled.
We have now a complete survey of the possible contents of a memory frame extension. Recalling memory backup of static link and externally accessed registers (Section 3), the following situations demand allocation of a memory extension to the frame of procedure P:
- P contains a nested procedure.
- P contains structured local variables (including parameters).
- P contains more than 14 local variables (including parameters).
- P hands over one of its local register variables as variable parameter.
It is noteworthy that, statistically, most procedure activations will manage without memory frame extension. In fact, investigations have shown that, in an object oriented environment, most calls are in fact calls to procedures without any of the above properties. Once being aware of these conditions, the percentage can (and should) obviously be increased by an appropriate programming style.
Considering the necessity of runtime-calls to allocate and deallocate memory, it is desirable (though not necessary) to collect all memory parts of a frame to one connected block. This is in fact practised with the only exception of open arrays. Open array value-parameters are separately allocated depending on the extra cell describing their size.
In connection with allocation and deallocation of local frames, perhaps unexpected difficulties appear. The procedure entry sequence and, possibly, return sequences have to be compiled before the exact layout of the frame is determined. Indeed, the statement sequence itself can contribute to the frame layout in two ways: By passing over variable parameters (demanding a memory frame extension) and by WITH-statements (introducing anonymous variables). Both the number of eventually occupied local registers and the size of the memory extension are therefore unknowns, and both require a particularly sophisticated handling. Given the fact that no parameterized instruction is usable, the first unknown necessitates an unknown number of instructions to be generated to adjust the stack-pointer. The second unknown includes the case that no memory extension is needed at all, a case that obviously derserves a special handling.
We have overcome these problems by making extensive use of the fixup technique. In fact, the entry sequence and all return sequences have to be fixed before exiting the procedure. Although no run-time overhead has to be paid (noop-instructions are eliminated on the object file, see Section 6) this seems not to be the ultimate solution. After all, this is an impressive example of how complexity may increase in a one-pass compilation environment, if resources are to be allocated implicitly as required.
In Modula-2, variables are not initialized automatically. However, formal parameters and cells that point to the memory frame extension do have to be initialized; the former by the caller, the latter by the initialization sequence of the called procedure. A new difficulty comes to the fore. At the time of evaluating the actual parameters, no memory frame extension of the called procedure exists yet. Hence, structured value parameters can not been assigned to their formal counterparts at this time. In opposite to the previous problem, an elegant solution exists here. Instead of the (address of the) formal parameter, the actual parameter is passed over and the copy-operation postponed to procedure initialization. The initializing sequence then simply copies the actual parameter value to the correct location within the memory extension and fixes the respective formal parameter.
This terminates our outline of data allocation. In the next Section we shall learn how data structures are processed by Dragon instructions.
5. Code Generation
The data objects declared in a module are the basic constituents of a program. The symbol table specifies their properties and attributes exhaustively and in a concise way. For example, information about type, size and allocation (cell) of each object is stored in the symbol table. Or in the case of a procedure object, the set of all imported variables (see Section 3). However, these objects are not the only items the code generator finds when processing a statement sequence. A new category of items appears while compiling selectors, expressions and procedure calls, for example. Taking pattern on [3], we associate an item descriptor (item in short) with each compiled construct. Analogously to object descriptors, items specify the attributes of the corresponding construct. For example, the item controlling a procedure call specifies the set of exported variables (see Section 3).
In opposite to objects in the symbol table, items are of a transient nature. Typically, they are local variables of compiling procedures. A new item is created whenever the compilation of an appropriate construct starts. During its lifetime, the item assumes various states or modes. The set of possible modes is indicated in Figure 4. We emphasize that carefully tailoring the item mode space to the machine's architecture and instruction set is absolutely crucial to the quality of generated code.
Before a data element can actually be processed, it has to be made available or loaded. Dragon provides three kinds of resources to keep a loaded data element: Register stack (top), local registers and constants registers. stkMd (resp. regMd) indicates the element to be loaded on top of stack (resp. in a local register). In conMd, an integer constant can be regarded as loaded in a constants register, if it represents one of the special values 0, 1, 2, 3, 4 or -1 (see Section 3).
Obviously, the notion of (genuine) loading is restricted to data elements of wordsize. In the case of compound data we shall interpret loading the object as loading its address. Notice that this extension is in perfect concordance with the cell concept (see Section 4). Of course, if allocated in memory, loading the address has an intrinsic meaning also for data elements of wordsize. As a matter of fact, it is frequently a first step to loading the element itself. The two modes sadrMd and ladrMd cover the cases of loaded address on the stack and in a local register. Their exact definition is motivated by the members LRI, SRI, RSB, WSB of Dragon's load and store instructions arsenal. In the sadrMd, the base address resides on the stack, in the ladrMd it is contained in a local register. In both modes, a positive byte offset precisely specifies the position of the data element.
Going down another step in the scale of "loadedness" we meet the modes sinxMd and linxMd. As suggested by their names, they refer to indexed data. Both modes assume the index value on top of the stack. sinxMd takes the base address from the word immediately below the stack top, linxMd takes it from a local register. These modes are closely related with the load instructions RX and QRX. Here, no store-counterparts exist. The modes typMd, procMd, codMd, varMd and conMd are initial modes. One of these modes is assigned to a fresh item at its creation, depending on the kind of the underlying object.
The two remaining modes opnMd and jmpMd are of particular interest. Both are artificial modes and delay the emission of code. In essence, they prevent from precipitately emitting code to load data onto the stack and thus allow for a slightly context-sensitive handling. The opnMd specifies an open operation, i.e. an operator and two operands on the stack or in registers (reflecting the RR-instruction format). Take as a simple example the statements n := i+j and n := i+j+k, where i, j, k and n are integer variables in local registers. In the first case, the result i+j should directly be assigned to k, using the register add instruction RADD, whereas in the second statement the same result must intermediately be loaded onto the stack.
The jmpMd is even more sophisticated. It subsumes information about a whole conditional expression (to be conditionally evaluated in Modula-2). This information splits up into two parts: Following [5], the first part includes two chains, the true-jump-chain and the false-jump-chain, both linking jump instructions within the object code. As soon as the destination of either true-jumps or false-jumps is determined, the corresponding chain is fixed up. The second part specifies the most recent relation in the conditional expression, i.e. a relational operator and an argument on the stack or in a local register. The other argument is assumed to be on the stack (reflecting the RJB-jump-instruction format). If an explicit Boolean value has to be tested (represented by 0 if FALSE and by 1 if TRUE), JBB-jump-instructions instead of RJB-instructions are generated. Notice that the most recent relation replaces the condition code of conventional processors.
The necessity of fixing up forward jump destinations is inherent in one-pass compiling. At the time of emitting a jump instruction, not even the order of magnitude of the jump displacement is known. It is certainly justifyable to emit an instruction with a capacity for the displacement of, say, two bytes. Unfortunately, in case of Dragon, conditional jump instructions provide a one-byte displacement only. However, a long distance jump to D on condition C (JMP C, D) can be expressed by an indirect jump sequence (JMP NOT C, L; JMP D; L: ...). Therefore, it is a reasonable solution to emit the substituting sequence throughout, and to transform unnecessary long jumps into short jumps before generating the object file. We shall go into more detail on this transformation in the next Section.
Once having fixed all item modes, the art of code generation essentially consists in loading data operands in the correct sequence at the right time into the right resource. Accordingly, the heart of the code generator is a collection of loading procedures. As with item modes, loading procedures occur in pairs: Load and LoadStk on one hand, LoadAdr and LoadAdrStk on the other. All of these procedures take an item as parameter. For example, calling LoadStk for any item, except an item in jmpMd, puts that item into stkMd. Calling Load for an item in varMd results in regMd, if the variable cell is located in a local register, otherwise in stkMd. Loading an sadrMd item always ends up in stkMd. Loading the address of a wordsize-variable whose cell is in the memory extension gives ladrMd, and loading the address of an item in linxMd ends up in sadrMd. Finally, what means loading an item that is in opnMd? In fact, this includes evaluating the pending operation in such a manner that the result goes onto the stack.
In a way, the assignment generator is a store-counterpart to the diverse loading procedures. It depends on a preparing procedure, essentially guaranteeing the left-hand side of the assignment statement to be in a state of loaded address or regMd. The assignment generator itself performs type-checking and the actual assignment. In the case of an opnMd item to assign to a local register variable, it generates the corresponding RR-instruction.
Let us add a few comments on the compiler's present state as regards code optimization, data packing and bounds checking. We have seen that two modes opnMd and jmpMd conceptually support lazy evaluation. Further, early loading of constants is carefully avoided. In particular, the compiler evaluates constant expressions at compile time. Finally, integer multiplications and divisions by powers of 2 are resolved into shift-operations. To the current version, the word is the elementary memory unit. No packing at all is provided. Obvious candidates for packing are arrays of characters, short integers or cardinals. Packing bases extensively on Dragon's sophisticated field-unit. For example, field operations allow replacing of single bytes in a word. To compile an assignment statement a[i] := ch, where ch is of type CHAR and a is a packed array of characters, the assignment preparation routine would have to supply the address of the word containing a[i], the displacement of a[i] within that word, and the word itself. Checks on subrange bounds are unconditionally emitted. However, stack size checks are omitted in the current version. In seldom cases, this could lead to situations that are incorrectly handled by the system's runtime routine that manages the register stack.
6. Generating Object Files
In principle, emitted instructions are definitive. However, we have seen that fixups are indispensible at isolated places. Two categories exist: Forward jumps and procedure entries and returns. Remember that in both cases the provisionally emitted code covers the rarely occurring worst case. It appears therefore advisable to delay the actual writing to file in order to enable removing superfluous instructions afterwards. As a matter of fact, the object file generator includes a code compression process. It is governed by two transformation rules: 1) If displacement fits in one byte, substitute an indirect jump sequence by a direct jump, 2) Eliminate empty operations (noops). The process is complicated by the need of relocating the remaining code.
In that it combines the activities of compressing the code and writing it to file, our algorithmus avoids actual code moving. First step: Divide the entire code sequence into segments, each of them starting at a compressible location. Second step: Write the segments to the object file, one after the other, and relocate displacements of jump instructions on the basis of the segment table. The segment table displays the amounts of length reduction of each segment. Again, an exceptional case complicates the algorithm. In addition to jumps, there exists another kind of relocatable instructions: Loads of jump displacements to prepare a stack jump SJ. To distinguish these load instructions, the compiler emits a virtual operation code LIPC. Subsequently, the compression algorithm translates it into an ordinary load and relocates its operand.
The object file appears to the module loader as a unit that consistently describes a single module at runtime. The description includes code frame, literal string frame and size of global data frame. In the present version, strings in the literal string table have to be unpacked when loaded. An entire runtime configuration is built up in general by joining together a whole collection of modules. Places within individual modules where intermodular linkages become explicit are inavoidable. It is the linker's responsibility to establish correct linkages at these places. A table of pointers is included in each object file to indicate the linkage places. The exact specification is encoded at each linkage place itself.
Two kinds of linkages principally exist: Linkages to global data and linkages to (exported) program routines. Linkages of the first kind require the base of a literal or global frame (possibly the module's own) as operand of a LIQB instruction, linkages of the second kind require a jump destination in a DFC instruction. In the first case, the frame is identified by the provisional operand ModNo*256+0, if it is a literal frame, and by ModNo*256+1, if it is a global data frame. The module number ModNo is an index into the object file's module table, specifying the names of all imported modules. ModNo 0 always refers to self. In the second case, the external procedure is coded as ModNo*256 + ProcNo. Thereby, ProcNo is the index into the entry table of ModNo's object file. The entry table shows the entry addresses of all exported procedures, relative to the base of the code frame. ProcNo 0 always specifys the module's main body.
7. Conclusions
A one-pass Modula-2 compiler has been constructed for the Dragon computer, a specimen of a new generation of processors with a somehow reduced instruction set and a two level store. This compiler is a cross-compiler and is implemented in the Cedar language under the Cedar operating system. It is a derivate of an existing root-compiler formulated in Modula-2.
By far the largest part of the translation of the root-compiler from Modula-2 to Cedar was straightforward. However, two difficulties are perhaps worth to be mentioned. They stem from a different view of the two languages in two respects. First, Cedar allows no variable parameters but multiple output parameters. In most cases, a variable parameter in Modula-2 could be substituted by a pair of two parameters to the corresponding Cedar procedure, one input and one output. A different way was taken in the case of items. In Modula-2, items are local variables to compiling procedures and are passed over as variable parameters. In the Cedar variant, items are allocated in the heap. They are handed over as locally declared references pointing to the actual item records. We notice that this is a good example of how the garbage collector becomes involved unnecessarily.
The second difficulty is more severe. Variant records play an important role in a compiler. In fact, objects of all kinds (types, constants, variables, procedures) have to be linked together in the symbol table, i.e. have to be described by the same base record type, though by different variants. An analogous statement is true for structures. In an environment with integrated garbage collection, as Cedar is, variants containing themselves references to collectible storage are impossible. We substituted variants in Cedar by REF ANY connections. Unfortunately, a substantial drawback has to be put with this translation. It is founded in an awkward access method of records that are connected via a REF ANY pointer. Consider, for example the following Modula-2 statement:

IF (x^.varpar # y^.varpar) OR ((x^.typ # y^.typ) &
((x^.typ^.form # Array) OR NOT x^.typ^.dyn OR
(y^.typ^.form # Array) OR NOT y^.typ^.dyn OR
(x^.typ^.ElemTyp # y^.typ^.ElemTyp))) THEN error
END

Its Cedar counterpart reads:

IF (x^.varpar) # (y^.varpar) THEN error
ELSE
IF x^.typ # y^.typ THEN
IF
(x^.typ^.form # Array) OR (y^.typ^.form # Array) THEN error
ELSE
{ xarray: M2D.ArrayPtr ← NARROW [x^.typ^.ext];
yarray: M2D.ArrayPtr ← NARROW [y^.typ^.ext];
IF NOT xarray^.dyn OR NOT yarray^.dyn OR
(xarray^.ElemTyp # yarray^.ElemTyp) THEN error
The notational triumph of conditionally evaluated expressions is more than undercompensated. This drawback is particularly ironical in our case as the compiler does not profitably use garbage collection at all. In fact, a simple mark to the heap data of the currently compiled scope would do as well.
Concentrating now on the actual compiling activity, a few words on one-pass restrictions are in order. In general, they become effective in two respects. The first concerns the language itself. Objects must be declared textually before their use in statements. In Modula-2, the only critical point is mutual procedure call. The newest version of Modula-2 [1] in fact provides a forward declaration possibility for procedures.
The second class of one-pass restrictions is imposed by the target machine. First, its punctual or local nature is a restricting characteristic of one-pass compilation. A one-pass code-generator never takes a global view. Extreme context-sensitivity or optimizations are therefore a priori excluded. Second, in a one-pass compilation, resources (typically storage) have to be assigned to objects explicitely before their first use. In particular, they can not be made dependent on how an object is used across a whole program section. In summarizing we state two rules for one-pass suitability of a machine-architecture: 1) Effective use of the instruction set must not require context-sensitive handling, and 2) Resources have to be either homogeneous or structurally dedicated. Roughly speaking, the more diversity and "usage directedness" of resource assignment, the less suited the architecture to one-pass compilation.
Dragon's three-operand RR-instructions provide a typical example of a concept requiring a context-sensitive handling. These instructions then lead to the open-operation item mode opnMd. However, context-sensitivity turned out as not a vital drawback in the Dragon architecture regarding one-pass compilation. In fact, careful tuning of item states together with modest local optimizations (see Section 5) pay in code of a very reasonable quality. The inhomogeneous two level store and the non-uniformity in expressing long-distance and short-distance jumps do not comply with rule 2). We have seen that both significantly increase the code generator's complexity and require sophisticated post-processing. In a way, the extent and complexity of post-processing needed is a measure of overtaxing the pure one-pass concept. A further example of intended usage directed resource allocation is Dragon's lack of dedicated base registers for the current frame extension and the global base. The receipt is the necessity of an implicit reservation of two local registers for each procedure activation, if actually used or not.
In addition, there are a couple of characteristics of the Dragon architecture that render code generating more difficult than necessary, independent of the one-pass restriction. First, it is the double nature of the Dragon as a stack machine and a register machine that makes loading of data a multi-dimensional and subtle business. The code generator continuously balances between security (loading onto stack) and efficiency (delaying the loading onto stack). Notice that stkMd is very susceptible to getting obsolete.
Operators do not behave uniformly in the accepted locations of operands. RR operations, for example, accept both operands on stack, in local or constant registers or accept any combination. RJB operations, however, assume the first operand to be on the stack. This violates the recommendable principle of regularity. Further, no store-counterparts to the RX and QRX instructions exist. This is an example of an (undesirable) non-symmetry of the instruction set. Finally, missing the quick variant QXOR of the register XOR operation simply shows incompleteness.
After all, the very fact that this one-pass Modula-2 compiler for Dragon exists in its current form proves the usefulness and power of all involved components. A single statistical figure to conclude: An investigation of several realistic modules showed an average instruction length of 2.1 bytes with variations from 1.9 to 2.2 bytes.
References
[1] Programming in Modula-2, N. Wirth, Springer 1985
[2] Cedar Reference Manual
[3] A new Modula2 compiler, N. Wirth, to be published
[4] A new way to efficient Modula2 Symbol Files, J. Gutknecht, to be published
[5] A comparison of Microprocessor Architectures in the Light of Code Generation by a Compiler, N. Wirth, to be published
[6] DragOps, internal document