9. Compilation Methods
Introduction
The "meat" of Saffron, in my opinion, is the Casaba code which actually compiles things. Basically, the compile operations go something like this: You start out with the parse tree of something you want to compile (a module, block, statement, initial value, or expression), the context tree you've built so far, the program graph you've built so far, and the CompilerState.
When compiling an expression, you'll also have a TypeGraphNode, the target type of the expression. And you don't care about the program graph, because compiling an expression never adds anything to the program graph.
You get back the following. (1) A new context tree...the old one, but with sub-contexts created during the compilation hanging off of it. (2) A program fragment, the code for the execution of the particular module/block/statement you compiled. (3) A new program graph...the old one, but with new procedure graphs added for any transfer type values that were encountered during the compilation.
Again, it's a bit different for expressions. Instead of getting a program fragment, you get a Value. From this value, you can extract its type and the code which places the value on top of the runtime stack. Also, you don't get a program graph back, since you didn't pass one in.
There's also some funny stuff regarding definitions modules.
And initial values.
Compiling a Block
A block consists of the following, all of which are optional:
* An OPEN list
* Signal catchers, introduced by ENABLE
* Declarations (constants, variables, types)
* Statements
* Exit list
The code which compiles blocks is in SaffronBlockCompileMethods.ThreeC4. There are two procedures which enter this code, CompileFrameBlock and Compile. CompileFrameBlock is used to compile blocks which occupy new frames (global frames in the case of program blocks, stack frames in the case of procedure blocks) and possibly have argument and result lists. Compile is used to compile plain ol' blocks which are nested within frames. The code does the following:
Create a local context.
Start off by creating an empty context and an empty field list. CompileFrameBlock has to add the argument and result fields (from the transferType argument) to the field list. (*** This isn't implemented yet.) Then call InternalCompileBlock to continue.
Add
EXITS list to the context.
*** Not implemented.
Call CompileScope to continue.
Add
OPEN names to the context.
*** Not implemented. This is one feature which used to work, partially, but I bashed. If you look in Saffron2.0, you'll notice that Saffron2.0 lets you open interfaces. Well, there's a lot of other things that can be opened (records for instance); this involves hanging on to some parse trees and building some a structure (to be a field of the local context) which handles the bizarre scoping rules in OPEN statements. Also, all the lookup routines will have to know how to cope with this structure. For your amusement, see sections 4.4.2 and 7.2.2.2 of the Mesa manual, and section 3.4.2 of CLRM.
Do something with the
ENABLE clauses.
*** Obviously, not yet implemented.
Add the
EXITS list to the context.
*** Not implemented.
Build a field list.
The function AddDeclarationsToFieldList is used to build a field list with one field for each name in the declarations. Identifiers appearing where type expressions are expected are represented as IdentifierTGN's (they are not looked up yet).
Look up the type names.
Now go back and look up all the IdentiferTGNs. Replace the body field of each identifer TypeGraphNode with named TypeGraphNode corresponding to the identifier. (Note: At this point, all value expressions are still trash or unparsed or defaultMe.)
Build a dependency graph.
Analyze the dependency graph.
Do a topological sort of the dependency graph, performing the following on each node:
If the node is a value node which does not ultimately depend on the runtime state, then it must represent a static (compile-time constant) value. Use the expression compiler to change the value's parse tree into a static representation.
If the node is a value node which ultimately depends on the runtime state, then it represents a variable or runtime constant. Do nothing with it at this time.
If the node is a FIRST, LAST, or SIZE node which does not ultimately depend on the runtime state, then use the expression compiler to compute a static value and stuff this value into the appropriate slot of the appropriate type graph node.
If the node is a FIRST, LAST, or SIZE node which ultimately depends on the runtime state, then raise an error.
This will determine all the compile-time (static) quantities in the dependency graph.
Add the dependency graph to the local context.
We don't need to keep the dependency graph around. The only reason we do is so we can print it out when we print out the local context.
*** I cheat here; AnalyzeDependencies also performs this step. There really otta be a separate primitive which does this simple operation; I'm just too lazy busy to write it.
Store the contents.
Freeze the local context.
Produce a context tree.
Compile the declarations.
Proceed sequentially through the declaration list, performing the following on each declaration:
If it is a TYPE declaration, ignore it. (What about type codes, etc.?)
If it is a constant declaration, and the value slot of any corresponding entry in the field list contains a static ValueNode, then the field represents a compile-time constant declaration. Since compile-time constants have already been determined, simply ignore the field. (If we want these values around at runtime, for debugging or other reasons, then emit an appropriate StoreLocal instruction.)
Otherwise, the field represents a constant or variable declaration which requires a runtime representation. Compile the field's initialization expression; the result will be a context tree and a program graph fragment which places the initial value on top of the runtime stack. Concatenate this code with a StoreLocal instruction. Hang the child context tree from the context tree.
Compile the statements.
Expressions
Overview
There are three sets of functions that are applied to expressions. The first set is used while building the dependency graph; these functions look at an expression and determine what other quantities in the dependency graph that the expression depends upon. The second set is used to evaluate expressions while the dependency graph is being walked through. The third set is used to compile expressions during the general process of compiling declarations and statements.
Building the Dependency Graph
The interesting functions which assemble the dependency graph are AddValueDependencies, AddFirstDependencies, AddLastDependencies, and AddSizeDependencies. The methods for these functions appear in SaffronDependencyGraphMethods.ThreeC4.
The specification for AddValueDependencies[tree, dg, dgn] reads like this: A particular node dgn in the dependency graph dg depends on the value of this expression tree. Add edges to the dependency graph from dgn to every node in the dependency graph that tree depends on.
The specification for AddFirstDependencies[tree, dg, dgn] reads like this: A particular node dgn in the dependency graph dg depends on the first element of the type described by this type-expression tree. Add edges to the dependency graph from dgn to every node in the dependency graph that the first element of the type described by tree depends on.
AddLastDependencies and AddSizeDependencies do essentially the same thing that AddFirstDependencies does, except that they care about the last element (AddLastDependencies) and the size (AddSizeDependencies) of the type described by tree, not the first element.
Evaluating Expressions
The interesting functions which evaluate expressions are EvaluateAndTypeCheckExpression, EvaluateExpression, EvaluateSizeOfTypeExpression, EvaluateFirstOfTypeExpression, and EvaluateLastOfTypeExpression. EvaluateAndTypeCheckExpression is a primitive function implemented in SaffronTypeConformanceImpl.Mesa. The others are tree-recursive functions whose methods appear in SaffronExpressionCompileMethods.ThreeC4.
The various Evaluate routines are used while the dependency graph for a block is being walked. At this time, the local context for that block is still under construction. The Values that the Evaluate functions return are always compile-time constants.
The Evaluate functions take the following arguments:
(1) A parse tree for whatever expression is being evaluated.
(2) The local context of the block in which the expressions appears. Since the local context is still being built, it does not yet contain any information about names declared within that block. It is only useful for extracting its parent context rib, which has already been built and frozen.
(3) The field list for the contents of the block in which the expression appears. Here is where information is available about the names declared within the block.
(4) The CompilerState, which is needed here and there.
(5) A type graph node, the target type of the expression. Except in EvaluateAndTypeCheckExpression, the target type is used
only as a hint for evaluating the expression.
The only case I can think of right now where the target type is actually needed is when evaluating identifiers; if the target type is an enumerated type, then the identifier should be considered an element of the enumerated type in preference to being considered as a variable identifier. Take a look at the Anomaly for Enumeration Literals in CLRM, section 4.7. *** BTW, this isn't implemented.
The EvaluateAndTypeCheckExpression function calls EvaluateExpression, then checks the type of its result against the target type. If the two types are not compatible (I forget which is the appropriate relationship), then EvaluateAndTypeCheckExpression raises an error.
EvaluateAndTypeCheckExpression should not change the type of its argument; this will cause trouble if we start using a target type of "TOP"...
EvaluateExpression is the function that really does the work of evaluating an expression. It has all the gory methods.
The functions EvaluateFirstOfTypeExpression, EvaluateSizeOfTypeExpression, and EvaluateLastOfTypeExpression are applied to type expressions; each produces the appropriate value of the type which the type expression describes.
For type safety reasons, EvaluateAndTypeCheckExpression is the only function which should be called to evaluate expressions. No one should ever call EvaluateExpression, except for EvaluateAndTypeCheckExpression itself.
*** Currently, some of the methods which evaluate the arguments of polymorphic operators call EvaluateExpression. This is wrong. Here is how they should work. The numeric operators should call EvaluateAndTypeCheckExpression with a target type of "TOP" for each of their arguments, then use the DemandNumber primitive to ensure that the arguments have numeric types. The equals and not-equals operators should call EvaluateAndTypeCheckExpression with a target type of "TOP" for the first argument, then call EvaluateAndTypeCheckExpression on the second argument, using the first argument's type for the target type.
Compiling Expressions
If you thought that evaluating expressions is hairy, wait until you try compiling them! The functions which compile expressions are similar to those which evaluate them. They are CompileAndTypeCheckExpression, CompileExpression, CompileSizeOfTypeExpression, CompileFirstOfTypeExpression, and CompileLastOfTypeExpression. CompileAndTypeCheckExpression is a primitive function implemented in SaffronTypeConformanceImpl.Mesa; the others are tree-recursive functions whose methods appear in SaffronExpressionCompileMethods.ThreeC4.
It's very important that if Evaluate*Expression accepts an expression and returns a value, then the corresponding Compile*Expression function also return the same value.
The sundry CompileExpression routines are used while declarations and statements are being compiled. At this time, the local context (wherein the expressions appear) has been completely built and frozen into a rib. The Values returned by CompileExpression routines are generally fated to become part of a PushConstant program graph node.
The CompileExpression functions take the following arguments:
(1) A parse tree for whatever expression is being compiled.
(2) The context tree whose rib is the very same rib I was talking about two paragraphs above.
(3) The CompilerState, which is needed here and there.
(4) The target type. Just like in the Evaluate functions, the target type is used only as a hint for evaluating the expression, except in CompileAndTypeCheckExpression
The CompileExpression functions return the following values:
(1) A Value, which might be runtime or static. The interesting parts of this Value are its type, which is used by CompileAndTypeCheckExpression, and the program fragment which pushes the value onto the runtime stack.
(2) A new and improved context tree which looks just like the one passed in as an argument, except that some sub-context trees are now hanging off of it. Actually, I can't think of any reason right now why the context tree would ever have to be modified while compiling an expression...it may very well be the case that this value need not be returned.
In general, each of the Compile*Expression functions behaves similarly to the corresponding Evaluate*Expression function. The main difference is that Compile*Expression has to accept all valid expressions, whereas Evaluate*Expression only has to deal with expressions that can be constant-folded. Hence, the Compile methods look a lot hairier.
L-Values and R-Values
Here's a really half-baked idea that I started to implement, but never gave much thought to. The idea is to play around with a representation for l-values (locations). We already have such a representation, Parameterized Field Descriptors. L-values represent locations that can be stored into; i.e., they can occur on the left hand side of assignment statements. Variables are a simple case of l-values; we also have record fields, array elements, and chains of the aforementioned.
The function CompileLValue, in SaffronExpressionCompileMethods.ThreeC4, takes an expression which has presumably appeared on the left hand side of an assignment statement. It produces a Parameterized Field Descriptors and a Type Graph Node. The tgn describes the l-value's type; it is used as the target type when the right hand side of the assignment is compiled.
Well, a lot of things that can be l-values can also be r-values. (R-values are pretty much all kinds of values; anything than can appear on the right hand side of an assignment statement, which is practically everything). And since we've already got (supposedly) a function which copes with hairy l-values, why write another hairy function to deal with the same kind of expressions when they appear as r-values?
Instead, there's CompileLValueIntoRValue, a primitive function implemented in SaffronContextCreateCTImpl.Mesa.
That's about all I can say about l-values and r-values, because I haven't actually implemented very much of the aforementioned functions. I should mention that none of this stuff is applicable to the Evaluate routines, since the concept of "location" or "l-value" is meaningless when you're evaluating a compile-time constant expression. This means that something has to be thought out for evaluating expressions that look like l-values (variables, record fields, array elements).