Saffron Documentation, September, 1988 XEROX To From Saffron People James Rauen PARC CSL Subject Date Saffron: An Experimental Cedar Compiler September 2, 1988 <> FOR XEROX INTERNAL USE ONLY Abstract Saffron is an experimental Cedar compiler implemented in Casaba. This memo describes the current state of the implementation of Saffron, and occasionally even hints at what might be done next. Filed As [Indigo]7.0>Documentation>SaffronDocSept1988.Tioga 1. Introduction What's Going On Here This memo is The Saffron Documentation. It describes the condition in which I have left Saffron as of early September, 1988. You may find the Tioga levels buttons useful when reading through here, especially if you are looking for the documentation of a particular file or feature. If I haven't written documentation for a particular feature, though, the Tioga levels buttons probably won't be much help... I have viciously abused tense, fonts, formality, and other grammodocumentary conventions. But you shouldn't care too much, unless you're refereeing this as a paper to be published. Hah! If you're looking for info about a particular Saffron file, you can always search this document for the file name. If you want to play with the code, the DF file and other such goodies are described in section 3. If you're going to massively muck with the code, however, I implore you to do what I did...make a DF file for a new version of Saffron (I'd call it version 4.0 this time) and leave the 3.0 stuff as a monument to immutability. (I can't remember how many times I've thanked myself for keeping 2.0 around instead of blowing it away...) Disclaimer: This is not intended to be an ongoing, updatable, flexible document; I don't think that Saffron is stable enough for such a thing to exist. However, if one were to write such a document, I would hope that one might find this memo to be of some use in the process. In Case You Care saf|fron n. 1. A plant, Crocus sativus, native to the Old World, having purple or white flowers with orange stigmas. 2. The dried stigmas of this plant, used to color foods and as a cooking spice and a dyestuff. 3. Moderate or strong orange-yellow to moderate orange. In this sense, also called ''saffron yellow.'' See color. [Middle English saffran, from Old French safran, from Medieval Latin safranum, from Arabic za'faran.] --saf"fron adj. History The primary goal of Saffron is to demonstrate that Casaba is a viable environment for developing a compiler for a real language. A secondary goal is to produce a principled implementation of a Cedar compiler which may later be used as a testbed for experimenting with the language. (Ask anybody who knows about the existing Cedar compiler how easy it would be to experiment with...) A tertiary goal is to Indispensible References Without question, you should get your hands on the following references: [1] Mesa Language Manual, preferably the Xerox Development Environment edition (because it comes in a nice binder). This is the only easy-to-read description of anything that remotely resembles the Cedar language. My edition is dated November 1984. I'd strongly avoid any earlier version, especially the blue-and-white or the ancient orange editions. [2] A Description of the Cedar Language: A Cedar Language Reference Manual. The definitive and impenetrable definition of Cedar. It takes a while to understand anything that Butler Lampson is saying, but the effort is unquestionably worth it. There were incredibly few cases of Weird Cedar that I encountered which Butler did not also call attention to. CLRM is remarkably thorough. This is a blue-and-white, CSL-83-15. My edition misspells CSL on the spine (imagine that!). Also beware: there's one printing which is missing a few pages. You may also want the following: [3] On Adding Garbage Collection and Runtime Types to a Strongly-Typed, Statically-Checked, Concurrent Language. The first seven pages contain an excellent description of the various language features which were added to (or changed in) Mesa to produce Cedar. They make a good companion to the Mesa manual. This is a blue-and-white, CSL-84-7, by Paul Rovner. Other Stuff Three asterisks (***) mark things that still have to be done. You can bet there are a lot of them. If you're lucky, some of them will even be thought out to some extent. 2. Conventions Discussion This section describes various naming conventions I have used in Saffron. You probably want to skip this section until you're actually digging into the code. File Names Just about all of the file names in Saffron begin with the name "Saffron". This makes it take longer to type file names to the command tool. Exception: The BigIntegers stuff, which is a data abstraction that I really think belongs somewhere else on CedarChest, if only someone would implement everything... Opened Interfaces I think that the anonymous form of OPEN is revolting. However, there's nothing wrong with the renaming form of OPEN, and I use it quite liberally in the primitives. (I got really sick of retyping "SaffronContextPrivateTypes") Here are the names I use. Abbreviation Interface Name AT SaffronATDef BC BigCardinals BD SaffronBaseDef EH SaffronErrorHandling GEN SaffronGenericDef PG SaffronProgramGraphPrivateTypes PGD SaffronPGDef PT SaffronContextPrivateTypes TA SaffronTargetArchitecture Casaba Variables Most of the time, the identifiers I use for Casaba variables give a pretty clear hint to what type they are. (For example, it's a fairly good guess that the type of contextTree1 is ContextTree.) But sometimes, for aesthetics' sake, I like short-and-dirty identifiers. Here are some conventions I have followed. Names Type cs CompilerState ct ContextTree env Environment ffl FrozenFieldList fl FieldList lc LocalContext pf ProgramFragment pg ProgramGraph tgn TypeGraphNode 3. Read-Me-First Files and Directories Directories Right now, everything lives on [Indigo]7.0>Saffron3.0> and its various subdirectories. The DF file will tell you all you need to know. Except, of course, that the DF file doesn't live on [Indigo]7.0>Saffron3.0> or any of its various subdirectories. The DF file lives on [Indigo]7.0>Top>. Make The make file is MakeSaffron3.0.cm. This runs ThreeC4 on all the Casaba source files (except SaffronCG, which we hardly ever change, and is really slow to recompile). Then it calls makedo on the DF file. DF The DF file is Saffron3.0.DF. Observe that the extra period in the file name confuses makedo and smodel, so you always have to type the ".DF" extension. Config The configuration file is Saffron.config. Easy enough. Install The install file is Saffron.install. Also really easy. 4. Control The top-level (if there is such a thing) control of Saffron is in SaffronDriver.mesa. This module defines a CommandTool command, "Saffron", which invokes (big surprise) Saffron. The command line also takes argument flags which ultimately wind up in the CompilerState which is passed around just about everywhere. The driver enables various interesting signals and errors (see Error Handling for details). If Saffron encounters any error in the program it is compiling, it raises one of these signals. There is also an InternalError error which is not enabled; we want internal errors to pop up a notifier window and otherwise crash the compilation. The driver calls BD.ReadDefFile to invoke the parser. It winds up with the root of the program's parse tree. It then applies the (Casaba) tree-recursive function MakeEnvironment to the root of the parse tree, which ultimately returns an environment. The environment contains an entry for each file that had to be compiled (the file being compiled, plus all the files in its DIRECTORY, plus all the files in their DIRECTORYs, and so on.) At this point, the driver aborts the compilation if any errors were detected. Note that fatal errors and internal errors will already have caused the compilation to terminate. If logging had been requested (the "-t" option (for historical reasons (it originally meant "print type graph"))) then the driver writes the context tree and program graph of each file to a log file. *** Once we're beyond the stage of producing the program graph, the driver should then call some huge routine to compile (and optimize!) the program graph into target machine instructions. If possible, it would sure be nice to turn the program graph into Mimosa intermediate language, and then take advantage of the Mimosa back end. 5. Parsing Concrete Grammar We start with the concrete grammar, defined in SaffronCG.ThreeC4. The concrete grammar is a BNF grammar which describes the Cedar source language. Casaba (in particular, GenOneCasabaParser) automatically takes this description and magically produces a parser. Most parser generators (yacc, for instance) gobble attribute grammars, which let you specify somewhat arbitrary actions to be perfomed when each reduction of a grammar production occurs. (If you don't know, but do care about, what I'm talking about, then find and read any book with a dragon on the cover authored by Aho, et al. Casaba parsers are different. The only actions associated with a production describe how to build part of a parse tree. Whenever the parser reduces a production in the concrete grammar, it builds a portion of a parse tree defined by the... Abstract Grammar The abstract grammar is defined in SaffronAG.ThreeC4. It describes what Cedar parse trees look like. Basically, there are numerous kinds of parse tree nodes. Each is defined as an AbstractType. Each abstract type declaration also indicates which tree-recursive functions can act on that kind of abstract parse tree node. (This is a royal PITA, in my opinion). Each kind of abstract type has several productions associated with it. Each of these productions is defined as an AbstractProduction. 6. Casaba Type and Function Declarations Most of the primitive functions and types used by Casaba are declared in SaffronBaseDecls.ThreeC4. Some of them, the ones which explicitly manipulate parts of the abstract parse tree, are declared in SaffronAG.ThreeC4. There are also a few in SaffronProgramGraphDecls.ThreeC4. These primitive functions and types are exported by various implementation modules written in Cedar. The tree-recursive functions are all declared in SaffronTreeDecls.ThreeC4. There are also some primitive functions which the Casaba code doesn't call, but other scattered primitives all seem to need. These functions are declared in SaffronContext.Mesa. Here is a table of the relevant Casaba files, the cedar interfaces which they generate, and my abbreviations for the interfaces: Casaba File Cedar Interface Abbreviation SaffronAG SaffronATDef AT SaffronBaseDecls SaffronBaseDef BD (none) SaffronContext (none) SaffronProgramGraphDecls SaffronPGDef PGD 7. Type and Value Primitives Type Declarations The file SaffronContextPrivateTypes.Mesa contains the type definitions that Saffron uses to represent the type graph (Cedar types and Cedar values) and the context tree. Type Graph Creation Primitives The file SaffronContextCreateCTImpl.Mesa implements primitive functions which build type graph nodes, build context trees, build field lists, and perform lookups. Actually, it was originally just supposed to hold context tree and type graph node constructors, but lately I've been stuffing random primitive functions in. Values Value is the data abstraction which Saffron uses to represent Cedar values. It is also one of the data abstractions in Saffron that sorely needs rethinking. Later on in this section, I discuss what I think needs to be changed in the representation of values. The file SaffronValueImpl.Mesa implements primitive functions which build Values. The definition of Value is in SaffronContextPrivateTypes.Mesa. As things stand right now, a Value is a variant record that can be any of the following: dummy: Dummy values are used to represent things that haven't been implemented yet. There shouldn't be any dummy values kicking around once all of Saffron is implemented. A dummy value has one field, a descriptive ROPE. unparsed: This represents a value that hasn't been parsed yet. It has one field, an expression parse tree. defaultMe: This represents a value that should be replaced by a default. I think that this is the same idea as OMITTED in CLRM. static: This represents a "static" (or "manifest" or "compile-time", if you prefer) constant. There are three fields: a program fragment which will place the value on top of the runtime stack, a type graph node (the type of the value), and the value itself. The value itself is a REF ANY whose actual type depends on the type graph node. See SaffronContextPrivateTypes.Mesa for details. trash: This represents TRASH of the appropriate type. Just like static values, trash values have a type field and a program fragment which pushes trash onto the runtime stack. runtime: This represents a runtime value. It has two fields: its type, and a program fragment. Unlike the code for static and trash nodes, however, this program fragment can be arbitrarily big and hairy. One thing that I think needs to be fixed is the distinction between static values and all values. I suggest the following. (1) Use the term "manifest" instead of "static"; I have clung to "static" because Butler Lampson uses it in CLRM. But then I read the "Report on the Programming Language Euclid", another blue-and-white by Butler Lampson, et al., in which he uses the term "manifest", which I prefer mightily. (2) Introduce a subclass of Value called ManifestValue; ManifestValue includes all the cases of Value except for runtime values. Also, ManifestValue doesn't have a "code" field. (It's trivial to assemble a PushConstant instruction). (3) Get rid of StaticValue, which tries to be what ManifestValue should be. Note that we don't want any program fragments in the type graph or context tree. Why? Because the representation of program fragments tends to get smashed when they get catenated together into larger program fragments. Program fragments only belong in the program graph. This is one of the motivations behind ManifestValues; a ManifestValue has no code in its representation, so it is always safe to put ManifestValues in the context tree and type graph. (P.S. It's kosher for the type graph and context tree to point to ProcedureGraphs...the representation of a ProcedureGraph is immutable. In fact, every procedure value points to some procedure graph.) *** Right now, there really isn't any way to represent the value of an inline procedure. I think that there should be another case of values (and ManifestValues) called inlineProc which contains the parse tree for the inline proc. This kind of value can also be used to represent initialization expressions for type defaults. (something else I haven't implemented). Remember, regular (as opposed to inline) procedures are NOT compile-time constants. Therefore, they are NOT manifest values. I emphasize this point because it took me an incredibly long time to learn it. The reasons are numerous: one, it screws up the dependency analysis if you assume that regular procedures are compile-time constants; two, be serious -- how is the compiler supposed to know the exact bit-for-bit representation of a procedure, especially before the pass that generates target machine code; three, I'm sure you can think of other reasons. Just trust me on this one. Regular procedures are NOT compile-time constants. 8. Program Graph Primitives 9. Compilation Methods Introduction The "meat" of Saffron, in my opinion, is the Casaba code which actually compiles things. Basically, the compile operations go something like this: You start out with the parse tree of something you want to compile (a module, block, statement, initial value, or expression), the context tree you've built so far, the program graph you've built so far, and the CompilerState. When compiling an expression, you'll also have a TypeGraphNode, the target type of the expression. And you don't care about the program graph, because compiling an expression never adds anything to the program graph. You get back the following. (1) A new context tree...the old one, but with sub-contexts created during the compilation hanging off of it. (2) A program fragment, the code for the execution of the particular module/block/statement you compiled. (3) A new program graph...the old one, but with new procedure graphs added for any transfer type values that were encountered during the compilation. Again, it's a bit different for expressions. Instead of getting a program fragment, you get a Value. From this value, you can extract its type and the code which places the value on top of the runtime stack. Also, you don't get a program graph back, since you didn't pass one in. There's also some funny stuff regarding definitions modules. And initial values. Compiling a Module Compiling a Block A block consists of the following, all of which are optional: * An OPEN list * Signal catchers, introduced by ENABLE * Declarations (constants, variables, types) * Statements * Exit list <> Create a local context. <> Add EXITS list to the context. <<*** Not implemented.>> <> Add OPEN names to the context. *** Not implemented. This is one feature which used to work, partially, but I bashed. If you look in Saffron2.0, you'll notice that Saffron2.0 lets you open interfaces. Well, there's a lot of other things that can be opened (records for instance); this involves hanging on to some parse trees and building some a structure (to be a field of the local context) which handles the bizarre scoping rules in OPEN statements. Also, all the lookup routines will have to know how to cope with this structure. For your amusement, see sections 4.4.2 and 7.2.2.2 of the Mesa manual, and section 3.4.2 of CLRM. Do something with the ENABLE clauses. *** Obviously, not yet implemented. Add the EXITS list to the context. *** Not implemented. Build a field list. The function AddDeclarationsToFieldList is used to build a field list with one field for each name in the declarations. Identifiers appearing where type expressions are expected are represented as IdentifierTGN's (they are not looked up yet). Look up the type names. Now go back and look up all the IdentiferTGNs. Replace the body field of each identifer TypeGraphNode with named TypeGraphNode corresponding to the identifier. (Note: At this point, all value expressions are still trash or unparsed or defaultMe.) Build a dependency graph. Analyze the dependency graph. <> <> <> <> <> <> Add the dependency graph to the local context. We don't need to keep the dependency graph around. The only reason we do is so we can print it out when we print out the local context. *** I cheat here; AnalyzeDependencies also performs this step. There really otta be a separate primitive which does this simple operation; I'm just too lazy busy to write it. Store the contents. Freeze the local context. Produce a context tree. Compile the declarations. Proceed sequentially through the declaration list, performing the following on each declaration: If it is a TYPE declaration, ignore it. (What about type codes, etc.?) <> <> Compile the statements. Expressions Overview There are three sets of functions that are applied to expressions. The first set is used while building the dependency graph; these functions look at an expression and determine what other quantities in the dependency graph that the expression depends upon. The second set is used to evaluate expressions while the dependency graph is being walked through. The third set is used to compile expressions during the general process of compiling declarations and statements. Building the Dependency Graph The interesting functions which assemble the dependency graph are AddValueDependencies, AddFirstDependencies, AddLastDependencies, and AddSizeDependencies. The methods for these functions appear in SaffronDependencyGraphMethods.ThreeC4. The specification for AddValueDependencies[tree, dg, dgn] reads like this: A particular node dgn in the dependency graph dg depends on the value of this expression tree. Add edges to the dependency graph from dgn to every node in the dependency graph that tree depends on. The specification for AddFirstDependencies[tree, dg, dgn] reads like this: A particular node dgn in the dependency graph dg depends on the first element of the type described by this type-expression tree. Add edges to the dependency graph from dgn to every node in the dependency graph that the first element of the type described by tree depends on. AddLastDependencies and AddSizeDependencies do essentially the same thing that AddFirstDependencies does, except that they care about the last element (AddLastDependencies) and the size (AddSizeDependencies) of the type described by tree, not the first element. Evaluating Expressions The interesting functions which evaluate expressions are EvaluateAndTypeCheckExpression, EvaluateExpression, EvaluateSizeOfTypeExpression, EvaluateFirstOfTypeExpression, and EvaluateLastOfTypeExpression. EvaluateAndTypeCheckExpression is a primitive function implemented in SaffronTypeConformanceImpl.Mesa. The others are tree-recursive functions whose methods appear in SaffronExpressionCompileMethods.ThreeC4. The various Evaluate routines are used while the dependency graph for a block is being walked. At this time, the local context for that block is still under construction. The Values that the Evaluate functions return are always compile-time constants. The Evaluate functions take the following arguments: (1) A parse tree for whatever expression is being evaluated. (2) The local context of the block in which the expressions appears. Since the local context is still being built, it does not yet contain any information about names declared within that block. It is only useful for extracting its parent context rib, which has already been built and frozen. (3) The field list for the contents of the block in which the expression appears. Here is where information is available about the names declared within the block. (4) The CompilerState, which is needed here and there. (5) A type graph node, the target type of the expression. Except in EvaluateAndTypeCheckExpression, the target type is used only as a hint for evaluating the expression. The only case I can think of right now where the target type is actually needed is when evaluating identifiers; if the target type is an enumerated type, then the identifier should be considered an element of the enumerated type in preference to being considered as a variable identifier. Take a look at the Anomaly for Enumeration Literals in CLRM, section 4.7. *** BTW, this isn't implemented. The EvaluateAndTypeCheckExpression function calls EvaluateExpression, then checks the type of its result against the target type. If the two types are not compatible (I forget which is the appropriate relationship), then EvaluateAndTypeCheckExpression raises an error. EvaluateAndTypeCheckExpression should not change the type of its argument; this will cause trouble if we start using a target type of "TOP"... EvaluateExpression is the function that really does the work of evaluating an expression. It has all the gory methods. The functions EvaluateFirstOfTypeExpression, EvaluateSizeOfTypeExpression, and EvaluateLastOfTypeExpression are applied to type expressions; each produces the appropriate value of the type which the type expression describes. For type safety reasons, EvaluateAndTypeCheckExpression is the only function which should be called to evaluate expressions. No one should ever call EvaluateExpression, except for EvaluateAndTypeCheckExpression itself. *** Currently, some of the methods which evaluate the arguments of polymorphic operators call EvaluateExpression. This is wrong. Here is how they should work. The numeric operators should call EvaluateAndTypeCheckExpression with a target type of "TOP" for each of their arguments, then use the DemandNumber primitive to ensure that the arguments have numeric types. The equals and not-equals operators should call EvaluateAndTypeCheckExpression with a target type of "TOP" for the first argument, then call EvaluateAndTypeCheckExpression on the second argument, using the first argument's type for the target type. Compiling Expressions If you thought that evaluating expressions is hairy, wait until you try compiling them! The functions which compile expressions are similar to those which evaluate them. They are CompileAndTypeCheckExpression, CompileExpression, CompileSizeOfTypeExpression, CompileFirstOfTypeExpression, and CompileLastOfTypeExpression. CompileAndTypeCheckExpression is a primitive function implemented in SaffronTypeConformanceImpl.Mesa; the others are tree-recursive functions whose methods appear in SaffronExpressionCompileMethods.ThreeC4. It's very important that if Evaluate*Expression accepts an expression and returns a value, then the corresponding Compile*Expression function also return the same value. The sundry CompileExpression routines are used while declarations and statements are being compiled. At this time, the local context (wherein the expressions appear) has been completely built and frozen into a rib. The Values returned by CompileExpression routines are generally fated to become part of a PushConstant program graph node. The CompileExpression functions take the following arguments: (1) A parse tree for whatever expression is being compiled. (2) The context tree whose rib is the very same rib I was talking about two paragraphs above. (3) The CompilerState, which is needed here and there. (4) The target type. Just like in the Evaluate functions, the target type is used only as a hint for evaluating the expression, except in CompileAndTypeCheckExpression The CompileExpression functions return the following values: (1) A Value, which might be runtime or static. The interesting parts of this Value are its type, which is used by CompileAndTypeCheckExpression, and the program fragment which pushes the value onto the runtime stack. (2) A new and improved context tree which looks just like the one passed in as an argument, except that some sub-context trees are now hanging off of it. Actually, I can't think of any reason right now why the context tree would ever have to be modified while compiling an expression...it may very well be the case that this value need not be returned. In general, each of the Compile*Expression functions behaves similarly to the corresponding Evaluate*Expression function. The main difference is that Compile*Expression has to accept all valid expressions, whereas Evaluate*Expression only has to deal with expressions that can be constant-folded. Hence, the Compile methods look a lot hairier. L-Values and R-Values Here's a really half-baked idea that I started to implement, but never gave much thought to. The idea is to play around with a representation for l-values (locations). We already have such a representation, Parameterized Field Descriptors. L-values represent locations that can be stored into; i.e., they can occur on the left hand side of assignment statements. Variables are a simple case of l-values; we also have record fields, array elements, and chains of the aforementioned. The function CompileLValue, in SaffronExpressionCompileMethods.ThreeC4, takes an expression which has presumably appeared on the left hand side of an assignment statement. It produces a Parameterized Field Descriptors and a Type Graph Node. The tgn describes the l-value's type; it is used as the target type when the right hand side of the assignment is compiled. Well, a lot of things that can be l-values can also be r-values. (R-values are pretty much all kinds of values; anything than can appear on the right hand side of an assignment statement, which is practically everything). And since we've already got (supposedly) a function which copes with hairy l-values, why write another hairy function to deal with the same kind of expressions when they appear as r-values? Instead, there's CompileLValueIntoRValue, a primitive function implemented in SaffronContextCreateCTImpl.Mesa. That's about all I can say about l-values and r-values, because I haven't actually implemented very much of the aforementioned functions. I should mention that none of this stuff is applicable to the Evaluate routines, since the concept of "location" or "l-value" is meaningless when you're evaluating a compile-time constant expression. This means that something has to be thought out for evaluating expressions that look like l-values (variables, record fields, array elements). 10. Error and Signal Handling Where they Are The places to look are SaffronErrorHandling.Mesa and SaffronErrorHandlingImpl.Mesa. Error and Signal Types There are six kinds of errors and signals that Saffron raises. Look at the description of the driver to find out how they are handled. Here they are, from the most drastic to the most harmless: Impossible (Error) Impossible errors occur only in the primitives. They occur in the ENDCASE branch of a SELECT statement when the ENDCASE should never be reached. Even more specifically, they only occur when the SELECT statement is dispatching on elements of an enumerated type or discriminating a variant record. Internal (Error) Internal errors are used to signal bugs in Saffron. There are basically three situations where internal errors might occur. (1) Detecting that a representation invariant has been violated. (2) Discovering that an impossible condition has happened, where the situation is anything more complicated than the case described in the preceding section. (3) Indicating that something isn't implemented yet. Fatal (Error) Used to report an unrecoverable error in the Cedar program being compiled. Immediately terminates the compilation. Error (Signal) This is a signal, not an error. Used to report a recoverable error in the Cedar program being compiled. Watch it with this one...you don't want to resume from one of these if anything in Saffron's representation is amiss...otherwise, internal errors might start occurring later in the compilation. Warning (Signal) Used to report a warning message about something in the Cedar program being compiled. Message (Signal) Used to send a message to the report stream. I use this one mainly for debugging. Something to Fix *** Right now, only the primitives can hook into this machinery. Casaba code has to call a procedure called "Error", which is independent of all this stuff. I have put some procedures into the aforementioned files which will allow Casaba code to raise these various and sundry errors. I would recommend that the following be done: (1) Rename the signals and errors to something else. Why? So that (2) can be done without horrible name conflicts. (2) Rename the procedures which invoke the various and sundry signals and errors to something more palatable. Like "InternalError", "FatalError", "Error", "Warning", and "Message", perhaps. (3) Add BaseFunction or CedarFunction declarations to the Casaba sources, probably SaffronBaseDecls.ThreeC4, for these invocation procedures. (4) Get rid of the old Error function, and replace its invocations with the appropriate new procedure. 11. Stuff enumerated types MakeType