Saffron Documentation, September, 1988 XEROX To From Saffron People James Rauen PARC CSL Subject Date Saffron: An Experimental Cedar Compiler September 2, 1988 FOR XEROX INTERNAL USE ONLY Abstract Saffron is an experimental Cedar compiler implemented in Casaba. This memo describes the current state of the implementation of Saffron, and occasionally even hints at what might be done next. Filed As [Indigo]7.0>Documentation>SaffronDocSept1988.Tioga 1. Introduction What's Going On Here This document is The Saffron Documentation. It describes the condition in which I have left Saffron as of the end of August, 1988. You may find the Tioga levels buttons useful when reading through here, especially if you are looking for the documentation of a particular file or feature. If I haven't written documentation for a particular feature, though, the Tioga levels buttons probably won't be much help... I have viciously abused tense, fonts, formality, and other grammodocumentary conventions. But you shouldn't care too much, unless you're refereeing this as a paper to be published. Hah! If you're looking for info about a particular Saffron file, you can always search this document for the file name. In Case You Care saf|fron n. 1. A plant, Crocus sativus, native to the Old World, having purple or white flowers with orange stigmas. 2. The dried stigmas of this plant, used to color foods and as a cooking spice and a dyestuff. 3. Moderate or strong orange-yellow to moderate orange. In this sense, also called ''saffron yellow.'' See color. [Middle English saffran, from Old French safran, from Medieval Latin safranum, from Arabic za'faran.] --saf"fron adj. History Indispensible References Without question, you should get your hands on the following references: [1] Mesa Language Manual, preferably the Xerox Development Environment edition (because it comes in a nice binder). This is the only easy-to-read description of anything that remotely resembles the Cedar language. My edition is dated November 1984. I'd strongly avoid any earlier version, especially the blue-and-white or the ancient orange editions. [2] A Description of the Cedar Language: A Cedar Language Reference Manual. The definitive and impenetrable definition of Cedar. It takes a while to understand anything that Butler Lampson is saying, but the effort is unquestionably worth it. There were incredibly few cases of Weird Cedar that I encountered which Butler did not also call attention to. CLRM is remarkably thorough. This is a blue-and-white, CSL-83-15. My edition misspells CSL on the spine (imagine that!). Also beware: there's one printing which is missing a few pages. You may also want the following: [3] On Adding Garbage Collection and Runtime Types to a Strongly-Typed, Statically-Checked, Concurrent Language. The first seven pages contain an excellent description of the various language features which were added to (or changed in) Mesa to produce Cedar. They make a good companion to the Mesa manual. This is a blue-and-white, CSL-84-7, by Paul Rovner. Other Stuff Three asterisks (***) mark things that still have to be done. You can bet there are a lot of them. If you're lucky, some of them will even be thought out to some extent. 2. Conventions Discussion This section describes various naming conventions I have used in Saffron. You probably want to skip this section until you're actually digging into the code. File Names Just about all of the file names in Saffron begin with the name "Saffron". This makes it take longer to type file names to the command tool. Exception: The BigIntegers stuff, which is a data abstraction that I really think belongs somewhere else on CedarChest, if only someone would implement everything... Opened Interfaces I think that the anonymous form of OPEN is revolting. However, there's nothing wrong with the renaming form of OPEN, and I use it quite liberally in the primitives. (I got really sick of retyping "SaffronContextPrivateTypes") Here are the names I use. Abbreviation Interface Name AT SaffronATDef BC BigCardinals BD SaffronBaseDef EH SaffronErrorHandling GEN SaffronGenericDef PG SaffronProgramGraphPrivateTypes PGD SaffronPGDef PT SaffronContextPrivateTypes TA SaffronTargetArchitecture Casaba Variables Most of the time, the identifiers I use for Casaba variables give a pretty clear hint to what type they are. (For example, it's a fairly good guess that the type of contextTree1 is ContextTree.) But sometimes, for aesthetics' sake, I like short-and-dirty identifiers. Here are some conventions I have followed. Names Type cs CompilerState ct ContextTree env Environment ffl FrozenFieldList fl FieldList lc LocalContext pf ProgramFragment pg ProgramGraph tgn TypeGraphNode 3. Read-Me-First Files and Directories Directories Right now, everything lives on [Indigo]7.0>Saffron3.0> and its various subdirectories. The DF file will tell you all you need to know. Except, of course, that the DF file doesn't live on [Indigo]7.0>Saffron3.0> or any of its various subdirectories. The DF file lives on [Indigo]7.0>Top>. Make The make file is MakeSaffron3.0.cm. This runs ThreeC4 on all the Casaba source files (except SaffronCG, which we hardly ever change, and is really slow to recompile). Then it calls makedo on the DF file. DF The DF file is Saffron3.0.DF. Observe that the extra period in the file name confuses makedo and smodel, so you always have to type the ".DF" extension. Config The configuration file is Saffron.config. Easy enough. Install The install file is Saffron.install. Also really easy. 4. Control The top-level (if there is such a thing) control of Saffron is in SaffronDriver.mesa. This module defines a CommandTool command, "Saffron", which invokes (big surprise) Saffron. The command line also takes argument flags which ultimately wind up in the CompilerState which is passed around just about everywhere. The driver enables various interesting signals and errors (see Error Handling for details). If Saffron encounters any error in the program it is compiling, it raises one of these signals. There is also an InternalError error which is not enabled; we want internal errors to pop up a notifier window and otherwise crash the compilation. The driver calls BD.ReadDefFile to invoke the parser. It winds up with the root of the program's parse tree. It then applies the (Casaba) tree-recursive function MakeEnvironment to the root of the parse tree, which ultimately returns an environment. The environment contains an entry for each file that had to be compiled (the file being compiled, plus all the files in its DIRECTORY, plus all the files in their DIRECTORYs, and so on.) At this point, the driver aborts the compilation if any errors were detected. Note that fatal errors and internal errors will already have caused the compilation to terminate. If logging had been requested (the "-t" option (for historical reasons (it originally meant "print type graph"))) then the driver writes the context tree and program graph of each file to a log file. *** Once we're beyond the stage of producing the program graph, the driver should then call some huge routine to compile (and optimize!) the program graph into target machine instructions. If possible, it would sure be nice to turn the program graph into Mimosa intermediate language, and then take advantage of the Mimosa back end. 5. Parsing Concrete Grammar We start with the concrete grammar, defined in SaffronCG.ThreeC4. The concrete grammar is a BNF grammar which describes the Cedar source language. Casaba (in particular, GenOneCasabaParser) automatically takes this description and magically produces a parser. Most parser generators (yacc, for instance) gobble attribute grammars, which let you specify somewhat arbitrary actions to be perfomed when each reduction of a grammar production occurs. (If you don't know, but do care about, what I'm talking about, then find and read any book with a dragon on the cover authored by Aho, et al. Casaba parsers are different. The only actions associated with a production describe how to build part of a parse tree. Whenever the parser reduces a production in the concrete grammar, it builds a portion of a parse tree defined by the... Abstract Grammar The abstract grammar is defined in SaffronAG.ThreeC4. It describes what Cedar parse trees look like. Basically, there are numerous kinds of parse tree nodes. Each is defined as an AbstractType. Each abstract type declaration also indicates which tree-recursive functions can act on that kind of abstract parse tree node. (This is a royal PITA, in my opinion). Each kind of abstract type has several productions associated with it. Each of these productions is defined as an AbstractProduction. 6. Casaba Type and Function Declarations Most of the primitive functions and types used by Casaba are declared in SaffronBaseDecls.ThreeC4. Some of them, the ones which explicitly manipulate parts of the abstract parse tree, are declared in SaffronAG.ThreeC4. There are also a few in SaffronProgramGraphDecls.ThreeC4. These primitive functions and types are exported by various implementation modules written in Cedar. The tree-recursive functions are all declared in SaffronTreeDecls.ThreeC4. There are also some primitive functions which the Casaba code doesn't call, but other scattered primitives all seem to need. These functions are declared in SaffronContext.Mesa. Here is a table of the relevant Casaba files, the cedar interfaces which they generate, and my abbreviations for the interfaces: Casaba File Cedar Interface Abbreviation SaffronAG SaffronATDef AT SaffronBaseDecls SaffronBaseDef BD (none) SaffronContext (none) SaffronProgramGraphDecls SaffronPGDef PGD 7. Type and Value Primitives Type Declarations The file SaffronContextPrivateTypes.Mesa contains the type definitions that Saffron uses to represent the type graph (Cedar types and Cedar values) and the context tree. Type Graph Creation Primitives The file SaffronContextCreateCTImpl.Mesa implements primitive functions which build type graph nodes, build context trees, build field lists, and perform lookups. Actually, it was originally just supposed to hold context tree and type graph node constructors, but lately I've been stuffing random primitive functions in. Value Primitives The file SaffronValueImpl.Mesa implements primitive functions which build Saffron representations of Cedar values. 8. Program Graph Primitives 9. Compilation Methods Introduction The "meat" of Saffron, in my opinion, is the Casaba code which actually compiles things. Basically, the compile operations go something like this: You start out with the parse tree of something you want to compile (a module, block, statement, initial value, or expression), the context tree you've built so far, the program graph you've built so far, and the CompilerState. When compiling an expression, you'll also have a TypeGraphNode, the target type of the expression. And you don't care about the program graph, because compiling an expression never adds anything to the program graph. You get back the following. (1) A new context tree...the old one, but with sub-contexts created during the compilation hanging off of it. (2) A program fragment, the code for the execution of the particular module/block/statement you compiled. (3) A new program graph...the old one, but with new procedure graphs added for any transfer type values that were encountered during the compilation. Again, it's a bit different for expressions. Instead of getting a program fragment, you get a Value. From this value, you can extract its type and the code which places the value on top of the runtime stack. Also, you don't get a program graph back, since you didn't pass one in. There's also some funny stuff regarding definitions modules. And initial values. Compiling a Module Compiling a Block A block consists of the following, all of which are optional: * An OPEN list * Signal catchers, introduced by ENABLE * Declarations (constants, variables, types) * Statements * Exit list Create a local context. Add EXITS list to the context. Add OPEN names to the context. *** Not implemented. This is one feature which used to work, partially, but I bashed. If you look in Saffron2.0, you'll notice that Saffron2.0 lets you open interfaces. Well, there's a lot of other things that can be opened (records for instance); this involves hanging on to some parse trees and building some a structure (to be a field of the local context) which handles the bizarre scoping rules in OPEN statements. Also, all the lookup routines will have to know how to cope with this structure. For your amusement, see sections 4.4.2 and 7.2.2.2 of the Mesa manual, and section 3.4.2 of CLRM. Do something with the ENABLE clauses. *** Obviously, not yet implemented. Add the EXITS list to the context. *** Not implemented. Build a field list. The function AddDeclarationsToFieldList is used to build a field list with one field for each name in the declarations. Identifiers appearing where type expressions are expected are represented as IdentifierTGN's (they are not looked up yet). Look up the type names. Now go back and look up all the IdentiferTGNs. Replace the body field of each identifer TypeGraphNode with named TypeGraphNode corresponding to the identifier. (Note: At this point, all value expressions are still trash or unparsed or defaultMe.) Build a dependency graph. Analyze the dependency graph. Add the dependency graph to the local context. We don't need to keep the dependency graph around. The only reason we do is so we can print it out when we print out the local context. *** I cheat here; AnalyzeDependencies also performs this step. There really otta be a separate primitive which does this simple operation; I'm just too lazy busy to write it. Store the contents. Freeze the local context. Produce a context tree. Compile the declarations. Proceed sequentially through the declaration list, performing the following on each declaration: If it is a TYPE declaration, ignore it. (What about type codes, etc.?) Compile the statements. Expressions Overview There are three sets of functions that are applied to expressions. The first set is used while building the dependency graph; these functions look at an expression and determine what other quantities in the dependency graph that the expression depends upon. The second set is used to evaluate expressions while the dependency graph is being walked through. The third set is used to compile expressions during the general process of compiling declarations and statements. Building the Dependency Graph The interesting functions which assemble the dependency graph are AddValueDependencies, AddFirstDependencies, AddLastDependencies, and AddSizeDependencies. The methods for these functions appear in SaffronDependencyGraphMethods.ThreeC4. The specification for AddValueDependencies[tree, dg, dgn] reads like this: A particular node dgn in the dependency graph dg depends on the value of this expression tree. Add edges to the dependency graph from dgn to every node in the dependency graph that tree depends on. The specification for AddFirstDependencies[tree, dg, dgn] reads like this: A particular node dgn in the dependency graph dg depends on the first element of the type described by this type-expression tree. Add edges to the dependency graph from dgn to every node in the dependency graph that the first element of the type described by tree depends on. AddLastDependencies and AddSizeDependencies do essentially the same thing that AddFirstDependencies does, except that they care about the last element (AddLastDependencies) and the size (AddSizeDependencies) of the type described by tree, not the first element. Evaluating Expressions The interesting functions which evaluate expressions are EvaluateAndTypeCheckExpression, EvaluateExpression, EvaluateSizeOfTypeExpression, EvaluateFirstOfTypeExpression, and EvaluateLastOfTypeExpression. EvaluateAndTypeCheckExpression is a primitive function implemented in SaffronTypeConformanceImpl.Mesa. The others are tree-recursive functions whose methods appear in SaffronExpressionCompileMethods.ThreeC4. The various Evaluate routines are used while the dependency graph for a block is being walked. At this time, the local context for that block is still under construction. The values that the Evaluate functions return are always compile-time constants. The Evaluate functions take the following arguments: (1) A parse tree for whatever expression is being evaluated. (2) The local context of the block in which the expressions appears. Since the local context is still being built, it does not yet contain any information about names declared within that block. It is only useful for extracting its parent context rib, which has already been built and frozen. (3) The field list for the contents of the block in which the expression appears. Here is where information is available about the names declared within the block. (4) The CompilerState, which is needed here and there. (5) A type graph node, the target type of the expression. Except in EvaluateAndTypeCheckExpression, the target type is used only as a hint for evaluating the expression. The only case I can think of right now where the target type is actually needed is when evaluating identifiers; if the target type is an enumerated type, then the identifier should be considered an element of the enumerated type in preference to being considered as a variable identifier. Take a look at the Anomaly for Enumeration Literals in CLRM, section 4.7. *** BTW, this isn't implemented. The EvaluateAndTypeCheckExpression function calls EvaluateExpression, then checks the type of its result against the target type. If the two types are not compatible (I forget which is the appropriate relationship), then EvaluateAndTypeCheckExpression raises an error. EvaluateAndTypeCheckExpression should not change the type of its argument; this will cause trouble if we start using a target type of "TOP"... EvaluateExpression is the function that really does the work of evaluating an expression. It has all the gory methods. The functions EvaluateFirstOfTypeExpression, EvaluateSizeOfTypeExpression, and EvaluateLastOfTypeExpression are applied to type expressions; each produces the appropriate value of the type which the type expression describes. For type safety reasons, EvaluateAndTypeCheckExpression is the only function which should be called to evaluate expressions. No one should ever call EvaluateExpression, except for EvaluateAndTypeCheckExpression itself. *** Currently, some of the methods which evaluate the arguments of polymorphic operators call EvaluateExpression. This is wrong. Here is how they should work. The numeric operators should call EvaluateAndTypeCheckExpression with a target type of "TOP" for each of their arguments, then use the DemandNumber primitive to ensure that the arguments have numeric types. The equals and not-equals operators should call EvaluateAndTypeCheckExpression with a target type of "TOP" for the first argument, then call EvaluateAndTypeCheckExpression on the second argument, using the first argument's type for the target type. Compiling Expressions 10. Brief summary of files SaffronDeclarationAnalysisImpl.mesa Primitives for constructing and analyzing dependency graphs. 11. Stuff enumerated types abstractions for values head Node, repeat as needed, nest for subheads if appropriate body node, repeat as needed To submit a completed entry to the CSL Notebook, use /Cedar/CedarChest7.0/Forms/CSL-NotebookMsg.form (available in a Cedar Command Tool via "openr CSL-NotebookMsg.form" or "form CSL-NotebookMsg") For further information about the CSL Notebook, see either /Cedar/CedarChest7.0/Documentation/CSL-NotebookDoc.tioga or /Indigo/CSL-Notebook/Documentation/CSL-NotebookDoc.tioga There is some magic Tioga formatting in this form. The whole document has a local style appropriate for producing the Xerox corporate standard Internal Memo design [reference Xerox Corporate Identification Graphics Standards, Section 2, page 15]. The first node has a Mark property, insideHeader, that makes CSL Notebook Entry and the title appear on each page. The second node has no text content and a special format that interacts with the third node, the Xerox logo, to position the logo on the first printed page. The To and Subject nodes are Tioga subtrees with sufficient empty nodes to space them properly to conform to the Internal Memo design. ôCopyright c 1988 by Xerox Corporation. All rights reserved. The code which compiles blocks is in SaffronBlockCompileMethods.ThreeC4. There are two procedures which enter this code, CompileFrameBlock and Compile. CompileFrameBlock is used to compile blocks which occupy new frames (global frames in the case of program blocks, stack frames in the case of procedure blocks) and possibly have argument and result lists. Compile is used to compile plain ol' blocks which are nested within frames. The code does the following: Start off by creating an empty context and an empty field list. CompileFrameBlock has to add the argument and result fields (from the transferType argument) to the field list. (*** This isn't implemented yet.) Then call InternalCompileBlock to continue. *** Not implemented. Call CompileScope to continue. Do a topological sort of the dependency graph, performing the following on each node: If the node is a value node which does not ultimately depend on the runtime state, then it must represent a static (compile-time constant) value. Use the expression compiler to change the value's parse tree into a static representation. If the node is a value node which ultimately depends on the runtime state, then it represents a variable or runtime constant. Do nothing with it at this time. If the node is a FIRST, LAST, or SIZE node which does not ultimately depend on the runtime state, then use the expression compiler to compute a static value and stuff this value into the appropriate slot of the appropriate type graph node. If the node is a FIRST, LAST, or SIZE node which ultimately depends on the runtime state, then raise an error. This will determine all the compile-time (static) quantities in the dependency graph. If it is a constant declaration, and the value slot of any corresponding entry in the field list contains a static ValueNode, then the field represents a compile-time constant declaration. Since compile-time constants have already been determined, simply ignore the field. (If we want these values around at runtime, for debugging or other reasons, then emit an appropriate StoreLocal instruction.) Otherwise, the field represents a constant or variable declaration which requires a runtime representation. Compile the field's initialization expression; the result will be a context tree and a program graph fragment which places the initial value on top of the runtime stack. Concatenate this code with a StoreLocal instruction. Hang the child context tree from the context tree. Ê j•StyleDef± BeginStyle (Cedar) AttachStyle (abstract) "for abstract nodes" { head4 AlternateFontFamily 14 en tabStops } StyleRule (root) "format for root nodes" { cedarRoot } ScreenRule (root) "format for root nodes" { 0 firstHeaders 0 lastDropFolio FontPrefix docStandard 36 pt topMargin 36 pt headerMargin .5 in footerMargin .5 in bottomMargin 1.75 in leftMargin 1.5 in rightMargin 5.25 in lineLength 24 pt topIndent 24 pt topLeading 0 leftIndent 0 rightIndent } PrintRule (pagenumber) "format for folio on each page" { unleaded AlternateFontFamily } StyleRule (positionInternalMemoLogo) "Xerox logo: screen" { docStandard 1 pt leading 1 pt topLeading 1 pt bottomLeading } ScreenRule (positionInternalMemoLogo) "for Xerox logo" { docStandard 1 pt leading 1 pt topLeading -36 pt bottomLeading -1.5 in leftIndent } PrintRule (internalMemoLogo) "Xerox logo: screen" { "Logo" family 18 bp size 20 pt topLeading 20 pt bottomLeading } ScreenRule (internalMemoLogo) "for Xerox logo" { "Logo" family 18 pt size 12 pt leading -36 pt topLeading 36 pt bottomLeading -1.5 in leftIndent } PrintRule (memoHead) "for the To, From, Subject nodes at front of memos" { docStandard AlternateFontFamily 240 pt tabStops } StyleRule EndStyle˜Iunleaded•Mark insideHeaderšÐbo&˜&Ipositioninternalmemologo˜Iinternalmemologošœ˜memoheadšÏsœž˜NšœÏtœ ˜Nšœ ˜ N˜N˜N˜—šžœž˜ Nšœ˜Nšœ˜N˜N˜—Iabstractšœ Ïmœ1™Q˜¦Q˜¤Q˜6 šœ}£œ*˜«QšžŽ˜Ž—— šœ˜Qšž&Ðbsže˜Ž—Qšœw˜wQšœâ˜âQšœ~¤œW˜ÛQšœë˜ë—P˜——P˜ š¨#˜#R˜<—P˜ R˜R˜šÐrtœ=®˜?Rš®œ®˜ š®œ4˜5indentšœ/˜/Qšœ^˜^— šœ:˜:Sšœ=˜=Sšœ8®˜9——Qš ®œ¶œÇÐosœ¯œt®˜“——…—Qgv