CSL Notebook Entry: Saffron Dealer Notes and Slides

XEROX
To    From
CSL    James Rauen
    PARC CSL


Subject    Date
Dealer notes and slides: Saffron, an    August 31, 1988
experimental Cedar compiler    


<<Copyright Š 1988 by Xerox Corporation.  All rights reserved.>>
FOR XEROX INTERNAL USE ONLY

Abstract    Cedar gives its programmers more than enough rope to hang themselves, but rarely allows them to kick the chair out from 
underneath.  The incredible maze of Cedar "features" makes it difficult to produce a coherent specification of the language, even 
more so an implementation, not to mention a coherent implementation.  I will describe Saffron, a compiler which attempts to 
coherently implement Cedar.  I will discuss my work on Saffron this summer, describe some of Cedar's "features" which make it such 
a joy to implement, and show some highlights from the trip I took into the abyss of Weird Cedar.
Attributes    Dealer, Cedar, compiler, Casaba, informal
References    CSL-83-15 "A Description of the Cedar Language"
Very Brief Introduction
    These are the slides I used for my Saffron dealer on August 30, 1988, along with some notes.
Slide 1 (Title)
Saffron
An Experimental Cedar Compiler


James Rauen
August 31, 1988

    I had originally titled this talk "Weird Cedar", intending to talk about all the weird nooks and crannies of Cedar that I spent 
    this summer exploring.  But Howard Sturgis and Bill Jackson talked me out of it and convinced me that I should talk about 
    Saffron instead.
Slide 2
Why another Cedar compiler?
ˇ Demonstrate the viability of Casaba
ˇ Principled implementation
Easy to maintain
Easy to verify
ˇ Experiments

    The first question you might ask is, "Why do we need another Cedar compiler?"  After all, we already have one working, and there 
    are plenty of people in the lab who are porting Cedar to other architectures.  Well, the immediate goal (by immediate, I mean 
    in the foreseeable future) of Saffron is to demonstrate that Casaba is a viable compiler implementation language.  The best way 
    to show this is to implement a real language, like Cedar, using Casaba.
    It would also be really nice to have a principled implementation of Cedar kicking around.  I haven't looked at the existing 
    compiler much, but I've heard some spine-tingling horror stories about it.  The problem with the existing compiler is that it 
    is an aggregate of ten (?) years of work, during which time the language definition was constantly changing.  So it wasn't like 
    the compiler could just have been designed and implemented.  
    A principled Cedar implementation would make it easy (?) to compare the language semantics with the implementation itself.  
    Also, a principled Cedar compiler would be a useful testbed for adding (and removing!) features from the language.
Slide 3
What is Cedar?
ˇ CLRM
ˇ Mesa manual
ˇ The implementation
ˇ Religion

    The first problem I ran into was finding a definition of Cedar.  Needless to say, this wasn't an easy task.  There are at least 
    four major sources, none of which is completely suitable.
    The definitive source (and definitive here is in quotation marks) is the Cedar Language Reference Manual, Butler Lampson's work.  
    Butler's book reads like very elegant poetry written in Serbo-Croatian.  You know it's elegant; but you don't necessarily 
    understand it right away.  And if you ever do take the time to plow through it, you'll find that Butler presents a very elegant 
    definition of a language which he calls Cedar. Butler's Cedar bears a very strong resemblance to the current version of Cedar 
    we all know and love, but there are also a lot of differences.  Not the least of which is that Butler's Cedar is polymorphic.  
    There are also some discrepancies and inconsistencies in his definition of "current Cedar".  So we can't just say that CLRM is 
    the definitive definition of Cedar, not without changing our idea of what Cedar is. There are also some bugs in CLRM which no 
    one has gotten around to fixing.  I'll get to some of these later.
    But CLRM isn't the only definition around.  There's also the Mesa manual.  If you consider Cedar to be a strict superset of Mesa, 
    where Mesa is the subset of programs which don't have the keyword "CEDAR" in their headers, then the Mesa subset should agree 
    with what the Mesa manual says.   On the other hand, if you don't buy the argument that Mesa is a strict subset of Cedar, I 
    still think you'd agree that Cedar constructs should behave like their Mesa counterparts whenever possible.  Either way, the 
    Mesa manual provides at least a partial definition of Cedar.
    And the implementation.  (The current Cedar compiler.) 
    Finally, we have the Cedar Religion.  Every Cedar programmer has his/her way of envisioning how the language "should be".  
    What this all boils down to is that there is no canonical definition of Cedar.  So, when I say to myself, "I'm going to sit down 
    and write a Cedar compiler," I'm at a bit of a loss to say exactly which Cedar I'm compiling.  
    I'm never going to say all this.  Looks like only posterity will get to see it.
Slide 4
Saffron
ˇ Front end of a compiler
ˇ Implements Cedar/Mesa ą Religion
ˇ Implemented in Casaba

    By "implements Cedar give-or-take Religion", I mean that we're sticking as close as possible to Butler and the current 
    implementation.  But there are also a few places where we add a feature, for completeness' sake...that's where the religion 
    comes in.
    One thing we're not doing is eliminating disgusting language features.  I might look at something like implementation modules in 
    directory clauses, or side-effecting initialization expressions, and say, "that's disgusting."  And Howard will jump up and 
    down and might even say, "that's illegal".  But somebody will then point out that, hey, some crucial code buried deep within 
    the system depends on that language feature.  In any event, we're not pruning the language at all.
Slide 5
Parser
ˇ Concrete Grammar
154 nonterminals
475 productions
ˇ Abstract Grammar
120 nonterminals
364 productions

    A numbers slide.  (Howard pointed out that I probably won't have the time to do the spiel on concrete grammars versus abstract 
    grammars.)  Faithful reader: If you're interested, find Howard or one of his Casaba papers.
    Casaba automatically generates a parser based on these grammars.  This winds up giving us an abstract parse tree.
Slide 6
Recursive Functions
ˇ Applied to the abstract parse tree
ˇ Build:
Context Tree
Type Graph
Program Graph
ˇ Example
    CompileFrameBlock:
        TreeRecursiveFunction [Tree, TypeGraphNode.transferType, ContextTree.parent, ProgramGraph.arg, CompilerState]
        Returns [ContextTree, ProgramFragment, ProgramGraph.res]
        DamagedReps [ContextTree.parent, ProgramGraph.arg];


    Now that we have the parse tree, we define recursive functions to apply to the tree.  In Saffron, we write functions which build 
    three major data structures: the context tree, the type graph, and the program graph.  I'll talk about these in detail in the 
    next three slides. 
     
Slide 7
Context Tree
Mimics the block structure of a program
One node for each block/scope:
ˇ Opened names
ˇ Enabled signals
ˇ Field List
ˇ Exit List
ˇ Dependency graph
ˇ Parent context pointer

    O patient reader: note that Opened names, enabled signals, exit lists aren't yet implemented, just thought about.
Slide 8
Type Graph
ˇ Predefined types
ˇ Constructed types
Pair: TYPE = RECORD [x: INT, y: INT];
u: LIST OF REAL;
DIRECTORY Foo: TYPE Bar USING [Baz];
ˇ Runtime state

    Predefined types: INT, CARD32, BOOLEAN, etc.
    Facility for any constructed type.  In the first example, "Pair", we construct three type graph nodes, one for the RECORD type and 
    one for the named "Pair" type.  The second example shows that we also build type graph nodes for anonymous types (LIST OF 
    REAL).  Directory entries are type declarations, too, so we have them in the type graph.
    Here's where we take off on a tangent.  The type graph also includes type graph nodes describing the type structure of the current 
    state of the computation.   This includes the frame types of all the lexically visible stack frames.  (following the static 
    links).  We also know the type structure of the execution stack at all times, although we don't build type graph nodes for it.  
    They can be inferred from the program graph (next slide)
Slide 9
Program Graph
ˇ Intermediate language
ˇ Retains all type information
ˇ Type checking

    This is our intermediate language.
    Unusual feature: it retains all the type information of the source code.  As I mentioned in the last slide, given the program graph 
    and the type graph, we can describe the exact type structure of a computation at every instruction.
    Departure point for one of Howard's radical type theories:  Do the type checking at this point, instead of while we walk the parse 
    tree.
Slide 10
Environment
<< [Artwork node; type 'Artwork on' to command tool] >>

    Environments.  As we compile a file which includes others, we build up an environment structure  indexed by file name.   We only 
    have to deal with each file once.  The example here is RopeImpl.  Here are the files in the various directories:
    RopeImpl: Basics, PrincOpsUtils, Rope, RopePrivate
    Basics: PrincOps
    PrincOps: nothing
    PrincOpsUtils: Basics, PrincOps
    Rope: Basics, PrincOps
    RopePrivate: PrincOps, Rope
    Everything I've described so far was fairly well understood, and a lot of it was implemented, before I picked up the ball.  About 
    two months ago, Howard and I found ourselves at the brink of a huge pit.  Here it is...

Slide 11
Into the Abyss

    Tree: TYPE = REF TreeBody;
    TreeBody: TYPE = RECORD [
        nodeData: REF ANY,
        children: LIST OF Tree
        ];

    
a: INT = b;
b: INT = 3;


    Declare-before-use is more than slightly relaxed in Cedar.  I started looking at code fragments like this, and thinking really hard.
    Type declarations can depend on other type declarations.  (So we don't have forward (as in Pascal) or other abhorrences).  The 
    values of named constants depends on other declarations further down the line.  (In the second example, if I replace b: INT = 3 
    with b: INT _ 3, then the value of a changes from 3 to TRASH.)
    The following paragraphs are obsolete; they refer to the example 
                a: INT = b; 
                b: INT = a;
    First, I noticed what happens if you scratch out the second "a" and replace it with something like "3".  Cedar folds forward 
    references to compile time constants, which means that the value of "a" is 3.  But if I change this equals sign (the second 
    one) to a gets arrow, then the value of "a" suddenly becomes zero (or garbage?...)  So it turns out that you cannot, in 
    general, know the value of an expression until you've looked at all the declarations which follow it. 
    There are lots of other things I can worry about if I feel like it.  What happens if I wrap an identity function around "a"?  It 
    makes a different if the function is inline or not.  What if one of my initializations has a side effect, like assigning to 
    some other variable?  Well, there's a lot more flaming I can do along these lines, but I don't have time to get into any of it 
    now.
Slide 12
Deeper and Deeper
    Foo: TYPE = INT[k..k+FIRST[Bar] ];
    Bar: TYPE = INT[FIRST[Foo]..10];
    k: INT = LAST[Bar]

    Here we have dependency relationships all over the place.  These include properties of types (FIRST, LAST, SIZE) which have to be 
    constant, and named constant values.   
Slide 13
The Problem
ˇ Sequential processing of declarations
EXCEPT FOR
ˇ Named constants with compile-time values (but not runtime constants)
AND
ˇ Constant properties of types

    This is the problem.   
Slide 14
The Solution
Dependency Graph
Nodes for:
ˇ Values of named constants
ˇ Bodies of inline procedures 
ˇ Initialization expressions of types 
ˇ SIZE, FIRST, LAST properties of types 
ˇ Runtime state 
Evaluate nodes in topological order
Reject if:
ˇ Cyclic, or
ˇ A supposedly constant type property depends on runtime state 

    The solution: we build a dependency graph.  A dependency graph is a directed graph with nodes for: (1) the values of named 
    constants, (2) the  bodies of inline procedures -- so we can detect and reject recursive inline procedures, (3) the 
    initialization expressions of types -- since these also behave like inline procedures, (4) the properties of types which must 
    be known at compile time -- SIZE of all types, FIRST and LAST of element types, and (5) the runtime state, so we can express 
    quantities that depend on the runtime state.
    For each block in the program, we build a dependency graph for the names declared in that block.  Then we walk through the graph, 
    visiting a node only after we have visited all the nodes that it depends on.  As we visit the nodes, we can compute the 
    constant values of named constants and constant type properties.
    We reject the program if there is a cycle in the dependency graph, or if a property of a type which is supposed to be known at 
    compile time (like the type's size) depends on the runtime state.
Slide 15
Example
    Foo: TYPE = INT[k..k+FIRST[Bar] ];
    Bar: TYPE = INT[FIRST[Foo]..10];
    k: INT = LAST[Bar]

<< [Artwork node; type 'Artwork on' to command tool] >>

    I'll work through the example here.
    Visit the nodes in this order:  LAST[Bar], VALUE[k], FIRST[Foo], FIRST[Bar], LAST[Foo], SIZE[Foo], SIZE[Bar]
    Sure, it looks easy now, but we sweated this stuff out for several weeks back in July.
Slide 16
Saffron Now
ˇ Builds type graphs and context trees
ˇ Dependency analysis
ˇ Generates some intermediate code:
Initialization (declarations)
Expressions
Assignments
A few simple statements
Blocks
Transfer type values (procedures,  ...)

    Here's where Saffron is right now.  It builds type graphs and the context tree (it already did before I got here, so that's not 
    very interesting.)
    Saffron performs dependency analysis on a block by block basis.  This was described in the preceding slides.
    And, we're generating some intermediate code (program graph).  This includes initialization code for variable declarations, 
    expressions (folding constants, generating code for what has to be done at runtime), assignment statements, a few simple 
    statements (IF-THEN-ELSE works, and NULL is really easy, but that's about it so far), blocks, and transfer type values (things 
    like procedure bodies -- we're coping with all the lexical structure of Cedar programs).
    Oh, Saffron has representations for and can manipulate Cedar values (compile-time or runtime).
Slide 17
Wrapping Up
ˇ Cedar
ˇ Casaba
ˇ Acknowledgements
Howard Sturgis
Bill Jackson
Carl Hauser
Russ Atkinson
Peter Kessler

    Cedar: I've been working with it off and on for over two years now, since the early days of 6.0.  Working on compiling Cedar gives 
    you, ah, an entirely new perspective on the langauge.  Before, my attitude was, "yes, it's an incredibly powerful language 
    which I love to use even though it's very crufty in places."  Now I really know how crufty it is.  But I still like it.
    Casaba: 
    Howard Sturgis, my supervisor, who's the whole reason I'm working on this.
    Bill, whom I've also been working with...answered all the obscure Casaba questions that Howard passed on.
    Carl, Russ & Peter: on the few occasions that I came up with a truly bizarre Cedar question, they always knew the answer.