<> <> <<>> SchemePretty PCEDAR 2.0 % FOR INTERNAL XEROX USE ONLY SchemePretty Mike Spreitzer Ó Copyright 1989 Xerox Corporation. All rights reserved. Abstract: This package provides pretty-printing functions for Scheme. Created by: Mike Spreitzer Maintained by: Mike Spreitzer Keywords: Scheme, Pretty-Print XEROX Xerox Corporation Palo Alto Research Center 3333 Coyote Hill Road Palo Alto, California 94304 For Internal Xerox Use Only 1. User Interfaces Tioga Operations The SchemePrettyView command registers two new operations with Tioga and the Edit Tool: SchemePrettyPrintExpr and SchemePrettyPrintData. They both read Scheme forms starting at the beginning of the current Tioga primary selection, and continue reading up to but not including the first form that starts outside the current primary selection. The read forms are pretty printed, as program or data according to which operation is used. The pretty results replace the read forms, and are selected. In [PCedar2.0]SchemePretty.tip is a TIP table suitable for layering in front of Tioga's, and that puts SchemePrettyPrintExpr and SchemePrettyPrintData under Scheme-L and Scheme-Shift-L respectively. The Scheme key is also known as (Left) Alt. Currently, the pretty printer only outputs plain text --- no Tioga node structure. If the selected forms span multiple nodes, the spanned node breaks will be obliterated when the selected forms are replaced with their prety versions. The pretty printer assumes that firstIndent=restIndent --- that is, the first line of a node is indented the same as the other lines. Note that this is not true in Scheme.style! But it is in Ascii.style. The pretty printer no longer discards comments. The formatting parameters are taken from User profile entries. The margin used in the Tioga operations is what's left after deducting the leftIndent of the node containing the start of the selection and the number of characters before the start of the selection in its line. Commander Commands Syntax: SchemePrettyPrintExpr * SchemePrettyPrintData * = (margin ) | (sia ) | (simplelen ) | (simplequote ) | (debug ) | (debugss ) | (context ) Description: Reads Scheme from standard input, pretty-prints it to standard output. Command line arguments override a fixed default formatting style. Command line arguments are read, but not evaluated, by Scheme. "f" looks are applied to the output stream via the "%l" formatting hack. Examples: % SchemePrettyPrintExpr (margin 15) (x y) (x y) (sdf oiu oiu lkj uy lkjh sdf) (sdf oiu oiu lkj uy lkjh sdf) -- Aborted Syntax: SchemePrettyPrintFile ( [ _] | ) * = -margin | -sia | -simplelen | -simplequote | -debug | -debugss | -context Description: Produces a Tioga-formatted file from another. Examples: % SchemePrettyPrintExpr (margin 15) (x y) (x y) (sdf oiu oiu lkj uy lkjh sdf) (sdf oiu oiu lkj uy lkjh sdf) -- Aborted User Profile Options SchemePretty.margin: INT _ 80 <> SchemePretty.sia: INT _ 3 <> SchemePretty.simpleLen: INT _ 2 <> SchemePretty.simpleQuote: BOOL _ TRUE <> SchemePretty.rules: Token _ "SchemePretty.rules" <> SchemePretty.keywordLooks: Token _ "k" <> SchemePretty.argumentLooks: Token _ "c" <> SchemePretty.defineeLooks: Token _ "n" <> 2. The Pretty-Printing Theory SchemePretty reads text with its own special reader, in order to retain comments. SchemePretty prints Scheme forms using StructuredStreams. StructuredStreams takes as input a stream of characters delimited into nested objects with breakpoints; the output stream of characters differs from the input only at the breakpoints, where linebreaks and other whitespace may be inserted. In particular, a breakpoint has: a condition, an offset, and a sep. The sep is a ROPE that will be output if the break is not taken. The condition controls when the break is taken, and the offset determines the indentation of the new line. There are six possible conditions: never, miser, width, lookLeft, united, and always. Never and always are obvious (if somewhat degenerate). A width break breaks only if necessary to avoid making a line too long. A lookLeft break will break if the width would, or if any object since the last sibling break broke (recursively). A united break will break if the width would, or if the containing object breaks (recursively). A miser break will break only if there is no other way to avoid making some line too long; for every top-level object, a minimal number of miser breaks is taken. SchemePretty always uses a single space as the sep for a breakpoint. By careful placement of object boundaries relative to parentheses, only two offsets are ever needed: 0 and sia (the Standard Indentation Amount, normally 2). SchemePretty has no use for the width condition --- lookLeft is always preferred. Scheme comments are attached to non-comment forms, either following them or preceeding them. Some comments could be considered either attached to the following non-comment or the preceeding one; the SchemePretty reader chooses one in an as-yet undocumented way. SchemePretty extends the set of possible Scheme forms with Commented. A Commented has a prefixed sequence of linebreaks and comments, a non-comment form, and a postfixed sequence of linebreaks and comments. The SchemePretty reader records every comment, and every linebreak that isn't separated from a comment by non-comment text, in a Commented form. SchemePretty converts Scheme forms to StructuredStreams input by following an ordered list of rules. The basic conversion step is parameterized by a context (a name) and a depth (an integer), takes a subject Scheme form, and produces almost-StructuredStreams-input according to the rules; the almost-StructuredStreams-input is then converted into StructuredStreams input in a standard way. Almost-StructuredStreams-input consists of a sequence of: object boundaries, breakpoints, printing characters, buffered strings. The resultant StructuredStreams input is the same sequence, with the following exceptions. Buffered strings are delayed until just before the next printing characters or object begin. Each linebreak in a buffered string becomes an always breakpoint whose indentation is that of its cohort breakpoints (they must all have the same indentation) if there are any, otherwise 0. The cohort breakpoints of a buffered line are those that are not separated from it by an object boundary or printing characters. The cohort breakpoints of each buffered string that includes linebreaks are deleted. A form is classified as either leaf, atom, simple, or complex, according to the following rules. Every form is a leaf, except for vectors, pairs, the empty list, and some Commented forms. A form that is or contains a Commented form is complex, except when it is a Commented whose non-Comment part is a leaf, in which case the Commented is also a leaf. Commentless quote and quasiquote forms are classified just like their cadr, except that those with leaf cadr are atoms. Improper lists are treated as if they were proper ones, with the non-list cdr treated as the last element. A commentless vector or list is complex if it contains a non-atom element or has more than 2 elements; the others are simple. Users may change three parameters of the printing: (1) the sia, (2) the maximum length of a simple list or vector, and (3) whether the quotation exception applies. 3. The Pretty-Printing Meta-Rules A rule is written in one of the following two ways: contextName pattern (contextName depth) pattern Rules that indicate a depth apply only at the given depth; rules that don't indicate a depth apply at all depths. The almost-StructuredStreams-input is produced from the context, depth, and subject by applying the rules in order, until one of them matches. Each rule describes both the condition under which it matches and the almost-StructuredStreams-input it produces in a mostly integrated way. The following meta-rules describe how a pattern prescribes the matching test and the almost-StructuredStreams-input to emit, for a non-comment subject. Note that the canonical treatment of improper lists is built into these meta-rules. leaf => match iff a leaf; emit via print definee => match iff a leaf; emit via print, with definee looks argument => match iff a leaf; emit via print, with argument looks atom => match iff a leaf or atom; emit by recursion simple => match iff not complex; emit by recursion complex => match iff complex; emit by recursion any => match always; emit by recursion contextName => match always; emit by recursion into contextName at current depth (precommentless x) => match a subject if x would and the subject is not preceeded by comments; emit like x (postcommentless x) => match a subject if x would and the subject is not followed by comments; emit like x (commentless x) => match a subject if x would and the subject is neither preceeded nor followed by comments; emit like x (enter contextName i) => match always; emit by recursion into contextName at depth i (inc contextName) => match always; emit by recursion into contextName at current depth +1 (dec contextName) => match always; emit by recursion into contextName at current depth -1 (me pat emi) => match like pat; emit like emi (mei pat ctx) = (me pat (inc ctx)) (med pat ctx) = (me pat (dec ctx)) (quote x) => match the symbol or string x; emit directly, with keyword looks (quasiquote aSymbol) => match the symbol aSymbol; emit directly (lb x) => match a list that x would; emit left paren, whatever x emits, right paren (ob x) => match like x; emit obj begin, like x, obj end (lob x) => match a list that x would; emit left paren, obj begin, like x, obj end, right paren (vp x) => match vector whose vector->list x would; emit #, what x would for that vector->list (strip-list x) => match & emit like x (which should match Tail NTs), but consider this to match Form NTs [use carefully!]. (or . list) => match if any element of list would; emit according to matched element (begin-obj . x) => match like x; emit obj begin, like x (end-obj . x) => match like x; emit obj end, like x (cw . x) => match like x; emit a c/w bp, like x; c in {n, m, l, u, a}, w in {0, s} (readermacro "aString" y) => match never; emit "aString", then like y on cadr of subject (dotform y) => match never; emit ". ", then like y (skip x . y) => match a pair whose car matches x and cdr matches y; emit cdr like x (insert "aString" y) => match like y; emit "aString", then like y (x . y) => match a pair whose car matches x and hacked cdr matches y; emit like x, (if hacked then buffer ". "), like y where a hacked z is just z if that's a pair, otherwise it's the list (z) () => match the empty list; emit nothing If none of rules of the context match the subject, the rules of a default context are applied. There are two default contexts, named form-default and list-default; which one is used is determined by the syntactic category begin printed (see next secion). When the subject is a Commented form, the prefixed linebreaks and comments are emitted, the rules are applied to the non-comment part, and then the postfixed linebreaks and comments are emitted. Static Checking of the Rules Two aspects of the rules are checked statically when they are read: syntactic coherence and color coherence. A context-free grammar for Scheme might use two non-terminals, Form and Tail, something like this: Form ::= ( Tail ) | Symbol | Number | String | ... Tail ::= Each context is suitable for emitting text that should be parsed into exactly one of those two non-terminals. For example, consider the rules expr (lob (expr ls . expr)) expr (lb expr) expr leaf The use of `expr' after the dot is syntactically inconsistent with the other uses. For example, it would reformat the input `(a b)' as `(a (b))'; the error is in treating the cdr of the list `(a b)' --- that is, `(b)' --- as a Form instead of a Tail. SchemePretty associates each rule fragment with a non-terminal (NT), and statically checks the structure of the rule set (by meta-rules given below). The rule set must be checked as a whole because a single rule may not contain enough information. For example, a rule like foo (ob bar) introduces an interdependency between the syntactic (and color) categories of contexts foo and bar. Between any two non-paren tokens, there should be some whitespace. This is color coherence. For example, the rules expr (lob (expr expr)) expr leaf violate color coherence because there's no whitespace output between the two sub-expressions in the first rule; the input `(a b)' would be reformatted as `(ab)'. With each rule fragment, SchemePretty associates a beginning color (BC), which is either White or Black. The ending color of every context is assumed to be Black (thus, you'd better not write your rules to depend on it being White! This usually isn't a problem --- and when it is, we'll just make that a variable along with the beginning color.) The structure of the rule set is statically checked for color coherence. Following is a summary of the constraints imposed locally by each rule fragment. The notation `rf => [Nt, Bc]' means that a rule fragment of form rf is considered to have NT Nt and BC Bc. The left-side notation `sf[Nt,Bc]' means that sub-form sf is constrained to have NT Nt and BC Bc. Where `ctx' appears, it is a reference to the context of one of whose rules this is a fragment. Free means unconstrained. leaf => [Form, Black] definee => [Form, Black] argument => [Form, Black] atom => [ctx.NT, ctx.BC] simple => [ctx.NT, ctx.BC] complex => [ctx.NT, ctx.BC] any => [ctx.NT, ctx.BC] contextName => [contextName.NT, contextName.BC] (precommentless x) => [x.NT, x.BC] (postcommentless x) => [x.NT, x.BC] (commentless x) => [x.NT, x.BC] (enter contextName i) => [contextName.NT, contextName.BC] (inc contextName) => [contextName.NT, contextName.BC] (dec contextName) => [contextName.NT, contextName.BC] (me pat emi) => [emi.NT, emi.BC] Note: no static checking is done inside pat. (quote aSymbol) => [Form, Black] (quasiquote aSymbol) => [Form, Black] (lb x[Tail,Free]) => [Form, Black] (ob x) => [x.NT, x.BC] (lob x[Tail,Free]) => [Form, Black] (vp x[Form,Black]) => [Form, Black] (strip-list x[Tail,Free]) => [Form, Black] (or . list) => [list.first.NT, list.first.BC] (begin-obj . x) => [x.NT, x.BC] (end-obj . x) => [x.NT, x.BC] (cw . x) => [x.NT, White] (readermacro "aString" y[Form,Free]) => [Form, Black] (dotform y[Form,Free]) => [Tail, Black] (skip x . y) => [y.NT, y.BC] (insert "aString" y) => [y.NT, White] (x[Form,Free] . y[Tail,White]) => [Tail, x.BC] () => [Tail, Free] 4. The Pretty-Printing Rules The rules can be seen in SchemePretty.rules, and are themselves output of the pretty-printer (using the rules context). 7. Old, Rejected Proposals 3A. The Pretty-Printing Meta-Rules A rule is written in one of the following two ways: contextName pattern depth contextName pattern Rules that indicate a depth apply only at the given depth; rules that don't indicate a depth apply at all depths. The almost-StructuredStreams-input is produced from the context, depth, and subject by applying the rules in order, until one of them matches. Each rule describes both the condition under which it matches and the almost-StructuredStreams-input it produces in a mostly integrated way. The following meta-rules describe how a pattern prescribes the matching test and the almost-StructuredStreams-input to emit, for a non-comment subject. atom => match iff neither pair nor vector; emit via print simple => match iff simple; emit by recursion complex => match iff not simple; emit by recursion any => match always; emit by recursion contextName => match always; emit by recursion into contextName at current depth (enter contextName i) => match always; emit by recursion into contextName at depth i (inc contextName) => match always; emit by recursion into contextName at current depth +1 (dec contextName) => match always; emit by recursion into contextName at current depth -1 (me pat em) => match like pat; emit like em (quote aSymbol) => match the symbol aSymbol; emit directly (or . list) => match if any element of list would; emit according to matched element (parens . x) => match iff x would; emit left paren, whatever x emits, right paren (object . x) => match x; emit obj begin, x, obj end (both . x) => match x; emit left paren, obj begin, x, obj end, right paren (begin-obj . x) => match x; emit obj begin, x (end-obj . x) => match x; emit obj end, x (cw . x) => match x; emit a c/w bp, x; c in {n, m, l, u, a}, w in {0, s} ("aString" . x) => match x; emit "aString", x (cadrOnly . x) => match a list whose second element matches x; emit second element according to x (commentless . x) => match a list that both matches x and has no comments attached to any element; emit according to x (x . y) => match a pair whose car matches x and hacked cdr matches y; emit x, (if hacked then buffer ". "), y where a hacked z is just z if that's a pair, otherwise it's the list (z) () => match the empty list; emit nothing If none of rules match, the following default rules apply: context atom context (parens) context (any . list) list () list (ls any . list) When the subject is a Commented form, the prefixed linebreaks and comments are emitted, the rules are applied to the non-comment part, and then the postfixed linebreaks and comments are emitted. 4A. The Pretty-Printing Rules expr (parens any) expr (me (commentless 'quote any) (cadrOnly "'" data)) expr ('quote ns data) data (me (commentless 'quote any) (cadrOnly "`" any)) data (me (commentless 'quasiquote any) (cadrOnly "`" any)) data (me (commentless 'unquote any) (cadrOnly "," any)) data (me (commentless 'unquote-splicing any) (cadrOnly ",@" any)) expr (me (commentless 'quasiquote any) (cadrOnly "`" (enter templ 1))) expr ('quasiquote ns (enter templ 1)) 0 templ expr templ (me (commentless 'quote any) (cadrOnly "'" templ)) templ (me (commentless 'quasiquote any) (cadrOnly "`" (inc templ))) templ (me (commentless 'unquote any) (cadrOnly "," (dec templ))) templ (me (commentless 'unquote-splicing any) (cadrOnly ",@" (dec templ))) templ ('unquote ns (dec templ)) templ ('unquote-splicing ns (dec templ)) templ ('quasiquote ns (inc templ)) expr (both (or 'define 'set!) ns atom ls simple) expr (both (or 'define 'set!) ns any . body-s) body-s () body-s (as expr . body-s) expr (both 'lambda ns formals ls simple) expr (both 'lambda ns formals . body-s) formals (parens) formals (both any . formalTail) formalTail () formalTail (l0 any . formalTail) expr (both (or 'unless 'when) ns any . body-s) expr (both 'export ns formals . exportTailDef) exportTailDef (as) exportTailDef (as as (me ('define . any) expr) . exportTailDef) exportTailDef (as as expr . exportTailOther) exportTailOther (as) exportTailOther (as as (me ('define . any) expr) . exportTailDef) exportTailOther (as expr . exportTailOther) expr (both 'if ns simple ls . (object atom l0 atom)) expr (both 'if ns simple . simpleIf) expr (both 'if ns complex . complexIf) simpleIf (ls (me simple expr) us (me simple expr)) simpleIf (as expr us expr) complexIf (as (me atom expr) ls (me atom expr)) complexIf (as expr us expr) expr (both 'cond ms . (object . condTail)) expr (both 'case ns any as . (object . condTail)) condTail () condTail (condClause) condTail (condClause a0 . condTail) condClause (both (me simple expr) l0 (me simple expr) ) condClause (both expr a0 expr) condClause (both (me simple expr) l0 '=> ns (me simple expr) ) condClause (both expr a0 '=> ns expr) condClause (both expr . condClauseTail) condClauseTail () condClauseTail (a0 expr . condClauseTail) expr (both 'let ns begin-obj atom m0 (both . bindings) end-obj . body-s) expr (both (or 'let 'let* 'letrec 'with) ns (both . bindings) . body-s) bindings () bindings (binding) bindings (binding a0 . bindings) binding (both expr ms expr) expr (both 'begin ms . (object . body-0)) body-0 () body-0 (expr) body-0 (a0 expr . body-0) expr (both 'do ns (both . bindings) as condClause . body-s) expr (both 'extend-syntax ns formals . extendTail) extendTail () extendTail (as extension . extendTail) extension (both expr . body-s) expr (both simple ms . args) expr (both complex a0 . args) args () args (object expr) args (object (me simple expr) . args-s) args (object (me complex expr) . args-c) args-s (l0 expr) args-s (l0 (me simple expr) . args-s) args-s (a0 (me complex expr) . args-c) args-c (a0 expr) args-c (a0 (me simple expr) . args-s) args-c (a0 (me complex expr) . args-c) 3B. The Pretty-Printing Meta-Rules A rule is written in one of the following two ways: contextName pattern depth contextName pattern Rules that indicate a depth apply only at the given depth; rules that don't indicate a depth apply at all depths. The almost-StructuredStreams-input is produced from the context, depth, and subject by applying the rules in order, until one of them matches. Each rule describes both the condition under which it matches and the almost-StructuredStreams-input it produces in a mostly integrated way. The following meta-rules describe how a pattern prescribes the matching test and the almost-StructuredStreams-input to emit, for a non-comment subject. Note that the canonical treatment of improper lists is built into these meta-rules. atom => match iff neither pair nor vector; emit via print simple => match iff simple; emit by recursion complex => match iff not simple; emit by recursion any => match always; emit by recursion contextName => match always; emit by recursion into contextName at current depth (enter contextName i) => match always; emit by recursion into contextName at depth i (inc contextName) => match always; emit by recursion into contextName at current depth +1 (dec contextName) => match always; emit by recursion into contextName at current depth -1 (me pat em) => match like pat; emit like em (quote aSymbol) => match the symbol aSymbol; emit directly (parens x) => match iff x would; emit left paren, whatever x emits, right paren (object x) => match x; emit obj begin, x, obj end (both x) => match x; emit left paren, obj begin, x, obj end, right paren (cadrOnly x) => match a list whose second element matches x; emit second element according to x (commentless x) => match a list that both matches x and has no comments attached to any element; emit according to x (or . list) => match if any element of list would; emit according to matched element (begin-obj . x) => match x; emit obj begin, x (end-obj . x) => match x; emit obj end, x (cw . x) => match x; emit a c/w bp, x; c in {n, m, l, u, a}, w in {0, s} ("aString" . x) => match x; emit "aString", x (x . y) => match a pair whose car matches x and hacked cdr matches y; emit x, (if hacked then buffer ". "), y where a hacked z is just z if that's a pair, otherwise it's the list (z) () => match the empty list; emit nothing If none of rules match, the following default rules apply: context atom context (parens ()) context (any . list) list () list (ls any . list) When the subject is a Commented form, the prefixed linebreaks and comments are emitted, the rules are applied to the non-comment part, and then the postfixed linebreaks and comments are emitted. 4B. The Pretty-Printing Rules <> expr (parens (any)) <> <> expr (both ((or 'define 'set!) ns atom ls simple)) expr (both ((or 'define 'set!) ns any . body-s)) body-s () body-s (as expr . body-s) <> expr (both ('lambda ns formals ls simple)) expr (both ('lambda ns formals . body-s)) formals (parens ()) formals (both (any . formalTail)) formalTail () formalTail (l0 (any . formalTail)) <> expr (both ((or 'unless 'when) ns any . body-s)) <> expr (both ('export ns formals . exportTailOther)) exportTailDef (as) exportTailDef (as as (me ('define . any) expr) . exportTailDef) exportTailDef (as as expr . exportTailOther) exportTailOther (as) exportTailOther (as as (me ('define . any) expr) . exportTailDef) exportTailOther (as expr . exportTailOther) <> expr (both ('if ns simple ls begin-obj atom l0 atom end-obj)) expr (both ('if ns simple . simpleIf)) expr (both ('if ns complex . complexIf)) simpleIf (ls (me simple expr) us (me simple expr)) simpleIf (as expr us expr) complexIf (as (me atom expr) ls (me atom expr)) complexIf (as expr us expr) <> expr (both ('cond ms . condTail)) expr (both ('case ns any as . condTail)) condTail (object condSeries) condSeries () condSeries (condClause) condSeries (condClause a0 . condSeries) condClause (both ((me simple expr) l0 (me simple expr)) ) condClause (both (expr a0 expr)) condClause (both ((me simple expr) l0 '=> ns (me simple expr)) ) condClause (both (expr a0 '=> ns expr)) condClause (both (expr . condClauseTail)) condClauseTail () condClauseTail (a0 expr . condClauseTail) <> expr (both ('let ns begin-obj atom m0 (both bindings) end-obj . body-s)) expr (both ((or 'let 'let* 'letrec 'with) ns (both bindings) . body-s)) bindings () bindings (binding) bindings (binding a0 . bindings) binding (both (atom ms expr)) binding (both (expr ls expr)) <> expr (both ('begin ms . beginTail)) beginTail (object body-0) body-0 () body-0 (expr) body-0 (a0 expr . body-0) <> expr (both ('do ns (both bindings) as condClause . body-s)) <> expr (both ('extend-syntax ns formals . extendTail)) extendTail () extendTail (as extension . extendTail) extension (both (expr . body-s)) <> <<(quote ;a comment here!>> <> <> expr (me (commentless ('quote any)) (cadrOnly ("'" data))) expr ('quote ns data) data (me (commentless ('quote any)) (cadrOnly ("'" any))) data (me (commentless ('quasiquote any)) (cadrOnly ("`" any))) data (me (commentless ('unquote any)) (cadrOnly ("," any))) data (me (commentless ('unquote-splicing any)) (cadrOnly (",@" any))) <> expr (me (commentless ('quasiquote any)) (cadrOnly ("`" (enter templ 1)))) expr ('quasiquote ns (enter templ 1)) 0 templ expr <> templ (me (commentless ('quote any)) (cadrOnly ("'" templ))) templ (me (commentless ('quasiquote any)) (cadrOnly ("`" (inc templ)))) templ (me (commentless ('unquote any)) (cadrOnly ("," (dec templ)))) templ (me (commentless ('unquote-splicing any)) (cadrOnly (",@" (dec templ)))) <> templ ('unquote ns (dec templ)) templ ('unquote-splicing ns (dec templ)) templ ('quasiquote ns (inc templ)) <> expr (both (atom ms . args)) expr (both (simple ls . args)) expr (both (complex a0 . args)) args () args (expr) args (object ((me simple expr) . args-s)) args (object ((me complex expr) . args-c)) args-s (l0 expr) args-s (l0 (me simple expr) . args-s) args-s (a0 (me complex expr) . args-c) args-c (a0 expr) args-c (a0 (me simple expr) . args-s) args-c (a0 (me complex expr) . args-c)