JunoParseUnparse.mesa (ex Lexer + Parser + ParseTable + OldUnparser)

Adapted from Parser.mesa, last Edited by: GNelson, January 10, 1983 5:50 pm
Merged with some of Lexer.mesa, coded by by GNelson (?), September 9, 1982 4:13 pm
Merged with OldUnparser.mesa, last Edited by: GNelson, October 11, 1983 9:23 pm
Merged with ParseTable.mesa, last Edited by: GNelson, August 24, 1983 10:30 pm
Merged with OldParseWindow.mesa, last Edited by: GNelson, October 11, 1983 9:29 pm
Last edited by by Stolfi, June 13, 1984 12:25:54 pm PDT

The Parser/Unparser susbsystem of Juno is concerned with the translation between the external Juno language and the internal "symbolic Expression" (i.e., parse tree) representation.

This interface declares the data structures and procedures used by the Juno Parser/Unparser subsystem. The syntax of the external language is determined mainly by:

(1) a lexical scanner table, that describes how to group characters into lexemes. It includes a character type table, a table of lexeme aliases, and a list of two-chracter operators.

(2) a parse/unparse table, that gives the binding type and power of each lexical item, and some formatting hints for its unparsing.

(3) a static syntax checker, a procedure that validates the resulting parse tree.

These three items (and therefore the exact syntax of the external language) must be supplied by the client. This interfaces merely gives the structure and tools for building those objects.

DIRECTORY

Rope USING [ROPE, Length, Substr];

JunoParseUnparse: CEDAR DEFINITIONS

IMPORTS Rope =

BEGIN
OPEN Rope;

- - - - SYMBOLIC EXPRESSIONS

Se: TYPE = REF ANY; -- Symbolic Expression. Either:

ATOM (variable, operator, or function name)
REF REAL, REF INT
ROPE
LIST OF Se of the form (operator arg) or (operator arg arg)

- - - - FAKE ROPE STREAMS

Stream: TYPE = REF StreamRep;

StreamRep: TYPE = RECORD [pos: INT, rope: ROPE, len: INT];

A Stream simulates an IO.STREAM, but is cheaper and easier to set up, read, and backspace.

StreamFromRope: PROC [rope: ROPE] RETURNS [stream: Stream] = INLINE

{RETURN [NEW[StreamRep ← [pos: 0, rope: rope, len: rope.Length[]]]]};

Contents: PROC [stream: Stream] RETURNS [ROPE] = INLINE

{OPEN stream; RETURN [Substr[base: rope, start: pos, len: len-pos]]};

- - - - SYNTAX TABLES FOR LEXING/PARSING/UNPARSING

Table: TYPE = REF TableRep;

TableRep: TYPE = RECORD

[-- lexer stuff
maxAtomLength: INTEGER, -- max length of identifiers. Default is 200
maxStringLength: INTEGER, -- max length of strings. Default is 200
chType: ARRAY CHAR OF CharType, -- described below
twoCharOps: LIST OF TwoCharOp, -- list of two-character lexemes

-- parser/unparser stuff
defaultAtomProps: REF Props -- default properties for atoms (identifiers).
];

CharType: TYPE = {letter, digit, blank, quote, op, invalid, other};

Identifiers must start with letter and continue with letters or digits;
blanks are ignored; quotes delimit strings; ops form one- or two-character
operators; invalids are invalid in any context; others are valid only inside strings.

TwoCharOp: TYPE = RECORD [chars: RECORD [c1, c2: CHAR], atom: ATOM]; -- atom = $c1c2

Props: TYPE = RECORD -- parse/unparse properties of lexemes

[closer: ATOMNIL,
infix: BOOLFALSE,
prefix: BOOLFALSE,
postfix: BOOLFALSE,
matchfix: BOOLFALSE,
subfix: BOOLFALSE,
busfix: BOOLFALSE,
closefix: BOOLFALSE,
bindingPower: BindingPower ← 0,
identifier: BOOLFALSE,
unparseType: UnparseType ← zero];

BindingPower: TYPE = INTEGER [-1..100]; -- operator binding power (-1 is reserved)

NewTable: PROC RETURNS [table: Table];

Creates a new syntax table for lexing/parsing/unparsing.

All of the
table's fields are initialized to default values. Maximum length of atoms and strings is set to 200 each.The table.chType is initialized as follows:

letters to letter; digits to digit; double quotes to quote;
CR, SP, LF, FF, NUL, and TAB to blank;
+-*/|\()[]{}<>=~@#$%&^←:;,.!? to op;
and all others to other.

SetCharType: PROC [table: Table, from, to: CHAR, chType: CharType];

Sets char type for all characters in [from..to]

EnterTwoCharOp: PROC [table: Table, c1, c2: CHAR];

Makes c1c2 into a double character lexeme; e.g. -> and //. Both chars must be of type op.

EnterAlias: PROC [table: Table, alias, standard: ATOM];

Defines standard as the preferred form of alias.

Both may be any atoms, including one- or two-character operators. The lexer will automatically replace every scanned atom by its standard form, if specified, before passing it to the caller. This is to allow typing english names for operators that have no keys, such as "forall" or "and".

EnterAtomProps: PROC [table: Table, atom: ATOM, props: Props];

Defines parse/unparse properties of an atomic lexeme (identifier or operator).

The atom should be its own preferred form (i.e., should not be an alias)

GetAtomProps: PROC [atom: ATOM, table: Table] RETURNS [name: ATOM, rp: REF Props];

Maps aliases into standard names and gets parse/unparse props from table.

Will return table.defaultAtomProps if not defined.

- - - - LEXER

Lexeme: TYPE = REF; -- Lexical item (REF INT, REF REAL, ATOM, or ROPE);

Exhausted: PROC [stream: Stream, table: Table] RETURNS [BOOL];

Skips blank characters from the given stream, returns TRUE if gets to end of it.

Lex: PROC [stream: Stream, table: Table] RETURNS [lex: Lexeme, error: ROPE, rp: REF Props];

Parses the longest prefix of the stream that is a valid lexeme.

If the parse is successful, Lex returns error = NIL, lex= lexeme parsed, rp = its properties. If lex is an atom without special properties, will set its properties to table.defaultAtomProps and return them. Aliases will be automatically mapped to their preferred forms. rp = NIL for REF INTs, REF REALs, and ROPEs.

If the stream is exhausted, or no prefix of it is a valid lexeme, Lex returns lex = rp = NIL and error = some non-empty message. Note that you will get an error if the strream begins with a blank.

In any case, the characters that were not used in lex are left in the stream. The part the was lexed to lex can be determined by comparing stream.pos before and after the call.

To use Lex, define and setup a table table, create a stream from the desired rope, then execute

WHILE NOT Exhausted[stream, table] DO
[lex, error] ← Lex[stream, table];
IF error # NIL THEN EXIT;
<use lex>
IF <lex was not good> THEN EXIT;
ENDLOOP.
IF error # NIL THEN {-- lexer or parser error-- ...}

- - - - PARSER

Parse: PROC [stream: Stream, table: Table]
RETURNS [expr: Se, error: ROPE, openCount: INTEGER];

Parses a maximal-length expresion from the given rope.

If the parsing succeeds, the symbolic expression thus obtained is returned in expr, error is NIL, and openCount is 0.

If the parser fails, i.e. no prefix of the given rope is a valid expression, error is set to a non-null message, and expr is the (usually incomple) expression parsed so far. openCount will be the number of openfix operators (parentheses) whose matching closefixes were not parsed.

In any case, the part of the stream that was not parsed to expr is left in Contents[stream].

You will get an error if the stream is empty or contains only blanks.

- - - - UNPARSER

UnparseType: TYPE = {zero, infixA, infixB, matchA, infixC, prefix, matchB, infixD, infixE};
Explained (?) in unparser module

Unparse: PROC
[f: LIST OF Se,
table: Table,
culprit: Se ← NIL,
openCount: INTEGER𡤀,
leftMargin: INTEGER ← 0, rightMargin: INTEGER ← 69]
RETURNS [ROPE];

Unparses a Car[expr], and produces a rope.

- - - - SYNTAX CHECKING

SyntacticPredicate: TYPE = PROC [f: Se] RETURNS [r: VerdictAndCulprit];

Checks if Car[f] is of a specified form.
If so, returns [yes, NIL].
If not, returns [no,f] or [ofCourseNot, f]

Verdict: TYPE = {yes, no, ofCourseNot};

VerdictAndCulprit: TYPE = RECORD [verdict: Verdict, culprit: REF ANYNIL];

Culprit relevant only if verdict is no or ofCourseNot, and is the sub-expression whose Car is wrong.

Or: PROC
[v1, v2, v3, v4, v5: VerdictAndCulprit ← [ofCourseNot]] RETURNS [r: VerdictAndCulprit];

Caution: will compute all arguments before testing them.

False: SyntacticPredicate; -- always returns [ofCourseNot, f]

HasForm: PROC [f: Se, op: REF, Arg1: SyntacticPredicate, Arg2: SyntacticPredicate ← NIL]
RETURNS [r: VerdictAndCulprit];

Tests if Car[f] is a list (op, a1, a2) where Arg1[a1] and Arg2[a2] yield [yes].
The op may be an atom or a list of atoms (meaning any of them).

Is: PROC [f: Se, op: REF] RETURNS [r: VerdictAndCulprit];

Tests if Car[f] is the atom op.
The op may be a list of atoms (meaning any of them).

- - - - CAR, CDR, ETC

Car: PROC [r: Se] RETURNS [Se];

Cdr: PROC [r: Se] RETURNS [Se];

Cadr: PROC [r: Se] RETURNS [Se];

Caddr: PROC [r: Se] RETURNS [Se];

Cddr: PROC [r: Se] RETURNS [Se];

Caadr: PROC [r: Se] RETURNS [Se];

Cdar: PROC [r: Se] RETURNS [Se];

Cddar: PROC [r: Se] RETURNS [Se];

END.