JunoParseUnparse.mesa (ex Lexer + Parser + ParseTable + OldUnparser)
Adapted from Parser.mesa, last Edited by: GNelson, January 10, 1983 5:50 pm
Merged with some of Lexer.mesa, coded by by GNelson (?), September 9, 1982 4:13 pm
Merged with OldUnparser.mesa, last Edited by: GNelson, October 11, 1983 9:23 pm
Merged with ParseTable.mesa, last Edited by: GNelson, August 24, 1983 10:30 pm
Merged with OldParseWindow.mesa, last Edited by: GNelson, October 11, 1983 9:29 pm
Last edited by by Stolfi, June 13, 1984 12:25:54 pm PDT
The Parser/Unparser susbsystem of Juno is concerned with the translation between the external Juno language and the internal "symbolic Expression" (i.e., parse tree) representation.
This interface declares the data structures and procedures used by the Juno Parser/Unparser subsystem. The syntax of the external language is determined mainly by:
(1) a lexical scanner table, that describes how to group characters into lexemes. It includes a character type table, a table of lexeme aliases, and a list of two-chracter operators.
(2) a parse/unparse table, that gives the binding type and power of each lexical item, and some formatting hints for its unparsing.
(3) a static syntax checker, a procedure that validates the resulting parse tree.
These three items (and therefore the exact syntax of the external language) must be supplied by the client. This interfaces merely gives the structure and tools for building those objects.
DIRECTORY
Rope USING [ROPE, Length, Substr];
- - - - FAKE ROPE STREAMS
Stream: TYPE = REF StreamRep;
StreamRep:
TYPE =
RECORD [pos:
INT, rope:
ROPE, len:
INT];
A Stream simulates an IO.STREAM, but is cheaper and easier to set up, read, and backspace.
StreamFromRope:
PROC [rope: ROPE]
RETURNS [stream: Stream] =
INLINE
{RETURN [NEW[StreamRep ← [pos: 0, rope: rope, len: rope.Length[]]]]};
Contents:
PROC [stream: Stream]
RETURNS [ROPE] =
INLINE
{OPEN stream; RETURN [Substr[base: rope, start: pos, len: len-pos]]};
- - - - SYNTAX TABLES FOR LEXING/PARSING/UNPARSING
Table: TYPE = REF TableRep;
TableRep:
TYPE =
RECORD
[-- lexer stuff
maxAtomLength: INTEGER, -- max length of identifiers. Default is 200
maxStringLength: INTEGER, -- max length of strings. Default is 200
chType: ARRAY CHAR OF CharType, -- described below
twoCharOps: LIST OF TwoCharOp, -- list of two-character lexemes
-- parser/unparser stuff
defaultAtomProps: REF Props -- default properties for atoms (identifiers).
];
CharType:
TYPE = {letter, digit, blank, quote, op, invalid, other};
Identifiers must start with letter and continue with letters or digits;
blanks are ignored; quotes delimit strings; ops form one- or two-character
operators; invalids are invalid in any context; others are valid only inside strings.
TwoCharOp: TYPE = RECORD [chars: RECORD [c1, c2: CHAR], atom: ATOM]; -- atom = $c1c2
Props:
TYPE =
RECORD
-- parse/unparse properties of lexemes
[closer: ATOM ← NIL,
infix: BOOL ← FALSE,
prefix: BOOL ← FALSE,
postfix: BOOL ← FALSE,
matchfix: BOOL ← FALSE,
subfix: BOOL ← FALSE,
busfix: BOOL ← FALSE,
closefix: BOOL ← FALSE,
bindingPower: BindingPower ← 0,
identifier: BOOL ← FALSE,
unparseType: UnparseType ← zero];
BindingPower: TYPE = INTEGER [-1..100]; -- operator binding power (-1 is reserved)
NewTable:
PROC
RETURNS [table: Table];
Creates a new syntax table for lexing/parsing/unparsing.
All of the table's fields are initialized to default values. Maximum length of atoms and strings is set to 200 each.The table.chType is initialized as follows:
letters to letter; digits to digit; double quotes to quote;
CR, SP, LF, FF, NUL, and TAB to blank;
+-*/|\()[]{}<>=~@#$%&^←:;,.!? to op;
and all others to other.
SetCharType:
PROC [table: Table, from, to:
CHAR, chType: CharType];
Sets char type for all characters in [from..to]
EnterTwoCharOp:
PROC [table: Table, c1, c2:
CHAR];
Makes c1c2 into a double character lexeme; e.g. -> and //. Both chars must be of type op.
EnterAlias:
PROC [table: Table, alias, standard:
ATOM];
Defines standard as the preferred form of alias.
Both may be any atoms, including one- or two-character operators. The lexer will automatically replace every scanned atom by its standard form, if specified, before passing it to the caller. This is to allow typing english names for operators that have no keys, such as "forall" or "and".
EnterAtomProps:
PROC [table: Table, atom:
ATOM, props: Props];
Defines parse/unparse properties of an atomic lexeme (identifier or operator).
The atom should be its own preferred form (i.e., should not be an alias)
GetAtomProps:
PROC [atom:
ATOM, table: Table]
RETURNS [name:
ATOM, rp:
REF Props];
Maps aliases into standard names and gets parse/unparse props from table.
Will return table.defaultAtomProps if not defined.
- - - - LEXER
Lexeme: TYPE = REF; -- Lexical item (REF INT, REF REAL, ATOM, or ROPE);
Exhausted:
PROC [stream: Stream, table: Table]
RETURNS [
BOOL];
Skips blank characters from the given stream, returns TRUE if gets to end of it.
Lex:
PROC [stream: Stream, table: Table]
RETURNS [lex: Lexeme, error: ROPE, rp:
REF Props];
Parses the longest prefix of the stream that is a valid lexeme.
If the parse is successful, Lex returns error = NIL, lex= lexeme parsed, rp = its properties. If lex is an atom without special properties, will set its properties to table.defaultAtomProps and return them. Aliases will be automatically mapped to their preferred forms. rp = NIL for REF INTs, REF REALs, and ROPEs.
If the stream is exhausted, or no prefix of it is a valid lexeme, Lex returns lex = rp = NIL and error = some non-empty message. Note that you will get an error if the strream begins with a blank.
In any case, the characters that were not used in lex are left in the stream. The part the was lexed to lex can be determined by comparing stream.pos before and after the call.
To use Lex, define and setup a table table, create a stream from the desired rope, then execute
WHILE NOT Exhausted[stream, table] DO
[lex, error] ← Lex[stream, table];
IF error # NIL THEN EXIT;
<use lex>
IF <lex was not good> THEN EXIT;
ENDLOOP.
IF error # NIL THEN {-- lexer or parser error-- ...}
- - - - UNPARSER
UnparseType:
TYPE = {zero, infixA, infixB, matchA, infixC, prefix, matchB, infixD, infixE};
Explained (?) in unparser module
Unparse:
PROC
[f:
LIST
OF Se,
table: Table,
culprit: Se ←
NIL,
openCount:
INTEGER𡤀,
leftMargin:
INTEGER ← 0, rightMargin:
INTEGER ← 69]
RETURNS [
ROPE];
Unparses a Car[expr], and produces a rope.
- - - - SYNTAX CHECKING
SyntacticPredicate:
TYPE =
PROC [f: Se]
RETURNS [r: VerdictAndCulprit];
Checks if Car[f] is of a specified form.
If so, returns [yes, NIL].
If not, returns [no,f] or [ofCourseNot, f]
Verdict: TYPE = {yes, no, ofCourseNot};
VerdictAndCulprit:
TYPE =
RECORD [verdict: Verdict, culprit:
REF
ANY ←
NIL];
Culprit relevant only if verdict is no or ofCourseNot, and is the sub-expression whose Car is wrong.
Or:
PROC
[v1, v2, v3, v4, v5: VerdictAndCulprit ← [ofCourseNot]]
RETURNS [r: VerdictAndCulprit];
Caution: will compute all arguments before testing them.
False: SyntacticPredicate; -- always returns [ofCourseNot, f]
HasForm:
PROC [f: Se, op:
REF, Arg1: SyntacticPredicate, Arg2: SyntacticPredicate ←
NIL]
RETURNS [r: VerdictAndCulprit];
Tests if Car[f] is a list (op, a1, a2) where Arg1[a1] and Arg2[a2] yield [yes].
The op may be an atom or a list of atoms (meaning any of them).
Is:
PROC [f: Se, op:
REF]
RETURNS [r: VerdictAndCulprit];
Tests if Car[f] is the atom op.
The op may be a list of atoms (meaning any of them).
- - - - CAR, CDR, ETC
Car: PROC [r: Se] RETURNS [Se];
Cdr: PROC [r: Se] RETURNS [Se];
Cadr: PROC [r: Se] RETURNS [Se];
Caddr: PROC [r: Se] RETURNS [Se];
Cddr: PROC [r: Se] RETURNS [Se];
Caadr: PROC [r: Se] RETURNS [Se];
Cdar: PROC [r: Se] RETURNS [Se];
Cddar: PROC [r: Se] RETURNS [Se];