<< JunoParseUnparse.mesa (ex Lexer + Parser + ParseTable + OldUnparser)>> << >> <> <> <> <> <> <> <<>> << The Parser/Unparser susbsystem of Juno is concerned with the translation between the external Juno language and the internal "symbolic Expression" (i.e., parse tree) representation.>> << This interface declares the data structures and procedures used by the Juno Parser/Unparser subsystem. The syntax of the external language is determined mainly by:>> << (1) a lexical scanner table, that describes how to group characters into lexemes. It includes a character type table, a table of lexeme aliases, and a list of two-chracter operators. >> << (2) a parse/unparse table, that gives the binding type and power of each lexical item, and some formatting hints for its unparsing.>> << (3) a static syntax checker, a procedure that validates the resulting parse tree.>> << These three items (and therefore the exact syntax of the external language) must be supplied by the client. This interfaces merely gives the structure and tools for building those objects.>> DIRECTORY Rope USING [ROPE, Length, Substr]; JunoParseUnparse: CEDAR DEFINITIONS IMPORTS Rope = BEGIN OPEN Rope; << - - - - SYMBOLIC EXPRESSIONS >> Se: TYPE = REF ANY; -- Symbolic Expression. Either: << ATOM (variable, operator, or function name) REF REAL, REF INT ROPE LIST OF Se of the form (operator arg) or (operator arg arg)>> << - - - - FAKE ROPE STREAMS>> Stream: TYPE = REF StreamRep; StreamRep: TYPE = RECORD [pos: INT, rope: ROPE, len: INT]; << A Stream simulates an IO.STREAM, but is cheaper and easier to set up, read, and backspace.>> StreamFromRope: PROC [rope: ROPE] RETURNS [stream: Stream] = INLINE {RETURN [NEW[StreamRep _ [pos: 0, rope: rope, len: rope.Length[]]]]}; Contents: PROC [stream: Stream] RETURNS [ROPE] = INLINE {OPEN stream; RETURN [Substr[base: rope, start: pos, len: len-pos]]}; << - - - - SYNTAX TABLES FOR LEXING/PARSING/UNPARSING>> Table: TYPE = REF TableRep; TableRep: TYPE = RECORD [-- lexer stuff maxAtomLength: INTEGER, -- max length of identifiers. Default is 200 maxStringLength: INTEGER, -- max length of strings. Default is 200 chType: ARRAY CHAR OF CharType, -- described below twoCharOps: LIST OF TwoCharOp, -- list of two-character lexemes -- parser/unparser stuff defaultAtomProps: REF Props -- default properties for atoms (identifiers). ]; CharType: TYPE = {letter, digit, blank, quote, op, invalid, other}; << Identifiers must start with letter and continue with letters or digits; blanks are ignored; quotes delimit strings; ops form one- or two-character operators; invalids are invalid in any context; others are valid only inside strings.>> TwoCharOp: TYPE = RECORD [chars: RECORD [c1, c2: CHAR], atom: ATOM]; -- atom = $c1c2 Props: TYPE = RECORD -- parse/unparse properties of lexemes [closer: ATOM _ NIL, infix: BOOL _ FALSE, prefix: BOOL _ FALSE, postfix: BOOL _ FALSE, matchfix: BOOL _ FALSE, subfix: BOOL _ FALSE, busfix: BOOL _ FALSE, closefix: BOOL _ FALSE, bindingPower: BindingPower _ 0, identifier: BOOL _ FALSE, unparseType: UnparseType _ zero]; BindingPower: TYPE = INTEGER [-1..100]; -- operator binding power (-1 is reserved) NewTable: PROC RETURNS [table: Table]; << Creates a new syntax table for lexing/parsing/unparsing.>> << All of the table's fields are initialized to default values. Maximum length of atoms and strings is set to 200 each.The table.chType is initialized as follows: >> << letters to letter; digits to digit; double quotes to quote; CR, SP, LF, FF, NUL, and TAB to blank; +-*/|\()[]{}<>=~@#$%&^_:;,.!? to op; and all others to other.>> SetCharType: PROC [table: Table, from, to: CHAR, chType: CharType]; << Sets char type for all characters in [from..to]>> EnterTwoCharOp: PROC [table: Table, c1, c2: CHAR]; << Makes c1c2 into a double character lexeme; e.g. -> and //. Both chars must be of type op.>> EnterAlias: PROC [table: Table, alias, standard: ATOM]; << Defines standard as the preferred form of alias.>> << Both may be any atoms, including one- or two-character operators. The lexer will automatically replace every scanned atom by its standard form, if specified, before passing it to the caller. This is to allow typing english names for operators that have no keys, such as "forall" or "and".>> EnterAtomProps: PROC [table: Table, atom: ATOM, props: Props]; << Defines parse/unparse properties of an atomic lexeme (identifier or operator).>> << The atom should be its own preferred form (i.e., should not be an alias)>> GetAtomProps: PROC [atom: ATOM, table: Table] RETURNS [name: ATOM, rp: REF Props]; << Maps aliases into standard names and gets parse/unparse props from table.>> << Will return table.defaultAtomProps if not defined.>> << - - - - LEXER>> Lexeme: TYPE = REF; -- Lexical item (REF INT, REF REAL, ATOM, or ROPE); Exhausted: PROC [stream: Stream, table: Table] RETURNS [BOOL]; << Skips blank characters from the given stream, returns TRUE if gets to end of it.>> Lex: PROC [stream: Stream, table: Table] RETURNS [lex: Lexeme, error: ROPE, rp: REF Props]; << Parses the longest prefix of the stream that is a valid lexeme.>> << If the parse is successful, Lex returns error = NIL, lex= lexeme parsed, rp = its properties. If lex is an atom without special properties, will set its properties to table.defaultAtomProps and return them. Aliases will be automatically mapped to their preferred forms. rp = NIL for REF INTs, REF REALs, and ROPEs.>> << If the stream is exhausted, or no prefix of it is a valid lexeme, Lex returns lex = rp = NIL and error = some non-empty message. Note that you will get an error if the strream begins with a blank. >> << In any case, the characters that were not used in lex are left in the stream. The part the was lexed to lex can be determined by comparing stream.pos before and after the call. >> << To use Lex, define and setup a table table, create a stream from the desired rope, then execute WHILE NOT Exhausted[stream, table] DO [lex, error] _ Lex[stream, table]; IF error # NIL THEN EXIT; IF THEN EXIT; ENDLOOP. IF error # NIL THEN {-- lexer or parser error-- ...} >> << - - - - PARSER>> Parse: PROC [stream: Stream, table: Table] RETURNS [expr: Se, error: ROPE, openCount: INTEGER]; << Parses a maximal-length expresion from the given rope.>> << If the parsing succeeds, the symbolic expression thus obtained is returned in expr, error is NIL, and openCount is 0. >> << If the parser fails, i.e. no prefix of the given rope is a valid expression, error is set to a non-null message, and expr is the (usually incomple) expression parsed so far. openCount will be the number of openfix operators (parentheses) whose matching closefixes were not parsed. >> << In any case, the part of the stream that was not parsed to expr is left in Contents[stream].>> << You will get an error if the stream is empty or contains only blanks.>> << - - - - UNPARSER>> UnparseType: TYPE = {zero, infixA, infixB, matchA, infixC, prefix, matchB, infixD, infixE}; <> Unparse: PROC [f: LIST OF Se, table: Table, culprit: Se _ NIL, openCount: INTEGER_0, leftMargin: INTEGER _ 0, rightMargin: INTEGER _ 69] RETURNS [ROPE]; << Unparses a Car[expr], and produces a rope.>> << - - - - SYNTAX CHECKING>> SyntacticPredicate: TYPE = PROC [f: Se] RETURNS [r: VerdictAndCulprit]; << Checks if Car[f] is of a specified form. If so, returns [yes, NIL]. If not, returns [no,f] or [ofCourseNot, f] >> Verdict: TYPE = {yes, no, ofCourseNot}; VerdictAndCulprit: TYPE = RECORD [verdict: Verdict, culprit: REF ANY _ NIL]; << Culprit relevant only if verdict is no or ofCourseNot, and is the sub-expression whose Car is wrong.>> Or: PROC [v1, v2, v3, v4, v5: VerdictAndCulprit _ [ofCourseNot]] RETURNS [r: VerdictAndCulprit]; << Caution: will compute all arguments before testing them.>> False: SyntacticPredicate; -- always returns [ofCourseNot, f] HasForm: PROC [f: Se, op: REF, Arg1: SyntacticPredicate, Arg2: SyntacticPredicate _ NIL] RETURNS [r: VerdictAndCulprit]; << Tests if Car[f] is a list (op, a1, a2) where Arg1[a1] and Arg2[a2] yield [yes]. The op may be an atom or a list of atoms (meaning any of them).>> Is: PROC [f: Se, op: REF] RETURNS [r: VerdictAndCulprit]; << Tests if Car[f] is the atom op. The op may be a list of atoms (meaning any of them).>> << - - - - CAR, CDR, ETC>> Car: PROC [r: Se] RETURNS [Se]; Cdr: PROC [r: Se] RETURNS [Se]; Cadr: PROC [r: Se] RETURNS [Se]; Caddr: PROC [r: Se] RETURNS [Se]; Cddr: PROC [r: Se] RETURNS [Se]; Caadr: PROC [r: Se] RETURNS [Se]; Cdar: PROC [r: Se] RETURNS [Se]; Cddar: PROC [r: Se] RETURNS [Se]; END. <<>>