JunoLexer.mesa (was Lexer.mesa)
Coded September 9, 1982 4:13 pm
Last edited by Stolfi March 13, 1984 3:26:34 am PST

Simple lexer for alpha-numeric identifiers, unsigned ints, unsigned reals w/o trailing E's, single and double-character operators, and quoted strings with no funny business embedded in them. No two-line strings, either. Returns lexeme as a REF INT, REF REAL, ATOM, or ROPE.

End-of-file in mid-lexeme is an error: the last char in the stream should be of type blank. You can guarantee this with IO.AppendStreams if you can't guarantee it any other way.
DIRECTORY Atom, Rope, IO;

JunoLexer: DEFINITIONS

= BEGIN

Handle: TYPE = REF HandleRec;

HandleRec: TYPE = RECORD
[error: Rope.ROPE, -- initially NIL, set to error message on lexical error
eof: BOOL, -- initially FALSE, set to TRUE on end of stream.
a: REF ANY, -- contains last lexeme returned by Lex
buf: REF TEXT, -- contains the string that lexed to a
in: IO.STREAM, -- Source of characters to be broken into lexemes.
type: ARRAY CHAR OF CharType,
      -- described below
opList: LIST OF OpRec] ; -- list of two-character lexemes

NewHandle: PROC RETURNS [h: Handle];
All of h's fields are initialized except h.in
-- h.type is initialized by DefaultCharTypes below
-- To read lexemes, do h ← NewHandle[]; h.in ← <some input source>, then
-- call Lex repeatedly.

CharType: TYPE = {letter, digit, blank, quote, op};

AddOpPair: PUBLIC PROC [h: Handle, c1, c2: CHAR];
makes c1c2 into a double character lexeme; e.g. -> and ..
-- both chars must be of type op

Lex: PUBLIC PROC[h: Handle];
sets h.a to next lexeme from stream h.in,
-- OR sets h.eof to TRUE and h.a to NIL,
-- OR sets h.error to a non-nil message and h.a to NIL;
-- In the last case (error = TRUE), the characters removed
-- from h.in are put back, so you can reread them.
The char types in DefaultTypeArray are set by the following procedure:

OpRec: TYPE = RECORD [opname: RECORD [CHAR, CHAR], op: ATOM];

DefaultCharTypes: PUBLIC PROC[h: Handle];
sets h.type as follows:
-- letters to letter; digits to digit; '" to quote; cr, sp, lf, ff, nul, and tab to blank;
-- and all other to op

END.