edited by Teitelman April 6, 1983 3:46 pm
DIRECTORY
Atom USING [MakeAtom, MakeAtomFromChar],
CedarScanner USING [CharFromToken, GetProc, GetClosure, Token, GetToken, ContentsFromToken, RealFromToken, IntFromToken, IntegerOverflow, SingleFromToken, RopeFromToken, AtomFromToken],
IO USING[CharProc, GetChar, Backup, EndOf, CharsAvail, Error, EndOfStream, BreakProc, SetIndex, GetIndex, NUL, PeekChar, ROPE, STREAM, CR, ESC, LF, TAB, SP, FF, BS],
List USING [DReverse],
IOExtras USING [FromTokenProc],
RefText USING [ObtainScratch, ReleaseScratch, Append],
Rope USING [Cat, Equal, FromChar, FromRefText, Concat]
InputImpl: CEDAR PROGRAM
IMPORTS Atom, List, IO, Rope, CedarScanner, RefText
EXPORTS IO, IOExtras =
BEGIN OPEN IO;
constants
dot: CHARACTER = '.;
comma: CHARACTER = ',;
semicolon: CHARACTER = ';;
colon: CHARACTER = ':;
lpar: CHARACTER = '(;
rpar: CHARACTER = ');
lbracket: CHARACTER = '[;
rbracket: CHARACTER = '];
stringdelim: CHARACTER = '";
minus: CHARACTER = '-;
plus: CHARACTER = '+;
dollar: CHARACTER = '$;
upArrow: CHARACTER = '^;
MINUS: ATOM = Atom.MakeAtomFromChar[minus];
PLUS: ATOM = Atom.MakeAtomFromChar[plus];
DOLLAR: ATOM = Atom.MakeAtomFromChar['$];
UPARROW: ATOM = Atom.MakeAtomFromChar['^];
Signals
SyntaxError: PUBLIC ERROR[stream: STREAM, msg: ROPE ← NIL] = CODE;
Parsing the input stream as a sequence of characters: GetSequence
GetSequence:
PUBLIC
PROC [stream:
STREAM, charProc: CharProc ← LineAtATime]
RETURNS[value:
ROPE] = {
text: REF TEXT = RefText.ObtainScratch[100];
{
ENABLE UNWIND => RefText.ReleaseScratch[text];
i: NAT ← 0;
maxLength: NAT = text.maxLength;
char: CHARACTER;
quit, include: BOOLEAN ← FALSE;
CopyOver:
PROC = {
rope: ROPE;
IF i = 0 THEN RETURN;
text.length ← i;
rope ← Rope.FromRefText[text];
value ← Rope.Concat[value, rope]; -- if token is NIL, i.e. first time, is a nop which is just what we want.
i ← 0;
}; -- of CopyOver
DO
IF stream.EndOf[] THEN EXIT;
char ← stream.GetChar[];
[quit, include] ← charProc[char];
IF include
THEN
{IF i = maxLength THEN CopyOver[];
text[i] ← char;
i ← i + 1;
}
ELSE IF quit THEN stream.Backup[char];
IF quit THEN EXIT;
ENDLOOP;
CopyOver[];
RefText.ReleaseScratch[text];
};
}; -- of GetSequence
LineAtATime: PUBLIC CharProc = {RETURN[char = CR, char # CR]};
EveryThing: PUBLIC CharProc = {RETURN[FALSE, TRUE]};
GetLine:
PUBLIC
PROC [stream:
STREAM]
RETURNS[line:
ROPE] = {
reads characters until next CR, and returns all characters read up to but not including the CR (EOF also acts as a CR). The CR is also read and discarded.
line ← GetSequence[stream, LineAtATime];
[] ← stream.GetChar[ ! IO.EndOfStream => CONTINUE];
RETURN[line];
};
Parsing the input stream as a sequence of Cedar Tokens: GetCedarToken
Interfacing to the mesa scanner
The CedarScanner interface wants a closure, and operates by calling the corresponding procedure with an index. In order not to have to go back and forth with SetIndex and GetIndex (which some streams do not implement), the data field of the closure contains a REF TEXT and a stream, and characters that are read are copied to that buffer. For each call to GetToken, the index that is passed in to CedarScanner is reinitialized to 0, and refers to the corresponding character in the buffer. When CedarScanner asks for a character that has already been read, we simply get it from the buffer. This work is all done in GetClosureProc, the procedure used for all closures. GetCedarToken is the top level procedure that is called by GetAtom, GetBool etc. It takes a procedural argument which is supposed to get the corresponding value out of the token, and executes it under a catch phrase which catches SyntaxError. GetCedarToken also takes care of releasing the scratch text and checking for various errors, like EOF and tokenERROR, as well as backing the stream up when reading went too far.
CedarScannerToken: TYPE = CedarScanner.Token;
GetClosure: TYPE = CedarScanner.GetClosure;
FromTokenProc: TYPE = IOExtras.FromTokenProc;
GetClosureData: TYPE = RECORD[stream: STREAM, text: REF TEXT];
CreateClosure:
PROC [stream:
STREAM]
RETURNS[closure: GetClosure, closureData:
REF GetClosureData] =
INLINE
{closureData ← NEW[GetClosureData ← [stream: stream, text: RefText.ObtainScratch[16]]];
closure ← GetClosure[GetClosureProc, closureData];
};
GetCedarScannerToken:
PUBLIC PROC [stream:
STREAM, fromTokenProc: FromTokenProc] = {
closure: GetClosure;
closureData: REF GetClosureData;
token: CedarScannerToken;
syntaxError: ROPE;
{ENABLE UNWIND => RefText.ReleaseScratch[closureData.text];
[closure, closureData] ← CreateClosure[stream];
token ← GetCedarScannerToken1[stream, closure, closureData];
IF fromTokenProc #
NIL
THEN fromTokenProc[closure, token ! SyntaxError =>
{syntaxError ← Rope.Cat[CedarScanner.ContentsFromToken[closure, token], " ", msg];
CONTINUE;
}];
RefText.ReleaseScratch[closureData.text];
};
IF syntaxError # NIL THEN ERROR SyntaxError[stream, syntaxError];
};
takes a closure, closureData, calls CedarScanner backs up stream if necessary, checks for some error conditions. separate procedure because called from GetRefAny so that same closure can be used
GetCedarScannerToken1:
PROC [stream:
STREAM, closure: GetClosure, closureData:
REF GetClosureData]
RETURNS[token: CedarScannerToken] = {
text: REF TEXT;
tooFar: INT;
closureData.text.length ← 0;
token ← CedarScanner.GetToken[closure, 0];
text ← closureData.text; -- may have been replaced in case overflowed length.
SELECT token.kind
FROM
tokenEOF => ERROR IO.EndOfStream[stream];
tokenERROR => ERROR SyntaxError[stream, token.msg];
tokenCOMMENT => RETURN[GetCedarScannerToken1[stream, closure, closureData]];
ENDCASE;
IF (tooFar ← text.length - token.next) # 0
THEN
-- may have read too far
{IF tooFar < 0 THEN ERROR;
IF tooFar = 1 THEN stream.Backup[text[text.length - 1]]
ELSE stream.SetIndex[stream.GetIndex[] - tooFar];
};
};
GetClosureProc: CedarScanner.GetProc -- [data:
REF, index:
INT]
RETURNS [
CHAR] -- = {
closureData: REF GetClosureData ← NARROW[data];
char: CHAR;
text: REF TEXT ← closureData.text;
IF index < text.length THEN RETURN[text[index]]; -- backed up
char ← closureData.stream.GetChar[! IO.EndOfStream => GOTO EOF];
IF index # text.length THEN ERROR; -- we make this check here, rather than before the preceding statement, because of the scanner wants eof indicated by two nuls. Thus, we get called again with index one beyond text.length. The reason we don't simply add the NUL to text, is that then Backup in GetCedarScannerToken1 will try to backup over the NUL, which is wrong. We really need a single character that corresponds to EOF.
IF text.length >= text.maxLength
THEN
{newText: REF TEXT = RefText.ObtainScratch[2 * text.maxLength];
newText.length ← 0;
RefText.Append[to: newText, from: text];
RefText.ReleaseScratch[text];
closureData.text ← text ← newText;
};
text[text.length] ← char;
text.length ← text.length + 1;
RETURN[IF char = '& THEN 'A ELSE char]; -- treat & like any other character while parsing.
EXITS
EOF => RETURN[NUL];
};
GetCedarToken:
PUBLIC
PROC [stream:
STREAM]
RETURNS[value:
ROPE] = {
fromTokenProc: FromTokenProc = {
IF token.kind = tokenCOMMENT THEN value ← GetCedarToken[stream]
ELSE value ← CedarScanner.ContentsFromToken[closure, token]};
GetCedarScannerToken[stream, fromTokenProc];
};
Basic input routines for each value type: GetAtom, GetInt, GetCard, GetBool, GetReal, GetRope
GetAtom:
PUBLIC
PROC [stream:
STREAM]
RETURNS [value:
ATOM] = {
fromTokenProc: FromTokenProc = {
SELECT token.kind
FROM
tokenATOM => value ← CedarScanner.AtomFromToken[closure, token];
ENDCASE => ERROR SyntaxError[stream, "is not an atom"];
};
GetCedarScannerToken[stream, fromTokenProc];
};
GetId:
PUBLIC
PROC [stream:
STREAM]
RETURNS [value:
ROPE] = {
fromTokenProc: FromTokenProc = {
SELECT token.kind
FROM
tokenID => value ← CedarScanner.ContentsFromToken[closure, token];
ENDCASE => ERROR SyntaxError[stream, "does not describe an atom"];
};
GetCedarScannerToken[stream, fromTokenProc];
};
GetBool:
PUBLIC
PROC [stream:
STREAM]
RETURNS [value:
BOOLEAN] = {
fromTokenProc: FromTokenProc = {
r: ROPE;
SELECT token.kind
FROM
tokenID => r ← CedarScanner.ContentsFromToken[closure, token];
ENDCASE => ERROR SyntaxError[stream, "does not describe a BOOL"];
IF Rope.Equal[s1: r, s2: "TRUE"] THEN value ← TRUE
ELSE IF Rope.Equal[s1: r, s2: "FALSE", case: FALSE] THEN value ← FALSE
ELSE ERROR SyntaxError[stream, "does not describe a BOOL"];
};
GetCedarScannerToken[stream, fromTokenProc];
}; -- GetBool
GetCard:
PUBLIC
PROC [stream:
STREAM]
RETURNS [value:
LONG
CARDINAL] = {
fromTokenProc: FromTokenProc = {
SELECT token.kind
FROM
tokenINT => value ← CedarScanner.IntFromToken[closure, token ! CedarScanner.IntegerOverflow => ERROR SyntaxError[stream, " - integer overflow"]];
ENDCASE => ERROR SyntaxError[stream, "does not describe an CARD"]; -- ought to be able to catch errors wrongkind errors and convert them.
};
GetCedarScannerToken[stream, fromTokenProc];
};
GetInt:
PUBLIC
PROC [stream:
STREAM]
RETURNS [value:
INT] = {
fromTokenProc: FromTokenProc = {
SELECT token.kind
FROM
tokenINT => value ← CedarScanner.IntFromToken[closure, token ! CedarScanner.IntegerOverflow => ERROR SyntaxError[stream, " - integer overflow"]];
tokenSINGLE =>
{c: CHAR = CedarScanner.SingleFromToken[closure, token];
SELECT c
FROM
'- => value ← (-GetInt[stream]);
'+ => value ← GetInt[stream];
ENDCASE => GOTO Error;
};
ENDCASE => GOTO Error;
EXITS
Error => ERROR SyntaxError[stream, "does not describe an INT"];
};
GetCedarScannerToken[stream, fromTokenProc];
};
GetReal:
PUBLIC
PROC [stream:
STREAM]
RETURNS [value:
REAL] = {
fromTokenProc: FromTokenProc = {
SELECT token.kind
FROM
tokenREAL => value ← CedarScanner.RealFromToken[closure, token];
tokenINT => value ← CedarScanner.IntFromToken[closure, token];
tokenSINGLE => {
c: CHAR = CedarScanner.SingleFromToken[closure, token];
SELECT c
FROM
'- => value ← (-GetReal[stream]);
'+ => value ← GetReal[stream];
ENDCASE => GOTO Error;
};
ENDCASE => GOTO Error;
EXITS
Error => ERROR SyntaxError[stream, "does not describe a REAL"];
};
GetCedarScannerToken[stream, fromTokenProc];
};
GetRope:
PUBLIC
PROC [stream:
STREAM]
RETURNS [value:
ROPE] = {
fromTokenProc: FromTokenProc = {
SELECT token.kind
FROM
tokenROPE => value ← CedarScanner.RopeFromToken[closure, token];
ENDCASE => ERROR SyntaxError[stream, "is not a ROPE"]; -- ought to be able to catch errors wrongkind errors and convert them.
};
GetCedarScannerToken[stream, fromTokenProc];
};
GetRefAnyLine calls GetRefAny, constructing a list of the values returned, until an object is immediately followed by a CR. e.g. if user types FOO FIE FUM{cr} returns ($FOO, $FIE, $FUM). If user types FOO FIE FUM{sp} {cr} continues reading on next line. Useful for line oriented command interpreters. (Note that a CR types inside of a list will have the same effect as a space).
Reading Ref Anys: GetRefAny, GetRefAnyLine
GetRefAnyLine:
PUBLIC
PROCEDURE [stream:
STREAM]
RETURNS[list:
LIST
OF
REF
ANY ←
NIL] = {
temp: REF ANY;
list ← LIST[GetRefAny[stream]];
UNTIL
NOT IO.CharsAvail[stream]
DO
IF IO.PeekChar[stream] = IO.CR THEN {[] ← stream.GetChar[]; EXIT};
temp ← GetRefAny[stream: stream];
list ← CONS[temp, list];
ENDLOOP;
RETURN[List.DReverse[list]];
}; -- of GetRefAnyLine
GetRefAny:
PUBLIC
PROCEDURE[stream:
STREAM]
RETURNS[object:
REF
ANY] = {
closure: GetClosure;
closureData: REF GetClosureData;
RightParen: SIGNAL = CODE;
Comma: SIGNAL = CODE;
GetRefAny0:
PROCEDURE[stream:
STREAM]
RETURNS[object:
REF
ANY] = {
token: CedarScanner.Token;
UNTIL IO.EndOf[stream]
DO
token ← GetCedarScannerToken1[stream, closure, closureData];
SELECT token.kind
FROM
tokenID => {
r: ROPE = CedarScanner.ContentsFromToken[closure, token];
RETURN[IF Rope.Equal[r, "NIL"] THEN NIL
ELSE IF Rope.Equal[r, "TRUE"] THEN NEW[BOOL ← TRUE]
ELSE IF Rope.Equal[r, "FALSE"] THEN NEW[BOOL ← FALSE]
ELSE Atom.MakeAtom[r]];
};
tokenINT => RETURN[NEW[INT ← CedarScanner.IntFromToken[closure, token]]];
tokenREAL => RETURN[NEW[REAL ← CedarScanner.RealFromToken[closure, token]]];
tokenROPE => RETURN[CedarScanner.RopeFromToken[closure, token]];
tokenCHAR => RETURN[NEW[CHAR ← CedarScanner.CharFromToken[closure, token]]];
tokenATOM => RETURN[CedarScanner.AtomFromToken[closure, token]];
tokenSINGLE =>
{c: CHAR = CedarScanner.SingleFromToken[closure, token];
SELECT c
FROM
'( =>
{lst, tail: LIST OF REF ANY;
obj: REF ANY;
UNTIL
IO.EndOf[stream]
DO
obj ← GetRefAny0[stream ! RightParen => EXIT; Comma => LOOP];
IF tail # NIL THEN {tail.rest ← LIST[obj]; tail ← tail.rest}
ELSE {tail ← LIST[obj]; lst ← tail};
ENDLOOP;
RETURN[lst];
};
') => SIGNAL RightParen;
'^ => NULL; -- e.g. ^3, makes print and read be inverses.
', => SIGNAL Comma;
'-, '+ =>
{obj: REF ANY = GetRefAny0[stream];
WITH obj
SELECT
FROM
x: REF INT => IF c = '- THEN x^ ← -x^;
x: REF REAL => IF c = '- THEN x^ ← -x^;
ENDCASE => ERROR SyntaxError[stream, Rope.Concat["Illegal character: ", Rope.FromChar[c]]];
RETURN[obj];
};
ENDCASE => ERROR SyntaxError[stream, Rope.Concat["Illegal character: ", Rope.FromChar[c]]];
};
tokenDOUBLE => ERROR SyntaxError[stream, Rope.Concat["Illegal input: ", CedarScanner.ContentsFromToken[closure, token]]];
tokenCOMMENT => ERROR SyntaxError[stream, Rope.Concat["Illegal input: ", CedarScanner.ContentsFromToken[closure, token]]];
ENDCASE => ERROR; -- EOF and ERROR are caught in GetCedarScannerToken1 and converted into SyntaxError
ENDLOOP;
}; -- of GetRefAny0
[closure, closureData] ← CreateClosure[stream];
object ← GetRefAny0[stream !
RightParen => ERROR SyntaxError[stream, "unmatched left paren"];
Comma => ERROR SyntaxError[stream, "Illegal character: ,"]
];
RefText.ReleaseScratch[closureData.text];
}; -- of GetRefAny
Parsing the input stream as a sequence of arbitrary tokens: GetToken
GetToken:
PUBLIC PROC [stream:
STREAM, breakProc: BreakProc ← TokenProc]
RETURNS[
ROPE] = {
anySeen: BOOL ← FALSE;
charProc: CharProc = {
SELECT breakProc[char]
FROM
break => {include ← NOT anySeen; quit ← TRUE};
sepr => {include ← FALSE; quit ← anySeen};
other => {include ← TRUE; quit ← FALSE; anySeen ← TRUE};
ENDCASE => ERROR;
};
RETURN[GetSequence[stream, charProc]];
};
SkipOver:
PUBLIC
PROC [stream:
STREAM, skipWhile: BreakProc] =
{
char: CHARACTER;
IF skipWhile = NIL THEN RETURN;
DO
IF stream.EndOf[] THEN RETURN;
SELECT skipWhile[(char ← stream.GetChar[])]
FROM
other, break => EXIT;
ENDCASE;
ENDLOOP;
stream.Backup[char];
}; -- of SkipOver
WhiteSpace:
PUBLIC BreakProc = {
RETURN[
SELECT char
FROM
IO.SP, IO.CR, IO.LF, IO.TAB => sepr,
ENDCASE => other];
};
IDProc:
PUBLIC BreakProc = {
RETURN[
SELECT char
FROM
SP, CR, ESC, LF, TAB, ',, ':, '; => sepr,
ENDCASE => other];
};
TokenProc:
PUBLIC BreakProc = {
RETURN[
SELECT char
FROM
'[, '], '(, '), '{, '}, '", '+, '-, '*, '/, '@, '← => break,
SP, CR, ESC, LF, TAB, ',, ':, '; => sepr,
ENDCASE => other];
};
miscellaneous
PeekChar:
PUBLIC
PROC[self:
STREAM]
RETURNS [char:
CHARACTER] = {
char ← self.GetChar[]; self.Backup[char]
};
BackSlashChar:
PUBLIC
PROC [char:
CHARACTER, stream:
STREAM ←
NIL]
RETURNS [
CHARACTER]
interpreters \ conventions, e.g. maps \n to CR, \t to TAB, etc. Raises SyntaxError if \ not followed by acceptable character. IF char is in ['0..9], then stream must be supplied, or else syntaxerror. If stream is supplied, two more characters are read. If these are digits, returns corresponding character, otherwise, raises SyntaxError.
=
{
SELECT char
FROM
'n, 'N, 'r, 'R => RETURN[CR];
't, 'T => RETURN[TAB];
'b, 'B => RETURN[BS];
'l => RETURN[LF];
'f, 'F => RETURN[FF];
'\\, '\', '\", ESC => RETURN[char];
IN ['0..'7] =>
IF stream #
NIL
THEN
{d2, d3: CHARACTER;
d2 ← stream.GetChar[];
d3 ← stream.GetChar[];
IF d2 IN ['0..'7] AND d3 IN ['0..'7] THEN RETURN[0C + ((char - '0) * 64) + ((d2 - '0) * 8) + (d3 - '0)];
};
ENDCASE;
Error [SyntaxError, stream];
};
END.
Change Log
8-Mar-82 23:26:47 W.T. Added " to characters recognized by TokenProc
16-Apr-82 17:51:49
W.
T. Added +, -, *, @, ← to characters recognized by TokenProc
April 23, 1982 12:10 am W.T. took out hack for readrefany reading foo.fie. made it just return an atom
June 10, 1982 3:01 pm W.T. fixed bug in GetRope, GetRefText: StopAndPutBackChar was not being checked.
June 21, 1982 10:32 pm W.T. fixed bug in ReadStringDelim. If more than 256 characters appeared between "", then caused a boundsfault.
Edited on December 14, 1982 10:50 pm, by Teitelman
defined GetMesaToken
changes to: DIRECTORY , IMPORTS , GetMesaToken , getProcType (local of GetMesaToken)
Edited on January 28, 1983 5:49 pm, by Teitelman
release scratch reftext on unwind, rather than before error
changes to: GetCedarScannerToken1
Edited on March 25, 1983 3:55 pm, by Teitelman
changes to: fromTokenProc (local of GetReal) fixed bug so that GetReal when encountering an integer would not cause an error.
Edited on March 29, 1983 5:54 pm, by Teitelman
changed GetRefAny to raise an error if it saw a comment. This is consistent with behaviour of GetInt, GetReal, etc. Client should use filtercomments stream if wants to throw away comments.
changes to: GetRefAny0 (local of GetRefAny)
Edited on April 6, 1983 3:46 pm, by Teitelman
catch CedarScanner.IntegerOverflow and convert it to SyntaxError
changes to: DIRECTORY, fromTokenProc (local of GetCard), fromTokenProc (local of GetInt), fromTokenProc (local of GetCard), fromTokenProc (local of GetInt)