ScriptTokenizer.mesa
Last Edited by Mitchell, February 11, 1983 6:15 pm
DIRECTORY
Bindings,
ScriptScan;
ScriptTokenizer: CEDAR PROGRAM
IMPORTS Bindings
EXPORTS ScriptScan
= BEGIN
c: REF TEXT;
InitScan: PUBLIC PROC [stream: IO.STREAM, bindTbl: Bindings.BTHandle, z: ZONE] = BEGIN
c ← RefText.New[256];
END;
classMap: ARRAY CHAR OF CharClass ← [
  NUL .. ControlZ: blank
  'A .. 'Z: ucAlpha
  'a .. 'z: lcAlpha
  '0 .. '9: digit
 '<,'>, '!, ... : separator
];
CharClass: TYPE ~ {lcAlpha, ucAlpha, digit, separator, blank};
{ PARA$ justified←T font.size← 3.7E2 { font | face ← <TimesRoman> } <now ... ty.>}
sbuuuusblllllllllsubllllsllllsbdsdudbsbllllbsbllllbsbsulllllllllsbsbsllllllllllsss
GetToken: PUBLIC PROC RETURNS [t: Token] = BEGIN
get a page at a time and map each character into its CharClass in a second packed array.
keep a ring buffer of Tokens in a separate array (to save allocation costs)
skip over string constants as fast as possible.
'< => {UNTIL nextchar[stream]='> DO NULL ENDLOOP;
END;
END.
Change Log
Created by Mitchell, February 7, 1983 6:13 pm