ScriptTokenizer.mesa
Last Edited by Mitchell, February 11, 1983 6:15 pm
DIRECTORY
Bindings,
ScriptScan;
ScriptTokenizer:
CEDAR
PROGRAM
IMPORTS Bindings
EXPORTS ScriptScan
= BEGIN
c: REF TEXT;
InitScan:
PUBLIC
PROC [stream:
IO.
STREAM, bindTbl: Bindings.BTHandle, z:
ZONE] =
BEGIN
c ← RefText.New[256];
END;
classMap:
ARRAY
CHAR
OF CharClass ← [
NUL .. ControlZ: blank
'A .. 'Z: ucAlpha
'a .. 'z: lcAlpha
'0 .. '9: digit
'<,'>, '!, ... : separator
];
CharClass: TYPE ~ {lcAlpha, ucAlpha, digit, separator, blank};
{ PARA$ justified←T font.size← 3.7E2 { font | face ← <TimesRoman> } <now ... ty.>}
sbuuuusblllllllllsubllllsllllsbdsdudbsbllllbsbllllbsbsulllllllllsbsbsllllllllllsss
GetToken:
PUBLIC
PROC
RETURNS [t: Token] =
BEGIN
get a page at a time and map each character into its CharClass in a second packed array.
keep a ring buffer of Tokens in a separate array (to save allocation costs)
skip over string constants as fast as possible.
'< => {
UNTIL nextchar[stream]='>
DO
NULL
ENDLOOP;
END;
END.
Change Log
Created by Mitchell, February 7, 1983 6:13 pm