DIRECTORY Bindings, ScriptScan; ScriptTokenizer: CEDAR PROGRAM IMPORTS Bindings EXPORTS ScriptScan = BEGIN c: REF TEXT; InitScan: PUBLIC PROC [stream: IO.STREAM, bindTbl: Bindings.BTHandle, z: ZONE] = BEGIN c _ RefText.New[256]; END; classMap: ARRAY CHAR OF CharClass _ [ NUL .. ControlZ: blank 'A .. 'Z: ucAlpha 'a .. 'z: lcAlpha '0 .. '9: digit '<,'>, '!, ... : separator ]; CharClass: TYPE ~ {lcAlpha, ucAlpha, digit, separator, blank}; { PARA$ justified_T font.size_ 3.7E2 { font | face _ } } sbuuuusblllllllllsubllllsllllsbdsdudbsbllllbsbllllbsbsulllllllllsbsbsllllllllllsss GetToken: PUBLIC PROC RETURNS [t: Token] = BEGIN '< => {UNTIL nextchar[stream]='> DO NULL ENDLOOP; END; END. ^ScriptTokenizer.mesa Last Edited by Mitchell, February 11, 1983 6:15 pm get a page at a time and map each character into its CharClass in a second packed array. keep a ring buffer of Tokens in a separate array (to save allocation costs) skip over string constants as fast as possible. Change Log Created by Mitchell, February 7, 1983 6:13 pm ΚY˜J™J™2unitšΟk ˜ Jšœ ˜ Jšœ ˜ —šœœœ˜ Jšœ ˜Jšœ ˜—Jšœ˜Jšœœœ˜ š Οnœœœ œœ!œ˜VJ˜Jšœ˜—šœ œœœ˜%Icodešœœ˜L˜'L˜L˜L˜—Kšœ œ0˜?KšΟfΠfkŸ  Ÿ?˜RKšŸR˜RK˜š žœœœœ˜0J™XJ™KJ™/J™š œœœœœ˜1J˜—Jšœ˜—Jšœ˜Isubtitle™ K™-—…—Ϊ‘