DIRECTORY Rope USING [ROPE]; ACFind: CEDAR DEFINITIONS ~ BEGIN ROPE: TYPE ~ Rope.ROPE; Ref: TYPE ~ REF Rep; ActionProc: TYPE ~ PROC[position: INT, keyFound: ROPE] RETURNS [quit: BOOL _ FALSE]; Create: PROC [keys: LIST OF ROPE, caseSensitive: BOOL _ FALSE] RETURNS [Ref]; Find: PROC [self: Ref, text: ROPE, action: ActionProc] RETURNS [foundAny: BOOLEAN _ FALSE]; MoveRep: TYPE ~ RECORD[SEQUENCE l: NAT OF State]; State: TYPE ~ REF StateRep _ NIL; StateRep: TYPE ~ RECORD [ keys: LIST OF ROPE _ NIL, -- what keys have been matched, if any? failure: State _ NIL, -- the next state, by default offset: CHAR _ '\000, -- offset to first char of the move sequence move: REF MoveRep _ NIL -- maps characters into States ]; Rep: TYPE ~ RECORD [ initialState: State _ NIL, caseSensitive: BOOLEAN _ FALSE ]; END. ACFind.mesa Copyright c 1985 by Xerox Corporation. All rights reserved. Dave Rumph, June 18, 1986 10:11:32 am PDT The Aho-Corasick pattern matching algorithm, as described in "Efficient String Matching: An Aid to Bibliographic Search," Communications of the ACM 18:6 (June 1975). It is very effective in searching repeatedly for a finite number of fixed patterns containing no wildcards. The efficiency of the algorithm is based on building a Finite State Automaton (FSA), which is then run over the input data. The FSA can be either deterministic or non-deterministic; the deterministic version, implemented here, requires more overhead time to setup, but then runs faster. Building and Using FSA's Procedure to be called for each key matched. Position is at the end of keyFound. Subtract Rope.Size[keyFound] (a fast operation) to get the beginning. Build the FSA, creating a new instance if required. Step through the text, calling the ActionProc for each match. Data Structures All the global stuff for one instance of a search Κύ˜codešœ ™ Kšœ Οmœ1™