Spelling Corrector/Pattern Completer Edited by Teitelman on November 3, 1982 11:23 am
DIRECTORY
Generator USING [Handle],
Rope USING [ROPE]
;
Spell: CEDAR DEFINITIONS =
BEGIN
General comments
This interface supports two distinct operations, spelling correction and pattern completion. Spelling correction involves attempting to transform an unrecognized rope, called the unknown, into a known rope, which is a member of some computable set of ropes, according to various heuristics, e.g. removal of doubled characters, transpositions, case errors, etc. For example, correct Compille to Compile (doubled character), UsrExecImpl to UserExecImpl (missing character), ViewerOsp to ViewerOps (transposition). Pattern completion involves transforming the unknown rope into a known candidate or candidates using specified transformations. (Currently the only pattern supported is *, which matches any sequence of characters. In the future, all patterns recognized by the Tioga Edit Tool will be supported.)
There are two ways to call this package. In the first case, the client has in hand an unknown rope and wants to find the single correct rope that it specifies. The unknown rope may be a misspelling of some candidate in the source of candidates (see discussion of sources below), or it may be a pattern which uniquely specifies the desired candidate, e.g. the user might type R*.profile in a context where a file name was required, knowing that this is sufficient to be unambiguous. In order to permit the client not to have to distinguish these two cases, a single procedure, GetTheOne, is provided. In the case that the unknown is a pattern, GetTheOne calls the pattern matcher, and if the resulting list consists of a single candidate, returns that candidate, otherwise GetTheOne returns NIL (i.e. if there are zero or more than one successful matches). If the unknown is not a pattern, then actual spelling correction will be performed. In either case, if the user's profile has so indicated as described below, the user will be informed of the action, or confirmation will be requested before the action is taken.
The second use of this package occurs when the client has in hand an unknown and wants all candidates that match it, i.e., a list of ropes. Again, the unknown may be a pattern, or simply a misspelling. For example, the user might type to the userexec Delete *.mesa$, or Delete Fooo.mesa. In order to permit the client not to have to distinguish these two cases, there is a single procedure, GetMatchingList, which handles both cases. If the unknown is a pattern, GetMatchingList returns a list of all candidates that match the pattern, if a misspelling, GetMatchingList returns a list consisting of the single correct spelling (if one is found, otherwise return NIL).
sources of candidates for correction/completion
For both GetTheOne and GetMatchingList, the source of candidates can either be a LIST OF ROPE, or a generator which produces ROPEs. Generators can be easily created using CreateGenerator as described below. For both cases, an optional filter predicate can be specified which can be used to select out a subset of the candidates for consideration.
Confirmation and messages for spelling correction
Spelling correction is divided into three classes in order of decreasing certainty: (1) corrections involving case errors only, (2) corrections in which all mistakes are accounted for, i.e., the only mistakes are doubled characters or transpositions, e.g., typing compille or compiel for compile (but not compil), (3) corrections in which there are some (albeit very few) actual errors, i.e., extraneous characters or omitted characters. (The case of pattern completion is considered to fall between case (1) and (2).) The user can independently control which classes of corrections confirmation is required for, for which classes he is to be informed, or even for which classes of corrections the spelling corrector is enabled at all, by appropriate settings for the parameters confirm, inform, and disabled (which are packaged into a single record of type Modes, described below). The value of disabled is an enumerated type {never, someMistakes, allAccountedFor, patternMatch, caseError, always}, which specifies that corrections are disabled for that class and all classes that precede it in the enumerated type. For example, if disabled is someMistakes, then compille will be corrected to compile, but compil won't, because it has a missing character and hence falls into the class someMistakes. If disabled is always, then the corrector is turned off completely. Similarly, the value of inform and confirm specify that informing/confirming is requested for that class and all that precede it in the enumerated type. For example, if inform is allAccountedFor, then the user will be informed for corrections of class allAccountedFor as well as those with someMistakes. If informing is requested, but no inform procedure was supplied in the corresponding call, then simply do not inform the user.
These parameters can be specified by entries in the user profile for Spell.Confirm, Spell.Inform, and Spell.Disabled, or explicitly passed in as arguments to GetTheOne and GetMatchingList. The default settings are disabled = never, confirm = allAccountedFor, inform = patternMatch.
For those cases for which confirmation is requested, a timeout and a default value for confirmation can be specified via user profile entries Spell.Timeout and Spell.DefaultConfirm respectively. The default setting for timeout = -1, meaning never timeout. If confirmation is requested, but no confirming procedure has been supplied in the corresponding call, then the behaviour is the same as though the user rejected the correction, i.e. did not confirm.
Types, global parameters
ROPE: TYPE = Rope.ROPE;
SpellingList: TYPE = LIST OF ROPE;
SpellingGenerator: TYPE = REF SpellingGeneratorRecord;
SpellingGeneratorRecord: TYPE = RECORD[clientData: REF ANY, private: REF SpellingGeneratorPrivateRecord];
SpellingGeneratorPrivateRecord: TYPE;
Filter: TYPE = PROCEDURE[candidateRope: ROPE, unknown: ROPE] RETURNS [accept: BOOLEAN]; -- true means consider the candidate
AbortProc: TYPE = PROC;
responsible for checking for an abort condition, and also doing the abort. For example, when called from UserExecUtilities.GetTheOne, this procedure would be IF UserExec.UserAbort[exec] THEN ERROR UserAborted;
ConfirmProc: TYPE = PROC [msg: ROPE, timeout: INT, defaultConfirm: BOOL] RETURNS [yes: BOOL];
output rope to your favorite stream, display in viewer, whatever, then request confirmation. timeout is how long to wait, in milliseconds, before returning default. If timeout = -1, then wait forever, i.e. user must confirm.
InformProc: TYPE = PROC [msg: ROPE];
output to your favorite stream, display in viewer, whatever
CorrectionClass: TYPE = {never, someMistakes, allAccountedFor, patternMatch, caseError, always};
Modes: TYPE = REF READONLY ModesRecord;
ModesRecord: TYPE = RECORD[
inform, confirm, disabled: CorrectionClass,
timeout: INT, -- In msecs. -1 means never time out
defaultConfirm: BOOL -- the default value to be used in case a confirmation times out, i.e. TRUE means the correction is confirmed, FALSE means it is rejected.
];
defaultModes: Modes;
the modes obtained from the userprofile
GetTheOne, GetMatchingList
GetTheOne: PROCEDURE[
unknown: ROPE,
spellingList: SpellingList ← NIL,
generator: SpellingGenerator ← NIL, -- if both spellingList and generator are non-NIL, candidates will be taken from generator until it runs out, and then from spellingList. (If both spellingList and generator are NIL, then the correction will fail immediately.)
abort: AbortProc ← NIL,-- NIL => never abort.
confirm: ConfirmProc ← NIL, -- NIL => If confirmation is required, act as though user said No.
inform: InformProc ← NIL,-- NIL => no output.
filter: Filter ← NIL,
modes: Modes ← NIL -- NIL means use values in user profile
]
RETURNS [ROPE];
GetMatchingList: PROCEDURE[
pattern: ROPE,
spellingList: SpellingList ← NIL, -- arguments same interpretation as for GetTheOne
generator: SpellingGenerator ← NIL,
abort: AbortProc ← NIL,
confirm: ConfirmProc ← NIL,
inform: InformProc ← NIL,
filter: Filter ← NIL,
modes: Modes ← NIL
]
RETURNS [LIST OF ROPE];
Note...
the interface UserExecUtilities provides a convenient way of calling GetTheOne and GetMatchingList by specifying either an ExecHandle or a Viewer. These procedures will specify appropriate values for abort, confirm, and inform when calling the corresponding procedures in Spell, e.g. if given an ExecHandle, a Yes No menu will be posted and the user can confirm either via menu or by typing Y or N.
Spelling Generators
GeneratorFromProcs: PROC [
initialize: PROCEDURE[self: SpellingGenerator] ← NIL,
generate: PROCEDURE [self: SpellingGenerator] RETURNS [candidate: REF ANY], -- must narrow to ROPE or REF TEXT.
terminate: PROCEDURE[self: SpellingGenerator] ← NIL, -- to be called when finished.
clientData: REF ANYNIL
] RETURNS [SpellingGenerator];
Note...
the value of generate must narrow to ROPE or REF TEXT. If the value narrows to a REF TEXT, a ROPE will be created only for those candidates that the spelling corrector is going to retain across calls to generate. This enables generate to return a scratch ref text for inspection, and not have to allocate a new rope for each candidate it generates.
GeneratorFromEnumerator: PROC [
enumerator: PROC[self: Generator.Handle],
clientData: REF ANY
] RETURNS [SpellingGenerator];
This procedure is useful when you have a procedure that enumerates, e.g. EnumerateGlobalFrames, and wish to use it to provide candidates for spelling corrector. Simply write an enumerator procedure which uses Generator.Produce (see Generator interface for an example) to produce candidates from inside of the enumerator, and pass this procedure in as the enumerator argument.
File correction/completion.
GetTheFile: PROC [
unknown: Rope.ROPE,
defaultExt: Rope.ROPENIL,
abort: AbortProc ← NIL,
confirm: ConfirmProc ← NIL,
inform: InformProc ← NIL,
modes: Modes ← NIL
] RETURNS [correct: Rope.ROPE];
If unknown does not have an extension, defaultExt will be used. If the unknown does not have an extension and defaultExt is NIL, then unknown will only be compared against files with no extension.
If unknown has an extension, and the extension (minus trailing $ if any) is not one of those on extensionList (defined below), first attempt spelling correction on extensionList. If this succeeds, see if the file name is now correct, and if so, return without doing any directory enumeration. Otherwise, attempt correction considering only files with the indicated extension.
If the unknown extension does not correspond to one of those on extensionList, attempt correction considering files with any extension, i.e. both the root and the extension may or may not be misspelled. (Note that this differs from the procedure followed if unknown does not contain an extension, and defaultExt is specified, in that in this case it is assumed that defaultExt is correct, and only files with the indicated extension are examined.)
In the event of a successful correction, the informing message will contain an extension only if the unknown contained an extension. However, the value returned, if non-NIL, will always be the full name of the file.
Here are some examples:
GetTheFile["Mumble.mesa"] => look at files with extension mesa for a misspelling of mumble (becauuse "mesa" is on extensionList)
GetTheFile["Mumble", "mesa"] => same as above, except correction message will say Mumble -> rather than Mumble.mesa ->
GetTheFile["Mumble.msa"] => correct msa to mesa, see if mumble.mesa exists, and if not, proceed as in case 1
GetTheFile["Mumble", "msa"] => only look at files with extension msa
GetTheFile["Mumble.frob"] => examine all files with non-Null extensions, because "frob" is not on extensionlist.
GetTheFile["Mumblefrob"] => only look at files with no extensions
GetMatchingFileList: PROC [
unknown: Rope.ROPE,
defaultExt: Rope.ROPENIL,
abort: AbortProc ← NIL,
confirm: ConfirmProc ← NIL,
inform: InformProc ← NIL,
modes: Modes ← NIL
] RETURNS [files: LIST OF ROPE];
file extensions
extensionList: SpellingList;
used in spelling correction of file names as described below. initialized to (mesa, bcd, cm, config, commands, profile, df, doc, press, style, abbreviations). The user is free to add entries to this list.
AddExtension: PROC [ext: ROPE]; -- adds ext to extensionList.
Miscellaneous
IsAPattern: PROC [unknown: ROPE] RETURNS[BOOLEAN];
currently true if unknown contains * not preceded by a '
END.