SoundexImpl.mesa
Copyright © 1985 by Xerox Corporation. All rights reserved.
Doug Terry, November 26, 1985 4:39:52 pm PST
DIRECTORY
Ascii,
RefText,
Rope,
Soundex;
SoundexImpl: CEDAR PROGRAM
IMPORTS Ascii, RefText, Rope
EXPORTS Soundex
~ BEGIN
OPEN Soundex;
ROPE: TYPE ~ Rope.ROPE;
The Soundex algorithm presented in Knuth's "The Art of Computer Programming", Vol. 3, page 392 is as follows:
1. Retain the first letter of the name, and drop all occurrences of a, e, h, i, o, u, w, y in other positions.
2. Assign the following numbers to the remaining letters after the first:
b, f, p, v --> 1
c, g, j, k, q, s, x, z --> 2
d, t --> 3
l --> 4
m, n --> 5
r --> 6
3. If two or more letters with the same code were adjacent in the original name (before step 1), omit all but the first.
4. Convert to the form "letter, digit, digit, digit" by adding trailing zeros (if there are less than three digits), or by dropping rightmost digits (if there are more than three).
codeSize: CARDINAL = 8; -- the above algorithm assumes a codeSize of 4
NameToCode: PUBLIC PROC [name: ROPE] RETURNS [SoundexCode] ~ {
LetterToDigit: Rope.TranslatorType = {
[old: CHAR] RETURNS [new: CHAR]
new ← SELECT Ascii.Lower[old] FROM
'a, 'e, 'h, 'i, 'o, 'u, 'w, 'y => '0,
'b, 'f, 'p, 'v => '1,
'c, 'g, 'j, 'k, 'q, 's, 'x, 'z => '2,
'd, 't => '3,
'l => '4,
'm, 'n => '5,
'r => '6,
ENDCASE => '7;
};
codename: ROPE;
scode: REF TEXT ← RefText.ObtainScratch[codeSize];
prevchar, c: CHAR;
IF name = NIL THEN RETURN [NIL];
codename ← Rope.Translate[base: name, start: 1, translator: LetterToDigit];
scode ← RefText.InlineAppendChar[scode, Ascii.Upper[Rope.InlineFetch[name,0]]];
prevchar ← LetterToDigit[Rope.InlineFetch[name,0]];
FOR i: INT IN [0..Rope.InlineLength[codename]) DO
c ← Rope.InlineFetch[codename, i];
IF c # prevchar AND c # '0 THEN {
scode ← RefText.InlineAppendChar[scode, c];
IF RefText.Length[scode] = codeSize THEN EXIT;
};
prevchar ← c;
ENDLOOP;
FOR i: INT IN [RefText.Length[scode]..codeSize) DO
scode ← RefText.InlineAppendChar[scode, '0];
ENDLOOP;
codename ← Rope.FromRefText[scode];
RefText.ReleaseScratch[scode];
RETURN[codename];
};
END.
Doug Terry, October 18, 1985 1:46:04 pm PDT
created.
Doug Terry, November 8, 1985 6:50:06 pm PST
changes to: NameToCode, DIRECTORY, EXPORTS, ~
Doug Terry, November 26, 1985 4:39:52 pm PST
changes to: ~