The Soundex algorithm presented in Knuth's "The Art of Computer Programming", Vol. 3, page 392 is as follows:
1. Retain the first letter of the name, and drop all occurrences of a, e, h, i, o, u, w, y in other positions.
2. Assign the following numbers to the remaining letters after the first:
b, f, p, v --> 1
c, g, j, k, q, s, x, z --> 2
d, t --> 3
l --> 4
m, n --> 5
r --> 6
3. If two or more letters with the same code were adjacent in the original name (before step 1), omit all but the first.
4. Convert to the form "letter, digit, digit, digit" by adding trailing zeros (if there are less than three digits), or by dropping rightmost digits (if there are more than three).
NameToCode:
PUBLIC
PROC [name:
ROPE]
RETURNS [SoundexCode] ~ {
LetterToDigit: Rope.TranslatorType = {
[old: CHAR] RETURNS [new: CHAR] -- old Cedar7.0 impl
[old: CHAR] RETURNS [CHAR] -- note "new" not declared as return type
new:
CHAR ¬
SELECT Ascii.Lower[old]
FROM
'a, 'e, 'h, 'i, 'o, 'u, 'w, 'y => '0,
'b, 'f, 'p, 'v => '1,
'c, 'g, 'j, 'k, 'q, 's, 'x, 'z => '2,
'd, 't => '3,
'l => '4,
'm, 'n => '5,
'r => '6,
ENDCASE => '7;
RETURN[new]; -- must explicitly return new here
};
codename: ROPE;
scode: REF TEXT ¬ RefText.ObtainScratch[codeSize];
prevchar, c: CHAR;
IF name = NIL THEN RETURN [NIL];
codename ¬ Rope.Translate[base: name, start: 1, translator: LetterToDigit];
scode ¬ RefText.InlineAppendChar[scode, Ascii.Upper[Rope.Fetch[name,0]]];
prevchar ¬ LetterToDigit[Rope.Fetch[name,0]];
FOR i:
INT
IN [0..Rope.Length[codename])
DO
c ¬ Rope.Fetch[codename, i];
IF c # prevchar
AND c # '0
THEN {
scode ¬ RefText.InlineAppendChar[scode, c];
IF RefText.Length[scode] = codeSize THEN EXIT;
};
prevchar ¬ c;
ENDLOOP;
FOR i:
INT
IN [RefText.Length[scode]..codeSize)
DO
scode ¬ RefText.InlineAppendChar[scode, '0];
ENDLOOP;
codename ¬ Rope.FromRefText[scode];
RefText.ReleaseScratch[scode];
RETURN[codename];
};