DIRECTORY Rope; MessyCode: CEDAR DEFINITIONS = { Auf Cedar ROPE: TYPE = Rope.ROPE; Alphabet: TYPE = REF AlphabetPrivate; AlphabetPrivate: TYPE = RECORD [ chars: ROPE, includes: ARRAY CHAR OF BOOL ]; Code: TYPE = REF CodePrivate; CodePrivate: TYPE = RECORD [ to: ARRAY Language OF DecisionTree, dangles: ARRAY Language OF INT, size: INT ]; Language: TYPE = {A, B}; DecisionTree: TYPE = REF DecisionTreePrivatePrivate; DecisionTreePrivate: TYPE = RECORD [ variant: SELECT kind: * FROM branch => [a: ARRAY CHAR OF DecisionTree _ ALL[NIL]], leaf => [r: Relation], ENDCASE]; Relation: TYPE = REF RelationPrivate; RelationPrivate: TYPE = RECORD [ codeword: ARRAY Language OF ROPE, index: INT]; OtherLanguage: ARRAY Language OF Language = [A: B, B: A]; Translate: PROC [code: Code, source: ROPE, to: Language] RETURNS [translated: ROPE]; }. MessyCode.Mesa Copyright c 1986 by Xerox Corporation. All rights reserved. Spreitzer, January 17, 1986 12:12:21 pm PST What's going on We are concerned with establishing a 1-to-1 association between the strings of two languages. One language, call it Ll, consists of all the finite non-empty sequences of characters from some alphabet S. The other language, call it Ls, consists of all the finite non-empty sequences of characters drawn from S' (where S' G S) that begin with a character from S'' (S'' G S'). We want some restrictions on the association. We want the association to basically have the form of a bunch of (literal) pattern replacements --- that is, where pattern a appears in the string in Ll, pattern b appears in the string in Ls, for a bunch of client-specified a and b. The technique used is as follows. We restrict the a's to be single characters. Ignore for the moment the restriction on the initial character in Ls. With each c (character from S) we associate some non-empty string b from S', where more than one c can be associated with the same string. We insist that there be some edge-labeled tree T, where every parent node has |S'| childward edges, each labeled with a different character in S', such that the set of paths is the set of bs. To map from a string sl in Ll to a string ss in Ls: Initialize ss to be the concatenation of the bs associated with the cs in sl. Compute the correction of ss relative to sl. Tag ss by the correction. To invert: Extract the correction and an initial sl from ss. Correct sl by the correction. The correction tells which c is actually associated with an instance of a b. The correction is a number that is almost a base n encoding of what to do for each c (but n varies from position to position). The correction is encoded in ss by a method similar to Tioga's method of representing formatting information in a file: as "trashy stuff" to the right of an "escape string". We deal with the restriction on the initial character in Ls by using a different association from cs to bs until after some non-empty b is reached, and taking care of the possibility that no non-empty b is reached. Now we say it again, precisely. We are defining a 1-to-1 relation, call it Code, between strings in Ls and Ll. This relation is determined by a client-specified function t: S X Bool _ S'*, which must obey certain restrictions. They are: t(c, true) B Ls Btrue = {b | E c , t(c, true)=b} Bfalse = {b | E c , t(c, false)=b} UniquePrefix(Btrue, S'', S') UniquePrefix(Bfalse, S', S') UniquePrefix(B, S0, Sr) W B G S09Sr* & A s B S09Sr# : E! b B B, s' B Sr# , s = b9s' (9 is the concatenation operator) We define Code in terms of two other 1-to-1 relations: Code(ss, sl) W E sm B Lm, d B N , S(sm, sl, d, true) & T(ss, sm, d) Lm = B09Br* N is the set of non-negative integers The relation S describes the substitution of bs for cs and the computation of the correction. The relation T describes how to tag a string by a correction. S(sm, sl, d, i) = sl=e & sm=e & d=0 (where e is the empty string) V E c B S, slr B Ll, b B Bi, smr B Bfalse*, dr B N , sm=b9smr & sl=c9slr & t(c, i)=b & d = idx(c, i) + nc(b, i) * dr & S(smr, slr, dr, false) idx(c, i) = |{c' B S | c' < c & t(c, i)=t(c', i)}| (using any total order < that's convenient) nc(b, i) = |{c B S | t(c, i)=b}| Κq– "cedar" style˜code™Kšœ Οmœ1™‘œ‘œœœ‘œΟuœ2™οMš‘œ‘œœ ™Mš‘ œ‘œœ‘œœ‘œ‘œ‘œ™ Mš‘ œ‘œœ‘œœ‘œ‘œ ‘œ™"Mšœ ‘ œ‘œ‘œ™Mšœ ‘ œ‘œ‘œ™š œ ‘œ‘ œ‘ œ™Mš ‘œœ‘ ‘ ’™ Mš'œœ‘œœ‘ ‘ Πmuœœ‘œœ‘œ‘œœ‘ £œœ‘œ‘‘œ™.—Mšœœ™!—šœ6™6Mš*œ‘ œ‘ œœœ‘ œœ œ‘œœœ‘ œ‘ œ‘œœ‘ œ‘ œ‘œ™CMš œ œ‘ ‘ ’™ M™%—šœ-‘œ‘œg™œš œ‘ œ‘ œ‘œ™Mš‘ œ‘œœ‘ œ‘œœ‘œ ‘œ™/š'œœ‘œœ‘œ‘ œœ œ‘œœ‘ œ‘ œœ‘ ’œ‘ œœ™4Mš‘ œ‘‘ œœ‘ œ‘‘ œœ‘œ‘œ‘™Mš œ‘œ‘œΠdgœ‘œ‘ ™Mš œ‘ œ‘ œ‘ œ™——šœ‘œ ‘œœ‘œ‘œ‘œœ‘œ‘œ‘œ‘œ™2M™+—Mšœ€œ‘œ ‘œœ‘œ‘œ‘œ‘œ™ ——˜ Kšžœžœžœ˜K˜Kšœ žœžœ˜%šœžœžœ˜ Kšœžœ˜ Kšœ žœžœžœž˜K˜—K˜Kšœžœžœ ˜šœ žœžœ˜Kšœžœ žœ˜#Kšœ žœ žœžœ˜Kšœž˜ K˜—K˜Kšœ žœ ˜K˜Kšœžœžœ˜4šœžœžœ˜$šœ žœ ž˜Kš œžœžœžœžœžœ˜5Kšœ˜Kšžœ˜ ——K˜Kšœ žœžœ˜%šœžœžœ˜ Kšœ žœ žœžœ˜!Kšœžœ˜ —K˜šœžœ žœ ˜)K˜—K˜Kš Οn œžœžœžœžœ˜TK˜—K˜——…—NΙ