DIRECTORY Rope; MessyCode: CEDAR DEFINITIONS = { Auf Cedar ROPE: TYPE = Rope.ROPE; Alphabet: TYPE = REF AlphabetPrivate; AlphabetPrivate: TYPE = RECORD [ chars: ROPE, includes: ARRAY CHAR OF BOOL ]; Code: TYPE = REF CodePrivate; CodePrivate: TYPE = RECORD [ to: ARRAY Language OF DecisionTree, dangles: ARRAY Language OF INT, size: INT ]; Language: TYPE = {A, B}; DecisionTree: TYPE = REF DecisionTreePrivatePrivate; DecisionTreePrivate: TYPE = RECORD [ variant: SELECT kind: * FROM branch => [a: ARRAY CHAR OF DecisionTree _ ALL[NIL]], leaf => [r: Relation], ENDCASE]; Relation: TYPE = REF RelationPrivate; RelationPrivate: TYPE = RECORD [ codeword: ARRAY Language OF ROPE, index: INT]; OtherLanguage: ARRAY Language OF Language = [A: B, B: A]; Translate: PROC [code: Code, source: ROPE, to: Language] RETURNS [translated: ROPE]; }. MessyCode.Mesa Copyright c 1986 by Xerox Corporation. All rights reserved. Spreitzer, January 17, 1986 12:12:21 pm PST What's going on We are concerned with establishing a 1-to-1 association between the strings of two languages. One language, call it Ll, consists of all the finite non-empty sequences of characters from some alphabet S. The other language, call it Ls, consists of all the finite non-empty sequences of characters drawn from S' (where S' G S) that begin with a character from S'' (S'' G S'). We want some restrictions on the association. We want the association to basically have the form of a bunch of (literal) pattern replacements --- that is, where pattern a appears in the string in Ll, pattern b appears in the string in Ls, for a bunch of client-specified a and b. The technique used is as follows. We restrict the a's to be single characters. Ignore for the moment the restriction on the initial character in Ls. With each c (character from S) we associate some non-empty string b from S', where more than one c can be associated with the same string. We insist that there be some edge-labeled tree T, where every parent node has |S'| childward edges, each labeled with a different character in S', such that the set of paths is the set of bs. To map from a string sl in Ll to a string ss in Ls: Initialize ss to be the concatenation of the bs associated with the cs in sl. Compute the correction of ss relative to sl. Tag ss by the correction. To invert: Extract the correction and an initial sl from ss. Correct sl by the correction. The correction tells which c is actually associated with an instance of a b. The correction is a number that is almost a base n encoding of what to do for each c (but n varies from position to position). The correction is encoded in ss by a method similar to Tioga's method of representing formatting information in a file: as "trashy stuff" to the right of an "escape string". We deal with the restriction on the initial character in Ls by using a different association from cs to bs until after some non-empty b is reached, and taking care of the possibility that no non-empty b is reached. Now we say it again, precisely. We are defining a 1-to-1 relation, call it Code, between strings in Ls and Ll. This relation is determined by a client-specified function t: S X Bool _ S'*, which must obey certain restrictions. They are: t(c, true) B Ls Btrue = {b | E c , t(c, true)=b} Bfalse = {b | E c , t(c, false)=b} UniquePrefix(Btrue, S'', S') UniquePrefix(Bfalse, S', S') UniquePrefix(B, S0, Sr) W B G S09Sr* & A s B S09Sr# : E! b B B, s' B Sr# , s = b9s' (9 is the concatenation operator) We define Code in terms of two other 1-to-1 relations: Code(ss, sl) W E sm B Lm, d B N , S(sm, sl, d, true) & T(ss, sm, d) Lm = B09Br* N is the set of non-negative integers The relation S describes the substitution of bs for cs and the computation of the correction. The relation T describes how to tag a string by a correction. S(sm, sl, d, i) = sl=e & sm=e & d=0 (where e is the empty string) V E c B S, slr B Ll, b B Bi, smr B Bfalse*, dr B N , sm=b9smr & sl=c9slr & t(c, i)=b & d = idx(c, i) + nc(b, i) * dr & S(smr, slr, dr, false) idx(c, i) = |{c' B S | c' < c & t(c, i)=t(c', i)}| (using any total order < that's convenient) nc(b, i) = |{c B S | t(c, i)=b}| Э╩qЦ "cedar" styleШcodeЩKЪЬ ╧mЬ1ЩбЬбЬЭЬЭЬбЬ╧uЬ2ЩяMЪбЬбЬЭЬаЩMЪбаЬбЬЭЬбЬЭЬбЬбЬбЬЩ MЪбаЬбЬЭЬбЬЭЬбЬбЬ бЬЩ"MЪЬ баЬбЬбЬЩMЪЬ баЬбЬбЬЩЪ Ь бЬбаЬбаЬЭЩMЪ бЬЭЬбаЭбавЩ MЪ'ЭЬЭЬбЬЭЬбаЭба╨muЬЭЬбЬЭЬбЬбЬЭЬбагЬЭЬбЬбЭбЬЩ.ЧMЪЬЭЬЩ!ЧЪЬ6Щ6MЪ*ЬбаЬбаЬЭЬЭЬбаЬЭЬаЬбЬЭЬЭЬбаЬбаЬбЬЭЬбаЬбаЬбЬЩCMЪ ЬаЬбаЭбавЩMЩ%ЧЪЬ-бЬбЬgЩЬЪ ЬбаЬбаЬбЬЩMЪбаЬбЬЭЬбаЬбЬЭЬбЬ бЬЩ/Ъ'ЭЬЭЬбЬЭЬбЬбаЬЭЬаЬбЬЭЬбаЬбаЬЭЬбавЬбаЬЭЬЭЩ4MЪбаЬбЭбаЬЭЬбаЬбЭбаЬЭЬбЬбЬбЩMЪЭЬбЬбЬ╨dgЬбЬбаЩMЪЭЬбаЬбаЬбаЬЩЧЧЪЬбЬ бЬЭЬбЬбЬбЬЭЬбЬбЬбЬбЬЩ2MЩ+ЧMЪЬдЬбЬ бЬЭЬбЬбЬбЬбЬЩ ЧЧШ KЪЮЬЮЬЮЬШKШKЪЬ ЮЬЮЬШ%ЪЬЮЬЮЬШ KЪЬЮЬШKЪЬ ЮЬЮЬЮЬЮШKШЧKШKЪЬЮЬЮЬ ШЪЬ ЮЬЮЬШKЪЬЮЬ ЮЬШ#KЪЬ ЮЬ ЮЬЮЬШKЪЬЮШ KШЧKШKЪЬ ЮЬ ШKШKЪЬЮЬЮЬШ4ЪЬЮЬЮЬШ$ЪЬ ЮЬ ЮШKЪЬЮЬЮЬЮЬЮЬЮЬШ5KЪЬШKЪЮЬШ ЧЧKШKЪЬ ЮЬЮЬШ%ЪЬЮЬЮЬШ KЪЬ ЮЬ ЮЬЮЬШ!KЪЬЮЬШЧKШЪЬЮЬ ЮЬ Ш)KШЧKШKЪ ╧n ЬЮЬЮЬЮЬЮЬШTKШЧKШЧЧЕЧN╔