[Cyan]<DATools6.1>EDIFAndCore>MessyCode.Mesa!1

MessyCode.Mesa

Spreitzer, January 17, 1986 12:12:21 pm PST

DIRECTORY Rope;

MessyCode: CEDAR DEFINITIONS = {

What's going on

We are concerned with establishing a 1-to-1 association between the strings of two languages. One language, call it Ll, consists of all the finite non-empty sequences of characters from some alphabet S. The other language, call it Ls, consists of all the finite non-empty sequences of characters drawn from S' (where S' ‚ S) that begin with a character from S'' (S'' ‚ S'). We want some restrictions on the association. We want the association to basically have the form of a bunch of (literal) pattern replacements --- that is, where pattern a appears in the string in Ll, pattern b appears in the string in Ls, for a bunch of client-specified a and b.

The technique used is as follows. We restrict the a's to be single characters. Ignore for the moment the restriction on the initial character in Ls. With each c (character from S) we associate some non-empty string b from S', where more than one c can be associated with the same string. We insist that there be some edge-labeled tree T, where every parent node has |S'| childward edges, each labeled with a different character in S', such that the set of paths is the set of bs.

To map from a string sl in Ll to a string ss in Ls:

Initialize ss to be the concatenation of the bs associated with the cs in sl.

Compute the correction of ss relative to sl.

Tag ss by the correction.

To invert:

Extract the correction and an initial sl from ss.

Correct sl by the correction.

The correction tells which c is actually associated with an instance of a b. The correction is a number that is almost a base n encoding of what to do for each c (but n varies from position to position). The correction is encoded in ss by a method similar to Tioga's method of representing formatting information in a file: as "trashy stuff" to the right of an "escape string".

We deal with the restriction on the initial character in Ls by using a different association from cs to bs until after some non-empty b is reached, and taking care of the possibility that no non-empty b is reached.

Now we say it again, precisely. We are defining a 1-to-1 relation, call it Code, between strings in Ls and Ll. This relation is determined by a client-specified function t: S Bool b S'*, which must obey certain restrictions. They are:

t(c, true) Ls

Btrue = {b | c t(c, true)=b}

Bfalse = {b | c t(c, false)=b}

UniquePrefix(Btrue, S'', S')

UniquePrefix(Bfalse, S', S')

UniquePrefix(B, S0, Sr) {

B ‚ S0"Sr*

' s S0"Sr : ! b B, s' Sr s = b"s'

(" is the concatenation operator)

We define Code in terms of two other 1-to-1 relations:

Code(ss, sl) { sm Lm, d N S(sm, sl, d, true) ' T(ss, sm, d)

Lm = B0"Br*

N is the set of non-negative integers

The relation S describes the substitution of bs for cs and the computation of the correction. The relation T describes how to tag a string by a correction.

S(sm, sl, d, i) =

sl=e ' sm=e ' d=0 (where e is the empty string)

( c S, slr Ll, b Bi, smr Bfalse*, dr N

sm=b"smr ' sl=c"slr ' t(c, i)=b

' d = idx(c, i) + nc(b, i) * dr

' S(smr, slr, dr, false)

idx(c, i) = |{c' S | c' < c ' t(c, i)=t(c', i)}|

(using any total order < that's convenient)

nc(b, i) = |{c S | t(c, i)=b}|

Auf Cedar

ROPE: TYPE = Rope.ROPE;

Alphabet: TYPE = REF AlphabetPrivate;

AlphabetPrivate: TYPE = RECORD [

chars: ROPE,

includes: ARRAY CHAR OF BOOL

];

Code: TYPE = REF CodePrivate;

CodePrivate: TYPE = RECORD [

to: ARRAY Language OF DecisionTree,

dangles: ARRAY Language OF INT,

size: INT

];

Language: TYPE = {A, B};

DecisionTree: TYPE = REF DecisionTreePrivatePrivate;

DecisionTreePrivate: TYPE = RECORD [

variant: SELECT kind: * FROM

branch => [a: ARRAY CHAR OF DecisionTree ← ALL[NIL]],

leaf => [r: Relation],

ENDCASE];

Relation: TYPE = REF RelationPrivate;

RelationPrivate: TYPE = RECORD [

codeword: ARRAY Language OF ROPE,

index: INT];

OtherLanguage: ARRAY Language OF Language

= [A: B, B: A];

Translate: PROC [code: Code, source: ROPE, to: Language] RETURNS [translated: ROPE];