What's going on
We are concerned with establishing a 1-to-1 association between the strings of two languages. One language, call it Ll, consists of all the finite non-empty sequences of characters from some alphabet S. The other language, call it Ls, consists of all the finite non-empty sequences of characters drawn from S' (where S' ‚ S) that begin with a character from S'' (S'' ‚ S'). We want some restrictions on the association. We want the association to basically have the form of a bunch of (literal) pattern replacements --- that is, where pattern a appears in the string in Ll, pattern b appears in the string in Ls, for a bunch of client-specified a and b.
The technique used is as follows. We restrict the a's to be single characters. Ignore for the moment the restriction on the initial character in Ls. With each c (character from S) we associate some non-empty string b from S', where more than one c can be associated with the same string. We insist that there be some edge-labeled tree T, where every parent node has |S'| childward edges, each labeled with a different character in S', such that the set of paths is the set of bs.
To map from a string sl in Ll to a string ss in Ls:
Initialize ss to be the concatenation of the bs associated with the cs in sl.
Compute the correction of ss relative to sl.
Tag ss by the correction.
To invert:
Extract the correction and an initial sl from ss.
Correct sl by the correction.
The correction tells which c is actually associated with an instance of a b. The correction is a number that is almost a base n encoding of what to do for each c (but n varies from position to position). The correction is encoded in ss by a method similar to Tioga's method of representing formatting information in a file: as "trashy stuff" to the right of an "escape string".
We deal with the restriction on the initial character in Ls by using a different association from cs to bs until after some non-empty b is reached, and taking care of the possibility that no non-empty b is reached.
Now we say it again, precisely. We are defining a 1-to-1 relation, call it Code, between strings in Ls and Ll. This relation is determined by a client-specified function t: S Bool b S'*, which must obey certain restrictions. They are:
t(c, true) Ls
Btrue = {b | c t(c, true)=b}
Bfalse = {b | c t(c, false)=b}
UniquePrefix(Btrue, S'', S')
UniquePrefix(Bfalse, S', S')
UniquePrefix(B, S0, Sr) {
B ‚ S0"Sr*
'