<<>>
CharacterDiscussion.mail
    18-May-90    To: Mike Spreitzer    Re: Character Proposals (Part 1)
Date: 18 May 90 10:10:25 PDT
From: Kenneth A Pier:PARC:Xerox
Subject: Re: Character Proposals (Part 1)
In-reply-to: "Mike Spreitzer:PARC:Xerox's message of Thu, 17 May 90 08:44:39 PDT"
To: Mike Spreitzer:PARC:Xerox
cc: PCedarImplementors:PARC:Xerox, SchemeXeroxImplementors:PARC:Xerox

I am keeping track of the messages Re: Character Proposals as a PFUDGe type discussion on 
[CedarCommon2.0]<Discussions>CharacterDiscussion.mail.
    21-May-90    Mike Spreitzer    Character Proposals (Part 0)
Date: Mon, 21 May 90 08:13:24 PDT
From: Mike Spreitzer:PARC:Xerox
Subject: Character Proposals (Part 0)
To: PCedarImplementors, SchemeXeroxImplementors
Cc: Mike Spreitzer:PARC:Xerox

    I should mention, for the DLs, why I think this is interesting to think about now.
    There are three problems I'm addressing:

1.  We want to use more than the ASCII characters inside PARC.  I personally am interesting in using mathematical symbols in a new 
programming language, and in using Greek letters in existing programming languages.  I also want to use mathematical symbols in 
English text; this includes comments in programs.  While it is true that Tioga already handles more than ASCII, Tioga is not all of 
Cedar and Scheme.  For example, a program pretty printer could (and mine actually does) use multiple ROPEs and STREAMs to move the 
contents of comments through several packages.  Also, I think Xerox as a company should support non-English texts as nearly equally 
well as it does English ones as possible.  While we're not responsible for making products, sticking our heads in the ASCII sand 
makes it that much harder to transfer our work to product programs, and assumes that there are no interesting problems to be solved 
concerning multinational text.

2.  We want to interoperate with systems external to PARC that use more than the ASCII characters.  I deduce this from two slightly 
smaller assertions: (A) we want to interoperate with systems external to PARC, (B) some external systems will, and some already do, 
use character codings other than ASCII.  One such system is Viewpoint.  Pavel alleges that much of Europe uses an 8-bit coding 
called `Latin 1', which coincides with ASCII in the first 128 codes, and diverges in the second.  The X consortium is using 32-bit 
character/key codes.  Surely other organizations are also feeling the pressure, and we can expect more widespread use of non-ASCII 
codings in the future.

3.  We are already interoperating with non-Cedar systems that use character codings other than PARC ASCII, but we don't properly 
translate the character codes.  PARC ASCII associates up-arrow and left-arrow with the numbers that real, current ASCII associates 
with circumflex and underscore.  You can tell that we don't even interoperate properly with UNIX or other external ASCII systems 
because the underscore/left-arrow confusion shows up all over the place.  XCC associates the international currency symbol with the 
number that PARC ASCII associates the dollar-sign.  We are so confused about this that if you switch between screen and print 
StyleKind (try it! [look in the Places menu]) this character ($) changes appearance!  The problem is that Cedar doesn't recognize 
when it is importing characters from, or exporting characters to, a foreign system, and thus doesn't do any of the necessary 
translations.  Note that merely recognizing system boundaries and doing character translation does not necessarily require a larger 
character set; but if you accept point (2), it does.

    I think this is interesting to think about now because of the youth of PCedar and Scheme.  With PCedar3.0 coming up, there will 
    soonish be an opportunity to recompile everything, which means that changes to the Rope and IO interfaces are within the Pale 
    (for DCedar, I think the expectation is that we will never go to a higher major version, and thus will never again recompile 
    everything).  SchemeXerox is also quite young; there is very little in it now that depends on the character coding being used 
    (eg, there is no editor, no file reader/writer, and nothing that images text --- it gets all these things from Cedar).  We 
    haven't even got Modula-3 on PCR yet.  I think that if we are going to expand the character set, there will be fewer software 
    engineering obstacles if we do it now than any time later.
    The biggest downside I see is that this requires work, and person-cycles are in very short supply.  Is this likely to ease in the 
    future?  I think the biggest open question is: is it worth the work?  In order to answer that, we must explore just how little 
    work we can get away with.  To do that means sketching out a design, which is what I'm trying to do with this discussion.
    Still to come: the ROPE and STREAM level interfaces question, and the compiler support question.

    Thoughts?
    Mike
    17-May-90    Mike Spreitzer    Character Proposals (Part 1)
Date: Thu, 17 May 90 08:44:39 PDT
From: Mike Spreitzer:PARC:Xerox
Subject: Character Proposals (Part 1)
To: PCedarImplementors, SchemeXeroxImplementors
Cc: Mike Spreitzer:PARC:Xerox

    I propose that Cedar and Scheme eventually adopt XCC as their standard character code.  I claim that one of the biggest bogons in 
    Cedar's current handling of characters is that it doesn't know its boundaries: Cedar doesn't take care to note when it's 
    getting characters from a foreign system and convert them into its own code (of course, with Cedar's code currently having only 
    8 bits, that's a pretty hopeless thing to do anyway).  I propose that the Cedar and Scheme interfaces with foreign systems be 
    expanded to discuss character translation.  I propose to start developing interfaces now for the new characters.  The rest of 
    this message is concerned with the design of those interfaces.
    I think it would be great if we could expand Rope.ROPE and IO.STREAM to handle the new characters, because that could save us a lot 
    of editing.  But we can only do that when we're ready to recompile all of PCedar and either freeze DCedar or recompile all of 
    it too.  Did I hear the sentiment that it's nearly time to freeze DCedar expressed at PFUDGE yesterday?
    Regardless of whether we expand Rope and IO or make new interfaces, I think we should start with an interface defining characters.  
    It might look something like this:

    Char: CEDAR DEFINITIONS = {

        CHR: TYPE = RECORD [CARD];
            <<The CARD is the XCC code for the character represented by data of this type.>>

        noCHR: CHR = [CARD32.LAST];
            <<This is a special CHR that does not represent any character --- like NIL, which is a REF that doesn't point to any memory.>>

        Valid: PROC [CHR] RETURNS [BOOL];
            <<Does our standard coding associate a character with the given number?  Valid[noCHR] = FALSE.>>

        Ord: PROC [CHR] RETURNS [CARD];
        Val: PROC [CARD] RETURNS [CHR];
        Widen: PROC [CHAR] RETURNS [CHR];
            <<Equivalent to Import[[the Cedar language's built-in coding, ..]].>>
        Narrow: PROC [CHR] RETURNS [CHAR];
            <<Equivalent to Export[.., the Cedar language's built-in coding], except that BoundsFault is raised instead of NoTranslation.>>

        Coding: TYPE = REF CodingPrivate;
        CodingPrivate: TYPE;
            <<Represents some association between numbrs and characters.  Example Codings are ASCII, XCC, and JIS.>>

        LookupCoding: PROC [ATOM] RETURNS [Coding];
            <<Returns NIL if the given ATOM doesn't name a known coding.  There may be other ways (in other interfaces) to obtain Codings.>>

        CodedChar: TYPE = RECORD [cdg: Coding, chr: CHR];
            <<Represents a character in some foreign coding.>>

        Defined: PROC [CodedChar] RETURNS [BOOL];
            <<Does the given coding associate some character with the given number?>>

        Describe: PROC [CodedChar] RETURNS [ATOM];
            <<Returns an English description of the character that the given coding associates with the given number.  Returns NIL iff NOT 
            Defined.>>

        Corresponds: PROC [CHR, CodedChar] RETURNS [BOOL];
            <<Foreign codings may be ambiguous --- they can associate more than one meaning with one of their characters.  Our coding attempts 
            to be unambigous.  This, one CodedChar corresponds, in general, to a set of CHRs.  This and the next few procedures 
            expose this correspondence.>>

        NumCorrespndents: PROC [CodedChar] RETURNS [INT];
            <<Returns the size of the set of CHRs that correspond to the given CodedChar.  Size will be 0 iff (NOT Defined) OR we haven't yet 
            extended our CHRs with a new number to correspond to a newly discovered CodedChar.>>

        NthCorrespondent: PROC [CodedChar, INT] RETURNS [CHR];
            <<Returns one of the CHRs that correspond to the given CodedChar.  Raises BoundsFault if given index is >= NumCorrespndents.  There 
            is no significance to the ordering, except that correspondent 0 is the one returned by Import.>>

        Import: PROC [CodedChar] RETURNS [CHR];
            <<Returns the CHR that best represents the given character.  Equivalent to NthCorrespondent[.., 0], except that instead of raising 
            BoundsFault, Import raises NoTranslation --- and the caller may RESUME with a CHR to use anyway.>>

        Export: PROC [chr: CHR, to: Coding] RETURNS [CHR];
            <<IF Cooresponds[chr, [to, ans]], returns ans.  Otherwise, raises NoTranslation, and returns what the caller RESUMEs with.>>

        NoTranslation: SIGNAL [CHR] RETURNS [CHR];
            <<Raised by translation procedures when there is no translation in the target coding for the source character.  Client can RESUME 
            with a translation to use.>>

        }.
    One of the issues raised by this interface is the data type used to represent our new characters.  Scheme is fortunately opaque --- 
    programs can't tell how characters are represented, and there are no constraints on the behavior of integer->char and 
    char->integer (the Scheme equivalents of VAL and ORD) that prevent expansion.  The SchemeXerox representation for character 
    currently allows 16 bits, and could relatively easily be expanded by another 2 bits; this should be able to hold all the 
    numbers we'll need in the forseeable future.  I'm assuming that we don't want to widen Cedar's built-in type CHAR.  I chose to 
    declare the Cedar representation to take at least 32 bits, instaed of 16, because (1) you should always prepare for growth, and 
    (2) the XCC standard explicitly states that it may require more than 16 bits in the future.  Alternatives for theCedar 
    representation that come to mind are:
        CHR: TYPE = CARD;
        CHR: TYPE[SIZE[CARD32]];
        CHR: TYPE = RECORD [CHRRep]; CHRRep: TYPE[SIZE[CARD32]];
    I disfavor the first alternative because it prevents Cedar's type system from distinguishing characters and cardinals.  I actually 
    think I prefer the second alternative to what I proposed earlier, but refrained from making that my actual proposal because I 
    seem to remember that such constructions are disfavored by Cedar programmers in general.  I favor making the representation 
    opaque because information hiding is good in general.  Alternative 3 makes the representation opaque, but in a way that may not 
    trouble most programmers --- for example, I think it makes NARROW[.., REF CHR] possible.
    Another issue is the names for the Cedar interface and type.  Alternatives that come to mind are: Char.X, Char.CHAR (does the 
    compiler allow this?), Char.ECHAR, Char.Char, CH.AR, and Char.XCHAR.
    One problem that always arises in this area is the poorly standardized, overloaded English terms used to discuss it.  We need names 
    for: a number that represents a character (sometimes called a character code; I refrain from using that term because it's 
    ambiguous), an association between numbers and characters (sometimes called a character code, sometimes called a character set; 
    I chose to call this a coding), and a set of characters (sometimes called a repertoire, sometimes a character set; I've avoided 
    needing this term so far).
    Should the translation stuff go in the basic interface, or somewhere else?  It needs to be extremely available: wherever Cedar or 
    Scheme meets a foreign system --- and that may be very deep inside Cedar's or Scheme's own implementation --- characters must 
    be translated.  I think putting it in the basic interface is good because that emphasizes the fact that translation is 
    available whever these new characters are.
    Should Codings be looked up by name, or fetched from an interface that simply exports the codings we need as individual items (eg, 
    Coding.NewAscii, Coding.JIS6226, Coding.PressMath)?  I favor both.  Having an interface that exports some particular codings as 
    individual items is good because references to interface items are better than indirection through ATOMs (or whatever we use 
    for names).  Having the lookup by name is good because that's part of the support for extending our interoperability without 
    recompiling or running any code.
    I think we should be able to extend our interoperability without recompiling and running any code.  This means extending the set of 
    codings, adding associations between CodedChars and CHRs, and adding CHRs.  This can be done, for example, by keeping these 
    things in files in standard places --- somewhat analogously to the way we define fonts by files.
    Gotta go; more later.  Thoughts?

    Mike
    17-May-90    Doug Wyatt    Re: Character Proposals (Part 1)
Date: 17 May 90 11:03:12 PDT
From: Doug Wyatt:PARC:xerox
Subject: Re: Character Proposals (Part 1)
In-reply-to: "Mike Spreitzer:PARC:Xerox's message of Thu, 17 May 90 08:44:39 PDT"
To: Mike Spreitzer:PARC:Xerox
cc: PCedarImplementors:PARC:Xerox, SchemeXeroxImplementors:PARC:Xerox

    Here's another possible representation for CHR ...
        CHR: TYPE = MACHINE DEPENDENT {nil(CARD.LAST)};
    ORD and VAL will work on this, and you can test chr=nil.  It's tempting to name some of the values too, but 1) you're using the 
    same CHR type for different codings, and 2) the compiler might not deal gracefully with an enumeration containing hundreds or 
    thousands of names.
    Actually, I prefer XCHAR (or ECHAR) to CHR; X is too short and uninformative, CHAR and Char are unacceptable name conflicts.  
    You're not serious about CH.AR, are you?
    -- D.
    17-May-90    Mike Spreitzer    Re: Character Proposals (Part 1)
Date: Thu, 17 May 90 11:25:19 PDT
From: Mike Spreitzer:PARC:Xerox
Subject: Re: Character Proposals (Part 1)
In-reply-to: "Doug Wyatt:PARC:xerox's message of 17 May 90 11:03:12 PDT"
To: Doug Wyatt:PARC:xerox
Cc: Mike Spreitzer:PARC:Xerox, PCedarImplementors:PARC:Xerox, SchemeXeroxImplementors:PARC:Xerox

    I like the enumerated type idea.  The main use will be for only one coding, so I think it would be OK to name some of the values.  
    That brings up the issue of what happens when XCC-1-... becomes XCC-2-...; it's easy enough to deal with additions to the 
    coding, but what happens when changes are made?  We have this problem regardless of whether we put some name-code bindings in 
    interfaces.  Note that the current technology (CHAR) simply forbids changes (and doesn't do too well on additions, either).
    My thought behind the name `Char.X' is that clients (and impls) would use the full name `Char.X'; the name only appears short in 
    the interface itself.  I'm not sure how seriously to take `CH.AR'; do we believe in the idea of designing for the full name 
    only?
    I'm not sure how tightly we want to bind ourselves to XCC.  For example, as we discover places where XCC fails to hold to 
    One-Semantic-One-Code we may wish to diverge our standard coding a bit from XCC.  If we admit we may not always use XCC, then 
    maybe using X in the name is misleading; that's why I suggested using E, and other schemes.
    Mike
    17-May-90    Foote:OSBU North    Re: Character Proposals (Part 1)
Date: 17-May-90 12:02:31 PDT
Subject: Re: Character Proposals (Part 1)
In-Reply-To: Originator: "::", UniqueString: "Mike Spreitzer:PARC's message of 17 May 90 11:26:12 PDT (Thursday)"
Message-ID: Originator: "James K. Foote:OSBU North:Xerox", UniqueString: "17-May-90 12:02:31 PDT"
To: Mike Spreitzer:PARC:Xerox
Cc: Doug Wyatt:PARC:Xerox, PCedarImplementors:PARC:Xerox, SchemeXeroxImplementors:PARC:Xerox
Reply-To: Foote:OSBU North:Xerox
From: Foote:OSBU North:Xerox

> I'm not sure how tightly we want to bind ourselves to XCC.  

Make sure that the benefits of not using XCC are high enough to justify the costs.  The costs may include converters for 
interoperability and might include new fonts.
 
Further, I'd recommend that if you're not going to use XCC then it would be better to invent your own standard than to be close to, 
but not exactly the same as, XCC.

But then what do I know. 

    -- Jim
    18-May-90    Mike Spreitzer    Character Proposals (Part 1, cont'd)
Date: Fri, 18 May 90 08:01:51 PDT
From: Mike Spreitzer:PARC:Xerox
Subject: Character Proposals (Part 1, cont'd)
To: PCedarImplementors, SchemeXeroxImplementors
Cc: Mike Spreitzer:PARC:Xerox

    Another issue raised by the Char interface is the use of �name� data in that interface --- which is more primitive than the Rope or 
    whatever interface is used to define our standard for �name� data.  That is, what should LookupCoding take as an argument, in 
    an interface �below� Rope?  What should Describe return?  I think the right answer is that those procedures don't belong in the 
    Char interface --- they can be lifted to some other interface that is above Rope (or equivalent).
    This brings us to the issue of how to layer a system so that nothing depends on something above it.  I think the following 
    organization would work:
Higher Stuff
Knowledge of how to import/export with an arbitrary system
Knowledge of how to import/export with base operating system (SunOS, Mac, ..)
Knowledge of pure CHR operations and CHR R CHAR operations
    Here's how the arbitrary import/expor operations could be implemented in a way that allows extensibility without changing code.  
    Imagine that we assign a name to every foreign coding we understand.  With each foreign coding we associate a file whose name 
    is derived (in some deterministic way that's accessible to people) from the coding's name.  Using the lower-level knowledge of 
    how to translate into the coding of the base operating system, we can translate that file name, open the file, and translate 
    its contents.  The file's contents is a listing (in the base operating system's coding) that gives, for each number that the 
    foreign coding associates with a character, the set of corresponding CHRs and an English (how chauvanistic!) description.  
    People can thus extend the set of understood foreign codings by creating such files, and can extend the association between 
    CodedChars and CHRs by editing such files; these edits can be done in Cedar or Scheme or the base operating system.  This 
    scheme relies on something else, probably code, to implement the translation between CHRs and the base operating system's 
    coding; extending this association would be more difficult, but also less likely to be required.

    Mike
    18-May-90    Mike Spreitzer    Character Proposals (Part 2)
Date: Fri, 18 May 90 08:22:46 PDT
From: Mike Spreitzer:PARC:Xerox
Subject: Character Proposals (Part 2)
To: PCedarImplementors, SchemeXeroxImplementors
Cc: Mike Spreitzer:PARC:Xerox

    Not only do we need the ability to translate to/from a foreign coding, we need to know when to translate.  For plain-text files, 
    that's probably easy: assume it's in the base operating system's coding, but provide clients (and thus, indirectly, users) the 
    ability to indicate otherwise.  What about Tioga files?  In current Tioga files, you cannot determine what character is meant 
    at a given position just by looking at the CHAR.  For some positions, XCC is used, and you can determine the intended character 
    by looking also at the CharSet (which ain't easy from a program, unless you're using TiogaAccess).  However, most of our Tioga 
    characters don't use XCC --- they use �PARC ASCII�, or worse.  Worse is using �looks� (or Postfix properties, or other Tioga 
    style hackery) to select a Press font that images a glyph corresponding to an entirely different character than what anybody's 
    ASCII assigns to the CHAR.  Each Press font uses an independent coding --- do most Press fonts (ie, all of them except the 
    Math, Greek, and symbol ones) share one or a few codings?  Anyway, if the Tioga document isn't so old that its style 
    annotations are too broken to tell you what font applies at a given position, you can tell which coding applies.
    I'm not going to ask for changes in Tioga (nor would I turn them down!).  But I think that for programs to properly read and write 
    Tioga documents, we need a package that correctly converts a Tioga document to/from a stream of CHR.  This means that the 
    TStyle level of Tioga has to be below programs that read/write Tioga documents --- which doesn't seem too unreasonable.
    Do we want to make a new kind of file, which is not full Tioga, nor plain Ascii, but has CHRs in it?
    18-May-90    Christian P Jacobi    Re: Character Proposals
Date: Fri, 18 May 90 11:58:18 PDT
From: Christian P Jacobi:PARC:Xerox
Subject: Re: Character Proposals
In-reply-to: "Mike Spreitzer:PARC:Xerox's message of Fri, 18 May 90 08:22:46 PDT"
To: Mike Spreitzer:PARC:Xerox
Cc: PCedarImplementors:PARC:Xerox, SchemeXeroxImplementors:PARC:Xerox

Just for the curious:
I have stored what I have not yet deleted from the last two weeks mail about internationalization for X windows. ---> 
/net/palain/palain/jacobi/charcodemail.maillog
For my taste pretty incomprehensive... but can we afford to ignore it?
Christian
 
    21-May-90    Mike Spreitzer    Character Proposals (Part 3)
Date: Mon, 21 May 90 08:26:54 PDT
From: Mike Spreitzer:PARC:Xerox
Subject: Character Proposals (Part 3)
To: PCedarImplementors, SchemeXeroxImplementors
Cc: Mike Spreitzer:PARC:Xerox

Maybe XCC is not the coding we should adopt.  It has been alledged that there is other work (I'm looking into some) on >8-bit 
standards that stands a better chance of winning in the marketplace.  On the other hand, XCC is currently being used by Mother X.  
If we are going to build one system with some parts written in Scheme and others in Cedar (and others in Modula-3?), these 
languages will either have to agree on one character coding (political yuk) or translate character codes as part of inter-language 
interoperability (performance yuk).  Note that SchemeXerox and PCedar already store their character codes differently (SchemeXerox 
shifts the number and adds tag bits).

Mike
    22-May-90    Mike Spreitzer    Character Proposals (Part 4)
Date: Tue, 22 May 90 09:33:32 PDT
From: Mike Spreitzer:PARC:Xerox
Subject: Character Proposals (Part 4)
To: PCedarImplementors, SchemeXeroxImplementors
Cc: Mike Spreitzer:PARC:Xerox

    Much of the impact on existing code comes from whatever is done about the Rope and IO interfaces.  Following are some alternatives.

    I.  Make new interfaces (eg, ERope and EIO).  Least disruptive to the existing system, but requires the most work to get to a new 
    system where expanded characters are used ubiquitously.  PCedar.depends lists 1630 modules that depend directly on one or both 
    of Rope and IO.  To get to a point where expanded characters are used ubiquitously, each of those 1630 modules would have to be 
    edited.  Since some of those modules are themselves interfaces --- some of them pretty central, like FS and PFS --- other 
    problems ensue.  For an interface like FS and PFS, whose use of ROPEs and STREAMs is essentially confined to procedure 
    arguments and results, alternate procedures that take and return EROPEs and ESTREAMs can be added.  But FS and PFS also declare 
    Error, which passes a ROPE; that should be changed at some point to pass an EROPE.  In other interfaces, the 
    non-argument-or-result use of ROPEs and STREAMs is more central.  For example, a Viewer has a name field that is declared to be 
    a ROPE; that should eventually chage to an EROPE.  Each use of ROPE or STREAM in the type of an interface item will require 
    either: (1) adding an alternate E-item to the interface, and thus eventual editing of all the clients, or (2) changing the type 
    of that item to use EROPE and ESTREAM, and editing and recompiling the implementation and all the clients in one release step.

    II.  Extend Rope and IO, without changing the types of existing public items.  Redefine a ROPE to mean a sequence of ECHAR, instead 
    of a sequence of CHAR, and a STREAM analogously; this is natural since most clients are using them for sequences of characters, 
    and our notion of characters is changing.  The existing Rope and IO procedures that take or yield CHARs are left as they are, 
    and new alternate versions that traffic in ECHARs are added.  When a procedure is required to convert an ECHAR into a CHAR and 
    there is no correct conversion, an error is raised.  The text variant of RopeRep can be left as it is, adding all the new 
    variants (including the inevitable new wide flat one) to those grouped under the node variant; thus REF TEXT and Rope.Text are 
    still bitwise equivalent.  Since extremely few Cedar modules know about the internal structure of the node variant of RopeRep, 
    and IO's impl can easily be made to deal with stream classes that don't provide the expanded class procedures, simply expanding 
    these definitions --- without changing the rest of the system to actually pass or receive larger characters --- can be done 
    with very little disruption to the existing system.   It does require recompiling most of Cedar --- but that's expected to be 
    done for PCedar3.0 anyway.  It also requires either freezing or recompiling most of DCedar.  Is it time to freeze DCedar?  
    Anyway, as far as PCedar is concerned, there is very little disruption at first; what I'm talking about is adding RopeRep 
    variants and STREAM class procedures that nobody uses at first.  Then, the conversion of the rest of the system to actually use 
    larger characters can proceed without requiring further interface changes (except for interfaces with items that use CHAR in 
    their type --- I think there are far fewer of these than ones that use ROPE or STREAM, but I don't know of an easy way to check 
    that).  The lack of interface changes means that the changes to the rest of the system don't all need to be done at the same 
    time; the rest of the system can be upgraded incrementally, and each upgrade to a package does not disturb its clients (unless 
    and until one client tries to use that upgraded package to pass an expanded character to another client that isn't yet prepared 
    to deal with them, in which case the receiving client will get a runtime error).  A disadvantage is that Cedar's type system 
    doesn't help a client tell when a service has been upgraded to work with expanded characters; this is the flip side of not 
    having the type system require editing or recompilation every time a service is upgraded.  I think this is a win because many 
    modules that use ROPEs and STREAMs just do `pipefitting' with them, and don't fondle characters individually; these modules 
    never need to be edited in this proposal (but they must in proposal I).  I did a little study to quantify this claim; it's 
    quick and dirty and only approximate, but I think informative.  I consulted PCedar.depends to find all the modules that 
    reference either or both of Rope and IO; these are the 1630 mentioned earlier.  I studied those whose names begin with either A 
    or B; there are 111 of these.  I examined the Rope and IO interfaces, and identified the items that yield CHARs; they are:
        Rope: Fetch InlineFetch Map Translate Flatten InlineFlatten ToRefText AppendChars UnsafeMoveChars ContainingPiece
        IO: GetChar GetBlock UnsafeGetBlock PeekChar TOS TextFromTOS GetCedarToken GetToken BreakProc CharClass GetLine GetByte GetHWord 
        GetFWord CreateStreamProcs CreateStream STREAMRecord StreamProcs Value ValueType
    Using XRef, I found and examined all the modules that reference these items.  Of the 111 Rope/IO-referencing modules, only 27 
    depend on a ROPE or STREAM element fitting into a CHAR.  This suggests that following proposal II instead of I would save over 
    3/4 of the editing.  I noticed that most of the editing that would be required is pretty brainless.  This study overestimates 
    the saving because it doesn't look at other interfaces that mention CHAR.
    Some clients use ROPEs and STREAMs to pass 8-bit bytes, rather than characters.  The expanded ROPEs and STREAMs can, of course, 
    also pass bytes; there will even be implementations that only pass bytes, and thus pay no time or space penalty.  How should 
    sequences of characters be encoded in files?  All the popular encodings should be supported.

    III.  Extend Rope and IO, changing public items that traffic in CHAR to traffic in ECHAR instead; add ECHAR to the built-in types 
    of the compiler, and make CHAR a subrange of ECHAR.  Russ says adding a new built-in type requires very little compiler work, 
    and a recompilation of everything (every .mob contains a list of all the built-in types).  He wasn't sure how much work would 
    be involved in making CHAR a subrange of ECHAR.  Proposal III impacts the rest of the system much like proposal II, except that 
    the ECHAR -> CHAR conversion (and potential error) would occur outside Rope.Fetch or IO.GetChar instead of inside.  The 
    advantage of this proposal is that clients whose only use of Fetch or GetChar is to compare the returned character to a literal 
    do not require eventual editing (unless they logically should be comparing against more characters) under proposal III (but do 
    under I and II).  In the study mentioned above, this reduces the number of modules that must be edited from 27 to 24 (assuming 
    their character comparisons remain appropriate).  One disadvantage is that the conversion must then be done for every ROPE and 
    STREAM, even those that only implement 8-bit characters.  Another disadvantage is that it is harder to tell which packages have 
    been upgraded --- there is no longer the clue of their using EFetch instead of Fetch.

    I think proposal II is the best.  What do you think?

    Mike