-- CharDefs -- Revised by Tse: 17-Mar-83 14:13:51 -- Owner: GCurry --Public definitions for simple (16-bit) characters. --A simple character is one which has a 16-bit representation within the Star framework. --Character Set - The 16-bit character space is partitioned into uniform-sized blocks of 400B, called character sets. The character set space is partitioned into groups corresponding to anticipated usage. For example, character sets 0 through 57B. representing the first 60B x 400B characters are dedicated to non-Kanji (i.e., non-ideographic) characters; character sets 60b through 117B are devoted to Japanese Kanji, and 120B through 337B are reserved for Chinese Kanji. Eash of these major partitions is further partitioned by sub-function (DIscussion of these is deferred until a final specification becomes available - Joe Becker is considering these assignments). Character sets 340B through 377B are not currently used. DIRECTORY CharADefs USING [Codes0, Codes1, Codes2, Codes3, Codes4], CharBDefs USING [Codes20, Codes21, Codes40], StandardDefs USING [Cv]; CharDefs: DEFINITIONS = BEGIN OPEN StandardDefs; --==================== -- ON THE REPRESENTATION OF CHARACTERS : --Some preliminary experiments have shown that the code generated to deal with text may be rather sensitive to the (machine) representation of characters. For this reason we would like to defer the question of the "official" character representation, while admitting the possibility of a two word representation. If you need to know the SIZE of the type "Char" (as opposed to knowing only that it has at two fields called chset and code), please tell Gael Curry. --==================== Class: TYPE = { ClassBreak, ClassRoman, ClassHiragana, ClassKatakana, ClassKanji}; Char: TYPE = MACHINE DEPENDENT RECORD [ --Character chset(0:0..7): Chset, --Public field code(0:8..15): Code]; --Public field --WARNING !!!!!!! Chset: TYPE = CARDINAL [0..340B); --!!!!!! FILED IN DU AS PART OF ASPECT TYPE (see WSFontDefs). Check with appropriate group before changing --WARNING !!!!!!! Chsetsnonkanji: TYPE = Chset [0B..60B); Chsetskanji: TYPE = Chset [60B..340B); chsetRoman: Chset = 0; chsetGreek: Chset = 1; chsetRussian: Chset = 1; chsetCyrillic: Chset = 1; chsetKana: Chset = 2; chsetHebrew: Chset = 3; chsetBopomofo: Chset = 4; chsetSymbol: Chset = 20B; -- chsetJSymbol: Chset = 21B; ++ This is on a later page. chsetRendering: Chset = 40B; chsetTiles: CharDefs.Chset = LAST[CharDefs.Chset]; Code: TYPE = CARDINAL [0..400B); --==================== -- ACCENTED CHARACTERS : --This is included here for future compatibility. We hope to represent accented characters in the workstation the same as on the net. See also WSCharDefs.AqPlainCharacter. --==================== AccentedChar: TYPE = MACHINE DEPENDENT RECORD [ --Accented character accents(0:0..15): PACKED ARRAY Codes0 [unused300B..accentHacek] OF BOOLEAN, char(1:0..15): Char]; --FUNCTIONS ON CHARACTERS --==================== -- ON THE "MEANINGS" OF CHARACTERS : --Different characters designations may mean the same. There are at least two ways to deal with this possibility - (1) invent a space of meanings and map designations to it, and (2) choose a canonical designation for each meaning. We will try to use the second approach. To be precise, the space of designations is partitioned into equivalence classes based on equality of meaning, and a representative from each class is chosen as the canonical designation. --==================== Meaning: PRIVATE PROCEDURE [char: Char] RETURNS [Char] = INLINE --Returns the canonical designation for char. BEGIN --Used to worry about aliasing here RETURN[char] END; MeaningEqual: PROCEDURE [char1, char2: Char] RETURNS [BOOLEAN] = INLINE --Determines whether char1 means the same as char2. BEGIN RETURN[Meaning[char1] = Meaning[char2]]; --There are faster ways END; TextMeaningEqual: PROCEDURE [char1, char2: Char] RETURNS [BOOLEAN] = INLINE --The Star Functional Specification provides that "Match on Text" ignores case. This function provides that test. BEGIN RETURN[MeaningEqual[UpperCase[char1], UpperCase[char2]]]; --There are faster ways END; --==================== -- ON THE "ORDERINGS" OF CHARACTERS : -- Characters also are comparable under (at least) one notion of order. Note, however, that this is not the ordering that Mesa provides on the representation of "Char". --==================== Ordering: TYPE = MACHINE DEPENDENT{none(0), less(1), equal(2), greater(3)}; Order: PROCEDURE [char1, char2: Char] RETURNS [Ordering]; --Ordering determines the ordering between characters for the purpose of sorting. Note that --"Order[ char1, char2 ] = equal" isn't necessarily equivalent to --"MeaningEqual [ char1, char2 ]", or to --"TextMeaningEqual [ char1, char2 ]". --==================== -- MISCELLANEOUS FUNCTIONS ON CHARACTERS : -- Some of the functions below are tied to the Star Functional Specification more tightly than those above. In order to protect themselves against changes to the FS, clients should use functions on simple characters provided by this interface whenever possible (Please tell us about oversights). --==================== UpperCase: PROCEDURE [char: Char] RETURNS [Char] = INLINE --UpperCase returns the upper-case counterpart of character char. It is not clear what upper case MEANS for an arbitrary character, but whatever it comes to mean for Star, this operation implements it. BEGIN SELECT char.chset FROM chsetRendering => IF LOOPHOLE[char.code, Codes40] = Codes40[sigmaFinal] THEN char ← [chsetGreek, LOOPHOLE[Codes1[upperSigma], Code]] ELSE char.code ← UpperCase40[char.code]; ENDCASE => char.code ← SELECT char.chset FROM chsetRoman => UpperCase0[char.code], chsetGreek, chsetCyrillic => UpperCase1[char.code], --Add arms as necessary for other alphabets, ENDCASE => char.code; RETURN[char]; END; LowerCase: PROCEDURE [char: Char] RETURNS [Char] = INLINE --LowerCase returns the lower-case counterpart of character char. It is not clear what lower case MEANS for an arbitrary character, but whatever it comes to mean for Star, this operation implements it. BEGIN char.code ← SELECT char.chset FROM chsetRoman => LowerCase0[char.code], chsetGreek, chsetCyrillic => LowerCase1[char.code], chsetRendering => LowerCase40[char.code], --Add arms as necessary for other alphabets, ENDCASE => char.code; RETURN[char]; END; Break: PROCEDURE [char: Char] RETURNS [BOOLEAN]; --Determines whether char has the "break" property. Word: PROCEDURE [char: Char] RETURNS [BOOLEAN] = INLINE --Determines whether char has the "word" property. BEGIN RETURN[NOT Break[char]]; END; IsAlphanumeric: PROCEDURE [char: Char] RETURNS[BOOLEAN]; -- Determines whether char is an alphanumeric character in it's charset. -- I.e. IsAlphanumeric will return TRUE if char is a digit, 0-9, or if the charset in char is language charset and the code represents a character in the alphabet of that language. Otherwise it will return FALSE. -- Char Sets: chsetJSymbol: Chset = 21B; -- General & Japanese Symbols. -- JIS-1: chsetKanji0: Chsetskanji = FIRST[Chsetskanji]; chsetKanji1: Chsetskanji = chsetKanji0 + 1; chsetKanji2: Chsetskanji = chsetKanji0 + 2; chsetKanji3: Chsetskanji = chsetKanji0 + 3; chsetKanji4: Chsetskanji = chsetKanji0 + 4; chsetKanji5: Chsetskanji = chsetKanji0 + 5; chsetKanji6: Chsetskanji = chsetKanji0 + 6; chsetKanji7: Chsetskanji = chsetKanji0 + 7; chsetKanji8: Chsetskanji = chsetKanji0 + 8; chsetKanji9: Chsetskanji = chsetKanji0 + 9; chsetKanji10: Chsetskanji = chsetKanji0 + 10; chsetKanji11: Chsetskanji = chsetKanji0 + 11; -- JIS-2: chsetKanji12: Chsetskanji = chsetKanji0 + 12; chsetKanji13: Chsetskanji = chsetKanji0 + 13; chsetKanji14: Chsetskanji = chsetKanji0 + 14; chsetKanji15: Chsetskanji = chsetKanji0 + 15; chsetKanji16: Chsetskanji = chsetKanji0 + 16; chsetKanji17: Chsetskanji = chsetKanji0 + 17; chsetKanji18: Chsetskanji = chsetKanji0 + 18; chsetKanji19: Chsetskanji = chsetKanji0 + 19; chsetKanji20: Chsetskanji = chsetKanji0 + 20; chsetKanji21: Chsetskanji = chsetKanji0 + 21; chsetKanji22: Chsetskanji = chsetKanji0 + 22; chsetKanji23: Chsetskanji = chsetKanji0 + 23; chsetKanji24: Chsetskanji = chsetKanji0 + 24; chsetKanji25: Chsetskanji = chsetKanji0 + 25; -- TYPES AND CONSTANTS FOR JAPANESE LOOKUP SYSTEM: -- Types: CharLookup: TYPE = Char; -- Lookup char. -- Constants: charKanjiLookup: Char = Roman[space]; -- Kanji lookup char. --CHARACTER SET 0 -- Codes which are not to be used in a particular character set are named "unused#B"; codes which are available, but unassigned are named "available#B" The actual code assignments may change up to a point, so clients should reference characters symbolically whenever possible. Codes0: TYPE = CharADefs.Codes0; romanFL: Code = LOOPHOLE[Codes0[lowerA]]; -- First lowercase Roman char romanLL: Code = LOOPHOLE[Codes0[lowerZ]]; -- Last lowercase Roman char romanFU: Code = LOOPHOLE[Codes0[upperA]]; -- First uppercase Roman char romanLU: Code = LOOPHOLE[Codes0[upperZ]]; -- Last uppercase Roman char digitF: Code = LOOPHOLE[Codes0[digit0]]; -- First digit digitL: Code = LOOPHOLE[Codes0[digit9]]; -- Last digit space: Code = LOOPHOLE[Codes0[space]]; -- space newLine: Code = LOOPHOLE[Codes0[newLine]]; newPar: Code = LOOPHOLE[Codes0[newParagraph]]; quote: Code = LOOPHOLE[Codes0[rightQuote]]; doubleQuote: Code = LOOPHOLE[Codes0[doubleQuote]]; rightParen: Code = LOOPHOLE[Codes0[rightParenthesis]]; leftParen: Code = LOOPHOLE[Codes0[leftParenthesis]]; period: Code = LOOPHOLE[Codes0[period]]; exclamMark: Code = LOOPHOLE[Codes0[exclamationMark]]; questionMark: Code = LOOPHOLE[Codes0[questionMark]]; --CHARACTER SET 1: Greek and Cyrillic alphabets Codes1: TYPE = CharADefs.Codes1; --CHARACTER SET 2: Japanese "Hiragana & Katakana" Codes2: TYPE = CharADefs.Codes2; --CHARACTER SET 3: Hebrew Codes3: TYPE = CharADefs.Codes3; -- Other Type: RgHiragana: TYPE = [hirSmallA..hirN]; RgKatakana: TYPE = [katSmallA..katSmallKe]; -- Constants: -- chars: charChouon: Char = Kana[chouon]; -- chouon(long vowel) -- codes: hiraganaF: Code = LOOPHOLE[Codes2[hirSmallA]]; -- First Hiragana hiraganaL: Code = LOOPHOLE[Codes2[hirN]]; -- Last Hiragana katakanaF: Code = LOOPHOLE[Codes2[katSmallA]]; -- Fisrt Katakana katakanaL: Code = LOOPHOLE[Codes2[katSmallKe]]; -- Last Katakana hirKurikaeshi: Code = LOOPHOLE[Codes2[hirKurikaesi]]; -- Hiragana repeat hirKurikaeshiDakuon: Code = LOOPHOLE[Codes2[hirKurikaesiDakuon]]; -- Hiragana dakuon repeat katChouon: Code = charChouon.code; -- Katakana long vowel hirSmallA: Code = LOOPHOLE[Codes2[hirSmallA]]; hirN: Code = LOOPHOLE[Codes2[hirN]]; hirKurikaesi: Code = LOOPHOLE[Codes2[hirKurikaesi]]; -- hiragana repeat hirKurikaesiDakuon: Code = LOOPHOLE[Codes2[hirKurikaesiDakuon]]; -- hiragana dakuon repeat katSmallA: Code = LOOPHOLE[Codes2[katSmallA]]; katN: Code = LOOPHOLE[Codes2[katN]]; katSmallKe: Code = LOOPHOLE[Codes2[katSmallKe]]; katKurikaesi: Code = LOOPHOLE[Codes2[katKurikaesi]]; -- katakana repeat katKurikaesiDakuon: Code = LOOPHOLE[Codes2[katKurikaesiDakuon]]; -- katakana dakuon repeat kanjiKuriKaeshi: Code = 0B; -- Kanji repeat touten: Code = LOOPHOLE[Codes2[touten]]; -- Japanese comma kuten: Code = LOOPHOLE[Codes2[kuten]]; -- Japanese period hajimeKagiKakko: Code = LOOPHOLE[Codes2[hajimeKagiKakko]]; -- Japanese open quote owariKagiKakko: Code = LOOPHOLE[Codes2[owariKagiKakko]]; -- Japanese closed quote hajimeNijuKagiKakko: Code = LOOPHOLE[Codes2[hajimeNijuKagiKakko]]; -- Japanese open double quote owariNijuKagiKakko: Code = LOOPHOLE[Codes2[owariNijuKagiKakko]]; -- Japanese cloded double quote -- offsets: offsetHiragana: Cv = hirSmallA; offsetKatakana: Cv = katSmallA; offsetKatHir: Cv = offsetKatakana - offsetHiragana; --CHARACTER SET 4: Chinese Bopomofo Phonic Characters Codes4: TYPE = CharADefs.Codes4; --CHARACTER SET 20B: General and Technical symbols Codes20: TYPE = CharBDefs.Codes20; --CHARACTER SET 21B: General & Japanese symbols Codes21: TYPE = CharBDefs.Codes21; --CHARACTER SET 40B: Ligatures etc. Codes40: TYPE = CharBDefs.Codes40; --CHARACTER SET 60B throuch 73B: Japanese Kanji JIS-1 -- Types: RgChsetKanjiJIS1: TYPE = [chsetKanji0..chsetKanji11]; -- JIS-1 Kanji chsets -- Constants: -- chars: charKanjiKurikaesi: Char = [chsetKanji0, 0B]; charFirstKanjiJIS1: Char = [chsetKanji0, 0B]; charLastKanjiJIS1: Char = [chsetKanji11, 241B]; -- chars: numbers charKansuji0: Char = [chsetKanji0, 1B]; charKansuji1: Char = [chsetKanji0, 3B]; charKansuji2: Char = [chsetKanji0, 4B]; charKansuji3: Char = [chsetKanji0, 5B]; charKansuji4: Char = [chsetKanji0, 26B]; charKansuji5: Char = [chsetKanji0, 13B]; charKansuji6: Char = [chsetKanji0, 50B]; charKansuji7: Char = [chsetKanji0, 70B]; charKansuji8: Char = [chsetKanji0, 47B]; charKansuji9: Char = [chsetKanji0, 104B]; charKansujiJuu: Char = [chsetKanji0, 10B]; charKansujiHyaku: Char = [chsetKanji0, 146B]; charKansujiSen: Char = [chsetKanji0, 55B]; charKansujiMan: Char = [chsetKanji0, 17B]; charKansujiOku: Char = [chsetKanji1, 362B]; charKansujiCho: Char = [chsetKanji5, 275B]; -- CHARACTER SET 74B throuch 111B: Japanese Kanji JIS-2 -- Types: RgChsetKanjiJIS2: TYPE = [chsetKanji12..chsetKanji25]; -- JIS-2 Kanji chsets -- Constants: -- chars: charFirstKanjiJIS2: Char = [chsetKanji12, 0B]; charLastKanjiJIS2: Char = [chsetKanji25, 104B]; UpperCase0: PROCEDURE [code: Code] RETURNS [Code] = INLINE --Returns the code representing the upper-casification of the indicated (assumed chset 0) code BEGIN code0: Codes0 = LOOPHOLE[code]; RETURN[ SELECT code0 FROM IN [lowerA..lowerZ] => code - 40B, IN [lowerAEdipthong..lowerDstroke], IN [lowerIJligature..lowerOEligature], IN [lowerThorn..lowerEng] => code - 10B, ENDCASE => code]; END; UpperCase1: PROCEDURE [code: Code] RETURNS [Code] = INLINE --Returns the code representing the upper-casification of the indicated (assumed chset 1) code BEGIN code1: Codes1 = LOOPHOLE[code]; RETURN[ SELECT code1 FROM IN [lowerAlpha..lowerOmega] => code - 40B, -- lower case Greek IN [lowerA..lowerYa] => code - 60B, -- lower case Cyrillic ENDCASE => code]; END; UpperCase40: PROCEDURE [code: Code] RETURNS [Code] = INLINE --Returns the code representing the upper-casification of the indicated (assumed chset 40) code BEGIN code40: Codes40 = LOOPHOLE[code]; RETURN[ SELECT code40 FROM IN [upperAring..lowerCcedilla] => IF LOOPHOLE[code40] MOD 2 = 0B -- even codes are lowercase THEN code - 1B ELSE code, ENDCASE => code]; END; LowerCase0: PROCEDURE [code: Code] RETURNS [Code] = INLINE --Returns the code representing the lower-casification of the indicated (assumed roman) code BEGIN code0: Codes0 = LOOPHOLE[code]; RETURN[ SELECT code0 FROM IN [upperA..upperZ] => code + 40B, IN [upperAEdipthong..upperDstroke], IN [upperIJligature..upperOEligature], IN [upperThorn..upperEng] => code + 10B, ENDCASE => code]; END; LowerCase1: PROCEDURE [code: Code] RETURNS [Code] = INLINE --Returns the code representing the lower-casification of the indicated (assumed chset 1) code BEGIN code1: Codes1 = LOOPHOLE[code]; RETURN[ SELECT code1 FROM IN [upperAlpha..upperOmega] => code + 40B, -- upper case Greek IN [upperA..upperYa] => code + 60B, -- upper case Cyrillic ENDCASE => code]; END; LowerCase40: PROCEDURE [code: Code] RETURNS [Code] = INLINE --Returns the code representing the lower-casification of the indicated (assumed chset 40) code BEGIN code40: Codes40 = LOOPHOLE[code]; RETURN[ SELECT code40 FROM IN [upperAring..lowerCcedilla] => IF LOOPHOLE[code40] MOD 2 = 1 -- odd codes are uppercase THEN code + 1B ELSE code, ENDCASE => code]; END; Number0: PROCEDURE [code: Code] RETURNS [CARDINAL] = INLINE --Returns the numeric value of the (assumed Roman digit) code BEGIN RETURN[code - LOOPHOLE[Codes0[digit0], CARDINAL]]; END; Roman: PROCEDURE [codes0: Codes0] RETURNS [Char] = INLINE BEGIN RETURN[[chsetRoman, LOOPHOLE[codes0]]]; END; Greek, Russian, Cyrillic: PROCEDURE [codes1: Codes1] RETURNS [Char] = INLINE BEGIN RETURN[[chsetCyrillic, LOOPHOLE[codes1]]]; END; Kana: PROCEDURE [codes2: Codes2] RETURNS [Char] = INLINE BEGIN RETURN[[chsetKana, LOOPHOLE[codes2]]]; END; Hebrew: PROCEDURE [codes3: Codes3] RETURNS [Char] = INLINE BEGIN RETURN[[chsetHebrew, LOOPHOLE[codes3]]]; END; Bopomofo: PROCEDURE [codes4: Codes4] RETURNS [Char] = INLINE BEGIN RETURN[[chsetBopomofo, LOOPHOLE[codes4]]]; END; -- Get Hiragana: Convert katakana to hiragana. Don't convert if code is not katakana code. Hiragana: PROCEDURE [code: Code] RETURNS [Code] = INLINE BEGIN RETURN[ IF code IN [katSmallA..katN] OR code IN [katKurikaesi..katKurikaesiDakuon] THEN code - offsetKatHir ELSE code]; END; -- Get Katakana: Convert hiragana to katakana. Don't convert if code is not hiragana code. Katakana: PROCEDURE [code: Code] RETURNS [Code] = INLINE BEGIN RETURN[ IF code IN RgHiragana OR code IN [hirKurikaesi..hirKurikaesiDakuon] THEN code + offsetKatHir ELSE code]; END; Symbol: PROCEDURE [codes20: Codes20] RETURNS [Char] = INLINE BEGIN RETURN[[chsetSymbol, LOOPHOLE[codes20]]]; END; JSymbol: PROCEDURE [codes21: Codes21] RETURNS [Char] = INLINE BEGIN RETURN[[chsetJSymbol, LOOPHOLE[codes21]]]; END; Rendering: PROCEDURE [codes40: Codes40] RETURNS [Char] = INLINE BEGIN RETURN[[chsetRendering, LOOPHOLE[codes40]]]; END; END. -- of CharDefs LOG September 8, 1980 7:53 AM GCurry created December 27, 1980 11:02 AM GCurry renamed, consolidated various files January 7, 1981 5:01 PM S. Finkel Fixed MeaningEqual January 10, 1981 2:57 PM GCurry Make Chset PUBLIC January 25, 1981 7:21 PM GCurry Reflect new Cosmopolitan ranges, new character set 0 definition. February 9, 1981 9:11 AM GCurry Update to OIS Character Set. March 9, 1981 10:47 AM Finkel Add JStar char definitions. April 4, 1981 3:39 PM Mader Add Character set 1, and 20 definitions, changed character set 21 to JSymbol. June 16, 1981 12:36 PM Gittins Add accented chars (Character set 40B) June 19, 1981 11:55 AM Gittins Add chsetRendering & Upper/Lowercase40 Add names>340B to charset 0, andextend upper/lower case. July 2, 1981 11:18 AM Buelow Move Codes0, Codes1, Codes2 to CharADefs; Codes20, Codes21, Codes40 to CharBDefs for Star 19.1. July 31, 1981 10:50 AM Morrison Added chsetHebrew, Hebrew, Codes3. December 7, 1981 11:27 AM Otto Added chsetTile, formerly in CharForgotDefs 22-Jan-82 13:46:02 Tripp Added Class definition and several character constants. 7-Feb-82 18:40:09 Laaser Added IsAlphanumeric interface principally for Cusp in the international product. 11-Mar-82 K.Akada - Fixed Proc Hiragana and Katakana . 16-Jun-82 8:48:25 Tripp Extend character set range for Chinese kanji. 14-Jul-82 18:45:04 Finkel Added warning comment about changing Chset TYPE 8-Sep-82 21:34:12 Becker/Tripp Added Character Set 4: Chinese "Bopomofo" Phonetic Alphabet 17-Mar-83 14:15:56 Tse Return the original char if no uppercase/lowercase in UpperCase40 & LowerCase40. Special case for lower case sigma char.