Rope.mesa
Copyright Ó 1984, 1985, 1986, 1987, 1988, 1991, 1992 by Xerox Corporation. All rights reserved.
created by Russ Atkinson
Beach, February 21, 1985 9:26:19 am PST
Russ Atkinson (RRA) July 13, 1987 9:50:50 pm PDT
Christian Jacobi, January 12, 1988 5:42:52 pm PST
Carl Hauser, February 10, 1988 3:00:08 pm PST
Willie-s, August 1, 1991 5:28 pm PDT
Michael Plass, February 21, 1992 4:30 pm PST
Doug Wyatt, November 22, 1991 4:26 pm PST
A ROPE is (nominally) an immutable object containing a sequence of characters indexed starting at 0 for Length characters. The representation allows certain operations to be performed without copying all of the characters at the expense of adding additional nodes to the object. The use of immutable ropes alleviates concerns about "ownership" of the storage, and allows sharing of ropes by concurrent processes.
For a more complete explanation of ropes, see RopeDoc.tioga.
Implementation note: the bit pattern of the text variant is GUARANTEED to be consistent with the built-in Cedar type TEXT, although the types will not agree either at compile-time or runtime. This means that one can Rope.Text or REF TEXT interchangeably provided that the runtime type is not examined, that the contents are not modified, and provided that the compiler can be fooled (via LOOPHOLE). Since REF TEXT is bitwise compatible with Mesa LONG STRING for lengths < 32K characters, it is OK to pass flat ropes to procedures that expect LONG STRING, provided that those routines do not modify the contents. For short STRING you are on your own (see ConvertUnsafe).
DIRECTORY
Basics USING [Comparison, UnsafeBlock];
Rope: CEDAR DEFINITIONS = BEGIN
Part 1: Basic operations and definitions
ROPE: TYPE = REF RopeRep;
XROPE: TYPE = REF XRopeRep; -- a supertype of ROPE for larger character sets (see XRope)
Comparison: TYPE = Basics.Comparison;
Stand-in for Basics.Comparison
UnsafeBlock: TYPE = Basics.UnsafeBlock;
Stand-in for Basics.UnsafeBlock
NoRope: ERROR;
... is signalled if rope is invalid variant or some other invariant has broken. This is a serious error, and indicates that either storage is corrupted, or the user has supplied a bad routine when making a rope, or some other nasty bug.
Note: BoundsFault = RuntimeError.BoundsFault; it is raised by many of the Rope operations.
There are no values of a len parameter that raise errors: if len is too long it is shortened to indicate the rest of the rope, and if len < 0 the behavior will be as if len = 0.
Cat: PROC [r1, r2, r3: ROPE, r4, r5: ROPE ¬ NIL] RETURNS [ROPE];
... returns the concatenation of up to five ropes (limit based on eval stack depth). BoundsFault occurs if the result would be longer than LAST[INT].
Concat: PROC [base, rest: ROPE] RETURNS [ROPE];
... is the two-rope (faster) version of Cat. BoundsFault occurs if the result would be longer than LAST[INT].
Compare: PROC [s1, s2: ROPE, case: BOOL ¬ TRUE] RETURNS [Comparison];
... returns the lexicographic comparison of the two ropes based on CHAR collating sequence.
case => case of characters is significant.
Equal: PROC [s1, s2: ROPE, case: BOOL ¬ TRUE] RETURNS [BOOL];
... tests contents equality of s1 and s2. Faster than Compare.
case => case of characters is significant.
Fetch: PROC [base: ROPE, index: INT ¬ 0] RETURNS [CHAR];
... fetches indexed character from given ropes.
BoundsFault occurs if index < 0 or index is >= Length[base].
Index: PROC [s1: ROPE, pos1: INT ¬ 0, s2: ROPE, case: BOOL ¬ TRUE] RETURNS [INT];
... returns the smallest character position N such that N >= pos1 and Equal[Substr[s1, N, Length[s2]], s2, case]. If s2 does not occur in s1 at or after pos1, Length[s1] is returned.
case => case of characters is significant. BoundsFault occurs when pos1 < 0.
Find: PROC [s1, s2: ROPE, pos1: INT ¬ 0, case: BOOL ¬ TRUE] RETURNS [INT];
... is like Index, returning the smallest character position N such that N >= pos1 and Equal[Substr[s1, N, Length[s2]], s2, case], except that Find returns -1 if s2 is not found.
case => case of characters is significant. BoundsFault occurs when pos1 < 0.
FindBackward: PROC [s1, s2: ROPE, pos1: INT ¬ MaxLen, case: BOOL ¬ TRUE] RETURNS [INT];
... is like Find, except that it returns the largest character position N such that N <= pos1.
IsEmpty: PROC [r: XROPE] RETURNS [BOOL];
... is equivalent to Length[r] = 0.
Length: PROC [base: XROPE] RETURNS [INT];
... returns the # of characters in the given rope.
Size: PROC [base: XROPE] RETURNS [INT];
Size[base] = Length[base]
Replace: PROC [base: ROPE, start: INT ¬ 0, len: INT ¬ MaxLen, with: ROPE ¬ NIL]
RETURNS [ROPE];
... returns rope with given range replaced by new. BoundsFault occurs when start < 0 or start > Length[base] or the result would be longer than LAST[INT].
Substr: PROC [base: ROPE, start: INT ¬ 0, len: INT ¬ MaxLen] RETURNS [ROPE];
... returns a subrope of the base. BoundsFault occurs if start < 0 or start > Length[base].
CompareSubstrs: PROC [
s1: ROPE, start1: INT ¬ 0, len1: INT ¬ MaxLen,
s2: ROPE, start2: INT ¬ 0, len2: INT ¬ MaxLen,
case: BOOL ¬ TRUE] RETURNS [Comparison];
... returns Compare[Substr[s1, start1, len1], Substr[s2, start2, len2], case].
EqualSubstrs: PROC [
s1: ROPE, start1: INT ¬ 0, len1: INT ¬ MaxLen,
s2: ROPE, start2: INT ¬ 0, len2: INT ¬ MaxLen,
case: BOOL ¬ TRUE] RETURNS [BOOL];
... returns Equal[Substr[s1, start1, len1], Substr[s2, start2, len2], case].
Part 2: Extended operations and definitions
Literal: PROC [rope: ROPE] RETURNS [ROPE] ~ INLINE { RETURN[rope] };
The identity function, useful to convey the desired type of a literal to the compiler in certain contexts, e.g. when the target type is a REF ANY.
Run: PROC [s1: ROPE, pos1: INT ¬ 0, s2: ROPE, pos2: INT ¬ 0,
case: BOOL ¬ TRUE, len: INT ¬ MaxLen] RETURNS [INT];
... returns largest number of chars N <= len such that EqualSubstrs[s1,pos1,N, s2,pos2,N, case].
IsPrefix: PROC [prefix: ROPE, subject: ROPE, case: BOOL ¬ TRUE] RETURNS [BOOL];
... returns Run[s1: prefix, s2: subject, case: case]=Length[prefix]; that is, returns TRUE iff prefix is a prefix of subject.
Match: PROC [pattern, object: ROPE, case: BOOL ¬ TRUE] RETURNS [BOOL];
... returns TRUE iff object matches the pattern, where the pattern may contain * to indicate that 0 or more characters will match. If case is true, then case matters.
SkipTo: PROC [s: ROPE, pos: INT ¬ 0, skip: ROPE] RETURNS [INT];
... returns the lowest position N in s such that s[N] is in the skip rope and N >= pos. If pos > Length[s] or no such character occurs in s, then return Length[s]. BoundsFault occurs when pos < 0.
SkipOver: PROC [s: ROPE, pos: INT ¬ 0, skip: ROPE] RETURNS [INT];
... returns the lowest position N in s such that s[N] is NOT in the skip rope and N >= pos. If pos > Length[s] or no such character occurs in s, then return Length[s]. BoundsFault occurs when pos < 0.
Map: PROC [base: ROPE, start: INT ¬ 0, len: INT ¬ MaxLen, action: ActionType]
RETURNS [BOOL];
... applies the action to the given range of characters in the rope. Returns TRUE when some action returns TRUE. BoundsFault occurs when start < 0 or start > Length[base].
Translate: PROC [base: ROPE, start: INT ¬ 0, len: INT ¬ MaxLen,
translator: TranslatorType ¬ NIL] RETURNS [new: ROPE];
... applies the translation to get a new rope. If the resulting size > 0, then new does not share with the original rope! If xtranslator = NIL, the identity translation is performed for extented characters. If translator = NIL, the xtranslator is used for all characters.
Lower: TranslatorType;
... changes chars to lower case
Upper: TranslatorType;
... changes chars to upper case
Flatten: PROC [base: ROPE, start: INT ¬ 0, len: INT ¬ MaxLen] RETURNS [Text];
... returns a flat rope from the given range of characters. BoundsFault occurs if the resulting length would be > LAST[TextBound].
FromProc: PROC [len: INT, p: PROC RETURNS [CHAR], maxPiece: INT ¬ MaxLen]
RETURNS [ROPE];
... returns a new rope given a length and a proc to apply for each CHAR
FromChars: PROC [PROC [PROC [CHAR]]] RETURNS [ROPE];
... calls a generating proc which calls its argument proc once for each CHAR
FromRopes: PROC [PROC [PROC [ROPE]]] RETURNS [ROPE];
... calls a generating proc which calls its argument proc once for each ROPE
it's best for the ropes to have approximately uniform size
NewText: PROC [size: TextBound] RETURNS [Text];
... returns a new Rope.Text, contents uninitialized, Length[NewText[size]] = size
FromRefText: PROC [s: REF READONLY TEXT,
start: TextBound ¬ 0, len: TextBound ¬ TextBound.LAST] RETURNS [Text];
... makes a rope from a (substring of a) REF READONLY TEXT
causes copying, can raise BoundsFault for invalid start
ToRefText: PROC [base: ROPE] RETURNS [REF TEXT];
... makes a REF TEXT from a rope
causes copying, can raise BoundsFault if the copied part can't fit
FromChar: PROC [c: CHAR] RETURNS [Text];
... makes a rope from a character
MakeRope: PROC [base: REF, size: INT, fetch: FetchType,
map: MapType ¬ NIL, move: MoveType ¬ NIL] RETURNS [ROPE];
... returns a rope using user-supplied procedures and data. Note that the user procedures MUST survive as long as the rope does!
MakeConstantRope: PROC [char: CHAR, length: INT] RETURNS [ROPE];
result.Length[] = length.
result.Fetch[i] = char, for i in [0 .. length).
AppendChars: PROC [buffer: REF TEXT, rope: ROPE, start: INT ¬ 0, len: INT ¬ MaxLen]
RETURNS [charsMoved: TextBound];
... appends characters to the end of a REF TEXT buffer, starting at start within the rope. The move stops if there are no more characters from the rope OR len characters have been moved OR the buffer is full (buffer.length = buffer.maxLength); charsMoved is always the # of characters appended. NOTE: the user is responsible for protecting buffer from concurrent modifications.
UnsafeMoveChars: UNSAFE PROC [block: UnsafeBlock, rope: ROPE, start: INT ¬ 0]
RETURNS [charsMoved: INT];
... moves characters to an UnsafeBlock, starting at start within the rope. The move stops if there are no more characters from the rope OR block.count characters have been moved; charsMoved is always the # of characters moved. This UNSAFE operation is mainly for the implementation of IO.RIS. Most clients should use AppendChars.
ContainingPiece: PROC [rope: ROPE, index: INT ¬ 0]
RETURNS [base: ROPE, start: INT, len: INT];
... finds the largest piece containg the given index such that the result is either a text or an object variant. (NIL, 0, 0) is returned if the index is NOT in the given rope.
Balance: PROC [base: ROPE, start: INT ¬ 0, len: INT ¬ MaxLen, flat: INT ¬ FlatMax]
RETURNS [ROPE];
... returns a balanced rope, possibly with much copying of components
flat' ¬ MIN[MAX[flat,FlatMax], LAST[TextBound]]
len' ¬ MIN[MAX[len,0], Size[base]-start]
start < 0 OR start > Size[base] => bounds fault
the resulting maxDepth will be limited by 2+log2[len'/flat']
VerifyStructure: PROC [s: ROPE] RETURNS [leaves, nodes, maxDepth: INT];
... traverses the structure of the given rope; return the number of leaves, nodes and the max depth of the rope extra checking is performed to verify invariants a leaf is a text or object variant a node is a non-NIL, non-leaf variant shared leaves and nodes are multiply counted.
VerifyFailed: ERROR;
occurs when VerifyStructure finds a bad egg
should not happen, of course
Part 3: Miscellaneous definitions
FetchType: TYPE = PROC [data: REF, index: INT] RETURNS [CHAR];
... is the type of fetch routine used to make a user rope.
MapType: TYPE = PROC [base: REF, start, len: INT, action: ActionType]
RETURNS [quit: BOOL ¬ FALSE];
... is the type of user routine used to map over a subrope; returns TRUE if some action returns TRUE.
MoveType: TYPE = UNSAFE PROC [block: UnsafeBlock, data: REF, start: INT]
RETURNS [charsMoved: INT ¬ 0];
... is the type of user routine used to move characters to an UnsafeBlock. The move should stop if there are no more characters from the rope OR block.count characters have been moved; charsMoved is always the # of characters moved. The data given is from the object variant. This uses an UnsafeBlock in order to integrate well with IO.UnsafeGetBlock.
ActionType: TYPE = PROC [c: CHAR] RETURNS [quit: BOOL ¬ FALSE];
... is the type of routine applied when mapping; returns TRUE to quit from Map.
TranslatorType: TYPE = PROC [old: CHAR] RETURNS [CHAR];
... is the type of routine supplied to Translate.
StringBound: TYPE = CARD16;
The bounds type for LONG STRING
TextBound: TYPE = NAT15;
The bounds type for REF TEXT
UncheckedFlat: TYPE = POINTER TO UncheckedFlatRep;
UncheckedFlatRep: TYPE = MACHINE DEPENDENT RECORD [
May be used by TRUSTED code for high-performance access to a Text variant.
No user-servicable parts inside; refer servicing to qualified personnel.
len: StringBound,
max: StringBound,
chars: PACKED SEQUENCE COMPUTED StringBound OF CHAR
];
XRopeRep: PRIVATE TYPE = MACHINE DEPENDENT RECORD [
See RopeImpl for the official explanation of the RopeRep variants.
SELECT xtag: * FROM
eightbit => [
length: NAT15, -- Only valid for text variant
v: SELECT tag: * FROM
text => [PACKED SEQUENCE max: NAT15 OF CHAR],
node => [
depth: [0..2**13) ¬ 0,
cases: SELECT case: * FROM
substr => [size: INT, base: ROPE, start: INT],
concat => [size: INT, base, rest: ROPE, pos: INT],
replace => [size: INT, base, replace: ROPE, start, oldPos, newPos: INT],
object => [size: INT, base: REF, fetch: FetchType, map: MapType, move: MoveType]
ENDCASE
],
ENDCASE
],
extended => [
size: INT[0..MaxLen),
ext: REF ExtendedRep
],
ENDCASE
];
MaxLen: INT = LAST[INT];
FlatMax: TextBound = 24;
Text: TYPE = REF TextRep; -- the small, flat variant handle
TextRep: TYPE = XRopeRep.eightbit.text; -- useful for creating new text variants
RopeRep: TYPE = XRopeRep.eightbit; -- the byte-oriented variety
ExtendedRep: TYPE; -- For extension to larger character sets - operations in another interface
END.