Rope.mesa
Copyright © 1984, 1985 by Xerox Corporation. All rights reserved.
Paul Rovner, August 8, 1983 12:28 pm
Russ Atkinson, February 19, 1985 1:27:19 pm PST
Beach, February 21, 1985 9:26:19 am PST
Doug Wyatt, February 26, 1985 3:25:27 pm PST
A Rope is (nominally) an immutable object containg a sequence of characters indexed starting at 0 for Length characters. The representation allows certain operations to be performed without copying all of the characters at the expense of adding additional nodes to the object. The use of immutable ropes alleviates concerns about "ownership" of the storage, and allows sharing of ropes by concurrent processes.
For a more complete explanation of ropes, see RopeDoc.tioga.
Implementation note: the bit pattern of the text variant is GUARANTEED to be consistent with the built-in Cedar type TEXT, although the types will not agree either at compile-time or runtime. This means that one can Rope.Text or REF TEXT interchangeably provided that the runtime type is not examined, that the contents are not modified, and provided that the compiler can be fooled (via LOOPHOLE). Since REF TEXT is bitwise compatible with Mesa LONG STRING for lengths < 32K characters, it is OK to pass flat ropes to procedures that expect LONG STRING, provided that those routines do not modify the contents. For short STRING you are on your own (see ConvertUnsafe).
DIRECTORY
Basics USING [Comparison],
PrincOps USING [zBNDCK, zLI0, zRSTRL];
Rope: CEDAR DEFINITIONS
= BEGIN
Part 1: Basic operations and definitions
ROPE: TYPE = REF RopeRep;
NoRope: ERROR;
... is signalled if rope is invalid variant or some other invariant has broken. This is a serious error, and indicates that either storage is corrupted, or the user has supplied a bad routine when making a rope, or some other nasty bug.
Note: BoundsFault = Runtime.BoundsFault; it is raised by many of the Rope operations.
There are no values of a len parameter that raise errors: if len is too long it is shortened to indicate the rest of the rope, and if len < 0 the behavior will be as if len = 0.
Cat: PROC [r1, r2, r3, r4, r5: ROPENIL] RETURNS [ROPE];
... returns the concatenation of up to five ropes (limit based on eval stack depth). BoundsFault occurs if the result would be longer than LAST[INT].
Concat: PROC [base,rest: ROPENIL] RETURNS [ROPE];
... is the two-rope (faster) version of Cat. BoundsFault occurs if the result would be longer than LAST[INT].
Compare: PROC [s1, s2: ROPE, case: BOOLTRUE] RETURNS [Basics.Comparison];
... returns the lexicographic comparison of the two ropes based on CHAR collating sequence. case => case of characters is significant.
Equal: PROC [s1, s2: ROPE, case: BOOLTRUE] RETURNS [BOOL];
... tests contents equality of s1 and s2. Faster than Compare. case => case of characters is significant.
Fetch: PROC [base: ROPE, index: INT ← 0] RETURNS [c: CHAR];
... fetches indexed character from given ropes. BoundsFault occurs if index < 0 or index is >= Length[base].
InlineFetch: PROC [base: ROPE, index: INT] RETURNS [c: CHAR] = TRUSTED INLINE {
... is the fast version of Fetch, since no procedure call is done when the rope is flat.
IF base # NIL THEN {
first: CARDINALLOOPHOLE[base, LONG POINTER TO CARDINAL]^;
IF first < 100000B --base.tag=text-- THEN RETURN [QFX[base, index, first]];
};
RETURN [Fetch[base, index]];
};
Index: PROC [s1: ROPE, pos1: INT ← 0, s2: ROPE, case: BOOLTRUE] RETURNS [INT];
... returns the smallest character position N such that N >= pos1 and Equal[Substr[s1, N, Length[s2], s2, case]. If s2 does not occur in s1 at or after pos1, Length[s1] is returned. case => case of characters is significant. BoundsFault occurs when pos1 < 0.
Find: PROC [s1, s2: ROPE, pos1: INT ← 0, case: BOOLTRUE] RETURNS [INT];
... is like Index, returning the smallest character position N such that N >= pos1 and Equal[Substr[s1, N, Length[s2], s2, case], except that Find returns -1 if not found. case => case of characters is significant. BoundsFault occurs when pos1 < 0.
IsEmpty: PROC [r: ROPE] RETURNS [BOOL];
... is equivalent to Length[r] = 0.
InlineIsEmpty: PROC [r: ROPE] RETURNS [BOOL] = TRUSTED INLINE {
... is the fast version of IsEmpty.
IF r = NIL THEN RETURN [TRUE];
{ first: INTEGERLOOPHOLE[r, LONG POINTER TO INTEGER]^;
IF first > 0 THEN RETURN [FALSE];
IF first = 0 THEN RETURN [TRUE];
};
RETURN [LOOPHOLE[r, REF node RopeRep].size = 0];
};
Length: PROC [base: ROPE] RETURNS [INT];
... returns the # of characters in the given rope.
InlineLength: PROC [base: ROPE] RETURNS [INT] = TRUSTED INLINE {
... returns Length[base].
IF base = NIL THEN RETURN [0];
{ first: INTEGERLOOPHOLE[base, LONG POINTER TO INTEGER]^;
IF first >= 0 THEN RETURN [ExtendPositive[first]];
};
RETURN [LOOPHOLE[base, REF node RopeRep].size];
};
Size: PROC [base: ROPE] RETURNS [INT];
Size[base] = Length[base]
InlineSize: PROC [base: ROPE] RETURNS [INT] = TRUSTED INLINE {
InlineSize[base] = InlineLength[base].
Duplicates InlineLength to avoid a compiler bug dealing with copying of inline procedures.
IF base = NIL THEN RETURN [0];
{ first: INTEGERLOOPHOLE[base, LONG POINTER TO INTEGER]^;
IF first >= 0 THEN RETURN [ExtendPositive[first]];
};
RETURN [LOOPHOLE[base, REF node RopeRep].size];
};
Replace: PROC [base: ROPE, start: INT ← 0, len: INT ← MaxLen, with: ROPENIL]
RETURNS
[ROPE];
... returns rope with given range replaced by new. BoundsFault occurs when start < 0 or start > Length[base] or the result would be longer than LAST[INT].
Substr: PROC [base: ROPE, start: INT ← 0, len: INT ← MaxLen] RETURNS [ROPE];
... returns a subrope of the base. BoundsFault occurs if start < 0 or start > Length[base].
Part 2: Extended operations and definitions
Run: PROC [s1: ROPE, pos1: INT ← 0, s2: ROPE, pos2: INT ← 0, case: BOOLTRUE]
RETURNS
[INT];
... returns largest number of chars N such that Equal[Substr[s1,pos1,N], Substr[s2,pos2,N], case]. 0 is returned if pos1 >= Length[s1] or pos2 >= Length[s2]. BoundsFault is raised if pos1 < 0 or pos2 < 0.
Match: PROC [pattern, object: ROPE, case: BOOLTRUE] RETURNS [BOOL];
... returns TRUE iff object matches the pattern, where the pattern may contain * to indicate that 0 or more characters will match. If case is true, then case matters.
SkipTo: PROC [s: ROPE, pos: INT ← 0, skip: ROPE] RETURNS [INT];
... returns the lowest position N in s such that s[N] is in the skip rope and N >= pos. If pos > Length[s] or no such character occurs in s, then return Length[s]. BoundsFault occurs when pos < 0.
SkipOver: PROC [s: ROPE, pos: INT ← 0, skip: ROPE] RETURNS [INT];
... returns the lowest position N in s such that s[N] is NOT in the skip rope and N >= pos. If pos > Length[s] or no such character occurs in s, then return Length[s]. BoundsFault occurs when pos < 0.
Map: PROC [base: ROPE, start: INT ← 0, len: INT ← MaxLen, action: ActionType]
RETURNS
[BOOL];
... applies the action to the given range of characters in the rope. Returns TRUE when some action returns TRUE. BoundsFault occurs when start < 0 or start > Length[base].
Translate: PROC [base: ROPE, start: INT ← 0, len: INT ← MaxLen,
translator: TranslatorType ← NIL] RETURNS [new: ROPE];
... applies the translation to get a new rope. If the resulting size > 0, then new does not share with the original rope! If translator = NIL, the identity translation is performed
Flatten: PROC [base: ROPE, start: INT ← 0, len: INT ← MaxLen] RETURNS [Text];
... returns a flat rope from the given range of characters. BoundsFault occurs if the resulting length would be > LAST[NAT].
InlineFlatten: PROC [r: ROPE] RETURNS [Text] = TRUSTED INLINE {
... is the fast version of Flatten, since there is no procedure call for something already flat.
IF r = NIL THEN RETURN [NIL];
IF LOOPHOLE[r, LONG POINTER TO INTEGER]^ >= 0 THEN RETURN [LOOPHOLE[r]];
RETURN [Flatten[r]]
};
FromProc: PROC [len: INT, p: PROC RETURNS [CHAR], maxPiece: INT ← MaxLen]
RETURNS
[ROPE];
... returns a new rope given a proc to apply for each CHAR
NewText: PROC [size: NAT] RETURNS [Text];
... returns a new Rope.Text, contents uninitialized, Length[NewText[size]] = size
FromRefText: PROC [s: REF READONLY TEXT] RETURNS [Text];
... makes a rope from a REF READONLY TEXT
causes copying
ToRefText: PROC [base: ROPE] RETURNS [REF TEXT];
... makes a REF TEXT from a rope
causes copying
FromChar: PROC [c: CHAR] RETURNS [Text];
... makes a rope from a character
MakeRope: PROC [base: REF, size: INT, fetch: FetchType,
map: MapType ← NIL, append: AppendCharsType ← NIL] RETURNS [ROPE];
... returns a rope using user-supplied procedures and data. Note that the user procedures MUST survive as long as the rope does!
AppendChars: PROC[buffer: REF TEXT, rope: ROPE, start: INT ← 0, len: INTLAST[INT]]
RETURNS
[charsMoved: NAT];
... appends characters to the end of a REF TEXT buffer, starting at start within the rope. The move stops if there are no more characters from the rope OR len characters have been moved OR the buffer is full (buffer.length = buffer.maxLength). charsMoved is always the # of characters appended. NOTE: the user is responsible for protecting buffer from concurrent modifications.
ContainingPiece: PROC [rope: ROPE, index: INT ← 0]
RETURNS
[base: ROPE, start: INT, len: INT];
... finds the largest piece containg the given index such that the result is either a text or an object variant. (NIL, 0, 0) is returned if the index is NOT in the given rope.
Balance: PROC [base: ROPE, start: INT ← 0, len: INT ← MaxLen, flat: INT ← FlatMax]
RETURNS
[ROPE];
... returns a balanced rope, possibly with much copying of components
flat' ← MIN[MAX[flat,FlatMax], LAST[NAT]]
len' ← MIN[MAX[len,0], Size[base]-start]
start < 0 OR start > Size[base] => bounds fault
the resulting maxDepth will be limited by 2+log2[len'/flat']
VerifyStructure: PROC [s: ROPE] RETURNS [leaves, nodes, maxDepth: INT];
... traverses the structure of the given rope; return the number of leaves, nodes and the max depth of the rope extra checking is performed to verify invariants a leaf is a text or object variant a node is a non-NIL, non-leaf variant shared leaves and nodes are multiply counted.
VerifyFailed: ERROR;
occurs when VerifyStructure finds a bad egg
should not happen, of course
Part 3: Miscellaneous definitions
FetchType: TYPE = PROC [data: REF, index: INT] RETURNS [CHAR];
... is the type of fetch routine used to make a user rope.
MapType: TYPE = PROC [base: REF, start, len: INT, action: ActionType]
RETURNS
[quit: BOOLFALSE];
... is the type of user routine used to map over a subrope; returns TRUE if some action returns TRUE.
ActionType: TYPE = PROC [c: CHAR] RETURNS [quit: BOOLFALSE];
... is the type of routine applied when mapping; returns TRUE to quit from Map.
TranslatorType: TYPE = PROC [old: CHAR] RETURNS [new: CHAR];
... is the type of routine supplied to Translate.
AppendCharsType: TYPE = PROC [buffer: REF TEXT, data: REF, start: INT, len: INT]
RETURNS
[charsMoved: NAT];
... is the type of user routine used to append characters to the end of a REF TEXT buffer. The move should stop if there are no more characters from the rope OR len characters have been moved OR the buffer is full (buffer.length = buffer.maxLength). charsMoved is always the # of characters appended. The data given is from the object variant.
ExtendPositive: PRIVATE PROC [x: INTEGER] RETURNS [INT]
= TRUSTED MACHINE CODE { PrincOps.zLI0 };
Takes an integer assumed to be positive and makes it into an INT.
QFetch: PRIVATE PROC [base: Text, index: CARDINAL] RETURNS [CHAR]
= TRUSTED MACHINE CODE { PrincOps.zRSTRL, 4 };
quick fetch using the index, no NIL or bounds checking
QFX: PRIVATE PROC [base: ROPE, index: CARDINAL, bound: CARDINAL] RETURNS [CHAR]
= TRUSTED MACHINE CODE { PrincOps.zBNDCK; PrincOps.zRSTRL, 4 };
quick fetch using the index & bounds checking
RopeRep: PRIVATE TYPE = RECORD [
SELECT tag: * FROM
text => [length: NAT, text: PACKED SEQUENCE max: CARDINAL OF CHAR],
node => [
size: INT,
cases: SELECT case: * FROM
substr => [base: ROPE, start: INT, depth: INTEGER],
concat => [base, rest: ROPE, pos: INT, depth: INTEGER],
replace => [base, replace: ROPE, start, oldPos, newPos: INT, depth: INTEGER],
object => [base: REF, fetch: FetchType, map: MapType, append: AppendCharsType]
ENDCASE
]
ENDCASE
];
MaxLen: INT = LAST[INT];
FlatMax: CARDINAL = 24;
Text: TYPE = REF TextRep; -- the small, flat variant handle
TextRep: TYPE = RopeRep[text]; -- useful for creating new text variants
END.
For those who care, this is the official explanation of the RopeRep variants:
Note: NIL is allowed as a valid ROPE.
Note: ALL integer components of the representation must be non-negative.
SELECT x: x FROM
text => {
[0..x.length) is the range of char indexes
[0..x.max) is the number of chars of storage reserved
all Rope operations creating new text objects init x.length = x.max
x.length <= x.max is required
the bit pattern of the text IS IDENTICAL to TEXT and StringBody!!!!
};
node => {
[0..x.size) is the range of char indexes
SELECT x:x FROM
substr => {
x.base contains chars indexed by [0..x.size)
[0..x.size) in x ==> [x.start..x.start+x.size) in x.base
Size[x.base] >= x.start + x.size
};
concat => {
x.base contains chars indexed by [0..x.pos)
[0..x.pos) in x ==> [0..x.pos) in x.base
x.rest contains the chars indexed by [x.pos..x.size)
[x.pos..x.size) in x ==> [0..x.size-x.pos) in x.base
x.pos = Size[x.base] AND x.size = x.pos + Size[x.rest]
};
replace => {
x.base contains chars indexed by [0..x.start), [x.newPos..x.size)
[0..x.start) in x ==> [0..x.start) in x.base
[x.newPos..x.size) in x ==> [x.oldPos..Size[x.base]) in x.base
x.rest contains the chars indexed by [x.start..x.newPos)
[x.start..x.newPos) in x => [0..x.newPos-x.start) in x.base
x.size >= x.newPos >= x.start AND x.oldPos >= x.start
x.size - x.newPos = Size[x.base] - x.oldPos
};
object => {
x.base is the data needed by the user-supplied operations
x.fetch[x.base, i] should fetch the ith char AND x.fetch # NIL
x.map[x.base, st, len, action] implements Map[x, st, len, action]
x.append[buffer, x.base, st, len] implements AppendChars[buffer, x, st, len]
it is OK to have x.map = NIL OR x.append = NIL
};
ENDCASE => ERROR NoRope
}
ENDCASE => ERROR NoRope