RopeDoc.tioga
Copyright © 1985 by Xerox Corporation. All rights reserved.
Russ Atkinson (RRA) January 29, 1985 10:18:25 am PST
Rope Documentation
ROPE is Cedar's standard "string" type: an immutable garbage-collected sequence of characters. The Rope interface provides a large set of useful operations on ropes, including rope concatenation, subrope extraction, and rope scanning. A client can provide his own specialized implementation of ROPE. The standard implementation of ROPE avoids copying for most rope concatenation and subrope extraction operations. Because they are immutable, ropes can be safely shared between programs without concern for storage ownership or synchronization.
1. Introduction to Rope
A rope (type Rope.ROPE) is a garbage-collected sequence of characters. A rope is immutable; the sequence of characters denoted by a rope never changes. Ropes may be shared freely among independently-written applications, since no application will hand out a rope and then attempt to free its storage or somehow alter the characters it contains. Because of this, rope is the standard "string" type for public Cedar interfaces.
Rope is an interface that defines the type ROPE and a set of operations on ropes. Rope.mesa and Rope.bcd are brought over during Cedar installation, and the Rope interface's implementation is included in the Cedar boot file. Most Cedar programs either define ROPE: TYPE = Rope.ROPE immediately, or else OPEN Rope with a USING list. The object-style notation applies to many procedures in the Rope interface (r.Fetch[i] is the same as Rope.Fetch[r, i], if r is a ROPE), so it is seldom necessary to give the interface name explicitly even if the interface is not opened.
The Cedar language provides rope literals. Such a literal is denoted by a quoted string, for instance, "This is a rope literal\n". The compiler determines that the quoted string is a rope (and not a STRING or REF TEXT) from the target type established by the context of the literal, as in r: ROPE ← "now is the time".
In the current implementation the type Rope.ROPE is a REF to a variant record. One variant just contains a sequence of characters; in this case the rope is a text rope, or Rope.Text. Other variants contain references to one or more other ropes. These variants are allow the rope implementation to avoid copying when performing most Cat (rope concatenaction), Substr (subrope extraction), and Replace (an optimized combination of Cat and Substr that is often useful) operations. A large rope created using these operations will be a directed acyclic graph. This structure is a compromise; many operations slow down as the graph becomes deeper, but the Cat, Substr, and Replace operations are much faster than they would be if they had to copy long arguments. Unless a client is striving for the ultimate in efficiency, he has no need to understand more details of the rope representation.
A client may use his own representation for a rope by implementing a small set of basic operations on the new representation. Examples of this are given later in this documentation.
A brief summary of all the operations in the Rope interface appears below. The goal of this summary is to convey the general character of what exists, rather than all of the details. So in this summary, procedures are grouped into classes according to their function. The detailed description that comes in the next section is arranged alphabetically by procedure name.
In all operations, NIL is equivalent to the null rope (""). However, not all ropes with length of zero are NIL; the correct way to test whether or not a rope r is null is by r.Length[] = 0. In fact, it is seldom useful to compare two ropes r1 and r2 using the built-in compare operation r1 = r2, since this just compares pointer values; Rope.Compare[r1, r2] should be used instead.
MaxLen is the maximum number of characters in a rope; it is LAST[INT] = 2147483647. The maximum number of characters in a text rope is LAST[NAT] = 32767.
The Rope interface always describes a range of characters in a rope (called a piece in the comments below) by specifying the first character position (start) and the number of characters to include (len). If start is negative or greater than the length of the rope, a bounds fault occurs (RuntimeError.BoundsFault, the same error used by array bounds checking). A negative value of len describes an empty piece, while an excessively large value of len describes a piece extending to the end of the rope. In the procedures below, start always defaults to 0 and len always defaults to MaxLen.
Several operations take a case argument, which means "case significant". If case is TRUE, upper case characters are treated as distinct from lower case characters; if case is FALSE, an upper case character is treated as the corresponding lower case character. In the procedures below, case always defaults to TRUE (case is significant).
Constructing a rope
FromChar [c:
CHAR]
RETURNS [Text]
returns a new rope containing the single character c
FromProc [len:
INT, p:
PROC
RETURNS [
CHAR], maxPiece:
INT]
RETURNS [
ROPE]
returns a new rope r, Length[r] = len, contents from sequentially calling p
FromRefText [s:
REF
TEXT]
RETURNS [Text]
returns a new rope, contents resulting from s
MakeRope
[base: REF ANY, size: INT, fetch: FetchType, map: MapType, pieceMap: PieceMapType]
RETURNS [ROPE]
returns a client-defined rope with given size and operations
The two primitive operations on a rope
Fetch [base:
ROPE, index:
INT]
RETURNS [
CHAR]
returns the indexed character in the rope (or raises BoundsFault)
Length [base:
ROPE]
RETURNS [
INT]
returns the number of characters in base.
Constructing a rope from other ropes
Cat [r1, r2, r3, r4, r5: ROPE ← NIL] RETURNS [ROPE]
Concat [base, rest:
ROPE]
RETURNS [
ROPE]
returns the concatenation of the given ropes
Replace [base:
ROPE, start, len:
INT, with:
ROPE]
RETURNS [
ROPE]
returns a new rope, contents resulting from base with the given piece replaced by replace
Substr [base:
ROPE, start, len:
INT]
RETURNS [
ROPE]
returns a new rope, contents from base (from start for len)
Translate [base:
ROPE, start, len:
INT, translator: TranslatorType]
RETURNS [
ROPE]
returns a new rope, contents from translating base (from start for len)
Comparing ropes
Compare [s1, s2:
ROPE, case:
BOOL]
RETURNS [Basics.Comparison]
returns lexicographic comparison of the contents
Equal [s1, s2:
ROPE, case:
BOOL]
RETURNS [
BOOL]
returns s1 = s2 (true iff s1 and s2 contain same sequence of characters, modulo case)
IsEmpty [r:
ROPE]
RETURNS [
BOOL]
returns Length[r] = 0
Simple pattern matching on ropes
Find [s1, s2:
ROPE, pos1:
INT, case:
BOOL]
RETURNS [
INT]
returns index of s2 in s1 (not before pos1), or -1 if s2 not found
Map [base:
ROPE, start, len:
INT, action: ActionType]
RETURNS [
BOOL]
applies action to each character in the given piece
Match [pattern, object:
ROPE, case:
BOOL]
RETURNS [
BOOL]
returns true iff object matches pattern (using * matching)
Run [s1:
ROPE, pos1:
INT, s2:
ROPE, pos2:
INT, case:
BOOL]
RETURNS [
INT]
returns position of first difference between s1 (not before pos1) and s2 (not before pos2)
SkipOver [s:
ROPE, pos:
INT, skip:
ROPE]
RETURNS [
INT]
returns first position not before pos of char in s not in skip
SkipTo [s:
ROPE, pos:
INT, skip:
ROPE]
RETURNS [
INT]
returns first position not before pos of char in s in skip
Constructing a mutable sequence of characters from a rope
ToRefText [base:
ROPE]
RETURNS [
REF
TEXT]
returns a new REF TEXT, contents from base
Dealing with a rope's internal structure
Balance [base:
ROPE, start, len, flat:
INT]
RETURNS [
ROPE]
returns a balanced tree for the specified piece
ContainingPiece [ref:
ROPE, index:
INT]
RETURNS [base:
ROPE, start, len:
INT]
returns the leaf piece containing the index for len chars
Flatten [base:
ROPE, start, len:
INT]
RETURNS [Text]
returns a text variant rope, contents resulting from the given piece
PieceMap [base:
ROPE, start, len:
INT, action: PieceActionType, mapUser:
BOOL]
RETURNS [BOOL]
applies action to each flat piece in the given piece
VerifyStructure [s:
ROPE]
RETURNS [leaves, nodes, maxDepth:
INT]
performs consistency checks and gathers counts for s
2. Rope Operations
Signals raised through the interface
The following procedures raise
RuntimeError.BoundsFault:
Fetch, if index e Length[base] or index < 0.
Any procedure returning a text rope, if the length of the rope would exceed LAST[NAT].
Any procedure returning a rope, if the length of the rope would exceed MaxLen.
Any procedure with a start parameter, if start < 0 or start exceeds the length of the rope.
Any procedure with a pos (or pos1 or pos2) parameter, if pos < 0.
There are no values of a len parameter that raise errors: if len is too long it is shortened to indicate the rest of the rope, and if len < 0 the behavior will be as if len = 0.
VerifyStructure raises VerifyFailed when it finds a problem in the rope it is examining. This error should not normally occur.
Any procedure may raise NoRope. This indicates that an internal consistency check has failed, and is not intended to be caught by normal clients. This error probably means that some unsafe operation in the client program has violated the type structure, though it may mean that the rope implementation is broken.
Details of the operations
Balance [base:
ROPE, start:
INT ← 0, len:
INT ← MaxLen, flat:
INT ← FlatMax]
RETURNS [
ROPE]
Returns the specified piece of base as a balanced tree. The leaves of the returned rope will be a maximum of flat characters if they are copied. The leaves will be a minimum of flat/2 characters in any case (unless there is a single leaf). The flat argument will be converted to MIN[MAX[flat,FlatMax],MaxLen] before being used. BoundsFault occurs when start < 0 or start > Length[base]. The len argument will be assigned MAX[MIN[len,Length[base]-start],0] before being used. Larger values of flat will result in less depth, but probably more copying.
Compare [s1, s2:
ROPE, case:
BOOL ←
TRUE]
RETURNS [Basics.Comparison]
Compares the contents of s1 and s2 lexicographically, using ASCII character codes. Returns less if s1 < s2, equal if s1 = s2, and greater if s1 > s2. If case is TRUE, the characters are compared exactly. If case is FALSE, upper case characters are translated into corresponding lower case characters before comparison.
Cat [r1, r2, r3, r4, r5:
ROPE ←
NIL]
RETURNS [
ROPE]
Returns a new rope, contents resulting from concatenating r1 .. r5. Copying occurs only when the result rope is small. BoundsFault occurs when the length of the resulting rope would exceed LAST[INT].
Concat [base, rest:
ROPE]
RETURNS [
ROPE]
Concatentates two ropes. Equivalent to, but faster than, Cat[base, rest].
ContainingPiece [ref:
ROPE, index:
INT ← 0]
RETURNS [base:
ROPE, start:
INT, len:
INT]
ContainingPiece[ref, index] returns the largest piece in ref such that base is either a text or object variant, and the characters indexed by [start..start+len) in base are the same as that indexed by [index..index+len) in ref. If index < 0 or index e Length[ref] then [NIL, 0, 0] is returned. Intended for use by applications demanding efficient access.
Equal [s1, s2:
ROPE, case:
BOOL ←
TRUE]
RETURNS [
BOOL]
Returns TRUE iff contents of s1 and s2 are the same, defined as: Length[s1] = Length[s2] AND FOR i IN [0..Length[s1]): Fetch[s1,i] = Fetch[s2,i] (modulo case significance).
Find [s1, s2:
ROPE, pos1:
INT ← 0, case:
BOOL ←
TRUE]
RETURNS [
INT]
Let N =
Find[s1, s2, pos1, case]. If N # -1 then N is the least N such that
N e pos1 & Equal[Substr[s1, N, Length[s2], s2, case]
If N = -1, then there is no such index. Note that if case is FALSE, then upper case characters are treated as their lower case equivalents. BoundsFault occurs when pos1 < 0.
Fetch [base:
ROPE, index:
INT ← 0]
RETURNS [
CHAR]
If 0 d index < Length[base], Fetch returns the indexed character in the rope, otherwise raises bounds fault. If the indexed character occurs in a client-defined rope, the client-supplied fetch routine may be called.
Flatten [base:
ROPE, start:
INT ← 0, len:
INT ← MaxLen]
RETURNS [Text]
If the piece given by base, start, and len is short enough to fit in a text variant of rope, then returns a text variant object including the contents of the specified piece. BoundsFault occurs when start < 0 or start > Length[base] or the result would more than LAST[NAT] characters. The object returned may be the same object that existed prior to the Flatten invocation.
FromChar [c:
CHAR]
RETURNS [Text]
Returns a new text rope containing the single character c.
FromProc [len:
INT, p:
PROC
RETURNS [
CHAR], maxPiece:
INT ← MaxLen]
RETURNS [
ROPE]
Returns a new rope r, where Length[r]=len, and the contents result from sequentially calling p. If len > maxPiece then the rope returned will be a balanced tree of flat ropes, each piece being no larger than maxPiece. The implementation may impose minimum and maximum limits on maxPiece (currently [24..LAST[NAT]]), and will adjust values outside of that range to be within the range.
FromRefText [s:
REF
TEXT]
RETURNS [Text]
Returns a rope with length and contents from the given REF TEXT. Copying will occur.
Index [s1:
ROPE, pos1:
INT, s2:
ROPE, case:
BOOL ←
TRUE]
RETURNS [
INT]
Similar to Find. Let N = Index[s1, pos1, s2, case]. If N # Length[s1] then N is the least integer such that
N e pos1 & Equal[Substr[s1, N, Length[s2], s2, case]
If N = Length[s1] then either no such occurence of s2 occurs in s1, or s2 is empty. Note that if case is FALSE, then upper case characters are treated as their lower case equivalents. BoundsFault occurs when pos1 < 0.
Length [base:
ROPE]
RETURNS [
INT]
Returns the number of characters in base. Note that Length[NIL] = 0.
MakeRope [base:
REF, size:
INT, fetch: FetchType, map: MapType ←
NIL,
pieceMap: PieceMapType ← NIL] RETURNS [ROPE]
Returns a new rope, implemented using the client-supplied size, base, and operations. Calling Rope operations using a client-defined rope results in calling the client-supplied procedures with the base reference. If the client does not supply map or pieceMap operations they will be implemented by default operations within the Rope implementation (using the client-supplied fetch procedure). The argument types are declared as:
FetchType: TYPE = PROC [REF, INT] RETURNS [CHAR]
MapType:
TYPE =
PROC
[base: REF, start, len: INT, action: ActionType] RETURNS [BOOL]
PieceMapType:
TYPE =
PROC
[base: REF, start, len: INT, action: PieceActionType] RETURNS [BOOL]
The fetch operation is used to fetch single characters. The map and pieceMap operations are used to map over characters and pieces from a client-defined rope, and are present for the sake of efficiency. The operations supplied should appear to be strictly functional, otherwise the correct behavior of rope operations cannot be guaranteed.
Map [base:
ROPE, start:
INT ← 0, len:
INT ← MaxLen, action: ActionType]
RETURNS [
BOOL]
Applies the action procedure to each character in the given piece, or until the action returns
TRUE. Map returns
TRUE if it was stopped by action returning
TRUE. The type of action is declared as:
ActionType: TYPE = PROC [CHAR] RETURNS [BOOL]
BoundsFault occurs when start < 0 or start > Length[base].
Match [pattern, object:
ROPE, case:
BOOL ←
TRUE]
RETURNS [
BOOL]
Returns TRUE iff object matches pattern. Matching takes place from left to right, and the * character in the pattern will match any number of characters in the object. If case is FALSE, alphabetic characters match either their upper or lower case equivalents, other such characters must match exactly.
Examples that return
TRUE:
Match["a*b", "axb", TRUE]
Match["Ab", "aB", FALSE]
Examples that return
FALSE:
Match["a*b", "aaa", TRUE]
Match["Ab", "aB", TRUE]
PieceMap [base:
ROPE, start:
INT ← 0, len:
INT ← MaxLen,
action: PieceActionType, mapUser: BOOL ← TRUE] RETURNS [BOOL]
Applies the action procedure to each leaf piece in the representation of the piece specified by base, start, and len. PieceMap terminates when all leaf pieces have been visited or the action returns TRUE. A leaf piece is either a text variant or a client-defined object. PieceMap returns TRUE if it was stopped by action returning TRUE. The leaf pieces that make up the given piece will either be of the text variant, or of the client-implemented variant, or (if mapUser is TRUE) of whatever rope variant is supplied by the client's PieceMap operation. The action routine supplied in client variants is only used if it is not NIL and the mapUser parameter is TRUE. The type of action is declared as:
PieceActionType: TYPE = PROC [ROPE, INT, INT] RETURNS [BOOL]
BoundsFault occurs when start < 0 or start > Length[base]
Replace [base:
ROPE, start:
INT ← 0, len:
INT ← MaxLen, replace:
ROPE ←
NIL]
RETURNS [
ROPE]
Returns a new rope, x, where x results from base with the piece indexed by start for len replaced by the contents of replace.
BoundsFault occurs when start < 0 or start >
Length[base] or the length of the resulting rope would exceed
LAST[
INT]. The following is true:
Equal[
Replace[base, start, len, replace],
Cat[Cat[Substr[base, 0, start], replace], Substr[base, start+len]],
TRUE]
Replace combines a delete with an insert at the same location, and is therefore suited for many text editing applications.
Run [s1:
ROPE, pos1:
INT ← 0, s2:
ROPE, pos2:
INT ← 0, case:
BOOL ←
TRUE]
RETURNS [
INT]
Returns largest number of chars N such that Equal[Substr[s1,pos1,N], Substr[s2,pos2,N], case]. If pos1 > Length[s1] or pos2 > Length[s2], then 0 will be returned. BoundsFault occurs when pos1 < 0 or pos2 < 0.
Size [base:
ROPE]
RETURNS [
INT]
Size[base] = Length[base].
SkipOver [s:
ROPE, pos:
INT ← 0, skip:
ROPE]
RETURNS [
INT]
Returns first position not before pos where Fetch[s] # a character in skip. If pos > Length[s], then pos will be returned. Otherwise, if no such position exists, Length[s] will be returned. BoundsFault occurs when pos < 0.
SkipTo [s:
ROPE, pos:
INT ← 0, skip:
ROPE]
RETURNS [
INT]
Returns first position not before pos where Fetch[s] = a character in skip. If pos > Length[s], then pos will be returned. Otherwise, if no such position exists, Length[s] will be returned. BoundsFault occurs when pos < 0.
Substr [base:
ROPE, start:
INT ← 0, len:
INT ← MaxLen]
RETURNS [
ROPE]
Returns a new rope, x, with contents from the given piece of base. Copying will not occur unless the resulting rope is small. The rope returned may have existed prior to the invocation of Substr. BoundsFault occurs when start < 0 or start > Length[base].
ToRefText [base:
ROPE]
RETURNS [
REF
TEXT]
Returns a new REF TEXT containing contents from the given rope. Copying always occurs.
Translate [base:
ROPE, start:
INT ← 0, len:
INT ← MaxLen, translator: TranslatorType ←
NIL]
RETURNS [ROPE]
Returns a new
ROPE constructed by applying the translator procedure to each character in the indicated piece. A
NIL translator performs no translation, but the returned rope will be a copy of the given rope. TranslatorType is defined as:
TranslatorType: TYPE = PROC [old: CHAR] RETURNS [new: CHAR];
BoundsFault occurs when start < 0 or start > Length[base].
VerifyStructure [s:
ROPE]
RETURNS [leaves, nodes, maxDepth:
INT]
Examines the rope s, verifying that certain representation invariants hold, and counting the leaves (text and object variants) and nodes (substr, concat, and replace variants) in the rope. The maximum depth of the rope is also returned. VerifyFailed occurs when a bad representation is discovered. This operation is intended for diagnostic use only.
3. Rope Pragmatics
Implementation of rope literals
Currently, each rope literal in a program module consumes two words of space in that module's global frame, hence in the MDS. Unlike STRING literals, rope literals do not consume large amounts of global or local frame space, and it is not valid to follow a rope literal with 'L. There is a very good chance that the global frame cost for rope literals will be reduced to a small constant someday before the MDS disappear
The binder detects multiple occurrences of the same rope literal, and only allocates a single rope no matter how often the same literal appears. This sharing is permitted since ropes are assumed to be immutable. Someday the loader may perform the same merging of rope literals. The merging algorithm could even be improved to detect long ropes that differ only slightly, and generate a more compact representation.
Accumulating characters to make a rope
The downside of rope immutability is that adding a single character to a rope always involves storage allocation. This is fine if somebody is actually interested in the resulting rope, but in many applications a program only wishes to accept characters one by one until some event happens, then produce a rope for the final result. The obvious solution is to accumulate the characters in a REF TEXT buffer (see the escape from rope section below), then produce a rope using Rope.FromText.
A standard way to accomplish the same thing without directly manipulating a REF TEXT is to use st: IO.STREAM ← IO.ROS[], so the client can perform IO.PutChar[st, char] until all the characters are needed as a rope; then get the rope via IO.RopeFromROS[st].
The IO interface provides several other more specialized ways of generating ropes. See the IOStream documentation for more information.
If efficiency is crucial, a client can create a Rope.Text using RopeInline.NewText, and set its length and contents directly. In this case the client must be careful to avoid changing the object (length or contents) after passing it to any external client as a rope, and to ensure that the length field is consistent with the maximum length of the text rope (length d max).
Unsafe conversions to and from rope
Some Cedar programs must deal with
STRINGs passed through existing Mesa interfaces. The
ConvertUnsafe interface is provided for these programs to use. It contains the following procedures, which are almost self-explanatory:
AppendRefText: PROC [to: LONG STRING, from: REF READONLY TEXT];
AppendRope: PROC [to: LONG STRING, from: ROPE];
ToRefText: PROC [from: LONG STRING] RETURNS [REF TEXT];
ToRope: PROC [from: LONG STRING] RETURNS [Rope.Text];
Note that no allocation of uncounted storage is done behind the ConvertUnsafe interface (the client passes storage into the Append* procedures). Also note that the compiler will lengthen a STRING into a LONG STRING, so no separate procedures are needed to deal with short STRINGs.
In some cases the following hack is justified:
r: Rope.Text ← Rope.Flatten[sourceRope];
s:
LONG
STRING ←
LOOPHOLE[r];
now use s ...
This should be used only if high efficiency is important and if there is high probability that sourceRope is already flat (is a text rope). This trick is banned if the string s is being passed to someone who might modify it, or someone who might retain and use a reference to it after the (reference-counted) reference r has disappeared. Think carefully before using this one!
Speedups from using other rope interfaces
For applications that are limited in performance by the performance of the Rope package, one possible way of improving the performance (at considerable cost in code space) is to use some of the inline routines defined in Rope. It is recommended that this step be delayed until proven to be necessary. The operations of interest are InlineFetch, InlineIsEmpty, InlineLength, and InlineFlatten.
For an application that exhibits good locality of reference to a long rope (successive Fetch calls are often consecutive or have nearly-equal indices), a considerable speedup is available by in effect opening an IO.STREAM on the rope (to keep state between fetches).
For clients who need to access large files as if they were ropes, the RopeFile package should be used.
Escape from rope
Sometimes rope is just not what you need: you need a structure to hold together some characters, and you update these characters so frequently (and pass them around in such well-controlled ways) that the extra expense of rope allocation is not acceptable. Then you should use REF TEXT (documented in the language manual). The RefText interface provides operations identical to the rope comparison and pattern matching procedures, but for REF TEXT. It also provides some procedures with no analogy for rope. These include a destructive append operation and a pool of preallocated TEXTs that can be time-shared among separate applications. See the interface RefText.mesa for more information.
How deep ropes impact performance
Because ropes try to avoid copying for Cat, Substr, and Replace operations, the depth of the tree used to represent a rope object can become arbitrarily deep. For invocations of Map and PieceMap operations this additional depth may lead to errors due to exhaustion of frame space (which comes from MDS). The standard implementation attempts to keep the depth small by flattening small ropes into text variants and by rebalancing deep ropes. A client may wish to further reduce the depth of a rope object through various means, including rebalancing the tree (via Balance).
There are practical limits on flattening. No flat piece composed solely of a text variant may be longer than LAST[NAT]. Also, virtual memory space may be quite limited, so large amounts of flattening of large ropes will first fragment, then exhaust virtual memory.
The cost of the Fetch operation will be proportional to the depth of the tree in the worst case. The cost of the Map and PieceMap operations is mostly dependent on the number of pieces present in the given range of indexes, and the number of times the action routines are called.
Two examples of user-defined rope
The first example is a simple computed sequence of characters corresponding to the ASCII character set. That is, the sequence is 256 characters long, character 0 has code 0, and so on. We write the procedures as follows:
MyFetch:
PROC [base:
REF, index:
INT]
RETURNS [
CHAR] = {
RETURN [LOOPHOLE[LowHalf[index], CHAR]]};
MakeAsciiSet:
PROC []
RETURNS [
ROPE] = {
RETURN [MakeRope[base: NIL, size: 256, fetch: MyFetch]]};
MakeAsciiSet creates a rope that can be used as any other rope. It "behaves" as a sequence of 256 characters in rope operations. However, it does not occupy as much space as a sequence of 256 characters.
The second example is that of a file that has been made into a rope. To keep things simple we shall assume the following file interface:
OpenRead: PROC [name: ROPE] RETURNS [File]
ReadBuffer: PROC [file: File, index: INT, buffer: REF TEXT]
ByteLength: PROC[file: File] RETURNS [INT]
OpenRead excludes others from writing the file, so we don't have to worry about its contents changing. Given the above operations, we can write the following program to make a file into a rope:
MyRef: TYPE = REF MyData;
MyData:
TYPE =
MONITORED
RECORD [
need: BOOL, -- TRUE if buffer is invalid
length: INT, -- byte length of file
pageIndex: INT, -- byte index of page in buffer
file: File, -- file object for this rope
buf: REF TEXT -- buffer for this rope
];
FileFetch:
PROC [r:
REF, index:
INT]
RETURNS [
CHAR] = {
FileFetchEntry:
ENTRY
PROC [rr: MyRef, index:
INT]
RETURNS [
CHAR] =
INLINE {
pageIndex: INT ← rr.pageIndex;
IF index >= rr.length THEN ERROR; -- should never happen
IF rr.need
OR index
NOT
IN [pageIndex..pageIndex+PageSize)
THEN {
need a different buffer of characters
rr.pageIndex ← pageIndex ← index MOD PageSize;
ReadPage[rr.file, pageIndex, rr.buf];
rr.need ← FALSE};
RETURN [rr.buf[NARROW[index-pageIndex, NAT]]];
};
WITH r SELECT FROM
rr: MyRef => RETURN[FileFetchEntry[rr, index]];
ENDCASE => ERROR; -- should never happen
};
MakeFileRope:
PROC [name:
ROPE]
RETURNS [
ROPE] = {
file: File ← OpenRead[name];
length: INT ← ByteLength[file];
buf: REF TEXT ← NEW[TEXT[PageSize]];
base: MyRef ←
NEW[MyData ←
[need: TRUE, length: length, pageIndex: 0, file: file, buf: buf]];
RETURN [MakeRope[base: base, size: length, fetch: FileFetch]];
};
The above program fragment is not a very good implementation for a file rope. We have made no provision for closing the file or dealing with an error such as an aborted transaction. Random access to this file rope could lead to intolerable thrashing, with a page read for every fetch. The lack of copying for Cat, Substr, and Replace would serve to make the problem worse, since many fragments of the file could be accessible. See the RopeFile implementation for a more practical approach.
A. Appendix: Modules in the Rope implementation
The Rope package interfaces are described by [Indigo]<Cedar>Top>Rigging.df; the implementation is described by [Indigo]<Cedar>Top>RiggingMaker.df. The principal modules are:
Rope -- definitions
RopePrivate -- private definitions
RopeImpl -- implementation for most operations
RopeImplExt -- implementation for scanning operations
-- (Find, Match, SkipOver, SkipTo, Run, VerifyStructure)