RedBlackTreeImpl.mesa
Copyright Ó 1985, 1986, 1987, 1991 by Xerox Corporation. All rights reserved.
MBrown on October 20, 1983 11:37 pm
Russ Atkinson (RRA) February 11, 1987 11:19:03 pm PST
Carl Hauser, April 11, 1985 4:45:25 pm PST
JKF October 27, 1988 8:26:53 am PDT
Notes on the data structure
The color of a node should be thought of as the color of the edge coming into the node. The general color invariants for red-black trees are that (1) all external nodes are black, while internals are red or black, and (2) all paths from an internal node to an external node contain the same number of black arcs.
The tree edges of a 2-3-4 tree become black arcs in the binarized red-black representation, while the internal 2-, 3-, and 4-nodes are represented by connected red subtrees: the degenerate one with one node, both of the possible two node trees, and the completely balanced tree on three nodes. Another way of stating this invariant is to say that no path contains two consecutive red arcs.
The top-down insertion algorithm below is from Program 5 of "A Dichromatic Framework for Balanced Trees" by Leo Guibas and Bob Sedgewick. The deletion algorithm incorporates several changes to the algorithm in that paper. This program was originally adapted from Dave Gifford's StringStorageImpl.Mesa.
DIRECTORY
Basics USING [Comparison],
RedBlackTree;
RedBlackTreeImpl:
CEDAR
MONITOR
LOCKS self USING self: Table
EXPORTS RedBlackTree
SHARES RedBlackTree
= BEGIN OPEN RedBlackTree;
Comparison: TYPE = Basics.Comparison;
RedBlack: TYPE = BOOL;
Red: BOOL = TRUE;
Black: BOOL = FALSE;
DuplicateKey: PUBLIC ERROR = CODE;
BadTable: PUBLIC ERROR = CODE;
doChecking:
BOOL =
TRUE;
forces extra validity checking of inputs.
Private: TYPE = REF PrivateRep;
PrivateRep:
TYPE =
RECORD [
size: INT,
getKey: GetKey,
compare: Compare,
y: Node, -- This is a dummy node that is at both links of z
z: Node]; -- This is a dummy node used as the child of leaf (external) nodes
Locking mechanism
InnerProc: TYPE = PROC [private: Private, y: Node, z: Node, root: Node];
DoInner:
ENTRY
PROC [self: Table, inner: InnerProc] =
TRUSTED {
ENABLE UNWIND => NULL;
private: Private ¬ LOOPHOLE[self.private];
root: Node ¬ self.root;
IF root = NIL OR root.rbLLink # NIL THEN RETURN WITH ERROR BadTable;
inner[private, private.y, private.z, root];
};
Procedures exported to RedBlackTree
Create:
PUBLIC
PROC [getKey: GetKey, compare: Compare]
RETURNS [table: Table] = {
private: Private ¬ NEW[PrivateRep ¬ [size: 0, getKey: getKey, compare: compare, y: NEW[NodeRep], z: NEW[NodeRep]]];
root: Node ¬ NEW[NodeRep];
y: Node ¬ private.y;
z: Node ¬ private.z;
table ¬ NEW[TableRep];
table.root ¬ root;
table.private ¬ private;
root.rbColor ¬ Black;
root.rbLLink ¬ NIL;
root.rbRLink ¬ z;
y.rbColor ¬ Red; y.rbLLink ¬ y.rbRLink ¬ NIL;
z.rbColor ¬ Black; z.rbLLink ¬ z.rbRLink ¬ y;
};
Size:
PUBLIC
PROC [self: Table]
RETURNS [size:
INT ¬ 0] = {
inner: InnerProc = {
InnerProc: TYPE = PROC [private: Private, y: Node, z: Node, root: Node];
size ¬ private.size;
};
IF self # NIL THEN DoInner[self, inner];
};
Insert:
PUBLIC
PROC [self: Table, dataToInsert: UserData, insertKey: Key] = {
node: Node ¬ NEW[NodeRep ¬ [data: dataToInsert]];
InsertNode[self, node, insertKey];
};
InsertNode:
PUBLIC
PROC [self: Table, nodeToInsert: Node, insertKey: Key] = {
Requires self.rbColor = self.rbRLink.rbColor = z.rbColor = Black, y.rbColor = Red.
keyAlreadyPresent: BOOL ¬ FALSE;
inner: InnerProc = {
InnerProc: TYPE = PROC [private: Private, y: Node, z: Node, root: Node];
x, gg, g, f: Node;
c: Comparison ¬ greater; --since actual tree sits in right subtree of header node...
search the tree, rebalancing on the way down.
f ¬ root;
x ¬ f.rbRLink;
DO
IF (x.rbLLink.rbColor=Red)
AND (x.rbRLink.rbColor=Red)
THEN {
IF x = z
THEN {
an external node has been found; check to be sure that a duplicate key is not present.
IF c = equal THEN {keyAlreadyPresent ¬ TRUE; EXIT};
perform the insertion.
x ¬ nodeToInsert;
x.rbLLink ¬ z;
x.rbRLink ¬ z;
IF c = less THEN f.rbLLink ¬ x ELSE f.rbRLink ¬ x;
c ¬ equal;
private.size ¬ private.size + 1;
};
do color flip.
x.rbLLink.rbColor ¬ Black;
x.rbRLink.rbColor ¬ Black;
x.rbColor ¬ Red;
IF f.rbColor=Red
THEN {
two reds in a row, so rebalance (gg may be self).
g ¬ Balance[gg, g, f, x];
x ¬ g;
};
};
IF c = equal THEN EXIT;
gg ¬ g; g ¬ f; f ¬ x;
--x ¬ IF (c ¬ private.compare[insertKey, x.data]) = less THEN x.rbLLink ELSE x.rbRLink;
c ¬ private.compare[insertKey, x.data];
IF c = equal THEN {keyAlreadyPresent ¬ TRUE; EXIT};
x ¬ IF c = less THEN x.rbLLink ELSE x.rbRLink;
ENDLOOP;
root.rbRLink.rbColor ¬ Black; --root is always black
};
DoInner[self, inner];
IF keyAlreadyPresent THEN ERROR DuplicateKey;
};
Delete:
PUBLIC
PROC [self: Table, deleteKey: Key]
RETURNS [deletedNode: Node ¬
NIL] = {
inner: InnerProc = {
InnerProc: TYPE = PROC [private: Private, y: Node, z: Node, root: Node];
f, result, parentOfResult: Node;
x, g, b: Node;
c: Comparison;
result ¬ NIL;
f ¬ root; x ¬ f.rbRLink;
IF x = z THEN RETURN;
y.rbColor ¬ Black; --children of external nodes have no red to contribute to rebalancing.
Inject a red node at the root if necessary.
IF (x.rbLLink.rbColor = Black) AND (x.rbRLink.rbColor = Black) THEN x.rbColor ¬ Red;
IF (c ¬ private.compare[deleteKey, x.data]) = equal
THEN {
result ¬ x;
parentOfResult ¬ f;
};
Search the tree for the symmetric order succecessor of the node containing key
(or the node itself if its RLink is z)
DO
{
g ¬ f;
f ¬ x;
IF c = less
THEN {b ¬ x.rbRLink; x ¬ x.rbLLink}
ELSE {b ¬ x.rbLLink; x ¬ x.rbRLink};
If x is the node we're to return, save pointers to it and its parent for later.
IF x # z
AND (c ¬ private.compare[deleteKey, x.data]) = equal
THEN {
result ¬ x;
parentOfResult ¬ f;
};
Note that if x is Red, no rotations happen now; this is what re-establishes the properties of g and f eventually...
IF x.rbColor=Red OR x.rbLLink.rbColor=Red OR x.rbRLink.rbColor=Red THEN LOOP;
IF b.rbColor=Red
THEN {
Single rotation to move the red link b onto the search path.
IF b = f.rbLLink
THEN { f.rbLLink ¬ b.rbRLink; b.rbRLink ¬ f; }
ELSE { f.rbRLink ¬ b.rbLLink; b.rbLLink ¬ f; };
f.rbColor ¬ Red; b.rbColor ¬ Black;
IF f = g.rbLLink THEN g.rbLLink ¬ b ELSE g.rbRLink ¬ b;
Move back up the path to allow g and f to get re-established
x ¬ b; GOTO FixCompare;
};--IF b.rbColor=Red
IF x = z THEN EXIT;
It is essential that this exit test be right here - after the single rotation, but before the double rotations.
x.rbColor ¬ Red;
IF b.rbLLink.rbColor = Red
THEN {
b.rbLLink.rbColor ¬ Black;
x ¬ Balance[g, f, b, b.rbLLink];
GOTO FixCompare;
};
IF b.rbRLink.rbColor = Red
THEN {
b.rbRLink.rbColor ¬ Black;
x ¬ Balance[g, f, b, b.rbRLink];
GOTO FixCompare;
};
"Color flip"
f.rbColor ¬ Black;
b.rbColor ¬ Red;
EXITS
FixCompare => c ¬ private.compare[deleteKey, x.data];
};
ENDLOOP;
root.rbRLink.rbColor ¬ Black;
z.rbColor ¬ Black;
y.rbColor ¬ Red;--undo the color mess
If search has been successful, f is now the symmetric order successor of result (or result itself if result.rbRLink = z), and g is its parent. f is Red, unless it is also the root. x and b are irrelevant.
IF result #
NIL
THEN {
detach f from the tree.
IF g.rbLLink = f THEN g.rbLLink ¬ z ELSE g.rbRLink ¬ z;
if f is not the result node, splice f into tree in place of result.
IF f # result
THEN {
IF parentOfResult.rbLLink = result
THEN parentOfResult.rbLLink ¬ f
ELSE parentOfResult.rbRLink ¬ f;
f.rbLLink ¬ result.rbLLink; f.rbRLink ¬ result.rbRLink;
f.rbColor ¬ result.rbColor;
};
private.size ¬ private.size - 1;
};
deletedNode ¬ result;
};
DoInner[self, inner];
};
LookupNode:
PUBLIC
PROC [self: Table, lookupKey: Key]
RETURNS [node: Node ¬
NIL] = {
inner: InnerProc = {
InnerProc: TYPE = PROC [private: Private, y: Node, z: Node, root: Node];
node ¬ root.rbRLink;
DO
IF node = z THEN {node ¬ NIL; EXIT};
SELECT private.compare[lookupKey, node.data]
FROM
equal => EXIT;
less => node ¬ node.rbLLink;
greater => node ¬ node.rbRLink;
ENDCASE => ERROR;
ENDLOOP;
};
DoInner[self, inner];
};
Lookup:
PUBLIC
PROC [self: Table, lookupKey: Key]
RETURNS [data: UserData ¬
NIL] = {
inner: InnerProc = {
InnerProc: TYPE = PROC [private: Private, y: Node, z: Node, root: Node];
equalNode: Node ¬ root.rbRLink;
DO
IF equalNode = z THEN RETURN;
SELECT private.compare[lookupKey, equalNode.data]
FROM
equal => EXIT;
less => equalNode ¬ equalNode.rbLLink;
greater => equalNode ¬ equalNode.rbRLink;
ENDCASE => ERROR;
ENDLOOP;
data ¬ equalNode.data;
};
DoInner[self, inner];
};
Lookup3:
PUBLIC
PROC [self: Table, lookupKey: Key]
RETURNS [leftData, equalData, rightData: UserData ¬
NIL] = {
inner: InnerProc = {
InnerProc: TYPE = PROC [private: Private, y: Node, z: Node, root: Node];
equalNode: Node ¬ root.rbRLink;
leftNode: Node ¬ NIL;
rightNode: Node ¬ NIL;
DO
IF equalNode = z THEN {equalNode ¬ NIL; EXIT};
SELECT private.compare[lookupKey, equalNode.data]
FROM
equal => {
IF equalNode.rbLLink # z
THEN {
leftNode ¬ equalNode.rbLLink;
WHILE leftNode.rbRLink # z DO leftNode ¬ leftNode.rbRLink ENDLOOP;
};
IF equalNode.rbRLink # z
THEN {
rightNode ¬ equalNode.rbRLink;
WHILE rightNode.rbLLink # z DO rightNode ¬ rightNode.rbLLink ENDLOOP;
};
EXIT;
};
less => {rightNode ¬ equalNode; equalNode ¬ equalNode.rbLLink};
greater => {leftNode ¬ equalNode; equalNode ¬ equalNode.rbRLink;};
ENDCASE => ERROR;
ENDLOOP;
IF equalNode # NIL THEN equalData ¬ equalNode.data;
IF leftNode # NIL THEN leftData ¬ leftNode.data;
IF rightNode # NIL THEN rightData ¬ rightNode.data;
};
DoInner[self, inner];
};
LookupSmallest:
PUBLIC
PROC [self: Table]
RETURNS [data: UserData ¬
NIL] = {
inner: InnerProc = {
InnerProc: TYPE = PROC [private: Private, y: Node, z: Node, root: Node];
smallestNode: Node ¬ root.rbRLink;
IF smallestNode=z THEN RETURN;
UNTIL smallestNode.rbLLink=z DO smallestNode ¬ smallestNode.rbLLink ENDLOOP;
data ¬ smallestNode.data;
};
DoInner[self, inner];
};
LookupNextLarger:
PUBLIC
PROC [self: Table, lookupKey: Key]
RETURNS [data: UserData ¬
NIL] = {
inner: InnerProc = {
InnerProc: TYPE = PROC [private: Private, y: Node, z: Node, root: Node];
largerNode: Node ¬ NIL;
x: Node ¬ root.rbRLink;
UNTIL x=z
DO
IF private.compare[lookupKey, x.data] = less
THEN {largerNode ¬ x; x ¬ x.rbLLink}
ELSE x ¬ x.rbRLink;
ENDLOOP;
IF largerNode # NIL THEN data ¬ largerNode.data;
};
DoInner[self, inner];
};
LookupLargest:
PUBLIC
PROC [self: Table]
RETURNS [data: UserData ¬
NIL] = {
inner: InnerProc = {
InnerProc: TYPE = PROC [private: Private, y: Node, z: Node, root: Node];
largestNode: Node ¬ root.rbRLink;
IF largestNode=z THEN {largestNode ¬ NIL; RETURN};
UNTIL largestNode.rbRLink=z DO largestNode ¬ largestNode.rbRLink ENDLOOP;
data ¬ largestNode.data;
};
DoInner[self, inner];
};
LookupNextSmaller:
PUBLIC
PROC [self: Table, lookupKey: Key]
RETURNS [data: UserData ¬
NIL] = {
inner: InnerProc = {
InnerProc: TYPE = PROC [private: Private, y: Node, z: Node, root: Node];
smallerNode: Node ¬ NIL;
x: Node ¬ root.rbRLink;
UNTIL x=z
DO
IF private.compare[lookupKey, x.data] = greater
THEN { smallerNode ¬ x; x ¬ x.rbRLink }
ELSE x ¬ x.rbLLink;
ENDLOOP;
IF smallerNode # NIL THEN data ¬ smallerNode.data;
};
DoInner[self, inner];
};
DestroyTable:
PUBLIC
PROC [self: Table] = {
inner: InnerProc = {
InnerProc: TYPE = PROC [private: Private, y: Node, z: Node, root: Node];
new: Table ¬ Create[private.getKey, private.compare];
self.private ¬ new.private;
self.root ¬ new.root;
};
DoInner[self, inner];
};
EnumerateIncreasing:
PUBLIC
PROC [self: Table, procToApply: EachNode] = {
quick and dirty recursive implementation, has overhead of one procedure call per call to procToApply.
inner: InnerProc = {
InnerProc: TYPE = PROC [private: Private, y: Node, z: Node, root: Node];
VisitSubtree:
PROC [node: Node]
RETURNS [stop:
BOOL ¬
FALSE] = {
link: Node ¬ node.rbLLink;
IF link # z AND VisitSubtree[link] THEN RETURN [TRUE];
IF procToApply[node.data] THEN RETURN [TRUE];
link ¬ node.rbRLink;
IF link # z THEN RETURN [VisitSubtree[link]];
};
IF root = NIL OR root.rbRLink = z THEN RETURN;
[] ¬ VisitSubtree[root.rbRLink];
};
DoInner[self, inner];
};
EnumerateDecreasing:
PUBLIC
PROC [self: Table, procToApply: EachNode] = {
quick and dirty recursive implementation, has overhead of one procedure call per call to procToApply.
inner: InnerProc = {
InnerProc: TYPE = PROC [private: Private, y: Node, z: Node, root: Node];
VisitSubtree:
PROC [node: Node]
RETURNS [stop:
BOOL ¬
FALSE] = {
link: Node ¬ node.rbRLink;
IF link # z AND VisitSubtree[link] THEN RETURN [TRUE];
IF procToApply[node.data] THEN RETURN [TRUE];
link ¬ node.rbLLink;
IF link # z THEN RETURN [VisitSubtree[link]];
};
IF root = NIL OR root.rbRLink = z THEN RETURN;
[] ¬ VisitSubtree[root.rbRLink];
};
DoInner[self, inner];
};
CheckTable:
PUBLIC
PROC [self: Table] = {
Raises BadTable if the table t is not well-formed.
inner: InnerProc = {
InnerProc: TYPE = PROC [private: Private, y: Node, z: Node, root: Node];
Assert: PROC [p: BOOL] = {IF NOT p THEN ERROR BadTable };
Check1:
PROC [x: Node, maxKey: Key]
RETURNS [Key,
INTEGER,
BOOL] = {
Returns largest key in subtree rooted at x, number of black arcs on paths to external nodes from x, and whether or not this node is red. ERRORs if it finds two reds in a row, finds different distances to the external nodes (counting only red arcs), or finds keys out of order. If x is an internal node, increments count.
dl, dr: INTEGER;
redChild: BOOL;
IF x = z THEN RETURN [maxKey, 0, FALSE];
[maxKey, dl, redChild] ¬ Check1[x.rbLLink, maxKey];
Assert[~(redChild AND (x.rbColor=Red))];
Assert[private.compare[maxKey,x.data] = (IF count=0 THEN equal ELSE less)];
count ¬ count + 1;
[maxKey, dr, redChild] ¬ Check1[x.rbRLink, private.getKey[x.data]];
Assert[~(redChild AND (x.rbColor=Red))];
Assert[dl=dr];
RETURN [maxKey, dl+(IF x.rbColor=Black THEN 1 ELSE 0), x.rbColor=Red];
};
count: INT ¬ 0;
check structure of the table header, sentinels, etc.
Assert[z.rbLLink=y];
Assert[z.rbRLink=y];
check structure of table.
IF root.rbRLink # z
THEN {
smallest: Node ¬ root.rbRLink;
UNTIL smallest.rbLLink = z DO smallest ¬ smallest.rbLLink ENDLOOP;
[] ¬ Check1[root.rbRLink, private.getKey[smallest.data]];
};
};
DoInner[self, inner];
};
Private procedure, used by Insert (1 call), Delete (2 calls).
Balance:
PROC [gg, g, f, x: Node]
RETURNS [Node] = {
Balances a local portion of the tree.
g is a child of gg (it may be either child).
if g -rlink-> f -rlink-> x, or x <-llink- f <-llink- g, then only a single rotation is needed; otherwise two rotations are used. when called from insert, g is black and both f and x are red. after rebalancing, the new g is black, and is the parent of both the new f and the new x, which are both red.
t: Node;
tc: RedBlack;
do first rotation if necessary
IF f=g.rbLLink
THEN {
IF x=f.rbRLink
THEN {
f.rbRLink ¬ x.rbLLink; x.rbLLink ¬ f;
t ¬ f;
f ¬ x;
x ¬ t;
};
}
ELSE {
IF x=f.rbLLink
THEN {
f.rbLLink ¬ x.rbRLink; x.rbRLink ¬ f;
t ¬ f; f ¬ x; x ¬ t;
};
};
do second rotation
IF x=f.rbLLink
THEN { g.rbLLink ¬ f.rbRLink; f.rbRLink ¬ g; }
ELSE { g.rbRLink ¬ f.rbLLink; f.rbLLink ¬ g; };
update link in great grandparent
IF g = gg.rbRLink THEN gg.rbRLink ¬ f ELSE gg.rbLLink ¬ f;
do color swap (when called from Insert, g is always Black and f is always Red)
tc ¬ g.rbColor; g.rbColor ¬ f.rbColor; f.rbColor ¬ tc;
return g, new child of gg (name changed in second rotation)
RETURN [f]
};
END.
CHANGE LOG
Created by MBrown on January 18, 1980 10:58 AM
It worked the first time!
Changed by MBrown on January 19, 1980 8:50 PM
Implemented EnumerateIncreasing. Fixed a bug in CheckTable that defeated checking of
key order. Made changes to allow MinusInfinity as a Key. Changed Insert to use
keyPresent flag, as in Guibas-Sedgewick Program 5.
Changed by MBrown on 20-Jan-80 13:28
Changed Balance to compare pointers rather than keys, and eliminated the code for
"swap g and f" since only g is returned from Balance.
Changed by MBrown on March 4, 1980 2:17 PM
Moved inline definitions of Node structure to RedBlackTree, which we now OPEN.
Changed by MBrown on April 13, 1980 4:21 PM
Coded Delete. Made LookupSmallest and LookupNextLarger PUBLIC, since they may be
useful in testing Delete.
Changed by MBrown on April 13, 1980 6:01 PM
Delete seemed to work the first time, a surprise. I removed some the redundant
checks that were there for debugging; the old version will be called
RedBlackCheckedImpl.
Changed by MBrown on May 1, 1980 10:03 PM
Changed Delete to avoid a search from the root to find the parent of the result node.
This involved actually understanding the algorithm; "cleaning up" the way rebalancing
is done introduced several bugs.
Changed by MBrown on May 7, 1980 6:36 PM
Added DestroyTable and Size. We now allocate/free TableObjects using procs in the
RedBlackTree interface. Renamed module.
Changed by MBrown on August 11, 1980 10:17 PM
Major revisions (prompted by first real client, the Juniper locks code, which will
make further changes for its special needs). We take the key comparison function from
the interface, instead of requiring "<" to be applicable. This means that comparisons
may be expensive, so we save the result in a local variable to avoid doing twice the
required number. MinusInfinity is no longer treated specially in the code. We do
explicit field selections for the links and color fields, instead of using inline
procs from the interface. Added Lookup3 proc for Juniper locks application. We now
pass in ALL storage, including storage for table objects. y and z now point to nodes
in the global frame of this module, rather than being replicated for each table; thus
it is now unsafe for multiple processes to be inside of an instance of this module,
even if they operate on distinct tables (it was always unsafe to perform concurrent
operations on a single table).
Changed by MBrown on October 29, 1980 8:42 PM
Eliminated all reference to MinusInfinity (it was being used in CheckTable).
Changed by MBrown on January 7, 1981 9:47 PM
Changes for collectible storage, module monitor.
Changed by MBrown on March 1, 1982 3:30 pm
New version for use in Alpine. No allocation done in package, even for sentinels. No
more "AssignKey[,]", which was used only for sentinels; instead, check for z explicitly. No
more "Key[]", and Compare now takes a Key and a Node. A Table is just a Node; no count
of number of nods in a table is kept in the table.
Other variants that are possible (but hopefully are not required):
(1) use < and = instead of 3-way compares.
(2) use object monitor (requires replicating z and y per-table), or no monitor.
(3) Table and Node are different types (this is required for object monitor, to hold y and z.)
Changed by MBrown on March 2, 1982 4:20 pm
Table is a RECORD [Node]. Clean up errors in interface.
Changed by MBrown on June 28, 1982 11:12 am
CEDAR implementation.
Changed by MBrown on August 26, 1982 11:11 pm
Use Environment.Comparison.
Changed by MBrown on November 10, 1982 6:34 pm
DuplicateKey is now an ERROR.
Changed by MBrown on October 20, 1983 11:38 pm
Russ Atkinson (RRA) May 7, 1985 12:09:51 pm PDT
Completely revised to make this a monitorized data abstraction. The user's data is now REFs, and the user supplies the comparison routine(s).
JKF October 27, 1988 8:22:37 am PDT
Made a change to InsertNode to fix a bug associated with insertion of duplicate keys.