\input CS445Hdr % This is CS445N6B.tex of March 02, 00:30 \def\header{6. Intersection problems} \def\title{Lectures 11--13} \setcount0 10 \sect{6.3.3. Mairson's segment intersection algorithm.} There is a special case of the pairwise segment intersection problem in which the theoretical lower bound for the running time can actually be attained. Harry Mairson discovered quite recently an algorithm that, given two sets of $n$ segments $S$ and $T$ such that the elements in each set are pairwise disjoint, reports all the $I$ intersecting pairs $(s\in S, t\in T)$ in time $O(n\log n + I)$.\par Like the general segment intersection algorithm, Mairson's too is based on the sweeping line technique, except that two separate sorted lists,one for $S$ and one for $T$, are used to keep track of the segments that intersect the current sweeping line. Since there are no $S$-$S$ intersections, the $S$ list changes only when endpoints are encountered, and the same occurs for the $T$ list; $S$-$T$ intersections do not affect the ordering of segments inside $S$ or $T$. Therefore, we do not have to process these intersections in increasing order of $x$-coordinates, which accounted for the $O(I\log n)$ term in the previous algorithm.\par A segment $r\in S\un T$ is inserted in the appropriate tree when the sweeping line encounters its leftmost endpoint, and deleted when we encounter the rightmost one. Just before deleting $r$, we could in principle detect and report all its intersections with segments currently in the opposite tree. It is easy to see that, in this way, we would be sure to detect every $S$-$T$ intersection exactly once. The problem with this idea is that the trivial implementation (test the segment $r$ against all segments in the other tree) could take $\Omega(n^2)$ time in the worst case, even if we had no intersections at all.\par When the right endpoint of $s\in S$ is processed, we can easily locate in the list of $T$'s the two that are immediately above and below it. It would be nice if all segments intersected by $s$ were consecutive elements of the current $T$ list, starting with one of these two neighbors:\fig4 If this were always the case, then all we would have to do is to start at these two segments and proceed away from $s$, both upwards and downwards along the $T$ list, testing each segment for intersection with $s$ and terminating the search at the first failure in each direction. The time required for this search would then be $O(1)$ for each reported intersection (i.e., for each successful test), plus $O(\log n)$ for each endpoint (this includes the time to locate the endpoint of $s$ among the segments of $T$ and to perform the two final, failing tests of the search). The total time then would be the optimal one, $O(n\log n + I)$.\par Unfortunately, there are situations for which the list $T$ will contain ``shielding'' segments disjoint from $s$ mixed with the intersecting ones. For example, when the endpoint of $s1$ is hit in the figure below,\fig4 the $T$ list would contain segments $t1$ to $t5$ (in order from top to bottom); the segment $t3$ would be shielding $t1$ and $t2$ from $s1$, and $t6$ would be shielding $t7$.\par A segment $r\in S\un T$ can ``shield'' an $S$-$T$ intersection, i.e. it can prevent its detection by the linear search described above, only if the left endpoint $p$ of $r$ sits in an {\it $S$-$T$ cone}, the region bounded above and below by two intersecting segments $s\in S$ and $t\in T$ and extending to the right of their intersection. Mairson's idea is to detect and destroy every cone containing $p$, as soon as $p$ is encountered, by splitting one of the two segments defining the cone at some point between the intersection and the current sweep line.\fig4 After this operation, $r$ can no more shield the intersection of $s$ and $t$. Every time a new endpoint is going to be processed, Mairson's algorithm guarantees that none of the currently active segments begins in an $S$-$T$ cone. This condition will be called the {\it main invariant}, and it implies that there are no shielding segments in the current lists.\par Fortunately, it is not hard to test whether a new left endpoint $p$ is in an $S$-$T$ cone. Obviously, segments that are not in the current $S$ and $T$ lists cannot be candidates for $s$ or $t$, and from the main invariant (plus the fact that there are no $S$-$S$ or $T$-$T$ crossings) it follows that either all cones containing $p$ have upper side in $T$ and lower one in $S$, or all are the other way around. It also follows that in each list the sides of those cones are consecutive elements {\it immediately} above or below $p$:\fig4 It doesn't matter which of the two sides of the cone we split, so, to make things simpler, let's always split the inferior ones. Since the situation is symmetrical in $S$ and $T$, let's assume these are in $S$. If $t$ is the $T$ segment immediately above $p$, we can destroy all cones containing $p$ by splitting their lower sides at a point just to the right of the segment $t$, as indicated in the figure above\footfig1(3.5){Mairson's own version of the algorithm splits those segments at the current sweep abscissa ($xp$), as shown in the figure at right. Actually, any splitting point between the intersection and $x=xp$ would be OK, except for the order in which the intersections are reported and for a few extra details in the algorithm. Note that if we split $s$ at $x=xp$ instead of at $s\inte t$, its new left endpoint can sit in $S$-$T$ cones not containing $p$ (like $t2$-$s2$ in the figure); therefore, these cones too would have to be located and destroyed. However, this is a relatively trivial problem, that requires only a minor change in the algorithm.}. Splitting one such segment $s$ produces a left piece $s\pr$ and a right piece $s\prr$; besides replacing $s$ by $s\prr$ in the $S$ list, we have to report all intersections of $s\pr$ with members of the $T$ list. By using once more the main invariant, we can prove that $s\pr$ intersects one or more {\it consecutive} segments from the $T$ list, the lowest one of which is $t$.\par The formal description of the algorithm is given below. To avoid worrying about degeneracies, we assume the segements to be open, i.e. not to include their endpoints. The lists of active $S$ and $T$ segments are represented as balanced search trees \prg{SL} and \prg{TL}, and are manipulated by the following routines:\par \def\tb{\hangindent 20pt after1\ \ \ \ } \tb\bu\ \prg{Locate($p$,$L$)}, where $L$ is \prg{SL} or \prg{TL}, returns the pair $(a,b)$ of segments from $L$ that are just above and below the point $p$;\par \tb\bu\ \prg{Delete($a$,$L$)} deletes segment $a$ from tree $L$;\par \tb\bu\ \prg{Insert($c$,$a$,$b$)} inserts segment $c$ between segments $a$ and $b$ in the tree (\prg{SL} or \prg{TL}) that contains both;\par \tb\bu\ \prg{Above($a$)}, \prg{Below($a$)} return the segment immediately above or below $a$ in the list containing $a$.\par The first three operations (including the rebalancing of the trees) take $O(\log n)$ time, and with a little care it is possible to implement the last two so that they take just $O(1)$ time.\par \def\intep{\mathrel{\intep}} The trees \prg{SL} and \prg{TL} initially contain two dummy horizontal segments at $y=\pm\infty$, each stretching from $x=-\infty$ to $x=+\infty$. We also use $a\intep b$ as a predicate meaning ``segment $a$ intersects segment $b$ on or to the left of the current sweep line ($x=xp$)''.\par \penalty-3000 \algorithm{6.2}{Reporting intersections between two sets of pairwise disjoint segments.}{ \step 1. [Sort points.] {Sort all $2n$ segment endpoints by increasing $x$-coordinate. Let $p$ be the leftmost of them.} \step 2. [New left endpoint.] {Compute $(ta, tb)\gets\prg{Locate($p$,TL)}$, $(sa,sb)\gets\prg{Locate($p$,SL)}$. If $ta\intep sb$, let $(a,b)\gets(ta, sb)$; if $sa\intep tb$, let $(a,b)\gets (sa,tb)$. If neither case holds, jump to step 6.} \step 3. [Split segment $b$.] {Let $q$ be the intersection of $a$ and $b$. Set $a\pr\gets a$.} \step 4. [Report intersections of left part.] {Report $(a\pr, b)$ as an intersecting pair. Set $a\pr \gets \prg{Above($a\pr$)}$; if $a\pr\intep b$ go back to step 4.} \step 5. [Move endpoint of $b$.] {Change the left endpoint of $b$ (as stored in the corresponding tree) to $q$. Set $b\gets \prg{Below($b$)}$; if $a\intep b$ then set $a\pr\gets a$ and go back to step 4.} \step 6. [Insert new segment.] {(At this point we know $p$ does not sit in any $S$-$T$ cone). Let $c$ be the segment whose left endpoint is $p$; if $c\in S$ then $\prg{Insert($c$,$sa$,$sb$)}$ else $\prg{Insert($c$,$ta$,$tb$)}$. Go to step 10.} \step 7. [New right endpoint.] {Let $c$ be the segment whose endpoint is $p$. If $c\in S$ then $\prg{Delete($c$,SL)}$ and let $(a, b)\gets \prg{Locate($c$,TL)}$, else $\prg{Delete($c$,TL)}$ and let $(a, b)\gets \prg{Locate($c$,SL)}$. If $c\intep a$ then go to step 8; if $c\intep b$ go to step 9. Otherwise go to step 10.} \step 8. [Report upgoing intersection.] {Report $(c,a)$. Set $a\gets \prg{Above($a$)}$; if $c\intep a$ repeat step 8, otherwise go to step 10.} \step 9. [Report downgoing intersection.] {Report $(c, b)$. Set $b\gets \prg{Below($b$)}$; if if $c\intep b$ repeat step 9, otherwise go to step 10.} \step 10. [Pick next point.] {Advance $p$ to the next endpoint in order of increasing $x$-coordinates. If $p$ is a left endpoint, go back to step 2, otherwise go back to step 7.}} Assuming the main invariant is true when we reach step 2, it will also be true when we reach step 6; this stems from the fact that the only active $S$-$T$ cones that may contain the new left endpoints introduced in step $3$ are the ones that contain $p$, and these are all destroyed in steps 3--5. Since the remaining steps do not affect the invariant, and the latter is is trivially true the first time we reach step 2, we conclude it will be true every time. Steps 2, 6, 7 and 10 are executed $O(n)$ times and each takes $O(\log n)$, giving a total time of $O(n\log n)$. Step 1, executed once, also takes $O(n\log n)$. The remaining steps are obviously executed $I$ times on the whole, and since each costs $O(1)$ the total time is $O(n\log n + I)$. Applying this algorithm to the figure below,\fig7 the following things will happen:\par \digress{ \parskip 1pt plus1pt $t1$, $t3$, $t4$, $t9$, $s3$, $s4$ inserted\par \dp intersection $t9s4$ reported; $t9$ deleted\par \dp $s2$ inserted\par \dp $t2$ found in $t1$-$s3$ cone\par \dp intersection $s3t1$ reported; $s3$ split\par \dp $t2$ inserted\par \dp $s1$ found in $s2$-$t4$ cone\par \dp intersections $s2t4$, $s2t3$, $s2t2$, $s2t1$ reported; $s2$ split\par \dp intersections $s3t4$, $s3t3$ reported; $s3$ split\par \dp intersections $s4t4$, $s4t3$ reported; $s4$ split\par \dp $s1$ inserted\par \dp $t1$, $t2$ deleted\par \dp $t6$, $t7$, $t5$ inserted\par \dp $t5s1$ reported; $t5$ deleted\par \dp $t8$ found in $t7$-$s2$ cone\par \dp intersections $t7s2$, $t7s1$ reported; $t7$ split\par \dp intersection $t6s2$ reported; $t6$ split\par \dp $t8$ inserted\par \dp $t3$, $t4$, $s1$ deleted\par \dp intersections $s2t8$ reported; $s2$ deleted\par \dp intersections $s3t7$, $s3t6$ reported; $s3$ deleted\par \dp intersections $s4t7$, $s4t6$ reported; $s4$ deleted\par \dp $t6$, $t7$, $t8$ deleted\par } Mairson's algorithm can be be used to find the intersection of two arbitrary simple polygons $P$ and $Q$, whose edge sets are taken to be $S$ and $T$, respectively. The result is the the set $V$ of all points at which the boundary of $P$ crosses that of $Q$. If this set is empty, either $P$ and $Q$ are disjoint, $P\subset Q$, or $Q\subset P$; we can decide between these three cases in linear time, with no preprocessing, by checking whether a vertex of one polygon is inside the other.\par \digress{If the set is not empty, the intersection is a collection of one or more simple polygons, whose perimeters are chains belonging alternately to the perimeters of $P$ and $Q$, connected by points of $V$.\par \dp We probably would like to get each component of the intersection represented in the standard format for a simple polygon, namely as a list of its vertices (or edges) in counterclockwise order. Barring degeneracies, if two edges $pp\pr$ of $P$ and $qq\pr$ of $Q$ meet at a vertex $v\in V$, one of the two segments $vp\pr$, and $vq\pr$ (or at least a prefix of it) will be the edge of the intersection that follows $v$ in counterclockwise order.\fig3\par \dp We can easily find out which of the two satisfies this condition. Suppose it is $vp\pr$; we can list the vertices of that component in counterclockwise order by walking c.c.w. along $P$, starting at $v$, until we encounter another vertex of $V$; at that point we switch to the perimeter of $Q$, and we continue in the same manner until we get back to the starting point $v$.\par \dp There is a big problem with this method: we must know the ordering of the points of $V$ along the perimeter of each of the two polygons. Since we already know the ordering of the edges along each perimeter, all we need is the ordering of the $v$s along each edge. Since each edge can have $\Omega({I/ n}$ intersections, just sorting them would bring back the $\Omega(I\log n)$ term we avoided by using Mairson's algorithm.\par \dp Fortunately, Mairson's algorithm can be modified to give the intersections of each edge in increasing order of $x$-coordinate (although intersections of different edges are still randomly mixed). The essential modifications consist in destroying enclosing $S$-$T$ cones also when the sweep line encounters a {\it right} endpoint, and reporting all intersections detected in the processing of each endpoint in order opposite to that of algorithm 6.2, i.e. from left to right. The invariant then becomes ``no segment endpoint to the left of the sweep line sits in an active $S$-$T$ cone''. } \sect{6.3.4 Rectangle intersections and McCreight's priority search trees.} Another secial case of the pairwise intersection problem which attains asymptotically optimum performance is the case where $\Cscr$ is a set of rectangles aligned with the $x$ and $y$ axes.\fig4 This special case arises in the checking of VLSI design rules, particularly under the Mead-Conway methodology.\par The rectangles of $\Cscr$ that intersect a given vertical line will define on it a collection of intervals. As the line sweeps from left to right over the rectangles, each interval will be seen to ``enter the scene'' at some definite moment, stay put for a while, and then suddenly disappear. If two rectangles intersect, there must be a moment when the corresponding intervals are both present and overlapping; since they don't move up and down along the sweeping line, the overlap could be detected as soon as the second interval appears.\par Therefore, we have reduced the rectangle intersection problem to one of maintaining a dynamic collection of intervals capable of supporting insertions, deletions, and queries of the following kind: given a test interval $T$, report all intervals in the collection that currently overlap $T$. If we could implement this structure in such a way that insertions and deletions take $O(\log n)$ time, and the interval query takes $O(\log n + s)$ when there are $s$ hits, then we would be able to report all $I$ pairs of intersecting rectangles from $\Cscr$ in $O(n\log n + I)$ time.\par To solve this one-dimensional dynamic problem, we map it back into a two-dimensional (dynamic) one, by representing each interval $[x,y]$ as a point with coordinates $(x,y)$. For simplicity, let's assume all coordinates are between 0 and 1; then $0\leq x\leq y\leq 1$, implying that all intervals are mapped into points of the unit square above the main diagonal:\fig4\par Two intevals $T=[x,y]$ and $T\pr=[x\pr,y\pr]$ will intersect iff $x\leq y\pr$ and $y\geq x\pr$. In the two-dimensional analogy, this is equivalent to say that $T\pr$ is above and to the left of the mirror image $\bar T=[y,x]$ of $T$ around the main diagonal.\fig4 So, the overlapping interval query is equivalent to a two-dimensional range query of a somewhat restricted kind, namely ``find all points inside a given rectangle, whose top and left sides are anchored to the boundaries of the unit square''.\par In 1980, E. McCreight presented an elegant solution to this problem, by representing the collection of points $(x,y)$ corresponding to our intervals in a data structured he called a {\it priority search tree}. This data structure is actually more powerful than what we need. Besides allowing insertions and deletions in time $O(\log n)$, it also allows us to find in $O(\log n + s)$ time all $s$ points $(x,y)$ with $a\leq x\leq b$ and $0\leq y\leq c$, where $a$, $b$ and $c$ are arbitrary given numbers. This ``$1{1\over 2}$-dimensional query'' means reporting all points that fall into a rectangle with the base anchored to the bottom of the unit square\foot1{For the interval query problem, as it was phrased above, we would prefer to have it anchored at the top, but McCreight's paper defined the trees the other way around. Oh well, that is life$\ldots$}:\fig4\par McCreight's priority search trees also allow us to find the point of smallest $x$, largest $x$, or smallest $y$ in the given rectangle (but not the one with largest $y$), in just $O(\log n)$ time. As in a standard binary search tree, the root of McCreight's structure contains one point of the set, and the remaining points are assigned to the left or to the right subtree according to whether they are to the left or to the right of a vertical line. Unlike a standard tree, however, the node chosen as the root is {\it not} the one whose $x$-coordinate separates the two subtrees, but the one with smallest $y$-coordinate. The root also contain the range of $x$-coordinates spanned by the nodes of the tree, and the $x$-coordinate of the vertical line separating the two subtrees. The same is true recursively for all subtrees. As it can be seen, the resulting structure is at the same time a heap in the $y$-direction and a search tree in the $x$-direction!\par \def\root{{\char'162\char'157\char'157\char'164}} It is not much harder to search, insert or delete an element in such trees than in standard ones. To search for a new point $p$, we first check whether $ypc$. If not, we search recursively for the point of minimum $x$ in the left subtree that falls into the region; if no such point is found, we look also in the right subtree. We can ignore either subtree if the region is completely on the ``wrong'' side of the vertical separating line. The final answer is either the result of this search or the root of the tree (if the latter falls in the region). The point of maximum $x$ can be found in an entirely analogous way.\par Finally, let's consider how to enumerate all points included in the given rectangle. If $y\root>c$ we can stop; otherwise, if $a\leq x\root \leq b$ we report it. Then we apply the same procedure recursively one or both subtrees, depending on the position of the region relative to the dividing line.\par It is not hard to see why these algorithms work, but it is not clear that they run on $O(\log n)$ time. In the classical binary search scheme each probe eliminates half of the search tree, which is not always the case here: after some probes we may have to continue the search down both subtrees. Clearly, some extra proof is required.\par \def\[{\mathrel{[}} \def\]{\mathrel{]}} Each node visited by the algorithms above falls in one of six classes, depending on how the range of $x$-coordinates of all its descendants (graphically denoted by $<>$) is related to the interval $[a, b]$ (denoted by $\[\,\]$). The six cases are: $<>\[\,\]$, $\[\,\]<>$, $<\[>\]$, $\[<\]>$, $<\[\,\]>$ and $\[<>\]$.\par Nodes of type $<>\[\,\]$ or $\[\,\]<>$ are never visited by the algorithms above. Two nodes of type $<\[\,\]>$ have to be in the same path, so there can be no more than $O(\log n)$ of them in the whole tree; the same holds for nodes of type $<\[>\]$ and $\[<\]>$.\par The point stored in a node of type $\[<>\]$ will be in the range $a\leq x\root \leq b$, so the algorithm for locating the point of minimum $y$ will stop as soon as it gets to one of these nodes, i.e. will visit one of them only if its parent is a node of different type. Since at most $O(\log n)$ of the latter are visited, the same will be true of the former.\par As for the point enumeration algorithm, consider the tree consisting only of the {\it visited} nodes. A node of type $\[<>\]$ will be visited only if its parent is of a different type, or it is also of type $\[<>\]$ and has $0\leq y\leq c$. In the latter case the parent will be one of the $s$ reported points, and the former case can occur only $O(\log n)$ times. We conclude that this ``visitation'' tree has at most $O(\log n) + s$ internal nodes, and therefore at most $2(O(\log n) + s) + 1=O(\log n + s)$ total nodes.\par A similar argument gives the same result for the algorithm for finding the point of minimum $x$ in the given region. In the ``visitation'' tree as defined above, if a node $p$ has a left descendant of type $\[<>\]$ that is not a leaf, then that descendant will be in the query region, and the algorithm will not visit the right subtree of $p$. It follows that any two internal nodes of type $\[<>\]$ must be on the same path, and this in turn implies that the number of such internal nodes is at most $O(\log n)$. The number of internal nodes of other types is subject to the same bounds, so we conclude that the totla number of visited nodes is also $O(\log n)$. \vfill\eject\end damn damn damn damn damn damn damn damn damn!(635)\f5 damn damn damn damn damn damn damn damn damn!\f5 damn damn damn damn damn damn damn damn damn!\f5 damn damn damn damn damn damn damn damn damn!\f5 damn damn damn damn damn damn damn damn damn!\f5 damn damn damn damn damn damn damn damn damn!\f5 damn damn damn damn damn damn damn damn damn!\f5 damn damn damn damn damn damn damn damn damn!\f5 damn damn damn damn damn damn damn damn damn!\f5 damn damn damn damn damn damn damn damn damn!\f5 damn damn damn damn damn damn damn damn damn!\f5 damn damn damn damn damn damn damn damn damn!\f5 damn damn damn damn damn damn damn damn damn!\f5 damn damn damn damn damn damn damn damn damn!\f5 damn damn damn damn damn damn damn damn damn!\f5 damn damn damn damn damn damn damn damn damn!\f5 damn damn damn damn damn damn damn damn damn!\f5 damn damn damn damn damn damn damn damn damn!\f5 damn damn damn damn damn damn damn damn damn!\f5 damn damn damn damn damn damn damn damn damn!\f5 damn damn damn damn damn damn damn damn damn!\f5 damn damn damn damn damn damn damn damn damn!\f5 damn damn damn damn damn damn damn damn damn!\f5 damn damn damn damn damn damn damn damn damn!\f5 damn damn damn damn damn damn damn damn damn!\f5 damn damn damn damn damn damn damn damn damn!\f5 damn damn damn damn damn damn damn damn damn!\f5 damn damn damn damn damn damn damn damn damn!\f5 damn damn damn damn damn damn damn damn damn!\f5 damn damn damn damn damn damn damn damn damn!\f5 \