\input CS445Hdr % This is CS445N4B of February 9, 1982 12:00 AM \setcount0 20 \def\header{4. Convex hull algorithms} \def\title{Lecture 8} \sect{4.6. Analysis of ``split and merge'' algorithms.} In the previous sections, we discussed two recursive algorithms for finding convex hull (4.3 and 4.4) that could be described as using the ``split-and-merge'' method. The running time $H(n)$ of such algorithms is given by the following general recurrence: $$H(n)\leq\alter{$O(1)$|if $n=1$,\cr\vb $H(\alpha n)+H(\beta n) +S(n)+M(n)$| otherwise\cr}\eqno(1)$$ where $S(n)$ is the time required to split a problem of size $n$ in two subproblems whose sizes are at most $\alpha n$ and $\beta n$, and $M(n)$ is how long it takes to merge the two solutions. The formula above holds both for maximum (worst-case) and for average times.\par Recurrence (1) is extremely common in the analysis of combinatorial algorithms. The asymptotic behavior of $H(n)$ depends on that of $S(n)+M(n)$, and on the constants $\alpha$ and $\beta$. Some of the most common cases are summarized in the table below: $$\vbox{\tabskip0pt \halign{\ctr{\hskip10pt #}|\ctr{\hskip10pt #}|\ctr{\hskip10pt #}\cr $\alpha+\beta$ | $S(n)+M(n)$ |$H(n)$\cr \noalign{\vskip6pt\hrule\vskip6pt} $\null\leq 1$|$O(n)$|$O(n\log n)$\cr\vb $\null\leq 1$|$O(n^p)$, $p<1$|$O(n)$\cr\vb $\null\leq c< 1$|$O(n)$|$O(n)$\cr\vb $\null\leq 1$|$\dispstyle O\biglp{n\over{\log n}}\bigrp$| $O(n\log\log n)$\cr\vb $\null\leq 1$|$\dispstyle O\biglp{n\over{(\log n)^p}}\bigrp$, $p>1$|$O(n)$\cr}}\eqno(2)$$ For example, in algorithm 4.3 (Preparata-Hong) the split and merge operations took $O(n)$ time, and $\alpha=\beta={1\over2}$\foot1{This is not exactly true, since $\lceil n/2\rceil$ can be as large as ${2n\over3}$. When $n$ is not a power of 2, we need a more general version of recurrence (1), namely\ftpar \null\ftfill$\dispstyle H(n)\leq H(\alphan n)+H(\betan n) + S(n)+M(n)$ \ftpar where $\alphan$ and $\betan$ are allowed to depend on $n$. Table (2) remains valid in this general formulation, as long as we have $\alphan\leq \delta$ and $\betan\leq \delta$ for some constant $\delta$ strictly less than 1.}, so the total running time is $O(n\log n)$. The same is true of algorithm 4.4 (Shamos-Houy). In the latter, the split operation requires a preliminary sorting step, so the bound is tight: we require $\Theta(n\log n)$ time\foot2{We say $f(n)=\Omega(g(n))$ iff we have $\leftv f(n)\rightv\geq cg(n)$ for some positive constant $c$ and all sufficiently large $n$. We say that $f(n)=\Theta(g(n))$ iff $f(n)=O(g(n))$ and $f(n)=\Omega(g(n))$, i.e. we have $c1g(n)\leq\leftv f(n) \rightv \leq c2g(n)$ for some positive constants $c1, c2$ and for all sufficiently large $n$.}.\par Actually, in both algorithms the merge operation requires only $O(h1+h1)$ operations, where $h1$ and $h2$ are the number of vertices in the hulls of the two subproblems. In the worst case, we may have $h1+h2=n$; however, the {\it average} value of this quantity grows much slower than $n$, as we shall see.\par We already remarked that for algorithm 4.3 the statistical distribuition of the points in the two subproblems is the same as in the original one. If we let $\bar h(n)$ be the average number of vertices on the convex hull of $n$ random points of the plane, then the average merge time for algorithm 4.3 will be $$O\left(\bar h\biglp\lceil {n\over 2}\rceil\bigrp+ \bar h\biglp\lfloor{n\over 2}\rfloor\bigrp\right)=O\left(\bar h(n)\right),$$ and the split time is $O(1)$.\par We should note that the function $\bar h(n)$ is relevent also in the analysis of Jarvis' algorithm (4.1): as we saw, its running time is $O(nh)$, where $h$ is the size of the resulting hull. The asymptotic behavior of $\bar h(n)$ is therefore worth of a more detailed analysis.\par \sect{4.7. Random points in a triangle.} In this section we will analyze the behavior of $\bar h(n)$ for a very simple case: we will assume the $n$ given points are uniformly distributed inside a triangular region of the plane. Moreover, we will henceforth assume that the points are independent of each other.\par There are several obvious ways to tackle the analysis of $\bar h(n)$. For example, we might try to use some ``divide and conquer'' method (like that used in algorithm 4.3) to get a recurrence relation for $\bar h(n)$. We could also try to compute the probability that one point falls inside the triangle defined by three others, or that the $(i+1)$-th point falls inside the hull of the preceding $i$ ones, and somehow use this result to get the average number of vertices in the hull.\par Unfortunately, these ideas generally translate into rather messy statistical formulas, which are very hard to evaluate. The approach we will follow in this section is the one used by A. R\'enyi and R. Sulanke (1963) to prove the result below: \theorem{4.4}{The convex hull of a set $P$ constituted of $n$ random points uniformly and independently distributed in an arbitrary triangle $T$ has, on the average, $$\eqalign{\bar h(n)|=2H{n-1}\cr\vb \null|=2(1+{1\over2}+{1\over3}+\cdots+{1\over {n-1}})\cr\vb \null|=2(\ln n+\gamma+o(1))\cr}$$ vertices.} \proof{Note that the number of edges on the convex hull is equal to the number of vertices, and that each edge is determined by two of the given points. Therefore, $\bar h(n)$ is also the average number of pairs $pi, pj\in P$ such that segment $pipj$ is an edge of the hull. If we define the random variables $$\epsilon{ij}=\alter{ 1|if $pipj$is an edge of $\hull(P)$,\cr\va 0|otherwise.\cr}\eqno{(3)}$$ then the number of edges (vertices) of $\hull(P)$ will be $$\sum{1\leq i0$, $\null=0$, etc.), where $F$ is a second-degree polynomial and the $xi,yi$ are the coordinates of input points. Note that the logical tests performed by the algorithms we discussed so far (like the cross-product test, or the function $\prg{SameSide($p$, $q$, $r$, $s$)}$) can be rewritten using a bounded number of such quadratic tests. This holds even if we allow the tests to be applied to intermediate variables, as long as these are linear functions of the inputs. As a matter of fact, all convex hull algorithms that can be found in the literature perform only quadratic tests on the input points, so theorem 4.5 is extremely important.\par For a given value of $n$, a convex hull algorithm can be ``unfolded'' so as to become a binary decision tree, where each internal node $t$ is associated with a fixed quadratic test $ft(x1, y1, x2, y2, \ldotss, xn, yn)geq0$ and each leaf $s$ with a subset $Ns$ of $\leftset 1, 2, \ldotss, n\rightset$. Each input $P=\leftset p1, p2, \ldotss, pn\rightset$, defines a path in this tree, where the edge taken after each internal node $t$ is determined by whether $P$ satisfies or fails the associated test $ft$. The leaf $s$ where this path ends is such that $\hull(P)= leftset pi \relv i\in Ns\rightset$.\par The central idea of Yao's proof is the following. Let n=2m, and let $R^*$ be a regular polygon with $m$ vertices $\leftset r1^*, r2^*, \ldotss, rn^*\rightset$.\par For each permutation $\sigma$ of $1, 2, \ldotss, m$, define the family $\Pscr\sigma$ as being the collection of all $2m$-point sets $P=\leftset r1, r2, \ldotss, rm, q1, q2, \ldotss, qm\rightset$, where $\leftset r1, r2, \ldotss, rm\rightset$ is a ``slightly perturbed'' copy of $R^*$, and each $qi$ is a point on the segment $r{\sigma(i)}r{\sigma(i)+1}$ \foot1{Here, as usual, we assume the $ri$ to be cyclically indexed, i.e. $ri$ actually means $r{(i-1)\modop m + 1}$.} and somewhere near its middle point:\fig5\par For each member $P=\leftset r1, r2, \ldotss, rm, q1, q2, \ldotss, qm\rightset$ of $\Pscr\sigma$, we have $\hull(P)=\leftset r1, r2,\ldotss, rm\rightset$, but the points $q1, q2, \ldotss, qm$ are on the ``boundary'' of decision: an infinitesimal change in their coordinates would be enough to make them vertices of the hull. When the algorithm is applied to such $P$, it must check each $q{\sigma(i)}$ against the adjacent vertices $ri$ and $r{i+1}$ for collinearity. In doing so, the algorithm will essentially determine for each $q{\sigma(i)}$ the corresponding side $rir{i+1}$, i.e. it will determine the permutation $\sigma$. To distinguish between the $m!$ possible permutations, the algorithm needs al least $\lg (m!) = \Omega(n\log n)$ tests.\par Of course, this argument is little more than hand-waving, but unfortunately the details are too complex to give here in full. For them, the reader is referred to the original paper in JACM 28,4 (Oct. 1981) 780--787.\par \vfill\eject\end (635)\f5 13277i1I