[_CD6_]<ralpine>CHSpaper>S3r1.tex!22

\boldhead{\bf 3.1 Spatial Distributions and Anti-Entropy}

We argued above that a nonuniform spatial distribution can significantly reduce the traffic generated by anti-entropy without unacceptably increasing its convergence time. The network topologies considered were very regular: $D$-dimensional rectilinear grids.

Use of a nonuniform distribution becomes even more attractive when we consider the actual CIN topology. In particular, the CIN includes small sets of critical links, such as a pair of transatlantic links that are the only routes connecting $n𡤁$ sites in Europe to $n𡤂$ sites in North America. Currently $n𡤁$ is a few tens, and $n𡤂$ is several hundred. Using a uniform distribution, the expected number of conversations on these transatlantic links per round of anti-entropy is $2n𡤁 n𡤂 /(n𡤁 + n𡤂 )$. This is about 80 conversations, shared between the two links. By comparison, when averaged over all links, the expected traffic per link per cycle is less than 6 conversations. It is the unacceptably high traffic on critical links, rather than the average traffic per link, that makes uniform anti-entropy too costly for use in our system. This observation originally inspired our study of nonuniform spatial distributions.

To learn how network traffic could be reduced, we performed extensive simulations of anti-entropy behavior using the actual CIN topology and a number of different spatial distributions.

Preliminary results indicated that distributions parameterized by $Q←s (d)$ adapt well to the ``local dimension'' of the network as suggested in Section 3, and perform significantly better than distributions with any direct dependence on $d$. In particular, $1/Q←s (d)^2$ outperforms $1/dQ←s (d)$. The results using spatial distributions of the form $Q←s(d)^{-a}$ for anti-entropy were very encouraging. However, early simulations of rumor mongering uncovered a few problem spots in the CIN topology when spatial distributions were used.

After examining these results, we developed a slightly different class of distributions that are less sensitive to sudden increases in $Q←s (d)$. These distributions proved to be more effective for both anti-entropy and rumor mongering on the CIN topology. Informally, let each site $s$ build a list of the other sites sorted by their distance from $s$, and then select anti-entropy exchange partners from the sorted list according to a function $f(i)$. This function gives the probability of choosing a site as a function of its position $i$ in the list. For $f$ we can use the spatial distribution function that would be used on a uniform linear network. Of course, two sites at the same distance from $s$ in the real network (but at different positions in the list) should not be selected with different probabilities. We can arrange for this by averaging the probabilities of selecting equidistant sites; i.e., by selecting each of the sites at distance $d$ with probability proportional to

$$ p(d) = { \sum←{i=Q(d-1)+1}^{Q(d)} f(i) \over Q(d) - Q(d-1) }. $$

Letting $f = i^{-a}$, where $a$ is a parameter to be determined, and approximating the summation by an integral, we obtain

$$ p(d) \approx {{Q(d-1)^{-a+1} - Q(d)^{-a+1}}\over {Q(d) - Q(d-1)}}.\eqno (3.1.1)$$

\noindent

Note for $a$ = 2 the right side of (3.1.1) reduces to

$$1/(Q←s (d-1)Q←s (d)),$$

\noindent
which is $O(d^{-2D})$ on a $D$-dimensional mesh, as discussed in Section 3.

Simulation results reported in this and the next section use either uniform distributions or distributions of the above form for selected values of $a$. (To avoid the singularity at Q(d) = 0 we add one to Q(d) throughout equation (3.1.1).)

Table 4 summarizes results for the CIN topology using push-pull anti-entropy with no connection limit. This table represents 250 runs, each propagating a single update introduced at a randomly chosen site. The ``Compare Traffic'' figures represent the number of anti-entropy comparisons per cycle, averaged over all network links and averaged separately for the transatlantic link to Bushey, England. ``Update Traffic'' represents the total number of anti-entropy exchanges in which the update had to be sent from one site to the other. (The distinction between compare and update traffic can be significant if checksums are used for database comparison, as discussed in Section 1.3).

\vskip 10pt

\noindent

{\bf Table 4.} Simulation results for anti-entropy, no connection limit.

\vskip 5pt\penalty 1000
{\vbox{\tabskip=0pt \offinterlineskip
\def\tr{\noalign{\hrule}}

\halign to\hsize{\strut#& \vrule#\tabskip=1em plus2em&\hfil#\hfil& \vrule#& \hfil#\hfil& \vrule#&\hfil#\hfil& \vrule#& \hfil#\hfil& \vrule#& \hfil#\hfil& \vrule#& \hfil#\hfil& \vrule#&\hfil#\hfil& \vrule#\tabskip=0pt\cr\tr
&&Spatial&&\hfil$t←{last}$\hfil&&\hfil$t←{ave}$\hfil&&\multispan3\hfil Compare Traffic\hfil &&\multispan3\hfil Update Traffic\hfil&\cr
&&Distribution&& && &&Average&&Bushey&&Average&&Bushey&\cr\tr

&& uniform && 7.8 && 5.3 && 5.9 && 75.7 && 5.8 && 74.4 &\cr\tr
&& $a=1.2$ && 10.0 && 6.3 && 2.0 && 11.2 && 2.6 && 17.5 &\cr\tr
&& $a=1.4$ && 10.3 && 6.4 && 1.9 && 8.8 && 2.5 && 14.1 &\cr\tr
&& $a=1.6$ && 10.9 && 6.7 && 1.7 && 5.7 && 2.3 && 10.9 &\cr\tr
&& $a=1.8$ && 12.0 && 7.2 && 1.5 && 3.7 && 2.1 && 7.7 &\cr\tr
&& $a=2.0$ && 13.3 && 7.8 && 1.4 && 2.4 && 1.9 && 5.9 &\cr\tr}}}

\vskip 10pt

\noindent

Two points are worth mentioning:

\item{1.} Comparing the $a=2$ results with the uniform case, convergence time $t←{last}$ degrades by less than a factor of 2, while average traffic per round improves by a factor of more than 4. Arguably, we could afford to perform anti-entropy twice as frequently with the nonuniform distribution, thereby getting an equivalent convergence rate with a factor of two improvement in average traffic.

\item{2.} Again comparing the $a=2$ results with the uniform case, the compare traffic on the transatlantic link falls from 75.7 to 2.4, a factor of more than 30. Traffic on this link is now less than twice the mean. We view this as the most important benefit of nonuniform distributions for the CIN topology. Using this distribution, anti-entropy exchanges do not overload critical network links.

\noindent

The results in Table 4 assume no connection limit. This assumption is quite unrealistic -- the actual Clearinghouse servers can support only a few anti-entropy connections at once. Table 5 gives simulation results under the most pessimistic assumption: connection limit of 1 and hunt limit 0. As above, the table represents 250 runs, each propagating a single update introduced at a randomly chosen site.

\vskip 10pt\penalty-400

\noindent

{\bf Table 5.} Simulation results for anti-entropy, connection limit 1.

\penalty 1000\vskip 5pt\penalty 1000
{\vbox{\tabskip=0pt \offinterlineskip
\def\tr{\noalign{\hrule}}

\halign to\hsize{\strut#& \vrule#\tabskip=1em plus2em&\hfil#\hfil& \vrule#& \hfil#\hfil& \vrule#&\hfil#\hfil& \vrule#& \hfil#\hfil& \vrule#& \hfil#\hfil& \vrule#& \hfil#\hfil& \vrule#&\hfil#\hfil& \vrule#\tabskip=0pt\cr\tr
&&Spatial&&\hfil$t←{last}$\hfil&&\hfil$t←{ave}$\hfil&&\multispan3\hfil Compare Traffic\hfil&&\multispan3\hfil Update Traffic\hfil&\cr
&&Distribution&& && &&Average&&Bushey&&Average&&Bushey&\cr\tr

&& uniform && 11.0 && 7.0 && 3.7 && 47.5 && 5.8 && 75.2 &\cr\tr
&& $a=1.2$ && 16.9 && 9.9 && 1.1 && 6.4 && 2.7 && 18.0 &\cr\tr
&& $a=1.4$ && 17.3 && 10.1 && 1.1 && 4.7 && 2.5 && 13.7 &\cr\tr
&& $a=1.6$ && 19.1 && 11.1 && 0.9 && 2.9 && 2.3 && 10.2 &\cr\tr
&& $a=1.8$ && 21.5 && 12.4 && 0.8 && 1.7 && 2.1 && 7.0 &\cr\tr
&& $a=2.0$ && 24.6 && 14.1 && 0.7 && 0.9 && 1.9 && 4.8 &\cr\tr}}}

\vskip 10pt

\noindent

Note:

\item{3.} The ``Compare Traffic'' figures in Table 5 are significantly lower than those in Table 4, reflecting the fact that fewer successful connections are established per cycle. The convergence times in Table 5 are correspondingly higher than those in Table 4. These effects are more pronounced with the less uniform distributions.

\item{4.} The total comparison traffic (which is the product of convergence time and ``Compare Traffic'') does not change significantly when the connection limit is imposed; the compare traffic per cycle decreases, while the number of cycles increases.

\noindent

To summarize: using a spatial distribution with anti-entropy can significantly reduce traffic on critical links that would be ``hot spots'' if a uniform distribution were used. The most pessimistic connection limit slows convergence but does not significantly change the total amount of traffic generated in distributing the update; it just slows down distribution of the update somewhat.

Based on our early simulation results, a nonuniform anti-entropy algorithm using a $1/Q(d)^2$ distribution was implemented as part of an internal release of the Clearinghouse service. The new release has now been installed on the entire CIN and has produced dramatic improvements both in network load generated by the Clearinghouse servers and in consistency of their databases.