\boldhead{3. Spatial Distributions}
Up to this point we have regarded the network as uniform, but in reality the cost of sending an update to a nearby site is much lower than the cost of sending the update to a distant site. To reduce communication costs, we would like our protocols to favor nearby neighbors, but there are drawbacks to too much local favoritism. It is easiest to begin exploring this tradeoff on a line.
Assume, for the moment, that the database sites are arranged on a linear network, and that each site is one link from its nearest neighbors. If we were using only nearest neighbors for anti-entropy exchanges, then the traffic per link per cycle would be $O(1)$, but it would take $O(n)$ cycles to spread an update. At the other extreme, if we were using uniform random connections, then the average distance of a connection would be $O(n)$, so that even though the convergence would be $O(\log n)$, the traffic per link per cycle would be $O(n)$. In general, let the probability of connecting to a site at distance $d$ be proportional to $d^{-a}$, where $a$ is a parameter to be determined. Analysis shows that the expected traffic per link is:
$$ T(n) = \cases{O(n), & $a<1$;\cr
O(n/\log n), & $a=1$;\cr
O(n^{2-a}), & $1 < a < 2$;\cr
O(\log n), & $a = 2$; \cr
O(1), &$a > 2$.\cr}$$
Convergence times for the $d^{-a}$ distribution are much harder to compute exactly. Informal equations and simulations suggest that they follow the reverse pattern: for $a > 2$ the convergence is polynomial in $n$, and for $a < 2$ the convergence is polynomial in $\log n$. This strongly suggests using a $d^{-2}$ distribution for spreading updates on a line, since both traffic and convergence would scale well as $n$ goes to infinity.
A realistic network bears little resemblance to the linear example used above (surprisingly, small sections of the CIN are in fact linear, but the bulk of the network is more highly connected), so it is not immediately obvious how to generalize the $d^{-2}$ distribution. The above reasoning can be generalized to higher dimensional rectilinear meshes of sites, suggesting good convergence (polynomial in $\log n$) with distributions as tight as $d^{-2D}$, where $D$ is the dimension of the mesh. Moreover, the traffic drops to $O(\log n)$ as soon as the distribution is $d^{-(D+1)}$, so we have a broader region of good behavior, but it is dependent on the dimension of the mesh, $D$. This observation led us to consider letting each site $s$ independently choose connections according to a distribution that is a function of $Q←s (d)$, where $Q←s (d)$ is the cumulative number of sites at distance $d$ or less from $s$. On a $D$-dimensional mesh, $Q←s (d)$ is $\Theta(d^D)$, so that a distribution like $1/Q←s (d)^2$ is $\Theta(d^{-2D})$, regardless of the dimension of the mesh. We conjectured that the $Q←s (d)$ function's ability to adapt to the dimension of a mesh would make it work well in an arbitrary network, and that the asymptotic properties would make distributions between $1/dQ←s (d)$ and $1/Q←s (d)^2$ have the best scaling behavior. The next section describes further practical considerations for the choice of distribution and our experience with $1/Q←s (d)^2$.