\input Paper.tex \centerline{\titlefont Section 2 of the Paper} {\noindent\bf 2. Deletion} {\noindent\bf 2.0 Death Certificates} Using either anti-entropy or rumor mongering, we cannot delete an item from the database by simply removing a local copy of the item and expecting the ``absence'' of the item to spread to other sites. Just the opposite will happen: the propagation mechanism will spread old copies of the item from elsewhere in the database back to the site where we have deleted it. Unless we can simultaneously delete all copies of an obsolete item from the data base, it will eventually be "resurrected" in this way. To remedy this problem we replace deleted items with {\sl death certificates}, which carry time stamps and spread like ordinary data. During propagation, when old copies of deleted items meet death certificates, the old items are removed. If we keep a death certificate long enough it eventually cancels all old copies of its associated item. Unfortunately this does not completely solve the problem. We still must decide when to delete the death certificates themselves, so they do not ultimately consume all available storage at the sites. One strategy is to retain each death certificate until it can be determined that every site has received it. Sarin and Lynch [??] describe a protocol for making this determination, based on the distributed snapshot algorithm of Chandy and Lamport [??]. Separate protocols are needed for adding and removing sites (the site addition protocol is not actually described in [??]). If any site has failed since the creation of a death certificate, that death certificate cannot be deleted until the site removal protocol has been run. In a network of several hundred sites this fact can become quite significant. In our experience, there is a fairly high probably that at any time some site will be down for hours or even days, preventing the distributed snapshot or site deletion algorithm from completing. A much simpler strategy is to hold death certificates for some fixed length of time, such as 30 days, and then discard them. With a threshold such as this, we run the risk of obsolete items older than the threshold being resurrected, as described above. Fortunately this risk can be made very small. {\noindent\bf 2.1 Death Certificate Decay} There is a distributed way of extending the threshold back further than the space on any one server would permit. This scheme, which we call {\sl death certificate decay}, uses two thresholds, $\tau_1$ and $\tau_2$. Each sever keeps all death certificates timestamped within $\tau_1$ of the current time, and once $\tau_1$ is reached, older death certificates are deleted at a rate of $1/\tau_2$, creating an exponential distribution: $$P_1(t) = \cases{1, & $0  t < \tau_1$;\cr e^{-{t - \tau_1\over \tau_2}}, & $t  \tau_1$\cr}$$ (For simplicity we ignore the differences between sites' local clocks. It is realistic to assume that the clock synchronization error is at most $\epsilon << \tau_1$. This has no significant effect on the arguments below.) $P_1(t)$ is the probability that a death certificate of age $t$ will still be present in the database of a single site. This probability drops off very quickly. However, as will be seen below, we can ensure that if a death certificate is present at {\sl any} site in the network, resurrection of the associated data item cannot occur. For $n$ sites, the probability that at least one site holds a given death certificate of age $t$ is: $$P_n(t) = 1 - (1 - P_1(t))^n$$ To compare this with a fixed threshold $\tau$, let us assume that the rate of deletion is steady over time, so that for equivalent use of memory on the individual machines we should choose $\tau_1 + \tau_2 = \tau$. For example, suppose that we have $500$ sites using a fixed $\tau = 30 \hbox{days}$, and we replace it with a probabilistic $\tau_1 = 10 \hbox{ days}$ and $\tau_2 = 20 \hbox{ days}$. There is now a negligible ($10^{-100}$) chance that a death certificate of age $30$ days has been discarded by every site, and a significant chance ($.965$) that a certificate of age $110$ days is still held by some site. {\noindent\bf 2.2. Anti-Entropy with Death Certificate Decay} If anti-entropy is used for distributing updates, decaying death certificates older than $\tau_1$ should not normally be propagated during anti-entropy exchanges. Whenever such a death certificate encounters an obsolete data item, however, the death certificate must be "reactivated" in some way, so it will propagate to all sites and cancel the obsolete data item. The obvious way to reactivate a death certificate is to set its timestamp forward to the current clock value. This approach might be acceptable in a system that did not allow deleted data items to be "reinstated." In general it is incorrect, because somewhere in the network there could be a legitimate update with a timestamp between the original and revised timestamps of the death certificate (e.g. one reinstating the deleted item). Such an update would incorrectly be cancelled by the reactivated death certificate. To avoid this problem, we store a second timestamp, called the {\sl activation timestamp}, with each death certificate. Initially the ordinary and activation timestamps are the same. A death certificate cancels a corresponding data item if its ordinary timestamp is greater than the timestamp of the item; it is propagated during anti-entropy if its activation timestamp is newer than $\tau_1$ before the current time. To reactivate a death certificate, we set its activation timestamp to the current time, leaving its ordinary timestamp unchanged. This has the desired effect of propagating the reactivated death certificate without cancelling more recent updates. {\noindent\bf 2.3. Rumor Mongering with Death Certificate Decay} If rumor mongering is used for distributing updates, decaying death certificates require no special treatment. Each death certificate is created as an active rumor. Eventually it propagates through the network and becomes inactive at all sites. Whenever a death certificate with a timestamp older than $\tau_1$ encounters an obsolete data item at some site, the death certificate is reactivated simply by making it an active rumor at that site. The normal rumor mongering mechanism then distributes the death certificate throughout the network, without regard for the fact that its timestamp may be quite old. {\noindent\bf 2.4. Scaling Properties} Notice that the performance of the death certificate decay algorithm, measured by the retained history in death certificates, improves as the number of sites increases. More precisely, we can choose a fixed probability $p$ and compute the time $t$ at which $P_n(t)$ falls below $p$, as a function of $n$. The result is $$t = \tau_2 \ln n + O(1)$$ That is, for a database with $n$ sites we can expect to get a factor of $O(\log n)$ more history by using death certificate decay. As described above, the time required for a new update to propagate through the network, using either anti-entropy or rumor mongering, is also expected to be $O(\log n)$, so at first glance this algorithm would appear to scale indefinitely. Unfortunately, this is not the case. The problem is that the expected time to propagate a new update, which grows with $n$, will eventually exceed the threshold value $\tau_1$, which does not grow with $n$. While the propagation time is less than $\tau_1$, it is seldom necessary to reactivate old death certificates; after the propagation time grows beyond $\tau_1$, reactivation of old death certificates becomes more and more frequent. This introduces additional load on the net for propagating the old death certificates, thereby further degrading the propagation time. The ultimate result is catastrophic failure. To avoid such a failure, system parameters must be chosen to keep the expected propagation time below $\tau_1$. \par\vfill\end