[_CD6_]<ralpine>CHSpaper>S0.tex!27

\nopagenumbers\null\vskip .8in

\centerline{\titlefont

Epidemic Algorithms for Replicated Database Maintenance}

{\it
\vskip .7in
\centerline{Alan Demers, Mark Gealy, Dan Greene, Carl Hauser, Wes Irish, John Larson,}

\centerline{Sue Manning, Scott Shenker, Howard Sturgis, Dan Swinehart, Doug Terry, and Don Woods}

}

\vskip 0.5in

\centerline{Xerox Palo Alto Research Center}

\vskip .9in

{\noindent\bf Abstract}

When a database is replicated at many sites, maintaining mutual consistency among the sites in the face of updates is a significant problem. This paper describes several randomized algorithms for distributing updates and driving the replicas toward consistency. The algorithms are very simple and require few guarantees from the underlying communication system, yet they ensure that the effect of every update is eventually reflected in all replicas. The cost and performance of the algorithms are tuned by choosing appropriate distributions in the randomization step. The algorithms are closely analogous to epidemics, and the epidemiology literature aids in understanding their behavior. One of the algorithms has been implemented in the Clearinghouse servers of the Xerox Corporate Internet, solving long-standing problems of high traffic and database inconsistency.

\noindent {\bf CR Categories and Subject Descriptors:} C.2.4 [Computer-Communication Networks]: Distributed Systems---{\sl distributed databases}.

\noindent{\bf General Terms:} Algorithms, Experimentation, Performance, Theory.

\noindent {\bf Additional Keywords and Phrases:} epidemiology, rumors, consistency, name service, electronic mail.

\noindent

An earlier version of this paper appeared in the Proceedings of the Sixth Annual ACM Symposium on Principles of Distributed Computing, Vancouver, August 1987, pages 1--12.

\noindent
February 7, 1989

\vfill\eject

\pageno=1

\headline{\ifodd\pageno\rightheadline\else\leftheadline\fi}

\def\rightheadline{\hfil \subtitlefont Epidemic Algorithms for Replicated Database Maintenance\hfil\folio}

\def\leftheadline{\subtitlefont\folio\hfil Epidemic Algorithms for Replicated Database Maintenance\hfil}

\def\footline{\subtitlefont\hfil Xerox PARC, CSL-89-1, January 1989\hfil}

\boldhead{0. Introduction}

Considering a database replicated at many sites in a large, heterogeneous, slightly unreliable and slowly changing network of several hundred or thousand sites, we examine several methods for achieving and maintaining consistency among the sites. Each database update is injected at a single site and must be propagated to all the other sites or supplanted by a later update. The sites can become fully consistent only when all updating activity has stopped and the system has become quiescent. On the other hand, assuming a reasonable update rate, most information at any given site is current. This relaxed form of consistency has been shown to be quite useful in practice [Bi]. Our goal is to design algorithms that are efficient and robust and that scale gracefully as the number of sites increases.

\noindent

Important factors to be considered in examining algorithms for solving this problem include

\item{$\bullet$} the time required for an update to propagate to all sites, and

\item{$\bullet$} the network traffic generated in propagating a single update. Ideally network traffic is proportional to the size of the update times the number of servers, but some algorithms create much more traffic.

In this paper we present analyses, simulation results and practical experience using several strategies for spreading updates. The methods examined include:

\item{1.} {\it Direct mail:} each new update is immediately mailed from its entry site to all other sites. This is timely and reasonably efficient but not entirely reliable since individual sites do not always know about all other sites and since mail is sometimes lost.

\item{2.} {\it Anti-entropy:} every site regularly chooses another site at random and by exchanging database contents with it resolves any differences between the two. Anti-entropy is extremely reliable but requires examining the contents of the database and so cannot be used too frequently. Analysis and simulation show that anti-entropy, while reliable, propagates updates much more slowly than direct mail.

\item{3.} {\it Rumor mongering:} sites are initially ``ignorant''; when a site receives a new update it becomes a ``hot rumor''; while a site holds a hot rumor, it periodically chooses another site at random and ensures that the other site has seen the update; when a site has tried to share a hot rumor with too many sites that have already seen it, the site stops treating the rumor as hot and retains the update without propagating it further. Rumor cycles can be more frequent than anti-entropy cycles because they require fewer resources at each site, but there is some chance that an update will not reach all sites.

Anti-entropy and rumor mongering are both examples of epidemic processes, and results from the theory of epidemics [Ba] are applicable. Our understanding of these mechanisms benefits greatly from the existing mathematical theory of epidemiology, although our goals differ (we would be pleased with the rapid and complete spread of an update). Moreover, we have the freedom to design the epidemic mechanism, rather than the problem of modeling an existing disease. We adopt the terminology of the epidemiology literature and call a site with an update it is willing to share {\sl infective} with respect to that update. A site is {\sl susceptible} if it has not yet received the update; and a site is {\sl removed} if it has received the update but is no longer willing to share the update. Anti-entropy is an example of a {\sl simple} epidemic: one in which sites are always either susceptible or infective.

Choosing partners uniformly results in fairly high network traffic, leading us to consider spatial distributions in which the choice tends to favor nearby servers. Analyses and simulations on the actual topology of the Xerox Corporate Internet reveal distributions for both anti-entropy and rumor mongering that converge nearly as rapidly as the uniform distribution while reducing the average and maximum traffic per link. The resulting anti-entropy algorithm has been installed on the Xerox Corporate Internet and has resulted in a significant performance improvement.

We should point out that extensive replication of a database is expensive. It should be avoided whenever possible by hierarchical decomposition of the database or by caching. Even so, the results of our paper are interesting because they indicate that significant replication can be achieved, with simple algorithms, at each level of a hierarchy or in the backbone of a caching scheme.