Grapevine:an Exercise in Distributed Computingby Andrew D. Birrell, Roy Levin, Roger M. Needham* and Michael D. SchroederCSL-82-4June 1982[P82-00035] c Copyright Association for Computing Machinery 1982. Printed with permission.Abstract: Grapevine is a multicomputer system on the Xerox research internet. It providesfacilities for the delivery of digital messages such as computer mail; for naming people,machines, and services; for authenticating people and machines; and for locating serviceson the internet. This paper has two goals: to describe the system itself and to serve as acase study of a real application of distributed computing. Part I describes the set of servicesprovided by Grapevine and how its data and function are divided among computers on theinternet. Part II presents in more detail selected aspects of Grapevine that illustrate novelfacilities or implementation techniques, or that provide insight into the structure of adistributed system. Part III summarizes the current state of the system and the lessonslearned from it so far.This paper appeared in Communications of the ACM, Vol. 25, No. 4, April 1982.* Roger M. Needham's regular address is University of Cambridge Computer Laboratory, Corn Exchange Street,Cambridge, CB2 3QG, United KingdomXEROXXerox CorporationPalo Alto Research Centers3333 Coyote Hill RoadPalo Alto, California 94304Ip E$@qX1ALr@q<siF]t s7`utN.q v B,Z/**&5$'K% V#!5!TOX:w"vjtV "xtvXtmttd` 7=KGRAPEVINEiiCR Categories and Subject Descriptors: C.2.4 [Computer-Communication Networks]:Distributed Systemsdistributed applications, distributed databases; C.4 [Performance ofSystems]reliability, availability, and serviceability; D.4.7 [Operating Systems]:Organization and Designdistributed systems; H.2.4 [Database Management]:Systemsdistributed systems; H.2.7 [Database Management]: Database Administration;H.4.3 [Information Systems Applications]: Communications Applicationselectronic mailGeneral Terms: Design, Experimentation, Reliability"Kyzyrq&vqv=w% vq  vw-vqv wvq v wv qvkqvw dqX v%( D?Q#eYGRAPEVINE1Part I: Description of Grapevine1. IntroductionGrapevine is a system that provides message delivery, resource location, authentication, and accesscontrol services in a computer internet. The implementation of Grapevine is distributed andreplicated. By distributed we mean that some of the services provided by Grapevine involve the useof multiple computers communicating through an internet; by replicated we mean that some of theservices are provided equally well by any of several distinct computers. The primary use ofGrapevine is delivering computer mail, but Grapevine is used in many other ways as well. TheGrapevine project was motivated by our desire to do research into the structure of distributedsystems and to provide our community with better computer mail service.Plans for the system were presented in an earlier paper [5]. This paper describes the completedsystem. The mechanisms discussed below are in service supporting more than 1500 users.Designing and building Grapevine took about three years by a team that averaged two to threepersons.1.1 Environment for GrapevineFigure 1 illustrates the kind of computing environment in which Grapevine was constructed andoperates. A large internet of this style exists within the Xerox Corporation research anddevelopment community. This internet extends from coast-to-coast in the U.S.A., to Canada, and toEngland. It contains over 1500 computers on more than 50 local networks.<==>U>UU>>U%U&U&>%>%U%>U&&%U>%>U5U6U6>5>5U5>U665U>5>U5U6U6>5>5U5>U665U>5>U+U,U,>+>+U+>U,,+U>+>U%U&U&>%>%U%>U&&%U>%>UUU>>U>UU>>UUU>>U>UU>>UUU>>U>UU>>U   $ &&% $ & $%&& UU>>U>UU>>UUU>>U>UU>>U Workstation Workstation Workstation WorkstationServere Workstation WorkstationGatewayGatewaytelephone lineEthernetEthernetnServere<==<<EthernetZpq<8pU{ri!NtI[p UGLEs pEC=s pB0%7@f]>R <G9#=7O631+4i/Os,GpU*} Z(P&I#=dX) \>[ #8 ' ?@ @@  @@ @@EEv!Ev!%EEv! @@AA AAAA!!%Ev!5EEv!/ @@;AA/ @@/AA5!!5Ev! ?@ AA AAAA!!!!5EEv!5Ev!!!+EEv!+Ev! % @@% @@% AA1 AA+!!%EEv!%Ev!1  !!/ @?/ @@;AA/AA5!!EEv!Ev! @? @@+AAAA%!!EEv!Ev! ?? ?@ AAAA!!EEv!Ev! @? @@AAAA!! @@ @@+AAAA%!!@@@!@;@;@  !AA AA)AA,AA/AAAA AAEEv!Ev!EEv!Ev!|% <  9p d p d p2 d p2 d p" d p d p d p dp( dp D p d p`  p" d p dp4 d ( <=$GRAPEVINE2Most computing is done in personal workstation computers [12]; typically each workstation has amodest amount of local disk storage. These workstations may be used at different times fordifferent tasks, although generally each is used only by a single individual. The internet connectingthese workstations is a collection of Ethernet local networks [6], gateways, and long distance links(typically telephone lines at data rates of 9.6 to 56 Kbps). Also connected to the internet are servercomputers that provide shared services to the community, such as file storage or printing.Protocols already exist for communicating between computers attached to the internet [11]. Theseprotocols provide a uniform means for addressing any computer attached to any local network inorder to send individual packets or establish and use byte streams. The individual packets aretypically small (up to 532 bytes), and are sent unreliably (though with high probability of success)with no acknowledgement. The byte stream protocols provide reliable, acknowledged, transmissionof unlimited amounts of data [1].1.2 Services and ClientsOur primary consideration when designing and implementing Grapevine was its use as the deliverymechanism for a large, dispersed computer mail system. A computer mail system allows a group ofhuman users to exchange messages of digital text. The sender prepares a message using some sortof text editing facility and names a set of recipients. He then presents the message to a deliverymechanism. The delivery mechanism moves the message from the sender to an internal buffer foreach recipient, where it is stored along with other messages for that recipient until he wants toreceive them. We call the buffer for a recipient's messages an inbox. When ready, the recipientcan read and process the messages in his inbox with an appropriate text display program. Therecipient names supplied by the sender may identify distribution lists: named sets of recipients,each of whom is to receive the message. We feel that computer mail is both an importantapplication of distributed computing and a good test bed for ideas about how to structuredistributed systems.Buffered delivery of a digital message from a sender to one or more recipients is a mechanism thatis useful in many contexts: it may be thought of as a general communication protocol, with thedistinctive property that the recipient of the data need not be available at the time the sender wishesto transmit the data. Grapevine separates this message delivery function from message creation andinterpretation, and makes the delivery function available for a wider range of uses. Grapevine doesnot interpret the contents of the messages it transports. Interpretation is up to the various messagemanipulation programs that are software clients of Grapevine. A client program implementing acomputer mail user interface will interpret messages as interpersonal, textual memos. Other clientsmight interpret messages as print files, digital audio, software, capabilities, or data base updates.Grapevine also offers authentication, access control, and resource location services to clients. Forexample, a document preparation system might use Grapevine's resource location service to find asuitable printing server attached to the internet (and then the message delivery service to transfer adocument there for printing) or a file server might use Grapevine's authentication and access controlservices to decide if a read request for a particular file should be honored.Grapevine's clients run on various workstation and server computers attached to the internet.Grapevine itself is implemented as programs running on server computers dedicated to Grapevine.A client accesses the services provided by Grapevine through the mediation of a software packagerunning on the client's computer. The Grapevine computers cooperate to provide services that aredistributed and replicated.[pqpV s p0U$8#SZ51Q"BOasMpZJ;&I'!=G\_EGC">A!\^GRAPEVINE32. Design GoalsWe view distributed implementation of Grapevine both as a design goal and as the implementationtechnique that best meets the other design goals. A primary motivation for the Grapevine projectwas implementing a useful distributed system in order to understand some system structures thatmet a real set of requirements. Once we chose message delivery as the functional domain for theproject, the following specific design goals played a significant role in determining system structure.Grapevine makes its services available to many different clients. Thus, it should make noassumptions about message content. Also, the integrity of these services should not in any waydepend on correctness of the clients. Though the use of an unsatisfactory client program will affectthe service given to its user, it should not affect the service given to others. These two goals helpdetermine the distribution of function between Grapevine and its clients.Two goals relate to Grapevine's reliability properties. First, a user or client implementor should feelconfident that if a message is accepted for delivery then it will either be made available to itsintended recipients or returned with an indication of what went wrong. The delivery mechanismshould meet this goal in the face of user errors (such as invalid names), client errors (such asprotocol violations), server problems (such as disk space congestion or hardware failures), orcommunication difficulties (such as internet link severance or gateway crashes). Second, failure of asingle Grapevine server computer should not mean the unavailability of the Grapevine services toany client.The typical interval from sending a message to its arrival in a recipient's inbox should be a fewminutes at most. The typical interactive delay perceived by a client program when delivering orreceiving a message should be a few seconds at most. Since small additions to delivery times arenot likely to be noticed by users, it is permissible to improve interactive behavior at the expense ofdelivery time.Grapevine should allow decentralized administration. The users of a widespread internet naturallybelong to different organizations. Such activities as admission of users, control of the names bywhich they are known, and their inclusion in distribution lists should not require an unnaturaldegree of cooperation and shared conventions among administrations. An administrator should beable to implement his decisions by interacting directly with Grapevine rather than by sendingrequests to a central agency.Grapevine should work well in a large size range of user communities. Administrators should beable to implement decentralized decisions to adjust storage and computing resources in convenientincrements when the shape, size, or load patterns of the internet change.Grapevine should provide authentication of senders and recipients, message delivery secure fromeavesdropping or content alteration, and control on use and modification of its data bases.M&pq<8pH rBp"=A(O?^%:=B;,;8K6:%5+Q3a,:1I.I,]*2,). U'c%9%#C#`"  V1KfG"D 9)3/4R i2- H 0/S 7I.6)d7$ =N@GRAPEVINE43. Overview3.1 Registration Data BaseGrapevine maintains a registration data base that maps names to information about the users,machines, services, distribution lists, and access control lists that those names signify. This data baseis used in controlling the message delivery service; is accessed directly for the resource location,access control and authentication services; and is used to configure Grapevine itself. Grapevine alsomakes the values in the data base available to clients to apply their own semantics.There are two types of entries in the registration data base: individual and group. We call the nameof an entry in the registration data base an RName.A group entry contains a set of RNames of other data base entries, as well as additional informationthat will be discussed later. Groups are a way of naming collections of RNames. The groups forma naming network with no structural constraints. Groups are used primarily as distribution lists:specifying a group RName as a recipient for a message causes that message to be sent to allRNames in that group and in contained groups. Groups also are used to represent access controllists and collections of like resources.An individual entry contains an authenticator (a password), a list of inbox sites, and a connect site,as well as additional information that will be discussed later. The inbox site list indicates, in orderof preference, the Grapevine computers where the individual's messages may be buffered. The waythese multiple inboxes are used is discussed in Section 4.2. The connect site is an internet addressfor making a connection to the individual. Thus, an individual entry specifies ways ofauthenticating the identity of and communicating with  by message delivery or internetconnection  the named entity. Individuals are used to represent human users and servers, inparticular the servers that implement Grapevine. Usually the connect site is used only forindividuals that represent servers. Specifying an individual RName (either a human or a server) asa recipient of a message causes the message to be forwarded to and buffered in an inbox for thatRName.3.2 FunctionsFollowing is a list of the functions that Grapevine makes available to its clients. Responses to errorconditions are omitted from this description. The first three constitute Grapevine's delivery service.Accept message:[sender, password, recipients, message-body] _ okThe client presents a message body from the sender for delivery to the recipients. Thesender must be the RName of an individual and the password must authenticate thatindividual (see below). The recipients are individual and group RNames. The individualscorrespond directly to message recipients while the groups name distribution lists. AfterGrapevine acknowledges acceptance of the message the client can go about its otherbusiness. Grapevine then expands any groups specified as recipients to produce thecomplete set of individuals that are to receive the message and delivers the message to aninbox for each.WpqpQr LsIpsp &H [FBMDwTBT?7s psp=-sp:M9%<7<[5r@3-21(.s ps ps p- .:+?D)u W'1&%5tp$ tp+&"J[ I Us p [-)sps p%up R #6 Y6#/+4%..;d >X,8GRAPEVINE5Message polling:[individual] _ {empty, non-empty}Message polling is used to determine whether an individual's inboxes contain messages thatcan be retrieved. We chose not to authenticate this function so it would respond faster andload the Grapevine computers less.Retrieve messages:[name, password] _ sequence of messages _ okThe client presents an individual's name and password. If the password authenticates theindividual then Grapevine returns all messages from the corresponding inboxes. When theclient indicates "ok," Grapevine erases these messages from those inboxes.Grapevine's authentication, access control, and resource location services are implemented by theremaining functions. These are called the registration service, because they are all based on theregistration data base.Authenticate:[individual, password] _ {authentic, bogus}The authentication function allows any client to determine the authenticity of an individual.An individual/password combination is authentic if the password matches the one in theindividual's registration data base entry.1Membership:[name, group] _ {in, out}Grapevine returns an indication of whether the name is included in the group. Usually theclient is interpreting the group as an access control list. There are two forms of themembership function. One indicates direct membership in the named group; the otherindicates membership in its closure.Resource location:[group] _ members[individual] _ connect site[individual] _ ordered list of inbox sitesThe first resource location function returns a group's membership set. If the group isinterpreted as a distribution list, this function yields the individual recipients of a messagesent to the distribution list; if the group is interpreted as the name of some service, thisfunction yields the names of the servers that offer the service. For a group representing aservice, combining the first function with the second enables a client to discover the internetaddresses of machines offering the service, as described in Section 5. The third function isused for message delivery and retrieval as described in Section 4.Registration data base update and inquiry:There are various functions for adding and deleting names in the registration data base, andfor inspecting and changing the associated values. 1 This password-based authentication scheme is intrinsically weak. Passwords are transmitted over the internet as clear-text and clients of the authentication service see individuals' passwords. It also does not provide two-way authentication:clients cannot authenticate servers. The Grapevine design includes proper encryption-based authentication and securityfacilities that use Needham and Schroeder's protocols [9] and the Federal Data Encryption Standard [8]. These betterfacilities, however, are not implemented yet.Wpq<8pR?s gp upO7L Ml/-K"Hs gp upupE7"CIAJ>A=)*sp;_8Ws gpup5N&73"41*2Fq.s gp up +6$)J (**&I$#As gpup g!v up g up<W]D(4yI8%Bs* pA 2H q}pqwN.nA4d- : >X,<GRAPEVINE63.3 RegistriesWe use a partitioned naming scheme for RNames. The partitions serve as the basis for dividing theadministrative responsibility, and for distributing the data base among the Grapevine computers.We structure the name space of RNames as a two-level hierarchy. An RName is a character stringof the form F.R where R is a registry name and F is a name within that registry. Registries cancorrespond to organizational, geographic, or other arbitrary partitions that exist within the usercommunity. A two-level hierarchy is appropriate for the size and organizational complexity of ouruser community, but a larger community or one with more organizational diversity would cause usto use a three-level scheme. Using more levels would not be a fundamental change to Grapevine.3.4 Distribution of FunctionAs indicated earlier, Grapevine is implemented by code that runs in dedicated Grapevinecomputers, and by code that runs in clients' computers. The code running in a Grapevine computeris partitioned into two parts, called the registration server and the message server. Although oneregistration server and one message server cohabit each Grapevine computer, they should bethought of as separate entities. (Message servers and registration servers communicate with oneanother purely by internet protocols.) Several Grapevine computers are scattered around theinternet, their placement being dictated by load and topology. Their registration servers worktogether to implement the registration service. Their message servers work together to implementthe delivery service. As we will see in Sections 4 and 5, message and registration services are eachclients of the other.The registration data base is distributed and replicated. Distribution is at the grain of a registry;that is, each registration server contains either entries for all RNames in a registry or no entries forthat registry. Typically no registration server contains all registries. Also, each registry is replicatedin several different registration servers. Each registration server supports, by publicly availableinternet protocols, the registration functions described above for names in the registries that itcontains. Any server that contains the data for a registry can accept a change to that registry. Thatserver takes the responsibility for propagating the change to the other relevant servers.Any message server is willing to accept any message for delivery, thus providing a replicated mailsubmission service. Each message server will accept message polling and retrieval requests forinboxes on that server. An individual may have inboxes on several message servers, thus replicatingthe delivery path for the individual.If an increase in Grapevine's capacity is required to meet expanding load, then another Grapevinecomputer can be added easily without disrupting the operation of existing servers or clients. Ifusage patterns change, then the distribution of function among the Grapevine computers can bechanged for a particular individual, or for an entire registry. As we shall see later this redistributionis facilitated by using the registration data base to describe the configuration of Grapevine itself.The code that runs in clients' machines is called the GrapevineUser package. There are severalversions of the GrapevineUser package: one for each language or operating environment. Theirfunction and characteristics are sufficiently similar, however, that they may be thought of as a singlepackage. This package has two roles: it implements the internet protocols for communicating withparticular Grapevine servers; and it performs the resource location required to choose which serverto contact for a particular function, given the data distribution and server availability situation of themoment. GrapevineUser thus makes the multiple Grapevine servers look like a single service. Aclient using the GrapevineUser package never has to mention the name or internet address of a\pqpX,s U$pNSY7)Q(7O spspsp sp!M,6L/4.Jd&9HBCs@xp.)>I< spsp ;K9N_7\54+3 V2#7.0Y-PJ+*>)a 'P&&O$[R"YPH7-)%!)8V UV[50 6sp $&7 YG(9!B%E.\d] >^dGRAPEVINE7particular Grapevine server. The GrapevineUser package is not trusted by the rest of Grapevine.Although an incorrect package could affect the services provided to any client that uses it, it cannotaffect the use of Grapevine by other clients. The implementation of Grapevine, however, includesengineering decisions based on the known behavior of the GrapevineUser package, on theassumption that most clients will use it or equivalent packages.3.5 Examples of How Grapevine WorksWith Figure 2 we consider examples of how Grapevine works. If a user named P.Q were usingworkstation 1 to send a message to X.Y, then events would proceed as follows. After the user hadprepared the message using a suitable client program, the client program would call the deliveryfunction of the GrapevineUser package on workstation 1. GrapevineUser would contact someregistration server, such as A, and use the Grapevine resource location functions to locate anymessage server, such as B; it would then submit the message to B. For each recipient, B woulduse the resource location facilities, and suitable registration servers (such as A) to determine thatrecipient's best inbox site. For the recipient X.Y, this might be message server C, in which case Bwould forward the message to C. C would buffer this message locally in the inbox for X.Y. If themessage had more recipients, the message server B might consult other registration servers andforward the message to multiple message servers. If some of the recipients were distribution lists, Bwould use the registration servers to obtain the members of the appropriate groups.<==  @@ @@    @@@@@      @@@@  @@@    @@           !!    "! """##!"  "! """##!"*)(***++)*:98:::;;9: user "P.Q"t GrapevineUserClient program GrapevineUser GrapevineUserFile Server "E"MessagesendiMessage Registration Registration authenticate, membershiplocate locatet authenticate retrievet Server "A"o Server "D"oClient program Server "B"o Server "C"olocate forwardFTP authenticater<==<< connectiont user "X.Y"t Workstation 1 Workstation 2 GRAPEVINE\pq<8pWLU@&TFRO(.P@Kks#HcpEsp F sp;DS C4%A9sp*?nsp&spsp=Qsp;0spsps:p spsp4sp8C0sp 6y'>s4pS/!)dX# o>]** d!> ' " #AA AAAAAAAAAAAAAA AA AA%AA%AA%AA%AA %AA%AA%AA%AA%AA%AA%AA %AA#%AA&%AA)%AA,%AA-#AA- AA-AA-AA-AA-AA-AA-AA- AA- AA AA @? AA @@ @@AA @@AA   !! AA !!!!! @@ AA AA AA AA AA AA AA AA AA AA AA AA# AA& AA) AA, AA @@AA @@+AA @@! @@AA @@#AA @?)   !!  !!+AA @@'!!/ /AA/ @@;AA AAAA/! @@7!!!!  + 9  1 d< & p d p d p d p" d p2 "d p2 dp dp  p dp # p!  p` #p  p& d p D p d p ! p"  p" d p  p  p@ d p $p2 Dp $ p *dp2  p" d p@ D p. d p` 'D 3p =`=$GRAPEVINE8When X.Y wishes to use workstation 2 to read his mail, his client program calls the retrievalfunction of the GrapevineUser package in workstation 2. GrapevineUser uses some registrationserver (such as D) that contains the Y registry to locate inbox sites for X.Y, then connects to each ofthese inbox sites to retrieve his messages. Before allowing this retrieval, C uses a registration serverto authenticate X.Y.If X.Y wanted to access a file on the file server E through some file transfer program (FTP) the fileserver might authenticate his identity and check access control lists by communicating with someregistration server (such as A).3.6 Choice of FunctionsThe particular facilities provided by Grapevine were chosen because they are required to supportcomputer mail. The functions were generalized and separated so other applications also could makeuse of them. If they want to, the designers of other systems are invited to use the Grapevinefacilities. Two important benefits occur, however, if Grapevine becomes the only mechanism forauthentication and for grouping individuals by organization, interest, and function. First, ifGrapevine performs all authentications, then users have the same name and password everywhere,thus simplifying many administrative operations. Second, if Grapevine is used everywhere forgrouping, then the same group structure can be used for many different purposes. For example, asingle group can be an access control list for several different file servers and also be a distributionlist for message delivery. The groups in the registration data base can capture the structure of theuser community in one place to be used in many ways.4. Message DeliveryWe now consider the message delivery service in more detail.4.1 AcceptanceTo submit a message for delivery a client must establish an internet connection to a message server;any operational server will do. This resource location step, done by the GrapevineUser package, isdescribed in Section 5. Once such a connection is established, the GrapevineUser package simplytranslates client procedure calls into the corresponding server protocol actions. If that particularmessage server crashes or otherwise becomes inaccessible during the message submission, then theGrapevineUser package locates another message server (if possible) and allows the client to restartthe message submission.The client next presents the RName and password of the sender, a returnTo RName, and a list ofrecipient RNames. The message server authenticates the sender by using the registration service. Ifthe authentication fails, the server refuses to accept the message for delivery. Each recipientRName is then checked to see if it matches an RName in the registration data base. All invalidrecipient names are reported back to the client. In the infrequent case that no registration serverfor a registry is accessible, all RNames in that registry are presumed for the time being to be valid.The server constructs a property list for the message containing the sender name, returnTo name,recipient list, and a postmark. The postmark is a unique identification of the message, and consists[pqpVsp9U(4S6 spsp$spQk\9GRAPEVINE9of the server's clock reading at the time the message was presented for delivery together with theserver's internet address. Next, the client machine presents the message body to the server. Theserver puts the property list and message body in reliable storage, indicates that the message isaccepted for delivery, and closes the connection. The client may cancel delivery anytime prior tosending the final packet of the message body, for example after being informed of invalidrecipients.Only the property list is used to direct delivery. A client might obtain the property values byparsing a text message body and require that the parsed text be syntactically separated as a"header", but this happens before Grapevine is involved in the delivery. The property list stayswith the message body throughout the delivery process and is available to the receiving client.Grapevine guarantees that the recipient names in the property list were used to control the deliveryof the message, and that the sender RName and postmark are accurate. 4.2 Transport and BufferingOnce a message is accepted for delivery, the client may go about its other business. The messageserver, however, has more to do. It first determines the complete list of individuals that shouldreceive the message by recursively enumerating groups in the property list. It obtains from theregistration service each individual's inbox site list. It chooses a destination message server for eachon the basis of the inbox site list ordering and its opinion of the present accessibility of the othermessage servers. The individual names are accumulated in steering lists, one for each messageserver to which the message should be forwarded and one for local recipients. The message serverthen forwards the message and appropriate steering list to each of the other servers, and places themessage in the inboxes for local recipients. Upon receiving a forwarded message from anotherserver, the same algorithm is performed using the individuals in the incoming steering list as therecipients, all of which will have local inboxes unless the registration data base has changed. Themessage server stores the property list and body just once on its local disk and places references tothe disk object in the individual's inboxes. This sharing of messages that appear in more that onelocal inbox saves a considerable amount of storage in the server.2With this delivery algorithm, messages for an individual tend to accumulate at the server that is firston the inbox site list. Duplicate elimination, required because distribution lists can overlap, isachieved while adding the message into the inboxes by being sure never to add a message if thatsame message, as identified by its postmark, was the one previously added to that inbox. Thisduplicate elimination mechanism fails under certain unusual circumstances such as servers crashingor the data base changing during the delivery process, but requires less computation than thealternative of sorting the list of recipient individuals. In some circumstances delivery must be delayed: for example, all of an individual's inbox sites or aregistry's registration servers may be inaccessible. In such cases the message is queued for laterdelivery.In some circumstances delivery will be impossible: for example, a recipient RName may beremoved from the registration data base between validation and delivery, or a valid distribution listmay contain invalid RNames. Occasionally delivery may not occur within a reasonable time: for 2 As another measure to conserve disk storage, messages from an inbox not emptied within 7 days are copied to a fileserver and the references in the inbox are changed to point at these archived copies. Archiving is transparent to clients:archived messages are transferred back through the message server when messages from the inbox are retrieved.[pq<8pVGU4 s pS6DQkQOQM JII7%G8 TEn-2C-7AF\[GRAPEVINE10example, a network link may be down for several days. In such cases the message server mails acopy of the message to an appropriate RName with a text explanation of what the problem was andwho did not get the message. The appropriate RName for this error notification may be thereturnTo name recorded in the message's property list or the owner of the distribution list thatcontained the invalid name, as recorded in a group entry in the registration data base. Even thiserror notification can fail, however, and ultimately such messages end up in a dead letter inbox forconsideration by a human administrator.4.3 RetrievalTo retrieve new messages for an individual, a client invokes the GrapevineUser package todetermine the internet addresses of all inbox sites for the individual, and to poll each site for newmessages by sending it a single inbox check packet containing the individual's RName. For eachpositive response, GrapevineUser connects to the message server and presents the individual's nameand password. If these are authentic, then the message server permits the client to inspect waitingmessages one at a time, obtaining first the property list and then the body. When a client has safelystored the messages, it may send an acknowledgement to the message server. On receipt of thisacknowledgement, the server discards all record of the retrieved messages. Closing the retrievalconnection without acknowledgement causes the message server to retain these messages. For thebenefit of users who want to inspect new messages when away from their personal workstation, themessage server also allows the client to specify that some messages from the inbox be retained andsome be discarded.There is no guarantee that messages will be retrieved in the order they were presented for delivery.Since the inbox is read first-in, first-out and messages tend to accumulate in the first inbox of anindividual's inbox site list, however, this order is highly likely to be preserved. The postmark allowsclients who care to sort their messages into approximate chronological order. The order isapproximate because the postmarks are based on the time as perceived by individual messageservers, not on any universal time. 4.4 Use of Replication in Message DeliveryReplication is used to achieve a highly available message delivery service. Any message server canaccept any message for delivery. Complete replication of this acceptance function is importantbecause the human user of a computer mail client may be severely inconvenienced if he cannotpresent a message for delivery when he wants to. He would have to put the message somewhereand remember to present it later. Fortunately, complete replication of the acceptance function ischeap and simple to provide. Message transport and buffering, however, are not completelyreplicated. Once accepted for delivery, the crash of a single message server can delay delivery of aparticular message until the server is operational again, by temporarily trapping the message in aforwarding queue or an inbox.3 Allowing multiple inboxes for an individual replicates the deliverypath. Unless all servers containing an individual's inbox sites are inaccessible at once, new messagesfor that individual can get through. We could have replicated messages in several of an 3 The servers are programmed so any crash short of a physical disk catastrophe will not lose information. Writing asingle page to the disk is used as the primitive atomic action.WpqpRHQ7(O7<MlGKDI9s p H 'Bs ?p:> -8!\>D"G Xe U  Pq p ;a-Xl.qUd? R>XCGRAPEVINE11individuals's inboxes, but the expense and complexity of doing so does not seem to be justified bythe extra availability it would provide. If the immediate delivery of a message is important then itsfailure to arrive is likely to be noticed outside the system; it can be sent again because a deliverypath for new messages still exists.5. The Registration Data BaseThe registration data base is used by Grapevine to name registration servers, message servers, andindeed, registries themselves. This recursive use of the registration data base to represent itselfresults in an implementation that is quite compact.5.1 Implementing RegistriesOne registry in the data base is of particular importance, the registry named GV (for Grapevine).The GV registry is replicated in every registration server; all names of the form *.gv exist in everyregistration server. The GV registry controls the distribution and replication of the registration database, and allows clients to locate appropriate registration servers for particular RNames.Each registration server is represented as an individual in the GV registry. The connect site for thisindividual is the internet address where clients of this registration server can connect to it. (Theauthenticator and inbox site list in the entry are used also, as we will see later.)The groups of the GV registry are the registries themselves; reg is a registry if and only if thereexists a group reg.gv. The members of this group are the RNames of the registration servers thatcontain the registry. The GV registry is represented this way too. Since the GV registry is in everyregistration server, the membership set for gv.gv includes the RNames of all registration servers. 5.2 Message Server NamesEach message server is represented as an individual in the MS registry (for message servers). Theconnect site in this entry is the internet address where clients of this message server can connect toit. (The authenticator and inbox site list in the entry are used also, as we will see later.) It ismessage server RNames that appear in individuals' inbox site lists. A group in the MS registry, Maildrop.ms, contains as members some subset (usually, but notnecessarily, all) of the message server RNames. This group is used to find a message server thatwill accept a message for delivery.5.3 Resource LocationThe registration data base is used to locate resources. In general, a service is represented as a groupin the data base; servers are individuals. The members of the group are the RNames of the serversoffering the service; the connect sites of the individuals are the internet addresses for the servers.To contact an instance of the service, a client uses the GrapevineUser package to obtain themembership of the group and then to obtain the connect site of each member. The client then maychoose among these addresses, for example on the basis of closeness and availability.\pq;pW5>$UjMSJQ#JrEpMCRB3=s9p#+qp59)999983qpFspsp 6ispqp:4Z1'qp sp/32.T* qsp(sp#).spK'cqp2qp %,sp swp;qp/ )w4y wP<)F qsp s p3EAz# as Yp`+8M O.6*dU ?]"JGRAPEVINE12The GrapevineUser package employs such a resource location strategy to find things in thedistributed registration data base. Assume for a moment that there is a way of getting the internetaddress of some operational registration server, say Cabernet.gv. GrapevineUser can find theinternet addresses of those registration servers that contain the entry for RName f.r by connecting toCabernet.gv and asking it to produce the membership of r.gv. GrapevineUser can pick a particularregistration server to use by asking Cabernet.gv to produce the connect site for each server in r.gvand attempting to make a connection until one responds. If f.r is a valid name, then anyregistration server in r.gv has the entry for it. At this point GrapevineUser can extract any neededinformation from the entry of f.r, for example, the inbox site list.Similarly, GrapevineUser can obtain the internet addresses of message servers that are willing toaccept messages for delivery by using this resource location mechanism to locate the servers in thegroup MailDrop.ms. Any available server on this list will do.In practice, these resource location algorithms are streamlined so that although the generalalgorithms are very flexible, the commonly occurring cases are handled with acceptable efficiency.For example, a client may assume initially that any registration server contains the data base entryfor a particular name; the registration server will return the requested information or a name notfound error if this registration server knows the registry, and otherwise will return a wrong servererror. To obtain a value from the registration data base a client can try any registration server; onlyin the case of a wrong server response does the client need to perform the full resource locationalgorithm.We are left with the problem of determining the internet address of some registration server inorder to get started. Here it is necessary to depend on some more primitive resource locationprotocol. The appropriate mechanism depends on what primitive facilities are available in theinternet. We use two mechanisms. First, on each local network is a primitive name lookup server,which can be contacted by a broadcast protocol. The name lookup server contains an infrequentlyupdated data base that maps character strings to internet addresses. We arrange for the fixedcharacter string GrapevineRServer to be entered in this data base and mapped to the internetaddresses of some subset of the registration servers in the internet. The GrapevineUser package canget a set of addresses of registration servers using the broadcast name lookup protocol, and send adistinctive packet to each of these addresses. Any accessible registration server will respond to suchpackets, and the client may then attempt to connect to whichever server responds. Second, webroadcast a distinctive packet on the directly connected local network. Again, any accessibleregistration server will respond. This second mechanism is used in addition to the first because,when there is a registration server on the local network, the second method gives response fasterand allows a client to find a local registration server when the name lookup server is down.EpqpAC?:L=p5s p;=sp9s p+sp&8%s p %s6Ep)sp4{spJ2sp#/01-P,s p-) $8'@*8%u0sp& #4's!pPs p44K s pD x@ R>D sNpS Xsp+ X #N Y/86&.0';. Wd@ !>F4GRAPEVINE13Part II: Grapevine as a Distributed System6. Updating the Registration Data Base The choice of methods for managing the distributed registration data base was largely determinedby the requirement that Grapevine provide highly available, decentralized administrative functions.Administrative functions are performed by changing the registration data base. Replication of thisdata base makes high availability of administrative functions possible. An inappropriate choice ofthe method for ensuring the consistency of copies of the data, however, might limit this potentialhigh availability. In particular, if we demanded that data base updates be atomic across all servers,then most servers would have to be accessible before any update could be started. For Grapevine,the nature of the services dependent on the registration data allows a looser definition of consistencythat results in higher availability of the update function. Grapevine guarantees only that the copiesof a registration data base entry eventually will have the same new value following an update to oneof them. If all servers containing copies are up and can communicate with one another,convergence will occur within a few minutes at most. While an update is converging, clients maydetect inconsistency by reading the value of an entry from several servers.6.1 RepresentationThe value for each entry in the registration data base is represented mainly as a collection of lists.The membership set of a group is one such list. Each list is represented as two sublists of items,called the active sublist and the deleted sublist. An item consists of a string and a timestamp. Aparticular string can appear only once in a list, either in the active or the deleted sublist. Atimestamp is a unique identifier whose most significant bits are a time and least significant bits aninternet address. The time is that perceived by the server that placed the item in the list; theaddress is that server's. Because a particular server never includes the same time in two differenttimestamps, all timestamps from all servers are totally ordered.4For example, Figure 3 presents the complete entry for a group named "LaurelImp^.pa" from theregistration data base as it appeared in early April 1981. There are three such lists in this entry: themembership set labelled members and two access control lists labelled owners and friends (seeSection 6.5 for the semantics of these). There are five current members followed by thecorresponding five timestamps, and one deleted member followed by the corresponding timestamp.The owners and friends lists each contain one name and no deletions are recorded from either.A registration data base entry also contains a version timestamp. This timestamp, which has thesame form as an item timestamp, functions as an entry's version number. Whenever anything in anentry changes the version timestamp increases in value, usually to the maximum of the othertimestamps in the entry. When interrogating the data base, a client can compare the versiontimestamp on which it based some cached information with that in the data base. If the cachedtimestamp matches then the client is saved the expense of obtaining the data base value again andrecomputing the cached information. The version timestamp appears in the prefix line in Figure 3. 4 The item timestamps in the active sublist are used to imply the preference order for the inbox site list in anindividual's entry; older items are prefered. Thus, deleting then adding a site name moves it to the end of thepreference ordering.Zpq;pXUri+Pt'K[pGILGCE`D1>$Bf%A@8)>$C=;+;<C!9q5"7T 5K0s-pK+/4*% spsp8([;&& Y$F"N!0@!q(pC]Ysp&spsp3%.03!<+R `7)-. K 8& 6Kk#?Slq)EWd$ >\TGRAPEVINE14Prefix: [1-Apr-81 12:46:45, 3#14], type = group, LaurelImp^.paRemark: (stamp=[22-Aug-80 23:42:14, 3#22]) Laurel TeamMembers: Birrell.pa, Brotz.pa, Horning.pa, Levin.pa, Schroeder.paStamp-list: [23-Aug-80 17:27:45, 3#22], [23-Aug-80 17:42:35, 3#22], [23-Aug-80 19:04:54, 3#22], [23-Aug-80 19:31:01, 3#22], [23-Aug-80 20:50:23, 3#22]DelMembers: Butterfield.paStamp-list: [25-Mar-81 14:15:12, 3#14]Owners: Brotz.paStamp-list: [22-Aug-80 23:43:09, 3#14]DelOwners: noneStamp-list: nullFriends: LaurelImp^.paStamp-list: [1-Apr-81 12:46:45, 3#14]DelFriends: noneStamp-list: nullFigure 3: A Group from the Registration Data Base6.2 Primitive OperationsGrapevine uses two primitive operations on the lists in a registration data base entry. An updateoperation can add or delete a list item. To add/delete the string s to/from a list, any item with thematching string in either of the sublists first is removed. Then a timestamp t is produced from theserver's internet address and clock. Finally the item (s,t) is added to the active/deleted sublist. Amerge operation combines two versions of a complete list to produce a new list with the most recentinformation from both. Each string that appears in either version will appear precisely once in theresult. Each string will be in the active or deleted sublist of the result according to the largesttimestamp value associated with that string in either version. That largest timestamp value alsoprovides the timestamp for the string in the result. Keeping the sublists sorted by string valuegreatly increases the speed with which the merge can be performed. The update and mergeoperations are atomic in each particular server.6.3 PropagationThe administrative interface to Grapevine is provided by client software running in anadministrator's computer. To make a change to the data of any registry, a client machine uses theresource location facilities of the GrapevineUser package to find and connect to some registrationserver that knows about that registry. That registration server performs an update operation on thelocal copy of an entry. Once this update has been completed the client can go about its otherbusiness. The server propagates the change to the replicas of the entry in other servers. The meansused to propagate the change is Grapevine's delivery service itself, since it gives a guarantee ofdelivery and provides buffering when other servers are temporarily inaccessible. As described inSection 5.1, the members of the group that represent a registry are the registration servers thatcontain a copy of the data for that registry. Thus, if the change is to an entry in the reg registry,the accepting server sends a change message to the members, other than itself, of the distribution listreg.gv. A change message contains the name of the affected entry and the entire new value for theentry. Registration servers poll their inboxes for new messages every 30 seconds. When a changemessage is received by a server it uses merge operations to combine the entry from the changemessage with its own copy.\+pqpW{tq8U#tq/Rtq9QH_O7N?L&JdtqH&G\ECtq A%@x>';N;pX26s3p4(s1p"!spsp/Nsp.$6sp*,Ysp2+*:*(E&L%/J#dX!0sxp8S9QM YNAZ M *7 #C YHsp s p&spXF. Sd  =]E8GRAPEVINE15With this propagation algorithm, the same final state eventually prevails everywhere. When a clientmakes multiple updates to an entry at the same server, a compatible sequence of entry values willoccur everywhere, even if the resulting change messages are processed in different orders bydifferent servers. If two administrators perform conflicting updates to the data base such as addingand removing the same member of a group, initiating the updates at different servers at nearly thesame time, it is hard to predict which one of them will prevail; this appears to be acceptable, sincethe administrators presumably are not communicating with each other outside the system. Also,since copies will be out of step until the change messages are received and acted upon, clients mustbe prepared to cope with transient inconsistencies. The algorithms used by clients have to beconvergent in the sense that an acceptable result will eventually ensue even if different andinconsistent versions of the registration data appear at various stages in a computation. Themessage delivery algorithms have this property. Similar update propagation techniques have beenproposed by others who have encountered situations that do not demand instantaneous consistency[10, 13].If deleted items were never removed from an entry, continued updates would cause the data base togrow. Deleted items are kept in an entry so that out-of-order arrival of change messages involvingaddition followed by deletion of the same string will not cause the wrong final state. Deleted itemsalso provide a record of recent events for use by human administrators. We declare an upperbound of 14 days on the clock asynchrony among the registration servers, on message deliverydelay, and on administrative hindsight. The Grapevine servers each scan their local data base oncea day during inactive periods and purge all deleted items older than the bound.If a change message gets destroyed because of a software bug or equipment failure, there is adanger that a permanent inconsistency will result. Since a few destroyed messages over the life ofthe system are inevitable, we must provide some way to resynchronize the data base. At one pointwe dealt with this problem by detecting during the merge operation whether the local copy of theentry contained information that was missing from the incoming copy. Missing information causedthe server to send the result of the merge in a change message to all servers for the registry. Whilethis "anti-entropy" mechanism tended to push the data base back into a consistent state, the effectwas too haphazard to be useful; errors were not corrected until the next change to an entry. Ourpresent plan for handling long-term inconsistencies is for each registration server periodically, sayonce a night, to compare its copy of the data base for a registry with another and to use merges toresolve any inconsistencies that are discovered. The version timestamp in each entry makes thiscomparison efficient: if two version timestamps are equal then the entries match. Care must betaken that the comparisons span all registration servers for a registry, or else disconnected regions ofinconsistency can survive.6.4 Creating and Deleting NamesThe rule that the latest timestamp wins does not deal adequately with the creation of new names. Iftwo administrators connect to two different registration servers at about the same time and try tocreate a new data base entry with the same name, it is likely that both will succeed. When this database change propagates, the entry with the latest time timestamp will prevail. The losingadministrator may be very surprised, if he ever finds out. Because the later creation could betrapped in a crashed registration server for some time, an administrator could never be sure that hiscreation had won. For name creation we want the earlier creation to prevail. To achieve this effect,we faced the possibility of having to implement one of the known and substantial algorithms foratomic updates to replicated databases [3], which seemed excessive, or of working out a way tomake all names unique by appending a hidden timestamp, which seemed complex. We instead fell\pq;pX /5V>"?TtFRSPW O XMI%9KLIHGs pLFYDTEBI@=(9;Y :"H8W N62*4C2O/?.$I,ZH*J(D&U%/X #eZ!UNU :CpHspa@" [ $*0 Y*5!Dsp.D/Q dE =]kGRAPEVINE16back on observations about the way in which systems of this nature are used. For each registrythere is usually some human-level centralization of name creation, if only to deal with questions ofsuitability of RNames (not having a junior clerk preempt the RName which everyone wouldassociate with the company president). We consider this centralization enough to solve the problem.Note that there is no requirement that a particular server be used for name creation: there is nocentralization at the machine level.Deleting names is straightforward. A deleted entry is marked as such and retained in the data basewith a version timestamp. Further updates to a deleted entry are not allowed. Recreation of adeleted entry is not allowed. Sufficiently old deleted entries are removed from the data base by thepurging process described in Section 6.3.6.5 Access ControlsAn important aspect of system administration is control of who can make which administrativechanges. To address this need we associate two access control lists with each group: the owners listand the friends list. These lists appear in the example entry in Figure 3. The interpretation of theseaccess lists is the responsibility of the registration server. For ordinary groups the conventions areas follows: membership in the owners list confers permission to add or remove any group member,owner, or friend; membership in the friends list confers permission to add or remove oneself. Thenames in the owners and friends lists may themselves be the names of groups. Quite separately,clients of the registration server have freedom to use membership in groups for access controlpurposes about which the registration server itself knows nothing at all. The owners and friendslists on the groups that represent registries are used to control name creation and deletion withinregistries; these lists also provide the default access controls on groups whose owners list is empty.While we have spent some time adjusting the specific semantics of the Grapevine access controls,we do not present further details here. 6.6 Other Consequences of ChangesThe registration servers and message servers are normal clients of one another's services, with nospecial relationship. Registration servers use message server delivery functions and message serversuse the registration service to authenticate clients, locate inboxes, etc. This view, however, is notquite complete. If a change is made to the inbox locations of any individual, notice has to be givento all message servers that are removed, so they can redeliver any messages for that individualbuffered in local inboxes. Notice is given by the registration server delivering a message to themessage servers in question informing them of the change. Correctness requires that the lastregistration server that changes its copy of the entry emit the message; we achieve this effect byhaving each registration server emit such a message as the change is made. A message serverreceiving an inbox removal message simply redelivers all messages in the affected inbox. Redeliveryis sufficient to rebuffer the messages in the proper server. In the system as implemented asimplification is made; inbox removal messages are sent to all inbox sites for the affected individual,not just to removed sites. While this may appear to be wasteful, it is most unusual for any siteother than the primary one to have anything to redeliver. Other registration service clients that use the registration data base to control resource bindings mayalso desire notification of changes to certain entries. A general notification facility would requireallowing a notification list to be associated with any data base entry. Any change to an entry wouldZ>pqpUkQ SdQBP *:N@3sp'Lv$In$>G"=E+:D)>s;p:":" Os 8Wps pN6?(4B2^1-M/b6(-J+F*(>(8>"&m(!Ts!Lp?#]JU!OW+7T,6A,H b< PG;I/KdH >[XKGRAPEVINE17result in a message being sent to the RNames on its notification list. We have not provided thisgeneral facility in the present implementation, but would do so if the system were reimplemented.7. Finding an Inbox SiteThe structure and distribution of the Grapevine registration data base are quite complex, with manyindirections. Algorithms for performing actions based on this data base should execute reliably inthe face of administrative changes to the registration data base (including those which causedynamic reconfiguration of the system) and multiple servers that can crash independently. In theirfull generality such algorithms are expensive to execute. To counter this, we have adopted atechnique of using caches and hints to optimize these algorithms. By cache we mean a record ofthe parameters and results of previous calculations. A cache is useful if accessing it is much fasterthan repeating the calculation and frequently produces the required value. By hint we mean a valuethat is highly likely to be correct and that is faster to check than to recalculate. To illustrate howcaches and hints can work, we describe here in some detail how the message server caches hintsabout individuals' inbox sites.The key step in the delivery process is mapping the name of an individual receiving a message tothe preferred inbox site. The mapping depends upon the current state of the registration data baseand the availability of particular message servers. To make this mapping process as efficient aspossible, each message server maintains an inbox site cache that maps RNames of individuals to ahint for the currently preferred inbox site. Each message server also maintains a down server listcontaining the names of message servers that it believes to be inaccessible at present. A messageserver is placed on this list when it does not accept connections or fails during a connection. Therules for using the inbox site cache to determine the preferred message server for a recipient I are:1.If an entry for I is in the cache and the site indicated for I in the cache is not on the downserver list, then use that site;2.Otherwise get the inbox site list for I from the registration service; cache and return for usethe first site not on the down server list; if the selected site is not first on the list, mark theentry as "secondary."There has to be a rule for removing message servers from the down server list; this happens whenthe server shows signs of life by responding to a periodic single packet poll.When a message server is removed from the down server list, the inbox site cache must be broughtup to date. Any entry that is marked as "secondary" and that is not the revived site could be thereas a substitute for the revived site; all such entries are removed from the cache. This heuristicremoves from the cache a superset of the entries whose preferred inbox site has changed (but notall entries in the cache) and will cause recalculation of the preferred inbox site for those entries thenext time they are needed.We noted earlier that changing an individual's inbox site list may require a message server toredeliver all messages in that individual's inbox, and that this redelivery is triggered by messagesfrom registration servers to the affected message servers. The same changes also can cause sitecaches to become out-of-date. Part of this problem is solved by having the inbox redeliverymessages also trigger appropriate site cache flushing in the servers that had an affected inbox.Unfortunately any message server potentially has a site cache entry made out-of-date by the change.\pq;pX VV>aO7rJp VHSFFCDEBIA)Fsp?^W=Osp;D#9+3845, T3ac17*/+sp#.=s,6p sp;*l_(Vsps{%psp,sp#{ &sp4211)S ^NVX%?6,6*,K a YVKG\/IdC6 >]MGRAPEVINE18Instead of sending a message to all message servers, we correct the remaining obsolete caches byproviding feedback from one message server to another when incorrect forwarding occurs as a resultof an out-of-date cache. Thus, the site cache really does contain hints.To summarize the cache flushing and redelivery arrangements, then, registration servers removeservers from an inbox site list and send messages to all servers originally on the list. Each respondsby removing any entry for the subject individual from its site cache and redelivering any messagesfound in that individual's inbox. During this redelivery process, the cache entry will naturally berefreshed. Other message servers with out-of-date caches may continue to forward messages herefor the subject individual. Upon receiving any message forwarded from another server, then, thetarget message server repeats the inbox site mapping for each name in the steering list. If thepreferred site is indeed this target message server, then the message is added to the correspondinginbox. If not, then the target site does the following:1.Forwards the message according to the new mapping result;2.Sends a cache flush notification for the subject individual back to the server that incorrectlyforwarded the message here.The cache flush notification is a single packet sent unreliably: if it fails to arrive, another one willbe provoked in due course. This strategy results in the minimum of cache flush notifications beingsent  one to each message server whose cache actually needs attention, sent when the need forattention has become obvious. This mechanism is more economical than the alternative of sendingcache flush notifications to all message servers, and even if that were done it would still benecessary to cope with the arrival of messages at old inbox sites.8. System ConfigurationAs described in Section 5, the configuration of the Grapevine system is controlled by its registrationdata base. Various entries in the data base define the servers available to Grapevine and the waysin which the data and functions of Grapevine are distributed among them. We now considerprocedures for reconfiguring Grapevine.8.1 Adding and Deleting Registry ReplicasThe set of registration servers that contain some registry is defined by the membership set for thecorresponding group in the GV registry. When a change occurs to this membership set, the affectedserver(s) need to acquire or discard a copy of the registry data. To discover such changes, eachregistration server simply monitors all change messages for groups in the GV registry, watching foradditions or deletions of its own name. A registration server responds to being deleted bydiscarding the local replica of the registry. With the present implementation, a registration serverignores being added to a registry site list. Responding to a registry addition in the obvious way by connecting to another registration server for the registry and retrieving the registry data  is notsufficient. Synchronization problems arise that can lead to the failure to send change messages tothe added server. Solving these problems may require the use of global locks, but we would prefera solution more compatible with the looser synchronization philosophy of Grapevine. For thepresent obtaining a registry replica is triggered manually, after waiting for the updates to the GVregistry to propagate and after ensuring that other such reconfigurations are not in progress.]"pqpXOsp,V-5TIQ.0O@'NU LRW J_H`FIE'4/C]8{@T9{=LN;8zG"6C4upT3C1OB/B(}r#dp/7!V B's)pDqp,N\7qp> ^ #Fu Yp_up`RE.\qdpI =^<<GRAPEVINE198.2 Creating ServersInstalling a new Grapevine computer requires creating a new registration server and a new messageserver. To create the new registration server named, say, Zinfandel.gv, a system administrator firstcreates that individual (with password) in the registration data base, and gives it a connect site thatis the internet address of the new computer. Next, Zinfandel.gv is added to the membership set ofall registries that are to be recorded in this new registration server. To create the new messageserver named, say, Zinfandel.ms, the administrator creates that individual with the same connect site,then adds Zinfandel.ms to MailDrop.ms. Both servers are assigned inbox sites.Once the data base changes have been made, the registration and message servers are started on thenew computer. The first task for each is to determine its own name and password so that it mayauthenticate itself to the other Grapevine servers. A server obtains its name by noting its owninternet address, which is always available to a machine, then consulting the data base in a differentregistration server to determine which server is specified to be at that address: the registrationserver looks for a name in the group gv.gv, the message server looks for a name in the groupMailDrop.ms. Having found its name, the server asks a human operator to type its password; theoperator being able to do this correctly is the fundamental source of the server's authority. Theserver verifies its password by the authentication protocol, again using a registration server that isalready in operation, and then records its name and password on its own disk. The new registrationserver then consults some other registration server to obtain the contents of the GV registry in orderto determine which groups in the GV registry contain its name: these specify which registries thenew server should contain. It then contacts appropriate other servers to obtain copies of the database for these registries. Because the new server can authenticate itself as an individual in the GVregistry, other registration servers are willing to give it entire data base entries, including individuals'passwords.Obtaining the registry replicas for the new registration server suffers from the same synchronizationproblems as adding a registry replica to an existing server. We solve them the same way, by waitingfor the administrative updates to the GV registry to propagate before starting the new computer andavoiding other simultaneous reconfigurations.8.3 Stopping and Restarting ServersStopping a server is very easy. Grapevine computers can be stopped without disturbing any diskwrite in progress. The message and registration server are programmed so that, when interruptedbetween disk page writes, they can be restarted without losing any permanent information. While amessage or registration server is not running, messages for it accumulate in its inboxes in messageservers elsewhere, to be read after it restarts.Whenever a message and registration server restart, each verifies its name and password byconsulting other servers, and verifies that its internet address corresponds to the connect siterecorded for it in the data base; if necessary it changes the connect site recorded in the data base.Updating the connect site allows a server to be moved to a new machine just by moving thecontents of the disk. After restarting, a registration server acts on all accumulated data base changemessages before declaring itself open for business.Using the internet, it is possible, subject to suitable access controls, to load a new software versioninto a remote running Grapevine computer, stop it, and restart it with the new version.Yjpq;pTsQpGO's pMF!L/4s pJdOH s pGF s ps p(C^A&9@2\>g\ <H:%sp9s pT7=(:5r^3V 1Rqp0!qp*.HR,}#@q*pl( %#B$(<"J&qp / -gs#^p@T B!B40,W a/1 Z 5$*=73/PdW >ZTGRAPEVINE208.4 Other ReconfigurationsOne form of reconfiguration of the system requires great care: changing the location of inbox sitesfor a registration server. Unless special precautions are taken, the registration server may neverencounter the change message telling it about a new inbox site, because that message is waiting forit at the new site. A similar problem arises when we change the internet address of a messageserver that contains a registration server's inbox. Restrictions on where such data base changes canbe initiated appear to be sufficient to solve these problems, but we have not automated them.Although this resolution of this problem is somewhat inelegant, the problem is not common enoughto justify special mechanisms.pqps p[ $Q Y'< QY '6/ Sd R'>TGRAPEVINE21Part III: Conclusions9.Present StateThe Grapevine system was first made available to a limited number of clients during 1980. Atpresent, Fall 1981, it is responsible for most of the mail traffic and distribution lists on the Xeroxresearch internet. There are five dedicated Grapevine computers, each containing a registrationserver and a message server. The computers are physically distributed among northern andsouthern California and New York. The registration data base contains about 1500 individuals and500 groups, divided mainly into four major registries; there are two other registries used by non-mail clients of the registration service, plus the GV and MS registries. The total message trafficamounts to some 2500 messages each working day, with an average of 4 recipients each; themessages average about 500 characters, and are almost exclusively text.The registration data base also is used for authentication and configuration of various file servers,for authentication and access control in connection with maintenance of the basic software and databases that support our internet gateways, and for resource location associated with remote procedurecall binding. The registration data base is administered almost exclusively by non-technical staff.There are at least three separate computer mail interface programs in use for human-readable mail.Most mail system users add and delete themselves from various distribution lists, removing thistiresome job from administrative staff.The Grapevine registration and message servers are programmed in Mesa [7]. They contain some33,000 lines of custom written code, together with standard packages for runtime support and PUP-level communications. The Grapevine computers are Altos [12] with 128K bytes of main memoryand 5M bytes of disk storage. A running Grapevine computer has between 40 and 70 Mesaprocesses [4], and can handle 12 simultaneous connections. The peak load of messages handled bya single message server so far exceeds 150 per hour and 1000 messages per day. One serverhandled 30,000 messages while running for 1000 hours. The maximum number of primary inboxesthat have been assigned to a server is 380.10.DiscussionThe fundamental design decision to use a distributed data base as the basis for Grapevine's messagedelivery services has worked out well. The distributed data base allowed us to meet the designgoals specified in Section 2, and has not generated operational difficulties. The distributed updatealgorithms that trade atomic update for increased availability have had the desired effect. Thetemporary inconsistencies do not bother the users or administrators and the ability to continue database changes while the internet is partitioned by failed long-distance links is exercised enough to beappreciated.In retrospect, our particular implementation of the data base for Grapevine was too inflexible. Asthe use of the system grew, the need for various extensions to the values recorded in individual andgroup entries has become apparent. Reformatting the existing distributed data base to include space[pq;pVriO}{ Jdp PHY FNE!8C9QAoJ?2qpqp'= O<G9'>7<D5r:*3O1I0=".G'+? P)tHqp'F%9$]"JO T+r{ pDJ!D 4K jG G S.(<dN 0 S>\&GRAPEVINE22for the new values is difficult operationally. In a new implementation we would consider providingfacilities for dynamic extension of the value set in each entry. With value set extension, however,we would keep the present update algorithm and its loose consistency guarantees. These guaranteesare sufficient for Grapevine's functional domain, and their simplicity and efficiency are compelling.There is a requirement in a message system for some data base which allows more flexibledescriptions of recipients or distribution lists to be mapped onto message system RNames (such asthe white or yellow page services of the telephone system), but in our view that service falls outsideof Grapevine's domain. A system which provides more flexibility in this direction is described in[2].Providing all naming semantics by indirection through the registration data base has been verypowerful. It has allowed us to separate the concept of naming a recipient from that of addressingthe recipient. For example, the fact that a recipient is named Birrell.pa says nothing about wherehis messages should be sent. This is in contrast to many previous message systems. Indirections alsoprovide us with flexibility in configuring the system.One feature which recurs in descriptions of Grapevine is the concept of a "group" as ageneralization of a distribution list. Our experience with use of the system confirms the utility ofuse of the single "group" mechanism for distribution lists, access control lists, services, andadministrative purposes.Clients other than computer mail interfaces are beginning to use Grapevine's naming,authentication, and resource location facilities. Their experience suggests that these are animportant set of primitives to provide in an internet for constructing other distributed applications.Message transport as a communication protocol for data other than textual messages is a usefuladdition to our set of communication protocols. The firm separation between Grapevine and itsclients was a good decision; it allows us to serve a wide variety of clients and to give usefulguarantees to our clients, even if the clients operate in different languages and in differentcomputing environments. At several points in Grapevine, we have defined and implemented mechanisms of substantialversatility. As a consequence, the algorithms to implement these mechanisms in their full generalityare expensive. The techniques of caches and hints are powerful tools that allow us to regainacceptable efficiency without sacrificing "correct" structure. The technique of adding caches andhints to a general mechanism is preferable to the alternative style of using special case short cutmechanisms whose existence complicates algorithmic invariants.Grapevine was built partly to demonstrate the assertion that a properly designed replicated systemcan provide a very robust service. The chance of all replicas being unavailable at the same timeseems low. Our experience suggests that unavailability due to hardware failure follows this pattern.No more than one Grapevine computer at a time has ever been down because of a hardwareproblem. On the other hand, some software bugs do not exhibit this independence. Generally allservers are running the same software version. If a client's action provokes a bug that causes aparticular server to fail, then in taking advantage of the service replication that client may causemany servers to fail. A client once provoked a protocol bug when attempting to present a messagefor delivery. By systematically trying again at each server in MailDrop.ms, that client soon crashedall the Grapevine computers. Another widespread failure occurred as a result of a malformedregistration data base update propagating to all servers for a particular registry. We conclude that itis hard to design a replicated system that is immune from such coordinated software unreliability. XspqpSQQ9+P QN@JLuOJ^HSGHEKBC Q@x(sps >p$s p<8-;68  ;6FR4{#<2/S/T%-1-, Y*H$:(~O&M$Q #  OKe(5CF!>HN VW F W $/2 Y318)@s p@/ed9)f ~>Y\GRAPEVINE23Our experience with Grapevine has reinforced our belief in the value of producing "real"implementations of systems to test ideas. At several points in the implementation, reality forced usto rethink initial design proposals: for example, the arrangements to ensure long-term consistency ofthe data base in the presence of lost messages. There is no alternative to a substantial usercommunity when investigating how the design performs under heavy load and incrementalexpansion.11.AcknowledgementsMany people have contributed to the success of the Grapevine project. Bob Taylor and BobMetcalfe recognized early the need for work on computer mail systems and encouraged us todevelop Grapevine. Ben Wegbreit participated in the initial system design effort. Many colleagueshave helped the project in various ways: Dave Boggs, Doug Brotz, Jeremy Dion, Jim Horning,Robert Kierr, and Ed Taft deserve special mention. Jerry Saltzer and several anonymous refereeshave made valuable commentaries on earlier drafts of the paper.12.References1.Boggs, D.R., Shoch, J.F., Taft, E.A., and Metcalfe, R.M. PUP: an internetwork architecture.IEEE Trans. on Communications 28, 4 (April 1980), 612-634.2.Dawes, D.R., Harris, S., Magoon, M., Maveety, S., and Petty, D. The design and serviceimpact of COCOSAn electronic office system. In Computer Message Systems. R.P. Uhlig (Ed.)North Holland, New York, 1981, 373-384.3.Gifford, D.K. Weighted voting for replicated data. In Proc. 7th Symposium on OperatingSystems Principles. (Dec. 1979), ACM Order No. 534 790, 150-162.4.Lampson, B.W., and Redell, D.D. Experience with processes and monitors in Mesa. Comm.ACM 23, 2 (Feb. 1980), 105-117.5.Levin, R., and Schroeder, M.D. Transport of electronic messages through a network.TeleInformatics 79, North Holland, 1979, 29-33.6.Metcalfe, R.M., and Boggs, D.R. Ethernet: Distributed packet switching for local computernetworks. Comm. ACM 19, 7 (July 1976), 395-404.7.Mitchell, J.G., Maybury, W., and Sweet, R. Mesa language manual (Version 5.0), TechnicalReport CSL-79-3, Xerox Palo Alto Research Center, 1979.8.National Bureau of Standards, Data encryption standard. Federal Information ProcessingStandards 46, Jan. 1977.Tpq;pO>N<)LRLJNH 4 !F ?r{:p N9F7<S5qH3L1?*r{ %p=#sp ?$ spT'L8sL p/y3sp  5sp G  s p*/67.sd p  7>UGRAPEVINE249.Needham, R.M., and Schroeder, M.D. Using encryption for authentication in large networksof computers. Comm. ACM 21, 12 (Dec. 1978), 993-999.10.Rothnie, J.B., Goodman, N., and Bernstein, P.A. The redundant update methodology of SDD-1: A system for distributed data bases (The fully redundant case.) Computer Corporation ofAmerica, June 1977.11.Shoch, J.F. Internetwork naming, addressing, and routing. In Proc. 17th IEEE ComputerSociety International Conference, September 1978, IEEE Cat. No. 78 CH 1388-8C, 72-79.12.Thacker, C.P., McCreight, E.M., Lampson, B.W., Sproull, R.F., and Boggs, D.R. Alto: Apersonal computer. In D.P. Siewiorek, C.G. Bell, and A. Newell, Computer Structures: Principlesand Examples. (2nd Ed.) McGraw-Hill, New York, 1981.13.Thomas, R.H. A solution to the update problem for a multiple copy data base which useddistributed control. Bolt, Beranek, and Newman Technical Report #3340, July 1976.pqp"7Ds p<Qq@ ?s,  p5NAs 7 p(/.)dR I=x HELVETICA HELVETICA  TIMESROMAN HELVETICA HELVETICAMATH  HELVETICA  HELVETICA LOGO TIMESROMAN  TIMESROMAN TIMESROMAN  TIMESROMAN TIMESROMAN  TIMESROMAN  HELVETICA MATH  TIMESROMAN  TIMESROMAN TIMESROMAN  TIMESROMAN  TIMESROMAN HELVETICA ^" ,4<D N!o x    de (-e.f qedee~e-f:f ded~edee,Tude8d~e/e feM"Tedd~e e/e e eMe0eB'?e\i~ffe eDeD~e/eN~fee/j/[Grapevine.press Schroeder.paO13-Jul-82 10:21:09 PDT: