Adding Voice to an Office Computer Networkby D. C. SWINEHART, L. C. STEWART, and S. M. ORNSTEINComputer Science LaboratoryXerox Palo Alto Research CenterCaptionsFigure 1. The physical architecture of the Etherphone system. Etherphones and servers are connected toEthernet local networks, interconnected by a packet gateway. A 1.5 Mbps Ethernet is used because of theavailability of a VLSI controller.Figure 2. Components of a typical Etherphone office installationincluding telephone set, microphone and speaker, Etherphoneprocessor, and Ethernet transceiver.Figure 3. Internal structure of the Etherphone hardware. The slave processor handles voice signalprocessing while the main processor handles network communications. The two processors normallyexecute in parallel and communicate through shared memory. a'p*\1qX5W^UEOr I sg Gh F"@?dA@=;@<$ 5c 4` 2:0@.Cj4;H1Abstract: This paper describes the architecture and initialimplementation of an experimental telephone system deve-loped by the Computer Science Laboratory at the Xerox PaloAlto Research Center (PARC CSL). A specially designedprocessor (EtherphoneTM) connects to a telephone instru-ment and transmits digitized voice, signalling, and supervisoryinformation in discrete packets over an Ethernet local areanetwork. When used by itself, an Etherphone processorprovides the functions of a conventional telephone, but itcomes into its own when combined with the capabilities of anearby office workstation, a voice file service, and othershared services such as databases. Most of the work so farhas gone into the basic provisions for voice switching andtransmission. Today the system supports ordinary telephonecalls and simple voice message services. We will expandthese functions as we explore the integration of voice with ourexperimental office systems.IntroductionVoice, in meetings and in conversation, has always played amajor role in the conduct of business in the office. Since theintroduction of the telephone, the role of voice in the office hasreached major proportions. There have been many improvementsin telephony since commercial service began in the 1880's, butexcept for direct dialing, few have resulted in functional improve-ments for the subscriber. The advent of computers and newcommunications technologies has begun to allow the developers oftelephone equipment to provide substantial increases in functionand convenience. In addition to improvements in voicecommunications, these advances include data and video transmis-sion, with recent forays into electronic mail and other non-voicefunctions.At PARC, we and our colleagues have approached the problemof office productivity from a different perspective. Our researchinto personal information systems has provided considerable expe-rience with the design, production, and use of personal computersand programs that augment or supplant traditional paper-basedoffice activities. At the heart of this work is the personalcomputer, such as the Xerox 8010 Professional Workstation. Thepersonal workstation gives its user a coherent and integrated userinterface supporting the creation, modification, storage, and prin-ting of documents, memos, messages, and personal databases. Highspeed Ethernet communications link each workstation to those ofother users and to other computers that provide a variety of ser-vices, such as file storage, electronic mail distribution, and highquality printing.Ironically, the typical office in our laboratory includes a multi-function workstation and a stand-alone telephone instrument two isolated systems. Telephone equipment manufacturers haveapproached the merger of these technologies by treating officeapplications as extensions to their basic telephone service (Refs. 27-29). With the Etherphone project, we have the opportunity to gainKaft@aq s@` '@^e2@\@[ u vZw[s@YL7@W$@U#@TV$@R@Q4@O`'@M&@L#@Jj/@H-@GEBlr V?t)E>4 E<\0E: 0E9 :E7f#E5%E4%E2p*E0j-kE/!&E-zAE+ V)#E'i3st E%AE$2E"s1 E stE%- stE}%E9 E/!stE?E/E9'E V%E);E st&E >E 3)Est$ @D*_2additional insights into the potential for voice by instead treatingvoice communications as an extension to our existing officesystems.This paper discusses our goals and our reasons for undertakingthe project. An overview of the system architecture is presented,and the fundamentals of Ethernet voice transmission are described.These introductory sections provide the context for a description ofthe structure and functions of the major system components,including applications packages and programs that have been deve-loped for existing workstations.ObjectivesThe drawbacks of the telephone are as evident as its benefits.So many calls fail to reach the intended party that the term"telephone tag" has been coined to describe the resulting frustra-tion. Conversely, when a call is successful, the called party oftenwishes it had failed; unwanted or ill-timed telephone calls are amajor source of interruption and annoyance in the workplace.As long as people remain busy and mobile, many of theseproblems will defy solution by technical means. Moreover, greatcare is required whenever one considers changes to a socialinstitution. The conventions of telephone use are deeplyestablished. Although people are often annoyed by the presentarrangements, they also tend to be quite conservative and to resentchanges in the system. Nevertheless, we feel that improvements inseveral areas will prove useful:Improved ControlThe twelve pushbuttons and hookswitch on a telephone, com-bined with a set of call progress tones from the central office,provide a reasonable user interface for placing and answering tele-phone calls. However, newer telephone systems have madeavailable myriad additional features without substantially improvingthe user interface. Multiple-line telephones and telephones withadditional pushbuttons for common functions improve matters, butmost attempts to provide more telephone capabilities become toocomplicated to control long before their designers run out of ideas.Many features are never used because they are too hard toremember and too hard to use.Similar problems with the composition and editing of text andgraphical documents led to the development of our workstation-based systems (Ref. 23). The contextual power of the high-resolution display screen, "mouse" pointing device, and menu-based software have increased the ability of the casual user tomaster large numbers of operations and quite complex situations.We believe that the management of interactive voice will submit tothe same techniques, provided the workstations have adequateprogrammed access to the telephone-switching functions.KaftEa&E` &'$E^e V[EY st) EXU:sEVt)EU"ES_;EQEMr VJGt0 sEHt(EF/EEQst&EC?EB- V?A -E=stE;4E:K sUtVE8 stE6"!E5Ust*E3E/DxX V,t$E*'E)4'E' s t*E%,E$>st,E"&E ?EH:sEt ,E V89E+Est EB3 E,E(sELt?E (E 7 JE +\Ui3Informed "Agents"A private secretary can act as an agent for an executive whocannot afford the time to answer every incoming call. The secre-tary can take messages, locate elusive conversants, and screenincoming calls. In the modern office, this kind of assistance is rare.Attempts to simulate some of these functions with simple recordingmachines have been notoriously unsuccessful, since many callersfind them annoying or intimidating. While an unattended systemcannot provide the finely tuned discrimination of a private secre-tary, it can provide at least simple filters of the flavor: ``No callsfor half an hour, unless it's Bob, or unless Fred calls about thebudget.''Conversely, for the individual who needs to be out of the officebut who wishes to be available by telephone, we can employ thecapabilities of our distributed systems to allow the telephone systemto track his location. The same kinds of information in the handsof human attendants can supply informed assistance to those callingfrom outside the Etherphone system.Voice as DataLeaving a recorded message can often be an effective substitutefor those calls that go unanswered. Telephone answering machinesmake this possible, but their imperative nature demands a well-constructed, composed message rather than an informal commu-nication. Few of us are able to produce such a composition in realtime. These machines alienate callers, who often hang up withoutleaving any useful message.Our approach to this problem is to shift the control overproducing a voice message from the callee to the caller. The sys-tem can allow the caller to record, audition, and modify a messagebefore sending it  or deciding not to. The message might con-tain text, recorded voice, or a combination. If experience withelectronic text messages is any guide, users will come to preferrecorded messages to telephone calls in some situations. Experi-ence with stand-alone voice message systems that are beginning toappear tends to support the value of caller-controlled voice mes-sages (Refs. 15, 26, 28).Voice in messages and documents will find application farbeyond simple telephony. We intend to explore voice annotationof ordinary office documents, spoken reminders, musical prompts,and other combinations of voice and visual media.In summary, we feel that our computing and communicationsenvironment can be brought to bear on voice in two ways. Oneway is to improve our control over otherwise ordinary voice com-munications; the other is to incorporate the voice medium into ourintegrated environment on an equal footing with text, graphics, anddata. While the facilities we have built or proposed are notentirely new, we believe that the full integration of voice into ourcomputer-based office systems will make existing functions moreeffective and new applications possible.KaftEaxX V^t+E]K/st E[&EY st/sEXUt8 EV7EU"stES_)EQ; EP0ENi VK2 EJ,EHY2EFst'EE 0ECc#E>xX V<8t!E:stE8/E7B<E5st5E3st$E2L V/ -E- ,stE,<-E*#stE(+stE'F!E%7stE#0E"P*E  V9E@ st%E/E1 V0 .E"stE5 E:2E-E st"E D2E "E( E+_]4System ArchitecturePARC EnvironmentFor most office applications, a distributed network of personalworkstations, interconnected by high-bandwidth local-area networksor by specialized PABX lines, is supplanting the traditional centralcomputing installation. At PARC, both the standard Ethernet,operating at 10.0 Mbps, and an earlier research Ethernet, operatingat 3.0 Mbps, are in general use (Refs. 16, 6). These networks linkpersonal workstations to each other and to shared resources such asfile servers, mail servers, and high-quality laser printers (Refs. 2,17). Communications outside the local area are provided primarilyby leased lines  most operating at 9.6 or 56 Kbps, with someoperating at up to 1.5 Mbps (Refs. 3, 20). Several types ofpersonal workstations are in use, including the Xerox Alto, Dorado(1132), 1108 and 1100 processors (Refs. 24, 14). These sameprocessors are used as the controllers for most servers. Everyworkstation has a high-resolution bitmap display (1000 by 800pixels), a keyboard, and a mouse (Ref. 23).In our laboratory, most users work within the Cedar program-ming environment. Cedar is an experimental integrated systemthat is used for software development and computer-aided design,as well as for day-to-day operation of office applications, such asdocument preparation, graphics, and electronic mail (Refs. 5, 23).Our approach to applications research is to build prototypeapplications based on the Cedar environment, then to use them inour daily work. This approach encourages us to carry thedevelopment of our implementations far enough that performanceand reliability problems do not impair our ability to evaluate theirusefulness. Once they are introduced, these applications supplyboth ideas and useful components for further developments. Mostof our applications are structured so that users can either operatethem directly, or write programs that combine their functions withthose of other facilities.Basic Architectural DecisionsThe goals of improved telephone functionality and integratedvoice add two new requirements to the existing capabilities inCedar: rapid and versatile control over voice transmission andswitching, and a high performance voice storage capability.We surveyed several possible architectures for the system.Subscriber line access to a voice storage system and use of ourexisting Centrex service through DTMF signalling was rejected dueto the low bandwidth and the inflexible control over switching.Use of a commercial PABX was attractive, but the necessaryswitching capabilities were not commercially available. The use ofour own computers to control the switching operations of astandard PABX was rejected primarily due to the nontechnicaldifficulties of arranging the necessary cooperative effort.Kaft>amr>\x VYtEXU#EV$EUst$ES_(EQ stEP&ENi; ELst<EK4EIs stEG/EF$+st ED}7stEB(EA.+ V>m*E<st*E;)E9w*E7B V5,E3g1E1 st(E0,E.q2E, st &E+" /stE){;E'%E&,E!x Vt<EZ;E"E ; VJsEt0E4 ET*sEt(Est E ^&E /E ; > +^Ck5Although reluctant to undertake the necessary developmentwork, we concluded that the most effective way to satisfy our needswas to construct our own transmission and switching system. Weelected to use the Ethernet to transmit voice as well as controlinformation. We chose Ethernet because of its pervasiveness in ourenvironment and in order to demonstrate the effectiveness ofEthernet for voice.The potential of the Ethernet for packet voice transmission hasbeen described elsewhere (Ref. 22). In the following section, wepresent only a brief analysis.Ethernet Transmission of VoiceTraditional telephony relies upon circuit-switched communi-cations. Using packet-switched communications to provide thesame functions raises some interesting issues:DelayThe first issue one must address is the end to end (microphoneto receiver) delay for interactive telephone calls, since evenrelatively short delays can be disturbing. Specifically, end-to-enddelays of less than 20 milliseconds are generally undetectable,delays of 40 to 80 milliseconds can cause trouble when significantecho is present, and delays greater than 100 milliseconds begin todisrupt normal conversation (Refs. 1, 10, 4).In a packet voice system, the delay has two components:packetization delay and transmission delay. Packetization delayarises because the first sample of a packet cannot be transmitteduntil the last sample has been digitized. The delay introduced isequal to the product of the sampling interval (125 microseconds istypical) and the packet size. Transmission delays can result fromsoftware delays and network-access delays.Ethernet transmission exhibits complex relationships amongpacket size, offered load, and access delay. Studies based ontypical packet sizes (Refs. 16, 9, 13, 21, 25) have shown that whenoffered loads fall below a specific threshold there are few collisions,and network-access delays are negligible in relation to the otherdelay components. Above the threshold, delays climb rapidly asthe network becomes saturated. Sending only large packetssuitable for data transmission pushes this threshold well above 90%of full utilization. Shortening the packets to a size suitable forinteractive voice reduces the threshold to the 50 - 80% range. Onemust engineer an Ethernet voice system to operate below the kneeof this delay curve.Our voice hardware and protocols limit packetization andtransmission delay to under 40 milliseconds. We have found thatvoice delay due to Ethernet transmission is not a problem withinthe local area. However, it may become significant when ourdelays are added to others, notably satellite transmission times.Kaft Va'E` .E^e)stE\-E[ st1EYo'EW VU"ES_"stEQEMr VJGt!EHst$EF.EBx V?t+E>&/>E< stE:1 E901E7#E5- V3 + E1y*stE/2E.*(stE,$E* st#E)4* V&s4E$ stE#$&E!}3E;E.st! E  stE*E8stE.stE;EB V/E+st E2+E st+E - \E *]oq6CapacityOne major advantage of Ethernet voice transmission is itsefficient sharing of a high-bandwidth channel among users. Com-bined with a method that we call silence detection, this sharingpermits concentration effects similar to those of Time AssignedSpeech Interpolation (TASI) (Ref. 1). The silence-detection meth-od involves transmitting only those packets containing significantsignal energy.To estimate the resulting capacity, consider an implementationwhose voice packets contain 160 eight-bit samples (representing 20milliseconds of voice), collected at 125 microsecond intervals.Using the 10 Mbps standard Ethernet, each packet includes 64bytes of overhead, so each (one-way) voice connection consumes 90Kbps. Assuming the worst case, where traffic must be limited to50% utilization to avoid unacceptable access delay, the bandwidthavailable in a network dedicated to voice transmission is 5 Mbps,yielding 28 full-duplex "trunks" (55 one-way voice streams.) If weestimate a TASI advantage of 1.6 for this size trunk group (Ref. 1),a dedicated 10 Mbps Ethernet can support about 45 conversations.If at most 20% of telephones are in use at once, such a networkcould support in excess of 225 subscribers. This argument requiresthat control traffic and other non-voice traffic is negligiblecompared to voice traffic.From these arguments, it seems clear that a simple single-cableEthernet voice-transmission system can support a sizeable instal-lation. A multiple-cable installation would be needed to carryvoice traffic above these limits, or to support significant data trafficin addition to voice. Individual cables would be interconnected bypacket gateways or conventional circuit switches. The verydifferent statistics of data and voice traffic might make it advisableto partition a large system into parallel data and voice cables.Installation of a multiple-cable system would require attention tousage patterns and other traffic-engineering considerations.SecuritySecurity can be a problem with Ethernet or any other broadcastmedium, since no physical protection against eavesdropping ispossible. We impose security on our transmissions by encryptingall voice and control traffic using the Data Encryption Standard(Ref. 7). The availability of single-chip DES devices operating at900 Kbps makes this possible. We distribute the keys using amethod similar to the trusted authentication server approachadvocated by Needham and Schroeder (Ref. 18).New FunctionsThe broadcast nature of the Ethernet admits some newcommunication possibilities. Conference calls among several sitescan be achieved without the need for special conference-bridgehardware: multicast techniques permit voice packets from a givensite to be received and combined by each of the others (Ref. 8). InKaftEax V^t9E]K- stE[ vt EY1 EXU!stEVBEU VRE 3EP 7EN'sEMOt$EK,EJst. EHY%EFAEE $ECc: EA 1sE@t 5E>m stE<;(<E; V8])E69E5st&E3gHE1(E0/60E.q/E,E+"3E){<E%x V"Pt/E  0Est3EZ#Est%E stEd(<E-ETx Vt89E stE D$E :E+st E+_]r7the extreme, voice from a meeting or conference can be transmittedonce and received by any number of listeners, without consumingany more bandwidth than a single conversation consumes.Etherphone Voice ProtocolsThe voice protocols for the Etherphone system are based on thepreceding analysis. We transmit 8000 8-bit samples per second,using the industry-standard mu-255 encoding (Ref. 11). TheEtherphone processor actually supports two related protocols, onefor interactive voice (telephone calls), the other to play and recordstored voice.The interactive-voice protocol was designed to meet a delaybudget of 40 milliseconds end-to-end. This delay consists of apacketization delay of 20 milliseconds, hardware latency of 5milliseconds for encryption and Ethernet transmission, softwaredelays of up to 5 milliseconds, and what we call an anti-jitter delayof 10 milliseconds. To meet this budget, the protocol specifies thetransmission of fifty packets per second, each containing 160 voicesamples and 36 bytes of addressing and control overhead.Anti-jitter delay is introduced by buffering at the receivingstation to allow for variations in the arrival times of packets. Onemight say the protocol operates with about one-half packet ofbuffering. This delay budget does not allow time for retrans-missions in the event of lost packets. This has proven acceptable,because the native packet loss rate of well-designed Ethernetcomponents has been shown to be less than one packet in twomillion (Ref. 12).Recording and playback of stored voice has slightly differentcharacteristics. Transmission delay is not particularly important,provided that the start-up delay from request to playback isreasonably short. The stored-voice protocol calls for 100milliseconds of buffering. The additional buffering is desirablebecause the shared voice file server may not be able to schedulethe transmission times of packets as accurately as a dedicated voiceterminal can. The additional buffering also permits retransmission,although we have not found the need to implement it.Major System ComponentsHardware choicesPerhaps the most significant aspect of the Etherphone systemarchitectural design was the choice of voice terminal equipment.Given the existing PARC environment, voice terminals were theonly new equipment required. The alternatives we explored wereto add special hardware and software to each workstation, or todesign a standardized Ethernet telephone peripheral  theEtherphoneTM  that would do all of the necessary voice conver-sion and transmission. We chose the Etherphone approach forseveral reasons. We wanted to make voice capabilities available toKaftEa3E` -E^e7EYx VW;t&EUst(ES 1 stERE$EP5EN VL55EJ$stEH=EG??EE+v tECstEBI&E@8 V=E<8"stE:*E8st2E7B%stE5#E3)E2L V/3 E-st& E,<E*XstYE(st%E'F,E%/E# st%E"P4ErE8x Vvt 1E,sE(t$Est!E,E2I9E v w tE stE st/ lE+_:p8users over our full range of workstations. Furthermore, we did notwant to redesign the voice hardware for each type of workstation,nor were we confident that all of them could handle thetransmission and control of telephone conversations with sufficientreliability and without substantial loss of performance in their otherduties.An additional hardware-related decision was motivated byeconomics. We were able to limit the size and speed of the Ether-phone processor, and thus its cost, by restricting its activities tothose that could not be readily performed by writing software forthe existing workstations or servers.Etherphone users converse with each other using only Ethernetcommunications. To make calls outside the system, however,access to the public telephone network is needed. The preferredmethod, transplanted from the conventional PABX environment,would involve connecting a group of central-office trunk lines to aserver that would complete the connections over the Ethernet. Forour prototype system, we chose instead to provide each Etherphonewith a jack for connection to an existing subscriber line. Tocomplete telephone calls involving non-Etherphone users, theEtherphone connects the telephone instrument to the subscriberline. This choice allows the relatively small community of Ether-phone users to retain their short extension numbers, avoids the costof developing a trunk server, and permits direct use of the subscri-ber line as a backup if the hardware or software fails.System OrganizationIn addition to the Etherphones, we have designed and imple-mented a telephone control server and a voice file server, both ofwhich are Cedar-based applications programs. Figure 1 shows thestructure of the resulting Etherphone system.The Etherphone is responsible for voice digitization, for theanalog interface to the switched telephone network, and for imple-menting the voice-transmission protocols. It is designed to be avoice peripheral, without local intelligence or system control re-sponsibility. Instead, it reports user actions to the telephonecontrol server, which returns detailed commands that control theEtherphone functions.The telephone control server is the system controller. Its respon-sibilities are analogous to the control responsibilities of aconventional PABX: monitoring the state of the system, keepingtrack of the state of each Etherphone, and setting up allconnections. It must also coordinate the operation of the softwarein users' workstations. This workstation software provides im-proved user interfaces for the telephone and other more experi-mental applications.The voice file server uses a general-purpose computer withattached disk storage for voice. It performs standard file serverfunctions, but is specialized for the real-time needs of telephony.KaftEa)stE` ,E^e4!5E\3E[+EYo VV)'*EUst/ES_0EQ*EP% VMO)EK st*EJ)st EHY*EFAEE 1stECc)EA)stE@+",E>m>E<st)E;!#E9w0E77E3gx V0t"E.vtv tE-V+stE+- V('E'F/E%(stE#6 E"P st' E @E V@vtst EYZ"E'EJ9:1E st( E v st ET 4E V vt%E Dst E 5 E U*]9Finally, existing standard file servers and database services areused for storage of telephone directory information and for storageof users' call filters and other information.The following sections amplify the descriptions of these majorsystem components.EtherphoneThe Etherphone is a dual-processor computer with interfaces forthe Ethernet, DES encryption, RS-232 serial devices, and voice.The main processor is responsible for network communications andcontrol while the slave processor handles voice signal-processingfunctions. An analog board, a digital board, and a power supplyoccupy a convection-cooled cabinet measuring 12" x 13" x 5".This package is designed to sit under a user's desk, while thetelephone set, microphone, and speaker occupy positions of conve-nience. This arrangement is shown in Figure 2. Figure 3 is ablock diagram of the Etherphone's internal architecture.The analog board is centered around an 8 by 8 analog crossbarswitch that interconnects various voice sources and sinks: telephoneset, telephone line, analog-to-digital converters, DTMF decoder,microphone and speaker, and line-level auxiliary inputs andoutputs. The analog-to-digital conversion functions are accom-plished by coder-decoder (CODEC) devices. The telephone setand telephone line interfaces are connected by relays so that apower failure or system crash will restore standard telephoneservice.The digital board contains the two processors, memory systems,timing logic, and digital interfaces for the Ethernet, RS-232 devices,DES encryption, analog-to-digital conversion, and control of theanalog hardware. A watchdog timer is provided for automaticrecovery from an intermittent fault.The main processor is an Intel 8088 with 8K bytes of EPROMand 56K of RAM. DES encryption is accomplished by a 900 Kbpssingle-chip device with direct memory access (DMA). TheEthernet controller is an internal Xerox LSI part using theprotocols of the 3 Mbps experimental Ethernet, but operating at 1.5Mbps. The RS-232 interface permits the the connection of a localdisplay and keyboard as an option.Programs written in the C language supply all of the mainprocessor's functions except for a small number of low-level devicedrivers, which are written in an assembly language. The softwareincludes a round-robin-scheduled multitask operating system, and anetwork package that implements both the voice protocols andprotocols for connecting a local control program to the telephonecontrol server. This local control program communicates useractions (such as lifting the switchhook or keying in a "5" on thetelephone) to the server without interpretation. Commands fromthe control server instruct the Etherphone's local control programto establish Ethernet voice connections, configure the crossbarswitch, generate programmed tone sequences, and manage theKaft Va"E` ;E^e- V[/EYEUx VRt,EQ+$sEOtvt4EMvt*EL5st!EJ,sEHt'EG?'EEst%st EC8 VA.+E?DE=2 E<845E:st* E8stE7B=E5 * E3 V12E/(E-<E,<st*E*$ V'4E&,4E$./stE"#,$E!6<Est$E" V&"E~#E2st E04 E8E3 E: st,E6 E /st E D#E (E/  E+_]10telephone line interface.The slave processor is also an 8088. It has access to 8K ofprivate EPROM, 2K of RAM, and shared access to 48K of mainmemory. The slave processor executes a small, carefully coded,assembly-language program. Its functions include silence detection,the combination of incoming packets to achieve conference-callbridging, echo suppression, gain control, CODEC control, and low-level buffer management in shared memory.The overall performance of the Etherphone is sufficient tosupport two simultaneous Ethernet conversations. For example,two conversations are needed in order to forward an arrivingoutside call to an attendant's Etherphone when the telephone set isalready being used for an Ethernet call. In a differentconfiguration, the Etherphone will support a four-party conferencecall.Because the voice sampling rate of each Etherphone is set by itsown crystal-controlled clock, synchronization of two Etherphonesparticipating in a conversation presents an intriguing problem. Apair of Etherphones may have clocks differing by as much as onepart in 10,000. In the steady state, this frequency error causes thequantity of buffered voice at the receiving Etherphone to slowlyincrease or decrease. Since silence detection is used to reducetransmission bandwidth, the correct buffer depth is reestablishedduring each silent interval. Communications with the voice fileserver avoid this problem by use of software feedback  the fileserver is driven by the Etherphone clock.Telephone Control ServerThe telephone control server performs the functions normallyassociated with the common control facilities in a conventionalPABX or telephone switching office. However, because it does nothave to switch or transmit any of the actual voice traffic, it has nospecial hardware needs and requires only modest computing powerto support its activities. The telephone control server supplies twoclasses of functions: Etherphone intelligence and call processing.The control server contains the Etherphone intelligence softwarethat interprets user actions at an Etherphone. Most Etherphoneshave only the standard DTMF keypad, but a small display andkeyboard can be added to provide a more effective user interfacewhen a workstation is not available (see Figure 1). In either case,control server software interprets and responds to the user's actions.These responses include service requests such as setting up andtaking down connections, and sending control instructions back tothe Etherphones.The control server also performs the telephone-control functionsusually referred to as call processing. It provides control andbookkeeping for connections in collaboration with the Etherphoneuser interface programs for the participating parties. The controlserver provides access to a number of databases, includingdirectories establishing the relationships among Etherphone net-JftEa V^#stE]K:E[st6EYst EXU.EV!EU) VREEPst EN'EMO2EKO Pst EJ%EHY VE5 EC@EBI6stE@0E> st.E=S#E;st"E:*E8]stE6,E5)E0x V-t'E,<"E*stE(; E'F/E%st)E#- V!6@E-st E%E@1E/stE6sEJtE 6E V:@E&stE "E D-st E <=$E$ TE+_]11work addresses, the names of individuals and services (such as thevoice file server), telephone extensions, office locations, and nearbyworkstations. The call-processing portion of the control server hasan additional function: it synchronizes the activities of eachEtherphone with the workstation applications that manage thatEtherphone and that use it to provide their voice connections.When user actions coming from the Etherphone and from theworkstations are inconsistent, the server resolves the conflicts.The telephone control server includes some maintenancefacilities not directly connected with system operation. Forexample, the server contains software to detect the distress packetsthat Etherphones broadcast when they are first powered on orwhen unrecoverable failures occur. This caretaker softwareresponds by performing automatic downloading and remote main-tenance of Etherphone software. By consulting another database,the control server can support the simultaneous operation ofEtherphones with different software or hardware configurations.The same database assigns each Etherphone to a particular tele-phone control server, so that one server and a few Etherphones canbe used to develop new system versions, while another server sup-ports normal operations for the remaining users.For system control, we were able to take advantage of therecent development of remote procedure call protocols (Ref. 19).Remote Procedure Call (RPC) is a means for transforming themessage-passing semantics of packet-switching networks into theprocedure-call format familiar to programmers. RPC largelyrelieves applications programmers from any worries about packetformats, addressing, reliable communication, and security. Thismakes it possible to defer decisions about the partitioning of adistributed system until very late in the design. Our task wasreduced to producing an RPC implementation for the Etherphonecontrol processor, then specifying the procedural interfaces betweendifferent system components.Voice File ServerThe voice file server uses the stored-voice protocol to recordand play back digitized voice segments. It runs on a Xerox Doradocomputer that can be equipped with as many 300 Mbyte disks asusage warrants. At 8000 bytes per second, the voice-storagecapacity of each disk is over 8 hours.The disk is organized so that storage is allocated in one secondunits. This permits disk activity on behalf of a single record orplayback operation to be limited to one contiguous disk transferper second. Nevertheless, user software can specify the order andduration of voice-segment access at a grain of one millisecond.This facility makes it possible to experiment with voice editing.The very high network-communication loads presented by multiplevoice-protocol connections present a special challenge. The currentimplementation is capable of handling about eight simultaneoustransfers.JftEa&E` )E^e st5E\%+&E[8EYo+sEWt9EV!@s VS_tN*O EQ(8)EP(ENi<EL?!s@tEKEIs@EGEF$%ED}&EB,EA.5 E?0 V<5E;&E9wE7;E6(!"&E4*E20E12/E/)E-)E,<1E*E&,x V#jt+E!%E *Et  ,E& V 2 Ed%E+E(En&E.E 2Ex,E >E *  bE +](l12All voice is stored in encrypted form. The associated keys arestored by the telephone control server in a directory along withinformation granting appropriate access to each voice segment.Since each voice segment is stored only once, regardless of thenumber of users granted access, the file server directory also keepstrack of the number of outstanding references to each segment.Voice storage is reclaimed automatically when no referencesremain.Workstation applicationsMost users of the Etherphone system will have both an Ether-phone and a personal workstation. Although there can be conflic-ting requests for service originating from the workstation and theuser's Etherphone, voice-related applications software on theworkstation generally takes priority in decision making. As anexample, consider an arriving call. The Etherphone is alwaysprepared to announce the call with a ringing tone, but workstationsoftware is first given the choice of rejecting the call, allowing theEtherphone to ring normally, or choosing some other way to an-nounce the call.Our goals include both producing useful workstationapplications and making the Etherphone functions available toother programmers who would like to produce voice-related appli-cations. For this reason, the workstation software includes a basicpackage of support facilities, then another layer of specificapplication programs that operate as clients of the basic package.The support package provides applications packages with aprocedural interface that is as simple as we could make it withoutreducing the available functions. The simplicity is obtainedprimarily by suppressing necessary but uninteresting details: com-munications with the telephone control server and the synchroni-zation of workstation actions with those of the Etherphone itself.The support package is also responsible for preventing actions byapplications programs that could reduce the reliability of the basictelephone system and other applications. The dedicated facilities ofthe voice transmission and switching system makes this task easierthan it might otherwise be.The existing Cedar applications layer provides a rudimentaryuser interface for placing and receiving telephone calls, for mana-ging a personal telephone directory, and for dealing with voicemessages. The telephone-management facilities allow call place-ment by name rather than by number, or by pointing at recipients'names in a directory displayed on the screen. Other displaysannounce the names of callers, and present logs of telephoneactivity. The voice-message system is patterned after our text-mailsystem. Facilities are provided for recording and reviewing mes-sages, for giving a voice message a text subject field, and fordirecting delivery of the message to one or to several recipients. Atable of contents of the user's messages is also provided. Individualmessages can be played, deleted, or saved for a later time.Jft Va'E` &E^e*E\$E[;EYo(EW1(2EV!EQx VNt<EMO5 EK)EJC6DEHY$EF6EE BECc/EA"E@ V=S3E;&E:/E8] 7E6;+<E5!! V2L)E0,E. =E-V"E+-E*:E(`AE&>E%-E#j,E! V%EZ=E?E 2Ed&E 3EEn0E$E (Ex0E (E *; E +](g13Since the Cedar environment is available on several differenttypes of workstations, workstation access to the Etherphone facili-ties will be available at least to all Cedar users. Furthermore, thefunctionality of the support package is not extensive, so we do notexpect it to be difficult to build similar packages that will makeadvanced voice capabilities available to users of other local pro-gramming environments.Status and PlansWe are currently testing an initial system using prototypeEtherphone hardware. We are planning to install additionalEtherphones, offering service to a sizable segment of our labora-tory, in order to test the value of the enhanced control and messagecapabilities and to allow Cedar programmers to develop their ownapplications. With the enabling facilities in place, we are just be-ginning to explore the potential applications of integrated voice inthe office environment.AcknowledgmentsWe would like to thank Jim Horning, Ken Pier, Mary Clairevan Leunen, Meg Withgott, and Polle Zellweger for their valuablecomments during the preparation of this paper.References1. American Telephone and Telegraph Company, Notes on TheNetwork, 1980.2. A. D. Birrell, R. Levin, R. M. Needham, M. D. Schroeder,"Grapevine: An Exercise in Distributed Computing," Comm. ACM,25, 4 (April 1982), 260-274.3. D. R. Boggs, J. F. Shoch, E. A. Taft, R. M. Metcalfe, "Pup: AnInternetwork Architecture," IEEE Trans. Communications, Com-28,4 (April 1980), 612-624.4. P. T. Brady, "Effects of Transmission Delay on ConversationalBehavior on Echo-Free Telephone Circuits," Bell System TechnicalJournal, 50, 1 (January 1971), 115-134.5. D. K. Brotz, "Laurel Manual," Report CSL-81-6, Xerox PaloAlto Research Center, May 1981.6. R. C. Crane, E. A. Taft, "Practical Considerations in EthernetLocal Network Design," Proc. 13th Hawaii International Conferenceon Systems Sciences, Honolulu January 1980, 166-174.7. Data Encryption Standard, Federal Information ProcessingStandard (FIPS), Publication 46, National Bureau of Standards, U.S. Department of Commerce, January 1977.8. J. W. Forgie, "Voice Conferencing in Packet Networks,"International Conference on Communications, Seattle, June 1980.Jft Va=E` ,E^eAE\6 E[*EYo 4EWESr VPWt  EN;EM*EKa.EI<EH.EFk$EDE@r V=St ,E; 5E:.E5Ur @2tv @0t@.*;@,v @*t@(A@&sv#@$t@" .@ b v@ t@7@R@5 @v @Bt@124@%@2(@ p../ @ v*t@ *^)149. T. A. Gonsalves, "Packet-Voice Communication on an EthernetLocal Computer Network: an Experimental Study," Proc.SIGCOMM 1983 Symposium on Communications Architectures andProtocols, Austin TX, March 1983, 178-185.10. J. R. Cavanaugh, R. W. Hatch, J. L. Sullivan, "Models for theSubjective Effects of Loss, Noise, and Talker Echo on TelephoneConnections," Bell System Technical Journal, 55, 9 (November1976), 1319-1371.11. H. H. Henning, J. W. Pan, "D2 Channel Bank: SystemAspects," Bell System Technical Journal, Volume 51, (October1972), 1641-1657.12. J. F. Shoch, J. A. Hupp, "Measured Performance of anEthernet Local Network," Comm. ACM, 23, 12 (December 1980),711-721.13. D. H. Johnson, G. C. O'Leary, "A Local Access Network forPacketized Digital Voice Communication," IEEE Trans.Communications, Com-29, 5 (May 1981), 679-688.14. B. W. Lampson, K. A. Pier, "A Processor for a High-Performance Personal Computer", Proc. 7th Symposium onComputer Architecture, SigArch/IEEE, La Baule, May 1980, 146-160.15. N. F. Maxemchuk, "An Experimental Speech Storage andEditing Facility," Bell System Technical Journal, 59 (October 1980),1383-1395.16. R. M. Metcalfe, D. R. Boggs, "Ethernet, Distributed PacketSwitching for Local Computer Networks," Comm. ACM, 19, 7(July 1976), 395-404.17. J. G. Mitchell, J. Dion, "A Comparison of Two Network BasedFile Servers," Comm. ACM, 25, 4 (April 1982), 233-245.18. R. M. Needham, M. D. Schroeder, "Using Encryption forAuthentication in Large Networks of Computers," Comm. ACM,21, 12 (December 1978), 993-998.19. B. J. Nelson, Remote Procedure Call, Report CSL-81-9, XeroxPalo Alto Research Center, May 1981.20. Internet Transport Protocols, Xerox System IntegrationStandard XSIS 028112. December 1981.21. G. J. Nutt, D. L. Bayer, "Performance of CSMA/CD NetworksUnder Combined Voice and Data Loads," IEEE Trans.Communications, Com-30, 1 (January 1982), 6-11.22. J. F. Shoch, "Carrying Voice Traffic Through an Ethernet LocalNetwork  A General Overview," presented at the IFIT WG 6.4Int. Workshop Local-Area Computer Networks, Zurich, Switzerland,August 1980.23. D. C. Smith, E. Harslem, C. Irby, R. Kimball, "The Star UserInterface, an Overview," Proc. National Computer Conference (JuneJft@a6@`  "v@^e .@\t!@Y!@XU"@V v$t @U@RE @P v  t@N@L5  "@Jvt@H@F$+@D}v @Bt@@7@>mjkv@<#t@;@8]yt0@6vt@5 @2L@0(v @.t@,<,@*vt@'3@&,v @$t@!vt@ $@Zt:@$@7@Jv @t@9@:+v @+t@  @ *:@ v"t@ ;*^P151982), 515-528.24. C. P. Thacker, E. M. McCreight, B. W. Lampson, R. F. Sproull,D. R. Boggs, "ALTO: A Personal Computer," in ComputerStructure: Readings and Examples, D. Sieworek, C. G. Bell, and A.Newell, Eds. McGraw-Hill, New York, 1981.25. F. A. Tobagi, N. Gonzalez-Cowley, "On CSMA-CD Networksand Voice Communication," Proc. Infocom (1982), 122-1278.26. W. D. Guns, "Computer Based Voice Mail: Status andOutlook," SRI International Business Intelligence Program, File No.83-757, January 1983.27. P. A. Strudwick, R. E. Adkins, "Integrating Voice and Data atthe Human Interface: the Key to Effective Communications in theOffice of the 80's," Proc. National Telecommunications Conference(November 1981), New Orleans, page D1.2.28. H. G. Jud, R. E. Winter, "A Modern Integrated PABX withCentralized Message Recording and Remote Distribution," Proc.National Telecommunications Conference (November 1981), NewOrleans, page F3.3.29. J. Edwards, D. Lieberman, "The ROLM CBX as an IntegratedVoice/Data Communications Switching Unit," Proc. NationalTelecommunications Conference (November 1981), New Orleans,page F3.1.The authors are located at the Xerox Palo Alto Research Center, 3333Coyote Hill Road, Palo Alto, CA 94304.Jft@a@^;@]K+-v@[t @Y)@W;1@Uv t@R !@Q+ v'@Ot@L9@K?@Isv! @Gt(@E 1 @Cc%v@A&t@@@=S@;LMv @:t@8] @2vXD@12&@0*7  HELVETICA HELVETICA  HELVETICA  HELVETICA  TIMESROMAN  HELVETICA  TIMESROMAN  HELVETICA TIMESROMAN   TIMESROMANs {#+3;{C}K~S[cjqOuj/xvz0qglobecomp.bravo SwinehartAugust 29, 1983 9:44 AM