An Experimental Environment
for Voice System Development
Daniel C. Swinehart
Douglas B. Terry
Polle T. Zellweger
Xerox Palo Alto Research Center
Abstract
The EtherphoneTM system has been developed to explore methods for extend existing multi-media office environments with the 
facilities needed to handle the transmission, storage, and manipulation of voice.  Based on a hardware architecture that uses 
microprocessor-controlled telephones to transmit voice over an Ethernet that also supports a voice file server and a voice 
synthesis server, this system has been used for applications such as directory-based call placement, call logging, call filtering, 
and automatic call forwarding.  Voice mail, voice annotation of multi-media documents, voice editing using standard text editing 
techniques, and applications of synthetic voice use the Etherphones for voice transmission.  Recent work has focused on the 
creation of a comprehensive voice system architecture, both to specify programming interfaces for custom uses of voice, and to 
specify the role of different system components, so that equipment from multiple vendors could be integrated to provide 
sophisticated voice services. 
1. Introduction
------- length: 1 in
Authors' address: Computer Science Laboratory, Xerox Palo Alto Research Center, 3333 Coyote Hill Road, Palo Alto, CA 94304.
Arpanet: Swinehart.PA@Xerox.COM, Terry.PA@Xerox.COM, Zellweger.PA@Xerox.COM
------- length: 1 in
Suppose Alexander Graham Bell had waited to invent the telephone until personal workstations and distributed computing networks had 
been invented.  What approach would he take in introducing voice communications into the modern computing environment?  It was an 
attempt to answer this question that led to the creation of a voice communications project within the Computer Science Laboratory 
at Xerox PARC.
Stated more concretely, the project's aim was to extend our existing multi-media office environment with the facilities needed to 
handle the transmission, storage, and manipulation of voice.  We believed that it should be possible to deal with voice as easily 
as we can manage text, electronic mail, or images.  The desired result was an integrated workstation that could satisfy nearly all 
of a user's communications and computing needs.
A basic requirement was to provide conventional telephone facilities (so that casual users would not have to read a manual to make 
a phone call), but our goals went far beyond that.  We had observed that most enhanced voice communications facilities had been 
developed by designers of telephone systems.  In contraast, we wished to draw on our experience as developers of personal 
information systems running on powerful workstations with graphical interfaces.  We were convinced that the user would prefer to 
perform voice management functions using the power and convenience of workstation facilities such as on-screen menus, text editors, 
and comprehensive informational displays.
These aims led us to explore two related research domains:
    "Taming the telephone": Despite an immense investment in research and development over the last 110 years, the user interface and 
    the functionality of the telephone still leaves much to be desired.  We contend that the personal workstation, combined with a 
    telephone system whose characteristics we can control, make it possible to better match the behavior of the office telephone 
    with the needs of its users.  There are gains to be had in the placement of calls, the handling of incoming calls, and the 
    capabilities available to telephone attendants (that is, switchboard operators, receptionists, and secretaries). 
    "Taming the tape recorder": We also believe that workstation techniques for creating, editing, and storing text or images can be 
    modified to deal with digitally-recorded voice.   Application areas such as electronic mail, document annotation, and dictation 
    are candidates for improvement.  Speech synthesis and recognition devices can be added to provide translation between textual 
    and spoken information.
These two sets of activities are clearly related.  A carefully designed system can support novel applications of  both live and 
recorded voice.
In this overview we will describe the EtherphoneTM system that we have developed and used to explore ways of integrating voice into 
a personal information environment.  The following sections sketch the present hardware architecture, describe some of the more 
compelling applications that have been built to exploit it, and briefly explore the software and systems issues that have surfaced.
2. Etherphone System Description
In designing our prototype voice system, we surveyed several possible hardware architectures, including using our existing Centrex 
service or a commercially available PABX.  We concluded that the most effective way to satisfy our needs was to construct our own 
transmission and switching system [Swinehart et al 1983].  Ethernet local-area networks, which provide the data communications 
backbone supporting personal distributed computing at Xerox PARC, have proven to be an effective medium for transmitting voice as 
well as data.  Our prototype voice system consists of the following types of components connected by Ethernets, as shown in Figure 
1.

<< [Artwork node; type 'ArtworkInterpress on' to command tool] >>
Figure 1.  Etherphone system components.
Etherphones: telephone/speakerphone instruments that include a microcomputer, encryption hardware, and an Ethernet controller.  
Etherphones digitize, packetize, and encrypt telephone-quality voice and send it to each other directly over an Ethernet.  
Etherphone software is written in C.  The current environment contains approximately 50 Etherphones, which are used daily by 
members of the Computer Science Laboratory as their only telephone service.  A connection to the standard direct-dial telephone 
line provides access to telephones outside the Etherphone system.  Additional information on the Etherphone hardware and the Voice 
Transmission Protocol can be found in a previous report [Swinehart et al 1983].
Voice Control Server: a program that provides control functions similar to a conventional PABX and manages the interactions between 
all the other components.  It runs on a dedicated server that also maintains databases for public telephone directories, 
Etherphone-workstation assignments, and other shared information.  The Voice Control Server is programmed in the Cedar programming 
environment [Swinehart et al 1986].
Voice File Server: a service that can hold conversations with Etherphones in order to record or play back user utterances.  In 
addition to managing stored voice in a special-purpose file system, the Voice File Server provides operations for combining, 
rearranging, and replacing parts of existing voice recordings to create new voice objects.  For security reasons, voice remains 
encrypted when stored on the file server.
Text-to-speech Server: a service that receives text strings and returns the equivalent spoken text to the user's Etherphone.  Each 
text-to-speech server is constructed by connecting a commercial text-to-speech synthesizer to a dedicated Etherphone and is 
available on a first-come-first-served basis.  Two speech synthesizers, purchased from different manufacturers, have been installed.
Workstations: high-performance personal computers with large bitmapped displays and mouse pointing devices.  Workstations are the 
key to providing enhanced user interfaces and control over the voice capabilities.  We rely on the extensibility of the local 
programming environmentbe it Cedar, Interlisp, Unix, or whateverto facilitate the integration of voice into workstation-based 
applications.  Workstation program libraries implement the client programmer interface to the voice system.
In addition, the architecture allows for the inclusion of other specialized sources or sinks of voice, such as speech recognition 
equipment or music synthesizers.
All of the communication required for control in the voice system is accomplished via a remote procedure call (RPC) protocol 
[Birrell and Nelson 1984].  For instance, conversations are established between two or more parties (Etherphones, servers, and so 
on) by performing remote procedure calls to the Voice Control Server.  During the course of a conversation, RPC calls emanating 
from the Voice Control Server inform participants about various activities concerning the conversation.  Active parties in a 
conversation use the Voice Transmission Protocol to actually exchange voice.  Multiple implementations of the RPC mechanisms permit 
the integration of workstation programs and voice applications programmed in different environments.
3. Examples of Applications to Date
Most of our user-level applications to date have been created in the Cedar environment, although limited functions have been 
provided for Interlisp and for standalone Etherphones.  This section describes the voice applications that are available in Cedar, 
including telephone management, text-to-speech synthesis, and voice annotation and editing.  Figure 2 shows a typical Cedar screen 
using voice, text, and graphics to support programming and document preparation activities.
In order to make voice a first-class citizen of the Cedar environment, Etherphone functions are typically available in several 
ways: through an Etherphone control panel, through commands that can be issued in a typescript, and through procedures that can be 
invoked from client programs.  This integration of voice capabilities will be discussed more fully in the next section.
<<>>
Figure 2.  A Cedar screen in use.  The two windows in the upper left show a document preparation task, including voice annotation 
of this paper.  The bottom window of this pair shows new voice (represented by arrowheads) being inserted in the middle of an 
existing voice annotation.  The two windows in the lower left show a programming task that is monitoring part of the voice 
annotation system.  The two windows in the upper right show images created with several graphical illustration packages.  The 
window at the lower right accepts user commands, similar to a Unix shell.  The bottom row of icons represent files and tools that 
are active but are not currently being manipulated by the user.
3.1. Telephone management
The telephone management functions provide improved capabilities for placing calls and improved capabilities for receiving calls.  
Figure 3 shows an Etherphone control window, called Finch, and a personal telephone directory window.
Users can place calls by specifying a name, a number, or other attributes of the called party.  A system directory database for 
local Xerox employees (about 1000 entries) is stored on the Voice Control Server.  Etherphone users can also create personal 
directories, which are consulted before the system directory to locate the desired party.  A soundex search mechanism [Knuth 1973] 
compensates for some kinds of spelling errors.
A variety of convenient workstation dialing methods are provided.  A user can fill in fields in the Finch tool, select names or 
numbers from anywhere on the screen, or use either of two directory tools that present browsable lists of names and associated 
telephone numbers as speed-dialing buttons.  Calls can also be placed by name or number from the telephone keypad.
Calls are announced audibly, visually, and functionally.  Each Etherphone user selects a personalized ring tune, such as a few bars 
of "Mary Had a Little Lamb".  This tune is played at a destination Etherphone to announce calls to that user.  The caller hears the 
same tune as a ringback confirmation.  During ringing, the telephone icon jangles with a superimposed indication of the caller, as 
shown in the middle portion of Figure 4.  An active conversation is represented as a conversation between two people with a 
superimposed indication of the other party (also shown in Figure 4).  The system automatically fills in the Finch tool's Calling 
Party or Called Party field to allow easy redialing.  It also creates a new entry in a conversation log.  A user can consult the 
conversation log to discover who called while he was out of the office.
<< [Artwork node; type 'ArtworkInterpress on' to command tool] >>
Figure 3.  Two workstation telephone management windows.  The upper Finch window provides a two-dimensional user interface to the 
Etherphone system.  It includes an Etherphone control menu (the first line, including `Phone', `Answer', etc. buttons), a redialing 
area (the second line), an area for system status reports, and a conversation log (indicating a call in progress to Aquarius 
Theater info).  The lower window shows a portion of a personal telephone directory, which is a set of speed-dialing buttons that 
can be created easily from an ordinary text file.  The call in progress was placed by clicking on the underlined `Aquarius Theater 
info' entry.
Our methods of following a user around an office building rely upon the personalized ring tunes, which allow Etherphone users to 
identify calls to them wherever they may be: in their own offices, within earshot, or at other Etherphones.  If an Etherphone user 
logs in at a workstation, his calls are automatically forwarded to the adjacent Etherphone.  An additional feature, called 
visiting, allows him to register his presence with a second workstation or Etherphone, such as during a meeting.  Registering with 
the destination location allows users to travel more freely than forwarding calls from the home location does.  Each visit request 
cancels any earlier requests; visiting the home location cancels visiting.  The common problem of forgetting to cancel forwarding 
is further eased by ringing both Etherphones during visiting.
<< [Artwork node; type 'ArtworkInterpress on' to command tool] >>
Figure 4.  Etherphone system icons.  The two icons at the left show a closed personal telephone directory and a Finch icon at rest. 
 The four central icons show several stages of an incoming call from Doug Wyatt (username Wyatt.pa): the three left icons of the 
group are animated during ringing, while the right conversation icon is used after the call has been answered.  The rightmost icon 
shows an outgoing call to Polle Zellweger's home.  Animation and visual feedback in the icons provide useful information without 
consuming valuable screen area.
3.2. Text-to-speech synthesis
A user or program can generate speech as easily as printing a message on the display by using one of the Text-to-speech Servers.  A 
user can select text in a display window and click the Finch tool's SpeakText menu button.  A program can call a procedure with the 
desired text as a parameter.  These features are implemented by creating a "conversation" between the user's Etherphone and a 
Text-to-speech Server.  The system sets up a connection to the Text-to-speech Server, sends the text (via RPC), returns the 
digitized audio signal (via the Voice Transmission Protocol), and closes the connection when the text has been spoken.  A similar 
mechanism is used for voice recording and playback.
Our primary uses for text-to-speech so far have been in programming environment and office automation applications.  Programming 
environment tasks have included spoken progress indicators, prompts, and error messages.  Office automation applications have 
included proofreading (especially comparing versions of a document when one version has no electronic form, such as proofing 
journal galleys) and audio reminder messages generated by calendar programs.
3.3. Voice annotation and editing
This section describes the addition of a voice annotation mechanism to Tioga, the standard text-editing program in Cedar.  Tioga is 
essentially a high-quality galley editor, supporting the creation of text documents using a variety of type faces, type styles, and 
paragraph formats.  Tioga includes the ability to incorporate illustrations and scanned images into its documents.  Tioga is the 
underlying editor for all textual applications in Cedar, including the electronic mail system, the system command interpreter, and 
other tools that require the user to enter and manipulate text.  Wherever Tioga is used, all of its formatting and multi-media 
facilities are available.  Thus, by adding voice annotation to Tioga, we have made it available to a variety of Cedar applications.
Any text character within a Tioga document can be annotated with an audio recording of arbitrary length.  The user interface of the 
voice annotation system is designed to be lightweight and easy to use, since spontaneity in adding vocal annotations is essential.  
Voice within a document is shown as a distinctive shape superimposed around a character, so that the document's visual layout is 
unaffected.  Furthermore, adding voice to a document does not alter its contents as observed by other programs (such as compilers).
To add an annotation, the user simply selects the desired character within a text window and buttons AddVoice in that window's 
menu.  Recording begins immediately, using either a hands-free microphone or the telephone handset, and continues until the user 
buttons STOP.  A voice annotation becomes part of the document, although the voice data physically resides on the Voice File 
Server.  Copies of the document may be stored on shared file servers or sent directly to other users as electronic mail.  To listen 
to voice, a user selects a region containing one or more voice icons and buttons PlayVoice.
Simple voice editing is available: users can select a voice annotation and open a window showing its basic sound-and-silence 
profile.  Sounds from the same or other voice windows can be cut and pasted together using the same editing operations supported by 
the Tioga editor.  A lightweight `dictation facility' that uses a record/stop/backup model can be used to record and incorporate 
new sounds conveniently.  Editing is done largely at the phrase level (never at the phoneme level), representing the granularity at 
which editing can be done with best results and least effort for an office situation.  The dictation facility can also be used when 
placing voice annotations directly into documents.
Sound-and-silence profiles alone do not provide adequate contextual information for users to identify desired editing locations.  
Several contextual aids are provided.  A playback cue moves along the voice profile during playback, indicating exactly the 
position of the voice being heard (see Figure 5).  While playback is in progress, a user can perform edits immediately or mark 
locations for future edits.  Simple temporary markers can be used to keep track of important boundaries during editing operations, 
while permanent textual markers can be used to mark significant locations within extended transcriptions.  Finally, the system 
provides a visual indication of the voice-editing history in an editing window.  Newly-added voice appears in a bright yellow 
color, while less-recently-added phrases become gradually darker as new editing operations occur.
More information about the voice annotation and editing system can be found in [Ades and Swinehart 1986].
<< [Artwork node; type 'ArtworkInterpress on' to command tool] >>
Figure 5.  Voice annotation and editing.  The upper window shows voice annotations being added to a Tioga document (the voice 
annotation system documentation).  The third line of menu buttons ("AddVoice PlayVoice ...") near the top of the window are used to 
manipulate voice.  In the second line of text, a voice annotation appears on the word "short", indicated by the comic-strip dialog 
`balloon' surrounding the character.  In the fifth line, a similar annotation has been opened for editing in the lower window, 
labelled "Sound Viewer #1".  In the sound-and-silence profile in the lower window, white rectangles depict silence, while dark 
rectangles depict sound.  The profile contains several contextual indicators to orient the user during editing.  The playback cue 
(the gray rectangle underneath the word "score") indicates the progress of voice playback.  A temporary marker in the form of a 
small cross has been placed in the voice section marked "Intro".  The textual annotations with arrows are permanent markers that 
will be stored with the document.
4. Progress toward a Voice System Architecture
The original goals of the Etherphone project were to produce experimental prototypes that could "tame the telephone" or "tame the 
tape recorder" in novel and useful ways.  As the project developed, however, a more fundamental goal emerged:  to create and 
experimentally validate a comprehensive architecture for voice applications.  The best way to explain the value of a voice 
architecture is to list some of the properties it should have:
    Completeness. It must be able to specify the role of telephone transmission and switching, workstations, voice file servers, and 
    other network services in supporting all the kinds of  capabilities we have identified, such as telephone services and recorded 
    voice services.
    Programmability. It must permit workstation programmers to modify existing voice-related applications and to create new ones.  
    Simple applications should be easy to write, requiring little or no detailed understanding of how the system is implemented.  
    More elaborate applications might require a more thorough knowledge of the underlying facilities.  The architecture must be 
    designed to minimize the effect of faulty programming on the reliability and performance of the overall system. (Users of 
    experimental software might experience program failure or reduced performance, but other users should not.)
    Openness. It should define the role of each major component, so that different kinds of components could be used to provide the 
    same functions.  In this way, multiple vendors of telephone and office equipment could cooperate to provide advanced voice 
    functions in conjunction with workstation-based applications.  For example, a conventional PABX (business telephone system) 
    could be used in place of the Etherphones to provide voice switching.  Ideally, such an architecture would evolve into a 
    standard for voice component interconnection.
The development of the Etherphone system has included an ongoing effort to define such an architecture, and to implement the system 
in compliance with it.  Following the general methodology employed by such modern communications architectures as the ISO reference 
model [Herman et al 1986] or the Xerox Network Systems protocols [Xerox 1981], the voice architecture is expressed as a set of 
layers, each calling on the capabilities of the layer below it through well-defined interfaces or protocols.  We have identified 
five distinct layers.  From highest to lowest, these are the Applications layer, the Service layer, the Conversation layer, the 
Transmission layer, and the Physical layer.
The best way we have found to explain this organization is from the inside out, beginning with the heart of the architecture, the 
Conversation layer.  It provides a uniform approach to the establishment and management of voice connections, or conversations, 
among the various services.  It also provides a standard method for distributing conversation state transitions and other progress 
reports to the various participants in each conversation.  All communications between services are mediated by Conversation layer 
facilities.  In the Etherphone system, the functions of the Conversation layer are implemented entirely within the Voice Control 
Server.  However, the architecture does not mandate centralized control.  For example, Etherphones built with larger memories and 
more powerful processors could support a distributed implementation, each managing the conversations that it or its associated 
workstation initiated.
The Service layer defines the various voice-related servicessuch as telephone functions, voice recording and storage, voice 
playback, speech synthesis, and speech recognitionthat form the basis for the voice applications.  Each of the services must 
follow the uniform Conversation layer protocols in creating voice connections with other services.  However, each can register with 
the Conversation layer additional service-specific interfaces (protocol specifications).  Connections may be formed between similar 
services (as in a call from one telephone to another), or among different services (such as a connection from a telephone to the 
recording service, mediated by a workstation program).  It is not expected that ordinary programmers will produce new services; the 
services provide both the basic user facilities and interfaces to the building blocks for higher-level applications.  In the 
Etherphone system, some services are implemented on the server machine that contains the Voice Control Server, others on separate 
server machines, still others on individual workstations. 
The Applications layer represents client applications that use the voice capabilities of the architecture.  To establish voice 
connections, a client uses simplified facilities provided by a service that resides on the workstation along with the application.  
Client applications also negotiate with the Conversation layer to gain access to specialized interfaces provided by other services. 
 The previous section illustrated many of the present voice applications using the Etherphone environment.
Logically below the Conversation layer is the Transmission layer.  This layer represents the actual methods for representing 
digital voice, for transmitting and switching voice, and for communicating control information among the components of the system.  
In the Etherphone system, voice is transmitted digitally, in discrete packets, using a standard 64 kilobits/second voice 
representation and our own voice transmission protocol.  Other transmission and switching methods could be substituted without 
affecting the nature of the programs operating in layers above the Conversation layer.  Possibilities include synchronous digital 
transmission, or even analog transmission.  The only requirement is that these components provide interfaces that allow the 
implementation of Conversation layer protocols.  As we have mentioned, the control protocol selected for all control communications 
in the system was a locally-produced remote procedure call protocol.  Other remote procedure protocols or message-based protocols 
would work equally well.
Finally, the Physical layer represents the actual choice of communications media, for the transmission of both voice information 
and control (not necessarily the same media!)  Besides the research Ethernet that we use (operating at 1.5 megabits/second), voice 
transmission on standard Ethernets, synchronous or token-oriented ring networks, digital PABX switches, or analog telephone 
switches could be used.
Looking at the architectural layers, it becomes easy to see how our efforts differ from work being done elsewhere.1  Most of the 
current efforts to "integrate voice and data", such as those systems built around the Integrated Systems Digital Network (ISDN) 
definitions [Decina 1982], deal only with the Transmission and Physical layers.  Other systems that include voice, such as the 
Diamond research effort at BBN [Thomas et al 1985], commercial voice mail services, etc., support some specialized applications 
exhibiting very scanty Service and Conversation layers.  They mostly build directly on capabilities corresponding to our 
Transmission layer.  By contrast, we have concentrated our efforts on Conversation and Service layer specifications, and on the 
architecture in general.
------- length: 1 in
1A project recently initiated at Bell Communications Research has been exploring similar goals and methods [Herman et al 1986].
------- length: 1 in
To date, only one instance each of the Physical, Transmission, and Conversation layers has been implemented.  We have used the 
resulting facilities extensively to produce the various Services and  Applications described in the preceding sections.  We have 
produced a relatively complete workstation service for Cedar workstatations, and a preliminary implementation for Interlisp.
We are not yet fully satisfied with the architecture, particularly the interface between the Conversation and Service layers.  This 
interface has proven to be somewhat clumsy to use, while at the same time restricting the number of capabilities that can be 
readily produced.  Recent progress is encouraging, however.
5. Current and Future Directions
Our efforts to date have been to build basic facilities for voice communication, based on the general architecture outlined in the 
previous section, then to produce a few interesting applications demonstrating the unique characteristics of the architecture and 
the flexibility of the Etherphone system for experimenting with novel voice applications.  Voice project members have built most of 
the applications, although a few programmers have included telephone management or voice synthesis activities in their applications 
using interfaces provided by the Services layer.
We have begun to explore a number of new directions and enhancements to the current capabilities.
We have a skeletal implementation of call filtering that provides options based on the subject, urgency, or caller's identity to 
decrease the intrusiveness of the telephone for the callee.  Our plan to integrate telephone conversation logs into the electronic 
mail system should have a side benefit of making the additional filtering information natural for the caller to supply.
We are considering novel kinds of interactive voice connections, such as all-day "background" telephone calls, use of the telephone 
system to broadcast internal talks or meetings (as a sort of giant conference telephone call), and conference calls that allow 
side-conversations to take place.
We plan to use conferencing capabilities (already supported by the hardware) to incorporate text-to-speech or recorded voice into 
telephone calls.  Among possible uses for text-to-speech are reading electronic mail over the telephone to a remote lab member as 
in PhoneSlave [Schmandt and Arons 1984] or MICE [Herman et al 1986] (but without dedicating a synthesizer solely to this task) and 
playing program-generated messages to callers, such as prompts or reports of the user's location (possibly by consulting the user's 
calendar program, such as "Dr. Smith is at the Distributed Systems seminar now, please call back after 5 o'clock").
We are also exploring a novel scripting mechanism for creating viewing paths through an electronic document or set of documents 
[Zellweger 1986].  Built on the capabilities of the voice architecture, scripted multi-media documents can contain any combination 
of text, pictures, audio, and action.  Scripts can be used in a wide variety of ways, such as for formal demonstrations and 
presentations, for informal communications, and for organizing collections of information.
Finally, we would like to extend the system to other media, such as still and real-time video, other workstations, and other 
architectures.
We have discovered that managing real-time and stored voice in a distributed environment presents some interesting problems in the 
areas of distributed systems [Terry 1986], user interface design, and voice transmission and processing technologies.   We intend 
to continue to investigate these problems.
6. References
[Ades and Swinehart 1986]    Ades, S. and Swinehart, D. Voice annotation and editing in a workstation environment, Proc. of AVIOS 
Voice Applications '86 conference, September 1986.
[Birrell and Nelson 1984]    Birrell, A. and Nelson, B. Implementing remote procedure call. ACM TOCS 2, 1, February 1984.
[Decina 1982]    Decina, Maurizio.  Progress towards user access arrangements in Integrated Services Digital Networks, IEEE Trans. 
on Communications 30, September 1982, 2117-2130.
[Herman et al 1986]    Herman, G., Ordun, M., Riley, C., and Woodbury, L.  The Modular Integrated Communications Environment 
(MICE): a system for prototyping and evaluating communications services.  To appear.
[ISO 1981]    International Organization for Standardization. ISO open systems interconnectionBasic reference model. ISO/TC 97/SC, 
16, 719, August 1981.
[Knuth 1973]    Knuth, D.  The Art of Computer Programming, Volume 3, Addison-Wesley, page 392.
[Schmandt and Arons 1984]    Schmandt, C. and Arons, B. Phone Slave: A Graphical Telecommunications Interface. Proc. Society for 
Information Display 1984 International Symposium, June 1984.
[Swinehart et al 1983]    Swinehart, D., Stewart, L., and Ornstein, S. Adding voice to an office computer network. Proc. of 
GlobeCom 83, IEEE Communications Society Conference, November 1983.
[Swinehart et al 1986]    Swinehart, D., Zellweger, P., Beach, R., and Hagmann, R. A Structural View of the Cedar Programming 
Environment. ACM Trans. Programming Languages and Systems 8, 4, October 1986.
[Terry 1986]    Terry, D. Distributed System Support for Voice in Cedar, Proc. of Second European SIGOPS Workshop on Distributed 
Systems, August 1986.
[Thomas et al 1985]    Thomas, R., Forsdick, H., Crowley, T., Schaaf, R., Tomlinson, R., Travers, V., Robertson, G. Diamond: A 
Multimedia Message System Built Upon a Distributed Architecture. IEEE Computer, December 1985, 65-77.
[Xerox 1981]    Xerox Corporation.  An Internetwork Architecture. XSIS 028112, Xerox Corporation, Stamford, Conn., December 1981.
[Zellweger 1986]    Zellweger, P.  Scripted Documents.  To appear.