Draft paper Distribution of Etherphone Control submitted to GlobeCom '87, due March 1987
Copyright Ó 1987 by Xerox Corporation. All rights reserved.
Draft last edited by Dan Swinehart, March 17, 1987 9:09:16 am PST
Distribution of Control in the Etherphone System
A day in the life of an Etherphone user
Daniel C. Swinehart

Computer Science Laboratory
Xerox Palo Alto Research Center
--- draft; do not distribute widely ---
Abstract: The Etherphone system provides an experimental environment to explore methods for including the transmission, storage, and manipulation of voice in existing electronic office systems. Some Etherphone capabilities are available from any telephone, while more sophisticated functions are obtained through applications running on an associated workstation. Knowledge of the user's identity and preferences form the basis for capabilities not usually available in telephone systems. Several examples illustrate the kinds of features that exist or are planned for the Etherphone system.
Etherphones have been designed to support the ready implementation of additional applications. Most enhanced telephone options and other voice services can be entirely implemented in the workstation environment. Those that prove particularly useful may migrate to a server in order to increase their availability from stand-alone telephones and from a wide range of operating system environments and workstation types. For server-based functions, permanent user-specified options can be stored in a database. These methods form a flexible basis for experimentation with voice in an integrated personal information environment.
1. Introduction
The Etherphone system is experimental facility that has been developed at the Xerox Palo Alto Research Center. Its objective is to explore methods for extending existing electronic office environments with the facilities needed to handle the digital transmission, storage, and manipulation of voice. The system has been designed to permit the exploration of a wide range of telephony, voice mail, voice annotation, and other multi-media document applications, while preserving simple conventional telephone behavior as a subset of its capabilities.
From the beginning, although the project has required the construction of a complete system, our emphasis has been not on the particular choices of switching or voice transmission methods, terminal equipment, or network architectures, but on functionality. More specifically, it has been our aim to provide a flexible environment for novel applications, fully-integrated into the office workstation environment, across a wide range of voice applications: telephony, recorded voice and music, synthesized voice and music. Our overall approach should extend to include speech transcription and the control of video recording, switching, and transmission, although we have not explored these areas.
Finally, our intent was to produce not only applications but also programmable facilities that would allow other developers to include sophisticated voice functions in their own applicaitons, without having to redevelop all of the underlying facilities.
This is an area of active research. We should mention some of the more relevant activities here. Many others have investigated the challenges of transmitting voice as datagrams on local area networks [4, 1, 5, 24]. Some notable approaches to high-level methods for programming telephone and voice functions include work by Ruiz [13], Richards et al [13], DeTreville [3], and Herman et al [6]. Advanced applications of recorded voice and document annotation have been extensively reported [8, 23, 11, 25]. A few products containing advanced telephone management and integrated voice capabilities are beginning to appear [8, 9, 10]. The Speech Research Group at MIT's Media Laboratory have produced remarkable results in their explorations of intelligent voice-directed interfaces to telephones, answering machines, and office system in general. [15, 16]. However, only recently have other broad-based experimental testbeds for office voice applications begun to emerge, most notably the Mice system [6] and the Island project [1].
Earlier publications on the Etherphone system have emphasized the system objectives, the overall hardware and software organization [19, 21], the specific software methodologies used to handle recorded voice [22], and the user interface designs that support voice annotation and editing [2] and scripted documents [29]. In this report, our intent is to demonstrate how the Etherphone system can be used to improve the nature of the telephone in the office setting. The emphasis is primarily on telephone control and management functions, referring to the annotation and editing capabilities only as they relate to these functions. The exposition now continues with the presentation of a series of typical office situations, and the way the Etherphone system deals with them. A brief implementation section outlines the general approach to development and execution of these applications.
2. A day in the life of an Etherphone user
2.1 User Model
In the scenarios that follow, we assume that each user's desk is equipped with a multipurpose workstation, such as a Xerox 6085 [28], or an Apple Macintosh. All the examples in this paper are taken from an implementation on a Xerox Dorado workstation [27] running the Cedar programming environment [20]. Each workstation is connected to a communication network. Associated with each workstation is a conventional telephone handset and keypad, and a hands-free speakerphone, whose speaker is used in place of a ringer. The network functions include an experimental PABX for managing voice connections: to other internal phones, to outside lines, and to a variety of services, including a service for storing and playing back digitally-recorded voice, and a text-to-speech synthesizer.
2.2 Scenarios
In this section, we wish to give the reader an understanding of the present and potential capabilities of the Etherphone system, through a series of descriptive examples. This is not an exhaustive enumeration of the Etherphone's features, but is intended to evoke its look, feel, and general approach to integration.
We will follow the activities of a peripatetic professional, Karmen Foozle, through a very busy day. Karmen spends a part of her day in the office, the rest visiting colleagues, attending meetings, and so on. She has frequent dealings with people over the telephone, and many of them have schedules as frantic as hers is.
2.2.1 Call forwarding is replaced by "Visiting"
As the day begins, Karmen is visiting Lee Strum in his office three doors from her own. Heard above the sounds of the office is a simple melody that Karmen recognizes as her personal ring tune, or motif. A few seconds later, Kim Bagley's motif joins in, in counterpoint to Karmen's tune. So that Karmen need not rush down the hallway to answer Kim's call, Lee turns to his workstation and registers Karmen as a visitor (Figure xx). Immediately, Lee's own telephone repeats the ring-duet, Lee's workstation presents a description of the call (Figure xx), and Karmen lifts the receiver with a friendly "Hi, Kim, what's up?" During Karmen's meeting with Lee, two additional calls find her in a similar way, and Lee also answers one that rings with his own motif. An additional call to Karmen after she has returned to her office reminds Lee to terminate the visiting arrangement (Figure xx). Had the visit to Lee's office been a scheduled meeting, Karmen's appointments calendar would have made and cancelled these arrangements automatically.
2.2.2 Many flexible call-filtering methods are available
Karmen uses the morning hours for activities requiring intense concentration that the disturbance of frequent telephone calls would disrupt. Yesterday, she modified her personal profile to reject all normal-urgency calls altogether for several hours (Figure xx). For the remainder of that morning, internal callers were given a visual explanation for being turned away, while outside callers were routed to an attendant.
Today, Karmen simply wants to reduce the annoyance level of the telephone without ignoring it altogether. She again changes her personal profile (Figure xx) to request subdued ringing. When Frances Brodsky calls, Karmen hears only a single, short, soft tone. On her screen appears the usual visual information: Frances' name and an indication that it's neither extremely urgent nor merely a social call. If she were to ignore the call completely, it would be handled as if Karmen were not there at all. Frances would have the standard options of speaking to an attendant, composing and sending a voice message, negotiating a later call with Frances' calendar, or simply trying again later. This time, she indicates (Figure xx) to Frances that she'll take the interruption if it's important enough. Frances, after a moment's thought, decides that it is and reissues the call at a higher priority, backed up by a short statement of the subject of the call (figure xx). Karmen answers, and they agree to meet later in the day to resolve a budget crisis.
2.2.3 Calls are to individuals, not locations
After lunch, Karmen takes advantage of Brett Kornwald's temporary absence to spend an hour preparing a presentation at Brett's full-color workstation. Logging in tells the telephone system where Karmen is. For the duration of Karmen's occupancy, although Brett's incoming calls still ring there, so do Karmen's. Outgoing calls are identified as hers; in fact, the workstation and associated telephone behave in all ways as if they belonged to Karmen.
2.2.4 Background calls and recorded voice support distributed meetings
For their mid-afternoon budget discussion, Karmen and Frances remain in their own offices, connected by a background telephone call. Both of them place and receive several calls to other parties during this meeting, sometimes adding them to the background call, sometimes superceding it. The system knows enough about the background connection to reestablish it, in a speakerphone configuration, whenever the standard behavior would otherwise be to hang up the phone. By remaining in their own offices, the participants have full access to their paper documents, workstation documents, policy manuals, etc. Using teleconferencing capabilities such as those described by Sarin [14] ,Lantz [7], or Stefik et al [17], they can view a shared budget document, discussing changes as they make them. Karmen and Frances create, edit, and attach voice annotations to specific locations in this document [2], combining these annotation into a narrative intended to convince their manager that several proposed new projects, although expensive, will be well-worth it (Figure xx). When the manager later plays this list of annotations, the corresponding budget document locations will be brought into view on his workstation, in synchrony with the audio presentation [29].
2.2.5 A negotiated approach to conference call establishment saves time and annoyance
Following the budget discussion, Karmen wants to advise her own team of the outcome. She turns to her workstation and composes a request to set up a conference telephone call, to take place ten minutes later. The call will involve Karmen, Lee, Kim, Pat Fisher, and Frances. Karmen selected their names from a hierarchical telephone/personnel directory that merges Karmen's personal entries with organizational and community directories (Figure xx). Each participant or his workstation agent is consulted to determine if they will be available. Pat's calendar system accepts on his behalf, since it knows that he will be in his office, accepting visitors, and not otherwise engaged. Similarly, Frances already knows about the call, and has indicated a willingness to participate. The others must be consulted. Reacting to a distinctive melody on their telephones, Kim and Lee independently consult their workstations to learn of the proposed call. Each of them indicates whether he or she will be available (Figure xx: Kim can make it; Lee cannot). Meanwhile, Karmen engages in other activies, undisturbed until the scheduled time arrives, when another distinctive tone advises all parties that the conference connection has been established among all their telephones.
As the call begins, Karmen has the "floor": hers is the only voice that will be transmitted to the other conference participants. After her opening introduction, she grants the floor to each of the other participants in turn for a reaction to the budget proposal, then opens the conference up for the remainder of the call to full interactive discussion [18, 7]. During the conversation, she lets all the participants see and hear the narrated document that she and Frances prepared earlier. A copy of the document itself appears on each workstation, and sections appear, as before, in synchrony with the narration.
2.2.6 The Etherphone can behave as a broadcast medium
Despite her best intentions, Karmen finds herself with so much paperwork to clean up at the end of the day that she cannot attend the late afternoon in-service training session. Instead, she places a call to the training room (Figure xx), which adds her speakerphone to an ongoing conference call carrying the audio portion of the meeting. During the meeting she mostly listens, occasionally obtaining the floor in order to ask a question or clarify a point. Using similar methods, Karmen has access to television and radio broadcasts, shared recorded audio files, etc.
2.2.7 The lot of the telephone attendant can also be made a happier one
After Karmen has left for the day, George Geargrinder, a salesman from another firm, telephones her. Since Karmen has signed off from her workstation, the call rings immediately at an attendant's workstation. As the attendant converses with Mr. Geargrinder, she uses the voice annotation capabilities [2] to record into a form that has appeared on her workstation his answers to questions such as his name, the reason for his call, and when he might be reached. The attendant also enters his name and telephone number textually. Having assured the caller that the message will be delivered, the attendant sends the message, which is already addressed to Karmen. It is waiting for her in her electronic mailbox the next morning, along with her conventional text-only mail. Having listened to him tell her why he wishes to speak with her, she uses the handy "return call" button on the message border to call him back. Voice mail from internal callers may include their photographs to identify them quickly, in place of the evocative sketch that accompanies the message from George Geargrinder.
Readers familiar with the Phone Slave system [15] will realize that the Geargrinder call follows closely the Phone Slave's automated scenario. We do not consider existing automated answering facilities, even those that are as advanced as the Phone Slave, sufficiently advanced to provide adequate service in the office setting, and therefore have proposed enhancing the role of the attendant rather than eliminating it.
3. Implementation
3.1 Hardware environment
The Etherphone voice facilities are provided by a rather unconventional hardware architecture. In place of a conventional PABX, voice is transmitted as digital datagrams over an internetwork of Ethernets [26] connected to each other by telephone lines. Digital to analog conversion and datagram transmission are performed by microprocessors that connect the user's telephone hardware to the Ethernet. Server machines connected to the network provide:
· control of voice conversation establishment and switching;
· coordination of workstations and adjacent telephones in cooperatively controlling voice activities;
· intelligence for the stand-alone behavior of telephone instruments;
· basic switching control, coordination and association of workstations and adjacent telephones
· voice recording services, voice synthesis services, other services.
Voice switching is accomplished by controlling the addressing of packets among Etherphones and services. Etherphones achieve full-duplex conferencing by digitally merging multiple ethernet datagram sequences, transmitted by the other particants, before relaying the results to the user in analog form. All control is also through datagram-based protocols in the internetwork. Figure HA is a diagram of the hardware architecture, which has been described more fully elsewhere [19, 21].
[Artwork node; type 'ArtworkInterpress on' to command tool]
Figure HA. Etherphone system components.
3.2 The role of distributed control
In this section we describe the role of distributed control and intelligence in the development and implementation of the user interface examples of Section 2. We will not discuss further the transmission, recording, editing, and annotation facilities of the Etherphone system [19, 2, 22]. Here we will outline:
· the object-oriented "conversation management" design that supports the incorporation of workstations and voice services, and that provides control of the voice system;
· the basic capabilities provided for the extensibility of services, and client programmer control of conversation-establishment and termination decisions;
· the use of a database to assist with the migration of functions from the workstation to servers, and the reasons for wanting to do it.
3.2.1 Object-oriented conversation model
The central facility of the Etherphone software is a system kernel that manages the creation, maintenance, and description of successive states of voice connections (called conversations). Objects denoted as parties represent participants in conversations, for the purpose of either participating in actual voice transmissions, controlling the conversations, or both. Different object classes represent different party types: Etherphone processors, telephone trunks to the public network, channels to the voice file service or to the text-to-speech synthesizers, or workstation control programs supporting the kinds of applications we saw in Section 2. Figure CNV(a) shows the system's representation of a connection between two stand-alone Etherphones; Figure CNV(b) shows a more elaborate configuration, involving several Etherphones, with associated workstations, in a conference arrangement that also includes a channel to the recording service (so that a narrated document can be played to all the participants). Notice the special role of workstation parties as objects that extend and alter the behavior of associated Etherphones [system 2]. This figure corresponds to a conference call that might have taken place place while Karmen Foozle was using Brett Kornwald's workstation (Section 2.2.3). The system prefers the identity of a workstation party to that of the associated Etherphone, so the telephone behaves as if it were Karmen's. Notice also the Visiting association between Kim Bagley and Karmen. Apparently, at the time represented by this figure, Kim was visiting Karmen in Brett's office, so that calls to Kim would also ring there.
At its heart, the system kernel provides programming facilities for
· the registration and departure of new parties;
· the creation and termination of conversations;
· invitation of parties to join conversations (usually resulting in ringing actions);
· the advance of parties through the various conversation states (dial tone, ringback, actively conversing, etc.);
· the distribution of informational reports to all participants when significant events occur (for example, when any party leaves a conversation by hanging up).
The basic system also includes the implementations of object classes representing Etherphone instruments, telephone trunks, the recording service and the voice synthesis service.
3.2.2 Facilities permitting workstation participation in conversation management
The user interfaces that Karmen and her friends used, along with many of the more elaborate behaviors of Section 2, were implemented not in a server but on the individual workstations. Registration of a workstation sets up associations like the one shown in Figure xx. Subsequently, the implementation of the workstation party object can partipate in the switching control actions and receive the informational reports. Thus, the full status of every call can be reported to users, and calls can be initiated from the workstation. To permit such modifications as call filtering (no ringing), workstation implementations can instruct the telephone party object implementation, through entries in a database maintained by the kernel, to abstain from the normal ringing behavior in response to incoming calls. Inhibition of "ringback" while placing complex conference calls, and a variety of other workstation extensions, are supported in a similar way. Timeout-based safeguards cause a return to conventional telephone behavior if the workstation does not respond to these real-time events in a timely manner.
3.2.3 The migration of facilities from workstations to servers
An important theme in the Etherphone system is the development of enhanced telephone applications as programs that reside on workstations, then the later migration to shared servers of facilities that prove particularly useful. Examples of facilities that have already made this transition are a telephone directory package and the package that manages the editing and storage of recorded voice [2]. A function that would benefit from this kind of migration, but which at present is implemented in workstation code, is the ability to filter incoming calls based on a caller's name, a call's subject, or a call's urgency.
The advantages of development on the workstation are obvious, among them: it can be accomplished without interrupting service to install new software; it better insulates other user from the ill effects of undebugged software; it permits rapid turnaround for changes, extensive debugging facilities, development at one's own terminal, and so on. The advantages to migrating well-understood facilities to reside on a server are perhaps less obvious; in fact, some of them came as a surprise to us. Server residency makes some facilities usable directly through the telephone touchpad, for telephones that are not located near workstations, or whose workstations are not active. It may reduce the performance requirements of the workstation. Most importantly, it reduces the complexity of software that must be redeveloped for a new programming environment or workstation type.
However, migration to servers also introduce new problems. For applications, such as custom-tailored call filtering, that ought to continue to operate when the workstation is not functioning, a method is need a way to record the user's specifications. In cases like this, we have found that it is a simple matter to record the specifications in the kernel-maintained database mentioned in the previous section. The normal migration path is to the server that implements the kernel system, but this is not required, since all of the kernel's facilities are available through network communications.
4. Conclusions
Most of the capabilities implied by the vignettes of Section 2 either exist today in Etherphone system, or could be readily achieved by applications programs developed on workstations using existing Etherphone services. Other capabilities, including the Geargrinder episode and some advanced conversation management ideas (such as the ability to hold private side conversations while participating in a larger conference) would require some additional low-level development, but no modifications to the overall architecture.
The distributed systems paradigm of a rich environment of services and their clients, connected through logical paths within a general-purpose internetwork, extends well to voice. Many of the user facilities of the Etherphone system are unmatched in existing voice products. We believe this is due in part to the system design that permits participation of many object classes, both centrally-located and distributed, which provides a flexible environment for development. One disappointment has been the difficulty of providing a sufficiently straightforward interface to the applications programmer who needs to have full access to the control facilities. At present, the applications programmer must choose between either very simple and limited capabilities or an excessively complex programming task. Extending these facilities to capture the full set of basic underlying voice functions in a sensible voice architecture is a challenge for future work.
Acknowledgments
References
[1] S. Ades. An Architecture for Integrated Services on the Local Area Network. Ph.D. Thesis, Cambridge University, February 1987.
[2] S. Ades and D. C. Swinehart. "Voice annotation and editing in a workstation environment," Proceedings AVIOS Voice Applications '86, September 1986, pages 13-28.
[3] J. DeTreville. Phoan: "An Intelligent System for Distributed Control Synthesis." Publication information as yet unknown.
[4] J. DeTreville and W. David Sincoskie. "A Distributed Experimental Communications System." IEEE Journal on Selected Areas in Communications SAC-1(6):1070-1075, December 1983.
[5]T. A. Gonsalves. "Packet-Voice Communication on an Ethernet Local Computer Network: an Experimental Study." Proceedings SIGCOMM 1983 Symposium on Communications Architectures and Protocols, Austin TX, March 1983, pages 178-185.
[6] Herman, G., Ordun, M., Riley, C., and Woodbury, L. "The Modular Integrated Communications Environment (MICE): a system for prototyping and evaluating communications services." Bell Communications Research, 435 South Street, Morristown NJ 07960. To appear.
[7] K. Lantz. "An Experiment in Integrated Multimedia Conferencing." Proceedings Conference on Computer-Supported Cooperative Work, Austin, TX, December, 1986.
[8] R. Nicholson. "Integrating voice in the office world." BYTE 8(12):177-184, December 1983.
[9] [A reference to Northern Telecom's Meridian telephone is intended here.]
[10] [A reference to a Panasonic multi-featured telephone is intended here.]
[11] J. K. Reynolds, J. B. Postel, A. R. Katz, G. G. Finn, and A. L. DeSchon. "The DARPA experimental multimedia mail system." Computer 18(10):82-89, October 1985.
[12] J. T. Richards, S. J. Boies, and J. D. Gould. "Rapid Prototyping and System Development: Examination of an Interface Toolkit for Voice and Telephony Applications." Proceedings CHI'86 Conference on Human Factors in Computing Systems, Boston, Mass., April 1986, pages 216-220.
[13]Ruiz, A. Voice and telephony applications for the office workstation. Proc. 1st International Conference on Computer Workstations, San Jose, CA, November 1985, 158-163.
[14] S. K. Sarin. Interactive On-line Conferences. Ph.D. thesis, Massachusetts Institute of Technology, Report MIT/LCS/TR-330.
[15] Schmandt, C. and Arons, B. Phone Slave: A Graphical Telecommunications Interface. Proc. Society for Information Display 1984 International Symposium, June 1984.
[16] C. Schmandt and B. Arons. "Voice Interaction in an Integrated Office and Telecommunications Environment." Proceedings 1985 Conference of American Voice Input/Output Society, October 1985.
[17] M. Stefik, G. Foster, D. Bobrow, K. Kahn, S. Lanning, and S. Suchman. "Beyond the Chalkboard: Computer Support for Collaboration and Problem Solving in Meetings." Comm. ACM 30(1):32-47, January 1987.
[18] [A reference to a paper by David Stodolsky on conference call floor control is intended here. It is this work that inspired the floor control notion in this paper.]
[19] D. C. Swinehart, L. C. Stewart, and S. M. Ornstein. "Adding voice to an office computer network."
Proceedings IEEE GlobeCom '83, November 1983. Also available as Xerox Palo Alto Research Center, Technical Report CSL-83-8, February 1984.
[20]D. C. Swinehart, P. T. Zellweger, R. J. Beach, and R. B. Hagmann. "A structural view of the Cedar programming environment." ACM Transactions on Programming Languages and Systems 8(4):419-490, October 1986.
[21] D. C. Swinehart, D. B. Terry, and P. T. Zellweger. "An experimental environment for voice system development." IEEE Office Knowledge Engineering Newsletter, February 1987.
[22] D. B. Terry, D. C. Swinehart. "Managing Voice Stored Voice in the Etherphone System." to appear.
[23] R. H. Thomas, H. C. Forsdick, T. R. Crowley, R. W. Schaaf, R. S. Tomlinsin, V. M. Travers, and G. G. Robertson. "Diamond: A multimedia message system built on a distributed architecture." Computer 18(12):65-78, December 1985.
[24]F. A. Tobagi and N. Gonzalez-Cowley. "On CSMA-CD Networks and Voice Communication." Proceedings Infocom, 1982, pages 122-1278.
[25]H. Wilder and N. Maxemchuk. "Virtual Editing II: the User Interface," Proceedings of the SIGOA Conference on Office Automation Systems, Philadelphia, Penn. 1982.
[26]Xerox Corporation, Intel Corporation, Digital Equipment Corporation. The Ethernet: A Local Area Network; Data Link Layer and Physical Layer Specifications. Version 2.0, November 82.
[27]Xerox Corporation. The Dorado: A high-performance personal computer—Three papers. Xerox Palo Alto Research Center, Report No. CSL-81-1, January 1981.
[28] [A reference describing the Xerox 6085 Professional workstation is intended here.]
[29] Zellweger, P. Scripted Documents. To appear.