Distribution of Intelligence in the Etherphone System A day in the life of an Etherphone user Daniel C. Swinehart Computer Science Laboratory Xerox Palo Alto Research Center Abstract: The Etherphone system is experimental facilit to explore methods for extending existing electronic office environments with the facilities needed to handle the transmission, storage, and manipulation of voice. Users control some of the Etherphone system's capabilities from any telephone, but more sophisticated functions are obtained through applications running on an associated workstation. Knowledge of the users' identity and user preference specification form the basis for a number of capabilities not usually available in telephone systems. Several examples illustrate the kinds of facilities that exist or are planned for the Etherphone system. The system has been designed to support the ready implementation of this kind of activity. Most telephone options and other voice services can be implemented on a workstation in the Etherphone environment. Those that prove particularly useful may migrate to a server (not necessarily the server), in order to increase their availability from stand-alone telephones and from a wide range of operating environements and workstation types. For server-based functions, permanent user-specified options can be stored in a database. We conclude that these methods form a flexible basis for experimentation with voice in an integrated personal information environment. 1. Introduction The Etherphone system is experimental facility that has been developed at PARC to explore methods for extending existing electronic office environments with the facilities needed to handle the transmission, storage, and manipulation of voice. Intended for exploration of wide range of telephony, voice mail, voice annotation, and other multi-media document applications. Begins with office environment and incorporates voice, in contrast to more common opposite approach. Hope that the change in point of view will lead to novel approaches. (On other hand, still has to be familiar to use for basic functions.) From the start, although we've had to build a lot of specific things, the emphasis has not been on specific choices of switching, transmission, terminal equipment, hardware or network architectures, but on functionality. More specifically, aim to provide flexible environment for programming novel applications, in fully-integrated way, across wide range of voice applications: telephony, recorded voice and music, synthesized voice&music. Methods should extend to cover speech transcription, video recording/switching/transmission. Aim also to produce voice facilities that serve people well, rising above the traditions that preserve methods that are no longer technically mandated. Finally, idea was to produce not only applications but also programmable facilities that would allow developers to include sophisticated voice applications without having to redevelop all the underlying facilities. Others have investigated Ethernet telephony, Packet voice transmission protocols [Sinkoskie, Ades], high-level methods for specifying telephone and voice functions [Olympic system, Ruiz 1985, Phoan, Herman et al 1986], recorded voice applications [Sydis, diamond, arpa, Maxemchuk], some products with telephone management [Sydis, Meridian, Panasonic]. Media lab types have had remarkable results in exploring intelligent interfaces to telephone, answering machine, office system in general. [Schmandt and Arons 1984, Intelligent desktop]. Only now are other experimental testbeds emerging to explore the range of user applications. [Herman et al 1986, Ades] We've written a lot about the goals of the system, hardware and software organization from systems standpoint [GlobeCom '83, IEEE], some specific software methodologies intended to handle voice [Managing Voice], user interfaces for less-interactive functions -- document editing and annotation, scripted documents [Ades/Swinehart, Zellweger 1987]. Here the intent is to demonstrate how this system is used to improve the nature of the telephone in the office setting. The emphasis is primarily on telephone control and management functions, only secondarily recorded voice applications, for which see ibid. Exposition will be by presenting a collection typical office situations and the way the Etherphone system deals with them. An short implementation section outlines the general approach to development and execution of these novel applications. 2. User interface User Model Multipurpose workstation (such as 6085; performance requirements not beyond Mac or PC) connected to internetwork. All examples in this paper show Xerox Dorado workstation running Cedar environment. Assoc. with each workstation is conventional telephone handset and a hands-free telephone terminal, whose speaker is used in place of a ringer. Functions in network include experimental PBX (under our control) for managing voice connections, w/standard access to other internal phones, outside lines, plus a variety of services such as service for storing and playing back digitally-recorded voice, voice synthesizer. Hands-free phone, whose speaker is used in place of a ringer. Scenarios Want to give reader good picture of present and potential capabilities of Etherphone system, through a series of descriptive examples. This is just a sampler. System is of course capable of wider range of functions than ther eis space to mention. We will follow the activities of a peripatetic professional, Karmen Foozle, during part of a very busy day. Karmen spends a part of her day in the office, the rest visiting colleagues, attending meetings, and so on. She has frequent dealings with people over the telephone, and many of them have schedules as frantic as hers is. Call forwarding is replaced by "Visiting" As the day begins, Karmen is visiting Lee Strum in his office three doors down from her own. Heard above the sounds of the office is a simple melody that Karmen recognizes as her personal ring tune, or motif. Shortly thereafter, Kim Bagley's motif joins in, in counterpoint to Karmen's tune. So that Karmen need not rush down the hallway to answer Kim's call, Lee turns to his workstation and registers Karmen as a visitor (Figure xx). Immediately, Lee's own telephone repeats the ring-duet and Karmen lifts the receiver with a friendly "Hi, Kim, what's up?" During Karmen's meeting with Lee, three additional calls find her in a similar way, and Lee answers one that rings with his own motif. An additional call to Karmen after she has returned to her office reminds Lee to terminate the visiting arrangement (Figure xx). Had the visit to Lee's office been a scheduled meeting, Karmen's appointments calendar would have made these arrangements automatically unless the appointment had been annotated with "no calls". Many flexible call-filtering methods are available Karmen uses the morning hours for activities that require intense concentration that the disturbance of frequent telephone calls would disrupt. Yesterday, she modified her personal profile to reject all normal-urgency calls altogether for several hours (Figure xx). For the remainder of the morning, internal callers were given a visual explanation for being turned away, while outside callers were routed to an attendant. (We do not consider existing automated answering facilities, even those that are as advanced as the Phone Slave, sufficiently advanced to provide adequate service in the office setting.) Today, Karmen simply wants to reduce the impact of the telephone without ignoring it altogether. She again changes her personal profile (Figure xx) to request subdued ringing. When Frances Brodsky calls, Karmen hears only a single, short, soft tone. On her screen appears the usual visual information: Frances' name and an indication that it's neither extremely urgent nor merely a social call. If she were to ignore the call completely, it would be handled as if she were not there at all. Frances would have the options of speaking to an attendant, leaving a voice message, scheduling a later call, or simply trying again later. This time, she indicates (Figure xx) to Frances that she'll take the interruption if it's important enough. Frances, after a moment's thought, decides that it is and reissues the call, at a higher priority backed up by a short subject string (figure xx). Karmen answers, and they agree to meet to resolve a budget crisis later in the day. Calls are to individuals, not locations After lunch, Karmen takes advantage of Brett Kornwald's temporary absence to spend an hour preparing a presentation at Brett's full-color workstation. Logging in tells the telephone system where Karmen is. For the duration of Karmen's occupancy, although Brett's incoming calls still ring there, so do Karmen's. Outgoing calls are identified as hers; in fact, the workstation and associated telephone behave in all ways as if they belonged to Karmen. Background calls and recorded voice support distributed meetings For their mid-afternoon budget discussion, Karmen and Frances remain in their own offices, connected by a background telephone call. Both of them place and receive several calls to other parties during this meeting, sometimes adding them to the background call, sometimes superceding it. The system knows enough about the background connection to reestablish it, in a speakerphone configuration, whenever the standard behavior would otherwise be to hang up the phone. Throughout the meeting, the participants have full access to their paper documents, workstation documents, policy manuals, etc. Using teleconferencing capabilities such as [Colab] or [Lantz1], they can view a shared budget document, discuss changes as they make them. Karmen and Frances create, edit, and attach voice annotations to specific locations in this document [Ades/Swinehart], combining these annotation into a narrative intended to convince their manager that several proposed new projects, although expensive, will be well-worth it (Figure xx). When the manager later plays this list of annotations, the corresponding budget document locations will be brought into view on his workstation, in synchrony with the audio presentation [Zellweger]. A negotiated approach to conference call establishment saves time and annoyance Following the budget discussion, Karmen wants to advise her own team of the outcome. She turns to her workstation and composes a request to set up a conference telephone call, to take place ten minutes later. The call will involve Karmen, Lee, Kim, Pat Fisher, and Frances (Figure xx). Karmen selected their names from a hierarchical telephone/personnel directory that merges Karmen's personal entries with organizational and community directories. Each participant or his workstation agent is consulted to determine if they will be available. Pat's calendar system accepts on his behalf, since it knows that he will be in his office, accepting visitors, and not otherwise engaged. Similarly, Frances already knows about the call, and has indicated a willingness to participate. The others must be consulted. Reacting to a distinctive melody on their telephones, Kim and Lee independently consult their workstations to learn of the proposed call. Each indicates whether they will be available (Figure xx: Kim can make it; Lee cannot). Meanwhile, Karmen engages in other activies, undisturbed until the scheduled time arrives, when another distinctive tone advises all parties that the conference connection has been established among all their telephones. As the call begins, Karmen has the "floor": hers is the only voice that will be transmitted to the other conference participants. After her opening introduction, she grants the floor to each of the other participants in turn for a reaction to the budget proposal, then opens the conference up for the remainder of the call to full interactive discussion. [Stodowsky ref and comment -- get one, also Lantz for computer conferencing]. During the conversation, she lets all the participants see and hear the narrated document that she and Frances prepared earlier. A copy of the document itself appears on each workstation, and sections appear, as before, in synchrony with the narration. The Etherphone can behave as a broadcast medium Despite her best intentions, Karmen finds herself with so much paperwork to clean up at the end of the day that she cannot attend the late afternoon in-service training session. Instead, she places a call to the training room (Figure xx), which adds her speakerphone to an ongoing conference call carrying the audio portion of the meeting. During the meeting she mostly listens, occasionally obtaining the floor in order to ask a question or clarify a point. Using similar methods, Karmen has access to television and radio broadcasts, shared recorded audio files, etc. The lot of the telephone attendant can also be made a happier one After Karmen has left for the day, George Geargrinder, a salesman from another firm, telephones her. Since Karmen has signed off from her workstation, the call rings immediately at an attendant's workstation. As the attendant converses with Mr. Geargrinder, she uses the voice annotation capabilities [Ades/Swinehart] to record into a form that has appeared on her workstation his answers to questions such as his name, the reason for his call, and when he might be reached. The attendant also enters his name and telephone number textually. Having assured the caller that the message will be delivered, the attendant sends the message, which is already addressed to Karmen. It is waiting for her in her electronic mailbox the next morning, along with her conventional text-only mail. Having listened to him tell her why he wishes to speak with her, she uses the handy "return call" button on the message border to call him back. Voice mail from internal callers may include their photographs to identify them quickly, in place of the evocative sketch that accompanies the message from George Geargrinder. Readers familiar with the PhoneSlave system [Schmandt and Arons 1984] will realize that the Geargrinder call follows closely the Phone Slave's automated scenario. Impressive as that system is, we ... comment from earlier, and therefore have proposed enhancing the role of the attendant rather than eliminating it. 3. Implementation Hardware Environment Voice facilities provided by rather unconventional hardware architecture. In place of conventional PABX, voice transmitted over an internetwork of Ethernets and telephone lines as digital datagrams. Digitization, transmission achieved by microprocessors connecting the user's telephone hardware to the Ethernet. Server machines on internet provide intelligence for individual telephones when used stand-alone, basic switching control, coordination and association of workstations and adjacent telephones, voice recording services, voice synthesis services, etc. Switching achieved by controlling addressing of packets in network; conferencing by merging multiple ethernet datagram sequences at receiving site. All control also through datagram-based protocols on internetwork. See Figure xx and [Globecom '83, IEEE]. Use of distributed control/intelligence to implement, aid in development of user interface examples Will not talk about the facilities required to do transmission [summary in GlobeCom '83, future publication], recording, editing and annotation [Ades/Swinehart, ManagingVoice], or even the detailed control methods. [future publication.] Will instead outline (a) the object-oriented "conversation management" design that supports the incorp. of workstations, recording service, synthesizer, .... and provides program control; (b) indicate the basic capabilities provided for extensibility of services (object style), and control of conversation-establishment/termination decisions; (c) use of database to assist with migration of facilities to server, and why we do that anyhow. (d) Perhaps something about reliance on other distributed intelligences to make the system tractable to build by two or three folks. Not an exhaustive description of Etherphone software by any means, just indicative of the approach used to get flexibility, ext...y, and goodies. a. Object oriented conversation model Basic facility of Etherphone software is facility for creating, maintaining, describing states of voice connections (called converstations). Objects called parties represent participants in conversations, either for purposes of supplying/absorbing actual voice transmissions, control, or both. Different object classes represent Etherphone processors, telephone trunks to outside world, channels to the voice file service or to the synthesizers, or workstation control programs that support the kinds of applications we saw in Section 2. Figure xx(a) shows the system's representation of a connection between two stand-alone Etherphones; Figure xx(b) shows a more elaborate configuration, involving several Etherphones, with associated workstations, in a conference arrangement that also includes a channel to the recording service so that a narrated document can be played to all the participants. Notice the special role of workstation parties as objects that extend and alter the behavior of associated Etherphones [system 2]. This figure corresponds to a conference call that takes place while Karmen Foozle is using Brett Kornwald's workstation. The system prefers the identity of a workstation party to that of the associated Etherphone, so the telephone behaves as if it were Karmen's. Notice also the Visiting association between Kim Bagley and Karmen. Apparently Kim is visiting Karmen in Brett's office, so that calls to Kim will ring there. At its heart, the system kernel provides programming facilities that permit the registration and departure of new parties, the creation and termination of conversations, invitation of parties to join conversations (usually resulting in ringing actions), and the advance of parties through the various conversation states (dial tone, ringback, actively conversing, etc.) In addition, kernel distributes informational reports to all participants when significant events occur. The basic system also includes the implementations of object classes representing Etherphone instruments, telephone trunks, the recording service and the voice synthesis service. b. facilities for service extension, conversation state decisions The user interfaces that Karmen and her friends used, along with many of the more elaborate behaviors, were implemented not in a server but on the individual workstations. Registration sets up associations like the one shown in the figure. Subsequently the implementation of the workstation party object can partipate in the switching control actions and receive the informational reports. Thus the full status of every call can be reported to users, and calls can be initiated from the workstation. To permit such modifications as call filtering (no ringing), workstation implementations can instruct telephone impl., through entries in a database maintained by the kernel, to abstain from the normal ringing behavior in response to incoming calls. Inhibition of "ringback" while placing complex conference calls, and a variety of other workstation extensions, are supported in a similar way. Timeout-based safeguards cause a return to conventional telephone behavior if the workstation does not respond to these real-time events in a timely manner. c. Value of migration of facilities to servers; use of databases to facilitate An important theme in the Etherphone is the development of enhanced telephone applications as programs that reside on workstations, then the later migration to shared servers of facilities that prove particularly useful. Examples of facilities that have already made this transition are the organizational telephone directory, and the package that manages the editing of recorded voice [ref]. A function that would benefit from this kind of migration is the capability to filter incoming calls based on caller name, subject, or urgency. The advantages of development on the workstation are obvious: no need to interrupt service, rebuild complicated program, risk correct behavior for other users while applications under development. Rapid turnaround for changes, extensive debugging facilities, programmer needn't leave own terminal, and so on. Advantages to migrating well-understood facilities to server less obviouswas to some extent surprise to us. Makes facilities usable directly on stand-alone telephones or when workstations not active. Reduces the performance required of workstation and its software. Most importantly, reduces complexity of software that must be provided by workstation when developing voice interface applications for a new programming environment or workstation type. (Examples: Viewpoint, Interlisp, Microsoft Windows). For applications such as custom-tailored call filtering that should continue to operate when the workstation is not functioning, need a way to record the users specifications. In instances like this, we've found that it is a simple matter to record the specifications in the same kernel-owned database which records the workstation's intentions to participate in the call-progress behavior. d. Reliance on environment to make job easie(r) Development of a system like the Etherphone has been a tractable task for a small team only because the development was done in a rich, powerful internetwork environment. File management, database management, communications protocols, authentication services, storage management, process scheduling all combine to reduce the effort required. Similar statements can be made about program development tools: editors, compilers, debuggers, monitoring toolsin an integrated system that runs along with the applications on both workstations and servers. Also, existence of extensible applications such as the multi-media editor, message system, window package, to serve as vehicles for the applications we've described. 4. Conclusions Most of the capabilities outlined in Section 2's collection of vignettes either exist today in Etherphone system, or could be readily achieved by applications programs developed on workstations using existing Etherphone services. Other capabilities, including the Geargrinder episode and some advanced converstion management ideas (such as the ability to hold private side conversations while participating in a larger conference) would require some additional low-level development, but no modifications to the overall architecture. Paradigm of rich distributed environment full of services and clients, with logical connections expressed in high-level languages in full pt/pt network independent of physical architecture, extends well to voice, functionally. Many of the user facilities are unmatched in existing voice products. Basic design that allows participation of many object classes, both centrally-located and distributed, has provided a flexible environment. A disappointment has been the complexities that still present themselves to the WS programmer who wants to have full access to the control facilitiesit's not a no-brainer. Another is the degree to which new areas need a lot of system-level work (like side-conversations). Extending these facilities to capture the full set of basic underlying voice facilities in a sensibly-presented voice architecture is a challenge for the future. Acknowledgments References [Sinkoskie] [Find a reference for the BellCore packet telephones. In Phoan stuff somewhere?] [Ades] [Use Stephen's thesis or its best reference for the best access to references to other packet voice telephony, also other stuff.] [Herman et al 1986] Herman, G., Ordun, M., Riley, C., and Woodbury, L. The Modular Integrated Communications Environment (MICE): a system for prototyping and evaluating communications services. Bell Communications Research, 435 South Street, Morristown NJ 07960. To appear. [Sydis] R. Nicholson. Integrating voice in the office world. BYTE 8(12):177-184, December 1983. [Meridian] [Something about NT's Meridian system -- was there a user interface paper?] [Panasonic] [Literature about the phone sold at Macy's?] [Phoan] [DeTreville's Phoan system] [OlympicSystem] [Reference for the programming environment for the Olympic messaging system.] [arpa] J. K. Reynolds, J. B. Postel, A. R. Katz, G. G. Finn, and A. L. DeSchon. The DARPA experimental multimedia mail system. Computer 18(10):82-89, October 1985. [diamond] R. H. Thomas, H. C. Forsdick, T. R. Crowley, R. W. Schaaf, R. S. Tomlinsin, V. M. Travers, and G. G. Robertson. Diamond: A multimedia message system built on a distributed architecture. Computer 18(12):65-78, December 1985. [Maxemchuk] H. Wilder and N. Maxemchuk. "Virtual Editing II: the User Interface," Proceedings of the SIGOA Conference on Office Automation Systems, Philadelphia, Penn. 1982. [Schmandt and Arons 1984] Schmandt, C. and Arons, B. Phone Slave: A Graphical Telecommunications Interface. Proc. Society for Information Display 1984 International Symposium, June 1984. [Intelligent Desktop] Schmandt, C. and Arons, [Intelligent desktop reference] [GlobeCom '83] D. C. Swinehart, L. C. Stewart, and S. M. Ornstein. Adding voice to an office computer network. Proceedings IEEE GlobeCom '83, November 1983. Also available as Xerox Palo Alto Research Center, Technical Report CSL-83-8, February 1984. [IEEE] D. C. Swinehart, D. B. Terry, and P. T. Zellweger. An experimental environment for voice system development. IEEE Office Knowledge Engineering Newsletter, February 1987. [Ades/Swinehart] S. Ades and D. C. Swinehart. Voice annotation and editing in a workstation environment, Proceedings AVIOS Voice Applications '86, September 1986, pages 13-28. [Zellweger 1987] Zellweger, P. Scripted Documents. To appear. [Colab] Stefik et al. Comm. ACM Colab reference. [Lantz1] Teleconferencing system reference. [Lantz2] Floor control reference. [Stodowsky] Floor control reference. [ManagingVoice] D. B. Terry, D. C. Swinehart. Managing Voice Stored Voice in the Etherphone System. to appear. [Ruiz 1985] Ruiz, A. Voice and telephony applications for the office workstation. Proc. 1st International Conference on Computer Workstations, San Jose, CA, November 1985, 158-163. A paper on Distributed control (user and programmer) for GlobeCom '87, due March 1987 Copyright 1987 by Xerox Corporation. All rights reserved. Draft last edited by Dan Swinehart, March 14, 1987 5:35:52 pm PST pUIcode