Heading:
Voice Proposal
Page Numbers: Yes X: 527 Y: 10.5"
Inter-Office Memorandum
ToFileDateAugust 20, 1981
FromS. M. Ornstein, L. Stewart,
D. Swinehart
LocationPalo Alto
SubjectVoice Project ProposalOrganizationCSL
XEROX
Filed on: <Audio>Doc>VoiceProposal.bravo
ABSTRACT
This is only the first part of the Voice Proposal. More sections will become available later.
This memo presents what we think is an appropriate voice project for CSL, discusses visions of new functionality that might come into our lives as a result of the endeavor, describes the enabling architecture as we see it, and then discusses a plan for setting out.
INTRODUCTION
Why voice? We see two domains. First, in the real-time world, our information management skills can give us improved control over voice communications (taming the telephone). Second, when we can integrate voice with our other endeavors (voice as data), we can add an additional dimension to all those activities. By ``voice’’, we do not mean ``audio’’, as in music, and we do not mean ``speech’’, as in speech synthesis or speech recognition. We do mean the integration of telephone service and the recording and retrieval of stored voice.
We have concluded that the most useful way to begin adding voice to our systems is to provide a new and better, Ethernet-based, form of telephone service. It seems likely that we can make genuine improvements in our own lives; and, in the process, we will build those fundamental tools for handling voice from which other extensions will grow naturally.
Our eventual goal is to build fully integrated systems that include voice as naturally as our systems now include text. Within about two years, we plan to provide every member of CSL with an Ethernet telephone, or Etherphone.
The Etherphone system will transmit voice and control information over the Ethernet rather than over conventional phone wires. The Etherphones themselves will serve as simple terminals (ethernet peripherals), without much intelligence of their own. Users’ workstations will provide enhanced user interfaces. A Voice File Server will enable storage and retrieval of voice. An Etherphone server will control the collection of Etherphones and generally manage the system. A gateway function will connect the Etherphone system to the public switched telephone network.
Visions for the future
We do not neccessarily advocate all of the features we can envision. In truth, we do not know exactly what we will find useful and elegant. The provision of plain-old-telephone-service is a complex enterprise; we may be able to do a better or less expensive job of it than has been done, but that alone is not sufficient reason for us to undertake a large scale project. We have marshalled our thoughts and come up with a series of visions of problems as they exist today and of ways in which an Etherphone system could improve our lives. We expect that the design of particular functionality (integrated systems) will be a continuing challenge long after a sufficient number of capabilities have been built.
Most of our proposed improvements in ``telephone’’ service have to do with getting phone participants more smoothly into direct voice communication -- or circumventing (supplementing) the need for such direct communication through voice messages. We argue that the telephone provides reasonable communication once you are talking to the other person. It is in the process of negotiating the establishment of a conversation between the participants that the phone system is most inadequate and annoying. For principals, a personal secretary mediates both incoming and outgoing calls and thereby takes over most of the burden. We hope that we can provide some of this same relief for others. We also believe that voice in messages and documents will find application far beyond simple telephony.
We must be careful, because the conventions of phone use are deeply established. Although people are often annoyed by the present arrangements, they also tend to be quite conservative and to resent changes in the system. Our proposals could cause substantial shifts of burden between callee and caller -- as will become evident below.
The reader must keep in mind that there are problems for which we would like (and could use) solutions but which we do not intend to try to solve. In particular, we do not plan to address issues of artificial intelligence (speech understanding), speech recognition (single words), or even speech synthesis from text. Obviously such capabilities could greatly enhance our systems, but we lack the resources for such work and further, when such capabilities become available, we will be able to fairly easily incorporate them into our systems.
We present here six visions for the future -- ways we think that an integrated environment including the telephone system can improve our lives.
1. BETTER PLACEMENT AND RECEIPT OF CALLS
The Caller’s Plight
Often one does not know and cannot find the desired number. The phone book cannot be found or is out of date. Sometimes when looking for a service rather than an individual, one may not even know what to look for. The organization of a directory often does not match one’s needs: ``Well, his first name is Tom.’’
Even after a number is found, one’s call attempts are often stymied. It is simply too much work to dial and re-dial a phone. The number is misdialled, the line is busy or doesn’t answer. (A modern day variant is ``please hold on, all of our agents are busy.’’) The call may be answered, but the person sought not be available. The call may be answered by an inflexible answering machine.
These problems occur in subtle combination! The call is not answered, but it was the wrong number anyway.
Solutions
Most of the difficulties of finding the right number can be solved by placing calls by name rather than number and by use of appropriate data bases (e.g. white pages, yellow pages). These matters have little to do with telephones, but we can smooth the interface of data base systems and telephone systems by automatically placing calls (autodialling) rather than requiring the user to copy the number from the data base terminal to the telephone.
We can also provide ``call placement’’ services at a level well above simple number translation and automatic dialling. We could construct a calendar system incorporating a schedule of calls, and note taking and time accounting facilities. A lawyer might be interested in such a system.
The difficulties which arise when the desired party is not reached are both more difficult and more closely associated with telephones. We will have to distinguish between calls placed within our new system and those calls placed to points outside the system. For internal calls, the calling station will be able to distinguish between busy and no-answer, and perhaps could know who answers. The calling party could then choose to leave a message, to try again later, or to try somewhere else. Handling all these cases for outside calls would require signal processing and speech understanding beyond the scope of our intentions; we plan to let the caller listen in and decide what should happen next.
The callee’s plight
People are often busy with tasks more important than handling a particular call. If only it were possible to tell who is calling, what is the subject, how long will it take. We live in a social environment in which answering the telephone has unduely high priority. In addition, one is often out ``for just a moment’’ and misses calls. Answering machines seem too impersonal and the receptionist too remote. In essence, we would like to provide everyone with the same services provided to those who have personal secretaries.
Solutions
Let your telephone help field calls. For arriving inside calls, one’s telephone can know who is calling. We will develop ways of letting the caller indicate the relative importance of calls and the general subject area. We will provide a call filter, instructions to one’s telephone of the flavor: ``No calls for half an hour, except if it’s Bob or if Fred calls about the budget.’’ If the caller really wants to get through, we can provide an override button. This negotiation process can even proceed if you are already using the phone for something else.
These kinds of functions will be much harder to provide for calls arriving from outside the system. An expert calling in from outside will be able to use pushbutton phone signals to engage in negotiations. For other callers, the filter will operate as for internal calls, but will not directly support negotiations. Instead, callers will be forwarded to an attendant who can act as their agent.
It is clear that new burdens will fall upon the caller, if only to judge the importance of one’s call. One is used to barriers when one makes a call to or receives one from a person with a secretary, but otherwise, as a caller, one is used to getting through and may resent buffering by a non-human mechanism. Although we can try to soften the resentment by making the mechanisms as polite as possible, there seems no way to get around the fact that we are redressing an age-old imbalance -- the caller has traditionally had the ability to interrupt the callee.
2. BETTER MANAGEMENT OF FANCY FEATURES
Advanced ``features’’ of traditional telephone systems are too hard to use. Such features fall into two classes: those used separately from use of the phone, and those used during calls. Forwarding is a good example of a facility used while not making a call. It is too easy to forward one’s phone and then forget about it. One often receives forwarded calls intended for someone else. One often forgets to forward one’s phone before leaving the office. Features of the second class, those used during a conversation, include setting up a conference call (including a third person in an existing conversation), and call-waiting (receiving another call while you are already engaged in conversation):
``I have one person on hold, want to form a conference, but the second callee doesn’t answer. Let’s see now . . . flash, pause, . . . now what do I do next . . .’’
``How exciting to be part of the first interstellar phone call! [beep] Uh oh, I have another call, could you wait a moment, your excellency? [flash] Hello? A year’s supply of what? No, I’m not interested! [slam] oops. . . ’’
Solutions
We will use our workstations to control the telephone, bringing all of our user interface expertise to bear. We will clearly indicate the state of the system: filters, forwarding, calls on hold. We will provide reasonable user interfaces for controlling features. It will be possible to enter control commands, under password control, from anywhere in the system, not just from one’s office. Such capabilities will allow much better control over as seemingly simple a matter as call forwarding.
While we do not neccessarily advocate any particular feature, a computer controlled telephone system could also provide such features as paging, intercom, personalized dial-tones, personalized ringing, and so on. Although we don’t know exactly how to implement it, suppose everyone wore a tiny device (Locator) which locally broadcast his/her identity -- to the nearest telephone. Such a mechanism would allow the system to recognize when you were near your phone. Thus it could desist from trying to set up a call for you after you left your office. Furthermore, as recipient of your calls, your phone, using its Locator, could change its answering strategy when you left your office. It could begin to answer calls with ``He’s out; please leave a message’’. Alternatively, with a more sophisticated system, the phone nearest whereever you were could ring with a ``signature tune’’ -- assuming your profile indicates that’s what you wanted.
There is an implication that telephones not associated with nearby workstations will need better user interfaces. The telephone, extended in the ways we envision, is too difficult to control with only twelve buttons and the switchhook. A larger keypad or keyboard and a one line display seem appropriate.
3. BETTER HUMAN-ASSISTED CALL MANAGEMENT
This section discusses possible solutions for the telephone problems associated with attendants and receptionists. The kinds of problems that arise are usually due to insufficient information presented to the attendant, too great a workload, and poor communications between the attendant and potential callees.
The caller’s plight
This situation is not too bad for a person calling into a good manual attendant system. A light may indicate exactly who is being called. The attendant may know the callee is out and answer immediately. In more ``advanced’’ systems -- which are usually bought for reasons of cost-reduction rather than improved convenience -- the attendant doesn’t know who is being called, the phone rings for a long time before anyone answers, and one is always put on hold. (These problems are particularly acute when a call is forwarded several times before the guard at the door finally answers -- and he is not equipped to deal with phone calls at all!) The situation is better in some systems: there may be a distinguished console which can display the called number, but usually there can only be one such console per installation. In CSL, we have a key system for the laboratory ``on top’’ of the PARC Centrex telephone system.
The attendant’s plight
Many calls can be in different states, resulting in lost or mishandled calls. Since the call director or receptionist’s console is a special hardwired device, the attendant load cannot be either split or transferred. Usually messages are transcribed and filed on paper.
The callee’s plight
Messages don’t always get through. If they do, they are usually late. Because one has to explicitly check for messages, it is hard to know when a message has arrived. Messages are too short: ``Please call back Fred.’’ By when? About what?
Solutions
We will implement attendant features as extensions to the standard telephone/workstation functions. We will identify the original callee, by name as well as number. We will use a well designed user interface to indicate the status of multiple calls. Potential callees will be able to leave general or individual messages for potential callers: ``If Fred calls, tell him I’m at home.’’
Because attendant facilities can be implemented with a standard telephone/workstation setup, the attendant’s duties can be easily transferred (including calls in progress!) or shared. Messages for callees will be voice messages instead of handwritten notes.
We should add that we have not neccessarily invented many new features. Modern PABX systems can handle these kinds of functions, but often the facilities are only available at the main console. Subgroups within an organization do not have good access to the facilities and in any event, the functions are not well integrated with the office environment.
4. VOICE MESSAGES
Voice messages are a quite general facility. It might be better called ``storage, retrieval, and editing of segments of voice.’’
The Curse of the Dictating Machine
. . . . .wonder if this will make the goddamn thing work . . . testing . . . testing . . one, two, three Dear Mr. Brown colon paragraph I wonder if you have any idea how much trouble your blasted firm no leave out the blasted has caused our no better make that we have had a good deal of trouble difficulty with the by the way send a copy of this to Peter too with the bobble sockets you have been making for our ouija boards they ouch (click) easy honey tend to fall out . . . . .
The Curse of the Answering Machine
Answering machines, which today are relatively inexpensive and quite powerful, ought to solve many of the problems that we have identified. Using answering machines, it is now possible to convey and exchange information even when both parties are not available at the same time. One can also use them as remote dictating machines, as crude call-filtering mechanisms, and so on. Use of an answering device can help redress the callee/caller imbalance.
But in practice, especially in an internal office situation, anyone who has encountered an answering machine instead of the intended callee knows how unsuitable a conversant such a device is; it is one of the more intimidating of modern inventions. Its non-interactive nature essentially demands a well-constructed, composed message rather than an information communication, and few of us are able to produce such a composition in real time.
Solutions
Digital recording, editing, filing, dissemination, and playback of segments of voice permit us both to construct more reasonable versions of existing facilities and to construct entirely new facilities.
For users within the system the caller, not the callee, will make the decision to leave a recorded message -- or to take some other action -- when an attempted call fails. This removes much of the intimidation. The caller will also be able to thoughtfully compose a message by editing and review before the message is delivered. Finally, our existing message systems can be used to identify and organize voice messages for the callee’s benefit.
For outside callers, we will not rely solely on automatic facilities, for the reasons given above. (Callers who understand our automatic systems may be able to log in using tone signalling, then to behave as if they were internal users.) Rather, we envision providing facilities that will allow human attendants to handle incoming calls better and more efficiently than do current systems. Callers from outside the system, after they are routed to the attendant, will be able to leave quite complex messages without the problems of paper transcription. Inside callers will also have the option of speaking to an attendant when problems arise.
As an example, consider the following hypothetical fragment of conversation between an outside caller and an attendant (this scenario is a liberally revised version extracted from a 1975 memo by Ed McCreight):
. . . .
Receptionist:
``Ms. Foozle is not in the office. May I give her a message?’’
Caller: ``Yes,
[Receptionist buttons Telephone Message. A Laurel message form appears, with a text header addressed to Foozle; the following interchange is recorded and associated with that form]
would you tell her that George Geargrinder called, and ask her to call me back at area code 408, 555-8024?’’
Receptionist: ``All right, Mr. Geargrinder, I’ll make sure she gets the message.’’
[Receptionist (optionally enters a From: field and a subject field, then) buttons Deliver.]
5. ANNOTATED DOCUMENTS
By annotated document, we mean a more or less ordinary text document with voice messages attached to particular words, sentences, or paragraphs. Envision an executive making comments on a proposal, or a playwright expressing the voice tone of a particular line. We want to make voice a full partner with text and with graphics.
Begin with an ``ordinary’’ document, for example a Tioga document including text, graphics, and other structure. Select a point of interest and record some comments. A distinctive icon marks the location of a comment. Selecting that icon causes the playback of the associated voice.
It will be possible to obtain a new window permitting one to edit or amend a comment. We might be able to provide sufficient digital signal processing to play back a comment at a higher or lower speed.
6. TELE-TIOGA
Fully integrated systems, mixing text, voice, and imaginal material on an equal footing, will present both unforseen opportunities and unforseen problems. One possibility is ``document-level telephony,’’ wherein two or more participants use their workstations and their telephones to interactively collaborate on the form and content of a document. Both participants see the same display and discuss their work over a conference call. This area is not strictly a voice application, since no special voice capabilities are required. It is just one of the many possibilities for systems combining the capabilities of computers, large displays, and digital and voice communications.