Heading:
Audio Proposal
Page Numbers: Yes X: 527 Y: 10.5"
Inter-Office Memorandum
ToThacker, McCreight, Baskett,DateJune 26, 1981
Lampson, Rovner, Crowther
FromS. M. Ornstein, L. Stewart,
D. Swinehart
LocationPalo Alto
SubjectAudio ProposalOrganizationCSL
XEROX
Filed on: <Ornstein>AudioProposal.memo
ABSTRACT
INTRODUCTION
Background and Discussion: Three alternative approaches
1. Centrex
2. Control of a PBX
3. Etherphone
Discussion
VISIONARY OVERVIEW
New User Services
Benefits to the Callee
Benefits to the Caller
A Potential Benefit for Both - a Locator
Voice Messages
Integration with Regular Telephones
PLANS
Etherphone I
Requirements
1. Processor/Memory/Ethernet
2. Audio hardware/audio switching
3. User interface, buttons/lights/display
Proposal
Details of the Etherphone I
Hardware
Functions
Bit transport
Supervision
Making connections
Placing a call
Receiving a call
During a call
Enhancements
Call waiting
Call restrictions
Redialing and Speed Calling
Answering machines
Call forwarding
APPENDIX - THE MICROCOMPUTER BASED ETHERPHONE I
1. Hardware
2. Software
ABSTRACT
This memo presents what we think is an appropriate Audio game for CSL, discusses a vision of new functionality that might come into our lives as a result of the endeavor, describes the enabling architecture as we see it, and then discusses a plan for setting out.
INTRODUCTION
We have concluded that the most useful way to add "audio" into our systems is to start by tring to provide a new and better, Ethernet-based, form of telephone service. There are two reasons for this choice; first, it seems likely that we can make some genuine improvements in our lives; and second, in the process we will have to build those fundamental tools for voice handling (means of turning voice into bits and vice versa, packaging up, transmitting and storing these bits, etc.) from which other extensions into the world of audio will grow naturally.
One can separate three sub-areas of work: telephony, filing, and system integration. Telephony has to do with the transmission of voice data and with elementary control functions of placing calls. Filing has to do with the ability to store and retrieve voice messages from a file server on the Ethernet. Integration has to do with advanced control facilities such as use of a data base to store telephone numbers, and with the manipulation of voice data in cooperation with our other activities: voice Laurel messages, voice annotation of documents, etc. Integration depends upon the basic utilities provided by Telephony and Filing. We therefore plan to work on them first.
Background and Discussion: Three alternative approaches
1. Centrex
The current Centrex system provides a single direct dial number for each office and has a few value-added features, such as call-forwarding and an attendant console. (One can forward calls to another number and one’s phone transfers to the attendent automatically after three rings.)
Several years ago, a device called the Ross Box (after Bill Ross) was available. This device connected to the Diablo port of an Alto and permitted the machine to pick up and dial one’s office phone. In addition, an early version of the Alto audio board was used to build a simple voice message system. One could call a special number, tap out someones initials, and leave them a message. Put together, these components, or more modern versions thereof, provide quite a bit of functionality. Basically the Ross Box equivalent could provide a way of using all of our data base knowledge to look up phone numbers and dial them for us. (i.e. Laurel could provide an alternative for answering a voice message that included "Call Back".) The audio board device, in small numbers, provides a way of getting voice on and off the Ethernet, where our programs can work with it, file it, annotate with it, and so on. The A/D and D/A conversions are provided by a server, rather than by one’s own machine.
This system provides the capability for a number of really impressive systems: voice messages, voice annotation of documents, semi-automatic call placement, and so on. There are also some crippling disadvantages: our control of the operation of the voice transmission is limited and uncertain, and an important resource, one’s telephone, is tied up for extended periods. The second problem is really a consequence of the first. In the centrex system, the only controls available over the control of the telephone system are obtained by electrically picking up the office telephone and electronically generating beeps. This can be made to work, but is a slow and uncertain process. The progress of an attempted call is determined by the return of various noises over the voice path: ringing, dial-tone, busy, reorder, etc. It is quite difficult for a machine to sort out these noises; they cannot be ignored because calls do not always get through, a line is busy, the network fails, etc. In addition, the placing of a call requires several seconds: one or two to dial the call, perhaps one for the system to connect a local call, and as many as six seconds for ringing to be detected at the destination. This means that for applications such as annotation of a document, one’s office phone is effectively ties up. Callers will get a busy signal! Placing and taking down calls every few seconds to try and avoid this circumstance might make the phone compnay very unhappy. Telephone exchanges allocate common equipment based on human frequencies of call placement. Machine speeds would upset the phone company’s "traffic engineering." We might obtain "call-waiting" or similar functions from the phone company, but again, it would be hard for our machines to recognize the associated "beep" and almost as hard to handle the situation in a reasonable manner.
Of course we could buy a second telephone line for everyone but the speed and uncertainty of call placement would still be with us.
2. Control of a PBX
Under this scenario we would replace our present telephone system with a commercial PBX and use a computer to control the operation of an otherwise standard telephone switch. D/A and A/D conversion functions for manipulation of voice-as-data would be done by a server, and call placement would be be done either by manually dialing one’s telephone or by having one’s workstation instruct the PBX (possibly through another server) to place a call.
In this system, while calls outside the PBX would still be slow and uncertain, calls inside would be very fast and we would have easy access to the state of the system. If we want to know if a number is busy, just (digitally) ask the PBX. It would be possible to connect to a server for just a few seconds to record a voice annotation, while still remaining open for incomming calls. We could instruct the switch to do just about anything, such as forwarding calls after leaving one’s office, rather than before. At base, these functions are available because the phone system would be entirely under our control. We would still require an adequate number of servers to meet our D/A and A/D needs.
The key disadvantages of this system is that we do not have such a controllable PBX (getting one would cost quite a bit), and before installing such a system, we would have to negotiate control of the switch with the vendor.
3. Etherphone
The basic premise of the Etherphone approach is that actual transmission of the voice data is done in digital form over the Ethernet. There are many variations, but the eventual system might provide each CSL member with a 20-40 chip microcomputer based telephone interfaced to the Ethernet. Connections to the outside telephone world would be done by servers with trunks to the phone company. This scenario has the advantage of complete control over the telephone transmission system. We benefit by the natural multiplexing of the ether and by direct access to voice-as-data. Control of the system is distributed; negotiation for a call might take place directly between the source and destination Etherphones.
The disadvantages of this system are the uncertainties of Ethernet voice (not too serious), and the major fact that a 20-40 chip Etherphone cannot be built at least until a single chip Ethernet controller is available.
Discussion
Well, where did the KTS proposal come from? It is a first step on the route to a full Etherphone system. We feel that the centrex option, control of the existing phone system, is unacceptable because it does not offer sufficient reliable functionality. We feel that the PBX route, control of a commercial telephone switch, is impracticable for us, because we do not have one. (However, someone ought to do it!) The third alternative, Etherphone, is difficult to pursue now because it is expensive to build an Etherphone today (although that will change).
The Ethernet KTS proposal is a combination of the centrex and Etherphone scenarios. By building a few expensive Etherphones now, we can give a few people all the benefits of the Etherphone and develop all the required protocols and work on applications while at the same time working towards our true goal of the 20-40 chip Etherphone for everyone. In addition, the KTS idea, wherein all the clients retain their original phone lines, avoids the problem that not everyone has an Etherphone. Noones view of the phone system need change; the same 4-digit number still works, but for those with Etherphones, many value-added functions become available.
What about slight variations of this scheme?
One obvious way to avoid the expensive separate Etherphone is to place audio hardware in our workstations. This approach is probably fine for annotation of documents, but our workstations are not designed for 100% availability (You can’t get calls while you are in the debugger.) and they are not designed for real-time performance (Your call to your friend breaks up because the collector starts running.) Basically, if we want to use the system while a special program is running then workstation audio hardware is fine, but we can’t build a telephone system that way -- it has to work all the time.
One way to avoid using up Alto I’s is to construct stand along Etherphones out of commercial 16 bit microcomputers. At the present time, both the processor and Ethernet would be full boards, the audio hardware would be a few extra chips, and a fairly bulky power supply and cabinet would be needed. On top of that, we would probably not have a very good development environment. Our early efforts would be greatly diverted by hardware and software development struggles.
Providing workstation audio hardware may still be a good idea. The proposed auxiliary board for the D-machines could include simple audio hardware for a small additional cost. While users of such hardware would not have full integration of all audio functions, they would be able to use voice messages, annotation, and so on by running special programs. (I claim this is diversionary and therefore a bad idea - should we leave it out of this memo?)
VISIONARY OVERVIEW
Fig. 1 shows a block diagram of the sort of intermediate range system we are contemplating. The Etherphone is a "smart" telephone, containing a computer with a modest keyboard, a one line LCD, ethernet interface, bell, etc. It digitizes its analog data and ships it (as well as call set-up control information) over the ethernet - instead of over the usual phone wires. Typically the receiver will be another Etherphone, participating in a call set-up or an on-going conversation. In the latter case the receiving Etherphone simply reconverts the digitized voice (from the ethernet) back to analog form and feeds it to the handset. Sometimes, as discussed below, voice traffic will go to or come from an Audio File Server which embodies special real-time capabilities necessary for servicing voice messages.
Because telephone service must continue to be provided even during times when one’s workstation is otherwise occupied, and because we don’t believe in the "always-ready-to-help" workstation (at least not for now), the Etherphone, is logically free-standing; i.e. it will provide the new, improved phone services described below without any help from the user’s workstation. Certain functions, however, such as name-lookup, assistance in managing (negotiating) connections, etc. may be provided by a separate Etherphone Server. The exact division of labor between the Etherphones and the Etherphone Server is not yet clear; the purpose of the server is to allow us to keep down the size of the Etherphone itself by sharing low duty-cycle functions.
For an individual who has a Workstation, there will be a Super-Laurel program that works in conjunction with the Etherphone and Audio File Server to allow one to listen to (receive), construct (edit), file, and transmit voice messages (or other voice documents for that matter).
Our plans call for a phased compatible growth into this new world. During this process there will be old-style phones (in CSL and certainly in the "outside" world) with which we will need to communicate. That means we will (eventually) need a POT gateway (plain-old-telephone) which allows conversations between our system and the regular phone system.
So to summarize, the proposed system includes the following elements:
1. Etherphones
2. Etherphone Server
3. Voice File Server
4. Related workstation programs
5. POT Gateway
The next section provides the reader with a preview of some of the improved services a user might see in this new world.
New User Services
Most of the improvements in "telephone" service have to do with getting phone participants more smoothly into direct voice communication - or circumventing (supplementing) the need for such direct communication through voice messages. We argue that the telephone provides reasonable communication, once you’re talking to the other person; that it is in the process of negotiating the establishment of a conversation between the participants that the phone system is most inadequate and annoying. For the caller this means dialing, waiting for the busy signal, redialing, waiting for the callee to answer, or a secretary, waiting for the real callee to be found and come to the phone, or waiting while the airline reservation system plays music at you......etc. For the callee, it means listening to the ringing, answering, establishing who is calling, what they want, whether it’s important enough to bother with now, etc. For "Big Guns" a personal secretary mediates both incoming and outgoing calls and thereby takes over most of the burden. We hope that through the Etherphone we can provide some of this same relief for smaller guns.
In general we are mucking with a very touchy business here since the conventions of phone use are deeply established, and although people are often annoyed by the present arrangements, they also tend to be quite conservative and to resent changes in the system. Our proposals could cause substantial shifts of burden between callee and caller as will become evident below.
First let’s distinguish between a conversation and a message. Although the distinction seems obvious, in talking about audio people often use the terms improperly (and sometimes even interchangeably). A conversation is a two way matter - it is used if you have a question or some matter that needs to be interactively discussed. A message is something you have to impart (maybe short, maybe long) that doesn’t require immediate, interactive response. That’s why we call Laurel a "message" system. Obviously you can turn a sequence of messages into a sort of converstaion but it’s awkward - and becomes increasingly useless as the degree of interactivity needed rises. Mostly what we are going to be discussing is mediating converstaions.
Benefits to the Callee
In the current phone system, the caller has the upper hand. There you are talking to someone in your office; the phone rings; you don’t know who it is, how urgent it is - nothing. Naturally you play it safe and interrupt your conversation to answer - frequently to your regret. If you’re lucky the person on the other end will:
1. quickly identify him/herself
2. tell you what it’s about, and
3. if it’s not a short question, ask if you’re free to talk now.
Many people aren’t that polite - in which case you have to listen until you figure out it can wait, and then interrupt to say "I’ll call you back" or equivalent. If you’re lucky enough to have a private secretary, you can provide her(him) with a filter for incoming calls which will typically let through only certain ones (emergencies, girlfriends, boyfriends, etc.).
In our visionary world, the Etherphone will similarly filter incoming calls, letting through only those you have specified as admissable. Your filter can be as complex as you are willing to specify. ("John and Mary under any circumstances, Peter if he calls about the budget, others only if they claim it’s urgent, but under no circumstances George or Nelson"). Let us consider calls occurring within our visionary system. ("Outside" calls, from regular phones, cannot be so selectively treated but must either be accepted or rejected - more simple filtering). When a call arrives, your Etherphone doesn’t ring immediately, but rather enters a negotiating phase. How this proceeds depends on whether a real live caller is "on the line" - or his Etherphone agent (placing the call for him - see the following section). Let us assume that the receiving Etherphone can distinguish between these and that it is a live caller. The Etherphone then asks who is calling - by feeding the caller a pre-stored message from the Audio File Server. The caller identifies him(her)self (via his(her) keyboard). If it’s John or Mary, your phone will then ring directly. If it’s George or Nelson, without ringing your phone, the Etherphone will tell them that you’re unavailable (again, by feeding them a pre-stored message). If it’s Peter, he gets a message that asks whether it’s urgent. He can (by button pushing) claim urgency, perhaps thereby penetrating your filter, in which case your phone will then ring. Alternatively your Etherphone will be able to "take" a (voice) message - complete with typed in title, sender, etc. Voice messages are discussed in a later section.
From this scenario, it is clear that when the callee is registered as "busy", new burdens fall upon the caller - to identify himself, specify the topic and the importance of his call, etc. One is used to such buffering when one makes a call to or receives one from a Big Gun, but otherwise as a caller, you’re used to getting through (unless, of course, the receiver’s phone is busy or doesn’t answer) and will resent buffering by a non-human mechanism. Although one can try to soften this by making the mechanism as polite as possible, there seems no way to get around the fact that we are fixing an age-old problem wherein the caller has had the ability to interrupt the callee (except for very strong willed individuals or those with private secretaries - rarely do I observe people letting their phones ring while they continue a conversation). This "upper hand" has, in the past, been justified because the only way to filter incoming calls was for a human to answer and find out whether it was important. People don’t mind if you answer but then tell them you’re busy and ask them to call back; I postulate that this is in part because they have an appeal - they can say it’s short, or important or whatever - i.e. can claim urgency. I believe that giving them that option (in the proposed system) will go a long way toward releiving the unhappiness that is to some extent inevitable (if we are to clean up the clutter of incoming phone calls).
Benefits to the Caller
For outgoing calls, you can call the receiver by name. You can either make the call while staying on the line - that is you will deal directly with what happens from the far end - or you can turn the call over to the Etherphone to make it for you, in which case the Etherphone will let you know (by ringing itself) when the callee answers. Just as with a secretary - while it is trying to get you back on the phone, it plays the callee a "Just a minute, Mr. Foo is calling you" message from the Audio File Server.
The Etherphone should ideally distinguish between (1) the remote phone’s ringing, (2) a busy signal, (3) a real person answering, and (4) the remote Etherphone answering. How much of this sort of thing we can do remains to be seen. In your telephone profile you can specify your "persistence" constant - i.e. how long to go on ringing, whether and how frequently to try again on a busy call, etc.
With this scheme, the callee can get bothered by "posthumus" calls - as when the caller goes out between the time he gives his Etherphone a call to place and the time it gets through. (Note that we presently have a similar problem when someone fails to un-forward his phone). It is therefore foolish to implement much "persistence" in call making. Besides, if the placing of calls is easy, the burden of pushing the "Try Again" button will seem modest. Actually, it will be better for a recipient to let his Etherphone field all incoming calls so that, by negotiating with the calling Etherphone, it can filter out such "posthumus" calls.
If your Etherphone places a call and encounters a negotiating Etherphone (rather than a live human) at the other end, your phone will generally call you for help - by ringing your Etherphone and telling you on the display what’s needed. You may be required to type in your name, topic, and urgency. Of course your name can be defaulted and the topic and urgency given in advance. Or it may be that the receiving Etherphone has no filter which excludes a call from you - in which case both Etherphones will know that the call is ready and will simultaneoulsy ring.
In this vision, one doesn’t have to reach far to encounter absurd situations where confusion arises as to whether it is a machine playing pre-stored messages at you or a live person on the othert end. Clearly there must be unambiguous ways for the Etherphone to distinguish these cases. (Presumably a human can relatively easily distinguish).
A Potential Benefit for Both - a Locator
Suppose everyone wore a tiny device which broadcast his/her identity very locally, say 15 feet - to the nearest telephone. Such a mechanism would allow the Etherphone to recognize when you were near your phone. Thus it could desist from trying to set up a call for you when you left your office. Furthermore, as recipient of your calls, your Etherphone, using its Locator, could change its answering strategy when you left your office. It could begin to answer calls with "He’s out; please leave a message". Alternatively, with a more sophisticated systemthe Etherphone nearest you could ring wherever you went - assuming your profile indicates that’s what you want.
Voice Messages
Voice messages travel from an Etherphone to the Audio File Server to await later retrieval. If the recipient has a workstation, then news about the message (title, sender, etc. but not the contents) will go into his regular Laurel mailbox - and will show up there in the listing of his messages. When he "displays" a voice mesasage, it plays out (from the Audio File Server) on his Etherphone - instead of on his screen. If the recipient has no workstation but only an Etherphone, then the Etherphone’s more limited features will provide a primitive capability for auditing voice mail. The LCD display will let you scan the voice message list (titles, senders) sequentially, listening only to those you choose.
Integration with Regular Telephones
In the fullness of time, we imagine that all telephones within CSL (maybe all PARC) will be Etherphones and that all communication with regular telephones will take place via a POT Server. However in order to create and use a believable prototype Etherphone, it must be able to talk to regular telephones even before we build a POT Server. In order to allow for this, our initial Etherphones will connect not only to the Ethernet but also to a regular phone line. There will be a convention which will allow the Etherphone to distinguish whether a call that is being placed is to go over the Ethernet or out over the regular phone lines. The signaling etc. will, of course, be entirely different in the two cases. However automatic dialing by name will be available in either case.
PLANS
Our long range view of the Etherphone is that it should be perhaps a 20-40 chip machine - containing a microprocessor, 64K words of memory, A/D and D/A, and a single chip Ethernet controller, etc. We considered using the Stanford SUN computer for starters, (see Appendix for more discussion) but this seemed not to offer sufficient advantage - it wasn’t what we ultimately wanted (too big - especially with the present Ethernet interface) and yet it wasn’t something we had conveniently in hand. It did offer us the opportunity to start directly on a proper microprocessor and in a likely language, but we felt it likely that the program would remain small and so could relatively easily be transcribed later. Besides it is likely to be rewritten several times in getting it right. Finally, we are fortuitously arriving at a period when Altos will become "surplus" items.
So we decided to start by building an Alto-based Etherphone I, restricting ourselves to only those features which will be duplicated in the eventual (Etherphone II) machine. This will allow us to get off the ground with a demonstration system and proceed with software development while we allow time for the Ethernet chip to come into existence and be incorporated into the design of Etherphone II. We plan to build two to five Alto I based Etherphone I’s and thus get started with the protocol and management problems while in parallel we pursue the design of an integrated Etherphone II.
Etherphone I
Requirements
1. Processor/Memory/Ethernet
We are talking about an Alto class machine. The fundamental requirements are that the processor pass data between the audio and the Ethernet at 64,000 bits per second in each direction at the same time. Additional cycles are needed to handle the user interface, and some level of other Ethernet communications (call management functions).
2. Audio hardware/audio switching
Telephone industry compatible speech or better. Telephone compatibility means 8000 eight bit samples per second, mu-255 encoded. Better means more bits with a linear encoding or more samples per second, or both; for example 12 bit linear encoding at 10K or 12K samples per second. (Mu-law is equivalent to 13 bit encoding at low signal levels, somewhat less at high signal levels.)
For mu-255 encoding, either a single chip codec or a 12 bit linear A/D and D/A seem appropriate. If the linear D/A is used, then table lookup translation of mu-255 coded bytes would be used. Single chip codecs will only work at 8000 samples per second.
An additional hardware requirement is for silence detection. In the Alto-auburn world, the microcode returns the sum of the absolute values of all the samples in an audio control block. This value is close enough to the sum of squares to be useful for silence detection. For a microprocessor, if we cannot afford to look at every sample, then this function must be hardware. Of course, the silence detector can be an analog energy detector!
It seems likely that there will be several audio sources/sinks around. Easy examples are the Telco line from the wall, the user’s handset, a handsfree telephone, a speaker, recorder jacks, etc. The audio peripheral need adequate analog switching or relays or digital conversion to freely connect these devices.
Some amount of random digital inputs and outputs are needed: to connect a data access arrangement, to out-pulse, etc.
3. User interface, buttons/lights/display
At least a 12 button phone industry keyboard is required, suitably interfaced to the processor. In principle, we could actually use a DTMF pad plus a DTMF decoder, but a switch array seems cheaper. The processor will probably want to synthesize DTMF for user-feedback is a switch array is used. (Note that the KTS proposal requires outpulsing or out-DTMFing anyway). In a system with enough common equipment, the DTMF receiver could be common, elsewhere on the net!
For higher levels of functionality, digital output is required, from a few lights to a single line display, to a several line display. The user input could be anything up to a full typewriter keyboard or touch-panel. (A several line display plus keyboard gives written message capability too.)
Proposal
We propose to build a two to five station Ethernet key telephone system. Each station would consist of an Alto I with an Auburn audio interface but no disk or display. The audio hardware would connect both to local audio devices (a telephone handset, a speakerphone, a mike and speaker), and to the telephone company office telephone line.
The basic idea of the Ether KTS operation would be to place telephone calls between Etherphones by transmitting the voice over the Ethernet, but placing calls to other non-Etherphone locations by dialing out over the standard office line. Once the basic transmission facilties are in place, we can begin exploring the voice universe by slowly incorporating value-added functions to the system.
A key ingredient of the design is 100% availability of telephone service. There would be a mechanical switch to bypass the Etherphone and connect the telephone handset directly to the line. This switch would normally connect the line to the Etherphone, but could be thrown in case of panic or debugging to leave the telephone working.
The chief beauty of this proposal is that it permits us to get under way with existing basic facilities:
1. Existing computer and development environment.
The Alto comes equipped with an ethernet controller, memory, and a processor. It also comes with a keyboard, which could be used for user input. Use of the display for user output will require us to limit ourselves to six lines. (Use of the display also requires a font, etc. etc.) We should probably not use the Alto disk.
Assuming use of Auburn on the Alto I, the software options are reduced basically to bcpl. Bcpl comes with a Pup package for communications, an operating system, timers, multi-processing, debugger, etc.
2. Existing audio hardware.
The Auburn board was designed for the Alto II, but but also works on an Alto I, provided that Mesa is not used. (Mesa uses up the RAM, so no processor devices can be used.) The Auburn board is a heavyweight device. It provides 12 bit linear coded audio at variable sample rates, and includes a substantial amount of digital and analog switching. It includes microphone/speaker, telephone (4-wire), and line levels. It includes enough digital switching for a DAA and then some. It includes a touch-tone detector and silence detector. It does not include a hybrid or AGC, which are needed for 2-wire devices. The Auburn system includes microcode for sum of absolute value silence detection, for mu-law encoding, for tone generation, and for 16 Kbps ADPCM. Auburn seems to do a little more than needed (variable sample rate), but in general, its extra functionality is not in those directions we find important.
[[[[[Another possibility is the CAT interface, which is a single codec audio interface for printer ports. It can be used throught the Diablo port of an Alto I, but does not provide any audio switching or digital switching capabilities.]]]]] (Do we believe in this? If not, let’s omit mention of it).
If we can restrict ourselves to live within the functional capability of the eventual Etherphone II, by discarding the Alto disk and most of the diaplay, then the use of the Alto I appears to allow us to start work immediately on the development of protocols and management rather than delay such efforts with hardware and development environment work. Note that even with the Alto we will need the analog hybrid and connection to the phone company.
Details of the Etherphone I
This contains a description of what the first working Etherphone is supposed to do. Naturally, there are an immense number of options to sort over.
Hardware
The Alto based Etherphone I includes a local telephone set (handset plus switchhook) and a centrex telephone line. We plan to build two to five of these models. There is some amount of analog circuitry for connecting the centrex line that does not yet exist, but everything else is available.
Functions
The first Etherphone has all the functionality of the existing office phones, but nothing else. It provides plain old telephone service. Calls from the Etherphone to another Etherphone take place over the Ethernet. Calls to or from other locations use the Etherphone telephone set, but actually place the call over the centrex line. After initial development, the Etherphone is the only phone in the office. The intent of this is to produce a real system, instead of a demonstration model. If an Etherphone could talk only to other Etherphones, we would not get much use out of the two to five we plan to build; members of the audio project do not make enough phone calls to each other to make heavy use of the system, but they do make a few such calls, plus a lot of others. We intend to construct this prototype to give the appearance that there are a lot of Etherphones, while there may only be a few to start.
The POTS functionality does not sound like much, but there are a lot of details. We break down the basic functionality into three areas: supervision, connection, and transport.
Bit transport
Transport refers to an Etherphone-to-Etherphone connection over the Ethernet. The process consists of managing a full duplex 64,000 bits per second data stream with small controlled delay. We feel that about 50 packets per second will be necessary, giving about 40-50 milliseconds of end to end delay (one packet plus slop). On top of the basic requirement for a fairly large number of packets per second, we want to add silence detection. Silence accounts for something over 40% of a typical full duplex call. Simply not transmitting the bits for a silence interval could nearly double the capacity of an Ethernet for voice traffic.
Supervision
Supervision refers to the management of basic control functions: when is the phone off-hook, provision of dial-tone, reading the pushbuttons, generating signalling tones, etc. This is a substantial task because the Etherphone must exchange signaling with the centrex line in order to make a believable system. (Man, isn’t this a big extra job?)
Making connections
Connection management refers to all the problems of taking the button pushes from the local phone (keyboard) and using the information to place a call. In the case of Etherphone to Etherphone communications, call placement would be accomplished by some form of the rondezvous protocol. In the case of calls using the centrex line, signalling information must be sent either by dial pulses or by DTMF (What’s DTMF?).
Placing a call
Remember that this first system behaves just like an ordinary telephone. In order to place a call, the user first picks up the handset. The computer detects the closed switchhook contacts and begins to generate a locally produced dialtone. Just as in a standard phone, the dialtone indicates to the user the systems readiness to place a call. After getting dialtone, the user can use the keyboard to dial a number. Our current proposal is for numbers like 3XXX to refer to other etherphones while other numbers refer to use of the centrex line. In either case, the computer must generate DTMF tones for the user to hear while he is pushing buttons. If the first digit is not a 3, the Etherphone immediately takes the centrex line off-hook and replaces the local dial-tone by a full duplex audio path between the local handset and centrex. The computer continues to generate tones according to the buttons and sends those tones both to centrex and to the local handset. For a centrex call, the Etherphone does nothing else until the switchhook is again open. If the first dialed digit is a 3, the Etherphone accumulates 4 digits and interprets the last three as an ethernet host address. A rondezvous protocol is used to establish an ethernet connection to that address. If there is no response, a re-order (fast busy) tone is generated locally. If the destination Etherphone is in use, either by another Ether-call or by a centrex call, a busy tone is generated. If the remote Etherphone is free, a ringing tone is generated and the remote Etherphone generates an audible ring. Once both Etherphones are off-hook, a full duplex audio path is established, using 64,000 bits per second mu-law coded audio. About 50 packets per second in each direction will be necessary to keep the delays low. Silence detection will be used to reduce the total 100 packets per second requirement.
Receiving a call
If ringing is detected at the centrex line, the Etherphone generates an audible ring. When the local handset is picked up, the Etherphone picks up the centrex line and establishes a voice path. (To completely mimic the usual telephone, the keyboard and tone generators should be active also.)
If a connection request arrives over the network, the Etherphone generates an audible ring and simultaneously keeps the calling Etherphone informed of the status of the local switchhook.
During a call
If an Ethernet connection request arrives while the local Etherphone is in use, a busy indication should be returned to the requestor. If the centrex line rings while the Etherphone is in use, the situation is more difficult. We can either implement call waiting and produce the usual beep in the local handset, or we can arrange things so that the centrex line is made busy whenever the Etherphone is in use.
Enhancements
We turn to a description of some value added functions that can be incrementally added to the above system without adding any hardware.
Call waiting - Rather than returning a busy signal to an Ethernet request and making the centrex line busy, we can return ringing and generate a beep in the local handset. This would mimic the PTT offering, but we could perhaps improve it by giving the local user a way to specify whether busy or ringing is to be returned. We could also give the caller the option of generating the beep only after getting a busy indication. (There is lots of room for thought!)
Call restrictions - We can easily add some non-call placement commands to the local Etherphone, like "do not disturb for X minutes" or accept only calls from certain Etherphones. Since we have no way to identify callers from centrex, only a blanket restriction could be made. This is a sensitive sociological area!
Redialing and Speed Calling - It will be simple for the Etherphone to remember a small set of numbers for single button operation and simple for it to redial the last number dialled if it was busy or no-answer.
Answering machines - Announcements could be generated, if they were short enough to keep in main memory, but more general facilities would require an audio file server.
Call forwarding - The Etherphone could be controlled from any other Etherphone or (with software) from any workstation. One could instruct the Etherphone in one’s office to forward the centrex line and arriving Etherphone calls to wherever one was. Ones office Etherphone could then automatically forward the centrex line and refer calling Etherphones to the new number.
APPENDIX - THE MICROCOMPUTER BASED ETHERPHONE I
The use of a standard 16 bit microcomputer would be a lot closer to the eventual Etherphone II than the Alto I, but a substantial amount of work would be needed to get anything working.
1. Hardware
Requirement for ethernet communications very nearly reuires the use of the Stanford (SUN terminal) multibus ethernet controller. There are other possibilities, the Intel chip (1982 at the earliest), the VLSI systems area (many months at least), the Intel board level product ($4000), and the MEC controller (unknown, 1.5 Mbps), but none of them seem very attractive.
There are 8086 multibus single board computers and Z8000 SBC available now for around $1000. The Stanford 68000 board will probably be available in several months. There is at least one other 68000 multibus computer on the way. The 68000 systems are likely to be at least $2000. Given the performance capabilities of these machines (don’t forget the NSC16000 and TI9900), there is no particular reason to choose one of them.
The 8086 has always seemed more popular around Xerox, probably because it was available first. There are several projects around which use it. There is a report of a C compiler and there is for sure an assembler. Webster intends to build ethernet printers using the Stanford multibus ethernet controller and an 8086 SBC.
The Z8000 probably has a C compiler available for Unix, but we don’t know for sure.
The 68000 has a Unix based C compiler and there is a C pup-package for it at Stanford. The 68000 board at Stanford is great overkill for us. It includes virtual memory.
No matter what we do, we would need audio hardware for such a system. We will need it anyway for the eventual Etherphone of course, but that will be a fully integrated design probably with no bus, while this device would need to speak multibus protocol (probably). One disadvantage of microcomputer audio compared with the Alto is that since there is no micro-machine multiplexing (TASK) going on, the microcomputer audio interface would need to be buffered.
The best approach looks like a multibus based system using the Stanford ethernet controller and either an 8086 or a 68000, but either way, it looks like >$3000 per box.
2. Software
There is a Unix emulator on the Parc-VAX, but it does not work perfectly, and the VAX does not yet have full ethernet support.
The best looking development environment for the 16 bit micros, independent of the one chosen, seems to be Unix and C, but we would have to live with a poor development environment and few packages compared to bcpl and swat. If we use C, we have a Pup package of sorts, from Stanford, although we might have to rewrite it to talk 8086 interrupts or something.
ADDITIONAL STUFF NOT YET ABSORBED:
Given that you have decided to pass real time voice via the ethernet, the reasons you need to have a special phone-computer (other than your work-station) are:
1. There will still be stand alone phones (no workstation) with which you must communicate
2. Your worksrtation won’t be smart enough (for many years if ever) to be available for handling phone calls with the sort of availability you expect and need for your phone.
3. There will be diversity of machines and you don’t want to have to build sixteen different kinds of hardware
So why pass real time voice over the Ethernet?
If you continue to use the PTT, then you will constantly have to adapt to their changing rules - you will be using signaling techniques and protocols not under your control
2. Ethernet is cheaper for people who aren’t already comitted - it (time multiplexing of packets) is the wave of the future even for the phone company (?)
3. Call placing and receiving is smoother - and faster.
4. There are some features you can’t get by the "low-rent" arrangement
1. conferencing
2. Intercom - because call setup is quicker and you might re-establish connection whenever voice speaks - that lets you use the phone for other uses while also doing low-duty intercoming
3. phone busy - let you carry on several conversations at once - like sequencing extensions but without any special hardware
5. Serendipity
6. Fun
Reasons not to pass real-time voice over the Ethernet:
1. There will be delay problems for sure
2. There may be some bad delays - e.g. Dorado file transfers - if we don’t use a separate voice ethernet
3. Capacity uncertainty - Gateways
4. Costs more (the computer you need is bigger - more buffering - although you don’t have to pay for the PTT
5. You still need a special computer to pass Ethernet voice out into the outside world of the phone system
6. "We don’t understand the problem" - the Phone Co. does