[Indigo]<Voice>Stewart>applans.bravo!2

PLANS

Our long range view of the Etherphone is that it should be perhaps a 20-40 chip machine - containing a microprocessor, 64K words of memory, A/D and D/A, and a single chip Ethernet controller, etc. We considered using the Stanford SUN computer for starters, (see Appendix for more discussion) but this seemed not to offer sufficient advantage - it wasn’t what we ultimately wanted (too big - especially with the present Ethernet interface) and yet it wasn’t something we had conveniently in hand. It did offer us the opportunity to start directly on a proper microprocessor and in a likely language, but we felt it likely that the program would remain small and so could relatively easily be transcribed later. Besides it is likely to be rewritten several times in getting it right. Finally, we are fortuitously arriving at a period when Altos will become "surplus" items.

So we decided to start by building an Alto-based Etherphone I, restricting ourselves to only those features which will be duplicated in the eventual (Etherphone II) machine. We plan to build two to five Alto I based Etherphone I’s and thus get started with the protocol and management problems while in parallel we pursue the design of an integrated Etherphone II.

Integration with Regular Telephones

In the fullness of time, we imagine that all telephones within CSL (maybe all PARC) will be Etherphones and that all communication with regular telephones will take place via a POT Server. However in order to create and use a believable prototype Etherphone, it must be able to talk to regular telephones even before we build a POT Server. In order to allow for this, our initial Etherphones will connect not only to the Ethernet but also to a regular phone line. There will be a convention which will allow the Etherphone to distinguish whether a call that is being placed is to go over the Ethernet or out over the regular phone lines. The signaling etc. will, of course, be entirely different in the two cases. However automatic dialing by name will be available in either case.

Etherphone I

Requirements

1. Processor/Memory/Ethernet

We are talking about an Alto class machine. The fundamental requirements are that the processor pass data between the audio and the Ethernet at 64,000 bits per second in each direction at the same time. Additional cycles are needed to handle the user interface, and some level of other Ethernet communications (call management functions).

2. Audio hardware/audio switching

We want telephone industry quality speech or better. Telephone quality means 8000 eight bit samples per second, mu-255 encoded. Better means more bits with a linear encoding or more samples per second, or both; for example 12 bit linear encoding at 10K or 12K samples per second. (Mu-law is equivalent to 13 bit encoding at low signal levels, somewhat less at high signal levels.)

For mu-255 encoding, either a single chip codec or a 12 bit linear A/D and D/A seem appropriate. If the linear D/A is used, then table lookup translation of mu-255 coded bytes would be used. Single chip codecs will only work at 8000 samples per second.

An additional hardware requirement is for silence detection. In the Alto-Auburn world, the microcode returns the sum of the absolute values of all the samples in an audio control block. This value is close enough to the sum of squares to be useful for silence detection. For a microprocessor, if we cannot afford to look at every sample, then this function must be hardware. Of course, the silence detector can be an analog energy detector!

It seems likely that there will be several audio sources/sinks around. Easy examples are the telephone line from the wall, the user’s handset, a handsfree telephone, a speaker, recorder jacks, etc. The audio peripheral needs adequate analog switching or relays or digital conversion to freely connect these devices.

Some number of digital inputs and outputs are needed: to connect a data access arrangement, to out-pulse, etc.

3. User interface, buttons/lights/display

At least a 12 button phone industry keyboard is required, suitably interfaced to the processor. In principle, we could actually use a DTMF (dual tone multifrequency, Touch-Tone is a registered trademark) pad plus a DTMF decoder, but a switch array seems cheaper. The processor will probably want to synthesize DTMF for user-feedback is a switch array is used. (Note that the KTS proposal requires outpulsing or out-DTMFing anyway). In a system with enough common equipment, the DTMF receiver could be common, elsewhere on the net!

For higher levels of functionality, digital output is required, from a few lights to a single line display, to a several line display. The user input could be anything up to a full typewriter keyboard or touch-panel. (A several line display plus keyboard gives written message capability too.)

Proposal

We propose to build a two to five station Ethernet key telephone system. Each station would consist of an Alto I with an Auburn audio interface but no disk or display. The audio hardware would connect both to local audio devices (a telephone handset, a handsfree phone, a mike and speaker) and to the telephone company office telephone line.

The chief beauty of this proposal is that it permits us to get under way with existing facilities:

1. Existing computer and development environment.

The Alto comes equipped with an ethernet controller, memory, and a processor. It also comes with a keyboard, which could be used for user input. Use of the display for user output will require us to limit ourselves to a few lines. (Use of the display also requires a font, etc. etc.) We should probably not use the Alto disk other than for debugging.

Assuming use of Auburn on the Alto I, the software options are reduced basically to bcpl. Bcpl comes with a Pup package for communications, an operating system, timers, multi-processing, debugger, etc.

2. Existing audio hardware.

The Auburn board was designed for the Alto II, but but also works on an Alto I, provided that Mesa is not used. (Mesa uses up the RAM, so no processor devices can be used.) The Auburn board is a heavyweight device. It provides 12 bit linear coded audio at variable sample rates, and includes a substantial amount of digital and analog switching. It includes microphone/speaker, telephone (4-wire), and line levels. It includes enough digital switching for a DAA and then some. It includes a touch-tone detector and silence detector. It does not include a hybrid or AGC, which are needed for 2-wire devices. The Auburn system includes microcode for sum of absolute value silence detection, for mu-law encoding, for tone generation, and for 16 Kbps ADPCM. Auburn seems to do a little more than needed (variable sample rate), but in general, its extra functionality is not in those directions we find important.

If we can restrict ourselves to live within the functional capability of the eventual Etherphone II, by discarding the Alto disk and most of the display, then the use of the Alto I appears to allow us to start work immediately on the development of protocols and management rather than delay such efforts with hardware and development environment work. Note that even with the Alto we will need the analog hybrid and connection to the phone company.

Details of the Etherphone I

This contains a description of what the first working Etherphone is supposed to do. Naturally, there are an immense number of options to sort over.

Hardware

The Alto based Etherphone I includes a local telephone set (handset plus switchhook) and a centrex telephone line. We plan to build two to five of these models. There is some amount of analog circuitry for connecting the office phone line that does not yet exist, but everything else is available.

Functions

The first Etherphone has all the functionality of the existing office phones, but nothing else. It provides plain old telephone service. Calls from the Etherphone to another Etherphone take place over the Ethernet. Calls to or from other locations use the Etherphone telephone set, but actually place the call over the centrex line. After initial development, the Etherphone is the only phone in the office. The intent of this is to produce a real system, instead of a demonstration model. If an Etherphone could talk only to other Etherphones, we would not get much use out of the two to five we plan to build; members of the audio project do not make enough phone calls to each other to make heavy use of the system, but they do make a few such calls, plus a lot of others. We intend to construct this prototype to give the appearance that there are a lot of Etherphones, while there may only be a few to start.

This POTS functionality does not sound like much, but there are a lot of details. We break down the basic functionality into three areas: supervision, connection, and transport.

Bit transport

Transport refers to an Etherphone-to-Etherphone connection over the Ethernet. The process consists of managing a full duplex 64,000 bits per second data stream with small controlled delay. We feel that about 50 packets per second will be necessary, giving about 40-50 milliseconds of end to end delay (one packet plus slop). On top of the basic requirement for a fairly large number of packets per second, we want to add silence detection. Silence accounts for something over 40% of a typical full duplex call. Simply not transmitting the bits for a silence interval could nearly double the capacity of an Ethernet for voice traffic.

Supervision

Supervision refers to the management of basic control functions: when is the phone off-hook, provision of dial-tone, reading the pushbuttons, generating signalling tones, etc. The Etherphone must exchange signaling with the centrex line in order to place external calls, but the burden can be put on the user to identify busy vs. ringing. These functions already exist in the Callup program using the Ross Box.

Making connections

Connection management refers to all the problems of taking the button pushes from the local phone (keyboard) and using the information to place a call. In the case of Etherphone to Etherphone communications, call placement would be accomplished by some form of the rondezvous protocol. In the case of calls using the centrex line, signalling information must be sent either by dial pulses or by DTMF.

Placing a call

Remember that this first system behaves just like an ordinary telephone. In order to place a call, the user first picks up the handset. The computer detects the closed switchhook contacts and begins to generate a locally produced dialtone. Just as in a standard phone, the dialtone indicates to the user the systems readiness to place a call. After getting dialtone, the user can use the keyboard to dial a number. Our current proposal is for numbers like 3XXX to refer to other etherphones while other numbers refer to use of the centrex line. In either case, the computer must generate DTMF tones for the user to hear while he is pushing buttons. If the first digit is not a 3, the Etherphone immediately takes the centrex line off-hook and replaces the local dial-tone by a full duplex audio path between the local handset and centrex. The computer continues to generate tones according to the buttons and sends those tones both to centrex and to the local handset. For a centrex call, the Etherphone does nothing else until the switchhook is again open. If the first dialed digit is a 3, the Etherphone accumulates 4 digits and interprets the last three as an ethernet host address. A roundezvous protocol is used to establish an ethernet connection to that address. If there is no response, a re-order (fast busy) tone is generated locally. If the destination Etherphone is in use, either by another Ether-call or by a centrex call, a busy tone is generated. If the remote Etherphone is free, a ringing tone is generated and the remote Etherphone generates an audible ring. Once both Etherphones are off-hook, a full duplex audio path is established, using 64,000 bits per second mu-law coded audio. About 50 packets per second in each direction will be necessary to keep the delays low. Silence detection will be used to reduce the total 100 packets per second requirement.

Receiving a call

If ringing is detected at the centrex line, the Etherphone generates an audible ring. When the local handset is picked up, the Etherphone picks up the centrex line and establishes a voice path. (To completely mimic the usual telephone, the keyboard and tone generators should be active also.)

If a connection request arrives over the network, the Etherphone generates an audible ring and simultaneously keeps the calling Etherphone informed of the status of the local switchhook.

During a call

If an Ethernet connection request arrives while the local Etherphone is in use, a busy indication should be returned to the requestor. If the centrex line rings while the Etherphone is in use, the situation is more difficult. We can either implement call waiting and produce the usual beep in the local handset, or we can arrange things so that the centrex line is made busy whenever the Etherphone is in use.

Enhancements

We turn to a description of some value added functions that can be incrementally added to the above system without adding any hardware.

Call waiting - Rather than returning a busy signal to an Ethernet request and making the centrex line busy, we can return ringing and generate a beep in the local handset. This would mimic the PTT offering, but we could perhaps improve it by giving the local user a way to specify whether busy or ringing is to be returned. We could also give the caller the option of generating the beep only after getting a busy indication. (There is lots of room for thought!)

Call restrictions - We can easily add some non-call placement commands to the local Etherphone, like "do not disturb for X minutes" or accept only calls from certain Etherphones. Since we have no way to identify callers from centrex, only a blanket restriction could be made. This is a sensitive sociological area!

Redialing and Speed Calling - It will be simple for the Etherphone to remember a small set of numbers for single button operation and simple for it to redial the last number dialled if it was busy or no-answer.

Answering machines - Announcements could be generated, if they were short enough to keep in main memory, but more general facilities would require an audio file server.

Call forwarding - The Etherphone could be controlled from any other Etherphone or (with software) from any workstation. One could instruct the Etherphone in one’s office to forward the centrex line and arriving Etherphone calls to wherever one was. Ones office Etherphone could then automatically forward the centrex line and refer calling Etherphones to the new number.