Heading:
Voice Proposal: Alternatives
Page Numbers: Yes X: 527 Y: 10.5"
Inter-Office Memorandum
ToFileDateSeptember 13, 1981
FromL. StewartLocationPalo Alto
SubjectVoice Project: AlternativesOrganizationCSL
XEROX
Filed on: <Audio>Doc>VPAlternatives.bravo
ALTERNATIVE ARCHITECTURES
We have identified alternative system approaches to voice. These differ (in various ways) in the area of transmission and switching and in the area of control. We believe that very nearly the same collection of functions can be provided through any of these approaches, but that the choice greatly affects the performance and elegance of those functions. We should not forget that eventually a difference in degree becomes a difference in kind: a system that takes 1/2 second to place a call is very different from a system that takes 10 seconds to place a call.
One can separate three sub-areas of voice work: telephony, filing, and system integration. Telephony has to do with the transmission of voice data, with the elementary control functions of placing calls, and with terminal equipment (hardware). It is in telephony that the architectures described below differ. Filing has to do with storage and retrieval of voice messages and with composition and editting capabilities. Integration has to do with advanced control facilities such as use of a data base to store telephone numbers, and with the manipulation of voice data in cooperation with our other activities. It is important to note that filing and some kinds of voice work could proceed with a voice system that is not combined with the telephone system. We could concentrate on voice editting and annotation and ignore telephony, but we think that there is a lot to be gained by the integration of telephony and filing.
1. Use the existing telephone system
The present CSL phone system provides a single direct dial number for each office and has a few value-added features such as call-forwarding and an attendant console. (One can forward calls to another number and one’s phone transfers to the attendent automatically after three rings.) The distinguishing features of this system are that control information is ``in-band’’, consisting of beeps and clicks on the voice channel, and that both voice and control information passes through conventional phone wires and through a central switch (exchange). These remarks also apply to most PABXs.
To use the transmission and switching facilities of the existing phone system and to provide control over its operation, we would build a device which would connect to a workstation and permit the machine to pick up and dial one’s office phone. In addition, we would need some number of voice I/O interfaces on server machines. The voice I/O device, in small numbers, would provide a way of getting voice on and off the Ethernet, where our programs can work with it, file it, and so on. The A/D and D/A conversions are provided by a server, rather than by one’s own voice terminal.
This system provides the capability for a number of really impressive systems: voice messages, voice annotation of documents, semi-automatic call placement, and so on. There are also some crippling disadvantages. Any voice recording or playback would require placing a call to one of the servers! Calls are placed by electrically picking up the office telephone and electronically generating beeps. This can be made to work, but is a slow and somewhat uncertain process. The progress of an attempted call is determined by the return of various noises over the voice path: ringing, dial-tone, busy, fast busy, etc. It is quite difficult for a machine to sort out these noises; they cannot be ignored because calls do not always get through. In addition, the placing of a call requires several seconds: one or two to dial the call, perhaps one for the system to connect a local call, and as many as six seconds for ringing to be detected at the destination. This means that for applications such as annotation of a document, one’s office phone is effectively tied up since we could not afford the overhead of setting up a call for playback of a 10 second segment.
Many of the call placement and call receipt functions, for calls within the system, could be handled by internet communication, one’s workstation could check if the callee’s phone was in use before placing a call. There would still be a small "window" through which other calls could creep. Functions like complex forwarding would be quite difficult, since we could use only the switching capabilities built into the standard phone system.
The essential difficulty with use of a standard telephone system is the lack of sufficiently fast and versatile control of the switching machinery.
Belleville is essentially following this strategy. His voice box is an audio interface for a workstation. His device includes a tape recorder so that low power workstations can control the operation of the voice peripheral without handling voice data directly. Higher power workstations also have access to the digitized voice.
2. Control of a PABX
We could replace our present telephone system with a commercial PABX and use one of our own computers to control it. D/A and A/D conversion functions for manipulation of voice-as-data would be done by servers, and call placement would be be done either by manually dialing one’s telephone or by having one’s workstation instruct the control computer to set up a call.
In this system, the voice data still travels through more or less conventional wires and switches, but the control information is entirely digital, on the Ethernet, and under our control. Calls inside the PABX would be very fast and we would have easy access to the state of the system. It would be possible to connect to a server for just a few seconds to record a voice annotation, while still remaining available for incoming calls. Complex switching operations would be possible because the phone system would be entirely under our control. We would still require an adequate number of servers to meet our D/A and A/D needs.
The key disadvantages of this system are that we do not have such a controllable PABX and, before installing such a system, we would have to negotiate control of the switch with the vendor.
3. Build our own
If we build our own transmission and switching, we will clearly have sufficient control over it to supply the facilities we need. The possible approaches lie along the spectrum from constructing a more or less conventional telephone switch to a fully distributed architecture of telephones connected to a local network. In our environment, the local network of choice is the Ethernet -- we call this approach Etherphone. Somewhere in the middle of the spectrum are systems with multiple line controllers connecting 8 to 32 telephones each to the Ethernet (or interconnected with high speed lines).
One might call the three approaches to contruction of transmission and switching facilities the centralized, the hierarchical, and the distributed. Construction of a standard telephone switch lies largely outside our expertise. The hierarchical approach of shared Etherphones is described more fully later, but briefly it might be a bit less expensive in trade for a larger and more difficult design. For these reasons, we are concentrating on Etherphone.
Discussion
We feel that the first option -- control of the existing phone system -- is unacceptable because it does not offer sufficient reliable functionality and performance. The PABX route -- control of a commercial telephone switch -- is impracticable for us because we do not have one. The Etherphone approach is the only alternative which provides sufficiently versatile transmission, switching, and control capabilities and which is available to us. At a later time, we may wish to integrate both hierarchical and "centralized" components into the transmission and switching system. If we do a good enough job with the control protocols for the system, such integration may not be too difficult.
SHARED VS. SINGLE LINE ETHERPHONES
This is a series of notes by Stewart on this general subject. Dan Swinehart’s comments marked (DCS).
The shared Etherphone is thought of as a Dandelion class machine without a display or disk, but with a highly capable Ethernet controller and a capable T1 style TDM interface. Belleville’s rough estimates place such a design at about $200 to $300 per line. Thacker has suggested that a shared controller might be more reasonable than single line micro based Etherphones.
1. Etherphones do not need much memory. I estimate 2K - 4K program (ROM) and 2K - 4K RAM. Of these estimates, the program memory is the weakest. We expect to put most of the control machinery in the Etherphone server and not in the Etherphone but . . . As for RAM, we are building a system with 30 - 40 millisecond delays, which translates to 400 bytes of audio buffering in each direction. Double that for double buffering and add some extra packet buffers for control and static and stack space. An Etherphone cannot use very much memory unless we were to hang on a lot of extraneous functionality -- 64K is way too much.
(DCS) In a "third generation" system, I’d expect most of the functions of the Etherphone server to have drifted back into the Etherphone -- or at least into one Etherphone on each cable -- so that cheap systems could be made and sold. The memory, esp. program memory, will then have to be somewhat larger. I believe your memory sizes within a factor of two or so, for the Etherphone II (one for everyone in CSL, but still not a product).
2. The shared Etherphone and the POTS gateway have considerable commonality. Both have an Ethernet controller on one side and a T1 line on the other. There are a couple of differences. With the shared Etherphone, we also need a way to get the button and light I/O done. We might use T1 time slots for this or we might run a different line.
(DCS) To build the Etherphone II B, one needs a reasonably powerful keyboard and some kind of alphanumeric display. That is easier if there’s a processor nearby. The T1 time slot scheduler for all this stuff could get pretty hairy.
However, if we go with the single line Etherphone, we (CSL) might never build a POTS gateway because we may not be able to put it to much use without divorcing ourselves from the Xerox Palo Alto phone system. (If we eventually go for replacing that system we would have more resources to use to build the POTS server.)
3. Ethernet transceivers should not be an issue. Not only will they get cheaper, but we can share them across several devices. The blue book calls for 50 meter transceiver cables. There will typically be a lot of phones and workstations inside a given 100 m circle. Even if we decide to run a separate 10 MHz net for phones and for workstations, adjacent offices can share transceivers.
a) Don’t forget, though, that coax is much cheaper than transceiver cable. We could afford to let the coax lie in loops in order to get the tap points where we want them.
4. The single line Etherphone design is less work for us. (This requires accepting the idea that we won’t design the POTS gateway for a while, if at all.)
[Hardware] It seems much easier to design a microcomputer out of standard parts on a single board than to (say) take apart a Dandelion and replace the display controller with some kind of T1/button/lights interface. (If we build a Dolphin board, the economic justification for building the shared device vanishes, yes?)
[Software] The shared controller must handle over 400 packets per second for 8 simultaneous users. There are laws of large numbers for concentration of telephones but 8-24 phones doesn’t seem like enough to get much advantage. It is harder to write the program.
Part of the same picture is that the shared controller is more complicated. We don’t know just how many conversations a given machine could handle, and the economics (see below) are critically dependent on this number.
5. The economic justification is not that clear.
a) The codecs and analog electronics come out about even.
b) You need a certain amount of ram per conversation, plus additional ram for the more complicated program. The shared machine can use larger cheaper dynamic memory chips though.
c) The program memory scales less than linear for the shared Etherphone, but its size does increase. (DCS) The shared Etherphone memory would have to be much faster stuff for the faster processor.
d) In trade for multiple Ethernet controllers and transceiver cable interfaces, the shared Etherphone needs multiple cable drivers for the T1 loop.
e) The power supply has economies of scale, as does the box (but don’t forget that all the phones have boxes too, and that eventually the single line Etherphone could fit inside.) (DCS) The phone box will probably need more power than can be sent down the loop cable anyhow. A speakerphone does. So do most keyboards, displays, etc.
f) The T1 interface stuff is not that small, especially since the button and light information needs to be handled.
g) You need a certain number of processor cycles per conversation, plus a few to run the more complicated program (scales more than linear!).
h) You might use a common encryption unit instead of distributed, but this requires doubling memory bandwidth, plus setup time etc.
(DCS) If we were convinced that a non-shared EP could never be made economically feasible (favorable or neutral), I think we’d have to look at some other architecture. We are not convinced of that. It is more important to explore a design whose architecture looks superior when evaluated by any standard other than cost. I think, for us, that’s the non-shared EP.
6. Shared Etherphones might improve the Ethernet utilization. Suppose that those conversations between two particular shared units are sent as a single packet stream. (This requires substantially more cycles in the shared Etherphone to unscramble the conversations.) However, we get 500 phones per 10 MHz Ethernet without shared controllers, so who cares?
(DCS) Samples from all conversations from a particular shared EP could be sent in a single packet, using multicast techniques. This would cut contention and overhead even more, at a cost of some hairy descrambling in all receivers. Silence detection, over short intervals anyhow, becomes useless in such a system. It’s all very hairy.
7. Our forward vision to a product Etherphone is not crystal clear, but:
a) Single chip micros are getting cheaper.
b) Memories are getting bigger and more reliable.
c) Ethernet controllers will be like jelly beans.
d) Transceivers will be cheaper in the expected large quantities. (If telephones aren’t a big market, what is?)
e) The random logic can be built as a custom part.
f) Wafer scale?
8. Larger computers are getting cheaper too. There is a product in the works with an I/O bus for server use that we might (in a few years) plug our T1 controller into.
9. It is harder to upgrade the voice quality of the shared controller than of single line Etherphones. Suppose we decide to go to 12 bits at 10 KHz, T1 and codecs no longer work. The single line Etherphone can make easier use of parallel single chip converters. (But we might run analog phones to the shared controller and have a multiplexed A/D D/A.)
(DCS) Consider even different phones with different, or variable, voice/sound quality, depending on the application and need.
10. The shared controller is an uninteresting part of the design space. Clearly there is a full spectrum of telephone system configurations, from single line Etherphones all the way to centralized PABXs. The shared controller lies in the middle of the space, with the problems of both ends and some of the advantages of each. The small simple distributed Etherphone is more appealing.
11. It will be a lot easier to experiment and play around with somewhat overpowered single line Etherphones than with a highly tuned, shared program.
Discussion
Given that we choose one approach to start, nothing prevents us from either changing our minds later or even mixing single line and shared controllers in the same system. (The POTS server is essentially this.) The single line Etherphone we can start now, we would have to use Dolphins (probably) to start the shared machine now.
We will be done sooner and at work on the bigger part of the project (applications) if we do the single line Etherphone. The hardware is the least important part of the thing anyway. The economics are close enough that we stand a good chance of being wrong whatever we decide.
WORKSTATION AUDIO HARDWARE
One obvious way to avoid the expensive separate Etherphone is to place audio hardware in our workstations. This approach is probably fine for annotation of documents, but our workstations are not designed for 100% availability (You can’t get calls while you are in the debugger.) and they are not designed for real-time performance (Your call to your friend breaks up because the collector starts running.) Basically, if we want to use the system while a special program is running then workstation audio hardware is fine, but we can’t build a telephone system that way -- it has to work all the time.
STANDARD MICROCOMPUTER BASED ETHERPHONES
One way to avoid building a microprocessor system from scratch is to construct stand alone Etherphones out of commercial 16 bit microcomputers. At the present time, both the processor and Ethernet would be full boards, the audio hardware would be a few extra chips, and a fairly bulky power supply and cabinet would be needed. This approach is interesting because we would use a standard single board computer that someone else has debugged but has the disadvantage that any such computer would be more general purpose than we need.
Requirement for ethernet communications very nearly requires the use of the Stanford (SUN terminal) multibus ethernet controller. There are other possibilities, the Intel chip (1982 at the earliest), the VLSI systems area (many months at least), the Intel board level product ($4000), and the SLC.
There are 8086 multibus single board computers and Z8000 SBC available now for around $1000. The Stanford 68000 board will probably be available in several months. The 68000 systems are likely to be at least $2000. Given the performance capabilities of these machines (don’t forget the NSC16000 and TI9900), there is no particular reason to choose one over another.
The 8086 has always seemed more popular around Xerox, probably because it was available first. There are several projects around which use it. There is a C compiler and there is an assembler. Webster intends to build ethernet printers using the Stanford multibus ethernet controller and an 8086 SBC.
The Z8000 probably has a C compiler available for Unix, but we don’t know for sure.
The 68000 has a Unix based C compiler and there is a C pup-package for it at Stanford. The 68000 board at Stanford is great overkill for us. It includes virtual memory.
No matter what we do, we would need encryption and audio hardware for such a system. We will need it anyway for the eventual Etherphone of course, but that will be a fully integrated design probably with no bus, while this device would need to speak multibus protocol (probably). One disadvantage of microcomputer audio compared with the Alto is that since there is no micro-machine multiplexing (TASK) going on, the microcomputer audio interface would need to be buffered.
The best approach looks like a multibus based system using the Stanford ethernet controller and either an 8086 or a 68000, but either way, it looks like >$3000 per box.