IEEEPaper.tioga
Swinehart, September 30, 1986 6:33:57 am PDT
An Experimental Environment for Voice System Development (or something)
by Moe, Larry, and Curly
Intro
We've built this voice stuff as an integral part of a multi-media programming&other environment — goal to see what effect of mature office architecture could do for voice:
Taming the telephone (what would Alex do?)
Applications of recorded voice usw
other motherhood from before
In this note will be describing the particular(?) systems & hardware/software system we built. But want to stress that we noticed need for voice architecture as well. Cedar architecture involves separate, hierarchical, open architectures for communications, filing, putting together OS (lip service to Cedar structure), user interface, and so on. In comparison, most voice-based architectures are pretty crude or custom-tailored and not extendable.
Third goal, as yet only partly-realized: develop similar open architecture for voice than can
Support extended applicatons
Allow applications from multiple languages and environments
Simple things are simple, elaborate ones possible
Etherphone Project Description
System developed over past few years in CSL — space won't permit, but see CSL 83-8 and refer to Fig 1. Need updated Figure 1, including synthesizer(s).
Experimental Environment — Hardware architecture
Tried lots of things — for adequate control & flex, decided to build own. Only critical piece of new hardware: device called Etherphone — voice transmission on Ethernet — concept: control through Ethernet, too; Etherphone is network peripheral; bring transmission&voice switching to network. [Gloss over trunk completely, except one sentence later on.]
Remainder of core system in software —
Control server
. Provides interpretation of telephone actions, controls them.
. Supports connections for other sources of voice traffic, such as voice recording server.
. As we'll see, manages the interactions between WS and (own telephone and other services).
. Various specialized databases.
. Primary enforcer & provider of voice architecture.
Voice file service, including recording & playback, management of utterances, management of details of editing.
Synthesizer — Etherphone with synthesizer hardware (two manufacturers) and specialized code in server to meter text... too specific.
Other possibilities — recognition equipment, music synthesizer supplied by ordinary computers on net, and so on.
All of above not very interesting without incorporating the office workstation. At PARC, high-performance personal computers and TS machines (Xerox machines, Vaxen, Suns, PC's — all on the network). Environments such as Interlisp, Smalltalk, Unix, Cedar. Have done the server development in Cedar, most of workstation development in Cedar (but there's an Interlisp existence proof). Many ways to look at Cedar (see Cedar papers); for this purpose, multiple document-management activities through use of structured, multi-media editor (Tioga); programming facilities remain fully available for further development directly in same integrated environment. Typical approach is to build applications in terms of others —> Electronic mail uses full document editor for text . . . See Fig. 2 for typical Cedar Screen.
System exists in Internetwork Environment depicted in Fig 1 — other workstations and services worldwide directly communicate. Applications of this and similar environments fairly mature by now, as you might find in [ some of ours, some of sombody else's].
Examples of Applications to Date
Informational displays -- who is calling (by tune and icon) -> Fig 3.
Simple commands — ?
Phoning from DB or browsing ) — see Fig. 4 for DBT Browser.
Voice Annotation and Editing —
Set up connection to file server instead of other phone —> can record arb-length dictation, connect to document. Fig 5.
Also in Fig 5, picture of segment of voice that can be edited to edit the voice — annotations marking, color cues combine w/graceful features to provide assistance in editing and locating things later [TV paper citations].
Built up from voice record/edit capabilities & Tioga, + ability to manage recorded-voice values. Can be used wherever Tioga can, so is avail. for construction constructing voice messages.
Further example of applications building on each other — narrated documents — cite Polle paper, show in Fig 6 (imagine scrolling action).
State (Summary)
In daily use by 50 people as sole phone (connections to other lines & outside trunks provided in undescribed ways) — have developed applications above. Still working to define architecture that would make these applications easier to build and more robust (problem with competing uses for connections). Need to experiment with applications of same architecture to different workstation environments, different hardware architectures (could even combine them.)
Acknowledgments, References, 250 pages of appendices