Etherphone: An Experimental Voice System Architecture
February 1987
(Logo Hardcopy)
Etherphone: An Experimental Voice System Architecture
Daniel C. Swinehart
Douglas B. Terry
Polle T. Zellweger
Computer Science Laboratory
Xerox Palo Alto Research Center
1
Overview
· What the user sees —
applications and user interface
· What the programmer sees —
system and software architecture
· Conclusions
2
What the User Sees
· Initial objectives for the Etherphone system
· Demonstration of user interface and Etherphone architecture (video)
· Recent applications
3
Original Voice Project Objectives
· Taming the telephone
better placement and receipt of calls
better handling of special features
better human-assisted call handling
· Voice as data
recorded voice messages
synthesized text-to-speech messages
speech recognition (not yet)
voice-annotated documents
integration with other interactive applications
· Programmer control over all of the above
4
(show videotape here.)
Recent Applications
· Voice annotation
· Voice editing
· Scripted documents
5
(Picture of voice annotation, editing; consider just talking to the next two slides)
Voice annotation
· Begin with Tioga multi-media document editor
· Add voice annotations
· Requires new implementation methods:
Storage-by-reference
Cross-document sharing
6
Voice editing
· User expands annotations to obtain visual voice representation in separate window
· Standard text-editing operations apply to voice
Voice-annotated documents
Integration with other interactive applications
· "Sound-and-silence" design oriented towards simple phrase, sentence-level modifications
· Specialized additions permit
Control of voice dictation
Cues, marks, text annotations to compensate for lack of directly-intelligible information
7
Scripted documents
· Work just beginning
· Combines playback of annotations and other embedded actions
· Script is:
time-ordered sequence of timed actions
applied to specified locations in one or more documents
· Entry into hypermedia research
8
What the programmer sees
· Recent objectives: a voice system architecture
· Review of Etherphone hardware architecture
· A sampling of software issues
conversation management (telephone calls)
managing stored voice
Overall Goal: A Voice System Architecture
· Comprehensive
full control of telephone functions
flexible management of recorded voice
extensible: admits new applications, services,
networks
· User-programmable
language-level interfaces
simple things are simple
complicated things make sense
acceptable impact on reliability
· Supports interoperability
multiple workstations and environments
multiple networks and protocols
multiple telephone transmission and switching
choices
Other Architectures
· Programming environments
· Network architectures
TCP/IP, XNS, SNA
· Document architectures
Interpress, Postscript
Voice Architecture Layering
[Artwork node; type 'ArtworkInterpress on' to command tool]
Review of System Components
· Etherphone microprocessor-based telephone
· Workstation applications
· Ethernet transmission, switching, and control
· Shared voice control server
· Shared voice file server
· Other shared services:
Speech synthesizers
Music generators
Speech recognizers
. . .
Network structure
[Artwork node; type 'ArtworkInterpress on' to command tool]
Network structure
[Artwork node; type 'ArtworkInterpress on' to command tool]
Alternate Structure
[Artwork node; type 'ArtworkInterpress on' to command tool]
Conversation/Party Model
[Artwork node; type 'ArtworkInterpress on' to command tool]
Simple Conversations
[Artwork node; type 'ArtworkInterpress on' to command tool]
More Elaborate Conversation
[Artwork node; type 'ArtworkInterpress on' to command tool]
Conversation Management
· one or more parties per conversation
sources and sinks of voice
non-voice participants, e.g. workstations
· conversation establishment
state machine (idle, notified, ringing, talking, ...)
parties act autonomously
notifications of state changes
· authenticated or presumed identities
· active parties use Voice Transmission Protocol
· broadcast reports during active conversation
encryption key distribution
recording started, playback finished, etc.
Conversation Management -- Challenges
· Novel conversation models
background calls, lectures
unsymmetric participation in multiple
conversations
under workstation program control
· Workstation/telephone partnership
workstation decisions override telephone
workstation tracks telephone-initiated activities
default behavior when workstation fails
· Reliability in the face of:
simultaneous actions by all parties
possibly-unreliable client code in workstation
distributed control, failures
real-time nature of the activity
Managing Recorded Voice
Goals: stored voice should be
· shareable
· editable
· available to diverse workstation clients
Realities: voice, unlike text,
· takes lots of space (64 Kbits/sec)
· cannot be printed or displayed
· requires special I/O devices
Recorded Voice Objects
· recording/playing voice sets up conversation between Etherphone and Voice File Server
· editing operations similar to those for character strings
Concatenation, Substring, Replace, etc.
exception: need to determine talkspurts and silence intervals
· editing operations produce new (immutable) voice objects
· operations performed on server
· voice shared and manipulated by reference
· voice data never copied or decrypted
(except when played)
Editing Voice Objects
[Artwork node; type 'ArtworkInterpress on' to command tool]
Conclusions
· Distributed systems issues and challenges
· Future directions
Distributed Systems Issues
in the Etherphone Project
· Communication
real-time voice protocol
Ethernet (or internet) transmission
multicast for conference calls
control via RPC
scheduled requests
multicast for reports
· Concurrency management
every client also a server
rules for ensuring consistency without deadlock
· Voice management
voice manipulated by reference
voice files vs. voice objects
· Databases
used for many purposes
frequent, shared access and update
Distributed Systems Issues (cont)
· Security
voice DES encrypted during transmission
remains encrypted on Voice File Server
key distribution by secure RPC
· Fault-tolerance
Voice Control Server not replicated
rely on existing phone system for backup
· Naming and binding
by name, feep name, location, etc.
· Programmability
simple methods not fully reliable
reliable methods still too complicated
Directions
· Voice architecture development
functionality
programmability
· Additional interactive telephone applications
call filtering
"background" calls
broadcast talks, meetings
conference calls with side conversations
· Additional media
still video
real-time video
· Integration of recorded voice applications with multi-media document research
improved annotation methods
scripted documents