Distributed Systems Support for Voice in Cedar
October 1986

Distributed Systems Support
for Voice in Cedar


Douglas B. Terry
Daniel C. Swinehart
Polle T. Zellweger

Computer Science Laboratory
Xerox Palo Alto Research Center
Plot
· User interface and system overview (video)
· Project goals
original, functional goals
architectural goals
· Towards an architecture for voice integration
· Voice ropes: management of recorded voice
· Conclusions
(show videotape here.)
Original Voice Project Goals
· Taming the telephone
better placement and receipt of calls
better handling of special features
better human-assisted call handling
· Voice as data
recorded voice messages
synthesized text-to-speech messages
speech recognition (not yet)
voice-annotated documents
integration with other interactive applications
· Programmer control over all of the above
Overall Goal: A Voice Systems Architecture
· User-programmable
language-level interfaces
simple things are simple
complicated things make sense
acceptable impact on reliability
· Supports interoperability
multiple workstations and environments
multiple networks and protocols
multiple telephone transmission and switching choices
· Extensible
admits new applications
admits new services
admits new workstations, networks, . . .
Voice Architecture Layering
Interpress Artwork: []<>Users>Terry.pa>Talks>IntervoiceLayers.IP!2 topMargin: 50.8mm, height: 152.4mm, width: 190.5mm
Conversation Management
· one or more parties per conversation
sources and sinks of voice
non-voice participants, e.g. workstations
· conversation establishment
state machine (idle, notified, ringing, ringback, active, ...)
parties act autonomously
notifications of state changes
· authenticated or presumed identities
· active parties use Voice Transmission Protocol
· broadcast reports during active conversation
encryption key distribution
recording started, playback finished, etc.
Conversation Management -- Challenges
· Novel conversation models
background calls, lectures, "watercooler" simulators
unsymmetric participation in (multiple) conversations
under workstation program control
· Workstation/telephone partnership
workstation decisions override telephone decisions
workstation tracks telephone-initiated activities
default behavior when workstation fails or drops out
· Reliability in the face of:
simultaneous actions by all parties
possibly-unreliable client code in workstation
distributed control, failures
real-time nature of the activity
Managing Recorded Voice
Goals: stored voice should be
· shareable
· editable
· available to diverse workstation clients
Realities: voice, unlike text,
· takes lots of space (64 Kbits/sec)
· cannot be printed or displayed
· requires special I/O devices
Recorded Voice Objects
· recording/playing voice sets up conversation between Etherphone and Voice File Server
· editing operations similar to those for character strings
Concatenation, Substring, Replace, etc.
exception: need to determine talkspurts and silence intervals
· editing operations produce new (immutable) voice objects
· operations performed on server
· voice shared and manipulated by reference
· voice data never copied or decrypted
(except when played)
Voice Storage
· two-level storage hierarchy: tunes and voice objects
many-to-many relationship between tunes and voice objects
· tunes
encrypted voice samples (block-mode)
stored on Voice File Server
· voice objects
sequence of tune intervals <tune#, start, length>
structure stored in database
· editing builds up complex voice object structures
premise: number of edits is small
Voice Interests
· garbage collection:
A tune is garbage if it is not a component of any voice objects.
A voice object is garbage if no client has an interest in that voice object.
· interests
similar to reference counts
grouped according to classes
stored in a database
· client operations
Retain — registers interest in a voice object
Forget — drops interest in a voice object
· class-specific garbage collection for interests
Sample Interest Classes
· TiogaVoice
interest registered when file copied to file server
interest information includes file name
voice object collected if file no longer exists
· Timeout
interest information includes timeout period
voice object collected if time since creation is greater than timeout
· WalnutMsg
only explicit garbage collection
interest registered when message received
interest dropped when message deleted
(scenario on sending voice mail goes here.)
Distributed Systems Issues
in the Voice Project
· Communication
Real-time voice protocol
Ethernet (or internet) transmission
64 Kbits/sec = 50 packets/sec
multicast for conference calls
Control via RPC
scheduled requests
multicast for reports
· Voice management
voice manipulated by reference
voice files vs. voice objects
· Security
voice DES encrypted during transmission
remains encrypted on Voice File Server
key distribution by secure RPC
· Fault-tolerance
Voice Control Server not replicated
rely on existing phone system for backup
· Naming and binding
by name, feep name, location, etc.