Cedar Voice System
September 1986

The Cedar Voice System



Douglas B. Terry

in collaboration with

Daniel C. Swinehart
Polle T. Zellweger

Computer Science Laboratory
Xerox Palo Alto Research Center
Goals of the Voice Project
· Taming the telephone
better placement and receipt of calls
better handling of special features
better human-assisted call handling
· Voice as data
recorded voice messages
synthesized text-to-speech messages
voice-annotated documents
integration with other interactive applications
· Programmer control over all of the above
· Not speech recognition!
Etherphones
· speak Voice Transmission Protocol
· respond to control commands (RPC)
set up connection
generate dial tone, busy signal
· report events
off hook, ringing, touch tone buttons
· provide analog interface to telephone network
Voice Control Server
· manages connections
· monitors and controls state of each Etherphone
· coordinates workstation user interfaces
· stores voice objects
· maintains databases
Etherphone-workstation assignments
directory assistance (white/yellow pages)
ring tunes
Voice File Server
· speaks Voice Transmission Protocol
· records voice files
· plays voice files
· stores ~9.5 hours of voice (300 Mbytes)
· performs backup of voice files
· maintains file directory
· participates in garbage collection
Workstations
· user interface for telephony
call placement
call status
white pages
· user interface for recorded voice
mail system
voice editing
· program access to voice and telephone features
Voice Transmission Protocol
· 64 Kbits/sec = 50 packets/sec
· 20 msec packet = 160 8-bit samples
· silence detection => 50%
· DES encryption
· ~40 msec end-to-end delay
20 msec — packetization
10 msec — encryption and delivery
10 msec — anti-jitter
· no retransmission
· capacity
10 Mbit ethernet => ~225 users
3 Mbit ethernet => ~100 users
1.5 Mbit ethernet => ~60 users
... assuming no data traffic
Conversation Management
· one or more parties per conversation
sources and sinks of voice
non-voice participants, e.g. workstations
· conversation establishment
state machine (idle, notified, ringing, ringback, active, ...)
parties act autonomously
notifications of state changes
· authenticated or presumed identities
· active parties use Voice Transmission Protocol
· broadcast reports during active conversation
encryption key distribution
recording started, playback finished, etc.
Recorded Voice Objects
· recording/playing voice sets up conversation between Etherphone and Voice File Server
· editing operations similar to those for character strings
Concatenation, Substring, Replace, etc.
exception: need to determine talkspurts and silence intervals
· editing operations produce new voice objects
· operations performed on server
· voice data never copied or decrypted
(except when played)
Voice Storage
· two-level storage hierarchy: tunes and voice objects
many-to-many relationship between tunes and voice objects
· tunes
encrypted voice samples (block-mode)
stored on Voice File Server
· voice objects
sequence of tune intervals <tune#, start, length>
structure stored in database
· editing builds up complex voice object structures
Data Management
Interpress Artwork: []<>Users>Terry.pa>LBbtree.ip!1 topMargin: 68.58mm, height: 101.6mm, width: 177.8mm, scale: 0.7
· voice database requirements
simple data model, good performance, sharing
· operation logs
append-only (or read-only)
human readable and editable
stored on a variety of file systems
· indexing provided by BTrees
BTree entry = [log id, byte address]
· logs represent the truth!
Status of the Voice System
· ~50 Etherphones in daily use in CSL
· Voice Control Server and Voice File Server stable but evolving
· Also, Text-to-Speech server
· Voice Mail and telephone directory services are main applications
· New applications: voice-annotated documents, dictation machine, automated scripts, etc.