<<Voice Project Activity Report for 2d half 1985>>
    <<Copyright © 1986 by Xerox Corporation.  All rights reserved.>>
    <<Swinehart, January 14, 1986 10:03:00 am PST>>
Executive Summary
The CSL Voice project is an effort to define and validate experimentally a prototype architecture, for which we have coined the 
term "Intervoice," that can incorporate live and recorded voice into office systems.  The methodology is to produce working, 
extensible prototypes, primarily in Cedar.
There has been significant progress during the last six months: a major revision to the telephone service has been designed and 
largely implemented; an implementation of a common database package needed to support a number of functions of the present 
Intervoice has been completed; a voice synthesis service has been added; and a preliminary Interlisp-D version of the workstation 
program was demonstrated.  An extensive questionnaire was distributed to the present Etherphone users (the most obvious customers 
for this work), soliciting comments on the adequacy of the existing design and ideas for future extensions.  Plans include the 
production of advanced applications that will exhibit the power of the underlying architecture, and a continued attempt to define 
the Intervoice architecture in a way that could accommodate components running on other hardware and in other environments.
Introduction
Voice is a vital form of interpersonal communication in the office.  As a non-interactive medium (in messages and as annotations to 
text documents) it has yet to be fully assessed, but it appears to have considerable value.  (The August 5, 1985 issue of the 
"Seybold Report on Office Systems" says: "Although voice applications have been relatively slow to take off in the United States, 
we suspect that lack of integration is the villain.  Once voice functions are truly intertwined with text and data applications in 
an organization's information systems, they will become an irresistible and compelling adjunct to any office.")  The appearance of 
integrated voice capabilities in Xerox workstation products is overdue.
The CSL Voice project is an effort to define and validate experimentally a prototype architecture, for which we have coined the 
term "Intervoice," that can incorporate live and recorded voice into office systems.  Intervoice must be able to specify the role 
of telephone transmission and switching, workstations, voice file servers, and other network services in supporting voice 
communications  both real-time telephone calls and recorded voice as it is stored, manipulated, and experienced in documents.  It 
must be an open architecture, permitting programmers to create new voice-related applications and modify existing ones without 
having to understand the detailed workings of the voice system and without endangering its normal operation.  Ideally, this 
architecture could evolve into a standard defining the role of each major component, so that multiple vendors could cooperate to 
provide advanced voice functions in conjunction with our workstations.
The methodology is to produce working, extensible prototypes, primarily in the Cedar programming environment; to demonstrate the 
present state of the architecture; and to serve as an experimental base for improving it.  The present experimental environment 
includes:
Etherphones: telephone/speakerphone instruments that communicate digitized voice and control information over the Ethernet.  Each 
includes a microcomputer and encryption hardware.  Approximately forty Etherphones are in daily use within CSL.
Telephone service: a control program that provides PBX functions and manages the communicatons of all the other components.  It 
runs on a dedicated server.  This service and workstation program libraries implement the client programmer interface to the 
Intervoice architecture.
Voice file service: a service that can connect to Etherphones in lieu of or in addition to other conversants.  Supports digital 
recording, playback, and flexible editing of user utterances.
Synthesized voice service: a recently-added facility described in more detail below.
Workstation programs: provide enhanced user interfaces and control over the voice capabilities.  The Cedar version is called Finch, 
and has been in use for some time.
There has been significant progress during the last six months: a major revision to the telephone service has been designed and 
largely implemented; an implementation of a common database package needed to support a number of functions of the present 
Intervoice has been completed; a voice synthesis service has been added; and a preliminary Interlisp-D version of the workstation 
program was demonstrated.  An extensive questionnaire was distributed to the present Etherphone users (the most obvious customers 
for this work), soliciting comments on the adequacy of the existing design and ideas for future extensions.
Activities
Database Facilities
The voice system requires access to a number of high-performance, shared databases that may reside on servers or personal 
workstations.  A simple data management system, called LoganBerry, has been developed to support these needs.  LoganBerry treats 
data as a set of untyped key:value pairs.  Data is maintained in one or more logs, which can be stored on a local disk, an Alpine 
file server, or in stable battery-backed RAM.  To prevent a database from being corrupted by processor crashes, new database 
entries are always appended to the end of a log file; delete operations are simply logged.  Monitor locks provide the necessary 
concurrency control.  LoganBerry databases can be backed up using the DF facilities already present in Cedar.
The basic query operations fetch an entry (or enumerate a range of entries) comprising a number of key:value pairs, given one such 
pair.  These queries are efficiently supported through the use of B-Tree indices.  A query package layered on top of these basic 
operations allows one to formulate queries involving multiple keys, a variety of pattern-matching techniques, and merged databases. 
 All LoganBerry operations can be invoked locally or via remote procedure calls (RPC).  A general purpose browser permits 
interactive querying of LoganBerry databases.  
Although the creation of LoganBerry was motivated by the needs of the voice project, its facilities have wider utility.  It has 
been released as a Cedar library package, and has already been used to produce a database of publication references.
Speech Synthesis Services
In addition to the voice file server, which supports voice recording and playback, the Etherphone system now includes two voice 
synthesizers produced by different manufacturers.  Each can convert arbitrary ASCII text to reasonably intelligible audio output, 
with control of speaking speed, pitch, and other voice characteristics.  Words that do not follow usual English pronunciation rules 
can be specified as a sequence of phonemes.
Each synthesizer is connected to a dedicated Etherphone, called a text-to-speech server.  Both servers are available to any 
Etherphone-equipped workstation, on a first-come, first-served, one-user-at-a-time basis.  Normal Etherphone connections are used 
to transmit the voice from the server to the user's Etherphone.  A user or program can generate speech as easily as printing a 
message on the display.  
Uses so far have been general applications of text-to-speech in the electronic office, including: proofreading (especially 
comparing versions of a document when one version has no electronic form), audio reminders, program progress indicators, and error 
messages.  The controllability of the generated speech suggests interesting future research in "audio style" for documents, in 
which speed, pitch, and other voice characteristics could be used to communicate italicization, boldface, or quotations. 
Redesign of Telephone service
The telephone service was designed to support a wide range of advanced capabilities.  Two years' operational experience with the 
existing design have revealed the need for an improved version, whose design and implementation have been largely completed during 
the last six months.  The primary improvements are:
A number of special-purpose databases have been replaced by Loganberry data bases, resulting in a much simpler system structure.  
These include the assignments of users to Etherphones, system-wide telephone directories, user-specific telephone options, and 
filing information needed to manage recorded voice messages.  More of the control of the system has been implemented in terms of 
databases that describe the required configurations or behavior.
It has always been a goal of the Etherphone system to give the workstation controlling a telephone priority over considerations 
such as whether to answer a call or not, without affecting the reliability of the underlying telephone system when the workstation 
fails or behaves poorly.  A revision of the control program that gives each of the participating workstation and Etherphone 
processors more autonomy, while preserving the needed priority, has also enabled better management of conference calls, multiple 
simultaneous calls, and the background use of the single voice channel to each office (allowing, for example, the playing of 
synthesized voice without interfering with incoming telephone calls.)
Within an office setting, people often use workstations away from their own offices and telephones, or visit other offices for 
extended periods.  The system has been extended to allow the automatic forwarding of telephone calls to the location where a user 
is known to be, as well the proper caller identification for outgoing calls.  This is an improvement over the forwarding 
capabilities of other systems, which require forwarding to be manually requested before leaving the office.
Intervoice Progress
The Intervoice architecture that is emerging to describe the structure and functional roles of the Etherphone system components is 
beginning to rely heavily on the use of flexible, high performance shared databases that are very hard to damage.  The ones we have 
now store declarative information, described above.  We will need to add the ability to store methods for execution when triggered 
by later events, as well as call-back information that can trigger execution in other parts of the system when the database is 
changed.  These requirements are strikingly similar to those identified by SCL for their object server.
Plans
Plans include the implementation of "voice ropes", a design done last year to provide flexible primitives for editing voice.  Using 
voice ropes, Loganberry databases, and the improved telephone service, we intend first to add full voice-annotation to Tioga 
documents, then to use the result as a component of a project integrating voice messaging and telephone conversation logging with 
Walnut electronic mail facilities, seeking a common paradigm spanning interactive conversations and more conventional off-line 
messages.  Loganberry will also be used as the basis for an improved telephone directory service, combining personal and public 
directories that can be either browsed or queried.  Use of the speech synthesizer by applications programmers will be encouraged.  
Possible applications include audio confirmation of the number dialled as a call begins, reading electronic mail over the telephone 
to a remote lab member, and playing program-generated prompts and messages to callers ("X is at the Forum now, please call back 
after 5pm").  The definition of a voice architecture to support all these facilities will continue.
[Swinehart, Terry, Zellweger]