Voice Communications Area
    The only forms of office automation most offices have yet seen are the electric typewriter, the copier, and the telephone.  Voice 
    communication is and will remain among the most powerful tools that people use to get their work done.  If Xerox is to remain a 
    force in offices it will have to understand the role of voice and the relationship of voice to its other products and services. 
     In CSL, apart from the intriguing systems problems that we have encountered while building voice applications, we have two 
    major reasons for wanting to study voice communications:
    Despite the immense investment in research and development over the last 110 years, the functionality of the telephone (now 
    extended by the answering machine and its more sophisticated voice-mail siblings) still leaves quite a bit to be desired.  The 
    telephone is easy to learn but hard to use, in the sense that an enormous proportion of attempted calls are unsuccessful  
    either because they fail to reach the intended party, or because they succeed at inopportune times.  The attempts to integrate 
    the telephone with more powerful user interfaces and with voice mail capabilities are making some progress in taming the 
    telephone, and in exploiting the use of recorded voice in conjunction with visual documents, but all the ones we've seen lack 
    much of the insight we have into effective ways to integrate voice with the workstation capabilities that we have begun to 
    master.
    Even so, the need to interact with the voice products and services of other vendors is even more compelling than we have found in 
    our other areas of interest (those addressed by Interpress, Interscript, XNS, international mail standards, etc.)  For example, 
    it is unlikely that Xerox would be able to capture a significant percentage of the business telephone switch market, even if we 
    knew how to or wanted to.  Having identified the functions we want our systems to provide, we need to understand how to achieve 
    them in conjunction with competitor's business telephone systems and with the existing telephone network.  This should include 
    both means for accommodating our requirements to the existing systems, and careful advice to the industry for improving the 
    interface to their capabilities.  This ought to take the form of a new voice communications architecture (in the spirit of XNS 
    and/or Interscript).
What Do We Want to Do?
    With the current Etherphone system, we have done the "means" part; now it's time to get to the "ends."  We have begun designing 
    extensions to the Etherphone that will provide:
    A more interesting set of telephone-management tools for the workstation.
    Voice ropes and some sample applications of them.  Voice ropes are fragments of recorded voice that can be segmented and combined 
    in a manner analogous to the treatment of text in Cedar.  Additional operations are required for recording, playing, and 
    identifying phrase or sentence boundaries, and for dealing with the immutable and voluminous nature of recorded voice.  The 
    applications we will pursue include Tioga-based visually-cued editors for voice, the annotation of Tioga files with voice 
    commentary, and the use of annotated Tioga documents as a means for providing integrated voice mail.
    Implementation of Etherphone-related workstation programs in other environments.  We're willing to work with programmers who would 
    like to implement Finch-like programs in XDE, Interlisp, or Smalltalk. The most likely first example will be in Interlisp, 
    since Henry Thompson's RPC implementation is nearing completion.
    We have no plans to explore the theory of speech recognition, or to incorporate any existing speech-recognition capabilities into 
    our prototypes, but we do intend to produce a design that would accommodate a successful speech-recognition implementation, 
    including provisions for indicating and assisting in the correction of incomplete or imperfect translations.
    Intervoice.  An architecture for office voice communications.
    The architecture proposal needs further discussion.  Architectures as diverse as Interpress and XNS have two important attributes 
    in common:
    They describe, in exacting detail, the range of capabilities that their clients have available, the way those capabilities are 
    structured, and exactly how they are used.  XNS describes the interfaces that clients use to communicate with other 
    applications; Interpress defines the file formats that clients need to obey in order to get documents printed.  In both cases, 
    it is possible to define precisely, as subsets of the total architectures, any restrictions that a particular implementation 
    might place on its clients.
    They serve as detailed specifications for the implementors of the services that are needed to support their clients.  The best 
    example comes from XNS:  each layer in the architecture has to be implemented by somebody, whose clients will then implement 
    the higher levels.  This is very important for voice; we want to be able to specify precisely the features that a PBX should 
    provide, for example.
    In the voice area, one can identify some simple protocols (e. g., the emerging ISDN) at about the level of RS-232, perhaps SDLC, or 
    the lower-levels of XNS, but higher-level architectures tend to be implicit in the implementations of telephone switching 
    systems, PBXs and the like.  All the existing proposals for integrating data and voice, or integrating local area networks and 
    PBXs, inside and outside of Xerox, are expressed at this primitive level, so they are not as interesting as they sound at 
    first.  To use a specific example, they would give little guidance for how to build an integrated Etherphone-like system using 
    a Northern Telecom SL-1 switch and an EMS voice mail machine (the one Xerox sells now).  Given an InterVoice architecture, it 
    should be much easier to decide which pieces of it Xerox wants to buy and which pieces it wants to make.  With any luck, it 
    could be presented as a de facto standard in the spirit of XNS.  We are attempting to cast the design for extensions to the 
    Etherphone prototype in terms of such an architecture.
    We have identified functions, and flexible methods for providing them, that dominate anything we have seen in the market or in 
    research projects we know of.  We also know of no attempt, successful or otherwise, to produce a comprehensive architectural 
    specification for the use of voice in personal information systems; "InterVoice" has a very good chance of serving as an 
    archetype for this kind of thing.