Voice Communications Area The only forms of office automation most offices have yet seen are the electric typewriter, the copier, and the telephone. Voice communication is and will remain among the most powerful tools that people use to get their work done. If Xerox is to remain a force in offices it will have to understand the role of voice and the relationship of voice to its other products and services. In CSL, apart from the intriguing systems problems that we have encountered while building voice applications, we have two major reasons for wanting to study voice communications: Despite the immense investment in research and development over the last 110 years, the functionality of the telephone (now extended by the answering machine and its more sophisticated voice-mail siblings) still leaves quite a bit to be desired. The telephone is easy to learn but hard to use, in the sense that an enormous proportion of attempted calls are unsuccessful  either because they fail to reach the intended party, or because they succeed at inopportune times. The attempts to integrate the telephone with more powerful user interfaces and with voice mail capabilities are making some progress in taming the telephone, and in exploiting the use of recorded voice in conjunction with visual documents, but all the ones we've seen lack much of the insight we have into effective ways to integrate voice with the workstation capabilities that we have begun to master. Even so, the need to interact with the voice products and services of other vendors is even more compelling than we have found in our other areas of interest (those addressed by Interpress, Interscript, XNS, international mail standards, etc.) For example, it is unlikely that Xerox would be able to capture a significant percentage of the business telephone switch market, even if we knew how to or wanted to. Having identified the functions we want our systems to provide, we need to understand how to achieve them in conjunction with competitor's business telephone systems and with the existing telephone network. This should include both means for accommodating our requirements to the existing systems, and careful advice to the industry for improving the interface to their capabilities. This ought to take the form of a new voice communications architecture (in the spirit of XNS and/or Interscript). What Do We Want to Do? With the current Etherphone system, we have done the "means" part; now it's time to get to the "ends." We have begun designing extensions to the Etherphone that will provide: A more interesting set of telephone-management tools for the workstation. Voice ropes and some sample applications of them. Voice ropes are fragments of recorded voice that can be segmented and combined in a manner analogous to the treatment of text in Cedar. Additional operations are required for recording, playing, and identifying phrase or sentence boundaries, and for dealing with the immutable and voluminous nature of recorded voice. The applications we will pursue include Tioga-based visually-cued editors for voice, the annotation of Tioga files with voice commentary, and the use of annotated Tioga documents as a means for providing integrated voice mail. Implementation of Etherphone-related workstation programs in other environments. We're willing to work with programmers who would like to implement Finch-like programs in XDE, Interlisp, or Smalltalk. The most likely first example will be in Interlisp, since Henry Thompson's RPC implementation is nearing completion. We have no plans to explore the theory of speech recognition, or to incorporate any existing speech-recognition capabilities into our prototypes, but we do intend to produce a design that would accommodate a successful speech-recognition implementation, including provisions for indicating and assisting in the correction of incomplete or imperfect translations. Intervoice. An architecture for office voice communications. The architecture proposal needs further discussion. Architectures as diverse as Interpress and XNS have two important attributes in common: They describe, in exacting detail, the range of capabilities that their clients have available, the way those capabilities are structured, and exactly how they are used. XNS describes the interfaces that clients use to communicate with other applications; Interpress defines the file formats that clients need to obey in order to get documents printed. In both cases, it is possible to define precisely, as subsets of the total architectures, any restrictions that a particular implementation might place on its clients. They serve as detailed specifications for the implementors of the services that are needed to support their clients. The best example comes from XNS: each layer in the architecture has to be implemented by somebody, whose clients will then implement the higher levels. This is very important for voice; we want to be able to specify precisely the features that a PBX should provide, for example. In the voice area, one can identify some simple protocols (e. g., the emerging ISDN) at about the level of RS-232, perhaps SDLC, or the lower-levels of XNS, but higher-level architectures tend to be implicit in the implementations of telephone switching systems, PBXs and the like. All the existing proposals for integrating data and voice, or integrating local area networks and PBXs, inside and outside of Xerox, are expressed at this primitive level, so they are not as interesting as they sound at first. To use a specific example, they would give little guidance for how to build an integrated Etherphone-like system using a Northern Telecom SL-1 switch and an EMS voice mail machine (the one Xerox sells now). Given an InterVoice architecture, it should be much easier to decide which pieces of it Xerox wants to buy and which pieces it wants to make. With any luck, it could be presented as a de facto standard in the spirit of XNS. We are attempting to cast the design for extensions to the Etherphone prototype in terms of such an architecture. We have identified functions, and flexible methods for providing them, that dominate anything we have seen in the market or in research projects we know of. We also know of no attempt, successful or otherwise, to produce a comprehensive architectural specification for the use of voice in personal information systems; "InterVoice" has a very good chance of serving as an archetype for this kind of thing.