last edited by Doug Terry, June 30, 1986 3:29:33 pm PDT
Copyright © 1986 by Xerox Corporation. All rights reserved.
Voice Ropes: Motivations, Design, and Implementation
Doug Terry
Introduction
The notion of "voice ropes", voice sequences that can be easily recorded, manipulated, and played, has been a goal of the Voice Project in CSL for quite some time. This document outlines the current design for a system providing voice ropes. This design has evolved over the years to its present form. In particular, Dan Swinehart and Luis Cabrera worked on an early version of such a design, which was distributed to VoiceProject^.pa on December 10, 1984. The following section outlining the motivation for voice ropes was adapted from Dan's message.
Motivation
We have a couple of years or so of experience now with the management of recorded voice messages. The interface at present deals with very low-level objects: "tunes", or voice files, and intervals within them. Any higher-level structures have been left to client programs, namely Finch. For a number of reasons, primary among them the desire to have a permanent representation of compositions of recorded voice segments and the desire to allow access to these capabilities from other programming environments and from the stand-alone telephone, we are proposing to define an editing and storage facility for recorded voice, to be located on the same server that implements Bluejay. This will have the additional benefit of moving the database code for managing the stored voice information to the server.
There are three problems: how to edit the voice (and store the edits, since rearranging the actual voice bits is not practical), how to denote ownership of the resulting structures, and what the client interface should be (what the functions are, how these functions interact with the other voice protocols.)
To solve the first problem, we have designed a package of functions based on the model of the Cedar Rope. If one thinks of an utterance as an immutable sequence of voice samples, then the set of capabilities for composing new ropes from old -- concatenation, substring, possibly replace as a compound operation -- are just the ones we need to edit voice. Additional functions are needed for recording, playback, and some specialized operations such as determining where the talkspurts are within a "voice rope."
Although we will use the model of a rope, we do not propose to implement the package in terms of Cedar ropes. Unlike Cedar ropes, these objects are going to have to outlast any given Cedar session, so there will need to be a permanent representation for them -- a database. Moreover, the "by-value" model of passing REFs around using RPC would not be suitable when applied to voice ropes, so we need a different kind of identifier than a REF to a rope structure for representing them.
Voice Ropes
A voice rope is an object that represents a segment of recorded voice. Thus, it is an object with a length and a collection of voice samples. Voice ropes are denoted by unique identifiers, which are themselves ROPEs. The generation of identifiers is up to the voice rope service.
VoiceRope: TYPE = RPC.ShortROPE; -- Identifies the voice rope.
A distinction exists between a voice rope and a voice rope's ID. A voice rope always resides on the Voice server; only voice rope IDs are given out to clients. The type defined above, VoiceRope, is actually a voice rope ID. Sometimes the distinction between a voice rope and its ID is blurred in this document. A voice rope ID is created for each immutable voice rope. Operations that manipulate voice ropes create new ones with new IDs.
The implementation of voice ropes is as a sequence of intervals within tunes. Simple voice ropes consist of a single interval within a single tune, perhaps the whole tune. More complex voice ropes can be constructed by the editing operations provided in the next section. The internal structure of a voice rope is dependent solely on what operations have been performed to construct the voice rope; the structure says nothing about the silence intervals within a voice rope, for instance.
A voice rope ID can be simply a timestamp of when the voice rope was created. However, the granularity of the timestamp must be fine enough that voice ropes created in succession have unique IDs. A reasonable algorithm for generating voice rope IDs is to concatenate the current time (in seconds) with the creator's RName; if several ropes are created in the same second then time is temporarily moved forward so that unique IDs are assured. The algorithm assumes that voice ropes are generated no more frequently than once per second on the average.
Operations on Voice Ropes
The following operations on voice ropes rely mostly on the underlying Bluejay facilities. A short description of how to implement the operation follows each procedure declaration. Some additional parameters may be needed. To support voice editing efficiently, all of the routines actually take an interval of a voice rope as an argument.
Record:
PROC []
RETURNS [VoiceRope];
Calls on RecordTune to record a new tune. A VoiceRope is then created with the new tune as its contents. The ID for the VoiceRope is returned to the caller.
Play:
PROC [vr: VoiceRope, start:
INT ← 0, len:
INT ← MaxLen]
RETURNS [];
Plays the given interval of the voice rope. Must call PlayTune for each tune component of the voice rope that resides completely or partially within the voice rope interval. Questions remain about how to integrate this routine with those for playing other sources of voice/sounds.
DescribeRope:
PROC [vr: VoiceRope, minSilence: VoiceTime ← -1]
RETURNS [
LIST
OF
RECORD[start, len:
INT]];
Returns a list of intervals within the voice rope that represent talkspurts. A talkspurt is defined to be any sequence of voice samples separated by minSilence ms. of silence. The list is constructed by calling DescribeTune for each tune component of the voice rope.
Fetch:
PROC [vr: VoiceRope, index:
INT]
RETURNS [VoiceSample];
Calls FetchVoiceSample after discovering the correct index into the correct tune component of the voice rope. Maybe we need a way of retrieving a block of voice samples.
The following editing operations do not involve Bluejay at all. Each operation results in a new voice rope since voice ropes are immutable.
Cat:
PROC [vr1, vr2, vr3, vr4, vr5: VoiceRope ←
NIL]
RETURNS [VoiceRope];
A new voice rope is created that is the concatenation of the tune components of the voice rope arguments.
Substr:
PROC [vr: VoiceRope, start:
INT ← 0, len:
INT ← MaxLen]
RETURNS [VoiceRope];
A new voice rope is constructed from the tune components that overlap the given interval. The tunes on the end may have to have their intervals adjusted.
Replace:
PROC [vr: VoiceRope, start:
INT ← 0, len:
INT ← MaxLen, with: VoiceRope ←
NIL]
RETURNS [VoiceRope];
The given interval of the voice rope is replaced with the specified voice rope. The work required is the same as doing Concat[Substr[vr, 0, start], with, Substr[vr, start+len]] except that voice rope IDs are not created for the intermediate results.
Length:
PROC [vr: VoiceRope]
RETURNS [
INT];
The length of the voice rope is determined by adding the lengths of its tune components.
Voice File Server Support
This design of the voice rope system assumes that the voice file server, Bluejay, provides the following operations:
RecordTune: records some voice as a "tune" and returns its TuneID.
PlayTune: plays a specified interval of a specified tune.
DeleteTune: reclaims the storage occupied by a tune; the TuneID may be subsequently reused.
DescribeTune: returns an indication of the non-silence and silence intervals of a specified tune.
FetchVoiceSample: returns a specified sample (8 bits = 125 msecs of voice) of a specified tune. There are certain difficulties in providing this function since Bluejay doesn't know the encryption key.
Actually, for the initial implementation, this required underlying functionality is provided by the FinchSmarts running on the workstation. Eventually, the code for managing voice ropes will migrate to the server and call more directly on Bluejay.
VoiceRope Database Design
Information about voice ropes and their structure is maintained in a LoganBerry database. Entries in the VoiceRope database are as follows:
VRID: the voice rope ID
Creator: the creator of the voice rope (must get from RPC)
Length: length of voice rope (can be computed from following info but maybe want to cache it)
TuneID: a tune ID
TuneInt: start/len interval within the tune
TuneID: a tune ID
TuneInt: start/len interval within the tune
...
Note that each tune component of a voice rope is listed as a separate TuneID and TuneInt attribute; we rely on LoganBerry to maintain the ordering of tune intervals within a voice rope. We may want other attributes associated with the voice rope such as those needed for access control. However, we would need some way for clients to assign values for such attributes.
Two main indices are maintained on the VoiceRope database: one on VRID so that information about a voice rope can be accessed given its ID, and one on TuneID so that all of the voice ropes using a particular tune can be easily determined. We may also want an index on Creator.
Interests in Voice Ropes
Clients must explicitly express an interest in retaining a voice rope in order to prevent it from being garbage collected. Interests belong to a particular class; different applications employing voice ropes can use different interest classes to control how those voice ropes are managed. One class of interest might be "TiogaDocumentOnFileServer"; within this class clients are identified by their full file name.
InterestClass: ROPE; -- the type of interest (could also be an ATOM)
NewInterest:
PROC [vr: VoiceRope, client:
ROPE, class: InterestClass, name:
ROPE ←
NIL, data:
ROPE ←
NIL]
RETURNS [];
The specified client of the specified class expresses a new interest in the voice rope. Client-defined names and data can be associated with the voice rope. Calls to this routine simply add an entry to the VoiceInterest database (calls with identical arguments are idempotent).
DropInterest:
PROC [vr: VoiceRope, client:
ROPE, class: InterestClass]
RETURNS [];
The specified client of the specified class drops its interest in the voice rope.
We might also need routines that allow clients to determine what interests they have expressed. For instance, what if a client wants to retrieve the voice rope with a particular name in a particular class or all of the interests created by "Terry.pa"? Such operations can be done by directly calling LoganBerry, but this does not provide the proper layer of abstraction from the underlying mechanisms.
VoiceInterest Database Design
Information about interests in voice ropes is also maintained in a LoganBerry database. Entries in the VoiceInterest database are as follows:
IID: an interest ID that serves as the primary key (maybe not used for anything else)
Creator: the creator of the interest, e.g. "Terry.pa"
Timestamp: the create time of the interest entry
VRID: the ID of a voice rope
Client: the interested party, e.g. "/Ivy/Terry/DocumentWithVoice.Tioga"
Class: the interest class, e.g. "TiogaDocumentOnFileServer"
Name: a client specified name for the interest
Data: client specified data associated with the interest
Indices in the VoiceInterest database are maintained on IID, Creator, VRID so that one can determine which interests refer to a given voice rope, Client, Class so that the entries of a class can be enumerated, and Name.
Garbage Collection
Discovering when the storage used by a particular tune can be reclaimed represents a particularly thorny problem. A garbage collector can be built assuming that garbage can be detected. We use the following definitions of "garbage":
A tune is garbage if it is not a component of any voice ropes.
A voice rope is garbage if no client has an interest in that voice rope.
An interest is garbage if it is no longer needed according to a class-specific algorithm.
These working definitions assume that the voice rope system is the only client of the voice storage server.
Garbage collection takes place at all three levels. A single collector exists for tunes, and one suffices for voice ropes. The Voice Rope Collector refuses to collect voice ropes that are too young in order to prevent it from collecting a newly created voice rope before a client has the opportunity to express an interest in it. We may need the same assurances at the tune level. Different garbage collectors are used to collect outdated interests of different classes since the definition of what constitutes garbage depends on the class of an interest.
A garbage collector can be either aggressive or non-aggressive. Aggressive garbage collectors not only delete information they determine to be garbage, but also try to discover additional garbage that might be generated by the delete, whereas non-aggressive garbage collectors rely on other passes (or collectors) to determine new garbage. For example, suppose A references B and that A is determined to be garbage; an aggressive garbage collector would try to collect B as well as A, while a non-aggressive collector would simply delete A and let the fact that A was the last one to reference B be detected at a later time.
CollectTunes:
PROC []
RETURNS [];
The Tune Collector enumerates the complete set of TuneIDs and calls CollectTune for each one. CollectTune[tuneID] queries the VoiceRope database to determine if any voice ropes make use of the tune. If not, then the tune is deleted.
CollectVoiceRopes:
PROC []
RETURNS [];
The Voice Rope Collector enumerates the VoiceRope database and calls CollectVoiceRope for each entry. CollectVoiceRope[vr] queries the VoiceInterest database to determine if an interest exists in the voice rope. If not, and the voice rope is older than some period of time, then the entry is deleted from the VoiceRope database. In addition, an aggressive implementation would call CollectTune for each tune component of the voice rope.
InterestInfo:
TYPE ~
RECORD[
vrID: VoiceRope,
creator: ROPE, -- actually an RName
timestamp: BasicTime.GMT,
client: ROPE,
class: InterestClass,
name: ROPE,
data: ROPE];
IsGarbageProc: TYPE ~ PROC[interest: InterestInfo] RETURNS [BOOLEAN ← FALSE];
CollectInterests:
PROC [class: InterestClass, proc: IsGarbageProc]
RETURNS [];
The Voice Interest Collector enumerates all entries of the specified class in the VoiceInterest database and calls the IsGarbageProc for each one. If this call returns TRUE, then the entry is deleted from the VoiceInterest database. In addition, an aggressive implementation would call CollectVoiceRope for the voice rope referenced in the deleted interest entry.
I propose that the aggressive style of garbage collection be employed. With such an approach, a separate Tune Collector is not needed very much since any tune is always included in some voice rope. These tunes will be collected as voice ropes are collected. The Tune Collector would only be needed to discover tunes that were created but did not ever become part of a voice rope (undoubtedly because of a machine crash). We could get by for quite a while without an explicit Tune Collector.
Similarly, a separate Voice Rope Collector would not be needed much if we assumed that every voice rope was always included in some interest. For instance, the Record operation could always register an interest of class "Timeout" for all newly recorded voice ropes. If no other interests in a voice rope are ever expressed, then the voice rope would be collected when the timeout interest expires.
Comments from Dan Swinehart: A nice attribute of this scheme is that a laid-back collector is also possible: one that wanders half-heartedly through the database, collecting things every once in a while. Such a beast could be left running all the time. The laid-back collector would need a registered set of IsGarbageProcs, tuned to class, that it would call in its meanderings. It would be locally aggressive, globally mellow (agressive pursuit from one level to another, but a slower journey through any given level).
Examples
Two classes of interests that seem particularly useful are "TiogaDocumentOnFileServer" and "Timeout". Let's look at these as examples of how class-specific garbage collection might be used. In addition, we may want classes for "VoiceEditingSession", "ArchivedTiogaDocument", etc.
Interests of class "TiogaDocumentOnFileServer" are registered for a Tioga document containing voice annotations. The interest's client is the full file name of the document, including a version number. The data field of the interest entry may be needed to store the file's create date since version numbers are probably not sufficient to uniquely identify a particular file. The file must reside on a publicly accessible file server (or at least on a machine running the STP server). Smodel might be modified to automatically call NewInterest for files containing voice as it copies them to a file server. The IsGarbageProc gets a file name from an interest entry and checks if the file exists; if the file has been deleted then it returns TRUE, otherwise it returns FALSE.
The "Timeout" class of interests is used to ensure that a voice rope exists for a minimum period of time. For instance, electronic messages containing voice may be given a timeout of 2 weeks. Recipients wishing to retain a message could express a new interest in the message; otherwise the message would be garbage collected when the timeout expires. For timeout interests, the client is undefined, while the data field holds the timeout value. The IsGarbageProc returns TRUE if the time since the interest entry was created is longer than the timeout value.
Conclusions
The implementation of voice ropes consists of three packages: Bluejay (the repository for voice files), which stores its files on a separate, specially formatted disk; the Voice Rope implementation itself, which represents its information as a couple of LoganBerry databases, one for the voice rope structures and one for interests; and a garbage collection package.
The notion of voice ropes is separate from the garbage collection of unwanted tunes. A voice rope package provides clients the ability to manipulate voice in familiar and useful ways. Garbage collection is necessary to free up the space occupied by unneeded recorded voice; some way of expressing interests in voice is required in order to determine what is "garbage".