Date: 10 Dec 84 10:52:18 PST
From: Swinehart.pa
Subject: New Proposal for voice ropes (working paper)
To: VoiceProject^
Cc: Swinehart
Reply-to: Swinehart.pa
[Based on ongoing conversations with Luis Cabrera. I'll issue a new version from time to time until the design firms up. Comments always welcome!]
Introduction
We have a year or so of experience now with the management of recorded voice messages. The interface at present deals with very low-level objects: "tunes", or voice files, and intervals within them. Any higher-level structures have been left to client programs, namely Finch. For a number of reasons, primary among them the desire to have a permanent representation of compositions of recorded voice segments and the desire to allow access to these capabilities from other programming environments and from the stand-alone telephone, we are proposing to define an editing and storage facility for recorded voice, to be located on the same server that implements Bluejay. This will have the additional benefit of moving the Cypress code for managing the stored voice information to the server.
There are three problems: how to edit the voice (and store the edits, since rearranging the actual voice bits is not practical), how to denote ownership of the resulting structures, and what the client interface should be (what the functions are, how these functions interact with the other voice protocols.)
To solve the first problem, we have designed a package of functions based on the model of the Cedar Rope. If one thinks of an utterance as an immutable sequence of voice samples, then the set of capabilities for composing new ropes from old -- concatenation, substring, possibly replace as a compound operation -- are just the ones we need to edit voice. Additional functions are needed for recording, playback, and some specialized operations such as determining where the talkspurts are within a "voice rope."
Although we will use the model of a rope, we do not propose to implement the package in terms of Cedar ropes. Unlike Cedar ropes, these objects are going to have to outlast any given Cedar session, so there will need to be a permanent representation for them -- probably a Cypress database. Moreover, the "by-value" model of passing REFs around using RPC would not be suitable when applied to voice ropes, so we need a different kind of identifier than a REF to a rope structure for representing them.
The choice of Cypress for storing voice ropes suggests a solution for the second problem: managing the ropes and their ownership. A preliminary implementation for voice messages made this approach seem feasible. It looks like a full-blown trace-and-sweep garbage collector is going to be needed for these things.
There were earlier proposals to allow for additional information to be attached to voice ropes: for example, text representations of their contents, or other annotations that should be tightly associated with them. We've decided that this is overloading a concept, and we'll leave these associations to higher level applications.
Finally, the protocol issues are not fully resolved at this writing. At least the recording and playback will have to be a part of the full switching protocol (ThParty and all that.) Many of the functions for manipulating the resulting ropes should not have to be. On the other hand, at least the process of locating the proper server to talk to should be connected with the switching protocol, since there may eventually be multiple such servers, and one wants to get the right one. Unresolved.
Client Functions
[Changed little from earlier designs; See the Implementation and representation section for newer information.]
Data types
A voice rope is an object that represents a segment of recorded voice. Thus it is an object with a length and a collection of voice samples. Voice ropes are denoted by unique identifiers, which are themselves ROPEs (or ATOMs, whichever turns out to be more efficient.) The generation of identifiers is up to the voice rope service.
VoiceRope: TYPE = RPC.ShortROPE; -- Identifies the voice rope.
VoiceRopeSequence: TYPE = REF VoiceRopeSequenceRecord;
VoiceRopeSequenceRecord: TYPE = RECORD [ s: SEQUENCE length: NAT OF VoiceRope ];
Editing functions
These assume that you already have some voice ropes.
Cat: PROC[vr1, ..., vr4: VoiceRope←NIL]
RETURNS [ VoiceRope]; -- wish we had variable-length lists.
CatSeq: PROC[voiceRopeSequence: VoiceRopeSequence]
RETURNS [ VoiceRope]; -- wish we had variable-length lists.
Substr: PROC[vr: VoiceRope, first: INT𡤀, len: INT←Rope.MaxLen]
RETURNS [ VoiceRope];
Replace: -- Like Rope.Replace, if it turns out to be useful.
Length: PROC[vr: VoiceRope] RETURNS [length: INT];
FetchBlock: PROC[vr: VoiceRope, first: INT𡤀, len: INT ← Rope.MaxLen]
RETURNS [samples: Rope.ROPE]; -- All of the voice samples from the substring. Could simplify by requiring client to do substring first.
FetchEnergies: PROC[ . . . ]; -- Maybe later
DescribeRope: PROC[
vr: VoiceRope, minSilence: Thrush.VoiceTime ← -1, returnSequence: BOOLFALSE]
RETURNS [ description: VoiceRopeSequence ];
Identify talkspurts separated by at least minSilence ms. each. Return a set of subropes representing the talkspurts as a sequence. Initial and terminal silences are eliminated even if shorter than the minSilence interval.
There may need to be functions for extracting the encryption key explicitly, although in general dealing with the key will be the purview of the Play/Record code and of the FetchBlock code. Probably some additional authentication stuff should be designed into this interface now, rather than later. Conceal as much of these machinations as you can, based on the notion that RPC itself carries authenticatable identity information.
A workstation-side package could be produced that would convert VoiceRopes to regular ropes, if desired. Rope-manipulation operations could then be done at high speed, Fetch[char] could be implemented, and the like, using a thin veneer on top of the VoiceRope interface to provide the mappings. Not clear it's valuable. This interface would need to supply a function that would actually reflect the resulting edits in the VoiceRope system, in cases where the rope needs to be kept.
Record/Playback functions
[See Thrush/ThParty stuff for reference. Or ignore this section.]
Interval becomes: [ ropeID, otherID, record/play, queue, request/start/finished], and progress answers in kind. The reason for otherID is to identify a record request when it comes back.
Will work for loon, with otherID again being a tag and the ropeID being the actual tune description, or a file name, or whatever.
It still seems important to represent attempts to play and record voice ropes in the ThParty switching protocol. In the former system, the Iniator called ThParty.Advance[to: active, interval: tune/interval/key specs, other params], which would be relayed to all smarts involved in the conversation, as would additional progress steps as recording or playback began or ended.
It seems reasonable to retain something of this same flavor, since all the smarts do need to know. But probably we should add a separate function:
ThParty.Play[ playSpecID: ROPE, queueIt: BOOL];
playSpecID isn't denoted as a voice rope because this same interface
ThParty.Record[ otherID: ROPE, queueIt: BOOL];
The reason for otherID is to identify a record request when its value returns via progress calls. I think. Since Record can't return the result directly.
{Aside: Progress: It's time to break up the single Progress entry in ThSmarts into several, each of which is responsible for indicating a different kind of progress: call switching state, tune playing, encryption keys, .... Within the Thrush implementation, conversation events in the log should now be represented as REF ANYs, and the size of the monolithic ConvEvent scaled down. The reason we need multiple progress calls is that REF ANY's don't travel well. Possibly add a generic Progress call, though, with room for an ATOM or two -- including a subtype spec -- and a ROPE or two, so that new applications can be tried without recompiling everything.}
In response to a Play/Record there will be (sometimes) a new encryption key table promulgated, as well as request/start/stop progress reports. Record[stop] will include the resulting voiceRope ID and the otherID, so that the original requestor can find its own request. Play does not specify a voice rope because the playSpecID could easily be used to specify the inputs to other sources of sound: Loon, Speech-Plus, and so on.
Access to interfaces
ThParty maybe should have a generic: give me an InterfaceRecordSpec[$Type] function, that will return the values needed to bind to various interfaces. Also a RegisterRecordSpec[] function. This would supplant Grapevine as a source of bindings for the utility interfaces like VoiceRopes and SpeechPlus and BluejayUtils. One would ask one's party, perhaps mentioning the other party in a connection, for the interface, so that server-specific bindings could be supported. This could maybe supplant or combine with the existing ExportOther... thing that speeds up bindings to Thrush sometimes?? The problem: there are a lot of source-specific functions -- like VoiceRope -- that need to be done and that need not go through the whole ThParty trip every time. But there needs to be a way to get a specific binding, in case eventually there's more than one for some functions. For now, add enough parameters but allow some of them to be null, and do something simple.
DB functions, from Nuthatch
These should probably be part of the VoiceRope interface:
Remember[rope]; -- Makes the rope permanent (subject to timeout), owned by its creator.
KnowAbout[rope, otherID, otherIDType, retain: T/F];
Does a Remember on behalf of the creator, and then adds information in the directory that the caller has a reference to this rope. If retain is FALSE, however, the timeout associated with the rope can cause it to disappear. KnowAbout is the way that clients keep handles on ropes, so it's useful (??) even if retain is FALSE.
otherID/otherIDType are used to give a caller-relative unique name to the rope. The type is used to distinguish among independent uses for the rope: Grapevine message, document annotation, system noise -- to be more sure that otherID is sufficiently unique, and to give some semantic clues to browsers. Depending on the type, the otherId might be a reference that the garbage collector can use directly to determine whether this knowledge of the rope is still valid. See below.
ForgetAbout[rope, otherID, otherIDType]
The caller is no longer interested in knowing about this rope, at least in terms of otherID.
Enumerate[...something or other....]
It will probably prove useful to enumerate ropes based on various partial specifications.
Random OldNotes
We've decided that as part of a periodic voice file maintenance procedure there should be a full Trace & Sweep garbage collector that uses the "KnowAbout" entries to enumerate all voice ropes and uses of Bluejay Tunes. It will not be responsible for GC'ing the KnowAbout entries, which are the responsibility of higher-level packages, (see the file reference, below, however), but it will be able to combine trace information with timeout information during the sweep phase to eliminate Cypress structures defining vRopes that aren't needed any more, and also to develop tune reference counts, to know which tunes to flush. A side-effect of this decision is that the expensive reference counting mechanisms that summarize the "retain" bits in the current Nuthatch implementation can be dispensed with -- a good thing, since it would get a lot more expensive when VoiceRopes were added.
There should be a 1-1 correspondence between a Jukebox and a "Nuthatch" data base -- the one that keeps track of voice ropes on the Jukebox. There should be no other access to a Jukebox than through Voice ropes. This solves a tricky problem of when to believe the tune count information derived during the T&S GC activity -- and coalesces a whole lot of previously disparate stuff.
Some combination of the VoiceRope implementation and Bluejay can keep track of the allocation of tunes to users for recording. Part of the active state of BluejayUtilsImpl.
Implementation
Discussion
The implementation of voice ropes consists of three packages: Bluejay (the repository for voice files), which stores its files on a separate, specailly formatted disk; the Voice Rope implementation itself, which represents its information as a Cypress database backed up by logs that are also written on Alpine (??); and a client-program package. Here we'll discuss only the first two.
Bluejay: A hang-up to higher-level designs has been the rigid tune structure. A present-day Bluejay Tune is given a name that is an index into a fixed-size table, presently around 3000 entries. One is therefore obliged to practice tune conservation and complicated tune management. What we'd actually prefer is that utterances just be assigned regions in a large virtual memory. Since more than one might be being recorded at once, that is impractical.
We intend to revise Bluejay's directory system so that it will generate UniqueId's for tunes from a large (48-64 bit) address space, as it is recording new ones. A tune can now be an utterance, or (at the discretion of the client (voice rope implementation)) a set of utterances. But it's considered OK to make it a single utterance. It is now more likely that entire tunes will become free simultaneously, so that complicated internal reallocations are not necessary. So the indexed file directory will now have to be hashed, and will have to be able to be variable-sized. If this proves difficult at first, we'll fake it with a hash-table implementation on top of the present tune-structure -- 1:1 with recorded tunes.
Now a tune can easily be associated with a single encryption-key, so we'll store it in the leader page for the tune. We'll also store a creation time, and a creator RName. Possibly also a length.
Voice Ropes: These are stored in Cypress. Represented by entities whose names are their VoiceRope ID's. Described by relationships in the voice rope relation, sketched below.
We've made one significant decision about voice ropes: their representations will be flat, rather than hierarchical. Not quite true: they will always be described by a one-level hierarchy. This means that the description of a voice rope will be a linear list of [TuneID, TuneInterval] pairs. This avoids having to wander around in Cypress to get the description of a rope is the result of repeated editing. It will be cost-effective only if editing does not produce ropes that are too complicated. Our preliminary predictions of how applications use these things indicate that this is a pretty good bet.
During recording, the system can use some default minSilence value to produce a set of intervals for an utterance -- corresponding to talkSpurts -- rather than a single entry. This will make subsequent DescribeRope operations more efficient.
Interest: We toyed with combining the representation of voice ropes with the representation of the interest by clients in them (needed for goodstuff retention during garbage collection); this would require producing a new VoiceRope, containing a copy of the description of the old one, each time someone new expressed interest in a rope. Since we want to be able to copy a file containing VoiceRope references, then indicate an interest by that file in the ropes without changing the contents of the file, we chose to stay with a separate interest relation.
Tuples in the interest relation allow a client to associate with a VoiceRope a unique identifier (an InterestID), of a specified InterestType, along with an optional timeout. As long as this tuple exists, the associated voice rope will not be collected. There are three ways for an interest tuple to go away:
1. Explicit deletion by client via ForgetAbout.
2. Expiration of optional timeout.
3. Disappearance of the referent of the associated InterestID. (Discussion continues)
In a system like Walnut (with voice), there is sufficient control over create, copy, move, and delete operations that a client-controlled management of these interest tuples is feasible. Once we start dumping VoiceRope id's into Tioga files and start copying the files around, all bets are off. We don't want to saddle our users with too difficult a management job. So the idea is that for uses of this type we use a special InterestType ("File"), then supply a fully-specified file name (possibly time-stamped as well) to an IFS or Alpine file as the InterestID. When the garbage collector runs, it assumes that if the file no longer exists, the creator is no longer interested.
One can now copy a file with voice ropes in it. The copy will be able to refer to the contained ropes without danger of their disappearing until the original file (the one with the back-reference in its interest relation) has gone away and the garbage collector has noticed. To hang onto the references longer, some program or other needs to make a new interest entry citing the new file copy.
FS.Copy could eventually pick up some smarts about the files it's copying -- Fiala needs this ability to deal with compressed files, too. In the meantime, the temptation is to change SModel so that as it puts a file on a file server, if it has voice in it (a property of the root node) it will update the appropriate interest relations. (Probably need a property parameter that suppresses this automatic update of interest.)
Logging: (Early thoughts) The VoiceRope server needs to keep a Walnut-style log of all its operations. It will have to be segmented so that nearly-full logs are not included in expunge operations -- like current Walnut design. This log is for reliability. (It can be kept short if we also back up entire segments once in a while, but no present plans.)
It appears there will also need to be a log at least optionally maintained by each client. This will allow update requests (to the interest relation stuff, anyhow) to be made even when the VoiceRope server is down. Lets Walnut and SModel proceed.
This can be made less onerous by developing a semi-automatic logging technique based on "pickling Apply"s -- storing procedure names, parameters, and possibly return values based on Cedar AM or Lupine-like automatic processing. So logging is made simply a matter of some simple initialization, then making the logged procedure calls in a peculiar way.
Types
[See also the voice rope types, above.]
TuneID: TYPE[2 to 4]; -- Unique ID. Time stamp plus possibly some obfuscating randomness.
VoiceRope: ROPE; -- as above.
Describe a unique interval within a Tune.
Tune -2 or something will indicate a silent interval of the specified length.
UTID: TYPE = RECORD [
tuneID: TuneID,
start: INT,
length: INT
];
VoiceRope Relation:
VoiceRopeTuple: TYPE = RECORD [
vrID: VoiceRope, -- its ID
creator: ROPE, -- e.g., Cabrera.PA
length: INT,
intervals: SEQUENCE numPieces: INT OF UTID
];
Interest Relation:
InterestType: TYPE = ROPE; -- e.g., "GVID", "SysNoise", "IFSFile", "AlpineFile"
InterestID: TYPE = ROPE;
e.g., "Bland.pa $ 52#65@10 Dec 84 09:34:12 PST", -- GVID type
"BeepTune", -- SysNoises type
"[Ivy]<Swinehart>Documentation>Script.tioga!5[14 Dec 84 09:34:12 PST]" -- IFSFile type
InterestTuple: TYPE = RECORD [
vrID: VoiceRope, -- its ID
type: InterestType,
interestID: InterestID,
viewer: ROPE, -- e.g., Ritchie.PA (owner of the interest entry)
expiration: BasicTime.GMT
];