TiogaVoiceDoc.tioga
Copyright Ó 1987, 1988, 1990 by Xerox Corporation. All rights reserved.
Polle Zellweger (PTZ) April 24, 1990 7:59:44 pm PDT
TIOGAVOICE: VOICE ANNOTATION AND EDITING
CEDAR 7.0 — FOR INTERNAL XEROX USE ONLY
TiogaVoice
Voice annotation and editing
Polle Zellweger and Stephen Ades
© Copyright 1987, 1988 Xerox Corporation. All rights reserved.
Abstract: TiogaVoice allows users to incorporate voice annotations into Tioga documents. The user interface is designed to be lightweight and easy to use, since spontaneity in adding vocal annotations is essential. Voice within a document is shown as a distinctive shape superimposed around a character, so that the document's visual layout and its contents as observed by other programs (e.g., compilers) are unaffected. Users point at text selections and use menus to add and listen to voice.
Simple voice editing is available: users can select a voice annotation and open a window showing its sound profile. Sounds from the same or other voice windows can be cut and pasted together, and a lightweight `dictation facility' that uses a record/stop/backup model can be used to record and incorporate new sounds conveniently. Editing is done largely at the phrase level (never at the phoneme level), representing the granularity at which editing can be done fastest and with least effort. The voice itself can be annotated with text. This and several other features have been designed to add speed and meaning to the editing process. The dictation facility can also be used when placing annotations straight into documents.
Created by: Stephen Ades and Polle Zellweger
Maintained by: Polle Zellweger <PolleZ>
Keywords: voice annotation, voice editing, multimedia documents, Tioga, Finch
References: S. Ades and D. Swinehart. Voice annotation and editing in a workstation environment, Xerox PARC report CSL-86-3, September 1986, available online as /Indigo/CSL-Notebook/entries/86CSLN-0023.tioga. The current user interface differs slightly from the one reported there (for example, the new user interface uses PopUpButtons).
XEROX  Xerox Corporation
   Palo Alto Research Center
   3333 Coyote Hill Road
   Palo Alto, California 94304

For Internal Xerox Use Only
1. Background
This document describes how to annotate Tioga documents with voice and how to play back or edit that voice. Let's begin with a simple description of the voice recording and playback capabilities provided by the Etherphone system.
Requirements
Your workstation must have an adjacent Etherphone, and you must be running both Finch and the TiogaVoice system. A request to record or play back voice establishes a conversation between your Etherphone and the voice file server.
See /Cedar/CedarChest®/FinchDoc.tioga for more information about Finch and the Etherphone system.
Recording and editing voice
The first thing you should know is that all voice bits are stored on the voice file server; voice is included in documents by reference. This minimizes the effects of the expense of voice on users. Voice requires up to 64 kilobits per second (standard telephone quality) in our implementation, depending on how much silence it contains.
A two-level scheme analogous to Cedar ropes permits safe and efficient reference to as well as efficient editing of voice. A piece of voice recorded at one time is stored in digital form as a voice file on the voice file server. Voice files are edited (without copying the voice bits) by means of voice ropes. A voice rope is a database entry, also stored on the voice file server, that specifies a sequence of segments of one or more voice files. Because voice ropes are immutable, documents that contain references to voice ropes behave identically to documents that contain actual voice bits, but they are much smaller.
You should also be aware that the voice on the voice file server is managed by a garbage collection scheme based on voice interests. TiogaVoice registers these interests whenever you copy a Tioga file using Cedar to a global location (i.e., a file server).
Note: Local-only files, files copied when TiogaVoice is not running, and files created by chatting to an IFS and performing IFS copies, do not register voice interests for voice that they contain. Therefore, that voice may be garbage collected if space is needed on the voice file server. The Tioga contents of Walnut messages are also temporarily in this category, pending the release of a new WalnutVoice based on TiogaVoice.
For further description of the voice management system, including voice ropes and voice interests, see: D. Terry and D. Swinehart. Managing stored voice in the Etherphone system. To appear in ACM TOCS, February 1988. Stored as /Indigo/CSL-Notebook/entries/88CSLN-0002.tioga .
A final caveat: although backup mechanisms exist for the voice stored on the voice file server, they are carried out only infrequently at present. We have not experienced any problems with our voice file system or its disk, but it is always possible.
2. Getting started
To start up TiogaVoice:
% PushV Commands
% Bringover -pm TiogaVoice.df
% InitializeTiogaVoice -- brings over fonts into the proper subdirectory;
alternatively, you can full boot
% Pop
% TiogaVoice
Every Tioga viewer created after TiogaVoice is started will contain a Voice menu button for controlling voice operations. The remainder of this document describes the recording and playback operations available in Tioga viewers, the voice editing operations available in voice viewers, a CommandTool command, and how to recover lost voice ropes.
Note: TiogaVoice is a bit of a gfi hog. It uses 9 to 16 gfis more than Finch, depending on how much you already have in your environment, for a total of 119 gfis (measured from a very naked state).
  30 Interpress
4 ImagerMemory
3 Artwork
4 PopUpButtons
2 TiogaButtons
37 LoganBerry
30 Finch
9  TiogaVoice
119
3. Operations available in a Tioga viewer
The top-level menu button Voice, at the far right of a Tioga viewer's first menu line, toggles the voice menu on and off the bottom menu line, as shown in the following screen picture. The pictured demo file, FrenchDialogue.tioga, is available as part of the TiogaVoice df file. It contains a French dialogue with a scanned picture and voice annotations in French and English.
[Artwork node; type 'Artwork on' to command tool]
Menu operations
AddVoice:
To add a voice annotation to a Tioga document, simply select a character, button AddVoice in the menu header, wait for the beep, and speak into the microphone or telephone handset (the latter will give much better quality). When you have finished speaking, button STOP in the menu header. TiogaVoice will place a voice balloon (intended to evoke a cartoon voice balloon) around the character to indicate the presence of voice. Two sample voice balloons are shown in the screen picture above.
Notes:
1. Voice balloons are visible only when TiogaVoice is running and Artwork is on. If TiogaVoice is not running, opening a document that contains voice will post a warning message in the system Message Window [in the form of a Tioga style error message: "Style error in "Postfix" rule at 0..0 in doc - Document contains voice; load TiogaVoice to see annotations"].
2. If you have a pending-delete selection, TiogaVoice will remove all voice in the selection, leaving the characters, and place the new voice as described below.
3. If you select more than one character for an AddVoice operation, TiogaVoice will place the annotation on the character nearest the caret.
4. If you have a point selection, TiogaVoice will place the annotation on the following character. If there is no following character, it will try the preceding character. If there is no preceding character, see the next case.
5. If you have no selection, when recording finishes TiogaVoice will open a voice editing window on the newly-recorded voice to allow you to store it wherever you like.
PlayVoice:
This sentence contains a sample voice annotation, indicated by the voice balloon around the character "v" in "voice". To hear the voice, select the word "voice" and button PlayVoice in the voice menu. This sentence contains a Stephen Ades memorial annotation, recorded in April 1986.
The PlayVoice button plays all voice annotations in the selection in succession.
STOP:
The STOP button halts all voice playback and/or recording operations in progress.
EditVoice:
The EditVoice button opens a voice viewer on each voice annotation in the selection. Each voice balloon in the selection is also opened to display the name of its corresponding voice viewer. That is, each voice balloon is replaced by a grey rectangle enclosing the words "Sound Viewer #n". The following screen picture shows an open voice balloon and its corresponding voice viewer.
[Artwork node; type 'Artwork on' to command tool]
Various editing operations can be performed on the voice in a voice viewer (see Section 4). These changes will not be reflected in any Tioga document until one of the following two voice viewer menu commands are invoked: Save or Store.
Open and closed voice balloons
A voice viewer has no more than one home document at a time. The home document for a voice viewer is changed only by the Store voice viewer command.
The home locations of a voice viewer are the locations (in the home document) that will be updated when you button Save in that voice viewer. The home locations of a voice viewer named "Sound Viewer #n" are shown by grey rectangles containing its name (which mark the immediately preceding character). A voice viewer may have zero, one, or several home locations at a time:
It has zero if it was created by buttoning DictationViewer in a Tioga or voice viewer, or by buttoning AddVoice in a Tioga viewer with no Tioga selection. It also has zero if its link to its home document was broken via the home document's DeleteLinks menu button.
It typically has one, equal to either its original location before editing or the last place that it was stored (via its Store menu button).
It has more than one if text including an open voice balloon was copied to another place in the home document. (If text including an open voice balloon is copied to a different document, the link in the new document is broken and that copy of the open voice balloon reverts to a closed voice balloon. The unedited version of the voice is stored in the new document.)
Please note that whether a voice balloon is open or closed does not say anything about the edited state of the corresponding voice viewer; it merely shows the existence of a link between the text and the corresponding voice viewer.
DeleteVoice:
This button removes all voice annotations in the selection, leaving the text untouched.
Note: This operation does not delete the voice from the voice file server. Voice ropes are managed by a garbage collection scheme that makes them available for deletion when their reference count goes to zero. Actual voice rope garbage collection occurs infrequently at this time. (In fact, as of December 1987 we have yet to delete anything, because we have not exhausted the 8+ hours of storage on the voice file server).
DictationViewer:
This button starts recording and opens a new voice viewer to allow you to control that dictation. The resulting recorded voice can be stored in any Tioga document. If recording is already in progress, the entire newly-recorded section of voice is placed in the new viewer and recording continues. The DictationOps menu button in the new viewer, which provides a model of a dictation machine, may prove particularly useful in reviewing and/or correcting the newly-recorded voice (see Section 4).
DeleteLinks:
This button breaks all of the TiogaVoice system's internal links between voice viewers and this document. As a result, no voice viewer created from this document will remember where to Save itself to (use Store instead).
Basic Tioga operations
Copying text with annotations
Copying an annotated character copies both the text and its annotation.
If the annotated character is open for voice editing, the unedited version of the annotation is copied. If an open annotated character is copied to another place in the same document, TiogaVoice creates another open annotated character, which represents another home location for the corresponding voice viewer. If an open annotated character is copied to a different document, TiogaVoice closes the annotated character, indicating that it is not a home location for the corresponding voice viewer.
Editing text with annotations
Selecting an annotated character pending-delete and replacing it with other character(s) deletes its voice annotation. However, an annotated character's looks or other properties can be changed without discarding its annotation. Inserting characters or editing characters that are not annotated does not disturb any annotations in the document.
To edit a text segment without discarding its annotations, select the segment, invoke the EditVoice menu button to create voice viewers on the annotations, replace the text, and store the voice annotations in the desired new locations.
Deleting text with annotations
Deleting an annotated character deletes both the text and its annotation.
4. Operations available in a voice viewer
Voice representation
A voice viewer contains a voice capillary (from capillary tube), which is a sequence of dark and light characters that represent the structure of sound and silence in one voice rope. Dark characters indicate the presence of sound, while light characters indicate silence. Some characters have dark and light portions; they contain both sound and silence. Each character in a voice viewer represents one-quarter second of voice. All characters are fixed-width, thus the length of a voice rope is proportional to the length of its visible representation.
Contextual cues
Contextual cues help users find desired editing locations within the relatively simple voice capillary representation. There are four kinds of contextual cues: a playback cue, temporary voice markers, permanent textual annotations, and a color representation of the recent editing history.
Playback cue
The playback cue is a gray rectangle that moves along the voice capillary during playback. It shows the location of the voice currently being played.
Temporary voice markers
Temporary voice markers can be used to mark locations for future voice or textual edits; they last for one voice editing session. A temporary voice marker appears as small x in a voice capillary. Use the Mark menu button to insert a temporary voice marker (see the description of the Mark menu button below).
Permanent textual annotations
Simple textual annotations can be permanently added to a voice capillary by selecting a location and typing the desired text. No looks or properties are permitted in textual annotations. Limited editing capabilities are available, namely BackSpace (BS or CTRL-A) and BackWord (CTRL-W).
Alternatively, textual annotations can be copied between text and voice viewers using the standard Tioga operations (namely SHIFT-select or CTRL-S select). In addition to easing the annotation of transcripts, this method can be used to add multi-lingual or mathematical annotations expressed as 16-bit Xerox Character Codes. [For the expert, textual annotations are expressed in the Xerox String Encoding, so BackSpace and BackWord may encounter more characters than those shown.]
In some cases only part of a textual annotation is displayed. Textual annotations are clipped by the edge of the voice viewer in the case of a multi-line voice capillary (that is, a textual annotation does not fold to a new line), or by the beginning of a following textual annotation.
Color editing history
When a voice viewer is on a color display, color is used to show the recent editing history across all voice viewers. The results of the four most recent voice editing operations are shown in the following colors (from most to least recent): yellow, orange, red, and brown. All older segments are shown in black. You can think of this scale as either lighter colors to darker colors or as a measure of the "heat" of the operation, like hot metals (excluding white-hot because it would be invisible). If you have a color display, voice viewers open there by default.
When a voice viewer is on a monochrome display, all portions of the voice capillary are shown in uniform black. Grey patterns are not used to approximate the different colors because they would make it difficult to see the playback cue.
Basic Tioga operations
Basic operations, such as selecting, deleting, and copying voice, use the same commands as in Tioga text viewers.
To select a voice character (one-quarter second of voice), use the left mouse button over the desired character. To select a voice phrase, use the middle mouse button. A voice phrase is a segment of voice between two voice boundaries; either a temporary voice marker or a significant segment of silence acts as a voice boundary. Use the right mouse button to extend a character or phrase selection.
Use the DEL key to delete a previously-selected voice segment, or select the segment while holding down the CTRL key.
Use SHIFT-select to copy voice segments from any open voice viewer. Use CTRL-SHIFT-select to move voice segments from any open voice viewer.
These operations can also be performed via the Operations menu in the EditTool.
Note: there is no UNDO for voice editing operations. However, note that a reference to the original unmodified voice still exists in the Tioga document until you save the voice viewer.
Menu operations
Add:
These operations control the recording of new voice into voice viewers. Recording begins after the beep. Arrows are added to the viewer at regular intervals (every 1/4 second) to indicate recording progress.
AddAtSelection: (Add[left]) This operation is the voice viewer analogue of the Tioga viewer AddVoice operation. It begins recording new voice at the voice selection.
DictationViewer: (Add[middle]) This button is analogous to the DictationViewer menu button in a Tioga text viewer. It opens a new voice viewer and begins recording into it. This operation can be invoked while recording is in progress to allow reviewing and/or correcting the newly-recorded voice (via the DictationOps menu button; see Section 4). In this case the entire newly-recorded section of voice is placed in the new viewer, and recording continues. A temporary voice marker is placed in the original voice viewer at the point where voice was being added. This temporary voice marker allows you to copy the newly-recorded section of voice back into the original voice viewer when it has been completed to your satisfaction.
Play:
PlayViewer: (Play[left]) Plays the entire contents of the voice viewer.
PlaySelection: (Play[middle]) Plays the contents of the voice selection.
STOP:
Use the STOP button to halt any recording or playback operation in progress.
The Disconnect menu button in the Finch viewer can also be used to halt recording or playback, but it may discard the last few seconds of voice.
DictationOps:
These operations were provided specifically to mimic the "stop, listen, [erase,] resume" model of a dictation machine. They use the concept of fresh (vs stale) voice. Stale voice is voice that is already in a document that you are editing, whereas voice that is currently being recorded is fresh. The idea is that users are more likely to wish to edit the fresh voice than the stale: to correct the choice of words or phrasing, eliminate coughs or other distractions, eliminate unwanted hesitations in the wrong place, and so on. When a voice viewer is on a color display, fresh voice is shown in a light color.
On a monochrome display, fresh voice is not visually distinguished from stale voice. Nevertheless, the operations still use this concept. The DictationViewer button (Add[middle]) can be used to prevent new material from merging indistinguishably with the old on a monochrome display. If only DictationOps are used to modify the voice in this viewer, the end of the viewer will coincide with the end of the freshest voice.
PlayFromSelection: (DictationOps[left]) Plays from the voice selection to the end of the freshest voice.
ReplaceFromSelection: (DictationOps[middle]) Deletes the voice from the current selection to the end of the freshest voice and starts recording at the new end of the voice viewer.
AddAtEnd: (DictationOps[right]) Begins recording new voice at the end of the freshest voice.
A sample use of the dictation operations goes like this: You record until you make a mistake or you wish to listen to what has just been dictated. Repeatedly selecting the approximate point with the mouse and buttoning PlayFromSelection allows you to find the end of the last phrase to be retained quickly. Then ReplaceFromSelection is used to replace the remainder of the previously-recorded segment with new dictation. AddAtEnd would be used instead if the replayed passage turned out to be acceptable.
The dictation operations stand apart from the normal rules for playback/recording (which are that one must be stopped manually before another can be requested). A playback or record request from the DictationOps menu implicitly cancels any already in progress.
Mark:
MarkAtPlayback: (Mark[left]) Places a temporary voice marker at the current playback location. This allows users to mark important locations while listening.
MarkAtSelection: (Mark[middle]) Places a temporary voice marker at the beginning of the current voice selection. If the current voice selection is pending-delete, it also places a temporary voice marker at the end of the voice selection.
DeleteViewerMarks: (Mark[CTRL-left]) Deletes all of the temporary markers in the voice viewer.
DeleteSelectionMarks: (Mark[CTRL-middle]) Deletes all of the temporary markers in the voice selection.
Store:
The Store menu button stores the current contents of the voice viewer at the first character of the Tioga selection, sets the (single) home location equal to that character (thus forgetting any previous home locations for this voice viewer), and makes the enclosing Tioga document the home document.
The rules for interpreting Tioga selections are similar to those for the Tioga AddVoice menu button:
1. If you have a pending-delete selection, TiogaVoice will remove all voice in the selection, leaving the characters, and store the voice as described below.
2. If you select more than one character for a Store operation, TiogaVoice will place the annotation on the character nearest the caret.
3. If you have a point selection, TiogaVoice will place the annotation on the following character. If there is no following character, it will try the preceding character. If there is no preceding character, an error message will appear in the system Message Window.
Save:
The Save menu button saves the current contents of the voice viewer at all matching open voice balloons (all home locations) in the home document.
AdjustSilences:
Reduces all silences in the voice viewer to the length set by the SetCriticalSilenceLength CommandTool command. The default value is 500 milliseconds (see Section 5).
5. CommandTool commands
SetCriticalSilenceLength <lengthInMilliseconds>
When the AdjustSilences button is bugged for any voice viewer, all silences of more than this length will be reduced to this length.
6. Recovering lost voice ropes
Recording new voice and/or editing previously-existing voice create new voice ropes throughout an editing session. However, changes to a Tioga document that contains voice become permanent only when the document is saved. If for some reason you wish to access a voice rope that no longer has any references to it (for example, if your machine crashes after the voice is created but before the document is saved), see a voice wizard: Dan Swinehart, Doug Terry, or Polle Zellweger.