XEROX
Filed on: <Audio>Doc>VPVFS.bravo
Storage and retrieval of voice from a network file server will serve as the connection between the "real-time" voice world of Etherphones and the non-interactive world of voice messages.
FEASIBILITY
In raw horsepower, our existing collection of hardware provides adequate performance for a multi-user voice storage system. Nevertheless, we feel that it will be a challenge to provide for multiple simultaneous file stores and retrieves.
Precedents
A single Alto with model 31 disks and running the standard bcpl FTP can support a real-time voice file retrieval at 128 Kbps, using 8000 bytes of buffering at the receiver. The same Alto is incapable of extending a file at 16 Kbps, although writing into an existing file works well.
An unloaded IFS can be used for file storage at 64 Kbps and retrieval at 128 Kbps. Juniper can both store and retrieve at 64 Kbps although the initial delays are long.
A "voice File Server 0" program, by W. Nowicki, supports a single file store or retrieve on a Dorado using the file stream interface of Pilot.
PERFORMANCE REQUIREMENTS
Bandwidth
64 Kbps voice corresponds to 8000 bytes per second, or 16 Alto disk pages per second or 4 IFS pages per second. Providing two one second disk buffers per voice connection to a file server would permit (for contiguous files) use of only one disk seek per second per connection while using only 16,000 bytes of memory. 64 Kbps is not very fast.
Capacity
A single T-80 disk with Pilot formatting provides 113,680 512 byte pages for a total capacity of 58.2 Mbytes, corresponding to just over two hours of voice. A T-300 with 2048 byte sectors provides 253.7 Mbytes, corresponding to 8.8 hours of voice. While we have not tried to guess how voice messages may be used, such amounts seem adequate for the next year or so.
Timing
The voice file server must speak to the Etherphones. Because file storage and retrieval is less interactive than conversation, we feel that the voice transmission protocol used to communicate with the file server may reasonably involve higher delays in order to reduce the numbers of packets per second. (There may be a problem since we need to support the recording of conversations.)
When multiple actions are proceding at once, however, a very large number (several hundred) of packets per second may be involved.
During playback, the voice file server must maintain a sufficiently accurate clock to accurately meter outbound data to an Etherphone. The Etherphone does not have enough memory to handle uneven flow. One possibility would be for the Etherphone to occasionally (once a second) transmit the time to the server. Such a scheme would only require the file server to have a clock accurate to 50 milliseconds or so over the course of a second.
ENVIRONMENT
We believe that Pilot running on a Dolphin with Trident disks is the appropriate environment for the voice file server.
The reasons in favor of using Pilot are:
1. Alto is on the way out, Pilot is on the way in, we should go with the wave of the future rather than get undertowed by the past.
2. Pilot (Cascade) provides a better programming environment then Alto Mesa, so the VFS will come up faster.
3. The Pilot interface to the disk is better than the Alto one and provides good performance for large transfers. Hence we may avoid writing new disk management routines.
4. D machines provide much better performance than Altos -- performance will be important.
5. There is at least some chance that Alpine will provide suffient performance for a VFS. If not, we may still be able to borrow some components.
Reasons against using Pilot are:
1. There is not yet a Trident disk controller for Dolphins, so we will have to use Dorados until the Trident controller becomes available for Dolphins. Another alternative is to use the main disk of a Dolphin.
The consensus is that I should start working in Pilot/Dorado land, move with Cedar as it becomes available, and migrate to Dolphins when the controller is born.
IDEAS
The voice file server will provide services mediated by the Etherphone Server. The fundamental file server requirement is speed. We will put most of the control machinery elsewhere.
Piece Tables
The file server function must provide piece table like functions in order to permit editing of voice messages without copying of data.
Arguments for implementing piece tables directly:
- All real-time stuff should be in the VFS, no real-time stuff should be outside of it. If piece tables are outside, then they have to satisfy real-time constraints.
- If piece tables are outside, then storage management of the VFS must be at least partially outside, since pieces may have multiple references and cannot be recycled without knowing the (external) state of those references.
Arguments for implementing piece tables elsewhere:
- Inclusion of the piece tables in the VFS will complicate it, thereby delaying its operability and jeopardizing its ability to run in real time.
- If piece tables are managed externally, then we need not implement them at all for starters. When an implementation occurs, we have much more flexibility to play with alternatives.
Letting the real-time constraints of piece table management get outside the VFS is really no big deal: the bandwidth of this information will be 2-6 orders of magnitude less than the bandwidth of the actual audio, so external implementations should have no trouble keeping up with the VFS. In most cases, a single message can give all the pieces for an entire playback. The general consensus is that the 2nd alternative (no piece table in the VFS) is the way to go.
What about problems with too many too small pieces? If pieces get too fragmented then the VFS won't be able to play them back in real time. Furthermore, what constitutes "too fragmented" is dependent on the particular VFS implementation, and so should be decided by the VFS rather than some external party. Our proposal is that the VFS should provide a command CheckPiecesOk that will indicate if a given set of pieces can be played back in real time. Ostensibly this command would be used by editors: if the answer is "no way" then the editor can combine pieces to achieve real-time ability.
What is the minimum grain of pieces? Should pieces be specified as whole disk pages (substantial fractions of a second at a time), individual samples, or at some intermediate level? For encryption, units on the order of 64 bits at a time or larger have to be dealt with together. We propose that the start and stop times for pieces should be specified in 8-sample chunks. This turns out to be one millisecond's worth, and that has a nice ring to it. What kind of time resolution do real audio wizards need? If the VFS were used to edit the soundtrack for a film would 1 millisecond resolution be adequate? One possibility is that clients must require enough data to place pieces on reasonable boundaries (if they need the resolution).
We suspect that 10 to 50 millisecond resolution would be adequate.
High level interface
The high-level command interface for the VFS (i.e. that used between it and the Etherphone server) should provide something like the following calls:
- CreateFile: does the obvious thing, sets file's ref count to 1.
- ExpungeFile: may not be necessary, see reference count stuff below.
- Append: audio is supplied from some internet source and is appended to the end of the file. This operation and file deletion are the only ones that modify the contents of voice files. In particular, there are no insert or replace operations.
- Playback: a piecelist containing (file, extent) pairs is specified; the indicated audio is shipped to a sink somewhere in the internet.
- StopTransfer: causes an Append or Playback operation to terminate.
- Copy (?): simply appends a piece of one file onto the end of another file. This command could be simulated with an Append+Playback combination where both source and sink are the VFS. This may not be necessary, especially for starters.
- IncRefCount: used for storage management. Causes the VFS to increment the reference count for a given file.
- DecRefCount: Causes the VFS to decrement the reference count for a given file. When a reference count becomes zero, the VFS should feel free to reallocate its space. See note 5 below about alternate mechanism for reference counting.
- ChangeStopTime: see note 4 below.
- CheckPiecesOk: see note 6 below.
Additional maintenance-level commands may be needed, such as:
- Enumerate directory: get a list of file id's.
- Store and retireve via FTP (or some other "data" protocol). Clients other than Etherphones may want to store or retrieve voice files in a non-interactive way.
File Storage Model
It is assumed that the VFS will use some mechanism to avoid storing lots of zeros for silence. One technique: use a "magic" block number in file descriptors to indicate "all zeroes so no storage was allocated." The technique used in the Voice File Server 0 program is to store essentially images of the voice packets received, so that silence is not stored and every piece of the file carries sequence number information.
One possibility implementation of voice files is to store the Ether packets exactly as they arrive. This makes recording and playback easy, but makes it somewhat harder to find a particular time in a file (even binary search will take time: the effect will be to increase the minimum piece size)..
Reliability
What if the Etherphone server dies?? In order to avoid dangling recording sessions that eat up all available disk space, all Append and Playback operations should have timeouts. One possible scheme:
- On issuing each Append/Playback command, the EPS specifies an "autostop" time. If this time is reached, then the VFS will automatically terminate the Append/Playback. For Playback, the piece list implies a stop time anyway.
- To continue an Append, the EPS periodically issues ChangeStopTime messages to advance the autostop time farther into the future. In messages containing several pieces, the ChangeStopTime command can be combined with additional piece specifications.
Note that this also gives the EPS a modicum of control over storage allocation in the VFS, since it can interrogate the VFS to find out how much space is left and, if space is getting tight, determine who gets to use how much.
Page Level Reference Counts
An alternate scheme for reference counting is to place reference counts on segments of files rather than on whole files. If piece tables start getting used a lot then there will be many files sitting around with only one small piece in use. If reference counts are on a file basis, then the whole file will have to sit around wasting space. An alternative is to do reference counting on a piece basis, with the IncRefCount and DecRefCount commands referring to pieces rather than files. The VFS could then mark individual blocks. This doesn't appear to involve substantially more work for either party, but would allow blocks to be recycled individually (e.g. turn unused blocks to silence). This may be left out of the initial implementation since few piece tables will span multiple files. However, even if a piece table only spans part of one file, 90% of the file may still be unused.
We are still divided on this issue.