[Indigo]<Voice>LCSAudio>voicevsdata.bravo!1

Real-time communications, and in particular, packet voice, put substantially different demands on a datagram based internet. Relevant facts about the requirements of voice communications are presented. The differing needs of real-time (voice) users and data users are discussed. Some possible ways of managing the operation of a combined voice and data internet are described. A concrete proposal for incorporation of voice and other real-time applications into OISCP is made.

Telephone quality voice can be achieved with rates of 8000 bits per second or so, but at this writing, the required techniques are computationally expensive. Intermediate bit rates are a possibility, but 64,000 bits per second represents the present telephone industry standard. For this reason, we restrict our attention to 64 Kbps telephone industry compatible speech. Such digital voice signals consist of sampling the voice 8000 times per second and representing each sample as an 8 bit encoding of the amplitude of the voice.

While a conversation between people is usually full duplex (both people can talk at once), usually only one participant at a time is speaking. In addition, when a person is speaking, there are often gaps between words or sentences. On the other hand, both partici[ants occasionally speak at once. Over conversations in general, something like 47% of the full-duplex channel is used.

The laws of large numbers apply to these statistics. Useful data points derive from the telephone industry use of Time Assigned Speech Interpolation (TASI), in which a certain number of trunk circuits (e.g. transoceanic cable circuits) are overcommited. If 24 full duplex trunks are available, usually 36 conversations can be supported, for a ratio of 1.5. If 150 circuits are available, 300 conversations can usually be supported, for a ratio of 2.0. [Notes on the Network, BSTJ] These statistical effects are usually referred to as the TASI advantage.

Consistant service is perhaps the wrong title for this concept. Once set up, a voice connection should maintain an adequate quality. In the presence of network overloading it would be better to reject (block) connection attempts altogether than to offer poor quality. (A corollary to this is that it is certainly better to block new calls than to degrade old ones.)

Suppose a medium banwidth (9600 bits per second to 56,000 bits per second) asynchronous (start-stop) serial line with no flow control is connected to an OIS gateway. If the internet side of the gateway cannot keep up, the gateway buffers will eventually overflow and data will be lost. This example exists today with the Research Internet Data Line Scanner (DLS); if a user of the dial-out services cannot keep up with the speed of the DLS line, data is lost.

This example is not strictly real-time. There are no particular delay requirements, but there is a bandwidth requirement. As a somewhat contrived example, suppose an internet is used to link two such non-flow controlled circuits (of the same speed). The delay introduced by the internet (by buffering and transit delay) can only increase. Once capacity on the outgoing circuit is left idle, the time lost can never be made up.

Suppose there is a producer of data for the real-time application that delivers data at a constant rate. The data is collected at the originating end until a full packet is accumulated. The packets are sent (at a constant rate) to the receiver, where the data is doled out (at a constant rate) to the consumer of data. Some amount of data, perhaps less than a packet’s worth, perhaps more, is buffered at the receiver to smooth out jitter in the arrival of packets.

For a printer application, an empty buffer might mean a missed scan line. For a voice application, an empty buffer might mean a momentary hiccup in the conversation. Probably neither case is absolutely catastrophic, but the such hiccups must not occur at more than some acceptable rate.

To illustrate the operation of the first principle, we offer two examples, the Ethernet and point to point links between gateways. On the Ethernet, everyone is equal. Generally speaking, when the load is low, everyone gets the bandwidth they need and the issue is moot. When the offered load exceeds the bandwidth of the net, the contending users share equally [Hupp & Shoch]. (In fact, the contending users share the actual bandwidth in proportion to the rate at which they become ready, see elsewhere.) For the point to point line case, the current gateway program allocates the line first come first served, and maintains a queue of packets to transmit on the line. If the queue exceeds a certain limit, packets are dropped. By symmetry, everyones packets are dropped with equal probability. (Actually, gateways promote small packets; by generating great numbers of small packets, a client could get 100% of a phone line, locking out other users. I think this is a bug.)

The effect of all this from the standpoint of a data-only internet, is that even when the communications capacity is greatly overcommited, all users get at least some of it, thus all users make forward progress (if you wait long enough, your file transfer will finish).

Real-time communications (including voice) are fundamentally different. Once a connection is set up, it should get the bandwidth it needs. It is better to refuse service altogether than to use internet bandwidth providing poor service. Consider what would happen if an "equal-sharing" network were slowly loaded with telephone calls. As the first users pile on, everything works fine; there is enough capacity for all. At some point, the demand exceeds the supply and the sharing property of the network allocates the available bandwidth equally to all contending users. All the telephone calls fail at once! It would have been better to refuse service to the "last-straw" phone call, thus limiting the outrage.

These matters can be interpreted as optimizing an objective function. The objective function for data users might be the sum of the logarithms of the bandwidths per user: more bandwidth is better, the channel is shared equally, and getting zero bandwidth is infinitely bad. The objective function for real-time users might be the sum of step functions with the various jumps at the required bandwidths for the various users: more than a certain amount is ok, less than that amount produces nothing. The maximum of this function is achieved by allocating the requested bandwidth to each user until capacity is reached, other users get nothing (but may try again later). It is not clear how to combine voice and data users within this model without adding information on the realative worths of data and voice.

There is an interesting analogy here between data vs. voice users of communications and time sharing vs. personal computers. The capacity of a time shared computer is allocated equally (usually) among contending users, the capacity of a collection of personal computers is allocated in "sufficient size" chunks up to the limit of the number of computers and none thereafter. In one case the advantage of adding capacity is that more people can work, in the other, everyone can still work, but their work gets done faster.

In the telephone industry, there is the notion of probability of blocking. Given a certain number of physical trunks between A and B, and certain statistics of the numbers and durations of calls placed, there will be a certain probability that all the trunks will be busy when a call arrives. Traffic engineering is the the business of providing enough trunks so that the probability of blocking is acceptably low, subject to economic constraints. (Typically, users are charged more during periods of high load than at other times. This tends to even out the loading and raise the average utilization of the trunks.)

2. Class of service. We will need some bits to use for internet voice. Consider the type-case of two 10 Mbit Ethernets connected by a point-to-point 1.5 Mbit link. There is plenty of bandwidth around, but it is not infinite. My proposal is for a "real-time" bit in the class of service field, with a few bits for "how much". The how much field might be the log of the required bandwidth or something. (We might want a special bit for 64Kbit voice, see below "TASI").

A pair of routers connected by a 1.5 MBit line would have a parameter indicating that up to 1 MBit of line capacity may be used for voice (or other real-time), with the remainder reserved for data. When there is less than 1 Mbit of real-time traffic flowing, the idle capacity can be used for data datagrams: (and the data queue empties faster), but when there is real-time traffic around, it gets reserved capacity. The routers keep an eye on packets coming in. Suppose the router sees a real-time, how-much=64 Kbit packet for a new source-destination pair. The router takes this as a hint that a new "stream" is being set up and makes a table entry "reserving" capacity for the connection. By using the how-much field together with the packet length, the router can predict when the next packet of the connection is expected. The table entry can be deleted (timed-out) if the next packet doesn’t show up. (Thus there is no "stream setup" protocol, it is all done with hints.) When it happens that the n-th+1 apparent stream shows up, the router drops the packet and sends an error replay "no capacity now". What must happen in a system like this is that:

It may be necessary to have an explicit voice bit (with "64-Kbit" subscript) rather than just "real-time". A typical phone call uses each half-duplex path slightly under 50% of the time. Overseas cable is overcommited for this reason. A "Time assigned speech interpolation" unit assigns you a trunk only when you are talking. Thus 24 actual 2-way trunks can carry 36 conversations or 150 actual trunks can carry 300 conversations. (The laws of large numbers are not fully in gear with only 24 trunks.) I envision a speech connection would send 50 160-data-byte packets per second while talking and would also send small packets at a lesser rate during silence in order to let the routers know (via the hint mechanism) that the ’connection’ was still there. The router could actually get away with allowing, say, 20 ’connections’ over the 1 Mbit of capacity rather than only 16. Only for brief periods would the offered load from the 20 conversations exceed 1 Mbit.