ArpaTCPDoc.tioga
Last Edited by: Nichols, September 1, 1983 5:07 pm
Last Edited by: Taft, January 4, 1984 10:59 am
Doug Terry, July 15, 1988 5:08:27 pm PDT
The Cedar TCP Implementation
CEDAR 7.0 — FOR INTERNAL XEROX USE ONLY
The Cedar TCP Implementation
Doug Terry and others
© Copyright 1987 Xerox Corporation. All rights reserved.
Abstract: This is a collection of notes on the implementation of DARPA's Transmission Control Protocol (TCP) in Cedar. TCP is an end-to-end reliable byte-stream protocol allowing communication between hosts in interconnected computer networks, such as the DARPA Internet.
Created by: Doug Terry and others
Maintained by: Doug Terry <Terry.pa>
Keywords: protocol, transport, TCP/IP, Arpanet
XEROX  Xerox Corporation
   Palo Alto Research Center
   3333 Coyote Hill Road
   Palo Alto, California 94304

For Internal Xerox Use Only
1. Introduction
This is a collection of notes on the implementation of DARPA's Transmission Control Protocol (TCP) in Cedar. TCP is an end-to-end reliable byte-stream protocol allowing communication between hosts in interconnected computer networks, such as the DARPA Internet. It is part of a layered hierarchy of protocols which support internetwork applications. As such, it makes use of an underlying, potentially unreliable datagram service (IP) and is used by application protocols (FTP, SMTP, Telnet, etc.).
Many people have contributed to this TCP implementation over the years. They include David Nichols, Ed Taft, Hal Murray, John Larson, and Doug Terry.
2. Usage
ArpaTCP.mesa provides the interface to this package. Use CreateTCPStream to get an ordinary IO.STREAM containing the TCP connection. Reading will wait for data and return it, writing will send data. Flush forces any partially filled packets to be sent with the TCP Push option set. When you are through sending data, do a Close. You may still receive incoming data but may not send any more after the Close. AbortTCPStream will send a TCP reset and destroy the connection entirely.
Urgent data is sent by calling SetUrgent. It sets the urgent pointer in outgoing packets to point to the current point in the stream. WaitForUrgentData is used to asynchronously wait for urgent data from the other end. It will only return when urgent data is received or the connection closes. Warning: This code has never been tested.
CreateTCPStream will raise the ERROR ArpaTCP.Error if it cannot open the connection for some reason. The ERROR IO.Error[$Failed, self] will be raised during an IO operation on the stream if the stream state changes to some state in which that operation cannot ever succeed (e.g. the remote end closes the connection while a write is pending, or the remote TCP stops responding to connection-level probes). A client who wishes to find out the reason for the error may call ArpaTCP.ErrorFromStream[self]. End of stream (resulting from a normal remotely-initiated close) is signalled by ERROR IO.EndOfStream. It is not sufficient to use IO.EndOf to detect end of stream since the TCP package may not know about the end of stream until after the read call begins.
When you open a connection, you may optionally specify a timeout value (in milliseconds) for data on the connection. If a read or write operation has to wait that long for data (or for buffer space at the remote end), then it will raise the SIGNAL ArpaTCP.Timeout. It can be resumed to try again. Note that this timeout is distinct from the connection-level timeout that occurs if the remote TCP stops responding altogether; the latter gives rise to IO.Error as described above.
To create a server, open the connection with active set to FALSE. This will create a single connection in listen state. Call WaitForListenerOpen to wait for someone else to open a connection there. Once the connection is open, you will have to open another connection in the listening mode to handle new connections. I would like to see the interface change here.
3. Implementation
The main parts of the implementation are ArpaTCPMain (which exports ArpaTCP), ArpaTCPOps, ArpaTCPLogging, ArpaTCPTransmit, ArpaTCPStates, and ArpaTCPReceiving, which are each discussed below.
ArpaTCPMain exports the ArpaTCP interface and implements the stream operations. The only interface it uses in the rest of the TCP code is ArpaTCPOps. It knows a little about the structure of a handle, particularly the input and output buffers. This module is modeled fairly heavily on the corresponding module in the BSP code.
ArpaTCPOps is the main interface that provides TCPness. It has most of the type and variable declarations for the TCP implementation, and it is OPENed by most of the other modules. It provides packet streams (more or less) to ArpaTCPMain, and utility routines to the rest of the implementation.
The main data type provided is the TCPHandle. It is a monitored record which contains all the state about a particular TCP connection. ArpaTCPOpsImpl and ArpaTCPReceivingImpl are both monitors that lock the handle passed to their routines. ArpaTCPTransmitImpl is not a monitor, but its routines are only called by procedures that already have the handle lock.
ArpaTCPOps has two routines called StartupTCP and ShutdownTCP which do the obvious things. They are not exported, but can be called from the debugger. When debugging, you should call ShutdownTCP before running the new version. StartupTCP forks two processes, called the receiver process and the retransmit process. The receiver process sits in a loop getting datagrams from the IP package and processing them. The retransmit process wakes up every so often and examines all active connections for packets that should be retransmitting and connections that should be shut down due to timeouts.
ArpaTCPStates provides the interface to create and destroy new handles. ArpaTCPStatesImpl is a monitor protecting the list of all handles. The rule for obtaining locks is that you always get the handle lock before you get the ArpaTCPStates lock if you are going to get both.
ArpaTCPReceiving contains the routines that handle data coming in from the net. It exports ProcessRcvdSegment which is called by the receiver process. It calls ArpaTCPTransmit to send acks and things like that. It queues the incoming data on the readyToReadQueue in the handle. That queue is guaranteed to be in order, and to have non-empty segments on it. There is also a fromNetQueue, which contains packets received out of order.
ArpaTCPTransmit provides routines to send data to the net. There are simple routines like SendFIN to send a single packet, and more complex ones line TryToSend, which may send data and/or acks. When sending data, one queues the outgoing datagrams on the toNetQueue and calls TryToSend.
ArpaTCPLogging provided routines to print debugging information on an output stream. Setting either of the streams logFile or pktFile to non-NIL will cause debugging information to appear on them. These files are holdovers from the original implementation. logFile differs from pktFile in that timestamps are printed as well.
4. Known bugs
We neither send nor process the TCP options. A valid TCP must implement all options.
Sequence number wrap-around is not properly handled (this is very delicate).
IO.Close is used to close down the outbound half of a connection. The caller is then allowed to continue reading from the stream, contrary to what IO.mesa says is legal. The implementation should perhaps provide some other means to close half of a connection and should prevent operations on a stream once it has been closed.
Urgent send and receive have never been tested.
TCP does not handle Large IP datagrams, i.e. those that do not fit into a single buffer.
TCP should do a better job of ensuring that too many buffers do not get allocated and that buffers get freed properly. Currently, buffers are allocated as needed by a sender; received buffers are simply dropped on the floor when no longer needed.
There is no way abort a tcp connection gone bad save to wait for it to time out (no-one is looking at process.abort, e.g.), and furthermore the timeout is not operating for some kinds of calls (like close and AbortTCPStream). Furthermore, if a connection does time out and I either (a) abort the process from the SIGNAL viewer, or (b) try to close the connection from the signal viewer, the connection is never properly closed (and the other end keeps sending keep-alives). In the (b) case, the close or AbortTCPStream hanga indefinitely. I presume the same would happen if I actually caught the signal and tried to do the close there instead of from the interpreter. (Weiser, 16 Oct 87)
The stream procedures in ArpaTCPMain are not monitored, so concurrent calls could get the buffer state thoroughly fouled up. Either move the TCPHandle object monitor out to ArpaTCPMain (from ArpaTCPOpsImpl) or add a new object monitor protecting just the buffers associated with the TCPHandle.
Each packet causes a new TCPRcvBuffer or TCPSendBuffer to be allocated. This is surely a serious inefficiency. Change to maintain a pool of free buffers that is periodically swept up by some background process. But note that determining safely that a buffer is no longer being referenced requires more careful treatment of buffer handles in the main stream procedures.
Fixed-length timers are used for retransmission, zero-window probes, etc. Ideally, measurements of round-trip delays should be obtained and used to derive timer values.
Also, the timeout value specified in TCPHandle is used for several different purposes including when to give up trying to send a packet and when to give up waiting for a packet. There should be separate timers for these different cases.