RPCPktIO.mesa - Reliable transmission and reception of packets
Copyright © 1984, 1986 by Xerox Corporation. All rights reserved.
Bob Hagmann, February 11, 1985 4:37:25 pm PST
Swinehart, November 24, 1986 7:53:27 am PST
Polle Zellweger (PTZ) August 2, 1985 1:39:38 pm PDT
Russ Atkinson (RRA ) October 29, 1985 8:57:16 am PST
Hal Murray, June 3, 1986 4:06:28 pm PDT
DIRECTORY
BasicTime USING [GetClockPulses, MicrosecondsToPulses, Pulses],
Booting USING [RegisterProcs, RollbackProc],
PrincOpsUtils USING [LongCopy, PsbHandleToIndex, ReadPSB],
Process USING [Detach, MsecToTicks, priorityClient3, priorityForeground, SetPriority],
PrincOps USING [PsbIndex, PsbNull],
Pup USING [Address, Host, Net, Socket],
PupBuffer USING [Buffer],
PupHop USING [InitialTimeoutPulses],
PupSocket USING [AllocBuffer, CreateServer, FreeBuffer, Get, GetMyAddress, Send, Socket, waitForever],
PupSocketBackdoor USING [SetDirectReceive],
PupWKS USING [rpc],
RPC USING [CallFailed, Conversation, unencrypted],
RPCInternal USING [ConversationObject, DecryptPkt, DoSignal, EncryptPkt, PktExchangeState, pktLengthOverhead, ReplyToRFA, RollbackRPCBinding, RollbackRPCPktStreams, RollbackRPCSecurity, ServerMain, StartRPCBinding, StartRPCPktStreams, StartRPCSecurity],
RPCLupine USING [DataLength, Dispatcher, Header, RPCPkt],
RPCPkt USING [CallCount, Header, Machine, PktID];
RPCPktIO: MONITOR
IMPORTS BasicTime, Booting, PrincOpsUtils, Process, PupHop, PupSocket, PupSocketBackdoor, RPC, RPCInternal
EXPORTS RPCInternal, RPCLupine
SHARES RPCLupine = {
Header: PUBLIC TYPE = RPCPkt.Header;
ConversationObject: PUBLIC TYPE = RPCInternal.ConversationObject;
Conversation: TYPE = REF ConversationObject;
ConcreteHeader: PROC [abstract: LONG POINTER TO RPCLupine.Header] RETURNS [LONG POINTER TO Header] = INLINE {
RETURN [ abstract ];
};
callSequence: RPCPkt.CallCount ← 0; -- monotonic from this host --
******** sending and receiving packets ********
myHost: RPCPkt.Machine;
sent: CARDINAL ← 0;
recvd: CARDINAL ← 0;
retransmitted: CARDINAL ← 0;
"PktExchange" implements a reliable packet exchange. There are five cases:
sending: transmit data, wait for ack
receiving: transmit ack, wait for data
call: transmit data, wait for data
endCall: transmit data, wait for ack or data (data => start of new call)
authReq: transmit RFA, wait for data (like "call", but no retransmissions)
Data and RFA packets are retransmitted until an ack is received. Ack packets are retransmitted only in response to "pleaseAck" requests. No acknowledgement after 14 transmissions is fatal (CallFailed[timeout]). When the transmitted packet has been acknowledged (if needed), a ping is sent periodically (ack packet saying pleaseAck). If necessary the ping is retransmitted until it has been acked. Failure to receive an ack for the ping is fatal (CallFailed[timeout]). Provided pings continue to be acknowledged, there is no limit on how long we will wait for the next data packet. If the client gets impatient, he can abort this process. (This corresponds to the local-machine semantics of a procedure call.)
minRetransmitMsecs: CARDINAL = 100; -- retransmit delay for local net --
msecsPerHop: CARDINAL = 500; -- approximate typical gateway hop delay? --
minPingMsecs: CARDINAL = 5000; -- initial interval between pings --
maxPingSecs: CARDINAL = 300; -- long-term ping interval --
maxTransmissions: CARDINAL ← 20; -- give up after too many transmissions --
signalTimeout: BOOLTRUE; -- debugging switch --
The retransmission delay is incremented by minPingMsecs each time we timeout, but is reset when we receive the desired packet. If n=maxTransmissions and i=minRetransmitMsecs and j=hops*msecsPerHop, we give up after i*(n*n+n)/2+n*j msecs. For the local network, that comes to 21 seconds. For a two-hop route, that comes to 41 seconds. The ping delay is doubled at each ping, up to maxPingSecs, which is 5 minutes. The maxPingSecs value is reached after about 5 minutes.
minRetransmitPulses: BasicTime.Pulses =
BasicTime.MicrosecondsToPulses[LONG[1000] * minRetransmitMsecs];
pulsesPerHop: BasicTime.Pulses =
BasicTime.MicrosecondsToPulses[LONG[1000] * msecsPerHop];
minPingPulses: BasicTime.Pulses =
BasicTime.MicrosecondsToPulses[LONG[1000] * minPingMsecs];
maxPingPulses: BasicTime.Pulses =
BasicTime.MicrosecondsToPulses[LONG[1000]*LONG[1000] * maxPingSecs];
transmitLocalPkts: BOOLTRUE;
Constants for WaitUntilSent (inside PktExchange)
shortWait: BasicTime.Pulses = -- 10 msecs --
BasicTime.MicrosecondsToPulses[LONG[1000] * 10];
longerWait: BasicTime.Pulses = -- 5 secs --
BasicTime.MicrosecondsToPulses[LONG[1000]*LONG[1000] * 5];
Waiting and wanting
idlerPkt: PupBuffer.Buffer ← NIL;
idlerCond: CONDITION ← [timeout:0];
waiterCond: CONDITION ← [timeout:Process.MsecToTicks[minRetransmitMsecs]];
waiterPkts: REF WaiterArray = NEW[WaiterArray ← ALL[NIL]];
WaiterArray: TYPE = ARRAY PrincOps.PsbIndex OF PupBuffer.Buffer;
wanting: REF WantingArray = NEW[WantingArray ← ALL[FALSE]];
WantingArray: TYPE = PACKED ARRAY PrincOps.PsbIndex OF BOOL;
Serving and idling
Number of server processes will never exceed "serversMax". If number of idle servers exceeds "idlersMax", one will abort. If number of idle servers drops below "idlersMin", one is forked.
RecvdPktTooLong: ERROR = CODE;
KillServer: ERROR = CODE;
servers: NAT ← 0;
serversMax: NAT ← 20;
idlers: NAT ← 0;
idlersMax: NAT = 6;
idlersMin: NAT = 2;
PktExchange: PUBLIC PROC [inPkt: RPCLupine.RPCPkt, length: RPCLupine.DataLength, maxlength: RPCLupine.DataLength, state: RPCInternal.PktExchangeState, signalHandler: RPCLupine.Dispatcher ← NIL] RETURNS [newPkt: BOOL, newLength: RPCLupine.DataLength] = {
On exit if newPkt=TRUE, the packet has been decrypted if needed.
Normally, transmits inPkt and copies result into inPkt. If a signal occurs, calls DoSignal which handles the signal protocol up to the last resumption packet. DoSignal returns the last resume packet for us to transmit, which we do by assigning it to outPkt; that packet was allocated in DoSignal's local frame, which we must later deallocate.
outPkt: RPCLupine.RPCPkt ← inPkt; -- altered after a signal --
outPktFrame: POINTERNIL; -- DoSignal's local frame --
newLen: NAT ← 0;
We use this internally to represent newLength, since it is possible for a newLength calculation to exceed the bounds of RPCLupine.DataLength (for received packets that are encrypted, but not ours.)
GT: PROC[conversation: RPC.Conversation] RETURNS [
transmissions: CARDINAL←maxTransmissions, sgnlTimeout: BOOL←signalTimeout] = INLINE {
conv: Conversation ← LOOPHOLE[conversation];
Can't afford the needed procedure call to convert to concrete.
IF conv=NIL THEN RETURN;
IF conv.maxTransmissions#0 THEN transmissions ← conv.maxTransmissions;
sgnlTimeout ← SELECT conv.timeoutEnable FROM
always => TRUE, never => FALSE, dontCare => signalTimeout, ENDCASE=>ERROR;
};
maxTrans: CARDINAL;
sgnlTimeout: BOOL;
[maxTrans, sgnlTimeout] ← GT[outPkt.convHandle];
DO -- loop for signal handlers --
sentTime: BasicTime.Pulses; -- initialized after sending any packet --
NewCallNumber: ENTRY PROC RETURNS [RPCPkt.CallCount] = INLINE {
RETURN [ callSequence ← callSequence+1 ];
};
reply: PupBuffer.Buffer;
recvdHeader: LONG POINTER TO Header;
myPSB: PrincOps.PsbIndex = PrincOpsUtils.PsbHandleToIndex[PrincOpsUtils.ReadPSB[]];
acked: BOOL;
thisPktID: RPCPkt.PktID;
header: LONG POINTER TO Header = @outPkt.header;
pingPulses: BasicTime.Pulses ← minPingPulses;
header.srceHost ← myHost;
header.srceSoc ← PupWKS.rpc;
header.srcePSB ← myPSB;
IF header.pktID.pktSeq = 0 THEN { -- first packet of a call; yucky interface!
header.type ← SELECT state FROM
sending => [0, rpc, notEnd, pleaseAck, call],
call => [0, rpc, end, dontAck, call],
authReq => [0, rpc, end, dontAck, rfa],
ENDCASE => ERROR; --receiving, endCall
header.pktID.callSeq ← NewCallNumber[];
header.pktID.pktSeq ← 1;
acked ← FALSE; }
ELSE {
header.type ← SELECT state FROM
sending => [0, rpc, notEnd, pleaseAck, data],
receiving => [0, rpc, end, dontAck, ack],
call => [0, rpc, end, dontAck, data],
endCall => [0, rpc, end, dontAck, data],
ENDCASE => ERROR; --authReq
IF state # receiving THEN { -- header.type.class = data
header.pktID.pktSeq ← header.pktID.pktSeq+1; acked ← FALSE; }
ELSE acked ← TRUE; };
thisPktID ← header.pktID;
SetWanting[myPSB];
DO -- loop for pings --
ENABLE UNWIND => {
ClearWanting[myPSB];
IF outPktFrame # NIL THEN PrincOpsUtils.Free[outPktFrame];
};
{
transmissions: CARDINAL ← 0;
retransmitPulses: BasicTime.Pulses;
retransmitPulses ← PupHop.InitialTimeoutPulses[header.destHost.net, minRetransmitPulses];
IF retransmitPulses = 0 THEN retransmitPulses ← minRetransmitPulses; -- unreachable
IF outPkt.convHandle # RPC.unencrypted THEN
header.length ← RPCInternal.EncryptPkt[outPkt, length]
ELSE header.length ← length + RPCInternal.pktLengthOverhead;
header.oddByte ← no;
DO -- loop for retransmissions --
{
GeneralSend[outPkt];
sentTime ← BasicTime.GetClockPulses[];
sent ← sent+1;
wait for response: an ack or the reply or a timeout
DO -- loop for each incoming packet (including garbage) --
reply ← MyReceive[
myPSB, sentTime, (IF acked THEN pingPulses ELSE retransmitPulses)];
IF reply = NIL THEN
IF acked THEN GOTO ping
ELSE { header.type.ack ← pleaseAck; GOTO retransmit };
recvdHeader ← LOOPHOLE[@reply.byteLength];
IF recvdHeader.type.class = rfa THEN {
[] ← RPCInternal.ReplyToRFA[
reply, header--encrypted--, thisPktID--clear--, inPkt.convHandle];
can't set "acked", because we must retransmit our data packet until we obtain the correct destPSB from some ack pkt
LOOP; };
IF outPkt.convHandle # RPC.unencrypted
AND recvdHeader.conv = header.conv
AND recvdHeader.srceHost = header.destHost THEN {
ok: BOOL;
[ok, newLen] ← RPCInternal.DecryptPkt[recvdHeader, outPkt.convHandle];
IF ~ok AND recvdHeader.type.class#ack THEN
ERROR RPC.CallFailed[runtimeProtocol ! UNWIND => GiveBackBuffer[reply] ]; }
ELSE newLen ← recvdHeader.length - RPCInternal.pktLengthOverhead;
IF recvdHeader.conv = header.conv
AND recvdHeader.srceHost = header.destHost
AND recvdHeader.pktID.activity = thisPktID.activity THEN -- pkt is related to our call
SELECT TRUE FROM
recvdHeader.pktID.callSeq = thisPktID.callSeq => -- pkt is in our call
SELECT TRUE FROM
a) pkt has next sequence number to ours
recvdHeader.pktID.pktSeq = 1+thisPktID.pktSeq => {
IF state = sending OR state = endCall THEN
he's not allowed to generate that pktSeq!
ERROR RPC.CallFailed[runtimeProtocol ! UNWIND => GiveBackBuffer[reply] ];
SELECT recvdHeader.type.class FROM
data => GOTO done;
ack => {
This can happen if state=call and callee sent next data pkt, but it was lost and now he is responding to our retransmission or ping. Soon, he will retransmit his data pkt.
acked ← TRUE;
GiveBackBuffer[reply]; };
ENDCASE => --call,rfa--
ERROR RPC.CallFailed[runtimeProtocol
! UNWIND => GiveBackBuffer[reply] ]; };
b) pkt has same sequence number as ours
recvdHeader.pktID.pktSeq = thisPktID.pktSeq => {
SELECT recvdHeader.type.class FROM
ack => {
acknowledgement of our packet
IF header.type.class = call THEN
header.destPSB ← recvdHeader.srcePSB;
acked ← TRUE;
IF state = sending OR state = endCall
THEN {GiveBackBuffer[reply]; reply←NIL; GOTO done}
ELSE -- state = call, authReq, or receiving --
Even if "pleaseAck", we don't need to ack it,
because other end should send data pkt soon --
GiveBackBuffer[reply];
};
data, call => -- retransmisson of his packet --
IF state = receiving
THEN IF recvdHeader.type.ack = pleaseAck
THEN {
GiveBackBuffer[reply]; reply ← NIL;
GOTO retransmit -- we're already sending an ack --
}
ELSE GiveBackBuffer[reply]
ELSE ERROR RPC.CallFailed[runtimeProtocol
! UNWIND => GiveBackBuffer[reply] ];
ENDCASE => --rfa --
ERROR RPC.CallFailed[runtimeProtocol
! UNWIND => GiveBackBuffer[reply] ];
};
c) pkt has earlier sequence number than ours
recvdHeader.pktID.pktSeq < thisPktID.pktSeq =>
GiveBackBuffer[reply]; -- no need to ack it --
d) pkt has some future sequence number
ENDCASE =>
ERROR RPC.CallFailed[runtimeProtocol ! UNWIND => GiveBackBuffer[reply] ];
recvdHeader.pktID.callSeq > thisPktID.callSeq AND state=endCall => {
IF recvdHeader.type.class # call
THEN ERROR RPC.CallFailed[runtimeProtocol ! UNWIND => GiveBackBuffer[reply] ];
acks our last packet, so we can handle the call
GOTO done
}
ENDCASE => {
wrong call: let someone else do it
recvdHeader.destPSB ← PrincOps.PsbNull;
EnqueueAgain[reply];
}
ELSE {
wrong conversation or activity: let someone else do it
recvdHeader.destPSB ← PrincOps.PsbNull;
EnqueueAgain[reply];
};
Incoming RFA packets don't reach here.
ENDLOOP--for each incoming packet--;
EXITS
retransmit => {
transmissions ← transmissions+1;
IF (transmissions = maxTrans AND sgnlTimeout) OR state = authReq
Don't retransmit RFA: caller will retransmit call pkt. Otherwise, if a spurious worker process occurred because of call pkt retransmission before response to RFA, then the spurious worker process sits needlessly retransmitting RFA's until it times out.
THEN { SIGNAL RPC.CallFailed[timeout]; transmissions ← 0 };
retransmitted ← retransmitted+1;
retransmitPulses ← retransmitPulses + minRetransmitPulses;
};
};
ENDLOOP-- for each transmission --;
EXITS
ping => {
header.type ← [0, rpc, end, pleaseAck, ack];
length ← 0; header.pktID ← thisPktID;
acked ← FALSE;
pingPulses ← MIN[maxPingPulses, pingPulses * 2];
};
};
only exit from loop is "GOTO done" or UNWIND
REPEAT done => {
This isn't covered by any UNWIND
ClearWanting[myPSB];
IF outPktFrame # NIL THEN PrincOpsUtils.Free[outPktFrame];
IF reply = NIL
THEN { --restore clear pktID-- header.pktID ← thisPktID; RETURN [FALSE, newLen] }
ELSE {
IF recvdHeader.outcome = signal AND state = call
AND signalHandler # NIL
THEN [outPkt, length, outPktFrame] ←
RPCInternal.DoSignal[reply, newLen, signalHandler, inPkt.convHandle]
ELSE {
IF newLen > maxlength THEN
ERROR RPC.CallFailed[runtimeProtocol ! UNWIND => GiveBackBuffer[reply]];
PrincOpsUtils.LongCopy[
from: recvdHeader, to: @inPkt.header, nwords: recvdHeader.length];
GiveBackBuffer[reply];
RETURN [TRUE, newLen]
};
}
};
ENDLOOP-- for each ping--;
We get here only after coming back from a call of DoSignal
ENDLOOP-- for signal handlers --;
we can't get here!
};
GeneralSend: PROC [pkt: RPCLupine.RPCPkt] = {
b: PupBuffer.Buffer = PupSocket.AllocBuffer[socket];
PrincOpsUtils.LongCopy[from: @(pkt.header), to: @(b.byteLength),
nwords: ConcreteHeader[@pkt.header].length];
PupSocket.Send[socket, b, b.dest];
};
SetWanting: ENTRY PROC [myPSB: PrincOps.PsbIndex] = INLINE {
wanting[myPSB] ← TRUE;
};
ClearWanting: PROC [myPSB: PrincOps.PsbIndex] = INLINE {
spare: PupBuffer.Buffer;
InnerClear: ENTRY PROC = INLINE {
wanting[myPSB] ← FALSE;
IF (spare ← waiterPkts[myPSB]) # NIL THEN waiterPkts[myPSB] ← NIL;
};
InnerClear[];
IF spare # NIL THEN GiveBackBuffer[spare] --ignore it, outside monitor--;
};
MyReceive: ENTRY PROC [myPSB: PrincOps.PsbIndex, sentTime, waitTime: BasicTime.Pulses] RETURNS [recvd: PupBuffer.Buffer] = INLINE {
ENABLE UNWIND => NULL;
Returns NIL if no packet arrives within waitTime pulses after sentTime
DO
IF (recvd ← waiterPkts[myPSB]) = NIL
THEN IF BasicTime.GetClockPulses[] - sentTime < waitTime
THEN WAIT waiterCond
ELSE EXIT
ELSE { waiterPkts[myPSB] ← NIL; EXIT };
ENDLOOP;
};
IdleReceive: PUBLIC PROC [pkt: RPCLupine.RPCPkt, maxlength: CARDINAL] = {
b: PupBuffer.Buffer;
InnerIdler: ENTRY PROC = {
IF idlers >= idlersMax THEN { servers ← servers-1; RETURN WITH ERROR KillServer[] };
idlers ← idlers + 1;
DO WAIT idlerCond; IF idlerPkt # NIL THEN EXIT; ENDLOOP;
IF idlers < idlersMin AND servers < serversMax THEN {
servers←servers+1;
Process.Detach[FORK Server[]]; };
b ← idlerPkt;
{
recvdHeader: LONG POINTER TO Header = LOOPHOLE[@b.byteLength];
idlerPkt ← NARROW[idlerPkt.ovh.next];
b.ovh.next ← NIL;
IF recvdHeader.length > maxlength THEN ERROR RecvdPktTooLong[]; --NULL??
PrincOpsUtils.LongCopy[
from: @b.byteLength, to: @pkt.header, nwords: recvdHeader.length];
};
};
InnerIdler[];
GiveBackBuffer[b];--outside monitor--
};
QueuesScrambled: ERROR = CODE;
EnqueueRecvd: ENTRY SAFE PROC [socket: PupSocket.Socket, b: PupBuffer.Buffer, foo: REF ANY] RETURNS [PupBuffer.Buffer] = TRUSTED { -- SAFE/TRUSTED For PupSocketBackdoor
Dispatch packet to appropriate RPC process, if any. Packet is known to be a Pup, and is addressed to our socket.
header: LONG POINTER TO Header = LOOPHOLE[@b.byteLength];
IF header.type.subType = rpc THEN {
destPSB: PrincOps.PsbIndex = header.destPSB;
recvd ← recvd+1;
IF ~(destPSB IN (PrincOps.PsbNull..LAST[PrincOps.PsbIndex]])
OR ~wanting[destPSB] THEN {
give to idle process to deal with
Pkts are dealt with LIFO by idlers, but it doesn't matter (much)
IF idlers = 0 THEN RETURN[b]; -- too busy, drop the packet
b.ovh.next ← idlerPkt;
idlerPkt ← b;
idlers ← idlers-1;
NOTIFY idlerCond; }
ELSE {
someone wants this packet: give them it
IF waiterPkts[destPSB] # NIL THEN RETURN [b];
waiterPkts[destPSB] ← b;
BROADCAST waiterCond; };
RETURN[NIL] }
ELSE RETURN[b];
};
EnqueueAgain: PUBLIC PROC [b: PupBuffer.Buffer] = {
b ← EnqueueRecvd[socket, b, NIL];
IF b # NIL THEN GiveBackBuffer[b];
};
socket: PupSocket.Socket;
Listener: PROC = {
Process.SetPriority[Process.priorityClient3]; -- Same as Drivers
Dan's voice code got horribly confused with lower priority (even foreground) before direct recv. We never understood why. /HGM, May 86
socket ← PupSocket.CreateServer[
local: PupWKS.rpc,
sendBuffers: 10,
recvBuffers: 10,
getTimeout: PupSocket.waitForever];
PupSocketBackdoor.SetDirectReceive[socket, EnqueueRecvd, NIL];
DO
b: PupBuffer.Buffer ← PupSocket.Get[socket];
b ← EnqueueRecvd[socket, b, NIL];
IF b # NIL THEN GiveBackBuffer[b];
ENDLOOP;
};
AllocBuffer: PUBLIC PROC RETURNS [b: PupBuffer.Buffer] = {
RETURN[PupSocket.AllocBuffer[socket]];
};
SendBuffer: PUBLIC PROC [b: PupBuffer.Buffer] = {
PupSocket.Send[socket, b, b.dest];
};
GiveBackBuffer: PUBLIC PROC [b: PupBuffer.Buffer] = {
PupSocket.FreeBuffer[b];
};
listenerProcess: PROCESS;
Server: PROC = {
Process.SetPriority[Process.priorityForeground];
RPCInternal.ServerMain[ ! KillServer => CONTINUE ];
};
******** Initialization ********
Initialize: ENTRY PROC = {
myAddr: Pup.Address ← PupSocket.GetMyAddress[];
myHost ← [myAddr.net, myAddr.host];
RPCInternal.StartRPCBinding[myHost]; -- exports "RPCInternal.exportTable"
RPCInternal.StartRPCSecurity[myHost]; -- exports "RPCInternal.firstConversation"
RPCInternal.StartRPCPktStreams[myHost]; -- initialize connection states
Booting.RegisterProcs[r: Rollback];
servers ← servers+1;
Process.Detach[FORK Server[]];
listenerProcess ← FORK Listener[];
};
Rollback: Booting.RollbackProc = TRUSTED {
InnerRollback: ENTRY PROC = TRUSTED {
RPCInternal.RollbackRPCPktStreams[]; -- forget connection counts
RPCInternal.RollbackRPCSecurity[]; -- invalidate local conversations
RPCInternal.RollbackRPCBinding[]; -- nothing?
};
InnerRollback[];
};
Initialize[];
}.
Bob Hagmann November 1, 1984 7:45:11 am
maxTransmissions ← 20 == this was set up to 100 during the time when Luther was thrashing
Bob Hagmann February 11, 1985 4:37:25 pm PST
changes to: GiveBackBuffer
Polle Zellweger (PTZ) August 2, 1985 1:39:27 pm PDT
changes to: PktExchange -- use newLen (NAT) rather than newLength [0..254] so that full-sized encrypted packets for other conversations don't cause bounds faults (the padding and/or the key can add a few words)
Swinehart, November 22, 1986 3:36:00 pm PST
Add maxTransmissions parameter embedded in ConversationObject.
changes to: DIRECTORY , PktExchange