About processes:
1. All processes execute in a single address space, although we will want to extend our model to multiple address spaces some day (Dragon has address space switching support).
2. Processes are (relatively) lightweight. Most of the process switch time is devoted to saving and restoring registers. We estimate 100-300 msecs per process switch, depending on the amount of state.
3. Processes are dynamically assigned to processors by the scheduler, so that processes can be considered virtual processors. We think of the scheduler as executing in the cracks between processes.
4. The scheduler attempts to run higher priority processes in preference to lower priority processes. However, there is no guarantee that a higher priority process will preempt a lower priority process, even for single processor systems.
About locks:
1. Primitive spin locks (PSLs) are the most primitive locks. PSLs are built out of CStore, are only acquired with reschedule off, and must be pinned. A single processor system should never wait for PSLs, obviously. In a multi-processor system PSLs must never be held long, or they become a bottleneck. The most important (and currently only) PSL is the metaLock, which controls access to the global process state information (especially the meta queues).
2. General spin locks (GSLs) use CStore and directed Yield. If CStore cannot acquire the lock, then the process attempting to acquire the lock donates its cycles to the process holding the lock using directed Yield. If the process holding the lock cannot be run, directed Yield acts like undirected Yield. GSLs can be non-pinned.
3. General queues (GQs) use a GSL to serialize ownership of the queue, and a circular queue of processes to hold the processes that are waiting. See Enqueue and Dequeue for details.
4. Monitor locks (MLs) are built from a GQ and an ownership word, which is used to denote the process owning the lock. See MonitorEntry and MonitorExit for details.
5. Condition variables (CVs) do not assume that a monitor is held, although one is always present. CVs also support timeouts. CVs are built from a GQ and a timeout word that holds the initial value of the timeout period. A flags word contains a bit that can enable aborts to affect the CV.
6. Interrupt condition variables (ICVs) are the mechanism for turning device signals and interrupts into ICV notifications. ICVs do not assume that a monitor is held, since none is ever present. ICVs also support timeouts. ICVs are built from a CV and a request bit for interrupt requests so we do not lose them while not waiting for the ICV. ICVs must be pinned, to allow the scheduler to remove requests from them.
Carl Hauser, March 13, 1987 3:48:54 pm PST
External procedures
Fork:
PUBLIC PROC[proc:
PROC ANY RETURNS ANY, argPtr: WordSeq, nArgs:
INT, nRets:
INT]
RETURNS [new: Process ←
NIL] = {
... forks a new process, which will initially call the given procedure using the given arguments. The number of words of arguments and returns is also necessary.
In the calling process:
F1: allocate a new process. Return NIL if we can't do this. (Error??)
-- CHH: PrincOps ProcessImpl.Fork waits until a process is available. Do we really want to change the semantics? March 12, 1987
F2: newFX ← Allocate a FX of (IF nRets > maxStackRets THEN nRets ELSE 0)+(IF nArgs > maxStackArgs THEN nArgs ELSE 0) words.
F3: copy the arguments from argPtr^ for nArgs words to newFX
(the original argument storage belongs to the caller)
F4: setup the process block & make it ready to run Forkee, having set things up so that Forkee's arguments will be correct. Forkee[proc, newFX, nArgs, nRets]
NotYetImplemented[];
};
Forkee:
PROC [proc:
PROC ANY RETURNS ANY, argPtr: WordSeq, nArgs:
INT, nRets:
INT]
RETURNS [
--doesn't--] ~ {
retRecord: WordSeq;
FF0: IF nRets > maxStackRets THEN
{ retRecord ← argPtr^;
argPtr^ ← argPtr + nArgs;
};
FF1: Depending on nArgs, either push argPtr or push nArgs from argPtr^;
FF2: Push appropriate descriptor information for proc
FF3: Call proc using an inline SFCI instruction; it eventually will return or not as it sees fit.
JJ1: proc's RETURN record may either be in the FX pointed to by argPtr at offset nArgs or on the stack.
JJ2: wait for JOIN or DETACH by another process.
JJ3: IF JOINed
THEN {
Copy return values in stack or FX to a Nacho;
Give the Nacho to the other process
}
JJ4: FrameAllocator.Free[argPtr];
JJ5: place the process in the free list (state ← free), free any remaining resources (possibly a FX) and quietly disappear
NotYetImplemented[];
};
Join:
PUBLIC PROC [process: Process, retPtr: WordSeq] = {
... joins the specified process, which waits for the process to complete, then copies the return values to the address given by retPtr.
In the calling process:
J1: wait for process to be in the done state (wait on condition?)
J2: copy the return values to retPtr^
J3: place the process in the free list (state ← free)
J4: return to the caller
NotYetImplemented[];
};
DirectedYield:
PUBLIC SAFE PROC[nextState: ProcessState, nextProcess: Process, when: Ticks] =
TRUSTED {
... yields the processor to the nextProcess, placing the current process in nextState. If nextProcess = NIL or is not ready, then the executing processor is yielded to the "best" ready process. The current process will be placed in the nextState. DirectedYield inhibits critical traps, and performs a ProcessSwitch. The when word is used for passing through timeout information. This routine must be called only when critical traps are enabled.
DisableInterrupts[];
CurrentProcess.when ← LOOPHOLE[when];
MarkFrame[];
Indicates that the current frame should not be saved.
ProcessSwitch[nextState, nextProcess];
At this point, the processor may not be running the same process as it was at the start of this routine. That is OK, because the return PC is not the same, either, since we will get it from the hook.
EnableInterrupts[];
};
Wait:
PUBLIC PROC [cond: Condition, lock: MonitorLock] = {
... waits for the condition variable cond to be the target of a Notify, Broadcast, or timeout. If lock # NIL, then lock must be released during the possible pause, and lock must be re-acquired after the Wait has finished.
Enqueue[@cond.queue];
Place self on the condition variable; we do this before giving up the monitor to avoid missing a Notify, which would be wrong; we cannot get a Notify between now and releasing the monitor lock, since one must possess the monitor lock in order to do a Notify.
IF lock #
NIL THEN MonitorExit[lock];
Exit the monitor; after this the condition variable can be Notify'd by another process
IF CurrentProcess.abort # requested
OR NOT cond.flags[abortEnableBit]
THEN
Go to sleep, choosing the "best" guy to run next. If we get a Notify between after exiting the monitor and before going to sleep, the Notify will remove this process from the condition, yet will NOT place this process on the ready queue (because our state is not yet waitingCV).
DirectedYield[waitingCV, NIL, cond.timeout];
DequeueSelf[@cond.queue];
If the current process is still on the condition queue then it must remove itself. DequeueSelf has no effect if the current process is not on a queue, so a race with some other process trying to dequeue this process is benign.
IF lock #
NIL THEN MonitorEntry[lock];
Reenter the monitor.
IF CurrentProcess.abort = requested
AND cond.flags[abortEnableBit]
THEN TestAbort[];
Final test for abort request while we were waiting on the CV.
};
WaitInterrupt:
PUBLIC PROC [cond: Condition] = {
... waits for cond to be the target of an interrupt request, or timeout. Unlike Wait, there are no monitor locks involved, and we have to proceed immediately if interrupts have occurred since the last WaitInterrupt.
DO
First, a quick test for requests during the time that the interrupt condition was not held.
currentFlags: CondFlags ← cond.flags;
IF currentFlags[condRequestBit]
THEN {
currentFlags[condRequestBit] ← FALSE;
cond.flags ← currentFlags;
EXIT;
};
Enqueue on cond with interrupts disabled to avoid a deadlock with interrupt servicing; note the assumption that interrupt condition variables are pinned.
DisableInterrupts[];
Enqueue[@cond.queue];
EnableInterrupts[];
Now that cond is held, see if there were interrupts since the last sample. If so, we don't really want to go to sleep, we want to service those requests.
IF NOT cond.flags[condRequestBit]
THEN
IF CurrentProcess.abort # requested
OR NOT cond.flags[abortEnableBit]
THEN
DirectedYield[waitingICV, NIL, cond.timeout];
We dequeue at this point to allow another process to use cond. DequeueSelf has no effect if the current process is not on the queue.
DequeueSelf[@cond.queue];
ENDLOOP;
IF CurrentProcess.abort = requested AND cond.flags[abortEnableBit] THEN TestAbort[];
};
Notify:
PUBLIC PROC [cond: Condition] = {
... notifies the "best" process waiting on cond that the condition has occurred. The protecting monitor lock is presumed to be held, but is not needed for this primitive.
best: Process = Dequeue[@cond.queue];
IF best #
NIL THEN MakeReady[best];
There really was a process waiting on the CV. Although a process may be made ready, the process switch is not immediately required. The process switch may occur when the protecting monitor lock is released.
};
Broadcast:
PUBLIC PROC [cond: Condition] = {
... notifies all waiting processes waiting on cond that the condition has occurred. To eliminate race problems, all of the waiting processes are made ready while holding onto the queue. The protecting monitor lock is presumed to be held, but is not needed for this primitive.
tail: Process ← cond.queue.chain;
IF tail #
NIL THEN {
AcquireQueue[@cond.queue];
IF (tail ← cond.queue.chain) #
NIL THEN {
cond.queue.chain ← NIL;
DO
Remove all of the processes from the circular queue and make them ready. Although processes may be made ready, the process switch is not immediately required. The process switch may occur when the protecting monitor lock is released.
this: Process = tail.next;
next: Process ← this.next;
tail.next ← next;
this.queue ← NIL;
this.next ← NIL;
MakeReady[this];
IF this = next THEN EXIT; -- we just removed the last one
ENDLOOP;
};
ReleaseQueue[@cond.queue];
};
};
MonitorEntry:
PUBLIC PROC [lock: MonitorLock] = {
... is performed at the entry to any monitored procedure to lock up the monitor for exclusive access. This operation is performed frequently, so it should be fast in the case where the monitor is not already owned.
self: Process = CurrentProcess;
DO
Try for conditional store of current PSB index with 0 wait chain. If we succeed, then we are in, and can return.
IF CStoreProcess[ptr: @lock.owner] THEN RETURN;
Place self on the monitor queue.
Enqueue[@lock.queue];
If the monitor is still held, then go to sleep, yielding to best process, only to wake up when the monitor queue is OK. Otherwise, some process released the monitor while we were trying to Enqueue, so we should get off the monitor queue and try again.
{
owner: Process = lock.owner;
IF owner #
NIL THEN {
THROUGH [0..spinTries) WHILE owner.state = running DO ENDLOOP;
IF lock.owner # NIL THEN DirectedYield[waitingML, lock.owner, 0];
};
DequeueSelf[@lock.queue];
};
ENDLOOP;
};
MonitorExit:
PUBLIC PROC [lock: MonitorLock] = {
... frees up the monitor lock. If there is a process waiting the monitor, dequeue the process from the monitor queue and place it in the ready queue. The current coding does nothing if we do not own the lock. Should we die instead of doing nothing?
IF lock.owner = CurrentProcess
THEN {
lock.owner ←
NIL;
Give up ownership of the lock.
IF lock.queue.chain #
NIL THEN {
Make ready the next process waiting for the monitor. There is a slight possibility that some other process will jump in ahead of this one while the monitor is free, but fairness is not guaranteed.
next: Process = Dequeue[@lock.queue];
IF next # NIL THEN MakeReady[next];
};
IF deferredWakeups
THEN RaiseReschedule[];
deferredWakeups = TRUE indicates that we deferred a desirable wakeup while processing a Broadcast or Notify for some monitor. Statistically speaking, it was probably for this monitor. So we raise reschedule here to get the best processes running.
};
};
TestAbort:
PUBLIC PROC = {
... tests for a requested abort.
self: Process = CurrentProcess;
IF self.abort = requested THEN {self.abort ← none; ERROR ABORTED};
};
RequestAbort:
PUBLIC PROC [process: Process] = {
... requests the given process to abort itself.
SELECT process.abort
FROM
none => {
process.abort ← requested;
MakeReady[process];
RaiseReschedule[];
};
inhibited =>
process.abort ← delayed;
ENDCASE;
};
Special external procedures
ShuffleReadyQueues:
PUBLIC PROC = {
... is called by the timeslicer service. It advances all processes waiting in the ready queues by one, and directs all undirected processes to switch to the best available process(es).
DisableInterrupts[];
AcquireMetaLock[];
Just order each processor to switch to the best available process.
FOR processor: Processor ← tailProcessor.next, processor.next
DO
SELECT processor.orders
FROM
noChange => {
Force a switch to the best available process
processor.orders ← switchToGiven;
processor.switchTo ← NIL;
};
ENDCASE;
IF processor = tailProcessor THEN EXIT;
ENDLOOP;
Last, raise the reschedule interrupt to make all processors choose the best available. Even the executing processor will eventually respond.
RaiseReschedule[];
ReleaseMetaLock[];
EnableInterrupts[];
};
CausePanicStop:
PUBLIC PROC = {
... causes all processors except the current one to come to a screeching halt with properly saved state. When we want to perform stack tracing, delicate debugging, or prepare for world-swap debugging, we use this routine to shut down the world. On entry, interrupts are assumed to be enabled and the metaLock is free. On exit, interrupts are disabled, the metaLock is held, and the panicStopLock is held.
self: Process = CurrentProcess;
DO
DisableInterrupts[];
Inhibit interrupts while we try to acquire the panicStopLock.
IF CStoreProcess[ptr: @panicStopLock]
THEN EXIT;
If panicStopLock is acquired, exit with interrupts disabled.
EnableInterrupts[];
Allow interrupts to occur while we are waiting. This allows the current processor to panic stop if it is requested to do so.
WHILE panicStopLock #
NIL DO
Spin without using CStore.
ENDLOOP;
ENDLOOP;
MarkFrame[];
Indicates that only frames older than the current frame are to be saved.
SaveProcessState[self];
We save the process state to make sure that the calling process is stable, and to make sure that we have enough stack registers to complete processing. We must have interrupts disabled to call this routine, but we don't need the metaLock.
DO
current: Processor ← CurrentProcessor;
done: BOOL ← TRUE;
AcquireMetaLock[];
FOR processor: Processor ← current.next, processor.next
WHILE processor # current
DO
SELECT processor.orders
FROM
noChange => {
Force a switch to the best available process
processor.orders ← panicStop;
processor.switchTo ← NIL;
done ← FALSE;
};
stopped => {};
ENDCASE => done ← FALSE;
ENDLOOP;
IF done THEN EXIT;
ReleaseMetaLock[];
RaiseReschedule[];
ENDLOOP;
At this point all of the processors except the current processor have entered the stopped state, and are waiting for different orders to be given. We return to the caller with interrupts off and the metaLock held.
};
ReleasePanicStop:
PUBLIC PROC = {
... allows the stopped processors to resume with normal processing. Call this procedure to release all processes from their stopped state. We assume that the metaLock is held and interrupts are off, and that the current process holds the panicStopLock. On exit, interrupts are off, and the current process has released the metaLock and the panicStopLock.
FOR processor: Processor ← tailProcessor.next, processor.next
DO
SELECT processor.orders
FROM
stopped => processor.orders ← noChange;
ENDCASE;
IF processor = tailProcessor THEN EXIT;
ENDLOOP;
At this point all of the other processors have been released from bondage. All we have to do now is release the panicStopLock and the metaLock, enable interrupts, and return.
panicStopLock ← NIL;
ReleaseMetaLock[];
EnableInterrupts[];
};
SatisfyPageFault:
PUBLIC PROC [startPage, endPage:
INT] = {
... for each process P in the page fault with startPage d P.when d endPage, make the process ready.
DisableInterrupts[];
AcquireMetaLock[];
{
tail: Process ← pageFaultMetaQueue;
p: Process ← tail;
WHILE p #
NIL DO
next: Process ← p.meta;
SELECT p.state
FROM
waitingPage =>
IF WordToInt[p.when]
IN [startPage..endPage]
THEN {
DequeueMeta[@pageFaultMetaQueue, p];
EnqueueReady[p];
place the process on the ready queue (also maintains highestReadyPriority)
IF p.priority > lowestRunningPriority
THEN deferredWakeups ←
TRUE;
The new process might be worth waking, but we don't do it here. We just remember that it was desirable.
};
ENDCASE;
p ← next;
IF p = tail THEN EXIT;
ENDLOOP;
};
ReleaseMetaLock[];
EnableInterrupts[];
IF deferredWakeups THEN RaiseReschedule[];
};
Internal procedures
ProcessSwitch:
PROC [nextState: ProcessState, nextProcess: Process] = {
... saves the state of the current process. If the process is not waiting, then it is placed on the ready queue. If nextProcess # NIL then it is the next process to run. Otherwise, the "best" ready process is chosen. During this sequence, we assume that critical traps are disabled. Note that if nextProcess # NIL, we must switch to it if it is ready, even if the current process is "better" (see comments in AcquireQueue to see why we require this).
self: Process = CurrentProcess;
SaveProcessState[self];
This is expensive enough to do outside of the region that holds the metaLock.
AcquireMetaLock[];
From now on, don't allow changes to the ready queue by other processors.
SELECT nextState
FROM
waitingML => {
Check to see if this process is trying to go to sleep on a monitor lock that has already been exited, which could happen if the MonitorExit occurred after this process got on the queue, but before the Yield. We have carefully arranged for this to be OK with Dequeue and friends.
IF self.next = NIL THEN GO TO bailOut;
self.state ← nextState;
};
waitingCV, waitingICV => {
Check to see if this process has already been removed from the condition queue, which could happen if the Notify occurred after this process got on the queue, but before the Yield. We have carefully arranged for this to be OK with Dequeue and friends.
IF self.next = NIL THEN GO TO bailOut;
IF WordToInt[self.when] > 0 THEN EnqueueMeta[@timeoutMetaQueue, self];
self.state ← nextState;
};
waitingPage => {
A page fault has occured, so we need to put the offending process to sleep on the page fault metaQueue, then notify the page fault handling process.
EnqueuePageFault[self];
DelicateNotify[pageFaultCond];
};
ENDCASE =>
EnqueueReady[self];
IF nextProcess =
NIL OR nextProcess.state # ready
THEN {
This is an undirected yield OR a directed yield to a process that is not ready for some peculiar (possibly transient) reason. So just choose the best process to run.
IF nextProcess #
NIL AND nextProcess.state = waitingPage
AND self.meta =
NIL THEN {
This is a directed yield to a process that is page faulting, so we should also wait for that page to be ready. It does us no good to proceed until that process is ready.
self.when ← nextProcess.when;
EnqueuePageFault[self];
};
nextProcess ← NIL;
Now choose the best process to run.
FOR priority: Priority
DECREASING IN [Priority.
FIRST..highestReadyPriority]
DO
tail: Process = readyQueueArray[priority];
IF tail # NIL THEN {nextProcess ← tail.next; EXIT};
ENDLOOP;
};
At this point nextProcess # NIL AND nextProcess.state = ready. The current intention is to have as many idle processes as processors, and to make sure that the idle processes are always ready to run.
DequeueMeta[@readyQueueArray[nextProcess.priority], nextProcess];
Note: this is currently the ONLY place that a process is removed from a ready queue!
DO
Sigh, we need to recompute the highestReadyPriority.
pri: Priority ← highestReadyPriority;
IF pri = Priority.FIRST OR readyQueueArray[pri] # NIL THEN EXIT;
highestReadyPriority ← pri.PRED;
ENDLOOP;
Now make nextProcess run on the executing processor.
nextProcess.state ← running;
CurrentProcessor.running ← CurrentProcess ← nextProcess;
SELECT self.priority
FROM
nextProcess.priority => {};
lowestRunningPriority => {
The process we are switching away from has the same priority as lowestRunningPriority, so it is possible that the lowestRunningPriority needs to be recomputed.
new: Priority ← tailProcessor.running.priority;
FOR each: Processor ← tailProcessor.next, each.next
WHILE each # tailProcessor
DO
IF new > each.running.priority THEN new ← each.running.priority;
ENDLOOP;
lowestRunningPriority ← new;
};
ENDCASE;
Now release the metaLock and restore the new state.
ReleaseMetaLock[];
IF nextProcess # self
THEN RestoreProcess[nextProcess];
Note that if we are the same process that no state needs to be restored, since nothing should have changed!
EXITS bailOut => ReleaseMetaLock[];
We get here when we have NOT found a better process to run. Therefore there is no change necessary, and things get done more quickly.
};
MakeReady:
PROC [process: Process] = {
... places the given process on the ready queue using the meta link, provided that the process was waiting for a queue. Also removes process from the timeout queue if it is there. On entry, we require that critical traps must NOT be disabled.
DisableInterrupts[];
AcquireMetaLock[];
IF process.queue #
NIL THEN
SELECT process.state
FROM
waitingCV, waitingML, waitingICV => {
priority: Priority = process.priority;
DequeueMeta[@timeoutMetaQueue, process];
remove from timeout queue if present
EnqueueReady[process];
place the process on the ready queue (also maintains highestReadyPriority)
IF priority > lowestRunningPriority
THEN deferredWakeups ←
TRUE;
The new process might be worth waking, but we don't do it here. We just remember that it was desirable.
};
ENDCASE;
ReleaseMetaLock[];
EnableInterrupts[];
};
Enqueue:
PROC [q: Queue, p: Process ← CurrentProcess] = {
... places the process on the queue in FIFO order. Priority processing is the responsibility of Dequeue.
AcquireQueue[q];
{
tail: Process = q.chain;
IF tail =
NIL
THEN p.next ← p
ELSE {p.next ← tail.next; tail.next ← p};
q.chain ← p;
p.queue ← q;
};
ReleaseQueue[q];
};
Dequeue:
PROC [q: Queue]
RETURNS [best: Process ←
NIL] = {
... performs dequeue by process priority. We perform a linear search of every process in the queue to find the best. The best process in the queue is the first one we find with the maximum priority. Performing the priority testing in Dequeue is more expensive for the case where there are several things on the queue, but it allows us to change process priorities without altering the queue. If the process we find has also timed out, we have to handle the case elsewhere, but it is OK for Dequeue to dequeue such a process. Dequeue does not remove such a process from the timeout queue.
lag: Process ← q.chain;
IF q.chain #
NIL THEN {
AcquireQueue[q];
best ← DequeueInner[q];
ReleaseQueue[q];
};
};
DequeueInner:
PROC [q: Queue]
RETURNS [best: Process ←
NIL] =
INLINE {
... is the inside part of Dequeue. While we still dequeue by process priority, we assume that we have already grabbed ownership of the queue. This allows us to have common code for this case and the more delicate notification required by interrupt requests & page faults.
lag: Process ← q.chain;
IF lag #
NIL THEN {
tail: Process = lag;
best ← lag.next;
IF lag = best
THEN q.chain ←
NIL
only process in queue, so clear the chain
ELSE {
head: Process ← best; -- loop invariant: lag.next = best
this: Process ← head;
DO
next: Process ← this.next;
IF next = head THEN EXIT; -- we have wrapped
IF next.priority > best.priority
THEN {lag ← this; best ← next};
new candidate for the best process
this ← next;
ENDLOOP;
IF tail = best
THEN q.chain ← lag;
don't queue chain point to deleted process
lag.next ← best.next;
};
best.queue ← NIL;
best.next ← NIL; -- the last change, due to test in ProcessSwitch
};
};
DequeueSelf:
PROC [q: Queue] = {
... dequeues the current process from the given queue. This operation has no effect if the process was not in the queue.
IF CurrentProcess.next #
NIL THEN {
lag: Process ← q.chain;
IF lag #
NIL THEN {
AcquireQueue[q];
IF (lag ← q.chain) #
NIL THEN {
tail: Process = lag;
DO
p: Process = lag.next;
IF p = CurrentProcess
THEN {
IF p = lag
THEN q.chain ←
NIL
last one in queue, so show no chain
ELSE {
IF p = tail
THEN q.chain ← lag;
don't let deleted one be the queue head
lag.next ← p.next;
};
CurrentProcess.queue ← NIL;
p.next ← NIL; -- the last change, due to test in ProcessSwitch
EXIT;
};
lag ← p;
IF lag = tail THEN EXIT;
ENDLOOP;
};
ReleaseQueue[q];
};
};
};
AcquireQueue:
PROC [q: Queue] = {
... grabs ownership of the given queue. Performs Yield[ready, owner] until queue is owned. This directed Yield is necessary to overcome nasty interactions between spin locks and preemptive scheduling.
WHILE NOT CStoreProcess[ptr: @q.busy]
DO
owner: Process ← q.busy;
IF owner = NIL THEN LOOP;
IF owner.state # running
THEN DirectedYield[ready, owner, 0];
Try to donate our cycles to the owner of the queue. Note that if the owner is running on another processor that there is not need to Yield, since the queue should be given up quite soon.
ENDLOOP;
};
ReleaseQueue:
PROC [q: Queue] = {
... releases ownership of the queue. Assumes that the current process owns the queue.
q.busy ← NIL;
};
Internal procedures (interrupts disabled, must hold metaLock)
DelicateNotify:
PROC [cond: Condition] =
INLINE {
... does a queue notification with the understanding that interrupts are disabled and the metaLock is held, so we just can't yield the processor. The condition must be resident, of course.
q: Queue = @cond.queue;
cond.flags[condRequestBit] ←
TRUE;
Register the request just in case we can't get the queue.
IF CStoreProcess[ptr: @q.busy]
THEN {
We got the queue, so we can make the best process in it ready. If we did not get the queue, then the owner of it will examine the flags pretty soon and see the notification. This only really works if the owner is trying to do a WaitInterrupt, though.
best: Process ← DequeueInner[q];
IF best #
NIL THEN {
cond.flags[condRequestBit] ←
FALSE;
The waiting process has been poked, no need to do it more than once.
DequeueMeta[@timeoutMetaQueue, best];
remove from timeout queue if present
EnqueueReady[best];
place the process on the ready queue (also maintains highestReadyPriority)
IF best.priority > lowestRunningPriority
THEN deferredWakeups ←
TRUE;
The new process might be worth waking, but we don't do it here. We just remember that it was desirable.
};
};
};
AssignProcessors:
PROC = {
... sets processor orders based on whether the a ready process is "better" than any running processes; we run through the ready processes in highest to lowest order. At the end of the reassignment, if we have to change the orders of any processor other than our own then we raise reschedule to get the other processors to do their switching.
The cost of algorithm in this procedure has an upper bound of O(N*MIN[N,MAX[K,R]), where N is the number of processors, K is highestReadyPriority-lowestRunningPriority, and R is the number of ready processes with priority IN (lowestRunningPriority..highestReadyPriority]. In practice, the algorithm has somewhat better average behavior, since in most cases there will have been only one process made ready by a single device interrupt, an idle processor will answer the interrupt first, and immediately assign the new process to itself with no further reschedule activity. In this (anticipated) normal case the cost is O(K)+O(1). This algorithm could be made faster in the high cost cases by keeping the processors sorted by priority, but that would increase costs in ProcessSwitch so it might not be a net perfomance gain.
needReschedule: BOOL ← FALSE;
current: Processor = CurrentProcessor;
firstAvail: Processor ← current;
deferredWakeups ←
FALSE;
At this point we are about to reassign all of the processors that need it. So nothing is really deferred.
FOR priority: Priority
DECREASING IN (lowestRunningPriority..highestReadyPriority]
DO
tail: Process = readyQueueArray[priority];
IF tail #
NIL THEN {
FOR process: Process ← tail.next, process.next
DO
best: Processor ← NIL;
bestPriority: Priority ← process.priority;
{
DO
The first scan is for some available processor that could run the requested process. If none is found, then we exit all loops.
best ← firstAvail;
IF best.orders = noChange
THEN {
np: Priority = best.running.priority;
IF np < bestPriority THEN {bestPriority ← np; EXIT};
};
firstAvail ← best.next;
IF firstAvail = current THEN GO TO done;
IF best.switchTo = process AND best.orders = switchToGiven THEN
GO TO alreadyAssigned;
ENDLOOP;
IF bestPriority # lowestRunningPriority
THEN
If the first available processor found is not already at the minimum possible priority then it is worth scanning for another processor with a lower priority.
FOR p: Processor ← best.next, p.next
WHILE p # current
DO
SELECT p.orders
FROM
switchToGiven => IF p.switchTo = process THEN GO TO alreadyAssigned;
noChange => {
np: Priority = p.running.priority;
IF np < bestPriority
THEN {
best ← p;
bestPriority ← np;
IF np = lowestRunningPriority THEN EXIT;
};
};
ENDCASE;
ENDLOOP;
We have found the best choice for the given process.
best.switchTo ← process;
best.orders ← switchToGiven;
IF best # current
THEN needReschedule ←
TRUE;
If we are gave orders to another processor, then we rememebr the fact.
IF best = firstAvail
THEN {
The best processor was the first one we found. Therefore we advance firstAvail to the next processor (unless that happens to be the current processor, in which case we are completely done scanning).
firstAvail ← best.next;
IF firstAvail = current THEN GO TO done;
};
EXITS alreadyAssigned => {
In this case the process was already assigned to a processor. So we want to choose another processor (starting at firstAvail) and another process. This should be rare.
};
};
IF process = tail THEN EXIT;
ENDLOOP;
EXITS done => EXIT;
};
ENDLOOP;
IF needReschedule THEN RaiseReschedule[];
};
ReleaseMetaLock:
PROC =
INLINE {
... releases ownership of the meta lock. On entry, we require that critical traps are disabled and that the metaLock is held. On exit, the metaLock is not held, although critical traps are still disabled.
metaLock ← NIL;
};
EnqueueReady:
PROC [process: Process] =
INLINE {
... places the given process on the ready queue using the meta link. On entry, we require that critical traps are disabled and that the metaLock is held. On exit, process.state = ready.
priority: Priority = process.priority;
EnqueueMeta[@readyQueueArray[priority], process];
IF priority > highestReadyPriority THEN highestReadyPriority ← priority;
process.state ← ready;
};
EnqueuePageFault:
PROC [process: Process] = {
... places the given process on the page fault queue using the meta link. On entry, we require that critical traps are disabled and that the metaLock is held. On exit, process.state = waitingPage.
EnqueueMeta[@pageFaultMetaQueue, process];
process.state ← waitingPage;
};
EnqueueMeta:
PROC [metaQueue: MetaQueue, process: Process] =
INLINE {
... places the given process on the given meta queue using the meta link. On entry, we require that critical traps are disabled and that the metaLock is held.
IF process.meta =
NIL THEN {
tail: Process = metaQueue^;
IF tail =
NIL
THEN process.meta ← process
ELSE {process.meta ← tail.meta; tail.meta ← process};
metaQueue^ ← process;
};
};
DequeueMeta:
PROC [metaQueue: MetaQueue, process: Process] =
INLINE {
... removes the given process from the given meta queue (timeout, ready, pageFault). On entry, we require that critical traps must be disabled and that the metaLock is held. The process state is not changed except for the meta link.
IF process.meta #
NIL THEN {
remove from meta queue if it is there
tail: Process = metaQueue^;
IF tail #
NIL THEN {
lag: Process ← tail;
DO
p: Process = lag.meta;
IF p = process
THEN {
IF lag = p
THEN metaQueue^ ←
NIL
last one in queue, so show no chain
ELSE {
IF p = tail
THEN metaQueue^ ← lag;
don't let deleted one be the queue head
lag.meta ← p.meta;
};
p.meta ← NIL;
EXIT;
};
lag ← p;
IF lag = tail THEN EXIT;
ENDLOOP;
};
};
};
Low-level primitives
DisableInterrupts:
PROC [] = {
... inhibits interrupts (must be a procedure call for Dragon)
NotYetImplemented[];
};
EnableInterrupts:
PROC [] = {
... enables interrupts (must be a procedure call for Dragon)
NotYetImplemented[];
};
CStoreProcess:
PROC[ptr: ProcessPtr, old: Process ←
NIL, new: Process ← CurrentProcess]
RETURNS [stored:
BOOL] =
-- MACHINE CODE -- {
... atomically performs {IF (stored ← ptr^ = old) THEN ptr^ ← new}.
IF (stored ← ptr^ = old) THEN ptr^ ← new;
};
CStoreWord:
PROC[ptr: WordPtr, old, new: Word]
RETURNS [stored:
BOOL] =
-- MACHINE CODE -- {
... atomically performs {IF (stored ← ptr^ = old) THEN ptr^ ← new}.
IF (stored ← ptr^ = old) THEN ptr^ ← new;
};
ReadAndClearDeviceRequests:
PROCRETURNS [rw: RequestWord ← NullRequestWord] =
-- MACHINE CODE -- {
... atomically reads and clears the device requests. This is done by reading the appropriate EU I/O location (Interrupt Request Word) then clearing the bits read.
NotYetImplemented[];
};
DeviceRequestsPending:
PROCRETURNS [b:
BOOL ←
FALSE] =
-- MACHINE CODE -- {
... Determines whether or not device requests are pending, but does not alter them. This can be done as a funny I/O command (preferable) or via a more synthetic method. If the latter, then it must agree with ReadAndClearDeviceRequests.
NotYetImplemented[];
};
RaiseReschedule:
PROC =
-- MACHINE CODE -- {
... raises the Reschedule interrupt, which, if disabled, will occur when it becomes enabled again. Wi
deferredWakeups ← FALSE;
NotYetImplemented[];
};
WordToInt:
PROC [x: Word]
RETURNS [
INT] =
-- MACHINE CODE -- {
... returns x > 0. This will be the correct implementation for Dragons, although it would do the wrong thing on Dorados.
RETURN [LOOPHOLE[x]];
};
EUFetchSpecial:
PROC[which: EUstateIndex]
RETURNS [w: Word ← WordZero] =
-- MACHINE CODE -- {
... fetches a word from the given special EU register.
NotYetImplemented[];
};
EUStoreSpecial:
PROC [which: EUstateIndex, x: Word] =
-- MACHINE CODE -- {
... stores a word into the given special EU register.
NotYetImplemented[];
};
SaveStack:
PROC =
INLINE {
... saves the stack (updates the current value of the hook EU register), returning the number of frames saved. The stack is saved up to (and including) the eldest marked frame. Also updates the framesLeft EU register. For excruciating details, see GenStack.mesa.
NotYetImplemented[];
};
MarkFrame:
PROC =
INLINE -- MACHINE CODE -- {
... marks the youngest frame on the IFU stack as being the last one to save. The eldest frame marked is actually the last one saved by SaveStack. This procedure probably needs to be called when interrupts are disabled.
NotYetImplemented[];
};
NotYetImplemented:
PROC = {
... marks operations with missing implementation details.
NotYetImplemented[];
};
Differences Between PrincOps and DragOps
Note: this section requires knowledge of the PrincOps architecture. I freely admit that my knowledge of said architecture is spotty at best. This section is also incomplete.
Multiple processors
With current PrincOps machines atomicity is acheived through a combination of locks, inhibition of interrupts, priority levels, and non-interruptibility of non-faulting microcode routines. In the DragOps arhcitecture, only locks (based on memory bus arbitration) and inhibition of interrupts for a single processor (based on IFU inhibition of the Reschedule interrupt) can be used as sources of atomicity. This complicates life, you betcha.
Memory vs. time
For PrincOps, memory was assumed to be expensive, and cycles were assumed to be expendable if it saved memory. For example, a process block costs only 8 bytes when no state vector is used, and only costs 36 additional bytes when a state vector is used (the unusual case).
The DragOps, memory is assumed to be relatively cheap, and cycles should only be spent to avoid gross memory costs. For example, a process block costs at least 40 bytes in the normal case, and there is no exceptional case.
The ready queue
For PrincOps, the head of the ready queue is the currently running process. For DragOps, running processes are not on the ready queue at all. Instead, running processes are located through the current ring of processors.
The DragOps ready queue is special in that it is actually an array of metaqueues, one for each priority level. This bounds the search for the best ready process to a time proportional to the number of priority levels (small), yet also allows us to place processes on the ready queue in unit time.
Queues and meta queues
For PrincOps, all queues are circular queues of processes linked through a particular word in the PSB (Process State Block). Condition variables, monitor locks, and even the ready queue are all queues where the queue pointer points at the tail of the processor queue. For DragOps, there are Queues and MetaQueues. Although both are circular queues, and both point at the tail, the link words are different. A DragOps process can be queued at the same time on both a queue and a metaqueue.
DragOps queues are used by "normal" Mesa monitors and condition variables. MetaQueues are used for more specialized reasons, like the ready queue (waiting for processor), the pageFault queue (waiting for a page to be brought into memory), and the timeout queue (waiting for some time to expire). Queues need not be pinned, while MetaQueues must be pinned. The scheduler is permitted to move processes between MetaQueues, but is not permitted to touch queues. All of this is done in the name of simplicity, you betcha.
Monitor locks & condition variables
For PrincOps, monitor locks and condition variables are managed by microcode that has carefully been constructed to not take interrupts at awkward times.
For DragOps, managing monitor lock and condition queues is too complex to be performed in a single instruction, so each monitor lock and condition has a spin lock that must be acquired before the queue can be examined or modified. A potential problem with this scheme is that a higher-priority process may attempt to acquire the spin lock while a lower-priority process possess the lock. Therefore, each spin lock has the address of the owning process, so a process that fails to acquire the lock can yiled control directly to the owning process. Although this is technically a violation of preemptive scheduling, it is necessary to prevent livelock (where no progress can be made because of the higher-priority process spinning).
State vectors
For PrincOps, the eval stack and other cleanup information is contained in a state vector. Minimal stack restrictions on monitor entry and condition wait ensure that the eval stack is empty when such processes go to sleep, which tends to keep the number of state vectors low.
For DragOps, there are no state vectors. When a process is about to go to sleep, it saves the frames in the IFU and EU stacks to memory, saves the appropriate EU registers to the process block, and puts itself on the appropriate metaqueue (if any). The only problem is ensuring that there are sufficient nachos to accomodate the stack of the process. The current thinking is to have an emergency-level process put every other process to sleep when frames are about to run out. When that has been done, a world-swap debug can take place.
An improvement on this scheme can be made if there is an error generated for a particular process when it has used "too many" frames. Since this is typically due to a process executing unbounded recursion, we can tolerate at least a few runaways without having to world-swap.
In both cases, we have to have some mechanism for keeping track of how many frames are in use by a given process, and how many are left in the global pool of frames. Details are yet to be specified.
Costs
For PrincOps, a process switch is on the order of 5 procedure call times. A procedure call (and return) is on the order of 5 microseconds (on the Dorado).
For DragOps, a process switch is on the order of 100 procedure call times (very rough estimate). Partly, this is because procedure calls are about 10 times faster, and partly this is because there is more per process state to save and restore.
Timeout
For PrincOps, a timeout occurs when the clock ticks and some PSB that is awaiting timeout has its wait count go to 0. All PSBs have to be touched at every tick, although this is done via microcode.
For DragOps, at each tick every process in timeoutMetaQueue has its when field decremented. If that field reaches 0, the process is placed in the ready queue. It then becomes the responsibility of the process to remove itself from the appropriate condition variable. This makes page faults (if any) due to touching the condition variable occur in the "right" process. Only PSBs waiting for timeout have to be touched at every tick.
Aborts
For PrincOps (as implemented for Cedar), aborts are enabled/inhibited on a per condition variable basis, and aborts are tested only by CVs & ICVs with aborts enabled and by explicit tests.
For DragOps, aborts are enabled/inhibited on a per process basis and on a CV basis. Abort requests are tested when waiting for CVs, ICVs, and MLs, as well as the normal explcit tests. Is it worth having the enable/disable on a per-process basis? (it costs rather little)
Things to do
This section is a list of things that still need some design work.
Fork / join
These need to be coded in greater detail.
Page faults
How will the fault handler work and place things to sleep waiting for page fault? For an EU page fault, the offending memory address is in the MAR register in the EU. For an IFU page fault the offending PC is for the start of the instruction, so the memory address to page in may be the next page (since this fits well with page clustering anyway, we may want to always page in the following page on an IFU page fault whenever the following page is also valid).
Soft and hard emergency limits on frames
These need support in the scheduler, as well as in stack save/restore. Reaching the soft limit (per process) should raise an error. This is just an attempt to restrain runaway processes. Reaching the hard limit will cause system shutdown and a call to the world-swap debugger. This probably will only happen if a significant number of runaways occur.
Timeout mechanism
In current Cedar relatively few condition variables have timeouts (the default is not to have timeouts). The current design for the timeout queue is based on the assumption that this situation will continue on Dragon. The current design has O(n) insertion and deletion of entries from the queue, O(n) updating for every tick, and O(1) testing, where n is the number of processes waiting for timeout. A heapsort-style priority queue has O(n log n) insertion and deletion, and O(1) testing, but it also has more complicated code, and needs an auxilliary pinned data structure (1 word / process). The current simple design will be used until it becomes a bottleneck.
Should the timeout "process" just be part of the scheduler? How will elapsed time be determined, and to what accuracy? Just for the record, there are three clocks of current interest to Cedar:
1: Time of day clock. This need not have resolution of better than a second, and it need not be locally stable, but it should have very good long-term accuracy.
2: Timeout clock. This need not have great accuracy, but it must generate interrupts, and it should have resolution of 10-100 milliseconds. More resolution is OK provided that timeout interrupts don't occur more often than every 10 milliseconds.
3: Elapsed time clock. This need not have great long-term accuracy, but it should have very good short-term accuracy, have roughly microsecond resolution, and it must be quite cheap to read. This clock can easily synthesize the Timeout clock.
Stack trace
SafeStorage needs a way to trace out the process stacks atomically. That is, all possible references in the process stacks must be discovered without any reference containing processes making headway. Dump frames are already pinned, and the panicStop mechanism can be used to stop all processes except for the current process (we may also want to have an stopRC command that would stop all processes that had not declared themselves to be non-RC). We can allocate frame extensions out of a special pool of memory that allows scanning. Possibly we should also make this pool pinned?
Reschedule & page fault
Setting L at procedure essentially transfers registers from the called frame to the calling frame, as far as stack save and restore are concerned. It will only work properly if both the calling frame and the called frame are in the EU (and IFU) registers. If the various compilers (and assembly language coders) guarantee that no calls are made before setting L, then there are only two sources of having the calling frame migrated into memory - reschedule & page fault. For these traps, if the youngest frame below the trap is P, and the next youngest is Q, and Q has zero registers, then both P and Q have to be restored before returning from the trap.