DragonProcesses.mesa
Copyright © 1984, 1985 by Xerox Corporation. All rights reserved.
Russ Atkinson (RRA) October 28, 1985 9:32:46 pm PST
About processes:
1. All processes execute in a single address space, although this will have to change some day.
2. Processes are lightweight. Besides the process stack there is relatively little state information.
3. Processes are dynamically mapped onto processors by the scheduler, so that processes can be considered virtual processors. We think of the scheduler as executing in the cracks between processes.
4. The scheduler attempts to run higher priority processes in preference to lower priority processes. However, there is no guarantee that a higher priority process will preempt a lower priority process.
About locks:
1. Primitive spin locks (PSLs) are the most primitive locks. PSLs are built out of CStore, are only acquired with reschedule off, and must be pinned. A single processor system should never wait for PSLs, obviously. In a multi-processor system PSLs must never be held long, or they become a bottleneck. The most important (and currently only) PSL is the metaLock, which controls access to the global process state information (especially the meta queues).
2. General spin locks (GSLs) use CStore and directed Yield. If CStore cannot acquire the lock, then the process attempting to acquire the lock donates its cycles to the process holding the lock using directed Yield. If the process holding the lock cannot be run, directed Yield acts like undirected Yield. GSLs can be non-pinned.
3. General queues (GQs) use a GSL to serialize ownership of the queue, and a circular queue of processes to hold the processes that are waiting. See Enqueue and Dequeue for details.
4. Monitor locks (MLs) are built from a GQ and an ownership word, which is used to denote the process owning the lock. See MonitorEntry and MonitorExit for details.
5. Condition variables (CVs) do not assume that a monitor is held, although one is always present. CVs also support timeouts. CVs are built from a GQ and a timeout word that holds the initial value of the timeout period.
6. Interrupt condition variables (ICVs) are the mechanism for turning device signals and interrupts into ICV notifications. ICVs do not assume that a monitor is held, since none is ever present. ICVs also support timeouts. ICVs are built from a CV and a request count for interrupt requests so we do not lose them while not waiting for the ICV. ICVs must be pinned, to allow the scheduler to remove requests from them, and to allow the request count to be changed by devices hooked up to physical memory.
DragonProcesses: PROGRAM = BEGIN
Types
Basic
WordPtr: TYPE = LONG POINTER TO Word;
Word: TYPE = PACKED ARRAY [0..31] OF BOOL;
WordZero: Word ← ALL[FALSE];
RequestKind: TYPE = [0..31];
RequestWordPtr: TYPE = LONG POINTER TO RequestWord;
RequestWord: TYPE = PACKED ARRAY RequestKind OF BOOL;
NullRequestWord: RequestWord = ALL[FALSE];
StackRegister: TYPE = MACHINE DEPENDENT {min (0), max (127)};
A StackRegister is an index into the EU stack registers.
Locks
MetaQueue: TYPE = LONG POINTER TO Process;
MetaQueueRep: TYPE = Process;
Queue: TYPE = LONG POINTER TO QueueRep;
QueueRep: TYPE = RECORD [
busy: Process, -- busy is a GSL that protects the chain
chain: Process -- chain points to the tail of a circular process queue
];
MonitorLock: TYPE = LONG POINTER TO MonitorLockRep;
MonitorLockRep: TYPE = RECORD [
queue: QueueRep, -- the GQ of waiting processes
owner: Process -- the process owning the lock
];
Condition: TYPE = LONG POINTER TO ConditionRep;
ConditionRep: TYPE = RECORD [
queue: QueueRep, -- the GQ of waiting processes
timeout: Word -- the timeout initialization
];
InterruptCondition: TYPE = LONG POINTER TO InterruptConditionRep;
InterruptConditionRep: TYPE = RECORD [
queue: QueueRep, -- the GQ of waiting processes
timeout: Word, -- the timeout initialization
requests: Word -- 0 => no requests, # 0 => pending requests
];
Process data
ProcessPtr: TYPE = LONG POINTER TO Process;
Process: TYPE = LONG POINTER TO ProcessRep;
ProcessRep: TYPE = RECORD [
queue: Queue, -- pointer to queue that this process is waiting for
next: Process, -- next process in above circular queue
meta: Process, -- next process in ready queue, timeout queue or page fault queue
when: Word, -- when timeout will occur OR page number for fault
priority: Priority, -- priority of this process
state: ProcessState, -- process state
euState: EUstate -- useful registers in EU
];
Priority: TYPE = MACHINE DEPENDENT {
slothful (0), -- user-level deep background processing (idle)
sluggish (1), -- user-level background processing
normal (2), -- user-level normal processing
perky (3), -- user-level foreground processing
nervous (4), -- system-level ?? processing
jumpy (5), -- system-level ?? processing
excited (6), -- system-level real-time processing
hyper (7) -- system-level emergency processing
};
ProcessState: TYPE = MACHINE DEPENDENT {
free (0), -- process is on free list
running (1), -- process is running (assigned to a processor)
ready (2), -- process is ready to run (on ready queue)
waitingPage (3), -- process is waiting for page fault (on page fault queue)
waitingCV (4), -- process is waiting for CV
waitingML (5), -- process is waiting for ML
waitingICV (6), -- process is waiting for ICV
done (7)  -- process is done (waiting for Join)
};
EUstateIndex: TYPE = MACHINE DEPENDENT {
Carry (0), -- the carry bit (not really a register)
Field (1), -- shifter control
Hook (2), -- pointer to the youngest frame saved to memory (a nacho)
FramesLeft (3) -- frames left before fault occurs
};
EUstate: TYPE = ARRAY EUstateIndex OF Word;
Nachos
Nachos are used to save EU & IFU registers associated with local frames. A chain of nachos is used to represent the process call stack. Nachos are fixed size to simplify and speed up allocation. Nachos are pinned to allow us to put a process to sleep without taking page faults.
Nacho: TYPE = LONG POINTER TO NachoRep;
NachoRep: TYPE = RECORD [
link: Nacho, -- link to the next elder frame in the process stack
nextPC: Word, -- the continuation PC for the frame
nRegs: Word, -- the # of registers saved in this Nacho
others: Nacho, -- the link to the area for more saved registers
regs: RegArray -- the saved registers (local variables)
];
RegArray: TYPE = ARRAY Reg OF Word;
Reg: TYPE = [0..15];
Processor data
Processors are chained together in a circular queue that is not modified after initialization.
Processor: TYPE = LONG POINTER TO ProcessorRep;
ProcessorRep: TYPE = RECORD [
next: Processor, -- next processor in ring
orders: ProcessorOrders, -- orders for what to do after reschedule
switchTo: Process, -- if orders = switchToGiven, then this is the one to switch to
running: Process -- the process currently being run {# NIL}
];
ProcessorOrders: TYPE = MACHINE DEPENDENT {
reset (0), -- useful during system init (?)
noChange (1), -- ignore the reschedule
switchToGiven (2), -- switch to process given by processor.switchTo (NIL => to best)
panicStop (3), -- save current process, then spin on these orders
stopped (4) -- stopped in response to panicStop
};
Global State
RunawayLimit: Word ← RunawayLimit;
... arbitrary limit on stack depth to guard against runaway processes. This limit could also be kept on a per-process basis at the cost of extra space. It is left as an unspecified constant.
EmergencyLimit: Word ← EmergencyLimit;
... arbitrary limit on remaining stack frames to ensure that enough is left to save the state information for all running processes. It is left as an unspecified constant.
metaLock: Process ← NIL;
the spin lock for meta queues; NIL means not busy; any process that desires to examine or modify any meta queue must acquire this lock and spin until it is free; the possessor of this lock must have traps disabled, and must not page fault
schedulerLock: Process ← NIL;
the lock for the scheduler; NIL means not busy; the possessor of this lock must have traps disabled, and must not page fault; the possessor of this lock has the sole responsibility to handle interrupts and to reschedule other processors; possessing this lock does not permit using a meta queue without acquiring the metaLock as well
readyQueueArray: ARRAY Priority OF Process;
All ready processes are queued in this array of metaqueues.
pageFaultMetaQueue: MetaQueueRep;
All processes waiting page faults are queued in this metaqueue.
pageFaultHandlerQueueRep: InterruptConditionRep;
pageFaultHandlerQueue: InterruptCondition = @pageFaultHandlerQueueRep;
This condition is notified (broadcast?) for every new page fault. The faulting processes are queued up on the pageFaultMetaQueue.
timeoutMetaQueue: MetaQueueRep;
All processes waiting timeouts are queued in this metaqueue.
reqWord: RequestWord ← NullRequestWord;
reqPtr: RequestWordPtr = @reqWord;
Requests for interrupt service are made through the reqWord. There are 32 different interrupt requests that are associated with conditions. For a device to request an interrupt, it sets its bit in the reqWord, then raises reschedule.
requestQueueArray: ARRAY RequestKind OF InterruptConditionRep;
External interrupts cause notifies through these conditions.
tailProcessor: Processor ← NIL;
Processors are arranged in a ring. This variable refers to the tail of the ring.
tailFreeProcess: Process ← NIL;
Free processes are arranged in a ring. This variable refers to the tail of the ring. This variable is protected by the metaLock.
panicStopLock: Process ← NIL;
Possessing this lock gives the current processor the exclusive right to come to a panic stop.
lowestRunningPriority: Priority ← slothful;
This is the priority of the lowest priority running process. It is used in the scheduler to accelerate testing for handling of reschedule interrupts. This value should only be used as a hint unless one is in possession of the metaLock.
deferredWakeups: BOOLFALSE;
TRUE if there have been calls to MakeReady since the last call to the scheduler or RaiseReschedule. This is a statistical hack to reduce the number of wakeups of processes that have been Notified before the monitor lock has been released.
spinTries: INT ← 8;
# of times to spin-test a spin-lock. This number should be 0 for a single-processor system, since there is no advantage to extra spins.
Reschedule Interrupt
RescheduleTrap: PROC = {
... is called whenever the reschedule interrupt is enabled and either the reschedule line is asserted OR reschedule traps become enabled and there is a deferred reschedule pending. In either case, we start up with critical traps inhibited, and a special return (hand-coded in the RescheduleTrap vector) enables critical traps on exit.
entryPriority: Priority ← CurrentProcess[].priority;
DO
** Step 1 ** try to grab ownership of the scheduler (if at lowest priority)
IF entryPriority = lowestRunningPriority AND schedulerLock = NIL THEN
This processor is one of the best to try the scheduler, and the scheduler is free.
IF CStoreProcess[ptr: @schedulerLock, old: NIL, new: CurrentProcess[]] THEN {
At this point we have acquired the scheduler.
** Step 2 ** sample & clear request word
currentReq: RequestWord;
DO
currentReq ← reqPtr^;
IF currentReq = NullRequestWord THEN EXIT;
IF CStoreReq[ptr: reqPtr, old: currentReq, new: NullRequestWord] THEN EXIT;
ENDLOOP;
** Step 3 ** service requests in currentReq
AcquireMetaLock[]; -- don't allow changes to the ready queue by other processors
IF currentReq # NullRequestWord THEN ServiceRequests[currentReq];
** Step 4 ** determine if processor assignments need to be reshuffled
AssignProcessors[];
** Step 5 ** release locks & test for new interrupt requests
schedulerLock ← NIL;
ReleaseMetaLock[];
};
** Step 6 ** normal exit, just follow orders
DO
SELECT CurrentProcessor[].orders FROM
noChange => EXIT;
switchToGiven => {
ProcessSwitch[ready, GetL[], CurrentProcessor[].switchTo];
CurrentProcessor[].orders ← noChange; -- reset the orders once followed
};
panicStop => {
This state occurs whenever we need to put the world to sleep in a hurry.
SaveProcessState[CurrentProcess[], GetL[]];
CurrentProcessor[].orders ← stopped; -- show that we really are stopped
};
ENDCASE;
ENDLOOP;
** Step 7 ** test for residual unserviced interrupts
IF reqPtr^ = NullRequestWord OR schedulerLock # NIL THEN EXIT;
entryPriority ← lowestRunningPriority;
By this point we have given a lowestRunningPriority processor the first chance to get in and process the interrupt. However, lowestRunningPriority must be treated as a hint, not as the truth. So if there are still device requests pending and no one has the scheduler, we go back to try to service those requests using the current processor. We change entryPriority to try to force an entry into the scheduler.
ENDLOOP;
};
External procedures
Fork: PROC [frame: Nacho] RETURNS [new: Process ← NIL] = {
... forks a new process, which will initially call the procedure at frame.nextPC with arguments given in the frame.
F1: allocate a new process. Return NIL if we can't do this.
F2: init the process block with the nacho (and other goodies)
F3: place the process in the ready list.
NotYetImplemented[];
};
Join: PROC [process: Process] RETURNS [frame: Nacho] = {
... joins an old process, which waits for the process to complete, then returns the return values as a nacho.
J1: wait for the process to be in the done state (wait on condition?)
J2: take the nacho from the process (must be only one, else in trouble)
J3: place the process in the free list (state ← free)
J4: return the nacho to the caller
NotYetImplemented[];
};
Yield: PROC [nextState: ProcessState ← ready, nextProcess: Process ← NIL, when: Word] = {
... yields the processor to the nextProcess. If nextProcess = NIL, chooses the "best" process to yield to. The current process will be placed in the nextState. Yield inhibits critical traps, and performs a ProcessSwitch. This routine must be called only when critical traps are enabled.
InhibitInterrupts[];
CurrentProcess[].when ← when;
ProcessSwitch[nextState, GetL[], nextProcess];
At this point, the processor may not be running the same process as it was at the start of this routine. That is OK, because the return PC is not the same, either, since we will get it from the hook.
EnableInterrupts[];
};
Wait: PROC [cond: Condition, lock: MonitorLock ← NIL] = {
... waits for the condition variable to be the target of a Notify, Broadcast, or Timeout. It is assumed that the given monitor lock is owned, and must be released during the possible pause. The monitor lock must also be re-acquired after the Wait has finished.
self: Process = CurrentProcess[];
Place self on the condition variable, setting the timeout; we do this before giving up the monitor to avoid missing a Notify, which would be wrong; we cannot get a Notify between now and releasing the monitor lock, since one must possess the monitor lock in order to do a Notify.
Enqueue[@cond.queue, self];
Exit the monitor; after this the condition variable can be Notify'd by another process
IF lock # NIL THEN MonitorExit[lock];
Go to sleep, choosing the "best" guy to run next. If we get a Notify between after exiting the monitor and before going to sleep, the Notify will remove this process from the condition, yet will NOT place this process on the ready queue (because our state is not yet waitingCV).
Yield[waitingCV, NIL, cond.timeout];
Cleanup the case where we are still on the condition queue. DequeueSelf has no effect if the current process is not on a queue, so a race with some other process trying to dequeue this process is benign.
IF self.next # NIL THEN DequeueSelf[@cond.queue];
Reenter the monitor.
IF lock # NIL THEN MonitorEntry[lock];
};
WaitInterrupt: PROC [cond: InterruptCondition] = {
... waits for the interrupt condition variable to be the target of an interrupt request, or Timeout. Unlike Wait, there are no monitor locks involved (although a version of this XOP could be produced that did release a monitor lock). Also unlike Wait, we have to proceed immediately if interrupts have occurred since the last WaitInterrupt.
self: Process = CurrentProcess[];
DO
First, a quick kill for requests during the time that the interrupt condition was not held.
currentRequests: Word = cond.requests;
IF currentRequests # WordZero THEN {cond.requests ← WordZero; RETURN};
Enqueue on the interrupt condition with interrupts disabled to avoid a deadlock with interrupt servicing; note the assumption that interrupt condition variables are pinned.
InhibitInterrupts[];
Enqueue[@cond.queue, self];
EnableInterrupts[];
Now that the interrupt condition is held, see if there were interrupts since the last sample. If so, we don't really want to go to sleep, we want to service those requests.
IF cond.requests = WordZero THEN Yield[waitingICV, NIL, cond.timeout];
We dequeue at this point to allow another process to use the interrupt condition. DequeueSelf has no effect if the current process is not on the queue.
DequeueSelf[@cond.queue];
ENDLOOP;
};
Notify: PROC [cond: Condition, lock: MonitorLock ← NIL] = {
... notifies the condition variable that the condition has occurred.
best: Process = Dequeue[@cond.queue];
IF best # NIL THEN MakeReady[best];
There really was a process waiting on the CV. Therefore, it is free to run, but we don't reschedule immediately (leave it to MonitorExit).
};
Broadcast: PROC [cond: Condition, lock: MonitorLock ← NIL] = {
... notifies all waiting processes that the condition has occurred. To eliminate race problems, all of the waiting processes are made ready while holding onto the queue.
tail: Process ← cond.queue.chain;
IF tail # NIL THEN {
AcquireQueue[@cond.queue];
IF (tail ← cond.queue.chain) # NIL THEN {
cond.queue.chain ← NIL;
DO
Remove all of the processes from the circular queue and make them ready.
this: Process = tail.next;
next: Process ← this.next;
tail.next ← next;
this.queue ← NIL;
this.next ← NIL;
MakeReady[this];
IF this = next THEN EXIT; -- we just removed the last one
ENDLOOP;
};
ReleaseQueue[@cond.queue];
};
};
MonitorEntry: PROC [lock: MonitorLock] = {
... is performed at the entry to any monitored procedure to lock up the monitor for exclusive access. This operation is performed frequently, so it should be fast in the case where the monitor is not already owned.
self: Process = CurrentProcess[];
DO
Try for conditional store of current PSB index with 0 wait chain. If we succeed, then we are in, and can return.
IF CStoreProcess[ptr: @lock.owner, old: NIL, new: self] THEN RETURN;
Place self on the monitor queue.
Enqueue[@lock.queue, self];
If the monitor is still held, then go to sleep, yielding to best process, only to wake up when the monitor queue is OK. Otherwise, some process released the monitor while we were trying to Enqueue, so we should get off the monitor queue and try again.
IF lock.owner # NIL
THEN Yield[waitingML, NIL, WordZero]
ELSE DequeueSelf[@lock.queue];
ENDLOOP;
};
MonitorExit: PROC [lock: MonitorLock] = {
... frees up the monitor lock. If there is a process waiting the monitor, dequeue the process from the monitor queue and place it in the ready queue. The current coding does nothing if we do not own the lock. Should we die instead of doing nothing?
IF lock.owner = CurrentProcess[] THEN {
lock.owner ← NIL;
Give up ownership of the lock.
IF lock.queue.chain # NIL THEN {
Make the next process waiting for the monitor ready to go after it. There is a slight possibility that some other process will jump in ahead of this one while the monitor is free, but that's life.
next: Process = Dequeue[@lock.queue];
IF next # NIL THEN MakeReady[next];
};
IF deferredWakeups THEN RaiseReschedule[];
deferredWakeups = TRUE indicates that we deferred a desirable wakeup while processing a Broadcast or Notify for some monitor. Statistically speaking, it was probably for this monitor. So we raise reschedule here to get the best processes running.
};
};
Special external procedures
ShuffleReadyQueues: PROC = {
This procedure is called by the timeslicer service. It advances all processes waiting in the ready queues by one, and directs all undirected processes to switch to the best available process(es).
InhibitInterrupts[];
AcquireMetaLock[];
First, order each processor to switch to the best available process.
FOR processor: Processor ← tailProcessor.next, processor.next DO
SELECT processor.orders FROM
noChange => {
Force a switch to the best available process
processor.orders ← switchToGiven;
processor.switchTo ← NIL;
};
ENDCASE;
IF processor = tailProcessor THEN EXIT;
ENDLOOP;
Second, rotate each ready queue.
FOR priority: Priority DECREASING IN Priority DO
Rotate the ready queues for each priority.
tail: Process = readyQueueArray[priority];
IF tail # NIL THEN readyQueueArray[priority] ← tail.meta;
ENDLOOP;
RaiseReschedule[];
ReleaseMetaLock[];
EnableInterrupts[];
};
CausePanicStop: PROC = {
When we want to perform stack tracing, delicate debugging, or prepare for world-swap debugging, we use this routine to shut down the world. On entry, interrupts are assumed to be enabled and the metaLock is free. On exit, interrupts are disabled, the metaLock is held, and the panicStopLock is held.
self: Process = CurrentProcess[];
InhibitInterrupts[];
WHILE NOT CStoreProcess[ptr: @panicStopLock, old: NIL, new: self] DO
spin using local cache only to reduce bus contention
WHILE panicStopLock # NIL DO SaveProcessState[self, GetL[]]; ENDLOOP;
ENDLOOP;
SaveProcessState[self, GetL[]];
We save the process state to make sure that the calling process is stable, and to make sure that we have enough stack registers to complete processing.
DO
done: BOOLTRUE;
AcquireMetaLock[];
FOR processor: Processor ← tailProcessor.next, processor.next DO
IF processor # CurrentProcessor[] THEN
SELECT processor.orders FROM
noChange => {
Force a switch to the best available process
processor.orders ← panicStop;
processor.switchTo ← NIL;
};
stopped => done ← FALSE;
ENDCASE => done ← FALSE;
IF processor = tailProcessor THEN EXIT;
ENDLOOP;
IF done THEN EXIT;
ReleaseMetaLock[];
RaiseReschedule[];
ENDLOOP;
At this point all of the processors except the current processor have entered the stopped state, and are waiting for different orders to be given. We return to the caller with interrupts off and the metaLock held.
};
ReleasePanicStop: PROC = {
Call this procedure to release all processes from their stopped state. We assume that the metaLock is held and interrupts are off, and that the current process holds the panicStopLock. On exit, interrupts are off, and the current process has released the metaLock and the panicStopLock.
FOR processor: Processor ← tailProcessor.next, processor.next DO
SELECT processor.orders FROM
stopped => processor.orders ← noChange;
ENDCASE;
IF processor = tailProcessor THEN EXIT;
ENDLOOP;
At this point all of the other processors have been released from bondage. All we have to do now is release the panicStopLock and the metaLock, enable interrupts, and return.
panicStopLock ← NIL;
ReleaseMetaLock[];
EnableInterrupts[];
};
Internal procedures
ProcessSwitch: PROC [nextState: ProcessState, limitL: StackRegister, nextProcess: Process ← NIL] = INLINE {
... saves the state of the current process. If the process is not waiting, then it is placed on the ready queue. If nextProcess # NIL then it is the next process to run. Otherwise, the "best" ready process is chosen. During this sequence, we assume that critical traps are disabled. Note that if nextProcess # NIL, we must switch to it if it is ready, even if the current process is "better" (see comments in AcquireQueue to see why we require this).
self: Process = CurrentProcess[];
SaveProcessState[self, limitL];
AcquireMetaLock[]; -- don't allow changes to the ready queue by other processors
SELECT nextState FROM
running =>
GO TO bailOut;
ready =>
EnqueueReady[self];
waitingPage =>
EnqueuePageFault[self];
waitingML => {
Check to see if this process is trying to go to sleep on a monitor lock that has already been exited, which could happen if the MonitorExit occurred after this process got on the queue, but before the Yield. We have carefully arranged for this to be OK with Dequeue and friends.
IF self.next = NIL THEN GO TO bailOut;
self.state ← nextState};
waitingCV, waitingICV => {
Check to see if this process has already been removed from the condition queue, which could happen if the Notify occurred after this process got on the queue, but before the Yield. We have carefully arranged for this to be OK with Dequeue and friends.
IF self.next = NIL THEN GO TO bailOut;
IF Positive[self.when] THEN EnqueueMeta[@timeoutMetaQueue, self];
self.state ← nextState};
ENDCASE;
SELECT TRUE FROM
nextProcess = NIL OR nextProcess.state # ready => {
This is an undirected yield OR a directed yield to a process that is not ready for some peculiar (possibly transient) reason. So just choose the best process to run.
nextProcess ← PickBestProcess[];
IF nextProcess = self THEN GO TO bailOut};
nextProcess.state = waitingPage => {
This is a directed yield to a process that is page faulting, so we should also wait for that page to be ready. It does us no good to proceed until that process is ready.
self.when ← nextProcess.when;
EnqueuePageFault[self];
nextProcess ← PickBestProcess[]};
ENDCASE;
This is a directed yield to a ready process. So we just switch to that process.
At this point nextProcess # NIL. The current intention is to have as many idle processes as processors, and to make sure that the idle processes are always ready to run.
Now take the selected nextProcess from the ready queue and run it.
DequeueMeta[@readyQueueArray[nextProcess.priority], nextProcess];
nextProcess.state ← running;
CurrentProcessor[].running ← nextProcess;
SELECT self.priority FROM
nextProcess.priority => {};
lowestRunningPriority => {
The process we are switching away from has the same priority as lowestRunningPriority, so it is possible that the lowestRunningPriority needs to be recomputed.
new: Priority ← tailProcessor.running.priority;
FOR each: Processor ← tailProcessor.next, each.next WHILE each # tailProcessor DO
IF new > each.running.priority THEN new ← each.running.priority;
ENDLOOP;
lowestRunningPriority ← new;
};
ENDCASE;
Now release the metaLock and restore the new state.
ReleaseMetaLock[];
RestoreProcess[nextProcess];
EXITS bailOut => ReleaseMetaLock[];
We get here when we have NOT found a better process to run. Therefore there is no change necessary, and things get done more quickly.
};
MakeReady: PROC [process: Process] = {
... places the given process on the ready queue using the meta link, provided that the process was waiting for a queue. Also removes process from the timeout queue if it is there. On entry, we require that critical traps must NOT be disabled.
InhibitInterrupts[];
AcquireMetaLock[];
IF process.queue = NIL THEN
SELECT process.state FROM
waitingCV, waitingML, waitingICV => {
DequeueMeta[@timeoutMetaQueue, process];
remove from timeout queue if present
EnqueueReady[process];
place the process on the ready queue
IF process.priority > lowestRunningPriority THEN deferredWakeups ← TRUE;
The new process might be worth waking, but we don't do it here. We just remember that it was desirable.
};
ENDCASE;
ReleaseMetaLock[];
EnableInterrupts[];
};
Enqueue: PROC [q: Queue, p: Process] = INLINE {
... places the process on the queue in FIFO order. Priority processing is the responsibility of Dequeue.
AcquireQueue[q];
{tail: Process = q.chain;
IF tail = NIL
THEN p.next ← p
ELSE {p.next ← tail.next; tail.next ← p};
q.chain ← p;
p.queue ← q};
ReleaseQueue[q];
};
Dequeue: PROC [q: Queue] RETURNS [best: Process ← NIL] = INLINE {
... performs dequeue by process priority. We perform a linear search of every process in the queue to find the best. The best process in the queue is the first one we find with the maximum priority. Performing the priority testing in Dequeue is more expensive for the case where there are several things on the queue, but it allows us to change process priorities without altering the queue. If the process we find has also timed out, we have to handle the case elsewhere, but it is OK for Dequeue to dequeue such a process. Dequeue does not remove such a process from the timeout queue.
lag: Process ← q.chain;
IF lag = NIL THEN RETURN; -- quick kill for nothing in the queue
AcquireQueue[q];
lag ← q.chain; -- have to refetch, in case it changed
IF lag # NIL THEN {
tail: Process = lag;
best ← lag.next;
IF lag = best
THEN q.chain ← NIL
only process in queue, so clear the chain
ELSE {
head: Process ← best; -- loop invariant: lag.next = best
this: Process ← head;
DO
next: Process ← this.next;
IF next = head THEN EXIT; -- we have wrapped
IF next.priority > best.priority THEN {lag ← this; best ← next};
new candidate for the best process
this ← next;
ENDLOOP;
IF tail = best THEN q.chain ← lag;
don't let deleted one be the queue head
lag.next ← best.next;
};
best.queue ← NIL;
best.next ← NIL; -- the last change, due to test in ProcessSwitch
};
ReleaseQueue[q];
};
DequeueSelf: PROC [q: Queue] = {
... dequeues the current process from the given queue. This operation has no effect if the process was not in the queue.
self: Process = CurrentProcess[];
IF self.next # NIL THEN {
lag: Process ← q.chain;
IF lag # NIL THEN {
AcquireQueue[q];
IF (lag ← q.chain) # NIL THEN {
tail: Process = lag;
DO
p: Process = lag.next;
IF p = self THEN {
IF p = lag
THEN q.chain ← NIL
last one in queue, so show no chain
ELSE {
IF p = tail THEN q.chain ← lag;
don't let deleted one be the queue head
lag.next ← p.next;
};
self.queue ← NIL;
p.next ← NIL; -- the last change, due to test in ProcessSwitch
EXIT;
};
lag ← p;
IF lag = tail THEN EXIT;
ENDLOOP;
};
ReleaseQueue[q];
};
};
};
AcquireQueue: PROC [q: Queue] = INLINE {
... grabs ownership of the given queue. Performs Yield[ready, owner] until queue is owned. This directed Yield is necessary to overcome nasty interactions between spin locks and preemptive scheduling. To avoid premature Yield on multi-processor systems, we spin a short time waiting for the spin lock to clear before trying the more expensive Yield.
WHILE NOT CStoreProcess[ptr: @q.busy, old: NIL, new: CurrentProcess[]] DO
owner: Process ← q.busy;
THROUGH [0..spinTries) WHILE owner # NIL DO owner ← q.busy; ENDLOOP;
IF owner # NIL THEN Yield[ready, owner, WordZero];
Try to donate our cycles to the owner of the queue.
ENDLOOP;
};
ReleaseQueue: PROC [q: Queue] = INLINE {
... releases ownership of the queue. Assumes that the current process owns the queue.
q.busy ← NIL;
};
Internal procedures (interrupts disabled, must not hold metaLock)
AcquireMetaLock: PROC = INLINE {
... grabs ownership of all of the meta queues. On entry, we require that critical traps are disabled, of course. On exit, we hold the meta lock.
WHILE NOT CStoreProcess[ptr: @metaLock, old: NIL, new: CurrentProcess[]] DO
spin using local cache only to reduce bus contention
WHILE metaLock # NIL DO ENDLOOP;
ENDLOOP;
};
SaveProcessState: PROC [process: Process, limitL: StackRegister] = INLINE {
... saves frames and process info from the current processor for the given process. On entry, we require that critical traps are disabled. On exit, there are no frames in the IFU stack (so a return will provoke a stack underflow). Note that this procedure must be an INLINE to keep from putting another frame on the stack.
Further, the limitL argument is used to limit the stack saving. The IFU & EU stacks are only saved up to (but not including) the last frame that has limitL as its saved L value. This ensures that we return to the proper context when we wake up a process.
save stack frames
SaveStack[limitL];
save EU regs
FOR which: EUstateIndex IN EUstateIndex DO
process.euState[which] ← EUFetchSpecial[which];
ENDLOOP;
check for local stack overflow
IF NOT Positive[process.euState[FramesLeft]] THEN
cause a stack overflow fault for the caller
NotYetImplemented[];
};
RestoreProcess: PROC [process: Process] = INLINE {
... restores the given process to a running state. On entry, we require that critical traps are disabled, and process.state = running. Note that this procedure must be an INLINE to keep from putting another frame on the stack.
restore EU regs
FOR which: EUstateIndex IN EUstateIndex DO
EUStoreSpecial[which, process.euState[which]];
ENDLOOP;
};
Internal procedures (interrupts disabled, must hold metaLock)
ServiceRequests: PROC [req: RequestWord] = INLINE {
... services requests in the req word by tweaking the appropriate interrupt condition queues. This loop is not the fastest way to do this, but it is not worth improving unless we find out that it is taking a lot of time.
FOR reqx: RequestKind IN RequestKind DO
IF req[reqx] THEN {
who: Process;
cond: InterruptCondition = @requestQueueArray[reqx];
cond.requests ← ALL[TRUE];
who ← Dequeue[@cond.queue];
IF who # NIL THEN
SELECT who.state FROM
waitingCV, waitingML, waitingICV => {
DequeueMeta[@timeoutMetaQueue, who];
EnqueueReady[who];
};
ENDCASE;
req[reqx] ← FALSE;
IF req = NullRequestWord THEN EXIT;
};
ENDLOOP;
};
AssignProcessors: PROC = {
... sets processor orders based on whether the a ready process is "better" than any running processes; we run through the ready processes in highest to lowest order. At the end of the reassignment, if we have to change the orders of any processor other than our own then we raise reschedule to get the other processors to do their switching.
needReschedule: BOOLFALSE;
current: Processor = CurrentProcessor[];
deferredWakeups ← FALSE;
At this point we are about to reassign all of the processors that need it. So nothing is really deferred.
FOR priority: Priority DECREASING IN Priority WHILE priority > lowestRunningPriority DO
tail: Process = readyQueueArray[priority];
IF tail # NIL THEN {
FOR process: Process ← tail.next, process.next DO
best: Processor ← NIL;
FOR processor: Processor ← current, processor.next DO
SELECT processor.orders FROM
switchToGiven =>
If the process is already assigned, then try for another process.
IF processor.switchTo = process THEN EXIT;
noChange =>
This processor is available for assignment.
SELECT TRUE FROM
processor.running.priority >= process.priority => {};
This processor is already running something at least as good as the candidate process, so we should not reassign it.
best = NIL OR processor.running.priority < best.running.priority =>
This processor is the best we have seen so far to run the given process. It may not be the best overall.
best ← processor;
ENDCASE;
ENDCASE;
IF processor.next = current THEN {
There are no more processors to check.
IF best = NIL THEN GO TO done;
If no suitable processor, then we will not find a suitable assignment for any other process at the same or a lower priority, so terminate the search.
We have found a best choice for the given process.
best.switchTo ← process;
best.orders ← switchToGiven;
IF best # CurrentProcessor[] THEN needReschedule ← TRUE;
EXIT;
};
ENDLOOP;
IF process = tail THEN EXIT;
ENDLOOP;
EXITS done => EXIT;
};
ENDLOOP;
IF needReschedule THEN RaiseReschedule[];
};
PickBestProcess: PROC RETURNS [best: Process ← NIL] = INLINE {
... returns the best process in the ready queue and returns it. On entry, we require that critical traps are disabled and that the metaLock is held. No change is made to the process selected.
FOR priority: Priority DECREASING IN Priority DO
tail: Process = readyQueueArray[priority];
IF tail # NIL THEN RETURN [tail.next];
ENDLOOP;
};
ReleaseMetaLock: PROC = INLINE {
... releases ownership of the meta lock. On entry, we require that critical traps are disabled and that the metaLock is held. On exit, the metaLock is not held, although critical traps are still disabled.
metaLock ← NIL;
};
EnqueueReady: PROC [process: Process] = INLINE {
... places the given process on the ready queue using the meta link. On entry, we require that critical traps are disabled and that the metaLock is held. On exit, process.state = ready.
EnqueueMeta[@readyQueueArray[process.priority], process];
process.state ← ready;
};
EnqueuePageFault: PROC [process: Process] = INLINE {
... places the given process on the page fault queue using the meta link. On entry, we require that critical traps are disabled and that the metaLock is held. On exit, process.state = waitingPage.
EnqueueMeta[@pageFaultMetaQueue, process];
process.state ← waitingPage;
};
EnqueueMeta: PROC [metaQueue: MetaQueue, process: Process] = INLINE {
... places the given process on the given meta queue using the meta link. On entry, we require that critical traps are disabled and that the metaLock is held.
IF process.meta = NIL THEN {
tail: Process = metaQueue^;
IF tail = NIL
THEN process.meta ← process
ELSE {process.meta ← tail.meta; tail.meta ← process};
metaQueue^ ← process;
};
};
DequeueMeta: PROC [metaQueue: MetaQueue, process: Process] = INLINE {
... removes the given process from the given meta queue (timeout, ready, pageFault). On entry, we require that critical traps must be disabled and that the metaLock is held. The process state is not changed.
IF process.meta # NIL THEN {
remove from meta queue if it is there
tail: Process = metaQueue^;
IF tail # NIL THEN {
lag: Process ← tail;
DO
p: Process ← lag.meta;
IF p = process THEN {
IF lag = p
THEN metaQueue^ ← NIL
last one in queue, so show no chain
ELSE {
IF p = tail THEN metaQueue^ ← lag;
don't let deleted one be the queue head
lag.meta ← p.meta;
};
p.meta ← NIL;
EXIT;
};
lag ← p;
IF lag = tail THEN EXIT;
ENDLOOP;
};
};
};
Low-level primitives
InhibitInterrupts: PROC [] = {
... inhibits interrupts
NotYetImplemented[];
};
EnableInterrupts: PROC [] = {
... enables interrupts
NotYetImplemented[];
};
CStoreProcess: PROC [ptr: ProcessPtr, old,new: Process] RETURNS [stored: BOOL] = {
... atomically performs {IF (stored ← ptr^ = old) THEN ptr^ ← new}.
IF (stored ← ptr^ = old) THEN ptr^ ← new;
};
CStoreWord: PROC [ptr: WordPtr, old,new: Word] RETURNS [stored: BOOL] = {
... atomically performs {IF (stored ← ptr^ = old) THEN ptr^ ← new}.
IF (stored ← ptr^ = old) THEN ptr^ ← new;
};
CStoreReq: PROC [ptr: RequestWordPtr, old,new: RequestWord] RETURNS [stored: BOOL] = {
... atomically performs {IF (stored ← ptr^ = old) THEN ptr^ ← new}.
IF (stored ← ptr^ = old) THEN ptr^ ← new;
};
RaiseReschedule: PROC = {
... raises the Reschedule interrupt, which, if disabled, will occur when it becomes enabled again.
deferredWakeups ← FALSE;
NotYetImplemented[];
};
CurrentProcess: PROC RETURNS [Process ← NIL] = {
... returns the current process. This is likely to be an Aux register.
NotYetImplemented[];
};
CurrentProcessor: PROC RETURNS [Processor ← NIL] = {
... releases ownership of the queue. This is likely to be an Aux register.
NotYetImplemented[];
};
Positive: PROC [x: Word] RETURNS [BOOL] = {
... returns x > 0. Currently ignore differences between halfword order on Dragons and Dorados.
RETURN [LOOPHOLE[x, INT] > 0];
};
VanillaAdd: PROC [x,y: Word] RETURNS [Word ← WordZero] = {
... returns x+y, no checking for overflow, no change to carry.
NotYetImplemented[];
};
VanillaSub: PROC [x,y: Word] RETURNS [Word ← WordZero] = {
... returns x-y, no checking for overflow, no change to carry.
NotYetImplemented[];
};
EUFetchSpecial: PROC [which: EUstateIndex] RETURNS [Word ← WordZero] = {
... fetches a word from the given special EU register.
NotYetImplemented[];
};
EUStoreSpecial: PROC [which: EUstateIndex, x: Word] = {
... stores a word into the given special EU register.
NotYetImplemented[];
};
GetL: PROC RETURNS [StackRegister ← min] = {
... returns the current value of L for the caller. This must be called with traps disabled to avoid having L move around after it has been sampled.
NotYetImplemented[];
};
SaveStack: PROC [limitL: StackRegister] = INLINE {
... saves the stack (updates the current value of the hook EU register), returns the number of frames saved. Also updates the FramesLeft EU register. For excruciating details, see DragonStackSave.mesa.
NotYetImplemented[];
};
NotYetImplemented: PROC = {
... marks operations with missing implementation details.
NotYetImplemented[];
};
Initialization
Initialization must setup the process blocks, the processor blocks, and the nachos. There will be a spin lock for performing initialization, with the first processor to grab the lock performing most of the work. Remaining processors will simply allocate themselves a processor object and wait for the scheduler to perform its tricks. None of this seems very hard, but none of it is written yet, either.
Init: PROC = {
NotYetImplemented[];
};
END.
Differences Between PrincOps and DragOps
Note: this section requires knowledge of the PrincOps architecture. I freely admit that my knowledge of said architecture is spotty at best. This section is also incomplete.
Multiple processors
With current PrincOps machines atomicity is acheived through a combination of locks, inhibition of interrupts, priority levels, and non-interruptibility of non-faulting microcode routines. In the DragOps arhcitecture, only locks (based on memory bus arbitration) and inhibition of interrupts for a single processor (based on IFU inhibition of the Reschedule interrupt) can be used as sources of atomicity. This complicates life, you betcha.
Memory vs. time
For PrincOps, memory was assumed to be expensive, and cycles were assumed to be expendable if it saved memory. For example, a process block costs only 8 bytes when no state vector is used, and only costs 36 additional bytes when a state vector is used (the unusual case).
The DragOps, memory is assumed to be relatively cheap, and cycles should only be spent to avoid gross memory costs. For example, a process block costs at least 40 bytes in the normal case, and there is no exceptional case.
The ready queue
For PrincOps, the head of the ready queue is the currently running process. For DragOps, running processes are not on the ready queue at all. Instead, running processes are located through the current ring of processors.
The DragOps ready queue is special in that it is actually an array of metaqueues, one for each priority level. This bounds the search for the best ready process to a time proportional to the number of priority levels (small), yet also allows us to place processes on the ready queue in unit time.
Queues and meta queues
For PrincOps, all queues are circular queues of processes linked through a particular word in the PSB (Process State Block). Condition variables, monitor locks, and even the ready queue are all queues where the queue pointer points at the tail of the processor queue. For DragOps, there are Queues and MetaQueues. Although both are circular queues, and both point at the tail, the link words are different. A DragOps process can be queued at the same time on both a queue and a metaqueue.
DragOps queues are used by "normal" Mesa monitors and condition variables. MetaQueues are used for more specialized reasons, like the ready queue (waiting for processor), the pageFault queue (waiting for a page to be brought into memory), and the timeout queue (waiting for some time to expire). Queues need not be pinned, while MetaQueues must be pinned. The scheduler is permitted to move processes between MetaQueues, but is not permitted to touch queues. All of this is done in the name of simplicity, you betcha.
Monitor locks & condition variables
For PrincOps, monitor locks and condition variables are managed by microcode that has carefully been constructed to not take interrupts at awkward times.
For DragOps, managing monitor lock and condition queues is too complex to be performed in a single instruction, so each monitor lock and condition has a spin lock that must be acquired before the queue can be examined or modified. A potential problem with this scheme is that a higher-priority process may attempt to acquire the spin lock while a lower-priority process possess the lock. Therefore, each spin lock has the address of the owning process, so a process that fails to acquire the lock can yiled control directly to the owning process. Although this is technically a violation of preemptive scheduling, it is necessary to prevent livelock (where no progress can be made because of the higher-priority process spinning).
State vectors
For PrincOps, the eval stack and other cleanup information is contained in a state vector. Minimal stack restrictions on monitor entry and condition wait ensure that the eval stack is empty when such processes go to sleep, which tends to keep the number of state vectors low.
For DragOps, there are no state vectors. When a process is about to go to sleep, it saves the frames in the IFU and EU stacks to memory, saves the appropriate EU registers to the process block, and puts itself on the appropriate metaqueue (if any). The only problem is ensuring that there are sufficient nachos to accomodate the stack of the process. The current thinking is to have an emergency-level process put every other process to sleep when frames are about to run out. When that has been done, a world-swap debug can take place.
An improvement on this scheme can be made if there is an error generated for a particular process when it has used "too many" frames. Since this is typically due to a process executing unbounded recursion, we can tolerate at least a few runaways without having to world-swap.
In both cases, we have to have some mechanism for keeping track of how many frames are in use by a given process, and how many are left in the global pool of frames. Details are yet to be specified.
Costs
For PrincOps, a process switch is on the order of 5 procedure call times. A procedure call (and return) is on the order of 5 microseconds (on the Dorado).
For DragOps, a process switch is on the order of 100 procedure call times (very rough estimate). Partly, this is because procedure calls are about 10 times faster, and partly this is because there is more per process state to save and restore.
Timeout
For PrincOps, a timeout occurs when the clock ticks and some PSB that is awaiting timeout has its wait count go to 0. All PSBs have to be touched at every tick, although this is done via microcode.
For DragOps, at each tick every process in timeoutMetaQueue has its when field decremented. If that field reaches 0, the process is placed in the ready queue. It then becomes the responsibility of the process to remove itself from the appropriate condition variable. This makes page faults (if any) due to touching the condition variable occur in the "right" process. Only PSBs waiting for timeout have to be touched at every tick.
Things to do
This section is a list of things that still need some design work.
Fork / join
These need to be coded in greater detail.
Page faults
How will the fault handler work and place things to sleep waiting for page fault? Where do we get the address of the word touched by the faulting process?
Soft and hard emergency limits on frames
These need support in the scheduler, as well as in stack save/restore. Reaching the soft limit (per process) should raise an error. This is just an attempt to restrain runaway processes. Reaching the hard limit will cause system shutdown and a call to the world-swap debugger. This probably will only happen if a significant number of runaways occur.
Timeout mechanism
In current Cedar relatively few condition variables have timeouts (the default is not to have timeouts). The current design for the timeout queue is based on the assumption that this situation will continue on Dragon. The current design has O(n) insertion and deletion of entries from the queue, and O(1) testing, where n is the number of processes waiting for timeout. A heapsort-style priority queue has O(n log n) insertion and deletion, and O(1) testing, but it also has more complicated code, and needs an auxilliary pinned data structure (1 word / process). The current simple design will be used until it becomes a bottleneck.
Should the timeout "process" just be part of the scheduler? How will elapsed time be determined, and to what accuracy? Just for the record, there are three clocks of current interest to Cedar:
1: Time of day clock. This need not have resolution of better than a second, and it need not be locally stable, but it should have very good long-term accuracy.
2: Timeout clock. This need not have great accuracy, but it must generate interrupts, and it should have resolution of 10-100 milliseconds. More resolution is OK provided that interrupts don't occur more often than every 10 milliseconds.
3: Elapsed time clock. This need not have great long-term accuracy, but it should have very good short-term accuracy, have roughly microsecond resolution, and it must be quite cheap to read. This clock can easily synthesize the Timeout clock.
Stack trace
SafeStorage needs a way to trace out the process stacks atomically. That is, all possible references in the process stacks must be discovered without any reference containing processes making headway. Dump frames are already pinned, and the panicStop mechanism can be used to stop all processes except for the current process (we may also want to have an stopRC command that would stop all processes that had not declared themselves to be non-RC). We can allocate frame extensions out of a special pool of memory that allows scanning. Possibly we should also make this pool pinned?
Reschedule & page fault
Setting L at procedure essentially transfers registers from the called frame to the calling frame, as far as stack save and restore are concerned. It will only work properly if both the calling frame and the called frame are in the EU (and IFU) registers. If the various compilers (and assembly language coders) guarantee that no calls are made before setting L, then there are only two sources of having the calling frame migrated into memory - reschedule & page fault. For these traps, if the youngest frame below the trap is P, and the next youngest is Q, and Q has zero registers, then both P and Q have to be restored before returning from the trap.