[Pixel]<DragonWare0.1>MesaRuntime>ProcessImpl.mesa!2

ProcessImpl.mesa

Russ Atkinson (RRA) October 9, 1986 4:57:12 am PDT

About processes:

1. All processes execute in a single address space, although we will want to extend our model to multiple address spaces some day (Dragon has address space switching support).

2. Processes are (relatively) lightweight. Most of the process switch time is devoted to saving and restoring registers. We estimate 100-300 msecs per process switch, depending on the amount of state.

3. Processes are dynamically assigned to processors by the scheduler, so that processes can be considered virtual processors. We think of the scheduler as executing in the cracks between processes.

4. The scheduler attempts to run higher priority processes in preference to lower priority processes. However, there is no guarantee that a higher priority process will preempt a lower priority process, even for single processor systems.

About locks:

1. Primitive spin locks (PSLs) are the most primitive locks. PSLs are built out of CStore, are only acquired with reschedule off, and must be pinned. A single processor system should never wait for PSLs, obviously. In a multi-processor system PSLs must never be held long, or they become a bottleneck. The most important (and currently only) PSL is the metaLock, which controls access to the global process state information (especially the meta queues).

2. General spin locks (GSLs) use CStore and directed Yield. If CStore cannot acquire the lock, then the process attempting to acquire the lock donates its cycles to the process holding the lock using directed Yield. If the process holding the lock cannot be run, directed Yield acts like undirected Yield. GSLs can be non-pinned.

3. General queues (GQs) use a GSL to serialize ownership of the queue, and a circular queue of processes to hold the processes that are waiting. See Enqueue and Dequeue for details.

4. Monitor locks (MLs) are built from a GQ and an ownership word, which is used to denote the process owning the lock. See MonitorEntry and MonitorExit for details.

5. Condition variables (CVs) do not assume that a monitor is held, although one is always present. CVs also support timeouts. CVs are built from a GQ and a timeout word that holds the initial value of the timeout period. A flags word contains a bit that can enable aborts to affect the CV.

6. Interrupt condition variables (ICVs) are the mechanism for turning device signals and interrupts into ICV notifications. ICVs do not assume that a monitor is held, since none is ever present. ICVs also support timeouts. ICVs are built from a CV and a request bit for interrupt requests so we do not lose them while not waiting for the ICV. ICVs must be pinned, to allow the scheduler to remove requests from them.

Carl Hauser, March 13, 1987 3:48:54 pm PST

DIRECTORY

DragOps,

Process;

ProcessImpl: PROGRAM

EXPORTS Process

= BEGIN

OPEN Process;

Types

Basic

WordSeq: TYPE = LONG POINTER TO WordSeqRep;

WordSeqRep: TYPE = RECORD [SEQUENCE COMPUTED CARDINAL OF Word];

Word: TYPE = DragOps.Word;

WordZero: Word ← DragOps.ZerosWord;

WordPtr: TYPE = LONG POINTER TO Word;

StackRegister: TYPE = MACHINE DEPENDENT {min (0), max (127)};

A StackRegister is an index into the EU stack registers.

AbortState: TYPE = MACHINE DEPENDENT {

none (0), -- process not requested to abort

requested (1), -- process abort requested

inhibited (2), -- process abort not allowed & not requested

delayed (3) -- process abort requested but not allowed

};

Nachos

Nachos are used to save EU & IFU registers associated with local frames. A chain of nachos is used to represent the process call stack. Nachos are fixed size to simplify and speed up allocation. Nachos are pinned to allow us to put a process to sleep without taking page faults.

Nacho: TYPE = LONG POINTER TO NachoRep;

NachoRep: TYPE = RECORD [

link: Nacho, -- link to the next elder frame in the process stack

nextPC: Word, -- the continuation PC for the frame

nRegs: Word, -- the # of registers saved in this Nacho

others: Nacho, -- the link to the area for more saved registers

regs: RegArray -- the saved registers (local variables)

];

RegArray: TYPE = ARRAY Reg OF Word;

Reg: TYPE = [0..15];

Global State

Aux registers

CurrentProcess: Process ← NIL;

This aux register holds the current process block address. Outside of regions holding the metaLock it must be consistent with CurrentProcessor.

CurrentProcessor: Processor ← NIL;

This aux register holds the current processor block address. Outside of regions holding the metaLock it must be consistent with CurrentProcess.

Global frame variables

runawayLimit: Word ← runawayLimit;

... arbitrary limit on stack depth to guard against runaway processes. This limit could also be kept on a per-process basis at the cost of extra space. It is left as an unspecified constant.

emergencyLimit: Word ← emergencyLimit;

... arbitrary limit on remaining stack frames to ensure that enough is left to save the state information for all running processes. It is left as an unspecified constant.

metaLock: Process ← NIL;

the spin lock for meta queues; NIL means not busy; any process that desires to examine or modify any meta queue must acquire this lock and spin until it is free; the possessor of this lock must have traps disabled, and must not page fault

schedulerLock: Process ← NIL;

the lock for the scheduler; NIL means not busy; the possessor of this lock must have interrupts disabled, and must not page fault; the possessor of this lock has the sole responsibility to handle interrupts and to reschedule other processors; possessing this lock does not permit using a meta queue without acquiring the metaLock as well

readyQueueArray: ARRAY Priority OF Process;

All ready processes are queued in this array of metaqueues.

pageFaultMetaQueue: MetaQueueRep;

All processes waiting page faults are queued in this metaqueue.

pageFaultCond: Condition;

All processes waiting to handle page faults are queued on this interrupt condition.

timeoutMetaQueue: MetaQueueRep;

All processes waiting timeouts are queued in this metaqueue.

requestQueueArray: ARRAY RequestKind OF ConditionRep;

External interrupts cause notifies through these conditions.

tailProcessor: Processor ← NIL;

Processors are arranged in a ring. This variable refers to the tail of the ring.

tailFreeProcess: Process ← NIL;

Free processes are arranged in a ring. This variable refers to the tail of the ring. This variable is protected by the metaLock.

panicStopLock: Process ← NIL;

Possessing this lock gives the current processor the exclusive right to come to a panic stop.

spinTries: NAT ← 10;

This is the number of times to spin-test in MonitorEntry when the monitor lock is held by another running processor. The "correct" value will de determined by experimentation.

lowestRunningPriority: Priority ← Priority.FIRST;

This is the priority of the lowest priority running process. It is used in the scheduler to accelerate testing for handling of reschedule interrupts. This value should only be used as a hint unless one is in possession of the metaLock.

highestReadyPriority: Priority ← Priority.FIRST;

This is the priority of the highest priority process on the ready queue. It is used as an accelerator for various cases.

deferredWakeups: BOOL ← FALSE;

TRUE if there have been calls to MakeReady since the last call to the scheduler or RaiseReschedule AND the priority of the process to make ready is likely to be worth running immediately. This is a statistical hack to reduce the number of wakeups of processes that have been Notified before the monitor lock has been released.

It may be worth keeping this flag on a finer basis than globally, or by having some other hack entirely.

External procedures

Fork: PUBLIC PROC
[proc: PROC ANY RETURNS ANY, argPtr: WordSeq, nArgs: INT, nRets: INT]
RETURNS [new: Process ← NIL] = {

... forks a new process, which will initially call the given procedure using the given arguments. The number of words of arguments and returns is also necessary.

In the calling process:

F1: allocate a new process. Return NIL if we can't do this. (Error??)

-- CHH: PrincOps ProcessImpl.Fork waits until a process is available. Do we really want to change the semantics? March 12, 1987

F2: newFX ← Allocate a FX of (IF nRets > maxStackRets THEN nRets ELSE 0)+(IF nArgs > maxStackArgs THEN nArgs ELSE 0) words.

F3: copy the arguments from argPtr^ for nArgs words to newFX

(the original argument storage belongs to the caller)

F4: setup the process block & make it ready to run Forkee, having set things up so that Forkee's arguments will be correct. Forkee[proc, newFX, nArgs, nRets]

NotYetImplemented[];

};

Forkee: PROC [proc: PROC ANY RETURNS ANY, argPtr: WordSeq, nArgs: INT, nRets: INT] RETURNS [--doesn't--] ~ {

retRecord: WordSeq;

FF0: IF nRets > maxStackRets THEN

{ retRecord ← argPtr^;

argPtr^ ← argPtr + nArgs;

};

FF1: Depending on nArgs, either push argPtr or push nArgs from argPtr^;

FF2: Push appropriate descriptor information for proc

FF3: Call proc using an inline SFCI instruction; it eventually will return or not as it sees fit.

JJ1: proc's RETURN record may either be in the FX pointed to by argPtr at offset nArgs or on the stack.

JJ2: wait for JOIN or DETACH by another process.

JJ3: IF JOINed

THEN {

Copy return values in stack or FX to a Nacho;

Give the Nacho to the other process

}

JJ4: FrameAllocator.Free[argPtr];

JJ5: place the process in the free list (state ← free), free any remaining resources (possibly a FX) and quietly disappear

NotYetImplemented[];

};

Join: PUBLIC PROC [process: Process, retPtr: WordSeq] = {

... joins the specified process, which waits for the process to complete, then copies the return values to the address given by retPtr.

In the calling process:

J1: wait for process to be in the done state (wait on condition?)

J2: copy the return values to retPtr^

J3: place the process in the free list (state ← free)

J4: return to the caller

NotYetImplemented[];

};

DirectedYield: PUBLIC SAFE PROC
[nextState: ProcessState, nextProcess: Process, when: Ticks] = TRUSTED {

... yields the processor to the nextProcess, placing the current process in nextState. If nextProcess = NIL or is not ready, then the executing processor is yielded to the "best" ready process. The current process will be placed in the nextState. DirectedYield inhibits critical traps, and performs a ProcessSwitch. The when word is used for passing through timeout information. This routine must be called only when critical traps are enabled.

DisableInterrupts[];

CurrentProcess.when ← LOOPHOLE[when];

MarkFrame[];

Indicates that the current frame should not be saved.

ProcessSwitch[nextState, nextProcess];

At this point, the processor may not be running the same process as it was at the start of this routine. That is OK, because the return PC is not the same, either, since we will get it from the hook.

EnableInterrupts[];

};

Wait: PUBLIC PROC [cond: Condition, lock: MonitorLock] = {

... waits for the condition variable cond to be the target of a Notify, Broadcast, or timeout. If lock # NIL, then lock must be released during the possible pause, and lock must be re-acquired after the Wait has finished.

Enqueue[@cond.queue];

Place self on the condition variable; we do this before giving up the monitor to avoid missing a Notify, which would be wrong; we cannot get a Notify between now and releasing the monitor lock, since one must possess the monitor lock in order to do a Notify.

IF lock # NIL THEN MonitorExit[lock];

Exit the monitor; after this the condition variable can be Notify'd by another process

IF CurrentProcess.abort # requested OR NOT cond.flags[abortEnableBit] THEN

Go to sleep, choosing the "best" guy to run next. If we get a Notify between after exiting the monitor and before going to sleep, the Notify will remove this process from the condition, yet will NOT place this process on the ready queue (because our state is not yet waitingCV).

DirectedYield[waitingCV, NIL, cond.timeout];

DequeueSelf[@cond.queue];

If the current process is still on the condition queue then it must remove itself. DequeueSelf has no effect if the current process is not on a queue, so a race with some other process trying to dequeue this process is benign.

IF lock # NIL THEN MonitorEntry[lock];

Reenter the monitor.

IF CurrentProcess.abort = requested AND cond.flags[abortEnableBit] THEN TestAbort[];

Final test for abort request while we were waiting on the CV.

};

WaitInterrupt: PUBLIC PROC [cond: Condition] = {

... waits for cond to be the target of an interrupt request, or timeout. Unlike Wait, there are no monitor locks involved, and we have to proceed immediately if interrupts have occurred since the last WaitInterrupt.

First, a quick test for requests during the time that the interrupt condition was not held.

currentFlags: CondFlags ← cond.flags;

IF currentFlags[condRequestBit] THEN {

currentFlags[condRequestBit] ← FALSE;

cond.flags ← currentFlags;

EXIT;

};

Enqueue on cond with interrupts disabled to avoid a deadlock with interrupt servicing; note the assumption that interrupt condition variables are pinned.

DisableInterrupts[];

Enqueue[@cond.queue];

EnableInterrupts[];

Now that cond is held, see if there were interrupts since the last sample. If so, we don't really want to go to sleep, we want to service those requests.

IF NOT cond.flags[condRequestBit] THEN

IF CurrentProcess.abort # requested OR NOT cond.flags[abortEnableBit] THEN

DirectedYield[waitingICV, NIL, cond.timeout];

We dequeue at this point to allow another process to use cond. DequeueSelf has no effect if the current process is not on the queue.

DequeueSelf[@cond.queue];

ENDLOOP;

IF CurrentProcess.abort = requested AND cond.flags[abortEnableBit] THEN TestAbort[];

};

Notify: PUBLIC PROC [cond: Condition] = {

... notifies the "best" process waiting on cond that the condition has occurred. The protecting monitor lock is presumed to be held, but is not needed for this primitive.

best: Process = Dequeue[@cond.queue];

IF best # NIL THEN MakeReady[best];

There really was a process waiting on the CV. Although a process may be made ready, the process switch is not immediately required. The process switch may occur when the protecting monitor lock is released.

};

Broadcast: PUBLIC PROC [cond: Condition] = {

... notifies all waiting processes waiting on cond that the condition has occurred. To eliminate race problems, all of the waiting processes are made ready while holding onto the queue. The protecting monitor lock is presumed to be held, but is not needed for this primitive.

tail: Process ← cond.queue.chain;

IF tail # NIL THEN {

AcquireQueue[@cond.queue];

IF (tail ← cond.queue.chain) # NIL THEN {

cond.queue.chain ← NIL;

Remove all of the processes from the circular queue and make them ready. Although processes may be made ready, the process switch is not immediately required. The process switch may occur when the protecting monitor lock is released.

this: Process = tail.next;

next: Process ← this.next;

tail.next ← next;

this.queue ← NIL;

this.next ← NIL;

MakeReady[this];

IF this = next THEN EXIT; -- we just removed the last one

ENDLOOP;

};

ReleaseQueue[@cond.queue];

};

MonitorEntry: PUBLIC PROC [lock: MonitorLock] = {

... is performed at the entry to any monitored procedure to lock up the monitor for exclusive access. This operation is performed frequently, so it should be fast in the case where the monitor is not already owned.

self: Process = CurrentProcess;

Try for conditional store of current PSB index with 0 wait chain. If we succeed, then we are in, and can return.

IF CStoreProcess[ptr: @lock.owner] THEN RETURN;

Place self on the monitor queue.

Enqueue[@lock.queue];

If the monitor is still held, then go to sleep, yielding to best process, only to wake up when the monitor queue is OK. Otherwise, some process released the monitor while we were trying to Enqueue, so we should get off the monitor queue and try again.

{

owner: Process = lock.owner;

IF owner # NIL THEN {

THROUGH [0..spinTries) WHILE owner.state = running DO ENDLOOP;

IF lock.owner # NIL THEN DirectedYield[waitingML, lock.owner, 0];

};

DequeueSelf[@lock.queue];

};

ENDLOOP;

};

MonitorExit: PUBLIC PROC [lock: MonitorLock] = {

... frees up the monitor lock. If there is a process waiting the monitor, dequeue the process from the monitor queue and place it in the ready queue. The current coding does nothing if we do not own the lock. Should we die instead of doing nothing?

IF lock.owner = CurrentProcess THEN {

lock.owner ← NIL;

Give up ownership of the lock.

IF lock.queue.chain # NIL THEN {

Make ready the next process waiting for the monitor. There is a slight possibility that some other process will jump in ahead of this one while the monitor is free, but fairness is not guaranteed.

next: Process = Dequeue[@lock.queue];

IF next # NIL THEN MakeReady[next];

};

IF deferredWakeups THEN RaiseReschedule[];

deferredWakeups = TRUE indicates that we deferred a desirable wakeup while processing a Broadcast or Notify for some monitor. Statistically speaking, it was probably for this monitor. So we raise reschedule here to get the best processes running.

};

TestAbort: PUBLIC PROC = {

... tests for a requested abort.

self: Process = CurrentProcess;

IF self.abort = requested THEN {self.abort ← none; ERROR ABORTED};

};

RequestAbort: PUBLIC PROC [process: Process] = {

... requests the given process to abort itself.

SELECT process.abort FROM

none => {

process.abort ← requested;

MakeReady[process];

RaiseReschedule[];

};

inhibited =>

process.abort ← delayed;

ENDCASE;

};

Reschedule Interrupt

RescheduleTrap: PROC = {

... is called whenever the reschedule interrupt is enabled and either the reschedule line is asserted OR reschedule traps become enabled and there is a deferred reschedule pending. In either case, we start up with critical traps inhibited, and a special return (hand-coded in the RescheduleTrap vector) enables critical traps on exit.

RescheduleTrap is primarily called in response to two causes: device interrupts and explicit condition variable wakeups. Device interrupts are turned into condition variable wakeups in this routine, so the appearance outside of this routine is that we only have condition variable wakeups. Statistically we only get enter RescheduleTrap if there is a ready process with priority >= lowestRunningPriority, and we try to give the scheduler to a processor with priority = lowestRunningPriority. The major exception to this is time-slicing, which should be kept to a modest rate (the current system uses about 10 per second).

entryPriority: Priority ← CurrentProcess.priority;

First, try to grab ownership of the scheduler (if at lowest priority)

IF entryPriority.ORD <= lowestRunningPriority.ORD AND schedulerLock = NIL THEN

This processor is one of the best to try the scheduler, and the scheduler is free.

IF CStoreProcess[ptr: @schedulerLock] THEN {

At this point we have acquired the scheduler.

currentReq: RequestWord ← ReadAndClearDeviceRequests[];

sample & clear request word (currently protected by the schedulerLock)

AcquireMetaLock[];

don't allow changes to the ready queue by other processors

IF currentReq # NullRequestWord THEN

Service requests in the currentReq word by tweaking the appropriate interrupt condition queues. This loop is not the fastest way to do this, but it is not worth improving unless we find out that it is taking a lot of time.

FOR reqx: RequestKind IN RequestKind DO

IF currentReq[reqx] THEN {

cond: Condition = @requestQueueArray[reqx];

DelicateNotify[cond];

currentReq[reqx] ← FALSE;

IF currentReq = NullRequestWord THEN EXIT;

};

ENDLOOP;

IF highestReadyPriority.ORD > lowestRunningPriority.ORD THEN

There are ready processes that are better than at least some processor, so go do the bulk reassignment.

AssignProcessors[];

release locks (in the given order)

schedulerLock ← NIL;

ReleaseMetaLock[];

};

Normal exit, just follow orders.

SELECT CurrentProcessor.orders FROM

noChange => EXIT;

If no orders to follow, then exit the inner loop.

switchToGiven => {

The executing processor has been asked to either switch to a particular process or to switch to the "best" available process.

nextProcess: Process ← CurrentProcessor.switchTo;

CurrentProcessor.orders ← noChange;

Reset the orders, so the executing processor is available for more orders
(like during the call to ProcessSwitch).

IF (nextProcess # NIL AND nextProcess.state = ready)
OR CurrentProcess.priority <= highestReadyPriority THEN {

It is likely that a switch is necessary. Otherwise keep running the current process.

MarkFrame[];

Indicates that only frames older than the current frame are to be saved.

ProcessSwitch[ready, nextProcess];

};

panicStop => {

This state occurs whenever we need to put the world to sleep in a hurry.

self: Process = CurrentProcess;

MarkFrame[];

Indicates that only frames older than the current frame are to be saved.

SaveProcessState[self];

Save the state so it can be examined/modified.

CurrentProcessor.orders ← stopped;

show that we really are stopped

RestoreProcess[self];

Just in case some piece of state was modified while we were stopped.

};

ENDCASE;

ENDLOOP;

Test for residual unserviced interrupts

IF schedulerLock # NIL THEN EXIT;

If some process already has the scheduler lock then we need not persist.

IF DeviceRequestsPending[] THEN EXIT;

If no device requests, then there is no point in oing back for another try at the scheduler.

IF entryPriority.ORD > lowestRunningPriority.ORD THEN

entryPriority ← entryPriority.PRED;

By this point we have given a lowestRunningPriority processor the first chance to get in and process the interrupt. However, lowestRunningPriority must be treated as a hint, not as the truth. So if there are still device requests pending and no one has the scheduler, we go back to try to service those requests using the current processor. We lower entryPriority to try to force an entry into the scheduler.

ENDLOOP;

};

Special external procedures

ShuffleReadyQueues: PUBLIC PROC = {

... is called by the timeslicer service. It advances all processes waiting in the ready queues by one, and directs all undirected processes to switch to the best available process(es).

DisableInterrupts[];

AcquireMetaLock[];

Just order each processor to switch to the best available process.

FOR processor: Processor ← tailProcessor.next, processor.next DO

SELECT processor.orders FROM

noChange => {

Force a switch to the best available process

processor.orders ← switchToGiven;

processor.switchTo ← NIL;

};

ENDCASE;

IF processor = tailProcessor THEN EXIT;

ENDLOOP;

Last, raise the reschedule interrupt to make all processors choose the best available. Even the executing processor will eventually respond.

RaiseReschedule[];

ReleaseMetaLock[];

EnableInterrupts[];

};

CausePanicStop: PUBLIC PROC = {

... causes all processors except the current one to come to a screeching halt with properly saved state. When we want to perform stack tracing, delicate debugging, or prepare for world-swap debugging, we use this routine to shut down the world. On entry, interrupts are assumed to be enabled and the metaLock is free. On exit, interrupts are disabled, the metaLock is held, and the panicStopLock is held.

self: Process = CurrentProcess;

DisableInterrupts[];

Inhibit interrupts while we try to acquire the panicStopLock.

IF CStoreProcess[ptr: @panicStopLock] THEN EXIT;

If panicStopLock is acquired, exit with interrupts disabled.

EnableInterrupts[];

Allow interrupts to occur while we are waiting. This allows the current processor to panic stop if it is requested to do so.

WHILE panicStopLock # NIL DO

Spin without using CStore.

ENDLOOP;

MarkFrame[];

Indicates that only frames older than the current frame are to be saved.

SaveProcessState[self];

We save the process state to make sure that the calling process is stable, and to make sure that we have enough stack registers to complete processing. We must have interrupts disabled to call this routine, but we don't need the metaLock.

current: Processor ← CurrentProcessor;

done: BOOL ← TRUE;

AcquireMetaLock[];

FOR processor: Processor ← current.next, processor.next WHILE processor # current DO

SELECT processor.orders FROM

noChange => {

Force a switch to the best available process

processor.orders ← panicStop;

processor.switchTo ← NIL;

done ← FALSE;

};

stopped => {};

ENDCASE => done ← FALSE;

ENDLOOP;

IF done THEN EXIT;

ReleaseMetaLock[];

RaiseReschedule[];

ENDLOOP;

At this point all of the processors except the current processor have entered the stopped state, and are waiting for different orders to be given. We return to the caller with interrupts off and the metaLock held.

};

ReleasePanicStop: PUBLIC PROC = {

... allows the stopped processors to resume with normal processing. Call this procedure to release all processes from their stopped state. We assume that the metaLock is held and interrupts are off, and that the current process holds the panicStopLock. On exit, interrupts are off, and the current process has released the metaLock and the panicStopLock.

FOR processor: Processor ← tailProcessor.next, processor.next DO

SELECT processor.orders FROM

stopped => processor.orders ← noChange;

ENDCASE;

IF processor = tailProcessor THEN EXIT;

ENDLOOP;

At this point all of the other processors have been released from bondage. All we have to do now is release the panicStopLock and the metaLock, enable interrupts, and return.

panicStopLock ← NIL;

ReleaseMetaLock[];

EnableInterrupts[];

};

SatisfyPageFault: PUBLIC PROC [startPage, endPage: INT] = {

... for each process P in the page fault with startPage d P.when d endPage, make the process ready.

DisableInterrupts[];

AcquireMetaLock[];

{

tail: Process ← pageFaultMetaQueue;

p: Process ← tail;

WHILE p # NIL DO

next: Process ← p.meta;

SELECT p.state FROM

waitingPage => IF WordToInt[p.when] IN [startPage..endPage] THEN {

DequeueMeta[@pageFaultMetaQueue, p];

EnqueueReady[p];

place the process on the ready queue (also maintains highestReadyPriority)

IF p.priority > lowestRunningPriority THEN deferredWakeups ← TRUE;

The new process might be worth waking, but we don't do it here. We just remember that it was desirable.

};

ENDCASE;

p ← next;

IF p = tail THEN EXIT;

ENDLOOP;

};

ReleaseMetaLock[];

EnableInterrupts[];

IF deferredWakeups THEN RaiseReschedule[];

};

Internal procedures

ProcessSwitch: PROC [nextState: ProcessState, nextProcess: Process] = {

... saves the state of the current process. If the process is not waiting, then it is placed on the ready queue. If nextProcess # NIL then it is the next process to run. Otherwise, the "best" ready process is chosen. During this sequence, we assume that critical traps are disabled. Note that if nextProcess # NIL, we must switch to it if it is ready, even if the current process is "better" (see comments in AcquireQueue to see why we require this).

self: Process = CurrentProcess;

SaveProcessState[self];

This is expensive enough to do outside of the region that holds the metaLock.

AcquireMetaLock[];

From now on, don't allow changes to the ready queue by other processors.

SELECT nextState FROM

waitingML => {

Check to see if this process is trying to go to sleep on a monitor lock that has already been exited, which could happen if the MonitorExit occurred after this process got on the queue, but before the Yield. We have carefully arranged for this to be OK with Dequeue and friends.

IF self.next = NIL THEN GO TO bailOut;

self.state ← nextState;

};

waitingCV, waitingICV => {

Check to see if this process has already been removed from the condition queue, which could happen if the Notify occurred after this process got on the queue, but before the Yield. We have carefully arranged for this to be OK with Dequeue and friends.

IF self.next = NIL THEN GO TO bailOut;

IF WordToInt[self.when] > 0 THEN EnqueueMeta[@timeoutMetaQueue, self];

self.state ← nextState;

};

waitingPage => {

A page fault has occured, so we need to put the offending process to sleep on the page fault metaQueue, then notify the page fault handling process.

EnqueuePageFault[self];

DelicateNotify[pageFaultCond];

};

ENDCASE =>

EnqueueReady[self];

IF nextProcess = NIL OR nextProcess.state # ready THEN {

This is an undirected yield OR a directed yield to a process that is not ready for some peculiar (possibly transient) reason. So just choose the best process to run.

IF nextProcess # NIL AND nextProcess.state = waitingPage AND self.meta = NIL THEN {

This is a directed yield to a process that is page faulting, so we should also wait for that page to be ready. It does us no good to proceed until that process is ready.

self.when ← nextProcess.when;

EnqueuePageFault[self];

};

nextProcess ← NIL;

Now choose the best process to run.

FOR priority: Priority DECREASING IN [Priority.FIRST..highestReadyPriority] DO

tail: Process = readyQueueArray[priority];

IF tail # NIL THEN {nextProcess ← tail.next; EXIT};

ENDLOOP;

};

At this point nextProcess # NIL AND nextProcess.state = ready. The current intention is to have as many idle processes as processors, and to make sure that the idle processes are always ready to run.

DequeueMeta[@readyQueueArray[nextProcess.priority], nextProcess];

Note: this is currently the ONLY place that a process is removed from a ready queue!

Sigh, we need to recompute the highestReadyPriority.

pri: Priority ← highestReadyPriority;

IF pri = Priority.FIRST OR readyQueueArray[pri] # NIL THEN EXIT;

highestReadyPriority ← pri.PRED;

ENDLOOP;

Now make nextProcess run on the executing processor.

nextProcess.state ← running;

CurrentProcessor.running ← CurrentProcess ← nextProcess;

SELECT self.priority FROM

nextProcess.priority => {};

lowestRunningPriority => {

The process we are switching away from has the same priority as lowestRunningPriority, so it is possible that the lowestRunningPriority needs to be recomputed.

new: Priority ← tailProcessor.running.priority;

FOR each: Processor ← tailProcessor.next, each.next WHILE each # tailProcessor DO

IF new > each.running.priority THEN new ← each.running.priority;

ENDLOOP;

lowestRunningPriority ← new;

};

ENDCASE;

Now release the metaLock and restore the new state.

ReleaseMetaLock[];

IF nextProcess # self THEN RestoreProcess[nextProcess];

Note that if we are the same process that no state needs to be restored, since nothing should have changed!

EXITS bailOut => ReleaseMetaLock[];

We get here when we have NOT found a better process to run. Therefore there is no change necessary, and things get done more quickly.

};

MakeReady: PROC [process: Process] = {

... places the given process on the ready queue using the meta link, provided that the process was waiting for a queue. Also removes process from the timeout queue if it is there. On entry, we require that critical traps must NOT be disabled.

DisableInterrupts[];

AcquireMetaLock[];

IF process.queue # NIL THEN

SELECT process.state FROM

waitingCV, waitingML, waitingICV => {

priority: Priority = process.priority;

DequeueMeta[@timeoutMetaQueue, process];

remove from timeout queue if present

EnqueueReady[process];

place the process on the ready queue (also maintains highestReadyPriority)

IF priority > lowestRunningPriority THEN deferredWakeups ← TRUE;

The new process might be worth waking, but we don't do it here. We just remember that it was desirable.

};

ENDCASE;

ReleaseMetaLock[];

EnableInterrupts[];

};

Enqueue: PROC [q: Queue, p: Process ← CurrentProcess] = {

... places the process on the queue in FIFO order. Priority processing is the responsibility of Dequeue.

AcquireQueue[q];

{

tail: Process = q.chain;

IF tail = NIL

THEN p.next ← p

ELSE {p.next ← tail.next; tail.next ← p};

q.chain ← p;

p.queue ← q;

};

ReleaseQueue[q];

};

Dequeue: PROC [q: Queue] RETURNS [best: Process ← NIL] = {

... performs dequeue by process priority. We perform a linear search of every process in the queue to find the best. The best process in the queue is the first one we find with the maximum priority. Performing the priority testing in Dequeue is more expensive for the case where there are several things on the queue, but it allows us to change process priorities without altering the queue. If the process we find has also timed out, we have to handle the case elsewhere, but it is OK for Dequeue to dequeue such a process. Dequeue does not remove such a process from the timeout queue.

lag: Process ← q.chain;

IF q.chain # NIL THEN {

AcquireQueue[q];

best ← DequeueInner[q];

ReleaseQueue[q];

};

DequeueInner: PROC [q: Queue] RETURNS [best: Process ← NIL] = INLINE {

... is the inside part of Dequeue. While we still dequeue by process priority, we assume that we have already grabbed ownership of the queue. This allows us to have common code for this case and the more delicate notification required by interrupt requests & page faults.

lag: Process ← q.chain;

IF lag # NIL THEN {

tail: Process = lag;

best ← lag.next;

IF lag = best

THEN q.chain ← NIL

only process in queue, so clear the chain

ELSE {

head: Process ← best; -- loop invariant: lag.next = best

this: Process ← head;

next: Process ← this.next;

IF next = head THEN EXIT; -- we have wrapped

IF next.priority > best.priority THEN {lag ← this; best ← next};

new candidate for the best process

this ← next;

ENDLOOP;

IF tail = best THEN q.chain ← lag;

don't queue chain point to deleted process

lag.next ← best.next;

};

best.queue ← NIL;

best.next ← NIL; -- the last change, due to test in ProcessSwitch

};

DequeueSelf: PROC [q: Queue] = {

... dequeues the current process from the given queue. This operation has no effect if the process was not in the queue.

IF CurrentProcess.next # NIL THEN {

lag: Process ← q.chain;

IF lag # NIL THEN {

AcquireQueue[q];

IF (lag ← q.chain) # NIL THEN {

tail: Process = lag;

p: Process = lag.next;

IF p = CurrentProcess THEN {

IF p = lag

THEN q.chain ← NIL

last one in queue, so show no chain

ELSE {

IF p = tail THEN q.chain ← lag;

don't let deleted one be the queue head

lag.next ← p.next;

};

CurrentProcess.queue ← NIL;

p.next ← NIL; -- the last change, due to test in ProcessSwitch

EXIT;

};

lag ← p;

IF lag = tail THEN EXIT;

ENDLOOP;

};

ReleaseQueue[q];

};

AcquireQueue: PROC [q: Queue] = {

... grabs ownership of the given queue. Performs Yield[ready, owner] until queue is owned. This directed Yield is necessary to overcome nasty interactions between spin locks and preemptive scheduling.

WHILE NOT CStoreProcess[ptr: @q.busy] DO

owner: Process ← q.busy;

IF owner = NIL THEN LOOP;

IF owner.state # running THEN DirectedYield[ready, owner, 0];

Try to donate our cycles to the owner of the queue. Note that if the owner is running on another processor that there is not need to Yield, since the queue should be given up quite soon.

ENDLOOP;

};

ReleaseQueue: PROC [q: Queue] = {

... releases ownership of the queue. Assumes that the current process owns the queue.

q.busy ← NIL;

};

Internal procedures (interrupts disabled, must not hold metaLock)

AcquireMetaLock: PROC = INLINE {

... grabs ownership of all of the meta queues. On entry, we require that critical traps are disabled, of course. On exit, we hold the meta lock.

WHILE NOT CStoreProcess[ptr: @metaLock] DO

spin using local cache only to reduce bus contention

WHILE metaLock # NIL DO ENDLOOP;

ENDLOOP;

};

SaveProcessState: PROC [process: Process] = INLINE {

... saves frames and process info from the current processor for the given process. On entry, we require that critical traps are disabled. On exit, there are no frames in the IFU stack (so a return will provoke a stack underflow). Note that this procedure must be an INLINE to keep from putting another frame on the stack.

save stack frames

SaveStack[];

save EU regs

FOR which: EUstateIndex IN EUstateIndex DO

process.euState[which] ← EUFetchSpecial[which];

ENDLOOP;

check for local stack overflow

IF WordToInt[process.euState[framesLeft]] <= 0 THEN

cause a stack overflow fault for the caller

NotYetImplemented[];

};

RestoreProcess: PROC [process: Process] = INLINE {

... restores the given process to a running state. Also changes the result of CurrentProcess. On entry, we require that critical traps are disabled, and process.state = running. Note that this procedure must be an INLINE to keep from putting another frame on the stack.

restore EU regs

FOR which: EUstateIndex IN EUstateIndex DO

EUStoreSpecial[which, process.euState[which]];

ENDLOOP;

};

Internal procedures (interrupts disabled, must hold metaLock)

DelicateNotify: PROC [cond: Condition] = INLINE {

... does a queue notification with the understanding that interrupts are disabled and the metaLock is held, so we just can't yield the processor. The condition must be resident, of course.

q: Queue = @cond.queue;

cond.flags[condRequestBit] ← TRUE;

Register the request just in case we can't get the queue.

IF CStoreProcess[ptr: @q.busy] THEN {

We got the queue, so we can make the best process in it ready. If we did not get the queue, then the owner of it will examine the flags pretty soon and see the notification. This only really works if the owner is trying to do a WaitInterrupt, though.

best: Process ← DequeueInner[q];

IF best # NIL THEN {

cond.flags[condRequestBit] ← FALSE;

The waiting process has been poked, no need to do it more than once.

DequeueMeta[@timeoutMetaQueue, best];

remove from timeout queue if present

EnqueueReady[best];

place the process on the ready queue (also maintains highestReadyPriority)

IF best.priority > lowestRunningPriority THEN deferredWakeups ← TRUE;

The new process might be worth waking, but we don't do it here. We just remember that it was desirable.

};

AssignProcessors: PROC = {

... sets processor orders based on whether the a ready process is "better" than any running processes; we run through the ready processes in highest to lowest order. At the end of the reassignment, if we have to change the orders of any processor other than our own then we raise reschedule to get the other processors to do their switching.

The cost of algorithm in this procedure has an upper bound of O(N*MIN[N,MAX[K,R]), where N is the number of processors, K is highestReadyPriority-lowestRunningPriority, and R is the number of ready processes with priority IN (lowestRunningPriority..highestReadyPriority]. In practice, the algorithm has somewhat better average behavior, since in most cases there will have been only one process made ready by a single device interrupt, an idle processor will answer the interrupt first, and immediately assign the new process to itself with no further reschedule activity. In this (anticipated) normal case the cost is O(K)+O(1). This algorithm could be made faster in the high cost cases by keeping the processors sorted by priority, but that would increase costs in ProcessSwitch so it might not be a net perfomance gain.

needReschedule: BOOL ← FALSE;

current: Processor = CurrentProcessor;

firstAvail: Processor ← current;

deferredWakeups ← FALSE;

At this point we are about to reassign all of the processors that need it. So nothing is really deferred.

FOR priority: Priority DECREASING IN (lowestRunningPriority..highestReadyPriority] DO

tail: Process = readyQueueArray[priority];

IF tail # NIL THEN {

FOR process: Process ← tail.next, process.next DO

best: Processor ← NIL;

bestPriority: Priority ← process.priority;

{

The first scan is for some available processor that could run the requested process. If none is found, then we exit all loops.

best ← firstAvail;

IF best.orders = noChange THEN {

np: Priority = best.running.priority;

IF np < bestPriority THEN {bestPriority ← np; EXIT};

};

firstAvail ← best.next;

IF firstAvail = current THEN GO TO done;

IF best.switchTo = process AND best.orders = switchToGiven THEN
GO TO alreadyAssigned;

ENDLOOP;

IF bestPriority # lowestRunningPriority THEN

If the first available processor found is not already at the minimum possible priority then it is worth scanning for another processor with a lower priority.

FOR p: Processor ← best.next, p.next WHILE p # current DO

SELECT p.orders FROM

switchToGiven => IF p.switchTo = process THEN GO TO alreadyAssigned;

noChange => {

np: Priority = p.running.priority;

IF np < bestPriority THEN {

best ← p;

bestPriority ← np;

IF np = lowestRunningPriority THEN EXIT;

};

ENDCASE;

ENDLOOP;

We have found the best choice for the given process.

best.switchTo ← process;

best.orders ← switchToGiven;

IF best # current THEN needReschedule ← TRUE;

If we are gave orders to another processor, then we rememebr the fact.

IF best = firstAvail THEN {

The best processor was the first one we found. Therefore we advance firstAvail to the next processor (unless that happens to be the current processor, in which case we are completely done scanning).

firstAvail ← best.next;

IF firstAvail = current THEN GO TO done;

};

EXITS alreadyAssigned => {

In this case the process was already assigned to a processor. So we want to choose another processor (starting at firstAvail) and another process. This should be rare.

};

IF process = tail THEN EXIT;

ENDLOOP;

EXITS done => EXIT;

};

ENDLOOP;

IF needReschedule THEN RaiseReschedule[];

};

ReleaseMetaLock: PROC = INLINE {

... releases ownership of the meta lock. On entry, we require that critical traps are disabled and that the metaLock is held. On exit, the metaLock is not held, although critical traps are still disabled.

metaLock ← NIL;

};

EnqueueReady: PROC [process: Process] = INLINE {

... places the given process on the ready queue using the meta link. On entry, we require that critical traps are disabled and that the metaLock is held. On exit, process.state = ready.

priority: Priority = process.priority;

EnqueueMeta[@readyQueueArray[priority], process];

IF priority > highestReadyPriority THEN highestReadyPriority ← priority;

process.state ← ready;

};

EnqueuePageFault: PROC [process: Process] = {

... places the given process on the page fault queue using the meta link. On entry, we require that critical traps are disabled and that the metaLock is held. On exit, process.state = waitingPage.

EnqueueMeta[@pageFaultMetaQueue, process];

process.state ← waitingPage;

};

EnqueueMeta: PROC [metaQueue: MetaQueue, process: Process] = INLINE {

... places the given process on the given meta queue using the meta link. On entry, we require that critical traps are disabled and that the metaLock is held.

IF process.meta = NIL THEN {

tail: Process = metaQueue^;

IF tail = NIL

THEN process.meta ← process

ELSE {process.meta ← tail.meta; tail.meta ← process};

metaQueue^ ← process;

};

DequeueMeta: PROC [metaQueue: MetaQueue, process: Process] = INLINE {

... removes the given process from the given meta queue (timeout, ready, pageFault). On entry, we require that critical traps must be disabled and that the metaLock is held. The process state is not changed except for the meta link.

IF process.meta # NIL THEN {

remove from meta queue if it is there

tail: Process = metaQueue^;

IF tail # NIL THEN {

lag: Process ← tail;

p: Process = lag.meta;

IF p = process THEN {

IF lag = p

THEN metaQueue^ ← NIL

last one in queue, so show no chain

ELSE {

IF p = tail THEN metaQueue^ ← lag;

don't let deleted one be the queue head

lag.meta ← p.meta;

};

p.meta ← NIL;

EXIT;

};

lag ← p;

IF lag = tail THEN EXIT;

ENDLOOP;

};

Low-level primitives

DisableInterrupts: PROC [] = {

... inhibits interrupts (must be a procedure call for Dragon)

NotYetImplemented[];

};

EnableInterrupts: PROC [] = {

... enables interrupts (must be a procedure call for Dragon)

NotYetImplemented[];

};

CStoreProcess: PROC
[ptr: ProcessPtr, old: Process ← NIL, new: Process ← CurrentProcess]
RETURNS [stored: BOOL] = -- MACHINE CODE -- {

... atomically performs {IF (stored ← ptr^ = old) THEN ptr^ ← new}.

IF (stored ← ptr^ = old) THEN ptr^ ← new;

};

CStoreWord: PROC
[ptr: WordPtr, old, new: Word]
RETURNS [stored: BOOL] = -- MACHINE CODE -- {

... atomically performs {IF (stored ← ptr^ = old) THEN ptr^ ← new}.

IF (stored ← ptr^ = old) THEN ptr^ ← new;

};

ReadAndClearDeviceRequests: PROC
RETURNS [rw: RequestWord ← NullRequestWord] = -- MACHINE CODE -- {

... atomically reads and clears the device requests. This is done by reading the appropriate EU I/O location (Interrupt Request Word) then clearing the bits read.

NotYetImplemented[];

};

DeviceRequestsPending: PROC
RETURNS [b: BOOL ← FALSE] = -- MACHINE CODE -- {

... Determines whether or not device requests are pending, but does not alter them. This can be done as a funny I/O command (preferable) or via a more synthetic method. If the latter, then it must agree with ReadAndClearDeviceRequests.

NotYetImplemented[];

};

RaiseReschedule: PROC = -- MACHINE CODE -- {

... raises the Reschedule interrupt, which, if disabled, will occur when it becomes enabled again. Wi

deferredWakeups ← FALSE;

NotYetImplemented[];

};

WordToInt: PROC [x: Word] RETURNS [INT] = -- MACHINE CODE -- {

... returns x > 0. This will be the correct implementation for Dragons, although it would do the wrong thing on Dorados.

RETURN [LOOPHOLE[x]];

};

EUFetchSpecial: PROC
[which: EUstateIndex]
RETURNS [w: Word ← WordZero] = -- MACHINE CODE -- {

... fetches a word from the given special EU register.

NotYetImplemented[];

};

EUStoreSpecial: PROC [which: EUstateIndex, x: Word] = -- MACHINE CODE -- {

... stores a word into the given special EU register.

NotYetImplemented[];

};

SaveStack: PROC = INLINE {

... saves the stack (updates the current value of the hook EU register), returning the number of frames saved. The stack is saved up to (and including) the eldest marked frame. Also updates the framesLeft EU register. For excruciating details, see GenStack.mesa.

NotYetImplemented[];

};

MarkFrame: PROC = INLINE -- MACHINE CODE -- {

... marks the youngest frame on the IFU stack as being the last one to save. The eldest frame marked is actually the last one saved by SaveStack. This procedure probably needs to be called when interrupts are disabled.

NotYetImplemented[];

};

NotYetImplemented: PROC = {

... marks operations with missing implementation details.

NotYetImplemented[];

};

Initialization

Initialization must setup the process blocks, the processor blocks, and the nachos. There will be a spin lock for performing initialization, with the first processor to grab the lock performing most of the work. Remaining processors will simply allocate themselves a processor object and wait for the scheduler to perform its tricks. None of this seems very hard, but none of it is written yet, either.

Init: PROC = {

NotYetImplemented[];

};

END.

Differences Between PrincOps and DragOps

Note: this section requires knowledge of the PrincOps architecture. I freely admit that my knowledge of said architecture is spotty at best. This section is also incomplete.

Multiple processors

With current PrincOps machines atomicity is acheived through a combination of locks, inhibition of interrupts, priority levels, and non-interruptibility of non-faulting microcode routines. In the DragOps arhcitecture, only locks (based on memory bus arbitration) and inhibition of interrupts for a single processor (based on IFU inhibition of the Reschedule interrupt) can be used as sources of atomicity. This complicates life, you betcha.

Memory vs. time

For PrincOps, memory was assumed to be expensive, and cycles were assumed to be expendable if it saved memory. For example, a process block costs only 8 bytes when no state vector is used, and only costs 36 additional bytes when a state vector is used (the unusual case).

The DragOps, memory is assumed to be relatively cheap, and cycles should only be spent to avoid gross memory costs. For example, a process block costs at least 40 bytes in the normal case, and there is no exceptional case.

The ready queue

For PrincOps, the head of the ready queue is the currently running process. For DragOps, running processes are not on the ready queue at all. Instead, running processes are located through the current ring of processors.

The DragOps ready queue is special in that it is actually an array of metaqueues, one for each priority level. This bounds the search for the best ready process to a time proportional to the number of priority levels (small), yet also allows us to place processes on the ready queue in unit time.

Queues and meta queues

For PrincOps, all queues are circular queues of processes linked through a particular word in the PSB (Process State Block). Condition variables, monitor locks, and even the ready queue are all queues where the queue pointer points at the tail of the processor queue. For DragOps, there are Queues and MetaQueues. Although both are circular queues, and both point at the tail, the link words are different. A DragOps process can be queued at the same time on both a queue and a metaqueue.

DragOps queues are used by "normal" Mesa monitors and condition variables. MetaQueues are used for more specialized reasons, like the ready queue (waiting for processor), the pageFault queue (waiting for a page to be brought into memory), and the timeout queue (waiting for some time to expire). Queues need not be pinned, while MetaQueues must be pinned. The scheduler is permitted to move processes between MetaQueues, but is not permitted to touch queues. All of this is done in the name of simplicity, you betcha.

Monitor locks & condition variables

For PrincOps, monitor locks and condition variables are managed by microcode that has carefully been constructed to not take interrupts at awkward times.

For DragOps, managing monitor lock and condition queues is too complex to be performed in a single instruction, so each monitor lock and condition has a spin lock that must be acquired before the queue can be examined or modified. A potential problem with this scheme is that a higher-priority process may attempt to acquire the spin lock while a lower-priority process possess the lock. Therefore, each spin lock has the address of the owning process, so a process that fails to acquire the lock can yiled control directly to the owning process. Although this is technically a violation of preemptive scheduling, it is necessary to prevent livelock (where no progress can be made because of the higher-priority process spinning).

State vectors

For PrincOps, the eval stack and other cleanup information is contained in a state vector. Minimal stack restrictions on monitor entry and condition wait ensure that the eval stack is empty when such processes go to sleep, which tends to keep the number of state vectors low.

For DragOps, there are no state vectors. When a process is about to go to sleep, it saves the frames in the IFU and EU stacks to memory, saves the appropriate EU registers to the process block, and puts itself on the appropriate metaqueue (if any). The only problem is ensuring that there are sufficient nachos to accomodate the stack of the process. The current thinking is to have an emergency-level process put every other process to sleep when frames are about to run out. When that has been done, a world-swap debug can take place.

An improvement on this scheme can be made if there is an error generated for a particular process when it has used "too many" frames. Since this is typically due to a process executing unbounded recursion, we can tolerate at least a few runaways without having to world-swap.

In both cases, we have to have some mechanism for keeping track of how many frames are in use by a given process, and how many are left in the global pool of frames. Details are yet to be specified.

Costs

For PrincOps, a process switch is on the order of 5 procedure call times. A procedure call (and return) is on the order of 5 microseconds (on the Dorado).

For DragOps, a process switch is on the order of 100 procedure call times (very rough estimate). Partly, this is because procedure calls are about 10 times faster, and partly this is because there is more per process state to save and restore.

Timeout

For PrincOps, a timeout occurs when the clock ticks and some PSB that is awaiting timeout has its wait count go to 0. All PSBs have to be touched at every tick, although this is done via microcode.

For DragOps, at each tick every process in timeoutMetaQueue has its when field decremented. If that field reaches 0, the process is placed in the ready queue. It then becomes the responsibility of the process to remove itself from the appropriate condition variable. This makes page faults (if any) due to touching the condition variable occur in the "right" process. Only PSBs waiting for timeout have to be touched at every tick.

Aborts

For PrincOps (as implemented for Cedar), aborts are enabled/inhibited on a per condition variable basis, and aborts are tested only by CVs & ICVs with aborts enabled and by explicit tests.

For DragOps, aborts are enabled/inhibited on a per process basis and on a CV basis. Abort requests are tested when waiting for CVs, ICVs, and MLs, as well as the normal explcit tests. Is it worth having the enable/disable on a per-process basis? (it costs rather little)

Things to do

This section is a list of things that still need some design work.

Fork / join

These need to be coded in greater detail.

Page faults

How will the fault handler work and place things to sleep waiting for page fault? For an EU page fault, the offending memory address is in the MAR register in the EU. For an IFU page fault the offending PC is for the start of the instruction, so the memory address to page in may be the next page (since this fits well with page clustering anyway, we may want to always page in the following page on an IFU page fault whenever the following page is also valid).

Soft and hard emergency limits on frames

These need support in the scheduler, as well as in stack save/restore. Reaching the soft limit (per process) should raise an error. This is just an attempt to restrain runaway processes. Reaching the hard limit will cause system shutdown and a call to the world-swap debugger. This probably will only happen if a significant number of runaways occur.

Timeout mechanism

In current Cedar relatively few condition variables have timeouts (the default is not to have timeouts). The current design for the timeout queue is based on the assumption that this situation will continue on Dragon. The current design has O(n) insertion and deletion of entries from the queue, O(n) updating for every tick, and O(1) testing, where n is the number of processes waiting for timeout. A heapsort-style priority queue has O(n log n) insertion and deletion, and O(1) testing, but it also has more complicated code, and needs an auxilliary pinned data structure (1 word / process). The current simple design will be used until it becomes a bottleneck.

Should the timeout "process" just be part of the scheduler? How will elapsed time be determined, and to what accuracy? Just for the record, there are three clocks of current interest to Cedar:

1: Time of day clock. This need not have resolution of better than a second, and it need not be locally stable, but it should have very good long-term accuracy.

2: Timeout clock. This need not have great accuracy, but it must generate interrupts, and it should have resolution of 10-100 milliseconds. More resolution is OK provided that timeout interrupts don't occur more often than every 10 milliseconds.

3: Elapsed time clock. This need not have great long-term accuracy, but it should have very good short-term accuracy, have roughly microsecond resolution, and it must be quite cheap to read. This clock can easily synthesize the Timeout clock.

Stack trace

SafeStorage needs a way to trace out the process stacks atomically. That is, all possible references in the process stacks must be discovered without any reference containing processes making headway. Dump frames are already pinned, and the panicStop mechanism can be used to stop all processes except for the current process (we may also want to have an stopRC command that would stop all processes that had not declared themselves to be non-RC). We can allocate frame extensions out of a special pool of memory that allows scanning. Possibly we should also make this pool pinned?

Reschedule & page fault

Setting L at procedure essentially transfers registers from the called frame to the calling frame, as far as stack save and restore are concerned. It will only work properly if both the calling frame and the called frame are in the EU (and IFU) registers. If the various compilers (and assembly language coders) guarantee that no calls are made before setting L, then there are only two sources of having the calling frame migrated into memory - reschedule & page fault. For these traps, if the youngest frame below the trap is P, and the next youngest is Q, and Q has zero registers, then both P and Q have to be restored before returning from the trap.