Heading:z18697qjk40(635)\g FileStore transaction code, version 1z18697y756qjk40\g Page Numbers: Yes X: 527 Y: 10.5"z18697qjk40\g CSL Notebook Entryz18697l4445y762\f5bg To: Alpine designers Date: February 20, 1981z18697l4445d2998e21(0,65535)(1,4445)(5,11684)(6,13653)\f1g3f0t2 1t0 16t6 1f1t0 5f0t7 1t0 From: Mark Brown Location: PARC/CSLz18697l4445d2998y716e25(6,13664)\f1g5f0t2 1t0 10t6 1f1t0 9f0t7 1t0 Subject: FileStore transaction code, version 1 File: [Ivy]Doc>FileStoreTransCode1.bravoz19579l4445d2998e25\f1g8f0t2 1t0 37t6 1f1t0 5f0t7 1f1t0 42f0 XEROX z18697l508y644e14(2116)\f2g5f0 Attributes: informal, technical, Database, Distributed computing, Filingz18697l4896d2999e10(0,4904)(1,65535)(5,65535)(6,65535)\f1g11f0 References: [Ivy]Doc>FileStoreDesign1.bravoz18697l4896d2999e10\f1 11f0 1f1 17g16G6f0 Abstract: This memo contains Mesa-like code implementing the Transaction Coordinator and Transaction Worker sections of FileStore.z18697l4896d2999e10j\f1g9f0 z18697e10(2116)\gv The following paragraphs provide some context for the code. The code is organized around the manipulation of volatile objects: collectible records, monitors, etc. Since this is a concurrent program there is no choice; we don't have stable collectible records or stable monitors. But the crucial invariants of the program involve the contents of the disk, since this is all that is available during crash recovery.d2998e12j(0,3352)(1,2998)(2,2998) The logd2998e12j\b7B The log is an infinite, sequential write-once, random access read, store for log records. The log is implemented using a circular buffer of disk pages. We make it seem infinite by (1) writing a page on tape before reusing it, and (2) never permitting a transaction to run so long that it would require access to portions of the log that now reside on tape. We achieve (2) by aborting any long-running transaction. The tape copy of the log can be used for recovery from media failures. If this recovery is not needed, the tape copy is unnecessary.d2998e12j\4i3I70i11I Each log record is uniquely identified by its log record ID. Depending on the structure of log records or pages, this might be the index relative to the beginning of the log of the first (or last) byte of the log record. It should be possible to do a random access read to an online log record, given the log record ID.d2998e12j\46i13I We make checkpoints in the log in order to bound the amount of work that is required during recovery. A checkpoint record contains, among other things, a list of all active transactions at the time of the checkpoint (both transactions whose outcome is not yet determined, and transactions that are committed or aborted but still have work to do.)d2998e12j\8i11I The restart record is the "root" of the log. It points to the most recent checkpoint. The restart record should be written in two independent places in order to tolerate failures during write. The restart record itself should be found without searching.d2998e12j\4i14I At restart it is also necessary to find the end of the log (the most recently written log record) by reading forward from the most recent checkpoint. There seem to be several ways of structuring log records or log pages to achieve this.d2998e12j Interfaced2998e12jk40\i9I We have not attempted to define a complete Mesa interface to the log. The code below uses the following three log calls.d2998e12j Log.Write: PROC [record: LogRecord, forceLog: BOOLEAN] RETURNS [LogRecordID];l4057d3704e6k40\f1g4f0 7f1 5f0 30f1 7f0 2f1 7f0 Writes record to the log, and returns its ID. If forceLog, then waits until record actually appears on the disk before returning.l4410j\g7i6I64i6I Log.Force: PROC [] RETURNS [LogRecordID];l4057d3704e6k40\f1g4f0 7f1 5f0 3f1 7f0 Returns the ID of the last record in the log. Waits until this record actually appears on the disk before returning.l4410j\g Log.FindTransState: PROC [transID: TransID] RETURNS [{committed, aborted, unknown}];l4057d3704e6k40\f1g4f0 16f1 5f0 19f1 7f0 Searches online log for outcome of given transaction.l4410j\g The code below uses several log record types. The precise encoding should not be taken seriously; only the information carried by each record is significant. The encoding will have an impact on the complexity of the recovery algorithm, however.d2998e12j Log.LogRecord: TYPE = RECORD [;l4057d3704e6\f1g4f0 11f1 5f0 1f1 8f0 transID: TransID,l4057e1\g SELECT * FROMl4057e1\f1g13f0 -- coordinatorl4410\g begin => [],l4410\g registerWorker => [worker: FileStoreID],l4410\g collecting => [workers: LIST OF FileStoreID, places: LIST OF FileStoreID],l4410\g24f1 8f0 21f1 8f0 committing => [],l4410\g aborting => [],l4410\g committed => [],l4410\g aborted => [],l4410\g -- workerl4410\g workerRegistered => [],l4410\g workerReady => [places: LIST OF FileStoreID],l4410\g24f1 8f0 workerCommitting => [],l4410\g workerAborting => [],l4410\g workerCommitted => [],l4410\g workerAborted => [],l4410\g -- both (one phase protocol on single FileStore)l4410\g workerCommittingAndCoordinatorCommitted => [],l4410\g workerAbortingAndCoordinatorAborted => [],l4410\g -- ...l4410\g ENDCASE ];l4410\f1g7f0 Recoverable statesd2998e12jk60\b18B We define the recoverable state of a transaction agent (coordinator or worker) to be the state to which it returns upon crash recovery. Only states recorded in logs are recoverable.d2998e12j\14i17I Recoverable states of a coordinatord2998e12jk40\i35I Unknown. A transaction is unknown if (1) it has no COLLECTING record in the log, or (2) has either a COMMITTED or an ABORTED record in the log. (It has not entered the first phase of the commit process, or it has committed and carried out all of its intentions, or it has aborted and undone all of its pre-commit updates.)d2998e12j\27i7I18f1 10f0 40f1 9f0 7f1 7f0 Collecting. A transaction is collecting if (1) it has a COLLECTING record in the log, and (2) has no COMMITTING or ABORTING record in the log. (It wishes to commit, is collecting "ready votes" from workers, but has not yet decided the transaction outcome.)d2998e12j\30i10I17f1 10f0 35f1 10f0 4f1 8f0 Committing. A transaction is committing if it has a COMMITTING record in the log but has no COMMITTED record in the log. (It has committed but has not yet heard "ack" from all of its workers.)d2998e12j\30i10I13f1 10f0 30f1 9f0 Aborting. A transaction is aborting if it has a ABORTING record in the log but has no ABORTED record in the log. (It has aborted but has not yet heard "ack" from all of its workers.)d2998e12j\28i8I13f1 8f0 30f1 7f0 Note that the outcome of a transaction (committed or aborted) is not viewed as a recoverable state, but instead these two outcomes are merged into the unknown state. Recovery must be prepared to deal with transactions that are so old that no online record of them exists. These transactions might have committed or aborted, but our algorithms should be insensitive to this, since otherwise we would be forced to mount a tape and search for the old transaction.d2998e12j\14i7I19i9I4i7I91i7I We consider that part of the coordinator's role in the system is to respond to requests for information about the outcome of specific transactions (this is the motivation for encoding the coordinator's FileStore in a TransID.) Workers should have no part in this. As a practical matter, the system is unlikely to answer requests that cannot be resolved by searching the online log.d2998e12j Recoverable states of a workerd2998e12jk40\i30I Unknown. A transaction is unknown if (1) it has no READY record in the log, or (2) has either a COMMITTED or an ABORTED record in the log. That is, either it has not entered the first phase of the commit process, or it has committed and carried out all of its intentions, or it has aborted and undone all of its pre-commit updates.d2998e12j\27i7I18f1 5f0 40f1 9f0 7f1 7f0 Ready. A transaction is ready if (1) it has a READY record in the log, and (2) has no COMMITTING or ABORTING record in the log. That is, it has voted ready to the coordinator, but has not yet heard the outcome of the transaction from the coordinator.d2998e12j\25i5I17f1 5f0 35f1 10f0 4f1 8f0 Committing. A transaction is committing if it has a COMMITTING record in the log but has no COMMITTED record in the log. That is, it has committed but not carried out all of its intentions.d2998e12j\30i10I13f1 10f0 30f1 9f0 Aborting. A transaction is aborting if it has a ABORTING record in the log but has no ABORTED record in the log. That is, it has aborted but not undone all of its pre-commit updates.d2998e12j\28i8I13f1 8f0 30f1 7f0 Discussiond2998e12j\i10I The worker states committing and aborting are an optimization. They allow less undo and redo work at restart by certifying that certain transactions are completely finished. This is also one of the functions of checkpoints, but checkpoints are taken less frequently than committing and aborting records are written. (Checkpoints themselves may be viewed as an optimization, since in principle the state of the file store can be restored by reading the log. In the same sense, the file store is an optimization over doing a linear scan of the log for each read access.)d2998e12j\18i10I5i8I232i10I5i9I Crash recoveryd2998e12j\b14B During crash recovery, we read the log to determine the recoverable state of each transaction that has been active since the most recent checkpoint. Then we perform the actions described below (later we'll have Mesa code for these too):d2998e12j Coordinatord2998e12j\i11I Unknown. If the transaction has no COMMITTED or ABORTED record in the log, we abort it (write ABORTED to the log, and optionally notify all of the workers whose identity has been logged); otherwise there is nothing to do.d2998e12j\36f1 9f0 4f1 7f0 39f1 7f0 Collecting. We call all workers again (they are identified in the COLLECTING record), asking for ready/notReady decisions.d2998e12j\67f1 10f0 Committing, Aborting. We call all workers again, telling them to commit or abort.d2998e12j Workerd2998e12j\i6I Unknown. If the transaction has no COMMITTED or ABORTED record in the log, we abort it (we must notify the coordinator); otherwise there is nothing to do.d2998e12j\36f1 9f0 4f1 7f0 36i4I Ready. We restore the transaction's volatile state from the log, reacquire the transaction's resources (locks), then wait for word from the coordinator.d2998e12j Committing, Aborting. We restore the transaction's volatile state from the log. This mainly means reconstructing the buffer pool at the time of the crash, as well as lists of other deferred updates such as file shortening and deletion. Locks do not have to be reacquired for pages (unlike the ready case), since in normal operation the buffer pool is treated as "truth". We then set in progress whatever processes are normally used to carry out intentions or undo updates. These are carefully coded to be idempotent, so they can tolerate any number of crashes.d2998e12j Remote procedure calld2998e12j\b21B The remote calls in this code are FileStore[F].PrepareWorker, FileStore[F].FinishWorker, and FileStore[F].RegisterWorker, where F is a FileStoreID. I chose this strange syntax since it cannot possibly be confused with the syntax we'll end up with.d2998e12j My assumption about a remote call is that when it returns, it has executed at the remote site one or more times. The result of the call, if any, is the result of one of the calls at the remote site.d2998e12j I do not quite understand what happens when the remote call fails because the remote site doesn't respond and we time out. I would prefer not to catch a signal in this case. But this leaves out procedures that return no results.d2998e12j\139i3I It may be helpful to consider the three remote procedures used here. If a call to RegisterWorker fails, then the proper response is to abort the operation that required the RegisterWorker. At a higher level, the client may then decide to abort the transaction. If a call to PrepareWorker fails, we must assume the worst: a call got through and the worker is, in fact, ready. The coordinator can decide to abort the transaction, but he retains the responsibility to notify the worker of this fact. Similarly, if a call to FinishWorker fails, we must repeat the call until it succeeds, since the worker must be assumed ready.d2998e12j Deficienciesd2998e12j\b12B There is some potential for useful special-casing of readonly workers. They do not have to participate in phase two of commit, since commit and abort are the same for a readonly worker. The code does not take advantage of this.d2998e12j The implementation of FinishTransaction is crude since it does not attempt to contact workers in parallel. In a real implemetation, we expect to have a group of processes whose role in life is to call PrepareWorker and FinishWorker. The process doing the FinishTransaction might be responsible for one of the calls, or (more likely) none of them. In the latter case, it could wait on a CV associated with the Coordinator record, and be notified as the responses arrived.d2998e12j Recovery needs to be coded to a similar level of detail. A strategy for cleaning up obsolete Coordinators and Workers needs to be specified; the current thing would accumulate them indefinitely.d2998e12j CoordinatorImpl: MONITOR LOCKS t USING t: Coordinatorl3351d2998e24k792\f5b17f1 14f0 1f1 7f0 14f1 IMPORTS Log, CoordinatorTable EXPORTS FileStore = BEGINl3351e1\f1b47B3b CreateTransaction: PUBLIC PROC [stableCreation: BOOLEAN] RETURNS [t: TransID] = {l3351d2998e6\bg17B2f1 11f0 18f1 7f0 2f1 7f0 c: Coordinator _ ConsCoordinator[l3704d3351\g transID: [ coordinatorFileStoreID: FileStorePrivate.MyFileStoreID[], idOnFileStore: FileStorePrivate.UniqueLongCardinalOnMyFileStore[]],l4057d3704\g35f1 17f0 32f1 17f0 transState: unknown];l4057d3704\g c.beginRecordID _ Log.Write[record: [c.transID, begin[]], forceLog: stableCreation];l3704d3351\g18f1 3f0 c.state _ active; IF stableCreation THEN c.recoverableState _ active;l3704d3351\g17f1 4f0 16f1 5f0 RETURN [c.transID] };l3704d3351\f1g6f0 ConsCoordinator: PROC [transID: TransID, transState: TransState] RETURNS [c: Coordinator] = {l3351d2998e6\g17f1 4f0 44f1 7f0 c _ NEW[CoordinatorObject _ [l3704d3351\g4f1 3f0 transID: transID, state: transState, recoverableState: transState,l4057d3704\g coordinator: MyFileStoreID[] ]];l4057d3704\g CoordinatorTable.Insert[c];l3704d3351\f1g16f0 11f1 RETURN[c] };l3704d3351\f1g6f0 CoordinatorFileStoreIDFromTransID: PUBLIC PROC [t: TransID] RETURNS [FileStoreID] = {l3351d2998e6\bg33B2f1 11f0 14f1 7f0 RETURN [t.coordinatorFileStoreID] };l3704d3351\f1g6f0 RegisterWorker: PUBLIC ENTRY PROC [t: TransID, w: FileStoreID, stableRegistration: BOOLEAN] RETURNS [success: BOOLEAN] = {l3351d2998e6\bg14B2f1 17f0 50f1 7f0 2f1 7f0 11f1 7f0 -- The caller (a Worker) should have no monitors locked.l3704d3351\f1g c: Coordinator _ CoordinatorFromTransID[t: t, whereToLook: volatileOnly];l3704d3351\g IF c = NIL OR c.state # active THEN RETURN [FALSE];l3704d3351\f1g2f0 5f1 7f0 17f1 11f0 2f1 5f0 2f1 IF ~IsMember[w, c.workers] THEN { c.workers _ Append[w, c.workers]; [] _ Log.Write[record: [t, registerWorker[w]], forceLog: stableRegistration] };l3704d3351\f1g2f0 25f1 5f0 2f1 1f0 39f1 3f0 RETURN [TRUE] };l3704d3351\f1g6f0 2f1 4f0 CoordinatorFromTransID: PROC [t: TransID, whereToLook: {volatileOnly, volatileOrLog}] RETURNS [Coordinator] = {l3351d2998e6\g24f1 4f0 58f1 7f0 IF CoordinatorFileStoreIDFromTransID[t] # FileStorePrivate.MyFileStoreID[] THEN ERROR;l3704d3351\f1g2f0 40f1 17f0 16f1 10f0 c: Coordinator _ CoordinatorTable.Lookup[t];l3704d3351\g17f1 16f0 11f1 IF c # NIL OR whereToLook = volatileOnly THEN RETURN[c];l3704d3351\f1g3f0 4f1 7f0 27f1 11f0 4f1 -- No volatile record of t exists. t has been recorded in the log unless it was created with stableCreation = FALSE. Search the online portion of the log for the outcome of t. (We should keep volatile records of recent transactions to keep the frequency of this operation down to an acceptable level. By doing some work at checkpoint time, only checkpoint records need to be looked at, and these are linked together. We can use the fact that TransIDs are generated in a fixed order to bound the search in the case that no record was made of a transaction.)l3704d3351\f1g s: TransState _ Log.FindTransState[t];l3704d3351\g16f1 3f0 19f1 SELECT s FROMl3704d3351\f1g6f0 3f1 4f0 committed, aborted => NULL;l4057d3704\g22f1 4f0 -- normal case.l4410d4057\f1g unknown => s _ FileStorePrivate.ResolveUnknownTransaction[t];l4057d3704\g15f1 16f0 29f1 -- t not found in log; appeal to a higher authority.l4410d4057\f1g active, collecting, committing, aborting => ERROR;l4057d3704\g44f1 ENDCASE => ERROR;l4057d3704\f1g7f0 4f1 6f0 RETURN[ConsCoordinator[t, s]] };l3704d3351\f1g6f0 FinishTransaction: PUBLIC ENTRY PROC [t: TransID, requestedOutcome: {commit, abort}, commitRecordPlaces: LIST OF FileStoreID] RETURNS [outcome: {committed, aborted, unknown}] = {l3351d2998e6k144\bg17B2f1 17f0 69f1 7f0 14f1 7f0 -- To notify its coordinator of a spontaneous abort, a worker causes FinishTransaction[t, abort, NIL] (with no Worker monitors locked).l3704d3351\f1g c: Coordinator _ CoordinatorFromTransID[t: t, whereToLook: volatileOrLog];l3704d3351\g IF c = NIL THEN RETURN [unknown];l3704d3351\f1g2f0 5f1 3f0 1f1 12f0 DO SELECT c.state FROMl3704d3351\f1g9f0 9f1 4f0 active => EXIT;l4057d3704\g10f1 -- Normal case.l4410d4057\f1g committed => RETURN [committed];l4057d3704\g13f1 6f0 aborted => RETURN [aborted];l4057d3704\g11f1 6f0 collecting => WAIT c.outcomeDetermined;l4057d3704\g14f1 5f0 19f1 -- Another activation of this procedure is at work. Wait for it to finish.l4410d4057\f1g ENDCASE => ERROR; -- includes committing, aborting, unknown.l4057d3704\f1g7f0 4f1 6f0 1f1 42f0 ENDLOOP;l4057d3704\f1g8f0 IF c.workers = NIL THEN {l3704d3351\f1g2f0 13f1 3f0 1f1 5f0 -- Funny transaction; no workers! Don't complain.l4057d3704\f1g [] _ Log.Write[record: [t, requestedOutcome-ed[]], forceLog: TRUE];l4057d3704\g5f1 3f0 53f1 4f0 c.recoverableState _ outcome _ requestedOutcome; GOTO done };l4057d3704\g49f1 4f0 IF commitRecordPlaces = NIL THEN commitRecordPlaces _ LIST [c.coordinator];l3704d3351\f1g2f0 22f1 4f0 1f1 5f0 21f1 4f0 IF c.workers.first = c.coordinator AND c.workers.rest = NIL AND commitRecordPlaces.first = c.coordinator AND commitRecordPlaces.rest = NIL THEN {l3704d3351\f1g3f0 32f1 3f0 18f1 8f0 41f1 3f0 27f1 8f0 -- Single FileStore case; use one phase protocol. Worker will do the necessary log write.l4057d3704\f1g c.recoverableState _ outcome _ FileStorePrivate.PrepareAndFinishWorker[t, requestedOutcome];l4057d3704\g31f1 17f0 GOTO done };l4057d3704\f1g4f0 IF requestedOutcome = abort THEN {l3704d3351\f1g2f0 26f1 5f0 -- Abort request. Use one phase protocol.l4057d3704\f1g [] _ Log.Write[record: [t, aborting[]], forceLog: TRUE];l4057d3704\g5f1 3f0 42f1 4f0 FOR w: FileStoreID IN c.workers DOl4057d3704\f1g4f0 15f1 3f0 9f1 3f0 [] _ FileStore[w].PrepareWorker[transID: t, requestedOutcome: abort, commitRecordPlaces: NIL];l4410d4057\g89f1 3f0 ENDLOOP;l4410d4057\f1g [] _ Log.Write[record: [t, aborted[]], forceLog: FALSE];l4057d3704\g5f1 3f0 41f1 5f0 c.recoverableState _ outcome _ aborted; GOTO done };l4057d3704\g41f1 5f0 -- Trying to commit a multi FileStore transaction. Use two phase protocol.l3704d3351\f1g [] _ Log.Write[record: [t, collecting[c.workers, commitRecordPlaces]], forceLog: TRUE];l3704d3351\g5f1 3f0 73f1 4f0 c.state _ c.recoverableState _ collecting;l3704d3351\g -- We are now responsible for doing the job. To do this properly we should have a number of processes whose role in life is to call PrepareWorker or FinishWorker, and wait for the return. If we wait in the monitor, others who enter will wait for us to determine the outcome. In this crude implementation, we do not wait.l3704d3351\f1g outcome _ committed;l3704d3351\g FOR w: FileStoreID IN c.workers DOl3704d3351\f1g4f0 15f1 3f0 9f1 3f0 IF FileStore[w].PrepareWorker[transID: c.transID, requestedOutcome: requestedOutcome, commitRecordPlaces: NIL] = notReady THEN outcome _ aborted;l4057d3704\f1g2f0 104f1 3f0 13f1 4f0 ENDLOOP;l4057d3704\f1g [] _ Log.Write[record: [t, outcome-ing[]], forceLog: TRUE];l3704d3351\g5f1 3f0 45f1 4f0 c.state _ c.recoverableState _ outcome-ing;l3704d3351\g FOR w: FileStoreID IN c.workers DOl3704d3351\f1g4f0 15f1 3f0 9f1 3f0 FileStore[w].FinishWorker[transID: c.transID, requestedOutcome: outcome];l4057d3704\g ENDLOOP;l4057d3704\f1g [] _ Log.Write[record: [t, outcome-ed[]], forceLog: FALSE];l3704d3351\g5f1 3f0 44f1 5f0 c.recoverableState _ outcome-ed; GOTO done;l3704d3351\g34f1 5f0 EXITSl3704d3351\f1g5f0 done => RETURN[c.state _ outcome]l3704d3351\g10f1 6f0 };l3704d3351\g END.--CoordinatorImpll3351d2998e1\f1b3f0B3f5b15f0B WorkerImpl: MONITOR LOCKS w USING w: Workerl3351d2998e24k792\f5b12f1 14f0 1f1 7f0 9f1 IMPORTS Log, WorkerPrivate, WorkerTable EXPORTS FileStore, FileStorePrivate = BEGINl3351e1\f1b13g15G PrepareWorker: PUBLIC PROC [t: TransID, requestedOutcome: {commit, abort}, commitRecordPlaces: LIST OF FileStoreID] RETURNS [{readonlyReady, ready, notReady}] = {l3351d2998e6\bg13B2f1 11f0 47f1 1f0 21f1 7f0 14f1 7f0 PrepareWorkerEntry: ENTRY PROC [w: Worker, requestedOutcome: {commit, abort}, commitRecordPlaces: LIST OF FileStoreID] RETURNS [{readonlyReady, ready, notReady}] = {l3704d3351\g20f1 10f0 46f1 1f0 21f1 7f0 13f1 8f0 SELECT w.state FROMl4057d3704\f1g6f0 9f1 4f0 active => NULL; -- normal case.l4410d4057\g10f1 4f0 1f1 16f0 ready => GOTO readyReturn;l4410d4057\g9f1 5f0 committed => RETURN [ready]; --result is arbitraryl4410d4057\g13f1 6f0 10f1 21f0 aborted => RETURN [notReady]; --result is arbitraryl4410d4057\g11f1 6f0 13f1 21f0 ENDCASE => ERROR;l4410d4057\f1g7f0 4f1 5f0 IF requestedOutcome = commit AND PrepareToCommit[w: w, twoPhase: TRUE] THEN {l4057d3704\f1g2f0 27f1 4f0 32f1 4f0 2f1 5f0 [] _ Log.Write[record: [w.transID, workerReady[commitRecordPlaces]], forceLog: TRUE];l4410d4057\g5f1 4f0 70f1 4f0 w.state _ w.recoverableState _ ready;l4410d4057\g36f1 1f0 GOTO readyReturn }l4410d4057\f1g5f0 ELSE {l4057d3704\f1g5f0 AbortWorker[w]; RETURN [notReady] }l4410d4057\g15f1 8f0 EXITSl4057d3704\f1g5f0 readyReturn => RETURN [IF w.readonly THEN readonlyReady ELSE ready];l4410d4057\g15f1 6f0 2f1 3f0 11f1 5f0 14f1 5f0 };--PrepareWorkerEntryl4057d3704\g w: Worker _ WorkerTable.Lookup[t];l3704d3351\g12f1 11f0 IF w = NIL THEN RETURN [notReady];l3704d3351\f1g2f0 5f1 3f0 1f1 4f0 1f1 6f0 RETURN [PrepareWorkerEntry[w, requestedOutcome, commitRecordPlaces]];l3704d3351\f1g6f0 };l3704d3351\g FinishWorker: PUBLIC PROC [t: TransID, requiredOutcome: {commit, abort}] = {l3351d2998e6\bg12B2f1 11f0 PrepareWorkerEntry: ENTRY PROC [w: Worker, requiredOutcome: {commit, abort}] = {l3704d3351\g20f1 10f0 SELECT w.state FROMl4057d3704\f1g6f0 9f1 4f0 ready => NULL; -- normal case.l4410d4057\g9f1 4f0 1f1 16f0 ENDCASE => RETURN; --we must have already finished.l4410d4057\f1g7f0 4f1 6f0 1f1 33f0 IF requestedOutcome = commit THEN CommitWorker[w: w, twoPhase: TRUE]l4057d3704\f1g2f0 27f1 4f0 30f1 4f0 ELSE AbortWorker[w: w, twoPhase: TRUE] };l4057d3704\f1g5f0 28f1 4f0 w: Worker _ WorkerTable.Lookup[t];l3704d3351\g12f1 11f0 IF w = NIL THEN RETURN; --we must have already finished.l3704d3351\f1g2f0 5f1 3f0 1f1 4f0 1f1 6f0 1f1 33f0 FinishWorkerEntry[w, requiredOutcome] };l3704d3351\g PrepareAndFinishWorker: PUBLIC PROC [t: TransID, requestedOutcome: {commit, abort}] RETURNS [{committed, aborted}] = {l3351d2998e6\bg22B2f1 11f0 49f1 7f0 PrepareAndFinishWorkerEntry: ENTRY PROC [w: Worker, requestedOutcome: {commit, abort}] RETURNS [{committed, aborted}] = {l3704d3351\g29f1 10f0 47f1 8f0 SELECT w.state FROMl4057d3704\f1g6f0 9f1 4f0 active => NULL; -- normal case.l4410d4057\g10f1 4f0 1f1 16f0 ENDCASE => ERROR; --this is local only, and protected by coordinator's monitor.l4410d4057\f1g7f0 4f1 5f0 1f1 62f0 IF requestedOutcome = commit AND PrepareToCommit[w: w, twoPhase: FALSE] THEN {l4057d3704\f1g2f0 27f1 4f0 32f1 5f0 2f1 5f0 CommitWorker[w: w, twoPhase: FALSE]; RETURN [committed] }l4410d4057\g29f1 5f0 2f1 8f0 ELSE {l4057d3704\f1g5f0 AbortWorker[w: w, twoPhase: FALSE]; RETURN [aborted] }l4410d4057\g28f1 5f0 2f1 8f0 w: Worker _ WorkerFromTransID[t: t, createWorkerIfNecessary: FALSE];l3704d3351\g61f1 5f0 IF w = NIL THEN ERROR; --this is local only, and protected by coordinator's monitor.l3704d3351\f1g2f0 5f1 3f0 1f1 4f0 1f1 5f0 1f1 62f0 PrepareAndFinishWorkerEntry[w, requestedOutcome] };l3704d3351\g PrepareToCommit: PROC [w: Worker, twoPhase: BOOLEAN] RETURNS [ready: BOOLEAN] = {l3351d2998e6\g17f1 4f0 23f1 7f0 1f1 8f0 9f1 7f0 -- Here we must check to ensure that the transaction has all of the resources necessary for commitment. This is much easier if they have actually been allocated already, for instance file creation and file lengthening not deferred. If updates have been deferred using update mode locks, these must be converted to exclusive mode at this time. If twoPhase = TRUE, then all exclusive locks must be written into the log.l3704d3351\f1g RETURN [ ... ] };l3704d3351\f1g6f0 CommitWorker: PROC [w: Worker, twoPhase: BOOLEAN _ TRUE] = {l3351d2998e6\g14f1 4f0 23f1 7f0 3f1 4f0 IF twoPhase THENl3704d3351\f1g2f0 10f1 4f0 [] _ Log.Write[record: [w.transID, workerCommitting[]], forceLog: TRUE];l4057d3704\g5f1 4f0 57f1 4f0 ELSEl3704d3351\f1g4f0 [] _ Log.Write[record: [w.transID, workerCommittingAndCoordinatorCommitted[]], forceLog: TRUE];l4057d3704\g5f1 4f0 80f1 4f0 -- Here we perform the volatile actions associated with transaction commitment. These are: make all buffer pool entries written by this transaction be the "truth" versions (discard old versions if present in buffer pool). Carry out other deferred updates such as file deletion and shortening. Finally, release locks held by this transaction.l3704d3351\f1g [] _ Log.Write[record: [w.transID, workerCommitted[]], forceLog: FALSE];l4057d3704\g5f1 4f0 56f1 5f0 };l3704d3351\g AbortWorker: PROC [w: Worker, twoPhase: BOOLEAN _ TRUE] = {l3351d2998e6\g13f1 4f0 23f1 7f0 3f1 4f0 IF twoPhase THENl3704d3351\f1g2f0 10f1 4f0 [] _ Log.Write[record: [w.transID, workerAborting[]], forceLog: FALSE];l4057d3704\g5f1 4f0 55f1 5f0 ELSEl3704d3351\f1g4f0 [] _ Log.Write[record: [w.transID, workerAbortingAndCoordinatorAborted[]], forceLog: FALSE];l4057d3704\g5f1 4f0 76f1 5f0 -- Log force is not required since transaction will be aborted in recovery (but we might do it anyway if it turns out to simplify recovery).l4410d4057\f1g -- Here we perform the volatile actions associated with transaction abortion. These are: discard all buffer pool entries written by this transaction. Undo non-deferred updates such as file creation and lengthening. Finally, release locks held by this transaction.l3704d3351\f1g [] _ Log.Write[record: [w.transID, workerAbortted[]], forceLog: FALSE];l4057d3704\g5f1 4f0 55f1 5f0 };l3704d3351\g END.--WorkerImpll3351d2998e1\f1b3f0B3f5b10f0B WorkerImpl2: MONITORl3351d2998e24\f5b13f1 IMPORTS Log, WorkerTable EXPORTS FileStorePrivate = BEGINl3351e1\f1b49B3b CreateWorker: PUBLIC PROC [t : TransID, stableCreation: BOOLEAN] RETURNS [Worker] = {l3351d2998e6\bg12B2f1 11f0 31f1 7f0 1f1 8f0 -- This implementation ensures that (1) no duplicate Workers (two Workers with the same transID) are created, (2) no monitors are locked when a remote call to RegisterWorker is made. This implementation sometimes makes a redundant remote call to RegisterWorker (in the case that two processes try to migrate the same transaction to the same FileStore at about the same time), but RegisterWorker must be prepared for duplicated calls anyway.l3704d3351\f1g LogWorker: ENTRY PROC [w: Worker] = {l3704d3351\g11f1 11f0 IF w.state = unknown THEN { [] _ Log.Write[record: [t, workerRegistered[]], forceLog: FALSE]; w.state _ active };l4057d3704\f1g3f0 17f1 6f0 8f1 3f0 50f1 5f0 IF stableCreation AND w.recoverableState = unknown THEN { [] _ Log.Force[]; w.recoverableState _ active } };l4057d3704\f1g3f0 15f1 4f0 28f1 6f0 7f1 3f0 w: Worker _ WorkerTable.Lookup[t];l3704d3351\g12f1 11f0 IF w # NIL THEN RETURN[w];l3704d3351\f1g3f0 4f1 15f0 IF FileStore[CoordinatorFileStoreIDFromTransID[t]].RegisterWorker[ t, MyFileStoreID[], stableCreation] THEN {l3704d3351\f1g12f0 91f1 4f0 w _ ConsWorker[transID: t, transState: unknown];l4057d3704\g LogWorker[w] };l4057d3704\g RETURN [w] };l3704d3351\f1g6f0 ConsWorker: ENTRY PROC [transID: TransID, transState: TransState] RETURNS [w: Worker] = {l3351d2998e6\g12f1 10f0 44f1 7f0 -- This procedure never takes very long, which is good since every transaction creation passes through this monitor.l3704d3351\f1g IF (w _ WorkerTable.Lookup[t]) # NIL THEN RETURN[w];l3704d3351\f1g3f0 5f1 11f0 14f1 15f0 w _ NEW[WorkerObject _ [l3704d3351\g4f1 3f0 transID: transID, state: transState, recoverableState: transState,l4057d3704\g coordinator: CoordinatorFileStoreIDFromTransID[transID],l4057d3704\g worker: FileStorePrivate.MyFileStoreID[] ]];l4057d3704\g8f1 17f0 WorkerTable.Insert[w];l3704d3351\f1g11f0 11f1 RETURN[w] };l3704d3351\f1g6f0 END.--WorkerImpl2l3351d2998e1\f1b3f0B3f5b11f0B FileStorePrivate: DEFS = BEGINl3351d2998e24k792\f5b18f1 4B3b l3351d2998e1\g FileStoreID: TYPE = MACHINE DEPENDENT RECORD [ ... ];l3351d2998e1\g13f1 4f0 3f1 24f0 FileObject: TYPE = RECORD [ file: FileID, trans: Worker, fileLockMode: LockMode ];l3351d2998e1\g12f1 4f0 3f1 6f0 l3351d2998e1\g TransactionID: TYPE = MACHINE DEPENDENT RECORD [ coordinatorFileStoreID: FileStoreID, idOnFileSystem: LONG CARDINAL ];l3351d2998e1\g15f1 4f0 3f1 24f0 56f1 13f0 -- Note that a FileStoreID, not a MachineID, is needed (because of CoordinatorFileStoreIDFromTransID).l3704d3351e1\g TransState: TYPE = MACHINE DEPENDENT { unknown, active, ready, collecting, committing, aborting, committed, aborted };l3351d2998e1\g12f1 4f0 3f1 17f0 -- or something like that ... this is the union of coordinator and worker states.l3704d3351e1\g CoordinatorObject: TYPE = MONITORED RECORD [ transID: TransID, state, recoverableState: TransState, beginRecordID: LogRecordID _ NULL, coordinator: FileStoreID, workers: LIST OF FileStoreID _ NIL ];l3351d2998e1\g19f1 4f0 3f1 16f0 87f1 4f0 37f1 7f0 15f1 3f0 WorkerObject: TYPE = MONITORED RECORD [ transID: TransID, state, recoverableState: TransState, coordinator: FileStoreID, worker: FileStoreID ];l3351d2998e1\g14f1 4f0 3f1 16f0 l3351d2998e1\g MyFileStoreID: PROC [] RETURNS [FileStoreID];l3351d2998e6\bg13B2f1 4f0 3f1 8f0 UniqueLongCardinalOnMyFileStore: PROC [] RETURNS [LONG CARDINAL];l3351d2998e6\bg31B2f1 4f0 3f1 8f0 2f1 13f0 ResolveUnknownTransaction: PROC [t: TransID] RETURNS [outcome: TransState {committed, aborted}];l3351d2998e6\bg25B2f1 4f0 14f1 7f0 PrepareAndFinishWorker: PROC [t: TransID, requestedOutcome: {commit, abort}] RETURNS [{committed, aborted}];l3351d2998e6\bg22B2f1 4f0 49f1 7f0 -- Caller (an implementation of BFS.FinishTranasction) asserts that w is the only worker of the transaction and that w.worker = w.coordinator. Caller requests w to finish with the specified outcome, writing the COMMIT record on w.coordinator. Worker can return the opposite of the requested outcome.l3704d3351e1\g212f1 6f0 CreateWorker: PROC [t: TransID] RETURNS [Worker];l3351d2998e6\bg12B2f1 4f0 13f1 8f0 -- If worker exists for this transaction, just returns it; otherwise creates it.l3704d3351\f1g l3351d2998e1\g END.--FileStorePrivatel3351d2998e1\f1b3f0B3f5b16f0B CoordinatorTable: DEFS = BEGINl3351d2998e24\f5b18f1 4f0 3f1 -- This is a monitor.l3704d3351\f1g Insert: PROC [c: Coordinator];l3351d2998e6\bg6B2f1 4f0 -- Caller asserts that no Coordinator x in the table has x.transID = c.transID. Add c to the table.l3704d3351\f1g Lookup: PROC [t: TransID] RETURNS [Coordinator];l3351d2998e6\bg6B2f1 4f0 14f1 7f0 -- Return c in table such that c.transID = t; return NIL if no such c exists.l3704d3351\f1g Delete: PROC [c: Coordinator];l3351d2998e6\bg6B2f1 4f0 -- Caller asserts that c is in the table. Remove c from table.l3704d3351\f1g END.--CoordinatorTablel3351d2998e1\f1b3f0B3f5b16f0B WorkerTable: DEFS = BEGINl3351d2998e24\f5b13f1 4f0 3f1 -- This is a monitor.l3704d3351\f1g Insert: PROC [w: Worker];l3351d2998e6\bg6B2f1 4f0 -- Caller asserts that no Worker x in the table has x.transID = w.transID. Add w to the table.l3704d3351\f1g Lookup: PROC [t: TransID] RETURNS [Worker];l3351d2998e6\bg6B2f1 4f0 14f1 7f0 -- Return w in table such that w.transID = t; return NIL if no such w exists.l3704d3351\f1g Delete: PROC [w: Worker];l3351d2998e6\bg6B2f1 4f0 -- Caller asserts that w is in the table. Remove w from table.l3704d3351\f1g END.--WorkerTablel3351d2998e1\f1b3f0B3f5b11f0B