Heading:
FileStore public interfaces, version 8
Page Numbers: Yes X: 527 Y: 10.5" First Page: 13
6. Transactions and locks
All FileStore actions are carried out under transactions that ensure atomicity and consistency of updates in the face of concurrent requests from multiple clients, crashes of servers or clients, etc.
Briefly, a client first requests that some FileStore create a transaction; the FileStore subsequently serves as the coordinator of that transaction. If actions are to be carried out in other FileStores under the same transaction, the client must cause those FileStores to become workers for that transaction.
Reads and writes are performed under the transaction. The transaction machinery ensures that other clients will not see the state of the FileStore in a partially-updated (inconsistent) state; this is done by setting locks on files or parts of files, as discussed in section 6.2.
Eventually the client either commits or aborts the transaction. Committing causes all writes occurring under the transaction to be made permanently in the state of the FileStore and to be made visible to other transactions. Aborting causes all the writes to be abandoned, and makes the permanent FileStore state be as if those writes had never occurred. A transaction can also be aborted by a server or client crash or by a detected deadlock among transactions attempting to lock data in conflicting ways.
6.1. AlpineTransaction interface
All Mesa definitions in this section are declared in AlpineTransaction.
ConversationID: TYPE = RPC.ConversationID;
TransID: TYPE [9];
Create: PROCEDURE [conversation: ConversationID] RETURNS [trans: TransID];
Creates a new transaction, for which this FileStore is to be the coordinator.
A TransID may be treated as a capability, since it contains enough bits of unpredictable information to make it extremely difficult to forge. Therefore a transaction is shared among clients or private to the initiator according to whether or not the initiator hands the TransID to other clients. Alpine enforces no restrictions on who may present a particular TransID.
A FileStore on a workstation will not normally use its local file system to coordinate transactions involving multiple FileStores, since a crashed coordinator can tie up resources on other file systems. It is perfectly ok for a workstation’s FileStore to coordinate its own transactions, or to act as a worker in a multiple file system transaction, since in these cases the only resources that it can tie up in a crash are its own.
FileStore: TYPE = BodyDefs.RName;
CreateWorker: PROCEDURE [conversation: ConversationID, trans: TransID, coordinator: FileStore];
--! Unknown {coordinator, transID};
Informs the FileStore that it is to serve as a worker under an existing transaction trans which is being coordinated by coordinator. The client must call CreateWorker before performing any operations under trans on a FileStore that is not the transaction’s coordinator. CreateWorker gives rise to some communication between the worker and the coordinator; Unknown[coordinator] is raised if the worker cannot locate or cannot contact the coordinator.
RequestedOutcome: TYPE = {abort, commit, commitAndContinue};
Outcome: TYPE = {abort, commit, unknown};
Finish: PROCEDURE [conversation: ConversationID, trans: TransID, requestedOutcome: RequestedOutcome] RETURNS [outcome: Outcome, newTrans: TransID];
--! Unknown {transID};
Requests that transaction trans be finished in the specified way. This call must be made to the FileStore that is coordinating the transaction.
If transaction trans has already committed or aborted, Finish simply returns the outcome. Otherwise, Finish waits if necessary to assure a known outcome (which may differ from the one requested), and returns that outcome.
If requestedOutcome=commit (and outcome=commit), all actions previously requested are committed, all locks are released, and trans is terminated. If requestedOutcome=commitAndContinue (and outcome=commit), actions are likewise committed and trans terminated, but locks are downgraded rather than released, and a new transaction newTrans is created holding those locks. Locks are downgraded as follows: write, update, readIntendWrite, and readIntendUpdate to read; intendWrite and intendUpdate to intendRead. All OpenFileHandles referring to trans are changed to refer instead to newTrans, so the client may subsequently perform operations on existing open files under the new transaction.
The following are all the signals raised in AlpineTransaction:
Unknown: ERROR [what: UnknownType];
UnknownType: TYPE = {transID, coordinator};
6.2. Semantics of locks
This is a brief functional overview of Alpine file locks; for background material and further detail, see ‘‘Alpine lock manager concepts’’.
All locking is performed implicitly, as a side-effect of read and write operations on files and other objects. Consequently there is no public interface for dealing explicitly with locks; however, most operations provide optional means for a certain amount of client control over locks. Local clients, i.e., ones located on the same machine as the FileStore, can access the lock manager directly for the purpose of setting higher-level ‘‘logical’’ locks.
LockMode: TYPE = {read, update, write, intendRead, intendUpdate, intendWrite, readIntendUpdate, readIntendWrite};
When an object such as a file page is accessed, it is first locked in some mode. A read access must lock the object in read, update, or write mode, while a modify access must lock the object in update or write mode. Locking in update or write mode during a read access is sometimes useful when the client knows that it will later modify the object during the same transaction. The distinction between update and write modes will be explained shortly.
Once a lock is set on an object under some transaction, it will prevent certain types of access by any other transaction until the transaction holding the lock has terminated. The interaction between locks is defined by the function Compat[r, e], where r is a lock mode being requested and e is the mode of an existing lock on the same object by another transaction; it returns TRUE if r may be set immediately and FALSE if r may not be set until e has been removed.
Compat[read, read]=TRUECompat[read, update]=TRUECompat[read, write]=FALSE
Compat[update, read]=
TRUECompat[update, update]=FALSECompat[update, write]=FALSE
Compat[write, read]=
FALSECompat[write, update]=FALSECompat[write, write]=FALSE
The interaction between read and write locks is conventional: multiple readers or at most one writer (but not both) can coexist. An update lock is effectively a read lock at the time it is set but is converted automatically into a write lock at the time the transaction is committed (which is when any modifications to the locked object are logically performed); thus an update can proceed in parallel with any ongoing reads, but committing the update may require waiting until all read locks have been removed.
A transaction may upgrade one of its own locks by converting it to a stronger lock, with relative strength given by read < update < write. This is performed automatically, for example, when an object is read and later written during a single transaction. Of course, this upgrade may be blocked by locks set by other transactions.
Files are locked at two levels: the entire file may be locked in some mode, or pages, properties, etc., within the file may be individually locked. Locking the entire file substantially reduces the amount of work the server must do, since individual locks then need not be applied during each operation on the file; this is the appropriate style of locking for bulk file transfers and transactions involving private files. On the other hand, locking individual pages is required for shared data bases to maintain adequate concurrency.
The whole-file lock modes are read, update, and write, just as for other objects; these locks effectively ‘‘cover’’ individual operations whose modes are no stronger than the whole-file lock.
When pages of a file are to be locked individually as operations are performed, the entire file is locked in a weaker intention mode that specifies the strongest page lock expected; these modes are intendRead, intendUpdate, and intendWrite. For all x,y, Compat[intendx, intendy]=TRUE, but Compat[intendx, y]=Compat[x, y] and Compat[x, intendy]=Compat[x, y]. This enables detection of potential conflicts between page and file locks at the time the file is locked.
If an individual operation would require a lock stronger than the one implied by an intention-mode file lock, the file lock is automatically upgraded; this upgrade can of course be blocked by file locks set by other transactions.
Additionally, there are combination modes readIntendUpdate and readIntendWrite, which immediately lock the entire file in read mode with intention to perform individual operations requiring update or write locks.
Most operations have an optional lockOption argument, which permits the client to specify the lock mode to be used and the action to be taken if a conflict occurs; the default values of these arguments are appropriate for most applications. If lockOption.mode is too weak for the operation, it is ignored and the default used instead. If the lock cannot immediately be set and lockOption.ifConflict=fail, the operation immediately raises the error LockFailed[conflict]; otherwise the operation waits until the lock can be set, raising LockFailed[deadlock] or LockFailed[timeout] if a deadlock or timeout occurs. Not all deadlocks can immediately be detected as such; a lock timeout is treated as evidence of a deadlock. <The timeout interval is to be determined.>
7. FileStore global operations
The AlpineAccess interface contains procedures for querying the global state of the FileStore and for operating on the owner data base.
nullVolumeGroupID: VolumeGroupID = [...];
GetNextVolumeGroup: PROCEDURE [conversation: ConversationID, previousGroup: VolumeGroupID] RETURNS [group: VolumeGroupID, members: LIST OF VolumeID];
--! Unknown {volumeGroupID};
Enables the client to enumerate the volume groups and volumes comprising a FileStore. If previousGroup=nullVolumeGroupID, GetNextVolumeGroup returns the first volume group; and when previousGroup is the last VolumeGroup in the FileStore, it returns nullVolumeGroupID.
<This description will surely evolve to include procedures for obtaining additional information about VolumeGroups and individual Volumes.>
Each VolumeGroup has associated with it an owner data base, as mentioned in section 4. The following procedures operate on the owner data base in various ways. All operations are performed under a transaction specified by the client. Many of the operations require that the client be a member of the ‘‘Alpine wheels’’ group, as discussed in section 4.2.
PageCount: TYPE = AlpineEnvironment.PageCount;
CreateOwner: PROCEDURE [conversation: ConversationID, trans: TransID, vol: VolumeGroupID, owner: OwnerName, spaceQuota: PageCount, createAccess, ownerAccess: AccessList];
--! AccessFailed {alpineWheel}, LockFailed, OperationFailed {ownerAlreadyExists, ownerDataBaseFull, tooManyRNames}, Unknown {transID, volumeGroupID};
DestroyOwner: PROCEDURE [conversation: ConversationID, trans: TransID, vol: VolumeGroupID, owner: OwnerName];
--! AccessFailed {alpineWheel}, LockFailed, OperationFailed {spaceInUseByThisOwner}, Unknown {owner, transID, volumeGroupID};
GetNextOwner: PROCEDURE [conversation: ConversationID, trans: TransID, vol: VolumeGroupID, previousOwner: OwnerName, lockDataBase: BOOLEAN ← FALSE] RETURNS [owner: OwnerName, spaceQuota, spaceConsumed: PageCount, createAccess, ownerEntryAccess: AccessList];
--! LockFailed, Unknown {volumeGroupID};
<lockDataBase=FALSE is of doubtful value, since a complete enumeration will read-lock the entire data base anyway.>
Each owner has associated with it two values relating to disk usage, spaceQuota and spaceConsumed, and two access control lists, createAccess and ownerEntryAccess. Members of an owner’s createAccess list are permitted to create files whose disk space is charged to that owner. Members of ownerEntryAccess are permitted to make changes to the owner entry itself.
OwnerProperty: TYPE = {space, createAccess, ownerEntryAccess};
OwnerPropertyValuePair: TYPE = RECORD [
SELECT property: OwnerProperty FROM
space => [spaceQuota, spaceConsumed: PageCount],
createAccess => [createAccess: AccessList],
ownerEntryAccess => [ownerEntryAccess: AccessList],
ENDCASE];
ReadOwnerProperties: PROCEDURE [conversation: ConversationID, trans: TransactionID, vol: VolumeGroupID, owner: OwnerName, desiredProperties: LIST OF OwnerProperty ← NIL] RETURNS [properties: LIST OF OwnerPropertyValuePair];
--! LockFailed, Unknown {owner, transID, volumeGroupID};
If desiredProperties=NIL, ReadOwnerProperties returns all properties for owner, ordered as in the declaration of OwnerProperty; otherwise it returns the desired properties in the order requested.
WriteOwnerProperties: PROCEDURE [conversation: ConversationID, trans: TransactionID, group: VolumeGroupID, owner: OwnerName, properties: LIST OF OwnerPropertyValuePair];
--! AccessFailed {alpineWheel, ownerEntry}, LockFailed, OperationFailed {tooManyRNames}, Unknown {owner, transID, volumeGroupID};
Writes all properties given in the properties list. In the case of the space property, only spaceQuota is written and spaceConsumed is ignored. To write ownerCreate requires that the client be a member of the file’s ownerEntryAccess list; to write space or ownerEntryAccess requires that the client be an Alpine wheel.
The following are all the signals raised by the procedures described in this section:
AccessFailed: ERROR [missingAccess: NeededAccess];
NeededAccess: TYPE = {alpineWheel, ownerEntry};
LockFailed: ERROR [failure: LockFailure];
LockFailure: TYPE = {..., deadlock, timeout};
OperationFailed: ERROR [why: OperationFailure];
OperationFailure: TYPE = {ownerAlreadyExists, ownerDataBaseFull, tooManyRNames, spaceInUseByThisOwner};
Unknown: ERROR [what: UnknownType];
UnknownType: TYPE = {owner, transID, volumeGroupID};
References
[Alpine, 1981]
Alpine documentation is kept on-line in the [Ivy]<Alpine>doc> directory. In the file names, ‘‘*’’ stands for a number; successive releases of a document are assigned the next higher number.
‘‘Alpine file server overview’’, AlpineOverview.press
‘‘Alpine lock manager concepts’’, LockConcepts*.bravo
‘‘FileStore interface internals’’, FileStoreInternal*.bravo
[Needham & Schroeder, 1978]
Roger M. Needham and Michael D. Schroeder, ‘‘Using encryption for Authentication in Large Networks of Computers’’,
Communications of the ACM, vol. 21 no. 12, December 1978.
Change history
Version 8; October 13, 1981 5:29 PM. Add AlpineFile.Close, LockPages, UnlockPages. Finishing a transaction with commitAndContinue now returns a new TransID rather than permitting additional operations under the old transaction. Substantially simplify interface to RPC runtime machinery.
Version 7; September 21, 1981 8:38 AM. Further refine to conform with RPC design. Add high water mark mechanism. Delete the individual property read/write procedures. Document semantics of locks. Change LockMode to LockOption. Change to read/write owner entry in same style as file properties.
Version 6; September 14, 1981 8:46 AM. Change style of this memo to make it serve better as client programmer’s documentation for the FileStore public interface; remove internal details (implementation strategies and the like) to a separate memo, ‘‘FileStore interface internals’’. Split the former FileStore interface into four pieces: AlpineEnvironment, AlpineAccess, AlpineFile, and AlpineTransaction. Bring conversation initiation and authentication into conformity with the current RPC design. Make a first cut at specifying the signals.
Version 5; August 25, 1981 12:41 PM. Make interface handle-oriented, since RPC requires it; see Handle, Create, Destroy. A Handle corresponds to a client talking to a server, not to a volume or a machine; some procedures take a volume (or volume group) parameter. Make the identification FileStore = log, and introduce a separate notion of logical disk volume and volume group. Add no-logging option to Open/CreateFile; this takes the place of noTransaction, since it handles the one situation in which we can see that clients will want to give up recovery to reduce the size of the log (large-scale replication of files, where the home copy is kept logged and other copies are unlogged.) OpenFileID -> OpenFileHandle, AccessListID -> AccessList (a list of RNames.) Add section on owner database (CreateOwner, ... , DestroyOwner), and expanded file properties section to include all properties we now plan to support.
Version 4; May 29, 1981 5:36 PM. Introduced notion of file owner, who is charged/credited for pages used/released in Create/DeleteFile and SetLength.
Version 3; May 15, 1981 11:02 AM. Introduced types OpenFileID, ClientID, AccessListID, ... . Added noTransaction. More detail on file opening and creation; support file create with and without an externally supplied ID (used to be only with.) Length -> ByteLength throughout. List-oriented file property operations. Much more work required on property interface, including locking issues.
Version 2; March 12, 1981 11:22 AM. Added more detail to the ‘‘Files’’ section, including procedures to manipulate file attributes. Made lock modes consistent with LockDesign0. Made minor changes to Transaction section: RegisterWorker takes a ‘‘restart ID’’ to allow unilateral abort of worker; FinishTransaction and FinishWorker take a requiredOutcome that may be commitAndContinue or commitAndTerminate; expanded comments on Transaction section operations.
Version 1; February 25, 1981 3:54 PM. Lock stuff moved out (to LockConcepts0.bravo, LockDesign0.bravo.) Sierra -> Alpine.
Version 0; February 19, 1981 3:16 PM. Interface name changed from ‘‘BFS’’. We are attempting to erase the distinction between the regular interface and ‘‘on the wire’’. The volatile structures ‘‘Coordinator’’, ‘‘Worker’’, and ‘‘File’’ have disappeared. The type LockID has been defined, following Ed Taft’s suggestion.
Unfinished business
Read ahead / write behind hints in the interface. Be sure that fast streams can be implemented on top of FileStore.
Log/recovery interface.
Cache registration (at what level should this be done?)
Lock interface (no conceptual difficulty, details can be deferred to implementation time.)
There used to be a CoordinatorFileSystemIDFromTransID operation, which is now hard to implement given that FileStores are identified by RNames rather than UniqueIDs. How necessary is this operation?
ContentID property?
Indefinitely deferred business
Validation of remote page caches (through a history database that is coordinated with file page updates.)
Transaction save points?