Heading:
FileStore interface, version 1
Page Numbers: Yes X: 527 Y: 10.5"
CSL Notebook Entry
To:Alpine designersDate:February 25, 1981
From:Mark BrownLocation:PARC/CSL
Subject:FileStore interface, version 1File:[Ivy]<Alpine>Doc>FileStoreDesign1.bravo
XEROX
Attributes:informal, technical, Database, Distributed computing, Filing
Abstract:This memo proposes a design for Alpine’s FileStore interface. The design is far from complete, but the facilities for multiple file system transactions have been worked out in some detail.
Change history
February 25, 1981 3:54 PM. Lock stuff moved out (LockConcepts0.bravo, LockDesign0.bravo.) Sierra -> Alpine.
February 19, 1981 3:16 PM. Interface name changed from "BFS". We are attempting to erase the distinction between the regular interface and "on the wire". The volatile structures "Coordinator", "Worker", and "File" have disappeared. The type LockID has been defined, following Ed Taft’s suggestion.
Introduction
This memo proposes a design for Alpine’s FileStore interface. FileStore is a low-level file system interface, several instances of which may exist on a single machine. Some instances will represent local file systems, while another may have a stub implementation that talks to a remote file system via RPC.
FileStore provides primitives for coordinating transactions that involve multiple file systems. These are given in the sections "Transaction Coordinator" and "Transaction Worker" below. Some care has been taken to provide primitives that can take advantage of important special cases such as single file system transactions and readonly transactions. We do not take the Lampson-Sturgis paper’s "always ready" approach to transactions, since the benefits of this approach are unclear while its cost is a heavier transaction mechanism. The transaction primitives are arranged so that duplicate calls, which will happen occasionally with a cheap RPC, will not cause problems. Some aspects of the interface are designed with an eye to supporting replicated commit records as a future option, but the interface does not fully support them now.
A FileStore on a workstation will not normally use its local file system to coordinate multiple file system transactions, since a crashed coordinator can tie up resources on other file systems. It is perfectly ok for a workstation’s FileStore to coordinate its own transactions, or to act as a worker in a multiple file system transaction, since in these cases the only resources that it can tie up in a crash are its own.
Several planned-for or at least talked-about items are missing from the interface as proposed here. Assuming that we wish to build a file transfer protocol on top of FileStore, the FileStore should accept some form of prefetch hint to help it schedule transfers. Databases and fast streams may wish to deal with groups of pages (what Pilot calls "uniform swap units") rather than single pages. Databases will want to log application-specific undo and redo actions and set application-specific locks. We planned to support encryption at a low level. The FileStore should perform low-level authentication, at least to the extent of verifying that it is being called by a known piece of software. We have not yet specified the means of validating file caches or manipulating file properties (read, write, and create times, string names, etc.), or any form of "broken read lock", transaction commit without releasing locks (Juniper "checkpoint"), or transaction save point.
FileStore: DEFS = BEGIN
FileStoreID: TYPE = System.UniversalID;
-- Unique ID of a file store.
FileID: TYPE = System.UniversalID;
-- Unique ID of a file.
TransID: TYPE [SIZE[FileStoreID]+SIZE[LONG CARDINAL]];
-- Unique ID of a transaction.
LockMode: TYPE = MACHINE DEPENDENT
{ S, U, X, IS, IU, IX, SIU, SIX };
-- or something like that ... . This type will really be defined in an interface specific to locks.
-- Files
CreateFile: PROC [f: FileID, initialPageLength: PageLength, t: TransID];
-- Pass in FileID so that a copy of an immutable file can be created.
DeleteFile: PROC [f: FileID, t: TransID];
-- This could be made available in both U and X mode (see WritePage below), but it is probably not worth the trouble.
LockFile: PROC [f: FileID, t: TransID, fileLockMode: LockMode];
-- Locks file f in mode fileLockMode for access under transaction t.
ReadPage: PROC [f: FileID, t: TransID, p: PageIndex, pageLockMode: LockMode {S, U, X} ← S]
RETURNS [data: Page];
-- Reads data from page p of file f. Client will specify a pageLockMode stronger than S if it expects to update the page later in the transaction. Page locking will not be done if an existing file lock subsumes the required page lock. An existing file lock may be converted to a stronger mode in order to lock the page in the requested mode.
WritePage: PROC
[f: FileID, t: TransID, p: PageIndex, data: Page, pageLockMode: LockMode {U, X} ← U];
-- Writes data on page p of file f. Client will specify pageLockMode = U if it wishes WritePage to return immediately without waiting for readers of page p to leave (client may still have to wait, but waiting will happen at commit time.)
GetLength: PROC [f: FileID, t: TransID, lengthLockMode: LockMode {S, U, X} ← S]
RETURNS [pageLength: PageLength];
SetLength: PROC
[f: FileID, t: TransID, pageLength: PageLength, lengthLockMode: LockMode {U, X} ← U];
-- Transaction Coordinator
CreateTransaction: PROC [stableCreation: BOOLEANFALSE] RETURNS [t: TransID];
-- Creates a Coordinator for a brand-new transaction. If stableCreation, then CreateTransaction will not return until a stable record of creating the transaction has been written. Without stableCreation, there is a small window in which a coordinator crash will cause the coordinator to forget that the transaction ever existed. Since this crash certainly aborts the transaction, stableCreation is not mandatory.
CoordinatorFileSystemIDFromTransID: PROC [t: TransID] RETURNS [FileStoreID];
-- Returns location of coordinator of transaction t ("location of coordinator" means "file system that will record the outcome"). We expect the encoding of the FileStoreID in the TransID to be relatively simple.
RegisterWorker: PROC [t: TransID, w: FileStoreID, stableRegistration: BOOLEANFALSE] RETURNS [success: BOOLEAN];
-- Requests permission to perform actions under transaction t on file system w. The request is a noop if w is already a worker for t. The request is refused (returns success = FALSE) if t.state # active. If stableRegistration, then RegisterWorker will not return until the registration has been written to stable storage. Without stableRegistration, there is a small window in which a coordinator crash will cause the coordinator to forget that the worker is part of the transaction. Since this crash is guaranteed to abort the transaction, stableRegistration is not mandatory, but it allows workers to have a higher tolerance of transaction inactivity before unilaterally aborting a transaction.
FinishTransaction: PROC [t: TransID, requestedOutcome: {commit, abort},
commitRecordPlaces:
LIST OF FileStoreID ← NIL]
RETURNS [outcome: {committed, aborted, unknown}];
-- Caller asserts that t was created on this FileStore (ERROR if not true), and requests t to commit or abort. If the transaction t has already committed or aborted, returns the outcome; returns unknown if t is so old that the coordinator cannot find the outcome. Otherwise, the COMMIT record will written to commitRecordPlaces (NIL means this file store).
-- Transaction Worker
PrepareWorker: PROC [t: TransID, requestedOutcome: {commit, abort}, commitRecordPlaces: LIST OF FileStoreID]
RETURNS [{readonlyReady, ready, notReady}];
-- Caller (an implementation of FinishTranasction above) requests the worker w to get ready to finish (commit or abort.) Worker immediately returns ready if it is already ready. The commit record for transaction w will be stored on the indicated file systems (passing the places allows for future use of replicated commit, in which a ready worker can attempt to determine the transaction outcome itself.) If w cannot guarantee to commit when FinishTransaction is called (for instance, if no volatile record of the worker can be found on this FileStore), it must respond notReady. (readonlyReady is same as ready, but worker certifies that phase two will be a no-op for it, regardless of the required outcome.)
FinishWorker: PROC [t: TransID, requiredOutcome: {commit, abort}];
-- Caller (an implementation of FinishTranasction above) asserts that w is recoverably in the ready state (because of a call on PrepareWorker), and demands that w finish with the specified outcome. Note that if no volatile record of the worker can be found on this FileStore, then the worker has already finished due to a previous call on FinishWorker, and this call should return immediately.
END.--FileStore