Heading:
OpenFileMap and LogMap, version 1
Page Numbers: Yes X: 527 Y: 10.5"
Inter-Office Memorandum
ToAlpine projectDateSeptember 8, 1981
FromEd TaftLocationPARC/CSL
SubjectOpenFileMap and LogMap,File[Ivy]<Alpine>Doc>FileLogMap1.bravo
version 1
XEROX
Attributes:informal, draft, technical, Alpine, distributed computing, filing
Abstract:OpenFileMap and LogMap are internal Alpine interfaces supporting FileStore. They maintain volatile state information required for efficient access to files and to the log.
The OpenFileMap and LogMap are volatile structures supporting the implementation of the Alpine FileStore. They are accessed through internal interfaces that are not exported (via RPC) to Alpine clients.
The OpenFileMap maintains state information associated with an OpenFileHandle, which represents a single client’s access to a single file under a single transaction.
The LogMap maintains, in an efficiently accessible form, the differences between the abstract state of the FileStore (including both committed and uncommitted transactions) and the actual contents of the base file system. The LogMap is an entirely redundant version of the Log, and much of its contents are simply references to Log entries.
OpenFileMap
Opening a file, as presently specified in FileStore, serves two purposes:
1.It defines a point at which file access control and whole-file locking are performed.
2.It associates a brief handle, the OpenFileMap.Handle, with a considerable amount of state that is thereafter kept by the server and need not be supplied in each call. This state includes the transaction ID, the client’s identity, access rights to the file, and any additional per-file state that FileStore may wish to retain from one file operation to the next.
It is the function of OpenFileMap to maintain this state information. However, OpenFileMap understands only the syntax and not the semantics of the state information. It is the caller’s responsibility to authenticate the ClientID, verify the access rights, register the lock mode with the Locks manager, etc.
Interface
OpenFileMap: DEFINITIONS =
BEGIN
Handle: TYPE [2];
A Handle must be unique for all time, to enable detection of dangling references. However, it is specific to a particular FileStore instance, so it need not be a full UniqueID. A Handle is not a capability: all uses of it must be validated by the caller, by calling Validate (below), before calling any other procedures exported via OpenFileMap.
ConversationID: TYPE = RPC.ConversationID;
The handle associated with a (potentially) secure conversation with some client. It is maintained by RPC and appears as an argument of each procedure exported to remote clients by FileStore.
Create: PROCEDURE [conversation: ConversationID, client: ClientID, trans: TransID, volume: VolumeID, file: FileID] RETURNS [Handle];
The caller asserts that conversation represents an authenticated conversation with the client denoted by ClientID. The other arguments are not checked but are simply remembered in state associated with the returned Handle. Remaining fields of that state are initialized to reasonable default values (e.g., no locks, no access, etc.)
Passing a VolumeID may or may not be necessary, depending on our decision whether or not to provide a way to replicate a file with the same FileID as the original.
Validate: PROCEDURE [handle: Handle, conversation: ConversationID];
Raises a signal if handle is invalid or if conversation does not match the ConversationID associated with handle. This is intended to be be called upon entry to every procedure exported from FileStore that operates on an open file. Its implementation is expected to be efficient.
Destroy: PROCEDURE [handle: Handle];
Immediately destroys the Handle and associated state information. Should be called when a transaction has been committed or aborted. (Do we need to keep a file open between committing and completing the transaction? I think not.)
SetLockMode: PROCEDURE [handle: Handle, lockMode: LockMode];
SetAccessRights: PROCEDURE [handle: Handle, accessRights: AccessRights];
... etc., for setting state information that may vary while a file is open. (These are perhaps not good examples.)
GetClientID: PROCEDURE [handle: Handle] RETURNS [ClientID];
GetFileID: PROCEDURE [handle: Handle] RETURNS [FileID];
... etc., to retrieve state information associated with the open file.
Enumerate: PROCEDURE [proc: EnumerationProc];
EnumerateTransaction: PROCEDURE [proc: EnumerationProc, trans: TransID];
EnumerationProc: TYPE = PROCEDURE [handle: Handle];
For each entry in the OpenFileMap (Enumerate), or for each entry corresponding to trans (EnumerateTransaction), calls proc[handle].
EnumerateTransaction is used to find all Handles belonging to a transaction; its implementation is expected to be efficient. It is used twice when preparing to commit a transaction: first to flush (via the FilePageMgr) all pages of files written by the transaction, and then to destroy all the Handles themselves.
END.
Implementation
It is possible to assign realistic yet relatively modest bounds on the total storage required for the OpenFileMap; e.g., 200 simultaneous clients, with an average of 20 open files each, at 50 words per OpenFileMap entry (a wildly generous allowance), amounts to only 200K words. Therefore it is reasonable to maintain this data structure entirely in virtual memory, presumably using Cedar collectible storage for the values.
The map representation should be highly efficient for lookup, since a lookup must be performed on nearly every FileStore operation. Efficiency of insertion and deletion are of secondary importance, since those operations are performed only for OpenFile and Commit/Abort. There is no ordering relationship among Handles. Consequently, a hash table is probably the best data structure for the OpenFileMap. (Since the representation of a Handle is private to the implementation, it can use low-order bits as an index into the hash table.)
LogMap
The LogMap serves two purposes:
1.It keeps track of all the logged intentions required to carry out a transaction after it is committed. Consulting the LogMap for this purpose is logically equivalent to searching the log for all the transaction’s entries, but is intended to be much more efficient.
2.It maintains information required to divert or modify FileStore actions when the abstract state of the FileStore is represented in the log rather than in the base file system. The most common example of such a diverted action is a read of data previously written in the same transaction.
For the first purpose, the LogMap needs to be remember only the existence of a log entry. Consequently, every deferred action written into the log must be recorded in the LogMap.
The LogMap could also record entries for immediate actions, i.e., those that must be undone if the transaction is aborted. This would eliminate the need to search the log (backward) for such entries. We believe that aborting a transaction is sufficiently uncommon that the cost of such searches is acceptable.
For the second purpose, the LogMap additionally needs to remember things about the log entry. That is, it needs to understand enough about the semantics of the action specified by the log entry so that it can maintain an up-to-date model of the virtual state of the FileStore.
The decision whether or not the LogMap understands about certain types of entries is somewhat arbitrary. We might consider structuring the LogMap as two levels, the lower of which just registers uninterpreted log entries and the upper of which understands the semantics of log entries.
Assumptions and limitations
In its present form, the LogMap deals only with log entries for file-level activity. It might be extended to deal with higher-level (logical) logging, but I don’t have a good understanding of what is required at higher levels.
This design of the LogMap maintains at most two ‘‘virtual’’ states of any given piece of the FileStore: the one seen by the (single) transaction currently updating that piece, and the one seen by all other transactions. That is, if an intention to change some piece of the FileStore exists in the log, and the transaction has not been either carried out or aborted, and a new request is made to access the same piece of the FileStore, then a client participating in the transaction should see the state recorded in the log, while any other client should see the state of the base.
This arrangement is intended to support ‘‘update mode’’ transactions, where reads and writes do not conflict at the time they are requested but rather are serialized at commit time. However, there must be at most one deferred write pending to any given piece of the FileStore, so that the LogMap need not maintain arbitrarily many ‘‘virtual’’ states. This depends on the correct operation of Locks.
This version of LogMap assumes only full-page updates; but extending it to handle partial-page updates should be straightforward.
Interface
LogMap: DEFINITIONS =
BEGIN
LogID: TYPE = Log.ShortLogRecordID;
General types and operations. These are for registering uninterpreted log entries whose semantics are not known to the LogMap.
EntityKey: TYPE = RECORD [
file: FileID,
SELECT type: * FROM
filePage => [page: PageNumber],
property => [property: Property],
size => NULL,
...
ENDCASE];
We refer to each independently updatable piece of the FileStore (file page, file property, etc.) as an Entity (need a better term??). For the purposes of the LogMap, every Entity is addressable by means of an EntityKey. For every distinguishable type of deferred file-level log record there must be a distinct EntityKey.type.
Location: TYPE = {base, log};
The current state of any given Entity (as viewed under some transaction) is located either in the base or in the log; the LogMap’s primary function is to remember which.
RegisterUpdate: PROCEDURE [entity: EntityKey, trans: TransID, logID: LogID];
Registers with the LogMap a log entry, identified by logID, for performing a deferred update to entity under transaction trans.
If RegisterUpdate is called multiple times on the same entity during one transaction, the latest update overrules the earlier ones in the sense that it is the only one accessible via LocateEntity (below); however, the earlier updates remain in the LogMap and are revealed by EnumerateTransaction (below). Exception: if the LogMap knows enough about the semantics of the updates to determine that a later update has entirely overruled the effects of an earlier one, then the earlier LogMap entry is deleted entirely.
It is illegal to register multiple updates to the same entity under different transactions.
An important consequence of these semantics is that after a transaction is committed, any deferred update to an Entity under that transaction must be carried out before any new updates to the same Entity may be registered. It is the caller’s responsibility to ensure this. Perhaps LogMap should export a procedure to assist this.
LocateEntity: PROCEDURE [entity: EntityKey, trans: TransID] RETURNS [location: Location, logID: LogID];
Determines the location of the current state of entity as viewed by transaction trans. If the LogMap contains no entry for entity, or if the last update to entity was under a different transaction than trans and that transaction has not been committed, then returns location=base; logID is undefined. If the last update to entity was under the same transaction as trans or that transaction has been committed, returns location=log; logID identifies the log entry for that update.
EnumerateTransaction: PROCEDURE [proc: EnumerationProc, trans: TransID];
EnumerationProc: TYPE = PROCEDURE [logID: LogID] RETURNS [delete: BOOLEAN];
For each entry in the LogMap corresponding to transaction trans, calls proc[logID], where logID identifies the log record. After proc returns, if delete=TRUE then the entry is deleted from the LogMap. The LogMap is enumerated in the order in which the entries were originally registered (i.e., in ascending order of logID).
EnumerateTransaction is used by FileStore when carrying out the deferred actions of a committed transaction. In this context, proc should return delete=TRUE always; delete=FALSE may be useful for checkpointing transactions.
Specialized types and operations. These are for registering log entries whose semantics are known to the LogMap. This specialization is useful for two reasons. First, interactions between updates can be taken account of. Second, significant optimizations in internal representation are possible; for example, compound updates to consecutive file pages can be represented as single LogMap entries.
PageRun: TYPE = FilePageMgr.PageRun;
-- = RECORD [firstPage: PageNumber, count: CARDINAL ← 1];
RegisterWritePages: PROCEDURE [file: FileID, pageRun: PageRun, trans: TransID, logID: LogID];
Records in the LogMap the fact that data intended to be written to file at pageRun has been entered into the log as a record identified by logID. FileStore should call this procedure with the result returned by Log.RecordDeferredPageWrite.
Writing pages previously written during the same transaction is OK; for any given page, the LogMap remembers only the LogID of the most recent write.
MappedPageRun: TYPE = RECORD [
SELECT location: Location FROM
base => [pageRun: PageRun],
log => [logID: LogID, offset: PageCount],
ENDCASE];
LocateReadPages: PROCEDURE [file: FileID, pageRun: PageRun, trans: TransID] RETURNS [LIST OF MappedPageRun];
Determines whether the data described by file and pageRun, as viewed under transaction trans, resides in the base or in the log. The result MappedPageRun can then be used as an argument to FilePageMgr.ReadPages or Log.PageRead, depending on variant.
Since part of the run might reside in the base and part in the log, LocateReadPages may break the run into pieces and return multiple MappedPageRuns. Also, since the run might begin in the middle of a log record, the log variant of MappedPageRun includes a page offset.
LocateReadPages consults only the LogMap, not the base. Thus it will not detect errors such as reading from a nonexistent file or page in the base; the FilePageMgr is expected to detect such errors. However, it will detect and raise a signal for errors such as reading from a file that has been deleted according to the log, since in this case the abstract state of the FileStore differs from the base.
RegisterSize: PROCEDURE [file: FileID, pageCount: PageCount, trans: TransID, logID: LogID];
Records in the LogMap a deferred change in the size of a file; pageCount is the new size. Deleting a file is registered as setting its size to zero. If the file’s size is decreased, LogMap may forget earlier deferred writes to the deleted portion of the file.
LocateSize: PROCEDURE [file: FileID, trans: TransID] RETURNS [location: Location, pageCount: PageCount, logID: LogID];
Locates the size of file as viewed under transaction trans. If location=base, the file’s size is in the base, and pageCount and logID are undefined. If location=log, pageCount is the current virtual size of the file and logID identifies the log record that most recently changed it.
END.
Implementation
Since every update operation can potentially give rise to a new LogMap entry, it is impossible to assign reasonable bounds to the total size of the LogMap without restricting the number of file operations per transaction. On the other hand, good performance is crucial, so the implementation of the LogMap must be very efficient, at least for the most common cases.
This argues for a hybrid implementation that uses an efficiently manageable internal data structure for well-behaved transactions (those will relatively few deferred updates), but that automatically overflows to some less efficient external data structure (e.g., a B-tree) when necessary.
The domain of the LogMap is unordered at the file level but ordered below the file level; and references to the LogMap can be expected to exhibit a high degree of locality. Lookups, insertions, and deletions must all be relatively efficient. These considerations suggest the use of a hash table for FileIDs, with each entry rooting some sort of tree in which the sub-file EntityKeys are stored. The hash table and the internal tree (e.g., a red-black tree) are mapped directly onto VM. The overflow B-tree is built on an external file; access to that file is managed by the FilePageMgr.
The LogMap entries for a transaction must be maintained in a list so that they may be enumerated in order of LogID. Additionally, a separate map keyed by TransID must be kept, to anchor these lists. It’s likely that we will need to keep other things on a per-transaction basis as well; so a separate TransactionMap interface may be called for.
LogMap is a module monitor that protects the private LogMap data structures. Additionally, the caller of procedures such as LocateReadPages is assumed to have acquired a lock on the current transaction to ensure the continued validity of information about entity location.
Miscellaneous remarks
A number of the TYPEs now defined in FileStore (e.g., PageRun, Property) will need to be moved to some independent definitions module, to avoid circular compilation dependencies.
It seems to me that we need a Log.PageRead operation that reads the data written by Log.RecordDeferredPageWrite. FileStore needs this to perform ReadPages for data located in the log rather than the base.
Perhaps the various LogMap procedures that accept both a FileID and a LogID should instead be passed an OpenFileMap.Handle. However, there isn’t any other reason for the LogMap to know about open files, so maybe this isn’t a good idea.
We need to develop an Alpine-wide policy on how to do enumerations. The examples here accept an enumeration procedure that is called for every instance; but the Pilot-style ‘‘stateless enumeration’’ may be better. (The main issue is that of monitor locks.)
Change history
September 8, 1981. OpenFileMap: Create and Validate take a ConversationID as an argument, and the caller is expected to call Validate before calling any other procedures. LogMap: Only deferred actions are kept in the LogMap. Clients inside and outside a transaction may see different Locations for an EntityKey. EnumerateTransaction enumerates in the forward direction only. Implementation uses a hash table and (normally) an internal tree, rather than a B-tree.