ExtendibleHash.mesa
Copyright Ó 1988 by Xerox Corporation. All rights reserved.
Bob Hagmann July 7, 1989 9:34:52 am PDT
ExtendibleHash: CEDAR DEFINITIONS
= BEGIN
Entries, records, and keys
Extendible hash functions are used when hashing is desired but no good estimate is possible of the number and size of items to be stored. This interface specifies the basics of Extendible hashing. The client must add all knowledge of the representation of the hash function, Entries, Records, and Keys. The client also provides the backing store for the hash table.
ExtendibleHashTable: TYPE = REF ExtendibleHashTableRep;
ExtendibleHashTableRep: TYPE; -- This represents the volatile state of an open hash table
Entry: TYPE = LONG POINTER TO EntryRep;
EntryRep: TYPE = MACHINE DEPENDENT RECORD [
entryWordSize(0): CARD32,
entryContents(WORDS[INT]): ARRAY [0..0) OF CARD32
];
HashValue: TYPE = RECORD [high: CARD32, low: CARD32];
EntSize: TYPE = CARD;
UpdateType: TYPE = {insert, replace, insertOrReplace};
Underlying page storage
The permanent storage for a hash table is represented by a client-provided PageStorage object; Pages are stored in an array of "pages" numbered from 0 relative to that object. Each page is typically the underlying physical page size (e.g., wordsPerPage) or some multiple thereof; PageSize defines the "reasonable" range of page sizes, within which one may expect the package to operate with reasonable performance.
The page storage is expected to include a substantial amount of caching. That is, if a given PageNumber was referenced and released recently, a new reference to that PageNumber is expected to have very low cost on the average.
wordsPerPage: INT = 512;
PageStorage: TYPE = REF ANY;
PageNumber: TYPE = CARD;
statePage: PageNumber = 0;
Page containing the top-level permanent state of the hash. The reason this may be of interest to the PageStorage primitives is explained below under "Locking and consistency".
PageSize: TYPE = CARD;
PagePtr: TYPE = LONG POINTER --TO ARRAY [0..pageSize) OF WORD--;
ReferenceType: TYPE = {read, write, new};
UpdateState: TYPE = {modified, unchanged};
Operations on page storage
ReferencePage: TYPE = UNSAFE PROC [storage: PageStorage, number: PageNumber, type: ReferenceType, clientPerCallData: REF] RETURNS [ptr: PagePtr];
Makes a reference to the specified page. Type=read means that the package will not modify the page; write means that it intends to modify the page before releasing it. Type=new is the same as type=write except that the package asserts that the page has not been used since the hash was initialized (i.e., is not presently part of the hash) and that its former contents are uninteresting.
No provision exists for this operation to fail (e.g., because page storage is exhausted or a transaction aborts). Any errors that it raises must be caught by the client of the package, and thereafter the state of the hash is undefined.
ReleasePage: TYPE = UNSAFE PROC [storage: PageStorage, number: PageNumber, update: UpdateState ← unchanged, clientPerCallData: REF];
Releases a page referenced by ReferencePage. If update=startOfUpdate (which occurs only when number=statePage) then an update operation involving multiple page writes has started, and the reference being released is the first write reference of the update. If update=endOfUpdate then an update operation has ended, and the reference being released is the last write reference of the operation; the package asserts that there are no remaining references in progress for that update. (Unchanged means that the reference being released is neither the beginning nor the end of an update.) If an update operation writes only a single single page, that page reference will be released with endOfUpdate. If an update operation writes multiple pages (and maintainRecomputableState=TRUE), the operation will be bracketed with write references to the statePage released with startOfUpdate and endOfUpdate respectively. For an elaboration on this, see "Locking and consistency" below.
PageStoragePrimitives: TYPE = RECORD [
referencePage: ReferencePage,
releasePage: ReleasePage
];
Global operations
State: TYPE = {closed, suspended, open};
A hash table can be in one of three states. A table that is closed is inaccessible for any operation besides Open or SetState; an attempted call will result in Error[closed]. A suspended table is likewise inaccessible, but an attempted call will wait indefinitely until the table is changed to some other state. An open table is available for regular operations.
New: PROC [ storPrim: PageStoragePrimitives, initialState: State[closed..suspended] ← closed] RETURNS [hashTable: ExtendibleHashTable];
Creates a new ExtendibleHashTable object with no associated storage instance, and sets its initial state as requested.
Open: PROC [hashTable: ExtendibleHashTable, storage: PageStorage, pageSize: PageSize, initialize: BOOLEANFALSE, initialDirectorySize: CARD ← 512, clientPerCallData: REFNIL];
Associates a PageStorage with an ExtendibleHashTable, and changes the table's state to open. (If it is already open, it is first closed as described under SetState.) The set of pages comprising the table instance is identified by the PageStorage. PageSize is the "logical" size of a pages in 32-bit words. If initialize=TRUE, an empty table is created; if FALSE, the storage pages are assumed to contain a well-formed table already.
Raises Error[badSeal] if the statePage does not contain a valid seal; raises Error[wrongPageSize] if pageSize differs from the one that the hash was built with. If initialize is TRUE, initialDirectorySize must be a power of two.
SetState: PROC [hashTable: ExtendibleHashTable, state: State[closed..suspended]];
If the table is open, waits until there is no operation in progress (which can take arbitrarily long, depending on the activities of other processes accessing the table). Then sets the table's state to the specified value. This procedure is intended to force the table into a quiescent, stable state, e.g., immediately before initiating a checkpoint or rollback.
GetState: PROC [hashTable: ExtendibleHashTable] RETURNS [state: State, entryCount: CARD, greatestPage: PageNumber, log2DirectorySize: CARD, storage: PageStorage];
Returns the table's state, the number of entries contained in the table, the highest PageNumber of any page which is or ever has been part of the table, the log2 of the directory size, and the PageStorage handle currently in use.
Validate: PROC [hashTable: ExtendibleHashTable, hashFunction: PROC [entry: Entry] RETURNS [hashValue: HashValue]];
Scans the entire table and verifies that it is well-formed. Raises Error[entrySizesWrong] if an inconsistency is detected between the sizes of entries and the space used in pages. Raises Error[other] if there are other (unspecified) problems with the table. All these errors indicate fatal structural problems whose presence may cause the hash table package to malfunction if any access operations are performed.
Raises Error[closed] if the hash's state is closed.
Hash access
These are all UNSAFE, but more efficient than the corresponding SAFE procedures since no storage allocation is required.
ReadEntry: UNSAFE PROC [hashTable: ExtendibleHashTable, hashValue: HashValue, proc: UNSAFE PROC [entry: Entry, wordsForEntriesOnPage: INT, overFlowPageProc: UNSAFE PROC RETURNS [entry: Entry, wordsForEntriesOnPage: INT]], clientPerCallData: REFNIL];
Finds the hash page and arranges it to be in memory via the "underlying page storage" representation primitives. The entry returned is the first entry on the page. It may or may not be the entry desired.
The client is required to look at the entries in succession and find the one needed. The number of valid 32 bit words is in wordsForEntriesOnPage. (This loop is done in the client to overcome performance problems if the size and compare functions are done by the hash package calling the client.) If the page is exhausted, the overFlowPageProc can be called to obtain further overflow pages. There may be zero, one, or many overflow pages. A value of NIL for entry means tha the overflow chain is done.
The page (and the overflow pages) containing the entry is locked, so proc and overFlowPageProc may not invoke any operations. It is illegal for Proc to write into entry^.
DeleteEntry: UNSAFE PROC [hashTable: ExtendibleHashTable, hashValue: HashValue, proc: UNSAFE PROC [entry: Entry, wordsForEntriesOnPage: INT, overFlowPageProc: UNSAFE PROC RETURNS [entry: Entry, wordsForEntriesOnPage: INT]] RETURNS [entryToDelete: Entry], clientPerCallData: REFNIL];
Finds the hash page and arranges it to be in memory via the "underlying page storage" representation primitives. The entry returned is the first entry on the page. The number of valid 32 bit words is in wordsForEntriesOnPage. It may or may not be the entry desired. The client is required to look at the entries in succession and find the one needed. When found, it is returned in entryToDelete. If NIL is returned, no entry is deleted.
UpdateEntry: UNSAFE PROC [
hashTable: ExtendibleHashTable,
hashValue: HashValue,
words: EntSize,
findProc: UNSAFE PROC [entry: Entry, wordsForEntriesOnPage: INT, overFlowPageProc: UNSAFE PROC RETURNS [entry: Entry, wordsForEntriesOnPage: INT]] RETURNS [entryToUpdate: Entry],
modifyProc: UNSAFE PROC [entry: Entry],
updateType: UpdateType ← insertOrReplace,
clientPerCallData: REFNIL];
Inserts an entry whose size is the specified number of 32 bit words into the hash table (that is, the size is the size of the entryContents). findProc is called when the update is for replace or insertOrReplace to find the matching entry, if any. modifyProc is called with entry pointing at exactly the right amount of storage in a hash or overflow page.
If updateType=insert then the update is required to be an insertion of a key not already present in the table. Similarly, if updateType=replace then the update is required to be a replacement of an entry already present in the table. If the updateType is violated, Error[wrongUpdateType] is raised and the operation is not performed.
After modifyProc is called, it is required that findProc is returns the entry just inserted, and Error[wrongEntryProduced] is raised if this does not hold; the state of the hash table is undefined if this error occurs.
Exceptions
Error: ERROR [reason: Reason];
Reason: TYPE = {badSeal, closed, entrySizesWrong, other, wrongEntryProduced, wrongPageSize, wrongUpdateType, entrySizeTooBig, notPowerOfTwoDirectorySize, notAnEntry, outOfRoom, overWriteOfNextEntry};
END.