FILE: Hash.mesa
Last edited by Ousterhout, May 10, 1983 10:43 am
Christian LeCocq May 16, 1986 11:56:03 am PDT
This file defines the interface to a simple hash table module that binds strings to values.
Hash: CEDAR DEFINITIONS =
BEGIN
NewTable:
PROC[nBuckets:
INT ← 16]
RETURNS [table: Table];
Creates a new HashTable with the given number of buckets. It's OK to start HashTables out small: they will grow automatically as more entries are allocated with HashFind.
Find:
PROC[table: Table, name: Rope.
ROPE]
RETURNS [entry: Entry];
Finds an entry by the given name in the given table. If there isn't any entry by that name, a new one is allocated. If the table becomes too large, it is grown automatically. For new entries, the clientData field is always NIL.
Enumerate:
PROC[table: Table, pattern: Rope.
ROPE, proc: EnumProc,
clientData: REF ANY ← NIL, errorStream: IO.STREAM ← NIL];
Enumerates all the entries in table with names that match the pattern by calling proc with the entry and clientData as arguments. If an error occurs and errorStream isn't NIL, then diagnostics are output. The pattern can have any of the following forms:
1. "*x": matches all elements of the table whose names contain the substring x. Since this involves a complete search of the entire hash table, it may be slow.
2. "a<x:y>b": matches names of the form acb where c gets successive numerical values between x and y, inclusive. Each string will be looked up in the hash table. This is relatively fast. Use backslash as a quote character to get < and > characters in the string without special interpretation.
3. Strings not in the above categories: they are just looked up as they are.
The * and <> forms may be combined. Note: a pattern of "*" will enumerate all of the entries in the table. TRUE is returned if the pattern was correct, FALSE if it didn't make sense.
Stats:
PUBLIC
PROC[table: Table, stream:
IO.
STREAM];
Print stats about bucket utilization on stream.
Entry: TYPE = REF EntryRec;
EntryRec: TYPE = RECORD [
Each entry points to its reference rope, a piece of client data, and a link to the next entry in the same bucket. There is also a pointer to the next entry in the overall list for the whole table (see the comment below under tables). The information is visible to clients, but should be read-only except for clientData.
name: Rope.ROPE,
clientData: REF ANY,
nextInBucket: Entry,
nextOverall: Entry
];
EnumProc:
TYPE =
PROC[entry: Entry, clientData:
REF
ANY];
Client procedure called during hash table enumeration.
Table: TYPE = REF TableRec;
TableRec: TYPE = RECORD [
A hash table contains a pointer to an array (sequence) of buckets, each of which is a linked list of entries that hash to the same bucket. In addition, the table contains a count of the total number of entries we've made in this table (so we know when it's getting too full), and some random information used by the hashing function. In addition, the entries are linked together in a list. This isn't strictly necessary, since the entries can all be found by searching the buckets, but it allows searching in order of allocation, which tends to get things that are close together in virtual memory (for less page thrashing).
buckets: REF Buckets,
nEntries: CARDINAL,
hashDivide: INT,
hashMask: CARDINAL,
firstEntry: Entry
];
Buckets:
TYPE =
RECORD [
s: SEQUENCE length: CARDINAL OF Entry
];
END.