BitTableLookup.mesa - interface for space efficient set membership test, useful for implementing sleazy spelling checkers.
-- Last Edited by
-- Nix on October 27, 1983 1:15 pm
DIRECTORY
IO USING [STREAM],
Rope USING [ROPE];
BitTableLookup: CEDAR DEFINITIONS =
BEGIN
ROPE: TYPE = Rope.ROPE;
STREAM: TYPE = IO.STREAM;
Create: PROC [size: INT] RETURNS [table: Table];
Creates a table with the given number of bits of capacity. The number of bits should be a prime number that is approximately 20 times the number of words in the given table.
ERRORS: Raises Failed[TableTooLarge] if the table size is greater than 1,048,576 bits; tables of this size suffice to hold 50,000 items while maintaining a probability of ~0.001 that non-members will erroneously be called members.
NumberHashFunctions: PROC [] RETURNS [numberHashFunctions: INT];
Returns the number of hash functions used by the lookup/insertion procedure.
Read: PROC [in: STREAM] RETURNS [table: Table];
Reads in a Table from the given stream.
ERRORS: Raises Failed[BadTable] if it detects that the stream does not contain a valid bit table.
Write: PROC [table: Table, out: STREAM];
Writes out a Table to the given stream.
Insert: PROC [table: Table, word: REF TEXT];
Inserts the word into the given Table.
Lookup: PROC [table: Table, word: REF TEXT] RETURNS [member: BOOLEAN];
Looks up the word in the Table and returns TRUE iff it thinks that the word is in there. If the word actually was in the Table, then this will always return TRUE. However if the word actually was not in the table then it may erroneously return TRUE, although this won't happen very often.
The representation of the bit table.
RowContents: TYPE = PACKED ARRAY [0..4096) OF BOOL;
RowPrime: CARDINAL = 4093; -- Prime less than 4096.
TableContents: TYPE = RECORD[
bits: SEQUENCE rowSize: CARDINAL OF REF RowContents
A vector of bits, so declared because there is a restriction on the maximum number of bits in a straightforward packed array of bool.
];
Table: TYPE = REF TableContents;
Errors
FailureTypes: TYPE = {TooLarge, BadTable};
Failed: ERROR[ why: FailureTypes ];
There are currently two kinds of errors:
TooLarge: raised by Create when asked to create a bit table that is larger than implementation restrictions currently allow (currently raised at about 1,000,000 bits).
BadTable: raised by Read when it finds that the stream does not contain a valid table.
END.
CHANGE LOG
Created by Nix on September 15, 1983 10:36 am