@make(article)
@disable(contents)
@begin(center)
HASH.LSP

A Hash-Coded Dictionary Facility

for Interlisp-10 & Interlisp-D

Christopher Lane

modeled after work by

L. M. Masinter
W. van Melle
R. M. Kaplan

August 1983
@end(center)

@section(Introduction)

The hashfile package provides a facility for storing and retrieving
information on files from within Interlisp.

HASH.LSP is a new implementation of the Interlisp-10 hashfile package
which works in both Interlisp-10 and Interlisp-D, being written in
Interlisp and operating on streams of bytes.  This package implements
both the current Interlisp-10 package as well as the original
implementation of Masinter and van Melle which is still used in EMYCIN
based systems, and will be referred to as the EMYCIN package.  This
document only deals with differences between the various packages.
Users should consult the Interlisp Reference Manual and the EMYCIN
documentation for details.

@section(Basics)

Hashfiles are created by calling CREATEHASHFILE.  A "HashFile", as
referenced in this document, is the datum returned by CREATEHASHFILE
or OPENHASHFILE, currently an array record containing the hashfile
name, and the number of slots in the file, the used slots and other
details.  All other functions with hashfile arguments use this datum.
A NIL hashfile argument still refers to SYSHASHFILE, that is in the
EMYCIN sense of the current hashfile, not a global user hashfile as
in Interlisp-10. Keys are strings or atoms, as in the other systems.

Interlisp-10 hashfiles come in several flavors, according to the values
stored in them.  The EMYCIN system provides even more flexibility.
This system @b(only) supports the most general EXPR type of hashfiles;
the VALUETYPE and ITEMLENGTH arguments are ignored.

Two key hashing @b(is) supported in this system but is discouraged
as it is only in EMYCIN, not in the Interlisp-10 system.  The functions
that manipulate `secret pages' do not exist in this implementation.

The package sysloads the DFOR10.COM package from the LispUsers directory
for Interlisp-10 users.

@section(Functions)
The functions implemented are:

@begin(description)
createhashfile[filename,valuetype,itemlength,#entries]

openhashfile[filename,access(,itemlength)(,#entries)(,smash)]

lookuphashfile[key,value,hashfile,calltype(,key2)]

puthashfile[key,value,hashfile(,key2)]

gethashfile[key,hashfile(,key2)]

hashfiledata[hashfile] (EMYCIN)

hashfilep[hashfile(,write?)]

hashfileprop[hashfile,property]

hashfilename[hashfile]

closehashfile[hashfile(,reopen)]

clearhashfiles[close,release] (EMYCIN)

maphashfile[hashfile,mapfn(,double)]

hashfilesplst[hashfile(,xword)]

collectkeys[hashfile,double,mkstring] (EMYCIN)

rehashfile[hashfile(,newname(,newvaluetype))]

@end(description)


@section(Global Variables)

The variables used by the system of interest to the user.

@begin(description)
HASHTEXTCHAR {↑A} The character separating two key hashkeys.

HASHFILERDTBL {ORIG} The hashfile read table.

HFGROWTHFACTOR {3} The ration of total slots to used slots when a
hashfile is created.

REHASHGAG {NIL} Flags whether to print message when rehashing;
initially off.

HASHFILEDEFAULTSIZE {512} Size used when #Entries is omited.

HASHLOADFACTOR {.875} The ratio, used slots/total slots, at which
the system rehashes the file, initially 7/8.

SYSHASHFILE {NIL} The current hashfile.

@end(description)

@section(Implementation)

The hashfile package views files as a sequence of bytes, randomly
accessible.  No notice is made of pages and it is assumed that the
host computer buffers I/O suffieciently.

Hash files consist of a layer of pointers followed by ascii data.
Pointers are 3 bytes wide, preceeded by a status byte.  The pointers
point to key PNAMES in the data section, where each key is followed
by its value.  Deleted key pointers are reused, deleted data space
is not, so rehashing is required if many items have been "replaced".

The data section starts at 4*HASHFILE:SIZE + 9, and consists of
alternating keys and values.  As deleted data is not re-written, files
should not be accessed directly.

When a key hashes into a used slot, a probe value is added to it to
find the next slot to search.  The probe value is a small prime derived
from the original hash key.

@section(Limitations)

The system currently is able to manipulate files on the local disk
and read hashfiles over the network, however, writing hash files over
the network runs into problems with the leaf servers.

Due to the pointer size, only hashfiles of < 6 million intial entries
can be created, though these can grow to 14 million entries before
automatic rehashing exceeds the pointer limit.  The total file length
is limited to 16 million bytes.  No range checking is done for these
limits.