File [eris]<LispManual>1983org>HashArrays.im!4



{Begin SubSec Hash Arrays}
{Title Hash Arrays}
{Text


{Note Interlisp-D doesn't GC any hash keys. }

{Note Will the garbage collector ever delete entries from a hash table?
Personal opinion:  I think that the GC should NOT delete entries, and that you should always be able to retrieve everything you put in a hash array using MAPHASH ---mjs}

{Tag HashArrays}

{index *PRIMARY* hash arrays}
{index *PRIMARY* hash keys}
{index *PRIMARY* hash values}

Hash arrays provide a mechanism for associating arbitrary lisp objects ("hash keys") with other objects ("hash values"), such that the hash value associated with a particular hash key can be quickly obtained.  A set of associations could be represented as a list or array of pairs, but these schemes are very inefficient when the number of associations is large.  There are functions for creating hash arrays, putting a hash key/value pair in a hash array, and quickly retrieving the hash value associated with a given hash key.

By default, the hash array functions use {fn EQ} for comparing hash keys.  This means that if non-atoms are used as hash keys, the exact same object (not a copy) must be used to retrieve the hash value.  However, the user can override this default for any hasharray by specifying the functions used to compare hash keys and to "hash" a hash key to a number.  This can be used, for example, to create hash arrays where {fn EQUAL} but non-{fn EQ} strings will hash to the same value.  Specifying alternate hashing algorithms is described below.


{note paragraph about GC of non-atomic hash keys (done in I-10, not in I-D)?}


In the description of the functions below, the argument {arg HARRAY} should be a value of the function {fn HASHARRAY}, which is used to create hash arrays.  For convenience in interactive program development, it may also be {lisp NIL}, in which case a hash array provided by the system, {index SYSHASHARRAY Var}{var SYSHASHARRAY}, is used; the user must watch out for confusions if this form is used to associate more than one kind of value with the same key.

Note:  For backwards compatibility, the hash array functions will accept a list whose {fn CAR} is a hash array, and whose {fn CDR} is the "overflow method" for the hash array (see below).  However, hash array functions are guaranteed to perform with maximum efficiency only if a direct value of {fn HASHARRAY} is given.



{FnDef {FnName HASHARRAY} {FnArgs MINKEYS OVERFLOW HASHBITSFN EQUIVFN}
{Text
Creates a hash array containing at least {arg MINKEYS} hash keys, with overflow method {arg OVERFLOW}. See discussion of overflow behavior below.

If {arg HASHBITSFN} and {arg EQUIVFN} are non-{lisp NIL}, they specify the hashing function and comparison function used to interpret hash keys.  This is described in the section on user-specified hashing functions below.  If {arg HASHBITSFN} and {arg EQUIVFN} are {lisp NIL}, the default is to hash {fn EQ} hash keys to the same value. 
}}


{FnDef {FnName HARRAY} {FnArgs MINKEYS}
{Text
Provided for backward compatibility, this is equivalent to {lisp (HASHARRAY MINKEYS 'ERROR)}.
}}


{FnDef {FnName HASHARRAYP} {FnArgs X}
{Text
Returns {arg X} if it is a possible hash array argument to other functions, an {lisp HARRAYP} datum or a list whose {fn CAR} is an {lisp HARRAYP} datum.  Otherwise, returns {lisp NIL}.
}}


{FnDef {FnName HARRAYP} {FnArgs X}
{Text
Returns {arg X} if it is an {lisp HARRAYP} datum, as returned by the function {fn HARRAY}.  Unlike {fn HASHARRAYP}, value is {lisp NIL} for lists whose {fn CAR} is an {lisp HARRAYP}.  {fn HASHARRAYP} should probably be used instead in most circumstances.
}}


{FnDef {FnName HARRAYPROP} {FnArgs HARRAY PROP NEWVALUE}
{Text
Returns the property {arg PROP} of {arg HARRAY}; {arg PROP} can have the system-defined values {lisp SIZE} (returns the maximum occupancy of {arg HARRAY}), {lisp NUMKEYS} (number of occupied slots), {lisp OVERFLOW} (overflow method), {lisp HASHBITSFN} (hashing function) and {lisp EQUIVFN} (comparison function).  Except for {lisp SIZE} and {lisp NUMKEYS}, a new value may be specified as {arg NEWVALUE}.

By using other values for {arg PROP}, the user may also set and get arbitrary property values, to associate additional information with a hash array.

Warning:  Changing the {lisp HASHBITSFN} or {lisp EQUIVFN} of a non-empty hash array can make some elements non-accessable.  See the section on user-specified hashing functions below.
}}


{FnDef {FnName HARRAYSIZE} {FnArgs HARRAY}
{Text
Equivalent to {lisp (HARRAYPROP HARRAY 'SIZE)}; returns the number of slots in {arg HARRAY}.
}}


{FnDef {FnName CLRHASH} {FnArgs HARRAY}
{Text
Clears all hash keys/values from {arg HARRAY}.  Returns {arg HARRAY}.
}}


{FnDef {FnName PUTHASH} {FnArgs KEY VAL HARRAY}
{Text
Associates the hash value {arg VAL} with the hash key {arg KEY} in {arg HARRAY}.  Replaces the previous hash value, if any.  If {arg VAL} is {lisp NIL}, any old association is removed (hence a hash value{index hash value} of {lisp NIL} is not allowed).  If {arg HARRAY} is full when {fn PUTHASH} is called with a key not already in the hash array, the function {fn HASHOVERFLOW} is called, and the {fn PUTHASH} is applied to the value returned (see below).  Returns {arg VAL}.
}}


{FnDef {FnName GETHASH} {FnArgs KEY HARRAY}
{Text
Returns the hash value associated with the hash key {arg KEY} in {arg HARRAY}.  Returns {lisp NIL}, if {arg KEY} is not found.
}}


{FnDef {FnName REHASH} {FnArgs OLDHARRAY NEWHARRAY}
{Text
Hashes all hash keys and values in {arg OLDHARRAY} into {arg NEWHARRAY}.  The two hash arrays do not have to be (and usually aren't) the same size.  Returns {arg NEWHARRAY}.
}}


{FnDef {FnName MAPHASH} {FnArgs HARRAY MAPHFN}
{Text
{arg MAPHFN} is a function of two arguments.  For each hash key in {arg HARRAY}, {arg MAPHFN} will be applied to (1) the hash value, and (2) the hash key.  For example,

{lispcode
[MAPHASH A
   (FUNCTION (LAMBDA (VAL KEY)
                  (if (LISTP KEY) then (PRINT VAL)]}

will print the hash value for all hash keys that are lists.  {fn MAPHASH} returns {arg HARRAY}.
}}


{FnDef {FnName DMPHASH} {FnArgs HARRAY{SUB 1} HARRAY{SUB 2} {ellipsis} HARRAY{SUB N}}
{Type NOSPREAD NLAMBDA}
{Text
Prints on the primary output file {fn LOAD}able forms which will restore the hash-arrays contained as the values of the atoms {arg HARRAY{SUB 1}}, {arg HARRAY{SUB 2}}, {ellipsis} {arg HARRAY{SUB N}}.  Example: {lisp (DMPHASH SYSHASHARRAY)} will dump the system hash-array.

Note: all {fn EQ} identities except atoms and small integers are lost by dumping and loading because {fn READ} will create new structure for each item.  Thus if two lists contain an {fn EQ} substructure, when they are dumped and loaded back in, the corresponding substructures while {fn EQUAL} are no longer {fn EQ}.  The {lisp HORRIBLEVARS} file package command ({PageRef FileCom HORRIBLEVARS}) provides a way of dumping hash tables such that these identities are preserved.
}}


{note use CIRCULARVARS or GENERALVARS instead of HORRIBLEVARS  ---lmm}




{Begin SubSec Hash Overflow}
{Title Hash Overflow}
{Text

{index hash overflow}

When a hash array becomes full, attempting to add another hash key will cause the function {fn HASHOVERFLOW}{index HASHOVERFLOW Fn} to be called.  This will either automatically enlarge the hash array, or cause the error {lisp HASH TABLE FULL}.{index HASH TABLE FULL Error}  How hash overflow is handled is determined by the value of the {lisp OVERFLOW} property of the hash array (which can be accessed by {fn HARRAYPROP}).  The possibilities for the overflow method are:

{Begin LabeledList the form that is passed to the hash function}

{Name the atom {lisp ERROR}}
{Text
The error {lisp HASH ARRAY FULL}{index HASH ARRAY FULL Error} is generated when the hash array overflows.  This is the default overflow behavior for hash arrays returned by {fn HARRAY}.
}

{Name {lisp NIL}}
{Text
The array is automatically enlarged by 1.5. This is the default overflow behavior for hash arrays returned by {fn HASHARRAY}.
}

{Name a positive integer {arg N}}
{Text
The array is enlarged to include {arg N} more slots than it currently has.
}

{Name a floating point number {arg F}}
{Text
The array is changed to include {arg F} times the number of current slots.
}

{Name a function or lambda expression {arg FN}}
{Text
Upon hash overflow, {arg FN} is called with the hash array as its argument.  If {arg FN} returns a number, that will become the size of the array.  Otherwise, the new size defaults to 1.5 times its previous size.  {arg FN} could be used to print a message, or perform some monitor function.
}

{End LabeledList the form that is passed to the hash function}


Note:  For backwards compatibility, the hash array functions accept a list whose {fn CAR} is the hash array, and whose {fn CDR} is the overflow method.  In this case, the overflow method specified in the list overrides the overflow method set in the hash array.  Note that hash array functions are guaranteed to perform with maximum efficiency only if a direct value of {fn HASHARRAY} is given.


}{End SubSec Hash Overflow}


{Begin SubSec User-Specified Hashing Functions}
{Title User-Specified Hashing Functions}
{Text

{index Hashing Functions}

In general terms, when a hash key is looked up in a hash array, it is converted to an integer, which is used to index on a linear array.  If the hash key is not the same as the one found at that index, other indices are tried until it is found.

The important features of this algorithm, for users of hash arrays are (1) the "hashing function" used to convert a hash key to an integer; and (2) the comparison function used to compare the hash key found in the linear array with the hash key being looked up.  In order for hash arrays to work correctly, any two objects which are equal according to the comparison function must "hash" to equal integers.

By default, the Interlisp hash array functions use a hashing function that extracts an integer from low-level object pointers, and use {fn EQ} for comparing hash keys.  This means that if non-atoms are used as hash keys, the exact same object (not a copy) must be used to retrieve the hash value.

In some situations this default is inconvenient.  For example, it may be useful to use strings as hash keys, without the restriction that {fn EQUAL} but not {fn EQ} strings are considered to be different hash keys.

The user can override this default behavior for any hasharray by specifying the functions used to compare hash keys and to "hash" a hash key to a number.  This can be done by giving the {arg HASHBITSFN} and {arg EQUIVFN} arguments to {fn HASHARRAY} ({PageRef Fn HASHARRAY}), or by changing the {lisp HASHBITSFN} and {lisp EQUIVFN} hash array properties using the function {fn HARRAYPROP} ({PageRef Fn HARRAYPROP}).

The {lisp EQUIVFN} property should be a function of two arguments that should return non-{lisp NIL} when its arguments are to be considered equal.  The {lisp HASHBITSFN} property should be a function of one argument that produces an integer (preferably a small positive integer in the range [0..2↑16)) with the property that objects that are considered equal produce the same hash bits.

The following function is useful for creating hash arrays that take strings as hash keys:

{FnDef {Name STRINGHASHBITS} {Args STR}
{Text
Hashes the string {arg STR} into an integer that can be used as a {lisp HASHBITSFN} for a hash array.  Strings which are {fn STREQUAL} hash to the same integer.
}}

Example:

(HASHARRAY {arg MINKEYS} {arg OVERFLOW} 'STRINGHASHBITS 'STREQUAL)

creates a hash array where you can use strings as hash keys.

Warning:  Changing the {lisp HASHBITSFN} or {lisp EQUIVFN} of a non-empty hash array with {fn HARRAYPROP} does NOT rehash the old contents of the hash array.  The results of doing this are not defined.  Therefore, changing the {lisp HASHBITSFN} or {lisp EQUIVFN} of a non-empty hash array is not recommended.


}{End SubSec User-Specified Hashing Functions}


}{End SubSec Hash Arrays}