File [Erinyes]<LispManual>ChapHashArrays.im!5



{Begin Chapter Hash Arrays}
{Title Hash Arrays}
{Text


{Note Interlisp-D doesn't GC any hash keys. }

{Note Will the garbage collector ever delete entries from a hash table?
Personal opinion:  I think that the GC should NOT delete entries, and that you should always be able to retrieve everything you put in a hash array using MAPHASH ---mjs}

{Tag HashArrays}

{index *PRIMARY* Hash arrays}
{index *PRIMARY* Hash keys}
{index *PRIMARY* Hash values}

Hash arrays provide a mechanism for associating arbitrary lisp objects ("hash keys") with other objects ("hash values"), such that the hash value associated with a particular hash key can be quickly obtained.  A set of associations could be represented as a list or array of pairs, but these schemes are very inefficient when the number of associations is large.  There are functions for creating hash arrays, putting a hash key/value pair in a hash array, and quickly retrieving the hash value associated with a given hash key.

By default, the hash array functions use {fn EQ} for comparing hash keys.  This means that if non-atoms are used as hash keys, the exact same object (not a copy) must be used to retrieve the hash value.  However, the user can override this default for any hash array by specifying the functions used to compare hash keys and to "hash" a hash key to a number.  This can be used, for example, to create hash arrays where {fn EQUAL} but non-{fn EQ} strings will hash to the same value.  Specifying alternative hashing algorithms is described below ({PageRef Term Hashing functions}).


{note paragraph about GC of non-atomic hash keys (done in I-10, not in I-D)?}


In the description of the functions below, the argument {arg HARRAY} should be a value of the function {fn HASHARRAY}, which is used to create hash arrays.  For convenience in interactive program development, it may also be {lisp NIL}, in which case a hash array provided by the system, {index SYSHASHARRAY Var}{var SYSHASHARRAY}, is used; the user must watch out for confusions if this form is used to associate more than one kind of value with the same key.

Note:  For backwards compatibility, the hash array functions will accept a list whose {fn CAR} is a hash array, and whose {fn CDR} is the "overflow method" for the hash array (see below).  However, hash array functions are guaranteed to perform with maximum efficiency only if a direct value of {fn HASHARRAY} is given.



{FnDef {FnName HASHARRAY} {FnArgs MINKEYS OVERFLOW HASHBITSFN EQUIVFN}
{Text
Creates a hash array containing at least {arg MINKEYS} hash keys, with overflow method {arg OVERFLOW}. See discussion of overflow behavior below ({PageRef Term Hash overflow}).

If {arg HASHBITSFN} and {arg EQUIVFN} are non-{lisp NIL}, they specify the hashing function and comparison function used to interpret hash keys.  This is described in the section on user-specified hashing functions below ({PageRef Term Hashing functions}).  If {arg HASHBITSFN} and {arg EQUIVFN} are {lisp NIL}, the default is to hash {fn EQ} hash keys to the same value. 
}}


{FnDef {FnName HARRAY} {FnArgs MINKEYS}
{Text
Provided for backward compatibility, this is equivalent to {lisp (HASHARRAY MINKEYS 'ERROR)}.
}}



{index *PRIMARY* HARRAYP Fn}

{FnDef {FnName HARRAYP} {FnArgs X}
{Text
Returns {arg X} if it is a hash array object; otherwise {lisp NIL}.

Note that {fn HARRAYP} returns {lisp NIL} if {arg X} is a list whose {fn CAR} is an {lisp HARRAYP}, even though this is accepted by the hash array functions.
}}


{FnDef {FnName HARRAYPROP} {FnArgs HARRAY PROP NEWVALUE}
{FnType Nospread}
{Text
Returns the property {arg PROP} of {arg HARRAY}; {arg PROP} can have the system-defined values {lisp SIZE} (returns the maximum occupancy of {arg HARRAY}), {lisp NUMKEYS} (number of occupied slots), {lisp OVERFLOW} (overflow method), {lisp HASHBITSFN} (hashing function) and {lisp EQUIVFN} (comparison function).  Except for {lisp SIZE} and {lisp NUMKEYS}, a new value may be specified as {arg NEWVALUE}.

By using other values for {arg PROP}, the user may also set and get arbitrary property values, to associate additional information with a hash array.

Note:  The {lisp HASHBITSFN} or {lisp EQUIVFN} properties can only be changed if the hash array is empty.
}}


{FnDef {FnName HARRAYSIZE} {FnArgs HARRAY}
{Text
Equivalent to {lisp (HARRAYPROP HARRAY 'SIZE)}; returns the number of slots in {arg HARRAY}.
}}


{FnDef {FnName CLRHASH} {FnArgs HARRAY}
{Text
Clears all hash keys/values from {arg HARRAY}.  Returns {arg HARRAY}.
}}


{FnDef {FnName PUTHASH} {FnArgs KEY VAL HARRAY}
{Text
Associates the hash value {arg VAL} with the hash key {arg KEY} in {arg HARRAY}.  Replaces the previous hash value, if any.  If {arg VAL} is {lisp NIL}, any old association is removed (hence a hash value of {lisp NIL} is not allowed).  If {arg HARRAY} is full when {fn PUTHASH} is called with a key not already in the hash array, the function {fn HASHOVERFLOW} is called, and the {fn PUTHASH} is applied to the value returned (see below).  Returns {arg VAL}.
}}


{FnDef {FnName GETHASH} {FnArgs KEY HARRAY}
{Text
Returns the hash value associated with the hash key {arg KEY} in {arg HARRAY}.  Returns {lisp NIL}, if {arg KEY} is not found.
}}


{FnDef {FnName REHASH} {FnArgs OLDHARRAY NEWHARRAY}
{Text
Hashes all hash keys and values in {arg OLDHARRAY} into {arg NEWHARRAY}.  The two hash arrays do not have to be (and usually aren't) the same size.  Returns {arg NEWHARRAY}.
}}


{FnDef {FnName MAPHASH} {FnArgs HARRAY MAPHFN}
{Text
{arg MAPHFN} is a function of two arguments.  For each hash key in {arg HARRAY}, {arg MAPHFN} will be applied to (1) the hash value, and (2) the hash key.  For example,

{lispcode
[MAPHASH A
   (FUNCTION (LAMBDA (VAL KEY)
                  (if (LISTP KEY) then (PRINT VAL)]}

will print the hash value for all hash keys that are lists.  {fn MAPHASH} returns {arg HARRAY}.
}}


{FnDef {FnName DMPHASH} {FnArgs HARRAY{SUB 1} HARRAY{SUB 2} {ellipsis} HARRAY{SUB N}}
{Type NOSPREAD NLAMBDA}
{Text
Prints on the primary output file {fn LOAD}able forms which will restore the hash-arrays contained as the values of the atoms {arg HARRAY{SUB 1}}, {arg HARRAY{SUB 2}}, {ellipsis} {arg HARRAY{SUB N}}.  Example: {lisp (DMPHASH SYSHASHARRAY)} will dump the system hash-array.

Note: all {fn EQ} identities except atoms and small integers are lost by dumping and loading because {fn READ} will create new structure for each item.  Thus if two lists contain an {fn EQ} substructure, when they are dumped and loaded back in, the corresponding substructures while {fn EQUAL} are no longer {fn EQ}.  The {lisp HORRIBLEVARS} file package command ({PageRef FileCom HORRIBLEVARS}) provides a way of dumping hash tables such that these identities are preserved.
}}


{note use CIRCULARVARS or GENERALVARS instead of HORRIBLEVARS  ---lmm}




{Begin SubSec Hash Overflow}
{Title Hash Overflow}
{Text

{index *PRIMARY* Hash overflow}

When a hash array becomes full, attempting to add another hash key will cause the function {fn HASHOVERFLOW}{index HASHOVERFLOW Fn} to be called.  This will either automatically enlarge the hash array, or cause the error {lisp HASH TABLE FULL}.{index HASH TABLE FULL Error}  How hash overflow is handled is determined by the value of the {lisp OVERFLOW} property of the hash array (which can be accessed by {fn HARRAYPROP}).  The possibilities for the overflow method are:

{Begin LabeledList the form that is passed to the hash function}

{Name the litatom {lisp ERROR}}
{Text
The error {lisp HASH ARRAY FULL}{index HASH ARRAY FULL Error} is generated when the hash array overflows.  This is the default overflow behavior for hash arrays returned by {fn HARRAY}.
}

{Name {lisp NIL}}
{Text
The array is automatically enlarged by 1.5. This is the default overflow behavior for hash arrays returned by {fn HASHARRAY}.
}

{Name a positive integer {arg N}}
{Text
The array is enlarged to include {arg N} more slots than it currently has.
}

{Name a floating point number {arg F}}
{Text
The array is changed to include {arg F} times the number of current slots.
}

{Name a function or lambda expression {arg FN}}
{Text
Upon hash overflow, {arg FN} is called with the hash array as its argument.  If {arg FN} returns a number, that will become the size of the array.  Otherwise, the new size defaults to 1.5 times its previous size.  {arg FN} could be used to print a message, or perform some monitor function.
}

{End LabeledList the form that is passed to the hash function}


Note:  For backwards compatibility, the hash array functions accept a list whose {fn CAR} is the hash array, and whose {fn CDR} is the overflow method.  In this case, the overflow method specified in the list overrides the overflow method set in the hash array.  Note that hash array functions are guaranteed to perform with maximum efficiency only if a direct value of {fn HASHARRAY} is given.


}{End SubSec Hash Overflow}


{Begin SubSec User-Specified Hashing Functions}
{Title User-Specified Hashing Functions}
{Text

{index *PRIMARY* Hashing functions}

In general terms, when a key is looked up in a hash array, it is converted to an integer, which is used to index into a linear array.  If the key is not the same as the one found at that index, other indices are tried until it the desired key is found.  The value stored with that key is then returned (from {fn GETHASH}) or replaced (from {fn PUTHASH}).

The important features of this algorithm, for purposes of customizing hash arrays, are (1) the "hashing function" used to convert a key to an integer; and (2) the comparison function used to compare the key found in the array with the key being looked up.  In order for hash arrays to work correctly, any two objects which are equal according to the comparison function must "hash" to equal integers.

By default, the Interlisp hash array functions use a hashing function that computes an integer from the internal address of a key, and use {fn EQ} for comparing keys.  This means that if non-atoms are used as hash keys, the exact same object (not a copy) must be used to retrieve the hash value.

There are some applications for which the {fn EQ} constraint is too restrictive.  For example, it may be useful to use strings as hash keys, without the restriction that {fn EQUAL} but not {fn EQ} strings are considered to be different hash keys.

The user can override this default behavior for any hash array by specifying the functions used to compare keys and to "hash" a key to a number.  This can be done by giving the {arg HASHBITSFN} and {arg EQUIVFN} arguments to {fn HASHARRAY} ({PageRef Fn HASHARRAY}).

The {arg EQUIVFN} argument is a function of two arguments that returns non-{lisp NIL} when its arguments are considered equal.  The {arg HASHBITSFN} argument is a function of one argument that produces a positive small integer (in the range [0..2↑16-1]) with the property that objects that are considered equal by the {arg EQUIVFN} produce the same hash bits.

For an existing hash array, the function {fn HARRAYPROP} ({PageRef Fn HARRAYPROP}) can be used to examine the hashing and equivalence functions as the {lisp HASHBITSFN} and {lisp EQUIVFN} hash array properties.  These properties are read-only for non-empty hash arrays, as it makes no sense to change the equivalence relationship once some keys have been hashed.

The following function is useful for creating hash arrays that take strings as hash keys:

{FnDef {Name STRINGHASHBITS} {Args STRING}
{Text
Hashes the string {arg STRING} into an integer that can be used as a {lisp HASHBITSFN} for a hash array.  Strings which are {fn STREQUAL} hash to the same integer.
}}

Example:

{lispcode (HASHARRAY {arg MINKEYS} {arg OVERFLOW} 'STRINGHASHBITS 'STREQUAL)}

creates a hash array where you can use strings as hash keys.


}{End SubSec User-Specified Hashing Functions}


}{End Chapter Hash Arrays}