File [eris]<LispManual>HashArrays.im!2



{Begin SubSec Hash Arrays}
{Title Hash Arrays}
{Text


{Note Interlisp-D doesn't GC any hash keys. }

{Note Interlisp-Vax doesn't throw away any hash keys}

{Note Will the garbage collector ever delete entries from a hash table?
Personal opinion:  I think that the GC should NOT delete entries, and that you should always be able to retrieve everything you put in a hash array using MAPHASH ---mjs}

{Note Exactly how are hash arrays different from arrays?  Do we want to document this here?  Personal opinion: we want to treat hash arrays as a totally different data type ---mjs}

{Tag HashArrays}

{index *PRIMARY* hash arrays}
{index *PRIMARY* hash keys}
{index *PRIMARY* hash values}

Hash arrays provide a mechanism for associating arbitrary lisp objects ("hash keys") with other objects ("hash values"), such that the hash value associated with a particular hash key can be quickly obtained.  A set of associations could be represented as a list or array of pairs, but these schemes are very inefficient when the number of associations is large.  There are functions for creating hash arrays, putting a hash key/value pair in a hash array, and quickly retrieving the hash value associated with a given hash key.

Hash keys can be any lisp object, but is should be noted that the hash array functions use {fn EQ} for comparing hash keys.  Therefore, if non-atoms are used as hash keys, the exact same object (not a copy) must be used to retrieve the hash value.

{note paragraph about GC of non-atomic hash keys (done in I-10, not in I-D)?}


In the description of the functions below, the argument {arg HARRAY} has one of three forms:
{lisp NIL}, in which case a hash array provided by the system, {index SYSHASHARRAY Var}{var SYSHASHARRAY}, is used; 
a hash-array created by the function {fn HARRAY};
or a list, {fn CAR} of which is a hash array.
The latter form is used for specifying what is to be done on overflow, as described below.


{Note What does 'size' MEAN??  Is size='max number hash links', or 2 times that?

vaguely, size=# of "cells" of storage  ---lmm

do we want to doc this??}


{FnDef {FnName HARRAY} {FnArgs LEN}
{Text
Creates a hash array containing at least {arg LEN} hash keys.
}}



{FnDef {FnName HARRAYSIZE} {FnArgs HARRAY}
{Text
Returns the size of {arg HARRAY}; the number of hash keys it can hold before becoming "full".
}}


{FnDef {FnName CLRHASH} {FnArgs HARRAY}
{Text
Clears all hash keys/values from {arg HARRAY}.
Returns {arg HARRAY}.
}}


{FnDef {FnName PUTHASH} {FnArgs KEY VAL HARRAY}
{Text
Associates the hash value {arg VAL} with the hash key {arg KEY} in {arg HARRAY}.  Replaces the previous hash value, if any.  If {arg VAL} is {lisp NIL}, any old association is removed (hence a hash value{index hash value} of {lisp NIL} is not allowed).  If {arg HARRAY} is full when {fn PUTHASH} is called with a key not already in the hash array, the function {fn HASHOVERFLOW} is called, and the {fn PUTHASH} is done to the value returned (see below).  Returns {arg VAL}.
}}


{FnDef {FnName GETHASH} {FnArgs KEY HARRAY}
{Text
Returns the hash value associated with the hash key {arg KEY} in {arg HARRAY}.  Returns {lisp NIL}, if {arg KEY} is not found.
}}


{FnDef {FnName REHASH} {FnArgs OLDHARRAY NEWHARRAY}
{Text
Hashes all hash keys and values in {arg OLDHARRAY} into {arg NEWHARRAY}.  The two hash arrays do not have to be (and usually aren't) the same size.  Returns {arg NEWHARRAY}.

{note in Interlisp-Vax, if OLDHARRAY=NEWHARRAY, rehashes OLDHARRAY "in place".}
}}


{FnDef {FnName MAPHASH} {FnArgs HARRAY MAPHFN}
{Text
{arg MAPHFN} is a function of two arguments.
For each hash key in {arg HARRAY}, {arg MAPHFN} will be
applied to (1) the hash value, and (2) the hash key.  For example,

{lispcode
[MAPHASH A
   (FUNCTION (LAMBDA (VAL KEY)
                  (if (LISTP KEY) then (PRINT VAL)]}

will print the hash value for all hash keys that are lists.  {fn MAPHASH} returns {arg HARRAY}.
}}


{FnDef {FnName DMPHASH} {FnArgs HARRAY{SUB 1} HARRAY{SUB 2} {ellipsis} HARRAY{SUB N}}
{Type NOSPREAD NLAMBDA}
{Text
Prints on the primary output file {fn LOAD}able forms which will restore the hash-arrays contained as the values of the atoms
{arg HARRAY{SUB 1}}, {arg HARRAY{SUB 2}},
{ellipsis} {arg HARRAY{SUB N}}.
Example: {lisp (DMPHASH SYSHASHARRAY)} will dump the system hash-array.

Note: all {fn EQ} identities except atoms and small integers are lost by dumping and loading because {fn READ} will create new structure for each item.
Thus if two lists contain an {fn EQ} substructure, when they are dumped and loaded back in, the corresponding substructures while {fn EQUAL} are no longer {fn EQ}.
The {lisp HORRIBLEVARS} file package command ({PageRef FileCom HORRIBLEVARS}) provides a way of dumping hash tables such that these identities are preserved.
}}


{note use CIRCULARVARS or GENERALVARS instead of HORRIBLEVARS  ---lmm}




{Begin SubSec Hash Overflow}
{Title Hash Overflow}
{Text

{index hash overflow}

{note who created this kludgy way of specifying overflow?  Why not simply have the overflow action be determined by a property of an HARRAY?  Then, you wouldn't have to pass around these dumb lists.}


When a hash array becomes full, attempting to add another hash key will cause the function {fn HASHOVERFLOW}{index HASHOVERFLOW Fn} to be called.  This will either automatically enlarge the hash array, or cause the error {lisp HASH TABLE FULL}.{index HASH TABLE FULL Error}  How hash overflow is handled is determined by the form that was passed to {fn PUTHASH}:


{Begin LabeledList the form that is passed to the hash function}

{Name {arg HARRAY}}
{Text
If a plain hash array is passed to a hash function, and it overflows, the error {lisp HASH ARRAY FULL}{index HASH ARRAY FULL Error} is generated.
}

{Name {lisp NIL}}
{Text
If a hash function is passed {lisp NIL} as its {arg HARRAY} argument, the system hash array {var SYSHASHARRAY}{index SYSHASHARRAY Var} is used.  This array is not used by the system, but is provided for the user.  If {var SYSHASHARRAY} overflows, it is automatically enlarged by 1.5.
}

{Name {lisp ({arg HARRAY} . {arg N})}}
{Text
{arg N} is a positive integer.  This form specifies that upon hash overflow, a new hash-array is created with {arg N} more cells than the current hash-array.
}

{Name {lisp ({arg HARRAY} . {arg F})}}
{Text
{arg F} is a floating point number.  This form specifies that upon hash overflow, the new hash array will be {arg F} times the size of the current hash-array.
}

{Name {lisp ({arg HARRAY} . {arg FN})}}
{Text
{arg FN} is a function name or a lambda expression.  This form specifies that upon hash overflow, {arg FN} is called with {lisp ({arg HARRAY} . {arg FN})} as its argument.  If {arg FN} returns a number, the number will be the size of the new hash array.  Otherwise, the new size defaults to 1.5 times the size of the old hash array.  {arg FN} could be used to print a message, or perform some monitor function.
}

{Name {lisp ({arg HARRAY})}}
{Text
Equivalent to {lisp ({arg HARRAY} . 1.5)}.
}


{End LabeledList the form that is passed to the hash function}


If a list form is used, upon hash overflow the new hash-array is {fn RPLACA}ed into the dotted pair, and {fn HASHOVERFLOW} returns it.


}{End SubSec Hash Overflow}



}{End SubSec Hash Arrays}