{Note The next few paragraphs seem to be a little too explicit.  Do we really want to go into this much detail?  How about describing hash arrays as 'black-boxes' that you give a hash-item to and receive a hash-value in return?  Perhaps there could be a seperate section later describing the algorithm in a little more detail.  ---mjs}{index hash links}A hash link is an association between any Interlisp pointer (atoms, numbers, arrays, strings, lists, et al) called the {index hash-item}{term hash-item}, and any other Interlisp pointer called the {index hash-value}{term hash-value}.Hash links are implemented by computing an address, called the {term hash-address}{index hash-address}, in a specified array, called the {term hash-array},{index hash-array} and storing the {term hash-item}{index hash-item} and the {term hash-value}{index hash-value} into the cell with that address.  The contents of that cell, i.e. the {term hash-item} and {term hash-value}, is then called the {index hash-link}{term hash-link}.{footThe term {term hash link} (unhyphenated) refers to the process of associating information this way, or the "association" as an abstract concept.}{comment endfootnote}Since the hash-array is obviously much smaller than the total number of possible hash-items,{footwhich is the total number of Interlisp pointers, i.e. in Interlisp-10, 256K.}{comment endfootnote}the hash-address computed from {arg ITEM} may already contain a hash-link.If this link is from {arg ITEM},{foot{fn EQ} is used for comparing {arg ITEM} with the hash-item in the cell.}{comment endfootnote}the new hash-value simply replaces the old hash-value.Otherwise, another hash-address (in the same hash-array) must be computed, etc, until an empty cell is found,{footWhen the hash array becomes 7/8 full, it is considered to be full, and the array is either enlarged, or an error is generated, as described below in the discussion of overflow.}{comment endfootnote}or a cell containing a hash-link from {arg ITEM}.{index hash links}When a hash link for {arg ITEM} is being retrieved, the hash-address is computed using the same algorithm asthat employed for making the hash link.If the corresponding cell is empty, there is no hash link for {arg ITEM}.If it contains a {index hash-link}hash-link from {arg ITEM}, the {index hash-value}hash-value is returned.Otherwise, another hash-address must be computed, and so forth.Note that more than one hash link can be associated with a given {index hash-item}hash-item by using more than one {index hash-array}hash-array.----------{var SYSHASHARRAY} is not used by the system, it is provided solely for the user's benefit.  It is initially 512 words large, and is automatically enlarged by 50% whenever it is "full".  See page {PageRef L!6}.-----------In Interlisp-10, the size of the hash array may be increased so that it is relatively prime to possible probe intervals.{note is this worth mentioning?}{Begin Note}Date: 28 Oct 1981 11:16 PSTFrom: KAPLAN at PARC-MAXCFYI, hash arrays in Interlisp-D currently ARE a power of 2, in the interests ofefficiency.  That might change if we get microcode support for hashing.{End Note}-----------By using an array argument of a special form, the user can provide for automatic enlargement of a {index hash-array}hash-array when it overflows,i.e., is full and an attempt is made to store a hash link into it.The array argument is either of the form[1] {lisp ({arg HARRAY} . {arg N})}, {arg N} a positive integer;[2] {lisp ({arg HARRAY} . {arg F})}, {arg F} a floating point number;[3] {lisp ({arg HARRAY})};or[4] {lisp ({arg HARRAY} . {arg FN})}, {arg FN} a function name or a lambda expression.In the first case, a new hash-array is created with {arg N} more cells than the current hash-array.In the second case, the new hash array will be {arg F} times the size of the current hash-array.The third case, {lisp ({arg HARRAY})},is equivalent to {lisp ({arg HARRAY} . 1.5)}.In the fourth case, {lisp ({arg HARRAY} . {arg FN})}, {arg FN} is called with {lisp ({arg HARRAY} . {arg FN})} as its argument.  If {arg FN} returns a number, the number will be the size of the new hash array.Otherwise, the new size defaults to 1.5 times the size of the old hash array, e.g. {arg FN} could be used to print a message, or perform some monitor function.In each case, the new hash-array is {fn RPLACA}ed into the dotted pair, and the computation continues.If a {index hash-array}hash-array overflows, and the array argument used was not one of these three forms, the error{lisp HASH TABLE FULL}{index HASH TABLE FULL EM} is generated, which will either cause a break or unwind to the last {index ERRORSET}{fn ERRORSET}, as per treatment of errors described in{SectionRef L!ERRORS}.The system hash array,{index SYSHASHARRAY SY} {var SYSHASHARRAY}, is automatically enlarged by 1.5 when it is full.-----------{Begin Note}Date: 23 Sep 1981 0740-PDTFrom: Dave Dyer <DDYER at USC-ISIB>To: Lispdiscussion^.pa at PARC-MAXC I'd like opinions to what extent hash arrays should be usableinterchangably with ordinary arrays. Should SETA and ELT work? Should both SETA and SETD retreve the key and data fields respectively? Should ARRAYSIZE work?  And which number, actual size or HARRAYSIZE   should it return if those are different? I tend to favor the view that ARRAYS and HARRAYS should not beinterchangable, to allow maximum flexibility in the choice ofthe representation of hash arrays across various Interlisps.(in which case, Interlisp-10 should be made more restrictive)Date: 25 Sep 1981 10:09 PDTFrom: Bobrow at PARC-MAXCIn-reply-to: DDYER's message of 23 Sep 1981 0740-PDTI agree that ARRAYS and HARRAYS should not be interchangable.  The garbage collection properties of the two should be able to be different.  Having something as a key in a hash array should not necessarily allow one to hold on to it.Date: 25 Sep 1981 14:17 PDTFrom: Masinter at PARC-MAXCSubject: Re: Arrays and hasha arraysThe issue has very little to do with GC semantics, which relates as much toMAPHASH rather than ELT and SETA, but whether HARRAYP can beimplemented other than as an array.My opinions are:HARRAYP can be a separate datatype from ARRAYP. All current lisp systemcode is believed to be written in such a way that that is true (it is stated so inthe VM). Note that TYPENAME must return the appropriate token when givenan ARRAYP or HARRAYP.Thus, ARRAYSIZE, SETA, ELT, SETD, ELTD are NOT expected to work onHARRAYPs; the only operations valid for things created by HARRAY areGETHASH, PUTHASH, CLRHASH, MAPHASH, HARRAYSIZE. The requirement on HARRAYSIZE is as follows:	(HARRAYSIZE (HARRAY n)) ge n	X must hold (HARRAYSIZE X) - (HARRAYSIZE Y) more key/value		atributes than Y.(In Interlisp-D, while things created by HARRAYP will be ARRAYP and respondto ARRAYSIZE, the ELT and SETA functions will complain when given aHARRAYP.){End Note}{Begin Note}Date: 19 Apr 1982 16:03 PSTFrom: JonL at PARC-MAXCSubject: Re: COPYARRAYI agree pretty strongly with Dyer that "hash-arrays" should not be thoughtof as arrays -- I've a version of a hashing package written in NIL/CommonLispwhich implements a HASH-TABLE as a semi-intelligent "object", and generallyone array isn't enough to hold the appropriate data (about a dozen more statevariables were needed, and in some cases a second data array).  Thefunctionalities implemented were a superset of those documented in theLISPMachine manual for hash-on-eq and hash-on-equal.Perhaps the current Interlisp hasharray should be kept for backwardscompatibility only, and some future package could implement a moredeveloped facility.  If so, it would be fairly important to have a typeof array which holds pointers, but which does not cause the pointers to beprotected/copied during a GC; alternatively, there could be a type ofpointer array which the GC sweeps after marking everything else, andjust deletes entries which haven't been marked elsewhere.  Thus it would bepossible for "GC" methods to take care of the case where you want a table entryto go away when no one else points to its key.  Incidentaly, the use of EXTENDs (i.e., "object-oriented" programming) madeit easy to put in special PRINT methods for these hash-tables, but let the defaultclass heirarchy values stand for most other methods, e.g., COPY.Date: 19 APR 1982 2246-PSTFrom: MASINTER.PAIt is clear that Interlisp says that HARRAYPs can be differentfrom ARRAYP, and it is a quirk of Interlisp-10 that COPYARRAworks (I'm actually not sure).But what is this about GC's "sweeping"?{End Note}{Begin Note}-------------------------------------the following few messages deal with the problem of hash array entries being deleted by the garbage collector -----mjsDate: 3 NOV 1977 1925-PSTDate: 29 APR 1975 1115-PDTFrom: DEUTSCHSubject: GCTo:   hartley at BBNBcc:   teitelman, bobrowI have (apparently ) finally encountered a situation where I reallyneed to know under precisely what circumstances the G.C. will deletean entry from a hash array.  I have a (unfortunately large) programwhich works properly if I do MINFS(25000) and RECLAIM) before Istart running it, but if I don't, it blows up with a NON-NUMERIC ARGNIL which results from something which it "knows" is in a hash arraynot being there.  In the latter case, a GC: 8 occurs during a phaseof the computation when there are many hash arrays beingfilled which will be examined later.Just to explain my current understanding: it is my impression thatthe G.C. will delete an entry from a hash array if the key(the thing you give to GETHASH) is otherwise reclaimable, i.e. ifthere are no references to the key other than from entry keys inhash arrays, and the key is not a small number or litatom.If this is true, I think it may be what is screwing me, sinceI think I may have hash arrays which at some point in their livesonly exist to be scanned with MAPHASH and whose keys no longer havereferences to them in some cases.  (This is not a contrived situation:imagine a system for manipulating sets which represents sets ashash arrays.  Then the keys almost certainly have no other referencesto them, and the ONLY important use of the arrays is MAPHASH.)-------29-APR-75 13:21:41-PDT,1050;000000000001Net mail from site BBN-TENEXB rcvd at 29-APR-75 13:21:33Date: 29 APR 1975 1619-EDTFrom: HARTLEY at BBN-TENEXBSubject: HASH ARRAYS AND GCTo:   DEUTSCH at PARCcc:   TEITELMAN at PARC, BOBROW at PARC, LEWISPETER,YOUR SITUATION IS A MORE COMPLICATED VERSION OF A SITUATIONTHAT AROSE HERE AWHILE AGO AND HAS BEEN BUGGINGME EVER SINCE.A USER SAVED A HASH ARRAY WITH DUMPHASH(SPELLING?) ANDFOUND THAT SHE COULDNT GET IT LOADED IN BECAUSEA GC OCCURED WHILE IT WAS LOADINGAND ALL THE HASH ENTRIES WENT AWAY. HER INTENT WASTO LOAD THE HASH ARRAY AND THEN MAPHASH THRU ITTO CONSTRUCT THE REST OF HER DATA STRUCTURE (HAD SHEGOTTENT O THE STAGE OF CREATING THE REST OF THESTRUCTURE BEFORE THE GC THEN THE HASH ARRAY WOULD NOTHAVE DISAPPEARED (THE CONTENTS OF THE HASH ARRAY THET IS)).THE LESSON TO BE LEARNED IS THAT EITHER MAPHASH SHOULD NOTEXIST, OR THAT THE GARBAGE COLLECTOR SHOULD NEVERDELETE ENTRIES FROM A HASH ARRAY.THIS PROBLEM IS RELATED TO MY OBJECTIONS TO THEDESIRE FOR MAPATOMS.Date: 29 APR 1975 1414-PDTFrom: DEUTSCHSubject: hash arrays & gcTo:   hartley at BBNBcc:   teitelman, bobrow, lewis at BBNBWell, I'm glad to learn I'm not alone in being bitten by this bug.The logical problem is fairly subtle.  I'm not willing to give upMAPHASH (and I see it as being different than MAPATOMS in that itrequires an explicit act to create and delete hash associations whereasatoms are created "at need" and therefore may more reasonable bereclaimed "when no longer needed").  What is really going in is thatsome hash arrays are sort of like property list associations,which persist indefinitely even if the user has forgotten all aboutthe atom that possesses them, whereas others are more like associationsby direct pointers (like list-records), which disappear when the keyis no longer accessible.  Litatoms can possess both types ofassociations: an example of the former is the EXPR property, whereasa "memo function" like a numeric hash from the name might be an exampleof the latter property.I confess I don't see a good solution to this problem.  The one thatcomes to mind is a bit in the hash array that tells the g.c. whetherit is allowed to delete entries, but somehow this is unsatisfying.However, if (as it appears) the difference between the two situationsis really the user's "intent", then there is no way the system canalways do the right thing, and some kind of explicit user-settable bitis really required.Comments?-----------------------------------the following few messages deal with the problem of MAPHASH (in Interlisp-10) working incorrectly if a garbage collection occurs at the wrong time-----mjsDate: 1 Mar 1982 13:09 PSTFrom: Masinter at PARC-MAXCSubject: Re: MAPHASH problems in Interlisp??(1) You have unfortunately run into one of the more subtle problems withMAPHASH in Interlisp-10: if a garbage collection which MOVES STORAGEoccurs during the middle of a MAPHASH, it is possible for the hashpointers to move around, and for entries to be missed and for some entriesto be visitied twice. This is the only situation in which MAPHASH willomit items or present them twice (note that "rehashing" actually copies theoriginal array into another one, so that if a rehash occurs because of overflow,you may get outdated information but not any duplicates.)(2) The problem is that if a reclaim needs to increase the size of one of thecontiguous areas (such as array space or string pointer space), it may actuallymove around pages of atoms. It isn't that atoms get compacted but rather thatother spaces have to increase which causes the atoms to get moved around.(3) The way that I worked around this problem when I ran into it was	(a) MAPHASH down the array, collecting a list of the "keys"	(b) MAPC down that list, performing the operationThis guarantees that no string/array/pname garbage collection will occur duringMAPHASH.There are some proposals for fixing this problem in Interlisp-10 (e.g., marking thearray that it is being maphashed, and if so marked, not rehashing during areclaim but fixing it the next puthash) but so far (for the last 4 years) noprogress on fixing it.This bug is not present in other Interlisp implementations, as far as I know. Date:  1 Mar 1982 1732-PSTFrom: Steve Crocker <Crocker at USC-ISIF>Subject: Re: MAPHASH problems in Interlisp??An alternative strategy is not to use MAPHASH at all.  An auxiliarylist may be kept, suitably updated whenever items are added to or deletedfrom the table.If this sounds ridiculously expensive, I submit that it competes with theproposed solution for some frequencies of MAPHASHing, adding, deleting andaccessing elements.It would be interesting to see the crossover point.{End Note}