1 LISP LIBRARY PACKAGES MANUAL 1 LISP LIBRARY PACKAGES MANUAL HASH 1 EXTENSIONS TO LISP 1 HASH 6 Hash is a hash-coded dictionary facility, providing much the same functionality as hash arrays do, except that the data is stored on a file. 2 Introduction 1 The Hash package permits information associated with string or atom keys'' to be stored on and retrieved from files. The information (or values'') associated with the keys in a file may be numbers, strings, or arbitary Interlisp expressions. The associations are maintained by a hashing scheme that minimizes the number of file operations it takes to access a value from its key. 2 Hash Files 1 The information is saved in a hash file,'' which is analogous to a hash array. Actually, the phrase hash file refers either to the file itself, or to the handle on that file which is used by the Hash package. The latter, of data type HashFile, is the datum returned by CREATEHASHFILE or OPENHASHFILE, currently an array record containing the hash file name, the number of slots in the file, the used slots, and other details. All other functions with hash file arguments use this datum. In older implementations, (e.g., for Interlisp-10) hash files came in several varieties, according to the types of value stored in them. The EMYCIN system provided even more flexibility. This system only supports the most general EXPR type of hash files and EMYCIN-style TEXT entries, in the same file. The VALUETYPE and ITEMLENGTH arguments are for the most part ignored. Two-key hashing is supported in this system but is discouraged as it is only used in EMYCIN, not in the Interlisp-10 system. The functions GETPAGE, DELPAGE, and GETPNAME, which manipulate secret pages, do not exist in this implementation. However, it is permissible to write data at the end of a hash file; that data will be ignored by the Hash package, and can be used to store additional data. Creating a Hash File 1 (CREATEHASHFILE File ValueType ItemLength #Entries Smash CopyFn) [Function] Creates a new hash file named File. All other arguments are optional. ValueType can be EXPR or TEXT, however both kinds of entries can exist on the same file so EXPR is best. ItemLength is not used by the system but is currently saved on the file (if less than 256) for future use. #Entries is an estimate of the number of entries the file will have. (This should be a realistic guess.) Smash is a hash file datum to reuse. CopyFn is a function to be applied to entries when the file is rehashed (see the description of REHASHFILE). 2 Using a Hash File 1 Opening and Closing Hash Files 1 Before you can use a hash file with this package, you have to open it, using the function (OPENHASHFILE File Access ItemLength #Entries Smash) [Function] Reopens the previously existing hash file File. Access may be INPUT (or NIL), in which case File is opened for reading only, or BOTH, in which case File is open for both input and output. Causes an error NOT A HASHFILE, if File is not recognized as a hash file. ItemLength and #Entries are for backward compatibility with EMYCIN where OPENHASHFILE also created new hash files; these arguments should be avoided. Smash is a hash file datum to reuse. If Access is BOTH and File is a hash file open for reading only, OPENHASHFILE attempts to close it and reopen it for writing. Otherwise, if File designates an already open hash file, OPENHASHFILE is a no-op. OPENHASHFILE returns a hash file datum. When you are finished using a hash file, you should close it, using the function (CLOSEHASHFILE HashFile ReOpen) [Function] Closes HashFile. If ReOpen is non-NIL it should be one of the accepted access types. In this case the file is closed and then immediately reopened with access = ReOpen. This is used to make sure the hash file is valid on the disk. Storing and Retrieving Data 1 (PUTHASHFILE Key Value HashFile Key2) [Function] Puts Value under Key in HashFile. If Value is NIL, any previous entry for Key is deleted. Key2 is for EMYCIN two-key hashing: Key2 is internally appended to Key and they are treated as a single key. (GETHASHFILE Key HashFile Key2) [Function] Gets the value stored under Key in HashFile. Key2 is necessary if it was supplied to PUTHASHFILE. (LOOKUPHASHFILE Key Value HashFile CallType Key2) [Function] A generalized entry for inserting and retrieving values; provides certain options not available with GETHASHFILE or PUTHASHFILE. LOOKUPHASHFILE looks up Key in HashFile. CallType is an atom or a list of atoms. These keywords are interpreted as follows: RETRIEVE If Key is found, then if CallType is or contains RETRIEVE, the old value is returned from LOOKUPHASHFILE; otherwise returns T. DELETE If CallType is or contains DELETE, the value associated with Key is deleted from the file. REPLACE If CallType is or contains REPLACE, the old value is replaced with Value. INSERT If CallType is or contains INSERT, LOOKUPHASHFILE inserts Value as the value associated with Key. Combinations are possible. For example, (RETRIEVE DELETE) will delete a key and return the old value. (PUTHASHTEXT Key SRCFIL HashFile Start End) [Function] Puts text from SRCFIL onto HashFile under Key. Start and End are passed directly to COPYBYTES. (GETHASHTEXT Key HashFile DSTFIL) [Function] Uses COPYBYTES to retrieve text stored under Key on HashFile. The bytes are appended to the file DSTFIL. Functions for Manipulating Hash Files 1 (HASHFILEP HashFile Write?) [Function] Returns HashFile if it is a valid, open hash file datum or returns the hash file datum associated with HashFile if it is the name of an open hash file. If Write? is non-NIL, HashFile must also be open for write access. (HASHFILEPROP HashFile Property) [Function] Returns the value of a Property of a HashFile datum. Currently accepted properties are NAME, ACCESS, VALUETYPE, ITEMLENGTH, SIZE, #ENTRIES, CopyFn and STREAM. (HASHFILENAME HashFile) [Function] Same as (HASHFILEPROP HashFile 'NAME). (MAPHASHFILE HashFile MapFn Double) [Function] Maps over HashFile applying MapFn. If MapFn takes two arguments, it is applied to Key and Value. If MapFn only takes one argument, it is only applied to Key and saves the cost of reading value from the file. If Double is non-NIL, then MapFn is applied to (Key1 Key2 Value) or (Key1 Key2) if the MapFn only takes two arguments. (REHASHFILE HashFile NewName NewValueType) [Function] As keys are replaced, space in the data section of the file is not reused (though space in the key section is). Eventually the file may need rehashing to reclaim the wasted data space. REHASHFILE is really a special case of COPYHASHFILE, and creates a new file. If NewName is non-NIL, it is taken as the name of the rehashed file. NewValueType is a no-op. The system automatically rehashes files when 7/8 of the key section is filled. The system will print a message when automatically rehashing a file if the global variable REHASHGAG is non-NIL. Certain applications save data outside Hash's normal framework. Hash files for those applications will need a custom CopyFn, which is used to copy data during the rehashing process. The CopyFn is used as the FN argument to COPYHASHFILE during the rehashing. (COPYHASHFILE HashFile NewName FN ValueType LeaveOpen) [Function] Makes a copy of HashFile under NewName. Each key and value pair is moved individually and if FN is supplied, is applied to (KEY VALUE HASHFILE NEWHASHFILE) and what it returns is used as the value of the key in the new hash file (this lets the user intervene, perhaps to copy out-of-band data associated with VALUE). ValueType is a no-op. If LeaveOpen is non-NIL then the new hash file datum is returned open, otherwise the new Hash file is closed and the name is returned. (HASHFILESPLST HashFile XWord) [Function] Returns an Interlisp generator for the keys in HashFile, usable with the spelling corrector. If XWord is supplied, only keys starting with the prefix in XWord are generated. Global Variables 1 The variables used by the system of interest to the user are: HASHFILEDEFAULTSIZE 512 Size used when #Entries is omitted or is too small. HASHFILERDTBL ORIG The hash file read table. HASHLOADFACTOR .875 The ratio, used slots/total slots, at which the system rehashes the file, initially 7/8. HASHTEXTCHAR ^A The character separating two key hash keys. HFGROWTHFACTOR 3 The ration of total slots to used slots when a hash file is created. REHASHGAG NIL Flags whether to print message when rehashing; initially off. SYSHASHFILE NIL The current hash file. SYSHASHFILELST NIL An alist of open hash files. 2 Implementation 1 The Hash package views files as a sequence of bytes, randomly accessible. No notice is made of pages and it is assumed that the host computer buffers I/O sufficiently. Hash files consist of a short header section (8 bytes), a layer of pointers (4*HASHFILE:Size bytes) followed by ASCII data. Pointers are three bytes wide, preceeded by a status byte. The pointers point to key PNAMES in the data section, where each key is followed by its value. Deleted key pointers are reused, deleted data space is not, so rehashing is required if many items have been replaced.'' The data section starts at 4*HASHFILE:Size + 9, and consists of alternating keys and values. As deleted data is not rewritten, not all data in the data section is valid. When a key hashes into a used slot, a probe value is added to it to find the next slot to search. The probe value is a small prime derived from the original hash key. 2 Limitations 1 The system currently is able to manipulate files on CORE, DSK, FLOPPY and over the network, via leaf servers. Hash files cannot be used with NS servers until they support random access files. Due to the pointer size, only hash files of < 6 million initial entries can be created, though these can grow to 14 million entries before automatic rehashing exceeds the pointer limit. The total file length is limited to 16 million bytes. No range checking is done for these limits.(LIST ((PAGE NIL (PAPERSIZE Letter FOLIOINFO (ARABIC ) STARTINGPAGE# 282) (0 0 612 792) ((FOLIO NIL (PARALOOKS (QUAD LEFT) CHARLOOKS (SUPERSCRIPT 0 INVISIBLE OFF SELECTPOINT OFF PROTECTED OFF SIZE 10 FAMILY MODERN OVERLINE OFF STRIKEOUT OFF UNDERLINE OFF EXPANSION REGULAR SLOPE REGULAR WEIGHT MEDIUM INVERTED OFF USERINFO NIL STYLE NIL) FORMATINFO (ARABIC )) (54 12 288 36) NIL) (HEADING NIL (HEADINGTYPE FOOTINGV) (54 27 558 36) NIL) (HEADING NIL (HEADINGTYPE VERSOHEAD) (54 762 558 36) NIL) (TEXT NIL NIL (54 54 504 618) NIL))) (PAGE NIL (PAPERSIZE Letter FOLIOINFO (ARABIC )) (0 0 612 792) ((FOLIO NIL (PARALOOKS (QUAD RIGHT) CHARLOOKS (SUPERSCRIPT 0 INVISIBLE OFF SELECTPOINT OFF PROTECTED OFF SIZE 10 FAMILY MODERN OVERLINE OFF STRIKEOUT OFF UNDERLINE OFF EXPANSION REGULAR SLOPE REGULAR WEIGHT MEDIUM INVERTED OFF USERINFO NIL STYLE NIL) FORMATINFO (ARABIC )) (270 12 288 36) NIL) (HEADING NIL (HEADINGTYPE FOOTINGR) (54 27 558 36) NIL) (HEADING NIL (HEADINGTYPE RECTOHEAD) (54 762 558 36) NIL) (TEXT NIL NIL (54 54 504 684) NIL))) (PAGE NIL (PAPERSIZE Letter FOLIOINFO (ARABIC )) (0 0 612 792) ((FOLIO NIL (PARALOOKS (QUAD LEFT) CHARLOOKS (SUPERSCRIPT 0 INVISIBLE OFF SELECTPOINT OFF PROTECTED OFF SIZE 10 FAMILY MODERN OVERLINE OFF STRIKEOUT OFF UNDERLINE OFF EXPANSION REGULAR SLOPE REGULAR WEIGHT MEDIUM INVERTED OFF USERINFO NIL STYLE NIL) FORMATINFO (ARABIC )) (54 12 288 36) NIL) (HEADING NIL (HEADINGTYPE FOOTINGV) (54 27 558 36) NIL) (HEADING NIL (HEADINGTYPE VERSOHEAD) (54 762 558 36) NIL) (TEXT NIL NIL (54 54 504 684) NIL)))))*TT/T0TT. . ../T.)T( )T)T)2T)T(B PAGEHEADING VERSOHEADB PAGEHEADING RECTOHEADA PAGEHEADINGFOOTINGVA PAGEHEADINGFOOTINGRMODERN MODERN MODERNMODERNMODERN  HRULE.GETFNMODERN  HRULE.GETFNMODERN  HRULE.GETFNMODERN  HRULE.GETFNMODERN    HRULE.GETFNMODERN HRULE.GETFNMODERN   HRULE.GETFNMODERN   HRULE.GETFNMODERN   HRULE.GETFNMODERN     HRULE.GETFNMODERN ; % ` ac g HRULE.GETFNMODERN  HRULE.GETFNMODERN   HRULE.GETFNMODERN  Z M *04H$  r@(Q  A HRULE.GETFNMODERN    '  1 5 L ^ 2 8 /g  #  - % & HRULE.GETFNMODERN    W-% s          &07     <  v?0 !   8  {   /)4 HRULE.GETFNMODERN  >Y,E> HRULE.GETFNMODERN  HRULE.GETFNMODERN  { HRULE.GETFNMODERN   HRULE.GETFNMODERN  'z