Page Numbers: Yes First Page: 1 Heading:z18697x3e12qjk72(635) June 16, 1977 10:41 AM [IVY]document>str-lev-mem-compz18697y774x3e12qjk72 Status: This is the old version, and has not been updated to reflect the current implementation. It probably needs substantial modification, but the basic ideas are still the same.l3528d2999x3e12\bi8B172b A proposal for description compiling and bootstrapping in KRL-1z18697x2e12ck72\b58f1 3f0 In the LISP 1.5 manual there was a description of the LISP interpreter in LISP. It was used only as a formal way of specifying the semantics, and the actual interpreter was machine coded, attempting to duplicate those semantics. In the AltoLisp system, the interpreter is coded in LISP, but that code is actually what runs. This bootstrapping is possible because there is a compiler which can convert LISP code into bytes which the hardware (microcode) can run.d3528x2e15jk40(2116)\7f1 4f0 43f1 4f0 16f1 4f0 205f1 4f0 117f1 4f0 I propose that KRL-1 be implemented with an analogous bootstrapping of data structures. In the past we have written KRL units describing the actual KRL objects such as descriptions, units, perspectives etc. Brian did a fairly complete set last summer. But these are like the LISP interpreter in the 1.5 manual -- done after the fact, and useful only as a form of documentation. In order to really use them, we need to build a description compiler, which plays the role analogous to the byte compiler in AltoLisp. We have been thinking of building such a compiler anyway, in order to allow people to compile their domain-related data structures in order to get better efficiency (by a factor of at least 3), both in space and time.d3528x2e3jk40\15f1 3f0 99f1 3f0 29f1 3f0 126f1 4f0 148bi20BI The result would be that the underlying KRL data structures, such as description, unit, and the various descriptor types would not be implemented as they were in KRL-0 with a set of record declarations. Instead they would be defined as KRL units, and all accesses to them would be done with KRL functions such as Seek and Describe. This includes all accesses to them in the functions which define Seek, Describe and Match! Processing which goes on in those functions could be the result of triggers appearing in the defining units.d3528x2e3jk40\40f1 3f0 26i11I2i4I76f1 3f0 72f1 3f0 52f1 3f0 The sections below outline:d3528x2e3jk40 * The basic notion of description compilingl4269d4798x2e3jk40 * A plan to bootstrap the entire systeml4269d4798x2e3jk40 * Some specific proposals for structuring compiled descriptions l4269d4798x2e3jk40 Description compilingz18697x2e12k72(635)\b If we think of KRL descriptive structures as data structures, there is a price paid for the flexibility of multiple description, inherited properties, procedural attachment, etc. Every time a data access (Seek) or data modification (Describe) is done, there is a high overhead in searching down lists of perspectives, lists of fillerpairs, etc. to find the right place in the structure, and in looking for triggers on the template unit, traps on the instance unit, etc. etc. For a great many of the instances built up in actual uses of KRL, the following are likely to be true:z18697d3528x2e3jk40(2116)\15f1 3f0 520f1 3f0 * The instance will be described using only a single template, or templates from a single chain in the further-specification hierarchy.z18344l4233d3528x2e3jk40 * The fillers of the fillerpairs will not make use of full-fledged descriptive power, but will be of a uniform type for all instances. Most of them will be either:z18344l4233d3528x2e3jk40 * unit co-references (including the implicit ones realized by strings, numbers and LISP structures);z18344l5503d4798x2e3jk40\83f1 4f0 * explicitly enumerated sets and sequences whose members are all explicit coreferences.z18344l5503d4798x2e3jk40 * The triggers associated with the templates will be static -- they will all be known before the program begins running, and will not change during the course of running.z18344l4233d3528x2e3jk40 * Most calls to Seek, Describe, etc. can be declared ahead of time as to what the nature of their description (e.g. Explicit unit coreference) will be.z18344l4233d3528x2e3jk40 Some systems (such as all normal computer languages and some AI systems like MDS) opt for efficiency by making these simple cases be the only ones possible. We can take advantage of the simplifications allowed in these cases, without giving up general power, by having a mixed mode in which some parts (but not all) of the descriptive structures are selectively compiled into records. These records would be like CLISP or ALGOL W, etc. in having a fixed structure with a pre-determined number of fields. A record would be allocated all at once, and accesses to it would be made directly (i.e. indexed adressing if the implementation is array-like, or pre-compiled CAR-CDR chains if list structures are used). The user would declare (using meta-descriptions) just what is to be compiled, and the system would automatically do the appropriate bookkeeping, updating, etc.z18697d3528x2e3jk40\377bi7BI31f1 5f0 4f1 7f0 236f1 7f0 The main implementation mechanisms are:z18697d3528x2e3jk40 * The underlying data structure for a unit has two parts -- a compiled record, and an explicit description. Any unit can have one or the other or both. Issue: this raises some questions about Unit names, unit headers, swapping, etc. which aren't worked out.z18344l4233d3528x2e3jk40\152i * There exist a set of units for meta-describing other units, declaring what the compiled form is to be. z18344l4233d3528x2e3jk40 * Seek, Describe and Match are CLISP forms which are compiled by a compiler which uses the declarations associated with units, and perhaps extra declarations associated with individual calls. This compiler is the most complex part of the implementation. The simple cases are:z18344l4233d3528x2e3jk40\i1I30f1 5f0 * Seek: Ifz18344l5503d4798x2e3jk40 the path describing where to Seek is explicitly present in the call to Seek, andz18344l6773d6068x2e3jk40 this path corresponds to a declared field of a record, andz18344l6773d6068x2e3jk40 the specifier telling what kind of description is sought corresponds to the declaration for what that field will contain, andz18344l6773d6068x2e3jk40 the process description says to only try using immediate informaton (i.e. not to do a generalized search)z18344l6773d6068x2e3jk40 then compile it as a quick tag check (see below) and a record access.z18344l6773d6068x2e3jk40 * Describe: Ifz18344l5503d4798x2e3jk40 the path describing where to put the new description is explicitly present in the call to Seek, andz18344l6773d6068x2e3jk40 this path corresponds to a declared field of a record, andz18344l6773d6068x2e3jk40 the description being added corresponds to the declaration for what that field will contain, (Note this can be determined either because the new description is quoted or MakeKRL'ed in the call to Describe, or through some kind of declaration associated with the call), andz18344l6773d6068x2e3jk40\174f1 3f0 the process description says to only try using immediate triggers (i.e. not to do a generalized search for inferences)z18344l6773d6068x2e3jk40 then compile it as the sequence:z18344l6773d6068x2e3jk40 a quick tag checkz18344l8043d7338x2e3jk40 a record storez18344l8043d7338x2e3jk40 an explicit series of calls to the triggers which are associated (at any level in the furher-specified hierarchy) with the slot and the kind of description being addedz18344l8043d7338x2e3jk40 z18344l5503d4798x2e3jk40 * Each record includes some kind of recognizable tag indicating the state of the declarations at time of compilation (e.g. an atom which is GENSYMMED anew every time potentially relevant declarations change). Compiled functions check to make sure that this tag matches the declarations at the time the function was compiled, and if not, the system recompiles the record or call (whichever is out of date) into the new form (using the old and new declarations) or failing this, falls back on intepretation. z18344l4233d3528x2e3jk40\140f1 9f0 * In all cases where the compiled record is insufficient, the system simply falls back on the intepreted corresponding functions. Some obvious examples of mixed mode are:z18344l4233d3528x2e3jk40 The main example is the implementation of meta-description. Each KRL object (unit, description, etc.) has both a compiled record (containing everything we had in the data records in the KRL-0 version), and an ordinary description (which we now call the meta-description).z18344l5503d4798x2e3jk40\66f1 3f0 118f1 3f0 Objects which have some slots which are much more generally used than others can choose to compile only a subset. As an obvious example, the unit for String will have at least a slot for length: and one for characterSequence:. Only the characterSequence appears in the compiled form, with length compiled as an ephemeral (see the last section) which is computed whenever it is needed.z18344l5503d4798x2e3jk40\151i6I31i8I12i18I12i17I36i6I Objects which normally have all of their description in compiled form need not be changed to add additional description. Descriptions can be attached, for example, which carry along historical information for use in debugging. This does not change the efficiency of what is already there.z18344l5503d4798x2e3jk40 Objects which are normally filled in with unit-coreferences can be used for meta-evaluation in which descriptions are added instead. The descriptions and normal fillers coexist in the two parts.z18344l5503d4798x2e3jk40 Bootstrappingz18697x2e12k72(635)\b Given the ability to compile descriptions into an efficient form, there is no reason to have explicit access to the underlying implementation (which will be INTERLISP records or datatypes of some sort). The writers of KRL-1 programs, including the system builders will not use functions corresponding to the old GetFillerPair, AddFiller, etc. All accesses to the structures will be through use of the higher-level functions Seek, Describe, and Match. The declarations of the basic data objects (including units and descriptions) will be in the form of units and descriptions and associated compiler meta-descriptions.z18697d3528x2e15jk40(2116)\157f1 9f0 53f1 3f0 13bi29BI234bi33BI In order to get this off the ground it is necessary to prime the pump with some hand-coded facsimiles of what the thing about to be programmed would have done if it had already been programmed. Once this first stage is completed, and it really is running, the hand-coded stuff can be replaced with the corresponding real thing. If all this seems mysterious, think of the analogous fact that Larry is in the process of getting the LISP compiler (which is written in LISP) to compile itself, so he can quit using the one which runs on MAXC. After a first pass of everything is running, then later versions can be produced using the earlier versions, so updating can go on with no further hand-coding.z18697d3528x2e3jk40 Figure 1 shows the different components and their logical connections. Figure 2 shows the time dependencies. The stages involved in the process are:z18697d3528x2e3jk40 Stage 1: Write a first version of the necessary units for basic structures and compiling. Decide on the compilation declarations for them. On the basis of this, produce hand-coded versions of:z18697l4057d3528x2e3jk40\b8B * A set of compiler-structures which describe both the basic KRL structures and the compiler structures themselvesz18697l5327d4798x2e3jk40\61f1 3f0 * A reader which produces compiled basic structures from textz18697l5327d4798x2e3jk40 Stage 2: Using the reader from Stage 1, read in the first version of the compiler for seek, describe and match. This compiler is a LISP program, except that it can include quoted KRL structures.z18697l4057d3528x2e3jk40\b8B172f1 3f0 Stage 3: Use the stage 1 reader, the stage 2 compiler and the stage 1 hand-coded structures to compile the following set of simple KRL-1 programs (which are written using no multi-process scheduling, and having only trivially compilable occurences of Seek, Describe, and Match):z18697l4057d3528x2e3jk40\b8B123f1 3f0 * A declaration compiler which produces compiler structures by looking at meta-descriptionsz18697l5327d4798x2e3jk40 * A Seek/Describe/Match compiler which handles more cases, and uses the actual unitsz18697l5327d4798x2e3jk40 * A translator which takes a compiled description which was compiled under an old set of declarations and produces a new compiled structure corresponding to new declarations.z18697l5327d4798x2e3jk40 Stage 4: Use the stage 3 programs and the actual unit definitions to go back and re-compile the stage 3 programs. Note that if everything goes well, the result should be exactly the same as the result of stage 3, but done with no feet on the ground. All of the hand-coded pieces are now vestigial. The translator isn't strictly necessary, assuming that on the way up, the units and declarations never got changed. However, it is vital in the case where changes are made.z18697l4057d3528x2e3jk40\b8B Stage 5: Use the stage 1 reader, the unit definitions, and the stage 4 programs to read and compile the following which are written in full KRL-1:z18697l4057d3528x2e3jk40\b9B131f1 3f0 * A schedulerz18697l5327d4798x2e3jk40 * An interpreter for the non-compiled cases of Seek, Describe, and Matchz18697l5327d4798x2e3jk40 * A reader and printerz18697l5327d4798x2e3jk40 Stage 6: We now should have a full running version of KRL-1. Note that the interpreter and scheduler don't need to be bootstrapped in, since they call themselves recursively and bottom out on simple cases which are compiled.z18697l4057d3528x2e3jk40\b8B46f1 3f0 Stage 7: Write smarter versions of the various compilers, using full KRL-1. This is an endless process, but it may be useful to make one step up from the versions written for stage 3.z18697l4057d3528x2e3jk40\b8B61f1 3f0 z18697l4057d3528x2e3jk40 Some detailed issuesz18697x2e12k72(635)\b Basic form of recordsz18697x2e12k72\b There are some tradeoffs of time/space/flexibility in deciding just what the records should be like. The major issues come up because of further-specification. I can think of two obvious proposals:z18697d3528x2e3jk40(2116) * Packed records. Each record consists of a simple linear sequence of fields. Further-specification hierarchies are restricted to be trees (only one parent for any unit). If B and C are further specified A's, then any instance of B will have a record whose fields are those of A, followed by those of B, and any instance of C will have a record whose fields are those of A, followed by those of C. Thus, a Seek which specifies a slot of an A can be compiled so that it will work on instances of both B and C, as well as instances labelled only as A. The advantage of this scheme is in the efficiency of the storage, and in the fact that all compiled code consists of simple access.z18697l4057d3528x2e3jk40\2b15B160b1B5b1B23b1B25b1B46b1B23b1B22b1B46b1B23b1B45b1B59b1B5b1B40b1B * Linked entries. Each record consists of a linked list of entries, each of which names a unit (and corresponding compile-time tag), and contains a simple linear sequence of fields. A Unit can be a further specification of any number of others, as long as none of the compiled names conflict. The fields stored in any entry are those which are specified in the Unit, but not in any of its further-specification parents. Accesses must include a run-time search for the appropriate entry. This leaves full flexibility in specifying hierarchies of structure inheritance, but is somewhat wasteful of both space and time. If we weren't bootstrapping all the way down to the fundamental structures, I would definitely favor this one. It looks like there should be some way to make it not really cost much in the case where the individual instances don't have multiple entries, but this leads to some complications in compiling. z18697l4057d3528x2e3jk40\2b15B What can be declared?z18697x2e12k72(635)\b Figure 3 gives a proposed outline of declarations which can be associated with a single slot. The declaration for a Unit basically just names the slots which are to be included:z18697d3528x2e3jk40(2116) * Type. The field can be specified to contain a description of a given form (specified with a meta-description), or to contain a naked unit-pointer which will be interpreted as a co-reference to that unit.z18697l4057d3528x2e3jk40\2b5B * Multiplicity The field can be specified to contain just one element of its type, or a set or a sequence (which will be represented with LISP lists)z18697l4057d3528x2e3jk40\2b13B123f1 4f0 * Changeability The field can be declared unchangeable (once it is specified, attempts to respecify it cause an error), or changeable. Set and sequence records which are changeable can be further declared as monotone-increasing (in which case simple LISP functions are used for adding) or arbitrary. An ephemeral field is different, in that it does not appear in the record. The declaration associates a procedure with getting and one with putting, and compiles calls to these whenever the corresponding field access would have been done.z18697l4057d3528x2e3jk40\2b13B Compiler structuresz18697x2e12k72(635)\b The information used by the compiler is spread around in a variety of meta-descriptions associated with the unit and its slots. This will be condensed into a simple data structure used by the compiler, and care must be taken to be sure that it gets updated appropriately (through triggers on the units for declarations). This structure must include:z18697d3528x2e3jk40(2116) * the unitz18697d3528x2e3jk40 * the name of the slotz18697d3528x2e3jk40 * the type of fieldz18697d3528x2e3jk40 * the associated procedures if it is ephemeralz18697d3528x2e3jk40 * triggers associated with that slot (perhaps in a compiled form)z18697d3528x2e3jk40