[Indigo]<KRL>document>str-lev-mem-comp!2

Page Numbers: Yes First Page: 1
Heading:

June 16, 1977 10:41 AM [IVY]<KRL>document>str-lev-mem-comp

Status: This is the old version, and has not been updated to reflect the current implementation. It probably needs substantial modification, but the basic ideas are still the same.

A proposal for description compiling and bootstrapping in KRL-1

In the LISP 1.5 manual there was a description of the LISP interpreter in LISP. It was used only as a formal way of specifying the semantics, and the actual interpreter was machine coded, attempting to duplicate those semantics. In the AltoLisp system, the interpreter is coded in LISP, but that code is actually what runs. This bootstrapping is possible because there is a compiler which can convert LISP code into bytes which the hardware (microcode) can run.

I propose that KRL-1 be implemented with an analogous bootstrapping of data structures. In the past we have written KRL units describing the actual KRL objects such as descriptions, units, perspectives etc. Brian did a fairly complete set last summer. But these are like the LISP interpreter in the 1.5 manual -- done after the fact, and useful only as a form of documentation. In order to really use them, we need to build a description compiler, which plays the role analogous to the byte compiler in AltoLisp. We have been thinking of building such a compiler anyway, in order to allow people to compile their domain-related data structures in order to get better efficiency (by a factor of at least 3), both in space and time.

The result would be that the underlying KRL data structures, such as description, unit, and the various descriptor types would not be implemented as they were in KRL-0 with a set of record declarations. Instead they would be defined as KRL units, and all accesses to them would be done with KRL functions such as Seek and Describe. This includes all accesses to them in the functions which define Seek, Describe and Match! Processing which goes on in those functions could be the result of triggers appearing in the defining units.

The sections below outline:

* The basic notion of description compiling

* A plan to bootstrap the entire system

* Some specific proposals for structuring compiled descriptions

Description compiling

If we think of KRL descriptive structures as data structures, there is a price paid for the flexibility of multiple description, inherited properties, procedural attachment, etc. Every time a data access (Seek) or data modification (Describe) is done, there is a high overhead in searching down lists of perspectives, lists of fillerpairs, etc. to find the right place in the structure, and in looking for triggers on the template unit, traps on the instance unit, etc. etc. For a great many of the instances built up in actual uses of KRL, the following are likely to be true:

* The instance will be described using only a single template, or templates from a single chain in the further-specification hierarchy.

* The fillers of the fillerpairs will not make use of full-fledged descriptive power, but will be of a uniform type for all instances. Most of them will be either:

* unit co-references (including the implicit ones realized by strings, numbers and LISP structures);

* explicitly enumerated sets and sequences whose members are all explicit coreferences.

* The triggers associated with the templates will be static -- they will all be known before the program begins running, and will not change during the course of running.

* Most calls to Seek, Describe, etc. can be declared ahead of time as to what the nature of their description (e.g. Explicit unit coreference) will be.

Some systems (such as all normal computer languages and some AI systems like MDS) opt for efficiency by making these simple cases be the only ones possible. We can take advantage of the simplifications allowed in these cases, without giving up general power, by having a mixed mode in which some parts (but not all) of the descriptive structures are selectively compiled into records. These records would be like CLISP or ALGOL W, etc. in having a fixed structure with a pre-determined number of fields. A record would be allocated all at once, and accesses to it would be made directly (i.e. indexed adressing if the implementation is array-like, or pre-compiled CAR-CDR chains if list structures are used). The user would declare (using meta-descriptions) just what is to be compiled, and the system would automatically do the appropriate bookkeeping, updating, etc.

The main implementation mechanisms are:

* The underlying data structure for a unit has two parts -- a compiled record, and an explicit description. Any unit can have one or the other or both. Issue: this raises some questions about Unit names, unit headers, swapping, etc. which aren’t worked out.

* There exist a set of units for meta-describing other units, declaring what the compiled form is to be.

* Seek, Describe and Match are CLISP forms which are compiled by a compiler which uses the declarations associated with units, and perhaps extra declarations associated with individual calls. This compiler is the most complex part of the implementation. The simple cases are:

* Seek: If

the path describing where to Seek is explicitly present in the call to Seek, and

this path corresponds to a declared field of a record, and

the specifier telling what kind of description is sought corresponds to the declaration for what that field will contain, and

the process description says to only try using immediate informaton (i.e. not to do a generalized search)

then compile it as a quick tag check (see below) and a record access.

* Describe: If

the path describing where to put the new description is explicitly present in the call to Seek, and

this path corresponds to a declared field of a record, and

the description being added corresponds to the declaration for what that field will contain, (Note this can be determined either because the new description is quoted or MakeKRL’ed in the call to Describe, or through some kind of declaration associated with the call), and

the process description says to only try using immediate triggers (i.e. not to do a generalized search for inferences)

then compile it as the sequence:

a quick tag check

a record store

an explicit series of calls to the triggers which are associated (at any level in the furher-specified hierarchy) with the slot and the kind of description being added

* Each record includes some kind of recognizable tag indicating the state of the declarations at time of compilation (e.g. an atom which is GENSYMMED anew every time potentially relevant declarations change). Compiled functions check to make sure that this tag matches the declarations at the time the function was compiled, and if not, the system recompiles the record or call (whichever is out of date) into the new form (using the old and new declarations) or failing this, falls back on intepretation.

* In all cases where the compiled record is insufficient, the system simply falls back on the intepreted corresponding functions. Some obvious examples of mixed mode are:

The main example is the implementation of meta-description. Each KRL object (unit, description, etc.) has both a compiled record (containing everything we had in the data records in the KRL-0 version), and an ordinary description (which we now call the meta-description).

Objects which have some slots which are much more generally used than others can choose to compile only a subset. As an obvious example, the unit for String will have at least a slot for length: and one for characterSequence:. Only the characterSequence appears in the compiled form, with length compiled as an ephemeral (see the last section) which is computed whenever it is needed.

Objects which normally have all of their description in compiled form need not be changed to add additional description. Descriptions can be attached, for example, which carry along historical information for use in debugging. This does not change the efficiency of what is already there.

Objects which are normally filled in with unit-coreferences can be used for meta-evaluation in which descriptions are added instead. The descriptions and normal fillers coexist in the two parts.

Bootstrapping

Given the ability to compile descriptions into an efficient form, there is no reason to have explicit access to the underlying implementation (which will be INTERLISP records or datatypes of some sort). The writers of KRL-1 programs, including the system builders will not use functions corresponding to the old GetFillerPair, AddFiller, etc. All accesses to the structures will be through use of the higher-level functions Seek, Describe, and Match. The declarations of the basic data objects (including units and descriptions) will be in the form of units and descriptions and associated compiler meta-descriptions.

In order to get this off the ground it is necessary to prime the pump with some hand-coded facsimiles of what the thing about to be programmed would have done if it had already been programmed. Once this first stage is completed, and it really is running, the hand-coded stuff can be replaced with the corresponding real thing. If all this seems mysterious, think of the analogous fact that Larry is in the process of getting the LISP compiler (which is written in LISP) to compile itself, so he can quit using the one which runs on MAXC. After a first pass of everything is running, then later versions can be produced using the earlier versions, so updating can go on with no further hand-coding.

Figure 1 shows the different components and their logical connections. Figure 2 shows the time dependencies. The stages involved in the process are:

Stage 1: Write a first version of the necessary units for basic structures and compiling. Decide on the compilation declarations for them. On the basis of this, produce hand-coded versions of:

* A set of compiler-structures which describe both the basic KRL structures and the compiler structures themselves

* A reader which produces compiled basic structures from text

Stage 2: Using the reader from Stage 1, read in the first version of the compiler for seek, describe and match. This compiler is a LISP program, except that it can include quoted KRL structures.

Stage 3: Use the stage 1 reader, the stage 2 compiler and the stage 1 hand-coded structures to compile the following set of simple KRL-1 programs (which are written using no multi-process scheduling, and having only trivially compilable occurences of Seek, Describe, and Match):

* A declaration compiler which produces compiler structures by looking at meta-descriptions

* A Seek/Describe/Match compiler which handles more cases, and uses the actual units

* A translator which takes a compiled description which was compiled under an old set of declarations and produces a new compiled structure corresponding to new declarations.

Stage 4: Use the stage 3 programs and the actual unit definitions to go back and re-compile the stage 3 programs. Note that if everything goes well, the result should be exactly the same as the result of stage 3, but done with no feet on the ground. All of the hand-coded pieces are now vestigial. The translator isn’t strictly necessary, assuming that on the way up, the units and declarations never got changed. However, it is vital in the case where changes are made.

Stage 5: Use the stage 1 reader, the unit definitions, and the stage 4 programs to read and compile the following which are written in full KRL-1:

* A scheduler

* An interpreter for the non-compiled cases of Seek, Describe, and Match

* A reader and printer

Stage 6: We now should have a full running version of KRL-1. Note that the interpreter and scheduler don’t need to be bootstrapped in, since they call themselves recursively and bottom out on simple cases which are compiled.

Stage 7: Write smarter versions of the various compilers, using full KRL-1. This is an endless process, but it may be useful to make one step up from the versions written for stage 3.

Some detailed issues

Basic form of records

There are some tradeoffs of time/space/flexibility in deciding just what the records should be like. The major issues come up because of further-specification. I can think of two obvious proposals:

* Packed records. Each record consists of a simple linear sequence of fields. Further-specification hierarchies are restricted to be trees (only one parent for any unit). If B and C are further specified A’s, then any instance of B will have a record whose fields are those of A, followed by those of B, and any instance of C will have a record whose fields are those of A, followed by those of C. Thus, a Seek which specifies a slot of an A can be compiled so that it will work on instances of both B and C, as well as instances labelled only as A. The advantage of this scheme is in the efficiency of the storage, and in the fact that all compiled code consists of simple access.

* Linked entries. Each record consists of a linked list of entries, each of which names a unit (and corresponding compile-time tag), and contains a simple linear sequence of fields. A Unit can be a further specification of any number of others, as long as none of the compiled names conflict. The fields stored in any entry are those which are specified in the Unit, but not in any of its further-specification parents. Accesses must include a run-time search for the appropriate entry. This leaves full flexibility in specifying hierarchies of structure inheritance, but is somewhat wasteful of both space and time. If we weren’t bootstrapping all the way down to the fundamental structures, I would definitely favor this one. It looks like there should be some way to make it not really cost much in the case where the individual instances don’t have multiple entries, but this leads to some complications in compiling.

What can be declared?

Figure 3 gives a proposed outline of declarations which can be associated with a single slot. The declaration for a Unit basically just names the slots which are to be included:

* Type. The field can be specified to contain a description of a given form (specified with a meta-description), or to contain a naked unit-pointer which will be interpreted as a co-reference to that unit.

* Multiplicity The field can be specified to contain just one element of its type, or a set or a sequence (which will be represented with LISP lists)

* Changeability The field can be declared unchangeable (once it is specified, attempts to respecify it cause an error), or changeable. Set and sequence records which are changeable can be further declared as monotone-increasing (in which case simple LISP functions are used for adding) or arbitrary. An ephemeral field is different, in that it does not appear in the record. The declaration associates a procedure with getting and one with putting, and compiles calls to these whenever the corresponding field access would have been done.

Compiler structures

The information used by the compiler is spread around in a variety of meta-descriptions associated with the unit and its slots. This will be condensed into a simple data structure used by the compiler, and care must be taken to be sure that it gets updated appropriately (through triggers on the units for declarations). This structure must include:

* the unit

* the name of the slot

* the type of field

* the associated procedures if it is ephemeral

* triggers associated with that slot (perhaps in a compiled form)