[Indigo]<KRL>document>str-syntax!2

Page Numbers: Yes First Page: 1
Heading:

UNPUBLISHED WORKING DRAFT - DO NOT COPY OR DISTRIBUTE
September 20, 1978 4:03 PM 2: Syntax [IVY]<KRL>document>str-syntax

2. Specification of the syntax of KRL-1

Part 1. Overview of KRL-1 syntax

The syntax of KRL-1 departs in several significant ways from that of KRL-0. The major overall effect is a drastic reduction in the number of bracketing and punctuation characters, and a smoothing out of some non-uniformities. The design of the syntax can be described in terms of 4 basic ideas:

* The use of key words based on English, both initially and as internal structure markers

* An LR(1) grammar for the syntactic forms

* Horizontal positioning as a means of specifying scope

* Interspersed "footnotes"

This document first describes the basic ideas, then gives a detailed BNF specification of the forms recognized by the KRL-1 parser, then ends with a collection of examples illustrating the use of the syntactic forms.

The basic syntactic structures

As in KRL-0, the syntax depends heavily on the use of English-like keywords. This correspondence with English is intended to be mnemonic, but does not imply that the system understands the "normal" usage of these words. Their occurence in structures is prescribed formally, as with the reserved symbols of any computer language. Key words (and some symbols, such as "=") are used both to distinguish the type of structure and to serve as internal delimiters. In most of the basic forms, (listed in the BNF specification below) the first token is a keyword which determines uniquely which syntactic form is intended. Often the internal words introduce optional elements, and do not appear in instances where those elements are left out. The following descriptor forms use the keywords "the", "a", "an", "with", "from", "viewedAs", and the character "=": <Note: The capitalization of keywords is ignored by the parser, but governed by conventions discussed later>

The actor from Event17 viewedAs an Act
An Act with actor = A Student

In a small set of forms, the second token, rather than the first, serves as the identifying key. Slots and functionals are the primary examples. Slots are indicated by an atomic name, followed by a colon, followed by the rest of the slot, as in:

actor: A Person

In one of the forms for functionals, the functional name is followed by a left parenthesis, which is one of a matched pair surrounding the list of arguments for the functional, as in:

Between(Block1, Block2)

Every form is recognizable by looking at its first two tokens, one of which will be a key. The basic syntax can therefore be described by an LR(1) grammar. This differs from the syntax of LISP in a significant way -- the first element of any syntactic form in KRL-1 will always be a single token. This makes it is impossible to recursively embed one form as the first element of another. This has implications for the two dimensional syntax discussed below.

Two dimensional syntax

In any programming language which allows the recursive embedding of expressions or statements, there is a problem of specifying where the embedded expression ends. In Algol-like languages, this is done by having all embedded statements be either a primitive (non-recursive) syntactic form, or bracketed in a "begin...end" pair. Sequences of statements at the same level are separated by a punctuation character (often semi-colon). In LISP-like languages, embedding is indicated through a balanced nested parenthesis structure.

Advanced systems for programming in higher level languages often contain a set of routines which do "prettyprinting" of recursively embedded structures. A structure can be printed (to hardcopy or a terminal) with indentation from the left margin indicating depth of nesting. Thus, instead of the nearly unreadable (but legal):

(DEFINEQ(FUNNYFUN(X Y)(COND((LISTP X)(CAR Y))((EQ(CADR Y)(CAR (COND((NUMBERP X)Y)(T(CDR Y)))))(CDR Y))(T(CONS X(CDR Y))))))

a listing from a typical LISP system will contain:

(DEFINEQ
(FUNNYFUN (X Y)
(COND ((LISTP X) (CAR Y))
((EQ (CADDR Y)
(CAR (COND ((NUMBERP X) Y)
(T (CDR Y)))))
(CDR Y))
(T (CONS X (CDR Y))))))

The treatment of scoping in KRL-1 syntax is based on the realization that in fact, the exact parenthesis count is almost totally ignored by human programmers. The easiest way to see if parenthesization is correct is to have the system prettyprint the structure, and see whether the indentation looks right. The parentheses are only needed by the LISP reader, since it throws away the information provided by the "white space", treating all strings of blanks, tabs, and new lines as equivalent. Since a parser is perfectly capable of counting spaces and tabs, we decided to dispense with explicit bracketing characters whenever possible and replace them with the information provided by indentation. A program is treated as a piece of text with two-dimensional positioning on a page, instead of being thought of primarily as a linear sequence of characters.

A related approach was tried by Landin for ISWIM in the middle 60’s <Landin, P.J. Grammars for languages that use indentation, UNIVAC S.P. Research Report, April, 1965>. The experiment was somewhat of a failure <Jim Morris, personal communication>, since students found it almost impossible to maintain the desired horizontal positioning relationships when editing their programs on a keypunch. Fortunately we are no longer forced to carry our programs around in decks. Programming in KRL-1 is expected to be done using display terminals which can display positioned text and allow the user to edit it in place, including moving chunks of text to the right and left on the screen.

The scope of any syntactic form is determined by the positioning of succeeding text characters on the page. It can be determined by the following algorithm:

To find the scope of any syntactic form:

Find the horizontal position of the left edge of the first symbol of the form

Find the first "offsides" line below the line where the form begins -- a line of text which has a character at this position, or anywhere to the left of it

All lines up to, but not including the offsides line are in the scope of the form.

A set of adjacent aligned items are all co-elements of a common superior scope. This is like the alignment of elements in a LISP prettyprint, but differs in that there is no need to have a pair of bracketing elements. Thus, the following forms represent different structures:

A Person with mother = Diana
A Programmer
A RedHead

---------

A Person with mother = Diana
A Programmer
A RedHead

In both cases, there is a red-haired person whose mother is Diana, but in the first case that person is also a programmer, while in the second, Diana is a programmer.

The two dimensional scoping rule could not be used in the same way in LISP because the first element of a list can be another list, and so on indefinitely. It works in KRL-1 because the grammar does not allow embedding in the first position of a syntactic form.

At least for the moment, horizontal position is determined by counting all characters including blanks as width 1, and interpreting tabs as advancing to the next position equal to 0 mod 8. It is however possible for the user to specify a tab width other than 8, via the function KrlTabWidth(n).

Two and a half dimensional syntax

In KRL-0 there were devices in the syntax for including "features", "triggers", and a variety of other specialized structures. What these all had in common was that they were in some sense ancillary to the "real" description structure. They could be thought of conceptually as footnotes -- providing additional useful information, but not in the mainstream of what was being said. Other programming languages have less of this problem because they are more linear in their underlying semantics. However, comments and verification statements (e.g. assertions or justifications) have this same flavor -- belonging at a particular point in a program, but not as part of its main sequence.

In KRL-1 there is a standard mechanism for all such structures, based on the normal English text conventions for footnoting. Following the initial word of any syntactic form there can be a note reference which consists of an uparrow (↑) followed by an integer. Anywhere later in the same unit (which forms the largest level of context for all KRL syntax) there must appear a footnote, which begins with the same integer, followed by a colon (:), followed by a description which conforms to all the normal syntactic rules for descriptions. The description contained in the footnote is treated semantically as a description of the structure in which the note reference appeared. It can in turn contain a note reference which refers to a further footnote (although such embeddings will probably be quite rare).

In standard KRL-1 hardcopy listings (including the examples given below), the note reference is in a superscript position, as is typographically standard for footnotes in English text. However, the uparrow is included so that this subtle positioning difference need not be used to indicate when a note reference is intended, since currently standard text formats do not have the flexibility to handle fractional-line vertical spacing.

The exact specification of the permissible location of note references is in the BNF, wherein the occurence of an asterisk (*) in the definition of a form indicates a place where a note reference may appear.

The placement of footnotes is not critical (as with footnotes in English text). A footnote can be inserted anywhere in other structures following its note reference. It will be extracted by the parser, which will then continue with whatever structure it was parsing. The numbers are only for establishing the correspondence, and do not need to be sequential or ordered. Several note references can refer to a common footnote.

In the current implementation, the internal form does not actually include the numbers, but replaces them with corresponding pointers. The printer will number the footnotes in order from 1 whenever it prints the structure. When note references can refer to the same footnote, separate copies will be made, and when the unit is reprinted they will not be remerged.

Explicit bracketing and separation

There are cases where the two dimensional syntax is overly clumsy, and it is desirable to provide character-string information for bracketing and separation. There are two conventions used for this:

* A semicolon can be used anywhere to indicate the termination of the deepest nested descriptor preceding it. The following two forms are equivalent:

A Bite with biter = Fido; A Dog

---
A Bite with biter = Fido
A Dog

* Square brackets "[" and "]" can be inserted around any descriptor or set of descriptors to explicitly indicate the scoping. The following are equivalent:

[A Bite with biter = Fido] An Event

---
A Bite with biter = Fido
An Event

Square brackets take precedence over white space in determining scoping. This is useful when the nesting gets very deep, and most of the code would appear squashed against the right hand edge of the page. The following forms are equivalent:

A Bite with [
biter = [
The bitten from
Event17 viewedAs a Bite]]

---
A Bite with biter = The bitten from Event17 viewedAs a Bite

In general, the parser will assume that a descriptor has been terminated whenever the following token could not be a legal continuation of that descriptor. Thus it will interpret:

A Dog Fido

as a sequence of two descriptors. In ambigous cases, it is only the most deeply nested descriptor which is terminated. Thus the following two are equivalent:

A Bite with biter = A Dog Fido bitten = Sally

--------

A Bite with biter = A Dog
Fido
bitten = Sally

Bracketed sequences

KRL-1 contains a syntax for specifying all the members of a set or sequence using a pair of bracketing characters -- "{" and "}" for sets, "<" and ">" for ordered sequences. The following two forms are valid descriptors:

{A Dog, A Cat, Fido}

<Event1, A Bite, Event1, A Bite>

The first describes a set containing three elements, one of which is known to be a dog, one is known to be a cat, and the third (which is distinct from the first) known to be Fido. The second form describes a sequence of four elements. Note that it can contain repetitions.

There are many other forms for describing sets and sequences. Most of these do not involve an explicit enumeration of the elements. However, this is such a common and useful case that we have chosen to provide a special syntax. These forms follow the general LR(1) rule described above, in that the initial bracketing character serves as a key, indicating what the structure is. The commas are required between elements. Note that the following two descriptions are different:

{A Dog Fido}|{A Dog, Fido}

The one on the left describes a set with a single element for which we have two descriptions. The second describes a two element set. The same convention is used for separating multiple arguments of a functional form. They are enclosed in parentheses "(" and ")" and separated by commas.

Allowable names, conventions for upper and lower case and fonts

Any combination of numbers and upper and lower case letters can be used for any name (except that a sequence which can be read by INTERLISP as a number will be treated as a pointer to the LISP form of that number). Spaces and punctuation characters cannot be used in names. There is a set of conventions for using upper and lower case, but these do not form part of the formal syntax of KRL-1. Reserved words will be recognized with letters in any combination of cases. All other words are assumed to be unique exactly as specified. The two names "THISisAfunnyNAME" and "ThIsIsAfUnNyNaMe" are both legal, and are distinct (as are the rest of the 216 possible combinations). The standard conventions are:

* Unit names and functional names always have their initial letter in upper case.

* Slot names always have their initial letter in lower case.

* Reserved words which appear as the initial key word for a descriptor have their first letter in upper case. All other reserved words begin in lower case. Some words can appear in both forms. Thus:

A Bite with biter = The bitten from a Bite with biter = Fido

* Names can consist of multiple English words, run together with the initial letter of each word after the first in upper case. Examples are: ThisIsAUnitName, thisIsASlotName.

* All characters which are not forced by one of the above conventions to be in upper case are in lower case.

Font and typeface choices are not significant in the formal syntax of KRL-1. However, there are standard conventions followed by the printing routines:

* All names appear in boldface when they are being defined. This includes the appearance of a unit name following "#", of a slot name preceding a ":", and of a functional name which is being defined in a HasFunctional. Note references and footnote labels are also bold. Otherwise no boldface is used.

* All footnotes (including note references and labels) are in a smaller font. This includes those comments which are footnotes.

* Note references are in superscript position. Footnote labels are not.

Quoting and interfaces with LISP

Since KRL can be used to describe its own structures, it must have a way to incorporate a structure into another structure not as a description of a referent, but as the referent itself. Such a referent can in some ways be thought of as similar to quoted forms in LISP, although in KRL-1, the semantics of quotation are somewhat more complex, since a quoted form can contain syntactic forms whose semantic interpretation depends on the syntactic context in which they appear, as well as forms whose content is dependent on the surrounding LISP context when they are interpreted (see below). The document Datatypes describes the semantics of quoted forms. The syntax, however is quite simple. A quoted description is a form following all of the usual syntactic rules, whose initial keyword is the character backslash (\). This is closely analagous to the use of the single quote (’) character in INTERLISP.

Since KRL-1 (like KRL-0) uses INTERLISP for defining procedures and passing arguments, it is necessary to provide an interface for including LISP forms in KRL descriptions and vice versa. In the KRL syntax, the character single quote (’) is a keyword which introduces a descriptor which asserts that the referent of the containing description is the quoted LISP object. Numbers and strings can be included in KRL syntax without a preceding single quote, but are treated as if they were, that is as the referent of the containing description. As in INTERLISP, strings must be bracketed by a pair of double quotes ("...") and cannot contain a double quote character unless it is preceded by the escape character (%).

A more active interface is provided by the "Lisp" keyword, which permits description of a LISP form to be evaluated together with a binding environment.

Various forms beginning with "bang" (!) and a keyword serve to introduce a LISP object into a descriptor, to be evaluated in the current context whenever the descriptor is converted to its memory level version. See the Interface document for details on these forms, which are called surrogates. Note that rules of two dimensional structure, footnotes, etc. do not apply within the LISP forms following single quote or !keyword. They are actually parsed by handing control over to LISP READ, which returns when it has found a legal expression (which can be an atom, number, list, etc.).

The KRL-1 environment in INTERLISP includes a special read-macro character for the LISP reader. Whenever a character sequence is being read by the LISP reader (whether from a file or a teletype), a backslash (\) causes control to be passed to the KRL-1 reader. A matching forward slash (/ or //, see below) causes the KRL-1 reader to terminate, and pass back a nexus (see the Interface document for details) which will then be inserted in its place in the LISP structure being read. Note that this implies that no forward slashes appear in KRL-1 syntax. As an example, we might have as part of a file-resident function definition:

(SETQ NEWDESCS <\A Dog/ OLDDESC:1 \[An↑1 IntegerSequence <1,2,3>]
1: A Default
/>]

This is a LISP form with two embedded KRL forms. Note that the outermost pair of angle brackets (<...>) is handled by CLISP, as is the colon in OLDDESC:1 and the final terminating square bracket. However, within the scope of a slash pair (\.../), these same characters are part of the KRL-1 syntax described in this section.

Important note: a double forward slash (//) is required to terminate a KRL structure in terminal typein.

If the first token after the backslash is the unit marker (#), a single unit will be read, defined, and returned.

For details on all of these issues see the document Interface.

Comments

There are two ways of introducing comments into KRL structures. Both methods are shown in the examples in part 3.

The first way is purely syntactic, at the message level only. The atom -- (two dashes) introduces a comment which extends to the end of the current line. Thus a comment is delimited by -- at one end and carriage return at the other. Such a comment is completely transparent and equivalent to a simple carriage return. Note however that the -- must be recognizable as an atom, and so must be preceeded and followed by separators.

Alternatively, there is a system defined functional Comment, of one argument. A footnote with that functional with a string as argument is the approved way to permanently and directly comment a part of a unit or other structure. Note however that all the system overhead involved in metadescription is caused thereby, so caveat scriptor.

Part 2. BNF description of the basic syntax

This section is a formal grammar for the different descriptor types. It is not a full specification of the syntax, since it does not attempt to express the significance of horizontal positioning and the interspersal of footnotes. Taken by itself, it is incomplete and allows ambiguous structures. In connection with the specification of scoping and footnotes above, it gives a full specification.

The meta-syntax for expressing the BNF is that suggested by Niklaus Wirth <"What can we do about the unnecessary diversity of notation for syntactic definitions", Unpublished memo>. The following is quoted from his memo:

This meta-language can therefore conveniently be used to define its own syntax, which may serve here as an example of its use. The word identifier is used to denote nonterminal symbol, and literal stands for terminal symbol. For brevity, identifier and character are not defined in further detail.

syntax = {production}.
production= identifier "=" expression "." .
expression= term {"|" term}.
term= factor {factor}.
factor= identifier | literal | "(" expression ")" |
"[" expression "]" | "{" expression "}" .
literal= """ character {character} """.

Repetition is denoted by curly brackets, i.e. {a} stands for nil | a | aa | aaa | ... . Optionality is expressed by square brackets, i.e. [a] stands for a | nil . Parentheses merely serve for grouping, e.g. (a|b)c stands for ac | bc . Terminal symbols are enclosed in quote marks. This, and the elimination of angular brackets around nonterminal symbols is consistent with common practice in programming languages.

The identifier literalAtom expands to any literal atom (not number) in LISP. The identifier S-expression expands to (anything reeadable as) a single INTERLISP expression, including lists, literal atoms, numbers, strings, and user-defined data types. The identifier integer is any LISP integer. Except for the effect on positioning as discussed above, all sequences of spaces, tabs and carriage returns are treated equivalently, as the terminator of a single token.

----------------------------

unit = "#" {"#"} unitName [unitNoteReference] {slot}.

unitName = atom.

atom = literalAtom | "!Name" S-expression.

!N is currently recognized as an abbreviation of !Name.

unitNoteReference = noteReference.

noteReference = "↑" integer.

footnote = integer ":" description.

This specification does not state where the footnotes appear. Standardly, a note reference can follow the initial word of any form. The exact inventory of places where they may appear is indicated by the occurence of askerisks (*) in the definitions below. Footnotes on units are the only ones which can contain functional specifications. They also contain descriptions of how the unit is used as a structure, and thus may contain a different class of descriptions than can appear in any other place.

slot = slotName ":" [slotNoteReference] [slotDescription].

slotName = atom.

slotNoteReference = noteReference.

slotDescription = description.

The slotName must be unique with respect to the unit. The slot footnote contains all the trigger and trap information associated with this slot alone. Other trigger and trap information linked to more than one slot is found in the unit footnote.

description = descriptorList | "[" * [descriptorList] "]" *.

Footnotes on a bracketed descriptor list are taken to apply to each descriptor in the list individually, not to the list as a whole.

* = [noteReference].

descriptorList = [descriptor] {[";"] description}.

Semicolons are optional as described above.

metaShift = "@" description.

surrogate = "!" surrogateType S-expression.

The following abbreviations for surrogate types are currently recognized:

dr, descrdescriptor
dn, descndescription
l lisp
c, coref coreference
k krl
ppost

coreference = unitPointer | slotPointer | localReference.

unitPointer = unitName *.

slotPointer = "The" * slotName ("inUnit" | "fromUnit") unitName.

localReference = "My" * slotName [("inUnit" | "fromUnit") unitName].

If the optional part is omitted, the lexically enclosing unit will be supplied.

pointer = krlPointer | lispPointer.

krlPointer = quotedStructure | structureRef | localNamedRef.

quotedStructure = quotedUnit | quotedDescriptor | quotedAnchor.

quotedUnit = "\" * unit.

quotedDescriptor = "\" * "~" descriptor.

quotedAnchor = "\" * description.

structureRef = "Structure" slotName [("inUnit" | "fromUnit") unitName].

localNamedRef = "StructureNamed" atom [("inUnit" | "fromUnit") unitName].

lispPointer = number * | string * | "’" S-expression *.

number = as recognized by the LISP reader.

string = as recognized by the LISP reader.

sequenceEnumeration = "<" * [descriptionSequence] ">" *.

setEnumeration = "{" * [descriptionSequence] "}" *.

If the descriptionSequence is left out, these descriptors stand for the empty sequence and empty set.

descriptionSequence = elementDescription {"," elementDescription}.

elementDescription = description | "..." | "!!" surrogateType S-expression.

perspective = ("A" | "An") * prototype [pairList] ["thatIs" description].

prototype = ["@"] unitName.

pairList = "with" fillerPair {fillerPair}.

fillerPair = slotName "=" description
| "[" fillerPair "]"
| "!!!" surrogateType S-expression "=" S-expression.

The slotName must come from the unit specified by the prototype of the perspective. The printer chooses between "A" and "An" depending on whether the prototype begins with a vowel or a consonant.

specification = specForm1 | specForm2.

specForm1 = "The" * slotName "from" description
("as" | "viewedAs") perspective.

specForm2 = "The" * slotName "from" (perspective | "My" slotName).

The form with My slotName is an abbreviation which fills in viewedAs mySlotPersp where mySlotPersp is the lexically first perspective descriptor in the definition of the slot with name slotName in the lexically enclosing unit.

functional = funForm1 | funForm2.

funForm1 = [whichForm *] functionalName * "(" [argSequence] ")" [pairList].

funForm2 = whichForm * functionalName * [arg].

whichForm = "Which" | "WhichIs".

functionalName = atom.

argSequence = descriptionSequence.

arg = description.

Note that a functional with zero or one argument may be written without explicit parentheses in which case it must be written with the Which or WhichIs prefix and may not include extra fillerpairs: e.g. either FatherOf(Max) or WhichIs FatherOf Max.

case = "Using" * focusDescription
["matchWith" matchDescription]
["selectFrom"] casePair {selectionPair}.

casePair = keyDescription "->" resultingDescription
| "[" casePair "]".

focusDescription = description.

keyDescription = description.

resultingDescription = description.

matchDescription = description.

selectRef = "Its" slotName.

The selectRef is a special form which can appear only inside the resultingDescription of a select descriptor. In the context of Using the employer from G0034 viewedAs a Business selectFrom a Carpenter -> Its tools, the descriptor Its tools is equivalent to The tools from [the employer from G0034 viewedAs a Business] viewedAs a Carpenter.

lispInvocation = "Lisp" description [bindingList].

bindingList = "binding" fillerPair {fillerPair}.

In a LISP invocation descriptor the left hand sides of the filler pairs are interpreted as variable names and the right hand sides as values. Tail surrogates are not supported.

functionalSpecification =
[whichForm *] "HasFunctional" * "(" focusSlotDesig "," functionalNameForm
{"," slotDesig} ")" [pairList].

This is a special form which can appear only as a footnote associated with a unitNoteReference.

functionalNameForm = [whichForm] ["Interpreted"] functionalName.

focusSlotDesig = ["MemberOf"] slotName.

slotDesig = {"Quoted" | "MemberOf" | "Optional" | "Sequence" | "Set"} slotName.

The Sequence and Set alternatives are only for the argSlotDesigSequence, where they may appear only last: they specifie that the functional may have arbitrarily many arguments which are to be collected as a sequence or set. The Optional alternative specifies that no filler pair should be constructed if the given instance argument is not present, rather than causing an error or building an empty collection. The Which and WhichIs alternatives for the functionalNameForm indicate preferred external format for instances of the functional. The number of elements in argSlotDesigSequence and argSequence must be the same for a particular functionalSpecification and all its instances, modulo the effect of Optional, Set, and Sequence.

Part 3. Examples

# Bring

self: A Happening
An Act with actor = My bringer

A Go with
goer = My bringer
destination = My destination
source = My source

A Go with
goer = My brought
destination = My destination
source = My source

bringer: A Person

brought: A PhysicalObject

source: A Place

destination: A Place

# # # # # # # # # # # # # # # #

# Family↑11: HasFunctional(woman, MotherOf, MemberOf offspring)
HasFunctional(man, FatherOf, MemberOf offspring)
HasFunctional(man, HusbandOf, woman)
HasFunctional(woman, WifeOf, man)
HasFunctional(MemberOf parents, ParentOf, MemberOf offspring)
HasFunctional(MemberOf offspring, ChildOf, MemberOf parents)
HasFunctional(parents, HaveFamily, Set offspring)
woman: An Adult; A Female
man: An Adult; A Male
parents: {My woman, My man}
offspring: SetUnion(My femaleChildren, My maleChildren)
maleChildren: SetOf(A Child; A Male)
femaleChildren: SetOf(A Child; A Female)

# Family1-- This and the following unit give semantically equivalent
-- information
self: A Family with
parents = {John, Sue}
offspring = {Susie, Sally, Joe}

# Family1-- This and the preceding unit give semantically equivalent
-- information
self: A Family with
parents = {John, Sue}
Which↑1 HaveFamily(Susie, Sally, Joe)
1: Comment("This functional is defined on the Family unit")