[phylum]<3-LISP>course>PROBLEM-SETS>ps-2b.bravo!8

Page Numbers: Yes X: 306 Y: 0.9" First Page: 10
Margins: Top: 1.1" Bottom: 1.2"
Heading:

PROBLEM SET #2 LISP: LANGUAGE AND LITERATURE May 22, 1984
————————————————————————————————————————————

What is important about the code just given (both versions) is that the grammar of the langauge it recognizes is entirely encoded in the recognition procedures themselves; nowhere is there an explicit "statement" of the grammatical rules. Of course the information that the rules contain is embedded within the procedures we have just examined, but extracting these rules from the procedures would take work.

Also, the information that is built up about the grammatical structure of each expression recognized by this code is also encoded implicitly, this time not in the structures of the program that we just examined, but rather in the structure of the computational process engendered by this program, in the course of recognizing a particular expression. Although it might seem that this latter information must be implicit if the former is, that is not the case; we will see a counter-example in a moment. However, because both grammar and "parse" are never explicitly represented, we will call this ARITH-RECOGNIZE recognizer doubly implicit.

Question 2-a: (first question!) Based on your understanding of ARITH-RECOGNIZE, define a doubly implicit recognizer for 3-NOTE language. You should end up with a package of recursive routines, of which 3-NOTE-RECOGNIZE will be the public "top-level" one. You may do it in functional style (like ARITH-RECOGNIZE), or in object-oriented style (like A2-RECOGNIZE), as you wish. It should support the following behaviour:

1> (3-note-recognize "234")
1= "accepted"

1> (3-note-recognize " (+ 2 3) ")
1= "accepted"

1> (3-note-recognize "(lambda [x)]")
1= "rejected"

1> (3-note-recognize "")
1= "rejected"

1> (3-note-recognize " (a . b) ")
1= "rejected"

1> (3-note-recognize " ’foo ")
1= "rejected"

Question 2-b: Suppose we define an extension to 3-NOTE, called 3-NOTE+, which extends the language to support:

oUp and down arrows — i.e., expressions of the form ‘↑expression’ and ‘\expression’;

oHandle expressions — i.e., expressions of the form ‘’expression’;

oDots. — i..e., expressions of the form ‘(expression1 . expression2)’.

Modify your procedures to recognize 3-NOTE+. I.e., you should have a new procedure, called 3-NOTE+-RECOGNIZE, supporting:

1> (3-note+-recognize " ’foo ")
1= "accepted"

1> (3-note+-recognize "’’↑’\’[]")
1= "accepted"

2-6. Implicit Parsers and Translators

The 3-LISP recognizer, since it maps expressions onto information about them (i.e., the information as to whether they are in the language or not), is of the form e ← i(e), in terms of our original diagram. We next present code for another routine that handles ARITH expressions, which is implicit in the sense that the ARITH grammar is still entirely coded implicitly in the programs, but which nonetheless constructs explicit representations of the grammatical structures of the expressions that it accepts. I.e., we will present a singly implicit parser. It will produce what are known as parse-trees — expressions in a simple language that encode information about the expressions it accepts. Specifically, (ARITH-PARSE string) will return "rejected" if the string designated by string is not in ARITH, but will return a parse tree if the string is in the language.

Specifically, we have:

1> (arith-parse "234")
1= [’expression [’numeral-expression "234"]]

1> (arith-parse "+(2,3)")
1= [’expression [’complex [’operator "+"]
"("
[’expression [’numeral-expression "2"]]
","
[’expression [’numeral-expression "3"]]
")"]]

1> (arith-parse "*(10,/(20,30))")
1= [’expression [’complex [’operator "*"]
"("
[’expression [’numeral-expression "10"]]
","
[’expression [’complex [’operator "/"]
"("
[’expression [’numeral-expression "20"]]
","
[’expression [’numeral-expression "30"]]
")"]]
")"]]

1> (arith-parse "+(3)")
1= "rejected"

Once again, we have two options in defining such a parser: functional or object-oriented. The code for the functional version (ARITH-PARSE) is analagous to that for ARITH-RECOGNIZE, but more complex because the inner procedures have to perform two tasks at once: returning a parse tree for the expression fragment that they recognize, and also returning the sequence of as-yet unrecognized tokens that still need to be parsed. If all the information were passed up from the leaves of the tree, this would be a simple matter of composition, but the category labels (’expression, ’complex, etc.) have to be added at each node. They cannot simply be pasted together with the results being passed up from the lower nodes, but must instead be inserted into the parse half of that result, leaving the sequence of as-yet unparsed tokens intact.

The object-oriented version, however, is simpler, since the lexical analysis object retains the information about how much of the input expression has been parsed. The inner parsing routines merely have to return an appropriate (encoding of) the parse itself. A complete listing:

;;; A2-PARSE Main parsing routine. Returns "rejected" if <string> is not
;;; ======== a well-formed ARITH expression; otherwise a parse tree.

(define A2-PARSE
(lambda [string]
(letseq [[s (lexan-object string)]
[answer (dynamic-catch (a2-parse-expression s))]]
(if (and (= (next-token s) ’eot)
(sequence answer))
answer
"Rejected"))))

;;; Ancillary parsing functions:
;;;
;;; The following procedures call A2-ERROR-IN-PARSE immediately upon
;;; finding a grammatical error of any sort, which does not return (since
;;; the procedures below are not designed to do anything sensible about
;;; continuing from errors), but instead does a DYNAMIC-THROW (q.v.).

;;; A2-PARSE-EXPRESSION Attempts to parse an arbitrary ARITH expression.
;;; =================== .

(define A2-PARSE-EXPRESSION
(lambda [s]
[’expression
(cond [(member (next-token s) ["+" "*" "-" "/"])
(cons ’complex
(cons [’operator (get-token s)]
(a2-parse-args s)))]
[(numeral-expression (next-token s))
[’numeral-expression (get-token s)]]
[$true (a2-error-in-parse)])]))

;;; A2-PARSE-ARGS Attempts to parse an ARITH args expression
;;; =================

(define A2-PARSE-ARGS
(lambda [s]
[(a2-parse-token "(" s)
(a2-parse-expression s)
(a2-parse-token "," s)
(a2-parse-expression s)
(a2-parse-token ")" s)]))

;;; A2-PARSE-TOKEN Ensures that the next token in s is <token>; returns it
;;; ============== as the parse.

(define A2-PARSE-TOKEN
(lambda [token s]
(if (not (= token (get-token s)))
(a2-error-in-parse)
token)))

;;; A2-ERROR-IN-PARSE Signals an error by sending back something which isn’t
;;; ================= a sequence, and therefore cannot be a valid parse.

(define A2-ERROR-IN-PARSE
(lambda []
(dynamic-throw ’not-a-sequence)))

The following, in contrast, is code for a functional version:

;;;==========================================================================
;;; Simple ARITH Expression Parser
;;;==========================================================================
;;;
;;; Parses ARITH expressions.

;;; Built on top of the Lexical Analyser
;;; ------------------------------------

;;; Assumed grammar: (<token> is the primitive lexical item)
;;;
;;; <expression> ::= <numeral> | <complex>
;;;
;;; <complex> ::= <operator> "(" <expression> "," <expression> ")"
;;;
;;; <operator> ::= "+" | "-" | "*" | "/"

;;; ARITH-PARSE Main parsing routine. Returns a parse tree, or else.
;;; =========== the string "rejected" if not a well-formed ARITH expression.

(define ARITH-PARSE
(lambda [string]
(let [[parse (dynamic-catch
(arith-parse-expression (lexical-analysis string)))]]
(if (null (second parse))
(first parse)
"rejected"))))

;;; Ancillary parsing functions:
;;;
;;; The following procedures call ARITH-ERROR-IN-PARSE immediately upon
;;; finding a grammatical error of any sort, which does not return (since
;;; the procedures below are not designed to do anything sensible about
;;; continuing from errors), but instead does a DYNAMIC-THROW (q.v.).

;;; Each of the parsing routines will return a parse/tail pair;

;;; ARITH-PARSE-EXPRESSION Attempts to parse an arbitrary ARITH expression.
;;; ======================

(define ARITH-PARSE-EXPRESSION
(lambda [tail]
(append-results [’expression]
(cond [(member (first tail) ["+" "*" "-" "/"])
(arith-parse-args [’complex [’operator (first tail)]]
(rest tail))]
[(numeral-expression (first tail))
[[’numeral-expression (first tail)] (rest tail)]]
[$true (arith-error-in-parse)]))))

;;; ARITH-PARSE-ARGS attempts to recognize an ARITH args expression
;;; ====================

(define ARITH-PARSE-ARGS
(lambda [new tail]
(letseq [[parse (append-results new (arith-parse-token "(" tail))]
[parse (append-results (first parse)
(arith-parse-expression (second parse)))]
[parse (append-results (first parse)
(arith-parse-token "," (second parse)))]
[parse (append-results (first parse)
(arith-parse-expression (second parse)))]
[parse (append-results (first parse)
(arith-parse-token ")" (second parse)))]]
parse)))

(define APPEND-RESULTS
(lambda [new result]
[(append new [(first result)]) (second result)]))

(define ARITH-PARSE-TOKEN
(lambda [token tail]
(if (and (not (null tail))
(= token (first tail)))
[token (rest tail)]
(arith-error-in-parse))))

;;; ARITH-ERROR-IN-PARSE Signals an error by sending back the string "rejected".
;;; ====================

(define ARITH-ERROR-IN-PARSE
(lambda []
(dynamic-throw [[] ["something"]])))

Note: The code for both of these versions can be loaded by typing:

(load "{turing}<3-lisp.problem-sets>arith-parser.3-lisp")

Question 2-c: Based on your experience writing 3-NOTE-RECOGNIZE, and the parsers just presented, write a parser for the 3-NOTE language. You should define a 3-NOTE-PARSE procedure, with associated other procedures, so as to support:

1> (3-note-parse "234")
1= [’expression [’numeral-expression "234"]]

1> (3-note-parse "[abc def]")
1= [’expression [’rail-expression "["
[’expression [’identifier-expression "abc"]]
[’expression [’identifier-expression "def"]]
"]"]]

1> (3-note-parse "(lambda [x] 3)")
1= [’expression
[’pair-expression "("
[’expression [’identifier-expression "lambda"]]
[’expression [’rail-expression
"["
[’expression [’identifier-expression "x"]]
"]"]]
[’expression [’numeral-expression "3"]]
")"]]

1> (3-note-parse "+(2,3)")
1= "rejected"

In both ARITH and 3-NOTE, various special tokens (‘[’, ‘(’, etc.) are included in expressions primarily in order to indicate the category of an expression. Tokens like this (they include ‘l’ in the l-calculus, quantification signs ‘A’ and ‘E’ in the predicate calculus, etc.) are called syncategorematic — meaning that they are used to indicate syntactic category, not as carriers of meaning in and of themselves. Our parsers for these languages, both ARITH-PARSE and 3-NOTE-PARSE, included these tokens in the parse tree, in spite of the fact that the parse tree, in virtue of its node structure, conveys the information about syntactic category directly, therefore making their inclusion entirely redundant.

Question 2-d: Write a modified version of your 3-NOTE-PARSE — call it 3-NOTE-a-PARSE — that avoids including the syncategorematic tokens in the parse trees it constructs (you will probably want to do this by copying your code from 3-NOTE-PARSE and editing it). It should support:

1> (3-note-a-parse "(lambda [x y] 3)")
1= [’expression
[’pair-expression
[’expression [’identifier-expression "lambda"]]
[’expression [’rail-expression [’expression [’identifier-expression "x"]]
[’expression [’identifier-expression "y"]]]]
[’expression [’numeral-expression "3"]]]]

We mentioned in the introduction that 3-LISP’s internalizer was a kind of translator, where one language was the language of notation, the other the language of internal structures. Since 3-NOTE is a subset of 3-LISP’s standard notation, it would be possible to write a program (called, say, 3-NOTE-INTERNALIZE), that returned not a parse tree but rather the internal structure into which its argument string would be converted. I.e., 3-NOTE-INTERNALIZE would implement a subset of 3-LISP’s own INTERNALIZE. This procedure, like the ones before, would have the 3-NOTE grammar encoded entirely implicitly within the programs, as well as the grammar of the target language of 3-LISP internal structures. It would also encode information about the grammatical structure of any particular expression in the structure of the computational process generated. So it is doubly implicit for both languages, or quadruply implicit.

Question 2-e: Write such a 3-NOTE-INTERNALIZE. One way to do so would be to run over the parse tree returned by 3-NOTE-PARSE, and create internal structure from that. That would of course be less implicit. However, you should instead modify your definition of 3-NOTE-PARSE to make a stand-alone 3-NOTE-INTERNALIZE. Your code should support the following; you can check it on more complex cases (at least complex cases within 3-NOTE) by comparing it with the system’s own INTERNALIZE. Note: you may (indeed, will definitely want to) use INTERNALIZE to convert numeral expressions into numerals, and identifier-expressions into atoms.

1> (3-note-internalize "(+ 2 x)")
1= ’(+ 2 x)

1> (3-note-internalize "[[] [abc]]")
1= ’[[] [abc]]

1> (3-note-internalize "(first ’[a b c])")
1= "rejected" ; This would have worked if you had
; defined 3-NOTE+-INTERNALIZE instead.

2-7. A Digression, on the way towards Explicit Grammars

For reasons of modularity, generality, extensibility, and so forth, it is often preferable to have the grammar, as well as the information about the grammatical structure of the particular expression, be represented explicitly. I.e., it is often convenient to have doubly explicit transducers (though not always; see the discussion in section 2.10). For example, note the great similarities between ARITH-RECOGNIZE and 3-NOTE-RECOGNIZE, and between ARITH-PARSE and 3-NOTE-PARSE. In this section we will write several procedures (recognizers, parsers, etc.) that take explicit grammars as arguments.

One feature of the procedures we defined in the last few sections was that the particular grammars we used had various properties that made the code easier: they were not left-recursive (there were no possible derivations of the form non-terminal-i ← ... ← non-terminal-i ... ); they had no deletion rules (all non-terminals in a parse expanded into at least one character in the original string); they could be parsed with only a single token "look-ahead"; and you could tell where an expression ended. When we write more general routines that take grammars explicitly, we will not want to restrict ourselves so much, and will therefore have to adopt more general programming techniques. Although we will never release ourselves from the "no-left-recursion" limitation, the programs we write will not be restricted in any of the other ways.

Before presenting any transducers, however, we need a digression on "filtering" and "collecting". In many of the procedures we write, we need convenient ways to collect, from a space of possibilities, those that meet some criteria. For example, suppose that we wanted to define a procedure called FILTER-EVEN that, given as argument a sequence of integers, would return a sequence consisting, in order, of those elements of the original sequence that were even. A simple definition is the following:

(define FILTER-EVEN
(lambda [seq]
(cond [(null seq) seq]
[(even (first seq))
(cons (first seq)
(filter-even (rest seq)))]
[$true (filter-even (rest seq))])))

We can abstract this, and define a FILTER procedure that takes as an argument not only the sequence itself, but also the predicate to be used to test its elements. A possible definition:

(define FILTER
(lambda [predicate seq]
(cond [(null seq) seq]
[(predicate (first seq))
(cons (first seq)
(filter predicate (rest seq)))]
[$true (filter predicate (rest seq))])))

Another commonly needed utility is one that takes a function F that maps objects of type T onto sequences, and a sequence SEQ of objects of type T, and then appends together the results of applying F to the elements of SEQ. For example, the FRINGE procedure defined in the last problem set was of this variety. Since APPEND is defined over an indefinite number of arguments, we saw in section that FRINGE could be defined as follows:

(define FRINGE
(lambda [seq]
(if (not (sequence seq))
[seq]
(append . (map fringe seq)))))

This suggests defining a general utility, called COLLECT, that takes a function and a sequence of arguments, and appends the results together:

(define COLLECT
(lambda [fun seq]
(append . (map fun seq))))

A more general version, allowing FUN to be of other than single arity, is the following (i.e., (COLLECT fun args1 ... argsk) appends together the reuslts of mapping fun of arity k to the sequences of arguments args1 ... argsk):

(define COLLECT
(lambda args
(append . (map . args))))

This allows the following compact definition of FRINGE:

(define FRINGE
(lambda [seq]
(if (not (sequence seq))
[seq]
(collect fringe seq)))))

These techniques can all be combined in rather powerful ways. Since

(append [] [a1 ... ak]) = (append [a1 ... ak] []) = [a1 ... ak]

we can define FILTER as follows:

(define FILTER
(lambda [predicate seq]
(collect (lambda [element]
(if (predicate element)
[element]
[]))
seq)))

Also, we can define a filtered mapping utility, FILTERED-MAP, that applies a function to just those elements of a sequence that satisfy some predicate:

(define FILTERED-MAP
(lambda [fun predicate seq]
(map fun (filter predicate seq))))

or equivalently:

(define FILTERED-MAP
(lambda [fun predicate seq]
(collect (lambda [element]
(if (predicate element)
[(fun element)]
[]))
seq)))

For example:

1> (set x [1 2 3 4 5 6 7])
1= [1 2 3 4 5 6 7]

1> (filter even x)
1= [2 4 6]

1> (filtered-map square even x)
1= [4 16 36]

The real power of COLLECT is illustrated in the second definition of FILTERED-MAP. COLLECT can be used as a version of FILTERED-MAP, where the decision about whether to include the result (for a given element of the original sequence), and the decision as to what result to include, are taken together in some other code. Note, in particular, that (COLLECT fun seq) is essentially (MAP fun seq), except that it appends results together. So if COLLECT is used with a fun that is set up to produce either a single-element sequence of an answer, or else the null sequence, it can effectively decide whether or not the answer should be put into the final sequence. I.e., it is just like a MAP function except that it can make the extra decision as to whether to have its answer included in the final result.

For example:

(collect (lambda [element]
(if ... some long computation involving element ...
[expression-1] ; include expression-1 in the answer
[])) ; don’t include anything in the answer
sequence)

We define two simple routines, SINGLETON and NOTHING, in order to make this more modular and abstract:

(define SINGLETON (lambda [ans] [ans]))
(define NOTHING (lambda [] []))

Thus:

1> (set x [1 2 3 4 5 6 7])
1= [1 2 3 4 5 6 7]

1> (collect (lambda [element]
(if (even element)
(singleton (square element))
(nothing)))
x)
1= [4 16 36]

We will use FILTERED-MAP and COLLECT (with SINGLETON and NOTHING) extensively in the explicit grammar procedures of the next few sections.

2-8. Recognizers with Explicit Grammars

In this section we define a collection of so-called "top down recursive descent recognizers", called RECOGNIZE-n, defined over grammars represented as a sequence of rules, where each rule is a sequence of a left-hand side symbol, and a right hand side consisting of zero or more symbols. For example, we will have:

1> (set *camelot* [[’s ’np ’vp]
[’vp ’v ’np]
[’vp ’v]
[’v "slept"]
[’v "loved"]
[’v "wielded"]
[’np "arthur"]
[’np "gwen"]
[’np "excalibur"]])
1= ...

1> (recognize-1 "arthur slept" ’s *camelot*)
1= "accepted"

1> (recognize-1 "arthur wielded excalibur" ’s *camelot*)
1= "accepted"

1> (recognize-1 "arthur gwen" ’s *camelot*)
1= "rejected"

We will present several versions, starting with a conceptually simple one, leading towards versions that are both more general and more compact. The code for all three versions can be loaded by typing:

(load "{turing}<3-lisp.problem-sets>general-recognizer.3-lisp")

The first, called RECOGNIZE-1, is a back-tracking recognizer that takes as arguments an expression (string), a start-symbol, and a grammar, and returns "accepted" or "rejected" depending on whether the expression is an element of the language described by the grammar. You should be able to read the code, given a brief description of the general strategy. We define a frontier to be a set of symbols, usually non-terminal symbols, and say that an expression (token sequence) is covered by the frontier if the token sequence can be grouped in a series of groups of tokens, each of which is described by the sequence of symbols in the frontier.

For example, in the grammar just used as an example, the token sequence

["wielded" "excalibur" "gwen"]

would be covered by the frontier

[’vp ’np]

since ["wielded" "excalibur"] is a vp and ["gwen"] is an np.

(define RECOGNIZE-1
(lambda [expression start-symbol grammar]
(if (recognize-frontier-1 (lexical-analysis expression)
[start-symbol]
grammar)
"accepted"
"rejected")))

(define RECOGNIZE-FRONTIER-1
(lambda [tokens frontier grammar]
(cond [(null frontier) (null tokens)]
[(terminal (first frontier))
(and (not (null tokens))
(match (first frontier) (first tokens))
(recognize-frontier-1 (rest tokens) (rest frontier) grammar))]
[(nonterminal (first frontier))
(some (lambda [rhs]
(recognize-frontier-1 tokens
(append rhs (rest frontier))
grammar))
(rhsides (first frontier) grammar))])))

(define TERMINAL
(lambda [symbol]
(or (= symbol ’identifier)
(= symbol ’numeral-expression)
(string symbol))))

(define NONTERMINAL (compose not terminal))

(define MATCH
(lambda [symbol token]
(cond [(= symbol ’identifier) (identifier token)]
[(= symbol ’numeral-expression) (numeral-expression token)]
[$true (= symbol token)])))

(define RHSIDES
(lambda [nonterm grammar]
(filtered-map rest
(lambda [rule] (= nonterm (first rule)))
grammar)))

For example, we could define the following grammar for ARITH:

(set *ARITH*
[[’expression ’numeral-expression]
[’expression ’complex]
[’complex ’operator "(" ’expression "." ’expression ")"]
[’operator "+"]
[’operator "-"]
[’operator "*"]
[’operator "/"]])

and then use it to yield:

1> (recognize-1 "234" *arith*)
1= "accepted"

1> (recognize-1 "+(2,3)" *arith*)
1= "accepted"

1> (recognize-1 "(2 3)" *arith*)
1= "rejected"

RECOGNIZE-1 looks for only a single way to accept the incoming expression; we can define a RECOGNIZE-2 that simultaneously looks for all possible ways to recognize the incoming expression:

(define RECOGNIZE-2
(lambda [expression start-symbol grammar]
(if (some null
(recognize-symbol-2 (lexical-analysis expression)
start-symbol
grammar))
"accepted"
"rejected")))

(define RECOGNIZE-SYMBOL-2
(lambda [tokens symbol grammar]
(cond [(null tokens) (nothing)]
[(match symbol (first tokens))
(singleton (rest tokens))]
[$t (collect (lambda [rhs]
(recognize-frontier-2 tokens rhs grammar))
(rhsides symbol grammar))])))

(define RECOGNIZE-FRONTIER-2
(lambda [tokens frontier grammar]
(cond [(null frontier) (singleton tokens)]
[(null tokens) (nothing)]
[$t (collect (lambda [fragment]
(recognize-frontier-2 fragment (rest frontier) grammar))
(recognize-symbol-2 tokens (first frontier) grammar))])))

RECOGNIZE-SYMBOL-2 and RECOGNIZE-FRONTIER-2 differ primarily in the fact that the first deals with a single symbol, whereas the latter is more complicated because it has to recognize a full sequence of them. But a more abstract version is possible, which combines these facilities into one:

(define RECOGNIZE-3
(lambda [expression start-symbol grammar]
(if (some null
(recognize-frontier-3 (lexical-analysis expression)
[start-symbol]
grammar))
"accepted"
"rejected")))

(define RECOGNIZE-FRONTIER-3
(lambda [tokens frontier grammar]
(cond [(null frontier) (singleton tokens)]
[(null tokens) (nothing)]
[$t (collect (lambda [fragment]
(recognize-frontier-3 fragment (rest frontier) grammar))
(if (match (first frontier) (first tokens))
(singleton (rest tokens))
(collect (lambda [rhs]
(recognize-frontier-3 tokens rhs grammar))
(rhsides (first frontier) grammar))))])))

Question 2-f: Write a grammar, acceptable by RECOGNIZE-1, -2, or -3, for 3-NOTE. (Note: you will have to replace the Kleene "+" and "*", using recursive rules instead.) Try it on each of:

[a]
(a)
[]
()
(lambda [x] y)
(lambda [x) y]

2-9. Parsing with Explicit Grammars

The next step is to build a parser, rather than just a recognizer, that works with an explicit representation of a grammar. Our parsing procedure will be similar to that of RECOGNIZE-2; above, except that whereas each internal routine in RECOGNIZE-2 returned a sequence of as-yet unrecognized tokens, our new routines will return a two element sequence, the first element of which is the parse, and the second of which is the as-yet unparsed tokens. Actually, they will return a sequence of such two-element sequences, since they are designed to discover all possible parses (as RECOGNIZE-2 did). A top-level procedure:

(define PARSE
(lambda [expression start-symbol grammar]
(let [[good-parses
(filtered-map first
(compose null second)
(parse-symbol (lexical-analysis expression)
start-symbol
grammar))]]
(if (null good-parses)
"rejected"
good-parses))))

PARSE-SYMBOL takes a symbol, a list of tokens to be parsed, and a grammar; if the symbol matches the first token directly, the result is simply the tree fragment consisting of that symbol and that first token, with the remaining tokens passed back as well, since they still need to be parsed (note the use of SINGLETON).

(define PARSE-SYMBOL
(lambda [tokens symbol grammar]
(if (match symbol (first tokens) grammar)
(singleton [[symbol (first tokens)] (rest tokens)])
(collect (lambda [rhs]
(parse-frontier symbol tokens rhs grammar))
(rhsides symbol grammar)))))

The code for these two procedures can be loaded by typing:

(load "{turing}<3-lisp.problem-sets>general-parser.3-lisp")

Question 2-g: Write an appropriate PARSE-FRONTIER to fit in with the two procedures just defined. The reason PARSE-FRONTIER is given SYMBOL as an argument is so that it can take each of the parse trees that it builds up from the right hand sides of the rules (rhs), and add the SYMBOL node on top of them. You will want to study RECOGNIZE-FRONTIER-2 carefully, and also understand the differences between RECOGNIZE-SYMBOL-2 and PARSE-SYMBOL. Together, these should enable you to define PARSE-FRONTIER.

One way to understand this last question is that we have provided the base case for a recursive definition, and are asking you to provide the corresponding recursive one.

2-10. Extensions and Further Explorations

Though this is the end of the problem set, it is by no means the end of the road we have been exploring. For example, it is clear, given all of these various procedure definitions, that it would be powerful to abstract over all the various sorts of transductions — parsing, recognition, translation — and define a general purpose procedure that in some sense would take the task as an explicit argument, as well as the grammar. One way to do this is to associate with each grammar rule a procedure that sets out the function to be computed over expressions that are described by that rule. By using the right kinds of procedures, one can specify translation, analysis, etc.

Given such machinery, it would be possible, by associating sufficiently powerful procedures with each rule, to translate 3-LISP expressions not just into their internalizations, but into normal-form results. I.e., one could implement the 3-LISP processor in such a system! The task is relatively straightforward except for the question of environments, but by using sufficiently fancy associated procedures, one can pass environments up and down the tree of the computation that is engendered by the structure of the expression.

Simpler than these is the task of writing a genuine interpretation procedure — something that, as described in the introduction, can be done only if the expressive power of the theoretical language is adequate to refer to the semantic domain of the language in question. In 3-LISP’s case this would restrict us to languages about numbers, sequences, internal structures, etc.

A final extension is a task that is suggested by the examples we have considered: a procedure that takes a grammar as an expression (parsed according to a grammar for grammars), and generates the code for an implicit parsing procedure, in some programming language. Such systems, called parser generators, are very powerful way of obtaining efficient implementations of parsers for new computer languages (since implicit parsers are often more efficient that explicit ones).