{Begin SubSec Pattern Matching} {Title Pattern Matching} {Text {note The pattern match compiler was written by L. M. Masinter.} {index *PRIMARY* Pattern matching} {index *PRIMARY* Pattern match compiler} {Tag PatternMatch} Interlisp provides a fairly general pattern match facility that allows the user to specify certain tests that would otherwise be clumsy to write, by giving a pattern which the datum is supposed to match. Essentially, the user writes "Does the (expression) X look like (the pattern) P?" For example, {lisp (match X with (& 'A -- 'B))} asks whether the second element of {lisp X} is an {lisp A}, and the last element a {lisp B}. The implementation of the matching is performed by computing (once) the equivalent Interlisp expression which will perform the indicated operation, and substituting this for the pattern, and {it not} by invoking each time a general purpose capability such as that found in FLIP or PLANNER. For example, the translation of {lisp (match X with (& 'A -- 'B))} is: {lispcode (AND (EQ (CADR X) 'A) (EQ (CAR (LAST (CDDR X))) 'B))} Thus the pattern match facility is really a pattern match compiler, and the emphasis in its design and implementation has been more on the efficiency of object code than on generality and sophistication of its matching capabilities. The goal was to provide a facility that could and would be used even where efficiency was paramount, e.g., in inner loops. As a result, the pattern match facility does not contain (yet) some of the more esoteric features of other pattern match languages, such as repeated patterns, disjunctive and conjunctive patterns, recursion, etc. However, the user can be confident that what facilities it does provide will result in Interlisp expressions comparable to those he would generate by hand. Wherever possible, already existing Interlisp functions are used in the translation, e.g., the translation of {lisp ($ 'A $)} uses {fn MEMB}, {lisp ($ ('A $) $)} uses {fn ASSOC}, etc. {index MATCH (Pattern Matching Operator)} The syntax for pattern match expressions is {lisp (match {arg FORM} with {arg PATTERN})}, where {arg PATTERN} is a list as described below. If {arg FORM} appears more than once in the translation, and it is not either a variable, or an expression that is easy to (re)compute, such as {lisp (CAR Y), (CDDR Z)}, etc., a dummy variable will be generated and bound to the value of {arg FORM} so that {arg FORM} is not evaluated a multiple number of times. For example, the translation of {lisp (match (FOO X) with ($ 'A $))} is simply {lisp (MEMB 'A (FOO X))}, while the translation of {lisp (match (FOO X) with ('A 'B --))} is: {lispcode [PROG ($$2) (RETURN (AND (EQ (CAR (SETQ $$2 (FOO X))) 'A) (EQ (CADR $$2) 'B]} In the interests of efficiency, the pattern match compiler assumes that all lists end in {lisp NIL}, i.e., there are no {fn LISTP} checks inserted in the translation to check tails. For example, the translation of {lisp (match X with ('A & --))} is {lisp (AND (EQ (CAR X) (QUOTE A)) (CDR X))}, which will match with {lisp (A B)} as well as {lisp (A . B)}. Similarly, the pattern match compiler does not insert {fn LISTP} checks{index LISTP checks in pattern matching} on elements, e.g., {lisp (match X with (('A --) --))} translates simply as {lisp (EQ (CAAR X) 'A)}, and {lisp (match X with (($1 $1 --) --))} as {lisp (CDAR X)}. Note that the user can explicitly insert {fn LISTP} checks himself by using {lisp @}, as described below, e.g., {lisp (match X with (($1 $1 --)@LISTP --))} translates as {lisp (CDR (LISTP (CAR X)))}. Note: The insertion of {fn LISTP} checks for {it elements} is controlled by the variable {var PATLISTPCHECK}.{index PATLISTPCHECK Var} When {var PATLISTPCHECK} is {lisp T}, {fn LISTP} checks are inserted, e.g., {lisp (match X with (('A --) --))} translates as: {lisp (EQ (CAR (LISTP (CAR (LISTP X)))) 'A)}. {var PATLISTPCHECK} is initially {lisp NIL}. Its value can be changed within a particular function by using a local CLISP declaration (see {PageRef Tag CLISPLocalDeclarations}). Note: Pattern match expressions are translated using the DWIM and CLISP facilities, using all CLISP declarations in effect (standard/fast/undoable) (see {PageRef Tag CLISPDeclarations}). {Begin SubSec Pattern Elements} {Title Pattern Elements} {Text A pattern consists of a list of pattern elements. Each pattern element is said to match either an element of a data structure or a segment. For example, in the editor's pattern matcher, "{lisp --}" ({PageRef (in Edit Pattern) --}) matches any arbitrary segment of a list, while {lisp &} or a subpattern match only one element of a list. Those patterns which may match a segment of a list are called {it segment} patterns; those that match a single element are called {it element} patterns. }{End SubSec Pattern Elements} {Begin SubSec Element Patterns} {Title Element Patterns} {Text {index *PRIMARY* Element patterns in pattern matching} There are several types of element patterns, best given by their syntax: {Begin LabeledList several types of element patterns} {Label {index $1 (in pattern matching)}{index & (in pattern matching)}{lisp $1} or {lisp &}} {Item Matches an arbitrary element of a list. } {Label {index ' (in pattern matching)}{lisp '{arg EXPRESSION}}} {Item Matches only an element which is equal to the given expression e.g., {lisp 'A}, {lisp '(A B)}. {fn EQ}, {fn MEMB}, and {fn ASSOC} are automatically used in the translation when the quoted expression is atomic, otherwise {fn EQUAL}, {fn MEMBER}, and {fn SASSOC}. } {Label {index = (in pattern matching)}{lisp ={arg FORM}}} {Item Matches only an element which is {fn EQUAL} to the value of {arg FORM}, e.g., {lisp =X}, {lisp =(REVERSE Y)}. } {Label {index == (in pattern matching)}{lisp =={arg FORM}}} {Item Same as {lisp =}, but uses an {fn EQ} check instead of {fn EQUAL}. } {Label {arg ATOM}} {Item The treatment depends on setting of {var PATVARDEFAULT}.{index PATVARDEFAULT Var} If {var PATVARDEFAULT} is {lisp '} or {lisp QUOTE}, same as {lisp '{arg ATOM}}. If {var PATVARDEFAULT} is {lisp =} or {lisp EQUAL}, same as {lisp ={arg ATOM}}. If {var PATVARDEFAULT} is {lisp ==} or {lisp EQ}, same as {lisp =={arg ATOM}}. If {var PATVARDEFAULT} is {lisp _} or {lisp SETQ}, same as {lisp {arg ATOM}_&}. {var PATVARDEFAULT} is initially {lisp '}. {var PATVARDEFAULT} can be changed within a particular function by using a local CLISP declaration (see {PageRef Tag CLISPLocalDeclarations}). Note: numbers and strings are always interpreted as though {index PATVARDEFAULT Var}{var PATVARDEFAULT} were {lisp =}, regardless of its setting. {fn EQ}, {fn MEMB}, and {fn ASSOC} are used for comparisons involving small integers. } {Label {lisp ({arg PATTERN{sub 1}} {ellipsis} {arg PATTERN{sub N}})}} {Item Matches a list which matches the given patterns, e.g., {lisp (& &)}, {lisp (-- 'A)}. } {Label {index @ (in pattern matching)}{lisp {arg ELEMENT-PATTERN}@{arg FN}}} {Item Matches an element if {arg ELEMENT-PATTERN} matches it, and {arg FN} (name of a function or a {lisp LAMBDA} expression) applied to that element returns non-{lisp NIL}. For example, {lisp &@NUMBERP} matches a number and {lisp ('A --)@FOO} matches a list whose first element is {lisp A}, and for which {lisp FOO} applied to that list is non-{lisp NIL}. For "simple" tests, the function-object is applied before a match is attempted with the pattern, e.g., {lisp ((-- 'A --)@LISTP --)} translates as {lisp (AND (LISTP (CAR X)) (MEMB 'A (CAR X)))}, not the other way around. {arg FN} may also be a {arg FORM} in terms of the variable {lisp @}, e.g., {lisp &@(EQ @ 3)} is equivalent to {lisp =3}. } {Label {index * (in pattern matching)}{lisp *}} {Item Matches any arbitrary element. If the entire match succeeds, the element which matched the {lisp *} will be returned as the value of the match. Note: Normally, the pattern match compiler constructs an expression whose value is guaranteed to be non-{lisp NIL} if the match succeeds and {lisp NIL} if it fails. However, if a {lisp *} appears in the pattern, the expression generated could also return {lisp NIL} if the match succeeds and {lisp *} was matched to {lisp NIL}. For example, {lisp (match X with ('A * --))} translates as {lisp (AND (EQ (CAR X) 'A) (CADR X))}, so if {lisp X} is equal to {lisp (A NIL B)} then {lisp (match X with ('A * --))} returns {lisp NIL} even though the match succeeded. } {Label {index ~ (in pattern matching)}{lisp ~{arg ELEMENT-PATTERN}}} {Item Matches an element if the element is {it not} matched by {arg ELEMENT-PATTERN}, e.g., {lisp ~'A}, {lisp ~=X}, {lisp ~(-- 'A --)}. } {Label {lisp (*ANY* {arg ELEMENT-PATTERN} {arg ELEMENT-PATTERN} {ellipsis})}} {Item Matches if any of the contained patterns match. } {End LabeledList several types of element patterns} }{End SubSec Element Patterns} {Begin SubSec Segment Patterns} {Title Segment Patterns} {Text {index Segment patterns in pattern matching} {Begin LabeledList Segment Patterns} {Label {index $ (dollar) (in pattern matching)}{index -- (in pattern matching)}{lisp $} or {lisp --}} {Item Matches any segment of a list (including one of zero length). } {End LabeledList Segment Patterns} The difference between {lisp $} and {lisp --} is in the type of search they generate. For example, {lisp (match X with ($ 'A 'B $))} translates as {lisp (EQ (CADR (MEMB 'A X)) 'B)}, whereas {lisp (match X with (-- 'A 'B $))} translates as: {lispcode [SOME X (FUNCTION (LAMBDA ($$2 $$1) (AND (EQ $$2 'A) (EQ (CADR $$1) 'B]} Thus, a paraphrase of {lisp ($ 'A 'B $)} would be "Is the element following the {it first} {lisp A} a {lisp B}?", whereas a paraphrase of {lisp (-- 'A 'B $)} would be "Is there {it any} {lisp A} immediately followed by a {lisp B}?" Note that the pattern employing {lisp $} will result in a more efficient search than that employing {lisp --}. However, {lisp ($ 'A 'B $)} will not match with {lisp (X Y Z A M O A B C)}, but {lisp (-- 'A 'B $)} will. Essentially, once a pattern following a {lisp $} matches, the {lisp $} never resumes searching, whereas {lisp --} produces a translation that will always continue searching until there is no possibility of success. However, if the pattern match compiler can deduce from the pattern that continuing a search after a particular failure cannot possibly succeed, then the translations for both {lisp --} and {lisp $} will be the same. For example, both {lisp (match X with ($ 'A $3 $))} and {lisp (match X with (-- 'A $3 --))} translate as {lisp (CDDDR (MEMB (QUOTE A) X))}, because if there are not three elements following the first {lisp A}, there certainly will not be three elements following subsequent {lisp A}'s, so there is no reason to continue searching, even for {lisp --}. Similarly, {lisp ($ 'A $ 'B $)} and {lisp (-- 'A -- 'B --)} are equivalent. {Begin LabeledList More Segment Patterns} {Label {lisp $2}, {lisp $3}, etc.} {Item Matches a segment of the given length. Note that {lisp $1} is not a segment pattern. } {Label {index ! (in pattern matching)}{lisp !{arg ELEMENT-PATTERN}}} {Item Matches any segment which {arg ELEMENT-PATTERN} would match as a list. For example, if the value of {lisp FOO} is {lisp (A B C)}, {lisp !=FOO} will match the segment {lisp {ellipsis} A B C {ellipsis}} etc. Note: Since {lisp !} appearing in front of the last pattern specifies a match with some {it tail} of the given expression, it also makes sense in this case for a {lisp !} to appear in front of a pattern that can only match with an atom, e.g., {lisp ($2 !'A)} means match if {fn CDDR} of the expression is the atom {lisp A}. Similarly, {lisp (match X with ($ ! 'A))} translates to {lisp (EQ (CDR (LAST X)) 'A)}. } {Label {index ! (in pattern matching)}{lisp !{arg ATOM}}} {Item treatment depends on setting of {var PATVARDEFAULT}.{index PATVARDEFAULT Var} If {var PATVARDEFAULT} is {lisp '} or {lisp QUOTE}, same as {lisp !'{arg ATOM}} (see above discussion). If {var PATVARDEFAULT} is {lisp =} or {lisp EQUAL}, same as {lisp !={arg ATOM}}. If {var PATVARDEFAULT} is {lisp ==} or {lisp EQ}, same as {lisp !=={arg ATOM}}. If {var PATVARDEFAULT} is {lisp _} or {lisp SETQ}, same as {lisp {arg ATOM}_$}. } {Label {lisp .}} {Item {index . (in pattern matching)}The atom "{lisp .}" is treated {it exactly} like "{lisp !}". In addition, if a pattern ends in an atom, the "{lisp .}" is first changed to "{lisp !}", e.g., {lisp ($1 . A)} and {lisp ($1 ! A)} are equivalent, even though the atom "{lisp .}" does not explicitly appear in the pattern. One exception where "{lisp .}" is not treated like "{lisp !}": "{lisp .}" preceding an assignment does not have the special interpretation that "{lisp !}" has preceding an assignment (see below). For example, {lisp (match X with ('A . FOO_'B))} translates as: {lispcode (AND (EQ (CAR X) 'A) (EQ (CDR X) 'B) (SETQ FOO (CDR X)))} but {lisp (match X with ('A ! FOO_'B))} translates as: {lispcode (AND (EQ (CAR X) 'A) (NULL (CDDR X)) (EQ (CADR X) 'B) (SETQ FOO (CDR X)))} } {Label {lisp {arg SEGMENT-PATTERN}@{arg FUNCTION-OBJECT}}} {Item {index @ (in pattern matching)}Matches a segment if the segment-pattern matches it, and the function object applied to the corresponding segment (as a list) returns non-{lisp NIL}. For example, {lisp ($@CDDR 'D $)} matches {lisp (A B C D E)} but not {lisp (A B D E)}, since {lisp CDDR} of {lisp (A B)} is {lisp NIL}. Note: an {lisp @} pattern applied to a segment will require {it computing} the corresponding structure (with {fn LDIFF}) each time the predicate is applied (except when the segment in question is a tail of the list being matched). } {End LabeledList More Segment Patterns} }{End SubSec Segment Patterns} {Begin SubSec Assignments} {Title Assignments} {Text {index *PRIMARY* Assignments in pattern matching} Any pattern element may be preceded by {index _ (in pattern matching)}"{lisp {arg VARIABLE}_}", meaning that if the match succeeds (i.e., everything matches), {arg VARIABLE} is to be set to the thing that matches that pattern element. For example, if {lisp X} is {lisp (A B C D E)}, {lisp (match X with ($2 Y_$3))} will set {lisp Y} to {lisp (C D E)}. Note that assignments are not performed until the entire match has succeeded, so assignments cannot be used to specify a search for an element found earlier in the match, e.g., {lisp (match X with (Y_$1 =Y --))} will {it not} match with {lisp (A A B C ...)}, unless, of course, the value of {lisp Y} was {lisp A} before the match started. This type of match is achieved by using place-markers, described below. If the variable is preceded by a {lisp !},{index ! (in pattern matching)} the assignment is to the {it tail} of the list as of that point in the pattern, i.e., that portion of the list matched by the remainder of the pattern. For example, if {lisp X} is {lisp (A B C D E)}, {lisp (match X with ($ !Y_'C 'D $))} sets {lisp Y} to {lisp (C D E)}, i.e., {fn CDDR} of {lisp X}. In other words, when {lisp !} precedes an assignment, it acts as a modifier to the {lisp _}, and has no effect whatsoever on the pattern itself, e.g., {lisp (match X with ('A 'B))} and {lisp (match X with ('A !FOO_'B))} match identically, and in the latter case, {lisp FOO} will be set to {lisp CDR} of {lisp X}. Note: {lisp *_{arg PATTERN-ELEMENT}} and {lisp !*_{arg PATTERN-ELEMENT}} are acceptable, e.g., {lisp (match X with ($ 'A *_('B --) --))} translates as: {lispcode [PROG ($$2) (RETURN (AND (EQ (CAADR (SETQ $$2 (MEMB 'A X))) 'B) (CADR $$2]} }{End SubSec Assignments} {Begin SubSec Place-Markers} {Title Place-Markers} {Text {index Place markers in pattern matching} {indexX {Name #} {Type in pattern matching} {Text {lisp #{arg N}} ({arg N} a number)} } Variables of the form {lisp #{arg N}}, {arg N} a number, are called place-markers, and are interpreted specially by the pattern match compiler. Place-markers are used in a pattern to mark or refer to a particular pattern element. Functionally, they are used like ordinary variables, i.e., they can be assigned values, or used freely in forms appearing in the pattern, e.g., {lisp (match X with (#1_$1 =(ADD1 #1)))} will match the list {lisp (2 3)}. However, they are not really variables in the sense that they are not bound, nor can a function called from within the pattern expect to be able to obtain their values. For convenience, regardless of the setting of {var PATVARDEFAULT}, the first appearance of a defaulted place-marker is interpreted as though {var PATVARDEFAULT} were {lisp _}. Thus the above pattern could have been written as {lisp (match X with ( 1 =(ADD1 1)))}. Subsequent appearances of a place-marker are interpreted as though {var PATVARDEFAULT} were {lisp =}. For example, {lisp (match X with (#1 #1 --))} is equivalent to {lisp (match X with (#1_$1 =#1 --))}, and translates as {lisp (AND (CDR X) (EQUAL (CAR X) (CADR X))}. (Note that {lisp (EQUAL (CAR X) (CADR X))} would incorrectly match with {lisp (NIL)}.) }{End SubSec Place-Markers} {Begin SubSec Replacements} {Title Replacements} {Text {index *PRIMARY* Replacements in pattern matching} The construct {lisp {arg PATTERN-ELEMENT}_{arg FORM}} specifies that if the match succeeds, the part of the data that matched is to be {it replaced} with the value of {arg FORM}. For example, if {lisp X =(A B C D E)}, {lisp (match X with ($ 'C $1_Y $1))} will replace the third element of {lisp X} with the value of {lisp Y}. As with assignments, replacements are not performed until after it is determined that the entire match will be successful. Replacements involving segments splice the corresponding structure into the list being matched, e.g., if {lisp X} is {lisp (A B C D E F)} and {lisp FOO} is {lisp (1 2 3)}, after the pattern {lisp ('A $_FOO 'D $)} is matched with {lisp X}, {lisp X} will be {lisp (A 1 2 3 D E F)}, and {lisp FOO} will be {fn EQ} to {lisp CDR} of {lisp X}, i.e., {lisp (1 2 3 D E F)}. Note that {lisp ($ FOO_FIE $)} is ambiguous, since it is not clear whether {lisp FOO} or {lisp FIE} is the pattern element, i.e., whether {lisp _} specifies assignment or replacement. For example, if {index PATVARDEFAULT Var}{var PATVARDEFAULT} is {lisp =}, this pattern can be interpreted as {lisp ($ FOO_=FIE $)}, meaning search for the value of {lisp FIE}, and if found set {lisp FOO} to it, or {lisp ($ =FOO_FIE $)} meaning search for the value of {lisp FOO}, and if found, store the value of {lisp FIE} into the corresponding position. In such cases, the user should disambiguate by not using the {var PATVARDEFAULT} option, i.e., by specifying {lisp '} or {lisp =}. Note: Replacements are normally done with {lisp RPLACA} or {lisp RPLACD}. The user can specify that {fn /RPLACA} and {fn /RPLACD} should be used, or {fn FRPLACA} and {fn FRPLACD}, by means of CLISP declarations (see {PageRef Tag CLISPdeclarations}). }{End SubSec Replacements} {Begin SubSec Reconstruction} {Title Reconstruction} {Text {index *PRIMARY* Reconstruction in pattern matching} {index => (in pattern matching)} The user can specify a value for a pattern match operation other than what is returned by the match by writing {lisp (match {arg FORM{sub 1}} with {arg PATTERN} => {arg FORM{sub 2}})}. For example, {lisp (match X with (FOO_$ 'A --) => (REVERSE FOO))} translates as: {lispcode [PROG ($$2) (RETURN (COND ((SETQ $$2 (MEMB 'A X)) (SETQ FOO (LDIFF X $2)) (REVERSE FOO]} Place-markers in the pattern can be referred to from within {arg FORM}, e.g., the above could also have been written as {lisp (match X with (!#1 'A --) => (REVERSE #1))}. If {lisp ->}{index -> (in pattern matching)} is used in place of {lisp =>}, the expression being matched is also {it physically changed} to the value of {arg FORM}. For example, {lisp (match X with (#1 'A !#2) -> (CONS #1 #2))} would remove the second element from {lisp X}, if it were equal to {lisp A}. In general, {lisp (match {arg FORM{sub 1}} with {arg PATTERN} -> {arg FORM{sub 2}})} is translated so as to compute {arg FORM{sub 2}} if the match is successful, and then smash its value into the first node of {arg FORM{sub 1}}. However, whenever possible, the translation does not actually require {arg FORM{sub 2}} to be computed in its entirety, but instead the pattern match compiler uses {arg FORM{sub 2}} as an indication of what should be done to {arg FORM{sub 1}}. For example, {lisp (match X with (#1 'A !#2) -> (CONS #1 #2))} translates as {lisp (AND (EQ (CADR X) 'A) (RPLACD X (CDDR X)))}. }{End SubSec Reconstruction} {Begin SubSec Examples} {Title Examples} {Text Example: {lisp (match X with (-- 'A --))} {lisp --} matches any arbitrary segment. {lisp 'A} matches only an {lisp A}, and the second {lisp --} again matches an arbitrary segment; thus this translates to {lisp (MEMB 'A X)}. Example: {lisp (match X with (-- 'A))} Again, {lisp --} matches an arbitrary segment; however, since there is no {lisp --} after the {lisp 'A}, {lisp A} must be the last element of {lisp X}. Thus this translates to: {lisp (EQ (CAR (LAST X)) 'A)}. Example: {lisp (match X with ('A 'B -- 'C $3 --))} {fn CAR} of {lisp X} must be {lisp A}, and {fn CADR} must be {lisp B}, and there must be at least three elements after the first {lisp C}, so the translation is: {lispcode (AND (EQ (CAR X) 'A) (EQ (CADR X) 'B) (CDDDR (MEMB 'C (CDDR X))))} Example: {lisp (match X with (('A 'B) 'C Y_$1 $))} Since {lisp ('A 'B)} does not end in {lisp $} or {lisp --}, {lisp (CDDAR X)} must be {lisp NIL}. The translation is: {lispcode (COND ((AND (EQ (CAAR X) 'A) (EQ (CADAR X) 'B) (NULL (CDDAR X)) (EQ (CADR X) 'C) (CDDR X)) (SETQ Y (CADDR X)) T))} Example: {lisp (match X with (#1 'A $ 'B 'C #1 $))} {lisp #1} is implicitly assigned to the first element in the list. The {lisp $} searches for the first {lisp B} following {lisp A}. This {lisp B} must be followed by a {lisp C}, and the {lisp C} by an expression equal to the first element. The translation is: {lispcode [PROG ($$2) (RETURN (AND (EQ (CADR X) 'A) (EQ [CADR (SETQ $$2 (MEMB 'B (CDDR X] 'C) (CDDR $$2) (EQUAL (CADDR $$2) (CAR X]} Example: {lisp (match X with (#1 'A -- 'B 'C #1 $))} Similar to the pattern above, except that {lisp --} specifies a search for {it any} {lisp B} followed by a {lisp C} followed by the first element, so the translation is: {lispcode [AND (EQ (CADR X) 'A) (SOME (CDDR X) (FUNCTION (LAMBDA ($$2 $$1) (AND (EQ $$2 'B) (EQ (CADR $$1) 'C) (CDDR $$1) (EQUAL (CADDR $$1) (CAR X]} }{End SubSec Examples} }{End SubSec Pattern Matching}