LoganQueryDoc.tioga
Doug Terry, July 20, 1992 5:12 pm PDT
LOGANQUERY
CEDAR 10.1 — FOR INTERNAL XEROX USE ONLY
LoganQuery
Doug Terry
© Copyright 1985-1992 Xerox Corporation. All rights reserved.
Abstract: LoganQuery allows complex queries to be formulated and run against a LoganBerry database. These queries may involve boolean predicates and a variety of pattern matching operations. LoganQuery classes, such as those implementing filters and mergers, provide the basic building blocks for supporting such queries. New LoganQuery classes can be added.
Created by: Doug Terry
Maintained by: Doug Terry <Terry.pa>
Keywords: database, queries, browsing, pattern-matching
See also: LoganBerryDoc.tioga, LoganBerryToolsDoc.tioga
XEROX  Xerox Corporation
   Palo Alto Research Center
   3333 Coyote Hill Road
   Palo Alto, California 94304

For Internal Xerox Use Only
Introduction
LoganQuery allows more complicated queries to be performed on a LoganBerry database than the simple retrievals provided by the LoganBerry interface. Database queries of various classes can be registered; retrieval operations are class-specific. For instance, the $Filter class returns database entries that match a particular pattern, while the $Merger class merges the entries returned by two independent queries. Filters can be cascaded to perform multiple-attribute queries. The predefined $Simple class permits operations identical to those provided by LoganBerry. Clients are free to create their own class of queries.
For more information on the LoganBerry data management system, see LoganBerryDoc.tioga. For information on some of the commands and tools that make use of the LoganQuery facilities, see LoganBerryToolsDoc.tioga.
Boolean Queries
The Query Language
The LoganBerry Query Language can be used to express boolean queries. The basic component of such queries is an "attribute pattern". The format of an attribute pattern is as follows:
attribute-type(pattern-type): pattern
The attribute-type is a string indicating a LoganBerry attribute type; it is case sensitive. The pattern is a string that will be compared against the value of the given attribute in LoganBerry database entries; if it contains blanks or special characters then it should be enclosed in double quotes. The pattern-type is a string that specifies a particular pattern-matcher; this governs how the comparison between values and patterns is performed.
The pattern-type (and surrounding parentheses) can be omitted from an attribute pattern. In this case, a default pattern-matcher is used (currently the default is "DWIM").
A LoganBerry entry matches an attribute pattern if (1) the entry contains an attribute of the type given in the attribute pattern, and (2) the value of this attribute matches the pattern given in the attribute pattern (see "Pattern Matching" below).
Attribute patterns can be combined using AND, OR, and NOT as well as parentheses. The AND operator between attribute patterns is optional (for backward compatibility). The syntax for boolean queries is defined by the following formal grammar:
query ::= boolexpr | boolexpr query
boolexpr ::= term | term "OR" boolexpr
term ::= factor | factor "AND" term
factor ::= attribute-pattern | "NOT" factor | "(" boolexpr ")"
attribute-pattern ::= attribute-type ":" pattern |
attribute-type "(" pattern-type ")" ":" pattern
Here are some sample boolean queries over a mythical directory database consisting of names and phone numbers:
Name: Ben*
Name(wildcard): Ben*
Name(exact): "Ben Dover" Phone(exact): 123-4567
Name(exact): "Ben Dover" AND Phone(exact): 123-4567
Name: Sally OR Name: Saly OR Name: Salley OR Name: Silly
Name(soundex): Sally
(Name: "Moe D. Lawn" OR Name: "Frank N. Stein") AND Phone(prefix): "(415)"
Name: "Moe D. Lawn" OR Name: "Frank N. Stein" AND Phone(prefix): "(415)"
Phone: *
Phone: * OR NOT Phone: *
NOT (Name: "Patty O'Furniture" OR Name: "Mary Mae Bill" OR Name: "Justin Tyme")
NOT Name: "Patty O'Furniture" AND NOT Name: "Mary Mae Bill" AND NOT Name: "Justin Tyme"
The first two queries above produce identical results (assuming DWIM as the default pattern-matcher). Queries three and four are also identical. Queries five and six are not identical, but have similar intent. Queries seven and eight are different (because operators group right-to-left); use parenthesis to explicitly control the order of execution. Queries nine and ten produce identical results only if each database entry contains a Phone attribute, for instance if Phone is the primary key. The last two queries given above are identical, of course.
Pattern Matching
Each attribute pattern in a boolean query evaluates to either TRUE or FALSE based on whether or not the attribute value present in a LoganBerry entry matches the pattern given in the query. A variety of pattern matchers are available for use in queries including:
exact: Compares the value and pattern for equality.
prefix: Checks if the pattern is a prefix of the value. Prefix[v, "p"] is the same as Wildcard[v, "p*"], though faster.
wildcard: The pattern may contain zero or more wildcards (the character "*") that match anything.
re: The pattern is taken to be a regular expression as defined in RegularExpressionDoc.tioga.
soundex: Compares the value and pattern based on their Soundex codes. The Soundex encoding tends to group together variants of the same name; for instance, Johnson, Jansen, and Johansen have identical Soundex codes.
subrange: Checks if the value is in the range specified by the pattern. The pattern is taken to be two strings (or rope literals) separated by a "-"; a match occurs if pattern.beginningdvaluedpattern.end in lexicographic ordering or prefix(value)=pattern.end.
numrange: Checks if the value is in the numerical range specified by the pattern. The pattern should consist of two positive integers separated by a "-", e.g. "18-21".
daterange: Checks if the value is in the chronological range specified by the pattern. The pattern should consist of two date and times (as can be parsed by Tempus) separated by a "-", e.g. "today at noon-doomsday". The starting time is taken to be the earliest time that meets the specification whereas the ending time in the latest, e.g. "Yesterday-Today" means all of yesterday and all of today, not the beginning of yesterday to the beginning of today.
date: Checks if the value matches the pattern within the time precision of the pattern. The pattern, as well as the value, should be a date and time (as can be parsed by Tempus). A match can not occur unless the pattern is no more precise than the value. For example, a pattern of "Wednesday" will match a value of "Wednesday at 2 pm", but not vice versa.
DWIM: Tries to deduce an appropriate pattern matcher from the given pattern.
This list is almost certainly incomplete since new pattern-matching routines can be dynamically registered (via PatternMatch.mesa).
Programmers' Interface
The LoganQuery package is intended for programmers who want to build LoganBerry-based applications. Users who simply want to run queries over LoganBerry databases should use the LoganBerry Browser or similar tools (see LoganBerryToolsDoc.tioga). The following sections explain the query class abstraction in more detail and describe some existing query classes.
Basic Operations
Two operations are generic to all query classes. These operations are syntactically identical and semantically similar to those provided by LoganBerry, except that they operate on "complex cursors" instead of LoganBerry's simple cursors:
NextEntry: Retrieves the next entry relative to the given cursor. The cursor is automatically updated so that NextEntry may be repeatedly called to enumerate entries. NIL is returned if the cursor is at the end of the sequence and dir=increasing or at the start of the sequence and dir=decreasing.
EndGenerate: Releases the cursor; no further operations may be performed using the given cursor. This must be called once for every cursor created.
Predefined Query Classes
The creation of complex cursors is a class-specific operation. Clients are free to introduce new query classes and operations for generating cursors of those classes. LoganQueryClass.mesa defines operations to generate cursors of various classes, including $Simple, $Filter, $Merger, $Query, and $Aborter:
GenerateEntries: Identical to the GenerateEntries operation provided by LoganBerry but returns a cursor with class=$Simple and data=LoganBerry.Cursor.
FilterEntries: Returns a cursor of class $Filter. Retrievals using this new cursor apply the given pattern-matching filter to the appropriate attribute of entries identified by the input cursor. A retrieval returns NIL if the input is exhausted or stopIfNothingGreater=TRUE and the filter procedure returns nothingGreater=TRUE.
Note: PatternMatch.mesa defines several pattern-matching filters for use with the $Filter (and $AntiFilter) query class.
AntiFilterEntries: Returns a cursor of class $AntiFilter. Retrievals using this new cursor apply the given pattern-matching filter to the appropriate attribute of entries identified by the input cursor and return those entries that do NOT match the pattern. In other words, the resulting cursor returns exactly the opposite of what would be returned by a $Filter cursor. A retrieval returns NIL if the input is exhausted.
MergeEntries: Returns a cursor of class $Merger. Retrievals using this new cursor merge the two input streams. The input streams should be ordered by the given attribute type. A retrieval returns NIL only when both inputs are exhausted.
QueryEntries: Returns a cursor of class $Query. Retrievals using this new cursor match a set of attribute patterns against entries in the database. Unless a particular base index is requested, a query planner attempts to choose an index that optimizes the execution of the query. The returned query plan provides information about the query.
EnableAborts: Returns a cursor of class $Aborter. Operations on cursors of this class behave exactly like those on the input cursor except that they can be aborted using the AbortQuery operation.
All of these operations, except for GenerateEntries, take one or more cursors as input and produce one or more cursors as output. This permits cursors to be combined in various ways to produce interesting queries. For example, multi-attribute queries are accomplished by cascading one or more filters (see "Attribute-based Queries" below). See LoganQueryClass.mesa for a more complete list of available query classes.
Boolean Attribute-based Queries
The following operation takes a list of attribute patterns as a query to run against a LoganBerry database:
QueryEntries: Returns a cursor of class $Query. Retrievals using this new cursor match the attribute patterns against entries in the database. The baseIndex parameter governs which attribute type is used as the base index for the query; for baseIndex=NIL, an index is chosen that attempts to optimize the execution of the query (use this unless you really care about the order in which database entries are retrieved). The returned query plan provides information about the query.
Boolean queries over a LoganBerry database (as defined in "Boolean Queries" above) are also supported. The following operation takes a boolean query and a cursor, and returns a cursor that can be used to retrieve entries matching the query:
BooleanFilterEntries: Returns a cursor of class $BooleanFilter. Retrievals using this new cursor match the boolean query against entries returned by the input cursor. Raises LoganQuery.SyntaxError if presented with an incorrectly formed query.
The output of the QueryEntries operation can be feed into the BooleanFilterEntries. See LoganQuery.mesa for details.