LoganQueryImpl.mesa
Copyright Ó 1985, 1992 by Xerox Corporation. All rights reserved.
Doug Terry, July 21, 1992 10:07 am PDT
Willie-s, April 27, 1992 11:40 am PDT
LoganQuery allows more complicated queries to be performed on a LoganBerry database than the simple retrievals provided by the LoganBerry interface. This module implements the complex queries, i.e. those that involve composing the basic LoganQuery classes. These include multi-attribute queries and boolean queries. See LoganQueryClassImpl.mesa for the implementation of the basic building blocks (LoganQuery classes) used by this module.
DIRECTORY
Ascii USING [Lower, Upper, NUL, SP],
Atom USING [GetPName, MakeAtom],
BasicTime USING [GMT, Period],
IO USING [BreakProc, EndOfStream, GetTokenRope, GetLineRope, TokenProc, GetChar, SkipWhitespace, PeekChar, GetRopeLiteral, Error, STREAM, PutRope, PutF1, RIS, rope],
PatternMatch USING [CheckPattern, DWIM, Lookup, MatchProc],
Rope USING [Cat, Concat, Equal, Fetch, FromChar, Index, Length, Replace, ROPE, Run, SkipTo, Substr, Translate, TranslatorType],
Tempus USING [Adjust, Parse, Precision, MakeRope, Unintelligible],
LoganBerry,
LoganQueryClass,
LoganQuery;
LoganQueryImpl: CEDAR PROGRAM
IMPORTS Ascii, Atom, BasicTime, IO, PatternMatch, Rope, Tempus, LoganBerry, LoganQueryClass
EXPORTS LoganQuery
~ BEGIN OPEN LoganQuery;
ROPE: TYPE ~ Rope.ROPE;
STREAM: TYPE ~ IO.STREAM;
Basic operations
The following operations are syntactically and semantically similar to those provided by LoganBerry. Any errors raised by these operations are generated by LoganBerry.
NextEntry: PUBLIC PROC [cursor: Cursor, dir: CursorDirection ¬ increasing] RETURNS [entry: LoganBerry.Entry] = {
Retrieves the next entry relative to the given cursor. The cursor is automatically updated so that NextEntry may be repeatedly called to enumerate entries. NIL is returned if the cursor is at the end of the sequence and dir=increasing or at the start of the sequence and dir=decreasing.
Errors: IndexNotOpen, BadIndex, LogNotOpen, BadLogEntry
IF cursor.class.retrieve = NIL THEN RETURN[NIL];
RETURN[cursor.class.retrieve[cursor, dir]];
};
EndGenerate: PUBLIC PROC [cursor: Cursor] RETURNS [] = {
Releases the cursor; no further operations may be performed using the given cursor. This must be called once for every cursor created.
IF cursor.class.destroy = NIL THEN RETURN;
cursor.class.destroy[cursor];
};
Query planning and execution
In actuality, there is no $Query class. The cursor returned by QueryEntries is of class $Filter; it is obtained by one or more calls to FilterEntries.
QueryEntries: PUBLIC PROC [db: LoganBerry.OpenDB, patterns: AttributePatterns, baseIndex: LoganBerry.AttributeType ¬ NIL, defaultPattern: PatternType ¬ "DWIM", planOnly: BOOLEAN ¬ FALSE] RETURNS [cursor: Cursor, plan: QueryPlan] ~ {
Returns a cursor; retrievals using this new cursor match the attribute patterns against entries in the database. The baseIndex parameter governs which attribute type is used as the base index for the query; for baseIndex=NIL, an index is chosen that attempts to optimize the execution of the query (use this unless you really care about the order in which database entries are retrieved). The returned query plan provides information about the query. If planOnly is TRUE, then a query plan is formulated and returned, but a cursor for this query is not created.
dbInfo: LoganBerry.SchemaInfo ¬ LoganBerry.Describe[db: db];
base: SubPlan ¬ NIL;
figure out where to start and end in database to save work, and pick a "good" base index if necessary (the index that minimizes expected cost)
FOR p: AttributePatterns ¬ patterns, p.rest WHILE p # NIL DO
possibleBase: BOOLEAN = baseIndex=NIL OR baseIndex=p.first.attr.type;
new: SubPlan ¬ AnalyzeSubPlan[p.first, dbInfo, possibleBase, defaultPattern];
SELECT TRUE FROM
NOT possibleBase, new.itype = $notAnIndex => plan ¬ CONS[new, plan];
baseIndex = p.first.attr.type, base = NIL => base ¬ new;
new.cost <= base.cost => { -- new best base
plan ¬ CONS[base, plan];
base ¬ new;
};
ENDCASE => plan ¬ CONS[new, plan]; -- new plan not best
ENDLOOP;
IF base = NIL THEN { -- no attribute pattern for base
IF baseIndex = NIL THEN baseIndex ¬ dbInfo.indices.first.key; -- pick arbitrary base
base ¬ AnalyzeSubPlan[NEW[AttributePatternRec ¬ [attr: [baseIndex, NIL], ptype: NIL]], dbInfo, TRUE, defaultPattern];
};
IF base.itype = $notAnIndex THEN
ERROR LoganBerry.Error[$NoIndex, "No suitable base index"];
plan ¬ CONS[base, plan];
IF planOnly THEN RETURN;
simple base query
cursor ¬ LoganQueryClass.GenerateEntries[db: db, key: base.attr.type, start: base.start, end: base.end];
build up filters
FOR p: QueryPlan ¬ plan, p.rest WHILE p # NIL DO
IF NOT Rope.Equal[p.first.attr.value, ""] AND p.first.filter#NIL THEN
cursor ¬ LoganQueryClass.FilterEntries[input: cursor, pattern: p.first.attr.value, filter: p.first.filter, atype: p.first.attr.type, stopIfNothingGreater: FALSE];
ENDLOOP;
cursor ¬ LoganQueryClass.EnableAborts[input: cursor];
};
AnalyzeSubPlan: PROC [p: AttributePattern, dbInfo: LoganBerry.SchemaInfo, computeCost: BOOLEAN ¬ TRUE, defaultPattern: PatternType] RETURNS [plan: SubPlan] ~ {
okPattern: BOOLEAN;
plan ¬ NEW[SubPlanRec ¬ [attr: p.attr, ptype: p.ptype]];
IF plan.ptype=NIL THEN plan.ptype ¬ defaultPattern;
IF Rope.Equal[plan.ptype, "DWIM", FALSE] THEN {
plan.wasDWIM ¬ TRUE;
plan.ptype ¬ PatternMatch.DWIM[plan.attr.value];
};
plan.filter ¬ PatternMatch.Lookup[plan.ptype];
IF plan.filter=NIL THEN
plan.errMsg ¬ "INVALID PATTERN MATCHING";
[okPattern, plan.infoMsg] ¬ PatternMatch.CheckPattern[plan.attr.value, plan.filter];
IF NOT okPattern THEN
plan.errMsg ¬ "BAD PATTERN";
IF plan.wasDWIM THEN
plan.infoMsg ¬ Rope.Cat["using ", plan.ptype, "...", plan.infoMsg];
IF NOT computeCost THEN RETURN;
FOR i: LIST OF LoganBerry.IndexInfo ¬ dbInfo.indices, i.rest WHILE i#NIL DO
IF i.first.key = plan.attr.type THEN {
plan.itype ¬ i.first.order;
EXIT;
};
ENDLOOP;
IF plan.itype = $notAnIndex THEN RETURN;
[plan.start, plan.end] ¬ BaseStartEnd[plan.attr.value, plan.ptype, plan.itype];
plan.cost ¬ QueryCost[plan.start, plan.end, plan.itype];
};
QueryCost: PROC [start, end: ROPE, itype: ATOM ¬ $lex] RETURNS [est: REAL ¬ 1.0] ~ {
Estimates the cost of enumerating an index of the given type from start to end. The cost model is quite simple: it assumes that the probability of any given character being the nth character of a string is 0.01 for n in [1..N]. The estimate returned is the estimated fraction of the index that must be traversed. For instance, start="a" and end="b" for itype=$lex yields est=0.01, while start="aaa" and end="aac" for itype=$lex yields est=0.000002.
N: NAT = 5;
run: INT;
sChar, eChar: CHAR;
IF end=NIL THEN RETURN[1.0];
IF start=NIL THEN start ¬ " ";
SELECT itype FROM
$lex, $ascii => {
run ¬ Rope.Run[s1: start, s2: end, case: itype=$ascii];
IF run > N THEN RETURN[0.0]; -- the query has negligible cost
est ¬ 1.0;
THROUGH [1..run] DO est ¬ est*0.01; ENDLOOP; -- est ¬ 1/100run
sChar ¬ IF Rope.Length[start] <= run THEN ' ELSE Rope.Fetch[start, run];
eChar ¬ IF Rope.Length[end] <= run THEN ' ELSE Rope.Fetch[end, run];
IF itype=$lex THEN {
sChar ¬ Ascii.Upper[sChar];
eChar ¬ Ascii.Upper[eChar];
};
est ¬ est*MIN[eChar-sChar,100]*0.01;
};
$gmt => {
startDate, endDate: BasicTime.GMT;
year: REAL ¬ 31536000.0;
startDate ¬ Tempus.Parse[rope: start, search: TRUE ! Tempus.Unintelligible => GOTO BadPattern].time;
endDate ¬ Tempus.Parse[rope: end, search: TRUE ! Tempus.Unintelligible => GOTO BadPattern].time;
est ¬ BasicTime.Period[from: startDate, to: endDate]/year; -- assume one year is complete database
IF est < 0 OR est > 1.0 THEN est ¬ 1.0;
EXITS
BadPattern => RETURN[1.0];
};
ENDCASE => RETURN[0.5]; -- don't know the index layout so assume half the index will be needed on average
};
BaseStartEnd: PROC [pattern: ROPE, ptype: ROPE, itype: ATOM ¬ $lex] RETURNS [start, end: ROPE] ~ {
This routine exists solely for performance reasons. The intent is to identify a range [start..end] of entries in an index that includes all possible values that match the given pattern; clearly this range depends on the ordering of entries in the index ($lex, $ascii, $gmt, etc.). It is always acceptable to return [NIL, NIL], which indicates that the whole index must be enumerated; returning a range that is too small or incorrect, on the other hand, leads to incomplete answers to a filtered query.
ENABLE SyntaxError => {start ¬ end ¬ NIL; CONTINUE;}; -- raised by ParseSubrange
Bump: PROC [old: ROPE] RETURNS [new: ROPE] ~ INLINE {
last: INT = Rope.Length[old] - 1;
new ¬ IF last >= 0 THEN Rope.Replace[base: old, start: last, len: 1, with: Rope.FromChar[SUCC[Ascii.Lower[Rope.Fetch[old, last]]]]] ELSE NIL;
};
Upper: Rope.TranslatorType = {
[old: CHAR] RETURNS [new: CHAR] -- old Cedar7.0 impl
[old: CHAR] RETURNS [CHAR]
new: CHAR ¬ Ascii.Upper[old];
RETURN[new];
};
Lower: Rope.TranslatorType = {
[old: CHAR] RETURNS [new: CHAR] --old Cedar7.0 impl
[old: CHAR] RETURNS [CHAR]
new: CHAR ¬ Ascii.Lower[old];
RETURN[new];
};
SELECT itype FROM
$lex => {
SELECT TRUE FROM
Rope.Equal[ptype, "exact", FALSE] => {
start ¬ pattern;
end ¬ pattern;
};
Rope.Equal[ptype, "prefix", FALSE] => {
start ¬ pattern;
end ¬ Bump[start];
};
Rope.Equal[ptype, "wildcard", FALSE] => {
start ¬ Rope.Substr[base: pattern, len: Rope.Index[s1: pattern, s2: "*"]];
end ¬ Bump[start];
};
Rope.Equal[ptype, "re", FALSE] => {
i: INT ¬ Rope.SkipTo[s: pattern, skip: "\'#[^$*+(\\<{!"]; -- look for special chars
IF NOT i = Rope.Length[pattern] AND Rope.Fetch[pattern, i] = '* THEN -- could be zero of previous char
i ¬ i-1;
start ¬ Rope.Substr[base: pattern, len: i];
end ¬ Bump[start];
};
Rope.Equal[ptype, "soundex", FALSE] => {
start ¬ Rope.FromChar[Rope.Fetch[base: pattern, index: 0]];
end ¬ Bump[start];
};
Rope.Equal[ptype, "subrange", FALSE] => {
[start, end] ¬ ParseSubrange[pattern];
};
Rope.Equal[ptype, "numrange", FALSE] => {
find greatest common prefix for start and end
s, e: ROPE;
[s, e] ¬ ParseSubrange[pattern ! SyntaxError => {start ¬ NIL; CONTINUE}];
IF Rope.Length[s] = Rope.Length[e]
THEN {
start ¬ Rope.Substr[base: s, len: Rope.Run[s1: s, s2: e]];
end ¬ Bump[start];
}
ELSE start ¬ end ¬ NIL; -- can do nothing intelligent, e.g. 300-3000
};
Rope.Equal[ptype, "daterange", FALSE] => {
start ¬ end ¬ NIL; -- can do nothing intelligent
};
Rope.Equal[ptype, "date", FALSE] => {
start ¬ end ¬ NIL; -- can do nothing intelligent
};
ENDCASE => start ¬ end ¬ NIL;
};
$ascii => {
start ¬ BaseStartEnd[Rope.Translate[base: pattern, translator: Upper], ptype, $lex].start;
end ¬ BaseStartEnd[Rope.Translate[base: pattern, translator: Lower], ptype, $lex].end;
};
$gmt => {
SELECT TRUE FROM
Rope.Equal[ptype, "date", FALSE] => {
patternDate: BasicTime.GMT;
patternPrecision: Tempus.Precision;
[patternDate, patternPrecision] ¬ Tempus.Parse[rope: pattern, search: TRUE ! Tempus.Unintelligible => GOTO BadPattern];
start ¬ Tempus.MakeRope[time: patternDate, precision: patternPrecision];
[patternDate, patternPrecision] ¬ SELECT patternPrecision FROM
years => Tempus.Adjust[years: 1, baseTime: patternDate],
months => Tempus.Adjust[months: 1, baseTime: patternDate],
days => Tempus.Adjust[days: 1, baseTime: patternDate],
hours => Tempus.Adjust[hours: 1, baseTime: patternDate],
minutes => Tempus.Adjust[minutes: 1, baseTime: patternDate],
seconds => Tempus.Adjust[seconds: 1, baseTime: patternDate],
ENDCASE => Tempus.Adjust[seconds: 1, baseTime: patternDate];
end ¬ Tempus.MakeRope[time: patternDate, precision: patternPrecision];
EXITS
BadPattern => start ¬ end ¬ NIL;
};
Rope.Equal[ptype, "daterange", FALSE] => {
patternDate: BasicTime.GMT;
patternPrecision: Tempus.Precision;
[start, end] ¬ ParseSubrange[pattern];
[patternDate, patternPrecision] ¬ Tempus.Parse[rope: start, search: TRUE ! Tempus.Unintelligible => GOTO BadPattern];
start ¬ Tempus.MakeRope[time: patternDate, precision: patternPrecision];
[patternDate, patternPrecision] ¬ Tempus.Parse[rope: end, search: TRUE ! Tempus.Unintelligible => GOTO BadPattern];
patternDate ¬ SELECT patternPrecision FROM
years => Tempus.Adjust[baseTime: patternDate, years: 1, seconds: -1].time,
months => Tempus.Adjust[baseTime: patternDate, months: 1, seconds: -1].time,
days => Tempus.Adjust[baseTime: patternDate, days: 1, seconds: -1].time,
hours => Tempus.Adjust[baseTime: patternDate, hours: 1, seconds: -1].time,
minutes => Tempus.Adjust[baseTime: patternDate, minutes: 1, seconds: -1].time,
ENDCASE => Tempus.Adjust[baseTime: patternDate, seconds: 1].time;
end ¬ Tempus.MakeRope[time: patternDate, precision: seconds];
EXITS
BadPattern => start ¬ end ¬ NIL;
};
ENDCASE => start ¬ end ¬ NIL;
};
$int => {
SELECT TRUE FROM
Rope.Equal[ptype, "exact", FALSE] =>
start ¬ end ¬ pattern;
Rope.Equal[ptype, "intrange", FALSE] => [start, end] ¬ ParseSubrange[pattern];
ENDCASE => start ¬ end ¬ NIL;
};
ENDCASE => start ¬ end ¬ NIL; -- search complete database
};
Attribute patterns
SyntaxError: PUBLIC ERROR [explanation: ROPE ¬ NIL] = CODE;
ReadAttributePattern: PROC [s: STREAM, prefetch: ROPE ¬ NIL] RETURNS [a: AttributePattern] ~ {
An attribute pattern is of the form: atype(ptype): avalue. The ptype is optional, and there may be whitespace preceeding the avalue. The avalue can be either a token or rope literal.
ENABLE IO.EndOfStream => ERROR SyntaxError["unexpected end of input"];
ch: CHAR;
token: ROPE ¬ prefetch;
IF token = NIL THEN
token ¬ IO.GetTokenRope[s, IO.TokenProc ! IO.EndOfStream => GOTO none].token;
a ¬ NEW[AttributePatternRec];
a.attr.type ¬ Atom.MakeAtom[token];
ch ¬ IO.GetChar[s]; -- attribute separation char or "("
IF ch = '( THEN { -- read pattern type
a.ptype ¬ IO.GetTokenRope[s, IO.TokenProc].token;
ch ¬ IO.GetChar[s];
IF ch # ') THEN ERROR SyntaxError[Rope.Concat["unmatched ()'s: last read ", a.ptype]];
ch ¬ IO.GetChar[s];
};
IF ch # ': THEN ERROR SyntaxError[Rope.Concat["missing colon: last read ", a.ptype]];
[] ¬ IO.SkipWhitespace[stream: s, flushComments: FALSE];
IF IO.PeekChar[s] = '" THEN {
a.attr.value ¬ IO.GetRopeLiteral[s ! IO.Error => ERROR SyntaxError["Malformed rope literal"]];
}
ELSE
a.attr.value ¬ IO.GetTokenRope[s, TokenProc].token;
EXITS
none => RETURN[NIL];
};
ReadAttributePatterns: PUBLIC PROC [s: IO.STREAM] RETURNS [ap: AttributePatterns] ~ {
Read attribute patterns until there are no more and build up a list of them.
a: AttributePattern;
endOfAp: AttributePatterns ¬ NIL;
a ¬ ReadAttributePattern[s];
WHILE a # NIL DO
IF endOfAp = NIL THEN
ap ¬ endOfAp ¬ LIST[a]
ELSE {
endOfAp.rest ¬ LIST[a];
endOfAp ¬ endOfAp.rest;
};
a ¬ ReadAttributePattern[s];
ENDLOOP;
};
WriteAttributePattern: PROC [s: IO.STREAM, a: AttributePattern] RETURNS [] ~ {
IO.PutRope[s, Atom.GetPName[a.attr.type]];
IF a.ptype # NIL THEN IO.PutF1[s, "(%g)", IO.rope[a.ptype]];
IO.PutF1[s, ": ""%g"" ", IO.rope[a.attr.value]];
};
WriteAttributePatterns: PUBLIC PROC [s: IO.STREAM, ap: AttributePatterns] RETURNS [] ~ {
FOR p: AttributePatterns ¬ ap, p.rest WHILE p # NIL DO
WriteAttributePattern[s, p.first];
ENDLOOP;
};
PatternsToEntry: PUBLIC PROC [ap: AttributePatterns] RETURNS [entry: LoganBerry.Entry] ~ {
endOfEntry: LoganBerry.Entry ¬ NIL;
FOR p: AttributePatterns ¬ ap, p.rest WHILE p # NIL DO
IF endOfEntry = NIL THEN
entry ¬ endOfEntry ¬ LIST[p.first.attr]
ELSE {
endOfEntry.rest ¬ LIST[p.first.attr];
endOfEntry ¬ endOfEntry.rest;
};
ENDLOOP;
};
EntryToPatterns: PUBLIC PROC [entry: LoganBerry.Entry] RETURNS [ap: AttributePatterns] ~ {
pattern: AttributePattern;
endOfPattern: AttributePatterns ¬ NIL;
FOR e: LoganBerry.Entry ¬ entry, e.rest WHILE e # NIL DO
pattern ¬ NEW[AttributePatternRec ¬ [attr: e.first]];
IF endOfPattern = NIL THEN
ap ¬ endOfPattern ¬ LIST[pattern]
ELSE {
endOfPattern.rest ¬ LIST[pattern];
endOfPattern ¬ endOfPattern.rest;
};
ENDLOOP;
};
Boolean attribute-based queries
Several LoganQuery classes are composed in various ways to implement a boolean query filter.
BooleanFilterEntries: PUBLIC PROC [input: Cursor, query: ParseTree, inputOrder: LoganBerry.AttributeType, primaryKey: LoganBerry.AttributeType, defaultPattern: PatternType ¬ "DWIM"] RETURNS [output: Cursor] ~ {
Returns a cursor of class $BooleanFilter. Retrievals using this new cursor match the boolean query against entries returned by the input cursor. The query tree should be the result of ParseBooleanQuery. The inputOrder specifies the attribute by which entries on the input cursor are ordered, while primaryKey indicates the attribute for which these entries have unique values. The defaultPattern parameter indicates which type of pattern matching to use if this is not explictly specified in an attribute pattern. The input cursor could be obtained from QueryEntries (or a variety of operations in LoganQueryClass such as GenerateEntries).
Actually there is no $BooleanFilter class. The output cursor returned is of class $Aborter so that queries using this cursor can be aborted.
output ¬ CursorForSubtree[input, query, inputOrder, primaryKey, defaultPattern];
output ¬ LoganQueryClass.EnableAborts[input: output];
};
CursorForSubtree: PROC [input: Cursor, node: ParseNode, order, key: LoganBerry.AttributeType, dp: PatternType] RETURNS [output: Cursor] ~ {
This builds up the cursor structure that implements the boolean query rooted at the given node. It calls itself recursively.
LookupFilter: PROC [ap: AttributePattern, default: PatternType] RETURNS [proc: PatternMatch.MatchProc] ~ {
IF ap.ptype=NIL THEN ap.ptype ¬ default;
IF Rope.Equal[ap.ptype, "DWIM", FALSE] THEN
ap.ptype ¬ PatternMatch.DWIM[ap.attr.value];
proc ¬ PatternMatch.Lookup[ap.ptype];
};
SELECT node.tag FROM
$and => {
For computing ANDs, we simply pipeline the two subquery cursors. That is, the output of one query feeds into the input of the other.
output1: Cursor ¬ CursorForSubtree[input, node.child1, order, key, dp];
output ¬ CursorForSubtree[output1, node.child2, order, key, dp];
};
$or => {
For computing ORs, we duplicate the input cursor, run one instance through one subquery's cursor pipeline and the other through the other subquery's. Then the two cursors are merged back together and duplicates are eliminated.
input1, input2, output1, output2: Cursor;
[input1, input2] ¬ LoganQueryClass.DuplicateEntries[input];
output1 ¬ CursorForSubtree[input1, node.child1, order, key, dp];
output2 ¬ CursorForSubtree[input2, node.child2, order, key, dp];
output ¬ LoganQueryClass.MergeEntries[output1, output2, order];
IF order = key THEN
output ¬ LoganQueryClass.UnDuplicateEntries[output, order];
Note: UnDuplicateEntries wants its input cursor to be ordered by the primary key. If this is not the case, as it may not be, then we do not call UnDuplicateEntries and hence the client may see duplicate entries in the output cursor. This should be fixed someday, perhaps by modifying UnDuplicateEntries to take both an order and key parameter ala SubtractEntries.
};
$not => {
There are two cases of NOTs: In the simple case, the child is simply an attribute pattern; the entries are run through an antifilter. The complex case handles more general subqueries but is more expensive; we duplicate the input cursor, run one instance through the subquery's cursor pipeline and the other directly into a subtractor, which finds those entries that do not match the subquery.
input1, fullinput, output1: Cursor;
IF node.child1.tag = $ap THEN { -- simple case
output ¬ LoganQueryClass.AntiFilterEntries[input: input, pattern: node.child1.ap.attr.value, filter: LookupFilter[node.child1.ap, dp], atype: node.child1.ap.attr.type];
RETURN [output];
};
[input1, fullinput] ¬ LoganQueryClass.DuplicateEntries[input];
output1 ¬ CursorForSubtree[input1, node.child1, order, key, dp];
output ¬ LoganQueryClass.SubtractEntries[fullinput, output1, key, order];
};
$ap => {
In this case, the entries are simply filtered according to the given attribute pattern.
output ¬ LoganQueryClass.FilterEntries[input: input, pattern: node.ap.attr.value, filter: LookupFilter[node.ap, dp], atype: node.ap.attr.type];
};
ENDCASE;
};
Parsing boolean queries
The grammar for boolean queries is as follows:
query ::= boolexpr | boolexpr query
boolexpr ::= term | term "OR" boolexpr
term ::= factor | factor "AND" term
factor ::= attribute-pattern | "NOT" factor | "(" boolexpr ")"
attribute-pattern ::= attribute-type ":" attribute-value |
attribute-type "(" pattern ")" ":" attribute-value
A recursive decent parser is used to parse queries that comply with this grammar. For each non-terminal in the grammar there is a parse routine defined below that accepts the right-hand-side of the production from the input stream and returns a parse tree for that portion of the query.
TokenStream: TYPE = REF TokenStreamRec;
TokenStreamRec: TYPE = RECORD [
stream: STREAM,
prefetch: ROPENIL
];
ParseBooleanQuery: PUBLIC PROC [s: IO.STREAM] RETURNS [tree: ParseTree] ~ {
Returns the parse tree for the query read from the given stream. Raises SyntaxError if the query is not a proper boolean query.
ts: TokenStream ¬ NEW[TokenStreamRec ¬ [s]];
tree ¬ GetQuery[ts];
};
GetQuery: PROC [ts: TokenStream] RETURNS [tree: ParseTree] ~ {
query ::= boolexpr | boolexpr query
subTree: ParseTree ¬ GetBoolexpr[ts];
IF PeekToken[ts] # NIL THEN {
tree ¬ NEW[ParseNodeRec ¬ [tag: $and]];
tree.child1 ¬ subTree;
tree.child2 ¬ GetQuery[ts];
} ELSE {
tree ¬ subTree;
};
};
GetBoolexpr: PROC [ts: TokenStream] RETURNS [tree: ParseTree] ~ {
boolexpr ::= term | term "OR" boolexpr
subTree: ParseTree ¬ GetTerm[ts];
IF Rope.Equal[PeekToken[ts], "OR", FALSE] THEN {
[] ¬ NextToken[ts];
tree ¬ NEW[ParseNodeRec ¬ [tag: $or]];
tree.child1 ¬ subTree;
tree.child2 ¬ GetBoolexpr[ts];
} ELSE {
tree ¬ subTree;
};
};
GetTerm: PROC [ts: TokenStream] RETURNS [tree: ParseTree] ~ {
term ::= factor | factor "AND" term
subTree: ParseTree ¬ GetFactor[ts];
IF Rope.Equal[PeekToken[ts], "AND", FALSE] THEN {
[] ¬ NextToken[ts];
tree ¬ NEW[ParseNodeRec ¬ [tag: $and]];
tree.child1 ¬ subTree;
tree.child2 ¬ GetTerm[ts];
} ELSE {
tree ¬ subTree;
};
};
GetFactor: PROC [ts: TokenStream] RETURNS [tree: ParseTree] ~ {
factor ::= attribute-pattern | "NOT" factor | "(" boolexpr ")"
token: ROPE ¬ PeekToken[ts];
SELECT TRUE FROM
Rope.Equal[token, "NOT", FALSE] => {
token ¬ NextToken[ts];
tree ¬ NEW[ParseNodeRec ¬ [tag: $not]];
tree.child1 ¬ GetFactor[ts];
};
Rope.Equal[token, "("] => {
token ¬ NextToken[ts];
tree ¬ GetBoolexpr[ts];
token ¬ NextToken[ts];
IF NOT Rope.Equal[token, ")"] THEN
ERROR SyntaxError["Unmatched parenthesis"];
};
ENDCASE => { -- attribute-pattern
tree ¬ GetAttrributePattern[ts];
};
};
GetAttrributePattern: PROC [ts: TokenStream] RETURNS [tree: ParseTree] ~ {
attribute-pattern ::= attribute-type ":" attribute-value |
attribute-type "(" pattern ")" ":" attribute-value
tree ¬ NEW[ParseNodeRec ¬ [tag: $ap]];
tree.ap ¬ ReadAttributePattern[ts.stream, ts.prefetch];
IF tree.ap = NIL THEN
ERROR SyntaxError["Unexpected end of input"];
ts.prefetch ¬ NIL;
};
TokenProc: IO.BreakProc ~ {
RETURN[SELECT char FROM
IN [Ascii.NUL .. Ascii.SP] => sepr,
'(, '), ': => break,
ENDCASE => other];
};
NextToken: PROC [ts: TokenStream] RETURNS [token: ROPE] ~ {
IF ts.prefetch # NIL
THEN token ¬ ts.prefetch
ELSE token ¬ IO.GetTokenRope[ts.stream, TokenProc ! IO.EndOfStream => {token ¬ NIL; CONTINUE}].token;
ts.prefetch ¬ NIL;
};
PeekToken: PROC [ts: TokenStream] RETURNS [token: ROPE] ~ {
IF ts.prefetch = NIL THEN
ts.prefetch ¬ IO.GetTokenRope[ts.stream, TokenProc ! IO.EndOfStream => CONTINUE].token;
token ¬ ts.prefetch;
};
Aborting queries
AbortQuery: PUBLIC PROC [cursor: Cursor] ~ {
If a process is currently executing a NextEntry operation on the given cursor, the process is aborted and the operation returns NIL; any subsequent calls to NextEntry will also return NIL (i.e. the cursor is no longer useable). This can be used on cursors returned by either QueryEntries or BooleanFilterEntries.
LoganQueryClass.AbortQuery[cursor];
};
Miscellaneous
ParseSubrange: PROC [r: ROPE] RETURNS [start, end: ROPE ¬ NIL] ~ {
r should be of the form "start-end".
ENABLE IO.EndOfStream, IO.Error => GOTO Bad;
ToDash: IO.BreakProc = {
[char: CHAR] RETURNS [IO.CharClass]
RETURN[SELECT char FROM
'- => sepr,
ENDCASE => other];
};
s: IO.STREAM ¬ IO.RIS[r];
start ¬ IF IO.PeekChar[s] = '" THEN IO.GetRopeLiteral[s] ELSE IO.GetTokenRope[s, ToDash].token;
IF IO.GetChar[s] # '- THEN GOTO Bad;
end ¬ IF IO.PeekChar[s] = '" THEN IO.GetRopeLiteral[s] ELSE IO.GetLineRope[s];
EXITS
Bad => ERROR SyntaxError["Not a subrange"];
};
END.