[Cedar]<CedarChest6.1>Documentation>OneCasabaDoc.tioga
Sturgis, May 6, 1986 11:06:21 am PDT
OneCasaba
CEDAR 6.1 — FOR INTERNAL XEROX USE ONLY
OneCasaba
Sturgis
© Copyright 1986 Xerox Corporation. All rights reserved.
Abstract:
OneCasaba is an LR parsing package, containing a parser generator (GenOneCasabaParser) and a parser package (OneCasabaParser). Like YACC in Unix, or PGS in Mesa, OneCasabaParser calls client code as it performs each parsing action, allowing the client code to perform such actions as building a parse tree. OneCasaba is intended for use on small as well as large grammars. I intend for OneCasaba to be relatively easy to use, and to play a role in Cedar similar to that played by YACC in Unix.
OneCasaba differs from YACC and PGS in two ways. First, OneCasaba can handle any LR[1] grammar, where YACC and PGS are limited to LALR[1]. The reason that YACC and PGS are limited is that the canonical LR[1] construction leads to enormous table sizes, even for LALR[1] grammars, while there is a reasonable construction for LALR[1] grammars. OneCasaba avoids these large tables by using a new algorithm with the property that if the grammar is only "slightly" non LALR[1], then the resulting tables will be only slightly larger than those that would have been constructed if the grammar had been LALR[1].
Second, if the presented grammar is not LR[1], i.e., there is some canonical LR[1] state which contains conflicting LR[1] items, then OneCasaba not only names the conflicting items (as do YACC and PGS) but also provides derivations of the items which demonstrate that the two items must lie in the same canonical LR[1] state.
Created by: Sturgis.pa
Maintained by: Sturgis.pa
Keywords: Parsing, ParserGenerator, CompilerCompiler
XEROX Xerox Corporation
Palo Alto Research Center
3333 Coyote Hill Road
Palo Alto, California 94304
For Internal Xerox Use Only
0. Change Log
May 6, 1986 10:55:43 am PDT
Describe the newly added (dangerous) "AllowConflicts" feature
April 9, 1986 9:48:43 am PST
Add more guidance for coding parser client code.
April 8, 1986 1:16:56 pm PST
Initial version
1. Introduction
OneCasaba is an LR parsing package, containing a parser generator (GenOneCasabaParser) and a parser package (OneCasabaParser). Like YACC in Unix, or PGS in Mesa, OneCasabaParser calls client code as it performs each parsing action, allowing the client code to perform such actions as building a parse tree.
OneCasaba is one member of a (projected) family of packages. TwoCasaba enables the definition of a concrete grammar for parsing together with a related abstract grammar, and parsing results in the construction of an abstract parse tree for the supplied source text. ThreeCasaba4 allows the additional definition of a recursive function to be applied to the constructed abstract parse tree.
OneCasaba differs from YACC and PGS in two ways. First, OneCasaba can handle any LR[1] grammar, where YACC and PGS are limited to LALR[1]. The reason that YACC and PGS are limited is that the canonical LR[1] construction leads to enormous table sizes, even for LALR[1] grammars, while there is a reasonable construction for LALR[1] grammars. OneCasaba avoids these large tables by using a new algorithm with the property that if the grammar is only "slightly" non LALR[1], then the resulting tables will be only slightly larger than those that would have been constructed if the grammar had been LALR[1]. While this is an interesting algorithm, I have yet to see a naturally occurring programming language grammar that is LR[1] but non LALR[1]. (There are of course grammars that are LR[1] but non LALR[1], its just that all of them that I have seen are specially constructed to have this property.)
Second, if the presented grammar is not LR[1], i.e., there is some canonical LR[1] state which contains conflicting LR[1] items, then OneCasaba not only names the conflicting items (as do YACC and PGS) but also provides derivations of the items which demonstrate that the two items must lie in the same canonical LR[1] state.
OneCasaba contains a tool for constructing parser tables, GenOneCasabaParser, and a package for parsing source text (OneCasabaParser). GenOneCasabaParser is described in section 2, OneCasabaParser is described in section 3, and an example is described in section 4.
DF File: [Cedar]<CedarChest6.1>Top>OneCasaba.df
Documentation: [Cedar]<CedarChest6.1>Documentation>OneCasabaDoc.tioga
2. Parser Table Generation
A OneCasaba parser is represented by a file containing a table that describes the parse states, and possible parse actions. This file is constructed from a grammar definition by GenOneCasabaParser.
If Foo.OneCasaba contains a description of a grammar, then the following command performed in a commander window:
GenOneCasabaParser Foo
will construct Foo.KipperedParseTables.
OneCasabaParser deals in numbers to identify the productions, while the client would prefer to deal in names. To assist the user, GenOneCasabaParser prints a message containing an ordered list of the production names, to be copied into client code; e.g., in an enumerated type declaration.
To see the normal messages produced by GenOneCasabaParser, perform the following command in a commander window:
GenOneCasabaParser OneCasabaFormat
The remainder of this section contains a description of the text format for describing a grammar, and of the possible error messages from GenOneCasabaParser, with the exception of those messages related to non LR[1] grammars. Section 5 discusses the messages that occur when the described grammar is not LR[1].
2.1 Format of a OneCasaba source file.
A OneCasaba source file contains a context free definition of a grammar, and its name must have a suffix of "OneCasaba". The general form of the text is and optional "AllowConflicts", followed by "Begin", followed by a sequence of items, separated by semicolons, and terminated with "End.". The items define symbols or productions. (AllowConflicts is dangerous, see section 2.4.)
In general, a context free grammars is specified by giving a set of terminal symbols, a disjoint set of nonterminal symbols, and one or more productions for each nonterminal symbol. In OneCasaba, we distinguish between two kinds of terminal symbols: generic and simple. A generic token defines a class of tokens that will be treated as a single terminal symbol during parsing, but whose individual spelling has semantic content; identifiers are a standard example. A simple token defines a single terminal symbol, and has a single spelling. (OneCasaba requires that each grammar symbol appear in a declaration.)
A simple token is represented by placing its spelling between double quotes. For example, "+" represents the plus sign and "BEGIN" represents the Begin token of Mesa. A simple token declaration consists of a left currly bracket, a sequence of one or more individual representations of simple tokens, a right currly bracket, a colon and "SimpleTokens" (without the quotes). A OneCasaba file may contain more than one simple terminal declaration. For example, the five tokens "+", "-", "*", "Begin", and "End" are declared as follows:
{ "+" "-" "*" }: SimpleTokens;
{ "Begin" "End" }: SimpleTokens
Simple tokens are constrained to either be of the form of an identifier, or to be one or two punctuation marks. (This, and other restrictions, are a consquence of using IO.GetCedarTokenRope as the tokenizer.)
Generic tokens must by one of a small number recognized by IO.GetCedarTokenRope, in particular one of those represented by the following IO.TokenKind: tokenID, tokenDECIMAL, tokenOCTAL, tokenHEX, tokenREAL, tokenROPE, tokenCHAR, and tokenATOM. A generic token by be given any name in a OneCasaba file. A single generic token is declared in one item by giving its name, a colon, the simple token "GenericToken", an equal sign, and the spelling of its TokenKind, placed in double quotes. For example:
Id: GenericToken = "tokenID"
A nonterminal is declared by giving its name, a colon, and "NonTerminal". For example:
Expression: NonTerminal
A production consists of its name, a left arrow, and a sequence of zero or more symbols. A production name is either the left hand side nonterminal of the production, or if there is more than one production with the same left hand side, the left hand side nonterminal, a dot, and a disambiguating identifier. Each production must have a unique name.
For example:
Expression.sum ← Id "+" Id
is a production whose name is Expression.sum, whose left hand side nonterminal is Expression and whose right had side is a sequence of three symbols: Id, "+", and Id. Likewise,
MainGoal ← Expression
is a production whose name is MainGoal, whose left hand side nonterminal is also MainGoal, and whose right hand side is the single symbol: Expression.
OneCasaba treats the first production mentioned as the root production, and thus the nontermial on its left side is the start symbol of the grammar. For technical reasons, the start symbol must not occur in any other production.
Here is a complete OneCasaba grammar for a rather trivial expression language:
Begin
{ "+" "-" }: SimpleTokens;
Id: GenericToken = "tokenID";
MainGoal: NonTerminal;
MainGoal ← Expression;
Expression: NonTerminal;
Expression.sum ← Id "+" Id;
Expression.diff ← Id "-" Id;
End.
There are a few fine points. 1) In constructing names for productions, the names must be unambiguous after the connecting dots have been removed. 2) One can use a name with a dot even if there is only one production with that nonterminal as its left hand side, e.g. we could have used
MainGoal.only ← Expression
in the above example. 3) We require than any nonterminal be capable of generating a phrase containing no nonterminals. 4) We require that any non terminal can be generated by the start symbol.
A complete grammar for this file format may be found in OneCasabaFormat.OneCasaba; although these files are currently parsed by a hand coded parser (written before OneCasaba was running).
2.2 Non LALR[1] grammars
GenOneCasabaParser can handle some non LALR[1] grammars, unlike most LR parser generators. I offer a little background information to help in understanding what GenOneCasabaParser can accomplish. LALR[1] grammars form a subset of the LR[1] grammars. There is a canonical algorithm for constructing a parser for an LR[1] grammar, however this canonical algorithm leads to enormous parser tables. There is a another algorithm which works exactly on LALR[1] grammars, and for these grammars produces parser tables which are much smaller than would be constructed for the same grammar by the canonical LR[1] algorithm. Most LR parser generators use this LALR[1] algorithm.
There have been a few LR[1] algorithms published that claim to work on all LR[1] grammars and produce tables comparable in size to those produced by the LALR[1] algorithm. The papers I have seen have not contained proofs of correctness, and the algorithms tend to be fairly complicated.
GenOneCasabaParser contains an algorithm for which I make the same claims, i.e., it works on all LR[1] grammars and produces tables comparable in size to those produced by the LALR[1] algorithm. Further, the algorithm is relatively easy to understand, and I believe that I can produce a proof of correctness. (I intend to publish this at some time in the future.)
This algorithm depends on "splitting" some states of the (inconsistent) LALR[1] parser. Before this algorithm can be applied, appropriate states must be selected for splitting. There is a trivial collection of states that will suffice, but if the algorithm is applied to this trivial collection the result may again be an enormous parser table.
GenOneCasabaParser contains a second algorithm that determines if the grammar is LR[1], and if so selects a collection of states to split. If the grammar is not LR[1], it produces a proof to that effect. (The error message discussed below for non LR[1] grammars is based on this proof.)
At the moment, I know of no interesting programming language grammars that are both LR[1] and not LALR[1]. (There are of course grammars that are both LR[1] and not LALR[1], but these grammars are specially constructed to have this property.)
Thus, I believe that the interesting algorithm in GenOneCasabaParser is the one that constructs a proof that a grammar is non LR[1]. I have applied this algorithm to an old version of a Mesa language grammar that contains the common If/Then/Else ambiguity. It is able to prove the grammar non LR[1] in a small number of minutes, and print out various conflicting derivations. (I was unable to prove that Mesa.Grammar was not LR[1] with my basic state splitting algorithm, because to do so with that algorithm requires applying it to the "trivial" collection of states mentioned above, and the resulting explosion of states exceeded the VM size of Cedar.)
You can see the results yourself. Perform the following in a Command Tool window (because Mesa.Grammar is not in the OneCasaba format, a different command is required to process it) (since Mesa.Grammar is a large grammar, over 400 productions, it requires about 9 minutes on a Dorado):
Install GenOneCasabaParser
← Commander.Register["TestLR1", GenOneCasabaParserDriver.TestLR1]
TestLR1 Mesa.Grammar
2.3 Easy Error messages from GenOneCasabaParser.
A OneCasaba source file may have four kinds of problems: syntactic, deficiencies in the defined grammar, a non LALR[1] grammar, or a non LR[1] grammar. We consider each of these in turn.
Syntax errors
A syntax error in the OneCasaba source file immediatly stops processing, and produces a message containing the location of the error.
Immediate Grammar deficiencies
GenOneCasabaParser requires that all symbols be declared exactly once. If a grammar violates this requirement, then error messages are generated for each violation, and processing is terminated.
The LR analysis used in GenOneCasabaParser makes some initial assumptions about the grammar: that all nonterminals can be reached from the root nonterminal, and that all nonterminals eventually generated pure terminal symbol sequences, i.e., symbol sequences that contain no nonterminals. In other words, there are no useless nonterminals. If the grammar fails to satisfy these assumptions, an appropriate error message is printed and processing is terminated.
non LALR[1] grammar
If the grammar is not LALR[1], GenOneCasabaParser prints a message to that effect, and then begins a conflict analysis to determine if the grammar is LR[1].
non LR[1] grammar
(See section 5)
2.4 Optional method for handling non LALR[1] grammars and ambiguous grammars.
There is a standard technique for handling non LALR[1] grammars, this is to make an arbitrary choice of parsing action when there is a conflict. On a shift/reduce conflict the usual choice is to perform a shift, and on a reduce/reduce conflict, some choice is made between the productions.
If "AllowConflicts" is placed immediately before the "Begin", then OneCasaba will follow this procedure. No attempt will be made to analyze the conflicts to see if the grammar is actually LR[1]. Rather, shift/reduce conflicts will be replaed by a shift, and reduce/reduce conflicts will be replaced by a reduce of the earlier defined production.
I do not reccommend using this option. It is easy to construct ambiguous grammars for which the resulting parser will not successfully parse legal sentences. The one case where I know that it has been useful in practice is in resolving the IF/THEN/ELSE ambiguity. However, this ambiguity is easily resolved by a non ambiguous LALR[1] grammar. (Cedar uses this non ambiguous grammar.)
3. Client Code for driving the Parser.
OneCasabaParser is a package to be used by client code. It provides a procedure to read a file and construct a parser table, and a procedure to parse input text. A client provides two calls: 1) a call on BuildParserTableFromKipperedStream to read in a parser table, and 2) a call on Parse to perform an actual parse. The call on Parse supplies: 1) the previously read parser table, 2) a call back procedure to read in successive tokens, 3) several call back procedures to respond to individual parse actions, and 4) a catch phrase for the signal GetReportStream used in case of syntax errors. In addition, reduction actions are reported to the client with production numbers, rather than names. These numbers can be converted to names by using the text list of names produced during parser table generation. Finally, several modules are used by OneCasabaParser, as described in LoadOneCasabaParser.cm.
3.1 Reading Parser Tables
OneCasabaParser.BuildParserTableFromKipperedStream
BuildParserTableFromKipperedStream: PROC[STREAM] RETURNS[ParserTable]
reads a stream and produces a parser table (previously writen by GenOneCasabaParser) for subsequent parsing.
3.2 Parsing
OneCasabaParser.Parse is to be used for parsing.
Parse: PROC[
table: ParserTable,
getSourceToken: PROC RETURNS[tokenKind: TokenKind, tokenText: ROPE, position: INT],
showReduce: PROC[rule: CARDINAL, firstCharPosition: INT, length: INT] ← NIL,
showGenericShift: PROC[code: CARDINAL, kind: TokenKind, text: ROPE, firstCharPosition: INT] ← NIL,
showNonGenericShift: PROC[text: ROPE, firstCharPosition: INT] ← NIL]
RETURNS[accepted: BOOLEAN]
It is up to client code to perform tokenization and to perform any semantic actions. OneCasabaParser.Parse calls the client for successive input tokens, and calls the client to report each parse action. It is supplied a ParserTable, and several call-back procedures; one to obtain tokens, and one for each announced parse action. The final return of OneCasabaParser.Parse is used to announce acceptance.
The caller of OneCasabaParser.Parse must also be prepared to catch the SIGNAL OneCasabaParser.GetReportStream, and RESUME with an IO.STREAM to be used for reporting syntactic errors.
OneCasabaParser.Parse identifies the productions by a production number, while the client code would prefer to identify the productions by names. In particular, the client code would like to use names directly related to the names occurring in the grammar definition. To assist in this translation, GenOneCasabaParser produces a message containing the production names in numerical sequence. These names can, for example, be copied into an enumerated type declaration.
DemoOneCasaba.mesa contains a simple example which simply prints out a description of each parse step as it happens. In this case, the list of production names (with added quotes) was used to initialize an array of production names.
Usually, one wishes to write client code that computes some function of the source text, built up recursively on the structure of the text. Here is one way to go about doing this.
Use a LIST OF REF ANY to represent partal values computed on substructures of the text. Write call back procedures for shifting a generic token, and reduction steps, but supply NIL for shifting a simple token.
The call back procedure for shifting a generic token first does a SELECT on the kind of the token, and for each token kind computes a value and constructs a REF to that value. This may involve making symbol table entries for identifiers, and computing numerical values of decimal tokens. Having constructed such a REF, place it on the partial value list by
valList ← CONS[refVal, valList]
The call back procedure for a reduction first does a SELECT on the production number, for each production computes a new value as a function of some previously computed values obtained from the partial value list, and replaces the previously computed values on the partial value list with this new value. In writing the SELECT statement, we can use the enumerated type produced during parser generation.
WITH ProductionNames[VAL[rule]] SELECT FROM
ItemsimpleTkns => ... ;
Each case of the select will pop several items off of the partial value list (i.e., valList.first, valList.rest.first, etc.), and push a new one on (i.e., valList ← CONS[refNewVal, valList.rest.rest]).
DemoOneCasaba.mesa also contains a partial example of such a computation (see DemoOneCasaba.DemoAComputation).
3.3 Installing
OneCasabaParser.PaserImpl requires several additional modules and LoadOneCasabaParser.cm is a cm file that describes what is needed. This file may be Installed by other such files, for example, see GenOneCasabaParser.load or DemoOneCasaba.load.
4. A complete example
OneCasabaFormat.OneCasaba is a description of the grammar used for OneCasaba files (at present GenOneCasabaParser uses a hand coded parser) and DemoOneCasaba is a client program that using this grammar. You may exercise this demonstration by executing the following commands in a Commander window:
GenOneCasabaParser OneCasabaFormat
DemoOneCasaba OneCasabaFormat
Of course, DemoOneCasaba can be applied to any OneCasaba file, for example:
DemoOneCasaba IfThenElseExample
5. Error Messages for Non LR[1] grammars
For most users of an LR parser generator, the most difficult messages to interpret are those that report that a grammar is non LR (of whatever particular variety of LRness). Unfortunately, this is also true for GenOneCasabaParser, so I include a longer discussion of these error messages and how to interpret them, and conclude with a worked example.
OneCasaba discovers that a grammar is not LR[1] by finding a pair of LR items that conflict, and a proof that these two items must occur in some single state of the canonical LR[1] parser. Having found such a pair, OneCasaba prints out the pair, and the proof. The ability to display this proof is an advantage that OneCasaba has over most other LR parser generators; they are usually able only to print the pair of conflicting items. I believe that this printed proof offers some intuitive help to the user in understanding how to fix the grammar.
To see an example of such a message, execute the following in a commander window:
GenOneCasabaParser IfThenElseExample
For each pair of conflicting items, OneCasaba prints out the pair, and for each member of the pair, a sequence of LR items. These sequences constitute the proof that the pair must occur in the same state.
These error messages are of the form:
An unresolvable conflict
between Item1 and Item2
Item1 can be reached by
Item
Item
Item
and
Item2 can be reached by
Item
Item
Item
An LR[1] item is a triple: a production of the grammar, a position in that production, and a terminal symbol that may follow an occurrence of the production. We represent the production and the position in the production simultaneously by italicising the symbol following the position. e.g., in:
[ statement ← If expression Then statement Else statement | <;> ]
The production is
"statement ← If expression Then statement Else statement",
the position immediately follows the "Else", and a semicolon may follow an occurrence of the production. Error messages also contain LR[0] items, which have no following terminal symbols, e.g.:
[ statement ← If expression Then statement Else statement ]
If the position follows the entire right hand side, no symbol is italicised:
[ statement ← If expression Then statement Else statement | <;> ]
Two kinds of LR conflict can occur: reduce/reduce and shift/reduce. In a reduce/reduce conflict, the parser does not know which of two reductions to apply. A reduction is generated when a state contains an LR item with the position at the immediate right hand end. In the OneCasaba printing format, this is indicated by no italicised symbol on the right side of the production. A reduce/reduce conflict is indicated when neither LR item contains an italicised symbol, and both contain the same follow symbol.
A shift/reduce conflict is indicated by an italicised symbol occurring in one of the LR items. In this case, the italicised symbol will be a terminal symbol, and it will be the same terminal symbol occurring as the follow symbol of the other item.
The sequence of LR items in an error message describe four things: a derivation, the lexical phrase generated by the derivation, a position in the phrase, and a terminal symbol that can occur as the first terminal symbol of some expansion of the symbols occurring to the right of the defined position.
Each italicised symbol occurring in one production (marking a position) is the left side symbol of the next production. Thus the sequence of productions and marked positions defines a derivation as follows: begin with the first production, and successively replace the italicised symbols with the right had side of the next production. The final result defines a lexical phrase, and the italicised symbol defines a position. (If there was no italicised symbol in the last production, the defined position is immediately to the right of the right hand symbols of the last production.) The terminal symbol contained in the last item names a terminal symbol that can occur as the first terminal symbol of some expansion of the symbols occurring to the right of the defined position.
Consider a simple example:
[ MainGoal ← a ]
[ a ← b c d ]
[ c ← e f | <.> ]
The phrase generated by the derivation, with position marked by an arrow, is:
b ^ e f d
and a period can occur as the first terminal symbol of some expansion of "e f d".
In general, when the two phrases are constructed, one for each of the conflicting items, it will be seen that the sequence of symbols up to the marked position are identical.
Now, a technical note. The sequence of symbols up to the marked position defines a path in the canonical LR[1] parser. It is a fact that the last LR[1] item of the sequence will lie in the state-set at the end of this path. (In fact, inital sub-sequnces of the items also define shorter paths, and the final LR[1] items of these sub-sequences lie in the similarly defined state-sets.) Since the paths defined for each of the two items are the same, they must both lie in this single state-set. This is an LR[1] conflict.
One can make some easy hand transformations one the sequences that will make them more informative. (Some day I'll add some code to do this automatically.)
1) If an LR[1] item generates a shift step, (i.e., the right hand side of the production contains a symbol in italics), then none of the follow symbols in the derivation are significant. Simply replace all of the items by corresponding LR[0] items obtained removing the follow symbol from each LR[1] item.
2) If an LR[1] item generates a reduction step, (i.e., the right hand side of the production contains no symbol in italics), and different follow symbols occur in the derivation, then some of these LR[1] items can be replaced by LR[0] items. Examine the LR[1] items in the derivation, starting with the final one. As you move backwords, you will encounter an LR[1] item whose follow symbol differs from the follow symbol in the final item. Replace this LR[1] item with the corresponding LR[1] item, and do so for all earlier items.
3) Some of the initial items in the two derivations will be the same. Shorten each derivation by removing all but the last of these initial equal items.
The result of these transformations will be a considerably more readable pair of derivations.
2.4 A worked example of a non LR[1] grammar
The grammar defined in the file
"IfThenElseExample.OneCasaba"
contains the common shift/reduce If/Then/Else ambiguity. Executing
GenOneCasabaParser IfThenElseExample
results in the following error message:
An unresolvable conflict
between [ statement ← If expression Then statement | <Else> ]
and [ statement ← If expression Then statement Else statement | <Else> ]
[ statement ← If expression Then statement | <Else> ]
can be reached by
[ MainGoal ← statement ]
[ statement ← If expression Then statement ]
[ statement ← If expression Then statement Else statement ]
[ statement ← If expression Then statement Else statement ]
[ statement ← If expression Then statement Else statement | <Else> ]
[ statement ← If expression Then statement | <Else> ]
and
[ statement ← If expression Then statement Else statement | <Else> ]
can be reached by along the same path by
[ MainGoal ← statement ]
[ statement ← If expression Then statement ]
[ statement ← If expression Then statement Else statement ]
[ statement ← If expression Then statement Else statement ]
[ statement ← If expression Then statement Else statement | <Else> ]
[ statement ← If expression Then statement Else statement | <Else> ]
In this case, we see that this is a shift/reduce conflict, because one of the productions in the conflict pair contains an italicised "Else". Further, we are presented with the following common path to the conflict state:
If expression Then If expression Then statement Else If expression Then If expression Then statement Else If expression Then statement
(OneCasaba does not always find the shortest path, just some path.) And finally, the two conflicting LR[1] items will occur in the canonical LR[1] state-set at the end of this path.
We can apply the three simplification steps:
1) The second LR[1] item generates a shift step (because "Else" is in italics). Therefore, all LR[1] items occurring its derivation can be replaced by LR[0] items. These results in:
[ MainGoal ← statement ]
[ statement ← If expression Then statement ]
[ statement ← If expression Then statement Else statement ]
[ statement ← If expression Then statement Else statement ]
[ statement ← If expression Then statement Else statement ]
[ statement ← If expression Then statement Else statement ]
2) The first LR[1] item generates a reduction step (no right hand side symbol is italicised). However, in this case, there are no LR[1] items that can be replaced by LR[1] items.
3) We now see that the first four items in each derivation are identical. Thus, we can omit the first three, leaving:
[ statement ← If expression Then statement | <Else> ]
can be reached by
[ statement ← If expression Then statement Else statement ]
[ statement ← If expression Then statement Else statement | <Else> ]
[ statement ← If expression Then statement | <Else> ]
and
[ statement ← If expression Then statement Else statement ]
can be reached by along the same path by
[ statement ← If expression Then statement Else statement ]
[ statement ← If expression Then statement Else statement ]
[ statement ← If expression Then statement Else statement ]
After this simplification, it is easier to see what situations will lead to a conflict during parsing. These are both derivations for "statement", from the first we can generate the following, (with brackets added to show the structure, and an arrow to mark the parsing position):
If expression Then [If expression Then statement Else [If expression Then statement] ^ ] Else statement
and from the second we can generate:
If expression Then [If expression Then statement Else [If expression Then statement ^ Else statement]] Else statement
These sequences are identical up to the pointer and the terminal symbol "Else" can follow the pointer. We see that in the first case we would like to reduce "If expression Then statement" to a statement, and in the second we would like to advance the pointer. These are both expansions of "statement", and hence could occur at any point in the grammar that "statement" can occur.