1.LR Parsing intro
Michael Santacroce
EECE 6083 Compiler Theory
University of Cincinnati
2.My goals for today
Go through and understand:
Basics of LR parsing, with an example
How the LR parsing algorithm can be implemented using LR(0), with an example
What different kinds of LR implementations exist
How LR parsing is used today
How I used LR parsing
3.Helpful links
Shameless borrowing slides from MIT: http://cons.mit.edu/fa17/ocw/L03.pdf
Is it legal for me to do this? https://academia.stackexchange.com/questions/56551/plagiarism-of-lecture-slides
Thanks @ Martin Rinard from MIT for your help,
https://en.wikipedia.org/wiki/LR_parser#Parsing_steps (really like 40 wiki pages)
Thanks WIKIPEDIA for letting me use your stuff -> example comes from here
https://en.wikipedia.org/wiki/GLR_parser
https://stackoverflow.com/questions/2676144/what-is-the-difference-between-lr-slr-and-lalr-parsers?noredirect=1&lq=1
https://stackoverflow.com/questions/7378337/what-is-the-difference-between-lr0-and-slr-parsing
https://stackoverflow.com/questions/19663564/what-is-the-difference-between-lalr-and-lr-parsing
http://www.supereasyfree.com/software/simulators/compilers/principles-techniques-and-tools/parsing-simulator/download.php
4.What did we learn from smart people (MIT)
Basic shift/reduce algorithm
Shift and then reduce (who could have guessed)
Reduce -> match largest possible production on RHS of stack
Possible conflicts
Reduce/reduce -> two possible reductions at once (not a huge deal)
Shift/reduce -> possible to both shift and reduce (huge deal)
Still a lot left unanswered
How to translate grammar into a parser
How to resolve conflicts
LR vs LALR vs GLR vs SLR
5.How to translate grammar into a parser
How does a parser “know” whether to shift or reduce?
Create a Parse Table
Has the intersection of every possible valid stack state to every possible next token
Isn’t that a whole bunch of stuff?
Yes
We will get to that
Hold your horses
6.How to translate grammar into a parser
How does a parser “know” whether to shift/reduce/error?
Maintain state stack
Look at next token
If corresponding action exists, do it
Use GOTO if specified for reduction
Else, ERROR
Table can be procedurally generated
7.How to translate grammar into a parser
Example: 1 + 1
STACK (tokens would not actually be there):
[0]
[0 '1' 2]
[0 B 4]
[0 E 3]
[0 E 3 '+' 6]
[0 E 3 '+' 6 '1' 2]
[0 E 3 '+' 6 B 8]
[0 E 3]
[0]
Questions?
8.Notes and recap
Remember when I said we’ll get to it (the large parse tables)
LR parses have an easier, more explicit, faster algorithm than LL parsers
My algorithm on its own is only 50 lines
LR parsers require much more infrastructure than LL parsers
My parse table is something like 200 lines
Classic case of memory vs complexity
For y=x^2: would you rather have a table of all possible y for any x?
Or a function that calculates y given any x?
LR parsers are guaranteed linear time compared to LL parsers
Unsure about complexity for LL, Wilsey mentioned advancements, idk, it depends
9.sidenote
Why did I choose LR parsing?
Previously spent an inordinate amount of time understanding yacc
Did not like the idea of backtracking
Made the grammar more human readable (to me)
Less coding, I am lazy, mostly just making the parse table
10.The next step: how to resolve conflicts
Shift/reduce and reduce/reduce errors are going to happen
Having a perfectly unambiguous grammar is a lot of work and results in grammars that are huge
The solution is having a lookahead token (solve shift/reduce problem we mentioned)
Lookahead in LR parsers allows ambiguity in a language
Ok, so how do we make a LR parser with lookahead?
Idea -> when reducing, look at a lookahead table
If an entry in lookahead table matches next token(s), then don’t reduce yet
11.LR vs LALR vs GLR vs SLR VS LR(1) Vs LALR(7)…
Gee whiz, there are a lot of types of LR parsers
All of them work the same way (ish), except for GLR
The difference is simply in how the table is generated and by extension what grammars are accepted
Lookahead Tokens
Allow parser to make decisions ahead of time and therefore process more ambiguous grammars
The previous example was considered LR(0)
LR(k) is a LR parser with k lookahead tokens
All of these parsers are variants of LR(1) (no such thing as SLR(0)/LALR(0))
12.Parser types overview
What grammars are accepted?
LR(1) ⊃ LALR(1) ⊃ SLR(1) ⊃ LR(0)
Canonical LR parser = LR(1) parser
Example: SLR(1) vs LR(0): https://en.wikipedia.org/wiki/Simple_LR_parser
Example input: 1 1 1
LR(0) says: S S S ?
SLR(1) says: S
13.Ok, so how does SLR(1) work
Shift/reduce as usual, except
When about to reduce, check lookahead token
If the current pattern + lookahead token matches some pattern, stop reducing
In our grammar: all factors are terms, all terms are relations, all relations are arithOps… but we do not want that to happen
Regular table Lookahead Table
14.SLR Parser notes
What I used (ish) for the compiler project
Didn’t make a true table, because dictionaries
Still able to create by hand
Otherwise I would have used some black magic
Past this point it’s exceedingly difficult to make these parsers by hand
Use a parser generator
Makes LR parsing a “black box”, which can be hard to work with and debug
15.LALR, LR(1)
LALR is an improvement on LR(1)
This is what most parser generators are going to use
It accepts less grammars than LR(1)
However, has a substantially smaller parse table
Works by merging states
Merges rules that have identical kernel sets
Results in reduce/reduce conflicts
SLR(1) merges further and therefore has even more conflicts
16.LALR, LR(1)
17.GLR parser
Created to handle “nondeterministic and ambiguous grammars”
Algorithm
Once again, similar to LR
However, GLR parse allows for multiple transitions (aka shift/reduce, reduce/reduce)
When a conflict is encountered, fork into 2+ parallel stacks
Continue to read input tokens to determine each stack transition
(possibly keep forking)
If at any point, no transition is valid, then discard entire stack
Complexity
O(n^3) in worst case, but it depends on grammar
For a deterministic grammar, runs in O(n) time
Grammars are usually deterministic or close to it, so W(n) is pretty unlikely
18.LR parsers today
C++ is really hard to do with an LR parser
“C++ grammar is ambiguous, context-dependent and potentially requires infinite lookahead to resolve some ambiguities”
Want to read a 400 page PhD thesis? http://www.computing.surrey.ac.uk/research/dsrg/fog/FogThesis.pdf
Read somewhere that a GLR parser can solve this issue
Besides that, pretty much any language can be parsed with LR
“Frank DeRemer invented LALR parsers with his PhD dissertation, called "Practical LR(k) Translators", in 1969, at MIT.” - https://en.wikipedia.org/wiki/LALR_parser_generator
Yacc (yet another compiler compiler), Stephen Johnson in 1975 - LALR
Bison, Robert Corbett, 1985 – LALR, GLR
https://github.com/lalrpop/lalrpop for rust
http://www.antlr.org (GLR parser)
…