LR Parsing intro, Michael Santacroce EECE 6083 Compiler Theory University of Cincinnati

Download this Presentation


Presentation Transcript

  • 1.LR Parsing intro Michael Santacroce EECE 6083 Compiler Theory University of Cincinnati
  • 2.My goals for today Go through and understand: Basics of LR parsing, with an example How the LR parsing algorithm can be implemented using LR(0), with an example What different kinds of LR implementations exist How LR parsing is used today How I used LR parsing
  • 3.Helpful links Shameless borrowing slides from MIT: Is it legal for me to do this? Thanks @ Martin Rinard from MIT for your help, (really like 40 wiki pages) Thanks WIKIPEDIA for letting me use your stuff -> example comes from here
  • 4.What did we learn from smart people (MIT) Basic shift/reduce algorithm Shift and then reduce (who could have guessed) Reduce -> match largest possible production on RHS of stack Possible conflicts Reduce/reduce -> two possible reductions at once (not a huge deal) Shift/reduce -> possible to both shift and reduce (huge deal) Still a lot left unanswered How to translate grammar into a parser How to resolve conflicts LR vs LALR vs GLR vs SLR
  • 5.How to translate grammar into a parser How does a parser “know” whether to shift or reduce? Create a Parse Table Has the intersection of every possible valid stack state to every possible next token Isn’t that a whole bunch of stuff? Yes We will get to that Hold your horses
  • 6.How to translate grammar into a parser How does a parser “know” whether to shift/reduce/error? Maintain state stack Look at next token If corresponding action exists, do it Use GOTO if specified for reduction Else, ERROR Table can be procedurally generated 
  • 7.How to translate grammar into a parser Example: 1 + 1 STACK (tokens would not actually be there): [0] [0 '1' 2] [0 B 4] [0 E 3] [0 E 3 '+' 6] [0 E 3 '+' 6 '1' 2] [0 E 3 '+' 6 B 8] [0 E 3] [0] Questions?
  • 8.Notes and recap Remember when I said we’ll get to it (the large parse tables) LR parses have an easier, more explicit, faster algorithm than LL parsers My algorithm on its own is only 50 lines LR parsers require much more infrastructure than LL parsers My parse table is something like 200 lines Classic case of memory vs complexity For y=x^2: would you rather have a table of all possible y for any x? Or a function that calculates y given any x? LR parsers are guaranteed linear time compared to LL parsers Unsure about complexity for LL, Wilsey mentioned advancements, idk, it depends
  • 9.sidenote Why did I choose LR parsing? Previously spent an inordinate amount of time understanding yacc Did not like the idea of backtracking Made the grammar more human readable (to me) Less coding, I am lazy, mostly just making the parse table
  • 10.The next step: how to resolve conflicts Shift/reduce and reduce/reduce errors are going to happen Having a perfectly unambiguous grammar is a lot of work and results in grammars that are huge The solution is having a lookahead token (solve shift/reduce problem we mentioned) Lookahead in LR parsers allows ambiguity in a language Ok, so how do we make a LR parser with lookahead? Idea -> when reducing, look at a lookahead table If an entry in lookahead table matches next token(s), then don’t reduce yet
  • 11.LR vs LALR vs GLR vs SLR VS LR(1) Vs LALR(7)… Gee whiz, there are a lot of types of LR parsers All of them work the same way (ish), except for GLR The difference is simply in how the table is generated and by extension what grammars are accepted Lookahead Tokens Allow parser to make decisions ahead of time and therefore process more ambiguous grammars The previous example was considered LR(0) LR(k) is a LR parser with k lookahead tokens All of these parsers are variants of LR(1) (no such thing as SLR(0)/LALR(0))
  • 12.Parser types overview What grammars are accepted? LR(1) ⊃ LALR(1) ⊃ SLR(1) ⊃ LR(0) Canonical LR parser = LR(1) parser Example: SLR(1) vs LR(0): Example input: 1 1 1 LR(0) says: S S S ? SLR(1) says: S
  • 13.Ok, so how does SLR(1) work Shift/reduce as usual, except When about to reduce, check lookahead token If the current pattern + lookahead token matches some pattern, stop reducing In our grammar: all factors are terms, all terms are relations, all relations are arithOps… but we do not want that to happen Regular table Lookahead Table
  • 14.SLR Parser notes What I used (ish) for the compiler project Didn’t make a true table, because dictionaries Still able to create by hand Otherwise I would have used some black magic Past this point it’s exceedingly difficult to make these parsers by hand Use a parser generator Makes LR parsing a “black box”, which can be hard to work with and debug
  • 15.LALR, LR(1) LALR is an improvement on LR(1) This is what most parser generators are going to use It accepts less grammars than LR(1) However, has a substantially smaller parse table Works by merging states Merges rules that have identical kernel sets Results in reduce/reduce conflicts SLR(1) merges further and therefore has even more conflicts
  • 16.LALR, LR(1)
  • 17.GLR parser Created to handle “nondeterministic and ambiguous grammars” Algorithm Once again, similar to LR However, GLR parse allows for multiple transitions (aka shift/reduce, reduce/reduce) When a conflict is encountered, fork into 2+ parallel stacks Continue to read input tokens to determine each stack transition (possibly keep forking) If at any point, no transition is valid, then discard entire stack Complexity O(n^3) in worst case, but it depends on grammar For a deterministic grammar, runs in O(n) time Grammars are usually deterministic or close to it, so W(n) is pretty unlikely
  • 18.LR parsers today C++ is really hard to do with an LR parser “C++ grammar is ambiguous, context-dependent and potentially requires infinite lookahead to resolve some ambiguities” Want to read a 400 page PhD thesis? Read somewhere that a GLR parser can solve this issue Besides that, pretty much any language can be parsed with LR “Frank DeRemer invented LALR parsers with his PhD dissertation, called "Practical LR(k) Translators", in 1969, at MIT.” - Yacc (yet another compiler compiler), Stephen Johnson in 1975 - LALR Bison, Robert Corbett, 1985 – LALR, GLR for rust (GLR parser) …