Tag Archives: Non-LR(k) Context-Free Grammar

Non-LR(K) Context-Free Grammar

In my notes for this problem (which I do months in advance), I wrote “Good luck explaining this” to myself, because it’s pretty crazy.  We’ll see how I do.

The problem: Given a Context-Free Grammar G, and a positive integer K (written in unary), is G not an LR(K) grammar?

Informally, an LR(K) grammar can have any string in the grammar parsed without backtracking by looking ahead at most k symbols.

The paper by Hunt, Szymanski, and Ullman that has the reduction has a more formal explanation if you want it.


I found this example in a set of PowerPoint slides from a class at RPI  about LR parsing

S->ADC | aaaddd
C->Cc | c

The language of this grammar is either aaaddd or aaadddci where i >0.

Notice that if k=1, if we are given the string aaaddd, we will not be able to with just 1 character of lookahead be able to tell if the string has c’s at the end.  The slides say that we can do this with k=4 (The parser will need to look past the 3 d’s to see if a c follows).  Though honestly, I’m not sure why we don’t need k to be 7.  Don’t we have to look past all of the a’s and all of the d’s to find the c?

Reduction: Let’s start with Theorem 3.2 from the Hunt, Szymanski, and Ullman paper.  It says: Suppose we have a nondeterministic Turing Machine M with the properties:

  • M has 1 accepting state and one (semi-infinite) tape
  • M never goes beyond the beginning of the tape (it never falls off the left end)
  • Some (not necessarily polynomial) function T(n) bounds the number of steps of M on an input of size n
  • M only enters the accepting state when the head is over the rightmost “utilized” portion of the tape.
  • Once in an accepting state, M never leaves.

Then for some string x, and for some k >= 8T(|x|)2 we can in time O(|x|) find a grammar G that:

  1. Accepts x
  2. Is not LR(k)
  3. Is not a bunch of similar grammar properties.

The grammar they build has a lot of productions.  The two most important non-terminals are B and B’, where B produces strings of the form: {y#zrev}* $, where y and z are configurations of M, and z is derived from y.

B’ produces a language with strings of the form:

  • The initial description of M on x, followed by
  • {yrev#y#}*, where y is a non-accepting configuration of M, followed by
  • {zrev#$}, where z is an accepting configuration of M

The intersection of these 2 languages is the set of productions that produce an accepting configuration, in which alternating productions are reversed.  If this language is non-empty, then M accepts x.  This grammar is not LR(k) for any k.  They also show the proof in the other direction, but it depends on low-level definitions of LR properties.

The NP-Completeness part of the proof is Theorem 3.3.  We start with any nondeterministic Turing Machine M.  For any string x, we build the string x’ to be g(M,x)#10k, where g(M,x) is the encoding of the grammar from Lemma 3.2.  Lemma 3.2 tells us that x is accepted by M if and only if x’ is LR(k).  Thus, we have shown a generic transformation from a generic nondeterministic Turing Machine.

Difficulty: 9.  I think the details of Lemma 3.2 are hard to follow, and the lemma sort of assumes you’re familiar with the various flavors of CFG’s, which I’m not.  So I might be missing something important in the details.