In my notes for this problem (which I do months in advance), I wrote “Good luck explaining this” to myself, because it’s pretty crazy. We’ll see how I do.

**The problem: **Given a Context-Free Grammar G, and a positive integer K (written in unary), is G not an LR(K) grammar?

Informally, an LR(K) grammar can have any string in the grammar parsed without backtracking by looking ahead at most k symbols.

The paper by Hunt, Szymanski, and Ullman that has the reduction has a more formal explanation if you want it.

**Example:**

I found this example in a set of PowerPoint slides from a class at RPI about LR parsing

S->ADC | aaaddd

A->aaa

D->ddd

C->Cc | c

The language of this grammar is either aaaddd or aaadddc^{i} where i >0.

Notice that if k=1, if we are given the string aaaddd, we will not be able to with just 1 character of lookahead be able to tell if the string has c’s at the end. The slides say that we can do this with k=4 (The parser will need to look past the 3 d’s to see if a c follows). Though honestly, I’m not sure why we don’t need k to be 7. Don’t we have to look past all of the a’s *and* all of the d’s to find the c?

**Reduction:** Let’s start with Theorem 3.2 from the Hunt, Szymanski, and Ullman paper. It says: Suppose we have a nondeterministic Turing Machine M with the properties:

- M has 1 accepting state and one (semi-infinite) tape
- M never goes beyond the beginning of the tape (it never falls off the left end)
- Some (not necessarily polynomial) function T(n) bounds the number of steps of M on an input of size n
- M only enters the accepting state when the head is over the rightmost “utilized” portion of the tape.
- Once in an accepting state, M never leaves.

Then for some string x, and for some k >= 8T(|x|)^{2} we can in time O(|x|) find a grammar G that:

- Accepts x
- Is not LR(k)
- Is not a bunch of similar grammar properties.

The grammar they build has a *lot* of productions. The two most important non-terminals are B and B’, where B produces strings of the form: {y#z^{rev}}* $, where y and z are configurations of M, and z is derived from y.

B’ produces a language with strings of the form:

- The initial description of M on x, followed by
- {y
^{rev}#y#}*, where y is a non-accepting configuration of M, followed by - {z
^{rev}#$}, where z is an accepting configuration of M

The intersection of these 2 languages is the set of productions that produce an accepting configuration, in which alternating productions are reversed. If this language is non-empty, then M accepts x. This grammar is not LR(k) for any k. They also show the proof in the other direction, but it depends on low-level definitions of LR properties.

The NP-Completeness part of the proof is Theorem 3.3. We start with any nondeterministic Turing Machine M. For any string x, we build the string x’ to be g(M,x)#10^{k}, where g(M,x) is the encoding of the grammar from Lemma 3.2. Lemma 3.2 tells us that x is accepted by M if and only if x’ is LR(k). Thus, we have shown a generic transformation from a generic nondeterministic Turing Machine.

**Difficulty: **9. I think the details of Lemma 3.2 are hard to follow, and the lemma sort of assumes you’re familiar with the various flavors of CFG’s, which I’m not. So I might be missing something important in the details.