Monthly Archives: March 2021

Non-LR(K) Context-Free Grammar

In my notes for this problem (which I do months in advance), I wrote “Good luck explaining this” to myself, because it’s pretty crazy.  We’ll see how I do.

The problem: Given a Context-Free Grammar G, and a positive integer K (written in unary), is G not an LR(K) grammar?

Informally, an LR(K) grammar can have any string in the grammar parsed without backtracking by looking ahead at most k symbols.

The paper by Hunt, Szymanski, and Ullman that has the reduction has a more formal explanation if you want it.

Example:

I found this example in a set of PowerPoint slides from a class at RPI  about LR parsing

S->ADC | aaaddd
A->aaa
D->ddd
C->Cc | c

The language of this grammar is either aaaddd or aaadddci where i >0.

Notice that if k=1, if we are given the string aaaddd, we will not be able to with just 1 character of lookahead be able to tell if the string has c’s at the end.  The slides say that we can do this with k=4 (The parser will need to look past the 3 d’s to see if a c follows).  Though honestly, I’m not sure why we don’t need k to be 7.  Don’t we have to look past all of the a’s and all of the d’s to find the c?

Reduction: Let’s start with Theorem 3.2 from the Hunt, Szymanski, and Ullman paper.  It says: Suppose we have a nondeterministic Turing Machine M with the properties:

  • M has 1 accepting state and one (semi-infinite) tape
  • M never goes beyond the beginning of the tape (it never falls off the left end)
  • Some (not necessarily polynomial) function T(n) bounds the number of steps of M on an input of size n
  • M only enters the accepting state when the head is over the rightmost “utilized” portion of the tape.
  • Once in an accepting state, M never leaves.

Then for some string x, and for some k >= 8T(|x|)2 we can in time O(|x|) find a grammar G that:

  1. Accepts x
  2. Is not LR(k)
  3. Is not a bunch of similar grammar properties.

The grammar they build has a lot of productions.  The two most important non-terminals are B and B’, where B produces strings of the form: {y#zrev}* $, where y and z are configurations of M, and z is derived from y.

B’ produces a language with strings of the form:

  • The initial description of M on x, followed by
  • {yrev#y#}*, where y is a non-accepting configuration of M, followed by
  • {zrev#$}, where z is an accepting configuration of M

The intersection of these 2 languages is the set of productions that produce an accepting configuration, in which alternating productions are reversed.  If this language is non-empty, then M accepts x.  This grammar is not LR(k) for any k.  They also show the proof in the other direction, but it depends on low-level definitions of LR properties.

The NP-Completeness part of the proof is Theorem 3.3.  We start with any nondeterministic Turing Machine M.  For any string x, we build the string x’ to be g(M,x)#10k, where g(M,x) is the encoding of the grammar from Lemma 3.2.  Lemma 3.2 tells us that x is accepted by M if and only if x’ is LR(k).  Thus, we have shown a generic transformation from a generic nondeterministic Turing Machine.

Difficulty: 9.  I think the details of Lemma 3.2 are hard to follow, and the lemma sort of assumes you’re familiar with the various flavors of CFG’s, which I’m not.  So I might be missing something important in the details.

Regular Grammar Inequivalence

This one seems almost too simple to be worth a post.  But here it is for completeness.

The problem: Regular Grammar Inequivalence.  This is problem AL14 in the appendix.

The description: Given two regular grammars G1 and G2, where each production has the form A->a or A->aB (where “a” is a terminal symbol and “A” and “B” are non-terminals), do G1 and G2 represent different languages?

Example:

Suppose G1 was:

  • S->aA
  • A->aA | bA | a | b

(That’s a grammar for “all strings over {a,b} starting with a”)

Suppose G2 was:

  • S->aS | bS | a

(That’s a grammar for “all strings over {a,b} ending with a”)

Since G1 can produce “ab” and G2 can’t, the grammars are inequivalent.

Reduction:

This seems to me to be a very simple restatement of Regular Expression Inequivalence.  Since we can easily turn a regular expression into a regular grammar (see, for example, the first hit I got when I googled “Convert regular expression to regular grammar”, which actually does a pretty good job of showing you how it works: ), then any pair of regular expressions can be expressed as a pair of grammars that is inequivalent if and only if the original expressions are.  Am I missing something?

Difficulty: 3, though I guess it depends on whether your students know how to go between regular expressions and regular grammars.

Structural Inequivalence for Linear Grammars

The problem: Structural Inequivalece for Linear Grammars.  This is problem AL13 in the appendix.

The description: Given two linear context-free grammars G1 = (N1, Σ, Π1, S1) and G2 = (N2, Σ, Π2, S2), where all productions have at most one nonterminal on the right-hand side.  Are the two grammars “structurally inequivalent”- if we replace each production A->w with A->(w)  (assuming the parentheses are new terminals) do we get different languages?

Example: Here is an example from the paper by Hunt, Rosenkrantz, and Szymanski that discusses this problem.

G1 is:

  • S-> A|B
  • A->0A | 0
  • B ->0B| 0

G2 is:

  • S->C0 | 0
  • C->D0 | 0
  • D->E0 | 0
  • E->S0 | 0

These both generate the language “strings of at least one 0”.  But if we look at the parenthesized grammars:

G1 becomes:

  • S-> (A|B)
  • A->(0A | 0)
  • B ->(0B| 0)

G2 becomes:

  • S->(C0 | 0)
  • C->(D0 | 0)
  • D->(E0 | 0)
  • E->(S0 | 0)

Notice that in G2 if we want to generate the string “00000” the productions are:

S->(C0)->((D0)0)->(((E0)0)0)->((((S0)0)0)0)->((((0)0)0)0)0)

In G2 one possibility is:

S->(A)->((0A))->((0(0A)))->((0(0(0A))))->((0(0(0(0A)))))->((0(0(0(0(0))))))

The parentheses don’t (and won’t ever) line up because the derivations expand in different directions (G1 is right-linear and G2 is left-linear).

Reduction: This is Theorem 5.16 from the Hunt, Rosenkrantz and Szymanski paper.  They show:

  • Actually, for “Type 3” grammars (ones that generate regular languages) structural (in)equivalence is the same as language (in)equivalence.  (This is because all Type 3 grammars are right regular)
  • We can in polynomial time turn a regular expression into an equivalent right-linear Context-Free Grammar.
  • So we can reduce from regular expression inequivalence, take our 2 expressions, turn them into CFG’s, and the grammars we get are structurally inequivalent if and only if the original expressions were.

Notice that since we showed last time that regular expression inequivalence also held on single-character expressions, this means that our problem also holds on single character grammars.

Difficulty: 6.  The hard part is the proof that structural inequivalence of grammars is the same as inequivalence of regular expressions, and the algorithm to turn a regular expression into a Type 3 grammar.  If your students already know that information, or if you expect them to figure it out, that may change the difficulty of the problem.