I wanted to pause here to do a variant on the old Regular Expression Inequivalence problem that I think is both interesting and also will be used coming up.

**The problem: **Regular Expression Inequivalence on Single Character Alphabets. This problem isn’t in the appendix, though it is mentioned as a comment to the normal Regular Expression Inequivalence problem (AL9).

**The description: **Given two regular expressions E_{1} and E_{2} that use the operators {∪, ⋅, *} and the alphabet ∑, where |∑| = 1. Do E_{1} and E_{2} represent different languages?

**Example: **Suppose E_{1} = (aa)* ∪ (aaa)* and E_{2} = aa(a)*. These languages both generate all strings of 2 or more a’s, and so are equivalent (and so would be a “no” instance of our problem). If we changed E_{2} to a(a)*, then E_{2} could generate the string “a”, and E_{1} can’t, meaning the languages are inequivalent (and so would be a “yes” instance of our problem).

**Reduction:** Stockmeyer and Meyer talk about this problem, reducing from 3SAT.

So we’re given a formula A in CNF form, with m clauses and n variables. The set of literals in clause k is C_{k}. Now list out the first n primes p_{1} through p_{n}.

We are going to consider integers z who have the value 0 or 1 mod p_{i} for each p_{i}. So, for example, if we had 4 variables, the first 4 primes are 2,3,5, and 7. 15 is 1 mod 2, 0 mod 3, 0 mod 5, and 1 mod 7. We’d represent 15 as the string “1001” and interpret it as setting variables 1 and 4 to true, and variables 2 and 3 to false. The integer z “satisfies” A if the truth values assigned above satisfy A.

Now let’s build an expression E_{0} whose language is 0^{z} for all z’s that do not generate legal encodings (they have a value that is not 0 or 1 mod some p_{i}). They say that expression is:

So basically for each p_{k} and each possible remainder j from 2 on up, create a string whose size is j mod p_{k}.

Now, for each clause C_{k} we make an expression E_{k} that accepts strings 0^{z}, where if z is a legal encoding (it’s 0 or 1 mod all p_{k}), then it does not set any of the literals in clause k to true (meaning it is a 0 in the position of a variable that exists positively, and a 1 in the position of a variable that exists negatively). The expression for this is:

.

..where r, s, and t are the variables that occur in C_{k}.

Our actual first expression E_{1} unions E_{0} and all of the E_{k}‘s together. E_{2} will be 0*.

They then say that “it is not hard to show” that A is satisfiable if and only if there is an integer z that encodes an assignment to A and 0^{z} is not in the language of E_{2}. Which I think makes sense because if A was satisfiable, then the z of that assignment would not be in E_{0} (because it is 0 or 1 mod all p_{k}) and it would not be in any E_{k} (because by satisfying clause k, the z would have to make one of the variables true).

**Difficulty**: 8. I know I rated the original “Regular Expression Inequivalence” problem a 9, but that was one of the first problems I did, and since then I’ve seen some things. I’d probably rate it a 7 now. This version is harder, because of all the prime number issues involved. I do think it’s possible to see how and why each expression is made, and what it’s used for, and it all comes together in a neat way.