Monthly Archives: April 2017

Hitting String

This is a good easy problem hidden in the middle of all of the hard ones.

The problem: Hitting String.  This is problem SR12 in the appendix.

The description: Given a set of strings A over the alphabet {0,1,*} all of the same length n, can we find a string x over the alphabet {0,1}, also of length n, where x agrees in at least one position with each string in A?

Example: Let A = {00000,11111,0*001, ***10, 101**, 1****}

Then x can be 10100.  It agrees with 00000 in the last position, 11111 in the first position, 0*001 in the fourth position, ***10 in the last position, and 1***** in the first position.

A pretty easy example of an instance that can’t be solved is A = {1**, 0**}, or even A = {***}

Reduction: We’ll go from 3SAT.  Each clause in the 3SAT instance will become a string in A.  Each string in A will have one position for each variable in the formula.  Each string will have a 1 in each variable’s position if that variable occurs positively in that clause, a 0 if it occurs negatively, and a * for all variables that don’t appear in the clause.  So each string will have just 3 characters that are not *.

Thus, we need to come up with a string x that has a 1 or 0 in each position (corresponding to a setting of a variable) that matches one of the three 1 or 0 characters in each string (satisfying that clause).

Difficulty: 4, maybe 3.  This is about as straightforward a “Turn SAT into a different kind of problem” reduction as you’re likely to see, but I do think crossing genre problems are harder for students than we may anticipate them to be.

Bounded Post Correspondence Problem

I’m sure most people remember the “Post Correspondence Problem” from Theory of Computation classes.  It’s mainly famous for two reasons:

  • It’s the classic example of an undecidable problem that isn’t about a language (like the Halting Problem is)
  • It leads to lots of fun (meaning “bad”) PCP jokes.  (“I need some PCP to be able to handle thinking about the PCP”)

Anyway, here is a restricted version of the problem that at least is decidable, if intractable.

The problem: Bounded Post Correspondence Problem.  This is problem SR11 in the appendix.

The description: Given two sequences of strings A =(a_1, ...a_n) and B = (b_1, ..., b_n) from a finite alphabet, and a positive integer K, can I find a sequence of integers i1..ik, where k ≤ K, and the string A’= a_{i_{1}} a_{i_{2}} ..a_{i_{k}} is identical to the string B’ =b_{i_1}b_{i_2}..b_{i_k}?

Example: The difference between this and “classic” PCP is the introduction of K, which makes the problem decidable because if nothing else you can just try all combinations of strings from A and B that are length K or less, which is finite.

So, any PCP instance that “works” is a good example to use here.  Here’s one that I got from my old Hopcroft and Ullman theory of computation book:

A =  {1, 10111,10}  B = {111,10,0}

We can construct the string 101111110.  From A, it’s built from 10111/1/1/10.  From B, it’s built from 10/111/111/0

Notice that we need to use the same elements from each string in the sequence (The second one, then the first one twice, then the third one).

Reduction: The reference in G&J is to a technical report by Constable, Hunt, and Sahni and says they use a “generic transformation”.   In the paper, that means they show it’s NP-Complete by showing it’s a “negation of freedom for a 2 variable loop free scheme”  Which is, as near as I can tell, a way of defining programs to be equivalent.

I’ll admit that I don’t really understand it.  I really want there to be a simpler reduction out there.  I found this post on stackexchange,  but it reduces from Shortest Common Supersequence, and uses a |R| of 2, and G&J say that Shortest Common Supersequence is polynomial with |R|=2, so something has to be wrong there.

I was tinkering with a reduction from Directed Hamiltonian Cycle, where the strings in A corresponded to edges, and the strings in B correspond to vertices.  So, for a graph like this:

You would get strings like this:

A B
x xa
ab bb
bc cc
ca a
cd dd

The idea is that each string in A corresponds to a directed edge, and each string in B corresponds to 2 copies of the destination of that edge.  The “x” is there to give us a starting point- we arbitrarily choose some starting vertex to be the start.  All edges that have this starting vertex as its destination only have 1 copy (instead of 2) of that vertex in B.

The problem is that a string like xabbcca  (corresponding to a non-Hamiltonian cycle) is a solution to the PCP problem, but is not a solution to the HC problem.  I don’t know how to force the string to be long enough to visit all vertices- I thought about working on a variant of the problem where you force the size of the string you create to be exactly =K instead of ≤ K.  But  since a cycle that doesn’t involve the start vertex can be “pumped” several times to make a string longer, that won’t work either.  There is probably something clever you can do with the strings to force you to visit all vertices, but I can’t see it.

Difficulty: 10, sadly.  I’m sure there is something easier out there.

Protected: Shortest Common Superstring

This content is password protected. To view it please enter your password below:

Protected: Shortest Common Supersequence

This content is password protected. To view it please enter your password below: