String-To-String Correction

This one reminds me of an old puzzle problem.

The problem: String-To-String Correction.  This is problem SR20 in the appendix.

The description: Given two strings x, and y over a finite alphabet ∑, and an integer K.  Can we start from x and derive y in K or fewer steps, where each step is either deleting a symbol or interchanging a symbol.

Example: The puzzles I’m thinking of are the ones that say “Can you go from HELLO to OLE in 6 steps?”

  • ELLO
  • ELO
  • EOL
  • OEL
  • OLE

The reason this is a hard problem is when you have repeated symbols, and need to “choose” which ones to delete.  A slightly harder example is to go from ABABABABBB to BAAB.  Now which A’s and B’s you choose to delete have an effect on the total number of moves.

The paper by Wagner that has the reduction makes the point that if you replace “Delete” with “Insert”, you get basically the same problem but go from y to x instead of from x to y.  The paper gives other variants that have polynomial solutions (allowing insertions and deletions and changes of a character, whether or not you allow interchange).

Reduction: Wagner uses Set Covering.  So we’re given a collection of sets all subsets of some set W, and an integer K.

Define t = |W| and r = t2.  Pick 3 symbols Q,R, and S that are not in W.  The string x  will be:


..concatenated together each time for each ci.  (So the copy that has c1 is first, then another copy with c2, and so on.

The string y will be n copies of the substring RQr, followed by w, followed by n copies of the substring Sr+1

The K’ for this problem is (K+1)*r-1 + 2t(r-1) + n(n-1)(r+1)2/2 + d.

d is r*n+|c1…c2| – |W|.

From here, Wagner explains that the way the correction problem works is by deciding which copies of the R, Qr and Sr+1  sets in x match to the corresponding ones in y.  By counting all of the operations that “have” to be done (by deleting or moving characters), he shows that the a correction problem that has weight of K’ has to cross over certain ci sets to make a covering, and that there have to be K or less of those sets.

Difficulty: 8.  The construction is pretty crazy.  You can eventually see where the components come from, but I can’t imagine how he came up with these things (especially the x and y strings) in the first place.

Two-Dimensional Consecutive Sets

This problem is harder to explain than the last couple.  Let’s see how it goes.

The problem: Two-Dimensional Consecutive Sets.  This is problem SR19 in the appendix.

The description: Given a finite alphabet ∑, and a collection C = {∑1..∑n}  of subsets of ∑, can we partition ∑ into some some number of sets X1..Xk such that:

  • Each Xi has at most one element in each set in C,
  • For each ∑j in C, there is an “index” l(j) (a number ≥ 0) where ∑j is in  Xl(j) ∪ Xl(j)+1  ∪ … ∪ Xl(j)+m-1, where m =|∑j|

Example: The “index” stuff is basically there to say “you can make each set in C by taking one element from a consecutive X sets”.  So, if C = {{a,b,c}, {b,e}, {b,c,e}, and {b,c} over the alphabet {a,b,c,d,e} then we can make X = {{a,e}, {b}, {c,d}}

Then we can get each set in C by taking one element from consecutive sets in X.

An example of a C that can’t be realized would be: {{a,b}, {b,c}, {b,d}, {a,b,c,d}}

The element b needs to be in a set X that is adjacency to the sets that contain a,c, and d.  Since the set with b can only be adjacent to at most 2 other sets in X, that means that some of the elements in a,c, and d need to be in the same set.  But the set {a,b,c,d} prevents that from happening.

Reduction: G&J’s reference is to a paper by Lipski, which uses Graph Coloring.  So we’re given a graph and an integer K.  We’re going to build an alphabet that has one symbol for each vertex in V, K+2 “b” elements (numbered from 0 to k+1), and K-2 “a” elements (numbered from 1 to k-2) for each edge.  (There is only one copy of the “b” elements.  There is one set of the “a” elements for each edge).  The sets in C are:

  • One set containing all of the “b” elements.
  • One set for each consecutive pair of “b” elements (so {b0, b1} all the way through {bk, bk+1})
  • For each edge (v,w), a set containing the vertex symbols for v and w, all k-2 “a” symbols for that edge, b0, and bk+1

If G has a K-Coloring, we can split ∑ into k+2 pieces, numbered X0 to Xk+1  (Notice that because of the set with all of the “b” elements in C, we need to have at least that many sets in X).  Each Xi gets its corresponding “b” element.  For each edge (v,w), we know that vertices v and w each have different colors (numbered from 1-K).  We put v’s symbol in the X set numbered with v’s color, and we put w’s symbol in the X set numbered with w’s color.  We put the remaining “a” symbols for that edge in the other k-2 X sets between 1 and N.  This creates a two-dimensional partition  of all of the sets in C.

If we have a two-dimensional partition of C, then list them out in order of how they sort the “b” elements (So the set with b0 is X0 and so on).  It’s possible that we have to do a reverse ordering (the set with b0 is Xk+1 and the set with bk+1 is X0), but we can treat that as basically the same.  Since the sets in C that were based on edges contain b0 and bk+1, the other elements in that set need to fit in sets X1 through Xk.  So the partition has to have exactly K+2 sets, and the vertices all fall in sets X1 through Xk with no two vertices sharing an edge in the same set.  This gives a K-coloring of the vertices in the graph.

Difficulty: 7.  It sort of makes sense that the “two-dimensional” problem needs us to create a “two-dimensional” set of elements (the “a” elements), but I don’t think I would expect a student to be able to come up with these elements and these steps without a lot of help.

Consecutive Sets

This one moves slightly away from the matrix stuff we’ve been doing, but results in a pretty cool reduction.

The problem: Consecutive Sets.  This is problem SR18 in the appendix.

The description: Given a collection C of subsets of a finite alphabet Σ, and a positive integer K, can I find a string w from ∑* with K symbols or less, such that each element in C has its characters occur sequentially in w?

Example: Let ∑ = {a,b,c,d,e} and C = {{a,b,c}, {a,d,e}, {b,e}, {c,d}}

Then I can choose w to be dcbadeb, which is length 7, which is the best I could find. Notice how positions 2-4 contain {a,b,c}, positions 4-6 contain {a,d,e}, positions 6-7 contain {b,e}, and positions 1-2 contain {c,d}.

Reduction: This is “Problem 4” in the paper by Kou from last week.  He reduces from Hamiltonian Path.  So we start with a graph G=(V,E).  Our alphabet Σ (he uses “R” in the paper) has one symbol for each vertex in V and one symbol for each edge in E.  Our C set (he uses F in the paper) will have one set for each vertex.  The symbols in the set for a vertex i will consist of the edge symbols for the edges incident on i, and the vertex symbol for i itself.  K will be 2*|E|+1.

Notice that no two sets in C are subsets of each other (because they each hold a separate symbol from V), and that the intersection of 2 sets from C is a single edge (the edge connecting the two vertices corresponding to the sets), if that edge exists, and the intersection is empty otherwise.  A string w that we make will be based on a sequence of vertices (and their C sets).  Since each edge appears in two C sets, the “initial” size of w is 2|E| + |V|, if we have no overlap between C sets in w.  The question is how small can it get?

Since each C set can have at most one symbol of overlap with another C set, and that only happens if an edge exists between the vertices that define the C sets, the only way to create an overlap to reduce the size of w is to order the vertices such that there are edges between consecutive pairs of vertices.  G has a Hamiltonian Path if and only if we can order the vertices in such a way that there is an edge bewteen each part of consecutive vertices.  In w, that means we can overlap 1 symbol from each pair of consecutive C sets.  Since there are |V|-1 pairs of vertices in a path, the minimum size of w is 2|E| + |V| – (|V|-1) = 2|E| + 1 = K.

Difficulty: 5.  I think this is a neat reduction that students can get, though it might take a bit of work to make sure they understand the problem.

Consecutive Block Minimization

These next two problems are very similar to the “consecutive ones” problems we’ve been looking at lately.

The problem: Consecutive Block Minimization.  This is problem SR17 in the appendix.

The description: Given an m x n matrix A consisting of 0’s and 1’s and a positive integer K, can we permute the columns of A so that the permuted matrix (B) has K or less blocks of consecutive 1’s.  G&J define a “block” as an entry bij =1 where bi,j+1 = 0 or j = n.  So a “block” is one-dimensional, going across a row.

Example: Here’s a matrix:

1 0 1 0 1
1 1 1 0 1
0 0 1 1 0
0 1 1 0 0
1 1 0 0 1

Right now, this has 9 blocks (3 on the first row, 2 on the second and fifth, 1 on the third and fourth).

But we can permute the columns to get:

0 1 1 1 0
1 1 1 1 0
0 0 0 1 1
1 0 0 1 0
1 1 1 0 0

..for a total of 6 blocks.

Reduction: The paper by Kou has several reductions in it, we’ll be using this paper next week too.  This problem is “Problem 3” in the paper.  The problem uses Hamiltonian Path, but with costs for edges, so it’s really more like “Traveling Salesman on Paths”.   His “Problem 2” takes an instance of regular Hamiltonian Path and shows that this “Traveling Salesman on Paths” problem is NP-Complete on “cost graphs”.

Kou defines a “cost graph” of a matrix as a complete graph that has one vertex for each row and an edge cost of edge (i,j) as  the number of columns in A that have a 1 in row i and a 0 in row j.  He proves in the paper that a Hamiltonian Path on the cost graph has cost k iff there are k+c blocks of 1’s in the matrix, where c is the number of 1’s in the last row in the matrix.

So, the full reduction takes an instance of HP, creates an incidence matrix (as we defined in the Consecutive Ones Matrix Augmentation problem),  builds a cost graph of that, and uses that cost graph as the input to the Consecutive Block minimization problem.

Difficulty: 7. The cost graph construction is pretty hard to grasp, and you need to prove some extra lemmas along the way to show that the transformation works.

Consecutive Ones Matrix Partition

This is the problem I skipped over last week to get the two problems from Booth’s dissertation done together.

The problem: Consecutive Ones Matrix Partition.  This is problem SR15 in the appendix.

The description: Given an mxn matrix A consisting of 0’s and 1’s, can we partition the rows of A into two groups m1 and m2 such that the matrices A1 = m1x n and A2 = m2x n both have the consecutive ones property?

Example: Here’s the matrix that does not have the consecutive ones property.

0 1 1 1 1
1 1 1 0 1
0 0 1 1 0
0 1 1 0 0
1 1 0 0 0

But if we break it into two matrices, for example:

0 1 1 1 1
1 1 1 0 1
0 1 1 0 0


0 0 1 1 0
1 1 0 0 0

The two matrices have the consecutive ones property (the second matrix already has it, the top matrix has it if you exchange the 4th and 5th columns.)

Reduction: G&J point to a “private communication”, but I found a technical report by Lipski that explains the reduction.  The reduction is from Hamiltonian Path on cubic graphs (each vertex has degree at most 3).  So we start with a graph G=(V,E) where each vertex is degree 3 or less.  We construct a new graph G’=(V’,E’) where G’ adds new vertices and edges so that each vertex in V has degree 4.  Each new vertex has degree 1 (and so the new edges we add to V always connect to distinct new vertices).  Take the incidence matrix of this graph (as defined in the Consecutive Ones Submatrix problem) and use it as A.

If G has a Hamiltonian Path, then we can split A into two parts: the vertices in the path (thus, all of V) and everything else.  The Hamiltonian Path has the consecutive ones property, just like we saw in the Consecutive Ones Submatrix problem.  The other rows just map to disjoint paths of length 2 (two degree one edges connected by some vertex in V) so will be able to be rearranged.

If A has a partition that works, then choose some vertex v in V.  Suppose it ends up in A1.  We know that this vertex has degree 4.  We also know that to satisfy the consecutive ones property, v needs to be adjacent to exactly 2 things in A1 and 2 things in A2 for the consecutive ones property to work out.  The vertices in A1 need to form a chain of vertices all in V to make the consecutive ones property hold, so will form a Hamiltonian Path.

Difficulty: 6.  The paper glosses over a lot of detail, and I did to.  For example, the paper says “Without loss of generality we may assume that V ∈ M1” (the paper’s version of A1.  I don’t think it’s obvious at all that the entirety of V will end up all in one partition or the other.  I think it’s true, but I think it needs more work from a student to explain why it works.

Consecutive Ones Matrix Augmentation

While I had the dissertation I needed for the last problem, and before I had to send it back (without making copies), I skipped ahead to another problem that used it as a reference.

The problem: Consecutive Ones Matrix Augmentation.  This is problem SR16 in the appendix.

The description: Given an mxn matrix of 0’s and 1’s and a positive integer K, can we change K or fewer 0’s into 1’s to give the matrix the consecutive ones property.  See the Consecutive Ones Submatrix problem for more detail on this property.

Example: Here is the matrix that did not have the property from last week:

0 1 1 1 1
1 1 1 0 1
0 0 1 1 0
0 1 1 0 0
1 1 0 0 0

Recall that we’re looking for a way to  permute the columns so that there are no gaps in the 1’s in each row.  The easy solution here is to add a 1 in the second row (making it all 1’s).  In that case, the matrix has the consecutive ones property without needing any column permutations.

Reduction: Again, it is found in the dissertation by Booth, but this time uses Optimal Linear Arrangement.  Recall in that problem we were given a graph and needed to find an ordering of vertices to minimize the “score” of the differences of the ordering values.

The corresponding idea here is to start with the graph and build an incidence matrix, just like in the Consecutive Ones Submatrix problem.  We notice that any permutation of the rows (in the dissertation, you permute the rows to find consecutive ones in each column) corresponds to a permutation of the vertices in the OLA problem.  If we took a permutation of the vertices, each edge (column) of the incidence matrix has exactly two 1’s in it.  There may or may not be any 0’s between those ones, but if there are, we can convert those 0’s into 1’s (this is the augmentation).

The way these problems relate is that if an edge exists between two vertices, vi and vj, located at positions i and j in the final permutation, that edge adds j-i to the cost (assuming j > i).  In our matrix, if we permute the row corresponding to vertex vi to row i, and permute the row corresponding to vertex vj to row j, then there are j-i-1 o’s between the two rows.  So we could set them all to 1, and that column would have the consecutive ones property.

Thus, a solution of cost K to the OLA problem corresponds to a solution to the Consecutive Ones Matrix Augmentation problem of cost K-|E|, since there are |E| columns in the incidence matrix, and each one differs by 1 from the OLA cost of the edge.

Difficulty: 6, mainly because of the incidence matrix/adjacency matrix difficulty I talked about last time, but also because OLA is not easy to get.  If you’ve already shown them incidence matrices (for example, because you’ve gone over the Consecutive Ones Submatrix problem), then this is probably a 4.

Consecutive Ones Submatrix

This is another problem where the reference is a PhD thesis that they had to ship to the library.  Hopefully I don’t mess the description up, because it’ll be hard to get back.

The problem: Consecutive Ones Submatrix.  This is problem SR14 in the appendix.

The description: Given an mxn matrix A, consisting of only 0’s and 1’s, and a positive integer K, can we find an mxK submatrix B of A such that we can rearrange the columns of B so that in each row, all of the 1’s occur consecutively?  (This is called the “consecutive ones property” and will come up in several problems.)

Example: Suppose this was A:

1 0 0 0
0 1 0 1
0 1 1 1
0 0 1 0

If K=3, we could among other things, choose the last 3 columns of A for B, and rearrange the second and third columns to get:

0 0 0
1 1 0
1 1 1
0 0 1

..which has the property.

The dissertation by Booth which has the reduction gives this example of a matrix that has no permutation when K=N.  (Note in the dissertation, the consecutive ones property applies to columns, not rows, so I transposed the matrix in the paper)

0 1 1 1 1
1 1 1 0 1
0 0 1 1 0
0 1 1 0 0
1 1 0 0 0

The easiest way to see why this won’t work is to realize that the fourth column needs to go last (or first) and the first column needs to go first (or last) to keep the sequences of four 1’s from being split.  So our partial solution is:

0 1 1 1 1
1 1 1 1 0
0 x x x 1
0 y y y 0
1 z z z 0

The x’s are some permutation of “010”, which means the 1 has to go last, so that puts column 3 from the original matrix there:

0 1 1 1 1
1 1 1 1 0
0 0 0 1 1
0 y y 1 0
1 z z 0 0

The y’s need to be a permutation of 01 that has a 1 in the third column, so that means that the second column needs to go into the third spot.  But the z’s need a permutation of 01 that has a 1 in the second column, so that means that the second column needs to stay in the second spot.  So no permutation works.

(While I have the dissertation, he also claims that this matrix can’t be permuted either:

0 0 1 1 0
0 0 0 1 1
0 0 0 0 1
1 1 0 0 0
1 1 1 0 0

..but it seems like this would work:

0 0 1 1 0
0 0 0 1 1
0 0 0 0 1
1 1 0 0 0
1 1 1 0 0

Maybe I’m doing the transposition wrong.  Remember, in his problem definition he permutes rows to keep consecutive ones in the columns, so he actually starts with:

1 0 0 0 1
1 1 0 0 0
1 1 0 0 0
0 0 1 1 0
0 0 0 1 1

..but I can permute rows to get:

1 1 0 0 0
1 1 0 0 0
1 0 0 0 1
0 0 0 1 1
0 0 1 1 0

I bet there is a missing 1 someplace in that matrix.)

Reduction: Is from Hamiltonian Path.  Keeping in mind that Booth’s thesis defines the problem as permuting rows to have 1’s in consecutive columns, he defines the incidence matrix of a graph G=(V,E) to be a |V|x|E| matrix M where mij is 1 if vertex i is in edge j, and 0 otherwise.  Notice that each column has exactly two 1’s in it.

(To solve G&J’s version of the problem, you’d want M=|E|x|V| and mij = 1 iff vertex j is in edge i)

Now all we have to do is take a graph, build an incidence matrix, and see if we can rearrange the rows so that the consecutive 1’s property holds for n-1 of the rows.  (The row we leave out is for the connection from the last vertex in the path back to the first vertex in the path, which we don’t have to take).

Difficulty: 5.  I’ll admit to being a bit confused by trying to use an adjacency matrix instead of an incidence matrix.  And to not really understanding the problem definition (the examples in the thesis really helped).  But really, I should have gotten this without having to make them ship the thesis all the way from California.

Sparse Matrix Compression

Here’s the problem I’ve been dreading for months.  I haven’t been able to track down the reference, and I haven’t been able to come up with a full reduction on my own.  I’ve decided that I fundamentally don’t understand the way G&J are explaining the problem and have come up with an alternate problem statement that I think meets the same goals.

The problem: Sparse Matrix Compression.  This is problem SR13 in the appendix.

G&J’s description: Given an mxn matrix A where each entry in the matrix is either 0 or 1, and an integer K ≤ m*n.  Can we find a sequence B of n+K integers, each of which is between 0 and m (inclusive), and a function s mapping the numbers from 1 to m to the numbers 1 to K, such that aij = 1 if and only if bs(i)+j-1 = i?

Example: So what does this have to do with compression, you may be asking? The idea is that the b sequence will tell us how to find runs of 1’s in various rows, and the s function tells us where each row’s sequence begins.

So, suppose this is our matrix:

1 0 1 0
0 0 1 1
0 0 0 1
1 1 0 0

The first row implies that the B sequence needs to have 1x1x in it (the x’s can be anything that’s not a 1)

The second row implies that the B sequence needs xx22 in it.

The third row implies that the B sequence needs xxx3

The fourth row implies that the B sequence needs 44xx

The shortest way I can make all of that happen is 4413122X.  In this case, s(1) = 2,  (the sequence starting at position 2: 4131 is the sequence for the first row), s(2) = 4, s(3)=1, s(4) = 1.  The X at the end can be any number, but needs to be there because the sequence needs to be of size n+K, and  the s values need to map from 1-K, and we have s(2) = 4.

(Non-)Reduction: G&J’s reference is to an “unpublished manuscript”, and they say to use graph 3- coloring, and to set K=3 (presumably related to the number of colors).  I’ve been thinking about this for months, and I think I just misunderstand the problem.   I have a real problem trying to understand the use of the K values- setting K=3 means that each row can only map to 1,2, or 3 via the s function.  And if two (or more!) rows have the same s value that means they can have no 1’s in common.  So since there are only 3 s values, if the graph is colorable, that means every row in the matrix needs to map into 3 groups of rows, where for each row they have no 1’s in common.

Which seems like the start of a good idea (the three groups are the three colors of vertices, two rows go into the same group if they can be colored the same and don’t share an edge), except that then you have to come up with a way to eventually get a number in the sequence for each 1 in each row and you need to worry about the fact that the groups that map to the other colors are starting right next to where your group starts (since every s(i) has to map to 1,2,or 3, they all have to start at one of the first 3 positions in the sequence, so for example b4 is the 4th thing in sequence 1, the 3rd thing in sequence 2, and the third thing in sequence 3.  So there needs to be at most one 1 out of all of those overlapping rows), and I couldn’t make this all work out.

Instead, I started thinking about how this problem looks an awful lot like Shortest Common Superstring– We need a common string that holds all of the strings corresponding to each row.

So, let me rewrite the problem as follows: Can we come up with a sequence B of 0’s and 1’s, and a function s such that bs(i)+j-1 = aij? Using this formulation, the best solution I could come up with for my example above was 1100011010.  s(1) = 7, s(2) = 4, s(3) = 3, s(4) = 1.  But on more sparse (or more similar) matrices, the string can get much shorter: If the matrix was 4 copies of the same row 0011, then 0011 can be the result string and s(i) = 1 for all i.  The 4×4 identity matrix can have the b sequence 0001000 and s(1) = 4, s(2) = 3, s(3)=2, s(1) = 1

This is “equivalent” to G&J’s description in the sense that it captures the same desires of the original problem- take a two-dimensional matrix that consists of 0’s and 1’s, and generate a (preferably short) one-dimensional representation with an index function that tells you where each row starts.  In this new formulation, the s function points to the start of each string, and now aij = 1 if and only if the corresponding element in the sequence is 1 (instead of i).  What isn’t so obvious to make equivalent is the K in the original problem (the number of “starting positions” you need to add to N to make the sequence) and the K here (the size of the resulting sequence).  I think it has to do with the structure of the underlying pattern of 0’s and 1’s in each row.  If there is a way to make those K’s translate in polynomial time, then this formulation is truly equivalent to G&J’s.

So now what we need is an instance of SCS where all strings are in the alphabet {0,1} (the problem is still NP-Complete in this case), make them be the rows of our matrix, and set the K’ of the Sparse Matrix Compression problem to be the K of the SCS problem, and it should work out.  I’m not 100% convinced this works, but it seems like it should.

Difficulty: 6.  Mainly because it’s so hard for me to understand the problem, but also seeing how it maps to SCS is not trivial.  But the problem as written by G&J is still unsolved.  Hopefully someone out there can explain how to solve that problem.

Hitting String

This is a good easy problem hidden in the middle of all of the hard ones.

The problem: Hitting String.  This is problem SR12 in the appendix.

The description: Given a set of strings A over the alphabet {0,1,*} all of the same length n, can we find a string x over the alphabet {0,1}, also of length n, where x agrees in at least one position with each string in A?

Example: Let A = {00000,11111,0*001, ***10, 101**, 1****}

Then x can be 10100.  It agrees with 00000 in the last position, 11111 in the first position, 0*001 in the fourth position, ***10 in the last position, and 1***** in the first position.

A pretty easy example of an instance that can’t be solved is A = {1**, 0**}, or even A = {***}

Reduction: We’ll go from 3SAT.  Each clause in the 3SAT instance will become a string in A.  Each string in A will have one position for each variable in the formula.  Each string will have a 1 in each variable’s position if that variable occurs positively in that clause, a 0 if it occurs negatively, and a * for all variables that don’t appear in the clause.  So each string will have just 3 characters that are not *.

Thus, we need to come up with a string x that has a 1 or 0 in each position (corresponding to a setting of a variable) that matches one of the three 1 or 0 characters in each string (satisfying that clause).

Difficulty: 4, maybe 3.  This is about as straightforward a “Turn SAT into a different kind of problem” reduction as you’re likely to see, but I do think crossing genre problems are harder for students than we may anticipate them to be.

Bounded Post Correspondence Problem

I’m sure most people remember the “Post Correspondence Problem” from Theory of Computation classes.  It’s mainly famous for two reasons:

  • It’s the classic example of an undecidable problem that isn’t about a language (like the Halting Problem is)
  • It leads to lots of fun (meaning “bad”) PCP jokes.  (“I need some PCP to be able to handle thinking about the PCP”)

Anyway, here is a restricted version of the problem that at least is decidable, if intractable.

The problem: Bounded Post Correspondence Problem.  This is problem SR11 in the appendix.

The description: Given two sequences of strings A =(a_1, ...a_n) and B = (b_1, ..., b_n) from a finite alphabet, and a positive integer K, can I find a sequence of integers i1..ik, where k ≤ K, and the string A’= a_{i_{1}} a_{i_{2}} ..a_{i_{k}} is identical to the string B’ =b_{i_1}b_{i_2}..b_{i_k}?

Example: The difference between this and “classic” PCP is the introduction of K, which makes the problem decidable because if nothing else you can just try all combinations of strings from A and B that are length K or less, which is finite.

So, any PCP instance that “works” is a good example to use here.  Here’s one that I got from my old Hopcroft and Ullman theory of computation book:

A =  {1, 10111,10}  B = {111,10,0}

We can construct the string 101111110.  From A, it’s built from 10111/1/1/10.  From B, it’s built from 10/111/111/0

Notice that we need to use the same elements from each string in the sequence (The second one, then the first one twice, then the third one).

Reduction: The reference in G&J is to a technical report by Constable, Hunt, and Sahni and says they use a “generic transformation”.   In the paper, that means they show it’s NP-Complete by showing it’s a “negation of freedom for a 2 variable loop free scheme”  Which is, as near as I can tell, a way of defining programs to be equivalent.

I’ll admit that I don’t really understand it.  I really want there to be a simpler reduction out there.  I found this post on stackexchange,  but it reduces from Shortest Common Supersequence, and uses a |R| of 2, and G&J say that Shortest Common Supersequence is polynomial with |R|=2, so something has to be wrong there.

I was tinkering with a reduction from Directed Hamiltonian Cycle, where the strings in A corresponded to edges, and the strings in B correspond to vertices.  So, for a graph like this:

You would get strings like this:

x xa
ab bb
bc cc
ca a
cd dd

The idea is that each string in A corresponds to a directed edge, and each string in B corresponds to 2 copies of the destination of that edge.  The “x” is there to give us a starting point- we arbitrarily choose some starting vertex to be the start.  All edges that have this starting vertex as its destination only have 1 copy (instead of 2) of that vertex in B.

The problem is that a string like xabbcca  (corresponding to a non-Hamiltonian cycle) is a solution to the PCP problem, but is not a solution to the HC problem.  I don’t know how to force the string to be long enough to visit all vertices- I thought about working on a variant of the problem where you force the size of the string you create to be exactly =K instead of ≤ K.  But  since a cycle that doesn’t involve the start vertex can be “pumped” several times to make a string longer, that won’t work either.  There is probably something clever you can do with the strings to force you to visit all vertices, but I can’t see it.

Difficulty: 10, sadly.  I’m sure there is something easier out there.