Longest Common Subsequence

I’m going out of order here, since the next few problems are all related, and this is the easiest one, and is a good way to get thinking about this kind of problem.

The problem: Longest Common Subsequence.  This is problem SR10 in the appendix.

The description: Given a finite set of strings R over some finite alphabet Σ, and a positive integer K, can I find a string w over the alphabet Σ such that w has length K or more, and w is a subsequence of each string in R?

Example: Suppose R was the strings {aab, aaab, aaaba, aaaa}

Then the longest w we could choose that is a subsequence of everything in R wold be “aa”.  Notice that if we add “ba” to R, then the longest w we can make is length 1 (the string “a” or the string “b”)

Reduction: The paper by Maier uses VC.  (This is the “LCS” problem in the paper).  So we start with a graph G = (V,E)  and a K.  Our alphabet Σ has one character for each vertex in V.  R will consist of |E|+1 strings, based on a “template string” T consisting of the characters for all of the vertices in V, in some order. The first string in R is T itself.  Then, each edge (u,v) gets a string: Two copies of T concatenated.  The first copy has u removed, the second one has v removed.  Set K’ to |V|-K.

Suppose G has a VC, call it V’.  Then take T and remove all of the vertices in V’ from it.  This remaining string will be called T’ and is of size |V|-K.  We will show that T’ is a subsequence of each string in R.  It’s pretty clearly a substring of T.

Each other string in R is built from some edge (u,v) in E.  So either u or v is in V’.  So u will be missing from T’, and u will be missing from the first copy of T in the string in R, so T’ is a substring of that first copy.  Similarly, if v is in V’, then T’ is a substring of the second copy of T.

If we find a string T’ of size K’, then first notice that for each string in R made from an edge (u,v), T’ can’t have both u and v.  The reason is that if (say) u comes before v in T, then in the string in R, u will be missing in the first half of that string, and v will be missing in the second half.  So the occurrence of v will come first in the string and the occurrence of u will come second, so they will be “swapped”.

It’s much easier to see with an example.  Suppose we have 5 vertices, and T = 12345.  If we have the edge (2,4), then the string in R is 13451235.  T’ can’t contain 24 in that order, because the 4 comes before the 2 in the string.

So this means that if we let V’ be the vertices in V that are not in T’, we have K vertices and at least one of the characters corresponding to that vertex is in each string in R, and so at least one vertex from V’ is in each edge in E.  This forms a vertex cover of V.

Difficulty: 5.  I may have downgraded this difficulty after seeing the much harder SR8 and SR9, but I do think that it is pretty straightforward.

Leave a Reply

Your email address will not be published. Required fields are marked *