Tag Archives: Vertex Cover

Shortest Common Supersequence

Now to go back to the problems I skipped over.  This one uses a more complicated version of the “template string” used in the Longest Common Subsequence reduction.

I also think this reduction has a small mistake in the paper, though it’s possible I’m misunderstanding something.  Anyone who’s interested is encouraged to check the paper, and my work to see what’s going on.

The problem: Shortest Common Supersequence.  This is problem SR8 in the appendix.

The description: Given a finite set of strings R over some finite alphabet Σ, and a positive integer K. can I find a string w over the alphabet Σ of size K or less such that each string in R is a subsequence of w?

Example: Here’s the R from last time: {aab, aaab, aaaba, aaaa}

If we choose w to be aaaba, each string in R is a subsequence of w.  (Remember subsequence is different from substring in that subsequences are allowed to have gaps.  So for example aaaa is a subsequence of aaaba.).  If we add the string “bb” to R, then  w needs to have an extra b someplace (some candidates for w are aaabba, aaabab, baaaba, and so on).

Reduction: The same paper by Maier as last week has this reduction too (This is the “SCS” problem in the paper).  The reduction is again from VC, so we’re given a graph G=(V,E) and an integer K.   The alphabet ∑ will have one symbol for each vertex in V, one symbol for each edge in E, and a special * symbol.  Define c = max(|V|,|E|).  Also note that the vertices and edges are “ordered” somehow (it doesn’t matter the exact ordering) so that we can have an order of the vertex and edge characters in ∑.

Just as in the LCS problem, the strings in R will be based on a template string.  The template string T consists of (in order):

  • All of the alphabet symbols corresponding to the vertices in V, in order
  • A section of 4c stars
  • All of the alphabet symbols corresponding to the edges in E, in order, twice in a row (so 2 copies of the first edge symbol, then 2 copies of the second edge symbol, and so on)
  • All of the alphabet symbols corresponding to the edges in E, in order, twice in a row (again)
  • Another section of 4c stars
  • All of the alphabet symbols corresponding to the vertices in V, in order (again)

This template string is length 8c+4|E|+2|V|

T will be in R, and we will have one string in R for each edge in E.  For an edge e=(vi, vj), we create a string with (in order):

  1. Two copies of the alphabet symbol for e
  2. The alphabet symbol for vi
  3. A section of 4c stars
  4. The alphabet symbol for vj
  5. Two copies of the alphabet symbol for e (again)

Set K’ = 8c+6|E| + 2|V| + K

If G has a VC V’ of size K, then each edge has at least one vertex in V’.  Let W be the set of edges whose “first” vertex occurs in the cover (by “first”, we mean the vertex that comes alphabetically first in the ordering we used to make ∑), and let U be the set of all remaining edges (and thus for all edges in U, the “second” vertex of the edge is in V’).

Take our template string T and add:

  • The alphabet symbol for each edge in W, twice, at the beginning of T
  • The alphabet symbols for each vertex in V’ in between the two copies of the edge lists in the middle (between the third and fourth bullets in the above definition of T)
  • The alphabet symbol for each edge in U, twice, at the end of T.

This adds 2|E|+k to T’s size  (each edge is either in W or U, and will appear twice), and so this new T’ is the right size for a SCS solution.

Now we need to show that each string in R is a subsequence of this new T’.  T is obviously a subsequence of T’.  Each string in R that is based on an edge e is a subsequence by figuring whether it’s in W or U.  It may help to break down what T’ looks like:

  1. 2 copies of W
  2. 1 copy of V
  3. 4c stars
  4. 1 copy of E
  5. 1 copy of V’
  6. 1 copy of E
  7. 4c stars
  8. 1 copy of V
  9. 2 copies of U

I personally think this is a mistake in the paper.  I think U needs to go at the beginning and W needs to go at the end, see below.

If e is in W, what we do is:

  • Map section a of the string to section 4 of T’
  • Map section b of the string to section 5 of T’  (We know its in those vertices, because e is in W, so its first vertex is in the cover)
  • Map section c of the string to section 7 of T’
  • Map section d of the string to section 8 of T’
  • Map section e of the string to section 9 of T’

That last part is why I think there is a mistake either in the paper, or in the way I understand it.  You can only map the edge to the last part of T’ if the edge is in that part.  So that part needs to be W, not U.  I can’t find a way to make a mapping if W is first.

If e is in U, what we do is:

  • Map section a of the string to section 1 of T’ (again, assuming the paper has the U’s and W’s backwards)
  • Map section b of the string to section 2 of T’
  • Map section c of the string to section 3 of T’
  • Map section d of the string to section 5 of T’ (the edge being in U means its second vertex is in the cover)
  • Map section e of the string to section 6 of T’

So, T’ is a supersequence of all strings in R.

In the other direction, suppose the strings in R have a supersequence of length K’.  Call that supersequence T” (since it’s not necessarily T’).  To be a supersequence of T,  T” needs to contain two sets of 4c stars someplace, and they will be consecutive (otherwise T” is too long).   Each other string in R will have it’s stars map to either the left set of 4c stars in T or the right set (they won’t split), otherwise T” is too long.

Suppose for some string in R, the stars go to the part of T” that maps the left set of stars in T.  Since there are no edge characters in T before the left set of stars, there needs to be characters added to T” to handle part a of the edge string.  (If the edge string’s stars map to the right end, we have a similar problem at the end of the string). We will also need a place to put the second vertex of the edge (part d of the string) before the copy of the edges in E.

The only way to add these extra symbols to T (forming T”) and not exceed K’ characters in length is to build something like T’ which starts with T, adds a vertex cover in the middle, and adds edges whose first vertex is not in the cover on one side of T, and edges whose second vertex is not in the cover on the other side.

Difficulty: 7.  It’s very hard to see why you need 2 copies of each edge, or why 4c is the number of stars you need.  But the idea of a construction where vertices are on the outside and edges are on the inside (in T) that needs to match up with strings with edges on the outside and vertices on the inside (in the rest of R) is something that I think students can follow, even if I doubt they could come up with it themselves.

Longest Common Subsequence

I’m going out of order here, since the next few problems are all related, and this is the easiest one, and is a good way to get thinking about this kind of problem.

The problem: Longest Common Subsequence.  This is problem SR10 in the appendix.

The description: Given a finite set of strings R over some finite alphabet Σ, and a positive integer K, can I find a string w over the alphabet Σ such that w has length K or more, and w is a subsequence of each string in R?

Example: Suppose R was the strings {aab, aaab, aaaba, aaaa}

Then the longest w we could choose that is a subsequence of everything in R wold be “aa”.  Notice that if we add “ba” to R, then the longest w we can make is length 1 (the string “a” or the string “b”)

Reduction: The paper by Maier uses VC.  (This is the “LCS” problem in the paper).  So we start with a graph G = (V,E)  and a K.  Our alphabet Σ has one character for each vertex in V.  R will consist of |E|+1 strings, based on a “template string” T consisting of the characters for all of the vertices in V, in some order. The first string in R is T itself.  Then, each edge (u,v) gets a string: Two copies of T concatenated.  The first copy has u removed, the second one has v removed.  Set K’ to |V|-K.

Suppose G has a VC, call it V’.  Then take T and remove all of the vertices in V’ from it.  This remaining string will be called T’ and is of size |V|-K.  We will show that T’ is a subsequence of each string in R.  It’s pretty clearly a substring of T.

Each other string in R is built from some edge (u,v) in E.  So either u or v is in V’.  So u will be missing from T’, and u will be missing from the first copy of T in the string in R, so T’ is a substring of that first copy.  Similarly, if v is in V’, then T’ is a substring of the second copy of T.

If we find a string T’ of size K’, then first notice that for each string in R made from an edge (u,v), T’ can’t have both u and v.  The reason is that if (say) u comes before v in T, then in the string in R, u will be missing in the first half of that string, and v will be missing in the second half.  So the occurrence of v will come first in the string and the occurrence of u will come second, so they will be “swapped”.

It’s much easier to see with an example.  Suppose we have 5 vertices, and T = 12345.  If we have the edge (2,4), then the string in R is 13451235.  T’ can’t contain 24 in that order, because the 4 comes before the 2 in the string.

So this means that if we let V’ be the vertices in V that are not in T’, we have K vertices and at least one of the characters corresponding to that vertex is in each string in R, and so at least one vertex from V’ is in each edge in E.  This forms a vertex cover of V.

Difficulty: 5.  I may have downgraded this difficulty after seeing the much harder SR8 and SR9, but I do think that it is pretty straightforward.

Protected: Set Basis

This content is password protected. To view it please enter your password below:

Protected: Minimum Test Set

This content is password protected. To view it please enter your password below:

Protected: Hitting Set

This content is password protected. To view it please enter your password below:

Protected: Min-Max Multicenter

This content is password protected. To view it please enter your password below:

Protected: Maximum Leaf Spanning Tree, Connected Dominating Set

This content is password protected. To view it please enter your password below:

Protected: Path Distinguishers

This content is password protected. To view it please enter your password below:

Protected: Kernel

This content is password protected. To view it please enter your password below:

Protected: Uniconnected Subgraph

This content is password protected. To view it please enter your password below: