Tag Archives: Difficulty 8

External Macro Data Compression

These next problems are related “macro compression” problems.

The problem: External Macro Data Compression.  This is problem SR22 in the appendix.

The description: Given a string s over some alphabet Σ, a “pointer cost” h and a bound B, can we find two strings D and C over the alphabet of Σ augmented with some number (< |s|) of “pointer characters pi such that:

  • |D| + |C| + (h-1)* (# of p characters in D and C) ≤ B
  • We can generate s by replacing pointer characters in C with their “meaning” in D.

Example: The concept they’re trying to get at isn’t that hard, it’s just hard to explain using mathematical language.  Suppose s = “ABCDEFABCGABCDEF”

Then we can define D to be “ABCDEF” and C to be “pqpGpq”.  The total size of this is 6 for D, plus 6 for C.  There are 5 pointers, so our total cost is 6+6+5h.

The idea is that the characters p and q are “pointers” that refer to substrings in D (p refers to “ABC” and q refers to “DEF”).  By replacing those pointers with what they “mean” in C, we can get back s.

A tricky part of this is that you are allowed to have substrings overlap.  So if s was “ABCDBCD” we could define D to be “ABCD” and C to be “pq” with p meaning “ABCD” and q meaning “BCD”.  Now our total cost is 4 for D, 2 for C, and 2 pointers, so 4+2+h.

Reduction:  The reduction (and the one for SR23) comes from a technical report by Storer, which is pretty dense.  I think we’re looking at “Theorem 2” in the paper, which is from VC<=3.

The alphabet that will be built from the VC instance has a lot of parts:

  • A special symbol $
  • 3 symbols vi, ai, and bi for each vertex vi in the graph
  • 4 more symbols fi,1,1 through fi,2,2 for each vertex vi in the graph
  • one symbol di for each edge ej in the graph.
  • 2 symbols c1 and c2 (there is actually one c symbol for each value of h.  We’ll assume h=2 here)
  • 3 symbols g1,1 g1,2 and g2,1 (This is also based on h.  So really you’d go from g1,1 through gh-1,2 and add gh,1)

The string s will also be built from parts (a lot of these are based on h has well, again, I’m fixing h=2 to keep it simpler)

  • Vi,l = ai$vi
  • Vi,2 = vi$bi
  • For each edge ei= (vj, vk), Ei = $vj$vj$
  • We also have Z1 = (c1)3 (so 3 copies of c1) and Z2 = (c2)3

s is:

  • Eidi  concatenated for each edge, followed by
  • Vi,jfi,j,k concatenated over each vertex and each possible f symbol, followed by
  • Z1g1,1Z1g1,2, followed by
  • Z2g2,1Z2

K’ = |s| + K – (7/2)|V|.

The basic idea from here is that if G has a VC, we can compress s by making pointers for (among other things) the combination of Vi,1fi,1,2Vi,2 and the combination of Vi,2fi,2,2Vi+1,1  where vertex vi is in the VC.  This lets us use the pointers both for the part of the string with the V’s and f’s, but also for the $vi$ strings in the E components, saving space.

In the other direction, if we have a compressed string of size K’, he shows that means that you have compress a string like the above and the overlaps show you the vertices in the cover.

Difficulty: 8.  I think to really get what’s happening, you need to actually build an example on a graph.  But I think the idea of building the sets so that that is the overlap you’re looking for is something that can be gotten eventually.

String-To-String Correction

This one reminds me of an old puzzle problem.

The problem: String-To-String Correction.  This is problem SR20 in the appendix.

The description: Given two strings x, and y over a finite alphabet ∑, and an integer K.  Can we start from x and derive y in K or fewer steps, where each step is either deleting a symbol or interchanging a symbol.

Example: The puzzles I’m thinking of are the ones that say “Can you go from HELLO to OLE in 6 steps?”

  • HELLO
  • ELLO
  • ELO
  • EOL
  • OEL
  • OLE

The reason this is a hard problem is when you have repeated symbols, and need to “choose” which ones to delete.  A slightly harder example is to go from ABABABABBB to BAAB.  Now which A’s and B’s you choose to delete have an effect on the total number of moves.

The paper by Wagner that has the reduction makes the point that if you replace “Delete” with “Insert”, you get basically the same problem but go from y to x instead of from x to y.  The paper gives other variants that have polynomial solutions (allowing insertions and deletions and changes of a character, whether or not you allow interchange).

Reduction: Wagner uses Set Covering.  So we’re given a collection of sets c1..cn all subsets of some set W, and an integer K.

Define t = |W| and r = t2.  Pick 3 symbols Q,R, and S that are not in W.  The string x  will be:

QrRciQrSr+1

..concatenated together each time for each ci.  (So the copy that has c1 is first, then another copy with c2, and so on.

The string y will be n copies of the substring RQr, followed by w, followed by n copies of the substring Sr+1

The K’ for this problem is (K+1)*r-1 + 2t(r-1) + n(n-1)(r+1)2/2 + d.

d is r*n+|c1…c2| – |W|.

From here, Wagner explains that the way the correction problem works is by deciding which copies of the R, Qr and Sr+1  sets in x match to the corresponding ones in y.  By counting all of the operations that “have” to be done (by deleting or moving characters), he shows that the a correction problem that has weight of K’ has to cross over certain ci sets to make a covering, and that there have to be K or less of those sets.

Difficulty: 8.  The construction is pretty crazy.  You can eventually see where the components come from, but I can’t imagine how he came up with these things (especially the x and y strings) in the first place.

Protected: Shortest Common Supersequence

This content is password protected. To view it please enter your password below:

Protected: Pruned Trie Space Minimization

This content is password protected. To view it please enter your password below:

Protected: Numerical 3-Dimensional Matching

This content is password protected. To view it please enter your password below:

Protected: 3-Matroid Intersection

This content is password protected. To view it please enter your password below:

Protected: Comparative Divisibility

This content is password protected. To view it please enter your password below:

Protected: Intersection Pattern

This content is password protected. To view it please enter your password below:

Protected: Set Basis

This content is password protected. To view it please enter your password below:

Protected: Planar 3-Satisfiability

This content is password protected. To view it please enter your password below: