Monthly Archives: August 2017

Rectilinear Picture Compression

This problem is apparently one of the more famous “unpublished manuscript” problems.  The reference in G&J is an unpublished manuscript by William Masek.  David Johnson mentioned in one of his NP-Completeness columns that Masek had “disappeared from the theory community”, but Johnson had the manuscript, and put out a call to Masek for permission to publish it as an article.  I guess that he never resurfaced, since I can’t find anything. I did find a different paper that explains the reduction though, so we’ll use that.

The problem: Rectilinear Picture Compression.  This is problem SR25 in the appendix.

Description: Given an NxN matrix M, consisting of 0’s and 1’s, and a positive integer K, can we find K or fewer rectangles in M that cover precisely all of the 1’s in M?

Example: Suppose M was:

1 0 0 1 1
1 0 1 1 1
0 0 1 1 0
0 0 1 1 1
1 0 0 0 1

Here is what I think the best way to draw rectangles is:

Notice that 1×1 rectangles are allowed, and by the way I read the problem definition you can have a cell that is in multiple rectangles.  (What you can’t have is a 0 inside a rectangle, or a 1 outside of a rectangle).

Reduction: The good news about Johnson’s NP-Competeness column is that it led me to a technical report by Conn and O’Rourke that explains most of the details of Masek’s reduction.  Which is a good thing, because it’s very complicated.  He basically defines shapes of 1’s as “wires” running through the matrix.  Here are the examples in the Conn and O’Rourke paper:

These wires will be used to represent boolean expressions (truth values are expressed as rectangles of 1’s: True is a 2×1 rectangle, false is a 1×2)  Masek then proves several lemmas about how many rectangles it takes to cover each figure, and gives an algorithm to show how to build a figure with the wire components for any boolean expression.  From the “how many rectangles does it take to cover each figure” lemmas, we know how many rectangles it will take to cover the figure if it is satisfiable, which becomes the K for the problem.

Difficulty: 9, maybe 10.  The Conn and O’Rourke paper doesn’t actually show the proofs of any of these lemmas, but I believe I could follow the reduction if I had access to the lemmas and the construction algorithm.

Regular Expression Substitution

This week’s problem is one of those “private communication” problems, so I had to create a reduction myself. I’m not confident it’s right, so let’s find out!

The problem: Regular Expression Substitution.  This is problem SR24 in the appendix.

The description:  Given two finite alphabets X and Y (possibly with different numbers of symbols, possibly with some overlap between the symbols), and a regular expression R over X∪ Y.  Also, we have one regular expression Ri over Y for each symbol in X, and a string in Y*.  Can we find a string z in the language of R, and a string wi in each of the Ri languages such that if we start with z, and replace each symbol xi with the string wi, the resulting string is w?

Example:  Here’s a pretty simple example.  Obviously you can use much more complicated regular expressions to make much more complicated things.

X = {a,b,c}.   Y = {1,2,3,4} w = 1231122334

R = X*Y (0 or more symbols from X, ending with a symbol from Y)

R1 = 1 + 11

R2 = 2+22

R3 = 3+33

z can be abcabc4

Each occurrence of a will be replaced by a string from R1 (1 the first time, 11 the second time), each occurrence of b will be replaced by a string from R2, and each occurrence of c will be replaced by a string from R3, getting us w back.

Reduction: G&J say to use X3C.  Here’s what I think I want to do:

We’re given an instance of X3C, so a set S (normally it’s X, but I don’t want to confuse it with the X in this problem) with 3q elements, and a collection C of 3-element subsets of X.  Our alphabet X will have one symbol for each element of each set in C (So an element will be something like ci,j for element i of set j.).  Our alphabet Y will have one symbol for each element in S.

z will be the string s0s1..s3q  (all the elements of S in some arbitrary order).

R will be the regular expression “Strings of length 3q over X that for each set in C either does not use that set at all, or uses all three elements in it exactly once”.  This is the part I’m not confident in.  I’m sure this is a description of a regular expression.  What I’m not sure of is whether this expression can be expressed in a form that is polynomial in the size of S and C.  I don’t know enough about how to minimize regular expressions to be able to answer that.

Anyway, from there, each expression Ri generates the single symbol in Y that is the corresponding element of the set in X (so the expression for the element in X c1,5 would be whatever the first element of set 5 in C is).

The idea is that you generate w by finding the string z  that has the correct elements of the correct sets. The rules for how you can make z give you constraints that your solution is a legal X3C solution.

Difficulty: 9.  Maybe it should be a 10, since I’m not really confident this is correct.  The real trouble I had coming up with a reduction was that strings in regular expressions have a fixed order, but the X3C problem just wants sets, which have no orderings.  So you need a way to either for the single string w to stand for any arrangement of symbols in S.  And while “uses each symbol in Y exactly once” is a regular expression, I’m not sure that can be written in polynomial time relative to the size of S and C either.  The only way I can think to do it offhand is by choosing between all (3q)! permutations of the symbols.

Internal Macro Data Compression

I’m finally back from my trip.  Apologies for the delay.

The problem: Internal Macro Data Compression.  This is problem SR23 in the appendix.

The description: Given an alphabet Σ, a string s in Σ*, a “pointer cost” h and a bound B, can we find a string C in the alphabet of Σ augmented with at most s “pointer characters” pi such that:

  • |C|+ (h-1)* (# of pointer characters in C) is ≤ B, and
  • We can replace pointer characters with substrings of C to regain s?

Example: This is similar to the last macro compression problem, but instead of having 2 strings C and D, C serves both purposes.  In our example from last time, we had s = “ABCDEFABCGABCDEF”, and came up with C= “pqpGpq” and D= “ABCDEF”.  The difference here is that there is no D string, we have to do it all in C.  We can do that by letting C = “ABCDEFqpGpq”.  Now if we let p=”ABC” (or, “the substring of C of length 3 starting at position 1” and q = “DEF” (or, “the substring of C of length 3 starting at position 4”, we get a string of cost 11+3h that gives us back s if we replace all pointers with the strings they represent.

Reduction: The paper by Storer that we’ve been using for the last few problems also has this reduction.  I think this is his “CPM” problem.  He uses the “K-Vertex Cover” problem, in the case where K=1.   Recall in that case we are looking for a set of edges E’ such that each edge in E shares a vertex with an edge in E’. We start with a graph, which is an instance of this problem.  The alphabet we build will have:

  • A special symbol $
  • One symbol for each vertex and edge in G

For each edge ei =  (vj, vk), define the substring Ei = $vj$vk$

Our s = the concatenation of eiEi for all edges i (So the symbol for the edge followed by the substring of the edge).  B = J + |S| – |E|.

The idea (and I’ll admit I have a hard time following this paper) is that if G has a 1-VC, then we have J edges that form a set E’ such that each edge in E is adjacent to something in E’.  This means that we can overlap those edges and make pointers out of the Ei strings to save enough characters.

Difficulty: 9.  I’m sure this can be explained better, but I really have a hard time following it.