I’m finally back from my trip. Apologies for the delay.
The problem: Internal Macro Data Compression. This is problem SR23 in the appendix.
The description: Given an alphabet Σ, a string s in Σ*, a “pointer cost” h and a bound B, can we find a string C in the alphabet of Σ augmented with at most s “pointer characters” pi such that:
- |C|+ (h-1)* (# of pointer characters in C) is ≤ B, and
- We can replace pointer characters with substrings of C to regain s?
Example: This is similar to the last macro compression problem, but instead of having 2 strings C and D, C serves both purposes. In our example from last time, we had s = “ABCDEFABCGABCDEF”, and came up with C= “pqpGpq” and D= “ABCDEF”. The difference here is that there is no D string, we have to do it all in C. We can do that by letting C = “ABCDEFqpGpq”. Now if we let p=”ABC” (or, “the substring of C of length 3 starting at position 1” and q = “DEF” (or, “the substring of C of length 3 starting at position 4”, we get a string of cost 11+3h that gives us back s if we replace all pointers with the strings they represent.
Reduction: The paper by Storer that we’ve been using for the last few problems also has this reduction. I think this is his “CPM” problem. He uses the “K-Vertex Cover” problem, in the case where K=1. Recall in that case we are looking for a set of edges E’ such that each edge in E shares a vertex with an edge in E’. We start with a graph, which is an instance of this problem. The alphabet we build will have:
- A special symbol $
- One symbol for each vertex and edge in G
For each edge ei = (vj, vk), define the substring Ei = $vj$vk$
Our s = the concatenation of eiEi for all edges i (So the symbol for the edge followed by the substring of the edge). B = J + |S| – |E|.
The idea (and I’ll admit I have a hard time following this paper) is that if G has a 1-VC, then we have J edges that form a set E’ such that each edge in E is adjacent to something in E’. This means that we can overlap those edges and make pointers out of the Ei strings to save enough characters.
Difficulty: 9. I’m sure this can be explained better, but I really have a hard time following it.