Tableau Equivalence

This is another problem where I’m going to supplement G&J’s problem description with definitions from the paper by Aho, Sagiv, and Ullman that has the reduction:

The problem: Tableau Equivalence.  This is problem SR32 in the appendix.

Definitions: Given a set of attributes A, and set F of ordered pairs of subsets of A (functional dependencies).  Most database queries ask for certain attributes (these are the “distinguished variables” in the G&J definition) that fill requirements defined by other attributes and values (anything new added are the “undistinguished variables” in the G&J definition).

A Tableau is a matrix that represents these attributes and variables.  The columns correspond to attributes, and the rows correspond to tuples that are returned by the query.   The first row of the tableau is the “summary” of the tableau and holds the distinguished variables we want to return (and possibly some constants)

Tableau example: This example comes from p. 223 of the paper.  For the query:

Find all values for variables a1 and a2 such that we can find values for variables b1 through b4 such that the following strings are all in our database instance (called “I” in the paper):  By convention in the paper, a variables are distinguished variables, and b variables are undistinguished.

  • a1b1b3
  • b2a21  (That’s the constant 1 at the end)
  • b2b1b4

If I was the strings {111,222, 121}, then all assignments of 1’s and 2’s to a1 and a2 work:

  • If a1 and a2 are both 1, then assigning all b variables to 1 generates the string 111 in all 3 cases above, which is in I
  • If a1 = 1 and a2 = 2, then we can assign 2 to b1 and 1’s to all over b variables and get 121 in all cases, which is in I.

Before I get to the 2 harder cases, let me show the tableau:

a1 a2
a1 b1 b3
b2 a2 1
b2 b1 b4

The first row lists the attributes we’re considering (a1 comes from A and only occurs in the first spot in the result string, a2 comes from B and only occurs in the second spot in the result string.  Our query doesn’t want any variables from C.  The summary lists the variables from the attributes that form our distinguished variables.

The rows below the summary show how to build legal strings (like my list above).

So now we can use the tableau to help see how to find legal values to the variables:

  • If a1 = 2 and a2 = 1, then we need to assign a 1 to b2 to make the second(non-summary) row 121.  We also need to assign a 2 to b1 and b3 to make the first row 222.  This means we need to assign a 1 to b4 to make the bottom row 121.
  • If a1 and a2 are both 2, we need to set b2 to 1 to make the middle row 121.  We need to set b1 and b3 to 2 to make the top row 222.  This means we need to set b4 to 4 to make the bottom row 121.

The problem: Given two Tableaux T1 and T2 which share the same A, F, X, and Y sets (as defined above), are they equivalent?  That is, do they generate the same sets of legal values for their distinguished variables for all possible I sets?

Example: This gets tricky, partially because of the need to worry about “all possible I sets”, and partially because adding functional dependencies makes things equivalent in subtle ways.  Here is the example from page 230 of the paper:

a1 a2 a3 a4
a1 b1 b2 b3
b4 b1 a3 b5
a1 a2 b6 b7
a1 b8 b9 a4

If we have the functional dependencies B->A and A->C are true.  Then all strings with b1 in the second character must have the same value (a1) in the first character.  Similarly, A->C implies that the entire third column can be replaced by a3.  This gives us the equivalent tableau:

a1 a2 a3 a4
a1 b1 a3 b3
a1 b1 a3 b5
a1 a2 a3 b7
a1 b8 a3 a4

And also, if the only difference between two rows is that one row has nondistinguished variables that don’t appear anywhere else, that row can be eliminated.  So we can get rid of the first row (with b3) because it is just like the second (with b5) But then we realize that the a1b1a3b5 row only differs from the row below it in b1 and b5 which now only appear in that row.  So we can remove that row too, to get the equivalent:

a1 a2 a3 a4
a1 a2 a3 b7
a1 b8 a3 a4


Reduction: From 3SAT.  Given a formula, we build 2 tableaux.  The set A will have 1 attribute for each clause and variable in the formula.  The clause variables will be distinguished variables.  Our T1 tableau will be set up as follows:

For each clause, we will set a common undistinguished variable for the 3 variables in the row (so each clause that has that variable will have the same undistinguished variable in that column), and separate (only used once) undistinguished variable in the other columns.

The T2 tableau will have 7 rows for each row in T1.  In T2 we replace the common undistinguished variables with 7 sets constants(0 or 1)  that are the ways to set the variables to make the clause true.

They prove a bunch of lemmas after this, but it boils down to: If we have a truth assignment for the formula, we can map that to the tableau by setting the common variables in T1 to the truth values in the assignment.  All clauses will be true in both T1 and T2.  If the tableaux are equivalent, then we must have found a way to set those common variables, and that gives us a truth assignment for the formula.

Difficulty: 7.  This isn’t that hard of a reduction.  Even the lemmas aren’t too hard, though they do depend on a paper’s worth of previous results in equivalences (like the functional dependency thing I did in the example).  But there is a ton of definitions to get through before you can start.

Conjunctive Boolean Query

This is another problem from the same paper as last week’s.  The paper this time though, doesn’t really give a reduction.  Hopefully, I think it’s easy enough for us to build it ourselves.

The problem: Conjunctive Boolean Query.  This is problem SR31 in the appendix.

The description: Given a conjunctive query in the format described last week, is the query true?  (That is, can we find any tuples to satisfy the query?)

Examples: Here is the “List all departments that sell pens and pencils” query from last week:

(x). Sales(x, pen) and Sales(x, pencil).

This problem would return true if there was an actual tuple in the database that could bind to x, and false otherwise.

Reduction: The paper by Chandra and Merlin that we used last week has the definition of this problem, but just says the NP-Completeness “follows from, say, the completeness of the clique problem for graphs”

But I think the reduction is pretty easy to spell out.  If we’re given an instance of Clique (a graph G and an integer K), we can just build the query:

“Does there exist k vertices x1 through xk such that all edges between any two vertices in the set exist?”

We can implement this as a database by creating a relation for all edges, and then the conjunctive query will have at most O(V2) clauses.

Dififculty: 3, but it’s a lot of work to understand the database definitions to get to the actual problem.

Conjunctive Query Foldability

Back on track with more database problems.  This is another one of those “really hard to explain in NP-Complete language” problems.

The problem: Conjunctive Query Foldability.  This is problem SR30 in the appendix.

The description: I find G&J’s definition very confusing, I think because they don’t have room to define what the various variables mean in terms of databases (they actually send you to the paper, as if they know it was confusing).  The definitions in the paper by Chandra and Merlin that has the reduction do a good job of giving names to things and building up slowly.  So:

They define a Relational Data Base as a finite domain D, and a set of relations R1 through Rs.  Each relation is an ordered tuple of elements of D.

A first-order query is of the form (x1, x2, …xk). Φ(x1, x2, ..xk) where Φ is a first-order formula whose only free variables are x1 through xk

The result of a query on a database B with domain D is a set of k-tuples {(y1..yk) ∈ Dk such that Φ(y1..yk) is true in B}.

A conjunctive query is a first-order query of the form (x1, ..xk).∃ xk+1,xk+2, …xm. A1∧A2..Ar, where each Ai is an atomic formula asking about a relation in a database, and each element in the relation is a variable form x1 to xm or a constant.

Examples of conjunctive queries: Here are some examples from the paper, so we can see how this relates to databases:

“List all departments that sell pens and pencils”.  If we have a relation Sales(Department, Item)  (“Department” and “Item” are placeholders for variables), the conjunctive query is:

(x). Sales(x, pen) and Sales(x, pencil).

In this case, there are no new variables (or ∃) added in the query, and “pen” and “pencil” are constants.

“List all second-level or higher managers in department K10” (People who are in X10 who manage someone who manages someone else).  If we have a relation Emp(Name, Salary, Manager, Dept), the conjunctive query is:

(x1).∃(x2, …, x9) .Emp(x2, x3, x4, x5) ∧Emp(x4, x6, x1, x7) ∧ Emp(x1, x8, x9, K10)

In this query, x1 is the “answer”, but depends on the existence of the other x variables.  In particular, x4, who is the employee x1 manages, and x2, who is the employee x4 manages.

Ok, now the actual foldability definition (this is written less formally than either the paper or G&J):

Given two conjunctive queries Q1 and Q2, can we create a function σ mapping all constants and variables x1 through xk to themselves (and possibly mapping other variables to constants or variables from x1 to xk as well), where if we replace each variable x  in Q1 with σ(Q1), we get Q2?

Example: Here is a “worse” version of the first example above:

(x).Sales(x, xpen) ∧ Sales(x,xpencil) ∧ xpen=pen ∧ xpencil = pencil

This can be “folded” into:

(x).Sales(x, pen) ∧ Sales(x,pencil) ∧ pen=pen ∧ pencil = pencil still has those redundant equalities at the end, though, but I don’t see a way to use the definitions of foldability to get rid of them.  I think that the definition given in Chandra and Merlin’s paper might let you do it because their function maps the “natural model” of a query, which I think includes relations.

Reduction: From Graph Coloring.  We start with a graph and an integer K.  We build one variable for each vertex, and K additional (separate) vertices, in a set called C. The relational model R relates vertices connected by an edge.

Q1 is: (). \exists V.\exists C. (\bigwedge\limits_{(u,v) \in E} R(u,v) \wedge \bigwedge\limits_{(u,v)\in C, u\ne v} R(u,v))

Q2 is (). \exists C. \bigwedge\limits_{(u,v)\in C, u \ne v} R(u,v))

So what I think we’re trying to do is “fold” the variables in V into the variables in C, so that there is still a relation between each edge.  That only works if the vertices on each side of the edge are different colors because otherwise.the relation won’t show up (because of the u \ne v constraint).  But I’m not sure, and the proof in the paper stops after building this construction.

Difficulty: 9.  I’m pretty sure a lot of this is going over my head.

Additional Key

After skipping around it for a while, we’re finally ready to return to this problem.

The problem: Additional Key.  This is problem SR27 in the appendix.

The description: Given a set A of attributes, and a set F of functional dependencies, a subset R of A, and a set K of keys on R,  can we find an additional key on R for F?  (Which is an additional subset R’ of R, that is not in K, that serves as a key for F)

Example: This example is from the Beeri and Bernstein paper that had last week’s reduction, and has this week’s as well:

  • ({A,B), {C}
  • ({C}, {B}

If R = A = {A,B,C}, and K = the single key {A,B}, then {A,C} is an additional key.

Reduction: The Beeri and Bernstein paper uses a similar construction from last week’s.  Again, we reduce from Hitting Set.

They build the same construction as in the BCNF violation reduction.  Recall that the set was set up so that there was a BCNF violation if there was a hitting set, and that violation happened because some but not all attributes were functionally dependent on the hitting set.  So if we add those extra attributes to the set of attributes in the hitting set, we get an additional key.

That’s the basic idea, but the paper has some parts I have trouble following (there are some notation issues I can’t parse, I think.  For example, it’s not immediately obvious to me what K is for this problem.  I think it’s the set of new attributes, but I’m not sure.

Difficulty: 9 One step harder than the last problem, since it builds on it.

Boyce-Codd Normal Form Violation

This week’s and next week’s problems use basically the same construction in their reduction.  We’re going to do them in reverse order, because that’s the order they’re presented in the paper.

The problem: Boyce-Codd Normal Form Violation.  This is problem SR29 in the appendix.

The description: Given a set of attributes A, a collection F of functional dependencies, and a subset A’ of A, does A’ violate Boyce-Codd normal form?

G&J’s definition of BCNF is: A’ violates BCNF if we can find a subset X of A’ and 2 attributes y and z in A’-X such that the pair (X,y) is in the closure of F, and (X,z) is not in the closure of F.

The paper by Beeri and Bernstein define BCNF as: for all disjoint nonempty sets of attributes X and Y in A, If Y is functionally dependent on X, then all attributes in A depend on X.

Example: Here is an example from the Beeri and Bernstein paper that has the reduction:

Let A = {Apartment_Type, Address, Size, Landlord, Apartment_Num, Rent, Tenant_Name}

(I’m pretty sure “Apartment_Type” means something like “Two Bedroom”, “Size” is in square feet, “Address” is the street address of the apartment building, at which there are potentially many apartments each with a different “Apt_Num”)

With these functional dependencies:

  • ({Apartment_Type, Address}, {Size})
  • ({Address}, {Landlord})
  • ({Address, Apt_Num},{Rent})
  • ({Name}, {Address, Apt_Num})

If A’ is {Name, Address, Apt_Num, Rent} then BCNF is violated because Rent is functionally dependent on Name (through the pair of attributes {Address, Apartment_Num}, but other attributes (for example, Size) is not.

Reduction: Beeri and Bernstein reduce from Hitting Set.  So we start with a set S, and a collection C of subsets of S, and an integer K.  We need to create a set of attributes and functional dependencies.

Each element in S will be an attribute in A.  If we have a hitting set S’ of the attributes, then we could create a BCNF violation by making some of the attributes (but not all) of A functionally dependent on S’.

So we create a new attribute for each subset in C.  We create dependencies between the element attribute of each element in each C and its “set” attribute.  Now if we have a hitting set S’, then that set is functionally dependent on each element attribute.  So then if we make some but not all attributes in A dependent on the set of all element attributes, we get a BCNF violation.

That’s the basic idea, but the actual proof has a lot of special cases to worry about and resolve.

Difficulty: 8.  The general idea isn’t too hard (though you have to learn the definition of BCNF to even get started), but there is a ton of details and special cases to worry about to do it right.

Prime Attribute Name

We’re skipping ahead a bit because this reduction is from the same paper as last week’s (and reduces from last week’s problem).

The problem: Prime Attribute Name.  This is problem SR28 in the appendix.

The description: Given a set A of attribute names, and a collection F of functional dependencies like in the last problem, and a specific attribute x in A, is x in some key K for the relational system <A,F>?  (Recall that keys need to be of minimal size).

Example: Using the example from last week- the list of attributes was {Name, ID, Course, Professor, Time}.  The functional dependencies were:

  • ({Name},{ID})  (If you know a student’s name, you can figure out their ID)
  • ({ID},{Name})  (If you know a student’s ID, you can figure out their name)
  • ({Name, Course}, {Professor, Time}) (If you know a student’s name and a course they are in, you can figure out the professor teaching it and what time the course is)
  • ({Professor, Time), {Course}) (If you know the professor of the course and the time it is being taught, you can figure out the course being taught.  Note that you can’t figure out the student because more than one student will be in the course)

Recall that we said the pair of attributes {Name, Course} was a key for this system.  (So, if x was “Key” or “Course”, a solver for this problem should answer “yes”.)  But since Name and ID are related, ID is also a prime attribute.  I think  Professor and Time are not prime, because any subset of A that includes them and has the required characteristic is not minimal (If you take Professor, you’d have to add two other things- for example Course and ID).

Reduction: This reduction is in the same paper that has the Minimum Cardinality Key reduction, and in fact reduces from Minimum Cardinality Key.  Their terminology differs from what I typically use, but I’ll try to do it their way to be close to the paper.

The MCK instance starts with a set of attributes A’, a set of relations D[0]’ (what they call F), and a key size m (what we call k).  We need to build an instance of the PAN problem: a set of attributes A, a new set of functional dependencies D[0], and a specific attribute we want to check: b (what our definition calls x).

First, they define a set A”, which is “a set whose cardinality if the smaller of |A’| and m”. (I guess of all new elements).  They define A to be A”xA’ unioned with A’ unioned with a new attribute b.  They define D[0] with the following characteristics:

  • For all subsets E and F of A’, if E relates to F in D[0]’, E unioned with B relates to F in D[0]
  • A’ unioned with b relates to A”xA’
  • for each ordered pair (i,e) in A”xA’, b unioned with (i,e) relates to e.
  • for each element i in A’ and all distinct e,f in A, the set {(i,e), (i,f)} relates to b.
  • Each element e in A’ relates to b.

The idea now is that if D[0] has a minimal key that contains b, you need a bunch (N) of the (i,e) sets as well in the key to distinguish which ones relate to b.  Then the A’ part of the ordered pair gives you n elements in A’ that are a key for A’.

If A’ has a key of cardinality n <= m, then we know that A” has at least n elements, and adding b to the pairs of items in A’ and A” gets you a key for A.

Also, b has to be in the key for A, since removing it gets you a set that is not D[0] expandible,

Difficulty: 9.  This is really a lot to digest.  I’m not entirely sure I get what’s happening myself.

Minimum Cardinality Key

Now we enter the “Database Problems” subsection of the chapter.  I’ll admit that I’ve had trouble with a lot of these, partially because databases isn’t an area I know a ton about, but also because the definitions of the problems often obscure what’s really happening.

The problem: Minimum Cardinality Key.  This is problem SR26 in the appendix.

The description: Given a set A of attributes, a collection F of ordered pairs of subsets of A (“functional dependencies”), and a positive integer M, can we find a set K of A attributes or less that form a key for the relational system <A,F>.

G&J defines “form a key” as the ordered pair (K,A) belongs to the closure F* of F, with the following properties:

  • F ⊆ F*
  • B⊆C⊆A implies (C,B) ⊆ F*  (the “projective” closure)
  • (B,C) and (C,D) ⊆ F* implies (B,D) ⊆ F*  (the “transitive” closure)
  • (B.C) and (B,D) ⊆ F* implies (B,C∪D) ⊆ F* (the “additive” closure)

Example: The paper by Lucchesi and Osborn that has the reduction has a good example:

Suppose we have a database table for student records.  The attribute set A could be {Name, ID, Course, Professor, Time}.  Some functional dependencies, written as ordered pairs, could be:

  • ({Name},{ID})  (If you know a student’s name, you can figure out their ID)
  • ({ID},{Name})  (If you know a student’s ID, you can figure out their name)
  • ({Name, Course}, {Professor, Time}) (If you know a student’s name and a course they are in, you can figure out the professor teaching it and what time the course is)
  • ({Professor, Time), {Course}) (If you know the professor of the course and the time it is being taught, you can figure out the course being taught.  Note that you can’t figure out the student because more than one student will be in the course)

If these relations are F, then we can derive all of the attributes just from knowing the name of the student and the course they are taking.  In other words, F* contains the pair ({name, course}, (name, ID, course, professor, time).

Reduction:  The Lucchesi and Osborn paper use different terminology than G&J’s.  They define a relation D, defined in a series of inductive steps, where D[0] is F, and each D[i] is created from D[i-1] by doing one step of one of the closures on the elements in D[i-1].  The final closure (what G&J call F*) they call D.

They also define an expansion of a relation D: Given a subset B of our attribute set A, if a relation (L,R) is in D, where L is in B, but R isn’t, the set B∪R is a D-expansion of B.

They first show that if B is a subset of A, and B’ is a D[0]-expansion of B, then (B,B’) and (B’,B) are in D.  They then show that a subset B of A is D[i]-expansible  (and thus D-expansible) if and only if it is D[0]-expansible.

Then, they do the reduction from VC. A will have one attribute for each vertex in G.  D[0] (G&J’s F) will, for each vertex v, relate the set of v’s neighbors to v.  They show that a subset K of A is a key if and only if it’s a vertex cover of V.  They do this by induction: if K=A, it’s obviously a cover.  If K is smaller than A, then K is a key if and only if there is a D[0] expansion that is also a key. K has a D[0] expansion if and only if there is some vertex v not in K such that all of the vertices adjacent to v are reachable by K.  So, K is a key (and also a cover of A).

Difficulty: 8. There must be an easier way to explain this problem.

Rectilinear Picture Compression

This problem is apparently one of the more famous “unpublished manuscript” problems.  The reference in G&J is an unpublished manuscript by William Masek.  David Johnson mentioned in one of his NP-Completeness columns that Masek had “disappeared from the theory community”, but Johnson had the manuscript, and put out a call to Masek for permission to publish it as an article.  I guess that he never resurfaced, since I can’t find anything. I did find a different paper that explains the reduction though, so we’ll use that.

The problem: Rectilinear Picture Compression.  This is problem SR25 in the appendix.

Description: Given an NxN matrix M, consisting of 0’s and 1’s, and a positive integer K, can we find K or fewer rectangles in M that cover precisely all of the 1’s in M?

Example: Suppose M was:

1 0 0 1 1
1 0 1 1 1
0 0 1 1 0
0 0 1 1 1
1 0 0 0 1

Here is what I think the best way to draw rectangles is:

Notice that 1×1 rectangles are allowed, and by the way I read the problem definition you can have a cell that is in multiple rectangles.  (What you can’t have is a 0 inside a rectangle, or a 1 outside of a rectangle).

Reduction: The good news about Johnson’s NP-Competeness column is that it led me to a technical report by Conn and O’Rourke that explains most of the details of Masek’s reduction.  Which is a good thing, because it’s very complicated.  He basically defines shapes of 1’s as “wires” running through the matrix.  Here are the examples in the Conn and O’Rourke paper:

These wires will be used to represent boolean expressions (truth values are expressed as rectangles of 1’s: True is a 2×1 rectangle, false is a 1×2)  Masek then proves several lemmas about how many rectangles it takes to cover each figure, and gives an algorithm to show how to build a figure with the wire components for any boolean expression.  From the “how many rectangles does it take to cover each figure” lemmas, we know how many rectangles it will take to cover the figure if it is satisfiable, which becomes the K for the problem.

Difficulty: 9, maybe 10.  The Conn and O’Rourke paper doesn’t actually show the proofs of any of these lemmas, but I believe I could follow the reduction if I had access to the lemmas and the construction algorithm.

Regular Expression Substitution

This week’s problem is one of those “private communication” problems, so I had to create a reduction myself. I’m not confident it’s right, so let’s find out!

The problem: Regular Expression Substitution.  This is problem SR24 in the appendix.

The description:  Given two finite alphabets X and Y (possibly with different numbers of symbols, possibly with some overlap between the symbols), and a regular expression R over X∪ Y.  Also, we have one regular expression Ri over Y for each symbol in X, and a string in Y*.  Can we find a string z in the language of R, and a string wi in each of the Ri languages such that if we start with z, and replace each symbol xi with the string wi, the resulting string is w?

Example:  Here’s a pretty simple example.  Obviously you can use much more complicated regular expressions to make much more complicated things.

X = {a,b,c}.   Y = {1,2,3,4} w = 1231122334

R = X*Y (0 or more symbols from X, ending with a symbol from Y)

R1 = 1 + 11

R2 = 2+22

R3 = 3+33

z can be abcabc4

Each occurrence of a will be replaced by a string from R1 (1 the first time, 11 the second time), each occurrence of b will be replaced by a string from R2, and each occurrence of c will be replaced by a string from R3, getting us w back.

Reduction: G&J say to use X3C.  Here’s what I think I want to do:

We’re given an instance of X3C, so a set S (normally it’s X, but I don’t want to confuse it with the X in this problem) with 3q elements, and a collection C of 3-element subsets of X.  Our alphabet X will have one symbol for each element of each set in C (So an element will be something like ci,j for element i of set j.).  Our alphabet Y will have one symbol for each element in S.

z will be the string s0s1..s3q  (all the elements of S in some arbitrary order).

R will be the regular expression “Strings of length 3q over X that for each set in C either does not use that set at all, or uses all three elements in it exactly once”.  This is the part I’m not confident in.  I’m sure this is a description of a regular expression.  What I’m not sure of is whether this expression can be expressed in a form that is polynomial in the size of S and C.  I don’t know enough about how to minimize regular expressions to be able to answer that.

Anyway, from there, each expression Ri generates the single symbol in Y that is the corresponding element of the set in X (so the expression for the element in X c1,5 would be whatever the first element of set 5 in C is).

The idea is that you generate w by finding the string z  that has the correct elements of the correct sets. The rules for how you can make z give you constraints that your solution is a legal X3C solution.

Difficulty: 9.  Maybe it should be a 10, since I’m not really confident this is correct.  The real trouble I had coming up with a reduction was that strings in regular expressions have a fixed order, but the X3C problem just wants sets, which have no orderings.  So you need a way to either for the single string w to stand for any arrangement of symbols in S.  And while “uses each symbol in Y exactly once” is a regular expression, I’m not sure that can be written in polynomial time relative to the size of S and C either.  The only way I can think to do it offhand is by choosing between all (3q)! permutations of the symbols.

Internal Macro Data Compression

I’m finally back from my trip.  Apologies for the delay.

The problem: Internal Macro Data Compression.  This is problem SR23 in the appendix.

The description: Given an alphabet Σ, a string s in Σ*, a “pointer cost” h and a bound B, can we find a string C in the alphabet of Σ augmented with at most s “pointer characters” pi such that:

  • |C|+ (h-1)* (# of pointer characters in C) is ≤ B, and
  • We can replace pointer characters with substrings of C to regain s?

Example: This is similar to the last macro compression problem, but instead of having 2 strings C and D, C serves both purposes.  In our example from last time, we had s = “ABCDEFABCGABCDEF”, and came up with C= “pqpGpq” and D= “ABCDEF”.  The difference here is that there is no D string, we have to do it all in C.  We can do that by letting C = “ABCDEFqpGpq”.  Now if we let p=”ABC” (or, “the substring of C of length 3 starting at position 1” and q = “DEF” (or, “the substring of C of length 3 starting at position 4”, we get a string of cost 11+3h that gives us back s if we replace all pointers with the strings they represent.

Reduction: The paper by Storer that we’ve been using for the last few problems also has this reduction.  I think this is his “CPM” problem.  He uses the “K-Vertex Cover” problem, in the case where K=1.   Recall in that case we are looking for a set of edges E’ such that each edge in E shares a vertex with an edge in E’. We start with a graph, which is an instance of this problem.  The alphabet we build will have:

  • A special symbol $
  • One symbol for each vertex and edge in G

For each edge ei =  (vj, vk), define the substring Ei = $vj$vk$

Our s = the concatenation of eiEi for all edges i (So the symbol for the edge followed by the substring of the edge).  B = J + |S| – |E|.

The idea (and I’ll admit I have a hard time following this paper) is that if G has a 1-VC, then we have J edges that form a set E’ such that each edge in E is adjacent to something in E’.  This means that we can overlap those edges and make pointers out of the Ei strings to save enough characters.

Difficulty: 9.  I’m sure this can be explained better, but I really have a hard time following it.