Monthly Archives: October 2017

Tableau Equivalence

This is another problem where I’m going to supplement G&J’s problem description with definitions from the paper by Aho, Sagiv, and Ullman that has the reduction:

The problem: Tableau Equivalence.  This is problem SR32 in the appendix.

Definitions: Given a set of attributes A, and set F of ordered pairs of subsets of A (functional dependencies).  Most database queries ask for certain attributes (these are the “distinguished variables” in the G&J definition) that fill requirements defined by other attributes and values (anything new added are the “undistinguished variables” in the G&J definition).

A Tableau is a matrix that represents these attributes and variables.  The columns correspond to attributes, and the rows correspond to tuples that are returned by the query.   The first row of the tableau is the “summary” of the tableau and holds the distinguished variables we want to return (and possibly some constants)

Tableau example: This example comes from p. 223 of the paper.  For the query:

Find all values for variables a1 and a2 such that we can find values for variables b1 through b4 such that the following strings are all in our database instance (called “I” in the paper):  By convention in the paper, a variables are distinguished variables, and b variables are undistinguished.

  • a1b1b3
  • b2a21  (That’s the constant 1 at the end)
  • b2b1b4

If I was the strings {111,222, 121}, then all assignments of 1’s and 2’s to a1 and a2 work:

  • If a1 and a2 are both 1, then assigning all b variables to 1 generates the string 111 in all 3 cases above, which is in I
  • If a1 = 1 and a2 = 2, then we can assign 2 to b1 and 1’s to all over b variables and get 121 in all cases, which is in I.

Before I get to the 2 harder cases, let me show the tableau:

A B C
a1 a2
a1 b1 b3
b2 a2 1
b2 b1 b4

The first row lists the attributes we’re considering (a1 comes from A and only occurs in the first spot in the result string, a2 comes from B and only occurs in the second spot in the result string.  Our query doesn’t want any variables from C.  The summary lists the variables from the attributes that form our distinguished variables.

The rows below the summary show how to build legal strings (like my list above).

So now we can use the tableau to help see how to find legal values to the variables:

  • If a1 = 2 and a2 = 1, then we need to assign a 1 to b2 to make the second(non-summary) row 121.  We also need to assign a 2 to b1 and b3 to make the first row 222.  This means we need to assign a 1 to b4 to make the bottom row 121.
  • If a1 and a2 are both 2, we need to set b2 to 1 to make the middle row 121.  We need to set b1 and b3 to 2 to make the top row 222.  This means we need to set b4 to 4 to make the bottom row 121.

The problem: Given two Tableaux T1 and T2 which share the same A, F, X, and Y sets (as defined above), are they equivalent?  That is, do they generate the same sets of legal values for their distinguished variables for all possible I sets?

Example: This gets tricky, partially because of the need to worry about “all possible I sets”, and partially because adding functional dependencies makes things equivalent in subtle ways.  Here is the example from page 230 of the paper:

A B C D
a1 a2 a3 a4
a1 b1 b2 b3
b4 b1 a3 b5
a1 a2 b6 b7
a1 b8 b9 a4

If we have the functional dependencies B->A and A->C are true.  Then all strings with b1 in the second character must have the same value (a1) in the first character.  Similarly, A->C implies that the entire third column can be replaced by a3.  This gives us the equivalent tableau:

A B C D
a1 a2 a3 a4
a1 b1 a3 b3
a1 b1 a3 b5
a1 a2 a3 b7
a1 b8 a3 a4

And also, if the only difference between two rows is that one row has nondistinguished variables that don’t appear anywhere else, that row can be eliminated.  So we can get rid of the first row (with b3) because it is just like the second (with b5) But then we realize that the a1b1a3b5 row only differs from the row below it in b1 and b5 which now only appear in that row.  So we can remove that row too, to get the equivalent:

A B C D
a1 a2 a3 a4
a1 a2 a3 b7
a1 b8 a3 a4

 

Reduction: From 3SAT.  Given a formula, we build 2 tableaux.  The set A will have 1 attribute for each clause and variable in the formula.  The clause variables will be distinguished variables.  Our T1 tableau will be set up as follows:

For each clause, we will set a common undistinguished variable for the 3 variables in the row (so each clause that has that variable will have the same undistinguished variable in that column), and separate (only used once) undistinguished variable in the other columns.

The T2 tableau will have 7 rows for each row in T1.  In T2 we replace the common undistinguished variables with 7 sets constants(0 or 1)  that are the ways to set the variables to make the clause true.

They prove a bunch of lemmas after this, but it boils down to: If we have a truth assignment for the formula, we can map that to the tableau by setting the common variables in T1 to the truth values in the assignment.  All clauses will be true in both T1 and T2.  If the tableaux are equivalent, then we must have found a way to set those common variables, and that gives us a truth assignment for the formula.

Difficulty: 7.  This isn’t that hard of a reduction.  Even the lemmas aren’t too hard, though they do depend on a paper’s worth of previous results in equivalences (like the functional dependency thing I did in the example).  But there is a ton of definitions to get through before you can start.

Conjunctive Boolean Query

This is another problem from the same paper as last week’s.  The paper this time though, doesn’t really give a reduction.  Hopefully, I think it’s easy enough for us to build it ourselves.

The problem: Conjunctive Boolean Query.  This is problem SR31 in the appendix.

The description: Given a conjunctive query in the format described last week, is the query true?  (That is, can we find any tuples to satisfy the query?)

Examples: Here is the “List all departments that sell pens and pencils” query from last week:

(x). Sales(x, pen) and Sales(x, pencil).

This problem would return true if there was an actual tuple in the database that could bind to x, and false otherwise.

Reduction: The paper by Chandra and Merlin that we used last week has the definition of this problem, but just says the NP-Completeness “follows from, say, the completeness of the clique problem for graphs”

But I think the reduction is pretty easy to spell out.  If we’re given an instance of Clique (a graph G and an integer K), we can just build the query:

“Does there exist k vertices x1 through xk such that all edges between any two vertices in the set exist?”

We can implement this as a database by creating a relation for all edges, and then the conjunctive query will have at most O(V2) clauses.

Dififculty: 3, but it’s a lot of work to understand the database definitions to get to the actual problem.

Conjunctive Query Foldability

Back on track with more database problems.  This is another one of those “really hard to explain in NP-Complete language” problems.

The problem: Conjunctive Query Foldability.  This is problem SR30 in the appendix.

The description: I find G&J’s definition very confusing, I think because they don’t have room to define what the various variables mean in terms of databases (they actually send you to the paper, as if they know it was confusing).  The definitions in the paper by Chandra and Merlin that has the reduction do a good job of giving names to things and building up slowly.  So:

They define a Relational Data Base as a finite domain D, and a set of relations R1 through Rs.  Each relation is an ordered tuple of elements of D.

A first-order query is of the form (x1, x2, …xk). Φ(x1, x2, ..xk) where Φ is a first-order formula whose only free variables are x1 through xk

The result of a query on a database B with domain D is a set of k-tuples {(y1..yk) ∈ Dk such that Φ(y1..yk) is true in B}.

A conjunctive query is a first-order query of the form (x1, ..xk).∃ xk+1,xk+2, …xm. A1∧A2..Ar, where each Ai is an atomic formula asking about a relation in a database, and each element in the relation is a variable form x1 to xm or a constant.

Examples of conjunctive queries: Here are some examples from the paper, so we can see how this relates to databases:

“List all departments that sell pens and pencils”.  If we have a relation Sales(Department, Item)  (“Department” and “Item” are placeholders for variables), the conjunctive query is:

(x). Sales(x, pen) and Sales(x, pencil).

In this case, there are no new variables (or ∃) added in the query, and “pen” and “pencil” are constants.

“List all second-level or higher managers in department K10” (People who are in X10 who manage someone who manages someone else).  If we have a relation Emp(Name, Salary, Manager, Dept), the conjunctive query is:

(x1).∃(x2, …, x9) .Emp(x2, x3, x4, x5) ∧Emp(x4, x6, x1, x7) ∧ Emp(x1, x8, x9, K10)

In this query, x1 is the “answer”, but depends on the existence of the other x variables.  In particular, x4, who is the employee x1 manages, and x2, who is the employee x4 manages.

Ok, now the actual foldability definition (this is written less formally than either the paper or G&J):

Given two conjunctive queries Q1 and Q2, can we create a function σ mapping all constants and variables x1 through xk to themselves (and possibly mapping other variables to constants or variables from x1 to xk as well), where if we replace each variable x  in Q1 with σ(Q1), we get Q2?

Example: Here is a “worse” version of the first example above:

(x).Sales(x, xpen) ∧ Sales(x,xpencil) ∧ xpen=pen ∧ xpencil = pencil

This can be “folded” into:

(x).Sales(x, pen) ∧ Sales(x,pencil) ∧ pen=pen ∧ pencil = pencil

..it still has those redundant equalities at the end, though, but I don’t see a way to use the definitions of foldability to get rid of them.  I think that the definition given in Chandra and Merlin’s paper might let you do it because their function maps the “natural model” of a query, which I think includes relations.

Reduction: From Graph Coloring.  We start with a graph and an integer K.  We build one variable for each vertex, and K additional (separate) vertices, in a set called C. The relational model R relates vertices connected by an edge.

Q1 is: (). \exists V.\exists C. (\bigwedge\limits_{(u,v) \in E} R(u,v) \wedge \bigwedge\limits_{(u,v)\in C, u\ne v} R(u,v))

Q2 is (). \exists C. \bigwedge\limits_{(u,v)\in C, u \ne v} R(u,v))

So what I think we’re trying to do is “fold” the variables in V into the variables in C, so that there is still a relation between each edge.  That only works if the vertices on each side of the edge are different colors because otherwise.the relation won’t show up (because of the u \ne v constraint).  But I’m not sure, and the proof in the paper stops after building this construction.

Difficulty: 9.  I’m pretty sure a lot of this is going over my head.