This is another problem where I’m going to supplement G&J’s problem description with definitions from the paper by Aho, Sagiv, and Ullman that has the reduction:
The problem: Tableau Equivalence. This is problem SR32 in the appendix.
Definitions: Given a set of attributes A, and set F of ordered pairs of subsets of A (functional dependencies). Most database queries ask for certain attributes (these are the “distinguished variables” in the G&J definition) that fill requirements defined by other attributes and values (anything new added are the “undistinguished variables” in the G&J definition).
A Tableau is a matrix that represents these attributes and variables. The columns correspond to attributes, and the rows correspond to tuples that are returned by the query. The first row of the tableau is the “summary” of the tableau and holds the distinguished variables we want to return (and possibly some constants)
Tableau example: This example comes from p. 223 of the paper. For the query:
Find all values for variables a_{1 }and a_{2} such that we can find values for variables b_{1} through b_{4} such that the following strings are all in our database instance (called “I” in the paper): By convention in the paper, a variables are distinguished variables, and b variables are undistinguished.
- a_{1}b_{1}b_{3}
- b_{2}a_{2}1 (That’s the constant 1 at the end)
- b_{2}b_{1}b_{4}
If I was the strings {111,222, 121}, then all assignments of 1’s and 2’s to a_{1} and a_{2} work:
- If a_{1 } and a_{2} are both 1, then assigning all b variables to 1 generates the string 111 in all 3 cases above, which is in I
- If a_{1} = 1 and a_{2} = 2, then we can assign 2 to b_{1} and 1’s to all over b variables and get 121 in all cases, which is in I.
Before I get to the 2 harder cases, let me show the tableau:
A | B | C |
a_{1} | a_{2} | |
a_{1} | b_{1} | b_{3} |
b_{2} | a_{2} | 1 |
b_{2} | b_{1} | b_{4} |
The first row lists the attributes we’re considering (a_{1} comes from A and only occurs in the first spot in the result string, a_{2} comes from B and only occurs in the second spot in the result string. Our query doesn’t want any variables from C. The summary lists the variables from the attributes that form our distinguished variables.
The rows below the summary show how to build legal strings (like my list above).
So now we can use the tableau to help see how to find legal values to the variables:
- If a_{1} = 2 and a_{2} = 1, then we need to assign a 1 to b_{2} to make the second(non-summary) row 121. We also need to assign a 2 to b_{1} and b_{3} to make the first row 222. This means we need to assign a 1 to b_{4} to make the bottom row 121.
- If a_{1} and a_{2} are both 2, we need to set b_{2} to 1 to make the middle row 121. We need to set b_{1} and b_{3} to 2 to make the top row 222. This means we need to set b_{4} to 4 to make the bottom row 121.
The problem: Given two Tableaux T_{1} and T_{2} which share the same A, F, X, and Y sets (as defined above), are they equivalent? That is, do they generate the same sets of legal values for their distinguished variables for all possible I sets?
Example: This gets tricky, partially because of the need to worry about “all possible I sets”, and partially because adding functional dependencies makes things equivalent in subtle ways. Here is the example from page 230 of the paper:
A | B | C | D |
a_{1} | a_{2} | a_{3} | a_{4} |
a_{1} | b_{1} | b_{2} | b_{3} |
b_{4} | b_{1} | a_{3} | b_{5} |
a_{1} | a_{2} | b_{6} | b_{7} |
a_{1} | b_{8} | b_{9} | a_{4} |
If we have the functional dependencies B->A and A->C are true. Then all strings with b_{1} in the second character must have the same value (a_{1}) in the first character. Similarly, A->C implies that the entire third column can be replaced by a_{3}. This gives us the equivalent tableau:
A | B | C | D |
a_{1} | a_{2} | a_{3} | a_{4} |
a_{1} | b_{1} | a_{3} | b_{3} |
a_{1} | b_{1} | a_{3} | b_{5} |
a_{1} | a_{2} | a_{3} | b_{7} |
a_{1} | b_{8} | a_{3} | a_{4} |
And also, if the only difference between two rows is that one row has nondistinguished variables that don’t appear anywhere else, that row can be eliminated. So we can get rid of the first row (with b_{3}) because it is just like the second (with b_{5}) But then we realize that the a_{1}b_{1}a_{3}b_{5} row only differs from the row below it in b_{1} and b_{5} which now only appear in that row. So we can remove that row too, to get the equivalent:
A | B | C | D |
a_{1} | a_{2} | a_{3} | a_{4} |
a_{1} | a_{2} | a_{3} | b_{7} |
a_{1} | b_{8} | a_{3} | a_{4} |
Reduction: From 3SAT. Given a formula, we build 2 tableaux. The set A will have 1 attribute for each clause and variable in the formula. The clause variables will be distinguished variables. Our T_{1} tableau will be set up as follows:
For each clause, we will set a common undistinguished variable for the 3 variables in the row (so each clause that has that variable will have the same undistinguished variable in that column), and separate (only used once) undistinguished variable in the other columns.
The T_{2} tableau will have 7 rows for each row in T_{1}. In T_{2} we replace the common undistinguished variables with 7 sets constants(0 or 1) that are the ways to set the variables to make the clause true.
They prove a bunch of lemmas after this, but it boils down to: If we have a truth assignment for the formula, we can map that to the tableau by setting the common variables in T_{1} to the truth values in the assignment. All clauses will be true in both T_{1} and T_{2}. If the tableaux are equivalent, then we must have found a way to set those common variables, and that gives us a truth assignment for the formula.
Difficulty: 7. This isn’t that hard of a reduction. Even the lemmas aren’t too hard, though they do depend on a paper’s worth of previous results in equivalences (like the functional dependency thing I did in the example). But there is a ton of definitions to get through before you can start.