Tag Archives: Difficulty 7

Serializability of Database Histories

This is another problem with a cool elegant reduction once you get past the baggage you need to know to understand the problem.  This database section seems to be full of problems like these.

The problem: Serializability of Database Histories.  This is problem SR33 in the appendix.

The description: We have a set V of variables in our database, and a set T of transactions, where each transaction i has a read operation (Ri) that reads some subset of V, and a write operation Wi that writes some (possibly different) subset of V.  We’re also given a “history” H of T, which permutes the order of all reads and writes maintaining the property that for all i, Ri comes before Wi in the history.  Think of this as a set of parallel transactions that reach a central database.  H is the order the database processes these operations.

Can we find a serial history H’ of T, with the following properties:

  • Each Ri occurs immediately before its corresponding Wi.
  • A “live transaction” is a transaction (Ri, Wi) where either Wi is the last time a variable is written before the Rj of some other live transaction or the last time the variable is written at all.  The set of live transactions in H and H’ needs to be the same.
  • For each pair of live transactions (Ri, Wi) and (Rj, Wj), for any variable v in Wi∩Rj, Wi is the last write set to contain v before Rj in H if and only if Wi is the last write set to contain v before Rj in H’.  The paper says that this means transaction j “reads from” (or “reads v from”) transaction i.

Example: The paper by Papadimitriou, Bernstein, and Rothnie that has the reduction has a good simple example of a non-serializable history:

H= <R1, R2, W2, W1>, where R1 and W2 access a variable x, and R2 and W1 access a variable y.  Both transactions are live since they both write their variables for the last time.  Notice that neither transaction reads any variable.  But the two possible candidates for H’ are: <R1, W1, R2, W2> (where R2 reads the y written by W1) and <R2, W2, R1, W1> (where R1 reads the x written by W2), so neither H’ candidate has the same set of transactions reading variables from each other.

Reduction: Is from Non-Circular Satisfiability.  Given a formula, they generate a “polygraph” of a database history.  A polygraph (N,A,B) is a directed graph (N,A) along with a set B of “bipaths” (paths that are 2 edges long).  If a bipath{(v,u), (u,w)} is in B, then the edge (w,v) is in A.  So, if a bipath exists in B from v to w, then an edge in A exists from w back to v.  This means that we can view a polygraph (N,A,B) as a family of directed graphs.  Each directed graph in the family has the same vertices and an edge set A’ that is a superset of A and contains at least one edge in each bipath in B. They define an acyclic polygraph as a polygraph (represented as a family of directed graphs) where at least one directed graph in the family is acyclic.

In the paper, they relate database histories to polygraphs by letting the vertex set N bet a set of live transactions.  We build edges in A (u,v) from transactions that write a variable (vertex u)  to transactions that read the same variable (vertex v).  If some other vertex w also has that variable in their read set then the bipath {(v,w), (w,u)} exists in B.  So edges (u,v) in A mean that u “happens before” v since u writes a variable that v reads.  A bipath {(v,w), (w,u)} means that w also reads the same variable, so must happen before u or after v.  They show that a history is serializable if and only if the polygraph for the history is acyclic.

So, given a formula, they build a polygraph that is acyclic if and only if the formula is satisfiable.  The polygraph will have 3 vertices (aj, bj, and cj) for each variable xj in the formula.  Each a vertex connects by an edge in A to its corresponding B vertex.  We also have a bipath in B from the b vertex through the corresponding c vertex back to the a vertex.

Each literal Cik  (literal #k of clause i) generates two vertices yik and zik.  We add edges in A from each yik to zi(k+1)mod 3  (in other words, the y vertex of each clause connects to the “next” z vertex, wrapping around if necessary).  If literal Cik is a positive occurrence of variable Xj, we add edges (cj, yik) and (bj, zik) to A, and the bipath {(zik, yik), (yik, bj)} to B.  If the literal is negative, we instead add (zik, cj) to A and {(aj, zik), (zik, yik)} to B.

If the polygraph is acyclic (and thus the history it represents is serializable), then there is some acyclic digraph in the family of directed graphs related to the polygraph.  So the bipath {(bj, cj), (cj, aj)} will have either the first edge from b-c (which we will represent as “false”) or will have the second edge from c-a (which we will represent as “true”).  (The directed graph can’t have both because its edges are a superset of A, which means it has the edge (aj, bj) and taking both halves of he bipath will cause a cycle).

If our acyclic directed graph has the “false” (b-c) version of the edge for a literal, then it also has to have the z-y edge of the bipath associated with the literal (otherwise there is a cycle).  If all of the literals in a clause were set to false, this would cause a cycle between these bipath edges and the y-z edges we added in A for each clause.  So at least one literal per clause must be true, which gives us a way to satisfy the formula.

If the formula is satisfiable, then build the acyclic digraph that starts with all of A, and takes the bipath edges corresponding to the truth value of each variable, as defined above.  This implies ways you need to take the edges from the bipaths for the literals, to avoid cycles.  The only way now for the graph to be acyclic is for there to be a cycle of x’s and y’s in the edges and bipath edges.  But that would imply that we’ve set all of the literals in a clause to false. Since we know that the clause can be made true (since the original formula is satisfiable), we know that a way exists to make the directed graph acyclic.

Difficulty: 7.  It takes a lot of baggage to get to the actual reduction here, but once you do, I think it’s pretty easy and cool to see how the cycles arise from the definitions of the graphs and from the formula.

Tableau Equivalence

This is another problem where I’m going to supplement G&J’s problem description with definitions from the paper by Aho, Sagiv, and Ullman that has the reduction:

The problem: Tableau Equivalence.  This is problem SR32 in the appendix.

Definitions: Given a set of attributes A, and set F of ordered pairs of subsets of A (functional dependencies).  Most database queries ask for certain attributes (these are the “distinguished variables” in the G&J definition) that fill requirements defined by other attributes and values (anything new added are the “undistinguished variables” in the G&J definition).

A Tableau is a matrix that represents these attributes and variables.  The columns correspond to attributes, and the rows correspond to tuples that are returned by the query.   The first row of the tableau is the “summary” of the tableau and holds the distinguished variables we want to return (and possibly some constants)

Tableau example: This example comes from p. 223 of the paper.  For the query:

Find all values for variables a1 and a2 such that we can find values for variables b1 through b4 such that the following strings are all in our database instance (called “I” in the paper):  By convention in the paper, a variables are distinguished variables, and b variables are undistinguished.

  • a1b1b3
  • b2a21  (That’s the constant 1 at the end)
  • b2b1b4

If I was the strings {111,222, 121}, then all assignments of 1’s and 2’s to a1 and a2 work:

  • If a1 and a2 are both 1, then assigning all b variables to 1 generates the string 111 in all 3 cases above, which is in I
  • If a1 = 1 and a2 = 2, then we can assign 2 to b1 and 1’s to all over b variables and get 121 in all cases, which is in I.

Before I get to the 2 harder cases, let me show the tableau:

A B C
a1 a2
a1 b1 b3
b2 a2 1
b2 b1 b4

The first row lists the attributes we’re considering (a1 comes from A and only occurs in the first spot in the result string, a2 comes from B and only occurs in the second spot in the result string.  Our query doesn’t want any variables from C.  The summary lists the variables from the attributes that form our distinguished variables.

The rows below the summary show how to build legal strings (like my list above).

So now we can use the tableau to help see how to find legal values to the variables:

  • If a1 = 2 and a2 = 1, then we need to assign a 1 to b2 to make the second(non-summary) row 121.  We also need to assign a 2 to b1 and b3 to make the first row 222.  This means we need to assign a 1 to b4 to make the bottom row 121.
  • If a1 and a2 are both 2, we need to set b2 to 1 to make the middle row 121.  We need to set b1 and b3 to 2 to make the top row 222.  This means we need to set b4 to 4 to make the bottom row 121.

The problem: Given two Tableaux T1 and T2 which share the same A, F, X, and Y sets (as defined above), are they equivalent?  That is, do they generate the same sets of legal values for their distinguished variables for all possible I sets?

Example: This gets tricky, partially because of the need to worry about “all possible I sets”, and partially because adding functional dependencies makes things equivalent in subtle ways.  Here is the example from page 230 of the paper:

A B C D
a1 a2 a3 a4
a1 b1 b2 b3
b4 b1 a3 b5
a1 a2 b6 b7
a1 b8 b9 a4

If we have the functional dependencies B->A and A->C are true.  Then all strings with b1 in the second character must have the same value (a1) in the first character.  Similarly, A->C implies that the entire third column can be replaced by a3.  This gives us the equivalent tableau:

A B C D
a1 a2 a3 a4
a1 b1 a3 b3
a1 b1 a3 b5
a1 a2 a3 b7
a1 b8 a3 a4

And also, if the only difference between two rows is that one row has nondistinguished variables that don’t appear anywhere else, that row can be eliminated.  So we can get rid of the first row (with b3) because it is just like the second (with b5) But then we realize that the a1b1a3b5 row only differs from the row below it in b1 and b5 which now only appear in that row.  So we can remove that row too, to get the equivalent:

A B C D
a1 a2 a3 a4
a1 a2 a3 b7
a1 b8 a3 a4

 

Reduction: From 3SAT.  Given a formula, we build 2 tableaux.  The set A will have 1 attribute for each clause and variable in the formula.  The clause variables will be distinguished variables.  Our T1 tableau will be set up as follows:

For each clause, we will set a common undistinguished variable for the 3 variables in the row (so each clause that has that variable will have the same undistinguished variable in that column), and separate (only used once) undistinguished variable in the other columns.

The T2 tableau will have 7 rows for each row in T1.  In T2 we replace the common undistinguished variables with 7 sets constants(0 or 1)  that are the ways to set the variables to make the clause true.

They prove a bunch of lemmas after this, but it boils down to: If we have a truth assignment for the formula, we can map that to the tableau by setting the common variables in T1 to the truth values in the assignment.  All clauses will be true in both T1 and T2.  If the tableaux are equivalent, then we must have found a way to set those common variables, and that gives us a truth assignment for the formula.

Difficulty: 7.  This isn’t that hard of a reduction.  Even the lemmas aren’t too hard, though they do depend on a paper’s worth of previous results in equivalences (like the functional dependency thing I did in the example).  But there is a ton of definitions to get through before you can start.

Protected: Expected Retrieval Cost

This content is password protected. To view it please enter your password below:

Protected: Dynamic Storage Allocation

This content is password protected. To view it please enter your password below:

Protected: Minimum Separating Set

This content is password protected. To view it please enter your password below:

Protected: Minimum Test Set

This content is password protected. To view it please enter your password below:

Protected: Intersection Graph For Segments On A Grid

This content is password protected. To view it please enter your password below:

Protected: Minimizing Dummy Activities in PERT Networks

This content is password protected. To view it please enter your password below:

Protected: Undirected Two-Commodity Integral Flow

This content is password protected. To view it please enter your password below:

Protected: Directed Two-Commodity Integral Flow

This content is password protected. To view it please enter your password below: