Tag Archives: Serializability of Database Hstories

Safety of Database Transaction Systems

This problem is related to last week’s but turns out to be much harder.

The problem: Safety of Database Transaction Systems.  This is problem SR34 in the appendix.

The description: Given a set of variables V and transactions T as defined in the Serializability of Database Histories problem, is every history H for T equivalent to some serial history?

Example: This is an extension of last week’s problem.  So last week’s example which produced a history that is not serializable means that that set of transactions is not safe.

The easiest way to produce transactions that are safe is to make them access different variables: Suppose I have 2 transactions (R1, W1) and (R2, W2).  If the first transaction reads and writes a variable x and the second transaction reads and writes a variable y, then any ordering of those transactions will be serializable.

Reduction: It’s in the same paper by Papadimitriou,  Bernstein, and Rothnie as the Serializability of Database History problem was.  It’s interesting that they couldn’t show that the problem was in NP.

They reduce from Hitting Set.  They show in the paper how to take a transaction system and build a graph where there is one vertex for each read and write operation in each transaction, and edges between the two operations in a transaction.  There are also edges between operations Ri and Wj or Wi and Wj  if those operations share a variable.  These edges show places where changing the order changes the meaning of a transaction history.  They show that a transaction system is safe if and only if the graph has no cycles containing a (Rj, Wj) edge.  (Note that means that cycles can exist as long as they contain only W-W edges)

So, given an instance of hitting set, (a set S and a collection C of subsets of S), they build a transaction graph: One read vertex for each set in S, plus one write vertex at the end.  Between read vertices Ri and Ri+1 we add |Ci| edges (or, so we still have a simple graph, |Ci| paths containing vertices that don’t appear anyplace else).  At the end of this chain of paths is a single W vertex, with an edge back to R1.  The only unsafe cycle now starts at R1, goes through one of the paths connecting each R vertex, goes to the final W vertex, and then back to R1.

So far, so good.  But then they lose me when they say “We can embed the hitting set problem –among others– in safety by forcing (by the use of sets of singletons) each such choice to correspond to a hitting set”.  I think what they’re saying is that they will create a set of variables corresponding to sets in C such that an unsafe cycle exists if and only if S has a hitting set.  But I’m not sure how they get there- especially in polynomial time.  I’m sure there’s a way, but it reads like “we set it such that it all works”, which isn’t convincing to me.

Difficulty: 9, because I don’t see how they do that last step.  I’m sure a good explanation exists that would make this less difficult. I’ll also say that the reduction also says the transaction system is unsafe “if and only if there exists an unsafe path–and therefore a hitting set”.  Which sounds like a Co-NP proof to me.  I’m probably missing something.

Serializability of Database Histories

This is another problem with a cool elegant reduction once you get past the baggage you need to know to understand the problem.  This database section seems to be full of problems like these.

The problem: Serializability of Database Histories.  This is problem SR33 in the appendix.

The description: We have a set V of variables in our database, and a set T of transactions, where each transaction i has a read operation (Ri) that reads some subset of V, and a write operation Wi that writes some (possibly different) subset of V.  We’re also given a “history” H of T, which permutes the order of all reads and writes maintaining the property that for all i, Ri comes before Wi in the history.  Think of this as a set of parallel transactions that reach a central database.  H is the order the database processes these operations.

Can we find a serial history H’ of T, with the following properties:

  • Each Ri occurs immediately before its corresponding Wi.
  • A “live transaction” is a transaction (Ri, Wi) where either Wi is the last time a variable is written before the Rj of some other live transaction or the last time the variable is written at all.  The set of live transactions in H and H’ needs to be the same.
  • For each pair of live transactions (Ri, Wi) and (Rj, Wj), for any variable v in Wi∩Rj, Wi is the last write set to contain v before Rj in H if and only if Wi is the last write set to contain v before Rj in H’.  The paper says that this means transaction j “reads from” (or “reads v from”) transaction i.

Example: The paper by Papadimitriou, Bernstein, and Rothnie that has the reduction has a good simple example of a non-serializable history:

H= <R1, R2, W2, W1>, where R1 and W2 access a variable x, and R2 and W1 access a variable y.  Both transactions are live since they both write their variables for the last time.  Notice that neither transaction reads any variable.  But the two possible candidates for H’ are: <R1, W1, R2, W2> (where R2 reads the y written by W1) and <R2, W2, R1, W1> (where R1 reads the x written by W2), so neither H’ candidate has the same set of transactions reading variables from each other.

Reduction: Is from Non-Circular Satisfiability.  Given a formula, they generate a “polygraph” of a database history.  A polygraph (N,A,B) is a directed graph (N,A) along with a set B of “bipaths” (paths that are 2 edges long).  If a bipath{(v,u), (u,w)} is in B, then the edge (w,v) is in A.  So, if a bipath exists in B from v to w, then an edge in A exists from w back to v.  This means that we can view a polygraph (N,A,B) as a family of directed graphs.  Each directed graph in the family has the same vertices and an edge set A’ that is a superset of A and contains at least one edge in each bipath in B. They define an acyclic polygraph as a polygraph (represented as a family of directed graphs) where at least one directed graph in the family is acyclic.

In the paper, they relate database histories to polygraphs by letting the vertex set N bet a set of live transactions.  We build edges in A (u,v) from transactions that write a variable (vertex u)  to transactions that read the same variable (vertex v).  If some other vertex w also has that variable in their read set then the bipath {(v,w), (w,u)} exists in B.  So edges (u,v) in A mean that u “happens before” v since u writes a variable that v reads.  A bipath {(v,w), (w,u)} means that w also reads the same variable, so must happen before u or after v.  They show that a history is serializable if and only if the polygraph for the history is acyclic.

So, given a formula, they build a polygraph that is acyclic if and only if the formula is satisfiable.  The polygraph will have 3 vertices (aj, bj, and cj) for each variable xj in the formula.  Each a vertex connects by an edge in A to its corresponding B vertex.  We also have a bipath in B from the b vertex through the corresponding c vertex back to the a vertex.

Each literal Cik  (literal #k of clause i) generates two vertices yik and zik.  We add edges in A from each yik to zi(k+1)mod 3  (in other words, the y vertex of each clause connects to the “next” z vertex, wrapping around if necessary).  If literal Cik is a positive occurrence of variable Xj, we add edges (cj, yik) and (bj, zik) to A, and the bipath {(zik, yik), (yik, bj)} to B.  If the literal is negative, we instead add (zik, cj) to A and {(aj, zik), (zik, yik)} to B.

If the polygraph is acyclic (and thus the history it represents is serializable), then there is some acyclic digraph in the family of directed graphs related to the polygraph.  So the bipath {(bj, cj), (cj, aj)} will have either the first edge from b-c (which we will represent as “false”) or will have the second edge from c-a (which we will represent as “true”).  (The directed graph can’t have both because its edges are a superset of A, which means it has the edge (aj, bj) and taking both halves of he bipath will cause a cycle).

If our acyclic directed graph has the “false” (b-c) version of the edge for a literal, then it also has to have the z-y edge of the bipath associated with the literal (otherwise there is a cycle).  If all of the literals in a clause were set to false, this would cause a cycle between these bipath edges and the y-z edges we added in A for each clause.  So at least one literal per clause must be true, which gives us a way to satisfy the formula.

If the formula is satisfiable, then build the acyclic digraph that starts with all of A, and takes the bipath edges corresponding to the truth value of each variable, as defined above.  This implies ways you need to take the edges from the bipaths for the literals, to avoid cycles.  The only way now for the graph to be acyclic is for there to be a cycle of x’s and y’s in the edges and bipath edges.  But that would imply that we’ve set all of the literals in a clause to false. Since we know that the clause can be made true (since the original formula is satisfiable), we know that a way exists to make the directed graph acyclic.

Difficulty: 7.  It takes a lot of baggage to get to the actual reduction here, but once you do, I think it’s pretty easy and cool to see how the cycles arise from the definitions of the graphs and from the formula.