Category Archives: Appendix: Storage and Retrieval

Safety of File Protection Systems

This is the last problem in the “Storage and Retrieval” section of the appendix and uses the Linear Bounded Automaton Acceptance problem from last week. So, since LBA acceptance was “not known to be in NP”, this problem is as well.

The problem: Safety of File Protection Systems.  This is problem SR36 in the appendix.

The description: We’re given:

  • A set R of “generic rights” (intuitively, these are things like “read access” or “write access”)
  • A set O of “objects”
  • A set S ⊆ O of “subjects”
  • A matrix P defining the set of rights for each combination of set and objects.  That is: for all s∈S and o∈O, P[s][o] defines the rights (a subset of R)  that subject s has on object o.
  • A set C of “commands”  Each command takes a set of parameters X1 through Xm, where each Xi is an object (and many are subjects as well).
  • The commands begin with a series of if statements: “if ri ∈ P[Xsi][Xoi]”, where “si” and “oi” are numbers from 1 to m.  So the if statement is asking if a specific right exists for a specific pair of parameters to the command.  The command will have several of these if statements connected by “and” statements.
  • If all of the clauses in the if statement is true then a list of several operations can happen.  The two operations that can happen are “enter a right into P” or “delete a right from P”.   I think you can also delete objects and subjects, but you can’t create them (in the version of the problem we’re talking about), so I don’t know if it matters

Can we find a sequence of commands in C where at some point in the sequence after executing a command, some set P[s][o] has a right r’ that it previously did not have?

Example: It’s pretty easy to come up with an example that is unsafe- just create a function that gives a right to someone.  I think where it gets harder is when you deal with operations that have sets of commands.  Here is an example from the paper by Harrison, Ruzzo, and Ullman that discusses this problem:

command IREAD(s1, s2, o){  
  if "read" ∈(s2, o) and
     "iread" ∈ (s1, s2)
     enter "read" into (s1, o)
     delete "read" from (s1, o)

The command “iread” is meant to stand for “indirect read”.   So what this is saying is that s2 can read 0, and s1 has the rights to read what s2 reads.  So s1 gets (temporary) permission to read 0.  Presumably, some time passes in between the granting of the read permission to s1 and the removal of that right, during which the data in o gets read by s1.  Notice that this is a safe procedure because, by the end, no new rights have been granted.

Reduction: The paper referenced above shows that a system in which we also have operations that can create objects and subjects is undecidable.  They do this by showing how this system can simulate an arbitrary Turing Machine.  The reduction turns each transition into a command in the system: The current state, the current cell the head is over, and the contents of that cell are implemented as rights, and then state transitions can be seen as commands: The “if” parts of the command check that the machine is in the correct configuration, and the “then” parts change rights in a way that simulates the moving of the head, the output to the tape, and the transition to a new state.   Importantly for our purposes, the only time these commands use the “create” commands (that are in the undecidable version of the problem, but not in ours) is when the Turing Machine moves the head into a previously unexplored area of the tape.

They then say that in our problem, which doesn’t have “create” commands, a simulation “similar to that used” in the undecidability reduction can also be used to convert an LBA into a set of commands, since an LBA won’t be moving its head arbitrarily into new space.  I think the LBA still is allowed to use some new space though, but I guess that since it is bounded by a polynomial in the input size, we can simulate that by having each cell get it’s own access rights, and that keeps us with a polynomial-sized reduction.

Difficulty: 8.  This is a hard problem to understand, and the reduction is done in a non-standard way (and depends on the LBA acceptance reduction which is also done in a non-standard way), so it may throw some people for a loop.  It is a cool example of showing non-standard ways to reduce things though if that’s what you’re looking for.

Consistency of Database Frequency Tables

The end of the Storage and Retrieval section is in sight!

The problem: Consistency of Database Frequency Tables.  This is problem SR35 in the appendix.

The description: I find G&J’s definition confusing, so this definition borrows a lot from the paper by Reiss that has the reduction. 

We have a set of “characteristics” (attributes) A.  Each attribute a in A has a domain Da.  A database is a set of “Objects” (tuples) O1..On where each object defines a value for each characteristic. (The result is a two-dimensional table where rows are objects and columns are attributes). We can define a frequency table  for each pair of attributes a and b in A.  The table has |Da| rows and |Db| columns, and each entry (x,y)  is “supposed” to represent the number of tuples in O that have x for its A attribute and y for its B attribute.

What the problem is asking is: Given a set of tables and a database table V, can we find a way to map the attributes in A to the tables such that the tables actually represent the frequencies in the database?

Since you need a frequency table for each pair of attributes, here is an example with 3 attributes, each taking 2 posible values.  Attribute a’s domain is {0,1}, b’s is {a,b}, and c’s is {>, <}.  Our set of objects is:

  • (0,a,>)
  • (0,b,>)
  • (1,b,>)
  • (0,b,<)

If we are handed a set of 3 frequency tables:

C1 vs C2:

1 0
2 1

C1 vs C3:

0 1
1 2

C2 vs C3:

1 2
0 1

These frequency tables are accurate if C1 is attribute a, C2 is attribute b, and C3 is attribute c.

The reduction: From 3SAT.  (Actually, this might be from regular CNF-SAT,  but there is no reason not to restrict the input to 3 literals per claue).  We take a SAT instance with p clauses and q variables and bake a database with 2 objects per variable (one for positive, one for negative) and p*q extra objects. We have one attribute per clause (“CLi“), one attribute per variable (“VRi“) and “VALUE” (which holds the truth value of the object in the final setting).  The frequency tables are set up to ensure that each variable has one value setting(true or false) and each clause is made true.  The way to make the database consistent with the table is to find a way to map the variables in the database to the tables in a way to make the formula satisfiable.

Difficulty: 5, because the reduction isn’t that hard to follow, or come up with, once you get how the frequency tables work.  But unlike most 5’s, I don’t think I’d assign it as a homework problem, because of how much extra work it would take to explain the frequency tables in the first place.

Protected: Safety of Database Transaction Systems

This content is password protected. To view it please enter your password below:

Serializability of Database Histories

This is another problem with a cool elegant reduction once you get past the baggage you need to know to understand the problem.  This database section seems to be full of problems like these.

The problem: Serializability of Database Histories.  This is problem SR33 in the appendix.

The description: We have a set V of variables in our database, and a set T of transactions, where each transaction i has a read operation (Ri) that reads some subset of V, and a write operation Wi that writes some (possibly different) subset of V.  We’re also given a “history” H of T, which permutes the order of all reads and writes maintaining the property that for all i, Ri comes before Wi in the history.  Think of this as a set of parallel transactions that reach a central database.  H is the order the database processes these operations.

Can we find a serial history H’ of T, with the following properties:

  • Each Ri occurs immediately before its corresponding Wi.
  • A “live transaction” is a transaction (Ri, Wi) where either Wi is the last time a variable is written before the Rj of some other live transaction or the last time the variable is written at all.  The set of live transactions in H and H’ needs to be the same.
  • For each pair of live transactions (Ri, Wi) and (Rj, Wj), for any variable v in Wi∩Rj, Wi is the last write set to contain v before Rj in H if and only if Wi is the last write set to contain v before Rj in H’.  The paper says that this means transaction j “reads from” (or “reads v from”) transaction i.

Example: The paper by Papadimitriou, Bernstein, and Rothnie that has the reduction has a good simple example of a non-serializable history:

H= <R1, R2, W2, W1>, where R1 and W2 access a variable x, and R2 and W1 access a variable y.  Both transactions are live since they both write their variables for the last time.  Notice that neither transaction reads any variable.  But the two possible candidates for H’ are: <R1, W1, R2, W2> (where R2 reads the y written by W1) and <R2, W2, R1, W1> (where R1 reads the x written by W2), so neither H’ candidate has the same set of transactions reading variables from each other.

Reduction: Is from Non-Circular Satisfiability.  Given a formula, they generate a “polygraph” of a database history.  A polygraph (N,A,B) is a directed graph (N,A) along with a set B of “bipaths” (paths that are 2 edges long).  If a bipath{(v,u), (u,w)} is in B, then the edge (w,v) is in A.  So, if a bipath exists in B from v to w, then an edge in A exists from w back to v.  This means that we can view a polygraph (N,A,B) as a family of directed graphs.  Each directed graph in the family has the same vertices and an edge set A’ that is a superset of A and contains at least one edge in each bipath in B. They define an acyclic polygraph as a polygraph (represented as a family of directed graphs) where at least one directed graph in the family is acyclic.

In the paper, they relate database histories to polygraphs by letting the vertex set N bet a set of live transactions.  We build edges in A (u,v) from transactions that write a variable (vertex u)  to transactions that read the same variable (vertex v).  If some other vertex w also has that variable in their read set then the bipath {(v,w), (w,u)} exists in B.  So edges (u,v) in A mean that u “happens before” v since u writes a variable that v reads.  A bipath {(v,w), (w,u)} means that w also reads the same variable, so must happen before u or after v.  They show that a history is serializable if and only if the polygraph for the history is acyclic.

So, given a formula, they build a polygraph that is acyclic if and only if the formula is satisfiable.  The polygraph will have 3 vertices (aj, bj, and cj) for each variable xj in the formula.  Each a vertex connects by an edge in A to its corresponding B vertex.  We also have a bipath in B from the b vertex through the corresponding c vertex back to the a vertex.

Each literal Cik  (literal #k of clause i) generates two vertices yik and zik.  We add edges in A from each yik to zi(k+1)mod 3  (in other words, the y vertex of each clause connects to the “next” z vertex, wrapping around if necessary).  If literal Cik is a positive occurrence of variable Xj, we add edges (cj, yik) and (bj, zik) to A, and the bipath {(zik, yik), (yik, bj)} to B.  If the literal is negative, we instead add (zik, cj) to A and {(aj, zik), (zik, yik)} to B.

If the polygraph is acyclic (and thus the history it represents is serializable), then there is some acyclic digraph in the family of directed graphs related to the polygraph.  So the bipath {(bj, cj), (cj, aj)} will have either the first edge from b-c (which we will represent as “false”) or will have the second edge from c-a (which we will represent as “true”).  (The directed graph can’t have both because its edges are a superset of A, which means it has the edge (aj, bj) and taking both halves of he bipath will cause a cycle).

If our acyclic directed graph has the “false” (b-c) version of the edge for a literal, then it also has to have the z-y edge of the bipath associated with the literal (otherwise there is a cycle).  If all of the literals in a clause were set to false, this would cause a cycle between these bipath edges and the y-z edges we added in A for each clause.  So at least one literal per clause must be true, which gives us a way to satisfy the formula.

If the formula is satisfiable, then build the acyclic digraph that starts with all of A, and takes the bipath edges corresponding to the truth value of each variable, as defined above.  This implies ways you need to take the edges from the bipaths for the literals, to avoid cycles.  The only way now for the graph to be acyclic is for there to be a cycle of x’s and y’s in the edges and bipath edges.  But that would imply that we’ve set all of the literals in a clause to false. Since we know that the clause can be made true (since the original formula is satisfiable), we know that a way exists to make the directed graph acyclic.

Difficulty: 7.  It takes a lot of baggage to get to the actual reduction here, but once you do, I think it’s pretty easy and cool to see how the cycles arise from the definitions of the graphs and from the formula.

Protected: Tableau Equivalence

This content is password protected. To view it please enter your password below:

Protected: Conjunctive Boolean Query

This content is password protected. To view it please enter your password below:

Protected: Conjunctive Query Foldability

This content is password protected. To view it please enter your password below:

Protected: Additional Key

This content is password protected. To view it please enter your password below:

Protected: Boyce-Codd Normal Form Violation

This content is password protected. To view it please enter your password below:

Protected: Prime Attribute Name

This content is password protected. To view it please enter your password below: