Tag Archives: SR35

Consistency of Database Frequency Tables

The end of the Storage and Retrieval section is in sight!

The problem: Consistency of Database Frequency Tables.  This is problem SR35 in the appendix.

The description: I find G&J’s definition confusing, so this definition borrows a lot from the paper by Reiss that has the reduction. 

We have a set of “characteristics” (attributes) A.  Each attribute a in A has a domain Da.  A database is a set of “Objects” (tuples) O1..On where each object defines a value for each characteristic. (The result is a two-dimensional table where rows are objects and columns are attributes). We can define a frequency table  for each pair of attributes a and b in A.  The table has |Da| rows and |Db| columns, and each entry (x,y)  is “supposed” to represent the number of tuples in O that have x for its A attribute and y for its B attribute.

What the problem is asking is: Given a set of tables and a database table V, can we find a way to map the attributes in A to the tables such that the tables actually represent the frequencies in the database?

Example:
Since you need a frequency table for each pair of attributes, here is an example with 3 attributes, each taking 2 posible values.  Attribute a’s domain is {0,1}, b’s is {a,b}, and c’s is {>, <}.  Our set of objects is:

  • (0,a,>)
  • (0,b,>)
  • (1,b,>)
  • (0,b,<)

If we are handed a set of 3 frequency tables:

C1 vs C2:

1 0
2 1

C1 vs C3:

0 1
1 2

C2 vs C3:

1 2
0 1

These frequency tables are accurate if C1 is attribute a, C2 is attribute b, and C3 is attribute c.

The reduction: From 3SAT.  (Actually, this might be from regular CNF-SAT,  but there is no reason not to restrict the input to 3 literals per claue).  We take a SAT instance with p clauses and q variables and bake a database with 2 objects per variable (one for positive, one for negative) and p*q extra objects. We have one attribute per clause (“CLi“), one attribute per variable (“VRi“) and “VALUE” (which holds the truth value of the object in the final setting).  The frequency tables are set up to ensure that each variable has one value setting(true or false) and each clause is made true.  The way to make the database consistent with the table is to find a way to map the variables in the database to the tables in a way to make the formula satisfiable.

Difficulty: 5, because the reduction isn’t that hard to follow, or come up with, once you get how the frequency tables work.  But unlike most 5’s, I don’t think I’d assign it as a homework problem, because of how much extra work it would take to explain the frequency tables in the first place.