Tag Archives: Difficulty 8

Boyce-Codd Normal Form Violation

This week’s and next week’s problems use basically the same construction in their reduction.  We’re going to do them in reverse order, because that’s the order they’re presented in the paper.

The problem: Boyce-Codd Normal Form Violation.  This is problem SR29 in the appendix.

The description: Given a set of attributes A, a collection F of functional dependencies, and a subset A’ of A, does A’ violate Boyce-Codd normal form?

G&J’s definition of BCNF is: A’ violates BCNF if we can find a subset X of A’ and 2 attributes y and z in A’-X such that the pair (X,y) is in the closure of F, and (X,z) is not in the closure of F.

The paper by Beeri and Bernstein define BCNF as: for all disjoint nonempty sets of attributes X and Y in A, If Y is functionally dependent on X, then all attributes in A depend on X.

Example: Here is an example from the Beeri and Bernstein paper that has the reduction:

Let A = {Apartment_Type, Address, Size, Landlord, Apartment_Num, Rent, Tenant_Name}

(I’m pretty sure “Apartment_Type” means something like “Two Bedroom”, “Size” is in square feet, “Address” is the street address of the apartment building, at which there are potentially many apartments each with a different “Apt_Num”)

With these functional dependencies:

  • ({Apartment_Type, Address}, {Size})
  • ({Address}, {Landlord})
  • ({Address, Apt_Num},{Rent})
  • ({Name}, {Address, Apt_Num})

If A’ is {Name, Address, Apt_Num, Rent} then BCNF is violated because Rent is functionally dependent on Name (through the pair of attributes {Address, Apartment_Num}, but other attributes (for example, Size) is not.

Reduction: Beeri and Bernstein reduce from Hitting Set.  So we start with a set S, and a collection C of subsets of S, and an integer K.  We need to create a set of attributes and functional dependencies.

Each element in S will be an attribute in A.  If we have a hitting set S’ of the attributes, then we could create a BCNF violation by making some of the attributes (but not all) of A functionally dependent on S’.

So we create a new attribute for each subset in C.  We create dependencies between the element attribute of each element in each C and its “set” attribute.  Now if we have a hitting set S’, then that set is functionally dependent on each element attribute.  So then if we make some but not all attributes in A dependent on the set of all element attributes, we get a BCNF violation.

That’s the basic idea, but the actual proof has a lot of special cases to worry about and resolve.

Difficulty: 8.  The general idea isn’t too hard (though you have to learn the definition of BCNF to even get started), but there is a ton of details and special cases to worry about to do it right.

Minimum Cardinality Key

Now we enter the “Database Problems” subsection of the chapter.  I’ll admit that I’ve had trouble with a lot of these, partially because databases isn’t an area I know a ton about, but also because the definitions of the problems often obscure what’s really happening.

The problem: Minimum Cardinality Key.  This is problem SR26 in the appendix.

The description: Given a set A of attributes, a collection F of ordered pairs of subsets of A (“functional dependencies”), and a positive integer M, can we find a set K of A attributes or less that form a key for the relational system <A,F>.

G&J defines “form a key” as the ordered pair (K,A) belongs to the closure F* of F, with the following properties:

  • F ⊆ F*
  • B⊆C⊆A implies (C,B) ⊆ F*  (the “projective” closure)
  • (B,C) and (C,D) ⊆ F* implies (B,D) ⊆ F*  (the “transitive” closure)
  • (B.C) and (B,D) ⊆ F* implies (B,C∪D) ⊆ F* (the “additive” closure)

Example: The paper by Lucchesi and Osborn that has the reduction has a good example:

Suppose we have a database table for student records.  The attribute set A could be {Name, ID, Course, Professor, Time}.  Some functional dependencies, written as ordered pairs, could be:

  • ({Name},{ID})  (If you know a student’s name, you can figure out their ID)
  • ({ID},{Name})  (If you know a student’s ID, you can figure out their name)
  • ({Name, Course}, {Professor, Time}) (If you know a student’s name and a course they are in, you can figure out the professor teaching it and what time the course is)
  • ({Professor, Time), {Course}) (If you know the professor of the course and the time it is being taught, you can figure out the course being taught.  Note that you can’t figure out the student because more than one student will be in the course)

If these relations are F, then we can derive all of the attributes just from knowing the name of the student and the course they are taking.  In other words, F* contains the pair ({name, course}, (name, ID, course, professor, time).

Reduction:  The Lucchesi and Osborn paper use different terminology than G&J’s.  They define a relation D, defined in a series of inductive steps, where D[0] is F, and each D[i] is created from D[i-1] by doing one step of one of the closures on the elements in D[i-1].  The final closure (what G&J call F*) they call D.

They also define an expansion of a relation D: Given a subset B of our attribute set A, if a relation (L,R) is in D, where L is in B, but R isn’t, the set B∪R is a D-expansion of B.

They first show that if B is a subset of A, and B’ is a D[0]-expansion of B, then (B,B’) and (B’,B) are in D.  They then show that a subset B of A is D[i]-expansible  (and thus D-expansible) if and only if it is D[0]-expansible.

Then, they do the reduction from VC. A will have one attribute for each vertex in G.  D[0] (G&J’s F) will, for each vertex v, relate the set of v’s neighbors to v.  They show that a subset K of A is a key if and only if it’s a vertex cover of V.  They do this by induction: if K=A, it’s obviously a cover.  If K is smaller than A, then K is a key if and only if there is a D[0] expansion that is also a key. K has a D[0] expansion if and only if there is some vertex v not in K such that all of the vertices adjacent to v are reachable by K.  So, K is a key (and also a cover of A).

Difficulty: 8. There must be an easier way to explain this problem.

External Macro Data Compression

These next problems are related “macro compression” problems.

The problem: External Macro Data Compression.  This is problem SR22 in the appendix.

The description: Given a string s over some alphabet Σ, a “pointer cost” h and a bound B, can we find two strings D and C over the alphabet of Σ augmented with some number (< |s|) of “pointer characters” pi such that:

  • |D| + |C| + (h-1)* (# of p characters in D and C) ≤ B
  • We can generate s by replacing pointer characters in C with their “meaning” in D.

Example: The concept they’re trying to get at isn’t that hard, it’s just hard to explain using mathematical language.  Suppose s = “ABCDEFABCGABCDEF”

Then we can define D to be “ABCDEF” and C to be “pqpGpq”.  The total size of this is 6 for D, plus 6 for C.  There are 5 pointers, so our total cost is 6+6+5h.

The idea is that the characters p and q are “pointers” that refer to substrings in D (p refers to “ABC” and q refers to “DEF”).  By replacing those pointers with what they “mean” in C, we can get back s.

A tricky part of this is that you are allowed to have substrings overlap.  So if s was “ABCDBCD” we could define D to be “ABCD” and C to be “pq” with p meaning “ABCD” and q meaning “BCD”.  Now our total cost is 4 for D, 2 for C, and 2 pointers, so 4+2+h.

Reduction:  The reduction (and the one for SR23) comes from a technical report by Storer, which is pretty dense.  I think we’re looking at “Theorem 2” in the paper, which is from VC<=3.

The alphabet that will be built from the VC instance has a lot of parts:

  • A special symbol $
  • 3 symbols vi, ai, and bi for each vertex vi in the graph
  • 4 more symbols fi,1,1 through fi,2,2 for each vertex vi in the graph
  • one symbol di for each edge ej in the graph.
  • 2 symbols c1 and c2 (there is actually one c symbol for each value of h.  We’ll assume h=2 here)
  • 3 symbols g1,1 g1,2 and g2,1 (This is also based on h.  So really you’d go from g1,1 through gh-1,2 and add gh,1)

The string s will also be built from parts (a lot of these are based on h has well, again, I’m fixing h=2 to keep it simpler)

  • Vi,l = ai$vi
  • Vi,2 = vi$bi
  • For each edge ei= (vj, vk), Ei = $vj$vj$
  • We also have Z1 = (c1)3 (so 3 copies of c1) and Z2 = (c2)3

s is:

  • Eidi  concatenated for each edge, followed by
  • Vi,jfi,j,k concatenated over each vertex and each possible f symbol, followed by
  • Z1g1,1Z1g1,2, followed by
  • Z2g2,1Z2

K’ = |s| + K – (7/2)|V|.

The basic idea from here is that if G has a VC, we can compress s by making pointers for (among other things) the combination of Vi,1fi,1,2Vi,2 and the combination of Vi,2fi,2,2Vi+1,1  where vertex vi is in the VC.  This lets us use the pointers both for the part of the string with the V’s and f’s, but also for the $vi$ strings in the E components, saving space.

In the other direction, if we have a compressed string of size K’, he shows that means that you have compress a string like the above and the overlaps show you the vertices in the cover.

Difficulty: 8.  I think to really get what’s happening, you need to actually build an example on a graph.  But I think the idea of building the sets so that that is the overlap you’re looking for is something that can be gotten eventually.

Protected: String-To-String Correction

This content is password protected. To view it please enter your password below:

Protected: Shortest Common Supersequence

This content is password protected. To view it please enter your password below:

Protected: Pruned Trie Space Minimization

This content is password protected. To view it please enter your password below:

Protected: Numerical 3-Dimensional Matching

This content is password protected. To view it please enter your password below:

Protected: 3-Matroid Intersection

This content is password protected. To view it please enter your password below:

Protected: Comparative Divisibility

This content is password protected. To view it please enter your password below:

Protected: Intersection Pattern

This content is password protected. To view it please enter your password below: