Back on track with more database problems. This is another one of those “really hard to explain in NP-Complete language” problems.

**The problem: **Conjunctive Query Foldability. This is problem SR30 in the appendix.

**The description: **I find G&J’s definition very confusing, I think because they don’t have room to define what the various variables mean in terms of databases (they actually send you to the paper, as if they know it was confusing). The definitions in the paper by Chandra and Merlin that has the reduction do a good job of giving names to things and building up slowly. So:

They define a Relational Data Base as a finite domain D, and a set of relations R_{1} through R_{s}. Each relation is an ordered tuple of elements of D.

A first-order query is of the form (x_{1}, x_{2}, …x_{k}). Φ(x_{1}, x_{2}, ..x_{k}) where Φ is a first-order formula whose only free variables are x_{1} through x_{k}

The result of a query on a database B with domain D is a set of k-tuples {(y_{1}..y_{k}) ∈ D^{k }such that Φ(y_{1}..y_{k}) is true in B}.

A conjunctive query is a first-order query of the form (x_{1}, ..x_{k}).∃ x_{k+1},x_{k+2}, …x_{m}. A_{1}∧A_{2}..A_{r}, where each A_{i} is an atomic formula asking about a relation in a database, and each element in the relation is a variable form x_{1} to x_{m} or a constant.

**Examples of conjunctive queries: **Here are some examples from the paper, so we can see how this relates to databases:

“List all departments that sell pens and pencils”. If we have a relation Sales(Department, Item) (“Department” and “Item” are placeholders for variables), the conjunctive query is:

(x). Sales(x, pen) and Sales(x, pencil).

In this case, there are no new variables (or ∃) added in the query, and “pen” and “pencil” are constants.

“List all second-level or higher managers in department K10” (People who are in X10 who manage someone who manages someone else). If we have a relation Emp(Name, Salary, Manager, Dept), the conjunctive query is:

(x_{1}).∃(x_{2}, …, x_{9}) .Emp(x_{2}, x_{3}, x_{4}, x_{5}) ∧Emp(x_{4}, x_{6}, x_{1}, x_{7}) ∧ Emp(x_{1}, x_{8}, x_{9}, K10)

In this query, x_{1} is the “answer”, but depends on the existence of the other x variables. In particular, x_{4}, who is the employee x_{1} manages, and x_{2}, who is the employee x_{4} manages.

Ok, now the actual foldability definition (this is written less formally than either the paper or G&J):

Given two conjunctive queries Q_{1} and Q_{2}, can we create a function σ mapping all constants and variables x_{1} through x_{k} to themselves (and possibly mapping other variables to constants or variables from x_{1} to x_{k} as well), where if we replace each variable x in Q_{1} with σ(Q_{1}), we get Q_{2}?

**Example: **Here is a “worse” version of the first example above:

(x).Sales(x, x_{pen}) ∧ Sales(x,x_{pencil}) ∧ x_{pen}=pen ∧ x_{pencil} = pencil

This can be “folded” into:

(x).Sales(x, pen) ∧ Sales(x,pencil) ∧ pen=pen ∧ pencil = pencil

..it still has those redundant equalities at the end, though, but I don’t see a way to use the definitions of foldability to get rid of them. I *think* that the definition given in Chandra and Merlin’s paper might let you do it because their function maps the “natural model” of a query, which I think includes relations.

**Reduction: **From Graph Coloring. We start with a graph and an integer K. We build one variable for each vertex, and K additional (separate) vertices, in a set called C. The relational model R relates vertices connected by an edge.

Q_{1} is:

Q_{2} is

So what I think we’re trying to do is “fold” the variables in V into the variables in C, so that there is still a relation between each edge. That only works if the vertices on each side of the edge are different colors because otherwise.the relation won’t show up (because of the constraint). But I’m not sure, and the proof in the paper stops after building this construction.

**Difficulty**: 9. I’m pretty sure a lot of this is going over my head.