Tag Archives: Difficulty 9

Conjunctive Query Foldability

Back on track with more database problems.  This is another one of those “really hard to explain in NP-Complete language” problems.

The problem: Conjunctive Query Foldability.  This is problem SR30 in the appendix.

The description: I find G&J’s definition very confusing, I think because they don’t have room to define what the various variables mean in terms of databases (they actually send you to the paper, as if they know it was confusing).  The definitions in the paper by Chandra and Merlin that has the reduction do a good job of giving names to things and building up slowly.  So:

They define a Relational Data Base as a finite domain D, and a set of relations R1 through Rs.  Each relation is an ordered tuple of elements of D.

A first-order query is of the form (x1, x2, …xk). Φ(x1, x2, ..xk) where Φ is a first-order formula whose only free variables are x1 through xk

The result of a query on a database B with domain D is a set of k-tuples {(y1..yk) ∈ Dk such that Φ(y1..yk) is true in B}.

A conjunctive query is a first-order query of the form (x1, ..xk).∃ xk+1,xk+2, …xm. A1∧A2..Ar, where each Ai is an atomic formula asking about a relation in a database, and each element in the relation is a variable form x1 to xm or a constant.

Examples of conjunctive queries: Here are some examples from the paper, so we can see how this relates to databases:

“List all departments that sell pens and pencils”.  If we have a relation Sales(Department, Item)  (“Department” and “Item” are placeholders for variables), the conjunctive query is:

(x). Sales(x, pen) and Sales(x, pencil).

In this case, there are no new variables (or ∃) added in the query, and “pen” and “pencil” are constants.

“List all second-level or higher managers in department K10” (People who are in X10 who manage someone who manages someone else).  If we have a relation Emp(Name, Salary, Manager, Dept), the conjunctive query is:

(x1).∃(x2, …, x9) .Emp(x2, x3, x4, x5) ∧Emp(x4, x6, x1, x7) ∧ Emp(x1, x8, x9, K10)

In this query, x1 is the “answer”, but depends on the existence of the other x variables.  In particular, x4, who is the employee x1 manages, and x2, who is the employee x4 manages.

Ok, now the actual foldability definition (this is written less formally than either the paper or G&J):

Given two conjunctive queries Q1 and Q2, can we create a function σ mapping all constants and variables x1 through xk to themselves (and possibly mapping other variables to constants or variables from x1 to xk as well), where if we replace each variable x  in Q1 with σ(Q1), we get Q2?

Example: Here is a “worse” version of the first example above:

(x).Sales(x, xpen) ∧ Sales(x,xpencil) ∧ xpen=pen ∧ xpencil = pencil

This can be “folded” into:

(x).Sales(x, pen) ∧ Sales(x,pencil) ∧ pen=pen ∧ pencil = pencil

..it still has those redundant equalities at the end, though, but I don’t see a way to use the definitions of foldability to get rid of them.  I think that the definition given in Chandra and Merlin’s paper might let you do it because their function maps the “natural model” of a query, which I think includes relations.

Reduction: From Graph Coloring.  We start with a graph and an integer K.  We build one variable for each vertex, and K additional (separate) vertices, in a set called C. The relational model R relates vertices connected by an edge.

Q1 is: (). \exists V.\exists C. (\bigwedge\limits_{(u,v) \in E} R(u,v) \wedge \bigwedge\limits_{(u,v)\in C, u\ne v} R(u,v))

Q2 is (). \exists C. \bigwedge\limits_{(u,v)\in C, u \ne v} R(u,v))

So what I think we’re trying to do is “fold” the variables in V into the variables in C, so that there is still a relation between each edge.  That only works if the vertices on each side of the edge are different colors because otherwise.the relation won’t show up (because of the u \ne v constraint).  But I’m not sure, and the proof in the paper stops after building this construction.

Difficulty: 9.  I’m pretty sure a lot of this is going over my head.

Additional Key

After skipping around it for a while, we’re finally ready to return to this problem.

The problem: Additional Key.  This is problem SR27 in the appendix.

The description: Given a set A of attributes, and a set F of functional dependencies, a subset R of A, and a set K of keys on R,  can we find an additional key on R for F?  (Which is an additional subset R’ of R, that is not in K, that serves as a key for F)

Example: This example is from the Beeri and Bernstein paper that had last week’s reduction, and has this week’s as well:

  • ({A,B), {C}
  • ({C}, {B}

If R = A = {A,B,C}, and K = the single key {A,B}, then {A,C} is an additional key.

Reduction: The Beeri and Bernstein paper uses a similar construction from last week’s.  Again, we reduce from Hitting Set.

They build the same construction as in the BCNF violation reduction.  Recall that the set was set up so that there was a BCNF violation if there was a hitting set, and that violation happened because some but not all attributes were functionally dependent on the hitting set.  So if we add those extra attributes to the set of attributes in the hitting set, we get an additional key.

That’s the basic idea, but the paper has some parts I have trouble following (there are some notation issues I can’t parse, I think.  For example, it’s not immediately obvious to me what K is for this problem.  I think it’s the set of new attributes, but I’m not sure.

Difficulty: 9 One step harder than the last problem, since it builds on it.

Prime Attribute Name

We’re skipping ahead a bit because this reduction is from the same paper as last week’s (and reduces from last week’s problem).

The problem: Prime Attribute Name.  This is problem SR28 in the appendix.

The description: Given a set A of attribute names, and a collection F of functional dependencies like in the last problem, and a specific attribute x in A, is x in some key K for the relational system <A,F>?  (Recall that keys need to be of minimal size).

Example: Using the example from last week- the list of attributes was {Name, ID, Course, Professor, Time}.  The functional dependencies were:

  • ({Name},{ID})  (If you know a student’s name, you can figure out their ID)
  • ({ID},{Name})  (If you know a student’s ID, you can figure out their name)
  • ({Name, Course}, {Professor, Time}) (If you know a student’s name and a course they are in, you can figure out the professor teaching it and what time the course is)
  • ({Professor, Time), {Course}) (If you know the professor of the course and the time it is being taught, you can figure out the course being taught.  Note that you can’t figure out the student because more than one student will be in the course)

Recall that we said the pair of attributes {Name, Course} was a key for this system.  (So, if x was “Key” or “Course”, a solver for this problem should answer “yes”.)  But since Name and ID are related, ID is also a prime attribute.  I think  Professor and Time are not prime, because any subset of A that includes them and has the required characteristic is not minimal (If you take Professor, you’d have to add two other things- for example Course and ID).

Reduction: This reduction is in the same paper that has the Minimum Cardinality Key reduction, and in fact reduces from Minimum Cardinality Key.  Their terminology differs from what I typically use, but I’ll try to do it their way to be close to the paper.

The MCK instance starts with a set of attributes A’, a set of relations D[0]’ (what they call F), and a key size m (what we call k).  We need to build an instance of the PAN problem: a set of attributes A, a new set of functional dependencies D[0], and a specific attribute we want to check: b (what our definition calls x).

First, they define a set A”, which is “a set whose cardinality if the smaller of |A’| and m”. (I guess of all new elements).  They define A to be A”xA’ unioned with A’ unioned with a new attribute b.  They define D[0] with the following characteristics:

  • For all subsets E and F of A’, if E relates to F in D[0]’, E unioned with B relates to F in D[0]
  • A’ unioned with b relates to A”xA’
  • for each ordered pair (i,e) in A”xA’, b unioned with (i,e) relates to e.
  • for each element i in A’ and all distinct e,f in A, the set {(i,e), (i,f)} relates to b.
  • Each element e in A’ relates to b.

The idea now is that if D[0] has a minimal key that contains b, you need a bunch (N) of the (i,e) sets as well in the key to distinguish which ones relate to b.  Then the A’ part of the ordered pair gives you n elements in A’ that are a key for A’.

If A’ has a key of cardinality n <= m, then we know that A” has at least n elements, and adding b to the pairs of items in A’ and A” gets you a key for A.

Also, b has to be in the key for A, since removing it gets you a set that is not D[0] expandible,

Difficulty: 9.  This is really a lot to digest.  I’m not entirely sure I get what’s happening myself.

Protected: Rectilinear Picture Compression

This content is password protected. To view it please enter your password below:

Protected: Internal Macro Data Compression

This content is password protected. To view it please enter your password below:

Protected: Polynomial Non-Divisibility

This content is password protected. To view it please enter your password below:

Protected: Geometric Connected Dominating Set

This content is password protected. To view it please enter your password below:

Pokemon

The other major reduction my student, Dan Thornton, was working on was a reduction for the old NES Pokemon game.

This is Dan’s last reduction for his independent study class.  He’s going to be applying to grad schools this year- if you’re running a grad program, you should accept him!

(Over to Dan)

The Problem:
Here we prove that a generalized version of the classic Pokemon Red/Blue video game is NP-complete.  Generalized Pokemon(hereafter referred to as GP) asks the following question: Given a party of Pokemon and a map of trainers, can you get from the start position to the end position without having all of your Pokemon defeated in battle and continue your journey of catching them all?

This is based off of a paper by Aloupis, Demaine, Guo, and Viglietta on classical Nintendo games.

Background:
The goal of any Pokemon game is progress through the world collecting all Pokemon to become the very best Pokemon trainer there ever was. For more background see this link. We will focus on a subproblem where there is a path filled with enemy trainers from a start point to some end point. Solving GP and determining if we can get to the finish from the start turns out to be NP-complete.

In our construction of GP, there are two types of enemy trainers we will encounter, Hard trainers which we will always lose against, and Easy trainers which we will always win against. The details of this construction will be described below.

The description:
Here we reduce GP from 3 SAT by building certain constructs or gadgets in GP that can be used to model clauses, variables, variable assignment, and satisfiability. Then inductively using these component pieces we may build an instance of GP that is logically equivalent to one from 3SAT. Below is the framework that we will map components of GP on to.

FrameworkGraphPath

In the above figure all solid lines are single use paths, all dashed lines have no traversal limit.

The idea is that our trainer starts at the Start location and proceeds to a variable gadget, they then have an exclusive choice between two paths. One path reflects the assignment of true, the other of false. This choice will take us to a series of paths that will let us proceed to each clause the variable assignment satisfies and unlock it. Then we move on to the next variable. Eventually after proceeding through all of the variable clauses, we arrive at the Check in phase. Here we try to pass through all of the clauses, only the clauses we previously satisfied with the correct variable assignment may be passed through. Now if all clauses are satisfied, then we are able to pass through every one and arrive at the Finish state.

As an example in the above figure the colored edges give use a path that will satisfy all the clauses and pass to the Finish. The red edges show us assigning the value of true to x then proceeding to the first and second clause which are satisfied by this assignment of x. The blue edges show an assignment for y and the green one for z. It is important to note that we are unable to move backwards except on dashed edges, this prevents us from going back and attempting to correct our variable assignment.

Introduction
To show equivalence we break any 3SAT instance up into to following: F = \wedge_{i=1}^{n} C_{i}. Here each clause is of the form:
C_{i} = x_{i1} \vee ... \vee x_{ik_i} where
k_{i} < 4
and each x_{ik_i} is a literal of the form \neg z_{l} \ or \ z_{l}.

Components

The construction of the GP instance is built out of several component pieces,  each described below.

Trainer Construction

Our trainer has a single Pokemon, a Ghastly that knows only a single move– Self-Destruct, a move that when used causes the user to faint but deals massive damage to a single enemy Pokemon. If it is ever our trainer’s turn during a battle we are forced to use a move.

By the above, if we ever use a move, we lose due to us only having a single suicidal move and a single Pokemon. When we lose we are unable to move and will be unable to get to the Finish.

Hard trainers have two Snorelaxs, both of which are slower than our Ghastly, so when we go into battle against them we are forced to use self-destruct and lose. Even if we cause a single of the opponent’s Snorelax to faint we still lose the battle as they have another.

Easy trainers all have a single Electrode that only knows self-destruct and has more speed than our Ghastly, so they will always go first and destroy themselves. Now normally self destruct would also deal massive damage to our Ghastly but, because Ghastly is a ghost type Pokemon, and self-destruct is a normal type move, it will not effect our Ghastly. Thus the opposing trainer will run out of Pokemon before us, and we win.

So, we have a situation where we may encounter any number of weak trainers and win, but a battle with a strong trainer guarantees a loss.

Field of Vision

FieldOfView

Enemy trainers have a “field of vision” of some fixed length. In the above Figure, Player A’s field of vision is given by all blue squares. Once a player enters a trainer’s field of view they will be unable to move, the enemy trainer who can see them will approach and the trainer will be forced to battle. Other enemy trainers can block a trainers field of view, for example the striped blue box above is no longer in trainer A’s field of view due to trainer B.

Alternatively we may choose to force an enemy trainer to battle our trainer if we stand in any of the orange squares.

Finally once an enemy trainer has been defeated they will remain on the square they were defeated on indefinitely, so if we challenged and beat trainer A with our trainer standing on any of the orange squares, then trainer A would stay on the exact square he is on even if our trainer moved into his field of vision (any of the solid blue squares).

To distinguish the two types of trainers hard trainers will have a blue field of vision, where as easy trainers will have a red one.

Variable Construction
Variable assignment may be modeled using the construction in Figure 3. Here the player enters through a. The player may either choose to move to battle the enemy trainer while standing on square x or y. If our trainer battles the enemy trainer on square x then the enemy trainer will block off the c exit, and once defeated will stay there indefinitely. Alternatively if our trainer battles on square y then the defeated enemy trainer will remain where he is indefinitely and we must exit through c. So this construction therefore presents the player with an exclusive choice between two possible paths b or c.
Variable

Single Use path
In order to prevent a player from re-tracing his or her steps to access previous paths we may use the construction below. Here if we assume a player enters from a and will exit through b.

Notice that to pass from a to b our player must pass through both squares marked x and y. Notice that once we pass through y trainer C will have moved forwards to battle our trainer at y. This now means that trainer B has line of sight to x. Now because we must go through x square to get from a to b or b to a any passage through this gadget will be impossible.

Also notice that if we enter from b then we must pass through y before x and so by the same argument as above this is impossible.

Trainer A simply serves as a barrier from letting us battle trainer C from an adjacent square.

So this gadget is a single use unidirectional path.

Singleusepath

Clauses
Below we illustrate a 3SAT clause, each of the weak trainers is a literal.
If the player at any time enters through any of the three literal paths from above, then they may battle a weak trainer. If a player battles any of the weak trainers then later on when the clause gadget is entered through check in then the weak trainer the player already battled will not approach the player, and will instead continue to block hard trainer B’s field of vision so that we may pass through to check out. Hard trainer A is there just to prevent re-entering the literal clauses after entering this gadget through check in.

Clause

Crossover
The crossover gadget shown below is present to deal with the issue of planarity, as there is no device in the original Pokemon games for passing “over” other portions. Then if we view the paths between gadgets as edges of a graph similar to our framework in the first figure, then without this gadget we are restricted to building planar graphs.

This gadget lets two paths cross without letting a player switch the path they are on. So if you enter from x a player is forced to leave through x' and similarly for y and y'. Each path is single use, but using the x‘s path, has no effect on our use of the y‘s path and vice versa.  As a result we may assume we can build a single use path between any two points.

CrossOver

Reduction
Here we reduce from an instance of 3SAT. We are given a formula F with the form defined in section 4. Now we build F into an instance of GP using the reduction framework in Figure 1 as well as our components. We also impose the restriction that in every variable gadget path b corresponds to assigning a literal the value True and c corresponds to False.

True_{3SAT} \Rightarrow True_{GP}
If we have a variable assignment \phi_{F} that satisfies F then by definition there is a literal z_{j} in every clause C_{i} such that z_{j} satisfies C_{i}. Then in our constructed instance of GP we must be able to visit the literal section of every clause gadget at least once prior to arriving at the check-in phase. So it follows that we are able to battle at least 1 of the easy trainers in each clause gadget before check-in. Therefore it must be possible for us to reach the finish.

True_{GP} \Rightarrow True_{GP} True_{3SAT}
If it is possible for our trainer to get from start to finish, then again by the framework in Figure 1 we must be able to battle an easy trainer in each of our clause gadgets. To reach these clause gadgets the trainer had to make an exclusive decision in each variable gadget. This choice corresponds to a literal assignment. So the path our trainer takes through the variable gadgets, and then through the clause gadgets to the finish gives us an assignment of literals such that every clause is satisfied. Obviously this corresponds to an assignment that satisfies F.

(Back to me)

Difficulty: 9.  This isn’t that hard to understand, but there are lots of details that are needed- having to position the trainers in exactly the right position for it all to work.

Minimum Inferred Finite State Automaton

Here’s the actual problem my student Dan Thornton spent most of his semester trying to understand and prove.  It’s pretty crazy.

The problem: Automaton Identification from Given Data.  We abbreviate it “AutID”.  G&J call this problem “Minimum Inferred Finite State Automaton” and it is problem AL8 in the appendix.

The description: Given a finite set of data, is it possible to build a deterministic finite automata of n or fewer states that agrees with the data?
Prior to giving a description of this problem we must introduce a few concepts and definitions. We want to construct a finite automation, this means a Mealy model deterministic finite automata. Formally a 6-tuple of the form

    \[FA = \langle U,S,Y,f_{tr},f_{out},s_{0} \rangle\]

  • U is the input alphabet, where u \in U is an individual character, and \overline{u} \in U^{*} is a string.
  • S the set of states in the model. s \in S refers to an individual state. S may equivalently be referred to as the set S_{[s]} of all equivalence classes [s]_{\overline{u}}, with representative element \overline{u}. Each equivalence class is a set of all input strings \overline{u} \in U^{*} such that f_{tr}(\overline{u}) = s.
  • Y the output alphabet, where u \in Y is a individual character, \overline{y} \in Y^{*}.
  • f_{tr}: S \times U \rightarrow S is the transition function.
  • f_{out}: S \times U \rightarrow Y is the output function.
    • Both of the above functions may be extended in a natural way so that
      f_{tr}: S \times U^{*} \rightarrow S and f_{out}: S \times U^{*} \rightarrow Y^{*} respectively. Usage should be clear from context.
  • s_{0} \in S is the start state.

We use the standard convention that \phi refers to the empty string of no characters.

A black box is similar to an ‘unknown’ Mealy model with the restriction that for any input string \overline{u} we only get to see the last output character f_{out} returns. Consider the following black box. With input alphabet U = \lbrace 1 , 0 \rbrace and output alphabet Y= \lbrace A,B,C,D \rbrace.

Here the edges are labeled ( input / output )
autid1

So if the above known black box was passed the string 0001 the complete Mealy model’s output function would give us back the string ADDC where as the black box would only give us back C.

Now our above finite automata is constructed using data we are given, here we assume that data is a set D of finite pairs of the form (input \ string, black \ box \ output).
D = \lbrace( \overline{u}_{1},y_{1} ) ,...,(\overline{u}_{k},y_{k}) \rbrace such that \overline{u}_{i} \in U^{*} is not empty and \overline{u}_{i} \neq \overline{u}_{j}for  i \neq j.

An automata A agrees with data D if for every datum d = (input string, output character) \in D, the final output character of f_{out}(input string) = output character.

Formal Problem Statement:
Given input alphabet U, output alphabet Y and data, a set D of finite pairs determine if an automation of n or fewer states exists that agrees with D.

The Reduction:

The reduction by Gold uses Monotone EQ SAT. So we are given a conjunction of clauses F = \wedge_{i=1}^{l} C_{i} where each clause in F contains all negated or non-negated variables z_{j} and the number of clauses and variables is equal, is there an assignment of the variables so that F is satisfied? Our instance will have l clauses and variables. Clauses will be numbered C_{0} ... C_{l-1} and variables will be numbered z_{1} ... z_{l}.

In the above instance of Monotone EQ SAT we do not have a variable z_{0}. Indices in our reduction will sometimes force us to reference z_{0}. We may think of this variable as simply a place holder for z_{l}.

The idea behind this reduction is that we may encode our instance of Monotone EQ SAT into data D in such a way that if we can create a finite automata from our encoding, then we can use it to obtain a solution to the instance of Monotone EQ SAT if one exists.

Aut ID construction:
Here we transform our instance of Monotone EQ SAT into data D, to do so we need the following functions:

    \[I_{F}(i,k) = \left\{ \begin{array}{ll} "no-value" & \quad \text{ if }\ z_{l - k} \in C_{i} \\ 0 & \quad otherwise \end{array} \right. \\ \text{where } \\ 0 \leq i \leq l-1 \text{ and } \\ 1 \leq k \leq l\]

    \[\pi_{F}(i) = \left\{ \begin{array}{ll} 1 & \quad \text{if}\ C_{i} \text{ contains uncomplemented variables}\ \\ 0 & \quad \text{if}\ C_{i} \text{ contains complemented variables}\ \end{array} \right. \\ \text{where } \\ 0 \leq i \leq l-1\]

The bounds on the above functions are a result of our numbering of clauses C_{i} and z_{j}. Note that z_{0} is really a placeholder for z_{l}.
Given a propositional formula F which is an instance of MonotoneEQ SAT we encode it by the following:

  • The variables of F are encoded by the set
    D_{L} =\lbrace (1,0),(11,0),....,(1^{l-1},0),(1^{l},1) \rbrace. Here each z_{j} corresponds to the element (1^{j},y) where y is an element of the output alphabet, Y.
  • We encode the clauses C_{i} of F by D_{C} = \lbrace (1^{i}01^{j}, I_{F}(i,j)) \rbrace for 0 \leq i \leq l-1 , 1 \leq j \leq l. Here each clause has a \textit{identifying tag}, 1^{i}0, then the 1^{j} term is used specify a particular variable, z_{l - j}. Importantly, datum in this set for which I_{F}(i,j) returns "no-value" are thrown out.
    • Each clause encoding 1^{i}01^{j} will be 0 only if clause C_{i} does not contain variable z_{l - j}, otherwise I_{F}(i,j) will be "no-value" and is thrown out.
  • We then encode whether each clause is a set of all negated, or non-negated variables. D_{P} = \lbrace (1^{i}00, \pi_{F}(i)) \rbrace for 0 \leq i \leq l-1.

As a simple example we convert to data the following formula

    \[F = (z_{1} \vee z_{2} \vee z_{4}) \wedge (\neg z_{2} \vee \neg z_{3} \neg z_{4})\]

So after our conversion we get:

  • D_{L} = \lbrace (1,0) , (11,0) , (111,0) , (1111,1) \rbrace
  • D_{C} = \lbrace (01,0),(10111,0) \rbrace
  • D_{F} = \lbrace (00,1),(100,0) \rbrace

Aut ID instance
Now we create an instance of Aut \ ID by letting n be the number of variables l and D = D_{L} \cup D_{C} \cup D_{P}.

True{Aut ID} \Rightarrow True{Monotone EQ SAT}
Now we may assume that we have a finite automata A that agrees with the data in D. Now we may use the transition function f_{tr} and the output function f_{out} to deduce properties of A.

Let s_{\overline{u}} be f_{tr}(\overline{u}), this is the state that A is in after receiving the input string \overline{u}.

A splitting string of \overline{u}_{1} and \overline{u}_{2} is a string \overline{u}_{s} \in U^{*} so that f_{out}(\overline{u}_{1} \overline{u}_{s}) differs from f_{out}(\overline{u}_{2} \overline{u}_{s}). If such a string exists then obviously s_{\overline{u}_{1}} \neq s_{\overline{u}_{2}}. If two states do not have splitting string then they are equivalent

We define variable states as follows, a variable z_{j} has the encoding 1^{j}, so s_{1^{j}} is z_{j}‘s variable state, we will sometimes refer to this as s_{z_j}.

Clause states are defined in a manner similar to variable states. A clause C_{i} corresponds to the identifying tag in the encoding of C_{i} or 1^{i}0, so s_{1^{i}0} is C_{i}‘s clause state, we will sometimes refer to this as s_{C_i}.

In the following theorems we shall talk of assignment of clause state s_{C_i} to variable state s_{z_j}, this means that these two states are equivalent or alternatively that there does not exist a splitting string of s_{C_i} and s_{z_j}.

Theorem 1:All variable states are different.

Proof. We assume two variable states are equal {s_z}_j and {s_z}_{j'} where j >j'. Then in the encoding of D we get z_{j} = 1^{j} and z_{j'} = 1^{j'} respectively. But now we may define the splitting string 1^{l-j}. The final element of f_{out}(1^{j} 1^{l-j}) = f_{out}(1^{l}) is defined in our data D as (1^{l},1). By a similar argument we get f_{out}(1^{j'} 1^{l-j'}) = f_{out}(1^{l'}) where l' < l, in our data this corresponds to one of the entries (1^{l'},0). Our automata A must agree with the data, so as a result these states {s_z}_j and {s_z}_{j'} cannot be equal. Thus we have reached a contradiction. \square

Theorem 2: Each state in our finite automata A corresponds to exactly one variable.

Proof. The above follows by the pigeonhole principle, as A has exactly l states, there are l variable states, and by Theorem 1, no two variable states are equal. \square

By the above we may talk about the states of A in term of the variable each state corresponds to.

Theorem 3: If clause state {s_C}_i and variable state {s_z}_j are equal, Then clause C_{i} contains variable z_{l-j}.

Proof. Assume {s_C}_i = {s_z}_j and z_{l-j} is not contained in C_{i}. Once again we have the following encodings C_{i} = 1^{i}0 and z_{l-j} = 1^{l-j} that were defined in our construction of D. Now we may define a splitting string u_{s}= 1^{j}. When we append the splitting string to each of the above encodings we get C_{i} + u_{s} = 1^{i}01^{j} and z_{l-j} + u_{s} = 1^{l-j} 1^{j}. Both of these strings are defined in our data D as (1^{i}01^{j},I_{F}(i,j)) and (1^{l},1) respectively, and by our assumption that z_{l-j} is not contained in C_{i}, I_{F}(i,j) = 0 So by the a conclusion identical to that of Theorem 1, {s_C}_i \neq {s_z}_j and our theorem follows. \square

Theorem 4:No two clause states {s_C}_i and {s_C}_k may be equal if they are of opposite composition, meaning one has all negated variables, and the other all un-negated variables.

Proof. Once again we proceed by contradiction, assume {s_C}_{i} = {s_C}_{k} and that {s_C}_{i}, {s_C}_{k} are of opposite composition. Once again we get each clauses encoding 1^{i}0 and 1^{k}0 respectively. Now we may define the splitting string 0. Looking at the final character output by f_{out} we see f_{out}{1^{i}00} = \pi_{F}(i) and f_{out}{1^{k}00} = \pi_{F}(k). It follows that \pi_{F}(i) \neq \pi_{F}(k) as {s_C}_i and {s_C}_k are of opposite composition. Thus we arrive at a contradiction.\square

Theorem 5: Aut \ ID gives us a solution to F our instance of Monotone EQ SAT.

Proof. By Theorem 2 every clause state must be assigned to a single variable state, notice this does not mean only a single assignment is possible but just the given automata A will give a single assignment. By Theorem 1 no clause may be assigned more than one variable state. By Theorem 3 a clause may be assigned to a variable state only if the clause contains the variable. So we see that our automata A assigns clauses to the variables that could potentially satisfy them. Furthermore by Theorem 4, each variable state is given clauses of only one composition, so there is an obvious variable assignment to satisfy all of the clause states equal to a given variable state. \square

True{Monotone EQ SAT} \Rightarrow True{Aut  ID}
Now we assume there exists a mapping \theta_{F}: z_{j} \rightarrow \lbrace True(or \ 1), False(or \ 0) \rbrace that satisfies our instance of Monotone EQ SAT, F with l clauses and variables. From this mapping we build a finite automata A_{F} that agrees with all data D.

We let A_{F}‘s 0-transitions refer to transitions on input 0. Similarly 1-transitions  refer to the transitions on input 1.

So first we define all 1-transitions as given in the following example automata, obviously such transitions will satisfy all data in D_{L}.

autid2

Now for A_{F} to satisfy all of the data in D_{C} we notice by Theorems 3 and 4 above mapping each clause to the variable state that satisfies it seems like a natural thing to do. We may determine this mapping of clauses to the variables that satisfy them from our truth assignment that satisfies F, \theta_{F}.

Such mappings may be added to A_{F} by noticing the following, as soon as we encounter a 0 in input we must transition to a clause state. So we may define a 0-transition for every state in A_{F} that maps the clause state {s_C}_{i} (which recall corresponds to the string 1^{i}0) to the variable state {s_z}_j that satisfies it. When adding 0-transitions we must be careful as if clause C_{i} is satisfied by variable z_{j} then in order to agree with the data in D_{C} we really should map C_{i} to variable z_{l - j} due to the definition of I_{F}.

Now we must assign an output to each of these 0-transitions. To do so we iterate through every clause state {s_C}_{i} by sequentially considering s_{1^{i}0}. At each clause state we assign the output of the 0-transition starting at this state to be \pi(i). Notice that if two clause states {s_C}_{i} and {s_C}_{k} are identical, then they must both be satisfied by the same variable and so therefore \pi(i) = \pi(j).

Below is an example of adding such a 0-transition and assigning it’s output:

autid3

The Blue edge is the zero transition added that maps the clause state C_{2},which corresponds to the string 110, to the variable state z_{l - 2}. Semantically this means that the variable z_{l - (l-2)} = z_{2} satisfies this clause.
The Red edge shows us adding the output value of a 0-transition, as from the above blue edge we know {s_z}_{l-2} corresponds to the clause C_{2}‘s clause state, so to the zero transition from this clause state will have output \pi(2).

Theorem 6: Our constructed automata A_{F} agrees with all D_{L}

Proof.  This is quite obvious from our example of A_{F} in Figure 1. We have defined all the 1 transitions, and notice that the final output of all f_{out}(1^{l}) = 0 if l < n and f_{out}(1^{n}) = 1 this exactly matches our data D_{L}. \square

Theorem 7: Our constructed automata A_{F} agrees with all D_{C}
Proof:Here we consider the final character output of f_{out} for all encodings of fixed clause C_{i}, these encodings are precisely the 1^{i}01^{j} for 1 \leq j \leq l. Now with the addition of the 0-transitions, we transition into s_{1^{i}0}, and from this state we only worry about 1-transitions. Recall that in our definition of D_{C}, all datum are of the form (input \ string, 0). In A_{F} there is a single 1-transition that outputs the non-zero value 1. To agree with D_{C} it is sufficient to make sure that this non-zero output occurs for an input string not in D_{C}.

From our assignment of 0-transitions defined above this singular 1-transition that outputs 1 will occur for the variable z_{l-j} that satisfies C_{i}, so then z_{l-j} must be in C_{i}. So A_{F} will give the datum (1^{i}01^{l-j},1). Now notice the corresponding entry in D_{C} as defined by I_{F}(i,j) would be (1^{i}01^{l-j},"no-value"), but all of these datum were removed from D_{C}. So the sufficiency condition above is satisfied. \square

Theorem 8 :Our constructed automata A_{F} agrees with all D_{P}

Proof: The manner in which we defined our 0 transitions explicitly involved a \pi_{i} term, so for any clause state {s_C}_{i} = {s_{1^{i}0}} , now a 0 transition as defined above will output \pi(i) which will obviously agree with D_{P}

By Theorem 6, Theorem 7, and Theorem 8. We conclude that A will agree with data D

(Back to me)

Difficulty: 9.  This is basically the reduction in the Gold paper, though Dan reorganized it a bunch to make it fit our format, and be more understandable.  But the formatting of the various variables, the fact that you need to define the IF function on variable k to talk about zl-k instead of zk, and the three different kinds of test data all make this super hard to understand.  In some ways, I wonder if this should be a 10, given the work we did on it.  But I’m saving the 10’s for things that are truly impenetrable, and we did manage to figure this one out.