**The problem: **Prime Attribute Name. This is problem SR28 in the appendix.

**The description: **Given a set A of attribute names, and a collection F of functional dependencies like in the last problem, and a specific attribute x in A, is x in some key K for the relational system <A,F>? (Recall that keys need to be of minimal size).

**Example: **Using the example from last week- the list of attributes was {Name, ID, Course, Professor, Time}. The functional dependencies were:

- ({Name},{ID}) (If you know a student’s name, you can figure out their ID)
- ({ID},{Name}) (If you know a student’s ID, you can figure out their name)
- ({Name, Course}, {Professor, Time}) (If you know a student’s name and a course they are in, you can figure out the professor teaching it and what time the course is)
- ({Professor, Time), {Course}) (If you know the professor of the course and the time it is being taught, you can figure out the course being taught. Note that you can’t figure out the student because more than one student will be in the course)

Recall that we said the pair of attributes {Name, Course} was a key for this system. (So, if x was “Key” or “Course”, a solver for this problem should answer “yes”.) But since Name and ID are related, ID is also a prime attribute. I think Professor and Time are *not* prime, because any subset of A that includes them and has the required characteristic is not minimal (If you take Professor, you’d have to add two other things- for example Course and ID).

**Reduction: **This reduction is in the same paper that has the Minimum Cardinality Key reduction, and in fact reduces from Minimum Cardinality Key. Their terminology differs from what I typically use, but I’ll try to do it their way to be close to the paper.

The MCK instance starts with a set of attributes A’, a set of relations D[0]’ (what they call F), and a key size m (what we call k). We need to build an instance of the PAN problem: a set of attributes A, a new set of functional dependencies D[0], and a specific attribute we want to check: b (what our definition calls x).

First, they define a set A”, which is “a set whose cardinality if the smaller of |A’| and m”. (I guess of all new elements). They define A to be A”xA’ unioned with A’ unioned with a new attribute b. They define D[0] with the following characteristics:

- For all subsets E and F of A’, if E relates to F in D[0]’, E unioned with B relates to F in D[0]
- A’ unioned with b relates to A”xA’
- for each ordered pair (i,e) in A”xA’, b unioned with (i,e) relates to e.
- for each element i in A’ and all distinct e,f in A, the set {(i,e), (i,f)} relates to b.
- Each element e in A’ relates to b.

The idea now is that if D[0] has a minimal key that contains b, you need a bunch (N) of the (i,e) sets as well in the key to distinguish which ones relate to b. Then the A’ part of the ordered pair gives you n elements in A’ that are a key for A’.

If A’ has a key of cardinality n <= m, then we know that A” has at least n elements, and adding b to the pairs of items in A’ and A” gets you a key for A.

Also, b has to be in the key for A, since removing it gets you a set that is not D[0] expandible,

**Difficulty: **9. This is really a lot to digest. I’m not entirely sure I get what’s happening myself.

**The problem: **Minimum Cardinality Key. This is problem SR26 in the appendix.

**The description: **Given a set A of attributes, a collection F of ordered pairs of subsets of A (“functional dependencies”), and a positive integer M, can we find a set K of A attributes or less that form a key for the relational system <A,F>.

G&J defines “form a key” as the ordered pair (K,A) belongs to the closure F* of F, with the following properties:

- F ⊆ F*
- B⊆C⊆A implies (C,B) ⊆ F* (the “projective” closure)
- (B,C) and (C,D) ⊆ F* implies (B,D) ⊆ F* (the “transitive” closure)
- (B.C) and (B,D) ⊆ F* implies (B,C∪D) ⊆ F* (the “additive” closure)

**Exam**ple: The paper by Lucchesi and Osborn that has the reduction has a good example:

Suppose we have a database table for student records. The attribute set A could be {Name, ID, Course, Professor, Time}. Some functional dependencies, written as ordered pairs, could be:

- ({Name},{ID}) (If you know a student’s name, you can figure out their ID)
- ({ID},{Name}) (If you know a student’s ID, you can figure out their name)
- ({Name, Course}, {Professor, Time}) (If you know a student’s name and a course they are in, you can figure out the professor teaching it and what time the course is)
- ({Professor, Time), {Course}) (If you know the professor of the course and the time it is being taught, you can figure out the course being taught. Note that you can’t figure out the student because more than one student will be in the course)

If these relations are F, then we can derive all of the attributes just from knowing the name of the student and the course they are taking. In other words, F* contains the pair ({name, course}, (name, ID, course, professor, time).

**Reduction: **The Lucchesi and Osborn paper use different terminology than G&J’s. They define a relation D, defined in a series of inductive steps, where D[0] is F, and each D[i] is created from D[i-1] by doing one step of one of the closures on the elements in D[i-1]. The final closure (what G&J call F*) they call D.

They also define an expansion of a relation D: Given a subset B of our attribute set A, if a relation (L,R) is in D, where L is in B, but R isn’t, the set B∪R is a D-expansion of B.

They first show that if B is a subset of A, and B’ is a D[0]-expansion of B, then (B,B’) and (B’,B) are in D. They then show that a subset B of A is D[i]-expansible (and thus D-expansible) if and only if it is D[0]-expansible.

Then, they do the reduction from VC. A will have one attribute for each vertex in G. D[0] (G&J’s F) will, for each vertex v, relate the set of v’s neighbors to v. They show that a subset K of A is a key if and only if it’s a vertex cover of V. They do this by induction: if K=A, it’s obviously a cover. If K is smaller than A, then K is a key if and only if there is a D[0] expansion that is also a key. K has a D[0] expansion if and only if there is some vertex v not in K such that all of the vertices adjacent to v are reachable by K. So, K is a key (and also a cover of A).

**Difficulty: **8. There must be an easier way to explain this problem.

**The problem: **Rectilinear Picture Compression. This is problem SR25 in the appendix.

**Description: **Given an NxN matrix M, consisting of 0’s and 1’s, and a positive integer K, can we find K or fewer rectangles in M that cover precisely all of the 1’s in M?

**Example: **Suppose M was:

1 | 0 | 0 | 1 | 1 |

1 | 0 | 1 | 1 | 1 |

0 | 0 | 1 | 1 | 0 |

0 | 0 | 1 | 1 | 1 |

1 | 0 | 0 | 0 | 1 |

Here is what I think the best way to draw rectangles is:

Notice that 1×1 rectangles are allowed, and by the way I read the problem definition you can have a cell that is in multiple rectangles. (What you can’t have is a 0 inside a rectangle, or a 1 outside of a rectangle).

**Reduction: **The good news about Johnson’s NP-Competeness column is that it led me to a technical report by Conn and O’Rourke that explains most of the details of Masek’s reduction. Which is a good thing, because it’s very complicated. He basically defines shapes of 1’s as “wires” running through the matrix. Here are the examples in the Conn and O’Rourke paper:

These wires will be used to represent boolean expressions (truth values are expressed as rectangles of 1’s: True is a 2×1 rectangle, false is a 1×2) Masek then proves several lemmas about how many rectangles it takes to cover each figure, and gives an algorithm to show how to build a figure with the wire components for any boolean expression. From the “how many rectangles does it take to cover each figure” lemmas, we know how many rectangles it will take to cover the figure if it is satisfiable, which becomes the K for the problem.

**Difficulty: 9**, maybe 10. The Conn and O’Rourke paper doesn’t actually show the proofs of any of these lemmas, but I believe I could follow the reduction if I had access to the lemmas and the construction algorithm.

**The problem: **Regular Expression Substitution. This is problem SR24 in the appendix.

**The description: **Given two finite alphabets X and Y (possibly with different numbers of symbols, possibly with some overlap between the symbols), and a regular expression R over X∪ Y. Also, we have one regular expression R_{i} over Y for each symbol in X, and a string in Y*. Can we find a string z in the language of R, and a string w_{i} in each of the R_{i} languages such that if we start with z, and replace each symbol x_{i} with the string w_{i}, the resulting string is w?

**Example:** Here’s a pretty simple example. Obviously you can use much more complicated regular expressions to make much more complicated things.

X = {a,b,c}. Y = {1,2,3,4} w = 1231122334

R = X*Y (0 or more symbols from X, ending with a symbol from Y)

R_{1} = 1 + 11

R_{2} = 2+22

R_{3} = 3+33

z can be abcabc4

Each occurrence of a will be replaced by a string from R_{1} (1 the first time, 11 the second time), each occurrence of b will be replaced by a string from R_{2}, and each occurrence of c will be replaced by a string from R_{3}, getting us w back.

**Reduction: **G&J say to use X3C. Here’s what I think I want to do:

We’re given an instance of X3C, so a set S (normally it’s X, but I don’t want to confuse it with the X in this problem) with 3q elements, and a collection C of 3-element subsets of X. Our alphabet X will have one symbol for each element of each set in C (So an element will be something like c_{i,j} for element i of set j.). Our alphabet Y will have one symbol for each element in S.

z will be the string s_{0}s_{1}..s_{3q} (all the elements of S in some arbitrary order).

R will be the regular expression “Strings of length 3q over X that for each set in C either does not use that set at all, or uses all three elements in it exactly once”. This is the part I’m not confident in. I’m sure this is a description of a regular expression. What I’m *not *sure of is whether this expression can be expressed in a form that is polynomial in the size of S and C. I don’t know enough about how to minimize regular expressions to be able to answer that.

Anyway, from there, each expression R_{i} generates the single symbol in Y that is the corresponding element of the set in X (so the expression for the element in X c_{1,5} would be whatever the first element of set 5 in C is).

The idea is that you generate w by finding the string z that has the correct elements of the correct sets. The rules for how you can make z give you constraints that your solution is a legal X3C solution.

**Difficulty: **9. Maybe it should be a 10, since I’m not really confident this is correct. The real trouble I had coming up with a reduction was that strings in regular expressions have a fixed order, but the X3C problem just wants sets, which have no orderings. So you need a way to either for the single string w to stand for any arrangement of symbols in S. And while “uses each symbol in Y exactly once” is a regular expression, I’m not sure that can be written in polynomial time relative to the size of S and C either. The only way I can think to do it offhand is by choosing between all (3q)! permutations of the symbols.

**The problem: **Internal Macro Data Compression. This is problem SR23 in the appendix.

**The description: **Given an alphabet Σ, a string s in Σ*, a “pointer cost” h and a bound B, can we find a string C in the alphabet of Σ augmented with at most s “pointer characters” p_{i} such that:

- |C|+ (h-1)* (# of pointer characters in C) is ≤ B, and
- We can replace pointer characters with substrings of C to regain s?

**Example: **This is similar to the last macro compression problem, but instead of having 2 strings C and D, C serves both purposes. In our example from last time, we had s = “ABCDEFABCGABCDEF”, and came up with C= “pqpGpq” and D= “ABCDEF”. The difference here is that there is no D string, we have to do it all in C. We can do that by letting C = “ABCDEFqpGpq”. Now if we let p=”ABC” (or, “the substring of C of length 3 starting at position 1” and q = “DEF” (or, “the substring of C of length 3 starting at position 4”, we get a string of cost 11+3h that gives us back s if we replace all pointers with the strings they represent.

**Reduction: **The paper by Storer that we’ve been using for the last few problems also has this reduction. I think this is his “CPM” problem. He uses the “K-Vertex Cover” problem, in the case where K=1. Recall in that case we are looking for a set of edges E’ such that each edge in E shares a vertex with an edge in E’. We start with a graph, which is an instance of this problem. The alphabet we build will have:

- A special symbol $
- One symbol for each vertex and edge in G

For each edge e_{i} = (v_{j}, v_{k}), define the substring E_{i } = $v_{j}$v_{k}$

Our s = the concatenation of e_{i}E_{i} for all edges i (So the symbol for the edge followed by the substring of the edge). B = J + |S| – |E|.

The idea (and I’ll admit I have a hard time following this paper) is that if G has a 1-VC, then we have J edges that form a set E’ such that each edge in E is adjacent to something in E’. This means that we can overlap those edges and make pointers out of the E_{i} strings to save enough characters.

**Difficulty: **9. I’m sure this can be explained better, but I really have a hard time following it.

After this week, I’m going to be traveling for three weeks. So the next post will be the week of August 14.

**The problem: **K-Vertex Cover. This problem is not in the appendix.

**The description: **Given a graph G=(V,E), and integers K and J. Does G contain a set of J or less paths, where each path contains K or less edges, and each edge in E is incident on at least one of these paths?

**Example: **Here is a graph:

A K-Vertex Cover where K = 0 is just a regular Vertex Cover:

With K=1, this is “Find a subset E’ of E where each edge in E shares an endpoint with something in E'”:

With K=2, we are finding paths of 2 (or less) edges where each edge in E is adjacent to something in a path (I colored each path a different color to make them easier to tell apart):

Notice that K is set as a parameter of the problem.

**Reduction: **The report by Storer that had the previous reduction (and which will have the next one) introduces this problem. The reduction is from Vertex Cover. I guess we could just take the initial VC instance, set K=0, and be done, but I like this construction because it shows how to make 1-VC, 2-VC, and so on NP-Complete. (In other words, if K is fixed outside of the problem, you need this reduction). Interestingly, a |V|-vertex cover is trivial (it’s asking if the graph has J or less connected components).

Given our VC instance G=(V,E), with bound J (instead of K, just because K here means something different and the J for our problem will be the same J for the VC instance), build a new graph G’=(V’, E’):

V’ starts with V, and adds two new vertices for each K and each J (so 2*K*J new vertices) labeled x_{1,1} through x_{j,k} and y_{1,1} through y_{j,k}. Notice that since J and K should be ≤ |V| (or the answer is trivial), this only adds a polynomial number of vertices.

E’ starts with E, adds an edge from each vertex in V to all x vertices labeled x_{j,1}, an edge from each x_{i,j} to its corresponding y_{i,j}, and a vertex from x_{i,j} to the next x_{i,k+1}

The idea is that since each x vertex needs to connect to a y vertex, we will need to include a path that goes through each x vertex in our k-cover. Since there are J paths of length K, and each vertex in v connected to all x_{j,1}, what happens is that each of the J vertices chosen in the VC of G “chooses” a different path of x vertices. So we can cover all of the vertices in G’ with J paths if and only if we can cover all of the vertices in G with J vertices.

**Difficulty: **5. This is a pretty cool reduction. I like the idea of adding K *J copies of the vertices, because while you can do that for a graph problem (since K and J need to be < |V| realistically), you can’t do that for other kinds of problems. For example, trying to create an element for each value from 1 to K in the Sum of Subsets problem won’t give you a polynomial reduction.

**The problem: **External Macro Data Compression. This is problem SR22 in the appendix.

**The description: **Given a string s over some alphabet Σ, a “pointer cost” h and a bound B, can we find two strings D and C over the alphabet of Σ augmented with some number (< |s|) of “pointer characters” p_{i} such that:

- |D| + |C| + (h-1)* (# of p characters in D and C) ≤ B
- We can generate s by replacing pointer characters in C with their “meaning” in D.

**Example: ** The concept they’re trying to get at isn’t that hard, it’s just hard to explain using mathematical language. Suppose s = “ABCDEFABCGABCDEF”

Then we can define D to be “ABCDEF” and C to be “pqpGpq”. The total size of this is 6 for D, plus 6 for C. There are 5 pointers, so our total cost is 6+6+5h.

The idea is that the characters p and q are “pointers” that refer to substrings in D (p refers to “ABC” and q refers to “DEF”). By replacing those pointers with what they “mean” in C, we can get back s.

A tricky part of this is that you are allowed to have substrings overlap. So if s was “ABCDBCD” we could define D to be “ABCD” and C to be “pq” with p meaning “ABCD” and q meaning “BCD”. Now our total cost is 4 for D, 2 for C, and 2 pointers, so 4+2+h.

**Reduction**: The reduction (and the one for SR23) comes from a technical report by Storer, which is pretty dense. I think we’re looking at “Theorem 2” in the paper, which is from VC<=3.

The alphabet that will be built from the VC instance has a lot of parts:

- A special symbol $
- 3 symbols v
_{i}, a_{i}, and b_{i}for each vertex v_{i}in the graph - 4 more symbols f
_{i,1,1}through f_{i,2,2}for each vertex v_{i}in the graph - one symbol d
_{i}for each edge e_{j}in the graph. - 2 symbols c
_{1}and c_{2}(there is actually one c symbol for each value of h. We’ll assume h=2 here) - 3 symbols g
_{1,1}g_{1,2}and g_{2,1}(This is also based on h. So really you’d go from g_{1,1}through g_{h-1,2}and add g_{h,1})

The string s will also be built from parts (a lot of these are based on h has well, again, I’m fixing h=2 to keep it simpler)

- V
_{i,l}= a_{i}$v_{i} - V
_{i,2}= v_{i}$b_{i} - For each edge e
_{i}= (v_{j}, v_{k}), E_{i}= $v_{j}$v_{j}$ - We also have Z
_{1}= (c_{1})^{3}(so 3 copies of c_{1}) and Z_{2}= (c_{2})^{3}

s is:

- E
_{i}d_{i}concatenated for each edge, followed by - V
_{i,j}f_{i,j,k}concatenated over each vertex and each possible f symbol, followed by - Z
_{1}g_{1,1}Z_{1}g_{1,2}, followed by - Z
_{2}g_{2,1}Z_{2}

K’ = |s| + K – (7/2)|V|.

The basic idea from here is that if G has a VC, we can compress s by making pointers for (among other things) the combination of V_{i,1}f_{i,1,2}V_{i,2} and the combination of V_{i,2}f_{i,2,2}V_{i+1}_{,1} where vertex v_{i} is in the VC. This lets us use the pointers both for the part of the string with the V’s and f’s, but also for the $v_{i}$ strings in the E components, saving space.

In the other direction, if we have a compressed string of size K’, he shows that means that you have compress a string like the above and the overlaps show you the vertices in the cover.

**Difficulty: **_{8}. I think to really get what’s happening, you need to actually build an example on a graph. But I think the idea of building the sets so that that is the overlap you’re looking for is something that can be gotten eventually.

**The problem: **Grouping By Swapping. This is problem SR21 in the appendix.

**The description: **Given a string x over some finite alphabet, and an integer K, can we convert x into a string that groups all characters together in adjacent positions by using K or less swaps of adjacent symbols?

**Example: **Suppose x was abcabccba

Our goal is to create a string where all of the a’s are consecutive, all of the b’s are consecutive, and all of the c’s are consecutive. So for example, we could try to gather the b’s together in the middle.

- Start with abcabccba
- 2 swaps get us acabbccba
- 2 swaps get is acabbbcca
- Now we need to bring the c’s together. 4 swaps get is aabbbccca
- Now we need to bring the a’s together. Swapping from the right we can get aaabbbccc in 6 more swaps. This is a total of 14 swaps

A better method would be to try to bring the c’s together first:

- Start with abcabccba
- 2 swaps get us ababcccba
- 3 swaps get us to ababbccca
- 1 swap gets us to aabbbccca
- 6 swaps get us to aaabbbccc. This is a total of 12 swaps.

Note that there is no rule that the a’s have to come before the b’s or anything. If we could get a final string of, say, bbbaaaccc in less total swaps, that would be fine.

**Reduction**: I should start by noting how similar this problem is to String-To-String Correction, especially since that problem is also NP-Complete in the case where the only operation allowed is swapping (instead of swapping or deleting). I was hoping to find a good simple reduction between these two problems. But I couldn’t figure out a way to relate the fact that String-To-String correction gives you a destination string to use, but Grouping By Swapping allows you to come up with any string, as long as the characters are grouped together. Doing all possible permutations of the alphabet as possible destination strings in a kind of Turing Reduction is too many possibilities, and I couldn’t think of a way to change x to force the best solution to this problem to relate to the needed y in the String-To-String Correction problem.

So, instead, we’ll use the paper by Paterson and Razborov that we used last week for the Non-Minimal Feedback Arc Set reduction which reduces from that problem here. They actually show a stronger problem- given a string x, can we find a permutation of the alphabet that has less inversions (out of order characters) on x than there is now?

(Since an inversion is “fixed” by swapping two characters to put them in order, we can reduce from this problem to our grouping by swapping problem by setting K to be the number of inversions in x)

We’re given an instance of Non-Minimal Feedback Arc Set- A directed graph G=(V,A) and a set S of edges that forms a feedback arc set in G. Give each vertex a number so that if we have an edge (v_{i}, v_{j}) in A-S, then i < j .

The string that will be built is based on a palindrome. Palindromes have the property that no matter what permutation of the alphabet, there is always the same number of inversions. (The way this relates to the original grouping by swapping problem is that *all* permutations of the characters in the final grouped string will have the same number of swaps). Note that if we swap 2 characters in a palindrome in the “wrong way” (away from where they will eventually end up) we will add 2 to the number of inversions.

Suppose |A| was p, and edge a_{j} was (u,v). Define e_{j} to be “uv” and E_{j} to be “vu”. The string w = e_{1}e_{2}..e_{p}e_{p}..e_{2}e_{1} is not quite a palindrome because we’re using “uv” in both the first half and the second half. The palindrome we want is: e_{1}e_{2}..e_{p}E_{p}..E_{2}E_{1. }

That palindrome has the same number of swaps to get to any legal grouping of characters. Call that number of swaps q. Given a permutation π of the alphabet, we can use the ordering of the vertices to find out how many edges in the graph are out of order in this permutation. Call this number back(π).

So the total number of inversions of w given a permutation π is q-|A|+2*back(π). (The inversions of the palindrome, minus all of the edges in A that were e instead of E, plus the penalty of 2 for each edge that is “wrong” in the inversion we’re using)

So, the original graph has feedback arc set smaller than S if and only if we can find a permutation π where back(π) < |S|. This happens if and only if we can find a permutation with less inversions than the identity permutation (the one that we got from the ordering of the vertices based on S), which only happens if w is a solution to the modified Grouping by Swapping problem.

**Difficulty: **9. I’ll be honest, I don’t really but the end of the reduction. I don’t really see how they’re using the feedback set S anywhere except for the initial ordering. So I’m not following those last couple of “if and only if” steps Maybe I’m missing something.

I do like the idea that a palindrome has the same number of inversions, no matter what your final order is. I wish I could think of a way to use that in a reduction from String-To-String Correction that was easier.

]]>**The problem: **Non-Minimal Feedback Arc Set. This problem is not in the appendix, but is related to the regular Feedback Arc Set problem (GT8).

**The description: **Given a directed graph G, and a subset S of the edges in G such that each cycle in G contains at least one edge in S (so S is a feedback arc set). Can we find another feedback arc set S’ with less edges than S?

**Example: **The difference between this problem and the “regular” Feedback Arc Set problem is that in this problem we’re given a set and asking “can we do better”, and in the regular Feedback Arc set problem, we’re given a K and asking “can we do it in K or less”? Here is an example graph we used for the Feedback Arc Set reduction:

Suppose we had the set S of {(f,d), (d,e), (e,c), (b,c)}. This is a feedback arc set, I’ve drawn it in blue:

In this case, we can build an S’ that is smaller: {(c,a), (e,a), (d,e)}:

Notice that S’ does *not* have to be a subset of S. It just needs to have less edges.

**Reduction: **The reduction for this problem and next week’s is from a paper by Paterson and Razborov from 1991. (As an aside, while I know that 1991 is over 25 years ago, I spend so much time with papers from the 1970’s working on these problems, that 1991 feels “modern”).

It feels like there should be an easy reduction from regular Feedback Arc set to this problem, but I couldn’t think of one. To do it, you’d need to take the K from the feedback arc set and create an *actual* feedback arc set (to make your S). Even for something like a Turing reduction, the requirement that you actually need to create a feedback arc set (which itself is hard to do) makes it hard for me to see an easy reduction in that direction.

So instead, the paper uses 3SAT. Given a formula, for each variable u and clause c add 4 vertices s: A_{u,i}, B_{u,i}, a_{u,j}, and b_{u,j}. Within each quartet of vertices, add edges (a_{u,i}, b_{u,i}), (A_{u,i}, B_{u,i}), (b_{u,i}, A_{u,i}), and (B_{u,i}, a_{u,i}):

Notice that this forms a cycle (and remember, there is one of these cycles for each pair of vertices and clauses- even if the vertex doesn’t appear in that clause. Though to be honest, I’m not sure you need a copy if the vertex doesn’t appear in the clause). The edges (a,b) or (A,B) will be chosen as the feedback edges, with the lowercase letters meaning setting the literal positively in the clause, and the capital letters meaning setting the literal negatively.

We still need to add edges that tie variables that occur in multiple clauses together. Each clause has 3 edges that make a cycle out of the 3 literals in the clause. the edges go from b->a, using the lowercase version of the letter if the literal is positive and the uppercase version if the literal is negative. The example in the paper is that if Clause i = {x_{3}, ~x_{4}, x_{6}}, we’d add the three edges (b_{3,i}, A_{4,i}), (B_{4,i}, a_{6,i}), and (b_{6,i}, a_{3,i}). Here are the three variable cycles for clause i with the new edges added:

Notice that the way we set a variable to make a literal true (for example, making x_{3} positive by adding the edge (a_{3}, b_{3}) to the feedback arc set) creates a feedback arc set both for the 4-vertex cycles, but also for the cycle for the clause.

The authors say that is “straightforward” that 3SAT is NP-Complete even when you give an assignment of the variables that satisfies all but one of the clauses. Mapping this assignment to the feedback edges in the graph will give us a feedback set for each cycle of 4 vertices, and for all clause cycles except for one Thus, we need an edge to “satisfy” this last clause. Call this set of edges (The edges corresponding to the all-but-one clause assignment, plus one edge in the last remaining cycle) S.

If we can find an S’ with *less* edges than S, it must have a way to set each variable to satisfy each clause, and thus the original formula is satisfiable. If we can’t, then the formula is unsatisfiable.

**Difficulty: **8. I don’t think the “satisfy all but one clause” thing is straigthforward. I guess if you give that version of 3SAT to the students, you can sort of see what you need to do with the edges in the graph. But this way of looking at the problem took me a while to wrap my head around.

**The problem: **String-To-String Correction. This is problem SR20 in the appendix.

**The description: **Given two strings x, and y over a finite alphabet ∑, and an integer K. Can we start from x and derive y in K or fewer steps, where each step is either deleting a symbol or interchanging a symbol.

**Example: **The puzzles I’m thinking of are the ones that say “Can you go from HELLO to OLE in 6 steps?”

- HELLO
- ELLO
- ELO
- EOL
- OEL
- OLE

The reason this is a hard problem is when you have repeated symbols, and need to “choose” which ones to delete. A slightly harder example is to go from ABABABABBB to BAAB. Now which A’s and B’s you choose to delete have an effect on the total number of moves.

The paper by Wagner that has the reduction makes the point that if you replace “Delete” with “Insert”, you get basically the same problem but go from y to x instead of from x to y. The paper gives other variants that have polynomial solutions (allowing insertions and deletions and changes of a character, whether or not you allow interchange).

**Reduction: **Wagner uses Set Covering. So we’re given a collection of sets c_{1}..c_{n} all subsets of some set W, and an integer K.

Define t = |W| and r = t^{2}. Pick 3 symbols Q,R, and S that are not in W. The string x will be:

Q^{r}Rc_{i}Q^{r}S^{r+1}

..concatenated together each time for each c_{i}. (So the copy that has c_{1} is first, then another copy with c_{2}, and so on.

The string y will be n copies of the substring RQ^{r}, followed by w, followed by n copies of the substring S^{r+1}

The K’ for this problem is (K+1)*r-1 + 2t(r-1) + n(n-1)(r+1)^{2}/2 + d.

d is r*n+|c_{1}…c_{2}| – |W|.

From here, Wagner explains that the way the correction problem works is by deciding which copies of the R, Q^{r} and S^{r+1} sets in x match to the corresponding ones in y. By counting all of the operations that “have” to be done (by deleting or moving characters), he shows that the a correction problem that has weight of K’ has to cross over certain c_{i} sets to make a covering, and that there have to be K or less of those sets.

**Difficulty: 8**. The construction is pretty crazy. You can eventually see where the components come from, but I can’t imagine how he came up with these things (especially the x and y strings) in the first place.