Partition

First, an administrative note.  I wanted to call this site “Annotated NP-Complete Problems”, because the idea is that I’m going through the Garey&Johnson book and adding notes to each problem talking about how to do the reduction and how applicable it is for student assignments.  But that name is sort of taken already, and I don’t want to step on any toes or cause any confusion.  So I figured that I’d change the title now, before anyone finds out about the site.

And as I’ve been writing, these notes feel more like “discussions” than “short annotations” anyway, so I think this is a better title.

This is the last of the “core six” problems in the G&J book, as defined in Chapter 3.  There are several other problems presented in that chapter, with proofs, but since the point of this exercise is to get to the problems without proofs, I’m going to skip over them, and come back to them only if I need to (because they’re the basis for a future reduction, for example).

The problem: Partition (PART)

The definition: Given a set of integers A, can I fid a subset A’⊆A such that the sum of all of the elements in A’ is exactly half the sum of all of the elements in A?

(Alternately, given a set of integers A, can I split all of the elements of A into two subsets B and C, such that the sum of all of the elements in B is equal to the sum of all of the elements in C?)

(Alternately (this is the G&J definition), given any set A, where each element a∈A has a size s(z) that is a positive integer, can we find a subset A’ of A where the sum of the sizes of everything in A’ is exactly equal to the sum of the sizes of everything in A-A’?)

Example: Suppose A was the set {1,2,3,4,6}.  Then A’ could be {1,3,4}, forming a partition (A’ and A-A’ both sum to 8).  If instead, A was the set {1,2,3,4,5}, then no possible partition exists.  This may seem like a silly case (any set A where the sum of the elements is odd has no partition), but even if the sum of the elements of A is even, it’s possible that no partition exists- for example if A is {2,4,100}.

Note: Partition is one of my favorite NP-Complete problems, because the description is so easy, and it seems so simple.  It’s probably my go-to example if I want to explain this stuff to non-technical people in under a minute- pretend that the elements of A are weights, and the value of each element is the weight in ounces.  The partition problem asks “given this group of weights, and a “scales of justice” style scale, can you make the scale balance using all of the given weights?”.  It’s pretty surprising to most people that the only known way to answer that question basically boils down to “try all possible combinations of arranging things on the scale”

The reduction: From 3DM.  G&J provide the reduction on pages 61-62.  The basic idea is to have one element in A for each element in M, and to represent the elements in A as binary numbers that have 1’s in positions corresponding to which element from W, X, and Y we get the triple from.  The number is set up so that we have a maximum possible value of the sum of all of these elements in A.  They then add two “extra” elements to A so that each side of the partition (elements of M that make a matching, and everything else) will add up to the same value

Difficulty: 7.  The idea of using binary numbers to keep track of “positions” in a list is tricky, but comes up in lots of places.  If students have seen that idea before, this becomes a hard but doable problem.  If students haven’t seen that idea before, I’d make this a 9.

 

 

Hamiltonian Circuit

This is one of my favorite problems, but one of the worst reductions to understand.  I usually skip over it when I teach it, showing them the picture of the widget and saying “See? It’s Nuts!” and moving on

The problem: Hamiltonian Circuit (HC)

The definition: Given a graph G=(V,E), where |V| = N.  Can I create an ordering <v1, v2…vn>of the vertices where an edge exists between each (vi, vi+1) in the sequence, and the edge (vn, v1) also exists?

(In other words, does a sequence of edges exist that lets me start at some vertex, visit each other vertex exactly once, and return to the start?)

Example:

HC example

The sequence <a,d,h,k,g,c> forms a Hamiltonian Circuit

The reduction: Is from VC, and it’s nuts.  G&J do it on pages 56-60.  Starting with an instance of VC (A graph G, and an integer K), build a new graph G’ with:

  • K “selector” vertices (a1-ak)
  • 12 vertices and 14 edges (see page 57) for each edge in G (a “cover-testing” component
  • Edges to connect the selector vertices to the start and end of the cover-testing component.

The Cormen algorithms book has an example of this, and it takes a graph with 4 vertices and 4 edges and builds a HC instance with 50 vertices and 72 edges.

Difficulty: 9, only because I’m sure there exist reductions that have like 25 page proofs, and I want to save my 10 for that.  But really, I think just making the students trace this reduction on an example is a really hard problem.

 

Independent Set

The last of the three similar graph problems, it’s also probably the easiest reduction.  When I teach NP-Completeness, I’ve had a lot of success starting with this reduction (instead of building a stable of NP-Complete problems from zero like the textbooks do).  You kind of have to say “Trust me, Vertex Cover is NP-Complete, I’ll show you why later”, but starting with a simple reduction means that they can follow the idea and the process right from the beginning without having to be bogged down in ugly conversions involving SAT.

The problem: Independent Set (IS)

The definition Given a graph G=(V,E) and an integer J.  Does G have a collection of at least J vertices where no edges connect any two vertices in J?

Example:  Using the same graph:

VC exampple

{a,h,k} is an independent set of size 3.

The reduction: From VC.   Use the exact same graph, and set J=|V|-K.  (G has an independent set of J vertices exactly when it has a vertex cover of K vertices.  Vertices not in the cover can’t have edges between them).

Difficulty: 1.  There’s a reason why I do this first (though sometimes I go from Clique instead).  In general, though, I try to avoid making them do problems that are too easy for homeworks and tests.  If the reduction is trivial, it makes showing that the conversion solves the original problem difficult.  You get a lot of proofs that basically say “C’mon!  They’re the same thing!”.  Which is not what I’m looking for 🙂

 

Clique

G&J don’t even bother to prove Clique is NP-Complete, just stating that it (and Independent Set) is a “different version” of Vertex Cover.  But the problem comes up often enough that it’s worth seeing in its own right.

The problem: Clique (I’ve also seen “Max Clique” or “Clique Decision Problem” (CDP))

The definition: Given a graph G=(V,E), and a positive integer J (G&J say J needs to be ≤ |V|, but I think you can allow large values and make your algorithm just say “no”).  Does J contain a subset V’ of V, or size at least J, where every two vertices in V’ are joined by an edge in E?

Example:  Using the graph we had for VC:

VC exampple

{a,c,d} form a clique of size 3.  There are no cliques of size 4, but if we add the edge (d,g) and the edge (c,h), then the vertices {c,d,g,h} would be one.

The reduction: From Vertex Cover.  Given a graph G and an integer K that is an instance of Vertex Cover, take the complement of G (the complement of a graph has the same vertices, but the set of edges is “inverted”:  (u,v) is an edge in the complement if and only if it was not an edge in G), and use it as your instance of CDP.  Set J= |V|-K

Difficulty: 3.  It’s not a 2 because it takes a little thought to see why the complement works.

Note: As stated perviously, I teach this course using the Cormen book, so in class I show Clique is NP-Hard by reducing from 3SAT.  It’s a good example of a hard reduction, probably about as hard as the VC reduction G&J do.

Vertex Cover

Back to the “core 6” with a classic problem from graph theory.  There are lots of similar graph problems, so it’s important to keep them all straight.

The Problem: Vertex Cover (VC)

The Definition: Given a graph G=(V,E) and a positive integer K (G&J say that K ≤ |V|, but really, if it’s bigger than that, you can have the algorithm just return “yes”).  Can I find a subset V’ of V, |V’| ≤ K, where every edge in E has at least one endpoint in V’?

Example: Hmm, I wonder if I can insert an image..

VC exampple

Ok, looks like that worked.  In the graph above, {c,d,g} is a vertex cover, because all edges have at least one endpoint in the set {c,d,g}.  As we’ll see later, a graph containing a clique (complete subgraph) of N nodes will need at have at least N-1 of those vertices in the cover.

The reduction: From 3SAT.  G&J on pages 54-56 show the construction.  We build a graph with vertices for all variables and their negation, with an edge between them.  We have three variables for each clause, connected in a triangle.  (The three variables correspond to “positions” in the clause).  Then there is an edge from each vertex corresponding to a literal (or its negation) to its corresponding “clause vertex”- the vertex that holds the position of that variable in the original formula.  K= # of variables + 2* number of clauses.

Difficulty: 7 , assuming it’s the first time a student has seen this sort of reduction.  If they have seen something similar (for example, a reduction from 3SAT to Clique), then maybe a 5.  The ideas are pretty similar.

Note: I teach my algorithms class using the Cormen book.  In it, they reduce 3SAT to Clique, proving Clique is NP-Complete, and then reduce Clique to VC. This works in the exact same way as the reduction from VC to Clique that I’ll be doing here next.

Exact Cover by 3-Sets

This is not one of the “core six”, but it is used a lot in reductions, so it’s worth including since it builds right off of 3DM

Also, I think I will include examples for lots of these problems.  Lots of times I have trouble parsing the problem description, so creating a concrete example is helpful.

The problem: Exact Cover by 3-Sets (X3C)

The definition: Given a set X, with |X| = 3q (so, the size of X is a multiple of 3), and a collection C of 3-element subsets of X.  Can we find a subset C’ of C where every element of X occurs in exactly one member of C’?  (So, C’ is an “exact cover” of X).

Example:  Suppose X was {1,2,3,4,5,6}

If C was {{1,2,3},{2,3,4},{1,2,5},{2,5,6}, {1,5,6}}  then we could choose C’ to be {{2,3,4},{1,5,6}} as an exact cover because each element in X appears exactly once.

If instead, C was {{1,2,3},{2,4,5},{2,5,6}}, then any C’ we choose will not be an exact cover (we need all 3 subsets to cover all elements in X at least once, but then the element 2 appears three times).

Note that if we do have an exact cover, C’ will contain exactly q elements.

The proof:G&J prove this “by restriction” which basically means that they show how X3C is a more general version of 3DM .  If you view an instance of 3DM as a special case of X3C by letting XX3C = W∪X3DM∪Y and C = all q3 triples taking one element from W, one element from X3DM, and one element from Y), then the C’ you get as a solution to X3C is also a matching for 3DM.

(Note that lots of these reductions will be between problems that use the same symbols in both problems.  I’ll do my best to disambiguate by using subscripts to mark where the common letter comes from.  So, in this case X3DM is the set X that we get from the 3DM problem (one of the 3 input sets), and XX3C is the set we build for the X3C problem (the set we need to cover).  Hopefully that doesn’t make things more confusing)

(Also note that like many (most?) Computer Science people, I’m a big fan of nested parentheses.  I’m sure all of you can follow along with that. (Right?))

Personally, I don’t like proofs by restriction as a way to teach this stuff to students.  It’s very easy to mess up the “this is a special case” argument- you get incorrect arguments like “this is just a special case of SAT where we return true wherever there’s a cover and false when we don’t!”.  Also it feels like you’re going backwards from a real reduction, and since getting the direction wrong is probably the #1 most common issue students have in doing reductions, anything that makes their job harder isn’t a great idea.

If I taught this in a class, I’d make them do a proper reduction out of it- start with an instance of 3DM (W, X3DM, and Y), and build an instance of X3C (creating XX3C and C), and going from there.

Difficulty: 3.  It’s only not a 2 because I, at least, have trouble understanding what the X3C and 3DM problems are asking.  It’s not as straightforward to explain as many other problems

 

3-Dimensional Matching

Last one for today.  I’m not sure how exactly I’ll space these out. We’ll see.

The Problem: 3-Dimensional Matching (3DM)

The Definition: Given disjoint sets W,X,Y all with the same number (q) of elements, and a set M of triples from WxXxY.  Is there a subset M’ of M of size q where no two elements of M’ agree in any coordinate?

I always have trouble picturing what this means, so here’s an example:

Let W = {1,2,3,4}, X = {10,20,30,40}, Y = {100,200,300,400}. q is 4 in this case.

M is a set of triples like (1,30,400), (2,20,100), and so on, with one element from each of W,X, and Y.

We’re asking for q elements of M that have no duplicates in any coordinate.  In other words, each of {1,2,3,4} appears in exactly one element of M’, each of {10,20,30,40} appears in exactly one element of M’, and so on.  So, a legal M’ could be:

{(1,20,300), (2,30,400), (3,40,100), (4,10,200)}

..assuming all 4 of those triples happened to be in the M we were given.

The Reduction: From 3SAT.  Pages 50-52 explain how we take the clauses and build W, X, Y, and M from them.  It’s pretty crazy-M contains 3 kinds of triples for 3 different kinds of jobs, build according to 3 different sets of rules.

Difficulty: 8.  I couldn’t imagine a student coming up with this without a lot of hand-holding.

3-Satisfiability

The first of the “core six” problems, and the first NP-Complete problem that’s actually a reduction from a known NP-Complete problem.

The Problem: 3-Satisfiability (3SAT)

The Definition: (p.46) Given a set of variables U, and a set C of clauses, where each clause has exactly 3 elements, can we assign true/false values to the variables in U that satisfies all of the clauses in C?

(Note that just as the Satisfiability problem was really “CNF-SAT”, most people would call this problem “3-CNF-SAT”)

The Reduction: From SAT.  Pages 48-49 give a simple algorithm to convert arbitrarily-sized clauses into clauses of size 3, possibly by adding some extra variables.

Difficulty: 6.  It’s pretty fiddly to come up with the conversions in all cases.  But this really depends on the students’  familiarity with Boolean logic.

As an aside, I find myself compelled to note that while 3-CNF-SAT is NP-Complete, the similar problem of 2-CNF-SAT (where all clauses are of size 2) can be solved in polynomial time.  To me, one of the coolest things in Computer Science is how such small changes to a problem (like 3-CNF to 2-CNF, or  Fractional Knapsack to 0/1 Knapsack-where the solution space shrinks!) can drastically affect the complexity of the problem.

Satisfiability

On page 46-47, Garey&Johnson list out 6 “basic core” NP-Complete problems.  I’ll go over all of them quickly, because they will be used in the reductions for many other problems.  But they all start with the “first” NP-Complete problem, Satisfiabiity:

The Problem: Satisfiability (SAT)

The Description: (G&J, p. 39)  Given a set U of variables, and a set C of Clauses over U, is there a way to set the variables in U such that every clause in C is true?

(Note that this is not general satisfiability, where the input is any boolean formula.  This is really “CNF-Satisfiability”, where the formula is in conjunctive normal form.  But you can use logical rules to turn any general logical formula into a CNF formula)

The proof: (p. 39-44) G&J provide a proof of Cook’s Theorem, which provides a “first” NP-Complete problem by  representing the formula in non-deterministic Turing Machines.  It’s a crazy hard proof, we spent a week (at least) of my graduate school Theory of Computation class going over it, and I still have trouble following it.  I think it’s way too hard for undergraduates to follow.

Difficulty: 10

Annotated NP-Complete Problems: The overview

In 1979, Garey&Johnson published  the classic work: Computers and Intractability: A Guide to the Theory of NP-Completeness.  It’s a fantastic book, that every Computer Scientist should own.  The first half of the book describes the theory of NP-Completeness, and shows methods to prove problems NP-Complete.  The last 100 pages list out NP-Complete problems, with a short description of why it’s NP-Complete, along with some other facts.

I teach Computer Science at Ohio Wesleyan University, and one of the courses I regularly teach is our upper-level Analysis of Algorithms class.  One of the subjects in this class is NP-Completeness, and I have students do some reductions on homeworks and tests.  I use the Garey&Johnson book to help me find problems, but when I do, I run across several issues:

1) The book was published in 1979, and many more problems have been discovered since then.

2) For most problems, the authors do not provide a proof of NP-Completeness, just a description of the problem they are reducing from, with a reference to the paper the reduction was made in.  Sometimes, even the reference is missing.

Fixing issue #1 is beyond my ability, though I would dearly love an updated version of this book.  (Though in truth, a modern list of NP-Complete problems would probably be larger than any physical book could contain).  As a teacher, though, issue #2 was an even greater challenge.  When I’m looking for a problem to assign on a homework or test, I’m usually time-constrained (I teach this topic at the end of the semester, when lots of demands are on my time), and I need to determine relatively quickly how hard the reduction will be for the students.  This has proven difficult for me, and I wondered if maybe it was difficult for other faculty as well.

Thus,  I have decided to spend some time building this resource.  The goal is for this to be an ongoing project, where I (eventually, probably over the course of years) go through this book, problem by problem, and for each proof that is not done in the book, provide:

  • A description of the problem
  • A quick sketch of how the reduction will go (this is basically the first step of the proof- where we convert an instance of a known NP-Complete problem to an instance of our problem.)  I find this to be the most difficult, “flash of insight” driven part of the proof.  Hopefully by providing this part, the rest of the proof will fall into place.
  • A ranking of difficulty from 1 (trivial) to 10 (basically impossible to be done by a student just learning this for the first time).  In my classes, I tend to find problems in the 3-6 range to be good choices

Hopefully, this will become a resource for faculty.  I plan on password-protecting older entries so that students can’t find solutions by searching the web.

I fully expect to make mistakes, to have ratings the people disagree with, for other people to find more elegant and simpler reductions, and so on.  Please let me know if any of that happens!

This is not meant to be a place to teach basic NP-Completeness (what a reduction is, why we care about it, anything like that).  That stuff is important, but Garey&Johnson teach it well 🙂

I’ll start with a quick overview of the problems proved in the early part of the book, because they form the “core” from which most of the reductions are based.