# Tag Archives: Difficulty 8

## Pruned Trie Space Minimization

This problem is hard to explain, partially because the definition given by G&J doesn’t really map to the structure they are talking about easily.

The problem: Pruned Trie Space Minimization.  This is problem SR3 in the appendix.

The description in G&J: Given a finite set S, a collection F of functions mapping elements of S to positive integers, and a positive integer K.  Can we find a sequence of m distinct functions from F <f1 .. fm> such that:

• For each pair of elements a and b in S, there is some function fi in the sequence where fi(a) ≠ fi(b)
• For each i from 1 to m, define N(i) to be the number of distinct tuples X= (x1..xi) where more than one a in S has the tuple (f1(a), …, fi(a)) = X, the sum of all of the N(i) values is at most K?

A better description: G&J’s definition removes all knowledge of the “tries” from the problem.  The Comer and Sethi paper that is referred to in the appendix I think does a better job.

First, a trie is a tree that separates a sequence of strings by letters. The idea is that each string has a unique path through the tree.  Here is the tree used in the paper:

This trie shows the path for the set of strings: {back, bane, bank, bare, barn, band, bang, barb, bark, been} by building the tree by considering letters in the string from left to right.  By using different orders of considering letters, we will get differently shaped tries, with different numbers of internal nodes.

A pruned trie recognizes that long paths of nodes with 1 child doesn’t actually need to be represented.  For example, once you go down the “b-e” side, the only place you can end up is at “been”.  So the trie is pruned by removing all such chains (we would consider the “e” node a leaf).

What we are interested in doing is finding an ordering on the letters in the string (or, more generally, the “attributes” of an element we are trying to distinguish) in order to minimize the number of nonleaf nodes in the pruned trie.

The actual question we want to solve is: Given a set of strings S and an integer K, can we construct a trie that differentiates the S strings with K or less internal nodes?

I think the way this maps to the G&J definition is:

S is the set of strings.  F is the set of attributes that map strings to an order of choosing attributes.  The sequence of functions <f1, …, fn> are the orders in which we choose attributes.  So f1(a) is the first node in the trie that we go to on the string a, f2(a) is the second node we go to and so on.  The fi(a) ≠ fi(b) requirement says that we need to eventually differentiate each string from each other, and the N(i) number is counting the number of internal nodes at each height of the tree:

Example: For the picture shown above, we get the following pruned trie (also from the paper):

This trie has 5 internal nodes.

Reduction: G&J say that the reduction goes from 3DM, but in the paper it goes from 3SAT. So we’ll start with a formula in 3CNF form with n variables and m clauses.  The strings we’ll build will have 3n+3m attributes (you can think of this as strings of length 3n+3m).    The first 2n attributes will correspond to literals (one attribute for the positive setting of a variable, one attribute for the negative setting).  The next 3m attributes will correspond to clauses (3 attributes for the 3 possible positions a variable can appear in a clause), and the last 3 attributes correspond to literals (to combine the positive and negative setting of that variable’s literals).

We will have one string for each literal (a 1 in the attribute matching that literal’s positive or negative setting, a 1 in the attributes matching that literal’s position in clauses, and a 1 in the attribute matching that variable, 0’s everywhere else).  We will have one string for each clause (a 1 in the three positions in each clause, 0’s everywhere else).  Then we will have a sequence of “hard to distinguish” strings made of decreasing numbers of 2’s (with 0’s everywhere else).

Here’s the example construction from the paper (blank spaces are zero’s).  It’s a little confusing because they chose n=m=3, but you can see where the various pieces are:

K=2n+m.

If the formula is satisfiable, then the ordering of attributes where we put all of the literals that form the satisfying arrangement first, then all of the clauses, then the W attributes (for the variables) distinguishes the strings in L with 2n+m internal nodes.

In fact, all tries must have at least K internal nodes to distinguish the strings in L- that can be seen from the table, since we have K strings made up of decreasing numbers of 2’s.  We also have to distinguish the strings in order (the strings with the most 2’s first, then the ones with less 2’s, all the way down to the last one with just one 2).  We need to choose one attribute for each literal (positive or negative).  Suppose we choose an attribute Ui (or its negation).  That node in the trie has 3 children:

• A 2, which distinguishes the string in L.
• A 1, which distinguishes the string corresponding to that literal in J.
• A 0, for everything else.

What this means is that we have “distinguished off” the literal string (in J) from the rest (on a 1), which means that the 1 it has in the clause position will not interfere with the 1 in that position of the clause string (in K).  So each clause string will be able to be distinguished by the clause position that satisfies the string.

So, if we have a trie with “only” K internal nodes, the attributes must line up to allow us to have a setting of a variable to satisfy each clause.

Difficulty: 8, with the Comer and Sethi trie definition.  If you are going straight from G&J’s definitions, it’s at least a 9.

## Numerical 3-Dimensional Matching

SP15 is 3-Parttion.

The next problem is one of G&J’s “unpublished results” problems.  I tried figuring out an elegant way to doing it, but couldn’t make it happen.

The problem: Numerical 3-Dimensional Matching.  This is problem SP16 in the appendix.

The description: Given three sets W, X, and Y, each containing the same amount (m) of elements with positive “sizes” and a positive bound B.  Can we create m sets A1 through Am (containing 3 elements each), such that:

• Each Ai has exactly one element from W, X, and Y
• The sum of the sizes of the elements in each Ai is exactly B
• Each element in W, X and Y is in some Ai

Example: Suppose we have the following sets:

• W has elements with sizes {1,2,3,4}
• X has elements with sizes {12,11,7,5}
• Y has elements with sizes {1,1,4,5}  (the ability to allow repeat numbers is why we define the sets as elements with sizes rather than sets of integers)

If B=14, then the partition where A1 is the first element in W, X, and Y, A2 is the second element in W, X, and Y, and so on gives each Ai set a sum of 14.  Obviously, we don’t need to choose corresponding elements from W, X, and Y to form the sets (for example, rearranging the elements in X to be in increasing order doesn’t change whether the problem can be solved, just the exact composition of the Ai sets)

Reduction: I tried doing a reduction using 3-partition, but got stuck (I’ll show it below, in case you want to try to fix it).  G&J refer you to Theorem 4.4 in the book, which is the proof that 3-partition itself is NP-Complete.

We can follow that along and do similar steps to our problem:

• Theorem 4.3 shows how to turn 3DM into 4-partition (a problem like 3-partition but each set in the solution has 4 elements instead of 3).  Since the sets that are created in the 4-partition solution come from 4 different places (page 98 calls then a “ui, a wi[·], an xj[·], and a yk[·]).  Since these partitioned sets all add to the same total (B) and come from 4 disjoint parent sets, we can see how we could do basically the same reduction and show that the “numerical 4-dimensional matching problem” is NP-Complete.
• Theorem 4,4 shows how to turn a 4-parition problem into a 3-partition problem.  The idea is to add enough “pairing” and “filler” elements to the 3-partition instance to make any 4-partition set be split into two 3-partition sets, each consisting of 2 elements from the 4-parittion, plus the “pairing” element of one of the 2 elements chosen.  We can do something similar converting numerical 4-dimensional matching to numerical 3-dimentional matching.  (The difference is that we are given specifically which sets the elements are coming from)  So, if we’re given W, X, Y, and Z in our numerical 4DM instance, we construct W’ to be elements from W and Y, X’ to be elements from X and Z, and Y’ to be the pairing elements of pairs from W’ and X’.  We then need to add enough filler elements to our 3 sets in a similar way to the 3-partition proof (again, the difference is that we have to specifically assign them to W’, X’, or Y’.  But that can be determined by how the 3-partition proof allocates the items)

Difficulty: If you have gone over the 3-partition reduction, this is probably a 6.  Lots of tricky math but doable in a (hard) homework.    But keep in mind you’re tacking it on to the difficulty 8 of understanding the 3-partition reduction in the first place.

My reduction I couldn’t get to work: I really want there to be an easier way to do this.  I tried reducing from 3-partition directly because the problems are so similar.  Here is where I got to:

We’re given a 3-parititon instance S, and an integer B.  Our goal is to split S into sets of size 3 that all add up to B.

So, let’s use S to create 3 sets in our numerical 3DM instance:

• W has all of the elements in S
• X has all of the elements in S, but the sizes are each increased by 10B.
• Y has all of the elements in S, but the sizes are each increased by 100B.

This would make B’ be 111B.

S has a 3-parititon, then for each set {si, sj, sk}, we take the three sets {wi, xj, yk}, {wj, xk, yi}, and {wk, xi, yj}  This will solve the numerical 3DM instance.

My problem comes showing this in the other direction.  If we have a numerical 3DM solution, we can only construct the 3-partition instance if the sets in the 3DM solution arrange themselves nicely like they do above.  I need to show that if the 3DM solution has {wi, xj, yk}, then the set in the 3DM solution that contains wj also contains xi (or xk) and yk (or yi).  I think you can get there by using the rules about how the bounds of the elements in the 3-partition instance work, but the work you need to do to show that it’s true makes this way of doing things no longer “easier” than the Theorem 4.4 proof I sketched above, so I gave up on it.

I still wish there was a more elegant way to transform this problem, though.

## 3-Matroid Intersection

This is a problem that was really hard for me to understand and explain.  It may be less so or other people, though.

The problem: 3-Matroid Intersection.  This is problem SP11 in the appendix.

The description: Given three matroids (E,F1), (E, F2), (E,F3), and a positive integer K.  Can we find a subset E’ of E with K elements that is in the intersection of F1, F2, and F3?

A matroid (E,F) is a set of elements E, and a family F of subsets of E with the following properties:

• Some set S in the family F implies that all subsets of S are also in F.
• If I have two sets S and S’ in F, and |S| = |S’|+1, then there is some element e that is in S but not S’, and adding e to S’ gets us a set also in F.

A common example of matroids comes from graph theory: Given a graph G=(V,E), the matroid based on this graph is (E,F), where E is the set of edges in the graph, and F is the collection of edges that have no cycles (i.e, forests)  This follows the two rules above:

• If S is a collection of edges that do not form a cycle, then any subset of S is also a collection of edges that do not form a cycle.
• If I have two collections of edges S and S’, and S is larger than S’ by one element, we need to find an edge in S that connects two connected components  (trees in the forest) in S’.  Since each tree on K vertices has exactly K-1 edges, S has at least one edge that is not part of any tree in S’ (adding an edge to a tree causes a cycle).  That edge must either connect two trees together, or connect a vertex in some tree to a vertex not used in S’, or connect two vertices not used in S’.  Either way, adding this edge to S’ gives us another forest.

Example:  Since the reduction we’re going to use involves building three matroids from a basic graph, let’s do that.  Here is a graph:

We’re going to use this graph to induce 3 matroids.  The first will be the graph matroid (using the rules defined above) on the undirected version of this graph:

So F1 is the set of all forests in the above graph.

F2 is collections of sets of edges in the directed graph where each vertex has at most one incoming edge.  So, sets like {(s,a), (a,t)} or {(a,b), (b,c), (c,t)}.  The exception is that vertex s must have zero incoming edges in any set in the family.

F3 is built similarly, except we count outgoing edges, and t (instead of s) is the vertex restricted to have 0 outgoing edges.

If K=4, then the following set of edges is in all three families: {(s,a}, (a,b), (b,c), (c,t)}.  It’s acyclic, so it’s in F1.  No vertex in that set has indegree or outdegree of more than 1 (and s has indegree 0 and t has outdegree o), so it’s in F2 and F3.

Just to clarify, because it’s confusing to me.  The sets E that are the “elements” of the matroid are edges in the graph, and the families F are collections of edges.  We just found a collection of 4 edges that will be in all three families.

The reduction: G&J don’t give a reference to this problem, but suggest using 3DM. In my searches, I found that Wikipedia and this set of lecture notes from MIT both want to use Hamiltonian path between two vertices.   (Notice that the “corrected” reduction for HP actually specifies the two vertices to connect between, so the HP reduction also proves this variant).  Here is my explanation of the reduction in the MIT lecture notes.  The class was taught by Michael Goemans, and I got the notes from his web site.

So, we start with an instance of HP- a possibly directed graph G=(V,E), and two special vertices s and t that we want to see if there is a Hamiltonian Path between.  We build the three matroids described in the examples based on using the edge set E: One based on forests in E, one based on collections of edges that have indegree at most 1 on each vertex, and one based on collections of edges that have outdegree at most 1 on each vertex.  Set K = |V|-1.

Notice that an intersection of these three matroids has to be a collection of paths that do not encounter the same vertex twice.  If we can create a collection that has K edges in it, the path will start at s (because F2 ensures that no edges going into s will be part of the intersection), encounter each vertex once (because F1 ensures that we have no cycles, and thus hit each vertex at most once), and end at t (because F3 ensures that no edges leaving t will be part of the intersection).  Thus, we have an intersection of K edges if and only if V has a Hamiltonian cycle.

Difficulty: It’s an 8 for me.  Since this is taught in a relatively small part of one lecture of a class (yes, a class at MIT, but still just a class), presumably it may be less for other people.  But I have a lot of trouble thinking in terms of matroids, and even now, I’m not really convinced that the three families of edges we create are polynomial in size.