# Monthly Archives: February 2017

## Pruned Trie Space Minimization

This problem is hard to explain, partially because the definition given by G&J doesn’t really map to the structure they are talking about easily.

The problem: Pruned Trie Space Minimization.  This is problem SR3 in the appendix.

The description in G&J: Given a finite set S, a collection F of functions mapping elements of S to positive integers, and a positive integer K.  Can we find a sequence of m distinct functions from F <f1 .. fm> such that:

• For each pair of elements a and b in S, there is some function fi in the sequence where fi(a) ≠ fi(b)
• For each i from 1 to m, define N(i) to be the number of distinct tuples X= (x1..xi) where more than one a in S has the tuple (f1(a), …, fi(a)) = X, the sum of all of the N(i) values is at most K?

A better description: G&J’s definition removes all knowledge of the “tries” from the problem.  The Comer and Sethi paper that is referred to in the appendix I think does a better job.

First, a trie is a tree that separates a sequence of strings by letters. The idea is that each string has a unique path through the tree.  Here is the tree used in the paper:

This trie shows the path for the set of strings: {back, bane, bank, bare, barn, band, bang, barb, bark, been} by building the tree by considering letters in the string from left to right.  By using different orders of considering letters, we will get differently shaped tries, with different numbers of internal nodes.

A pruned trie recognizes that long paths of nodes with 1 child doesn’t actually need to be represented.  For example, once you go down the “b-e” side, the only place you can end up is at “been”.  So the trie is pruned by removing all such chains (we would consider the “e” node a leaf).

What we are interested in doing is finding an ordering on the letters in the string (or, more generally, the “attributes” of an element we are trying to distinguish) in order to minimize the number of nonleaf nodes in the pruned trie.

The actual question we want to solve is: Given a set of strings S and an integer K, can we construct a trie that differentiates the S strings with K or less internal nodes?

I think the way this maps to the G&J definition is:

S is the set of strings.  F is the set of attributes that map strings to an order of choosing attributes.  The sequence of functions <f1, …, fn> are the orders in which we choose attributes.  So f1(a) is the first node in the trie that we go to on the string a, f2(a) is the second node we go to and so on.  The fi(a) ≠ fi(b) requirement says that we need to eventually differentiate each string from each other, and the N(i) number is counting the number of internal nodes at each height of the tree:

Example: For the picture shown above, we get the following pruned trie (also from the paper):

This trie has 5 internal nodes.

Reduction: G&J say that the reduction goes from 3DM, but in the paper it goes from 3SAT. So we’ll start with a formula in 3CNF form with n variables and m clauses.  The strings we’ll build will have 3n+3m attributes (you can think of this as strings of length 3n+3m).    The first 2n attributes will correspond to literals (one attribute for the positive setting of a variable, one attribute for the negative setting).  The next 3m attributes will correspond to clauses (3 attributes for the 3 possible positions a variable can appear in a clause), and the last 3 attributes correspond to literals (to combine the positive and negative setting of that variable’s literals).

We will have one string for each literal (a 1 in the attribute matching that literal’s positive or negative setting, a 1 in the attributes matching that literal’s position in clauses, and a 1 in the attribute matching that variable, 0’s everywhere else).  We will have one string for each clause (a 1 in the three positions in each clause, 0’s everywhere else).  Then we will have a sequence of “hard to distinguish” strings made of decreasing numbers of 2’s (with 0’s everywhere else).

Here’s the example construction from the paper (blank spaces are zero’s).  It’s a little confusing because they chose n=m=3, but you can see where the various pieces are:

K=2n+m.

If the formula is satisfiable, then the ordering of attributes where we put all of the literals that form the satisfying arrangement first, then all of the clauses, then the W attributes (for the variables) distinguishes the strings in L with 2n+m internal nodes.

In fact, all tries must have at least K internal nodes to distinguish the strings in L- that can be seen from the table, since we have K strings made up of decreasing numbers of 2’s.  We also have to distinguish the strings in order (the strings with the most 2’s first, then the ones with less 2’s, all the way down to the last one with just one 2).  We need to choose one attribute for each literal (positive or negative).  Suppose we choose an attribute Ui (or its negation).  That node in the trie has 3 children:

• A 2, which distinguishes the string in L.
• A 1, which distinguishes the string corresponding to that literal in J.
• A 0, for everything else.

What this means is that we have “distinguished off” the literal string (in J) from the rest (on a 1), which means that the 1 it has in the clause position will not interfere with the 1 in that position of the clause string (in K).  So each clause string will be able to be distinguished by the clause position that satisfies the string.

So, if we have a trie with “only” K internal nodes, the attributes must line up to allow us to have a setting of a variable to satisfy each clause.

Difficulty: 8, with the Comer and Sethi trie definition.  If you are going straight from G&J’s definitions, it’s at least a 9.

## Dynamic Storage Allocation

Since Bin Packing was a redo, here is the first real problem in the Storage and Retrieval section.

The problem: Dynamic Storage Allocation.  This is problem SR2 in the appendix.

The description: Given a set A of items.  Each item a in A has size s(a), arrival time r(a) and departure time d(a) (all positive integers).  We’re also given a storage size D.  Can we allocate the items to D “slots” of storage such that:

• Each item is stored in consecutive slots.  So an element a has to be contained in s(a) adjacent locations from 1 to D.
• No two items overlap the same slot during the time they are in storage. In other words, if two items a and a’ are mapped to the same slot in D, the must not have any overlap between their arrival and departure times.

Example: Here’s a simple set of items:

 Item Number Arrival Departure Size 1 1 2 4 2 2 3 4 3 1 3 2

If D=6, we can store these items by using slots 1-4 to hold both items 1 and 2 (notice that they don’t overlap in time, and having one item arrive right as the other departs is ok), and slots 5-6 to hold item 3.

Reduction: The reference to Stockmeyer in G&J is to a private communication.  I tried working out my own reduction from 3-Partition, but couldn’t make it work.  My approach was to make the sizes of the elements in the 3-Parttion instance map to times in this problem, since G&J give the hint that you can make all sizes 1 or 2.  But I couldn’t figure out how to make it work.  I sort of expect there to be 3 possible sizes for a 3-partition problem, instead of 2.

Eventually, I found a paper by Lim that uses regular Partition, using the storage allocation problem as a special case of a problem involving berthing ships.   (The ship problem adds extra complications like each ship needing a specified clearance between it and other ships).  He starts with a set A of elements, and defines T to be the sum of all of the element sizes.  He then creates one item in the storage allocation problem for each element in S.  For a given s(a) in A, the new item has size s(a), arrival time 2, departure time 3 (so exist for just one time duration) .  He also adds 9 new items that have the effect of creating only two sequences of storage slots that can hold the items from s, each of size= T/2. We can place the items in these slots if and only if there is a partition of S.

Difficulty: 7.  I don’t think the idea is too hard to understand, but the 9 sets that are created are hard to come up with (even if you can understand what their purpose is, coming up with the sets that actually get that purpose accomplished is pretty hard).

## Ratio Clique

Last week it was pointed out to me that my reduction for Balanced Complete Bipartite Subgraph was wrong, and in my searches to fix it, I found that the real reduction (by Johnson) used a variant of Clique that said (without proof)) that Clique is NP-Complete even if K was fixed to be |V|/2.  I looked up the Clique problem in G&J, and they say in the comments that it is NP-Complete for K = any fixed ratio of V.

I thought this was a neat easy problem that fit in the 3-6 difficulty range I mentioned last week and decided it was worth a post.  But thinking about this brings up some subtle issues relating to ratios and constants that are common sources of errors among students.  I’ll talk about that at the end.

The problem: I don’t know if there is an official name, so I’m calling it “Ratio Clique”.  It is mentioned in the comments to GT19 (Clique).

The description: For any fixed number r, 0< r < 1, does G have a clique of size r*|V| or more?

Example:  Here’s a graph we’ve used for a previous problem:

If r = .5, then r*|V| = 3.5.  So we’re asking if a clique of 3.5 or more vertices exists (which really means a clique of 4 or more vertices).  It does not exist in this graph.  If r ≤ , then we would be looking for a clique of size 3, which does exist in this graph (vertices b, c, and t)

The reduction: We will be reducing from the regular Clique problem.  Since we want to show this “for any fixed value of r”, we can’t change r inside our reduction.

So we’re given a graph G=(V, E) and a K as our instance of Clique. We need to build a graph G’=(V’, E’) that has a fixed K’ = ⌈r*|V’|⌉.

G’ will start with G, and will add new vertices to the graph.  The vertices we add depend on the ratio s of K to |V|    (K = ⌈s*|V|⌉).  K’ is initially K, but may change as vertices are added to the graph.

If r > s, then we need to add vertices to V’ that will connect to each other vertex in V’, and will increase K’ by 1.  This increases the ratio of , and we keep adding vertices until that ratio is at least r.

If G has a clique of size K, then the extra vertices in K’ can be added to the clique to form a larger clique (since these new vertices connect to every other vertex)

If G’ has a clique of size K’, notice that it must contain at least K vertices that were initially in G. (We only added K’-K new vertices).  These vertices that exist in G are all connected to each other and so will form a clique in G.

If r < s, then we will add vertices to V’ that are isolated (have no edges connecting to them).  K’ will stay equal to K.  Each vertex we add will reduce the ratio of , and we keep adding vertices until  K=⌈r*|V’|⌉.

Since these new vertices can not be part of any clique in G’, any clique in G’ must consist only of vertices from G.  Since K=K’, this gives us a clique of size K in both graphs.

It is probably also worth mentioning just how many vertices need to get added to the graph in each case, to make sure that we are adding a polynomial number.  If r>s, we will be adding w vertices to satisfy the equation: ⌈s*|V|⌉ + w = ⌈r*(|V|+w)⌉

(These are both ways of expressing K’)

Dropping the ceiling function (since it only leads to a difference of at most one vertex) Solving for w gets us w = .  Since r > s, both sides of that division are negative, so w ends up being positive, and polynomial in |V|.

If r < s, we will be adding w vertices to satisfy the equation:

⌈s*|V|⌉ = ⌈r(|V|+w)⌉

(These are both ways of expressing K)

This can similarly be solved to w = s|V|-r|V|.  Since s > v, this is also a positive (and polynomial) number of new vertices.

A possible source of mistakes: I’m pretty sure this reduction works, but we need to be careful that there is a difference between “for any fixed ratio r of |V|” and “for any fixed K”.  Because for a fixed K (say, K=7) solving the “Does this graph have a 7-Clique?” problem can be solved in polynomial (by enumerating all subgraphs of size 7, for example.  There are subgraphs, which is O()).  By choosing a ratio instead of a constant K, we gain the ability to scale the size of K’ along with the size of the graph and avoid this issue.  But it is worth mentioning this to students as a possible pitfall.  It’s very easy to do things in a way that effectively is treating r|V| as a constant K, which won’t work.

Difficulty: 3, but if you’re going to make students to the algebra to show the number of vertices that are added, bump it up to a 4.

## Bin Packing Take 2

[So WordPress’s search function has failed me.  A search for posts on Bin Packing didn’t turn up this post, so I went ahead and wrote a whole second post for this problem.  Since this time my reduction uses 3-Partition instead of Partition (and so is a little less trivial for use as a homework problem), I figured I’d leave it for you as an alternate reduction.

I have been thinking off and on about whether it would be useful when I’m done with this project (years from now) to go back and try to find reductions that can be done easier (or harder) than what I’ve shown here, to give more options that are in the 3-6 difficulty range that I think is best for homework problems.  I’m not sure how feasible that task would be, but it’s something I’ll try to keep in mind as I go forward.

Anyway, here’s my post that talks about Bin Packing again:]

On to a new chapter! A4- “Storage and Retrieval”

This first one is a classic problem that I guess I haven’t done yet.

The problem: Bin Packing.  This is problem SR1 in the appendix.

The description: Given a finite set U of items, each with a positive integer size, and positive integers B and K.  Can we split U into k disjoint sets such that the sum of the elements in each set is B or less?

Example: Suppose U was {1,2,3,4,5,6}, K=4, and B= 6.  We want 4 disjoint sets that each sum to 6 or less.  For example:

• {1,5}
• {2,4}
• {3}
• {6}

Note that if K = 3, we would need 3 sets instead of 4, and this wouldn’t be solvable.

The simple reduction: G&J on page 124 say that Bin Packing contains 3-Partition as a special case.  So let’s try reducing from there. Recall the definition of 3-Partition:

Given a set A of 3M elements and an integer B such that the sizes of each element are between B/4 and B/2 and the sum of all of the elements is m*B, can we split A into m disjoint sets that each sum to exactly B?

Keeping in mind that the bounds on the elements in A mean that there are exactly 3 elements in each set in the partition, we can see how this maps easily to the Bin Packing problem:

• U = A
• K = m
• Use the same B

While it is true that the Bin Packing problem allows the sums to be B or less, and the 3-Parittion problem forces the sets to sum to exactly B, the fact that all of the sets have to contain 3 elements and the fact that the sum of all of the element in U is m*B means that if any set in the Bin Packing answer is < B some other set will necessarily be more than B.

Difficulty: 3.  It is basically the same problem, but I think there is enough work needed to justify the reduction that it makes sense as a good easy homework problem.