Tag Archives: SR3

Pruned Trie Space Minimization

This problem is hard to explain, partially because the definition given by G&J doesn’t really map to the structure they are talking about easily.

The problem: Pruned Trie Space Minimization.  This is problem SR3 in the appendix.

The description in G&J: Given a finite set S, a collection F of functions mapping elements of S to positive integers, and a positive integer K.  Can we find a sequence of m distinct functions from F <f1 .. fm> such that:

  • For each pair of elements a and b in S, there is some function fi in the sequence where fi(a) ≠ fi(b)
  • For each i from 1 to m, define N(i) to be the number of distinct tuples X= (x1..xi) where more than one a in S has the tuple (f1(a), …, fi(a)) = X, the sum of all of the N(i) values is at most K?

A better description: G&J’s definition removes all knowledge of the “tries” from the problem.  The Comer and Sethi paper that is referred to in the appendix I think does a better job.

First, a trie is a tree that separates a sequence of strings by letters. The idea is that each string has a unique path through the tree.  Here is the tree used in the paper:

pruned-trie-space-minimization

This trie shows the path for the set of strings: {back, bane, bank, bare, barn, band, bang, barb, bark, been} by building the tree by considering letters in the string from left to right.  By using different orders of considering letters, we will get differently shaped tries, with different numbers of internal nodes.

A pruned trie recognizes that long paths of nodes with 1 child doesn’t actually need to be represented.  For example, once you go down the “b-e” side, the only place you can end up is at “been”.  So the trie is pruned by removing all such chains (we would consider the “e” node a leaf).

What we are interested in doing is finding an ordering on the letters in the string (or, more generally, the “attributes” of an element we are trying to distinguish) in order to minimize the number of nonleaf nodes in the pruned trie.

The actual question we want to solve is: Given a set of strings S and an integer K, can we construct a trie that differentiates the S strings with K or less internal nodes?

I think the way this maps to the G&J definition is:

S is the set of strings.  F is the set of attributes that map strings to an order of choosing attributes.  The sequence of functions <f1, …, fn> are the orders in which we choose attributes.  So f1(a) is the first node in the trie that we go to on the string a, f2(a) is the second node we go to and so on.  The fi(a) ≠ fi(b) requirement says that we need to eventually differentiate each string from each other, and the N(i) number is counting the number of internal nodes at each height of the tree:

Example: For the picture shown above, we get the following pruned trie (also from the paper):

pruned-trie-space-minimization2

This trie has 5 internal nodes.

Reduction: G&J say that the reduction goes from 3DM, but in the paper it goes from 3SAT. So we’ll start with a formula in 3CNF form with n variables and m clauses.  The strings we’ll build will have 3n+3m attributes (you can think of this as strings of length 3n+3m).    The first 2n attributes will correspond to literals (one attribute for the positive setting of a variable, one attribute for the negative setting).  The next 3m attributes will correspond to clauses (3 attributes for the 3 possible positions a variable can appear in a clause), and the last 3 attributes correspond to literals (to combine the positive and negative setting of that variable’s literals).

We will have one string for each literal (a 1 in the attribute matching that literal’s positive or negative setting, a 1 in the attributes matching that literal’s position in clauses, and a 1 in the attribute matching that variable, 0’s everywhere else).  We will have one string for each clause (a 1 in the three positions in each clause, 0’s everywhere else).  Then we will have a sequence of “hard to distinguish” strings made of decreasing numbers of 2’s (with 0’s everywhere else).

Here’s the example construction from the paper (blank spaces are zero’s).  It’s a little confusing because they chose n=m=3, but you can see where the various pieces are:pruned-trie-space-minimization3

K=2n+m.

If the formula is satisfiable, then the ordering of attributes where we put all of the literals that form the satisfying arrangement first, then all of the clauses, then the W attributes (for the variables) distinguishes the strings in L with 2n+m internal nodes.

In fact, all tries must have at least K internal nodes to distinguish the strings in L- that can be seen from the table, since we have K strings made up of decreasing numbers of 2’s.  We also have to distinguish the strings in order (the strings with the most 2’s first, then the ones with less 2’s, all the way down to the last one with just one 2).  We need to choose one attribute for each literal (positive or negative).  Suppose we choose an attribute Ui (or its negation).  That node in the trie has 3 children:

  • A 2, which distinguishes the string in L.
  • A 1, which distinguishes the string corresponding to that literal in J.
  • A 0, for everything else.

What this means is that we have “distinguished off” the literal string (in J) from the rest (on a 1), which means that the 1 it has in the clause position will not interfere with the 1 in that position of the clause string (in K).  So each clause string will be able to be distinguished by the clause position that satisfies the string.

So, if we have a trie with “only” K internal nodes, the attributes must line up to allow us to have a setting of a variable to satisfy each clause.

Difficulty: 8, with the Comer and Sethi trie definition.  If you are going straight from G&J’s definitions, it’s at least a 9.