Tag Archives: uncited reduction

Hitting String

This is a good easy problem hidden in the middle of all of the hard ones.

The problem: Hitting String.  This is problem SR12 in the appendix.

The description: Given a set of strings A over the alphabet {0,1,*} all of the same length n, can we find a string x over the alphabet {0,1}, also of length n, where x agrees in at least one position with each string in A?

Example: Let A = {00000,11111,0*001, ***10, 101**, 1****}

Then x can be 10100.  It agrees with 00000 in the last position, 11111 in the first position, 0*001 in the fourth position, ***10 in the last position, and 1***** in the first position.

A pretty easy example of an instance that can’t be solved is A = {1**, 0**}, or even A = {***}

Reduction: We’ll go from 3SAT.  Each clause in the 3SAT instance will become a string in A.  Each string in A will have one position for each variable in the formula.  Each string will have a 1 in each variable’s position if that variable occurs positively in that clause, a 0 if it occurs negatively, and a * for all variables that don’t appear in the clause.  So each string will have just 3 characters that are not *.

Thus, we need to come up with a string x that has a 1 or 0 in each position (corresponding to a setting of a variable) that matches one of the three 1 or 0 characters in each string (satisfying that clause).

Difficulty: 4, maybe 3.  This is about as straightforward a “Turn SAT into a different kind of problem” reduction as you’re likely to see, but I do think crossing genre problems are harder for students than we may anticipate them to be.

Capacity Assignment

This problem is from the same “unpublished manuscript” as last week’s.

The problem: Capacity Assignment.  This is problem SR7 in the appendix.

The description: Given a set C of “communication links”, and set M of positive capacities.  Each pair of a link c and a capacity m also has a cost function g(c,m) and delay penalty d(c,m) that has the following properties:

  • If i < j ∈ M, then g(c,i) ≤ g(c,j)
  • If i < j ∈ M, then d(c,i) ≥ d(c,j)

We’re also given positive integers K and J.  The problem is: Can we assign a capacity to each link such that the total g cost of all of our assignments is ≤ K and the total d cost of all of our assignments is ≤ J?

Example: There’s a lot to parse in that problem description.  The first thing to notice is that the set of links C doesn’t necessarily have to link anything together (it’s not like it has to apply to an underlying graph).  So we can just give them names:

C={a,b,c,d,e}

Next, there is no reason why the set of capacities has to be assigned as a bijection to C- the set M could be a different size entirely than the size of C:

M={1,2}

The cost function has to have the property that if we assign a 2 to a link, it has to cost as least as much as assigning 1 to the link:

g(c,1) = 3 for all c

g(c,2) = 4 for all c

The delay function has to have the property that if we assign a higher capacity to a link, the delay can’t be larger than assigning a lower capacity:

d(c,1) = 6 for all c

d(c,2) = 5 for all c

In this case, if we assign the capacity of 1 to all links, we get a total cost of 15 and a total delay of 30.  If we assign the capacity of 2 to all links, we get a total cost of 20 and a total delay of 25.     If we have K = 18, and J = 27, we can achieve that by setting 2 links to have capacity 1 and 3 links to have capacity 2.

The reduction: The example above is pretty close to how the reduction will work.  We will reduce from Sum of Subsets, so we start with a set S of integers and a target B.   Our set C will have one element for each element in S.  Our set M will be {1,2}.  Assigning a capacity of 1 will imply we don’t want to take this element in S’, and assigning a capacity of 2 will imply that we do.  (This makes more sense if I can use the set {0,1} for M, but the problem description says the elements of M have to be positive)

We will define our g function so that g(c,1) = 1 for all c, and g(c,2) will be s(c)+1 (where s(c) is the size of the element in S that corresponds to c).

Our d function will work similarly:  d(c,1) = s(c)+1 for all c, and d(c,2) = 1 for all c.  These functions both follow the restrictions for how g and d work.

Set K = |S| + B.  Since each cost is either s(c)+1 or 1, this is saying that there needs to be enough elements assigned a 1 (such that its cost is 1, instead of s(c)+1) to that the sizes of those elements does not exceed K.

Let T = The sum of all of the sizes of all of the elements in S.  Then let J = |S| + T – B.  Again, each d value always includes 1, and may include s(c) as well.  So this is saying that there needs to be enough values assigned a 2 (so that its delay is 1) so that the sizes of those elements does not exceed J.

If S has a SOS solution S’, then assigning a capacity of 2 to all elements in S’ and a 1 to all elements in S’ gives us a cost value of exactly K, and a delay value of exactly J.

If we have a Capacity Assignment solution, then notice that K+J = 2|S|  + T, and so is the sum of all delays and capacities no matter what assignment is chosen.  (g(c,m) + d(c,m) = s(c)+2, for all c, no matter what m we use).  So if the sum of the delays (or costs) were strictly less than K, the sum of the costs (or delays) would have to be strictly more than J.  The only way to satisfy both the K and J constraints is to make the sums exactly equal, which gives us a SOS solution.

Difficulty: 4.  I think the algebra for this problem is a little easier than last week’s, but it does take some work to understand what the problem is asking.  Changing the problem slightly to allow assignments and costs and delays to be 0 instead of making them all be positive integers makes the reduction easier too.

Multiple Copy File Allocation

These next two problems are from an “unpublished manuscript” that I can’t find.  Assuming I did them correctly, I think they are also good problems for students to work on.

The problem: Multiple Copy File Allocation.  This is problem SR6 in the appendix.

The description: Given a graph G=(V,E), where each vertex v ∈ V has a “usage” u(v), and a “storage cost” s(v) (both positive integers), and a positive integer K.  Can we find a subset V’ of V where for each v ∈ V we define d(v) to be the length of the shortest path from v to some member of V’, that the sum of:

  • The s(v) values for all vertices in V’ plus
  • The d(v)*u(v) values for all vertices not in V’ is ≤ K?

Example: The idea is that each vertex in V’ pays its s(v) cost, and each vertex not in V’ pays its u(v) cost for each edge is has to go through to get to something in V’.  Here’s a simple example:

sr6

Each node lists its s value first and u value second.  If we take just the “1,10” node into V’, our total cost is:

  • 1 from the “1,10” node (its s value)
  • 2 + 3 + 4 from each of its neighbors (their u values * a distance of 1)
  • 2 from the “90,1” vertex.  (u cost of 1, times a distance of 2 edges from the vertex in V’)

..for a total of 12.  I think that’s minimal.

Reduction: G&J say to use  Vertex Cover, and mention that we can make all s values equal to each other and all u values equal to each other.  The phrasing implies that we shouldn’t make the s value and u values the same altogether, which is a good hint.

So we start with out VC instance: a graph G=(V,E) and an integer K.  We will use the same graph.  Let s = the s(v) for each vertex = |V|.  Let u = the u(v) for each vertex =  \lceil|V|/2 \rceil + 1 (The smallest integer larger than |V|/2).

Set K’ = K*s + (|V|-K)* u.  Hopefully it’s clear how this number will allow a solution to the VC problem take that cover and create a solution to the MCFA problem.

If we have a MFCA solution, first notice that any vertices that are 2 (or more) edges away from something in V’ are contributing 2*u to the total cost.  Since 2*u > s, we can add a vertex to V’ at a distance 1 away from these distant vertices and keep our total cost ≤ K’.  This means we can transform any MCFA solution into one that is also a solution but where V’ is a cover of V.

Also notice that even after making these changes that if we were to have K+1 (or more) vertices in V’, our total cost would be (K+1)*s + (|V|-K-1)* u.  This is more than K’ (it adds an extra s of cost |V| and removes a u of cost |V|/2+1).

So, we can construct a MCFA solution where there are K or less vertices in V’ that form a cover.  This gives us a solution to the VC instance.

Difficulty: 5.  It takes a little playing around with the algebra to get the correct values, but the idea that we can make the s and u values the same for all vertices is helpful, and gives a clear path to how you’d reduce this problem from VC.

Expected Retrieval Cost

Here’s another problem where the G&J definition confused me for a bit.

The problem: Expected Retrieval Cost.  This is problem SR4 in the appendix.

The description: Given a set R of records, each with a probability of being accessed between 0-1 (and the sum of all probabilities = 1), some number m of sectors to place records on, and a positive integer K.  Can we partition R into m disjoint subsets R1..Rm  such that:

  • The “latency” cost of 2 sectors i and j, called d(i,j) is j-i-1  (if i < j) or m-i+j-1 (if i >=j)
  • The probability of a sector, called  p(Ri), is the sum of the probabilities of the records on that sector
  • The sum over all pairs of sectors i and j is p(Ri) * p(Rj) * d(i,j) is K or less

Example: The thing that was the hardest for me to understand was the definition of d.  The way it’s written, the distance between 2 adjacent sectors (for example d(2,3)) is 0.  The distance between a sector and itself (for example d(2,2)) is m-1.  The paper by Cody and Coffman do a better job of explaining the motivation: What we’re looking at is the time (in sectors traversed) for a disk to read sector j after finishing reading sector i.  So If we read sector 2 right before reading sector 3, the disk has no traversal time to go from the end of sector 2 to the beginning of sector 3.  But if we read sector 2 twice in a row, the disk reader (in this model) needs to scan to the end of all of the sectors, then return to the beginning, then scan all the way to the beginning of sector 2 to read again.

So, suppose we have m=2, and 4 records, each with .25 probability.  If we put them all in the same sector, we have d(i,j) = 1 for all pairs of sectors.  Since all pairs of sectors are in (say) R1, then p(R1) = 1, and p(R2) = 0.  So our sum is:

  • p(R1)*p(R1)* d(1,1) = 1*1*1 = 1, plus
  • p(R1) * p(R2) * d(1,2) = 1*0*0 = 0, plus
  • p(R2) * p(R1)* d(2,1) = 0*1*0 = 0, plus
  • p(R2)* p(R2) * d(2,2) = 0*0*1

..for a total of 1.

If we put 2 records in sector 1, and 2 records in sector 2, then p(R1) = p(R2) = .5.  So our sum is:

  • p(R1)*p(R1)* d(1,1) = .5*.5*1 = .25, plus
  • p(R1) * p(R2) * d(1,2) = .5*.5*0 = 0, plus
  • p(R2) * p(R1)* d(2,1) = .5*1.5*0 = 0, plus
  • p(R2)* p(R2) * d(2,2) = .5*.5*1 = .25

..for a total of .5.

The reduction: Hopefully the example using m=2 helps to show why using Partition is a good choice.  So we start with a set S of elements.  We will turn each element S into a value between 0 and 1 reflecting its proportion of the sum of all of the elements.  For example, if S={1,2,3,4,5}, then we would create a set R of values {1/15, 2/15, 3/15, 4/15, 5/15}.  These probabilities will all be between 0 and 1 and will all sum to 1.

We will set m=2, K = 1/2. Notice that d(1,2) = d(2,1) = 0.  So the only d values that will count for our sum is d(1,1) and d(2,2) (which are both 1)  So by our formula we need p(R1) * p(R2) + p(R2) * p(R1) = .5.

Some algebra tells us that this means that p(R1)*p(R2) = ..25, and we know that p(R1) + p(R2) = 1.  Solving that system of equations gets us p(R1) = p(R2) = .5.  Or, we have an Expected Retrieval Cost solution for R exactly when we have  a partition of S.

Difficulty: 4. Cody and Coffman say the details of the above reduction are “routine” after defining k = 1/2.  It is pretty straightforward, but there are some tricky parts to worry about.

I will say though that the definition in G&J, where it’s not clear how distances to adjacent things can be 0, struck me as much harder, and is the reason I dug up the Cody and Coffman paper in the first place.  I’d say that definition makes the problem a 6 or 7.

Protected: Ratio Clique

This content is password protected. To view it please enter your password below:

Protected: Bin Packing Take 2

This content is password protected. To view it please enter your password below:

Protected: Numerical Matching With Target Sums

This content is password protected. To view it please enter your password below:

Protected: Minimum Separating Set

This content is password protected. To view it please enter your password below:

Protected: Minimum Test Set

This content is password protected. To view it please enter your password below:

Protected: Hitting Set

This content is password protected. To view it please enter your password below: