Tag Archives: Partition

Expected Retrieval Cost

Here’s another problem where the G&J definition confused me for a bit.

The problem: Expected Retrieval Cost.  This is problem SR4 in the appendix.

The description: Given a set R of records, each with a probability of being accessed between 0-1 (and the sum of all probabilities = 1), some number m of sectors to place records on, and a positive integer K.  Can we partition R into m disjoint subsets R1..Rm  such that:

  • The “latency” cost of 2 sectors i and j, called d(i,j) is j-i-1  (if i < j) or m-i+j-1 (if i >=j)
  • The probability of a sector, called  p(Ri), is the sum of the probabilities of the records on that sector
  • The sum over all pairs of sectors i and j is p(Ri) * p(Rj) * d(i,j) is K or less

Example: The thing that was the hardest for me to understand was the definition of d.  The way it’s written, the distance between 2 adjacent sectors (for example d(2,3)) is 0.  The distance between a sector and itself (for example d(2,2)) is m-1.  The paper by Cody and Coffman do a better job of explaining the motivation: What we’re looking at is the time (in sectors traversed) for a disk to read sector j after finishing reading sector i.  So If we read sector 2 right before reading sector 3, the disk has no traversal time to go from the end of sector 2 to the beginning of sector 3.  But if we read sector 2 twice in a row, the disk reader (in this model) needs to scan to the end of all of the sectors, then return to the beginning, then scan all the way to the beginning of sector 2 to read again.

So, suppose we have m=2, and 4 records, each with .25 probability.  If we put them all in the same sector, we have d(i,j) = 1 for all pairs of sectors.  Since all pairs of sectors are in (say) R1, then p(R1) = 1, and p(R2) = 0.  So our sum is:

  • p(R1)*p(R1)* d(1,1) = 1*1*1 = 1, plus
  • p(R1) * p(R2) * d(1,2) = 1*0*0 = 0, plus
  • p(R2) * p(R1)* d(2,1) = 0*1*0 = 0, plus
  • p(R2)* p(R2) * d(2,2) = 0*0*1

..for a total of 1.

If we put 2 records in sector 1, and 2 records in sector 2, then p(R1) = p(R2) = .5.  So our sum is:

  • p(R1)*p(R1)* d(1,1) = .5*.5*1 = .25, plus
  • p(R1) * p(R2) * d(1,2) = .5*.5*0 = 0, plus
  • p(R2) * p(R1)* d(2,1) = .5*1.5*0 = 0, plus
  • p(R2)* p(R2) * d(2,2) = .5*.5*1 = .25

..for a total of .5.

The reduction: Hopefully the example using m=2 helps to show why using Partition is a good choice.  So we start with a set S of elements.  We will turn each element S into a value between 0 and 1 reflecting its proportion of the sum of all of the elements.  For example, if S={1,2,3,4,5}, then we would create a set R of values {1/15, 2/15, 3/15, 4/15, 5/15}.  These probabilities will all be between 0 and 1 and will all sum to 1.

We will set m=2, K = 1/2. Notice that d(1,2) = d(2,1) = 0.  So the only d values that will count for our sum is d(1,1) and d(2,2) (which are both 1)  So by our formula we need p(R1) * p(R2) + p(R2) * p(R1) = .5.

Some algebra tells us that this means that p(R1)*p(R2) = ..25, and we know that p(R1) + p(R2) = 1.  Solving that system of equations gets us p(R1) = p(R2) = .5.  Or, we have an Expected Retrieval Cost solution for R exactly when we have  a partition of S.

Difficulty: 4. Cody and Coffman say the details of the above reduction are “routine” after defining k = 1/2.  It is pretty straightforward, but there are some tricky parts to worry about.

I will say though that the definition in G&J, where it’s not clear how distances to adjacent things can be 0, struck me as much harder, and is the reason I dug up the Cody and Coffman paper in the first place.  I’d say that definition makes the problem a 6 or 7.

Kth Largest m-Tuple

I think I’m going to move the posts to Wednesday this semester since I teach two 2-hour classes on Tuesday/Thursday.

SP19 is Minimum Sum of Squares

SP20 is Kth Largest Subset and is very similar to our next problem.

The problem: Kth Largest m-Tuple.  This is problem SP21 in the appendix.

The description: Given sets X1 through Xm that all contain positive integers, and integers K and B, are there at least K m-tuples (x1, .., xm) from X1 x X2 x … x Xm for which the sum of the elements in each tuple is at least B?

Example: Let X1 = {1,2,3,4}, X2 = {5,10}, X3 = {1,2}. Notice that the X’s can be different sizes, and can repeat elements.

If B = 15, then the only tuples that sum to at least 15 are {3.10,2), (4,10,1), and  (4,10,2).

Reduction: The paper by Johnson and Mizoguchi presents the reduction pretty densely, but here is what I think they are saying:  We’re going to use a Turing Reduction to solve the Partition problem, which means we’re given an instance of Partition, and assume we can call a subroutine to solve the Kth Largest m-tuple problem multiple times to solve the Partition instance.  Recall that we can use Binary Search on multiple calls of the subroutine to determine (for example) how many tuples sum to some B or more.  (We need multiple calls because the subroutine is a boolean one, just saying yes or no to a given instance).

Updated reduction:

This idea came from Said D. in the comments, and I wanted to make sure it got posted here because it’s so simple and elegant.  If we are given a partition instance S = {s1..sn} and need to know if a subset sums to B (= half the sum of all of the elements in S), then the sets we create for the Kth Largest m-tuple instance are:

{s1+1, 1}, {s2+1, 1}, …(sn+1, 1}

And we number the K-th Largest m-tuple is looking for is B+n.  Each set will contribute 1 to the tuple sum no matter which element is chosen.  It may also contribute si as well, corresponding to whether we “take” that element in the partition instance or not.

That’s way better than what I was trying to do.  Thanks, Said!

The rest of my not very good reduction:

I’ll leave the rest up here for posterity, but this is way worse than Said’s idea above, and glosses over the fact that you can’t just use any elements you want as the sets- you need to set them up in a way that makes sure you don’t repeat elements when you choose the tuples.

So, in a manner similar to that used in the Kth Largest Subset reduction, we can find the number of tuples that sum to any B we want (including half the sum of all of the elements in the partition instance).  The only tricky part is that our “subsets” are tuples of different sizes.  So we need to run this algorithm multiple times:

  • Are there are 1-tuples that solve the partition problem?  (There are O(n)  1-tuples, so a binary search requires O(log n) calls)
  • Are there any 2 tuples that solve the partition problem?  (There are O(N^2) 2-tuples, so a binary search requires O(2* log n) calls)
  • Are there any m-tuples that solve the partition problem? (There are O(N!) n-tuples, so a binary search requires O(n * log n) calls)

Thus, we can do it in a polynomial number of calls to the K-th Largest m-tuple solver.

Difficulty: This is a little harder than the Kth largest subset reduction, which I gave a 5, so I’ll make this a 6.

Protected: Subset Product

This content is password protected. To view it please enter your password below: