# Tag Archives: Difficulty 4

## Hitting String

This is a good easy problem hidden in the middle of all of the hard ones.

The problem: Hitting String.  This is problem SR12 in the appendix.

The description: Given a set of strings A over the alphabet {0,1,*} all of the same length n, can we find a string x over the alphabet {0,1}, also of length n, where x agrees in at least one position with each string in A?

Example: Let A = {00000,11111,0*001, ***10, 101**, 1****}

Then x can be 10100.  It agrees with 00000 in the last position, 11111 in the first position, 0*001 in the fourth position, ***10 in the last position, and 1***** in the first position.

A pretty easy example of an instance that can’t be solved is A = {1**, 0**}, or even A = {***}

Reduction: We’ll go from 3SAT.  Each clause in the 3SAT instance will become a string in A.  Each string in A will have one position for each variable in the formula.  Each string will have a 1 in each variable’s position if that variable occurs positively in that clause, a 0 if it occurs negatively, and a * for all variables that don’t appear in the clause.  So each string will have just 3 characters that are not *.

Thus, we need to come up with a string x that has a 1 or 0 in each position (corresponding to a setting of a variable) that matches one of the three 1 or 0 characters in each string (satisfying that clause).

Difficulty: 4, maybe 3.  This is about as straightforward a “Turn SAT into a different kind of problem” reduction as you’re likely to see, but I do think crossing genre problems are harder for students than we may anticipate them to be.

## Capacity Assignment

This problem is from the same “unpublished manuscript” as last week’s.

The problem: Capacity Assignment.  This is problem SR7 in the appendix.

The description: Given a set C of “communication links”, and set M of positive capacities.  Each pair of a link c and a capacity m also has a cost function g(c,m) and delay penalty d(c,m) that has the following properties:

• If i < j ∈ M, then g(c,i) ≤ g(c,j)
• If i < j ∈ M, then d(c,i) ≥ d(c,j)

We’re also given positive integers K and J.  The problem is: Can we assign a capacity to each link such that the total g cost of all of our assignments is ≤ K and the total d cost of all of our assignments is ≤ J?

Example: There’s a lot to parse in that problem description.  The first thing to notice is that the set of links C doesn’t necessarily have to link anything together (it’s not like it has to apply to an underlying graph).  So we can just give them names:

C={a,b,c,d,e}

Next, there is no reason why the set of capacities has to be assigned as a bijection to C- the set M could be a different size entirely than the size of C:

M={1,2}

The cost function has to have the property that if we assign a 2 to a link, it has to cost as least as much as assigning 1 to the link:

g(c,1) = 3 for all c

g(c,2) = 4 for all c

The delay function has to have the property that if we assign a higher capacity to a link, the delay can’t be larger than assigning a lower capacity:

d(c,1) = 6 for all c

d(c,2) = 5 for all c

In this case, if we assign the capacity of 1 to all links, we get a total cost of 15 and a total delay of 30.  If we assign the capacity of 2 to all links, we get a total cost of 20 and a total delay of 25.     If we have K = 18, and J = 27, we can achieve that by setting 2 links to have capacity 1 and 3 links to have capacity 2.

The reduction: The example above is pretty close to how the reduction will work.  We will reduce from Sum of Subsets, so we start with a set S of integers and a target B.   Our set C will have one element for each element in S.  Our set M will be {1,2}.  Assigning a capacity of 1 will imply we don’t want to take this element in S’, and assigning a capacity of 2 will imply that we do.  (This makes more sense if I can use the set {0,1} for M, but the problem description says the elements of M have to be positive)

We will define our g function so that g(c,1) = 1 for all c, and g(c,2) will be s(c)+1 (where s(c) is the size of the element in S that corresponds to c).

Our d function will work similarly:  d(c,1) = s(c)+1 for all c, and d(c,2) = 1 for all c.  These functions both follow the restrictions for how g and d work.

Set K = |S| + B.  Since each cost is either s(c)+1 or 1, this is saying that there needs to be enough elements assigned a 1 (such that its cost is 1, instead of s(c)+1) to that the sizes of those elements does not exceed K.

Let T = The sum of all of the sizes of all of the elements in S.  Then let J = |S| + T – B.  Again, each d value always includes 1, and may include s(c) as well.  So this is saying that there needs to be enough values assigned a 2 (so that its delay is 1) so that the sizes of those elements does not exceed J.

If S has a SOS solution S’, then assigning a capacity of 2 to all elements in S’ and a 1 to all elements in S’ gives us a cost value of exactly K, and a delay value of exactly J.

If we have a Capacity Assignment solution, then notice that K+J = 2|S|  + T, and so is the sum of all delays and capacities no matter what assignment is chosen.  (g(c,m) + d(c,m) = s(c)+2, for all c, no matter what m we use).  So if the sum of the delays (or costs) were strictly less than K, the sum of the costs (or delays) would have to be strictly more than J.  The only way to satisfy both the K and J constraints is to make the sums exactly equal, which gives us a SOS solution.

Difficulty: 4.  I think the algebra for this problem is a little easier than last week’s, but it does take some work to understand what the problem is asking.  Changing the problem slightly to allow assignments and costs and delays to be 0 instead of making them all be positive integers makes the reduction easier too.

## Expected Retrieval Cost

Here’s another problem where the G&J definition confused me for a bit.

The problem: Expected Retrieval Cost.  This is problem SR4 in the appendix.

The description: Given a set R of records, each with a probability of being accessed between 0-1 (and the sum of all probabilities = 1), some number m of sectors to place records on, and a positive integer K.  Can we partition R into m disjoint subsets R1..Rm  such that:

• The “latency” cost of 2 sectors i and j, called d(i,j) is j-i-1  (if i < j) or m-i+j-1 (if i >=j)
• The probability of a sector, called  p(Ri), is the sum of the probabilities of the records on that sector
• The sum over all pairs of sectors i and j is p(Ri) * p(Rj) * d(i,j) is K or less

Example: The thing that was the hardest for me to understand was the definition of d.  The way it’s written, the distance between 2 adjacent sectors (for example d(2,3)) is 0.  The distance between a sector and itself (for example d(2,2)) is m-1.  The paper by Cody and Coffman do a better job of explaining the motivation: What we’re looking at is the time (in sectors traversed) for a disk to read sector j after finishing reading sector i.  So If we read sector 2 right before reading sector 3, the disk has no traversal time to go from the end of sector 2 to the beginning of sector 3.  But if we read sector 2 twice in a row, the disk reader (in this model) needs to scan to the end of all of the sectors, then return to the beginning, then scan all the way to the beginning of sector 2 to read again.

So, suppose we have m=2, and 4 records, each with .25 probability.  If we put them all in the same sector, we have d(i,j) = 1 for all pairs of sectors.  Since all pairs of sectors are in (say) R1, then p(R1) = 1, and p(R2) = 0.  So our sum is:

• p(R1)*p(R1)* d(1,1) = 1*1*1 = 1, plus
• p(R1) * p(R2) * d(1,2) = 1*0*0 = 0, plus
• p(R2) * p(R1)* d(2,1) = 0*1*0 = 0, plus
• p(R2)* p(R2) * d(2,2) = 0*0*1

..for a total of 1.

If we put 2 records in sector 1, and 2 records in sector 2, then p(R1) = p(R2) = .5.  So our sum is:

• p(R1)*p(R1)* d(1,1) = .5*.5*1 = .25, plus
• p(R1) * p(R2) * d(1,2) = .5*.5*0 = 0, plus
• p(R2) * p(R1)* d(2,1) = .5*1.5*0 = 0, plus
• p(R2)* p(R2) * d(2,2) = .5*.5*1 = .25

..for a total of .5.

The reduction: Hopefully the example using m=2 helps to show why using Partition is a good choice.  So we start with a set S of elements.  We will turn each element S into a value between 0 and 1 reflecting its proportion of the sum of all of the elements.  For example, if S={1,2,3,4,5}, then we would create a set R of values {1/15, 2/15, 3/15, 4/15, 5/15}.  These probabilities will all be between 0 and 1 and will all sum to 1.

We will set m=2, K = 1/2. Notice that d(1,2) = d(2,1) = 0.  So the only d values that will count for our sum is d(1,1) and d(2,2) (which are both 1)  So by our formula we need p(R1) * p(R2) + p(R2) * p(R1) = .5.

Some algebra tells us that this means that p(R1)*p(R2) = ..25, and we know that p(R1) + p(R2) = 1.  Solving that system of equations gets us p(R1) = p(R2) = .5.  Or, we have an Expected Retrieval Cost solution for R exactly when we have  a partition of S.

Difficulty: 4. Cody and Coffman say the details of the above reduction are “routine” after defining k = 1/2.  It is pretty straightforward, but there are some tricky parts to worry about.

I will say though that the definition in G&J, where it’s not clear how distances to adjacent things can be 0, struck me as much harder, and is the reason I dug up the Cody and Coffman paper in the first place.  I’d say that definition makes the problem a 6 or 7.

## Monotone 3-Satisfiability

I told Daniel when he gave me his Monotone Satisfiability reduction that the actual problem mentioned in G&J was Monotone 3-Satisfiability.  So he went off and did that reduction too.
The Problem:
Monotone 3 SAT. This is a more restrictive case of Monotone SAT

The Description:
Given an formula of clauses where each clause in contains all negated or non-negated variables, and each clause contains at most variables. Does there exist an assignment of the variables so that is satisfied?

Example:

the following assignment satisfies :

However:

And the following is in Monotone  3SAT form:

are both unsatisfiable.

The reduction:
In the following reduction we are given an instance of 3SAT,
. Here each clause is of the form:
where

and each is a literal of the form .
We use the following construction to build an instance of Monotone  3 SAT out of the above instance of 3SAT :
In each clause we have at most one literal, that is not of the same parity as the rest of the literals in the clause. For every such literal, we may preform the following substitution:
this yields a modified clause .
Now we must be able to guarantee that and are mapped to opposite truth values, so we introduce the new clause:
and conjunct it onto our old formula producing a new formula .

For example:
so we preform the substitution

so and

Now repeating this procedure will result in a new formula: .
We claim logical equivalence between the and This is semantically intuitive as the clause requires all substituted literal in to take the value opposite of this was the stipulation for the substitution initially. It is also verifiable by truth table construction for:

:
If there exists a truth assignment that satisfies , then we may extent this truth assignment to produce which will satisfy
by letting for all and letting for all .
Obviously if is satisfiable must be by the above construction of . So by the above claim we have that will satisfy .
:
Continuing from the above, if we have a truth assignment that satisfies , then by the claim above it also must satisfy . And is a sub-formula of so any truth assignment that satisfies must also satisfy .

(Back to me)

Difficulty: 4, since it’s a little harder than the regular Monotone Sat one.