Author Archives: Sean T. McCulloch

Three-Way Matching of Integers

This is an interesting variant on the Three-Dimensional Matching problem that will be used in the next reduction.  Since I think it would make a good easy homework problem, I’m posting it mainly as a reminder to myself to use it.

The problem: Three-Way Matching of Integers.  This problem isn’t in the appendix, but the paper by Papadimitriou and Kanellakis that describes it refers to G&J for the reduction.  I think that it’s because it’s seen as a “trivial” extension of Numerical Three-Dimensional Matching.

The description: Given a set A with n integers {a1 .. an), and a set B with 2n integers (b1, b2n), can we split B into n pairs of elements, where each pair Pi={bj, bk}  and the sum of ai+bj+ bk = the same constant c for all i?  (Notice that c will be 1/n of the sum of all of the elements in A and B)

Example: I think the problem is simpler than the description.  Suppose A = {1,2,3}, and B = {}.  Then c will be 15, and P1 = {6,8}, P2 = {4,9}, P3 = {5,7}.

Reduction:  This is very close to Numerical Three-Dimensional Matching, so we’ll start there.  So we’re given 3 sets X, Y, and Z, with n elements, and a bound B.  We can assume that B = 1/n of the sum of all of the elements in X, Y, and Z.

We can’t just set A=X, and B= Y∪Z, because we need to make sure that a solution to our problem doesn’t build a Pi that takes 2 elements from Y (or Z).  The fix is to add B to all of the elements of Y before putting them in B.  So now c will be 2B (the old B that a triple adds to, plus the B we added to the element from Y that we need).  This guarantees that each Pi will take one element that came from Y, and one element that came from Z.

Difficulty: 3.  I don’t think this is trivial since we have to change the original problem a bit in our reduction, but I think it is a change that’s not hard for people to figure out.  Though you might convince me this should be a 2 instead.

No-Wait Flow-Shop Scheduling

This problem is very similar to the last problem, but the reduction is much harder to understand.

The problem: No-Wait Flow-Shop Scheduling.  This is problem SS16 in the appendix.

The description: Given an instance of Flow-Shop Scheduling, can we create a schedule that meets the deadline with the additional property that no job waits between tasks?  (So, once a job completes a task, it immediately starts another task).

Example: Suppose I have 3 processors and 2 jobs:

Job P1 time P2 time P3 time
1 2 4 3
2 2 2 2
3 2 3 1


A flow-shop schedule that allowed waiting could be:

P1 1 1 2 2 3 3
P2 1 1 1 1 2 2 3 3 3
P3 1 1 1 2 2 3

This works because Job 2 waits for one time unit between finishing on P2 and starting on P3.   In the no-wait instance, we’d have to start job 2 much later, so that there is no interference.  But it turns out that we could also have a no-wait schedule putting a different order of jobs on P1 (Job 2, then Job 1, then Job 3) that is just as fast:

P1 2 2 1 1 3 3
P2 2 2 1 1 1 1 3 3 3
P3 2 2 1 1 1 3

Since jobs can’t be interrupted, the finish time of a job is always the start time plus all the time of all of its subtasks.  What we are trying to do is find a permutation of the jobs that allows each job to start at a time where all of the jobs will be finished by the deadline.

Reduction: I think this is Theorem 6 in the Lenstra, Rinooy Kan, and Brucker paper.  They reduce from directed Hamiltonian Cycle, but I think the discussion beforehand talking about (directed) TSP is also useful.

The key insight is based on the permutation discussion in the example above.  The TSP problem is also looking for a permutation of the vertices.  We can make the problems equivalent by making the cost of each edge (i,j) the value cij: The length of the time interval between when we can start job i and job j, assuming we start j as soon as possible after i.

The problem is that this is a reduction in the wrong direction (it reduces from our problem to TSP).  But since TSP is proven NP-Complete by reduction from directed HC, we can use a similar construction as we do in that proof to take an HC instance and build the correct jobs in the same fashion.  The paper actually does show you how to make that construction but it’s pretty complicated.

Difficulty: 8.  Not only is the conversion hard, I think the “oh, wait, this is based on a backward reduction” will confuse beginning students. For more advanced students, showing the details of this reduction may help add to a bag of tricks that can be used in similar “I have a reduction in the wrong direction” situations.

Flow-Shop Scheduling

The first couple of problems in this section, at least, are pretty tractable for students.

The problem: Flow-Shop Scheduling.  This is problem SS15 in the appendix.

The description: Exactly the same as Open-Shop scheduling, with the additional restriction that we have to schedule each job on the processors in strictly increasing order of processor number.

(Note that the description in G&J doesn’t specify that subtasks of a job are assigned to specific processors.  But it’s clear that’s how it works in the paper.)

Example:  Here is the set of jobs from last week:

Job P1 time P2 time P3 time
1 1 2 3
2 3 1 2
3 2 3 1

If we have to schedule things in order, we can only schedule one job at time 0, and it has to go on P1.  The greedy choice would be to schedule Job 1 on P1 first, and then schedule it on P2 at time 1, once it’s finished.  Then we can schedule the beginning of Job 3 on P1, and so on.  The overall schedule looks like:

P1 1 3 3 2 2 2
P2 1 1 3 3 3 2
P3 1 1 1 3 2 2

For total time 9.

But, if we schedule Job 2 instead of Job 3 on P1, we get:

P1 1 2 2 2 3 3
P2 1 1 X 2 X 3 3 3
P3 1 1 1 2 2 X 3

For total time 10.

Reduction: The paper by Garey, Johnson, and Sethi uses 3-Partition.  So we’re given a set A with 3N elements, and a bound B.  We’ll build a Flow-Shop problem with 3 processors.  The jobs will be a set of  N+1 “C” jobs and 3N “E” jobs.

Job C0 has tasks of length <0, B, 2B>, job Cn has tasks of length <2B, B, 0> and the intervening Ci tasks have length<2B, B, 2B>.

Notice that scheduling the C tasks leaves “holes” of length B on processor 2- a Ci task takes 2B time on processor P1, and when it moves to P2, it finishes there in B time, and so P2 needs to wait idly for another B time units while waiting for the next job on P1 to finish.

Job Ej (where j runs from 1 to 3N) will take <0, aj, 0> time, where aj is the corresponding element in A.  The deadline is (2n+1)B.

If a 3-partition exists, then we can split A into sets of 3 elements, each of which sum exactly to B.  Each of these sets of 3 elements can fit exactly into one of the length B “holes” left by the C jobs.

If a feasible schedule exists, notice that scheduling all of the C tasks as efficiently as possible gets us exactly to the deadline.  So the only thing we can do with the E tasks is put them into the holes on processor 2.  Which gives us a partition.

Difficulty: 5.  This is a pretty easy and straightforward application of 3-partition.  But I think 3-partition is hard to understand, so it’s still not an easy reduction.

Open-Shop Scheduling

On to the next section with a very nice reduction.

The problem: Open-Shop Scheduling.  This is problem SS14 in the appendix.

The description: Given a set of m processors and J jobs.  Each job j tas m tasks, t1[j] through tm[j].  Each subtask ti has a length l(t) and needs to be run on processor i.   We are not allowed to schedule the same job at the same time on two processors, but it doesn’t matter the order in which jobs go to processors.  Given an overall deadline D, can we finish all tasks by the deadline?

Example: The main difference between this problem and the scheduling problem is that each task gets broken into multiple sub-tasks.  I think of it like building a car- there is one “processor” that puts in the engine, one “processor” that does the body welding, one “processor” that puts on the wheels, and so on.  Since it’s all happening to the same car, we can’t be working on the same car in two places at the same time.

Suppose I have 3 jobs and 3 processors

Job P1 time P2 time P3 time
1 1 2 3
2 3 1 2
3 2 3 1

Obviously, the “right” way to do things is to schedule all of the tasks that have the same length on different processors at the same time.   Everything finishes by time 6.

If we get the first allocation of jobs wrong, though, perhaps by putting Job 1 on P1, Job 2 on P3, and Job 3 on P2, then at time 2 we have nothing to run on P1.  Trying to keep things as compact as possible, we can put Job1 on P3 and Job2 on P1 at time 3 (right after Job 2 finishes).  They both finish at time 5, and Job 3 has no place to go after it finishes at time 3.

Finishing the schedule, you get something like:

P1 J1 X J2 J2 J2 X J3 J3
P2 J3 J3 J3 X X J2 J1 J1
P3 J2 J2 J1 J1 J1 J3 X X

..finishing all of the jobs at time 8.

Reduction: The reduction by Gonzalez and Sahni uses Partition.  This threw me for a bit because G&J say that if you have only 2 processors, the problem is polynomial, and 2 processors seemed like the natural way to represent two partitions.  Suppose we have a partition set S, whose elements sum to T.  Each element in S is represented by 3 jobs, each with 3 tasks, one for each processor.  If the partition element is ai, then the first set of 3 tasks for that element will take time ai on P1, and time 0 on P2 and P3.  The second set of 3 tasks will take time ai on P2, and time 0 on P1 and P3.  The third set of 3 tasks will take time ai on P3, and time 0 on P1 and P2.   We add one extra job with 3 tasks, each taking time T/2 (which is the size of the partition of S).

Our deadline D will be 3T/2.  D also happens to be the total length of the last task, as well as the sum of all of the tasks put on each processor.  So the only way to meet the deadline is to have no idle time for any processor.

If a partition exists, then we can split S into two halves of equal size, and schedule all three processors with both halves of S, and one of the 3 jobs in the last task, as 3 discrete chunks, giving us a feasible schedule.

If a partition does not exist, then we can prove by contradiction that no feasible schedule exists:  Suppose we did have a schedule that finished by the deadline.  As noted above, there cannot be any idle time in the schedule for any processor, so each processor is running continuously from time 0 to time D.  Since processors can’t be working on the same job at the same time, that means that some processor (WOLOG P1) starts the big final job’s first task at time 0, and finishes at time T/2.  Then some other processor (WOLOG P2) continues the big final job’s second task from time T/2 till time T.  Since P2 needs to finish all of its jobs at time 3T/2 and has T total work done in other tasks, this must mean that the elements of S were split into a chunk of size T/2 done before the big job shows up, and a chunk of size T/2 done after the big job finishes.  This means that we did have a partition of S, a contradiction.

Difficulty: 5.  I really like this reduction, and think it’s very elegant. It may not be obvious to students starting from scratch why you need 3 processors, but it’s pretty obvious why this reduction wouldn’t work with just 2 processors.  Maybe if you give the students a hint that the 2-processor case is polynomial, then they’ll be able to get it.

Scheduling to Minimize Weighted Completion Time

This reduction took me a while to follow because there is a lot of subtle algebra you have to do for yourself.

The problem: Scheduling to minimize weighted completion time.  This is problem SS13 in the appendix.

The description: Given a set T of tasks where each task t has a length l(t) and weight w(t), a set of m processors, and a bound K.  Can we find an m-processor schedule for all tasks where each project’s “completion penalty” is its weight multiplied by it’s completion time, and the total completion penalty for all tasks is K or less?

Example: Suppose I had 2 processors and 4 tasks:

Task Length Weight
1 5 4
2 1 6
3 3 3
4 3 3

Scheduling greedily by weight would put task 1 and 2 on one processor, then tasks 3 and 4 on the second processor after task 2 finishes.  This has a total penalty of 20 for task 1 + 6 for task 2 + 12 for task 3 + 21 for task 4 = 59

Scheduling task 2 then 1 on processor 1 and tasks 3 and 4 on processor 2 gives a total penalty of 57.

Reduction: The paper by Lenstra, Rinnooy Kan, and Brucker was hard for me to follow because they are discussing lots of different scheduling variants in the same paper, and do it by parameterizing the kinds of inputs. I’m pretty sure our problem is Theorem 3b in the paper, which reduces from Partition.  So we start with a set T of elements and build 1 task for each element in T.  The processing time and weight of the task are the size of the element in T.  K will be the sum of the products of all pairs of elements – 1/4 of the sum of all of the elements squared.  We’ll have 2 processors.

They claim that since the processing times and lengths are all the same for each task, if you are given a set of these tasks on one processor, the order doesn’t matter, and we get the same total weight no matter what.  It took me a while to buy that, but it comes down to the fact that each task is costing one penalty unit per time step no matter where it is scheduled.

Given a subset T of S, define c to be the amount the sum of the elements in S is “off” of the partition.  (The difference between the sum of the elements in T and half of the sum of everything in S).  When c=0, T is a partition of S.

The completion penalty of a processor is the completion penalty of all of the tasks (remember, the order doesn’t matter)  – the sum of all of the weights of tasks on that processor * the sum of all of the weights not on that processor.  Which is K+c2. Which means we have a solution to the scheduling problem if and only if c is 0, which is exactly when we have a partition.

Difficulty: 6.  I’m not sure how much I’d trust students to be able to come up with the algebra themselves.  Make if you tell them to set up the problem with lengths and weights equal, and then make them prove that in that case, the order doesn’t matter, then it becomes more tractable for them.

Preemptive Scheduling

This is one of those reductions that is easy because you remove everything that makes the problem hard.  It always feels like cheating to do this, but I guess it shows that adding extra complexity to a hard problem will never make it easier..

The problem: Preemptive Scheduling.  This is problem SS12 in the appendix.

The description: Given a set T of tasks arranged in a partial order,   each task t has a positive length l(t).  We have a set of m processors and an overall deadline D.  Can we create a preemptive schedule for T that obeys the precedence constraints and meets the overall deadline?  A preemptive schedule allows tasks to be partially run on a processor and then removed, and we can finish it at some later time.

Example: Here is a simple partial order:

The number of the task is its length.  So task a takes 3 time units, for example.

A two-processor preemptive schedule could be something like: P1: (1,1,1,2,2,4)  and P2: (4,4,4,3,3,3), getting both tasks done at time6.

If we did not allow preemption, we would not be able to fit task 4 along with task 1, so we couldn’t do anything on P2 while P1 was running task 1.

Reduction: The paper by Ullman which we used for Precedence Constrained Scheduling has the reduction.   He basically notes that Precedence Constrained Scheduling is a special instance of Preemptive Scheduling where all lengths of all jobs are 1.  In that case, no preemption is possible.  So the reduction is basically “use the exact same problem”.

Difficulty: 2.  I usually reserve Difficulty 1 for “this is the same problem”, but here, there is a little bit of insight to convince yourself that you can interpret the problem this way.

Scheduling with Individual Deadlines

I feel like this problem is an easy extension of a problem we’ve already done.  This worries me, because why would it show up as a separate problem with a separate reference in G&J if that is the case?

The problem: Scheduling with individual deadlines.  This is problem SS11 in the appendix.

Description: Given a set T of tasks, each of length 1, arranged in a partial order, a set of m processors, and a deadline d(t) for each task t, can we schedule all tasks to meet all deadlines?

Example: Here’s a precedence graph:

Suppose the deadlines are:

Task Deadline
1 1
2 3
3 2
4 1
5 2
6 3

If we have 2 processors, then we can schedule all tasks by doing each task exactly as its deadline is coming up.  If we add one more requirement in the partial order from 3 to 5, then these tasks cannot all be finished by the deadline.

Reduction (easy version): When I read this problem, it seemed very similar to Precedence Constrained Scheduling.  The difference is that in that problem, we’re given a single deadline D that all tasks must be completed by, but in this problem, each task can have its own deadline.

But this still leads to a pretty easy reduction from PCS: We take our PCS instance, use the exact same tasks, use the same number of processors, and make each task’s individual deadline be D.  So our Individual Deadline instance is feasible exactly when all tasks are finished by time D.

This feels too easy.  Maybe I’m misreading one of the problems.  So, to be safe..

Reduction (Brucker, Garey, and Johnson version):

The reduction in the paper goes from Vertex Cover.  So we’re given an undirected graph G=(V,E), and a K.  Define define v = |V|, e= |E|, a= v+ e(e+1).  Define m = a + e+ v.  This will be the number of processors.  Define D = 3e+3.  This is the largest deadline for any task.

There will be m*(D-1)+1 tasks. Task T0 has deadline 1.  There will be many other “dummy” tasks Tij that are set up so that they have to start at time j and have to end at time j+1.  This is enforced by having each Ti,j have to come before Ti,j+1 in the partial order.  The number of i tasks at each j changes:

  • Ti,1 where i goes from 1 to m-k
  • Ti,2, where i goes from 1 to m-e
  • Ti, 2+j for each j from 1 to e, where i goes from 1 to v+j
  • Ti, e+2+j, for each j from 1 to e, where i goes from 1 to a+v+j-1
  • Ti, 2e+2+j, for each j from 1 to e, where i goes from 1 to e+v

T0 is the root of all of these and comes before each Ti,1.

Then we add the “graph-dependent” tasks.  Each vertex vi has a task that has to come after T0 and has deadline e+3.  Each edge ej spawns two chains of j tasks (E’j(1) through E’j(j) and E”j(1) through E”j(j))  each, all of which have deadline 2e+3.  We’ll also have two other groups of tasks for each edge: L’j(1) through L’j(a) and L”j(1) through L”j(a). The last E’ task for each j comes before all of the corresponding L’ tasks, and the last E” task for each j comes before all of the corresponding L” tasks.  Each of these L tasks has a deadline D-j+1.

Edges in G also create precedence constraints.  The edge ej= (vq, vr), with q < r, creates the ordering vq->E’j(1) and vr->E”j(1).

If G has a Vertex Cover, then that means that each edge in E has at least one endpoint in the cover.  The jobs for these vertices will be scheduled at time 1.  Since the vertex jobs have to be run before the corresponding edge jobs, we can run the E’ or E” job that was set by the last precedence constraints above at time 2.  Once that chain finishes, we can run the L jobs.

If we have a valid schedule that meets all deadlines, we must have scheduled T0 first, and each Ti,j at time j. This means we only have so many slots to schedule the E chains.  Since vertex tasks have to happen before these chains, we need to schedule k vertex tasks at time 1.  If the schedule is feasible, that must be a cover. (Because if it wasn’t, then there is an edge that does not have it’s E’ or E” chain start at time 2, which is too slow).

Difficulty: The easy way is a 2.  The way from the paper is an 8, because it’s hard to see why the specific numbers of dummy tasks comes from, why you need 2 chains (E’ and E”.  You need them because you don’t know which endpoint of the edge will be in the cover), why you need the L tasks, and so on.

Resource Constrained Scheduling

I spend a bit of time trying to see if there was an easy reduction between this problem and one of the other scheduling ones, but I couldn’t find anything.  So we’ll do it the hard way.

The problem: Given a set T of tasks, each with a length of 1, a set of m processors, and set of r resources.  Each resource i has a “bound” bi of how many resources are available at any time and each task t has a requirement ri(t) denoting how many of resource i it needs.  We’re also given a deadline D.

Can we find an m-processor schedule for all tasks that:

1) Meets the deadline, and

2) For all time units, ensures that all of the processors running in that time unit do not exceed the resource bounds for any resource?

Example: Suppose I have 4 tasks and 1 resource:

  • Task 1 uses 2 units of the resource
  • Task 2 uses 3 units of the resource
  • Task 3 uses 4 units of the resource
  • Task 4 uses 5 units of the resource

If there are 5 units of the resource available, we can get this task done in 3 time steps (Task 4, then 1 and 2, then 3)

If there are 7 units available, we can get the task done in 2 time steps (1 and 4, then 2 and 3)

If there are 14 units available, we can do everything in one time step.

Reduction: The paper by Garey and Johnson proves several variants of the problem,  The one I’ll do is in Lemma 3.4, which has 5 processors and 8 resources.  They go on from there to show the variant with 3 processors and 1 resource is also NP-Complete, but we only care about the general problem.

The reduction is from 3DM, In this case, we’re assuming that W, X, and Y (the 3 sets in the 3DM problem) are all the same set T, and S is our set of triples from T.  Define a value mk(i) to be the number of triples in S that have i in their kth value.  So m2(4) is the number of triples with a 4 in position 2.  Notice that for a legal matching to be possible, mk(i) has to always be at least 1.

So we build an instance of the resource constrained scheduling problem with 5 processors and 8 types of resources.   The tasks are:

  • One “S” task for each triple in S.
  • X0, Yo and Z0 tasks for each element in q (the number of elements in T)
  • Xij tasks where i runs from 1 to q and j runs from 1 to m1(i).
  • Similar Yij (using m2(i)) and Zij (using m3(i)) tasks.
  • “F” tasks from 1 to q.
  • |S|-q “G” tasks.

The deadline is |S|.

The resource requirements are set up so that during each time unit, we have 5 processors running:

  1. An S task
  2. An X or X0
  3. A Y or Y0
  4. A Z or Z0
  5. An F or G

The X0/Y0/Z0 tasks all correspond to an element in T, and can only be run along with an F and the S task that is the triple of the X, Y, and Z elements in S.   The F task makes sure that we have chosen a legal matching.   The matching consists of the S triples chosen along with an F task.  (The G tasks are there to fill the rest of the timesteps) .

Difficulty: 7.  More if you keep going and refine the problem down to 3 processors and 1 resource.

Precedence Constrained Scheduling

SS8 is “Multiprocessor Scheduling”, which is done in Section 3.2.1

Here’s SS9, which uses the non-book problem from last week.

The problem: Precedence Constrained Scheduling.  This is problem SS9 in the appendix.

The description: Given a set T of tasks arranged in a partial order, each of length 1, a set of m processors, and a positive deadline D.  Can we schedule the tasks in T in a way that obeys the precedence constraints and finishes by deadline D?


Suppose we had the following simple precedence graph:

With 2 processors, the earliest feasible schedule will finish at time 3 (timestep 1 schedules 2 of a,c, and d, timestep 2 schedules the remaining one, and then we can finally schedule b after that).  With 3 processors, we can finish at time 2 (schedule all of a,c, and d in timestep 1, then b right afterward).

Reduction: This is problem P2 in the paper by Ullman and the reduction uses the “Single Execution Time with Variable Number of Processors” problem from last week.  The main difference between that problem and this one is that instead of having a varying number of slots at each time, now we have K processors for all time steps.

So we start with an instance of the variable processor problem and add a bunch of new jobs Iij where i runs through each time step, and j runs from 0 to n – ci. (n is the number of jobs, ci is the number of processors available at time i).  The precedence graph keeps all of the old precedences and also has Iij come before Ii+1,k  for all j and k.  (So all I jobs at time i come before all I jobs at time i+1).  We have n+1 processors and the same deadline.  The idea is that the extra I jobs all have to run at their specific timestep, and they “fill” the empty processor slots between the ci processors available in the original and the n+1 processor slots available in this version of the problem.

Difficulty: 4, mainly because the variable # of processors problem we’re reducing from is hard to understand.  But this is a classic “go from a variable number of things to a larger fixed number of things” technique that can be used for lots of problems.


Single Execution Time Scheduling With Variable Number of Processors

Sorry for skipping last week.  I got about halfway through the post for the next problem, but forgot that the reduction went through this other problem instead, and then ran out of time to fix it.

The problem: Single Execution Time Scheduling.  This problem is not in the appendix and is problem “P4” in the paper by Ullman that has this reduction and the problem next week.

The description: Given a set S of N jobs, arranged in a partial order, each taking 1 time unit, a deadline D, and a sequence of D “processor slots” arranged in a sequence from 0 to D-1, where the sum of all of the processor slots is exactly N, can we create a feasible schedule for all tasks that respects the partial order?

Example: Here’s a simple example of a partial order:

If D=3 and the processor slot sequence was {1,2,1}, then this can be solved: schedule a at time 0, b and c at time 1, and d at time 2.   (So d is done by time 3).

If the processor slot sequence was {1.1.2}, then at time 1, we can only schedule 1 of b and c, so we won’t be able to schedule d at time 2.

Reduction: The Ullman paper goes from 3SAT.  The original formula will have m variables and n clauses.  We will create jobs:

  • xij and ~xij where i goes from 1 to m and j goes from 0 to m.
  • yi and ~yi where i goes from 1 to m
  • Dij where i goes from 1 to n and j goes from 1 to 7.

The partial order on these jobs is:

  • Each xij comes before xi,j+1 and each ~xij comes before ~xi,j+1
  • Each xi,i-1 comes before yi and each ~xi,i-1 comes before ~yi
  • For each Dij represent the j index as a binary number (from 1 to 7).  The three literals in clause i also have 7 combinations of ways to set their literals to make the clause true, which can also be represented as a binary number.  Then for each binary combination, look at the positive or negative settings of the literals that make that combination true.  Then take the last of the x (or ~x) jobs of the variables corresponding to those literals and make it come before the Di job.  So if we are considering job xk, we’d make xkm come before the Di job we’re looking at.

That last thing is just a way of saying “there are 7 ways of making the clause true, you need to execute the job of the literals that makes the clause true before you do the clause job.”

The deadline is n+3.  The processor slots have:

  • m slots at time 0
  • 2m+1 slots at time 1
  • 2m+2 slots at times 2-m
  • n+m+1 slots at time m+1
  • 6n slots at timem+2

The idea is that at time 0 we need to run one of either xi0 or ~xi0  for each i.  (The other is run at time 1).  These will correspond to whether variable i is set t true or false.  We need to do that because we need to run the y jobs as soon as they become available (y0 or ~y0 – whichever is the same parity as the x variable we ran in time 1- needs to be run at time 1, and so on down).  At time 1, we run either xi1 of ~xi1, depending on what we do at time 0.  So at time m+1, we have one y job left over (the last of the y’s in the sequence we started late), m x jobs left over (the xim or ~xim corresponding to the variable we started at time 1), and hopefully have enough x jobs finished to be able to run n D jobs (one for each clause).  This is the way you’ll satisfy each clause.  Then at time m+2, everything is done except for the other 6n D jobs.

Difficulty: 7.  I think Ullman does a very good job of explaining his method, which actually obscures a bit how complex this reduction is, and all of the moving parts and algebra going on.