Monthly Archives: March 2018

Open-Shop Scheduling

On to the next section with a very nice reduction.

The problem: Open-Shop Scheduling.  This is problem SS14 in the appendix.

The description: Given a set of m processors and J jobs.  Each job j tas m tasks, t1[j] through tm[j].  Each subtask ti has a length l(t) and needs to be run on processor i.   We are not allowed to schedule the same job at the same time on two processors, but it doesn’t matter the order in which jobs go to processors.  Given an overall deadline D, can we finish all tasks by the deadline?

Example: The main difference between this problem and the scheduling problem is that each task gets broken into multiple sub-tasks.  I think of it like building a car- there is one “processor” that puts in the engine, one “processor” that does the body welding, one “processor” that puts on the wheels, and so on.  Since it’s all happening to the same car, we can’t be working on the same car in two places at the same time.

Suppose I have 3 jobs and 3 processors

Job P1 time P2 time P3 time
1 1 2 3
2 3 1 2
3 2 3 1

Obviously, the “right” way to do things is to schedule all of the tasks that have the same length on different processors at the same time.   Everything finishes by time 6.

If we get the first allocation of jobs wrong, though, perhaps by putting Job 1 on P1, Job 2 on P3, and Job 3 on P2, then at time 2 we have nothing to run on P1.  Trying to keep things as compact as possible, we can put Job1 on P3 and Job2 on P1 at time 3 (right after Job 2 finishes).  They both finish at time 5, and Job 3 has no place to go after it finishes at time 3.

Finishing the schedule, you get something like:

P1 J1 X J2 J2 J2 X J3 J3
P2 J3 J3 J3 X X J2 J1 J1
P3 J2 J2 J1 J1 J1 J3 X X

..finishing all of the jobs at time 8.

Reduction: The reduction by Gonzalez and Sahni uses Partition.  This threw me for a bit because G&J say that if you have only 2 processors, the problem is polynomial, and 2 processors seemed like the natural way to represent two partitions.  Suppose we have a partition set S, whose elements sum to T.  Each element in S is represented by 3 jobs, each with 3 tasks, one for each processor.  If the partition element is ai, then the first set of 3 tasks for that element will take time ai on P1, and time 0 on P2 and P3.  The second set of 3 tasks will take time ai on P2, and time 0 on P1 and P3.  The third set of 3 tasks will take time ai on P3, and time 0 on P1 and P2.   We add one extra job with 3 tasks, each taking time T/2 (which is the size of the partition of S).

Our deadline D will be 3T/2.  D also happens to be the total length of the last task, as well as the sum of all of the tasks put on each processor.  So the only way to meet the deadline is to have no idle time for any processor.

If a partition exists, then we can split S into two halves of equal size, and schedule all three processors with both halves of S, and one of the 3 jobs in the last task, as 3 discrete chunks, giving us a feasible schedule.

If a partition does not exist, then we can prove by contradiction that no feasible schedule exists:  Suppose we did have a schedule that finished by the deadline.  As noted above, there cannot be any idle time in the schedule for any processor, so each processor is running continuously from time 0 to time D.  Since processors can’t be working on the same job at the same time, that means that some processor (WOLOG P1) starts the big final job’s first task at time 0, and finishes at time T/2.  Then some other processor (WOLOG P2) continues the big final job’s second task from time T/2 till time T.  Since P2 needs to finish all of its jobs at time 3T/2 and has T total work done in other tasks, this must mean that the elements of S were split into a chunk of size T/2 done before the big job shows up, and a chunk of size T/2 done after the big job finishes.  This means that we did have a partition of S, a contradiction.

Difficulty: 5.  I really like this reduction, and think it’s very elegant. It may not be obvious to students starting from scratch why you need 3 processors, but it’s pretty obvious why this reduction wouldn’t work with just 2 processors.  Maybe if you give the students a hint that the 2-processor case is polynomial, then they’ll be able to get it.

Scheduling to Minimize Weighted Completion Time

This reduction took me a while to follow because there is a lot of subtle algebra you have to do for yourself.

The problem: Scheduling to minimize weighted completion time.  This is problem SS13 in the appendix.

The description: Given a set T of tasks where each task t has a length l(t) and weight w(t), a set of m processors, and a bound K.  Can we find an m-processor schedule for all tasks where each project’s “completion penalty” is its weight multiplied by it’s completion time, and the total completion penalty for all tasks is K or less?

Example: Suppose I had 2 processors and 4 tasks:

Task Length Weight
1 5 4
2 1 6
3 3 3
4 3 3

Scheduling greedily by weight would put task 1 and 2 on one processor, then tasks 3 and 4 on the second processor after task 2 finishes.  This has a total penalty of 20 for task 1 + 6 for task 2 + 12 for task 3 + 21 for task 4 = 59

Scheduling task 2 then 1 on processor 1 and tasks 3 and 4 on processor 2 gives a total penalty of 57.

Reduction: The paper by Lenstra, Rinnooy Kan, and Brucker was hard for me to follow because they are discussing lots of different scheduling variants in the same paper, and do it by parameterizing the kinds of inputs. I’m pretty sure our problem is Theorem 3b in the paper, which reduces from Partition.  So we start with a set T of elements and build 1 task for each element in T.  The processing time and weight of the task are the size of the element in T.  K will be the sum of the products of all pairs of elements – 1/4 of the sum of all of the elements squared.  We’ll have 2 processors.

They claim that since the processing times and lengths are all the same for each task, if you are given a set of these tasks on one processor, the order doesn’t matter, and we get the same total weight no matter what.  It took me a while to buy that, but it comes down to the fact that each task is costing one penalty unit per time step no matter where it is scheduled.

Given a subset T of S, define c to be the amount the sum of the elements in S is “off” of the partition.  (The difference between the sum of the elements in T and half of the sum of everything in S).  When c=0, T is a partition of S.

The completion penalty of a processor is the completion penalty of all of the tasks (remember, the order doesn’t matter)  – the sum of all of the weights of tasks on that processor * the sum of all of the weights not on that processor.  Which is K+c2. Which means we have a solution to the scheduling problem if and only if c is 0, which is exactly when we have a partition.

Difficulty: 6.  I’m not sure how much I’d trust students to be able to come up with the algebra themselves.  Make if you tell them to set up the problem with lengths and weights equal, and then make them prove that in that case, the order doesn’t matter, then it becomes more tractable for them.

Preemptive Scheduling

This is one of those reductions that is easy because you remove everything that makes the problem hard.  It always feels like cheating to do this, but I guess it shows that adding extra complexity to a hard problem will never make it easier..

The problem: Preemptive Scheduling.  This is problem SS12 in the appendix.

The description: Given a set T of tasks arranged in a partial order,   each task t has a positive length l(t).  We have a set of m processors and an overall deadline D.  Can we create a preemptive schedule for T that obeys the precedence constraints and meets the overall deadline?  A preemptive schedule allows tasks to be partially run on a processor and then removed, and we can finish it at some later time.

Example: Here is a simple partial order:

The number of the task is its length.  So task a takes 3 time units, for example.

A two-processor preemptive schedule could be something like: P1: (1,1,1,2,2,4)  and P2: (4,4,4,3,3,3), getting both tasks done at time6.

If we did not allow preemption, we would not be able to fit task 4 along with task 1, so we couldn’t do anything on P2 while P1 was running task 1.

Reduction: The paper by Ullman which we used for Precedence Constrained Scheduling has the reduction.   He basically notes that Precedence Constrained Scheduling is a special instance of Preemptive Scheduling where all lengths of all jobs are 1.  In that case, no preemption is possible.  So the reduction is basically “use the exact same problem”.

Difficulty: 2.  I usually reserve Difficulty 1 for “this is the same problem”, but here, there is a little bit of insight to convince yourself that you can interpret the problem this way.

Scheduling with Individual Deadlines

I feel like this problem is an easy extension of a problem we’ve already done.  This worries me, because why would it show up as a separate problem with a separate reference in G&J if that is the case?

The problem: Scheduling with individual deadlines.  This is problem SS11 in the appendix.

Description: Given a set T of tasks, each of length 1, arranged in a partial order, a set of m processors, and a deadline d(t) for each task t, can we schedule all tasks to meet all deadlines?

Example: Here’s a precedence graph:

Suppose the deadlines are:

Task Deadline
1 1
2 3
3 2
4 1
5 2
6 3

If we have 2 processors, then we can schedule all tasks by doing each task exactly as its deadline is coming up.  If we add one more requirement in the partial order from 3 to 5, then these tasks cannot all be finished by the deadline.

Reduction (easy version): When I read this problem, it seemed very similar to Precedence Constrained Scheduling.  The difference is that in that problem, we’re given a single deadline D that all tasks must be completed by, but in this problem, each task can have its own deadline.

But this still leads to a pretty easy reduction from PCS: We take our PCS instance, use the exact same tasks, use the same number of processors, and make each task’s individual deadline be D.  So our Individual Deadline instance is feasible exactly when all tasks are finished by time D.

This feels too easy.  Maybe I’m misreading one of the problems.  So, to be safe..

Reduction (Brucker, Garey, and Johnson version):

The reduction in the paper goes from Vertex Cover.  So we’re given an undirected graph G=(V,E), and a K.  Define define v = |V|, e= |E|, a= v+ e(e+1).  Define m = a + e+ v.  This will be the number of processors.  Define D = 3e+3.  This is the largest deadline for any task.

There will be m*(D-1)+1 tasks. Task T0 has deadline 1.  There will be many other “dummy” tasks Tij that are set up so that they have to start at time j and have to end at time j+1.  This is enforced by having each Ti,j have to come before Ti,j+1 in the partial order.  The number of i tasks at each j changes:

  • Ti,1 where i goes from 1 to m-k
  • Ti,2, where i goes from 1 to m-e
  • Ti, 2+j for each j from 1 to e, where i goes from 1 to v+j
  • Ti, e+2+j, for each j from 1 to e, where i goes from 1 to a+v+j-1
  • Ti, 2e+2+j, for each j from 1 to e, where i goes from 1 to e+v

T0 is the root of all of these and comes before each Ti,1.

Then we add the “graph-dependent” tasks.  Each vertex vi has a task that has to come after T0 and has deadline e+3.  Each edge ej spawns two chains of j tasks (E’j(1) through E’j(j) and E”j(1) through E”j(j))  each, all of which have deadline 2e+3.  We’ll also have two other groups of tasks for each edge: L’j(1) through L’j(a) and L”j(1) through L”j(a). The last E’ task for each j comes before all of the corresponding L’ tasks, and the last E” task for each j comes before all of the corresponding L” tasks.  Each of these L tasks has a deadline D-j+1.

Edges in G also create precedence constraints.  The edge ej= (vq, vr), with q < r, creates the ordering vq->E’j(1) and vr->E”j(1).

If G has a Vertex Cover, then that means that each edge in E has at least one endpoint in the cover.  The jobs for these vertices will be scheduled at time 1.  Since the vertex jobs have to be run before the corresponding edge jobs, we can run the E’ or E” job that was set by the last precedence constraints above at time 2.  Once that chain finishes, we can run the L jobs.

If we have a valid schedule that meets all deadlines, we must have scheduled T0 first, and each Ti,j at time j. This means we only have so many slots to schedule the E chains.  Since vertex tasks have to happen before these chains, we need to schedule k vertex tasks at time 1.  If the schedule is feasible, that must be a cover. (Because if it wasn’t, then there is an edge that does not have it’s E’ or E” chain start at time 2, which is too slow).

Difficulty: The easy way is a 2.  The way from the paper is an 8, because it’s hard to see why the specific numbers of dummy tasks comes from, why you need 2 chains (E’ and E”.  You need them because you don’t know which endpoint of the edge will be in the cover), why you need the L tasks, and so on.