Single Execution Time Scheduling With Variable Number of Processors

Sorry for skipping last week.  I got about halfway through the post for the next problem, but forgot that the reduction went through this other problem instead, and then ran out of time to fix it.

The problem: Single Execution Time Scheduling.  This problem is not in the appendix and is problem “P4” in the paper by Ullman that has this reduction and the problem next week.

The description: Given a set S of N jobs, arranged in a partial order, each taking 1 time unit, a deadline D, and a sequence of D “processor slots” arranged in a sequence from 0 to D-1, where the sum of all of the processor slots is exactly N, can we create a feasible schedule for all tasks that respects the partial order?

Example: Here’s a simple example of a partial order:

If D=3 and the processor slot sequence was {1,2,1}, then this can be solved: schedule a at time 0, b and c at time 1, and d at time 2.   (So d is done by time 3).

If the processor slot sequence was {1.1.2}, then at time 1, we can only schedule 1 of b and c, so we won’t be able to schedule d at time 2.

Reduction: The Ullman paper goes from 3SAT.  The original formula will have m variables and n clauses.  We will create jobs:

  • xij and ~xij where i goes from 1 to m and j goes from 0 to m.
  • yi and ~yi where i goes from 1 to m
  • Dij where i goes from 1 to n and j goes from 1 to 7.

The partial order on these jobs is:

  • Each xij comes before xi,j+1 and each ~xij comes before ~xi,j+1
  • Each xi,i-1 comes before yi and each ~xi,i-1 comes before ~yi
  • For each Dij represent the j index as a binary number (from 1 to 7).  The three literals in clause i also have 7 combinations of ways to set their literals to make the clause true, which can also be represented as a binary number.  Then for each binary combination, look at the positive or negative settings of the literals that make that combination true.  Then take the last of the x (or ~x) jobs of the variables corresponding to those literals and make it come before the Di job.  So if we are considering job xk, we’d make xkm come before the Di job we’re looking at.

That last thing is just a way of saying “there are 7 ways of making the clause true, you need to execute the job of the literals that makes the clause true before you do the clause job.”

The deadline is n+3.  The processor slots have:

  • m slots at time 0
  • 2m+1 slots at time 1
  • 2m+2 slots at times 2-m
  • n+m+1 slots at time m+1
  • 6n slots at timem+2

The idea is that at time 0 we need to run one of either xi0 or ~xi0  for each i.  (The other is run at time 1).  These will correspond to whether variable i is set t true or false.  We need to do that because we need to run the y jobs as soon as they become available (y0 or ~y0 – whichever is the same parity as the x variable we ran in time 1- needs to be run at time 1, and so on down).  At time 1, we run either xi1 of ~xi1, depending on what we do at time 0.  So at time m+1, we have one y job left over (the last of the y’s in the sequence we started late), m x jobs left over (the xim or ~xim corresponding to the variable we started at time 1), and hopefully have enough x jobs finished to be able to run n D jobs (one for each clause).  This is the way you’ll satisfy each clause.  Then at time m+2, everything is done except for the other 6n D jobs.

Difficulty: 7.  I think Ullman does a very good job of explaining his method, which actually obscures a bit how complex this reduction is, and all of the moving parts and algebra going on.


Sequencing to Minimize Maximum Cumulative Cost

I want to give credit to our Interlibrary Loan people who tracked this reference down- it’s a Ph.D. thesis that only existed on microfilm.   So I got to use the microfilm reader for the first time since elementary school.  Those things are very different now.

The problem: Sequencing to Minimize Maximum Cumulative Cost.  This is problem SS7 in the appendix.

The description: Given a set of tasks with a partial order defined on the tasks, and a (possibly negative) integer cost c(t) for each task t, and a positive value Z.  Can we create a one-processor schedule that obeys the partial order constraints such that for each task t, the cost of each task scheduled before t is K or less?

Example: Suppose we have the following precedence graph of tasks (the number in the vertex is the cost of the task):

The schedule (d,e,a,c,b), has the last cost (task b) with a total cost of 12 (the sum of the costs of all tasks before it).  Notice that if some task costs are negative, it’s possible that the last task that is done does not have the maximum total cost.  For example, if task c’s cost was -4, then the schedule (d,e,a,c,b) has its maximum cost of 10 after task a is scheduled.

Reduction: I got this from a Ph.D. thesis by Abdel-Wahab, and in it, he reduces from Register Sufficiency.  So we’re given an instance of register sufficiency: a directed graph G=(V,E), and a K.  The precedence graph G’=(V’, E’) that we’ll build has 2 copies of every vertex in G.  The edge set of G’ will have all of the edges in E, and additionally, if (i,j) is an edge in E, then (j’, i) is an edge in E’.  (That’s an edge from the second copy of j, back to i.)  Each vertex costs 1 if it was in V, and -1 if it is a second copy.  Set K’ = K+1.

If the register sufficiency problem has a solution, then we build a sequence based on it.  Thinking in terms of Sethi’s “stone game”, each time we place a new stone on a vertex, we add that vertex to the end of the sequence.  If we pick up a stone from vertex x, add x’ (the second copy) to the end of the sequence.  If we’re moving a stone from y to x, then that’s like removing it from y and adding it to x.  You can prove by induction that the max cost of any task in this sequence is K’.

If the scheduling task has a solution, then we build a solution to the stone game.  For each task in the schedule, if it is in V (the original set of vertices), then place a stone on that vertex.   If it is not in V then it is a new vertex, so remove a stone from the copy in V.  Since K’ = K+1, it’s possible that this sequence of steps uses K+1 stones, so we need to modify it to use K.   So, find a node in the sequence whose cost is exactly K’.  We know this node is in V (it was a “place a stone” move since it took us to K’ stones).  It can’t be the last task since we have to end with one that costs -1 (since all of those vertices have no outdegree, and the ones that cost 1 all have an outdegree of at least 1).    So look at the task after it.  It can’t be a “place a stone” move, otherwise, we’ll use K’+1 stones.  So we know that the next task to be scheduled has cost -1.  Since that move will be to pick up a stone, just move the stone that’s about to be removed to our peak cost task instead of allocating a new stone, and we will solve the problem using 1 less stone at this point. If multiple places in the schedule cost K’ exactly, we can do this modification to all of them, creating a program sequence that uses just K stones at most.

Difficulty: 6.  The reduction is not hard, but the stone game takes some work to understand, and the turning a solution that costs K’ into a solution that costs K is a little tricky.  I wonder if there’s an alternate reduction that has K’=K.

Register Sufficiency

We’re jumping ahead again since this problem is used in the next reduction.

The problem: Register Sufficiency.  This is problem PO1 (“Program Optimization”) in the appendix.

The description: Given a directed acyclic graph G=(V,A), and a positive integer K,  can we find a “computation” for G that uses K or fewer registers?  The idea behind a “computation” is that vertices represent values that need to be kept in registers, and edges show dependencies between these values.  So can we represent the computation keeping K or fewer values in memory at all times?

Example: Here is the example graph from the paper by Sethi with the reduction (p.227):

So, for example, the “x” at the bottom is used twice: Once in t1 for the c*x calculation, and one at t3 for the (b+c*x)*x calculation.  We’d like to keep the x in the same register for both calculations.  The numbers of the vertices show the registers holding the values and lead to the following assembly-like computation:

  1. Load c into register 2
  2. Load x into register 3
  3. Multiply registers 2 and 3, putting the result in register 2.
  4. Load b into register 1.
  5. Add registers  1 and 2, putting the result in register 2.
  6. Multiply registers 2 and 3, putting the result in register 2.
  7. Load a into register 3.
  8. Add registers 2 and 3, putting the result into register 1.

A related problem: It’s worth mentioning that Sethi represents this computation problem as a “game” problem of placing stones on a graph.  The stones are like registers.  The possible moves are:

  1. Place a stone on a leaf (allocating that leaf’s value to a register)
  2. Pick up a stone from a node (reclaiming that register for another use)
  3. Place  a stone on a non-leaf node if there is a stone on all of its children (generating a value and putting the result in a new register)
  4. Move a stone from a node to a parent node if all children of the parent have a stone (generating a value and putting the result in a register held by one of the children).

The problem then becomes: Can we get stones on all “root” nodes of G (nodes without parents) using K or fewer stones at all times?

Reduction: Sethi’s reduction is from 3SAT.  The SAT instance has m clauses and n variables.  K will be 5n+3m+1.  The DAG that gets built has n+m “stages”.  The first n stages “assign” truth values to variables, and the rest check that each clause is satisfied.

The variable states are a pretty crazy structure based on a vertex zi, which can only be computed after all of its ancestors are computed, and when computed, there will be n-i+1 stones in hand.  This is the number of stones needed to compute the node to xi and ~xi.  So the last stone in that computation is placed on either xi or ~xi, “setting” its truth value (and not leaving any stones to compute the opposite value).  The clause stages are set up so that we only have enough stones to compute the clause if the clause is satisfiable (using the computed stone sitting on the xi or ~xi vertex).

The actual reduction goes on for pages and pages and has a lot of details to worry about.

Difficulty: 9.  It’s a shame that the cool idea of the stones game still led to such a hard reduction.

Sequencing With Deadlines and Setup Times

Is it weird that G&J call these problems “sequencing” and not “scheduling”?  They use “scheduling” for multiprocessor problems, and shop scheduling problems.  I guess the thinking is that single-processor problems are just “sequences” of what you put on the processor.

The problem: Sequencing with Deadlines and Setup Times.  This is problem SS6 in the appendix.

The description: We’re given a set of T tasks, each task with a length l(t) and deadline d(t).  We also have a set of “compilers”, C, and each task has a specific compiler k(t).  Each compiler has a setup time l(c).  Can we find a one-processor schedule for T that meets the deadlines of all tasks with the additional constraint that if two tasks are scheduled with different compilers, we must pay the setup cost of the second task’s compiler before starting the second task?

Example: I’m not sure “compiler” is the best word for C.  I think of it as more of a preprocessor (and in fact, the paper by Bruno and Downey that has the reduction calls it a “setup task”).  Each task needs some preprocessing to be done before it can run, and if you run two tasks that need the same preprocessing in a row, you can do them both one after the other. Otherwise, you need the preprocessing to happen (immediately) before you start the task.

So, suppose I have 2 compilers:

Compiler Setup time
1 5
2 7

And 4 tasks:

Task Length Deadline Compiler
1 5 10 1
2 3 20 2
3 3 23 2
4 10 100 1

..Then if we schedule task 1 first, its setup + length gets it done at time 10.  Then we do the setup for task 2, so it finishes at time 20.  Task 3 uses the same “compiler”, so does not need to do setup time, so will finish at time 23.  Task 4 uses a different compiler, so needs to re-run compiler 1’s setup time of 5, and will be done at time 38.

Notice that the most “efficient” schedule that minimizes the number of setups we have to do will not schedule all tasks by their deadlines.

Reduction: I’m pretty sure this problem is Theorem 1 in the paper by Bruno and Downey.  They say they reduce from Knapsack, but since there aren’t any weights, they’re really using Sum of Subsets.  The original set S has N elements, a target B, and the sum of all elements in the set is called t0 (which will, of course, be > B).  There will be N+2 “classes” of 3 tasks each that share the same setup task.  The setup for all classes Ci takes time 1 and has 3 tasks: one that takes time 1, one that takes time si (the corresponding element in S), and a third task that takes time H*si, where H is the larger of t0 and N+2.  The 2 “extra” classes Co and Cn+1 work similarly, but use t0 instead of si.  The deadline for the first task in all classes is d1 = 2(N+2) + B.  The deadline for the second task in all classes is d2 = 3N+5+3t0+2Ht0-HB.  The deadline for the third task in all classes is d3=3N+5+3t0+3Ht0.

They go on to show the following facts:

  • In a feasible schedule, you can’t have a d3 task finish before the d1 deadline (there is no time for it).
  • In a feasible schedule, you can’t finish the second task of C0 or Cn+1 before d1.  (Since the second task takes t0 time, feasibly scheduling everyone’s first tasks with their setup times does not leave enough time left).
  • In a feasible schedule, we can only setup 2N+3 setup tasks at most. (More leaves too much time for setup and processing and someone will miss their d3 deadline).
  • In a feasible schedule, there is exactly one task that gets setup once (that is, it gets set up, and does its three tasks in sequence).  This is because of the first bullet and the fact that we have a limit of setup tasks.
  • In a feasible schedule, you never do the first task then the third task without doing the second task  (You can rearrange other feasible schedules to have this property and remain feasible).

What results from these facts is that a feasible schedule has: a bunch of first tasks (preceded by the setup task), a bunch of first and second tasks in sequence (preceded by the setup task), and a single class that does its setup then all three tasks.  This last sequence crosses over the d1 deadline.  Then we need to set up and schedule the second and third tasks of all of the classes we only did the first task for before the d2 deadline.  Then we will set up and schedule the last task of all classes that have a third task remaining.

It turns out that the classes we chose to do the first 2 tasks before the d1 deadline for have costs (and second task length) of exactly B.

Difficulty: 7.  This is a really good paper that does a really good job of proving each of the above facts along the way and showing how it all works (many other papers would resort to saying some of these proofs were “obvious”, which I don’t think they are).  Having said that, I don’t know how anyone comes up with these classes- especially the deadlines- without weeks of trial and error, and that is too much for a student to manage on their own.


It’s kind of amazing that I’ve gotten this far without needing such a classic problem.  But I was writing up the next post and needed to link to my Knapsack article, and it wasn’t there, so..

The problem: Knapsack.  (More technically, “0/1 Knapsack”).  This is problem MP9 in the appendix.

The description: Given a set U of items, each item u in U has a profit p(u), and a weight w(u).  (G&J call these “value” and “size”), and positive integers B and K.  Can we create a subset U’ of U that has total profit at least K, but total weight at most B?

Example: Knapsack is one of my all-time favorite problems, which probably says something about me.  But I inflict it on my students at all levels of the curriculum – it’s a good motivation to introduce parallel arrays, and building simple classes at the low levels, the fractional version is a good example of a greedy algorithm, and the 0/1 version is a good example of where greedy fails.  It also fascinates me how it’s an example of a problem where a problem that has infinite solutions (Fractional Knapsack, when you can take any proportion of an item) is easily solvable, but a problem that has fewer solutions to consider (0/1 Knapsack, where there are “just” 2 choices for each item) is intractable.

Anyway, here is my classic “Your Greedy Approach Won’t Work” example:

Item Profit Weight
1 5 4
2 3 3
3 3 3

If B=6, the best option is to choose items 2 and 3.  But greedily picking items by profit/weight ratio (which works for the Fractional Knapsack problem) will choose item 1, and lose profit.

Reduction: G&J reference Karp’s paper and this is a classic reduction you see in most books that do this problem.  You go from Subset Sum.  We’re given a set S, and an integer K.  We create a set U with the same number of elements, make the profit and weight of each element the same, and equal to the size of the corresponding element in S.  We set K’=B=K.

If the original set had a subset summing to K, then taking those elements will make us a profit of K’ and a weight of B.

If we have a Knapsack solution with profit K’ or more, then since the profits of all items are equal to their weights, the only for the total weight to not exceed B is for the profit to be exactly K’.  So taking the corresponding items in S will get a sum equal to exactly K.

Difficulty: 3.  I use this as an easy homework or test question all the time.

Sequencing to Minimize Weighted Tardiness

Happy New Year, everyone!  I just checked ahead in the appendix, and at my current pace of about 1 problem a week, we should get through appendices A5 (Sequencing and Scheduling), A6 (Mathematical Programming), and possibly A7 (Algebra and Number Theory) this year.  The appendix goes through A12, so it looks like I have plenty of work ahead of me.

The problem: Sequencing to Minimize Weighted Tardiness.  This is problem SS5 in the appendix.

The description: Given a set T of tasks, each task t  has a length l(t), a weight w(t) (which is really a penalty for finishing a task late) and a deadline d(t), and a positive integer K.  Is there a one-processor schedule for all tasks in T where each task is penalized by w(t) for each time unit it goes beyond its deadline, such that the total penalty is K or less?

Example: Suppose I have 3 tasks:

Task Length Deadine Weight
1 3 3 100
2 1 4 3
3 2 4 1

If task 1 is going to not miss its deadline, it needs to start right away.  Since the weight penalty for missing the deadline is pretty disastrous, we should do that.

At time 3, when task 1 is done, we have a choice: Schedule task 2, and have task 2 finish 2 time units late, or schedule task 3, have it finish just 1 time unit late and schedule task 2 afterward.  With these weights, scheduling task 2 first gives a penalty of 2 (task 3 finishes 2 time units late, for a penalty of 1, and 1×2=2).  The other plan would have a penalty of 7 (Task 3 finishes 1 unit late, and task 2 finishes 2 units late).

If the weight of task 2 was 1, and the weight of task 3 was 5, then scheduling task 2 first gives a penalty of 10, but scheduling task 3 first gives a penalty of 6.

Reduction: G&J reference a paper by Lawler, which itself references G&J.  Weird.   Anyway, they reduce from 3-Partition.  Given a 3-Partition instance (a set of 3n elements, and a bound B.  Each element is between B/4 and B/2, and the sum of all elements is m*B), we create 4n jobs.  3n of them will be “A” jobs, each one corresponding to an element in the 3-Partition instance.  We will also have n “X” jobs.

Each “A” job Ai corresponds to an element in the 3-partition instance ai.  These jobs have weight and length B+ai and deadline 0.

The X jobs all have processing time 16B2+n(n+1)/2 + 1.  Call this number L.  The X jobs have weight (L+4B)*(4B)*n(n+1)/2 + 1.  Call this number W. Each X job Xi has deadline iL+(i-1)*4B.    Notice that these X jobs don’t depend on the values in the partition instance at all (they do depend on the size).

Our K value is W-1.

If the original set has a partition, then we can make a schedule of: X1 then the A jobs corresponding to a 3 element partition, then X2, then another set of 3 A jobs corresponding to another 3 element partition, and so on down.  Since we know the  3 elements in a partition sum to exactly B, we know that the processing time of each set of 3 A jobs is exactly 4B.  So each X job will finish at time i*L+(i-1)*4B, and not be tardy.  The A jobs all have deadline 0 and will all be tardy and will pay penalties related to their completion times, but that will sum to less than K.

In the other direction, suppose that we have a legal schedule.  Since all of the X jobs have weight > K, they all have to be scheduled before their deadlines.  Then we can talk about the weights and completion times that the A jobs need to have to fit around these X jobs.  It’s a lot of algebra to make it all work out, but the math isn’t too hard.

Difficulty: 7.  While the algebra is followable, the idea that you’d need to come up with these numbers in the first place is a hard one- I know they’re picked because they “work”, but I feel like there must be an easier way.

Scheduling to Minimize Weighted Completion Time

There probably will not be a post next week, since I won’t be at work over the holidays.  See you when I get back!

The problem: Sequencing to Minimize Weighted Completion Time.  This is problem SS4 in the appendix.

The description: Given a set T of tasks, and a partial order  \lessdot  on each task.  Each task t \in T has a length l(t) and a weight w(t).  We’re also given a positive integer K.  Can we find a one-processor schedule for T that obeys the precedence rules (so if a \lessdot b, then a is scheduled before b), and the sum of the completion time of each task * the weight of that task is K or less?

Example: Suppose we had 3 tasks:

Task Length Weight
1 2 4
2 3 3
3 4 1

Suppose the only ordering constraint was that 1 had to be scheduled before 2.  Then scheudling 1 first, then 2 seconds, then 3 third gives the following weights:

  • 1 is finished at time 2, so gives weight 8
  • 2 is finished at time 5, so gives  weight 15
  • 3 is finished at time 14, so gives weight 14.

..for total weight 37

If we also add a constraint that 3 had to be scheduled before 2, then there are only 2 feasible schedules (1,3,2) and (3,1,2).  The (1,3,2) schedule has a total weight of  41, and the (3,1,2) schedule gives a total weight of 55.

Reduction: The paper by Lawler reduces from Optimal Linear Arrangement.  So we start with an undirected graph.  Each vertex becomes a task with processing time 1 and weight 0.  Each edge gets an  “in” job with processing time 0 and weight -1, and an “out” job with processing time 0 and weight 1.   The “in” job has to come before the vertex it is going in to and the “out” job has to come afterward.

Recall OLA asks: Can we find an ordering of each vertex where  f(v) is the position of vertex v, such that:

The construction of the tasks gives us the same weighting for the “in” and “out” tasks for each edge (the “out” task contributes f(v) to the total, and the “in” task subtracts f(u) from it).  The only problem is that we should have positive lengths of jobs.

So let the in and out tasks take time 1, and vertices take time |V^4|.  Next, we can replace each vertex task with |V^4| tasks each of time 1, each linked in a precedence chain.  Once that happens, we can increase the weights of all tasks by the same amount and not change the problem.

Difficulty: I think the first part is difficulty 4.  Once you do the tricks with the numberings, it probably becomes a 6.

Sequencing to Minimize Tardy Tasks, Sequencing to Minimize Tardy Task Weight

I don’t want to constantly restate reductions that are done in the G&J book, so I’ll just state that problem SS2, “Sequencing to Minimize Tardy Tasks” is done in G&J on page 73.

The next problem has a very similar title, but is a pretty different problem:

The problem: Sequencing to Minimize Tardy Task Weight.  This is problem SS3 in the appendix.

The description: Given a set  T of tasks, each with a length, a deadline, and a weight (think “penalty”), and an integer K, can we find a one-processor schedule for T such that the sum of the weights of the tasks that are not completed before their deadline is K or less?

Example: Here’s a pretty simple example to show what the hard decision is:

Task Length Deadline Weight
1 1 1 3
2 2 3 4
3 3 3 5

Obviously, we can only do either task 3, or both of task 1 and 2.  The current weights favor doing tasks 1 and 2 (because we want to make the weights of missed tasks small), but changing the weight values can change the decision.

Reduction: This is done in Karp’s paper.  We reduce from Partition. Let S be the set we are trying to Partition.   We’ll have a task for each element in the S.  The length of each task and its weight will be equal to the size of the element in S.  K = half of the sum of all of the elements in S.   The deadline for all tasks is also K.

Notice that the only way to get total weight K (or less) after the deadline is to get total length K (or more) done before the deadline.  We can’t do more than K length before the deadline (since the deadline is K). So the only way to have a feasible schedule is to have a set of tasks whose lengths are exactly K, which is a partition.

Difficulty: 3.  I think this is a little easier than last week’s since you don’t have to make up that boundary task in the middle.  Still, going from one set of things (the elements in S) to 3 values (length, deadline, and weight) gives opportunities for confusion.

Sequencing With Release Times and Deadlines

On to a new section!  This is the “Sequencing and Scheduling” section of the appendix.  The first problem is a bit weird because I think it appears with a differnt name elsewhere in G&J.

The problem: Sequencing with Release Times and Deadlines.  This is problem SS1 in the appendix.

Description: Given a set of tasks, where each task t has a “release time” r(t), a “deadline” d(t), and a length(t), all positive integers, can we create a one-processor feasible schedule for these tasks that meets all deadlines?

The definition of “feasible” is pretty straightforward:

  • No task can start before its release time.
  • No two tasks can be running at the same time (and there is no preemption, so once a task starts running, it must complete)
  • All tasks finish before (or equal to) their deadline.

Example: Here’s an example that will relate to the reduction:

Suppose I have 5 tasks: The first 4 tasks are similar: All of them start at time 1, all of them have a deadline at time 11, and the lengths are 1,2,3, and 4.  Our fifth task starts at time 5, has a length of 1, and a deadline of 6.  (So every feasible schedule must have this task own the processor between times 5 and 6)

A feasible schedule would be: {1,4,5,2,3}.  5 minutes of tasks before the 5th task, then 5 minutes of tasks afterward.

Reduction: Like I said above, I think this equivalent to the G&J “Sequencing Within Intervals” problem- at least I can’t see a difference.  The reduction for that problem is on page 70 of the G&J book, and it’s pretty elegant.

We reduce from Partition.  Let B = the sum of all elements in the Partition instance.  Each element in the set will become a task with starting time 1, deadline B+1, and a length equal to the value of the set element.  We have one extra task that is released at time B/2, has a length of 1, and a deadline of B/2 + 1 (this is like our 5th task in the example above).

The idea is that the only way to fit all of the tasks feasibly is to have some subset of the tasks start at time 1 and take (collectively) exactly B/2 time, then we have our 1 extra task, then we fill the time from B/2+1 to B+1 with the rest of the tasks (which also take collectively exactly B/2+1 time).  This forms a partition of B.

Difficulty: 3.  I think this is a good reduction students can come up with on their own.  I like the slickness of the extra element to force the partition.  If the problem we’re talking about is actually different than the one done in G&J, we’d have to see just where the differences lie.

Safety of File Protection Systems

This is the last problem in the “Storage and Retrieval” section of the appendix and uses the Linear Bounded Automaton Acceptance problem from last week. So, since LBA acceptance was “not known to be in NP”, this problem is as well.

The problem: Safety of File Protection Systems.  This is problem SR36 in the appendix.

The description: We’re given:

  • A set R of “generic rights” (intuitively, these are things like “read access” or “write access”)
  • A set O of “objects”
  • A set S ⊆ O of “subjects”
  • A matrix P defining the set of rights for each combination of set and objects.  That is: for all s∈S and o∈O, P[s][o] defines the rights (a subset of R)  that subject s has on object o.
  • A set C of “commands”  Each command takes a set of parameters X1 through Xm, where each Xi is an object (and many are subjects as well).
  • The commands begin with a series of if statements: “if ri ∈ P[Xsi][Xoi]”, where “si” and “oi” are numbers from 1 to m.  So the if statement is asking if a specific right exists for a specific pair of parameters to the command.  The command will have several of these if statements connected by “and” statements.
  • If all of the clauses in the if statement is true then a list of several operations can happen.  The two operations that can happen are “enter a right into P” or “delete a right from P”.   I think you can also delete objects and subjects, but you can’t create them (in the version of the problem we’re talking about), so I don’t know if it matters

Can we find a sequence of commands in C where at some point in the sequence after executing a command, some set P[s][o] has a right r’ that it previously did not have?

Example: It’s pretty easy to come up with an example that is unsafe- just create a function that gives a right to someone.  I think where it gets harder is when you deal with operations that have sets of commands.  Here is an example from the paper by Harrison, Ruzzo, and Ullman that discusses this problem:

command IREAD(s1, s2, o){  
  if "read" ∈(s2, o) and
     "iread" ∈ (s1, s2)
     enter "read" into (s1, o)
     delete "read" from (s1, o)

The command “iread” is meant to stand for “indirect read”.   So what this is saying is that s2 can read 0, and s1 has the rights to read what s2 reads.  So s1 gets (temporary) permission to read 0.  Presumably, some time passes in between the granting of the read permission to s1 and the removal of that right, during which the data in o gets read by s1.  Notice that this is a safe procedure because, by the end, no new rights have been granted.

Reduction: The paper referenced above shows that a system in which we also have operations that can create objects and subjects is undecidable.  They do this by showing how this system can simulate an arbitrary Turing Machine.  The reduction turns each transition into a command in the system: The current state, the current cell the head is over, and the contents of that cell are implemented as rights, and then state transitions can be seen as commands: The “if” parts of the command check that the machine is in the correct configuration, and the “then” parts change rights in a way that simulates the moving of the head, the output to the tape, and the transition to a new state.   Importantly for our purposes, the only time these commands use the “create” commands (that are in the undecidable version of the problem, but not in ours) is when the Turing Machine moves the head into a previously unexplored area of the tape.

They then say that in our problem, which doesn’t have “create” commands, a simulation “similar to that used” in the undecidability reduction can also be used to convert an LBA into a set of commands, since an LBA won’t be moving its head arbitrarily into new space.  I think the LBA still is allowed to use some new space though, but I guess that since it is bounded by a polynomial in the input size, we can simulate that by having each cell get it’s own access rights, and that keeps us with a polynomial-sized reduction.

Difficulty: 8.  This is a hard problem to understand, and the reduction is done in a non-standard way (and depends on the LBA acceptance reduction which is also done in a non-standard way), so it may throw some people for a loop.  It is a cool example of showing non-standard ways to reduce things though if that’s what you’re looking for.