 This is lecture 8 of Computer Science 162. Topic today is thread scheduling. So our goals are first to talk about scheduling policies and what their goals are, then to talk about some of the options that we have for scheduling policies and some of the implementation considerations for the different scheduling policies. So earlier in the course we talked about the life cycle of a thread. And if you remember threads are created, then they're added to the, and when they're created they're added to the ready queue. At some point they get scheduled and get to run on the CPU. And then they can do a number of different things. They can compute, they can do some IO like for example reading from a file that will cause them to be put onto an IO queue because IO operations take some time and we don't want the CPU to be idle while we're waiting for that IO to complete. Now a thread that's running on the CPU could be preempted if we have preemptive scheduling because a timer interrupt goes off and then it'll be put back into the ready queue. It can also be interrupted by for example a network packet coming in. And then again it'll be put on the ready queue. It can also yield and again be put back into the ready queue. So the question for today is how does the operating system decide which of the threads it should take off of the queue. Now there are many different queues that we have in the system. For example we have IO queues for different devices and we could pick which one of the requests that's in that queue to service first. But today we're going to focus on the ready queue and giving the processor to a particular thread. And so this is what scheduling is about. Deciding which threads are given access to resources. So scheduling is not a new area of research. It's been an ongoing area of research since the early 1970s. And we're going to make some assumptions that are drawn from that era. And so they're not quite true today but they give us a basis for talking about scheduling. And then we can go in and add on all sorts of different extensions that change the assumptions. So some of the implicit assumptions that we have are the following. There's one program per user. There's one thread per program and programs are independent. Now again this is unrealistic but it's a way of simplifying the problem. If we have multiple jobs per user then what does it mean if one user runs five jobs and another user runs one job? In most operating systems it means that user running one job will get five times as much as a user who's running one job. So what does fairness mean in that context? Should fairness be defined at the job level or should it be defined at the user level? So to simplify everything we're going to make these assumptions. If there's one program per user one thread per program and the programs are independent. So then our high level goal is how do we dole out CPU time that optimizes some particular parameters that we might have of the system. And here we have a picture of the CPU being divided up between user one, then user two, then user three as we go through time. So the assumption that we're going to make is that we have CPU bursts. So if we look at a program as we have here on the left we can see that the program does a number of operations. Load and add and then it does a read from the file. Now when it does a read from a file that's going to cause it to wait for IO. So it did a bunch of CPU and now it waits for IO. And then it's going to go and store a value and it's going to do some index operation and then it's going to write to the file. So again it does a little bit of compute and then it does some IO and it has to wait for that IO. And then it does a load and add and again reads from the file so there's additional IO that occurs. Now if you actually measure in a program what you'll find is that most of the time you have very short bursts of computation. So this is our execution model. Programs are alternating between bursts of CPU and IO. So use a CPU for some short period of time then do some IO then use a CPU again. So at each one of these before a CPU burst occurs or when an IO burst occurs we have to make a scheduling decision. When IO burst occurs we have to decide what thread to schedule now. The CPU burst occurs because we've scheduled that thread to run next. Now if the CPU burst is very long and we're using time slicing then we may force the thread to give up the CPU before it's actually finished its current burst. So for example we can see here most of the bursts are less than 8 milliseconds so maybe we set our time slice to be 8 milliseconds. And if anybody runs longer than that then they might get preempted and so they'll start their burst then get preempted and then sometime later they'll get scheduled again and be able to finish that burst. So now we can look at some ways of measuring our scheduling policy. So the first metric is the waiting time. And this is the time that a job is waiting in the ready queue. So it's the time between when a job arrives in the ready queue and when we actually launch the job that is give it the CPU. The second metric is the service or execution time. This is the time that the job is actually running. The third metric is the response or completion time. And this is the time between when the job first arrives in the ready queue and when the job actually finishes running. So response time is what a user sees. So when you type a key in your editor and it gets echoed on the screen that's response time. Or it could be something larger granularity like when you compile something in a compiler. How long does that take? So response time is equal to the waiting time plus the service time. So how long the job is actually running and then how long the job spends waiting to run. Now another important metric is throughput. And this is the number of jobs that are completed within a time unit. And there's a relationship between throughput and the response time, but they're not the same thing. In fact, minimizing the response time is going to cause you to have more context switching than if you were only focused on trying to maximize the throughput. And we'll see an example of that in a little bit. So now we can talk about some of the potential goals that we might have for a scheduling policy. So one goal might be to minimize response time. So we want to minimize the elapsed time to do some operation. Another goal could be to maximize throughput. Now there are two parts to maximizing throughput. You have to minimize the overhead. So for example, eliminate as much context switching as possible because context switching is wasted work. You're having to spend CPU time saving registers and loading registers and making efficient use of the resources, the CPU, the disk, the memory, and so on. Now you might ask the question, what does efficient use of resources have to do like for the disk have to do with the CPU? Well, you can't do anything with the disk if you don't have the CPU. You can't schedule an IO operation to occur like a read or a write if you don't have the CPU. So if we want to be able to maximize utilization of all these resources, we have to do that by making the most efficient use of those resources. So another policy goal might be fairness. We want to share the CPU between users in some equitable way. Now, fairness is not the same as minimizing the average response time. In fact, as we're going to see in an example shortly, we can get better average response time by making the system be less fair. So the first scheduling policy we're going to look at is called first come first served or FIFO or run until done. Now in early systems like batch systems or mainframe systems, first come first serve meant one program is scheduled until it's done. And that includes if it does any IO. You just block and wait while the IO occurs. That's obviously very inefficient. For programs doing a lot of IO, we'd like to do something else like run another program. So we're not holding the CPU idle during that time. So today, first come first serve or first out means that you keep the CPU until the thread blocks. So you run for your CPU burst. And when you do IO, you give up the CPU. So here's an example. You have three processes. Process one has a burst time of 24. Units don't matter here. Process two has a burst time of three. And process three has a burst time also of three. So let's look at the Gantt chart of how the processes would get scheduled if they arrive in the order P1, then process two, then process three. So P1 is going to run from time zero to 24. Then P2 runs from time 24 to 27. And then P3 runs from time 27 to time 30. So if we look at our waiting times, P1 doesn't wait at all. So it's waiting time is zero. P2 waits for 24 before it gets to run and P3 waits for 27. So our average rate waiting time is 17. Our average completion time P1 finishes at 24, P2 at 27 and P3 at 30 is going to be 27. So what happens here is pretty bad, right? These short jobs P2 and P3 end up stuck behind P1. This is called the convoy effect, where a short process gets stuck behind a long process. So we want to avoid this. This is not good because we're making P2 wait 24 when it only is going to run for three. So it has to wait a really long time, and P3 even worse has to wait 27 and it only runs for three. So it waits for nine times longer than it actually runs. And so it finishes much, much later. So consider the reverse order, or a different order rather. Suppose the order of arrival was process two, then process three, and then process one. Now the Gantt chart for the schedule is process two starts at zero and finishes at three, process three starts at three and finishes at six, and process one starts at six and finishes at 30. So now our waiting time for P1 is six, our waiting time for P2 is zero, our waiting time for P3 is three. So our average waiting time has dropped to three. Our average completion time drops by from 27 to 13. So much better average waiting time, three versus 17, much better average completion time of 13 versus 27. All dependent on the order of arrival. So the advantages of first come first serve is it's very simple. The scheduling decision is whoever arrived first, we run them first. Second, we run them second. Arrives third, we run third, and so on. But the disadvantage, and this is a really huge negative, is that short jobs can get stuck behind long ones. So I like to use real world analogies. So this is just like remember that too much milk lecture. You get home from school. There's no milk. You go to the supermarket and you just have that gallon of milk you want to buy, and you get stuck behind the person that's got two carts worth of groceries because they're like shopping for the month. And so it's going to take you half an hour to get out there. The good news is this solves the synchronization problem because your roommate will see you waiting in line and then they'll go and put their milk back on the shelf. They'll only bring one gallon of milk home. But it'll take a long time to get that milk home. So not a good benefit of first come first serve. It's a big negative. Okay. So the second scheme that we're going to look at is round robin. And this is motivated by the fact that first come first serve is potentially bad for short jobs because it's highly dependent on the arrival order. If you're the first one in line with just the milk, you really don't care, right? On the other hand, if you're at the back of the line behind people with multiple people with big carts, you're going to care quite a lot. So with round robin, it works as follows. Each process is going to get a small unit of CPU time called as a time quantum. And it's usually somewhere between 10 and 100 milliseconds. Now after the quantum expires, if the process is still running, it's preemptive and it's added to the end of the ready queue. So if we have end processes in the ready queue and the time quantum is Q, then each process will end up getting one nth of the CPU time. And that'll be in chunks of at most Q time units, the length of our time quantum. Now this means that no process will have to wait more than n minus one times Q time units. So it'll basically have to wait for everyone else to do their Q time units in the worst case. Or as long as our n should stay. Okay, so performance is as follows. If we make Q really large, then this approximates first come first serve. If we make Q be a week, then it's pretty much first come first serve. If we make Q really small, then we're going to be heavily interleaved. And this will reduce the amount of time that a process has to wait. But we have to make sure that Q is large relative to the time it takes to do a context switch. Because otherwise the overhead from context switching would be too hot. If we made Q be one instruction, we'd spend all of our time context switching and effectively no time doing any actual useful work. So it's a balance here. We'll talk about that more in just a moment. So let's look at an example of round robin with a time quantum of 20. So we have four processes here. Process one has a burst time of 53. Process two has a burst time of eight. Process three has a burst time of 68. And process four has a burst time of 24. And we'll soon they arrive in this order P1, P2, P3, and then P4. So again, charts going to look as follows. First we run process one and it will run for, it wants to run for 53. Now at time 20, bing, the timer interrupt goes off and we take the CPU away. So now we give the CPU to process two. And process two is only going to run for eight. So then it finishes and it exits the system. So now we're going to run process three. And again, process three wants to run for 68. But after 20, we're going to stop it. So time 48. So our next process that we'll choose to run is going to be process four. And again, process four wants to run for 24. That's larger than 20. So at time 68, bing, timer interrupt goes off and we preempt it. Okay, it has four left to go. So who do we run next now? We run, we now loop all the way back. We're going to run process one again. And process one will run for 20 out of its remaining 33. And that leaves it with 13. Okay, so timer interrupt goes off again. It gets interrupted. And the next thread we're going to run is process three, which has 48 left. So after 20, we preempt it at 108. And it's left with 28 left. We now run process four and it finishes. Then we'll run process one again. It will finish. We'll run process three. The timer interrupt goes off, but we're not going to have to preempt it because it's the only thing there to run. So we'll run it to completion. Okay, so our waiting times. So process one waits for zero. Then it again waits for, it finishes at 20 and has to wait until 68. And then it finishes at 88 and doesn't get to run again until 112. So it waits for a total of 72. Process two only has to wait for 20. And process three waits three times for a total of 85. And process four waits twice for a total of 88. So our average waiting time is 66 and a quarter. Our average completion time is process one finishes at 125. Process two finishes at 28. Process three finishes at 153. And process four finishes at 112. So that's an average of 104 and a half. So the advantages of round robin is that it's better for short jobs. They get through the system a lot faster. And it's fair, but it gets their own shot, their equal shot at the processor for equal amount of time. But the downside is that we can have a lot of context switching. And for very long running jobs, that could potentially add up. If we don't make our quantum be large enough or make our context switch overhead be low enough. So how do we choose this quantum? If we choose the quantum to be too big, then we're going to affect response time. So we set the quantum to be 10 minutes. It could take a long time for a job to get a shot at the processor. What if we set it to infinite? Well, then we're just back at first come first serve. What if we make it really small? Well, throughput is going to suffer because if we make it very small, then we're going to spend a lot of time on context switching. And that's wasted work. And so that's going to reduce the amount of effective work or good put that we can have in the system. The CPU will be busy, but it'll be busy swapping out registers and swapping in registers if the time slice is too small. So actual choices of time sizes. Initially, Unix used a time slice of one second, which was fine when Unix was only being used by one or two people and they were each only doing one thing. But if you've got, you know, three compilations going on, then it could take you three seconds to echo a keystroke in the editor. So modern systems typically are trying to balance the short job performance and long job throughput. And so the typical time slice today will be somewhere between 10 and 100 milliseconds. The context switch overhead will be somewhere between 0.1 milliseconds and a millisecond. And so roughly 1% of the overhead will be due to context switching. So let's compare our two policies that we've looked at so far. First come first serve and round robin. So assuming a zero cost context switching time, we can ask the question is round robin always better than first come first serve? So let's look at a simple example to answer this question. So let's say we have 10 jobs and each of the jobs takes 100 seconds of CPU time. And we have a round robin scheduler with a quantum of one second. And we're going to start all the jobs at the same time. So with first come first serve, first job P1 will run and that runs for 100 seconds. Then job P2 runs for 100 seconds, finishing at 200. Job P3 runs starting at 200, finishing at 300 and so on all the way up to job 10, finishing at 1000. With round robin, we're going to run one second of P1, one second of P2, one second of P3, one second of P4, one second of P5 and so on until we finally get to time 990. And at time 991, our first job finishes, time 992, our second job finishes, time 993, or third all the way up to at 1000, our P10 finishes. So in both cases, because we're assuming zero cost context switching time, they're going to finish at the same time. So here are our completion times. And what's our average waiting time? I'm sorry, our average finishing time. Average finishing time is going to be 550 for FIFO. And for round robin, it's going to be 995.5. Because basically nobody finishes until the last 10 seconds. So this is not good at all. From a response time standpoint, this is really bad. It's almost twice the response time as FIFO. So they're both finishing at the same time, but the average response time is much, much worse under round robin. And what this really demonstrates is that when you have jobs that are all the same length, round robin is not a good choice. But there are other issues. So in the real world, context switching is not zero cost. And so all that context switching that we're doing with round robin, we're going to pay a price for it. So it's actually going to take longer for everybody to finish under round robin than with first come, first serve. The other real world issue is that we have a finite size cash. And we're going to talk about cash a lot and the memory hierarchy in more detail in the next coming weeks. But that cash state is small and has to be shared between all of the jobs that are running at the same time with round robin. Whereas with first come, first serve, you get exclusive access to the cash because it's the only thing that's running. So the total time is going to be much, much longer for round robin because we're going to be thrashing in the cash and we're also going to be incurring context switch overhead. So when all the jobs are the same, FIFO is the best choice to have. So let's look at our earlier example with different time quantum. So remember that our best case performance for this was when we ran the shortest job first P2, then we ran the next longer job P4, then we ran P1, then we ran P3. Now in this best case situation, our average wait time was 31.25 and our average completion time was 69.5. Now the worst case that we had was when we had a completely, the reverse of the schedule basically. So run the big job first. So running the big job first yielded an average wait time of 83.5 and an average completion time of 121.75. So both are significantly worse than our best case. So let's pick round robin with a quantum of 8. So with a quantum of 8, we end up with an average running time of 57.25, an average wait time rather of 57.25, an average completion time of 95.5. So this results from assuming the arrival order is P1, P2, P3, P4, and you can see here that first we run P1 and it gets preempted at time 8, then we run P2 and it finishes at time 16, exits the system. Then we run P3, it only runs for 8, then we run P4, it only runs for 8, and we just keep cycling. So P4 finishes at 8, then P1 finishes at 133, and P3 finishes at the very end at 153. So what happens if we try different quantum? So let's try a much smaller and a much bigger quantum. So we make our quantum be 1, which again in reality we'd never want to do that because of the overheads of context switching. But if we do that, our average wait time is double almost the best case. First come, first serve, average wait time. And our completion time is 100.5 versus 69.5 for the best case. But even more interestingly, we end up with worse performance than when our quantum was 8. 62 for our wait time versus 57.25, 100.5 for our completion time versus 95.5. Let's look at the other end. So if we make our quantum be big, like 20, again we end up with a higher average wait time than a quantum of 8 or even our quantum of 1. And we end up with a higher completion time at 104.5. What happens if we kind of split the difference? So we'll look at a quantum of 5 and a quantum of 10. Well again, we find that we're actually higher than our sweet spot of a quantum of 8 or the best case that we can do with first come, first serve. Similarly, we have completion times for a quantum of 5 or 99.5 and of 10 it's actually the same at 99.5. So the thing to really notice is to look at the finishing times for the different jobs in this table for the finishing times for P1, P2, P3, P4. So if we look at these various columns, we immediately start to notice something. In particular, if we look at the column for P2, what do we see? We see that the wait time varies between 0 and 145. So it varies pretty dramatically. The completion time varies between 8 and 153. So again, a pretty dramatic variance for something that only runs, remember P2 only runs for 8. Now let's look at another column. Look at column P3. If we look at column P3, there's actually very little variance other than the 0 for the worst case for everybody else. The wait time is the same. The completion time also does not vary that much except for the worst case situation. So it varies from 100, you know, constant at 153. So this is something to think about, right? Because P2 as our short job is very sensitive to the ordering. P3 as our very long job is insensitive to the ordering. So that tells us that there's probably a much better hour than that we could come up with. So before we do that, let's do some administrative details. So project one code is going to be due next Tuesday at before 11.59 p.m. Try not to use slip days on your first project. It's best to save those slip days for later in the semester. And we have a midterm on the 21st. It's going to be during class time and it will be split between this classroom and 2060 Valley Life Sciences building. If your last name starts with A through L, you'll be in 145 to an L. If your last name starts with M through Z, you'll be in 2060 Valley Life Sciences building. It's closed book and you get one handwritten, not machine generated, one handwritten page of notes. No calculators are allowed and it will cover lectures one through 13, all the readings, the handouts and projects one and two. The TAs are going to be doing a review session and we're awaiting room scheduling from central campus. And once we get that, we'll let everybody know when it's going to be. Okay, so looking at that example that we just had, it's pretty apparent that short jobs are very sensitive to the order in which they're run, whereas longer running jobs are much less sensitive and we're actually insensitive to the order in which they're run. So this tells us that maybe if we knew which of the jobs were short, we could do much better at scheduling those jobs. So what we'd like to do is mirror the best case first come, first serve. That was the best that we were able to get. Beat, round, robin, no matter what we picked as quanta, we didn't do as well as first come, first serve. So this is an algorithm we're going to call shortest job first. And we're going to run whatever job has the least amount of remaining computation. There's a preemptive variant of this that's called shortest remaining time first. And what that means is if a job arrives and it has a shorter time to completion than the remaining job has a running time on the current job, then we're immediately going to preempt the CPU and run that job. So because... So we can apply these algorithms to either the whole program or just to the current CPU burst of each program. But the basic idea behind both of these algorithms is that we want to get short jobs through the system first. That will have a huge effect on those short jobs, as we saw in the example. When the short jobs got to run first, they had great response time. When they were delayed, they had really bad response time. But by getting these short jobs out, we'll drive down our average response time. Now, this is going to have a negative effect on the long running jobs, but it's going to be relatively minimum. In fact, as we saw in round robin example, it didn't matter really what order we ended up running things in. The long jobs are going to take a long time anyway. So it turns out that shortest job first and shortest remaining time first are the best that you can do at minimizing the average response time. They're provably optimal. Shortest job first for non-preemptive algorithms and shortest remaining time first for preemptive algorithms. And since shortest remaining time first is always at least as good as shortest job first, we're going to focus on shortest remaining time first. Now, let's compare shortest remaining time first with what first come first serve would do and what round robin would do. And we're going to look at what happens if the jobs are all the same length. Well, if they're all the same length, shortest remaining time first is going to become the same as first come first serve. Because it's going to pick one of the jobs and it's going to start running and immediately it's going to be the job that now has the least amount of time remaining. So it's going to keep the CPU until it finishes. And then we'll pick another job and so on and so on. So what this actually says is that first come first serve is the best that you can do if all the jobs are the same length. Since shortest remaining time first is provably optimal. Now what happens if the jobs have varying length instead? So this is perhaps the case that it's going to be most common, right? Because it's uncommon that we'll have jobs all be the same length. So with shortest remaining time first and round robin algorithms, short jobs don't get stuck behind long ones. Because the long ones are forced to give up the CPU and then the shorter ones get to run with round robin. And with shortest remaining time first, we're not even going to schedule the long running jobs as long as we have short jobs that want to use the CPU. So let's look at an example that'll show how shortest remaining time first works. So let's say we have three jobs. So the first two, A and B, are both CPU bound. Each of them, A or B, can run for a week. So if A gets the CPU, it'll run for a week. If B gets the CPU, it's going to run for a week. And C is an IO bound application. So it's going to do one millisecond of CPU. So just a little CPU burst. And then it will do nine milliseconds of disk IO, then one millisecond of CPU burst, nine milliseconds of disk IO. So if C is the only one that's running, it'll use 90% of the disk. If A is running or B is running, it's going to use 100% of the CPU. So what happens if we have FIFO? If we use FIFO, if A or B is ever scheduled, it's going to keep the CPU for a week. So if our arrival order is A, B, and then C, then A will run for a week, B will run for a week, and then finally C will get to run. So our short little job, short in terms of CPU burst, is not going to get to run for a long time. You'll have to wait two weeks. What about with round robin or shortest remaining time first? Well, this is a case where it's probably a lot easier to see what happens with a timeline than to try and describe it. So here's what would happen if we have round robin with a 100 millisecond time slice. So C runs for 1 millisecond, and then does 9 milliseconds of IO. While it's doing that 9 milliseconds of IO, A is going to run for 100 milliseconds, and then B will run for another 100 milliseconds. And so 200 milliseconds later, we get to schedule C, and it runs for a millisecond, and then does 9 milliseconds of IO. So in the time period from the start of C's time slice to when C next gets scheduled, that's 201 milliseconds. And out of that 201 milliseconds, we only did 9 milliseconds of disk IO. So we went from having 90% utilization of the disk to only having 4.5% utilization of the disk. So maybe the solution is to have a shorter time slice. So here's where we have a 1 millisecond time slice. So C runs for a millisecond, then we run A, then we run for a millisecond, and B for a millisecond, and A for a millisecond, B for a millisecond, until we get to 9 milliseconds, at which point now C is ready to run again. So C runs for a millisecond. And then while it's doing its 9 milliseconds of IO, again, we're going to do A, B, A, B, A, B, A, B. So now we're able to get the disk utilization to 90%, but we're doing a tremendous amount of context switching. And that's going to have a negative impact also on the cache. So we're going to waste a lot of time moving the contents of the registers to memory and reloading them as we're context switching. We're also going to be thrashing on our cache, especially if A and B need a lot of cache space. So what will happen when short is remaining time first? Well, what's the job that has the least amount of remaining time to start with? It's going to be C. So that implies we'll run C for a millisecond. And then we'll pick A or B. And we'll run A for 9 milliseconds, because after 9 milliseconds, C is going to be ready to run again and it's the shortest job. And because SRTF is preemptive, we're going to preempt A. Now we'll run C for a millisecond. Now what do we run next? Well, we have two choices, A or B, which has the least amount of remaining time? A does by 9 milliseconds, because it's already done 9 milliseconds of its week's worth of work. So it has 1 week minus 9 milliseconds remaining time. So A will run again. And so we'll alternate between A and C until C finishes or A finishes, and then B will run. The key thing though is we're getting maximal utilization of the CPU, we're doing minimal amounts of context switching, we're delivering the best average response time, and we've got 90% disk utilization. So this truly is an optimal algorithm. So let's talk about this though, because this seems like what all of our systems should be running. But there's a little bit of a problem. It's not good if we have lots of small jobs, because it will always run the small jobs and large jobs might never get to run. They get starved. The other big problem is we have to predict the future. And how can we do this? So one approach would be to ask the user, how long is your job going to take when they submit it? But users lie, and so they'll be like, oh, it'll take a minute. And in reality it runs for a week. So one way to prevent cheating is if the job runs for much longer than a minute, you just simply kill it. And so then this causes users not to lie, because if you lie, you're not going to get your work done. So this will encourage you to be truthful. But the problem is even if you're trying to be truthful, it's very hard to know in advance how long your job is going to take to run. I think even programmers probably could not answer that question, let alone ordinary users. So the bottom line is we really can't know how long a job is going to take, but we can use shortest remaining time first as a way of measuring how good other policies are. Since it's optimal and there's no way to do any better, it makes a great yardstick to say how close to optimal is this new scheduling algorithm that I've come up with. So the advantages of shortest remaining time first are that it's optimal in terms of average response time, and that's something you can prove. Disadvantages, it's very hard to predict the future, and it's also unfair, because it's going to bias towards short jobs, and long jobs won't get to to run. So how can we predict how long a job is going to run for? Well, one approach to use is to look at past behavior. This is an adaptive approach, and this is something that's actually used in CPU scheduling. We'll see it used in virtual memory. We'll see it used in file systems. We'll see it used in many cases, and it works because the behavior of programs is very predictable. If a program was CPU bound in the past, it's likely to be CPU bound in the future. If it was IO bound in the past, it's likely to be IO bound in the future. Now, behavior was completely random. This wouldn't work. But then a lot of other things wouldn't work in computers if the behavior was random. We'll see caching. Caching wouldn't work. Virtual memory wouldn't work. File systems wouldn't work very well. Now, programs may change and go through phases of operations. They might have a CPU bound phase, then they might have an IO bound phase. We want to make sure we're able to track those kinds of changes in phase. But within a phase, we'll see very consistent behavior usually. So here's an example of how we could use short remaining time first if we estimated what the next burst length was going to be. Now, the way we could do that is to use an estimator function on the previous burst that we've seen. And so if we have tn-1 and tn-2 and tn-3 represent the last, the second the last, and the third to last the view burst, then we'll apply some function that will estimate what the next burst is going to be. And there's lots of different things we could use as the function. It could be some exponential weighted moving average function or a Kalman filter, many different choices that we could use. So here's an example of a CPU burst. The actual CPU bursts are here in the black line. And then we're using an exponential averaging function, which generates this blue curve. And you can see some things. So maybe the burst was much higher back here. And so there's a little bit of lag when the actual burst drops before the estimator catches up with that. And similar here at time six, where we have the burst increase significantly, there's a lag time before the estimator captures that increase. And that's something we can tune the parameters to make the estimator more accurate. Okay, another approach that we could use is what's used in multi-level feedback schedule. So here we have multiple queues. And this is an approach that was first used in the Cambridge time-sharing system, which was an early time-sharing operating system. And we have three queues in this case, each of which would have a different priority associated with it. The top queues are considered high-priority queues and would have a higher priority. These are often the foreground tasks. And then the lowest priority queue is typically a background queue. We have different scheduling algorithms that we use on each queue. So here we're using round robin with a quantum of eight. We double it for the next round robin queue to 16. And then at the bottom we have a first-come-first-serve queue for our background types of tasks. Now we're going to adjust each job's priority as follows. And many different details that vary depending upon the particular implementation of multi-level feedback scheduling. But the basic idea is the job starts out in the highest priority queue. So here it's running round robin with a quantum of eight. If the timeout expires, we drop one level. So if it tried to run for 10, the time will expire at 8, and we'll drop it into the queue with quantum 16. And if its timeout doesn't expire, so let's say it's running in this queue and it's doing bursts of length 10, we're going to raise it up a level. Or if it's in the top queue, it'll stay in the top queue. So what's going to happen with an approach like this? Well, long-running tasks are very quickly going to drop from the top queue to the bottom queue. Because long-running tasks are going to constantly exceed their quantum until they end up in the first-come-first-serve queue. So some more details. This as an approach is going to approximate shortest remaining time first. If you have a very CPU-bound job, it's going to drop like a rock to the bottom queue. But if you have a short-running IO-bound job, it's going to stay near the top. So think about C, A, and B. In our example, C was that short-running IO-bound job. It's going to stay up in the top queue. It's only doing a millisecond of work. Whereas the A and B, they're going to end up in the bottom queues. So now we have to think about how we're going to allocate the CPU between these various queues. Because now we have multiple queues that we're drawing from. So one approach is a fixed priority scheduling approach. So we serve all of the jobs that are in the highest priority queue. Then we serve all that are in the next priority queue. Then we serve all that are in the next, and so on and so on. So we drain each queue to empty before we go down a level. Another approach is to time slice across the queue. So we allocate a certain amount of CPU time to each of the queues. So maybe 70% goes to that top four round queue, then 20% to the next, and 10% to the first come first serve queue. So we should talk a little bit about fairness. If we use a strict priority scheduling approach between the queues, it's going to be unfair. This is where we run all in the top queue, then we run all in the next queue, then we run all in the next queue, and so on. Because long running jobs may never get access to the CPU. If there are lots of short running jobs in the system, the long running jobs are never going to have an opportunity to use the CPU. It gets scheduled onto the CPU. So in the Multix operating system, which was one of these very high availability operating systems, you could pull CPU boards, you could pull memory boards, you could pull all sorts of cards out of the machine, and it would keep running. It was really a very highly available system. A lot of the fault tolerance was done actually in software. They shut down one of these machines that had been running for over a decade at an academic institution, and they actually found a 10-year-old job in the bottom multi-level feedback queue scheduler. Not good. So we need to give these long running jobs some fraction of the CPU even when we have shorter jobs to run. So this is a trade-off we're going to make. We're going to gain fairness by hurting the average response time. We're going to get the best average response time when we run those short jobs first. But that's not going to be fair. We're going to trade better fairness for worse average response time. So how do we actually implement fairness? Well, we could give each of these queues a portion of the CPU. But then what happens if, say, we have one long running job and 100 short running jobs? That doesn't seem like it works, right? Having this kind of fixed allocation causes problems when we have an imbalance in the number of jobs in each queue. So this is sort of like you get to the supermarket, typically it's on a Friday night and you've got your gallon of milk, and there's 50 people in the express lane. So it's not a good choice to go to the express lane. It's a better choice to go to one of the regular lanes and get behind that person with two shopping carts worth of groceries because you're probably going to get out of the supermarket faster. So that's the problem that we could have here is if there's lots and lots of short jobs and we're only allocating some, even if it's a large fraction, it's going to be a disproportionate fraction versus there's only one job sitting in the long running queue. So another way we could implement fairness would be to increase the priority of jobs that don't get service. This is actually what's done in some variants of Unix. But it's very ad hoc. What's the rate at which we should increase the priorities or decrease the priorities? All of the approaches are very, very ad hoc and typically don't work well when you have a flood of jobs in the system. So there's an alternate scheduling algorithm that we could use called Lottery Scheduling. And this was developed by Carl Waltsberger while he was a grad student at MIT. And Carl and I were in the same research group together while he was working on Lottery Scheduling. So Lottery Scheduling is pretty cool. The idea is you give each job some number of Lottery Tickets. And at the start of each time slice, the scheduler randomly picks a winning ticket. So what's going to happen is in the long term, on average the amount of CPU time that each job is given is going to be proportional to the number of Tickets that it holds. You have to have a good random number generator to make this work. And Carl actually spent quite a lot of time coming up with a very fast and efficient but also a very uniformly distributed random number generator. So now how do we assign Tickets? Well, if we want to approximate shortest remaining time first, we want to make sure that short jobs get more Tickets and longer running jobs get fewer Tickets. But to avoid starvation, make sure everybody makes progress, we're going to give every job at least one Ticket. That's going to guarantee that it will get the CPU some fraction where that fraction is a fraction of all the Tickets that are out there in the system amount of CPU time in a complete cycle through all of the Tickets. And a good random number generator would cycle through all of the Tickets. That's one of the challenges of designing a good random number generator. Now the advantage that this is going to have over strict priority scheduling is that it's going to behave very gracefully as the load changes. If you add more jobs or jobs finish and leave the system, it's going to affect all jobs proportionally, independent of how many Tickets each job possesses. So let's look at an example. So we're going to assume short jobs get 10 Tickets and long jobs each get one Ticket. So if we have one short job and one long job, how many Tickets do we have in the system? We have 11 Tickets, right? 10 for the short job and one for the long job. So each short job, the one that we have, is going to get 91% of the CPU because it holds 10 out of 11 of the Tickets. This is on average. The long job, the one long job, is going to get one out of 11 or 9% of the CPU on average. Now what if we just have long jobs in the system? Well then there's going to be two total Tickets and each of the long jobs hold one of those two Tickets so it will get 50% of the CPU. What if we have two short jobs? Well each of the two short jobs has 10 Tickets so that's 20 Tickets total. So they have 10 out of 20 Tickets which is 50% of the CPU. What if we have 10 short jobs and one long job? So how many Tickets do we have in the system now? We have 101 Tickets, 100 for the short jobs, the 10 of them, and one for the long job. So each of the short jobs is going to get 10 out of 101 of the CPU or 9.9% and the one long job is going to get 1 out of 101 or 0.99% of the CPU. So it will still make progress slowly but it will make progress. And what if we have one short job and 10 long jobs? Well how many Tickets do we have outstanding? 20. So each of the short jobs will get or the one short job rather will get 10 out of 20 of the CPU or 50% and each of the long jobs will get 1 out of 20 or 5% of the CPU. So this is really nice. As we adjust the number of jobs, short or long, the system will automatically rebalance the fraction of the CPU that each job ends up getting. Now what happens if we have too many short jobs to give a reasonable response to John? If on a time-sharing system like Unix, if the load average is 100, nobody is really going to make progress. So one approach when you run out of resources is to forcibly log someone out. So I like real world analogies, so the analogy here would be if we get to 11 o'clock on the night that the project is due and everybody is logged into the instructional machines to make sure their code is running there but it's clear that some groups are not going to finish and we don't have enough CPU capacity to be able to compile and run everybody's test cases. Then maybe the solution is the project group that's furthest behind, we log them out. That way the other project groups could make progress. So that's not fair at all, but it is one approach to trying to make sure that everyone is in the system. And so that's one of the challenges we face with a scheduler. So how do we evaluate a scheduling algorithm? So there are a couple of different ways, a few different ways. One is to use deterministic model. So here we take a predetermined workload and we compute algorithmically what the performance of each algorithm would be for that workload. The second way to evaluate a scheduling algorithm is to use a mathematical approach to build a queuing model and this works well for stochastic workloads. So you assume some inter-arrival time between jobs over some distribution. You assume that job lengths have some distribution, that CPU bursts have a distribution and average and that IO bursts have average and distribution. And then you can build a mathematical model of what your expected wait times are going to be or response time and what your expected completion times are going to be. And then the final approach is to actually build your scheduling algorithm or to simulate your scheduling algorithm. So here what we do is we run a set of jobs and we collect their CPU and IO bursts. So that's the trace tape here that has CPU bursts and IO bursts. And then we build a simulator that allows us to replay this trace tape and use our plugged in scheduling algorithm. So here we've plugged in first come first serve, here we've plugged in our oracle algorithm. Now the nice thing about doing this in simulation is that our oracle algorithm, shortest job first, can actually look into the future and look at the trace tape and pick the best job to run. And here's Ron Robin with a quantum of 14. And so our output from each one of these simulations is going to be a set of performance statistics. This is the most flexible and the most general of the approaches. And what's nice about it is you can run it against your actual executions and compare how well or poorly you do relative to other schedulers. The challenges with the deterministic model and the queuing model is capturing all the parameters in both cases. And the advantage of the deterministic models and the queuing models is they allow you to say something much more formal about the behavior of a particular scheduling algorithm. So the last thing I want to say about scheduling is when is it the case that the details of a scheduling policy and fairness really matter? The answer is when we don't have enough resources to go around. So this is also another way of restating this question is when should you buy a faster computer or a faster network link or expand a highway or build a new bridge? When should you do any of these things? Well, one approach is that you buy when it pays for itself and improves response time. Because the assumption is that you pay a cost for worse response time. And that could be in terms of reduced productivity for you or if you have customers, let's say a web business, it's customers who are complaining about your website or because it's too slow or they just go away and never come back. Microsoft and Google have done studies that show if you exceed a threshold for returning search results, some fraction of your users will go away and never come back to your site. Similarly, your time is worth something and so, you know, you look at, well, when do I buy a faster computer? Well, when I can get my work done faster. So for example, about five years ago bought and asked a solid state drive. It was very expensive to put into my laptop because it made me more productive. I was able to run more virtual machines at the same time. I was able to do more compilations per unit time. It made me more productive. So here is a graph that shows on the x-axis utilization of your system. So this could be processor utilization. And on the y-axis, we have response time. So you might think that, you know, the time you resource is when you had 100% utilization. We've maxed out the utilization of your system. That's actually the worst time to do it because at that time, you actually end up with a response time that's approaching infinity. So really, you know, when you want to think about this and when algorithms actually work best is when we're in this linear portion of the curve. So, and this is when our algorithms are actually going to perform, you know, sort of at their best. So this kind of argues that the time to buy is not when we hit 100%, but right when we're at the knee of the curve. Because if we buy at the knee of the curve, we're buying sort of as late as we can before we hit this region where it goes exponential and we very quickly hit 100%. So in summary, today, the topic was scheduling. How to select a process from the reticue and give the CPU to it. Looked at a bunch of different algorithms. First come for scheduling, first come for serve scheduling, which ran threads to completion in the order that they were submitted. The advantage is very simple. The disadvantage is short jobs can get stuck behind long ones. Looked at round robin scheduling. Here, we give each thread a small amount of the CPU and cycle between all the ready threads. Advantage, better for short jobs. Disadvantage, when all the jobs are the same length, it performs very poorly, very bad average response time. Three other algorithms. Shortest job first, shortest remaining time first. We run whatever job has the least amount of remaining computation to do. The advantage, it's optimal in terms of average response time. Disadvantages, it's hard to predict the future and it's unfair because it's going to be biased in favor of short jobs and could starve long jobs. Multi-level feedback scheduling uses multiple queues which have different priorities and automatically promotes or demotes processes in a way that approximates shortest job first or shortest remaining time first. At the highest priority, if they run for long amounts of time, they run at the lowest priority. And then last, looked at lottery scheduling which gives each thread a number of tokens and short tasks are going to end up with more tokens. We reserve a minimum number of tokens for every thread so we can ensure forward progress and fairness.