 I'm giving Fung a chance to fiddle with the new camera. Good morning. Good morning. Good morning, all right. Today's Wednesday. I always have to think about it. OK, so today we're kind of halfway through our week discussing thread scheduling. And on Monday, we looked at some of the principles that underlie thread scheduling. And today what we're going to do is we're actually going to walk through a couple of classic pedagogical thread scheduling algorithms. And these are, I think, the point of presenting some of these. Well, there's two points. One is that it helps you think a little bit about how scheduling works and some of the features involved. These kind of algorithms are also fodder for midterm and final exam questions. And I think some of those questions are a little dumb, so I might not put you through them, but maybe I will. But the real point of doing this today is that on so walking through some of these simple algorithms, most of which aren't really used in practice in any sort of pure form like this. But the real point of this is to walk you through some of these sort of building blocks. And then on Friday, we're going to talk about an actual schedule, an actual real implemented deployed schedule. That's the Linux schedule. So today, we're going to go through kind of a series of evolutions in terms of scheduling algorithms and how those algorithms use information and the kind of information that's exposed to those algorithms. And I've kind of arranged it this way, and you'll understand those labels as we go through. On Friday, now, it's kind of unusual. So I normally like you guys to come to lecture. And maybe Friday is a day where it might seem like it's a little bit more difficult to get up in the morning and hike in here. I know a lot of you guys do that, and I appreciate it. I don't have a choice. But so Friday's lecture, I think, is going to be really cool. So I think all my lectures are pretty cool. So this is a high bar. So this is going to be really a fun lecture. What happened was yesterday, I was writing these lectures, and I started to look at some of the materials that I've been kind of using as reference guides. And there was this whole thing that my former advisor had done about the Linux 01 scheduler. And I was like, well, this is kind of interesting. So there's some nice design principles that are embodied there. But then I looked up, and I realized that this is Linux 2.6, and we're on Linux 3.0 now, and the 01 scheduler is history. Then what I found out is that there's a new schedule in the Linux kernel. And there's actually this fantastic story behind this schedule. Well, it's the completely fair scheduler, is the new scheduling algorithm. But this whole story of how this scheduler came about is really, really cool. And I think it highlights things that are really, really fun and kind of interesting about the Linux community. So the hierarchy within this supposedly very, very egalitarian community, contributions made by, I wouldn't call them, they're not paid to do it. So some of the main Linux guys are actually work at sort of Linux companies. And that's one of the reasons that they can do this. But the guy who developed the algorithms and some of the ideas behind the scheduler is actually an Australian anesthesist. So he's a part-time hacker who made this massive contribution to the Linux kernel and then kind of didn't. So this is an awesome, awesome story. So I think this will be a fun lecture. I'm going to try to walk through an actual description of how we're actually not going to talk specifically about how the completely fair schedule works, but we will talk about the basis for the completely fair scheduler, which is something called rotating staircase deadline. And actually, when you see the rotating staircase deadline schedule, you'll recognize pieces of a lot of what we're going to talk about today. So this will be a fun lecture. And I promise that I will include all the necessary profanity in the slides to describe the situation, because there's profanity in the names of some of the schedulers, as Isaac pointed out. And then there's also profanity, I think, that gets used by people that are involved in this situation for a variety of reasons. OK, so I'm planning on posting the slides. Somebody asked about this. And yes, the slides will be online. So that is going to happen. I have not given up. I've been working on writing the slides rather than posting them. But they will be up there. And then at this point, so my suggestion to you is that on assignment one, if you're not done writing your primitives and testing your primitives, you're kind of behind the eight ball at this point. The scheduling problems are not designed to be that hard, but they can be tricky. So I think that that's where you guys should be. And if you're not there, I would suggest that you get in gear. Robert came by the other day and was talking to me a little bit. And one of the things I want to remind you guys is that you guys are a great resource for yourself when it comes to help on the assignment. And I know that maybe it feels like I'm walking a little bit of a thin line here, but I don't see the line as that thin. You guys are free to talk about the assignment. If you guys are struggling with how to do this stupid whale mating problem that we're forcing you to do, then sit down with a classmate and say, hey, can I talk over my solution with you? Here's what I'm going to do it. And they may say, oh, something similar occurred to me. That conversation is completely awesome. As long as you sit down afterwards and separately implement your solutions. As long as there's no code swapping, the conversations are fine. And we are going to patrol this line in the sand. So there's a tool at Stanford. This is not meant to frighten anybody just in the interest of full disclosure called the measure of software similarity. We will be running that tool over all your assignments. And we will be investigating any sort of unusual similarities that that tool produces. So if you're thinking about cheating in this class, don't bother. If you feel like maybe you could find some assignment online to copy off of, I don't know. I mean, maybe that will work out for you. But you just have to cross your fingers and pray that nobody else in this class finds that same assignment. Because if they do, we will match the guys against each other. And we'll nail both of you. I don't want this to work. But this is something I think that we have to do. Because I want you guys to feel free to collaborate. But I want to know that people haven't crossed the line into actual code swapping, which is not OK. Any questions about this stuff? All right. So let's talk about last time. So we had a lecture where we just kind of introduced what are the goals of scheduling and some principles of scheduling and some of the things that you guys expect from your systems. So any questions about that material after you've slept down it, you thought about it? All Tuesday went by. I'm sure you spent all Tuesday just dreaming about this stuff and kind of imagining different ways that you can improve the scheduling and all that. Questions? Nobody had any questions. It was all crystal clear. I found that unlikely. All right, so let's go do some reviews. So what is scheduling? Fundamentally, what is scheduling? Anybody? What's that? Multiplexing the CPU. So scheduling is required so that we can multiplex the CPU. So that's a good part of the answer. Anybody else? What else is scheduling? The process of determining the next thread to run. I mean, fundamentally, at the end of the day, I'm going to have to schedule a thread on the CPU. And scheduling is the process of choosing the thread to run. If I have more cores than I have threads, I'm not scheduling. And I guess that's the only case in which you wouldn't be scheduling. So that's what scheduling is. Now, OK, so we already got this from Minimus Sidugia. Why do we schedule threads? And anyone hear Sidugia's answer? Did I repeat it out loud? Why need to do this? Because we need to multiplex the CPU. That's not that complicated. And also, why us? Because we're the kernel. And multiplexing is our job. Multiplexing is our job. You gave the kernel special privileges that it needs to do this, and the kernel has to use them. So that's why the kernel ends up doing the scheduling. When does scheduling happen? Where are the times, when are the times when the kernel is required to make a scheduling decision? There were four. When a thread makes a system call, that's one. Yeah? Yeah, timer up goes off, so that's preemptive, right? I'm going to stop a thread. Two other ones. Exit and yield, OK? A thread voluntarily gives up the CPU by calling yield, and I've got to give it to somebody else. A thread makes a system call and has to sleep until a call completes. If it's a blocking system call, I have to give the CPU to somebody else. When a thread exits clearly, I have to give the CPU to somebody else. And if the kernel decides that a thread has run for long enough that it's time to let somebody else have the CPU, we think of that as preemptive scheduling. So that fourth bullet, remember, is what makes a scheduling policy preemptive. I'm actually going to grab the resource, the CPU, away from the thread, OK? So we talked about these human-computer interaction, sort of human-computer expectations, right? And I broke these down. And again, this is my taxonomy. I didn't copy this or borrow this from anybody else, which means that it's fresh and also potentially completely bogus, right? But I think it makes some sense. So anyways, we expect our computers to respond, right? We expect them, when we interact with them, we expect them to react, right? To indicate that they've noticed us, right? We expect them to continue to do things that require continuity, especially video and audio-type things. And we also expect them to finish tasks that are kind of long-running things that kind of need to happen, right? There's actually something that we'll talk about on Friday. So the guy who, what is it? Cone-Colibus, maybe I should look how to, is that how you pronounce it? Yeah, so he has this, this is this guy who essentially develops some of the algorithms that are not a part of the Linux scheduler. And he has a really, really good, I think, description of the difference between responsiveness and interactivity, which we will cover on Friday, partly because I read it, and I'm not sure I completely understand it yet. So I'm going to go back and look at it again. But I think it's a nice, because there is a difference there, as he sees it as a scheduling designer, and so we'll talk about that a little bit. All right, and then finally, how do we evaluate the schedule? So today we're actually going to talk about real scheduling algorithms. So how do we evaluate them? How do we decide if they work well or not? Anybody? Right, so one of the goals of the system is how well does it meet deadlines, right? Unpredictable and predictable deadlines. The predictable deadlines are important for continuity, when I need to do something like continue to fill a buffer on a video card. And the unpredictable deadlines are important for responsiveness, right? So that's when you suddenly sit down and start typing and the system needs to respond in order to satisfy your every whim and desire, OK? And also, how well does it do with actually completely allocating resources, right? There's no point packing your tablet or your laptop or your phone with all sorts of memory and a fast CPU and a really good disk if the CPU scheduling algorithm doesn't make use of those resources, right? So to some degree, trying to put all of those resources to continuous use is another goal of schedule. And we talked a little bit about how that can be opposed to meeting deadlines, right, which sometimes means I have to drop everything and just try to get one thing done, OK? And again, we talked about on human-facing systems, interactivity wins, right? Why is that? Do you guys remember why? Right, and fundamentally, whose time is more important? Yours. I hope you think it's yours. I mean, if you think it's the computers, then you might end up with some really interesting scheduling algorithms. I don't know if you would come to class because you might be sitting in front of the computer all the time waiting for it. But anyway, so yeah, your time is more valuable than your computers. And again, part of the fun thing on Friday is that Kohn's work on scheduling is really, really focused on desktop user systems. And we'll talk about partly why that makes it really difficult to evaluate, right? And this is one of the aspects of the story that we'll get into on Friday, right? Is that how do you quantify the responsiveness of your system, right? It's very easy to measure things like server throughput, right? I just run a benchmark that jams a number of requests into my web server and I see how fast they pop out. But sluggishness, what does that mean, right? It's just this weird thing that you can tell your computer is not right, but you don't know what's wrong with it, right? How many people get this general presumption that their computer slows down over time? Yeah, me too. Is that weird? I wonder if that's psychological or an actual real thing, right? I had a, this is just a funny, hopefully funny aside. I had a friend once who worked for a company where one of the IT staff was really convinced that the way to solve any problem with a laptop was to run the defrag, right? So the Windows defragger, which we'll talk about later and we'll talk a little bit about disks, kind of tries to rearrange things on disks to make the layout more contiguous. So anytime anyone had a problem with their computer, he would run the defrag as part of the solution to that, right? And so we came up with this idea of building a psychological defragger, right? So a defragger that would sit there for hours, like drawing, you know, how many people ever ran the old Windows defragger? Maybe it still does this, but it pops up this gooey and you see things moving around and you can kind of see the disk getting more clean because all the blue blocks are together and all the red blocks are together. And so we thought about, like, we could just write this thing that just spins the disk for hours and draws that display, but actually doesn't do anything, right? And then I wonder if you study people that were like, wow, my computer is so fast now, yeah, thank you, you know? Really tell that my hard disk is healthy, right? So evaluating this stuff can be really hard and when Con was working on his scheduler patches, I mean, either there'd be people that were writing in that say, yeah, this is great, this is great for my desktop, it seems to run way better and the kernel developers are like, well, you know, show us a benchmark and they're like, I don't know. So this gets into tough stuff. All right, and then finally, so we didn't cover this last time but I think it's important to point out, scheduling performance is also really critical. Why is that? Why would I care about the performance of my schedule? Yeah, exactly, I mean, if my scheduler may make this fantastic decision, right? It may sit there for 10 seconds thinking about what the next thread to run is but those 10 seconds are gone, right? So the scheduler is not on some sense to the user a useful part of the system, right? It's only useful to the degree that it can make a decent decision and get the heck out of the way as fast as possible, right? And some of the work that went into the, some of the earlier Linux schedulers was to fix scaling problems, right? So early Linux schedulers, for example, would take a time to schedule the thread that was proportional to the number of tasks or threads that are running on the system, right? That's not good. And newer schedulers have done better in terms of having constant overhead, right? But in general, there's this interesting tension between the goodness of the decision I'm able to make and the speed with which I can make that decision. All right, any other questions about scheduling principles before we start diving into some algorithms? Questions, questions, going once, going twice? Okay, cool. So when we start breaking down schedulers, I think it makes sense to think about it in terms of information, right? So what kind of information does the scheduler use in order to make a decision? What things about the thread do we observe or what additional information do we take in from outside that helps us make the scheduling decision, right? And so we'll talk a little bit about three types of schedulers, right? So the first scheduler is this oracular scheduler, right? And you guys know what an oracle is, right? An oracle knows the future, right? So oracular is a scheduler that knows the future. Now, these schedulers are difficult to implement, right? In fact, they're impossible to implement. But they're useful to think about when we talk about comparisons to other schedulers because oracular schedulers can frequently, you know, do the best thing, right? If I knew the entire history of what a thread was going to do going forward into time, clearly I can use that information to make better schedulers, right? Now, typical schedulers, again, like aren't, you know, don't have a crystal ball, right? So they can't see the future. And what they do to compensate is they use this principle that actually comes up over and over again when we talk about operating system algorithms, both here and in memory management and other places, they use the past to predict the future, right? And particularly the recent past, right? So I say, what did the thread just do? And I assume that it's going to kind of keep doing something like that, right? And finally, another piece of information that we can incorporate is actual user input, right? So giving the user a chance or the system sometimes, right, on behalf of the user a chance to add information to the scheduling decision, right? To say that, you know what? I know that this process is a backup process that doesn't need to run at a high priority, right? So I'm going to assign it to a lower level, right? And I'm going to tell the scheduler this is a less important or more important thread, right? And schedulers normally have ways to do this. Now, here's the question. How many people use those scheduling mechanisms, right? Okay, Isaac, you're weird. How many people have ever used NICE or a tool that would allow you to boost or change your priorities, right? So the thing is, yeah, you guys, right? But the thing is, people don't like to do this, right? And one of the things we'll talk about, especially on Friday is that, you know, in general, especially on desktop systems, you're talking about people who don't want to mess around with the internals of the schedule. And these are people, they want the automatic transmission computer, right? They just want to push on and have a go, right? And so accepting this kind of input, it's unclear whether or not we should want schedulers to incorporate this input or that we should require schedulers to incorporate this or whether we should evaluate schedulers that do incorporate this input, right? So it's one thing to say, oh, well, the scheduler works great if you had this nice input from the user about what's important, but most of the time, schedulers have to guess, right? Because the user's not helping them out, okay? Questions about this little taxonomy here. All right, but let me at first talk about a group of scheduling algorithms that I'll call the know-nothings, right? So it's also possible that I incorporate no additional information, right, about the thread. You know, no user input, no path, no future, and I just try to do something simple, right? So let me give you an example of a know-nothing scheduler, okay? I've got, this is the example we'll come back to sort of several times on this class. I've got a series of threads, right? This is my ready queue, this is my waiting queue, okay? And one way to schedule threads, I mean, what do you think the simplest, dumbest, most basic way to schedule threads is? Choose a random thread, exactly, right? So let's just choose a thread at random. Let's say we choose thread T3, right? And what I'm showing up here is the start, when T3 starts to run, and then this is the end of the time quantum that I've set up, right? So that's the longest that I'm gonna let T3 run, okay? And let's say the T3 runs and it finishes its quantum and it goes back on the ready queue, right? Now what do I do? Choose another random thread, okay? I choose T5, T5, now maybe T5 blocks, right? So I'm gonna throw it over here on the waiting queue, now what do I do? Choose another random thread, right? And this is how this goes, right? So if threads exhaust their quantum, they could toss back on the ready, now I keep saying queue, and it's queues are kind of ingrained, but this isn't really a ready queue, I mean, there's no ordering here, so on the next slide I call this a ready pile, right? This is the pile of ready threads, right? There's no ordering here, so I think using the queue is kind of misleading, right? And essentially this is how this continues, right? Threads that block to get thrown on the waiting, I choose a random thread, and it's possible that threads can move back and forth, right? So here the waiting thread finished, and now it's ready to be scheduled, et cetera, right? And this is called random schedule, right? This is just what we describe, it's not that complicated, and here's how it works, right? First thing I have to do for both random and most simple types of scheduling is choose a scheduling quantum, right? The scheduling quantum means that's the longest that I'm gonna let any thread run, right? And remember, I have no other information about the threads here, and so I choose one quantum for everybody, that's just it, you know? If you hit the end of your quantum, I'm gonna stop you and I'm gonna run somebody else. Why is that important? What's that? It's fair, but what could happen if I don't have a scheduling quantum? What's that? Right, if I'm in a while one loop, there will never be another scheduling decision, right? Unless I have a quantum. So I need to be, preemption is kind of something that we start to assume as we start to talk about these algorithms, okay? Right, so I choose my scheduling quantum and then I choose, just choose a thread at random from the ready pile, and I run it to the blocks or it's quantum expires. Now what happens when a thread leaves the waiting state? Let's say a thread blocks and it's waiting on some IO, the IO finishes, then what do I do? Just throw it back in the ready pile, right? So this is probably the most, I think of this as the most stateless, basic, simple scheduler you could possibly have, right? Any questions about the random schedule, right? You guys are gonna have to implement several scheduling algorithms for assignment two, and my suggestion is pick ones that are easy. My suggestion is pick random is one of them, right? Because it's really, really easy, okay? All right, so round robin scheduling, as Ben pointed out, is very similar to random. The mechanism is the same. It took me like an hour to animate that slide before, so I decided not to do a separate one for round robin. But the only difference is that I'm gonna establish an order in my ready queue, right? And I'm gonna continue to follow that order each time, right? So to make how I choose the scheduling decision, I just take the thread that's at the head of the ready queue, and now I really do have a queue because I have ordering, right? I run that thread till it blocks or it's quantum expires, and then if it's quantum expires, I just pop it on the tail of the ready queue, right? So I put it back at the end, and this establishes an ordering, right? And when threads start, I could put them on the end of the ready queue, I could put them on the front of the ready queue, it doesn't matter, right? So that's how the ordering is established. Now what happens in this model when a thread blocks but then the blocking operation completes and it's ready to be run again? What do I do? What's that? It could go on the tail or on the head. I mean, just has to go back in the ready queue somewhere, right? And depending on how you want to do it, you could put it anywhere. The point is it just needs to get back on the ready queue. All right, questions about Ron Robbins. Ron Robbins, also very simple. Also candidate for assignment to implementation, right? Actually, I don't think that's true because I think we give you a Ron Robbins scheduler, so that one is off the table, okay? So again, I mean, these know-nothing scheduling algorithms are the most basic, simple building blocks for thinking about scheduling, right? And you can very easily, using Ron Robbins and random, develop an intuition about, for example, if I have a certain number of threads, how long will it take all those threads to complete their tasks? In both these cases, the running time is, if I have a bunch of threads that all need to do 500 milliseconds of work and I schedule them in this way, then the running time for the entire group is, the completion times are gonna be very similar, right? Because I'm just switching between little guys and little pieces and everybody's gonna finish around the same time, right? And again, these are not really useful algorithms, right? But we can use them sometimes to think about how other algorithms work, right? And again, they are very, very simple. So one of the features of these algorithms, I think it's important to point out, is that these algorithms don't reward, and in some sense, they even penalize threads that yield or block before their quantum expires, right? So let's say I have a thread that's, so let's say you wrote a piece of code, right? And that needs to run every once in a while and it wakes up and it does a certain amount of work, right? And let's say that you said, okay, I'm gonna try to be a good system citizen. I'm gonna rewrite my code so it uses half the amount of time, right? In this schedule and algorithm, nothing changes, right? There's no reward given to you as the programmer for your effort to better utilize the CPU, right? You're just gonna run less, right? You're gonna run for less time. Now you may be able to do the same amount of work, right? But there are times in which we might wanna say, okay, well, can we find a way to reward threads that give up their CPU quantum early, right? Because on some level, I said you can use this much CPU and they used less, right? So that's a nice thing, that's a good thing, right? I might wanna remember that in the future, okay? We'll talk later about algorithms that do have memory when it comes to this, okay? Now I just wanna point out one exception because, well, this will come up on Friday. In some ways, you know, again, Ron Robin can be thought of as a primitive that you can use in other situations, right? So for example, once I've sorted out all the other information I have about the threads on the system, I may be left with a set of threads that are essentially equivalent, right? All the same priority, they're all in the same queue, whatever. And in that case, I may just choose to Ron Robin through it. So this is the only case where you would see something like this in a real system, right? All right, any questions about the know-nothings? Know-nothing algorithms. Okay, so let's go to the opposite extreme. Let's talk about the know-it-all algorithms, okay? The know-it-all algorithms can predict the future, right? And what might we wanna know about the thread's future? What might we wanna know about the thread's future that would allow us to make a better scheduling decision? Anybody? Okay, sorry, I keep doing this. Sonali, you're a TA, you can't answer questions. If it's gonna wait, right? In the next quantum, is it gonna wait? What else? What's that? Okay, that's interesting, actually. I didn't think of that, deadlock. So I might wanna know if it's gonna deadlock. I'm not sure what I would do, right? Other than not schedule it for a long period of time to postpone the problem, right? How long it's gonna take until, what? Until it blocks or, yeah, I mean, how long is it gonna run? Like how long is the next series of uninterrupted CPU instructions it's about to execute? What else? I think I heard something up in the front of the room that sounded right. Yeah, so for example, how long is the IO it's gonna do take, right? If it's gonna block, how long is it gonna remain in the waiting state, right? So I might wanna know, how long is it gonna use the CPU? How long is this next CPU burst, right? Until it either blocks or yields, right? Will it block or yield, right? Particularly will it block or yield in the next time quantum? I mean, most threads eventually will block or yield, okay? Because they need some, I mean, if you have a thread that just is constantly performing computation, it never communicates with anything, right? Because it can't do IO, and so on some level, you can imagine that the utility of that kind of thing is limited, right? So I would argue that while some processes go through long periods of time where they don't do any IO, eventually they will do IO and they will block, right? And then if it blocks or yields, particularly if it blocks, how long is it gonna wait? How long is the thing that's gonna wait for take, okay? So let's look at a schedule in Algam where we actually have some intuition about what's about to happen. And let's say what we know here is how long each thread is going to run in before it blocks, right, or yields. And in this case, none of these threads are doing massive amounts of computation so they're all pretty soon, right? And one way to stack these up is what's called shortest job first, right? This is another schedule in Algam that you guys will hear about, right? And what I did here is I just ran the threads in order of the amount of time that's gonna take them to block, right? Now can anybody see what property this system minimizes? Anybody notice, why would I do something like this, right? What's that? Well, the context switch overhead is gonna be the same, right? Because I have to do the same number of context switches, right? Potentially, but let's assume for now that I'm gonna run these guys until they block, right? I'm not gonna split, I could split them across scheduling quantums. And if you do that, there's a variant of this that's called shortest remaining time first, right? So I can run the guy for the scheduling quantum and then I can basically lop off that portion and put it back on the ready queue and then put it back in order, okay? So throughput is, I think throughput is the same, right? What turns out here is the response time, right? So the time that each thread waits to run, I think is minimized by this approach, right? So let's look at why this is true. I'll give you a graphical depiction, right? So I'll come back to this. I'll give you a graphical depiction in a minute and we'll include some additional information, right? But again, let's go back to this question. Why would we prefer threads that give up the CPU before their time quantum ends? Why would we wanna run those threads or give them some preference? Right, but okay, that's true, but why is that? What are those threads doing? So let's say I have a bunch of threads that are gonna run for different amounts of time and block, right? Well, they're still gonna take up the same amount of space, but remember, we talked on Monday, the CPU gates access to every other resource on the system, right? And if I have a thread, so say I have two threads. One thread is going to run for its entire quantum, you know, it's computing digits of pi. The other thread just needs to use the disk, just needs to start another disk IO. If I run the thread that needs to start another disk IO first, that IO can be going on while the other guy is computing digits of pi. If I run the guy is computing digits of pi first, that disk IO is gonna complete much later because I've stalled the guy who just needs to do one little thing, right? So in general, remember that threads that use the CPU for a brief burst, right? Especially threads that block go into a state where they're waiting for something else, right? And to the degree that I can sometimes prioritize those threads, I can allow those things to happen in parallel, right? So I can do a better job of sharing all the resources on the system, you know? I can say, okay, you know, you just need the CPU for a couple cycles because you wanna redraw something on the screen, right? You do that very quickly and then I run the guy who's computing digits of pi, right? So this kind of interleaving can be very helpful. So let me go through an example of this, right? So let's say I have my five threads and let's say I not only know how long they are going to run for, but I also know how long whatever it is they do next is going to, so these threads are gonna run and they're gonna block, and then whatever they block for is gonna complete at this time, right? Does this picture make sense to everybody? So thread five is gonna run until here and then it's gonna block and whatever it's blocked for is gonna finish here, okay? So let's schedule these the longest job first, right? So let's start with T3, okay? So if I schedule T3 and now I schedule T1, I schedule T4, I schedule T5, I schedule T2, right? Let's say I wanna run all these threads before I make another scheduling decision, right? So here is the total waiting time, right? And what I mean by waiting time is the time after the thread, whatever the thread was waiting for, and this guy's not waiting for anything, he's just waiting for the CPU again, right? This guy is some sort of long running computation, right? All these other guys did something that blocked and now they're waiting for the CPU, right? To respond to whatever the input was or to continue their processing or whatever. So look at this line and get some idea for the area under that red curve, right? That's the total waiting time over all the threads, okay? Now let's assume that I put them in this order, right? Now look at what's happened, right? Smaller, right? Smaller total waiting time over all threads. And the reason for that is that I've managed to kind of parallelize a lot of these weights, right? A lot of these weights are happening while somebody else is using the CPU, right? And especially T3, right? So T3 is helping out a lot here because it's running while this guy's waiting, right? So in general, this hopefully is designed to give you some intuition why I might want to start with the shortest job, right? And we'll see in general that when we schedule threads, whether a thread blocks in its quantum or completes the quantum and then I have to preempt it is frequently used to guess something about what that thread is doing, right? Is it a long running computation that constantly hits the end of the quantum? Or is it an interactive task that wakes up, runs a couple of instructions and goes back to blocks again, right? And that input is used in a variety of ways in a large number of scheduling algorithms. So let me talk about, so, right, yeah. So again, normally we can't predict the future and here are the reasons why. These are probably seem obvious to people, right? Yeah, so do you. So the choice of quantum is largely a function of the trade off between the ability to make more frequent scheduling decisions and the overhead of making the scheduling decisions themselves. So on most systems, the quantum, if I choose quantum that are too large, what happens, right? Well, somebody who's waiting on IO that IO finishes, but somebody else is running and our quantum completes, it's so big that your system starts to seem sluggish. If I choose a quantum that's too small, then I have to make scheduling decisions over and over and over and over again and the overhead of that starts to add up, right? So in general, the longer the scheduling quantum, the schedule of quantum is too long then the interactivity starts to suffer. If the scheduling quantum is too short, then the overhead of scheduling starts to dominate, right? So that's that trade off, right? Yeah? No, no, no. The quantum in many cases is adjusted based on other things, right? So especially on Friday we'll talk about the quantum that's assigned to threads and more mature schedulers is varied, right? So high priority threads will get larger quantum, right? Low priority threads will get shorter. So the quantum does not have to be fixed, right? It has to be some, you know, some multiple of the timer and up, right? So this also means that on a lot of systems, the timer interrupt will fire more than the scheduler runs, right? So it's possible that on a Linux system, the timer interrupt fires and the scheduler actually doesn't run, right? The system just traps it says, I don't need to make a scheduling decision and gets out as fast as possible, right? So, but you need to be woken up periodically because sometimes you do need to reschedule things, right? But every time the timer interrupt fires, you may not make a scheduling decision. You may just get out of there as fast as possible, right? All right, so again, what we're gonna try to do, we're gonna try to use the past to predict the future, right? We can't predict the future. Control flow is unpredictable and users are unpredictable. A lot of the unpredictability of systems comes from the fact that the system is not able to read your mind and figure out what you're about to do to it, right? So let me present, so this is kind of the, this is a classic, you know, chestnut scheduling algorithm that we have to talk about but it's actually pretty important, right? So let me present it first and then we'll walk through an example, okay? So multi-level feedback use, you know, essentially are designed to prioritize threads that block before they finish their quantum and penalize threads that complete their quantum. And the way I do this is I add a new abstraction to my scheduling model and that abstraction is level cues and levels, right? So I imagine that I establish some number of cues, or I really should say I establish some number of levels, right, and each level is represented as a cue, okay? Or even a pile, maybe. My scheduling algorithm will always choose threads from the highest level first, right? If there's a thread in level zero and that's my highest level, that thread will run before a thread from level one, okay? And so how do I move threads between levels? If a thread blocks or yields, now this doesn't always happen immediately but if a thread blocks or yields, the simplest way is to say, okay, you know, you're a good citizen, you finished, didn't use very much CPU, you gave, you used less than I allocated for you. So I'm gonna move you up a level, okay? If a thread completes its quantum, I'm gonna toss you down a level. Now in general, there's usually more hysteresis than this, right? So usually I might say, you know, the thread completed its quantum 10 times and only blocked once, so I'm gonna move it down, right? After a certain number of scheduling. You imagine that this produces a huge amount of dynamism, right? Threads can shoot up and down very fast and generally I may want threads, I may wanna smooth that out and I do that by not making, not moving them immediately but kind of watching the behavior over a number of quantum, right? And then making a decision about how to move them between cues, right? But this is the simplest form of this, right? If you block, you go up. If you hit the end of your quantum, you go down, right? So let's go through an example of how this works, okay? Yeah? How would you do that? Yeah, yeah, but you, but the system may say, you know, my decision is based on the proportion of your quantum that you use, right? Yeah, so you're right. This as I've described it has a lot of very hard edges, right? You know, if I hit the end of the schedule the quantum immediately go down, right? But I might make a decision about again, like what proportion over the last 10 time quantum, what proportion of your quantum have you used, right? And if you're using small amounts of it, which indicates that maybe you're interactive, I'll move you up. And if you're using the whole thing every time, move you down, right? So yeah, so this is usually made in a smoother way, right? But I think this is the easiest to understand, right? All right, so let's go through an example of this, right? So now I don't just have a ready cue, I have levels at the top, middle, and bottom. And I still have my waiting cue, okay? So I start by running T3, you know, just pick something out of the top cue. Everyone starts on the top cue. And T3 blocks. So T3 starts to wait. Now when T3 is finished waiting, where will it go? Top, right? I can't move it up, it's already in the top, so it goes back in the cue it's in, right? Okay, now T5 ran and hit the end of its quantum. So what do I do with T5? So it stays in the ready area, but it's moved down to the middle level, right? Okay, now T4 runs and blocks, what do I do with T4? It goes into the waiting cue, and when it's done waiting, where will it go? Back to the top, right? Now I run T1, T1 also, now, okay, trick question, where does T1 go? It depends, right? Maybe T1 yielded, so T1 goes back to the ready cue back to the top, right? I only go to the waiting cue if I blocked, right? If I yield, I'm still runnable. So I'm back on the top level, okay? Now I run T2, T2 blocks, T2 is gonna go into the middle, right? It needs to be moved down, it hit the end of its, sorry, it didn't block, hit the end of its quantum, hit the end of its quantum, goes down to the middle, right? Now I run T1 again, I started to fast forward a little bit here, so T1 blocks, it goes into the waiting cue, now what happens? Who do I, what threads do I run, or what? T5 or T2, right? The top level's empty, okay? So now I have to go down a level, and, you know, maybe I run T5, T5 blocks again, I toss it in the, sorry, T5 does not block, ah! T5 hits the end of its quantum again, I toss it in the bottom level, okay? And now what do I do, who do I run? T2, except, oh wait, T3 finished waiting, now who do I run? T3, okay? So this is multi-level feedback cues, okay? Questions? The IO that it initiated completed, right? No, no, no, I mean T3 didn't decide, right? T3, so maybe what T3 was doing, it was it was waiting for you to click a key, and right in that moment you did, and so now it's done waiting, and the kernel will move it back into the ready queue, right? Right, exactly, if T3 had not finished waiting, right, then T2 would have been run next, exactly, right? Could you potentially starve one of the threads? Ah, yes, could you potentially starve one of the threads, yeah, so let me get through this, so just let me go through this quickly, we'll come to your question Isaac, so, so what happens to CPU bound threads and multi-level feedback cues? Where do they go? Down, they descend to the depths of the queue, the levels, they descend to the, whatever, anyway. I could keep going, but you can't do where I'm going with it. IO bound threads, where do they go? Halo, rise to the heights, they go with the, whatever, I'm getting to Christian Norman. Okay, they go up, right? Now, can anyone spot a problem with this approach? I'll pre-up this question because Isaac already solved it, yes, right, so in general, one of the things that we need to worry about when we write a scheduling algorithm is starvation. Can a situation arise where a thread or a set of threads never run? In general, this is not good, right? I mean, remember, I want the illusion of concurrency, okay? I'm not willing to say some threads don't exist so I could provide a better illusion of concurrency to some other threads, right? If the thread is on the system, it should run, right? And actually again, we'll come back on Friday because some of Cohen's work focuses on this, right? You know, fair scheduling, like every thread should be able to run sometime, once in a while, right? Certainly not never, okay? And how does this work? Well, if threads in the top queue keep doing little short bursts of IO that complete, they're just gonna keep bouncing back and forth between the waiting queue and the top level. And if there's enough of them, that can go on ad infinitum and the threads down in the basement never run, they haven't seen office space? Yeah, so you guys know the whole scene with Milton where they keep moving his office. That's kind of like what happens with the threads in this one, right? We're moving you to storage B. Yeah, so you don't want to be down there, especially if there's a bunch of threads in the top level that are kind of bouncing back and forth, right? One solution to this that people have used and some people that implement MLF queue for assignment two do this is to periodically just kind of start over, right? To periodically reboot your schedule and be like, okay, that guy at the bottom hasn't run for a long time, so I'll just, everybody back in the top level, right? And this is kind of a hack, right? So it's not really clear that this is an elegant solution to this problem, shall we say. There are other solutions that maybe are more elegant, okay? So let's talk a little bit just before we finish about ways of incorporated input from outside the scheduler. So up until now, we've talked about schedulers that incorporate no input, schedulers that can predict the future, and then the MLF queue approach which uses the recent past as a predictor of the thread's future, right? And one way that we can incorporate some external information into this system is by using priorities, right? So priorities are a common way that we as users think about scheduling. And the reason for that is that's one of the interfaces that's exposed to us as users to adjust the scheduling algorithms, right? And you can imagine, you know, either you or potentially the system itself can dole out these priorities, right? So maybe you are a hardcore Linux hacker and you realize that your video is choppy, so you go and you nice it and you give it a boost, right? What's probably more likely is that your Ubuntu system is set up to run different tasks with different priorities, right? So the indexing task, it runs at a very, very low priority to keep it out of the way of stuff that you might be doing. So you may not see these priorities, but they still be in action, right? One of the problems with priorities, priorities are relative, right? These are always relative. There's some, you know, arbitrary, you know, the old Linux schedule, I guess had 140 different scheduling priorities. I have no idea where they came up with that number. I was looking and how it was implemented and it was like, you know, we searched for a bit in five words, so five 32-bit words. And I was like, does that multiply to 140? No, no, no, they could have had 160 with no problem, right? But somehow they decided I won 40. I don't know where that came from. So anyway, and, you know, because priorities are relative, you can imagine that that creates these issues, right? Where, you know, what you really want is kind of an ordering, but that ordering is completely, I mean, that ordering could be established at a variety of different places within the overall priority scheme. So this creates some issues. One thing that strict priorities can do is they can starve threads very easily, right? So if I use strict priorities, if I'm always running high priority threads, right, and those high priority threads are ready to be run, again, I have a starvation condition where low priority threads may never have access to the CPU. So one sort of clever solution to this is something called lottery scheduling. A lottery scheduling was proposed, I think, maybe 15 years ago and got a little bit of attention at the time because it's kind of a slick way to think about scheduling, right? So rather than doing strict priorities, what I do is that I give out lottery tickets, right? So imagine I hold a lottery in this room to decide who's gonna answer the question, and rather than giving you guys one ticket each, I give each one of you a different number of tickets and then I collect all the tickets and choose one at random. So what that allows me to do is establish priorities, right? So if I give you a large number of tickets, it means that I really want you to be chosen to answer the question, right? But at the same time, everybody has a chance, right? So you never have a zero chance of being scheduled, right? Even if you only have one ticket, it might be your lucky day. Liarity, so a lottery scheduling also provides a nice solution to priority immersion problem that we didn't discuss and I maybe I'll put something up about that because it's kind of cool. And so the other thing I might do with a priority rather than figuring out which thread to run is as somebody pointed out before, I might use those priorities to determine how long the quantum is for the thread, right? So a higher thread may, all the threads may still be run, but the higher threads may be run for longer scheduling, right? I may not stop them for a longer period, okay? So just to finish up today, priorities in lottery tickets are, I think are kind of cool examples of scheduling abstractions, right? So we've talked in this class about abstractions, we talked about the thread abstraction, we're going to talk more about address spaces and other things in the future. But again, both priorities and tickets are these fictional things that we've decided to map on to what is really just a stream of instructions that is competing with other streams of instructions. So it's kind of a neat way to think about it. All right, I'm done. On time for change. So again, so on Friday we're going to talk about the completely fair schedule or if you missed my pitch at the beginning, this is a really, really cool story, right? So this Australian anesthetist did all this Linux hacking and maintained all these really neat pieces of code and then one day it was just like, I'm done, tired of this, and so we'll talk a little bit about Linux development, we'll talk a little bit about his contribution and we'll talk about the brain fuck scheduler, which is his new contribution to interactive scheduling on Linux, all right? I will see you on Friday.