 All right, good morning, everybody. Welcome back for another week of class. Looks like starting to thin out a little bit. OK, so today we're going to talk about a real Linux scheduler. So this is kind of intended to be a little bit of a fun lecture, not that all my lectures aren't fun, but maybe this one is funner than some of the other ones. So I'm going to introduce you to something called rotating staircase deadline scheduler, which is a neat attempt at improving interactivity and intelligibility within the Linux scheduling subsystem. We'll talk a little bit about Linux development, give you guys a little bit of an introduction to how things work or don't work sometimes, it seems, in the Linux community. And yeah, so that's the plan for today. So yeah, I mean, at this point, my suggestion would be that you guys be wrapping up assignment one. If that's not where you are, then you're behind. And if you guys are still getting started with assignment zero, then you're really behind at this point. So I can't say this is enough. You guys are seniors, graduate students, so this is your responsibility. If you leave this stuff to the last week in the class, you will do badly. And that's really just up to you as far as this goes. So this week, recitations are going to start covering assignment two. And that's kind of where you should be at. Again, please, I think there's been some nice responses to things in Piazza where people have been able to answer some of each other's questions. That's cool. Please feel free to help each other with the assignments subject to the constraints that are set by the collaboration guidelines. So talking about code is great. Talking in code, do people do that, is not so good. Various ways of doing that. And we will start to roll out the similarity checker once we start to see some of the bigger assignments come in, assignment two and assignment three. So please don't try it, because it's a sad thing to have to go through that process. All right, so last week we finished up talking about scheduling algorithms. So we looked at some fairly simple scheduling algorithms. We looked at some scheduling algorithms that rely on no information. We looked at some scheduling algorithms that rely on more information than a realistic system would ever have. And then we introduced the multi-level feedback cues, which is kind of a classic example of a real-esque scheduling algorithm that uses the past to predict the future. So any questions on this stuff before we do a little bit of review? OK. So what kinds of information could schedulers use to choose the next thread to run? Satish. OK. Yeah, OK, if I knew the amount of time that the task was going to take to complete, that would be some information I could use. Dan, what's something else? Yeah, so maybe some information about what it did last time, how long it ran until it blocked. What are other pieces of information? Swatha, give me one. What's that? Priorities, right? So I have a way of layering on information from outside the system potentially. Yeah, maybe what it needs, right? So what will happen next, right? So there's this whole family of things I might want to know that fall into this category, right? And these are the things that are available to oracular schedulers. Typical schedulers try to use the past to predict the future, right? So some information about what the thread just did. And then a variety of ways of incorporating some user input or sometimes just some predefined system input into what's important and what's not, right? So sometimes the operating system will run some of its threads at a very high priority because it knows that these are very important, right? Or some of your systems without your knowledge or may run things at different priorities. So when you boot up Ubuntu, it's configured to run different daemons with different priorities because it has some information about how important those are to the overall functioning of your system, right? So this may not be you producing this input, but it's an extra input coming into the system. All right, two examples of schedulers that don't use any information about threads or anything else, really. Richard, give me one. About nothing, no information. Round Robin. OK, Round Robin. Just run things in some predefined order over and over again. Bart, what's another one? Yeah, just random, right? So these are canonical examples of schedulers that don't use any outside information, right? So OK, so we just started to get into this, but let's say that we knew something about the future. Somebody said we would not want to know how long that thread would run until it blocks. What else might we want to know about that thread? Yeah, give me an example of something. OK, OK. I said block, but that's true, right? In general, how long it'll run before it stops, for some reason, right, before it calls yield or blocks, right? Thornton, what's something else I might want to know? Thomas? If it blocks. Yeah, so that's a good point. How long, I mean, assuming it's going to finish at some point, right? But will it finish before the next time when I'm at Simon? Yeah, maybe what kind of hardware resources is it going to require? Is it going to do a network reader, right, or something? Wembley, what else? How long it might block before? How long it might block before, right? These are all great answers, right? How long is it going to CPU? Will it block? How long will it wait if it starts to wait, right? OK, those are great. All right, so instead of predicting the future, we, you're starting it wrong. You're going to know the chant, you've got to know it, right? Use the path to predict the future, right? You could also say look at, but I think use, just nicer. You don't have to look at it. Sometimes you could not look at it, but still process it in some way, right? We use the path to predict the future, right? This is classic operating system design, and we'll come back to this idea over and over, OK? Maybe not it. Monday morning at 9, 10 AM. OK, so who remembers multi-level feedback cues? Who remembers that they exist? OK, wow. It wasn't just me that doesn't remember Friday that well. OK, so multi-level feedback cues. What's the first step? That's the first thing I have to do, Navia. I'm going to set up a multi-level feedback cue system. She doesn't know. What's that? OK, well, that's on my list of things to do. I don't know if it's the first thing. Alyssa, what about a first thing? OK, I have to set up some cues. That's also good, right? But maybe even before that, Sean. OK, yeah, I might assign. Well, I don't even know if feedback cues need priorities. Feedback cues kind of assign priorities naturally, right? What's something else that I'm forgetting here? What's a parameter of this system that's a fairly basic parameter? Money. Choose a scheduling quantum, right? I'm going to let threads run. Remember, this is what makes the decisions for me, right? Whether or not threads hit this barrier, right? I let the thread go on the processor. If it wax into the scheduling quantum, then what do I do to it? Where does it go? Up or down? Down, right? But this is pretty important, right? So I need to pick this. So then I established some number of cues, as Alyssa said, right? Each represents a priority level, essentially. So priorities kind of emerge from the system more naturally, right? I could layer priorities on top of this, and there's some fairly obvious ways to do that you guys can think about. I'm choosing threads from the highest level cues first, right? And then what do I do? What's my algorithm? How do I run this thing? Andrew, tell me what the first thing I do. OK, so I find at the highest occupied level, I pick a thread from that. How would I pick a thread out of the level? Yeah, Jeremy. Would you pop it out of the queue? I could pop it out of the queue, but let's say I have a bunch of threads at that level. What would I do at that point to pick one? Yeah, I just do something dumb. Do random around Robert or something. All the threads are essentially equivalent, right? They're all sitting there in the same queue, right? So I pick a thread from the highest occupied queue. Sam, what do I do now? Right, so I choose the thread. If the thread blocks or yields before it's scheduling quantum expires, I move it up. If it doesn't, I move it down, right? And again, there's usually some more hysteresis that we sort of impose on this process, right? So we probably won't move it up immediately, but we might start to think about moving it up if it yields, right? And we might just start to think about moving it down, right? OK, so this is pretty much the algorithm, right? So again, CPU bound threads. Where do CPU bound threads go, Nathan? Yeah, so CPU bound threads are going to keep hitting the end of their quantum, right? And they're going to go down, right? Peng, what about IO bound threads? Right, where else would they go? Someone's got to go, huh? Except in Wembley's system, where everyone is dungeoned. OK, so this is ML. So, and again, why would we prefer threads that give up the CPU before their time quantum ends? Why would certain types of schedules, including some we're going to talk about today, do this? Yeah. What's that? Well, OK, they're fully executed, so they might have exited. That's true. But what else might they have done, right? Yeah, it does do that, but why does it minimize the way of time, yeah? Yeah, OK, that's kind of the same thing. Might do it again, but what, I'm assuming that they're going to run a little bit. They're going to block, and then what else is going to be happening? They blocked because why? Spencer, I'm trying to get that Spencer. What's that? Who are you? Sean. Sean, why, if they blocked, what are they doing? What's likely about the system? What just happened? Yeah, you may be. Jeremy, OK, but they're waiting for something to happen. I like that answer, but let's keep going with that. What does that mean, yeah? Yeah, they've engaged some other part of the machine, right? Remember, I have all these different resources, right? Some of them don't require the CPU to be active to use, right? So I have network sockets. I have disks or whatever. And if a thread runs for a little while and blocks, hopefully what it's done is it started up some other part of the machine doing something, right? And now, if I have a thread that really needs CPU time, I can run those two threads on top of each other. And the thread that needs the CPU can be busy while the thread that needs the disk has the disk doing whatever it needs to be doing, right? So the idea is that they're probably waiting for something else, which can be paralyzed against the CPU, right? All right, any questions on this stuff before we go on? OK, good. So again, today is kind of like story time. So when I was working on this lecture last year, I was like, oh, OK, I'm going to talk about this scheduler, because that's still the scheduler that's the most up-to-date. And then I started doing some research, and I discovered that it wasn't up-to-date. You know, it's not necessarily a part of a system that you would expect to turn over often, right? You would think maybe after, I don't know, 50 years, right? That we'd have figured out thread schedule, right? Given that it's kind of important, and a lot of people have worked on Linux over the years, and maybe we would have figured this out. But it turns out there's still a lot of ongoing development. And the story that emerges is kind of interesting, right? So the first thing that it provides some insight into is the fact that Linux as a free and open operating system is now supporting massive numbers of devices, right? So Linux runs on phones, it runs on servers, it runs on desktops and laptops and embedded devices and all sorts of things, right? And so there's this big community that's emerged. And there's not always a lot of, you know, there's not necessarily one voice frequently when it comes to how to do some of these things, right? And there is space within the Linux world for a fair amount of diversity, which is part of what makes it really resilient, right? But at the same time, you might think to yourself, you just might conjecture, for example, that maybe it's unlikely that one scheduler would work well on all of these types of devices, right? Given their different needs and different resources and different expectations that people have of their performance, right? Some of this, right? You have a big community. It's kind of a decentralized community. People have strong opinions. People in charge are known for having strong opinions. So sometimes what comes out of this is a little bit of tension. And then, you know, the fundamental difficulties in just running a project like this, right? You have a project that has, you know, thousands of volunteers, right? Those people are doing this because they think it's fun, but they also do want to have some impact, right? They want to see their stuff, you know, become part of, you know, the main line tree or become used by other people. And so there's some of this as well, right? And then also, again, we talked a little bit about this before, right? A lot of scheduling comes down to performance. A lot of performance comes down to trying to measure stuff, right? But then what you find, at least according to one of our protagonists today, is that there weren't a lot of good tools out there for interactive benchmark, right? There's plenty of tools for, you know, running server performance and throughput tests and all sorts of things like that, right? And that community for reasons that we'll talk about in a second may be overrepresented when it comes to thinking about Linux design and performance evaluation, right? But we'll talk a little bit about some new tools for interactive benchmark. And then also like this guy, you know? Like, he's an anesthesist, right? Like he, I guess spends his days hovering over operating tables where somebody's, you know, and then at night he goes home and hacks Linux, right? So that's kind of cool, right? I mean, you guys have a day job, yes, you know? So, you know, if you ever wonder if you can get involved with Linux, I think this is kind of a neat example of the factor, neat existence proof, right, that you can, right? And still have a life and have kids and have a family, you know, make some money doing something else. So this was from a year ago. So this has probably changed, right? But, you know, this is Linux, right? So, you know, 9.2 million lines of code, right? And most of it, I should say most of it are the majority of it in device drivers, right? Code that's supporting very, very specific hardware components, but just a huge code base, right? I mean, this is pretty impressive. I wish I had a line count for your OS 161 kernel, but it's not, it's nowhere close to this, right? It might be over 10,000, but it's not 9.2 million, right? A development community, I mean, it's difficult to, you know, estimate how many people are working on Linux at any given time, but it's several thousand, right, sort of active developers that are actively contributing code into the mainline kernel. And then, you know, putting out new kernels every three months, right? Roughly, yeah, Jeremy. Yeah, almost all of it, I think. But, yeah, so C is still in many cases, and I think this is even true at Microsoft. I shouldn't speak for that because I haven't seen their code base in a long time, but, but yeah, C is still the language of choice for low-level systems development. Yeah, and so this is a really prolific project, right? And really, I think in many ways, it's a really, you know, we're used to it, right? As computer scientists, we're used to Linux being there, right? But the fact that Linux exists is pretty wild, right? And it's pretty neat. So, yes, big caveat. I don't actually know much about Linux. I've never contributed to Linux, patched to the Linux source tree. Thanks to Guru, I'm actually getting more involved in Linux now, and it's kind of fun. But I don't know much about this. I'm an outside observer, right? I don't know any of these people. But let's talk a little bit about the development process, right? So, in Linux's ideas, you know, what you want is somebody to be responsible for something, right? You have a big project, lots of volunteers. The idea is that every file, you know, every part of the tree should have a maintainer, right? And that maintainer is the person who gets bug reports, right? Something goes wrong. You're the maintainer. You get an email from someone saying, here's what went wrong. I think this is in your piece of code, right? And fix it, right? Each subsystem of the kernel, right? So, the scheduling subsystem, right? Which is built up of a number of different components, right? The memory management subsystem, right? The, you know, different parts of the IO subsystem. They all have their own maintainers, right? And that person is kind of supervising maybe a small group of people or a large group of people that are working on the things that go into that subsystem, right? And then at the very top of the tree, you have these guys, right? You know, Linus and Andrew who are really, I think, and I haven't updated this in a year, but at least at the time that I wrote these slides, they were the gatekeepers, right? So, they're the people who, you know, if you want a patch in Linux, right? If you want to change something in Linux, one of these guys is gonna look at it, right? With very, very high probability, right? So, to some degree, there's a lot of people working, right? But there's still a pretty small handful of people with, you know, the keys to the kingdom, right? With the people who really can get stuff onto the mainline Linux tree, right? So, and essentially what Linux does, and I think this is pretty neat and sort of healthy thing, right? Is that Linus, Andrew and a few other people maintain this mainline Linux kernel tree, right? So the mainline kernel tree is used to produce official releases, right? Then all sorts of people, right? From companies to individual developers, maintain their own forks and clones of the mainline repositories, right? So for example, one of the guys we're gonna talk about today, Con Calivis, maintains his own Linux kernel tree, right? And if you clone that, you get a bunch of his patches and added features that aren't in the mainline tree, right? And sometimes they're not in the mainline tree because, you know, they're still working on getting those features into the mainline tree. And other times they're not in the mainline tree because they're never gonna be in the mainline tree, right? Because they're things that that person thinks are useful or important and the mainline people don't, so to some degree, and Git, which you guys are using was built entirely for this purpose, right? I mean, Git was a project that, you know, tool that Linus put together based on the needs of the Linux community and Linux did all, right? So to some degree, allowing for this diversity is a big part of what makes these sort of distributed source code maintenance programs work, right? So here's one example that's sort of apropos for today's material, right? So imagine, you know, Linux runs on desktops, right? And maybe not that many desktops, right? How many people have a desktop that runs Linux or a laptop or something like that, right? Yeah, it's okay. So you guys are nerds, right? Like, you guys aren't normal people. That's important to keep in mind, right? But I don't know what, you know, Linux desktop numbers are now, they're probably still only a couple percent, right? But Linux runs on a fair number of servers, right? And you think about these communities, right? The server Linux guys, right? I mean, these are companies that frequently are, you know, deploying Linux in production environments, right? They have teams of engineers frequently, some of these guys get paid to work on Linux development, right? But to some degree, you know, this is a sort of well-funded, well-established community with fairly concrete goals, right? So these guys have lots of benchmarking tools that they use to test things and to identify regressions in various versions of Linux. They've got companies working for them that pay them to produce performance on top of Linux. And these guys are, again, frequently some of what their day job ends up being is, you know, at least communicating with or actively participating in Linux development, right? So server Linux, right? In corner one, right? Or, I don't know, I've never been to a boxing match. How did I identify the corners? In this corner, server Linux, right? And then in this corner, right? You got desktop Linux, okay? You know, poorly defined goals, right? Interactivity, how do you measure interactive performance? Right? It's a fairly slippery concept, right? You got users that were too cheap, right? To buy, like, a real operating system, right? You know, so they're like, oh, yeah, or whatever, or too weird, okay, whatever. But to some degree, you know, this is, and then, you know, development, it's like these people frequently just want their computers to work, right? Again, I mean, some of the people who put Linux on their machines are doing it because they want to hack the machine and stuff like that. But to some degree, you know, it's like, I don't know, people, how many, so how many people who use Linux have tried to install Linux on the machine of somebody else that you knew who you thought it would be a good experience for them and that turned out to be terrible, right? Yeah, me too, right? You know, like, grandma's not ready for Linux, okay? Hey, grandma wants tech support, grandma wants things to work, okay? All right, you know, David Holland, who did a lot of the OS 1161 stuff, I think, has been trying to, you know, get his mother to use like, met BSD or something, right? So that's even a tougher self, right? So anyway, so these are, you know, again, you can kind of see, like, this is not necessarily going to be a very fair fight, right? The Linux kernel mailing list, how many people have ever subscribed or tried reading the Linux kernel mailing list, right? Yeah, I did this once and it was terrible, right? I mean, it's like, there's so much going on, there's so many different arguments about some little tiny detail of some device driver that you've never heard of and never read and don't care about for some device that you never even knew existed or whatever. So this is like, you know, and, you know, if you were, you know, let's say, you're, you know, your grandma, right? And somebody put Linux on your machine and you're like, I'm going to email these guys for help, right? Like, this is the Linux kernel mailing list, right? I need help with one of, you know, my machine doesn't seem very fast or whatever, or I can't find a web browser, right? Yeah, yeah, this is, you know, grandma will get yelled at for posting on a Linux kernel mailing list and most people will too, even if you've achieved a pretty high level of technical sophistication, right? All right, so let's start talking about Linux scheduling, right? So that's kind of like the Linux community to some degree. So before kernel version 2.6, Linux systems used a scheduler that did not scale well, right? We've talked a little bit about scheduling overhead, right? And we've said to some degree, if I make the time quantum very small, then the scheduler has to run a lot and a lot of my cycles are being used in the scheduler, right? If I make the time quantum larger, then the scheduler runs less often. This scheduler was even worse, right? This scheduler, the runtime actually scaled with the number of threads, right? So as my system gets more threads, as it's getting busier, as there's more to do and more threads to choose from this scheduler is getting in the way more and more, right? So this was a bad thing, okay? And at some point, the Linux 2.6 scheduler was designed to try to address this problem, right? The goal was to produce an O1 scheduler, right? And Ingo Molnar, who's the Linux scheduling subsystem maintainer produced this O1 scheduler in Linux 2.6, right? And it's important, we're just gonna spend a slide or two on this because it sets up kind of what follows, right? So the O1 scheduler in Linux 2.6 combines two priorities, right? There is a static priority and there's a dynamic priority, okay? So based on what we've talked about, right? What do you think the static priority is? Where does that come from? Yeah, so the static priority is the, set by the user or system, right? Using nice, like this is what, you know, this is nice as a tool, it has not a fantastic name in terms of what it actually does, but you get used to that. So nice is a tool that allows you to adjust the priority of a task on Linux, okay? And the static priority was set by the user or system using nice, right? And there's a default value, et cetera. What about the dynamic priority? Where do you think that came from? Plus me, none of this is new. You guys are familiar with both of these ideas. Paul, dynamic priority. What have we talked about as a way of kind of assigning dynamic priority? We're gonna ignore you, Jeremy. Kevin, we just talk about this, a system that assigns, to some degree, assigns priorities dynamically. Okay, and what idea do multi-level feedback use to assign dynamic priorities? Yeah, so the idea with the dynamic priority was that it's this boost to the static priority that was intended to identify and reward interactive threads, right? So this is where, to some degree, interactive threads on the system started to get some preference, right? Now what happened with the O1 scheduler, at least according to some people, was that the code necessary to compute this dynamic priority, right? Or the interactivity boost got really gross, right? It got really complex, and there were all these constants in it that people didn't understand where they came from and stuff like that. And what happened was two things. I mean, A, this code got very difficult to maintain because you looked at it and you said, I don't even know what it's doing, right? Like what is this random number that I'm using as a multiplier in the middle of this function, right? And then the other thing was when you started trying to think about how the scheduler would work in certain cases or under certain workloads, it became very difficult to model, right? And that's something that you might say would be a desirable feature of a schedule, right? Would be to say, if I give you a certain number of threads, like maybe this would happen on an exam, for example, I give you some threads and I say, how would this scheduler schedule these threads, right? Like go walk through a couple of rounds of the scheduler. When you start having these really, really weird, complex, gross functions that are calculating these constants that go into making these scheduling decisions and that becomes very hard to do, right? Because these schedulers became very difficult to reason about, okay? All right, so enter the picture, Khan Kalevis, right? I have no idea if he's saying, if I'm saying his name wrong. Maybe at some point I'll get an email from him saying I'm not. But, so this is an Australian anesthesist and he became interested in Linux. So he started doing some Linux programming and his interests started to turn towards scheduling, right? And particular to interactive scheduling, right? So Khan started to say, can we do a better job of scheduling workloads on interactive systems, right? So again, and here, when we thought all was lost, right? When we had like the big Goliath type creature over in the server Linux corner, right? And this little weak lane over here in the desktop Linux, right? With like, you know, white hair and glasses or whatever. You know, here's our hero, right? Who's gonna ride to the rescue of desktop Linux and actually like devote some time and energy to this, right? And this is a smart guy, so this is a good thing, right? Okay, so, you know, Khan started to try to formulate some ideas about what some of these concepts were, right? I won't read all of this, but you know, when you, this is I think a nice example of how to approach different types of problems in computer systems, right? So one of the first things he struggled with was, what am I trying to improve, right? I say I want to improve interactivity, okay? But in order to improve something, you need to know what it is, right? In order to improve something, you need to be able to measure something. You can't improve something if you can't measure it, or you can't improve it, but you're just pretending that you improved, right? You can say to grandma, hey, this is a better scheduler, right? But there's no numbers behind it, right? That you can't measure it, you can't actually determine how much better your system is, right? So this in particular, and I can post some of this on Piazza, you know, you guys can look at it online, but in particular, there's a nice, I think, distinction he draws here between interactivity and responsiveness, right? We talk about those things sometimes like they're the same thing, but he found it very important to make a distinction between those two things, right? So you know, interactivity is, so let's see here, interactivity would allow you to play audio or video without any dropouts, or drag a gooey window across the screen and have it render smoothly across the screen without jerks, right? So it's a nice thing, I mean, it's a sense of like intuitively, what, you know, what does that mean? Responsiveness, right? Would allow you to continue using the machine without too much interruption to your work, right? The rate at which your workloads can proceed under different load conditions, right? So this is a nice, you know, he's trying to draw some distinctions here, right? He's trying to make things more concrete. And the other thing he did was he wrote two new benchmarks, right? The first one is called contest, and the second one is interbench, right? Contest is designed to measure responsiveness and interbench was designed to measure interactivity, right? So, you know, when you start working on something, define what it is you're trying to improve, write some tools that measure that thing, right? Now you're in a position where you can actually do some work, right? Because you can measure the changes that you're making and see if you're making improvements, okay? So in 2004, Kalevis released what he called the Rotating Staircase Deadline Scheduler, okay? And one of the goals of the Rotating Staircase Scheduler was to get rid of all of this black magic that was in the original 2.601 schedule, right? So his, you know, this is, you know, partly from his, you know, email releasing it, right? So, you know, you took out this big, gooey mass of difficult to understand code and I replaced it with stuff that's fairly clean and simple, right? And we'll walk through an example of how it works, right? And there are, there are some similarities here to NLFQ, right? The Multilevel Feedback Cues, but it's a little bit different and I think a little bit more elegant, okay? So here's the description, right? This is a starvation-free strict fairness on one scalable design with interactivity as good as the above restrictions, which I didn't list, can provide. There is no interactivity estimator, right? No sleeper on measurements and only simple fixed accounting, right? The design has a strict enough, a design and accounting, the task behavior can be modeled, right? So, and the other thing that's important here too is that I can make guarantees about latency, right? So I can say, given a certain thread of threads, what is the maximum amount of time that will pass before a certain thread will walk, right? And that's actually a pretty important thing when you start thinking about interactivity and responsiveness, right? All right, so we have one parameter, right? So we're gonna walk through an example of this, right? Which is the round-robin interval. Actually I think there's two interlocking intervals. This is a somewhat simplified example, right? And then there's one input, which is a thread priority. Okay, the thread priority determines how many times during each epoch or round-robin interval that thread gets to run, okay? And then this is like my mother of all animations for the semester. Okay, so here's my threads initially, right? And I've color-coded them based on their priorities. So these are my top priority threads. These are my priority one threads or middle priority threads and these are my lowest priority threads. The first thing I'm gonna do is I'm gonna sort them into these ranks, right, or levels, okay? The next thing I'm gonna do is that at the beginning of each iteration of the schedule, right, we're gonna walk through part of a full iteration of the scheduler. Each queue is assigned a maximum amount of time, that threads inside that level will be able to run. And that's done using very simple accounting, right? So here, for example, I assign each thread at the top, each thread in every queue a quota of five time units, right? What that means is that the top queue has a total of how many? 15, okay, this is not basic math, so I won't call people to figure these things out, right? So here's where I am at the beginning of time, right? My top priority queue has a total quota of 15 and this becomes important, right? Because as soon as threads in this queue have run for 15 time units, no other threads inside that level are able to run, right? This is part of where my guarantee about the maximum amount of time that it can take for any thread to run is established, right, so for example, if I said given this setup, what is the maximum amount of time before a thread, let's say before either thread nine or thread four runs, right? So again, part of the rule is when a priority level exhausts its quota, no threads can run at that priority level anymore, right? So what's the maximum amount of time that thread nine or thread four will have to wait before they can run? 35, right? It's the total in quota two and the total in quota one. The total quota in priority two plus the total quota in priority one. It can be less than that, right? There are cases where it can be less than that, but that is the most it can be, right? And that's a very, very nice thing to be able to say about a schedule, right? Be able to say, you know, the longest it's gonna be before thread four or thread nine runs. And actually, if you wanted to be stricter about it, you could say the longest it's gonna be before either one of these run is actually 40, right? Because depending on which one runs first, right? The other one will have to wait for maximum of additional five, five times. All right, so let's see how this works. On some level, this is pretty similar to MLFQ, okay? So given that, what do we think we do first, Jen? Who's gonna run first here? Thread five, right? Thread five is at the top priority and it's first in line for whatever reason. I can just, this is another case where I can just round Robin between threads at the same priority level, right? So I'm gonna run thread five, okay? And thread five's gonna run for some period of time. So as thread five is running, I'm draining both its quota and the quota of the priority level where it came from, okay? When a thread exhausts its priority quota at a particular level, this is why this is called the staircase scheduler, okay? In MLFQ, what would happen is that I would just, you know, keep going and I would, you know, this thread would be moved down, right? But, and actually that is what happens here, right? But what happens with the staircase is thread five is now moved to priority one, okay? But it's given a new quota in priority one. And priority one's overall quota is also adjusted, right? I think, I might be wrong about that actually. So this is the important thing, right? So at the beginning of, it turns out at the beginning of time, every thread goes back to, when I start the schedule, every thread starts at the priority level that it's assigned. As the scheduler runs, if I use up my quota in priority two, then I get another chance to run. I'm not finished, right? But now I have to compete with all of these threads of priority one, right? So essentially what happens is this guy ran, he exhausted his quota at priority two, now I'm gonna push him down to priority one and I'm gonna give him a new quota in priority one. Yeah, Jeremy. That's a good question. I don't know, I'm betting that what happens is this guy gets put on the end of the queue for priority one, right? Okay, that's a good question. Any other questions about this, right? Alyssa? Yeah, see that's why I think I might be wrong about this. I think there's actually a bug here, which is I don't think that the quota of priority one is increased in this case, right? But again, yes, that's a very good point, right? So essentially what Alyssa pointed out is that if I do this, right, I've already used up five, right? But now I still have 35 to go before I can run things in the bottom queue. So I think that actually it turns out that I don't increase the overall quota for priority one, right? So what'll happen is this thread will be able to run at priority one, but only if these threads run and don't exhaust the time that was there before, right? So I think that these values, and again, I wish these slides didn't have this bug, but these values are assigned at the beginning and they're not, they're never increased, right? I could be wrong, right? Okay, that's a great catch. Any other, any other questions? No, no, no, so yeah, this is an important distinction, right? Thread five's priority is not changed, right? The way you would change threads five's priority is by nicing it to a different level, right? You would have to adjust it yourself. The reason is thread five may, so threads that start in priority two can run at priority two, priority one, and priority zero, but when I get through all the levels, right, which I will, right, and this, again, this scheduler is starvation free, so I will always process all the way to the bottom. Then what happens is everybody starts back at the level that they started in the first round, right? So thread five, and I actually think on this example, thread five does run again in priority one and then again in priority zero, right? And then, but then when I restart things, he's back in priority two, right? So yes, the higher priority you have, the more run levels you have access to, right? And that makes a certain amount of sense, right? Like, I'm boosting your priority and giving you more access to the CPU, right? That's a good question, any other questions? All right, so let's keep going through the example, all right, and again, I think that's wrong. I wish I could fix it. So thread one now runs, right? In this case, what might happen is he might block, right, well, he didn't block. Did he block? I think he blocked. Oh, actually, okay, so let's say that he yields, right? So he now goes back on the end of the priority two run queue, but I've docked his quota and the overall quota here, right? Now I run thread two, right? Thread two blocks, okay? When a thread blocks, it's not runnable, right? And so technically I should have had some of the sort of other area over here, right? But the idea is when it's restarted, it goes back to the same level that it came from, right? With the same quota. So thread two within this iteration still has, you know, three time units that it can run in priority two, right? But now it's not runnable, right? So now I'm gonna run thread one again. Let's say that thread one also blocks, right? So now what am I going to do, right? I finished to some degree one, I think this is called a minor rotation of the scheduler. So what's the next thing I need to do at this point? Which is the next thread I should run? Yeah, so now I'm gonna move down on my priority levels, right? I'm done with priority two, right? There's nothing that is available to run at priority two, right? And there's a couple reasons that I can move down, right? One is that priority two's quota may expire, right? So priority two has no quota left and I don't run anything at priority two. The other reason here is that despite the fact priority two has quota left, there are no runnable threads at priority two, right? So now here I'm gonna move down, I'm gonna run something from priority one, I'm gonna run thread three, right? So let's say it doesn't, thread three blocks, right? So I put it, you know, I mark it as being blocked and I keep going, right? And I run the next thread from priority one. That also runs and also blocks. And let's say, let's see here. So let's say thread six exhausts its quota. So where is thread six going to go? It's gonna go to priority zero and that's wrong, but it would receive a new quota in priority zero, right? So what's right here is this, what's wrong is this. This should still be 10, yeah, what? Well, if you exited, who cares, right? If you exited, then your quota is undefined, right? Because you're never gonna run again. If you block, I think you keep the quota that you had, right? So this scheduler does not necessarily descend monotonically through the levels, right? So for example, what could happen here that would cause me to go back to priority level two? Dan. Yeah, so if either thread one or thread two finishes its blocking operation, it becomes ready to run again, then I'm gonna go back to priority two, right? So I'm always running things from the top priority queue going down. Yeah, Jeremy. Thread nine or five would be moved up. So once I've finished moving through all the priority levels and there's nothing left to run at any priority level or all the quotas are exhausted, right? I think this is what's called a major rotation in the scheduler. At that point, all the threads will reset to their original levels and everybody's given a new quota and then I start over, right? So I'm not there yet, but yes, eventually thread five that started in priority two will be able to run at priority two again, right? So again, this is an important distinction, right? And this is the big difference between the multi-level feedback queues, right? These threads are allowed to run at different levels, but they're not being pushed down, right? They will still be lifted back up when the scheduler runs again, right? It can, yeah, and that's, you're right. That's kind of similar to what we're doing here, right? We're kind of almost rebalancing, but the difference is threads here don't ever move up, right? There's no way to move up during a rotation, right? I can only move down, right? I fall down the staircase, this is the idea, and then I get pulled up at the very beginning. There's a great New Yorker cartoon this week about slinkies going upstairs to spawn, you know? If anyone gets that, put me here, but it was pretty cute. Yeah, so people don't go upstairs, right? Slinkies don't go upstairs, they go downstairs, so. So people go downstairs during the epoch, and then at the end, the scheduler picks people up, puts them on the stair where they started, and then we get to do that again, right? Let's see, I wanna see what, okay, so this guy runs and he also blocks, and now, again, so now I've got thread, so in this case, right? This is a case where, even if I had set this right, so let's say I hadn't added five, so this should be 11 now, right? It turns out that thread five still gets a chance to run, right? And I'll try to correct these slides, and get them up online somehow, but it would turn out in this case, right, that thread five that started in quota at priority two would actually have a chance to run at priority one, right? And that's just because the threads are in front of it blocked, right? And so, priority one would still have 11 left of quota, this guy has five, and so now he can run, right? And it turns out he might exhaust his quota, so where is he going, right? So he's gonna go to zero, and again, I shouldn't have adjusted the quota, but the idea is he gets another chance to run in level zero with quota five, right? I think that's, well, happens now. Yeah, and then he runs, and now at this point, I don't know what happened to those threads. Oh, I think I just took them out because they exhausted their quota, right? So at this point, there's nothing left to run, so this is one of the cases where the schedule will reset and we'll start a new epoch, right? In this case, there's nothing left to run anywhere, right? And so what would happen is that the scheduler would take all the threads that have fallen off the bottom, put them back where they started, give everybody a new quota, and we start again, right? In this case, where did this guy come from? Oh, this is what they would look like after I reset the schedule, right? So these were the threads that were still able to run, right, they had all fallen off the bottom, I pulled them back up, put them where they started. So we'll go over this on lens because this is kind of a complicated example, and it's still going, I can't believe it. So let me just point out a few things before we finish today. So again, there's some nice, there's some really nice features of this scheduling approach, right? One is that it's very easy to model, right? And because of this simple fixed accounting, there's no black magic, there's no floating point numbers or anything, everything could be done in 01, and there are newer versions that use interleaving between levels to further minimize the lead, see, right? I won't talk about how this works, right? Again, here we come back to this feature of interactive tasks, right? So the design relies on the fact that interactive tasks by their nature sleep often, right? It says most fair scheduling designs end up penalizing these tasks by giving them less than their possible share because of the sleep and have to use the mechanisms of bonus in their priority to offset this based on the duration they sleep, right? So here we have none of this, you know, black magic, bonus-ing, right? And it's something that's based on interactivity sort of from the start, right? So I'll finish with this, right? So this is, you know, this is a cool new scheduler. It's very easy to prove some things about. There were some initial reports that said that it worked well. But again, I mean, in this sort of desktop community, it's very difficult to establish, right? Based on the benchmarking and the tools that they had, that this was a real big win, right? And so there was a certain number of people who were interested in using this, but at the end of the day, this scheduler never left Con Calivis' sort of private git tree. I mean, people were using it, they were cloning it to sort of get access to this, but it never made it into the main line. And in parallel, right? Maybe spur it on by this, but the Linux Scheduling Maintainer develops his own version of this called completely fair scheduling, right? And that version of course ends up in the main line, right? And to some degree, part of this, I don't know, I mean, these are both smart guys, right? So I don't want to say anyone copied off each other. CFS is actually fairly, it has some fairly different ideas in it, but it's very much based on some of this, right? A scheduler that's easier to understand and easier to make guarantees about, right? And at some point, Con just threw up his hands and said, I said, I'm done, you know? Like, I'm gonna spend more time with my family, right? This didn't, let me go through this, right? Yeah, because this is one of my favorite slides, right? So, yeah, so this is, so this I'll leave you guys to think about, right? Because this is kind of a fun story of someone who got very passionate about doing something that wasn't necessarily his job and got really involved in something and I think that was neat, right? So in 2009, Kalevis, I guess, got, you know, got bored and decided to come back into the Linux Scheduling Community. He returned with the brain fuck scheduler, which that's the name of it and so I'm gonna say it in class. And this is something that's aimed at low power, sort of embedded devices, right? It's a desktop-oriented scheduler. This is a number, this was sort of why he explained, this is how he explained the name, right? And I think the most important part is the last bits. But this is a nice feature. The other thing that, the other thing I wanna point out right here, right, is that another very useful thing that Khan Kalevis contributed that as far as I know, and I think Guru confirmed this, still hasn't been integrated into the mainline tree, is this idea of pluggable kernel schedulers, right? So maybe one scheduler doesn't fit every workload of the types of challenges that the Linux community is facing on all these different devices, right? Unfortunately, that pluggable kernel scheduler architecture isn't part of the mainline either, right? But that's sort of the direction that they're going. All right, so Wednesday we're gonna start virtual memory. We're done with CPUs, I'll see you Wednesday.