 Okay, welcome back everybody to CS162. We're going to pick up where we left off basically just before the midterm and we're gonna talk about scheduling. And so today we're gonna talk about a couple of things here continuing in our vein of scheduling case studies. We're gonna actually talk about some real schedulers. We'll talk a bit about real time and forward progress. So if you look, if you remember from last time we basically talked about what the descriptions look like inside the kernel when you open a file. And we had pointed at this file structure before. And what I wanted to point out last time was basically this FOP pointer which points to a set of operations. And what's interesting about these operations is the set of operations includes things like how to read, how to write, how to open, et cetera. And as a result, what it does is allows you to have that uniform interface of open, closed, read, and write for everything from files to pipes to et cetera. Okay, and so that FOP structure we talked about basically is why you're allowed to do that. The second thing we talked about was device drivers. And we looked at a typical life cycle here for an IO request. And what you see here is a request such as a reader write coming in from a user program executing a system call and then invoking potentially the operations that are in that FOP structure I mentioned previously. And so that request basically the first thing that happens for instance, for a file reader write is we ask whether it can be satisfied. And as we talked about is the reason it might be satisfied already is cause the data is in the cache. And when we get into file systems a little bit later in the term we will talk about that in great depth. But assuming that it can't be satisfied then it goes on to get it ready to talk to device driver. Okay, and we talked about device drivers as that piece of code that talks to the actual device and knows how to do things that are unique to the device. The reason I have this sending request to the device driver highlighted in red here is because if it turns out that the device is gonna take a long time this is the point at which you put the process to sleep or the thread depending on how things are set up. And then you're gonna trigger something else and you have to schedule at that point. So then we talked about how the top half of the device driver is that part that runs kind of on behalf of the process sets up the commands, et cetera. And things are put to sleep. At that point it sends a command to the hardware and then the hardware takes over but the thread the original thread that called the device driver is sleeping now. Okay, and so the hardware comes along and eventually does the access causes an interrupt to happen. And we end up in the bottom half of the device driver which is interrupt driven. So that top half we talked about a moment ago was the part that basically is running on behalf of processes coming from above. The bottom half is invoked by an interrupt at which point it figures out kind of which process is waiting potentially wakes it up transfers the data and the IO is completed. So again, one of the reasons that we particularly that I particularly talked about this again this time was somewhere between the IO subsystem and the top half of the device driver the process actually gets put to sleep and we'll invoke scheduling again. So that's basically the topic that we've been working on. This is a figure from either the first or second lecture where we kind of show the idea of the CPU executing some thread and eventually something happens like it's an IO request or a time slice expires. Okay, so we talked a lot about that last time when we talked about Ron Robin or perhaps we execute fork or we wait have to wait for an interrupt because maybe we do a signal operation we're waiting for somebody to respond to us. And then that's the point at which we have to ask this question how is the OS gonna pick the next thing to run? Okay, so we've got this ready queue and that's got a bunch of threads that are ready to go, which one? And that's the topic of scheduling. And so last time we talked a lot about classic scheduling algorithms and the basic idea there is deciding which threads are given access to resources from moment to moment. And the last lecture in this one and the next one we're really talking about CPU resources but I will let you know that scheduling can be applied to things like disk drive to who gets the most bandwidth, et cetera. And when we start talking about scheduling disks, et cetera then we'll move into IO at that point but for now we're talking about CPU and how does the scheduling get triggered? Well, it can get triggered by timer interrupts by other IO interrupts. It can be triggered whenever a thread voluntarily goes to sleep, like it's trying to do IO gets stuck in the top half of the device driver. And at that point is gonna read the disk while it triggers scheduling to figure out what runs next. So this scheduler can get run on all those circumstances in which it's time to take the current thread put it to sleep and pick another one. All right, so and then when we talked last time about policies for scheduling and we talked about three of them minimizing response time, maximizing throughput and fairness. So the thing about minimizing response time is that can be very important if you're talking about the response to user input like keyboards, et cetera. So things like time to echo a keystroke in the editor. We also talked about maximizing throughput. So maybe in the cloud where you have these really big compute jobs what's important there is to make sure that the hardware is used as maximally efficiently as possible. And so that's a case where you don't wanna context switch very often. You wanna run the machine at full speed, maximum cache utilization, et cetera. And there's two parts to maximizing throughput. Sort of one of them is as minimizing the overhead. For example, not context switching too much. And the other is using resources efficiently, CPU disk memory. And as you can imagine as we talked the minimizing a response time for users and the maximizing of throughput for compute are sometimes at odds with each other. And today we're gonna talk a bit about how to deal with the contrast there. The other thing that's always in the background here is fairness. And that's the question of how do you share CPU among users in some equitable way? And fairness here is not about minimizing average response time necessarily because better average response time actually makes the system less fair. Can anybody tell me why that is? Do you remember why getting better average response time makes the system less fair? Or anybody wanna take a stab at that? Okay, priorities has something to do with it. Yeah, so to minimize response time we're basically looking for those tasks that run very quickly and have a short burst time. And so if we're maximizing response time and we're doing a lot of context switching which means we're taking a little bit to the detriment of throughput. So here was an example that we showed here of round Robin. So that's the simplest thing we can do where we have a timer that goes off every so often that's called the quantum. And I showed you an example here of processes one, two, three, four on the ready queue. The burst time is the time from when it starts running to when it does some IO. And so P one has a burst time of 53, P two of eight, P three of 68, P four of 24. And we talked a lot about the fact that if we put this in a FIFO queue and ran it P one to complete it to the next IO operation P two to the end P three, P four this is gonna be very bad for response time because you could buy accident and end up with the longest tasks first. And then the short ones which are what the users are waiting for, don't get to run. And so what round Robin does is shown by this Gantt chart where if we have P one, two, three, four on the ready queue and we have a quantum of 20 units of time then P one runs for 20 units. And then we stop and put it at the end of the queue and then P two runs, well, it can only run for eight because it's only eight long. And then P three runs for 20 and P four runs for 20, et cetera. And so you can just sort of simulate this on your own given the quanta 20 in the burst times 53, eight, 68, 24 and you see what the results are. And then we can talk about the various waiting times the threads had to experience, okay? So P one or processes. So P one here had a total wait time of 72. So those are all the times where it's not running before it's done that it has to wait P two, P three, P four, et cetera we can compute those. We talked about the average waiting time here being 66 and a quarter and the average completion time being 104 and a half. And so this round Robin has a simple, it's simple, right? We cut the tasks off so they don't run for too long. And the pros of that is it's much better for short jobs than if we just run every job to completion but the context switching can start adding up. And so we did talk last time I encourage you to see that lecture if you were perhaps studying for the midterm but one of the things we have to do is we have to balance this rapid switching with the overhead, okay? And what we would like is in a typical system we talked about how the switching overhead is somewhere between 10 and 100 milliseconds or excuse me, the quanta is between 10 and 100 milliseconds and the switching time is like 0.1 millisecond. And so we're trying to keep things at under 1% overhead in that instance, okay? So that was round Robin. We also talked about an idealized thing about what if we knew the future? So the problem with round Robin that we noticed here is it still isn't the most responsive because for instance P2, which is the shortest job ideally would run first because perhaps that's a user that only needs a few cycles every keystroke, okay? But of course the problem with that is that, what? What's the biggest issue with putting the shortest run first, okay? So you guys are thinking too hard here. So the biggest problem is we don't know what's the short job, right? So the biggest issue here is we don't know the future, okay? But if we did, and you're right it does cause starvation if we always managed to get the first job the shortest job first over and over again we could starve out the long ones. But this biggest issue is the future. And so we talked about something where we mirror the best first come first serve or best FIFO by always putting the short job first, okay? And that's called shortest job first, okay? Which is run, whatever job has the least amount of computation to do or the shortest time to completion first STCF. There's an interrupting version of this called the shortest remaining time first which is a preemptive version where if a job arrives and it has a shorter time to completion than it gets to run. And we talked about this and basically you can apply this idea to the whole program or to the CPU burst. And the big effect is on short jobs so that the short jobs really get to run quickly and the long jobs mostly don't notice unless there's so many short jobs you get starvation. And so this is a great idealized scheduler if we could only do it. And the biggest problem was of course how do you know the future? So the pros and cons of SRTF are one, it's optimal from a response time standpoint. So if you're gonna measure any real scheduler against an optimal SRTF is a good one. It's very hard to predict the future. We talked about some options last time about things like moving averages and Coleman filters. Somebody brought up the idea of some sort of machine learning to try to figure out, predict how fast the jobs were and the obvious other thing is it's unfair because the short jobs get to run in preference to the long ones. And if you do that too much, you start. Okay, good. Were there any questions on this before we now move into some new material? I thought I'd make sure that I got everybody up to speed on what we did last time. Okay, so now how do we handle a simultaneous mix of different applications? So today's systems have a mix of user interaction and long running things. So your cell phone is busy dealing with your swipes and taps. Well, at the same time, it might be actually computing the data in the background from your latest exercise session figuring out kind of what sort of machines you were on. Okay, that's sort of machine learning kind of stuff. So that might be trying to run with full CPU while the other quick things need to have your, you know, respond to you quickly. And so this is an interesting thing, right? Because the different app, the different schedulers we talked about last time, some of them are ideal for throughput. So FIFO, where you just run everything to completion or till the next time it does IO is great for throughput. Not so good for responsiveness. So if we want a mix of interactive and high throughput apps, we have to figure out how to best schedule them. We have to figure out how to recognize one from the other and, you know, in this, you start asking the question, do you trust an application that always says it's interactive and use that to give it priority? Okay, that seems like it's gonna get abused, right? You're gonna end up with these apps coming off from the app store that always tell the cell phone that they're the most important app in the world. And of course, nobody's ever gonna get any other work done, right? And buried in this, of course, is the question of should you schedule the set of apps identically on servers, workstations, iPads, cell phones, is every platform the same? And you could imagine probably not. So here's this burst time graph, which I showed you last time. And if you remember what this measures, is this measures frequency of tasks with a given burst time. And where burst time is the time from when the thing starts running to when it does its next IO. And the reason we typically have a burst, a peak toward the low end is cause there's a lot of user interaction, interactivity. And so as a result, you tend to have a lot of really short tasks and then you have a long tail full of long tasks. And so maybe we might imagine that short bursts reflect interactivity, which reflects high priority somehow. And this in fact is the assumption encoded into many schedulers. Many of them decide that apps that sleep a lot and have short bursts must be interactive. Okay, and so they give them high priority. Things that compute a lot and don't have a short burst should get lower priority with the notion that somehow they're gonna notice it last because the short bursts are gonna get out of the way quickly. And that simple heuristic has been used a lot. It turns out it works pretty well but it's really hard to characterize apps for sure because you have these, the exception proves the rule. What about apps that sleep for a long time and then compute for a long time? Or what about apps that have to run under all circumstances like real time apps? We'll talk about that later in the lecture, okay? So, but let's look at a common structure that was used a lot in schedulers. It still is used in a number of them. This is called the multi-level feedback scheduler. And what we do is rather than having a single ready queue we have many, okay? This particular diagram shows you three. And the top queue is the highest priority and the low, the bottom queue is the lowest and things go in between. And we also vary things like the quantum, okay? And so the quantum here is, you know, this is how often we do run Robin. So we do run Robin quickly at the top more, you know, we don't break it up quite as quickly as we go in the middle and then we have FIFO or first come, first serve at the bottom. Now, there was a question in the chat here about, you know, is there a scheduler based on inference from machine learning models? Sure, people have tried all sorts of things. The trick with machine learning is you have to make sure that the time it takes to classify an application doesn't become overhead that completely swamps the advantages of your scheduler. Okay, so you have to make sure that whatever you do is fast. And machine learning isn't always fast. Machine learning is something that's done over time. And so now you start talking about some interesting trade-offs between how accurate you are versus how much time it takes. So this multi-level feedback scheduler is another method for exploiting past behavior. Okay, it was first used in the CTSS system. So this is a long time. And so I said multiple cues each with a different priority. Higher priority cues are often considered the foreground tasks. The lower priority at the bottom here are background and every cue has its own scheduling algorithm. And here's the trick. We start everybody out at the top and, you know, they're running with around Robin, Kwan have eight and if they run so long that that they get interrupted before they do IO, then we decide that maybe they have more computation and we move them to the next cue. And then they down in the next cue get run with quantum 16, it's slightly lower priority. And if they also exceed that, then we move them down into the FIFO cue. And the minute that we do some IO, we move them back up to the top. Okay, all right, everybody with me on that. And so long running compute tasks start at the top and they get demoted to low priority automatically and things that have a lot of short tasks tend to float to the top. Now, one thing to note about this is it kind of approximates SRTF because it's predicting the future of that task because the fast tasks tend to float to the top, the short bursts, I mean, and the ones with longer bursts tend to go to the bottom. And so this is a way of getting at SRTF when we can't perfectly predict the future, okay? Now, each cue has to have some scheduling done for it. So if we did fixed priority scheduling where the top one's the highest priority and then the next priority and the next one, then this is fine, except you could imagine starvation happening pretty easily here because if we keep having short tasks you might never get the long ones to run, okay? Now, the question here of does this mean that the long tasks down at the bottom have less context switching because they end up in a FIFO cue. So yes, they have less context switching amongst themselves but there's still the context switching and the cues above, okay? So the long running tasks are still gonna get somewhat interrupted when the short running ones run. Now, the problem with fixed priority scheduling is you can imagine the starvation issue. Another idea is that each cue gets a certain amount of CPU time starting from the top with a large amount and down to the bottom which has lowest. So you could maybe have 70% of the CPU to the top ones, 20 to the next, 10 to the next, okay? Now, if you're starting to get a little nervous about all of the heuristics here, heuristics being how many cues, what are the quanta, what fractions go of CPU go to each cue, you're right to be a little skeptical about that. In fact, there were schedulers, I'll talk about one a little bit later in the lecture that were set up along this lines and it turns out that the heuristics start getting so complicated that nobody really knows how they work or why. So that's a danger, right? So one other thing that's interesting here is this particular scheduling scheme is subject to a countermeasure by users, okay? So the countermeasure would be something that a user could do that's going to foil the intent of the OS designer. So for instance, in a multi-level feedback scenario, you put in a whole bunch of meaningless IO just to keep the jobs priority high. And of course, if everybody did this then none of the scheme doesn't work, right? And there's a famous example of this back in the early days of computers playing computers, there was an Othello contest where everybody brought their Othello playing games. Othello is a board game for those that you're not familiar with it. It's not just a Shakespearean character. And you play against the competitor. And so the key was you wanted as much CPU as you could get. And so at one point, the winning team found out if they just put a whole bunch of printfs in a tight loop they could get scheduled more often and have a lot more CPU time. All right. So there's an example of a malicious program exploiting the underlying scheduler. All right. Now, there is a real case of a schedule like this. By the way, I will say that there are many schedulers like this in the world. SunOS was notorious for having a very complex one of these. Linux had something called the O01 scheduler, okay? And it actually had 140 priorities. If you look here, the first 100 of them from zero to 99 were considered real-time priorities. And those are the highest, by the way, is zero. The lowest is on the right. And then the user tasks had another 40 priorities which were changed by the nice command. So 40 for user tasks, 100 for real-time or kernel tasks. The lower priority value here of zero was higher priority and the higher priority value was lower priority. So I realized that's confusing, but zero is high priority. And the key thing that made this O01 was it didn't matter how many tasks there were in the system, the computing that the scheduler did was always O01. So that seems like that ought to be a good thing. So you could imagine, we were talking about machine learning earlier, you could imagine that the more tasks you got, the more machine learning you were doing and maybe things wouldn't scale, you know, constant time, but rather might scale as the number of threads or something like that. And so you'd get very bad behavior as you added threads. So the great thing about the O01 scheduler was all of the internal scheduling data structures and so on were O01. So that seems like a good thing, okay? And time slices, that means quanta, priorities, interactivity credits are all computed when the job finishes the time slice. I'll say a little bit about what that means in a moment, but you could imagine if I've got 40 possible user tasks, we'll ignore the real-time ones for a moment, user task priorities, then I might want to try to deal with interactivity by moving things that had short bursts to higher priority temporarily as long as they had short bursts. So that's where the heuristics start coming into play, okay? And the way that this ended up being O01 was there's two completely separate priority queues for the ready queue, one called active and one called expired. And all tasks in the active queue would run until their time slice expired and then they'd get placed on the expired queue and you'd go through and everybody would get to run and then you'd swap them, okay? And so it ended up being O01 as a result and the time slice depend on priority, linearly mapped. So things with higher priority got to run longer than things with lower priority, okay? So this is very similar to a multi-level queue. In fact, it is a multi-level queue kind of in disguise here because every, we have 140 levels here, okay? And the decision about how you move something back and forth between queues is where the heuristics come into play, okay? Now, here's another look at the O01 scheduler. Basically you have the expired and the active queue with a bunch of priorities. The priorities, basically you run each task on the highest priority and then when you're done with it, you swap it over to the expired queue. So in other words, when the quanta, now when you're done with the task, excuse me, when the quanta expired, you flip it over, you keep going until there's nothing left and then you swap the two. And the thing that made this complicated was not what I just described to you, what made it complicated was all of the heuristics to boost the priority of IO bound tasks up and down or to boost the priority of starved tasks from the low priorities up in order to make sure that somehow all users of the scheduler were happy, okay? So heuristics would take every process or thread and make a decision about move it down in priority, up in priority based on its past behavior. And these heuristics were very complicated. So the heuristics are interesting to at least talk about. So the user task priority got adjusted plus or minus five based on heuristics involving how long it's been sleeping versus how long it's been running. And the higher sleep average here meant it was a more IO bound task. You got more reward, you got to raise your priority. There was something called an interactive credit which was earned when the task sleeps for a very long time and suspended when the task ran for a very long time. And the interactive credit provided some hysteresis to avoid changing the priorities too frequently. And things that are interactive got some special dispensations. So if it really figured out something was interactive then it would even not do that run to the first quanta and switch over to the expired but it would get you, you get a chance to run for a little bit and switch over. Hopefully you're starting to see that this is complicated, right? The cool, the clean thing was the real-time tasks. So those 100 tasks in the middle or excuse me on the high end were always run at their priorities. They always preempted the non real-time tasks. There's no dynamic adjustment and some very well-defined schemes. So either FIFO where you ran to completion or round robin where you ran with a fixed quanta to completion. So the real-time priorities were nice and clean and predictable but it was a strict priority scheduler. The heuristics were complicated, okay? So I will, sorry, I will tell you the end of the story here is basically this got so complicated that a bunch of maintainers of Linux basically decided that they were tired of it because the heuristics got too complicated for anybody to understand their exact behavior and eventually Linus and a few others basically throughout 01 and came in with CFS which we'll talk about for the later part of the lecture. But it's interesting to note the dilemma that a scheduler designer is in. So if you're the core developer of some operating system that's used by a whole bunch of people and they have relied on the behavior of your scheduler and its heuristics. And however, somebody isn't quite happy so you need to change something. You don't wanna change the heuristics too much because now everybody else is gonna be unhappy. And so you start making little tweaks and you get this complicated decision tree if this and that and that change this by a little bit and then make this decision and things rapidly get out of hand. And at one point in the 2.6 currently just gave up and threw up their hands and tossed out the 01 scheduler. Even though the scheduler itself is extremely efficient as number of tasks grow, it's just too complicated to understand and it starts doing weird things that nobody knows why and it's not easy to make it work well. Okay, questions. So the end of this story, by the way, is that 01 doesn't exist anymore. Well, it exists but nobody uses it. So, Administrivia. So we're still grading midterm one. I think it was a pretty reasonable difficulty. It might have been a little bit on the hard side but we'll know more when we get things up. We had some people that had some issues with the Zoom recordings. So we'll probably look extra carefully at people that missed recordings but may give a pass for not having them this time. But you might wanna practice getting the Zoom portion that's set up just so that it works smoothly with midterm two. It seemed like when we finally settled on the actual Zoom proctoring that we did that people mostly were okay with it and it mostly worked. So that was a good thing. There was a little bit of a discussion and I just wanted to say something to make sure everybody knows this but yes, we are allowed to Zoom proctor midterms as well as finals. So the CS department, a CS half of the department, excuse me was actually given permission to proctor midterms for select courses in addition to finals. And so CS 162 is authorized, we requested and we were authorized to do that just so if you have other folks that are still wondering about that, we do have that authorization. And I think it worked pretty well. I know that people got a little nervous about it. Don't be nervous if everything worked out, you'll be fine but I'm hoping it gave people a little bit of a sense that they could just do the exam normally without considering that cheating was a requirement. You should let us know, we're actually gonna put out a survey soon just to know how people think the classes go on and midterms and projects and everything because this is a very hard term obviously being all virtual. We're hoping, so the bins are gonna pop up before the grading is done. I haven't looked at any of the grades yet. The bins are essentially slight tweaks off of what was in the summer but I hadn't gotten them up yet. So I apologize for that but we really are setting the bins and independent of the grading. So the problem with noise canceling headphones is really the issue with not knowing who's listening to what. So there's a little bit of a challenge on that. We will try to figure out things about that as we go. All right. And there may be some way that we could handle that but for now no headphones but maybe we can work something out. Let's put that on the list to figure out, okay? All right, let's see. Yeah, the question about when we'll be done grading we hope to be done certainly later this week. As you can imagine, things get a little trickier in this format for grading and so on. So we're working on it and as soon as we can we'll get them out. We won't make you wait too long, I promise. So now back to non midterm stuff. So group evaluations are coming out for project one soon, later because project one is almost done. And the way this works is you get to evaluate your partners for how well they're interacting in your group, okay? And so every one of you gets to get 20 points for every other partner, not yourself and you get to distribute it to your partners in any way you want. So if there's a four person group that means you get 60 points because there's three other partners and you get to distribute it to the other partners, okay? No points to yourself. And this is one of many evaluation techniques. This is not the only one but this is one of many evaluation techniques that we use to understand kind of how partners are working with each other. The other one of course is what's your TAs understand about your project dynamics or what you've talked to any of us about them, okay? But in principle, if a partner really isn't participating at all in the extreme cases almost never happens but it can, all of the missing partner's points could be redistributed to their partners if that other partner's not doing anything, okay? You could think of this as almost a zero sum game in that point. And the reason we do this and we've done this in 162 forever is that really this is a project course and you're supposed to be working with your partners and relying on each other, okay? And so this is a way of us understanding how you're doing. And one of the reasons I'm bringing this up is there are a couple of folks in the class that have essentially dropped off the earth. I think they were kidnapped by aliens. I'm not entirely sure but if you're one of those people and you're hearing this broadcast out on Mars please come back and start working with your group again, okay? Respond to email, respond to your TA's respond to your other partners, okay? All right, come back from Mars. Now, you might wanna make sure that your TA understands any group issues you might be having. I'm happy to meet with groups that wanna do a bit of fine tuning on their interactions but let's figure this out now the project one is essentially done how to get a fine tuned happy group. And to that aspect, we're going to start with the group coffee hours I promised at the beginning of the term and one of the TA's Akshott's gonna be posting how to do this a little bit later in the week but the idea is you can get extra credit points for screenshots of you and your team with cameras turned on interacting and holding up your favorite beverage of choice. And this is just, it's a gimmick but on the other hand, it's a reminder that you ought to be interacting with your group with your cameras turned on just to get things working. If you're dealing with extreme kind of group issues, it starts with actually seeing the other members and talking to them, okay? Texting, tweeting, Slack, pick your favorite your favorite communication technology that doesn't involve video. These are all fine and they have their place but they can't be the exclusive way that you interact because things are just not gonna go well, all right? And look, if we were in real life instead of virtually you would be meeting with your team all the time. So let's see if we can get the groups working well again, okay? You don't have to be holding a beverage. You could be pretending to hold a beverage if you like, glass of water works, cup of coffee, whatever. All right? Okay, and don't forget to turn the camera on for discussion sessions. Okay, all righty. Now, I think that's all I wanted to say to administer via wise. So we'll get the final grades of the exam out. I think things went fairly smoothly, so that's good. Okay, so does the OS schedule processes or threads? Well, many textbooks use the old model which is one thread per process. As we've already talked about, oh and by the way, look, if you really can't use a camera for some reason, talk to us. But I think you can okay that with us, but I would really like you guys to try to interact in whatever way works. So does the OS schedule processes or threads? So many textbooks, as I said, use the old model one thread per process, all right? And this was the case for decades. And then threads, the advantages of threads started becoming obvious. So you want a single protection domain with lots of concurrency in it. The only way to do that is many threads per process. And the way that this started was it started with user level threads being scheduled on top of a single kernel thread. And then that got moved into the kernel to some extent, which is where we are now with things like Linux, okay? So usually the scheduling is on a per thread basis, not a per process basis. The only way we might, the only reason we might think about processes is really if we were interested in understanding some sort of fairness, which said that each process gets a fraction of the CPU and then we divide it up per thread. That's a policy. But the way that would actually be implemented today is you divide the CPU up per threads based on that policy and then the threads of the things that are scheduled because the threads are the things that are being switched out inside the kernel, okay? So one point to notice is that switching threads versus switching processes does incur slightly different costs. So if you can really know that you're switching from thread A to thread B in the same process, the overhead is lower because in switching threads, you really only have to save and restore registers, whereas in switching processes, you're actually changing the active address space as well, which can get a little expensive and certainly disrupts caching, okay? And I think I showed you that there can be a factor of 40 difference in Linux for these two things, okay? Now I will toss out there just to tie together the beginning of the class, which is that simultaneous multi-threading or hyperthreading is available on some CPUs. And remember that the idea there is that different threads are interleaved on a cycle by cycle basis on the same CPU, okay? And that's got some magic that we talk about if you took 152 for instance, but in those instances, the different threads can have, each have different pointers to their page tables, which means they can each be in different processes and it would still switch them on a cycle by cycle basis. So if you have hyperthreading, you might get really fast switching, but in general, if you're switching from one thread to another on a CPU and you have to switch the address space, that's more expensive, okay? Now, what about multi-core or even multi-process where you have a bunch of multi-core chips that are tied together into a big shared memory machine? Okay, so algorithmically, sorry, one moment here. There's not a huge difference from single-core scheduling except that there's a bunch of simultaneous things that can be running, okay? And so now you have the choice if I have a big pot of potential threads on my ready queue, which group of them do I have running at the same time? All right, so it's helpful in some sense to have a per-core scheduling data structure, okay? For among other things, cache coherence. So if each core typically has a first and a second level cache in today's processors, and so if you have a thread that ran on core one and then was put to sleep and then went back to core one, you're gonna have some cache state that it can use, whereas if you always schedule the thread on a different core, you don't have the advantage of the cache, okay? And so there's something called affinity scheduling, which most good operating systems have, which basically says that once a thread is scheduled on a CPU, the OS tries to reschedule it on the same CPU to reuse cache and reuse other CPU local storage and resources like branch predictions, another good one. But of course, if there's 20 idle cores and one busy core, there is gonna be a point at which affinity scheduling is traded off against parallelism and probably the choice will be made to migrate the thread at some point. But we have to start thinking about these issues, okay? And here's an interesting thing that we kind of brought up when I showed you a test and set but I wanna re-emphasize it, which is remember the idea of a spin lock. So this was the thing not to do with test and set, I told you, okay? That was the way you do an acquire of a lock is you run test and set on the address of the value until you eventually get back zero. And the reason for that is the test and set, if you remember, grabs the value, stores a one, returns the value. And if you set the value to zero, it means the lock is free. And even if you have a thousand threads that all simultaneously do test and set because it's an atomic operation, only the one of them ever gets the zero and all the others get one. And so that one that gets the zero gets to exit the while and now they're in the critical section and the way you release the lock is you set value to zero, okay? So spin lock doesn't put the calling thread to sleep, it just busy waits, which Kubi said is a bad thing, right? Busy waiting is bad. Well, okay, I'm gonna tell you one instance where it may not be bad, don't do this at home, folks. And when is this preferable? Well, that might be preferable if you've got a set of threads running simultaneously on the same task and they're waiting at a barrier for each of the threads to finish, okay? So let's give it an explicit example. There are 20 threads and what they're gonna do is they're gonna run in parallel for a while and then they're all gonna wait until they're done before they continue, just like a join. And in that instance, you wanna have a spinning, something like a test and set that's gonna wait spinning until the last of the threads are done and then it releases quickly. And the reason that can release quickly is because basically we don't have to reload this off of some ready queue or some wait queue, reload all the registers out of the TCB and so on. We don't even have to dive in the kernel necessarily. So if we know that the set of threads that are spinning are all part of the same task, then this could be okay, okay? Because it would wake up very quickly. So every test and set is a right, unfortunately. So anyway, so I wanna stall at that for a second. So this could be preferable if you've got a multi-processor program with some simultaneously scheduled threads that are all spinning and waiting for each other because that in that instance, it's okay to spin because you're all part of the same task. Now you gotta be careful not to do this for too long because you'll end up wasting cycles if you do it incorrectly. Now, how would you know they're waiting for each other? So this is a question, good. The reason you'd know is because you've written a multi-processor program that you know has a barrier, you know that every thread comes to a single point and waits for all the others to run to get to that point and then it continues and you ask the operating system to schedule you all simultaneously. So all of the cores are all working on the same thing, then this might be okay. So if you're trying to optimize a parallel program, for instance, you might use spin locks, okay? And the limit of how many threads you could have would be the number of cores, exactly. You have to be very careful about doing this. Now back when I was building multi-processors in a while ago, we actually had a variant of this which was called two competitive. What that meant was you'd spin until the time that you've wasted spinning is exactly equal to the time it would take to put you to sleep. And at that point, you'd go to sleep. And so in the best case where you're only waiting very briefly, people would spin a bit and then they'd exit. If something screws up, like interrupts happen or you don't have enough things scheduled, then you would go to sleep after spinning for a while. And this is too competitive called number two competitive because in the worst case, you'd never waste more than twice what you'd waste having gone to sleep right away, all right? Now, of course, the problem with this spin lock is actually test and set is a right, if you think about it. Why is that a right operation? Well, it's a read followed by a right. And in cache coherence, a right is a bad thing because it validates all the other copies and then gets a copy in your cache before it does the right. And so if you've got every core is all doing while test and set, then that poor lock is bouncing all over the place and you're using up a whole bunch of memory bandwidth. And so if you really, what you really want is test and test and set, which we showed you this in lecture seven, you can see where what you do is you say while value and you spin here and that's a read. And so everybody gets the ones into their caches and they're just spinning locally. And then as soon as that goes to zero, then you do test and set to grab it. And so you're vastly speeding things up as a result. Okay, so now when multiple threads are working together, like I just said, then the only way that this works well is if they're all scheduled at the same time, that's called gang scheduling. And so there's a lot of gang scheduling operations that kernels offer, which is making spin waiting more efficient. Okay, because it's really inefficient to spin wait for a thread that's suspended and sleeping on the wait queue because now you're really wasting time. Okay, and there's some alternatives where the OS informs a parallel program, how many processors its threads are scheduled on called scheduler activations. And there the application adapts to the number of cores it's scheduled. And so you get kind of the best of both worlds. You only have as many threads as you have cores currently scheduled. Now, let's talk about real time scheduling. So what we've been talking up to now is about scheduling that either optimizes response time or it optimizes throughput, right? Or maybe some sort of fairness, which is some combination of them. Real time scheduling has a different goal. It's far more important in real time scheduling for predictability of performance. So a typical real time task might be something like the brakes on a car where the time from when you slam on the brake to when the brake pads start slowing you down, there's a limit to that. We wanna make sure it happens predictably and quickly. Otherwise, maybe I've slammed on the brakes and I end up hitting something, okay? So in real time scheduling, our goals are different. It's about predictability and meeting deadlines. And here we need to predict with confidence. For instance, what's the worst case response time of the system, not how to optimize response time. See, that's a different thing, okay? And so a real time system performance guarantees are often task or class centric and they're figured out in advance. And I'll show you how we do that. But the simple example of that would be the time between when I slam my brakes on and when the brakes start working, there's a deadline there that no matter what the scheduling of the system is, we hope that that deadline's never exceeded, okay? So in contrast, in a conventional system performance is a system or throughput oriented. It's kind of a wait and see. We'll try to run everything we can at the best speed we can, whereas real time is about enforcing predictability. So hard real time, which is for time critical safety oriented systems like brakes, the idea there is you're gonna meet all the deadlines if possible, determined in advance if this is possible. And there's some good schedulers. We'll talk about one called earliest deadline first, EDF, but there's also things like leaks, laxity first, rate monotonic scheduling, deadline monotonic scheduling, et cetera. Soft real time is like hard real time, but softer. And it's used for things like multimedia where we're gonna try to meet all the deadlines. So in the case of a video, you're gonna try to make sure that every frame comes up at the right time. But if you miss a video frame, it's not the end of the world, okay? Whereas if you miss something in hard real time, your car runs into a wall, okay? And so we're gonna try to meet the deadlines with high probability. And this is something like a constant bandwidth server is a good example there, okay? So we're gonna take a really brief break and we'll be right back here. So let's see if we can define this real time scheduling from a little bit more succinctly here. So in a typical real time scenario, tasks are preemptible, they're independent and they have arbitrary arrival or release times. Tasks typically have deadlines and known computation times. And here's an example setup, okay? So if you take a look here, we have threads one, two, three and four. And in this instance here, let's look at thread one. So there's a release or arrival time, that's the up arrow. There's some computation, which is represented in gray here, all right? And then there's a deadline, which is the point at which the real time schedule has to have completed the computation. So although I show you here that all of the computation happens right at the beginning, it could be spread anywhere between when the task arrives and when the deadline is, and that would be fine as long as it's completed by the deadline, okay? And T2 here sort of has an arrival at this point that's earlier than T1 and a deadline that's later, et cetera. And the key thing, in addition to those kind of key parameters, when does it arrive, what's the computation, what's the deadline? Notice here that since we have overlapped computation, this is not a possible scheduler result if we only have one core, okay? Because notice that we would have to have multiple things executing at the same time in order for this to happen this way, okay? Is everybody with me on this model? Okay, questions? So if this doesn't work, what could we do? Well, we could try running around Robin scheduler, okay? So notice by the way that T4 arrives first and then T3 and then T2 and then T1. And so if you notice here, so T4 runs and we have some quanta that we come up with and so T4 gets the first quanta and then there's nothing to run, so it gets to run again, but now T3 is there. And so this might be a round Robin schedule of that previous set of threads. And what happens is we hit a point at which we haven't finished all the computation, but the deadline shows up. So in this scheduler instance here, round Robin doesn't work and your car runs into the wall. So this seems unfortunate, okay? And what's the problem here? Well, the problem is that round Robin has no notion of deadlines. I mean, it wasn't designed for deadlines, it was designed for multiplexing, okay? And so the requirements for a scheduler for deadlines, it's fundamentally different from the requirements for multiplexing, okay? Now, one of the most common and, you know, I'll call it famous schedulers, it's called earliest deadline first. And in this instance, there's typically, our threads are actually periodic. So they have period P and computation C in each period. Okay, and so the idea will be that if we go back here in what we mean by periodic is that the thread will have an arrival that will happen over and over again. The computation will always be the same and that arrival will be right at the deadline spot. So you can imagine that we have another thread, another thread, another thread, and they keep getting reintroduced regularly and the parameters are the period for how often it is or how long the time between arrival and deadline and the computation. And the trick is, can we schedule this in a way such that we don't miss any deadlines, okay? And so every task has a priority based on how close the absolute deadline is, kind of makes sense, right? So as if you take the set of threads that are currently ready to schedule on the ready queue and you say which one of these is closest to the deadline, closest to its deadline, what I'm gonna do is I'm gonna let it run, okay? So whoever is closest to its deadline is the one that gets to run. So this is a type of priority, okay? Scheduling where the priority is based on closest, based on proximity basically to the deadline. That's why it's called earliest deadline first, okay? And so here's an instance where thread one basically has a period of four and a computation of one. Thread two has a period of five and a computation of two. Thread three has a period of seven and a computation of two. And so if you notice thread one obviously is happening more frequently. Thread two is happening a little less frequently and thread three is the least frequent. And now let's run this, okay? And let's assume that everybody arrives at time zero, okay? And if they all arrive at time zero, then for instance, four time periods later, we know that thread one has its deadline and arrives again, et cetera. And so if we look here from time zero, which one has the closest deadline? Well, clearly thread one's deadline is closest. So we let thread one run, okay? And it runs its one computation, okay? And at that point, its deadline's done. Thread two is now the next closest deadline. So it gets to run with its two units of computation. And then last but not least, thread three gets to run and it runs its three pieces of computation. Now, as I told you, this is periodic. So in fact, after this arrival, thread one will have a new arrival. Thread two will have a new arrival. Thread three will have a new arrival. And we can start looking at the scheduling. I'm not gonna go over this in detail. But what we're doing here is, we're saying that at any point in time, the thread whose deadline is closest is the one that gets to run, okay? Questions? Now, what we would find here is that assuming that we have been careful not to overload the system, we will always meet deadlines, okay? And notice the requirement, by the way, is preemption has to be a possibility. So if you were to run these for long enough, you would find that eventually some of them get interrupted where they compute for a little while, then something higher priority runs, and then they compute afterwards. As long as your tasks can be preempted, then EDF is the best way of handling this particular scheduling requirement, okay? And how do we know this? Well, even EDF won't work if you put too many tasks here, right? If you fill this up with so much computation that you're using more than a hundred percent of one CPU, then you're not gonna be able to schedule it, okay? Now, the question about how do tasks submit their periodicity to the scheduler, they would actually say, here's my thread and here's my periodicity and here's my computation. They would actually input that, okay? So this is not just an idealistic scenario, this is a real scenario. The thing that you're probably wondering, which is a very good thing, but let's assume you are, how do I know what C is? And if you were to go into the real-time literature, there's a lot of work that's been done on how do you compute the worst case time for a computation? And that's what we're calling C here. And so there's a lot of work in both having the compiler compute what C is plus building processors that are more predictable than regular ones. You might imagine, for instance, that for instance, you might imagine, for instance, that the cache actually gets in the way of predictability. And so some people who are designing real-time processors actually completely disable the cache. Okay? And this is, yeah, that's right. This is only caring about deadline not deadline minus computation time because by my problem statement, as long as we get all the computation in before the deadline, we're good. Okay? Now, even EDF won't work, but it turns out EDF is optimal in the following sense. If you take the amount of computation divided by, this should be period or whatever, divided by the period, and you sum all those up, what you see is that that's less than one, okay? And let me just give you a very simple intuition of why that makes sense. The idea here is that if I take the fact that there's one unit every four, one divided by four, as I'm using up one quarter of the CPU, two out of five is another 20% and so on. And so another 40%, excuse me. And so if I were to add up all those percentages and they came out to more than one, then I would realize there's absolutely no way to schedule this, okay? And EDF basically is optimal in that you can use 100% of the CPU here if you ignore the switching overheads. Now, how do we ensure progress? So starvation is a situation where thread fails to make progress and starvation is not deadlock. So next time we're gonna talk about deadlock because starvation is something that could resolve under the right circumstances where deadlocks are unresolvable but starvation still can be bad, okay? And there can be causes of starvation like the scheduling policy never runs a particular thread or threads wait for each other and are spinning in a way that'll never be resolved, okay? But isn't a cyclic deadlock. Now, by the way, deadlock is a type of starvation. Not all starvations are deadlocks, okay? So let's see a little bit about what kind of starvations we could have. So here's a straw man which is a non-work and serving scheduler. So you have to know what work and serving means. This is a scheduler that basically does not leave the CPU idle when there's work to do, okay? And so a non-work and serving scheduler could trivially lead to starvation if for instance, it doesn't schedule something, right? Maybe there's a bug in your scheduler but let's assume that everything's work and serving. So here's a different one that is work and serving but still could lead to starvation which is last come first serve. So this is a LIFO stack. And the idea is that the late arrivals are put on the top of the stack and they get first service. The early ones end up waiting. So this is extremely unfair. And in the worst case, if tasks keep arriving the original ones never run, all right? That's when the arrival rate exceeds the service rate. We'll talk more about queuing as we get later in the term. But this is a queue that if it builds up faster than it drains, then the things that were early on will never get to run. Now, if we had FIFO instead of LIFO the queue can also build up but at least there we're servicing the oldest things first but you can still have a queue where things arrive too fast and you're not servicing them, okay? So what does it mean for the CPU to be idle? So what it means for the CPU to be idle would be a situation in which it's not actually doing any useful users work instead it's spinning or it's basically in the idle thread and not running things that are ready to run. So that would be idle, okay? So we wanna, if things are schedulable, they can run then we wanna make sure we always run, okay? Now, what about first come first serve? So we showed you this idea last lecture where what's happening is we have things are arriving that's these colored threads and then they get scheduled in the same order they came but notice that this red one is very long and so while it's running all these other ones are building up and then when it finishes the other ones get to go back in order, FIFO order and this leads to starvation because if a thread never yields it goes into an infinite loop or something then other tasks never run. So this is the problem with all of the non-preemptive schedulers is that if you have a buggy task or a non-social task, let's say one that's being antisocial then you basically get starvation and all of the early personal operating systems on personal computers had this problem. So I mentioned that the first lecture things like macOS and Windows 3.1, et cetera had this problem, okay? So what about round robin? Well, the nice thing about round robin is that you always go through every task. So each of the end processes get one over nth of the CPU in the worst case and so with a quantum of length q milliseconds a processor, a process waits at most n minus one times q to run again and so a process can't be kept waiting indefinitely so this doesn't lead to starvation. So it's fair in terms of waiting time not necessarily in terms of throughput because we're varying sizes of tasks based on their requirements and so we don't necessarily guarantee everybody gets the same throughput. Okay, but what about priority scheduling? We also talked about that. So if you recall, a priority scheduler always runs the thread with the highest priority. So in this case on priority three has job one, two and three and it's gonna run everything maybe round robin, job one, two and three and then finally when that's done it'll go down to job four which is priority two and if one, two, three and four are gone then it'll get around to five, six and seven. So here's a case where we're clearly gonna starve if we keep putting high priority tasks in there faster than they can finish then the low priority ones never get to run. But there's a lot more serious problem even than starvation here, okay called priority inversion where high priority threads might become starved by low priority threads under the wrong circumstances and you're about to start the next lab and project number two is basically gonna start looking at scheduling and you're gonna need to address the following problem. So let's talk about priority inversion. So here's a priority inversion situation where the low priority task job one acquires a lock. Okay, now let's suppose that it acquired that lock and then jobs two and three showed up or suddenly became runnable or whatever the case may be there could be many reasons why job one was running with two and three suspended. I'll take care of whatever it is but now in this scenario job one has the lock but two and three are higher priority and so now maybe job three starts running, okay. So now what happens if job three tries to acquire the lock? All right, so job three tries to acquire the lock held by job one and it can't and so now job three has to go to sleep and it's blocked on a choir. So already just take a look at this picture here. We have a scenario where the highest priority task in the system, job three can't run and it's being blocked by job one at least, okay. So this is an inversion of priority which is problematic at best. Now, if job two weren't in the picture the fact that job three's blocked means that job one might be able to complete running until it released the lock and then job three would wake up right away and we'd be good to go but the mere act of job two being here is a problem. Okay, because if job two is busy running it could run for a very long time and in that sense, if job two runs for a very long time now job one doesn't run and as a result job three doesn't run and so you could, doesn't run and so you could say that job two is actually holding up job three, okay. And there's a priority inversion that may not resolve quickly because job two may run for a long time. So this particular situation is one in which the priorities that were designed by the designer of your task that's running here have been subverted by this priority inversion. So, whatever was the reason for you putting job three at highest priority and job one at lowest is not happening right now because job two which is supposed to be in the middle of them is basically screwing this all up. So, what can we do? Well, clearly what we need to do is we need to somehow get job one to run long enough to release the lock so the job three can run, okay. All right, so what do we do? So the medium priority task is busy starving the high priority one. Anybody think of what we do? Okay, signal, well, maybe you have a signal here but that would require more programming. That might not be what we want. Yeah, give the task one more priority, okay. Good, or priority donation, all right. And we'll show you how to deal with that but when else my priority lead to starvation or live lock, lots of cases where you might have a high priority task spinning waiting on a lock and a low priority one needs to release it. And so this high priority one is running but it's not running successfully because it's been waiting. So that's another type of inversion where the thing looks like it's running but it's not doing any real work. And so yes, priority donation. So the trick here is a job three temporarily grants job one its priority. So the job one gets to run at high priority long enough to release the lock, okay. So really what we're doing here is job one gets this temporary boost in priority long enough to release the lock, okay. And how did that happen? Well, job three donated its priority to job one or sometimes this is called priority inheritance. That's another term for this, okay. All right, and at some time job one releases and at that point job one's priority goes back to low priority but the lock's been released so job three can run, okay. And this is the point at which we go forward now the question might be how does the scheduler know? Okay. The scheduler knows because it's paying attention to this donation that's going on. Now the question is why is job two running before job three if job three has higher priority? Well, if you go back to this scenario here the problem is that job three can't run because it's sleeping on the acquire for the lock. So job three is not running it's sleeping and job two gets to run because it's runnable and job one is runnable but it doesn't get to run because job two has higher priority, okay. Hopefully that answered that question. So this is a scenario where the reason job three isn't running is because it's actually tried to do an acquire and it went to sleep, okay. Now you get to actually do priority donation in project number two. Now, this is not a theoretical problem, okay. So you may have all heard of the Martian Pathfinder rover. So July 4th, 1997, Pathfinder rover landed on Mars, okay. And this was the first US Mars landing since Viking in 1976, it's the first rover. What's very cool is you guys should all check this out is the way they delivered this rover to the surface was when the Martian Pathfinder spacecraft got into orbit it dumped a whole bunch of balloons that were wrapped around the rover. And so the rover was inside this multi balloon bubble thing and it actually fell to the surface and bounced until it stopped bouncing and then they deflated the balloons and the spacecraft and the rover made it there safely. So that's pretty amusing but that's not part of our story today. The story is that once this thing started, whoops, once this thing started working, it was great. It was sending back pictures, everything was great. And then a few days into the mission multiple system resets occurred over and over again and the system would reboot randomly losing valuable time in progress and the problem was priority inversion. So there's a low priority task that's collecting data and it grabbed a lock as part of an IPC task. And then what happened is the priority one thing just kept running, there was a bunch of random stuff going on and priority two wasn't able to run because it was trying to grab the lock. And so this was an actual scenario where the lock had to do with the buses and communication where since forward progress wasn't being made there was actually a watchdog timer that went off and kept rebooting the machine, which was a good thing because it meant that it rebooted it into a safe state that then could be examined and patched and they were able to reproduce the problem after a number of weeks down on earth and then they sent up a patch and fixed it. So the funny thing about this perhaps is the solution was priority donation. That's easy, right? That's your project too. The thing that is perhaps even more amusing or not for them was that they had turned priority donation off, okay? So they had actually the VxWorks had priority donation they turned it off because they wanted to make sure things were fast and they were worried about the performance implications of priority donation. And as a result, they ended up with priority inversion that basically broke stuff. So there's your story for the night, okay? Now, I think actually up on the resources page I have an analysis by one of the engineers that talks about this particular priority inversion problem. It's a real thing, so. Now, are the SRTF or multi-level feedback queues prone to starvation? Yeah, well, an SRTF obviously long jobs are starved in favor of short ones. MLF queue is an approximation to SRTF so it suffers from the same problem. And so yeah, so we can get starvation out of this just by having a lot of short bursty tasks running. Priorities seem like they're at the root of all these problems because even in this instance we have queues that are higher priority than others, okay? And so we're always preferring to give the CPU to a prioritized job and non-prioritized jobs may never get to run. But priorities were kind of a means to an end here. Our end goal was to serve a mix of CPU bound, IO bound, and interactive jobs. Well, give the IO bound ones enough CPU to issue their next operation and wait. Give the interactive ones enough CPU to respond to input and wait and let the long running ones grind away on all the rest of the CPU. So priorities were really a means to get at the kind of scheduling we wanted. And if you remember, you know, this is kind of we're living in a changing landscape here, right? This is the Bell's Law curve of computers per person, you know, and back in the day and, you know, 60s and what have you, you know, there might be one computer and a million people and now we might have thousands of computers per person. And so we're in a very different landscape. And so the question might even be our yesterday's scheduler is the right thing. So priority-based scheduling was rooted in time-sharing, allocating precious limited resources to a diverse workload. 80s brought personal computers, workstations, servers, et cetera, different machines of different types for different purposes. And it's a shift to fairness and avoiding extremes like starvation rather than maximal use of precious resources. Instead, we want to use resources in a way that meets our requirements, okay? And so that's a little different. And with the emergence of the web, you know, the data center is the computer, personal computers. I mean, you guys are all walking around with a cell phone that's extremely powerful. It's all about predictability now, okay? And so does prioritizing some jobs starve those that aren't prioritized? That's a question, all right? And if you give me a few more minutes before we end up here, I realize I'm running a tiny bit late, but proportional share scheduling is an idea where we're gonna hand out proportions of the CPU, okay? So the policies we've studied so far is always prefer to give the CPU to a prioritized job, non-prioritized ones never get to run. Instead, we could share the CPU proportionally, give each job a share of the CPU according to its priority, so that low priority jobs get a little bit less of the CPU than high priority jobs, but everybody gets to run. And if you recall from last time, we talked about lottery scheduling where every job got some number of lottery tickets. And then what would happen is to, whenever we wanted to schedule the next task is we draw a lottery ticket and the winning job whose lottery ticket we drew was the one that got to run, okay? So for instance, in this scenario with a yellow, red and blue jobs, the red ones get 50% of the CPU, 30% for the blue ones and 20 for the yellow ones. And this is a way of providing a fair queuing style of CPU distribution. Now, we talked about the lottery scheduling last time, so I'm not gonna go through this in great detail, but there is a certain unfairness that comes from randomness in this, okay? And the problem is that we're picking these tickets and it takes a longer job before two tasks that have equal number of tickets really get an equal number of the CPU. So as cool as the lottery ticket idea is, it's still got this unfairness point, okay? And so we could do something similar but different which is achieve proportional share scheduling without resorting to randomness and overcoming this law of small numbers problem we have here which is by using something called stride scheduling. So the stride of each job is if we take a big number W of some sort divided by the number of tickets, that's gonna be our stride. So for instance here, if W is 10,000, A has 100 tickets, B has 50, C has 250, then the strides are 100, 240 and every job kind of has a pass about how long of its stride is and the scheduler picks the job with the lowest pass, runs it and then adds a stride to its pass. And so what you see is because we're picking the job with the lowest pass number, then things that have small strides get to run more because they're advancing less with each run and that are the things that have a lot of tickets in it. So this is called stride scheduling because you're adjusting the stride of how far you walk and low stride jobs, which have lots of tickets get to run more often and they get a bigger proportion of the CPU. Now it gets a little messy when you worry about wraparound and all that sort of stuff. So the Linux completely fair scheduler is an example of this kind of fair queuing that is in common use today. So N threads is a simple first example, simultaneously execute on one Nth of the CPU. So what we imagine is if we had one CPU that we could somehow divide it up into N pieces and evenly give it to each one of the threads. If we could do that, then we would be able to run, each thread would get exactly one Nth of the CPU. I can't do this in real hardware. So the OS has to somehow give out CPU and time slices. And so what happens is we're gonna track CPU time to a thread so far. And we're gonna repair the illusion that we have a perfectly split up CPU like this. And so in this instance, T one got to run for a little longer than it's time. And so and T three ran for exactly its time and T two is now short. So what we're gonna do is we're gonna run T two for a little while until we catch up. And if we keep making a scheduling decision that lets the one that hasn't gone enough go, then we are gonna get the illusion of completely fair chunk of the CPU. And this is very related to the stride scheduling I just mentioned. Now, in addition to fairness, we want low response time. So there's this idea of the target latency, which is a period of time over which every process gets to run. So if the target latency is 20 milliseconds, you've got four processes, then every process gets five milliseconds time slice. The problem with that, of course, is if you have 20 milliseconds with 200 processes, we've got a very small time slice. So in fact, what we're gonna do is have a throughput goal, which is a minimum time slice. So for instance, if our target latency is 20 milliseconds or minimum granularity, so millisecond, we have 200 processes, then we lose our fair queuing and we go back to one millisecond time slice. By the way, this is my last exam, the CFS is my last topic for tonight. I just wanted to give you this. The other thing that you've probably all learned about is nice command. So the operating systems in the 60s and 70s gave you the ability to take a task that was running and give it a nice value, where being nice of zero is you got to run like everybody else. If your nice values were higher than that, then you ran a little slower or nicer. And if they were lower than that, then you got to run with more CPU. And I'll go over this again next time, but if you want now to get proportional share out to CFS, there's a way of basically coming up with a weight. And we're running enough out of time that I don't want to go into this in detail now, I will next time. But what I want to show you before we leave here is this idea of virtual time. So here we have one task has a weight that's four times that of the other one. What that means is that every thread has the virtual time of how much it's run. And so thread B, when it runs, it doesn't register as much virtual time per physical time as A. And so then if we just keep picking the thread with the lowest virtual time, we'll actually basically give B four times as much CPU as A. Okay. And so this is a real scheduler that's actually used in Linux. And you're probably using it now. So we're gonna finish up now. So the way you choose the right scheduler, if you care about throughput, you might do first come first serve. If you care about average response time, you might have some SRTF approximation. If you care about IO throughput, you might have some other SRTF approximation. Fairness, you might use the Linux CFS I just told you. Fairness with wait time, it's the CPU. You might do run Robin. If you're worried about meeting deadlines, you might do EDF. If you're worried about favoring important tasks like on the Martian Rover, you might use priority. Okay. So how does the Linux real time kernels affect the scheduler? So what happens there is the real time kernels basically give you the ability to schedule something in real time. They might give you EDF or they might give you others where the deadline is an option. Okay. So that's a, and the real time priorities I showed you earlier is a strict priority scheduler which you can use to do real time scheduling as well. All right. So when do the details of the policy matter? When there aren't enough resources. When should you buy a faster computer? When your response time's getting too high. Okay. So you might think you should buy a faster X when X is utilized at a hundred percent. Perhaps we'll talk more about that next time. So I'm gonna end now since we're way over time but I hope you guys have a great rest of your night and we will see you on Wednesday. We'll pick this up where we left off and I'll say a little bit more about the CFS scheduler since we were rushed a little bit on that. But I hope you have a great evening and we'll get the graded exams back to you as soon as we can. Good night.