 Whoops. All right, all right. Stream is live now. Sorry about that. All right. So yeah, one approach is if we have multiple CPUs on, and we're just using the same scheduling algorithm, we can just say, screw it. We'll go ahead and just assign processes to CPU cores as long as there are still CPU cores idle. And that's it. That's just our Band-Aid fix to it. So in this case, we would have good CPU utilization because you're just scheduling it as long as there is an idle CPU. So no CPU would be idle as long as there are processes. And we would also be fair to all processes as long as our scheduling algorithm was also fair. Disadvantages to this is that it's not scalable. So everything blocks on the global scheduler. What does that mean? So that means that all the CPU cores have to coordinate with one each other. The scheduler can't make independent decisions per CPU core, so it has to make sure that, OK, process one goes to only one CPU core, and then process two goes to another CPU core. And I don't have a situation where two CPU cores are trying to execute the exact same process at the exact same time. So you're going to have to wait for the scheduler to make a decision on a core until you can schedule a new process on a newly idle core. And also, it has poor cached locality. What does that mean? So cached locality just means the caches are still valid and they're still around. So if a process is scheduled on a CPU core and it gets context switched out and then context switched back into that same CPU core, some of the caches might still be valid, in which case it could reuse them and it would be nice and fast. But poor cached locality means that that same process might get scheduled to a different CPU core. If it gets scheduled to a different CPU core, well, none of the caches are going to be valid. It's going to have to recache everything and it's going to be slow. So it would just lose some performance. But this was the approach back in Linux 2.4, which was released sometime back in the 90s. But back in that day, when we only had a single CPU core, we didn't really concern ourselves about this too much. And having multiple CPUs was like something only billionaires had or like large universities. So they kind of just ignored that. So a better thing we could do if we had multiple CPUs is we can create a different scheduler that is independent on each CPU core. So the way that it would work is that each CPU core, essentially, if we're doing round robin, they would do their own round robin scheduling on each CPU core. And then the way that works is whenever a new process starts, the kernel just has to make one global decision. And it has to say, what CPU am I going to assign that process to? So the advantage behind this is that, hey, it's really easy to implement. If I can just have one queue, I can just copy, paste the code and have one round robin per CPU. It's also scalable because one CPU scheduling does not affect another CPU. So they don't really have to coordinate or anything. And there's also good cache locality. So that same process will stay on that same CPU core over and over again. So you're getting advantages where it won't invalidate its cache. Disadvantages is that there might be a big load imbalance. So if you have four CPU cores and eight processes launching, well, this would probably assign two processes to each CPU core, which seems like a perfectly good, perfectly valid decision, whatever the user launches the processes. But in this case, we might get unlucky and schedule two really short processes on core one and then two really long processes on core two. And then suddenly, well, core one's going to be done with those processes and then it's going to go idle. Oh, I gave two thumbs up so it did something. Yikes. All right, cool. What was I saying? Yeah, so that way if two short running processes are done on that CPU core, then that core would go idle. And then core two would still ping-pong back and forth between those two processes. And one core is just going to be idle. Yep. The what? Yeah, so the question is, is there any way for the scheduler to know anything about the process to make a better decision, essentially? And the answer to that is, generally, processes are super unpredictable. So you could guess. A lot of this is kind of guessing. And we'll see that in a little bit. So you can make some guesses. But generally, when you make guesses and you're wrong, really bad things happen. But there's always a trade-off for this. Like everything with scheduling, you could just say it depends. So another way we could do that if we didn't want to guess the intensity of a process. Yeah, so we got another comment. Hey, can we at least load balance? And that's exactly our fix to it. So if we had per CPU scheduling, well, we might want to do some global decisions of load balancing. So you could have your global scheduler basically rebalance your per CPU cores. So it could just check if a CPU is idle. Then I will just take one process from this CPU and then move it to another CPU. And that's called work stealing. So you're just moving a process between CPU cores. But the problem behind that is, well, the process you switch CPU cores, it might be really sensitive to caches. So you might cause it to perform really poorly if you pick the wrong process to switch. And there is actually a way to signify this to the kernel that one process really wants to stay on a single CPU core and you don't want it to switch. And that is called processor affinity. So that's the preference of a process to be scheduled on the same core. So you can even do this in Windows. So if you have a process you really care about, like a game or something like that, you can right click it and then they'll be like processor affinity. And you can set that to high. And when you set that to high, it will stay on the same CPU core and your caches won't get invalidated and you'll probably get a bit better performance. So this work stealing algorithm is basically a simplified version of what the link scheduler was actually like in 2.6. So a little more modern. So as an aside, another high computing one, another strategy is something called gang scheduling. I don't know why we have all these horrible names, but this is one of them. So there might be a situation where you want to schedule processes altogether at the same time and the scheduler can't be completely independent in that case, because you essentially want to context switch all the processes at the exact same time. So that's called gang scheduling or co-scheduling. And it basically just lets you schedule multiple processes as one big unit and it requires a big global context switch across all CPUs. So most scheduling doesn't have to deal with this, but once you get into high performance computing, then this is also another consideration that will make your life even more difficult. Another big problem is real time scheduling. So real time scheduling means there's like a time constraint either for a deadline or specified rate for something. Yeah, so this would be like four processes need to get scheduled together on four CPU cores at the same time. Yeah, so if one stops, they all stop. If they all get scheduled, they all schedule it at the same time. Basically at high performance computing, a lot of the things are going to have dependencies and they can't run without every other process running. So they all have to be scheduled to get together. All right, so yeah, next is real time. So real time means there are some time constraints. So deadline or a rate, this is especially important for audio. So for audio, you know, you've probably learned, hopefully I kind of learned my signals classes. I think that's what they were. We're like there's sampling rates and you have to keep up with a certain rate. Otherwise your audio will get all garbled and it'll sound like complete garbage. So audio is an example of real time because you have to fulfill a certain rate and you can only have so much delay. Otherwise you will hear it and you will notice it and you'll say, hey, my audio is not synchronized and you will throw your computer in the garbage. Another example of that is something like autopilot. So that has time constraints. So if your vehicle is heading towards a real small child, well, it should probably react with the brake pedal in like a few microseconds or something like that in order to not kill the child and like the real child, not like the children we've been talking about so far. So that's probably pretty bad. So you want to make sure it reacts really, really quickly. You don't want it to react in a second after it plows through someone. So with some systems, they are called like hard real time and that means they are absolutely guaranteed to complete a task within a certain amount of time. Generally for these systems, they are very, very simple and because they are simple, they are predictable. So this would be like some embedded systems where essentially you get to compile your program and then read the assembly and then count how many clock cycles it would take in the worst case if it took all the if branches and everything like that and then you say, hey, I can guarantee that this process or that this task will happen in a thousand clock cycles which means it'll happen in less than like a nanosecond or a few nanoseconds or microseconds or whatever and you guarantee that it happens with that. But with complicated systems, making that guarantee is nigh impossible. So there's something called soft real time. So that means you do not provide an actual guarantee because it is impossible to get a guarantee. So all you do is say, I will try my best to run this and then in practice, the deadlines always met. So for example, Linux, you cannot make any timing guarantees about the Linux kernel because it is insanely complicated and it depends on what is going on with your system. So if you are running a lot of processes all at the same time, well, you might not be able to make any timing guarantees while if there's only a few processes running, maybe you can. You would also have to, if you made real time guarantees, you'd have to understand how the scheduler works, make guarantees about that. So on and so forth, it just gets real complicated. So Linux doesn't give you real time, like hard real time, it gives you soft real time. The Linux kernel, the scheduling algorithms we talked about earlier first come first serve and round Robin, there are actual scheduling algorithms used by the Linux kernel. So if you want, Linux kernel is open source, you can search the source tree. Anything involved with the scheduler is called schedule underscore something. So schedule FIFO is just a FIFO queue that's first come first serve and round Robin is just called schedule underscore RR. So you can see how it's actually implemented in the Linux kernel if you really want to, but they're actually implemented. So what it does is it uses a multi-level queue scheduling for processes with the same priority and then the kernel goes ahead and dynamically adjusts the priority. Again, it would have to deal with priority inversion and all that fun stuff. So how it does this is if there are any soft real time processes, it will always schedule the highest priority one first and it will schedule it until it essentially goes to sleep and it is done. So normal processes on Linux that we will find, that's where the normal scheduling algorithm comes into play and the general ideas that they adjust the priority based off how old the process is. So in the links kernel real time processes like I said are always prioritized over new ones. The soft real time processes are either scheduled for first come first serve or round Robin and those are essentially the only options because for real time, you want things to be nice and simple and predictable. First come first serve and round Robin are nothing if not simple and predictable. So they are given 100 priority levels in theory where zero is low priority and 99 is high priority. And normal scheduling policies apply to anything that is given scheduled normal and the default priority for these processes is zero. And in fact, this number is called niceness. So the priorities range from negative 20 to 19 where the lower number means it is higher priority. So the lower number means it is less nice, means it is greedier, means it has a higher priority. So 19 is really nice. It means you'll let other people go ahead of you. So processes can change their priorities with system calls. So there is a system called called nice that lets you change your niceness so you can be less nice. And then there's a schedule set scheduler that you could choose to, if you're real time, if you choose to be between FIFO or round Robin. So annoyingly, Linux kind of unifies real time and normal processes on a Linux priority. So the idea behind this was they meant to simplify it. And instead of putting soft real time in one category and normal processes in another category, they just have a flat priority line. So the Linux priority goes from negative 100 to 39 where lower numbers mean higher priority to the left on this figure. And higher numbers mean low priority to the right of this figure. So for normal processes, their niceness by themselves is negative 20 to 19 where negative 20 is on the high priority side and 19 is on the low priority side. So on the unified Linux priorities scale, well, they just essentially scale it by 20 so that if your niceness is 19, your Linux priority is 39. And if it's niceness is zero, Linux priority is 20. And if it is negative 20, your Linux priority is zero. Why did they pick that? Well, they picked it because on this unified scale you can tell whether or not you're a normal process if your priority is zero or above. Yep, sorry. So soft real time means you're scheduled either like first come first serve or round robin and you always run before any normal processes. Yeah, it's a set of higher priority processes that uses simpler scheduler because they need to be more predictable. And we can see on our system what they are now that we know how to read them. Yeah, so the idea behind this too is that if it's negative on your system and Linux priority, it means it's a soft real time process and this is how it maps the priorities. So before in soft real time, a higher number meant higher priority. So it essentially just scales it the other way around. So now we can explore our system and figure out what the numbers mean. So if we run each top, we can now figure out what two more columns mean. So this PIR, that is that unified Linux priority. So if this number is zero or above it means it's a regular process. And if it is negative, it means it is a soft real time process. So the next number here is NI which stands for niceness. So this number only applies if it is a normal process. So if the Linux priority is negative, niceness will always be zero. Otherwise it'll essentially be the priorities scaled by 20. So a niceness of zero corresponds to here a unified Linux priority of 20. So it's just the default priority. So we can see that init default priority. Most of the processes running on this machine are default priority. I can go look through some more then I find some more, some fun ones. So there's some process right here that has a priority of nine. So it's still a normal process. So it's niceness is negative 11 which means it is less nice which means it is greedier which means it has a higher priority. So they are called wire plumber, pipe wire and pulse, spoiler alert. These are processes that deal with audio. So they are higher priority because they have a deadline but they don't quite go to the extreme of say of wanting to be scheduled as real time. So we can go through some more. We can see that there's this RT kit that is slightly nice but not that nice. There's a low memory monitor that has a priority of negative two. So it's a soft real time process and you can probably guess by its name. It's checking if your system is low on memory and will probably give you a warning or something like that or start killing some processes for fun. Other than that, so if something in the priority column says RT here, RT means negative 100 means the most real time you can get and you'll see that it's called migration and there's my creation zero, two, zero, one, two, three. Basically, I believe that's dealing the con that's the kernel process that deals with context switches. So it is the highest priority of anything needs to run needs to do whatever it needs to do and then it will start running another process. So we'll have one of those per CPU core. Anything else here doesn't look terribly exciting. So you can go explore but now we know what those two other columns mean. So here is the Lake scheduler evolution. So we saw like the big global scheduler and then we saw that they kind of divided it out per CPU core. So the actual scheduling algorithm that is used now is something called the completely fair scheduler and as the name might imply, it is fair and that allows for good response time which allows for good interactivity. Also a question that I probably missed before does the scheduler at all manage loading from physical storage into physical memory and try and prevent minimize page faults. We'll also, yeah, so dealing with page faults and all that stuff we'll get into later in the course but that's other things that the kernel has to be concerned about, yeah. All right, so the reason why we didn't use the per CPU scheduler is, well, has some issues. So if we had foreground and background processes, that's a good division. It's easier with the terminal, less so with GUI processes. So you might think, hey, well, maybe if you're like interacting with a window, that means it should have better response time but that's not generally true depending on how you use your computer, especially if you have multiple monitors, you might have multiple windows up that you're not directly interacting but they're playing a video or something like that that you would actually want good response time for but you're not directly interacting with it. And you also might have a window you are interacting with which just has like a static webpage which you don't really need to update so often. So the kernel has, if it's trying to divide foreground and background tasks in modern systems, the kernel would have a lot of trouble with it because it has to basically guess which processes are interactive with some heuristics. If you haven't heard the word heuristics mean, that just means that with some educated guesses. So I'm just guessing based off some data that this is a foreground task. So you might guess that processes that sleep a lot might be interactive that are waiting on like keyboard input or something like that but this is generally ad hoc which means just completely made up might be unfair because you might guess completely wrong and if you make a bad guess then a user will complain because suddenly whatever process they're trying to use is very laggy they don't really like that. So some ways we could introduce fairness for different priority processes is we could use different size time slices depending on the priority. If it's higher priority, we give it a larger time slice but there's also situations where this could be unfair as well because again, you're not guaranteed that your guesses are always going to be right and the penalty for being wrong is generally pretty severe. So we can talk about the ideal fare scheduling in the ideal world. Say we have an infinitely small time slice and context switching is instant. Well then if you had end processes the fairest thing to do would be to run them all at one over end the rate. So if there's one process it would get the full undivided attention of a CPU core and we'll just argue one CPU core here. Well if there were three processes that CPU core would run all three really really and switch between them really really fast to give you the illusion of the three processes running at a third of the time of the CPU. So it's just equally dividing itself among every process that's currently running. So for example, if we had these four processes where P1 wants to execute for eight time units P2 for four, P3 for 16 and P4 for four. Well, we'll assume that we can switch infinitely fast and then we will schedule it in blocks of four so each vertical line here represents four time units. So if they come in at the same time instant if we're being completely fair we have four time units to execute our CPU and four processes. So each process if we're fair just gets a fourth of that which is one time unit. So we would execute each process for one time unit in the first four time units if we're being completely fair then next four time units we do the same thing then next four then next four then at 16 time units in total each process would have executed for four time units. In this case, process two and process four would be terminated. So now I only have two processes left. So if I'm being fair they each get half the CPU time. So they would each get two time units to execute in the next block of four and then two time units again in the next block of four then at time 24 P1 would be done and then all four time units can go towards process three. So it would execute for four time units bring it's total execution time to 12 then at 16 and then all the processes would be done at time 32. So we're just being fair we're scaling the CPU based off how many processes are currently running. So really fair can't get more fair than that but super impractical. Oh, I never turned that back on. Yikes. Okay, so really fair, really impractical. It would be very, very good for response time and their activity. It would have the ideal response time which is essentially zero but we can't context switch in zero amount of time. It would have way too many context switches and it would be wasting a lot of time. Also, our scheduler would have to constantly scan all processes which is ON which is really, really slow. So context switching is overhead because it's not involved in doing what we want which is actually running processes. And guess what? Running a scheduler is also overhead. So even if you had the most perfect scheduler that could make the best decisions, well, if it ran for too long it would be completely useless. So your scheduler also has to be very, very quick. So another reason it's impractical. So the completely fair scheduler will, for each runable process it will assign it something called a virtual runtime where this case virtual just means not real. And at each scheduling point some process is going to run for some amount of time T which your scheduler will pick. And that real time the process is running is going to increase its virtual runtime by T times a weight or a fudge factor which is going to be based off the priority. So that is why a lower priority or yeah, a lower number means a higher priority because it would scale this number differently. So if the weight was like 0.5 that means, hey, I actually ran for two seconds but based off the weight I would, it would only count as running for one second of virtual runtime. So lower number causes your virtual runtime to increase slower, which essentially gives you more time. So this virtual runtime will monotonically increase. What's that fancy math word mean? It means that it only increases, it will never decrease. So the virtual runtime is guaranteed to only constantly go up. And what the scheduler does is it will select the next process to run just based off whatever process currently has the lowest virtual runtime. So whatever has it run that needs, it's essentially fair share. We'll select that to run and then we will compute the size of T based off how many processes are running and the idea behind that is we want to scale it so that we get close to the ideal fair scheduling. So if this process hasn't had a lot of CPU time while some other processes have, well, it would get a bigger T and it would run for longer in order to catch up. So that time slice is picked by the kernel and tries to essentially make things fair. So there's, if a process is new too it would just have some hard coded value for T so that there's just some starting point. So this allows the process to run and when it's time slice ends, which it computes we just pick the, we increase its virtual runtime then pick the next process that has the lowest virtual runtime and just keep doing this over and over again. So how it's implemented is red black trees. So that thing you learned in your algorithms class this is probably the first time you actually heard of it being used for something practical. So red black trees are self bouncing binary search trees and in this case, the key is that virtual runtime. Why does it do that? Well, it makes inserting, deleting and updating pretty fast, it's O log N but the part that the schedule is concerned with which is just picking the minimum virtual runtime. Well, that can be done in constant time which is really, really fast which is why they use that algorithm to schedule. So that red black tree is going to keep track virtual runtime at a nanosecond granularity and the nice thing behind this is it doesn't need to guess at the interactivity of a process. Yep, sorry. So if you, for a red black tree it's self balancing and the minimum is always at the top. Yeah, so it's still a binary tree but a red black tree has certain nicer properties for finding the minimum in constant time. So it can do this. So yeah, so you can Wikipedia red black trees but finding the minimum is constant but inserting that's where you have to do the balancing stuff and where it gets kind of ugly. So nice thing behind this completely fair scheduler is that if a process is IO bound which means it constantly gets blocked over and over again. Well, it would get its time slice and then it would run. It would probably block on something before its time slice is done. So it would only increase its virtual runtime based off how much it actually ran for. So whenever it gets unblocked, well it wouldn't have increased its virtual runtime that much. Other process would probably run, increase their virtual runtime. So this process is more likely to get scheduled immediately as soon as it gets unblocked and you'd get good interactivity with that. So soon as it receives some information it's more likely it will run immediately and you'll have some good response time for that. And also it would get a larger time slice if it was blocked for a long time in order to catch up to that like ideal fair schedule. So as we saw, scheduling gets even more complicated, more solutions, more issues. So we added priority. We got ourselves priority inversion for our troubles. We saw that some processes need good interactivity, others not so much. Deciding that is hard. Then we saw once we have multiple cores, well then we might have to do something about that and have per CPU cores. Then we also have real time which requires predictability and a whole different set of constraints. Then we saw the completely fair schedule which is what the Linux kernel actually uses to this day. And the idea behind that is it tries to model ideal fairness. And then also remember Friday, which is tomorrow. No lecture. I'm not here. I'm in Silicon Valley. I'm gone. And then, yeah, then so you'll be behind the other one, but this course or this section gets Monday off for Thanksgiving. So we'll be tied with the other one eventually. So we'll be unsynced for a little bit and then we'll catch back up. So no lecture tomorrow. And we also have a midterm room finally. So it will be November 15th. It's scheduled. We have the room from like six till eight. So unless there is any good reason not to, I'll probably start it at like 6.15 or 6.30 and end at like 7.45 or something like that. If you have any other suggestions for the midterm, just let me know on Discord or email me or do whatever. So just remember, phone for you. We're on this together.