 All right, welcome back to operating system. So today we're talking about advanced scheduling so We kind of alluded last time that we have different processes that we might care about and if we want to Schedule them. Especially what can we do about it? well One thing we could do is just be super explicit about it and And we could just add priorities to our scheduling algorithm so Since we may favor some we'll just assign them a number who gets to decide the number. Well, we do or we can let them decide themselves Then we assign each process a priority and the idea behind that is quite simple If a process is a higher priority We should run that first and What we might do is just round robin processes of equal priority This can be preemptive non preemptive. There's lots of choices. We could make we could round robin between different priorities If we really wanted to if they're close enough, we could do all sorts of things oops So if you're dealing with priorities, well, you need to assign a priority easiest thing to do is Just assign a number to it and make the number have some type of meaning So you can make a lower number mean higher priority or Lower priority it just depends on your taste So in Linux one of its tastes is to make a lower number mean higher priority so in Linux when you see this negative 20 means highest priority and 19 is the lowest priority for normal processes But if you just kind of implement a priority system, well We go back to our old friend of starvation So now you don't have any guarantees that starvation won't happen Because there could be a lot of high priority processes And if there are so many high priority processes, you might never actually run a low priority process and That way you would starve that low priority process One solution you could just make a band-aid to it immediately and just say hey Well, if a low priority process hasn't run in a very long time I'll just temporarily Make it a higher priority until it runs and then after it runs I'll bump it back down to its original priority. So sure you could do something like that But now we have a new fun issue So if you have priorities a issue that might come up is something called priority inversion and That means you accidentally make a low priority process behave like a high priority process So we saw before that processes can depend on one another So for example when we had our sub process Well, our child process was doing a read and it was waiting for our parent to do a right And it couldn't progress until we did a right So if you have a situation like that Then if who is doing the writing is a low priority process and who is doing the reading is a high priority process Well, you kind of inverted your priority for the low priority process because if you want to Free that bottleneck you should least temporarily Make that low priority process a high priority process because a high priority process depends on it So one solution to this is something called priority inheritance So if the kernel figures out that dependency it will do exactly that it will make that low priority process temporarily match the high priority process and then after the dependency clears It can go ahead and revert it back to a low priority process and Of course with this you may get chains of like process a depends on B depends on C depends on D Well, in that case the kernel would still have to detect it and do that priority inheritance For every single process in that chain and then as soon as it's freed up it can go ahead and revert them back to their Normal priorities, so that's one thing you could do if you had priorities We also kind of talked about that there's two types of processes and they have different Needs so a foreground process can receive user input Probably want good response time for that because they're probably using a keyboard and a mouse and paying attention to it While background processes, they probably don't need good response time. They're just running You probably are concerned about throughput. You want to end the process as quickly as possible so you can get on to other things So back in the day when terminals were real like they were just a integrated keyboard and monitor well Unix was designed in such a way where it could tell if a process was connected to a terminal By a process group ID and it's terminal ID. We don't actually have to know that for this course But basically back in the day when things were much easier You could tell whether or not a process was a foreground task or not because it had a terminal directly connected to it so if That was the case. Well, you could separate your processes out foreground processes need good response time Use round robin background processes just need throughput do something simple like first come first serve If we had something like that, we could We now have the problem of we have to schedule between foreground and background processes So if we have to schedule between foreground and background processes Well, you have to you have to pick a scheduling algorithm to pick between them So you could round robin between foreground and background if you want you could favor foreground a bit more Again, it is scheduling so there is no correct answer But that means you are round robinning around robinning, which is lots of fun So you could also make this more complicated. You could assign Foreground processes as a higher priority than background You could do whatever you want. It's scheduling. There is no one complete answer. It's just a series of trade-offs so Really complicated and we haven't talked about having multiple cores yet so far We've only talked about having a single core So we're going to assume a architecture called symmetric Multi-processing what that means is all CPU cores are connected to the same memory There's no performance difference or otherwise like Addressing is different or anything like that. They're all connected to the same memory and are otherwise identical Except what is independent about each CPU core is its lowest level of cache So like it's L0 or L1 that is independent to a CPU core While between multiple CPU cores, they might share like an L2 or an L3 cache So if we have multiple CPUs We can just band-aid our solution, right? Just use the same scheduling algorithm, but it just now schedules between all CPUs so It's the same idea, but it just makes more decisions So while there is still an available CPU It will just do the same scheduling decision and put that process that it picks on that CPU Until all CPUs are full and then it will just keep on doing that so advantageous to this It definitely keeps all the CPUs busy because whatever one becomes idle the scheduler would run and put a process on it It's also fair to all processes as long as our scheduling algorithm is fair Disadvantages of this is it's not scalable. So there is one scheduler and many CPU cores and it has to coordinate and make a Consistent decision for every CPU core so it has to do it all It can't do multiple CPU scheduling at once So it has to just run the algorithm say process one goes on this core then process two goes on this core Then process three goes on this core because it wouldn't want to be in a situation Where like you're trying to run the same process of two cores on the same time. It's just not gonna work So and that's also a prelude to when we get into scheduling or sorry When we get into threads and actually have to deal with issues like that The last disadvantage is poor cash locality What does that mean cash locality just means things in the cash stick around and if a process Get scheduled on the same CPU core while cash locality if it's scheduled in the same CPU core That means its cash might still be local on that same CPU and still might be around So if you get context switched out and then context switch back on to the same CPU core there might be some valid cash entries for your process still in the CPU and You would hopefully get some better performance if you are scheduled in the same CPU if your process runs on core One context switched out and then context switch back on to core two then None of the caches are going to be valid for you. So you'll probably be a bit slower And that's something you generally want to avoid But this scheduling algorithm is what was used back in ye olden days back in Linux 2.4 So you can figure it when that was released It's probably like in the 90s or before because no one had multiple cores. It just wasn't a thing So they didn't really concern themselves about it too much So when it started becoming more common Well, someone had the bright idea of saying hey, we could create schedulers that are independent on each CPU core. So Then we have reduced our global decisions We only have one big global decision we have to make and that's when a new process gets created What CPU core do we assign it to so you assign it to the CPU core with the lowest number of processes currently on it and That process will only stay on that CPU core and it will never move again so some advantages of this is It's easy to implement if I have one round robin queue I can just make a round robin queue per CPU without really changing anything and Just picking what CPU to schedule a process on isn't that complicated It's also scalable We're not Blocking on a global decision or global decision is just whenever a process is created not coordinating whenever a CPU needs to context switch a process in and out So each CPU core can be otherwise independent and it can be really fast And we also have good cache locality because processes never move between CPUs so it will all if It's cache is still valid It will be valid whenever it gets rerun on it The disadvantages of that this is there might be some load imbalance so you can imagine Since we just make the decision at the beginning we might be in a situation where we have Say we have four processes or sorry four cores and eight processes So we assign two processes to each core Well, we might get super unlucky and core one gets two really short processes and core two gets two really long processes in that case core one would finish Become idle and then we can't do anything so it would for one would just be idle and then core two would just be Ping-ponging back and forth between two processes, so we can do something better so a Compromise to that is well that seems like a ridiculous situation and the solution should be fairly simple so I Can have a global scheduler that can rebalance the CPU core So if it notices that CPU is idle I can just take a process from one CPU to another and that is called work stealing so Whenever this happens that process that's switching CPU cores Well, it's gonna have all its caches invalidated if it's super performance sensitive You probably don't want it to switch cores So you might want some control over it and that's what the term process affinity means So it means the preference of a process to stay scheduled on the same core So this is configurable and you can do it even in Windows So if you're in Windows and you're playing a game or something and you want to perform a little bit better You can go into task manager right-click it and then go to processor affinity Set it to high and that will mean it sticks around on the same core and won't switch And you'll get a bit better performance so you can do this for any process on your machine You seem to you care about performance on So this work stealing version is basically the old one scheduler from Linux 2.6 So fairly old, but this is Basically how the scheduler worked in Linux So another strategy which also is related to high-performance computing is something called gang scheduling Yes, I don't know why we have all these terrible terms So what that means is basically multiple processes are scheduled as a group together and the scheduler on each CPU cannot be completely independent because I need to schedule and Context switch all these processes at the same time So it's also called co-scheduling and the idea behind this is if a bunch of processes are Really dependent on each other. There's no use just running one and it's just going to be blocked immediately I'll just schedule them all to run at the same time and that way if there's any dependencies They're all running so they can resolve them so this will be more of a thing if you are in high-performance computing and it's just another wrinkle in Scheduling that you might have to consider depending on your application So another problem is something called real-time scheduling So real-time scheduling mean there are some types of time constraints either for a deadline or some type of rate This is especially important in audio So it's been a while since I took a signals class, but you know audio is at a certain sample rate and There's a latency to process audio Well, you want that to be as fast as possible so that your human brain doesn't see any Any delay or anything like that where the audio gets unsynced and it has to keep up with that sampling rate and You know keep up with that sampling rate Otherwise you'll get like weird crackling or some other weird sounds. It'll just sound like garbage So it'll either sound like garbage or if it's too slow, it'll be completely unsynchronized Also an extreme version of that something like autopilot or like a missile or something like that Where if you're driving you should probably react within a few milliseconds Otherwise, you know, if there's a small child like a real child not a child process You probably want to react pretty quickly if you not to hit and kill them not like our kill like real kill like bad kill so Probably want to react really quickly if your autopilot takes a second to make a decision then Probably not a good thing So there's two types of real-time One's hard real-time which is like you as a developer need to guarantee that something happens within a certain time limit So for these systems typically they're like embedded systems that are dead simple Why are they dead simple because they have to be predictable and if you're designing a system like this Well, guess what when you compile your code if you don't Directly program an assembly you're going to have to read assembly Count how many clock cycles each instruction takes figure out the maximum number of instructions You need to accomplish your function or whatever you're trying to time and then you have to guarantee that Hey, this will happen within a certain number of cycles Which corresponds to a certain number of seconds or microseconds or nanoseconds or whatever so The system has to be dead simple you have to count it The other part the other Thing is a soft real-time system and that basically you throw up your hands You say it's too complicated to actually figure this out But in practice I always meet the deadline because my computer is just fast enough So that's like your desktop system. It runs at scales of gigahertz And it does lots of things very very quickly. So audio would be like soft real-time and In practice it always hits the deadline because your computer is just such overkill but if you wanted to analyze the Linux kernel, there is no way you could make any guarantees whatsoever because One the actual scheduler is very complicated To well, you don't know what's going on in the offering system at any given time How long something takes to react depends on how many processes are running depends on you know, if The temperature if you're throttling your CPU Depends on like Thousands and thousands of things and you could never make any sort of guarantee But you could just test it and say that hey it it works my users don't complain at me It's good enough. So Linux can only do soft real-time just because it's insanely complicated so on the non-complicated side When it does soft real-time it does very simple scheduling algorithms and what we saw before are real scheduling algorithms that are used So if you look at the Linux kernel source tree, you'll find first come first serve It's called sketch underscore FIFO. So anything that has to do with the scheduler in Linux It's just called sketch and you'll also find round robins. It's just called sketch RR so These things are actually used in the Linux kernel for any soft real-time because you want that predictability You want it to be really really simple. So what it does? It has a multi-level queue for processes with the same priority and the kernel will just dynamically adjust the priority If it finds that a real-time process hasn't run in a long time and Any of the soft real-time processes? It's always going to schedule the highest one first before running normal processes and normal processes on Linux are what like 99% of the processes are going to be and That defaults to the Linux scheduler, which as we saw previously could be that per core one So like I said the real-time Processes are always prioritized. There's soft real-time They're either going to be first come first serve around Robin and there's a hundred static priority levels for this real-time So they go go from zero to 99 Where zero means low priority and 99 means high priority? so the opposite of what I said at the beginning which is really annoying and The normal scheduling so normal processes on Linux, which just use the normal scheduler, which you can see by schedule normal By default their priority is zero and the priority ranges from negative 20 to 19 where a lower number means higher priority and this number for some reason they call Niceness not to overload the term priority So the idea behind that is you if you are less nice you have a lower number Your greedier you take more of the CPU. It's one way to think about it So there is actually system calls for this so you can change your niceness by using a system call called nice So you could decide that your process is going to be less nice and therefore have a higher priority So you could just set your own priority and there's also like a set scheduler system call So there's ways to interact with it with the kernel So what does it look like with that mess well? Linux has a priority that tries to unify the real-time priority scale and It's niceness into a Linux unified priority which makes things real fun because Trying to make it simpler. They just came up with a third priority. So Here is our niceness that is fairly easy to figure out. So niceness varies from negative 20 to 19 and Everything on this scale anything to the left means it is high priority and anything to the right means It's low priority. So for normal processes negative 20 is towards the high priority side and 19 is towards the low priority side and they can range from negative 20 to 19 so The Linux priority tries to unify these two like I said, so when it tries to unify them Its first goal is to make any normal process have a priority of zero or above So it will just scale the niceness by essentially adding 20 to it So negative 20 becomes zero zero becomes 20 and 19 becomes 39 so it just scales it up then for real-time processes and This is like a mutually exclusive thing your process is either normal or it's real-time. It can't be both So the Linux priority it assigns numbers negative one to negative a hundred To be real-time ones which negative a hundred being more high priority and negative one being less So it has to take the soft real-time priority and essentially invert it So a zero becomes negative one and a 99 becomes negative a hundred So why they do that? I don't know but they do So now we can actually read some more columns in each top So when we had this we could have we kind of understand what pit is We know what pit is process ID user now we can read the next two columns So PR I is short for priority and it is the Linux priority these numbers at the bottom So if it's negative, it's real-time if it's zero or above. It's a normal process So here I can see a knit it has a priority of 20 and if it's a normal process So it has a PR I of zero or above It means it has a niceness and it is valid Otherwise if it's a real-time here and it's negative niceness has absolutely no meaning Why I don't know don't ask me. I just work here so We can see that a knit Has a niceness of zero. It's just a normal process Has a default priority so we can look through these to see if we find anything interesting So scroll down scroll down. Hey, there's something interesting So there are some processes that have a priority of nine Which means they're still a normal process and if we look at their niceness. It's negative 11 So they have a higher than normal priority And what is this? It's wire plumber pipe wire and pipe wire pulse Well, all those processes have to do with audio So audio stuff generally has a higher priority Because you want to hit those deadlines, right? So it's not it doesn't ask for real-time It just asked for a bit of a higher priority So we can scroll down. We can see this oddly named process tracker minor FS 3 Well, it has a priority of 39 niceness of 19. So it is a very low priority process. So it We essentially whatever it does. We don't care about it very much So we can explore some more There's a negative two. That's an interesting one. So it's a slightly higher priority still normal. Oh It's negative two. So it's of real-time priority So it will be scheduled ahead of any other normal processes and it's niceness doesn't mean anything So it's a fairly low priority real-time task and it's called low memory monitor So it's a process that just looks at your memory usage And if it gets too scarily high, it probably just starts killing random processes Well, probably not random the kernel kills random processes This is trying to probably be smarter than the kernel and kill a process that it thinks should go So what else we got so anything where the command is in green here means it is part of the kernel So you can see the kernel it has stuff that is less nice So all these things that we don't know what they do. Well, they seem to be fairly important and we also see this one With a priority of RT if the priority says RT it means negative 100 Which means absolutely highest priority real-time and something called migration and guess what if we look at all them There's Eight of them and there is one for each of my CPU cores on my machine And it's probably moving stuff around and it is very important So we can you can just explore See what some tasks are there's some other ones that seem to be less important Hey, there's a watchdog. That's real-time. What the hell is a watchdog? I don't know. Well, actually I do but what it is not terribly important to you But seems to be fairly important and then that seems to be important, you know You can have fun. You can explore so we know what those two columns mean now All right so The links scheduler evolution so we saw the first two and now we are going to go to the present Which is called the completely fair scheduler and the idea behind that is that as The name would imply it's fair. So if it's fair, it should also be responsive have fairly good interactivity So why don't we use the oh one scheduler? Well, it has a lot of issues with modern Processes and desktop environments and how we use our machines now So back in the day foreground and background processes. That was a good division It was easy to figure it out with a terminal attached to it But now it is very less. It's much less obvious. We have multiple monitors We have multiple windows open just because we don't have a we're not paying attention to a window or interacting with it It does not mean it is not in the foreground does not mean we are paying attention to it and does not mean it It is important. So like if you have a video Well, if you have a video playing a window and you can see it probably important And then If you say, okay, well, I'll just figure out what's a foreground task. Obviously a video playing and you can see it probably a foreground task Well, if you do that, you just invent a bunch of heuristics. What's a heuristic just means it's a bunch of rules that are like an educated guess and You could make other guesses like if a process sleeps a lot waiting on a keyboard Maybe I'm interacting with it. Maybe that means it's a foreground process But it's ad hoc What's ad hoc mean that I just made up a bunch of stuff and it doesn't really have to be true so if I am wrong with guessing and I guess that a Foreground process is actually a background process Then suddenly your video starts lagging all the time and it's just a really bad experience and generally the cost of being wrong Is quite high and you don't really want to do something like that So if I were to introduce fairness for different priority processes, I also have some different options I could do I could use different time slices and still be Kind of fair with round robin I could you know just assign a higher priority task a bigger time slice and then a less Priority one a lower priority one a smaller time slice I could do something like that, but then again this could be ad hoc could be unfair if I guess wrong so First we can talk about what is fair. So what is fair? Well, if we had infinitely small time slices at Any given point if we have end processes we would want to divide our CPU up So we give each process an nth of the CPU time So that means if we have one process we would give it our full undivided attention If there are three processes running we would give each a third of our time So it's just divided equally among all current processes What's that look like? Well, if we have process one that takes like eight seconds to run process two that takes four process three that takes 16 and process four that takes four We can Assume that we have infinitely small time slices. We can context switch in zero time So we can divide our Each column into four time slices just to make the numbers a bit better So at the beginning once we have all Four processes in if we have four time units if we were completely fair We would give each process a single time unit So each process would run for a single time unit and each number here just represents How long that process has currently been running for so after four time units they all run for one time unit then For the next four time units they all run for one time unit again again. We're being completely fair then this happens until a Total of 16 time units pass where each of our four process was given a fourth of the CPU time So each of them ran for four time units So in that case process two and process four are now complete and if we were being completely fair Well now there's two processes left and if we have four time units to divide up We would give each process two time units because there's only two left So each of them would execute for two time units, so they'd go from four to six then We would do it again for the next four time units and then at time 24 Process one is finally done and we only have one process left So we can dedicate all four time units so that one process and get it done So process three would run for four then again for four and then at time 32 All the processes would be done and throughout that we were fair to them at all times So Its name is ideal fair Really fair can't get more fair but we made a very big assumption which is not true and that is that we have infinitely small time slices context switches do take time and You can't just switch infinitely so even though this is fair and each process Gets the best response time So the response time for this would be near zero all the time because we have infinitely small time slices in reality doesn't work like this and also We have to constantly scan all processes, which is on which is just really really slow because context switching takes time and also running the scheduling algorithm itself takes time, which is not running processes, so you want The scheduling algorithm itself to be as quick as possible So that is where the completely fair scheduler comes into play. So what it does is for each Runnable process it assigns it a virtual runtime in this case virtual just means kind of fake and Whenever the scheduler needs to make a scheduling decision Where a process runs for some time t it will whenever it's done running It will record the virtual runtime and scale it by a weight based off the priority So what does that mean? That means that if a process runs for like one second or sorry if a process runs for let's say two seconds Then it could scale it by its weight and say its weight was point five Then that means it would only get one second of virtual runtime and have that count against it less So that is the intuition why Linux uses a lower number to mean lower Or a lower number to mean higher priority because that priority is based off the weight So a lower weight means your virtual runtime Increases slower and that counts against you less So this virtual runtime will monotonically increase What god-forsaken math word does that mean? Well that just mean monotonic just means it only increases It never decreases. So we only add to that virtual runtime. We never subtract from it so Scheduler in this case has a very easy decision its only decision is to pick the process with the lowest virtual runtime because that needs to run more to get up to the fair amount and The time slice it gets the kernel will compute that time slice Which is t based off what would be fair based off? How many processes are currently running and how far behind is this process to the next processes ideal or the next processes runtime? so The scheduler will pick the t based off the ideal one and then that scaling of That virtual runtime based off the weight is purely done from the priority So this allows the process to run When it time slices time slice ends the scheduler just runs again Or if it goes to sleep early or blocks for some reason then only it's run time How long it actually runs to counts against it? It wouldn't use its full time slice So it would only increase its virtual runtime by a little bit So the completely fair scheduler implemented with red black trees Yes, that thing you learn from algorithms that didn't actually have any practical application has some practical application now So red black tree is a self balancing binary search tree And in this case the keys that it is paying attention to is the virtual runtime What does that mean? Well, it means all of our operations like inserting deleting updating and finding the minimum Are all o log n which is quite fast So whenever you create a process or whatever you have to Whatever you recompute its virtual runtime. Well, you have to read it back to the tree All right, so you can see its decision and all of this nice and fast So in the implementation, the red black tree that virtual runtime is Accounted for in nanosecond granularity, which is really really fast So if you haven't seen a nanosecond before a nanosecond is about like that This is how fast a nanosecond is if you consider the speed of light So this is how much a nanosecond is going the speed of light pretty small. It's like 30 centimeters So nanosecond pretty fast speed of light pretty quick. So that's really quick And yeah, your CPUs operate on that. So, you know, that's actually not that long So it actually matters how long your wires are in some cases. So that's why So the nice thing about the completely fair scheduler to is that it favors IO bound processes by default What does that mean? Well, that means if a process is IO bound, you're probably interacting with it Because it's requesting some information from the kernel. It's waiting for some keyboard press or something like that So it would not use all of its time slice. So its virtual runtime would only go up by a little bit And then your kernel is going to run other processes. They're going to increase their virtual runtime more And then when your process is finally gets the information it needs and unblocks Well, it's much more likely to run because it would likely have the lowest virtual runtime So it would get scheduled to run, run pretty much immediately, get a good response time Then go back to sleep, probably still be the lowest one And then its time slices would also get bigger and bigger So if it needed to do a lot of computation after a bunch of input, it could So with that scheduling, even more complicated, it's like that fun thing More money, more problems, same thing with scheduling, more features you add, more problems you get So if you introduce priorities, guess what? Your problem is now priority inversion You have to handle it. One way to handle it is through priority inheritance You might also have some processes that need good interactivity, others not so much And you might try and discern which one is which Well, if you make a bad decision on that, you will get really bad scheduling And probably make your users very unhappy If you have multiple cores, well then they probably need something like per CPU queues Then also I might have like real time, and if I have real time I need things to be predictable And then I can't make something very complicated, that is very smart If I try to throw AI at it, that just wouldn't work or something like that Then we saw what is actually currently used in the Linux kernel today And that's the completely fairer scheduler that tries to model ideal fairness So we get to end a bit early. This class will become ahead of the other classes So they're not having a lecture tomorrow, so you will be a lecture ahead of them until Thanksgiving And then everyone will be synchronized again And also we have a room for the midterm. Joy, November 15th. It's the Wednesday So we have the room from 6 to 8. I was thinking of starting at like 6.30 Because it will be an hour and 15 minutes It is in the exam center, wherever that is. I have no idea where that is One of you probably will. I will figure it out before then If anyone has problems with it, let me know. Hopefully it's okay It has to be in the evening because your guys schedule is a complete mess Which puts me home at like 2am, so thanks for that But yeah, so that'll be fun. Let me know about it And otherwise we get to end early. Yay! So just remember, pulling for you, we're all in this together