 Hello everyone, welcome to the 9th lecture in the course design and engineering of computer systems. In this lecture, we are going to study a little bit more about the CPU scheduler and what are scheduling policies. Let us get started. So, we have seen this concept of the OS scheduler before, right. So, the scheduler decides which processes to run on a given CPU core. So, every CPU core can run one process at a time, but there could be multiple processes that are ready to run. And from among these multiple processes, the CPU scheduler picks one of the process to run on a CPU core at a given point of time. And after some time, it might run P1 for some time and then it might context switch to P2 and so on, right. It can run these processes one after the other in some order. So, the scheduler schedules not just processes, but also kernel level threads. We have studied threading in the previous lecture and these threads are also scheduled independently like processes by the OS scheduler. So, there are two parts to the scheduler. One is the policy which decides, okay, there are all of these ready processes to run, which one should I run next on a given CPU core. That is the policy. And once the policy decision is made, then you have the actual mechanism of the context switch itself, which is if I decide to, you know, stop running P1 and run process P2 next, then this context switch itself, which is saving the context of P1 in its PCB, restoring the context of P2 from its PCB. This mechanism itself, we have seen it before, right. In this lecture, what we are going to study is the policy, that is the decision of which process to run. And there are many simple scheduling policies. We will start with studying some very simple scheduling policies that are easy to understand, that have good theoretical properties and so on. But if you look at what are the schedulers that are there in real operating systems, they are actually quite complex, right. They are not the simple scheduling policies we will study in this lecture. So, in this lecture, I will start with the simple policies, but I will also try to give you a flavor for what are the real-life schedulers looking like and what are some of these complex policies. But of course, understanding a full-fledged real-life scheduler is beyond the scope of this course. So, let us get started with the simple policies and before we go to the policies, let us understand the types of schedulers. A scheduler is of two types. It can either be a preemptive scheduler or a non-preemptive scheduler. So, what are these differences? So, understand that the OS scheduler when is it invoked for a context which there are different ways in which the OS scheduler can be invoked, right. So, first of all, for the scheduler to run, the process has to be in kernel mode. And once a process is in kernel mode for a trap, the non-preemptive kind of schedulers, what they do is they only perform what are called voluntary context switches. That is, if a process P1 has come into kernel mode, it has trapped into the kernel and if it has made a blocking system call or it has terminated for some reason, then P1 cannot run anymore. Only in that case will a context which be done to another process and you will return back to the user mode of another process P2. If P1 is able to run, as long as P1 is able to run, you will run P1. Only when P1 does not want to run, if it makes a voluntary context switch, only then the scheduler will switch to another process. So, such schedulers are called non-preemptive scheduler. Non-preemptive means it will not kind of interrupt a process that wants to run. Other kind of schedulers are preemptive schedulers. That is, they also perform involuntary context switches. That is, even if this process is not blocked, has not terminated, it can still continue to run for some more time. Even then, some schedulers will stop this process and do a context switch to another process. Such schedulers are called preemptive schedulers and such context switches are called involuntary context switches because a process has no control over it. It is still running. It is in the middle of doing something, but it can still be context switched out. And these are needed modern operating systems do these involuntary context switches because you do not want any process to run for too long on the CPU and you know, starve other processes, deprive other processes of their runtime. You do not want any process to take over, hog the CPU for too long. Therefore, most modern operating systems use these kind of preemptive schedulers and a lot of involuntary context switches are also performed. So, now the question might come up, how is a context switch even triggered? Why will a trap happen? If this process P1 is not making any system, blocking system calls is not terminating, is not giving up the CPU, then how will the process go into kernel mode to trigger this context switch? Right? For that purpose, modern CPUs have what are called timers, special piece of hardware that generates interrupts periodically. So, these timer interrupts go off periodically and every so often a process will trap into the operating system, right? This will ensure that the operating system can set a timer on a process, trap to it and do an involuntary context switch, right? So, timer interrupts are critical for preemptive schedulers and most modern systems use preemptive schedulers because you want to share the CPU across multiple processes, right? You do not want any one process to run for too long. And because of this reason, because of the use of preemptive schedulers, process can be context switched out any time, right? These unfortunate context switches that lead to race conditions that we have seen in the previous lecture, these can happen with modern CPU schedulers. So, what are the goals of a scheduling policy? Before we study various policies, what do we want a good policy to do, right? We want a good policy to effectively use the CPU. A scheduling policy should not leave the CPU idle when there are processes to run that is inefficient. We want high CPU utilization. We want processes to complete as fast as possible. We want to minimize the completion time of a process that is from the time a process is created to the time it ends. We want this to be as fast as possible for as many processes as possible, right? We also want to minimize response time of a process. So, note that response time is different from completion time. What is the response time? The time from the process creation to the first time it is executed on the CPU. So, when a process is created, when it is forked by the parent, it is added to some list of processes and at a later point of time the CPU will run it. So, this is the response time. This indicates for example, if you click on a program, if you do some action, when will the action start to show up on the screen for you, right? That is the response time. So, even if you do not fully finish the process, if you at least schedule it once, give it some time to run, then the process will be responsive. This is very important for interactive processes, right? If you are playing a computer game, you click on something, you want the process to run and handle the event, run the code corresponding to your click, otherwise you are going to see a lag when you interact with the process. So, response time is the time until you get the CPU for a short time, for the first time, that is response time. Completion time is for the entire process to complete. These are two different things. The other properties we want are fairness, right? If there are multiple processes in the system, you want all of them to get some fair share of the CPU, right? You can also prioritize, you can also set priorities for processes saying, I want this process to get twice as much CPU as some other process that also you can do. But whatever it is, you should be able to control how much time each process gets on the CPU. And finally, whatever is the scheduling policy, it should have low overhead. That is, it should not take too long to decide. If you have a large number of processes, the scheduler itself should not run for a very long time trying to make this decision of which process to run next, right? It should quickly make the decision. And it should not cause too many context switches, because a context switch, you know, saving context, restoring context, all of this itself takes time. It can take up to like one microsecond. It can take like a few thousands of CPU cycles. And therefore, you do not want to have too many context switches. You do not want a scheduling policy that runs one process for a short time, context switch, context switch, you do not want that, right? You want your overall scheduling policy to not add much overhead to the system. So, with this in mind, let us start understanding some simple scheduling policies. So, the simplest policy that you can think of is a simple first in, first out or a 5-4 policy. What is this policy? Basically, all the processes that are arriving in your system, put them in some sort of a queue. Whenever a process comes, stick it into the queue. And the scheduler will start picking processes one by one in the queue. It will take this first process, run it till it completes or terminates or blocks, then move on to the next process and the next process and so on, right? There is a simple queue in the order in which the processes arrive, you will run them. And if a process runs for some time and blocks, of course, the next time it comes in, it will be treated like a separate job, right? It will be added back to the queue again and once again it will get its term, right? So, this is a simple process. It is non-preemptive. It is, it will let a process run for as long as it wants to and it is very easy to implement. But the problem is that sometimes short processes can stuck, can get stuck behind some big processes. Suppose, your P1 is a very large process that takes a long time to run, then the other processes that come later will have to wait their turn for a very long time, right? This is not ideal especially if you have interactive processes. So, that is the inefficiency of this 5-4 policy. So, let us just see a small example of 5-4. Suppose you have three processes, process P1 that runs for five time units and arrives at the end of time interval 0, that is at just at the beginning of time slot 1. So, process P1 arrives and then shortly after some time another process P2 arrives and again at the end of time unit 3 we have process P3 arriving and this is our timeline. So, now what is the schedule that the 5-4 scheduler will generate? So, when P1 comes 5-4 scheduler of course starts to run P1. P1 will run for its entire duration of 5 seconds, right? Even though P2, P3 have come and their shorter processes does not matter, this is a 5-4 scheduler. P1 has come first, therefore it will run first and then after sometime after P1 ends then P2 runs for the next three units P2 runs and then after sometime towards the end P3 runs, right? This is the schedule. So, given the arrival times of processes and how long they will run given this information, you can create a schedule like this. This is an example schedule of the 5-4 scheduler and this is fairly easy to understand. Now, let us move on to slightly more complex scheduling policies. Another popular though a very theoretical scheduling policy is what is called the shortest job first, right? In this scheduling policy you assume that you know how long a process runs. In 5-4 you need not know how long a process had to run, right? You just let it run as long as it wants to. But here in shortest job first you assume that what is called the CPU burst of a process that is the amount of time a process runs. In one instance when it is given the CPU until it terminates or blocks that is called the CPU burst of a process and you assume that the CPU burst of a process is known. And this scheduler will pick the process with the smallest CPU burst to run. Amongst all the processes that are there in the system right now you will pick the process with the smallest CPU burst. Maybe you store all the processes in a heap like data structure and you extract the process with the minimum CPU burst, right? You can decide what data structure to use. So, you will always pick the process with the smallest CPU burst but this is non-preemptive. Of course when a process is running if another process with the shorter CPU burst comes in it will not preempt the running process. It is a non-preemptive policy. And this policy you can prove you know if you take a course, a theory course you can actually prove that this is optimal and this will actually minimize the average completion time of all processes when all the processes arrive at the same time, right? Under certain conditions. This policy is actually going to work very well. But you still have the problem that a short process can get stuck behind a long process if it arrives slightly late, right? You have ideally you want to let the shorter processes run first. But if the short process comes slightly late and you have already started a long process then this is a non-preemptive policy. It will not preempt the running process. Once again let us take our same example of P1, P2, P3 that we have seen in the previous slide and let us try to work out what the schedule will be with the shortest job first. So, you have process P1 has arrived and it has started to run and it runs for the entire 5 units, right? Even though P2, P3 are arriving and they are shorter than P1, once P1 has started when P1 started it was the shortest job and just because these other processes arrive we are not preempting P1, we are not stopping P1 therefore P1 will continue to run for its entire duration of 5 time units. Now at this point after P1 finishes then what do you do? At this point you have both P2 and P3 but P3 is the shorter process therefore here is the difference from P4, right? P4 would have run P2 but the shortest job first will now run P3 and then it will run process P2 at the end. This is the schedule for shortest job first but as you can see we wanted to prioritize short processes, get them done quickly but that may not always happen especially if these processes, the short processes arrive a little late. So, therefore the improvement on this is a preemptive version of shortest job first that is called shortest remaining time first that is when a process arrives and its CPU burst is shorter than the remaining time of the current process then it will preempt the current process. If a shorter process comes it can preempt a currently running longer process thereby giving priority to shorter processes and this avoids the problem of short interactive processes getting stuck behind long processes. So, once again the same example let us see and let us see what happens with this preemptive shortest remaining time first, right? So, process P1 has run for one time unit, it has arrived and it has run. Now after one time unit we have process P2 is arriving. Now at this point P2 has a CPU burst of three units and P1 has run for one unit and it has four more units left. So, therefore this is shorter than this therefore P1 is preempted and you will run P2, okay? Now while P2 has run for two units after two more units after P2 we have P3 arriving. But when P3 is arriving P2 only has one unit of work left and P3 has two units of work, right? This two units of P2 are done, there is only one unit left and P3 has two units of work. Therefore P3 will not preempt P2, we will let P2 continue. After P2 finishes now P1 has four units left and P3 has two units left therefore P3 is the shorter one therefore we will run P3 and finally at the end this long P1 process will complete, right? So, this is the schedule that you will have with shortest remaining time first. Now all of these are process scheduling policies that assume that you know the run time of a process and so on, right? But in real life this may not be possible when you start, when you fork a process, when you run your A dot out program you do not know how long it is going to run for, right? So, therefore real life operating systems cannot make this assumption. So, now we will discuss a few scheduling policies that real schedulers can use. Another simple policy that is without all of these assumptions of knowing the CPU burst is what is called a round robin or a fair queuing and there is also a variant of it called weighted fair queuing, right? So, what is round robin? It is simple. You run the processes in a round robin fashion one after the other for a time slice each, right? You run process P1 for some duration of few time units then you run P2 then you run P3 you go through all your processes in a round robin fashion once you reach the end of your list then you come back again run P1 again for another time slice. You know maybe you run P1 for 10 milliseconds go to P2 10 milliseconds 10 milliseconds go do this all the time then come back again to P1, right? So, this is a simple round robin or a fair queuing policy that is fairly sharing the CPU across all the processes in the system. And you can also have a weighted version you can have a weighted round robin or a weighted fair queuing where you can assign weights or priorities to these processes. You can say process P1 is twice as important as process P2. Therefore, I will run P1 for 20 milliseconds I will run P2 for 10 millisecond P3 for 10 millisecond and so on, right? You can do this weighted fair share across different processes and where the time slice will be in proportion to the weight or the priority and this is a preemptive policy, right? Obviously at the end of a time slice you are going to preempt this process you can use the timer interrupt to go off at the end of the time slice and you are going to stop this process move on to the other process even if this process is still ready to run. Therefore, this is a preemptive policy and this is good for fairness and response time, right? Every process will get its turn pretty soon. You are not waiting for a process to finish and you know keep other processes waiting for a long time you are not doing that and especially if your time slice is small enough this scheduling policy has a very good response time the processes will respond will run at least for a short period of time very quickly and so if you have a real life scheduler you may not be able to enforce this time slice very exactly, right? If you say 10 millisecond maybe the timer interrupt when it goes off already 11 milliseconds have passed or maybe you know just before the time slice ends at 7 milliseconds the process has made a blocking system call, right? Real life processes you may not be able to exactly enforce this time slice. So, what real life schedulers will do is they will just adjust this excess or shortfall of the running time in future time slices, right? So, you will simply keep track of how long a process has run you will try to enforce the time slice approximately but you will simply keep track of how long a process has run and you will schedule the process that has used the least fraction of its fair share. So, suppose a process has run over its fair share and slightly used more than its fair share in this round in the next round you will deprioritize it slightly. On the other hand if a process did not use its fair share in this round in its next round it might get priority, right? So, you will pick a process that has used the smallest fraction of its fair share so far in this way you can compensate for slight overshooting undershooting of the time slice. So, this is a practical modification given that in real life you may not exactly be able to enforce the time slice and the Linux scheduler policy in the latest Linux kernel is a variant of this weighted fair queuing. Of course, it is very complex it has many other features in it but at the very basic level this is the simple idea behind the Linux scheduler. So, one other policy once again which is suited for real life implementation is what is called a multi-level feedback queue, right? So, here instead of round robin where you just had one list of processes in multi-level feedback queue what you do is you maintain multiple queues of processes, right? You can have this is one queue of high priority process then all the medium priority process you have another queue and the low priority process you have another queue. In this way you have multiple priority levels and at each priority level you will keep a separate queue of processes and you will schedule processes always starting from the highest priority level, right? If the scheduler has to run it will first go to the highest priority level run this process, run this process and at the same priority level processes of same priority you can use something like round robin, right? So, the scheduler always starts at the highest priority level runs all these processes in a round robin fashion then it moves on to the next priority level then to the next priority level and so on, right? So, this is a different way of running processes from the round robin and what is this priority? It can be set by the user or it can be adjusted by the scheduler itself. For example, the scheduler can do something like it can decay the priority with age that is if a process has run at a high priority level for some time and it has fully run for its time slice at this high priority level then you can the next round push it to a lower priority level, right? Why do you want to do that? This is to ensure that short processes if you have short IO bound processes that is if a process just runs for a small time goes waits for some IO disc operation to happen again runs for a short time again goes away. So, you want to prioritize such processes, right? This guy is only coming to you for a very short time. So, you might as well just run him quickly then again he will go back wait for some IO operation. So, such processes you want to prioritize them over processes that run on the CPU for a very long time, right? Over long CPU bound processes these will always run on the CPU for a very long time. Therefore, whenever you have a short IO bound process come in you want to quickly run that guy because anyway he has to wait for IO later on. So, how do you prioritize such processes? What you do is if any process did not use up its full time slice, right? It only ran for a short period of time then you let it remain at this priority level on the other hand if it used up its entire time slice you move it to a lower priority level. So, with time processes that are hogging the CPU that are taking up a large share of the CPU all the time these processes will move down the priority level whereas processes that are running for a very short period of time IO bound processes will run at a high priority level, right? So, this is an small heuristic that is used to ensure that IO bound processes which only run for short periods of time they are given preference. And of course, you do not want to give always preference to these IO bound processes, right? You do not want a CPU bound process to always be stuck at the lowest level and never complete. You know if a CPU bound processes at the lowest priority level and IO bound processes are always coming in the scheduler is always just running one of these processes and never coming here you do not want that. So, periodically in such algorithms which maintain strict priority levels periodically what you will do is you will reset all the processes to the highest priority level so that everybody gets a fair chance to compete once in a while, right? So, this is to avoid starvation of low priority processes. So, this is another example of a slightly realistic complex scheduler algorithm. So, now finally the last thing I want to talk to you about with schedulers is multi-core scheduling, right? So, all of these scheduler policies pick a process to run on one CPU core. Now, what if you have multiple CPU cores you have to schedule processes on the multiple CPU cores independently, right? So, there are different ways of doing it. If you have these multiple CPU cores what you can do is you can just maintain a common queue of processes, right? P1, P2 and so on and whenever a CPU core becomes free from this common queue you can schedule processes, okay? Process P1 you go here, right? This is like you are waiting at some counter in a single file and whichever counter becomes free you will go there. So, this core is free you run here, oh no this core has become free the next process will go here, right? You can keep doing this. This is any process basically can run at any CPU core that is free. This is one way of doing it. The other way of doing it is you somehow assign processes to cores, right? You have separate queues. These processes will run on this core these processes will run on this core and so on, right? You have separate queues waiting for each core and whenever so first this process will run on this core when its turn is done the next process the next process and so on, right? So, you can bind a process to a particular CPU core and run it on this core always or you can let any process run on any CPU core. Accordingly you will maintain this data structures of you know this list this queue of ready processes it can be a common queue or it can be a per core queue. So, what are the pros and cons of both these approaches? Both these are possible but they have their own advantages and disadvantages. So, ensuring that a process runs on the same core as far as possible this policy is better for the following reasons why? You will have cache locality that is when a process is running on a CPU core recollect that a CPU core has multiple layers of cache that are private to itself and you only have the last level cache that is common across different CPU cores. Therefore, if a process is running on a CPU core all of its some of its core data could be in the private caches of a CPU core, right? So, therefore it is simply more efficient and faster to let a process also just resume on the same core again in the future because then you will get good cache locality you will get good hit rate in the cache. The other reason is that you have some kind of NUMA systems which is in some systems we have discussed this a long back in some systems some memory is closer to some cores, right? You have a large number of cores and you have different RAM and for these cores it is faster to access this RAM for these CPU cores it is faster to access this RAM. So, in such systems what do you do? If the memory image of a process is in this RAM it is better to schedule the process in these cores if the memory image of the process is in this RAM it is better to schedule the process on these cores, right? So, this is called NUMA aware scheduling. In such cases a process that is there in this area of the RAM even if this CPU core is free you may not want to run it there because accessing this memory is very slow. So, this is the other reason why you might want to just pin processes to one CPU core or a set of CPU cores, right? And another reason is that if you have these per core queues it avoids synchronization. So, anytime different CPU cores have to access a common data structure there will be overheads like acquiring logs there will be cache coherence, right? We have studied all of these things. In general if a CPU core is accessing one item of data it is better if other CPU cores also do not access the same item of data, right? To avoid this synchronization across CPU cores also having per core queues is better. So, for all of these reasons it might seem like a good idea to just allocate processes to CPU cores and a CPU whenever it is free it will pick from its own queue. But the disadvantage of doing that is this is not flexible. If one core is overloaded this CPU has many processes that are taking a lot of time and some other CPU is just free that is not sensible, right? That does not make sense. So, in such cases what do you want to do? You want the CPU core to take some of the work from the overloaded core. Therefore if you are doing per core queues of ready processes then you must have some way for load balancing across these cores to ensure some uniform distribution of workload, right? So, these are all the things to keep in mind when you are thinking of a multi core scheduling algorithm. So, most of the schedulers in real operating systems have to consider all of these points because most systems today are multi core systems and all scheduling algorithms have to solve these issues around multi core scheduling. So, that is all we have for today's lecture. To summarize in this lecture we have studied what is a scheduling policy? What are its goals? We have seen some example scheduling policies starting from very simple ones on to more complex realistic ones and we have also seen what should a scheduler do in case of multi core systems. So, a small exercise for you just think of examples of real life schedulers, right? In real life if you have a counter and you have some queues and you know any time any system in real life where you have some scheduling going on just think of what is the kind of scheduling policies that are used in real life. For example, if you submit some applications to your office in a college do they look at the applications and pick the easiest application that take the smallest amount of time do they process them first? Do they process them in the order in which they came in? So, just look around in real life and try to find examples of the scheduling policies we have studied today. Thank you all that is all I have for this lecture and we will meet back again in the next lecture. Thanks.