 All right, let's get started. So today, we're going to build on some of the things we looked at last time, processes and threads, and look at some of the more details of how those are implemented and what their implications are. All right, so we'll review some of that material from last time, so it all looks familiar. We'll look at how threads get dispatched and how they can cooperate with each other and give some examples of concurrency. And we'll also sort of clear up some of the terminology that was left. We didn't quite get to last time. So multi-programming is the fundamental idea that's associated with processes. It's the idea of running multiple application programs at the same time on a processor. It's actually an old idea, so it predates interactive time sharing systems. It was originally used in mainframes that did batch processing. And basically, it would pull several jobs into memory and switch between them simply to utilize the CPU. So it would avoid the CPU being idle when one of the jobs is waiting for an IO operation. So it doesn't imply the real-time process systems that we have today. But on the other hand, there aren't many multi-programming systems that aren't interactive anymore, because there aren't really computers of that type. So multi-programmers become a bit of a synonym for multitasking, which means running those jobs with faster updates such that not only are there multiple virtual machines resident on one machine, but the perception of a user of any of those virtual machines is that they're actually running. They're not suspended for some amount of time. All right, so multi-programming is just simply that model that there are multiple resident processes in the one system's memory. And there's some regimen for switching between them and finishing all of them. It typically involves encapsulation of each process as a virtual machine with its own copy of memory, some kind of protected handle on the IO devices so that there isn't conflict between the different multi- different programs running on those different virtual machines. So the process is this abstraction that captures the program plus its running state and is typically associated with a virtual machine perspective. So the process is the thing running on the virtual machine, but the virtual machine is the thing that provides protection and an environment that simplifies the programmer's task of writing the code of the process, because from their perspective, they're just coding effectively on one machine. So process creation certainly is expensive. Switching between single-threaded processes is a little bit expensive, although we'll talk about that more. It's a lot less expensive than it used to be. Primarily, we want concurrency between complex applications. And so that's why we use threads. Within a process that's large and complex, we typically want to be doing many things at the same time, perhaps dealing with a network, dealing with the user interface, and so on. And those different threads of execution can most easily perform efficiently if they have shared memory. And so therefore, threads are a way to handle that. So with the thread, we separate the execution part of the process, the runnable state of the process from the allocation of the virtual machine. OK, so here's a Unix process. So it'll have the code over on the left, a memory. It's the virtual machine, so it has memory, including a stack and a heap, some IO state, and then the state of the CPU. So this part here are the resources that are being replicated as part of this process for each process, sort of the virtual machine part. OK, the instruction stream is a conventional sequence of instructions. And these resources here are managed some way, like memory we saw last time was actually partitioned. The physical memory is partitioned between the processes, so each one has its own apparent memory copy. IO state is similarly there are handles presented to each virtual machine so that the processes feel like they have access to everything. The CPU state, there's only one CPU though, so the operating system becomes responsible for basically making and maintaining copies of that that get swapped into the physical machine as you switch between processes. So that's the stuff that needs to get managed in order to do context switches between threads or processes. OK. So when you switch between different processes, the switching overhead certainly used to be quite high. We're going to look at that in some more detail now, because it's actually a much more complicated question, what you mean by a context switch for a process when it's multi-threaded. But anyway, it's more work than simply a thread switch. There's a small amount of CPU state to manage. A lot of memory state to manage that state has to be created and initialized when you start the process. And for that reason, the creation of a process is a lot of time. In fact, it's an arbitrary large amount of time, because it can involve initializing a large block of memory and also the typically a mapping table between the processes, view of memory and the physical memory. But it does provide very high protection of the CPU and high protection of other resources. Yeah, and we also saw last time that it's kind of expensive to do shared memory. You have to somehow allocate blocks from different processes that are the same physical block. And then also you have to do a context switch between them. So something written by one process can be read by another. Yeah, it's all the instructions that are run in between running the code of one process and the code of the other. So it's operating systems, calls, physical machine interrupts, often some very expensive instructions that are saving lots of registers somewhere. There's quite a lot of hardware support now for context switching. So all of that is happening in between when one thread stops or one process stops and another one starts. So that's the overhead part. So let's see. Oh, OK, all right. Yeah, and on a single CPU architecture, we're only running one process at a time. All right, now in contrast, threads, because they don't involve switching of memory or IO resources, have a somewhat lower switching overhead. They also have a lower creation overhead because a thread is going to be running in a particular memory environment of a containing process. So there's no need to initialize that memory. All that needs to happen is that the registers need to be initialized for that thread and a stack. One or two stacks need to be created. That's a very small amount of overhead. The threads protect the CPU state whenever you do a context switch of a thread. The old state gets saved and some new state gets read into the CPU. But the memory management's entirely up to the programmer. All the threads will have access to the same memory. Yeah, and the cost of sharing is low, meaning you can easily share data quickly between the threads because the context switching is fast. All right, and when you run this on multi-core processors, you even reduce somewhat the overhead of switching because there are actually registers that set trial on each of the cores, which can potentially be running distinct threads. And so you avoid that overhead of storing and saving registers as long as you're staying within one thread on each core. Yeah, so for four threads without context switches, there's no overhead. Typically, you want to switch some of those threads with other threads, and the usual overhead will come in. But if you think about it, the switching overheads only happening about one quarter as often. All right, and lastly, hyperthreading is a kind of a special support for threading where within each core, which by definition contains one copy of most of the CPU resources like arithmetic logic unit, memory management, and so on, they do have, though, multiple copies of the core CPU state, namely the registers. Because a hyperthreaded core has multiple register sets, there's no need to switch the state between them. No need to save them and read them back in. So the context switch is often like a single instruction of the order of nanoseconds, which is much faster than conventional context switches. The downside, though, is that when you switch to your other set of registers, you may not be able to do anything if you may be waiting for any of the other operations in the core to finish. So hyperthreading give you somewhere between a factor one and two improvement most of the time. All right, and last time we introduced the process control block. All right, so that's the current state of a process, including things like the state that the process is in, whether it's ready, running, et cetera. Number, to identify it. It's program counter, the register state, memory limits, and perhaps a map location, and then a list of the IR resources. Yeah, question? Yeah, so there's two different, yeah. So this is for a classical process with one thread. There's typically a different structure called a PEB. I'm trying to remember the acronym for that, but that's a process environment block, I think, is the name of that thing. And that's typically used for multi-threaded processes, and it doesn't have the, you instead have separate thread control blocks for the threads. OK, so we looked at last time at this switch from process to process. So these process control blocks are bits of memory that save the running state of a process. When you want to switch, you have to read that running state into the CPU. So let's say we're executing a process P0. When there's an interrupt or when it yields, then those objects on the previous slide get saved into a memory location in the form of that PCB struct. And when you're done there, the OS can reload the state of some other ready process, put it back into the registers and so on of the CPU, and then basically transfer control to the PC address in that PCB, that causes the second process to start executing. And that will iterate, run for a while until something makes it stop. It'll come back and save its state and so on. All right, so that's canonically what the context switch is. And as we mentioned, part in the middle, that code is typically running in the operating system. So along with these, the horizontal arrows is often a change in running mode from user mode to kernel mode and back again associated with this. And that means that the code in the middle can typically access many more critical resources than the user code, but when the user code runs, it has access to its own resources. OK, so last time someone asked about the actual practical times for doing this. So I went around and read some papers. And it's actually quite interesting. So for a context switch in Linux, current Linux, it's around three to four microseconds to do that on also current Intel CPUs. But there are some surprising numbers around this. So actually, right now, there isn't much of a difference between context switching for a single threaded process versus a single thread. So it's definitely doing more work. But for a variety of reasons, that extra work, saving some process of state to do with the IO resources and so on, into memory is very fast. Anyone want to guess why that would be? Yeah, it does have some specialized hardware for context switching, and that helps. Although, funnily enough, it seems like a lot of the common operating systems aren't using the hardware instructions for context switching because they're a little bit too aggressive in saving the register state. So the operating system, the software, typically knows what exactly stuff it needs to save. And so Linux and Windows actually do that in software still. But there's another piece of hardware that can help saving a loading. Yeah, so that's the key thing. Although you've got to save this extra bit of state, it's not very big. It's sort of tens of words. Even if it's a memory map, I guess you don't save the whole map. That's just a piece of memory. The point is it's not very much. And it fits easily in the inner cache. And when you have threads running back and forth, even dozens of them, you're really not consuming much memory. So that state that's flipping back and forth and sitting in the inner cache and is accessible very, very fast. So in fact, the caching has really accelerated the context switching to the point where it's hardly different between processes and threads. Assuming, again, that the process is a single thread. OK, but there are some big differences in performance, depending on other factors. So the context switching in a multi-core system, there's no particular reason that a particular process or thread will run on the same core the next time. It's normally going to run on an available core, which will often be different. So that has a bad cache consequence. Why is that bad from a caching point of view? Yeah? Yeah, so the Intel machines have level one and level two in each core. Sony out level three is shared. So yeah, so there's really fast cores. They'll only have the state of that process if you run it before. And if you run it a long time before, it'll be stale. So in fact, there's big overheads head for switching across the cores. Another thing is that there's a sort of related memory penalty which is related to working set. So that's the part of the memory that the processor has been using recently. So this is not automatically managed. But when a process runs, it has something called a working set, which is a set of memory locations that it's just regularly hitting over an interval of time. When you stop the process, it's sort of ranging over this set of memory locations. When you start, it's going to range over the same set. In the meantime, other processors might have been ranging somewhere else and kind of pushed that working set out of the cache. So when you start your process again and it wants to start using that memory again, that memory has to get swapped back in. So if those working sets are large and even in the tens of kilobytes, suddenly the context switching becomes, and this is like the net overhead. This doesn't happen instantly. It happens after your process runs for a little while and starts moving over its working set. Well, the process has all the stuff that's related to an entirely different memory, a virtual memory copy. The thing is that that state that's associated with a different memory copy is also in memory. So you're not moving it around. So basically, different processes, I really want to show that picture from last time, but if you recall, there was a picture I showed with physical memory divided into blocks. Some of the blocks are associated with a process, process one. Some of the blocks are associated with process two. So processes are just dealing with different memory. There is some state needed, a small amount of state to sort of tell the process to use, say the blank locations versus the white locations. So that threads don't have that. All the threads running in this right hand process are, you know, they have exactly the same view of memory. So there is additional stuff associated with the process, but it just turns out that actually moving it around and changing it is very fast, yeah? Well, no, I mean, no, the process. So I, well, you know, you can think of it as a process which between these two processes as being a threads, a thread context which with a little bit extra, right? The extra is just changing around the memory map roughly and the IO map. So yeah, that's valid. But again, they know that the process, the idea of a process and a thread is still distinct because one has this extra stuff that makes the virtual machine appear the way it does. Correct, yeah, yeah, if you have a big process with lots of threads, switching between those threads is gonna take about the same amount of time as switching between two very simple processes that have a single thread. And we have both, both kinds are pretty common. All right, so the working set, I just explained that, that's just the set of memory locations that are used in a small amount of time. All right, so the lesson from this is that context switching these days seem to depend mostly on memory access patterns, caching, working set and a whole lot less on sort of the intrinsic differences between processes and threads, yeah? No, that's the difference. That's slightly faster, sorry, 100 nanoseconds difference. I'm sorry if you don't, I'm sorry if that wasn't clear, that's the difference. So it's slightly faster because the basic context switching time is three to four microseconds, so the difference is about 3%. All right, so, all right, but there's a big caveat and we got into this already, which is many processes are multi-threaded. So, and we have to be a bit careful what we mean by a context switch for one of those processes, that's more typically called suspending the process, like if you actually take all the threads out of the runnable state, then you're really doing 30 or 100 thread context switches, and that's often orders of magnitude slower than single threading. So, yeah, here's just a snapshot of my Windows box with the top process and then the number of threads, so you can see that a lot of the processes really are running lots of threads. What it doesn't show though is you get down here, there's a lot of single threaded processes too. So, the ones down there context switch really fast. These ones are actually rarely, you know, the process itself is not really being switched in and out much, it's just coming in and the threads are being managed. The operation at the process level is called suspending and that doesn't happen nearly as often. And usually there's a special reason for doing it. Is that clear? Yeah. Yeah. Not in any, no, pretty much their process context and maps would be sitting in memory and their threads would be actually, the threads would be in a runnable or ready state in memory. So really all of this stuff would be sitting in memory most of the time. Only a small number of these would actually be active though. Yeah, well again, stuff that's actually listed as running is normally sitting in memory, taking up some memory. Well, yeah, exactly this much. So yeah, so processes that are active are consuming memory even if they're not, processes that are not suspended doesn't imply they're actually running, the threads are running, but when they're active, yeah, they're consuming memory. And a lot of the impact on memory these days is coming from the huge number of processes that people have, very few of which are actually doing real work most of the time. Yeah? Well, not on a multi-core processor, so K of them will be running if you have K cores or 2K. No, no, you can, because each of these is in effect running in a virtual machine, your OS can be running a thread from one process and a thread from another process. You don't have to, because you can't run all of these at the same time anyway, so you've got to sort of pick and choose. Okay, all right, so we did a classification last time of operating systems based on number of address spaces, which is basically a number of processes. And then, whether or not they supported multiple threads per address space, in other words, multiple threads per process. So early operating systems only had a single thread, really, so not even really processes. A lot of embedded systems emphasize threads over processes, so they're running sort of really just one process that has many threads in order to minimize the consumption of memory. As you can see, the Windows box I showed you is consuming tons of memory from these processes that all have disjoint memory blocks. When you only have a little bit of memory, that's not a good idea, so embedded systems tend to just have the one address space with lots of threads in it. So older Unixes tend to have the single threaded processes in them. But now, almost every operating system has processes that support many different threads. All right, so it's starting to become clear. Okay, let's see, thread state. And so threads, when they're running in a process, they share all of the IO and memory. So in particular, any kind of global variables, constant memory heap is shared across all of the threads, and the IO state is as well. By design, it's done that way. On the other hand, we already talked about the thread control block and registers, which are also, by the way, stored in the thread control block when you're saving the thread. But what about the stack? Well, you know what a stack is, but where does that live? Go ahead. Yes, it does. No, no, no, no, no, the process has multiple, if a process has multiple threads, it has multiple stacks. That's on the next slide. There's no simple answer to that question. You just have to figure out some scheme for doing that. So, you know, a stack, it contains the local variables, arguments to functions, and also program counters, typically also frame pointers. So it's stuff that's essentially the complete state of each function call. So let's quickly review that. So we have some code up here with a few functions calling each other, and then if we work through the instructions in that code, and then look at the stack on the right-hand side. Initially, when we call that function A, it should have the argument into the parameter temp, and then it also typically has some memory allocated, a stack memory allocated for the return value. You haven't generated the return value, but you need to save that space in case this function wants to call something else. You need to be able to save that space and write it later on. All right, so then we do some another call. So the stack grows now. Oh yeah, and I guess we've got a return, excuse me, we've got a return address too, which is a program counter value. So in fact, I guess we, yeah, no, I'm sorry, I mixed things up, the ret is an address since these functions aren't returning anything. If they were returning something, you needed extra location, okay. So we call C, and because these functions take no arguments, have no local variables, all that we're putting on the stack is the return program counter addresses. So let's see, whoops, A2 actually did have an argument. So when we push that frame associated with that call, then A gets the argument temp. And so this is a recursive function, you can see we're back to calling A again in a nested call. This time no temp is not meeting the test, and so there won't be a recursive call. We'll just print something and then start returning. So as we return from A, we pop that frame of A back off the stack, and simply as we return from C, that one comes off, return from B, that one disappears, and then A disappears. So hopefully that's all familiar. So imagine we wrote a program that was a, thorough calculation of the digits of pi, and it would save all the digits of pi in the file pi.x. What's the problem with this program as written? What's that? Yes. So what's the downside of that? So the first one won't finish, and if we wanted to print the glass list, we won't get to it. All right, so yeah, so the behavior you described is correct. Basically we'll keep running. And we have, in other words, a massive task. In fact, an unbounded task that's running before a simple task. So we would rather run that concurrently, run those two calculations concurrently. So that's shown schematically here. So we'd like to create a thread to run the first calculation. When you create a thread or fork a thread that call to fork the thread will return immediately and then you execute the second piece of code while the first thread is still executing. All right, so create thread is, we'll get into this more next time, but it's gonna actually create a whole copy of the current process, make some modifications to it and then run that. So that starts an independent thread and you can figure out what happens here, right? Both calculations run, which we'll get our second calculation completing quickly long before the first one does. All right, so whether or not we have multiple physical CPUs, it behaves as though we had basically two different CPUs executing the code at the same time, okay? All right, so here we're getting into the details of these stacks. So here's the memory space for the process. We have two registers which are in the two virtual CPUs. We don't see them here, but what's in memory is, you know, we said everything's shared between the two threads, so all of this is common, but the stacks are not shared. So in fact, we have to pick somehow two blocks of memory to allocate to those two stacks. So that's a complication of multi-threaded programming. That's the most significant complication. Those two stacks can't just simply grow in the opposite, or they can't both grow in the opposite direction as the heap as you would with the single threaded process. With single threaded process, you don't worry about how to dig to make the heap or the stack because you'll basically use everything until you run out. Here, though, you've got to decide where to put the second stack. And it gives you more vulnerabilities to running out of memory sooner. And in fact, in general, for complex processes, I guess Firefox and so on, you had about 50 of them to situate in your memory space. So that's an open question. There's no simple answer, but you have to somehow make some balance on the size of those stacks. In general, it's good to avoid heavily recursive functions because those are the things that are gonna chew up a lot of stack space. It's also good to avoid very large local variable arrays because those also use a massive amount of stack, yeah. All right, so that's a good question. So should we actually, well, it doesn't quite solve the problem. It's a protection step more than a resource step in the sense that you'd have to make the commitment for each virtual memory view. Like for one thread, you'd have to say here's, well, it does, actually, there's a few complexities to doing that. You could make a commitment that only a certain thread could see one of these stack areas and write to it. But that implies that that thread then has a different map from the other thread, right? And it also applies that you have, it implies that you have to make a commitment in your memory map as to which blocks are the stack blocks. So it doesn't, it helps in terms of protection in that you can avoid thread one stack crashing into thread two. That would probably not have, that would probably be throwing an exception anyway, even with a simpler scheme where they're in the same address space. But yeah, but the big penalty with that is that you've turned your threads into sort of almost process in that they have to somehow have this additional state that knows which is the stacks associated with which thread. So there's sort of a complexity penalty to doing that, yeah. I didn't quite get all of that question. The, yeah. Well, no, that's by definition not happening. I mean, each thread has the same address space. If you have a program or a process with multiple threads, they all see the same address space. Well, but you still need to have them, give them distinct stacks somehow. Although they can see each other's stack, but they're only using one of the, one of those regions for their own stack. So that's a necessity. And you know, you probably would have a protection layer for each thread to prevent them from riding either out, either outside their own area or perhaps into the heat. But anyway, it is more complicated than what we've talked about. The good side of this is that because the stack state is just sitting in memory, you don't have to do anything when you context switch. It's just, it's just sitting there. Thread number two doesn't do anything. I get, it's not quite true. It has to save its stack pointer in its thread control block. That's all it has to do. Actually, yeah, that's all it has to do. All right. Okay, so here's the thread control block. And it contains, it only needs to contain the CPU registered program counter and the stack pointer for that thread. Yeah, there's other stuff to do with its thread priority and so on. It's share of CPU time and so on. Stuff that the operating system uses in order to choose how to schedule it. Yeah, there's a little bit of extra stuff and we'll see why we need this very shortly. When the thread's not actually running, it has to go in a ready queue and this stuff contains data about which queue to go into and where it is exactly. Also probably a link list of other threads. Okay, pointed to the enclosing process control block or PEB and other stuff. Yeah, so the TCBs are in a protected area of memory that only the kernel can access to make sure that they don't get damaged because they would severely corrupt the whole running system if they were. Okay. All right. So last time we looked at the running lifetime of threads and just quickly they start off being created first and before they can actually be run, they go into some queue somewhere. That's it when they're in the ready state. They can then actually be run until some either interrupt or some kind of yield or halt happens. Then they go back into a waiting state or if they're just simply interrupted, they would go back into the ready state. If they're just interrupted by VOS. And finally they'll be terminated at some point. So the idea for implementing this is the ready queue. So if you think about it, if you look at that picture I showed you earlier of my task manager, if you actually counted all the threads, there's hundreds, probably thousands of them. So clearly most of them are actually not running at a given time. There's maybe eight of them running. The others are sitting in the ready state. So we have these hundreds or thousands of processes to manage and so we use ready queues to do that. We want to manage them in some way that they can basically instantly swap back in independent of how many there are. All right. So the idea of the context switch for a single threaded process or a thread doesn't really apply to a multi-threaded process. Again, that's the suspending operation which is really quite different. Suspending a multi-threaded process does mean, it actually means typically unloading all of the thread information from memory. So it's a much more massive step but it also doesn't happen much. Most of the time the process, the whole process is read into memory and its threads are just put into ready queues. So that's really the threads that are getting switched not the process for a multi-threaded process which is also the consequences that the multi-threaded process just sits there reading memory. All right, so most threads are in a ready state but there's a lot of them. We want to make sure that we can swap them back in basically in constant time. And so we use these scheduling queues and so you can think of those as kind of a bunch of workers, they're in a ready state but they're also arranged in a queue so they can conveniently spring to action when there's a task. All right, so there's typically a bunch of ready queues in fact and the ready queues are organized very often by a particular resource that could be causing a wait or essentially releasing a wait. So there could be a basic ready queue of things sort of just ready based on the operating system. So those things are not really waiting on anything, they're just runnable looking for a chunk of CPU to run. So all right, so that's simply a link list and you'll typically take the first element of the link list, swap it in, take another running process, swap it out and then put it at the end of the link list. So yeah, you use a link list with a tail pointer so that you can pull from the front and add to the back. I mean, it's a queue, right? I shouldn't say link list, it's a queue. Similarly, well, some things you may not be waiting on them or you may not have any, excuse me, any threads that are waiting on that resource and so that queue is empty. The disk very often will have stuff waiting on it and again we have this queue of processes or threads and we can basically pull from the head of that and then push on the tail when we swap something out. Okay, and similarly, ethernet. So is that making sense? So that way, the reason it's done this way is if let's say, if you had a single queue and for instance, one of these events freed up or swapped out a task, you'd have to look through the single list to find the next task that's waiting on that resource. So this way, you don't have to do any traversal of the lists in order to find the next runnable thing. Yeah, yeah, I'm sure those kind of things can happen. Yeah, I mean, right, if you're running, that's right. Yeah, if the operating system determines that you're currently running, yeah, I guess I shouldn't probably, once the process goes into the operating system, it's no longer really associated with any of these things. Basically, something stops the process that the OS is currently running and when it stops, then the OS should know which queue to put it in. So it's basically, yeah, so this is just for the purposes of maintaining a queue such that the appropriate event will trigger the appropriate next process. Yeah, well, the idea is simply that you want to be able to maintain a queue, which means in order to be fair, I'd like to be able to give every one of these things a turn, which means I wanna pull from one end and add to the other end so that if I keep doing that, everything will get the same amount of time actually running. Okay, so, all right, just to remind you, once again, a project signup is due by tomorrow. And make sure all of your class account names are in the application form. Select please three times, here's our section times again. And you should attend a section this week so there'd be only two left now, so make sure you attend one of the ones tomorrow if you haven't already been to one. Was there a question? Yeah, no, no, no, no. I mean, the queues are just the utility that the CPU is gonna use. They're not directly processing or dealing with interrupts or states. The operating system will have a trap sent to it from either a disk event or a running program that says yield because I'm waiting for something. Then the operating system can say, all right, here's the process, I'm gonna suspend this process, but what am I gonna do with it? Well, it's waiting on the ethernet, so I'm gonna stick it at the end of this queue. So in a different, okay, that's one kind of trap which is sort of a yield trap. The other kind of system call that it'll get is that the ethernet is currently is just cleared, is the buffer is empty. So when it gets that kind of trap, it says, oh, look, I can run another thing or similarly with disk, it'll get a trap that says the buffer is free now. When it gets that kind of trap, it says, okay, and if in addition it has a slot to schedule something, it can take an appropriate task from the queue associated with that kind of resource freeing event and then schedule that. Or even if I guess, you know, and if there isn't a slot to schedule, it may decide to preempt or it may wait a little while and then preempt a running process so that that available resource can be used. Yeah. No, the operating system wouldn't know anything about what's gonna happen. It's just, it knows nothing. But whenever something does happen, it's inevitably, it's made known to the operating system through a system call or an interrupt. And those are always specific to a device. So it's always sort of processing this, the set of events which are may or may not be associated with threads, right? Some events are not associated with threads, they're just, this device is buffer is empty. So then it can typically use this thing to find threads that would care about that. Sometimes though the events do come from system calls from particular threads. And in that case, from the type of the event, that's how it determines where to put that thread if it's suspended. So it's really a few different cases. But there should always be plenty of information for the operating system to decide what to do. All right, so, all right. So we're at the break time, so let's take a five minute break and we'll finish up after that. Okay, let's continue. And before we get going again, I just wanna point out that I misread this day. I'm sorry, this is not Thursday. From this distance, maybe the font's not big enough. But anyway, so I'm sorry, but the sections are all Tuesdays and Wednesdays, so you had to already attend one this week. Sorry about that. Okay. All right, so let's now try to make the operating system scheduling more concrete. We'll give a very simple picture this time because later on we're gonna explain different strategies that operating systems use in order to provide various guarantees. But for now, let's just look at the basic function. Okay, so the operating system, the basic loop is to run a thread. You know, let's just say you start a very first thread. Then choose the next thread to run. So that's the sort of switching operation. Save the state of the current thread and load the state of the CPU. So that's basically one traversal of that, the thread PCB saving diagram that we had earlier as traversing from one to the other. And it's an ongoing loop. So that's the main function arguably that the OS does. It switches from one thread to another or one single process to another. So if you wanna start running a thread, we covered that a little bit. You wanna take, assuming it's got a process control block, you wanna load that into the CPU. Where does the process control block come from the first time? Well, a lot of the information is often encoded in an executable binary. So the binary is typically contained programs and data, but they'll also contain any initialization of registers. So there's sort of a precursor of the process control block and thread control block that's encoded right in an executable. So you use that information to initialize the registers. And if it's a process, you'll also have to initialize all of the other virtual resources, the memory and the IO devices. And then from the initial data, so the executable, among other things, will always contain a start location, a start address to run, which you load into the program counter and then run. That's the entry point of the program. So by now you have lots of ideas about, so how does the dispatcher get control back? Quickly, what are some ways it gets? How does the OS get control back, yeah? Right, and specifically what is the IO? Yeah, but okay, that could be a couple of different ways. What does it mean IO? How does, what physically happens in the machine? Well, I mean, for it to get to the OS, it has to somehow have either a trap and exception or an interrupt, physical interrupt. So the computer, some wire has to get, yeah, or a yield, exactly, right. So the yield is a software instruction in the program that will typically cause a system call as well. Yeah, so we'll get more into that, but it does a sort of a special kind of call, function call, which also has the effect of doing things like priority switching. Okay, so all right, so we have some of these events. The yield and we have other kinds of interrupt events which can preempt the threads. All right, so let's see, internal events, waiting for some IO, also when the IO becomes free, that also triggers an event. Signalling is a mechanism that's a low-level mechanism for communication between threads and the yield, which is an important one, which is a friendly way of giving resources back to the operating system in your program when you've realized that you're in a waiting state. And actually compute pi, this is a nice version of compute pi where it computes the digit and then returns control to the OS. So early operating systems including Windows 3.1 and even DOS had a facility for something like multitasking by having lots of yield instructions in things like spreadsheet code. Since it had no real threads, the application code always had to call yield so that the operating system could start. Okay, so in our simplified view of the world of processes and threads, we said the threads only had one, there was one stack per thread. In practice there's usually at least two, at least in Linux. The reason is that we want very different protection on user code and user state from kernel state. So in fact, user code's running with it, usually a separate stack from kernel code. So when there's a trap issued, let's say from a call to yield, the trap will actually trigger an interrupt and the interrupt handler is gonna actually switch the stack pointer as well and move to a kernel stack. And that way you can protect the operating, again some of the critical resources, namely the stack state of the kernel from bad user code. All right. So now we're back in the kernel and we'd like to get to the other thread. So we've worked through some of how this happens to do with the queues. So depending on what event brought us in here, we can choose an appropriate thread that might be waiting on some resource that we've just learned is free. So we pick that new thread and then we do the switch, which is pushing out the old process control block or thread control block and reading in a new one and then also flip the program counter. Okay. Yeah, and this is the bit of subtlety that I don't really think we wanna get into right now, but usually there's some cleanup that happens later on. Even if this current thread is ready to be killed, we don't usually kill it right away. I'll get into that later. Yeah, there's a special thread that should do the cleanup because if we just do the context switch without the cleanup we'll actually give better performance to the next thread. Lower latency. Okay. So and it's part of switching. We want to save the state of the old thread and that's what's going into the TCB or PCB and then we'll read in the new one. So again, here's the picture. If we have a first thread goes along and calls yield that causes a trap, which then starts the kernel code which starts with its own stack. And as part of the switch, we do an actual, we reload the CPU registers and when we load the program counter that's when the code actually starts executing the other thread. All right, and suppose that other piece of code calls A first, which calls B which comes down and calls yield which causes a trap back into kernel code does the preamble and sets things up to do the switch and then goes back to the user code. Yeah, I mean, this is not very clear. It's going back all the way to the top to the user stack and the user code. Okay, is that clear? Yeah. Yeah. Oh, I see, I mean what happens to the, let's see, let me think. Yeah, so where does that get cleaned up? So because, yeah, all right, so we're not actually calling A again. We're entering back here. So in a sense, the stack pointer is going to point to the top of A but the program pointer is going to point back here again. Does that make sense? So yeah, because once these stacks are set up, we're not, you know, how can I put this? We're not making recursive calls where our program pointer doesn't go back to the start of A after the first call. Maybe I should have made, I was assuming that the context which here starts off at the beginning of the second thread, by the time we come back to the first thread, we're coming back, program counter right here. So it's not allocating anymore on the stack. We're just gonna be shuttling back and forth between those two states. The kernel code though, this does have to get removed each time. So part of the switch is to basically execute a pop or a return. It's like a call but it's like a return. So it clears this stack and then starts running the other piece of user code so that the next time we come down to yield, it will make another call and create a new stack frame for the kernel yield. But when it does that, it'll have a clear stack. So the cleanup code or rather the switch code itself is implemented as a special kind of a kind of return slash call that's gonna clear that red stack frame. Probably should have shown that on the animation. Okay, anyway, is that making sense? All right, so let's quickly examine some of the hardware details behind interrupts. So interrupts are really physical events, electrical signals that happen in the hardware, triggered by things like data becoming available or data having been completely written or mouse keyboard events and so on. So those trigger transitions in the hardware. The IO hardware on the chips recognizes those as interrupt events and associated with those events typically as special addresses, interrupt vector addresses and then in the often actually in the standard memory these days there, each vector is associated with a particular program counter. So in other words, it allows the interrupt to turn into a system call. So part of that, an important part of keeping everything running smoothly though is the priority system so that high priority events can create lower priority interrupts but not vice versa. Because the interrupt triggers something that's like a system call, it may take some time to return from that call. So the, yeah, so interrupts have priority, they also actually have masks as well. So you have the ability to turn off interrupts while you're processing other interrupts. Sometimes you might have a low priority interrupt that has a critical section, so it can't be interrupted and you'll implement that by masking out the other interrupts, which turns them all off. Normally though, you'd like to leave the higher priority interrupt lines enabled so that they can interrupt you. All right, and the priority encoder picks the highest enabled interrupt. Usually they're coming at different times which means it will execute a high priority interrupt, higher priority one than what's currently active. On the other hand, if an interrupt comes in that's lower in priority than the currently running interrupt, it won't execute until that one finishes. So I think, yeah, and there's usually a global interrupt, a GIE flag, and there's also usually non-interruptible, non-maskable interrupts that make sure that, for instance, critical timers can keep running and keep watchdog code running, especially in real-time systems to make sure that some part of the kernel is always going. So in a preemptive operating system there's a particular type of interrupt that's really important, which is timer-driven interrupts because they're guaranteed to always be happening at regular intervals and they're guaranteed that you always periodically go into the kernel to at least do scheduling and at least keep your runnable processes running. So the periodic housekeeping also, by the way, will include cleaning up processes that are no longer running and so on. And they'll go back and start running new runnable threads. It's called preemptive multithreading because the operating system via the timer interrupts preempting and interrupting running code. Non-preemptive multitasking is the kind that we talked about earlier where, for instance, in MS-DOS, the spreadsheet programs periodically call yield and give control back at appropriate times. Obviously, though, that assumes that code's extremely well-behaved, which no one expects these days. So preemptive multitasking guarantees that even bad codes being interrupted and allowing you to run good code periodically, yeah. No, almost never does a compiler insert any yield statements. Among other things, it wouldn't work unless those yield statements are executed close together in time and it's undecidable how long it'll take between one call and another. So people do use it in real-time systems sometimes. You just put a lot of yields in so that it can increase sometimes the quality. What it helps do is that putting yields in appropriate places makes it less likely that there's an interrupt in an inappropriate place. So it's often a good strategy for real-time. Okay, was there another question? Yeah? Well, I mean, this kind of timer, it's probably 10 milliseconds or 50 milliseconds, something like that. And it's just gonna slice between different runnable processes. So because it's preemptive, if you have a program that's trying to run for 30 seconds, it's just gonna dice it up into little chunks. Okay, so the idea of threads was to have better cooperation within runnable processes. So that assumes that they cooperate with each other and don't compete too much. It does allow you to share resources. And for instance, it's sometimes simplifies programming substantially by not requiring you to have to replicate data that's used to service different, say different user events. So all right, so the threads help you share resources. You don't have to replicate them for different clients being serviced. Sometimes they also speed things up. They also push some of the resource management into the application code, which can be good for performance. Okay. All right, so if you imagine we wanted to build a threaded web server, it would have a loop like this, which is basically accept connections. So in a TCP system, normally you create a listener first on the server which can accept client connections. Once you receive a connection, there'll be a connection object returned by that call. It'll block until it gets one. Once you receive that, then you can start a process to handle that connection with the user. And thread create, because it spawns a thread, returns immediately and allows you to go back in the loop and listen again for a connection. So the main thread here is always running and always listening. Whenever it gets a client connection that needs significant time, though, spent to service that connection, it spawns a separate thread to do that. Okay. And so an advantage of doing things this way versus trying to do a process version is that here we don't know ahead of time how many processes we might need. And we'd be trying to create processes each time. And even though processes and threads have similar switching times, the creation's very different. Again, remember that to create a process, you've got to create this whole virtual machine, initialize a big chunk of memory and so on. So that would be really expensive. And you can't easily sort of anticipate how much you'll need. And even if you did, you don't know how many resources the particular processes would need. Very difficult. The threaded version doesn't have that problem and it can adaptively decide how much memory it needs for each client as it goes along. So it's cheaper and it uses memory more efficiently. All right, so the last idea is thread pools, which are a way to basically create a better match between the number of threads that you desire and the number that can physically run on the machine. So thread pools are basically separate the number of sort of running active threads from the number that you would like to run by creating something called a thread pool. All right, well, we're right at the limit here. This is the last example and we just have to look at this one online, I guess. But it's similar to what we looked at before. So you receive a request and then process the request for an update. And each of these service requests might process IO in order to speed it up, we add a thread instead of sequentially doing the operations. There's the thread there. Oh yeah, all right, we're getting into currency now. All right, sorry. So I think we'll perhaps go over this next time, but threads create an additional problem, which is a synchrony, because they can execute in different orders. Sorry, I'll finish that next time.