 Okay everybody welcome back to 162. Today we're going to dive into some actual implementation details and start talking about how threads are implemented in the kernel and some things you need to worry about synchronization. So welcome back. If you remember from last time we were talking about high level APIs. Today we're going to be talking about synchronization. In particular we're going to start by understanding how the operating system gives you concurrency through threads and a brief discussion of process thread states and scheduling some high-level discussion of how stacks contribute to concurrency. On Monday we'll talk more about what Pintos does to give you threads and even dive deeper but today we're going to start diving into the kernel and then we're going to talk about why we need synchronization and then we're going to explore locks and semaphores in a little more detail. So if you recall though from last time we talked about inter-process communication or IPC and that was a mechanism to create communication channel between distinct processes and the reason we wanted to do that was well we started with all of this work to make sure processes were isolated from each other but then we need to figure out how to selectively punch holes into that protection so that those processes can communicate when they want to and we are going to need to start thinking about protocols so maybe there's a serialization format especially if you go across the network. One good thing about having processes rather than combining everything into one process is you can have failure isolation that can get interesting. We'll talk more about that later in the term and there's many uses in interaction patterns here once you have processes possibly spreading across the network and then you combine them together you can do all sorts of interesting things and toward the end of the term we're going to even talk about peer-to-peer style communications and cloud communications as well. So the other thing we were talking about is we talked about types of IPC so for instance we talked about UNIX pipes and the idea here is very simple a UNIX pipe is a data structure queue inside the kernel one process can write to the input end of the pipe and the other process can read from it and notice that we're using the read and write system calls the low-level raw interfaces just like we would if that were a file except that this is a in-memory queue of limited size and so as a result it's more efficient if we're not trying to make things persistent. So the memory buffer is finite which basically means that if the producer tries to write it and the buffers fold and it blocks which means it goes to sleep and if the consumer tries to read when the buffers empty it blocks which means it goes to sleep today we're going to start understanding what it means to get put to sleep and how that actually works okay. We also talked briefly about this particular system called pipe that takes a to entry array of file descriptors and it fills them with the read and write end of the pipe and we also talked about then how to go through with fork to set up communication between two processes so you should take a look at the last part of the lecture last time. The other thing is we talked about sockets and key idea here was that well pipes are communication on the same machine but we could have communication across the world that looks also like file IO and so we had this notion of a socket which is a bi-directional communication channel now pipes of course were single direction kind of like half duplex if you will a socket is an endpoint for communication and bi-directional communication and so you have a socket on either side there's a process for connecting which we talked about and notice that cues the green things here that are inside the sockets are not pipes there was some discussion of that on piazza cues are places to hold and temporarily hold information so when you take two sockets and you connect them together now you actually have a communication channel two directions that are independent of each other from process to process and that could be on the same machine it could be in the local area network it could be spanning the globe and so as part of that discussion we did a brief talk about how sockets get set up for TCP IP and if you see the two green sockets here are the final communication but we talked about how you set up a server socket that is bound on a certain port the first socket of the client side requests a connection the server socket then produces a new socket just for that connection and now this yellow channel between the two sockets is a unique channel and it's defined uniquely by these five numbers that you see at the left the source IP address the destination IP address the source port number destination port number and protocol where we're talking TCP IP in this particular instance and the the client side of this socket often has a random port and that's why in fact you can have multiple tabs on a browser that all connected the same website and act independently on the server side you have well-known ports like 80 for the web or 443 for secure web 25 percent mail etc and the well-known ports are all from 0 to 1024 okay and then last but not least we went through several different versions of a web server like protocol this was the very first one we looked at and we talked about how you you generate the socket that's the first little red thing in this code that generation has a has a family of addresses and so on protocols that it's in then you bind the address to that socket okay and that's where what comes into play is what's port is it interested in serving and what's the local address and then you listen and that listen is exactly what you see here when you see the ear right that's listening for incoming connections and then in a loop you accept the next connection and what comes out of accept is a brand new file descriptor which is the file descriptor opposite side for the server and so that's this green socket comes in from accept and then you can do anything you want with it and just as you might remember this particular instance that we gave was an instance that was had no parallelism and so this basically takes connections one at a time you want to see our discussion on several variants of that afterwards take a look at the lecture from last time all right that was where I wanted to do as a quick summary did we have any questions on that before I move on to some new material so everybody good yeah I see that our I see our numbers are a little lower today hopefully hopefully others are just a little delayed you guys are all the the most gung-ho of the students here okay yeah homework one is due that might have something to do with it okay so today let's talk now about implementation okay so multiplexing processes we have a process control block which we've been discussing kind of indirectly throughout the first couple of lectures and it's really basically a chunk of memory in the kernel that describes the process and so it has things like what's its state what's its process ID what are its current registers and so on if it only has one thread in it a list of open files we've been talking a lot about file descriptors and so on okay and so these are these are all of the descriptors inside the kernel describing a process the scheduler is going to maintain a data structure containing all of these process control blocks and it's going to decide for each process and each thread within each process who gets the CPU and we're going to have a whole lecture and plus on different schedulers so the question of who gets the next little slice of CPU is extremely interesting policy decision but that's for another day okay and the scheduler is going to be also giving out potentially non CPU resources like memory IO etc so the program counter of course is pointing at where in the code that particular thread is currently running okay so what does it mean to switch from one process to another so what it really means is here process zero is running and by the way now we're talking about a single threaded process so there's one thread so that process is main thread is running and at some point an interrupt happens okay and that saves all of the states such as the registers and the program counter and the stack pointer and all that sort of stuff into the process control block for zero and then it loads everything from process control block for one and then it returns to user level so what's in the middle here is kernel level what's on the outside is user level and what's in blue is actual code execution okay and so what's in the middle here is all running at kernel level and at high high privilege okay so this is privilege level zero for system privilege level three for user and as you may be well aware that for x86 there's actually four privilege levels but you typically only use zero and three the other thing I wanted to show you that's interesting here perhaps is if we do this switching too rapidly then what we're gonna get is all of all overhead and no execution and so this part of the blue where it's actually executing user instructions will become a vanishing fraction of the total execution and that's a form of thrashing if you go back and forth too fast and you end up with making no actual forward progress okay so the question about what the other two levels are they are different they're called rings and they're in certain military specs you have different things that are somewhere between system level and user level you can use those other two sometimes it's sometimes it's utilized for the hypervisor in some early versions of things where level zero is actually the hypervisor and one is the kernel level and so on so for now however just imagine there's two because we've only been talking about kernel and user now the question here about more time used to waiting than executing this is clearly not to scale so this would be a very bad design if we ended up with a vanishingly small fraction of the time was actually executing real stuff so what we want is we want to get to something under 10% there better of the time overhead so that we're you know using most of our cycles for something useful even though we all know that the operating system's most interesting part of this it's probably would be good to actually execute some of your real programs so the other thing I will point out is there are a bunch of transitions so if you notice we go from executing process zero we transition into the kernel that's at little yellow dot we exit from kernel back to process and so on and so these transitions which are transitions in privilege level represent potentially expensive saving and restoring of registers okay and in this case this entry into the kernel is coming from an interrupter could be a yield system call we'll talk about both of those as we go on now a process goes through a bunch of stages as does a thread okay and so for now I'm not even gonna say which which this represents because it represents all of them and processes and threads both have their thread component but as far as a process let's just talk processes for a moment the process starts from a new state and that's going to be right after we execute fork and set the process up and then we put it on some scheduling queue which is the point at which we admit it to say the ready queue okay and so that point it's now ready so what that means is not that this process the thread of this process is actually running but rather that it's ready to run okay and as if you think about it if you only have one CPU or one core then there could only be one thing actually running at a time everything else is ready okay and then at some point we the scheduler pulls it off the ready queue and it now becomes running okay and now if there's only one core there can only be one thing running at a time later an interrupt might happen which brings that original thread back onto the ready queue some other thread will have been brought into the running state as a result but we're only tracking sort of one processor thread at this time okay and then this will go on for a while la dee isn't this animation great we go back and forth at some point the thread or process will try to do some IO or do something that's going to require a weight like for instance a disk access how many instructions does it take to do a typical disk access what was that order of magnitude everybody's supposed to remember okay a million yep all right and so a million at least cycles means that when we're in the waiting state here we're on a queue waiting to get serviced with our IO there better be something else running so part of what we're doing is we're attempting to talk about how we can make something that's executing put to sleep long enough that we can run other things in place and overlap IO and execute overlap the IO and the execution computation the question here about SSD so SSDs are smaller okay it's not a million but it's probably ten thousand or a hundred thousand it's still going to be big enough that you're going to want to be put on a weight queue if we have more than one core that's a good question there can be more than one thing running and so the scheduler now has multiple run queues as well as multiple ready queues to worry about okay you'll never have a single thread run on more than one processor at a time because a thread only has one stack and so if you were trying to run it on multiple processors at a time you get chaos so ultimately the IO completes we get back to the ready state and we continue our running okay and then finally we will execute exit if you remember and that will put us in a terminated state okay which is a point at which the process is no longer available to run under any circumstances it's terminated okay and can anybody think why we might not just put free the process up why might we keep it in a terminated state laying around can anybody guess yeah great because the parent needs to get the result okay and so this is actually when it's in this state where it's terminated but not reallocated yet that's typically called a zombie state so that's a zombie process okay all right now if you if you look inside the kernel queues we have the ready queue and the CPU is the run queue but there are many other queues and so typically what happens is the process control blocks work their way from the ready queue to the CPU okay and potentially back again so if the time slice expires meaning that the amount of time it's supposed to run expires it gets put back on the ready queue it does an IO request it's put on an IO queue until the IO is done etc okay and so scheduling is the the act of deciding sort of which thing off the ready queue gets the CPU next okay there's also a type of scheduling in something like a disk driver device driver whatever which we'll talk about later in the term which will decide kind of which request gets to go next and that's usually there to try to optimize things like the the disk head not moving as much and so on so there's lots of different types of scheduling for now the type of scheduling we're talking about is really when you have a bunch of things on the ready queue which one gets the CPU next so the ready queue and all the IO device queues really are things that represent on non running processes okay so and there's a good question here so when you when you have a fork operation what happens well you only have one CPU therefore the child process needs to be put on a queue somewhere which gets it back in the ready queue okay because you can only have one thing running at a time this diagrams a little bit confusing but really what you want to think about is when you go through fork potentially the child gets the CPU and the parent goes on the ready queue or vice versa depending on the policy okay and you're and you should never assume one or the other gets to run first because they're they're completely independent once fork executes and I understand that diagrams a little bit confusing on that front but just you know you can only have one thing running and so there's really going to be one of them on is on the CPU and the others on the ready queue so you can imagine a bunch of these queues and they really all represent temporarily suspended threads okay and these temporarily suspended threads hold their queues of PCBs okay and those queues you know are linked lists of things and I'm just because I'm an electrical engineer I'm using a little ground symbol here for null but you guys can give me a little bit of slack there but you know we have lots of different queues in the system and they all have different suspended processes in them and the scheduler is potentially only interacting with the ready queue the rest of the queues are actually being interacted with through the device driver and we'll get into that not too long in a couple of lectures but the device driver for the disk for instance when a when a request comes back potentially will remove a process control block from its weight queue and put it back on the ready queue and so now it's runnable again and that scheduler is is this simple loop that is just many options here okay we have at least a lecture and a half on scheduling because surprisingly enough that simple question of on the ready queue who gets to go next vastly impacts whether the thing is responsive if you got a person typing or it's efficient if you've got a long-running task or if it's fair if you've got multiple things running at the same time or it if it's real-time scheduler because you've got a car and it's between pushing the break and the break engaging you know maybe there's a question of whether it's timely okay these are all scheduling policy questions which are going to be quite interesting for us but that's for another day the loop I wanted to show you here which I've shown you before is it's this mechanism where if there are any ready processes okay then it'll pick one and it'll pick one according to a policy so again if there's a queue for each device has a queue of processes that are waiting on it and so what happens when a request comes back from a disk then it finds that process that's been waiting there and it reactivates it and actually in something like Linux where the threads are our kernel threads and you'll see a little bit of that distinction in a bit then in fact the queues are actually holding thread control blocks not necessarily processes so the granularity of threads being able to be put to sleep on these queues is really what happens in systems where there's a one-to-one mapping between user thread and the kernel stack or thread okay all right many different scheduling policies so let's dive in a little further so when we were talking about processes originally we mentioned the fact that there can be many threads inside a process okay and each of these threads has a stack and some registers so for now we're going to talk about threads and how they're implemented and when it matters whether I'm taught whether I need to talk about process or not I will bring it back but just to remind you guys what's a process the process is a protected environment such as the memory space and file descriptors and all that stuff we've been talking about plus one or more threads okay and though each of those threads has a thread control block with registers and a stack in it okay and so when we need to talk about while we're switching the protection environment from process one to process two I'll make sure you know that I'm talking about that but for now we're going to dive in to just the concurrency portion or the threads okay so threads encapsulate concurrency or the active component the address spaces etc are the passive part and that's inside the the shell of the process why have multiple threads per address space for sharing okay now if you remember this is the shared state now we're within a process all of the threads in the single process they share the heap they share global variables they share code as we'll mention some of the important global variables those threads probably are going to share our locks etc we'll get to that as the second half of the lecture and then each thread has a thread control block that has information about where its stack is what its registers are metadata of various sorts and a stack in memory okay so that's per thread and so every one of the threads has that information okay and if you have too many threads then you can run out of space in your process well the reason we haven't talked about and there was a question here what about virtual address translation so they I wanted you to get a general idea about that that's going to require us to talk a few lectures to really get into it and so that's why I'm not I'm trying not to muddy the waters too much so we're we're working on we're working on the concurrency part today don't worry we'll get there you guys are going to be really deep operating system designers by the end of this class so the core of concurrency as we've kind of mentioned is this this scheduler loop or I'm going to call it the dispatch loop here and conceptually the operating itself operating system itself is this an infinite loop where we run thread we choose the next thread we stay we save the current thread state we load the new thread state and we just keep looping forever okay this is an infinite loop and I suppose under a certain point of view this is all the operating system does it just keeps looping letting threads run until they're either they yield the processor or they're interrupted and then we pick another one and we go okay so pretty simplistic and now we're done we'll have our final next week and we'll be good right so there's the whole operating system but perhaps we'll do a few more details just because we can one question might be should we ever exit this loop what are some good reasons to exit this loop anybody okay well interrupts don't necessarily well interrupts might be kind of like a bubble but they don't interrupt the loop because the interrupt happens and then it comes back yeah shutting down the machine PG&E yes power outages hopefully we're not going to get too many of those this season but I'm thinking we might have power shutdowns but yes so basically when the machine exits or it panics or any other sort of crashes you exit the loop but by and large we just keep it going okay so what we're going to do is we're going to briefly talk administration and then we're going to look more at how this all works okay homework ones do today as many of you are aware I appreciate very much that you guys are here for class thank you it's great to actually have people to ask questions project one is in full swing and I saw an interesting query on piazza that was kind of like well how can I do my design document if it wants code but I don't know how to do the project yet okay it seems almost like that's some catch 22 that catch 22 and the answer is that what we're looking for in your design document is a notion that you have read through enough of the code that you have an idea of roughly what you're going to need to do you're not going to have it all done because that's what the code deadlines are for but try to give us some intuitions it could be pseudocode you could pick out a couple of function calls you know they're going to be important you could pop up you could say well here's a data structure we're going to add these fields to it whatever those are those are not the same as we wrote a bunch of code and it works okay and so what we're looking for in your design document is a high-level idea of what you're planning to do and why and supplemented with some code pseudocode if you like that tells us some details of where you're going and helps your TA understand what you're thinking okay so that's the paradox you don't need fully working code to write a design document that would that would be pretty strange right the the if death user Prague basically says whether there are user programs are supported or not you can you can have a kernel only version so we should you should be attending your permanent discussion session remember to turn your camera on and zoom and discussion sessions are mandatory so we're taking attendance the question is will the design document be graded the answer is yes and you're trying to give us a understanding of your thinking in the design document and we will be grading the ideas that are there and then with your TA in your design review session you'll be talking to them and they may be giving you a few suggestions of other things to think so there is some you know the design will evolve possibly over the course of the project that certainly accepted okay the problem with example design docs of course is that they sort of have answers in them I'll see if I can find one for you but just think about you're trying to give a high level viewpoint to your manager who's your TA right and you're trying to give him an idea that you've thought through enough about what you need to do that you're on a good path all right the other thing is of course midterm one is coming up a week from tomorrow or two weeks from tomorrow not a week from tomorrow and it's gonna be video proctored I understand there was a little concern about how 61 C's video proctoring went believe me we're well aware of everything that's been going on in the department so we will try to avoid the mistakes of the past at least learn from them see I think that's all the administrative yeah I'm not entirely sure what happened I think that they were requiring people to record things locally and there were some issues with that under some circumstances that's not our current plan so we'll get that out to you all right good any questions on administrative yeah all right so let's talk about running a thread so what do we get when we run thread how do you run a thread well you load its state into the actual CPU registers program counter stack pointer okay if you're changing process you need to load its environment so that means get the page table set up that's that mysterious virtual memory we haven't talked a lot about yet get the you know get anything else loaded up and then you just jump to the BC and start running so one thing that's going to be interesting here I for you guys is that both this the OS which is managing threads and the thread themselves run on the same CPU so when the OS is running the thread isn't and when the threads running the OS isn't and we need to make sure that we can transition properly between those so this idea that the OS loads up a bunch of stuff and then jumps to the PC means essentially that the OS gives up control of the CPU all right and you know we're gonna have to deal with that right if you give up control to a user program that then proceeds to go into an infinite loop clearly we're gonna need to get that back somehow okay and so that's a question how do you get it back now I've been playing with computers long enough that I got to play with some of the early versions of Microsoft Windows like 3-1 some of the early Macintoshes and and other PC environments and in those PC environments what happened was the the multiple things that were running were fully cooperative so let's suppose that you had three applications running on your screen and they have three windows and one of the applications crashed excuse me what would happen is the system would freeze okay nothing would move you would have no control of the windows in the other applications either and the reason for that is that one application which crashed and maybe went into an infinite loop kept control of the processor okay so fortunately modern operating systems are not like that because we have memory protection which is an important thing but we also have things like preemption possibilities through interrupts which is going to be an important thing to talk about here but even back in the day you could have the illusion that multiple things were working you can have many windows all drawn stuff simultaneously representing different applications and the way that worked is each of those threads would run for a while and then it would voluntarily give up the CPU by calling a yield function back into the kernel okay and assuming that all of the applications cooperate it cooperated this worked fine it was when they didn't cooperate or forget not cooperating when they had a bug that was a problem okay so the Mac also was this way okay this is back in the dark ages in the early times okay in the original Macintoshes so let's talk about internal events first okay internal events are times when where everybody's cooperating and they're voluntarily giving up the CPU so a good example of this is blocking on IO when you make a system call and you ask the operating system to do a read you're giving up the CPU and therefore you know you're implicitly yielding in and there's well the operating systems working on your task by say talking to a disk for a million instructions it can schedule somebody else so surprisingly blocking on IO is a great example of yielding okay saying that you want to wait for a different thread or a different process say with a signal operation that's another example of voluntarily giving up the CPU because you're saying well I have to wait so go ahead you run that other thing for a while because it doesn't do me any good to be in a loop waiting okay the third thing which is sort of a an example to follow for all of these things is what I'll call a yield operation and it's actually a system call type thing which is basically let's suppose that I wanted to run a an application that computed pi to the last digit okay well what I would do is I'd have a while loop okay that's never gonna exit because this pie is very long right and I compute the next digit and then I yield and then I go over and over and over again okay and I also see in the chat mining Bitcoin potentially okay so these are very long-running things where what I've done is I've decided to execute a yield system call regularly enough that multiplexing works and the system acts properly like it's got multiple threads running at the same time okay now of course this particular application I'm showing on the screen is flawed for for a pretty important reason here does anybody know why this is is not a great example of yielding regularly well it yields too often maybe initially does anybody know anything about computing pi so the point here is that each digit you compute takes longer and longer and longer and longer so while this particular thing seems to be yielding properly at the beginning the yield operations are going to come at a longer and longer interval and eventually it'll be effectively like this thing is just acting forever okay so this particular use of yield is probably not a great example but assuming we yield regularly then we properly multiplex things and we are actually getting multi-processing okay so there is actually if you remember I gave you the POSIX API for threads in an earlier lecture p-thread create p-thread exit p-thread joint there's actually a p-thread yield although if you do a man on it you'll see that that's considered a not supported on all operating systems but there's also a schedule which is a similar thing and what this does is this actually says I yield the CPU so that another thread can run okay so this is a real interface right and so what we're going to do is we're going to say we're going to take a look at what is yield by us here okay and and once we've got yield figured out then we'll graduate to a few other interesting ways to get the the threads to give it up the processor okay so let's look at this compute pi function that I showed you earlier so we compute a digit we enter compute pi and it computes a digit comes back and it executes yield all right now this is the stack so remember how stacks kind of grow down and come back up all right yep sleep would be a type of yield as well so if you the compute pi stack frame starts at the top we execute yield now let me just show you back here if you take a look notice what's happening in this while true we enter the compute pi function so that's the first stack frame and then we run compute next digit and come back and then we run yield so yield is going to have a stack frame that's just below pi the compute pi the way we've set this up okay so that's what we're showing here and in this instance here blues is going to be the user code okay so we have compute pi stack frame we have yield yield is going to execute a system call all right which means that we transition into the current the kernel with a system call and at that point we actually change stacks so while we have the user stack in the blue area we end up in a kernel stack in the red area and there's a one-to-one correspondence between a user level stack and a kernel level stack okay and so we execute yield which is then going to execute run new thread which is going to execute a switch operation okay and so we're going to go through several levels here where yield calls the run new thread operation saying I got to pick a new thread which is going to call switch and we're going to find out what switch is about but let's start for a moment with understanding why do I have blue and red here okay this is not a political statement why is there a difference between the user level stack and the kernel stack okay so one-to-one one-to-one means that for every user level thread and stack there is a kernel level stack and I'll show you this next time when we really dive into real code okay but for now there's a one-to-one kernel stack specially allocated for this thread can anybody tell me why when I change modes by going into the kernel I use the kernel thread excuse me I use the kernel stack rather than the user's stack safeguard great because we don't trust the user ever if we're a kernel the most important thing that you need to do when you're the kernel is when you when you get a system call comes in from the user you check what the user gave you to make sure it's okay and then the second most important thing is you check what the user gave you to make sure it's okay and can you imagine what the third thing is you check with the user gave you make sure it's okay and then you actually execute things so this is an important state here because if the user code were to you know put a null or something in its stack pointer and then execute a system call the kernel is going to panic or do something because it's not going to have a valid thread so part of this transition from from user mode the kernel mode has to change the stack okay so here's what's going to happen now this is the running the new thread so we hit kernel yield calls run new thread and notice what run new thread is it picks it the next thread to run okay that's a scheduling type operation and then it executes switch okay and that switch operation is going to somehow switch to a different thread okay and then we're going to do some housekeeping which might be cleaning things up seeing how much CPU time we're using etc so how does the dispatcher switch to a new thread well we've kind of gotten an idea about that a little bit earlier which is we're going to save anything that the next thread may trash right we got to save the program counter and the registers and the stack pointer of this blue thing because we need to restore it later so we can keep computing pie which is very important right pie is the important number here in this class so and then we want to make sure we maintain isolation okay between threads now before you say well wait a minute I thought threads were sharing in processes right now remember our threads in our processes were intentionally not distinguishing them so we want to make sure that when you switch to another thread we have to make sure that we don't trash this current threads stack and if it turns out that we're going to a different process we have to make sure that we change the memory protection as well okay so how does that switch look okay well let's look at the stacks for a moment let's let's assume that what switch is going to do I'll show you some actual assembly like code in a moment but what switch is going to do is it's going to save out everything from thread a and load everything from thread B back in okay so how to think about this I'm going to show you a really silly piece of code here but this is going to help us okay so this code starts with function a which is going to call B and then once we get in B it's just going to go into an infinite loop that does yield yield yield yield okay and if you can imagine what that means it means that yield is going to give the CPU up to somebody and then when you come back we execute long enough to go into the loop and then we're going to yield again okay and suppose we've got two threads s and t both running exactly the same code so what happens well thread s a is at the top of the stack it enters B which is executing the while loop which calls yield which calls run new thread which calls switch and switch is going to switch to the other thread okay and then that switch is going to return to run new thread which is going to return from system call to yield which is going to return to the while which is going to call yield which is going to call go into the kernel and call run new thread and call switch which is going to switch back the other way and then we're going to come back okay so this particular example where there's only two threads in the system and they're both running exactly the same code what's going to happen is we're going to kind of go down the stack for s we're going to then switch over to t and come up the stack and then we're going to go down the stack for t switch come back for s and what's interesting about this is look what is this switch routine okay this switch routine is really simple okay and this is MIPS code but it's going to be very similar to what you got for risk 5 that you guys are all familiar with but we're going to save all of the registers of the CPU into the thread control block we're going to save the stack pointer we're going to save the return PC all of that stuff so this green thread control block is the one that we were running and now we're done with it and now we're going to load back the red one and then when we're done we're returning so this although this is written in assembly language and i'm going to say sort of assembly language um pseudocode notice that switches a routine so we call the function switch and it returns down here back to wherever we came from so that uh well so here's here's the thing that i think should be interesting so if we get in switch let's suppose first of all let me answer the question that's kind of on the group chat here which is when you switch to a new thread why are we reading the stack bottom up and not top down again the answer is we're returning okay so forget this somehow getting from s to t let's let's suspend that complexity in our mind if we were to just have one thread in the whole system we would call a would call b would call yield which would go into the kernel run new thread which would call switch and what does switch do switch when it's done return see the return down here and so what does return do well return pops something off the stack right and run new thread is a function which will pop something off the stack which will return back to user code and and yield will thereby return and then we'll go back again to yield and then we'll go back up and down if there were only one thread in the system where we the stack grows as we call forward and as we do returns the stack shrinks yeah okay now however this good i'm glad you guys got that now the question though is how does this work going back and forth okay why does that happen and the answer is when we get into switch on the left let me go back this way we save out all of thread s's registers and then we load in all of thread t's registers including its stack pointer which means really after we gotten to the bottom of the switch routine before we hit return we're actually over here because we're on a different stack so when we return after we execute switch it takes us back up here and then when we come down and we switch we return back up over here okay so take a second to to understand that see this back and forth okay and it's all in here because when we it'll never hit a again that's correct because there's an infinite loop but if you so you see there's an infinite loop here so a is never going to come back because b just stays in the loop forever but if you notice what's going on here is when we change the stack pointer to the thread t's stack let's say when we do this return even though we started with thread s's stack by the time we get down here we're on thread t's stack so when we do a return and we have thread t's return pc we're actually returning back into thread t not into thread s and vice versa okay and that's why we go back and forth now i'm going to let that marinate for you guys a little bit and we're going to explore this a little bit okay but oh good other question the other question is after you switch does the kernel stacks thread not match the user stacks thread the answer is they still match because the the way that the user stack and the kernel stack are associated with each other is the state in the this thread the red thread for t remembers which thread uh which stack it came from so when when we're in thread t's kernel stack it has associated with thread t's user stack okay so the matching up happens all the way from the kernel backup through thread t as well okay so in some sense you could say that if i were to take this s when it's suspended because i'm running in t and i were to disconnect this stack and thread control block and put it on some weight queue so that there isn't uh so that s is not on the ready queue and so that the scheduler never gets it then t will never go to s again it'll just go to other things maybe it goes to u v w whatever are running but s is happily suspended in some weight queue and the moment i put s back on the ready queue then this behavior will start happening again and we can um run s again okay so the thread is a complete self-contained snapshot of a running state which is a thread control block and two stacks and you can put it away and you can come back later and make it runnable okay so this is kind of the key idea that we've got so far okay so some details about that switch routine by the way so now what we've said essentially is the pc is saved by the way uh in in all of this it's what it's one of the it's one of the the registers that are saved okay i'm not actually showing you here but it's it's one of the pc is certainly saved okay now so what we just said is the tcb plus the stacks contain a complete restartable state of the thread you can put it anywhere for revival so here's a question for you what if you screw up switch okay this is like at the core of the core of the core of the core of the scheduler inside the kernel okay well let's say you forgot to restore register 32 or something so what's really bad about this is um you get intermediate intermittent failures depending on whether the user code was actually using register 32 or not okay and uh the system will get the wrong results without warning okay um let's hold off starting for a moment i know people are wondering how that got started let's just say for now the system has s and t running and this is just happening okay we haven't started anything yet we've popped into a running state okay so hold your suspend your question on that for just a second so uh switch is extremely important and the question might be is there an exhaustive test that you could run of the switch code the answer is no okay you're gonna you're gonna have to look at that code and then get other people look at that code and then look at that code again over and over again and you know it's not very long so there's you know it's not going to change much and it's not going to be too complicated but you got to be careful because if that's wrong the whole operating system is going to be behaving weirdly and you're not going to understand why okay there's a cautionary tale here i like to tell sometimes which is for speed there is a kernel from digital equipment corporations one of their research labs called topaz and this was back in the days where memory was uh very scarce and so some very clever programmer decided to save an instruction in switch that worked fine as long as the kernel wasn't bigger than a megabyte now i realize those numbers seem ridiculous to you today but let's assume for a moment that a megabyte was a lot of a lot of memory at one point okay and as long as the kernel size was less than a megabyte or 20 bits uh an address then um this would be fine and it was carefully documented and it saved an instruction so it was faster what is their motivation well the core of switch is used by every switch and so it's part of overhead so it made sense let's make it smaller the problem is and they documented it it was great except time passed and people forgot and you know the clever person may be retired um and later what happened is people started adding features to the kernel because they were getting excited about putting stuff in the kernel and it got bigger than a megabyte okay and once it got bigger than a megabyte suddenly very weird behavior started okay and yeah i suppose one moral of story could be don't document i don't i don't want to say that that came out of this lecture but the moral of story is be sure that you design for simplicity and if you're going to um if you're going to make some micro optimization you better make sure it's really worth it okay all right hashtag read the docs okay the instruction would save kind of the higher part of the bits of an address of the kernel okay so aren't we switching context here with the with the threads we've been talking about well assuming we're not changing the yeah you're asking about build scripts things weren't quite so sophisticated back then so um if we're switching just threads okay this is very sophisticated this is very fast so what i've shown you is the thread switching portion now if we need to switch between processes we're going to have to start switching address spaces and i wanted to give you just a little bit of an idea here so the frequency of the context switch in typical operating system like linux is somewhere in the 10 to 100 millisecond time okay the overhead's about three or four microseconds so you can kind of see where this goes all right this is um you know in the in in the small range here okay now switching between threads was much faster in 100 nanosecond range okay so there's a you know there's a thousand microseconds in a millisecond okay and a thousand nanoseconds in a microsecond so you can kind of see where these numbers come into play and so the key here is keeping the overheads low and so switching between threads within a process is fast we're switching between processes takes longer and this extra time is really all you know this is 30 or 40 times uh cost is really about things like saving the process state and so on okay now even cheaper rather than switching threads by going into the kernel and coming back would be to run threads in user space now i know there were some questions about this at one point but let's be a little clear here for a moment what we've been talking about and what the default thing in linux is these days is a one-to-one threading model where every user thread has what's called a kernel thread okay and i'm going to use this terminology and you're going to take a little time to get used to it but a kernel thread is really um a kernel stack that's one-to-one uh matched up with a user thread such that the user's stack gets switched out and the kernel stack is used when we're in the kernel and then when we return to the user we use the user's stack but the kernel stack is always there suspended so if i have four threads i have four kernel stacks inside the kernel matched up with user threads okay this is exactly what we've been talking about and this is what pintos does for you and this is what the basic linux model is but we can be faster so for instance we could do this where each kernel thread where there's a kernel stack has user threads associated with it more than one and what we do when a user thread executes yield is it's a user level yield where the user code library looks knows how to do that same stack switching i just showed you but it saves and restores registers between threads without ever going into the kernel so we can make the um we can do this user multiplexing very fast okay and if you were to google green threads for instance this was done a lot in the early days when uh going into the kernel was more expensive okay but you can do this with a thread library a threading library a lot of early versions of java were like this where the threads actually all operated up here but not in the kernel okay now the good thing about the left model all right is if a user thread does a a particular user thread does i o which puts it to sleep this kernel thread gets put off on the sleep queue for that i o device but the rest of them are still running so they're still getting cpu time that's good here in this model the many to one model we have multiple threads and if any one of them goes to into the kernel and goes to sleep on i o all of the threads are suspended because nothing can run okay so while the user thread models very fast it doesn't interact with sleeping in the kernel well and so that's why there's also a many to many model where you have a small number of kernel threads and many more user threads okay and that's got special library support and don't worry about it you as a user a programmer would just see a bunch of threads and you wouldn't your lot the library would hide this okay but today we're talking about the thing on the left for the lecture okay all right now um so just to show you a little bit now our model has a cpu potentially one cpu each process may have multiple threads and there might be multiple processes and so basically the switch overhead between the same process is low because it's easy to switch threads between different processes is higher we saw that factor of 30 or 40 okay the protection between threads in a process is low that's by design they can share memory with each other between different processes it's high that's also by design to protect processes from one another the overhead of sharing is low inside a process because threads can just share memory and between processes you got to do ipc to figure that out and there's no parallelism only concurrency so in this instance there really is only one thing actually running at a time now of course we all know about multiple cores so we can actually introduce parallelism in here and what happens is the top part of this model doesn't look much different but now we have three four however many cores that are executing there can be 28 in some instances 54 whatever and in that instance now we start having some questions can we you know the switching overhead might be similar but now if we have different processes but the same cores are running at the same time then that's medium overhead to communicate as opposed to if you're trying to communicate with a process it's completely asleep because it's not running on any core that's higher and yes there's parallelism here okay so this is an instance where concurrency which is really the thing we worry about it's translated into parallelism okay and I did want to say one quick thing about simultaneous multi-threading or what's called hyper-threading by Intel because they never want to take somebody else's name for something but we could imagine if we had a lot of transistors on a chip that we could put them together and allow multiple operations to run simultaneously so think about time goes down in these figures and each line here represents a cycle and so what you see here is there are three functional units in the case of the superscalar the ones that are solid yellow are actually doing something and so we're getting some parallelism here like for instance this is getting three things happening at once kind of in the middle where there's three yellows in a row we could get a multi-core by putting several of these together okay so in fact in this middle thing we now have two multi-cores that are the same as this one core on the left and then hyper-threading is a little different and then you can have two threads that get interleaved on the same core and so now rather than these empty spots like these gray parts we actually fill in green and yellow and so we use much closer to 100 of the pipeline okay that's called multi-threading simultaneous multi-threading or hyper-threading okay and this thing on the on the right is a much more efficient use of hardware and a lot of intel processes and AMD processors is all on have hyper-threading and you get definite speed up because you're using more slots here okay and this original technique was called simultaneous multi-threading you guys can take a look but in this instance now you'd actually have multiple threads running simultaneously on the single core okay whereas in the middle one you could have multiple threads on on two cores do GPUs have hyper-threading so GPUs don't really quite have hyper-threading in the way you're thinking GPUs are usually designed as a single a single task takes over the whole GPU okay hyper-threadings shouldn't affect locking because if you've got a good good code that's will work under all circumstances of concurrency and parallelism it shouldn't matter now so what happens when the thread blocks on i0 okay hyper-threading is parallel because there's two actual threads and they are running simultaneously oops i just lost my place here hold on a second my bad sorry about that guys hold on a second we'll put the screen back so now um the question is let's let's uh move forward i want to try to catch a couple of things before we we want to get into some synchronization here but so what happens if we block on i0 so here's a different process that's actually copying from one file descriptor to another so you open one for reading and the other one for writing we actually showed you that code a couple of lectures ago and so now it executes a read system call what happens well we take a system call into the kernel okay um you know that's a read system call and the read operations initiated and at that point we go in to the kernel we switch to the kernel stack all right and uh we will initiate maybe the device driver on the disk to go off and read and what happens then well we run new thread and switch so notice that uh we can set this up so that little bouncing back and forth between s and t works perfectly well if the thing instead of the executing yield does a read operation okay works perfectly well okay thread communication so waiting for signals or joins or networking over sockets all of that stuff has a similar behavior so that's why this this particular paradigm of the two stacks um which you can put on any sort of suspend queue plus you can put it back in the ready queue works very well for scheduling okay but what happens is the thread never does i o so now we want to we want to somehow progress beyond the early days of windows 3 1 and macintosh and so you know the compute py program could grab all the resources okay and if it never printed the console never did i o never ran yield we would crash the system okay and so there's gotta be some way to come back and the answer here is external events so the particular one uh the couple of them one is interrupts okay signals from hardware software that stopped the running code and the timer like an alarm clock that goes off uh every off so often okay both of these are interrupts from the hardware that that caused the user code to enter into the kernel even if it wasn't going to do that okay and if we make sure the external events occur frequently enough then we get fair sharing of the cpu as well so if you take a look here i just wanted to say a little bit about interrupts so a typical cpu has a bunch of devices that are all connected via interrupt lines to an interrupt controller and that interrupt controller goes through an interrupt mask which lets us to disable interrupts and then that goes through an encoder and tells the cpu to stop what it's doing to handle an interrupt so for instance if something comes off the network uh that'll generate an interrupt which will interrupt the cpu and the cpu will go off and do network interrupt okay so interrupts are invoked with interrupt lines from devices the controller chooses which interrupt request to honor okay and the operating system can mask out ones that it's currently dealing with um there's a priority encoder that lets us pick the highest priority ones and uh that whole interrupt core of the operating system we get into a little more detail when we get into devices but i'll point out a couple of things so the cpu can disable all interrupts typically with a single bit when it's processing one interrupt okay and it can change this interrupt mask to change the which devices it's willing to listen to okay and there's also a non-maskable interrupt typically which is something which might get triggered when say power was about to go out and there's no way for the cpu to disable that okay that's kind of the oh my gosh hurry up and do something quickly um each cpu has its own interrupt controller that's correct okay and uh the question about what do we do to prevent threads from getting interrupted by uh other cpu's is an interesting one we'll get into we'll get into disabling of interrupts in the next in the next lecture um the kernel stack is in kernel memory that's correct and uh it's not and when you're at user level you can't access that kernel stack otherwise that would defeat the whole port purpose i'll show you that next time too so an example of a network interrupt we're running some code here you know in assembly whatever the interrupt happens typically the pipeline gets flushed the program counter saved the interrupts are saved we go into kernel mode which does some manipulations of masks and saves interrupts and so on um and we'll re-enable interrupts we'll talk more about that for all things except at what i'm handling typically we go ahead and actually handle the interrupt itself like grabbing the network packet and then we restore and return back from interrupt and at that point we can pick up so this thing on the left that we've flushed we've interrupted and restarted as user code and the interrupt is able to stop the user code long enough to service the request and come back okay and i realize there's a lot of pieces to this we'll talk more about them later but an interrupt is a hardware invoked context switch so when we had our worry about the fact that perhaps user code could hold on to the processor well if we have an interrupt that occurs regularly enough we can switch and we'll do the trick and that trick is uh typical uh PCs have a timer okay many timers in some instances which are sources of interrupts and we just program the timer to go off every hundred or ten ten to a hundred milliseconds and that will make sure that we're able to context switch okay and so that instance looks just like this we're busy running code and the interrupt takes us into the kernel right this is not a yield arc this is not a system call arc what took us into the kernel was the interrupt itself but that interrupt stack can be made to look identical and then we just run new thread and we switch okay all right and um is there a protection against a malicious device constantly making interrupts uh depends on the circumstances okay if you have a malicious device that's and it's attached to the hardware then under uh bad circumstances that can be very bad so hopefully that doesn't happen okay so how do we initialize the tcb in stack while we initialize the register fields of the thread control block stack pointer is made to point at the stack all right and uh we set things up which what we'll call a thread root stub which we don't even have to initialize the stack but we're going to set it up to look like it's been running okay and what we're going to do is we're going to put that on the ready queue so that if we switch to it what it does is it returns from switch by uh loading the return address and a couple of registers and as a result it's going to start executing just like if it had been running for a long time and executed switch so that's this idea this new thread root stub has been set up as an environment with a new stacks and we've just set up the right registers so that we can fake it out to look exactly like we were running something else that called switch so this is got a state that looks like switch all right and the mo and so what is that setting up new thread do well it sets up the stack pointers it sets up a pointer to some code that needs to run and some function pointers and then we switch to it and it runs okay now this is going to depend heavily on what calling convention we are so i'm showing you something that looks like a MIPS or a risk five if you got an x86 you have to do a little bit more with the stack to set it up but the bottom line here is we're setting this up to look like we've switched to it so that if we switch to it it'll just start running okay and what does it look like well the thread root does some housekeeping it switches into user mode and it calls a function pointer okay and if you look here thread root calls the thread code and it starts growing and all of a sudden we've got a thread that's running and this is exactly the way that s and t were started in that previous slide okay now the question here about what happens if the user thread goes into an infinite loop the answer is well because we have a timer going off what's going to happen is it's going to waste its own cpu time but others will get to run in particular there could be somebody who comes in and kills it off and they have enough cpu to actually run say the shell or whatever okay all right now let's talk now about correctness okay and hopefully you guys can bear with me a little bit but now that we've got concurrent threads and we have a beginnings of an inkling about how to make sure they all run all the time and we have an idea that if we were to disable interrupts we might actually prevent things from switching that's going to be very important next lecture we can start talking about how do we make multi-threaded or multiple process code work and the problem is this non-determinism factor schedule can run threads in any order switch at any time and if the threads are independent that's okay but if they're cooperating on shared data we've got a mess and multiple threads inside of a single process is are likely to be collaborating together and then we may have a mess okay and the goal here is how do we correctly design things so they work by design regardless of how what the scheduler does to us and i like to think of this like the scheduler is a malicious murphy's law device whose sole job is to run your code in the order that exposes the worst concurrent bug and it's going to do it at the worst time okay so that's the murphy's law scheduler all schedulers are murphy's law schedulers and so our only defense is to design our code correctly so that it's not subject to the murphy's law scheduler okay now when a user thread switches there's a question in the chat here it's a kernel stack preserved now the one objection i would have to that question is there isn't one kernel stack i hope you see that there's many kernel stacks one for each thread okay and where they're preserved is they're on cues well empty space they're they're in places that are well associated with the current running thread okay so they're in registers associated with the operating system at that time all right so this is many possible executions of the murphy's law scheduler so here's an example of the bank server which i i think i've mentioned before but i want to go into so we have an eight we have many atms and a central bank and the question is suppose we want to implement a server process to handle requests for that well we might do something like this where the bank server grabs the next request processes it grabs the next request processes and does this serially one at a time and what does process requests do it figures out what you want to do and if you want to deposit potentially it gets your account information maybe using some disk i o it adds to the balance uh let's say if you're depositing and then it stores the result possibly also using disk i o and continues okay so more than one request being processed at once uh would seem like a good idea here but our naive way to do that why would we want to do that well at minimum we'd like to get our disk i o overlapped with computation okay so one option which i'm not going to go into right now because we're a little low on time but we could build an event i'll give you a very brief idea we could build an event driven version of this where we take that original task and we split it into a lot of pieces that are guaranteed to run to completion without ever stopping so that would be for instance uh the the first piece would be all the way up to getting the account id and starting the disk i o and then another piece would be after the disk i o is done we would add to the balance and the next thing would be after we've done that we and we start our disk i o when that returns that's another piece and so on so you pick these pieces between the disk i o's that we know are going to run quickly and you build a dispatch loop like this where the next event which is like the end of a disk i o you figure out which thing you were working on and you do the next thing okay and that quickly ends and you put that back on the event queue and you keep doing this in a loop all right this event driven way of doing things is really crazy unless you've ever done programming for windowing systems and then this will look very familiar to you but i will tell you that while you can program this way it's very hard to get it right like you could forget an i o step like one of these start requests or continue requests might actually do an i o and you weren't ready for it which is why we like to have many threads okay so threads can make this easier so let's have one thread for every user in the system doing a request and so what's great about this is we could have many folks all running deposits and so you know their disk i o's um you know might stall one of those threads but another one would get to run because remember every thread has a kernel half and can be put to sleep so it's great as now we've got parallelism performance okay but this is not good so let's suppose you're depositing ten dollars and your parents are depositing a hundred dollars into your account at the same time okay i don't know how often that happens to you but let's suppose it happens uh frequently and here's you your thread one and here's your parents thread two and you you load your balance your parents thread gets to run it loads the balance it adds a hundred bucks and stores the balance back and then you get to run and you add ten bucks and you store the balance back and if you look carefully at this what you see is how much did your account go up 110 dollars ten dollars okay i can tell you you're not going to be happy about that your parents aren't either uh so we have a problem and this problem starts showing up the moment we have threads working on the same data okay concurrency so this problem is one of the lowest level problems so like if we have thread a and thread b a is storing to x and b is storing to y normally that isn't problem have a problem okay but see this isn't even a raw i see somebody claiming that might be a robin hood thing the problem is the money just went poof nobody got it so that's just bad okay so that um here is an instance which is a little crazier right where um thread a is operating on some data including y and thread b is operating on y and suddenly we have a race condition and the question might be what are the possible values of x and they could vary quite widely okay um you could have x equal to one you know you could have x equal to three etc etc okay many options in here depending on how the threads are interleaved so that's not good or what about this thread a stores x equal one and b stores x equal two if we assume that loads and stores are atomic then x could be either one or two non-deterministically um i suppose if you had some sort of weird serial processor you might even get three out of this where you know a's writing zero zero zero one and b's writing zero zero one zero and they get interleaved and you get three um that's one you don't have to worry about okay but we need atomic operations okay and so they understand a concurrent program you need to know the underlying indivisible operations and what they are and so an atomic operation is an operation that always runs to completion or not at all and it's indivisible can't be stopped in the middle and the state can't be modified by somebody else in the middle and it's a fundamental building block okay if there are no atomic operations there's no way for threads to work together okay so notice that what we really wanted to happen back here in the bank case is we wanted this uh get account add to account store to account we wanted that to be atomic so that it couldn't be interleaved okay and so that's our atomic operation that we really want okay and on most machines memory loads and stores are atomic the weird example that i gave you that gave three that's that i have never seen that okay that's an interesting thing to think about okay but things like double precision loads and stores aren't always atomic okay so if you have a floating point double and you're loading and storing it you could actually get half of the top half of one in the bottom half of another under some circumstances okay so you got to know what your atomic operations are and next time we're going to talk a lot about what the native atomic operations are over and above loads and stores which is going to be important because we're also going to show you that loads and store atomic operations are not enough okay just not enough but let's hold that discussion off so if you remember what a lock is a lock prevents somebody from doing something so you lock before entering a critical section you unlock uh when you're done and you wait if the thing's already locked uh you wait for it to be unlocked and so the key idea here is that all synchronization in order to make something correct it always involves waiting so rather than running right away you wait so that the atomic sections don't get interleaved okay so waiting is actually a good thing here as long as you don't do it excessively okay and so typically as we mentioned several lectures ago locks need to be allocated so it might be something like uh you know structure lock my lock and then you would knit it or maybe p thread mutex my lock and you initialize it all the different systems have different ways of initializing the lock and then you typically have a choir which grabs the lock and release and they often take a pointer to the particular lock okay so how do we fix the banking problem well we put locks around our atomic section so we acquire the lock and we release the lock all right so this thing in the middle is what we call a critical section the critical section is the atomic operation that we've chosen that we only want one thread in at a time and the gatekeepers are going to be the acquire and release okay and so here's an example just to show you so if we have a bunch of threads here's some animation right thread a b and c they all reach the acquire if we let them into that critical section more than one at a time we get chaos but the lock will actually pick one to let through and so now a gets to run and then when it exits and calls release then the next one gets to run so now b gets to go and then c gets to go etc okay so you in order to make this all work properly in a banking operation we must use the same lock with all the methods withdraw etc that are operating on the same data so part of this is now we have to analyze our problems properly okay so if you remember some definitions synchronization are using atomic operations to give us cooperation between threads so for now loads and stores are the only ones mutual exclusion is this idea of producing or preventing more than one thread from an area we're going to mutually exclude things so that only one thread gets to run and the thing we're excluding from is this critical section and so this at the simplest level this idea of figuring out how to fix a synchronization issue is doing an analysis of where do I need my critical sections what's my shared data and where are my locks okay now we're going to get a lot more sophisticated in a bit okay but another concurrent program example might be two threads a and b are competing with each other a gets to run b gets to run okay so what do we see here well assume that memory loads and stores are atomic but incrementing and decrementing is not so by the way i equal i plus one and i plus plus they're the same as far as this concern because they compile to the same thing and what happens here who wins well it could be either okay and this is is it guaranteed that somebody wins well maybe not because they're going to keep overriding each other okay because i is a shared variable and if both threads have their own cpu running at the same speed do we know that maybe it goes on forever and nobody finishes because they never managed to get i less than 10 or greater than minus one okay so the inner loop looks like this you know we load we load we add we add we store we store and notice what just happened we overwrote so thread b overwrote the results of thread a and so the hand simulation here is like oh and we're off a gets off to an early start b says oh gotta go fast tries really hard a goes gets ahead and writes a one b gets then goes and writes a minus one a says what okay this is not and in the answer to the question on the chat we're not talking about two processes we're talking about two threads inside the same process okay and so they're actually sharing i okay and for the person worrying about coherency and sequential consistency let's let's assume we're sequentially consistent and not worry about that question for now so i uh each thread has its own stack yes but there's a global variable i so this issue we're seeing here is because the global variable i is shared okay now they may not run simultaneously under all circumstances but if there's a if we have multiple cores or we have multi-threading of some sort like simultaneous multi-threading they might run the same time or the scheduler might switch at exactly the wrong time and so the answer is you got to think about this as if the scheduler is going to pick the worst possible interleaving because it will happen once in a thousand times or once in a million times and it'll happen at three in the morning when an airplane will crash because of the bug right all right the murphy's law of schedulers is the best thing to think about okay so this particular example is the worst example that you can come up with this is an uncontrolled race condition whereas two threads are attempting to access the same data simultaneously with one of them performing a right okay and here simultaneous is defined even though you know one CPU maybe there's only one CPU we're thinking about this from a concurrency standpoint such that murphy's scheduler could under weird circumstances flop back and forth so does this fix it well we just put locks acquire and release around the i and the i minus i plus one and i equals i minus one did this fix it okay well it's better because we don't we always atomic atomically increment or decrement it's you know so it's the atomic operations are good and technically there's no race here now because a race is a situation where there's a read where two threads are accessing the same data and one of them's a right okay if you ever had that circumstance you've got a race and that's really bad so this is no longer a race because the acquire and release will actually prevent uh two threads from being in the middle where they're updating i at the same time so that's not a race but it's probably not still it's probably still broken because you've got this uncontrolled incrementing and decrementing going on and it's not likely to be what you wanted okay when might something like this make sense well if you each thread is supposed to get one unique value hold on for me just a sec here we're getting close to being done each thread needs one unique value of i then you might do something like this but you're not going to do this while loop where one's going up and one's going down okay and in fact you've already seen this example with a red block tree red black tree excuse me what you might do here is there's a single lock at the root okay and thread a when it doesn't insert what it does is it grabs the the lock of the root it does insert and it releases it b might insert by grabbing the lock doing an insert releasing and then doing a get by grabbing the lock inserting and releasing here both threads are modifying and reading the tree but the reason we have locking in here is to make sure the the tree itself is always correct okay so here the locks associated with the root of the tree there's no races at the operational level okay inside the tree so threads are exchanging information through a consistent data structure this is probably okay okay can you make it faster you're going to be tempted when we get you doing on the working on the file system one temptation might be well the problem is when thread at yay acquires the lock it locks the whole tree and we don't really need to do that there are ways that certain tree operations where you can go down and have a lock per node and and deal with uh locking only sub trees that you're actually going to change but you got to be really careful about that so concurrency is very hard and unfortunately i was hoping to get to to semaphores today but even for practicing engineers it's hard okay this analysis of what you need to lock and so on is something that people don't always get right and i just wanted to give you a couple examples so the ferrack 25 radiation machine there's a there's a reading that's up on the resources page for us is a great example of what happens when there's a concurrency bug so what happened was this was a radiation machine that could either do electron you could either have electrons or photons a very high x-ray style photons and the way it did that was it either had a target or not if it had a target it would set a bunch of electrons at that target and what would come out as x-rays otherwise it could use the the electrons directly and the problem was there was a bug such that when the operator was typing too fast it actually screwed up the positioning that would pick the target and the dosage and they they fried a bunch of people literally they they died from radiation poisoning it was awful okay there's a there's a interesting priority inversion that is up on today's reading as well we'll talk about that when we get into priority inversions there's also a talk about this toyota uncontrolled acceleration problem which was also a synchronization problem okay so what i want you guys to do is take your synchronization very seriously all right now i think unfortunately i'm not going to be able to get to the semaphore discussion today if you take a look there's some pretty good slides on on semaphores and maybe i'll see if i can put up a little more audio on that for later but i want to let you guys go today we we really talked about concurrency okay we we showed how to multiplex cpu's by unloading the current thread loading the next thread and getting context switching either voluntarily or through interrupts we talked about how the thread control block put this plus the stacks give the complete state of the thread and allow you to put it aside when it needs to go to sleep and then we started this discussion about atomic operations synchronization mutual exclusion and critical sections those four things together are part of the discussion and the design that's involved in understanding how to make a correct by design multi-threaded application and we we did some a lot of discussion of locks which is a synchronization mechanism for enforcing mutual exclusion on critical sections i gave you some good examples semaphores are a different type of more powerful than lock synchronization take a look on the slides i know they talked about this in section last week as well so you guys have a great weekend we will see you on monday and have a good night and the get outside a little bit if you're in the local area here because we can actually breathe for a change that's good all right ciao