 All righty, welcome back to operating system. So today we get to talk about locks, the solution to and the cause of all of our problems. So where we left off in lecture 17 is we had this code where we had eight threads incrementing a global variable 10,000 times. And we figured out that every time we did it, we got a different answer because they were essentially all fighting for it because the operations involved in this is a load, an increment, and then a store. So if we want to solve this problem, the first step to any solution is to identify and classify the problem. So our problem is something called a data race. And they occur when two or more threads are sharing some data. So a data race is specifically when two concurrent actions. So again, we don't need two threads. This can happen on a single CPU. All we need is concurrency, which we remember the difference between concurrency and parallelism. Yes, good. So if you have two concurrent actions accessing the same variable, and at least one of them is a write, you have a potential for a data race, which is you will get some unexpected results. So if two or more concurrent actions are just all reading a variable, that's fine. You have no worry of getting an invalid value or value that's about to change or anything like that because it doesn't change. So you have to have two concurrent actions, and at least one of them has to be a write. So next, we have to talk about what an atomic operation is. So atomic operations are indivisible. We tried to reuse the same word from chemistry, right? You have atoms, you can't go any lower than that. You can't break them up unless you get into your semiconductor physics courses and quantum mechanics and all the stuff I've since forgotten about. So an atomic extraction, it either happens or it doesn't. So you may assume it happens all at once. There's no sub steps that can be interrupted. It either happens or it doesn't. Can't preempt in the middle of it. There is no middle of it. It either happens or it doesn't happen. So you are only allowed to be preempted between two atomic instructions. So after instruction finish, then you can be preempted, switch to another thread, start executing something different, and then come back, but each one has atomic instructions that they execute. So if you want to get into some detail, so this will be a preview for a compiler course if you take one. So compilers just don't take your C code and then just do some magic and then give you some machine code. It will represent your C code in an intermediate language that's a bit simpler to argue about and more generic. So it's essentially kind of like assembly, but it's like generic assembly that doesn't matter what actual architecture you're using. So this is used for analysis by most compilers that do any type of optimization. So they change all of the C code you write into very simple statements that just do one thing at a time and you assume that each of these statements is atomic. So it's useful to represent your program like this because you can reason about data races and it's a bit easier to read than the actual assembly code. So all of the statements will have this form. So it will set something as a result. Well, it may or may not set something as a result, but this is as complicated as it gets. So it will set something and then do an operator on up to two operands and that's it. So for example, you couldn't represent one plus two plus three, you would have to break it off into several of these statements. So you would have to save the result of one plus two into some register or something and then add the value three to that. So this is what's used by compilers and then that's what they reason about and you assume each of these statements is atomic. So Jimple is the three address code used by GCC. If you really want to see what's going on, again, not required for this course, but in case you're interested, you can use this flag F dump tree Jimple. Don't ask me why they call it Jimple because they like naming things poorly. I think we all name things poorly. So I guess we can't really blame them. Look at what I called lab four. So here, if you want to see the tree, it's yeah, that dump tree Jimple. If you want to see all the three address code generated by GCC, you can use F dump tree all and you'll see just a bunch of compiler internal information. That will probably make no sense to you, but you can kind of read it if you've, well, you probably have read assembly before. So it's easier to reason about your code than low level assembly. Even if you don't know how to read it, it's actually pretty readable if you go ahead and dump it out. So if we do a look at the Jimple for that pthread data race, we kind of argued about before. So instead of count, we'll assume that, I mean, we're modifying a global variable. It's in memory somewhere. So we'll argue about it as a pointer to the count, which I'll call pcount. And that's just the global address. So the Jimple would be the following, which is kind of what I wrote out in lecture 17. So it's only allowed to assign one thing and then have up to two operations. In this case, the only thing it does here is a load. So this is a load into whatever. So that would represent a register for us internally in the compiler. It assumes you have an infinite amount of registers. So they just call them D1, D2, D3, that, that, that, and you can go up as high as you want inside the compiler. So this would be a load from memory. So I'm loading whatever the value is at that address. And then I am doing my increment. So this is my increment. So I get a new register value from the old register value plus one. And then I write out the result, whatever my increment result was. So assuming I have two threads executing once and initially that global variable is equal to zero, what are all the possible values of that global variable? So ideally, if the initial value is zero and we have two threads, well, one could increment it and then the other one could increment it so we could get two, right? Is it possible to get one? Yeah, it should be possible to get one because we have a data race in this case because they're essentially writing to that location as well. So we could have the case where one thread reads the value, we context switch over to the other thread, it reads the value which would be zero and then it does this thing we did in lecture 17 where at that point they would both increment the register and then write out the value. So in this case, to analyze data races more, you just have to assume all preemption possibilities to a single location. So if we call a read and a write from thread one like R1 and W1 and then we can call the read and the write from thread two, R2 and W2, we'll assume we can't reorder instructions within a thread. So you'll always see R1 followed by W1 and you'll always see R2 followed by W2. So you won't see W2 come before R2 or something like that. Although your compiler technically can reorder code and your CPU may reorder instructions too. So that makes your life even more difficult but we will assume no reordering in this course but in the real world, yes things might get reordered and it will really suck to debug those ones but for the most part, the compiler's do a good job. So if we are, oh yep. So in this case, we couldn't have a write one before a read one because we're accessing the same variable but if you have multiple values, variables you're changing, it could reorder the instructions if it wanted to. So if those were reads and writes to different variables, it could reorder between those and probably make your life a bit more difficult. Usually it's not a big deal. So in this case, if we argue about all possible reorderings, well the first thing that can happen is either thread one runs first or thread two runs first. So if we have thread one run first, well it would do the read and then after that, well we could have context switch back over to thread two and it could read and then at that point, the writes could be in any order and we would get one as our result for, in this order if we have, we don't context switch over to thread two, thread one would continue and then write, update that global variable and then we'd context switch to thread two and it would read it and then write it and our final result would be two. So any questions about that? Those are all of our possible orderings and you can tell how quickly this gets out of hand for a real problem. So this is just, this is two instructions, a thread and I'm only doing two threads. So you can imagine all the possible orderings that could happen if I have eight threads on the same value or if I have multiple values going or what have you. So arguing about this is going to be really, really hard if you have to argue about every single reordering. You want to be able to argue about your program such that you just don't have the situation for data races and we will explore that today. So our solution for preventing data races is essentially to prevent any concurrency. So there are something called a mutex and they are a type of lock and that's what we'll be talking about today. So you can create something called a mutex either statically or dynamically. So I could just create it as a global variable or something like that. And that is how I create a global variable. There's just this define that will go ahead and initialize a mutex for you. If you want to create them dynamically, you can create one using this pthreadmutex init and then you give it a pointer to whatever it wants to initialize just like the threads themselves. And then after you're done using it, you can call destroy on it. If you want to include attributes like the threads, you have to use the dynamic version. So this is the attributes pointer, but we'll just assume the default attributes. We won't need to do anything more complicated in this course, thankfully. So how do we use a lock? So with a mutex, you can do a lock call and then an unlock call. And this kind of works like a real life lock. So you can assume that this protected code, I don't know, is a bathroom or something like that. So assume this is a bathroom. So if you don't want multiple people in the bathroom at the same time, and just like you don't want multiple threads writing a variable at the same time. So you can think of it that way. A little bit ridiculous, but hey, go with me. So this lock call is essentially the same thing as you have a key, like there's only one key. So you grab the key and then you lock the door, you go in and you lock the door and you have the key with you. So that means that if any other thread tries to do this lock, you have locked yourself in the bathroom with the key so they can't get in. You're the only one that can execute this code at any given time. If you unlock, that means, hey, you unlock the door and you give back the key. So any code you have before or after, you could have another thread switched to, you could have concurrency all over the place, but between the lock and the unlock, you might get context switched out, but because another thread cannot get this lock, it cannot execute this code. So even if it executes, it could execute this concurrently and then whenever it hits unlock, if your thread has the key, it can't get past this function call. So it has to wait for you to unlock before it can proceed. So we called this code here, this protected code a critical section and that means only one thread can execute it at a given time. So with this, this prevents a data race because now we only have, if we have our update to a variable here, we only have one thread there at a time. So therefore we don't have any data races even if it writes to that variable because we got rid of that condition where we have two or more threads concurrently accessing a variable. So the problem of this is, well, we have a new situation called deadlocks, which we'll get into later if we have multiple mutexes, but for now let's just assume a single mutex and there's also this pthread try or pthread mutex try lock that will essentially try and get the lock and then tell you whether or not it succeeded. This lock call will block, so it will wait until it can proceed and it gets the key. So let us see how we use it. So if we want to fix our data race code, well, all we're going to do is create a mutex that we are going to use and then in the run, all we do is our data race goes between just the counter variable. So we had multiple threads reading and writing to that variable. That's what plus plus counter does. So we can do a lock right before it and then that means only one thread at a time can be executing this increment and then after we're done the increment, we unlock it. So if, whoops. So if we compile that, we get 80,000 every single time. So if we got rid of the data race, everything works properly. So any questions about that? Yeah. So if I lock and then another thread tries to call lock, so that's what's actually happening right now. So if another thread tries to call lock, while another thread has a lock, it will just block there and wait until it can actually acquire the lock. So it'll just be stuck in a queue. Yeah. So a mutex is just, it's a name for a lock that just ensures, so mutex is short for mutual exclusion. So it's just something that will ensure that only one thread is there executing that, whatever is between lock and unlock at a time. Yeah. So you might need multiple locks if say I had another variable I was incrementing. Well, I could reuse the same lock, but then I don't really need, if I reuse the same lock, everyone's fighting over that key when really I could make them two separate ones. So typically with locks, I essentially make this part serial. So you want to make it as small as possible for performance and make sure that threads aren't constantly fighting over it. So in this case, this would be real bad for performance because I'm essentially making every thread, like I'm essentially doing this line in serial and all the other threads are just fighting over the mutex over and over again, because only one can execute this at any given time. So one increments it, seven are waiting, trying to fight over the lock, then whenever that one's done, one of the seven gets it, probably goes and recues and it's lots of fun. All right. Any other questions about that? So this is like using mutexes will be the bane of our existence because we can screw them up real nice. So. Something went wrong. Please try again. Yes, something did go wrong. We use mutexes. All right. So here is all we did to prevent a mute or to prevent the data race here. So we just created a mutex at the top, did a lock and unlock call. So we turned any code between the lock and unlock, essentially serial. So only one thread can execute that at any given time, preventing data races. So we won't have the case where that number the counter isn't 80,000, but we probably just made things a lot slower. So again, that code between lock and unlock is called a critical section. And again, that means only one thread execute that a given time. So those critical sections, while they need to be safe, so they have to ensure mutual exclusion, which again means only one thread should be executing that critical section at a given time. Also with the lock, we should have liveness aka progress. So if multiple threads reached that lock call or that critical section, one and only one must proceed at a given time. And that critical section shouldn't depend on any outside threads at all. You can mess this up and deadlock and that means other, a deadlock just means no threads can make progress. Your program just kind of sits there and halts and it can't do anything. We also should have kind of like scheduling. We should have bounded waiting, aka it should be starvation free, which means if a thread makes it to the lock call, it should eventually proceed. So it shouldn't just get stuck there forever, just like in scheduling. Well, if a thread want, or if a process or a thread wants to execute, it should execute at some time in the future. It shouldn't just go in limbo forever. So if we want to implement it, well, we want them to have the least overhead as possible. So we want them to be efficient. We don't want to consume resources while a thread is waiting for another thread to finish its critical section. You want the locks to be fair. So you want each thread to wait approximately the same amount of time. And they should be simple so that your locks should be easy to use and hard to misuse. And similar to libraries, you want like layers of synchronization in your program. So like your hardware will have low level atomic operations that either happen or they don't. We'll get into some of what they are to build these locking primitives on top of. But for now, we don't need to know them. So on top of the atomic operations supported by your hardware, you have the high level synchronization primitives. So that's where like mutexes would live. And then building on mutexes, you have your properly synchronized application and a properly synchronized application has no data races. So it behaves the same way every time doesn't have data races and doesn't screw up essentially. So if you want to use a lock to implement a critical section, if you have a system with only one processor on it and the only source of concurrency was interrupts, this is a implementation of a mutex lock. So your lock function could just disable interrupts. So if you disable interrupts, that means you don't have concurrency anymore if that's your only source of concurrency. So then after the lock, nothing can interrupt me. I'm the only thing that can execute. And then my unlock could just be to re-enable interrupts. So the idea behind that as well, if I disable concurrency, I'm all good. But this isn't going to work if you have multiple cores. Also it's not going to work for your application because no operating system is actually going to let you disable interrupts like hardware interrupts. So clearly this was a good idea, but it's not going to work on a real system. So let's try to implement it ourselves in software. So here's a lock. We'll just represent the lock as just a int. So we're going to use the value of zero in a nit. So zero will mean that no thread has a lock. It is unlocked or that door is unlocked. And we are going to use one to signify that, oh, some thread has locked it. So in the lock function, I'll just have this while loop. So it has a semicolon here, which means if that condition is true, it will just keep on iterating over and over again, like it won't exit the while. It will just check the condition over and over and over again. So while the value of this lock is one, which should mean that some other thread has the lock, I'll just keep on retrying it over and over again, just reading it over and over again because I want to wait for that lock to become zero, which means some other thread unlocked it. So if the value is zero, I will change it to one to signify that I have locked it. So if I pass lock, that means I'm the only one with it and I should be the only one allowed to continue here. To unlock, we just change the value to zero to signify while we're done with it. So is this going to work? Is it, with this implementation, is it always the case that only one thread can call lock and only one thread proceeds? Yeah, so his thought is, hey, if multiple threads call lock, there is a situation in which all of them will pass it and all of them will continue, which would mean I implemented a very crappy lock. So anyone want to explain why that is the case specifically or what? So does everyone see why that's possible that it could happen? Yeah, isn't that just an edge case? But we have to argue about all the edge cases here because we're trying to implement a lock, right? So if initially, so this is our lock code, let's say two threads call lock. If two thread call lock, it should be the case that every single time only one of the two threads can make it pass the lock call at a given time, right? That's the idea of our lock. So if initially the value of the lock is zero and we have thread one and thread two, ideally how this should work is, well, thread one will read the value of L. In this case, it would be zero and then it should break out of the while loop and then set the value to, set the value of L to one and then that way if thread two calls lock at this point, well, it would read the value of L, it would be one and then it would just keep on retrying over and over again until eventually thread one, it switches back to thread one and then thread one calls unlock which just does a right to L of zero and then eventually this one would, we'd switch back to thread two, it would read the value of zero and then it would break out of the while and then go ahead and write. So in that case, only one thread makes it pass lock at a given time, right? So is there a situation where this is not true where they both make progress through it? And remember this is data racist so we have to argue about any concurrency switches that could happen, so is there a bad time to switch between the threads that will cause us to behave unexpectedly? Yeah, yeah, so if we context switch right after thread one reads, well, then we context switch over to thread two, it would also read the value zero and then at this point both of these would just write the value one to it and both could make it through, oops, and then both could make it through the lock call which means we're screwed. So two threads made it through, two threads would be executing that critical section, it could concurrently switch between them, you can't argue about that, so this is a very poor lock. So any questions about that? Being a bad idea and of course this could happen even if we have eight threads, all eight threads could just read the value of zero, context switch read zero, context switch read zero, da, da, da, and they could all make it if we got super unlucky. And that unlucky is like, you have to argue about every case once we deal with concurrency, so you have to prevent all this. So we can all see that this implementation is bad, right? No arguments about that, this does not work because multiple threads can call lock and make it through before another one even calls unlock. All right, so other issues, yeah, it's not safe, so if we have two threads, there is a possible way for both of them to be in a critical section and it's also not very efficient, so the other thread that doesn't have the lock, it's stuck in that infinite while loop as long as before the other one calls unlock. So why is it just reading the variable over and over and over and over again when we know it's not going to change until the other thread calls unlock? So there's no point in just letting it execute. So that is what is called a busy wait because it's just waiting, it's just essentially burning CPU cycles on something you know that will not change. So any questions about that at all? All right, so I left some additional time. If we want to go and talk about Lab Four, oh, Lab Four's fun. Oh, how would we fix that problem? Yeah, so we'll go into more of it tomorrow or not tomorrow, next lecture. So that, so yeah, all we're up to today is that crappy lock. As long as we know it doesn't work, it's good enough. We'll figure out how to fix it later. So, hey, all right, I assume people have like not read this yet. Oh geez, all right. So you are implementing your own user threads. So you will implement essentially like the interface will look exactly like P threads a little bit. But they're all going to be implemented not as kernel threads. You're going to be sharing that one kernel thread that starts executing your process and creating a bunch of other threads like we did in lecture 18. Like when I showed you how to use you context to essentially create a thread. So in this, I added a new session called progression. So all the test cases are ordered. So you can actually pass the first test case with writing like two lines of code. So I suggest you do that. So you, so like for example, this is how the test cases work. So the test case essentially runs this test function and this test function is run in a new process so it can figure out, you know, if your process terminate or if you terminated early or what happened with it. So this will just run in a new process and it will use your library. So you write a knit function, you can do whatever you need to in it. And then it will just have a sequence of other calls that should hopefully work. So as part of this, if you initialize a library, well, you should represent the current main running thread as thread zero. So in this case, this what ID should return the ID of the current running thread. So the current running thread, this should return zero. All the test cases use shared memory. So you write a value to shared memory and then the test infrastructure will go ahead and read it as soon as your process is done. So you should only write to one of these shared memory locations or the test will only write to one of these shared memory locations once and that's it and it will check all the values in this check function. So if you get any errors, they will look like this. This string will be printed. So if your ID returns negative one or something like that, well, this function is going to check that the integer at that memory location is zero. If it's not zero, it'll give you that message that says the main thread is wrong. And this is like the include for your library and this is the include for the test infrastructure that you don't really have to worry about. So the only things you have to know about the test infrastructure is that there is a test function that runs until it completes and then this check function that runs afterwards that checks all the values. So any questions about that are all right. So let's try and explain a more complicated test case and we'll kind of explain what each function does. So as part of this, so you can initialize your library then you can get the ID of the main thread and then this is how you create a new thread. So p thread create, essentially it needs to initialize some structure or something, all the structures will live in your library and the only argument you give what create is what function to run whenever that thread executes. So in this case, I'm going to create a thread that will run t one run and then create another thread that runs t two run. If you read the description of what create, it will say that, okay, you should create IDs with the low like create IDs sequentially with the lowest available number first. So if the main thread is thread zero, this thread should be called thread one, get ID one, this thread should get ID two. So you can see that here that that's where the return values go and the test case, it says, oh, okay. The ID of the second thread should be one. The ID of the third thread should be two. So that's how you read them. So in this one, how it will work is you maintain essentially a ready queue of all the tasks of what create will just create a new thread, set it up to run using make context, all that fun stuff we showed earlier. And then you would also have a queue that represents your waiting or ready threads, which I showed you how to use a queue. So your library should have a queue, thread zero should still be running after the create calls and our ready queue should be one and two. And in this case, we're just executing in FIFO order. So if we do something like join, join will do what pthread join does. So it means thread zero is going to wait for thread two to finish. So that means thread zero cannot execute anymore. It essentially blocks. So thread zero will block and because thread one is at the front of the ready queue, that would be the next thread that starts executing because of this join call. So thread one should start running and it should start running this t1 run function. So since, so there's this fun function also that we have called whatCancel. So whatCancel is essentially like a kill-9 for threads. So whatCancel, that means that the thread specified by that number is now automatically terminated. So this gets into like, how does this interact with join? So if thread one cancels thread zero, it means again, thread zero is now terminated, it is dead and nothing is attempting to join on thread two anymore because what thread one was doing when it got mercifully killed is it was joining on thread two. So now because it got killed, nothing is waiting on thread two anymore. So what we do here is, well, we join on thread zero which is terminated at this point. So what join should do is immediately return not switch to any different thread or do anything crazy like that should just immediately return and what it returns is the status of that thread. So the status is pretty easy. If the thread gets canceled, its status is 128. If it just terminates normally using exit, status should be zero. So in this case, we write out the result of join. It should be 128. And then here thread, the next step is thread one joins on thread two. So that means thread one can no longer execute because it's waiting for thread two to terminate and we need to figure out something else to run. So at this point, our ready queue would only have thread two in it and that's the only thread I can now execute. So we would start executing thread two and there's nothing else in the ready queue because thread one is now blocked. So whoever wrote this test made a global variable called x up here and because it's a global, well, it'll be affected by everything. So thread two increments it. It reads the value of it to make sure that we incremented it from zero to one and then we create a new thread which should get the ID of zero because that is now the lowest available ID because thread zero has been joined. So the new thread should get ID one. It's set up to execute this null run which doesn't do anything, it just returns. And then after that, we write the value we get to make sure that it's value is zero and then at this point, thread two run is done so you should need to implement implicit exit so that's not done for you like in pthreads you're implementing the thread so you have to do it. So there's like essentially two ways to set this up. You can use UC link and make sure it goes to some other thread you control that does some cleanup or you can essentially use an argument to pass an integer to that run function and then you can figure out what function you need to actually execute in there and then make sure it ends with a exit or what exit. So in this case, if thread two implicitly exits thread one should now unblock and get added to the ready queue. So the ready queue should now be thread zero which is the one we created to run that null run function and since thread two is done, we should just execute whatever is next on our queue which is thread zero. So we start executing thread zero starts executing immediately exits so it exits and the only other thread to run is thread one right now which should resume from this point. So it should return from this join and give you the status of whatever of thread two. So thread two exited normally so it should get a status of zero. Then to make sure we are executing thread one it will increment that global variable so it should go from one to two, then we read it then thread one should exit here and then the process terminates because it's the last thread that can execute. So any questions about that? This is like a hard test case. Yep, so T two run. So it just increments this global variable and then it creates a new thread that executes this null run which doesn't do anything. Yeah, so as part of what create returns the ID of the newly created thread and it's always supposed to be the lowest available number. So in this case because yeah and you can only reuse a number after another thread joins on it and then you can release all of its resources and everything like that. So if we didn't join on thread zero then that number wouldn't be available and we probably would have created thread three or something like that. So yeah, we created thread zero because let's see, at what point? Yeah, so we went here T one, we canceled it and then we joined it. So because we joined it, that means we can, our library can reuse ID one because ID one doesn't represent a thread that exists anymore. So now we essentially freed up ID zero. If we do what create, we get ID zero because it's now available. If we didn't join, we essentially would have like a zombie thread, right? So it keeps its ID until something else joins on it. So we could still join on that same ID if we want. So if we create a new thread, we'd get ID three. All right, any questions with their walkthrough? Yeah, you should use UC link to like have something clean up at the end and call what exit or you should just set up your make contact such that it calls a function that will run whatever the requested function is and then do a what exit at the end. So there's no one way to implement it. It's like one of the only like tricky things to do in this lab. But other than that, it's just maintain a queue. All right, so any other questions? We've seen the code. So make sure that's the beginning. You have an early testing. So just has a main function. You write whatever you want, write the checks and you will get the results. So if you have any questions about the behavior for what happens if you cancel or join a canceled thread or you join a thread before it gets canceled or any weird interactions that you think is an edge case, go ahead and throw that in the test. All you have to do is call check to check any return value and then I will run the solution on your code or on, yeah, on your code you submit and you will know the result and know what the solution should do. So other than that, read the functions. Hopefully the description is clear. So if it is not, let me know. And with that, just remember, phone for you.