 All righty, you welcome back to Operating Systems. So today, talking about locks, what are locks for? Glad you asked. So remember, lecture 17, where we had this example, where we had eight threads, each trying to increment a global variable by 10,000 times. Whenever we executed this thing, well, our final result, we expected to be 80,000, but every time we executed it, we got a different number. It was never quite 80,000. And it basically told us how unlucky we are. So what is this problem we have encountered? So we have to give it a name and identify it. So the name for the problem we encounter is something called a data race. And that occurs when you have any concurrency where you are modifying the same memory location. So data races occur when you are sharing data. They can even happen on a single core machine. So a definition of a data race is when two or more concurrent actions access the same variable, and at least one of them is a write operation. So if that is true, then you have a data race. And a data race basically just means one of the threads might get some old stale value and you might just have some weird unexpected results. So before we can argue more about data races, we have to talk about atomic operations. So atomic operations are operations on your CPU that either happen or they don't happen. There's no kind of in between. So they either happen all at once or they don't happen at all, no pausing, kind of like atomic particles or whatever. You can't divide them up anymore. They either exist or they don't exist. So that means if we have an atomic operation, you can't preempt it because it either happened or not. But you can have preemptions between two atomic instructions. So you might get preempted at any time in between any two atomic operations. So if we want to argue about that, and this will be a preview for your compiler course if you want to take it. So compilers just don't take C code and spit out assembly code. They do a bunch of intermediate steps. And one of the things that they have to represent all types of different computer architectures is they will generate something called intermediate code. So in typically intermediate code will be three address code. And this is what compilers use internally to do analysis on and then do any optimizations on. It won't do optimizations directly on your instructions for that machine. So what three address code looks like is the statements just represent one fundamental operation and you assume each of these are atomic. And then we can use these to reason about data races because they are the atomic operations. And it's also easier to read than assembly. So all of the statements will look like this. And some of the arguments are optional. So this is as complicated as a statement will get. It will assign some value to the result that can be up to operand one, some operator, and then operand two. So it could only deal with two things at a time and do some operation on them. So this can't represent something like one plus two plus three because that's too much for a single statement. So a statement, if you wanted to do that in terms of statements, you would need two. So you would need to have a temporary location and then save like one plus two to it and then a new location for that result. And you would have to add the old temporary to three. So you would have to do it over two steps. So the specific three address code used by GCC is called GIMPLE. Don't ask me why it's called that. We kind of suck at names. Look at your names for all your labs. So yeah, I can't really blame them. But if you want to see this GIMPLE, which looks like those statements we saw before, you can add this FDumpTreeGIMPLE flag and you'll see all of the three address code used by your functions. If you wanna see all the three address code, you can use FDumpTreeAll and you'll just see a whole bunch of compiler internal information. It's a bit easier to reason about your code than low level assembly. So not everyone likes to read assembly. Probably that's most of the class. So it's easier to reason about even if you don't know the specifics of GIMPLE, it's a lot easier to read about. So for example, in our data race example, if we were to take that plus plus count that we kind of went over last time and we did the three address code on it, the GIMPLE would look like the following. So anything with D dot some number is essentially a register and in three address code you have an infinite amount of registers. So every statement would go to a new register. So this first operation is the load from memory. So just to make things explicit, we change the count variable to a pointer to count because it would live at some virtual address which would be where the global variable lives. So this would be a load from memory into a register and then this operation, so D2, our temporary register called D2 would be equal to D1 plus one. So that's our increment. And then our final operation would be our store or our write. So we would write whatever the result of the increment is to that global variable. So now like we had before assuming two threads execute this and initially our global variable is equal to zero. What are all the possible values of that global variable? So can it equal two if two threads try and increment to it? Yes, hopefully can it equal one? Yeah, it can equal one because we can have this situation. So if we were to analyze all of the data races, well, you have to essentially do all the possible, like analyze every single possible preemption that you might have. So if we call the two things we care about are the read and the write to that global location. So for thread one, we'll call the read R1 and the write our W1 and the read always has to happen before the write because within a thread, it just won't do things in a willy-nilly order. It will execute from the beginning to the end within a thread. So we can call the operations in thread two R2 and W2. So if we assume that within a thread there won't be any reordering of instructions, each thread again only does a read and then a write. Your actual hardware, like you can see how this gets complicated really quickly because your actual hardware might actually reorder instructions. It probably wouldn't reorder the read and the write. But if you have more than one variable, while it could reorder operations between the variables, your compiler can reorder stuff, you can quickly see how this will get out of hand. So even with just a read and a write between two threads, we have six possible orderings. Either thread one could start executing or thread two can start executing for the first instruction, the first atomic. So we can either have the first front here is thread one executes first, it reads that global variable, so it would load a zero. And then if we continued executing thread one in the top case, it would do a write, update it to a one, and then thread two executes, it would read a one, increment it, and then write out a two, which is probably what we expect. The other possibility is well, after thread one reads, we might preempt a thread two, in which case it would do its read, so it would also read the value zero. And then at this point, while it could preempt back to thread one, or it could continue executing thread two, if it preempts back to thread one, while thread one writes the value one, and then preempt back to thread two, and then thread two writes the value one, other ordering is thread two continues to execute, writes one, and then context switches back to thread one, and it writes one, and same thing if the other thread goes first. So you can quickly see how this gets out of hand. So this is only two threads with just a read and a write. So imagine all the possible orderings on your machine if you have eight cores updating one variable. So you can kind of see why it was actually kind of a miracle that we got 80,000 at all. So any questions about that? That gets real fun real quick. So in order to prevent something back, we can create something called a mutex, which stands for mutual exclusion. So that should mean only one thread can do one thing at a given time. And I'll show you how to use them in the next slide, but first you need to create them. So you can create a mutex either statically or dynamically, so I could create a global variable called mutex M1, and if I want to initialize it, I have to set it to this define, and that will go ahead and create you a default mutex. If I want to use something like pthread create on it, or like a malloc and a free, I can just create a mutex called M2, and if I want to initialize it, I can call pthread mutex init, give it the address, and then similar to pthreads where you can set attributes, second argument is a pointer to attributes, but for this course, we'll just use null, we'll get a default mutex, and that's it. In either case, whenever you're done using the mutex, like with memory, you have to free it. For mutexes, you have to destroy them when you're done with them, and that's how you create them. So two options, either as a global variable or just anywhere you want, and then you have to use mutex init. So how you use them is a mutex has essentially two calls, there is a lock and an unlock, so how you use all your mutexes is any code you want, only one thread to execute at once, you call mutex lock, and you can think of it, one way you can think of it, like this protected code where you only want one thread executing at the time, kind of like, think of this as like, I don't know, a bathroom or something like that, so ideally only one person is in the bathroom, no one else follows you in or does anything, so this lock call would be the equivalent, it should behave like the equivalent of you locking the door of the bathroom and having the key with you, so if someone else comes to try and use it, they can't do it, so you will only have one bathroom in the time, or one person in the bathroom at a time, just like you will only have one thread executing this code at a given time, so if we only have one thread executing this code at a given time, we don't have data races because we can't have any concurrency within that protected code, then whenever that thread is done with the lock, it can call unlock, which essentially unlocks the door and then puts the key back, and then we can have any concurrency between more code and code, but only one thread at a time can execute this protected code, and this will, with locks, we have more problems as well, so ideally you want that locking code to be as small as possible because while essentially anything, all the protected code runs serially, which is slow, so sometimes you might want to have multiple mutexes that protect different variables instead of reusing the same thing for everything. In that case, there is something called deadlocks you will have if you have multiple mutexes. For now, we'll only do one single lock and we'll talk about deadlocks later. And there's also this try lock that will not block and will tell you whether or not you essentially got the key, so yeah, again, how it works is the first thread that makes it to lock will grab the key and then would be able to execute this protected code. If another thread tried to come in while some code, while another thread is executing this protected code, it would hit the lock and that call is blocking. So the other thread would block, wait until that thread that is currently executing that code is done with the lock and then it would get there. So any questions about how that works? All right, well, yeah. Yeah, so the question is, well, if there's a bunch of threads waiting at this lock call and then say there's seven threads waiting for it and then the code with the lock eventually calls unlock, what happens with the other threads? Is it random? Is it, did they all just try? Is it just a big fight or what? So that will be a topic we will get into. But ideally, you kind of want a queue, right? Because if we're doing offering systems, we want things to be nice and fair. So they should probably get it in more or less the same order they tried to. We should have the case where we don't starve it and all that and we'll get into that later. But yeah, so there is considerations we should have once multiple threads are trying to lock all at the same time, who should get the next lock? All right, so let's just try and use it. So with our code, let us go back and here all we will do is create a mutex and then we will do a lock and an unlock. So if we do a lock and an unlock around plus plus counter, that means only one thread can be incrementing it at any given time, in which case we should not have a data race and in which case the counter should be 80,000 every single time we execute it. So here we go, 80,000, 80,000, 80,000. We fixed our code finally and this is the proper fix for it. But this fix is also kind of lame because this is essentially running in serial and that's pretty much the only thing the threads are doing. And as you might imagine, if we have eight threads and only one can be doing this increment at any given time, most of the time seven threads will be sitting there at the lock all waiting for the lock. And if there's a queue, you can imagine that's probably a lot more complicated than just incrementing a variable. So this probably actually makes our program much slower than the serial version. So that's another thing. If you need to fix these things, mutexes are a way to do it, but you have to keep in mind that mutexes are not free, they take time to execute too. And if your goal is to make things run faster, well, that might not be a great idea. Oh, sorry, was that a question? No, okay. So that code between the lock and the unlock is called a critical section, which again just means that only one thread is executing at any given time and it should have a few properties. The first should be safety, aka it's really mutually exclusive, which means again, only one thread is executing that critical section at once. There should be other properties like you noted. So we should have liveness, aka progress, so if multiple threads reach that critical section, one and only one should proceed, they shouldn't just all just sit there and go, I don't know what to do. And that critical section shouldn't depend on any outside threads at all, you can mess it up and deadlock, and deadlock will get into later. Basically just means that no threads can make any progress. Two things are waiting for each other to do something, you can't resolve it, none can make progress. Then the next one is bounded weight. Kind of like scheduling, we don't want to starve any threads or processes. And what bounded weighting means is while a property that means if a thread makes it to lock, it must eventually proceed. So it can't be the case where it will just get stuck there forever because other threads essentially butt in front of it. So that would also necessitate a queue if we want bounded weighting. So other goals for our critical section is we want it to be efficient, we don't want to consume resources while we're waiting ideally. We want it to be fair, so we want each thread to wait for approximately the same amount of time, and it should be simple. So easy to use, hard to misuse. So similar to libraries, you want layers of synchronization in your programs. The lowest level part is like the atomic operations provided by your CPU that we'll get into in the next lecture. But not even knowing those, those are the operations that are used to build high level synchronization primitives. And the example of one of those is the mutex we saw. So that is just a high level tool that allows you to create critical sessions. We'll see other synchronization primitives as we go on in the course. And then using things like mutexes, you should be able to create a properly synchronized application, which means it does not have any data race that behaves the exact same way at the same time. You don't get any weird unexpected results or bugs. So if you wanted to implement a lock and your system only had a single core on it and the only source of concurrency was interrupts, well, you could implement a mutex by doing this. So your lock function could just disable interrupts completely and then you get rid of concurrency. So there's no way to switch to any other running thread or do anything else. So you can, you'll be the only thread executing that critical section. And then as your unlock, all you do is just re-enable interrupts. So you just re-enable concurrency, let things change, switch between threads as much as it wants. But this again would disable concurrency completely and it's not going to work if you have multiple cores on your machine because you're not preventing interrupts on any other core or any other core from just executing at the same time. And also, that's like disable hardware interrupts on your machine and no operating system is going to put a system call in where you can change the hardware like that. So this is a good idea, but it is clearly not going to work. For the simplest systems though, if you just have an embedded system with a single core and your only source of concurrency is interrupts, you don't have to implement something complicated, you can just do this. So, let us try to implement a lock, a mutex ourselves in software. So we are just going to use a number to represent whether or not essentially a thread has a key or not. So we'll just use zero to indicate that the key is available and we'll use one to indicate that some thread actually has a key. So in the init, all we would do is initialize that int to zero and then in the lock, the idea here is this is while the value of the lock is one which should indicate that some other thread has acquired the lock and it's in its critical section and here is a semicolon at the end of the while which means this will just retry over and over again. So if the value of the lock is one, it will just go back, check the condition again. If it's one, it will go check the condition again and then if it reads a zero, that's the only time it will break out of this while loop and then it should be the only thread, yeah, it should be the thread that is going to acquire the lock. So it will write the value one to indicate that it has the lock and then the idea here is that if any other thread tries to call lock, well, it will just be stuck in this while loop over and over again and then eventually the thread that acquired the lock would call unlock so it would set the value from one back to zero and then if another thread was waiting, it could break out of this while and then it could grab the lock. So if we look at this, is there any issues with this implementation at all? So the biggest one would be, is it possible that two threads call lock and two threads make it through lock because it should be the case that no matter what, if two threads try to call lock, only one makes it through and the other one has to sit there and wait in that while loop. So, all right, any ideas? Yeah, so two threads call lock, one reads a zero and then switch to another one that reads a zero and then they both pass, right? All right, we'll understand how that might occur. So yeah, that's a possibility in which case this doesn't do anything, it's useless as a locking function. So anyone want me to go over that in detail or are we all good with that? We're all good? Okay, yeah, same reason before, so this won't work and so, well, it won't work for two reasons. One, it's not safe so both threads could be in the critical section at once because they could both pass it and it's also not that efficient because even if it does work, well, if it does work and say thread one executes lock before thread two does anything, so it would acquire the lock, set it to one and then if we context switch back over to thread two, all it would do is endlessly execute this while loop as long as it's able to execute so it would just go over and over again and that's kind of a waste because you know it can't make any progress until thread one doesn't unlock so why would you bother trying it again? So those are the solutions we will tackle next lecture. So as an encouragement, I left lots of extra time because we can quickly go over lab four because I imagine none of you have actually read this yet. Oh, one person has. All right, so with this, yeah, if you want, we'll quickly go over it. If you wanna leave, you're free to because we're talking about this the whole time. So with this, I added a new section that should help you walk through how you do the lab. So first, you should read what each function does. So you're essentially implementing user threads, so it should have the same form as the pthread library just that you are creating user threads. So you are creating like what a knit so you can set up any data structures you need. What ID should be the ID of the current running thread and then if you wanna create a new thread, you should use what create. Unlike the pthread create that initializes a structure and then has a function and a bunch of attributes and stuff, you are just telling that thread what function to run whenever you create it and this what create essentially, it would have all that you context stuff we did before. So you would have to make a context and then tell it to execute this function. Yield will stop the current thread from running, switch to the next one that's in that ready or waiting queue and let's just go through the progression steps too. So we can look at easy test case. So I ordered the tests in the order you should be able to pass them if you start implementing so you don't even have to do a main context. Yeah, you don't even have to do like any context switching to pass the first case. So how it's set up here, let's go look at the first test. So the first test just says that main thread is zero. So how the test infrastructure works is there is a test function that runs while in a new process. So the test infrastructure will run this until completion and this will be the actual library calls that should test whatever we're testing. So in this case, they will always start with what init to initialize your library and then this what ID like I said before should return the ID of the current running thread. The idea here is that the original kernel thread that is running your process, well, that should be thread zero. So in this case, what ID should return zero? It writes it to some shared memory location. So the test infrastructure uses some shared memory so you can write integers there and then runs your test until it completes and then the parent process will essentially check all the values to see if they are what you expect. So the check function always runs after the test happens to completion and that will just be a bunch of expect calls and it will just test the value to the expected value and then if they don't match, it will print off this message. So in this case, what ID should return zero? So it checks that while the return value it got from your what ID is equal to zero, otherwise it's going to give you this message that says, hey, the main thread should be zero. So any questions about how the test infrastructure kind of works? Runs test, checks it after, it runs check. So, and it does it in a process so you can go ahead, some of the tests will test that you actually called like terminate the process and all that fun stuff. So there's a bunch of other test cases. I will step through and kind of explain what the function should do. So this is a much more involved test case. So again, it will start off by executing tests. It will always start with initializing your library. Here it does the same check. It wants to make sure that the ID of the current running thread is zero. So this should return zero and then as part of creating a new thread, the rule is you should use numbers sequentially and you should always use the lowest available ID. So ID started zero and then they should go up sequentially. If another one is available, you should reuse the lowest number, kind of like file descriptors. So here if we create a new thread and tell it to execute T1 run, well ID zero has already been used. So the next available ID should be one. So this should return one. And when you create a new thread, you shouldn't immediately switch to it and start executing it. It should just have the make context, set everything up you need to. Here it creates another thread that wants to run T2 and it should get ID two because that's the next available one. So currently thread zero is running and then our ready queue, which should be a queue. I showed you how to use a queue in the last lecture. So our ready queue should be thread one and then thread two because we created thread one first before thread two. So here all it does is tries to join on two. So that means, yeah, so you're implementing a join which is kind of like a wait. So this means thread zero that's currently executing is going to join on thread two. That means thread zero is blocked. So it cannot execute or continue executing until thread two has terminated. So thread, so as part of this, I can't run thread zero anymore. So I should run whatever the next thread in the queue is. So I should start executing thread one and thread one should start at T1 run. So it should start over here. There's this fun function I threw into the lab to make things more entertaining. So what cancel is essentially like kill-9 but for your threads. So if I cancel a thread, that means thread zero has now been terminated. So it now cannot run because it is terminated, it's dead and it's not attempting to join thread two anymore. So nothing is waiting on thread two to finish. So in thread one, it also joins on thread zero. So it's exactly like processes and P threads. So right now, right after the cancel, like at this point, thread zero is like a zombie thread because I can't clean up all of its resources or at the least I can't reuse its ID for anything until I join it. So after I joined thread zero, it means, well, I can clean up all the resources that have to do with thread zero. ID zero is now not used anymore and if I create a new thread, it would get ID zero. So the next step here is thread one joins on thread two now. So thread one blocks and the only other thread to execute is thread two. So thread two wants to execute T two run, which is here. So this person wrote, made a global variable here that they initialized to zero. So when thread two starts running, it increments it from zero to one, then they just write it to this location so they can check that its value is actually one now and then they create a new thread here. So if they create a new thread here, oh, I need to delete that comment. Okay, so if they create a new thread here, the lowest available ID is now zero because we joined on it. So this should return ID zero and thread zero will want to execute this null run function, which doesn't do anything. So here it just checks the value that it gets from what create and then T two run is done. So what you need to implement is it needs to implicitly call what exit whenever that function finishes, if it doesn't call it already. So there's two ways for you to implement it. You can use UC link like we saw before and you would have to, well, obviously you don't know what thread is going to be the one you wanna switch to whenever you create the thread. So you'd have to think a little bit about that. You need to create like a separate you context that essentially will do some cleanup for you that you wouldn't include as part of the library. So you just make your own internal thread or if you want, you can pass an argument and in your make context, you can essentially call not P two run. You can call a different function that will call that run and then call exit. Again, that's up to you. But at this point, the only other thread two, yeah. So at this point, thread two would terminate because it calls exit. So thread one would now be unblocked because it can now execute because the threads joining is done. So it would go to the back of the queue. So our queue would be thread zero and then thread one. So we should execute thread zero, which just executes this. Should just call exit terminate and then thread one should continue executing from here. Thread two just died normally. So its status should be zero. And yeah, there's only gonna be two statuses. So if a thread exits normally, its status is zero. If it got canceled by something else, its status is 128. Then here increments the global from zero or one to two, checks it and then thread one should implicitly exit and at this point, there's no other threads to execute so the process should end. So any questions about this or anything about the lab while we are here? So this is your first step. I've walked through a bit about it. Read the other functions again and give me any feedback if you want them described more. I've already changed them a little bit and then pass that. Well, you have the early testing part. So all you have to do is modify this test main.c file and then just do whatever library calls you want. So we'll start with what init and then if you want to know like, I don't know, other ways that cancel interacts with join or anything like that, just write your own test case and then use that check function that is in the code that basically just checks an integer and the idea behind that is you don't have to implement any of the lab, just write a test case and then I will run the solution on your, on that main.c file and you will get the actual result from the actual solution and you'll know what it should do. So any questions on this? Yeah. Yeah, so whenever a thread joins and it gets blocked, whenever the thread it's joining on terminates, it gets sent to the back. Yeah. Yeah, because it'd be like, it's also the equivalent of like, if you just create a new thread, it would also go to the back. Yeah. Yeah, so I suggest you just have one, to make it real easy, you just have one variable that says what thread is currently running and then a ready queue. And that's all you really need. And let's see, progression. So like main thread is zero. Well, if you just have a global variable that has an ID of the current running thread in what init to set running equal to zero and then what ID just return the running thread and boom, you passed the first test case already. So that should probably be your first step. If you want to feel good, just write like two lines of code, boom, you'll pass one test case and you'll be that much closer to finishing the lab. All right, any other questions? Yeah. So with the early testing part, let's see if I can switch the window. Are you pro, you can't see it, cool. All right, so with the early testing part, I should probably add more, I might add some more stuff for this. So this is the test main.c file. You just write whatever you want here. So whatever you want and I will run the solution for it. So it should always start with what init and then literally whatever you want. So you could check what the ID should be, something like that, then I don't know, create your own function. That does, I don't know, create. So you could do something like that, like that's a test case. So if you try and run it, well the default implementation is to return negative one for everything. So this, if you run it on your code, it'll just say, you know, it'll just say check negative one, check negative one, check negative one. But I would run it on the solution and then I will tell you whatever the solution outputted here. So in order to get the 5%, you just make a test case that calls check one or more times and I run the solution on it and if I get output, then you get 5%. So it's mostly for you. So if you wanna know like, oh, how does this interact with this? Just write a test case here and I will run the solution on it. All right, any other questions? All right, then we are free. Go past the first test case. You have enough time to do that. Eight minutes is lots of time to write like two lines of code. All right, so just remember, phone for you, we're on this together.