 All righty, welcome back to Operating Systems. So last lecture, we implemented the world's crappiest lock. Didn't really work. And it wasted a lot of time and resources. And today, we are going to create a better lock that does not suck. So you can actually implement good locks yourself in software with minimal hardware if you really want. So the bare minimum hardware requirements is that just loads and stores need to be atomic. And your CPU needs to execute instructions in order. And using just these primitives, there's two main algorithms you can use. There's like Peterson's algorithm, and then Lampard's bakery algorithm, where basically it simulates you going into a bakery, grabbing a number, and then calling numbers in order to implement a lock. The problem with these is that they're more computer sciency. They don't really scale that well. And any real hardware will not use these algorithms, or any real software will also not use these algorithms. So what is our atomic primitive we usually have on our CPUs in order to implement locks is something like this magical atomic function called compare and swap. So what it will do is it just wants a pointer to some location where an integer is, and then an old value for that integer, and then a new value for that integer. And then atomically what it will do is it will try to change the value pointed to by p here from old to new. And it will always give you the original value of the old value. So this will be atomic, and it will do the compare and swap for you all atomically. And you will know whether it changed from, in this case, a 0 to a 1 if it would turn 0. So here is our compare and swap. So what this will do, it's pointing to our lock, which is just a pointer to an integer. And the other arguments are saying if the value of that is currently 0, please change it to a 1. And then this function will always return 0 if it has successfully made that swap. Otherwise it will always return what the value of that memory location was before this function runs. So if compare and swap return 0 here, it means it successfully changed the 0 to a 1. If it returns just 1, it means the value was 1. And it didn't do the swap. So that is all atomic. And it will make sure that, hey, it atomically checks the value, and then we'll swap it if it matches. So any questions about compare and swap? So with that, we can actually implement a lock. So if we change our implementation to lock to this, well compare and swap, it will return 0 if it has successfully atomically changed that value from a 0 to a 1 to indicate that this thread has the lock, in which case it would return 0 and then break out of this while loop because that's the thread that changed it from 0 to 1. And one thread could pass this lock call. Now, while that value is 1, if another thread tries to call lock, it will do compare and swap at that memory location, and the current value is 1. So it will just get back a 1 and then it will loop here infinitely like we saw before. So in this case, we don't have any data races because this compare and swap is dynamic. So this is a correct but not efficient implementation of a lock. So everyone agree with that? Yeah. So it's not efficient because if one thread acquires a lock, so it changes the value from 0 to 1, and then another thread will just continuously read it over and over and over again, and just keep on trying even though the value we know isn't going to change. So that's why it's not efficient. But it works. So this is a implementation of a lock. This specific implementation is actually used and it's called a spin lock because of that while loop it will just try again and again and again and again and again. So again, any questions before we move on in this? This is a perfectly valid implementation of a lock. It just has a little bit of a deficit where it will just continuously retry over and over again. All right, we're good. So what that is is why I said before, it's essentially a spin lock. And that compare and swap is going to be on any modern CPU hardware, and that's the atomic primitive that everyone depends on. On x86, it's called comp exchange. Weird name, right? But we're bad at naming things. It's basically short for compare and exchange. Might be called that. Might be called compare and swap. They're essentially all doing the same things. Just how you use it might differ a bit. So we still have that busy weight problem for our lock, in which case one thread will just retry over and over and over again if another thread has a lock. So again, if we just have a single core CPU and you don't acquire the lock, if we were smart, we would just yield. So you hopefully have started implementing that in lab four or at least thinking about that. So we could just make this a bit better. And if we don't acquire the lock, the current value is one. We just yield and hopefully another thread can execute that will actually unlock that lock. If you have a multi-processor machine, it might be worthwhile to try again and again and again, depending on how long that other thread will acquire the lock for. So the longer it has the lock, the more time you're going to waste retrying it. But in some cases, it might only have it for a brief amount of time where trying it again and again and again might actually not be that bad. So here is our implementation that just adds a yield. So we just have wall compare and swap. So if it returns one, which means another thread has a lock, I wasn't able to change it from a zero to a one, I will just yield. But now we have another problem, isn't that funny? Every time we try and make something better, we introduce another problem. So our new problem is something called a thundering herd problem. What's that? Say we have eight threads and one thread successfully acquires the lock and then there are seven threads trying to get it after that thread unlocks it. Well, they'll just yield and whenever that first thread unlocks, seven are all going to try to acquire the lock when really only one's going to be able to get it. So why do I have all seven trying to fight for it at once? That's the thundering herd. So as soon as it's unlocked, there's seven immediately trying to unlock it. And also in this case, we don't have any fairness either. So as soon as that thread unlocks the lock, any of them could acquire it. We have no idea and ideally we would like to control it so that you're only waiting for the lock for a set amount of time. You wanna be able to reason about it. So this says, oh well, we should probably introduce a queue so that whenever a thread tries to acquire the lock, it will actually get it in the order it has stood in line for. So what we could do is just do a big code so I'll do this in pseudo code. So in the lock, if I fail the compare and swap, I will add myself to the queue to wait for a lock and then put myself to sleep. So I will wait until someone else wakes me up so sleeping just kind of blocks the thread until someone actually wakes you up. And then in the unlock, whenever the one thread that has the lock calls unlock, it would just set its value to zero and then it would check. If there's any threads in the wait queue, well, I should just wake up whatever is at the front. So I should just wake up one thread and there are two issues with this implementation. One is called a lost wakeup and another one is very well-named. It's just called the wrong thread gets the lock. I don't have much of an imagination. So let's look at that code for a little bit and see if we can argue what conditions would cause something like a lost wakeup or what possible conditions could call or could cause the wrong thread to get the lock. So again, with this, it's essentially the same problem as a data race. So lost wakeup should mean that one thread puts itself to sleep and then it never wakes up. So it's essentially in a coma, which is not good. So let's have a look at that and try and devise a situation in which case one thread will essentially just be in a coma forever. Yep. So it would check if it's in the queue and it's not and the other thread is here. Yeah. All right. Everyone else get that? All right. Let's see that example real quick here. So do, do, do. So what could happen? Let's say thread one. So thread one has a lock. It was the thread that set it from zero to one. So it should be the one that calls unlock. So let's say thread one is going to call unlock, start here and then another thread called thread two is trying to acquire the lock. So what could happen is, let's start executing thread two. Thread two could do this compare and swap. It would see that the current value of the lock is one because thread one has the lock. So it would next want to execute this line and then before it executes that line, what might happen is we might context switch to thread one. Thread one would set the value of the lock to zero. Check if any threads are in the weight queue which currently there are none. And then, so if there's none, it would just go here because that condition's not true and then finish unlock. So now the lock is currently unlocked and if we context switch back to thread two, well it's going to add itself to the weight queue and then put itself to sleep. And now it will be a sleep. Thread one, which is what thread should have woken it up will now not wake it up. So everyone to get that, yep. Yeah, so if say another thread thread three comes in, whoops. So if thread three comes in, it might acquire the lock and then unlock and then it would wake it up. But if you have a giant queue, just one of the wakeups would be lost. There would always be one sleeping. But in this condition, yeah, we could clear it up if just a new thread just did a lock and an unlock. But if we had a bigger queue, then we would lose a thread. All right, any other questions about that? So that's the lost wakeup. All right, what about a situation in that the wrong thread will get the lock? So here as a preamble to this, let's assume that thread one has the lock and then thread two is in the, so let's assume thread one has the lock then thread two is in the wake queue. So it should be the next one to acquire the lock. Then say we also have a new thread called thread three. So is there a situation that thread three will acquire the lock before thread two? Yep, everyone agree with that? Got thumbs up, so let's just go over it real quick. So if thread one has the lock, that means the value of the lock is equal to one. So what could happen is thread one starts executing unlock. It changes the value from one to zero and then it gets context switched right there. So now that the value of this lock has now changed to zero and then we might context switch directly to thread three which starts executing this lock call. In this case, well, it would check, compare and swap. The current value is zero, so it would just change it from a zero to a one and now it got the lock. So it just kind of whoop, it just came right in there, stole the lock from thread two. So everyone see how that happened? All right, not good. So now we have to fix those problems. So yeah, here's the last wake up example. So you have it and also the wrong thread getting the lock, although I just named the threads differently. I swapped thread one and thread two, but same difference as long as you're consistent, it doesn't matter. So here's how we fix it. This looks a bit ugly, but essentially all it is, here, let us swap back to this. So this looks a bit ugly, but what we are doing here is we are implementing our mutex. So let's see. So our mutex here is going to hold a bunch of things so that will hold our queue of threads that are waiting for a lock. This is the lock that is external, that all the threads should be using to make sure that there is only one. And then we are adding a new field called a guard. So this guard will be a spin lock that we only internally use to prevent data races in these lock and unlock calls. So this guard variable is used here, so an easier way to read this is that whole line I highlighted there. Well, that is actually equivalent to just locking the guard. And then if I go through, so whenever I set the guard to zero after I do a compare and swap to it, well, if we remember, that's the equivalent of just doing an unlock to the guard. So if we do that, it's kind of like just an internal mutex that we're using to implement lock and prevent data races. So we will lock the guard. Therefore, no other thread can interrupt us because we're in a critical section. Then we will check the value, the current value of the lock. We don't have to use compare and swap in this case internally because while we already acquired the guard lock, so we have mutual exclusion. We have no data races because only one thread can start executing this at a time. So we can check the current value of the lock here. If it is currently zero, we don't need compare and swap, we will just swap it ourselves. So we change it from a zero to a one. We know that we now have the mutex, so we can just return from this function, make sure we unlock the guard so that another thread can call this, and then we carry on. Otherwise, if we cannot acquire the lock, what we will do is we will put ourselves into the waiting queue and we won't have any problems with data races because we have that guard lock. Then we can unlock it and then put ourselves to sleep and that will just wait for something else to wake you up. Now in the unlock, we have the same thing. So we'll use that guard again to prevent data races. So here it's the equivalent of lock guard. And then at the end there, this setting it to zero is the equivalent of locking the guard or unlocking the guard, sorry. So first thing, if a thread wants to unlock, it will get the guard so that we don't have any data races with the queue. So now it will check if the queue is currently empty. If it is empty, I'm good. I can just set the lock from one to zero and that's it, the lock is now free for other threads that might wake up. Otherwise, if there is a thread in the queue, I don't have to, I can just transfer the lock to it. So I don't change the value, so the value of the lock is still one. All I do is wake up that thread and when that one thread wakes up, it will resume from the sleep and then just exit out and then continue with the lock, make it past the lock call and now it has the lock. So any questions about that or any potential issues we see here? Yeah, so this solves our problem. So that one case where we had another thread come in and swoop the lock, well, let's see what happens now. So let's say thread one has the lock. What did we say before? Thread one has the lock and then yeah, thread two is in the queue and then thread three starts. Right, that was our sequence we had before. So what was our problem before? Well, if we immediately started thread one and then change the value to zero and then some other thread swooped in, right? So now if we start executing thread one, while currently the value of the guard is zero because no one has it, so it would essentially acquire the guard spin lock here and then start executing this, right? So as soon as it makes it pass that, it acquires the guard spin lock. Say thread three started to execute. If thread three started to execute, it's essentially at that lock for the guard so it's not gonna pass. There's no way for it to pass and kind of swoop the lock. So because of this thread one has to continue in order to make progress. Here it would check the queue and then oh, okay, there's something in the queue so it would transfer, it would wake up thread two and thread two would get the lock because it was waiting in line. There's no possible way for thread three to swoop in and take it, right? All right, yep, yeah, does everyone see there's still a very slight issue? So there is a situation in that, let's see. Yeah, let's have the same case. So thread one goes here, wakes up the next thread, which is thread two, thread two acquires it and let's see what would happen in the case we get super unlucky. So thread three is trying to acquire the lock. Let's say currently some other thread has the lock so it can pass the guard. Okay, we don't want that highlight to follow us. So it can pass the guard. It would check the current value of the lock if another thread has a lock while the current value is one, so it would put itself in the queue and then do the unlock and then it could get interrupted here right before it puts itself to sleep. So let's say that it's the only thing in the queue right now, whoops. So in our queue, we have thread three and it is currently not yet asleep but it has added itself to the queue. Well, if thread one comes in and thread one tries to call unlock, well, it can currently get the guard because thread three has let it go and added itself to the queue. So it would be able to make thread one can make progress, can you stop it? Stop it. So thread one can make progress and then here, well, oh no, it could check if the queue is empty, it's not, here it would try and wake up a thread but it is currently not asleep. So that is a slight issue. So luckily this is the only kind of data race there is, it's already added on the queue so because it is on the queue and not yet asleep, you know that the next thing it is going to do is put itself to sleep. So easy solution to this is just retry it. So you could just yield yourself, add a bit more code here, yield yourself until you can successfully wake up that thread because you know it's about to go to sleep or you could devise a way to cancel it's sleep or do something else but it is a situation that we can handle. So any questions about that? Because that is the trickiest thing. So that's a mutex, that is the correct implementation of a mutex that works well. So it doesn't have the thundering herd problem, it doesn't have any lost wakeups and it doesn't have the wrong thread stealing the lock. We have solved all of our problems so everyone agree we have solved all our problems? Life is good? All right, life is pretty good, great. So here's what I was saying before that there is still a data race. Oh, question doesn't fail because I don't like the screen. Okay, that doesn't help. So yeah, the one we saw before is failing because of bad context switch and where it didn't get placed in the queue. So in the full implementation that we saw previously so here it is again. There was still a data race where a thread unlocks the guard and gets interrupted right before it puts itself to sleep after it adds it to the queue but we know that that thread is about to put itself to sleep so we could just retry it. So through all this remember what causes the data race because you will have to identify and fix any data races in your program as long as you have multiple threads. So say it again because it's really important. So data race, two concurrent actions, at least one of them is a right. So if we wanted to optimize ourselves for performance well we might notice that we can have as many readers as we want. So as long as each thread is just reading we can have all the readers we want in the world. So it may be the case where your program does a lot of reads and then writes are infrequent in which case you would want to kind of maximize apparelism. So if you want there is another type of lock that has different lock modes depending on whether or not you are reading or writing the variable. So they're very creatively called read write locks. So with mutexes or spin locks they don't care they just make that critical section limited to one thread executing at a time but we want reads to happen in parallel. So how that works is there's this P thread read write lock and then you can lock for reading or you could lock for writing. So how that works is it will allow as many readers as you want and it will also guarantee that when some thread acquires the write lock there is no threads with the read lock and there is only one thread with the write lock and that will happen. So you'll either be at the situation where a bunch of threads have the read lock which is perfectly safe or only one thread will have the write lock in which case you will not have a data race because you essentially just disabled concurrency. So how that looks is it looks pretty similar. So it uses that guard and the two lock implementations are just going to do something different based off whether or not you try to lock it for reading or writing. So if you try to lock it for writing it just functions like our lock we saw previously just our normal mutex it would just try and acquire it and then at the end it would just unlock it. The only difference is for the readers. So for the reader we first lock the guard because internally we are keeping track of the number of readers that are sharing the lock. So there's a plus plus reader here to keep track of the number of readers. So we saw before if we don't have a lock around it while then we have our data races it's like that count example we saw at the very beginning of like lecture 16 or whatever it was. So if we now increment the number of readers and we see that we are the first reader that means we have to acquire this mutex that we are sharing with the writer. So if we acquire it then we have the lock a writer is not possible to it's not possible for the writer to get the lock and then we will unlock the guard because the guard is just protecting this number of readers variable. So the nice thing about this is if we have another reader come in while it could get the guard increase the number of readers from one to two and just keep on going. So it just allows as many readers as you want could share hundreds of readers thousands of readers as many as you want they're all sharing that lock. The only difference is that in the unlock while it will try to acquire the guard lock decrement the number of readers and then there's an if here. So if you are the last reader then you are responsible for unlocking the mutex. So it's like that same I guess this is like the weird bathroom example. So the first person in has to get the key go in the bathroom and then as many people can pile in as you want and the only rule is the last person that leaves has to lock the door. So any questions about this? It'll what sorry. No, so if another thread say some thread has the lock for writing so it would have this internal lock here right. So if we have a reader it would get the guard no problem increase the number of readers from zero to one and then it would reach this lock call so it won't be able to pass until the writer unlocks it right. Because it's the only one at this point that could unlock it. So you're either in the case where there's only one writer or there's a bunch of readers. You can't have a mix of both otherwise you would have data races right because data races concurrent accesses with the least one of them being a right. All right, we're good. We're all experts now. All right, so that's wow that was faster than I expected so we have additional time for lab four stuff if you want to ask me anything. So let's wrap up. So we, what? Oh, okay, so we want critical sections to protect against data races so that's anything between the lock and the unlock. So we should know what data races are to concurrent accesses at least one of them is a right. Also spoiler alert, you have data races in your lab four even if you don't know it yet. So I wrote an additional like common issues because I guarantee that you will have that issue because I had that issue and I thought my implementation was working but it wasn't. So if I had that issue, likely you have that issue. So, but for the purposes of locking so mutexes or spin locks are the most straightforward locks just ensure mutual exclusion. Mutexes have a queue associated with them bit of extra work. Sometimes if things are, if the critical section is small enough you might want something like a spin lock for performance reasons because while they're really, really simple and they essentially pull really, really fast so as soon as it gets unlocked you will acquire it. If you want to implement locks in the real world you need some hardware support so you need that compare and swap atomic instruction. Remember atomic is the important word there otherwise it would have a data race we need to either happen all at once or not. And if you implement mutexes you need some kernel support for waking up threads and putting them to sleep. You could have implemented that and if you really want to you can implement thread sleep and wake up in lab four, up to you. I took that out but if you wanna put it back in that's up to you. And there is also a special type of lock called a read write lock that will share the lock between as many readers as you want and then will only allow a single writer. So for some programs, for performance reasons if you know you have a lot of readers you should probably use that lock. So with that just remember pulling for you we're all in this together.