 Welcome back to 353. So where we left off before taking the brief detour for the threads lab is we implemented a lock and our lock sucked. So had that busy wait problem where it was just an infinite loop reading a value over and over again. One, it didn't work. It let two threads actually acquire the lock at the same time. Also had a busy wait problem where even if another thread has a lock and it knows it can't acquire it, keeps on trying and trying again. And also, it was not fair. We get to solve all of those problems today. So you can implement locks within software with minimal hardware requirements. So the lowest hardware requirements is that memory loads and stores are atomic. Again, remember, atomic means it either all happens or it doesn't. There's no in between. And the other requirement is that instructions execute in order. And with just those two hardware requirements, there's two main algorithms. There's Peterson's algorithm and Lampard's bakery algorithm, which is essentially processes essentially take a number and then try and order themselves in order to get the lock in the same order. However, they're slow. They don't scale that well. So as more processes are waiting for the lock, it gets slower and slower and slower. And processors actually also execute out of order because, well, it turns out that is a faster thing to do. And that's all covered in your hardware course or at least some of it. So we're going to assume the real life situation where we have hardware support for atomicity. So we will assume a magical atomic function. And this is what we will use to implement our mutex. So the generic name is called compare and swap. And if it was represented as a C function, it would look like this. So it takes a pointer to an integer. And that is the memory location that I am trying to modify. So I'm trying to change a value here or read a value here. And then it takes two other arguments, an old value and then a new value. And what it returns is it returns the original value at that memory address. And what it will also do is it will swap that value at that memory address if it is equal to old. So if the original value is old, then it changes to new. So if we do compare and swap, we give it, say, a pointer to an integer. Remember, we just had a lock that is represented by an integer. So if we give it the pointer to our lock value and we say the old value is 0 and then the old value is 1, atomically, what that will do is if the current value is 0, well, it will read a 0. And 0 is equal to old, so 0 is equal to 0. So it would atomically change that 0 to a 1. And we don't have to worry about concurrency or anything like that because compare and swap is an atomic function itself. So it will either, it will change the value of 0 to a 1 atomically, or in this case, since that value goes between 0 and 1, in the case that the original value was 1, well, compare and swap would just return 1. So it would read the original value. So if the original value is 1, well, it's not equal to old, so it won't modify anything, and it'll just return 1. So now if we implement our lock like this, our lock can just look like compare and swap just in an infant loop. So if the current value of the lock is 0, then it would change from a 0 to a 1 to indicate that this thread has the lock, and then it would break out the while loop and return from the lock call. Otherwise, if another thread tries to acquire it and another thread has set it equal to 1, well, compare and swap returns 1, and it's just while 1, and it'll infant loop over and over again. So any questions about that? So this doesn't have the problem where two threads can call lock and two threads make it past lock because this compare and swap is atomic. So if the current value is 0, whatever thread happens to call compare and swap first, we'll change it to 0 to a 1, break out of the while loop, past lock, and then the other one could not do it. And yeah, so there's another question. How does this improve performance compared to the infant loop in the last lecture? This still has a busy weight, but this implementation actually works as a lock and actually ensures mutual exclusion. So still has the problem with busy weighting and not being fair, yeah. So the question is, how do we make sure that compare and swap is atomic? So compare and swap is like a literal CPU instruction that is atomic. So it's like hardware implementation. You don't have to worry about it. Someone else did the hard work for you. And pretty much every single processor everyone has will have an instruction like this, or instruction named similarly. So it might be called compare and swap. It might be called something else that we'll see in a sec. But any questions about this? So this works. It has a busy weight. It's not fair, but it at least ensures mutual exclusion, which means if two threads call a lock, only one is going to make it through at a time. And the only way that the other one passes is that while the other thread just unlocks it, sets it equal to zero, and then the other thread can go ahead and swap it from a zero to a one. They indicate now it has a lock. All good? All right, so this is our first crack at a mutex. It's not exactly a mutex because while busy weight and not fair, but oh, and so yeah, in this, so there's a question, how's the logic work again? So if currently like the value at L is equal to zero, what happens to compare and swap? So compare and swap, it will read the value. So it will read a zero and then compare and swap return zero, but we know because we set old to zero and new to one that if we get a zero back of like that was the original value, we know that this has actually changed the value of zero to a one. So in the case that we change zero to a one, compare and swap return zero, and while zero means we break out of the while loop. Okay, so that implementation is something called a spin lock. So compare and swap is like the actual hardware atomic instruction on x86 might have a different name. They call it complex change. Why? Because they like making things shorter, short for compare and exchange, compare and swap, compare and exchange mean the same thing. They pretty much solve the exact same issue. They just named it something and hardware people like making things shorter, so it has a very dumb name. So we need to solve some other problems. Sometimes like having a spin lock is okay if our critical section is quite small, maybe just waiting like trying it a few times isn't that big of a deal, but it's more of a big deal if we have like a uniprocessor system, so something with the single core on it because we know if we can't get the lock we should just yield to a different thread and let the kernel schedule another thread or another process and then maybe that frees the lock. On a multi-processing machine, yeah, spin locks may be good if our critical section is really, really small because to yield you have to do a system call and all of that and then the schedule has to run so maybe it's better to just try again sometimes. So we're going to solve some issues. First issue we are going to solve is that busy wait problem. So to solve busy wait we can simply add a yield. Oh yeah, so the reason why it's called a spin lock is it's named after this while loop. So the while loop constantly tries again and again and again so kind of like spinning your tires aimlessly, spinning something, basically that it's called a spin lock because this is an infant loop, wallets. It's an infant loop while some other thread has it locked. Okay, so in order to solve that busy wait problem maybe we do something like this where if we get a one back from compare and swap which means another thread has the lock, well, oops, we write a body in the while loop and just yield. So we just say, go ahead, scheduler, please just schedule something else to run but now we have a new problem. So consider we have like eight threads, one got the lock and then seven did not. Well, if we also want to be fair, all seven threads would yield and then as soon as we unlock it, well, we know that only one of those seven threads is going to be able to successfully get the lock but all seven of them are going to try again when we know that six of them are kind of doing it in vain. So we also have no control over which of those threads is going to get the lock first. So say thread two is the first one to try and then yielded itself. While we would like to be fair and if thread two is the first one that tried it should be the first one that gets it when it gets unlocked. So we need to be able to reason about that too and be fair and easiest thing to do is just to have a queue of threads first in first out pretty much the fair thing to do. So just to yield not good enough. So we can go ahead and add a weight queue to the lock. So we'll just add a queue of threads and each time we don't get the lock we'll just go ahead and add this current thread to the weight queue and sleep. And then now in the unlock we add a bit more code here. So if you are unlocking, so you are done with the lock you might just check the threads that are currently in the weight queue and then just wake up one thread. So if there's seven threads sitting there you only wake up one and this assumes that we have a thread sleep. So it puts the thread to sleep it essentially blocks the thread until another thread wakes it up. So this thread would essentially unblock just a single thread in that weight queue. So turns out there is two problems with that with this code as written. One is called a lost wake up which means there may be a situation and then this fun concurrency stuff is what you will have to do on exams and when you are programming you have to reason about these things. So turns out that through different threads executing this concurrently we might have a situation where a lost wake up occurs and it's gonna be our job to illustrate that this is possible. And a lost wake up just means a thread will go to sleep and then never wake up. So it'll essentially go into a coma I guess it'll be blocked forever. It should be the case where it puts itself to the wake queue goes to sleep but it never wakes up. And all we have to do is demonstrate a situation where that might happen using one or more threads. And then there's a second issue where the wrong thread would get the lock. So even though you are first in the wake queue there is a situation that another thread is going to acquire the lock instead of you. So we're going to have to reason about these so first is the lost wake up example. So looking at this code we're going to have to come up with a situation ourselves where in this case we only need two threads where two threads are running and there's a situation where one of them gets put to sleep forever. So as a hint here I will switch this so you can consider we have two threads and assume that let's say initially thread one so thread one has the lock and then the current value of the lock in memory is equal to oops one. So is there a sequence where these two threads can run and I'll give you a hint what function they are running where thread one is running unlock thread two is running lock. So is there a situation and in this case we just for the sake of argument we just argue concurrency. So assume we just have like a single processor running and then we just concurrently switch back and forth between the threads. Is there any order of switching between these threads where thread two puts itself to sleep and then never wakes up. So I'll give you a second to look at the code and think about that a little bit. And yeah we have one potential answer I'll wait a little bit more. We're good someone want to take a stab at yeah. So for this we have to just pick a thread to run first and then like thread one is running unlock and thread two is running lock. So which one runs first anyone which one unlock first. So thread two would run first. So if thread two runs first then what would happen is we change the value of the lock from a one to a zero. And then well if we switch over to another thread so it doesn't really matter if we go ahead we execute this line and then we switch to thread two. Thread two is calling lock so it will do compare and swap first and then see hey the current value of L is zero so it would change that atomically it would change that zero to a one. And compare and swap would return zero so we wouldn't so while zero that's false so it would block here it would go here and then it would be done the lock call. So that's fine so that works as expected. Thread one just gave up the lock that it had and then thread two acquired the lock after it gave it up. Totally fine yeah yeah so let's go back. So original situation was the value of the lock was one thread one has the lock it's calling unlock thread two is calling lock so this time let's assume that thread two runs first so it's calling lock first thing it's going to do is compare and swap current value is zero or one so it's going to return one and it's not going to swap because the current value is not zero so compare and swap returns one so now the next instruction it would execute is right here so do we keep executing thread two or do we switch to thread one yeah. Yeah so we switch to thread two or sorry switch to thread one and it starts running the unlock code so this is thread one so it would change the value of the lock from a one to a zero and then it might check hey are there any threads in the weight queue so if I put like the weight queue up here currently it's empty so I'll just use Python syntax so it's currently empty so is there any threads in the weight queue no so it would go ahead and skip this and then make it to the end and it is now done unlock so and okay well now we have no choice we can have to switch back to thread two so thread two is going to do this it's going to add itself to the weight queue so now our weight queue is going to have thread two in it and then it puts itself to sleep so now thread two is slept it's blocked it's never waking so thread one should have been the one responsible for waking it up but it wasn't thread one's already done so that makes sense why that works so thread two is now gone forever well it's blocked forever so questions about that clear good so with most of these again just assume that we have some threads they're doing some function calls it'll be a bit of practice to like figure out what situations are probably bad so for some more practice let us see if we can do the other situation where the wrong thread gets a lock so as a bit of a hint I'll say that thread one has lock again and then we have some other threads so we have thread two and we have thread three and currently we will assume that we're in the situation where in our weight queue whoops we have thread two so thread two is sitting right here wants to be woken up from the sleep it is in the weight queue so it should be the next thread that acquires the lock as soon as thread one unlocks it so thread two is in the sleep thread one is starting the unlock call whoops and then thread three is starting the lock call so is there a sequence of executing thread one and thread three and thread two I guess but it's currently blocked such that as soon as thread one unlocks a lock well it should be thread two because it is the next in queue but is there a situation where thread three can come in and swoop it yeah yeah you're gonna say the same thing essentially the same thing so situation we could have in this case is thread one executes it executes the line where we change the value of the lock to be a zero so it changes it from a one to a zero so it executes out this line right after it executes this line we immediately switch to thread three if we immediately switch to thread three thread three is going to do the compare and swap so it would currently atomically this is all happened atomically so we can't switch in between doing this it will read the value at that memory location currently zero okay well because it read zero and then because in the arguments old value is zero it's going to swap that zero to a one all atomically so it's going to change that zero to a one and then compare and swap will return a zero which means we finish and then thread three is done the lock call and now it has the lock so thread three just kind of swooped in so what would happen if we did this to completion well thread one would execute it would check if there are threads in the weight queue okay well there is maybe it you know maybe it just removes it from the weight queue after it wakes it up so now it's woken up it thread two can now run so let's say thread one just finishes so thread one is done unlock and now thread two wakes back up so thread two would go and do the compare and swap again and then it checks oh the current value is one some other thread has a lock so it would go ahead and add itself back on the weight queue and then put itself to sleep again because some jerk aka thread three came and just kind of swooped the lock it stole it from out under it so that makes sense so now we get to solve this issue so just so you have it here so whoops that's not it so here's the last wake up example so here's the order of threads so here I ordered them differently so thread one thread two thread two holds a lock the example I did had thread one here just so you have it just shows the last wake up example where thread one in this case never gets woken up and then similarly for the wrong thread getting the lock here's the sequence of calls here at the bottom just so we can go ahead and argue about it so to fix this it looks a bit ugly looks like that yeah it looks like that yikes so there is a fair amount of things going on here does that look fairly readable to everyone? everyone looking at? all right so let's make it a bit more readable so this is essentially just integrating that weight queue into the lock so we're just defining a struct so the struct this int lock that is the L I was using before so that is the integer that's supposed to represent the state of the lock zero means unlocked one means locked and then I'm just making that queue explicit so this is just the weight queue so these are all the threads that are currently waiting for the lock and then I now have another int guard so this int guard is going to be a spin lock so I'm going to use this as a spin lock but the rule is that I'm going to use guard as a spin lock only for my lock and unlock functions so no thread will have guard after the lock or the unlock function the rule is I will only use the guard while I'm executing these functions so looks a bit ugly but essentially assuming that guard is just a spin lock which it is this while loop well that before is essentially just a lock of guard right that was what I had way before originally so this was my proper implementation of a speed spin lock where it worked it had some issues but turned out it actually worked fairly well um so I'm just using that and I'm only using it internally here so I'm only going to use this guard dirt to implement my lock and unlock so this whole thing here this whole while loop here is essentially a lock of the guard this setting the guard to zero is basically an unlock of the guard as with this so might make it a bit easier to read if we do this and here why there's an unlocked so if I call this lock function so this is supposed to be like the mutex lock then the rule is I lock my guard so only one thread can execute this at the time because remember what data races are I don't want concurrent accesses the same memory location in this case the queue and also adding itself to the queue so I lock the guard and then I check if currently we have the lock or not so I don't need compare and swap for that lock variable because well I have locked the spin lock so I'm the only thread that actually can be executing this at the time so we don't have to worry about concurrency within the lock function now and then I'll check the current value so if it's zero it means it's unlocked it means I don't have to wake anything up all I can do is just change that value from a zero to a one says I have now that means I have the mutex and then I'm just sure to unlock it before I'm done this function and then that's all I have to do in the case that no other thread was waiting if another thread was waiting well this basically just adds myself to the queue so this would throw it to the back of the queue and then I unlock the guard because well if I still have it then no other thread can try calling lock no other thread turns out can call unlock so I may have to make sure I unlock it before I go to sleep and then when I wake up it would just continue back here and when it continues back here the way we're doing this is because now that it is protected well let's look at the unlock call same thing this while compare and swap that is the same as just locking the guard then it checks if the queue is empty so if the queue is empty all it does it knows that the current value of the lock is one so it just changes it to zero and then it's done and then it can just unlock the guard before the whole function otherwise if there is another thread in the queue well the current value of the lock is one and I don't have to change it from a one to a zero and then try and have the other one change it back from a zero to a one so I'm just going to transfer the mutex so I'll keep the value of the mutex as one but I'll just wake up the next thread so only one thread will wake up and then the current value of the lock will still be one but the way that it kind of works internally is that we're transferring the lock to the other function so it's still one but it passes the lock call and it should be the case where well now I'm done with unlock so I don't have the lock anymore the other thread has the lock questions about this so a bit ugly but if you kind of just get rid of the big compare and swap everything and change them in and realize that guard is a spin lock then kind of make sense is there any issues with this yeah so this actually still has a very very slight issue so say you get put to say you are trying to acquire the lock say we have two threads so this is a much much much subtler bug so you have two threads and currently lock is equal to one because I don't know thread two has lock so what could happen is if thread one calls lock then it could acquire the guard no problem then check the current value of the lock okay it's one so it goes here so it would add itself to the queue so let's say we have our queue here so let's say it adds itself so thread one gets added to the queue then it unlocks the guard and then it can contact switch right before it goes to sleep and then okay well let's say thread two calls unlock now it would go ahead it'd be able to grab the guard lock then it would check the queue the queue is not empty but it would go here and then it would try and wake up a thread so we try and wake up thread one but thread one is not asleep shit so turns out that this is a very slight problem but it's actually one we can detect so we know that essentially the thread one is about to go to sleep even though we're not sure it might not so thread wake up might just give us an error say hey that thread's not asleep yet but because we actually know for a fact that it is about to go to sleep we might just retry this and retry waking up the thread until it's actually successful and then finally we don't have any more issues even this this gets like ninety five percent of the way there but with threading libraries if it's not a hundred percent well then still broken yeah yep so if there's some other thread ahead of it that we need to wake up that means it was in line first and it should get woken up so so that's okay right so yeah if there's another thread it's okay it's like a very subtle thing is if I'm about to go to sleep and it's also trying to wake me up that there might be a case where I'm not asleep yet so tries we must we still had a bit of a data race but I could have fixed it on the slide but the slide are already scared you as soon as I showed it anyways so it got most of the way there so any questions about this is this slide less scary now that I've we went through it good so this minus that little fix this is the implementation of a mutex so you can just use it but under the hood this is essentially what the implementation looks like we still had so we still had a bit of a data race in the case that a thread called lock it got interrupted before it actually went to sleep but because it was safely added to that queue well thread wake up can just try and wake it up until it's successful because we know it's about to fall asleep so we just retry it over and over again until it finally wakes up with all of these our core issue was remembering what causes a data race so I've said it like a second time that came a whole slide to it so very very important because this is all concurrency things so in this case a data race we have two concurrent actions the same memory location and at least one of them is a right so knowing that there might be cases where for performance reasons you might be like oh okay well that means if a bunch of threads are only reading the same memory location there's no issues so what about if in my application I like rarely write to a memory location most of the time all my threads are reading to it if I just use the plain old mutex that means only one thread can read it at a time and I might have bad performance because well only one thread can run at a time nothing can run in parallel and it's slow if I want I can have as many readers as I want don't need a mutex as long as nothing writes to it at the same time as soon as we have a right then we have to make sure only one thread is executing at a time there is a special kind of walk that has different lock modes depending on whether you are reading or writing a memory location they have a very very very creative name they're called read write locks with mutex's or spin locks you have to lock the data even for read because while you don't know if a right can happen in another thread but really like I said before reads can happen in parallel as long as there's no right so the rule is there is something called a read write lock and there are different lock calls depending on whether you are reading or writing and the rules are multiple threads can hold a read lock simultaneously so it's called a read write lock so there is a lock specifically for reading and if you lie to the compiler and you're actually or not the compiler but like lie to your program and you lock it for reading and you're actually writing well then you have a data race and your program is going to be wrong so the other mode that we can do is we can do a write lock and that makes sure only one thread has this lock if there are currently other threads reading it will wait for them all to finish reading so that it can have exclusive access to this lock and essentially the right lock just treats it as a mutex so we can implement that just using mutex as if we want so if we really wanted to this would be the implementation of a read write lock same idea as before well we have this lock that actually just represents mutual exclusion so for our read write lock it's just we just lock that lock and unlock that lock because it's supposed to ensure that only one thread is doing this at a time locks have mutual exclusion so we're just going to just write lock essentially just delegates to it and says just lock that lock and then write unlock says just unlock that lock and then to implement the read lock well we're going to have it's going to access that lock too and it will go ahead and try and lock it or it is already acquired it will just keep track of the number of readers so it keeps track of the number of threads that have the lock essentially acquired for reading so we have an in called number of readers and we create a guard so that we don't have any data races with this variable if we had a data race with that we might not know how many readers currently have the lock so in the read lock our implementation is we lock the guard and then we increment the number of writers so we're keeping track of sorry the number of readers we're keeping track of the number of readers so initially the number of readers would be zero if we increment it and then we changed it from a zero to a one which indicates that I am the first reader well that means we don't have the lock or anything so if I'm the first reader I have to acquire the lock so then I have that lock that is meant for mutual exclusion and then I can unlock the guard so now if another thread was to call read lock well it would get the guard increment the number of readers from like one to two and then check well it's not the first reader so it knows that another thread already has the lock for reading so it doesn't need to acquire this lock because it's mutual exclusion so it can just unlock the guard and then we can have as many readers as we want two to three, three to four, four to five all good and then for the read unlock well we have to acquire the guard because we're going to change this end readers so if two threads have the lock open well then we would decrement the number of readers from one or sorry two to one and then we would check okay well there's another thread that has the lock so we wouldn't unlock it and we can just go ahead unlock the guard and then if we were the last thread so we changed the number of readers from a one to a zero we know that we're the last reader so we should just unlock that mutex yeah yeah so that's a good point the question is wouldn't this starve or can lead to starvation the read thread or write thread if we have a lot of readers and they just keep on reading and reading and reading and the answer to that is yes in this implementation the real one would not have that so just cut it off if like the read thread or write thread tried to acquire the lock just wouldn't allow any new readers until the write thread gets it so it would be fair this implementation works but it's not fair yeah yeah yeah so this is just preventing that data race so data race two concurrent accesses at least one is a right so that's still true for this but it lets as many readers as possible but in this case before like you just have one thread doing it at a time but now I can have either like eight threads all reading at a time or a single thread writing I can't have both all right any other questions about this one so this again whether or not you want to use this instead of mutex pretty much is performance reasons for this course we'll just we won't ever actually use a read write lock but good to know because many of you will be writing programs I assume you want to go fast so if you go ahead and look at it and you know reads happen more than writes and you add threads to it and it's slower well then guess what you probably need to do a read write lock or it's just not worth it to use threads yep yeah so in this case uh the lock lock is like just a mutex and this guard could be a spin lock or it could be a real mutex if we wanted to so it doesn't really matter it would depend mostly whether or not guard is a spin lock or a mutex just depends on performance reasons all right any other questions with this so we actually kind of know how mutexes are implemented now joy all right so we want the whole reason to do this is just to know how they're implemented and make sure that they're implemented correctly and well guess what to implement them we also have to argue about concurrency just like you will have to argue about concurrency for your programs so if you want to prevent data races we need critical sections so mutexes or spin locks are the most straightforward locks in order to implement them likely everything just needs hardware support for that that compare and swap instruction basically every single cpu on the planet has that now and well we need some kernel support for like waking up threads and putting them to sleep um spoiler alert the sleep and the wake up that was also part of the thread labs that the 80 hour thread labs so i took out that part so you're welcome so before you had to implement sleep and wake up on top of everything else and then how they interact has little subtle things but we won't worry about that for now kernel just has to go ahead and implement them so also for performance reasons because we know what a data race is now we can know we can have a lot of readers and if you want a lot of readers to all execute in parallel then you should use a read write lock so with that remember phone for you all in this together