 All right, hello, it's definitely midterm day because like no one's here. So we'll boss through this, and I guess discuss midterm stuff and I guess lab three stuff because some interesting things happen. But let's go through this quick. So locks implementation. Oh God, there's more questions. Oh, and there are two forks. All right, people are still asking midterm questions even though it's definitely a lecture. So we'll get to them if we have time, which we probably will. So locks implementation. So we were talking about locks, we tried to implement that didn't quite work because we still had a data race when we implemented a lock. Well, there's some like baseline things you need in hardware to actually implement locks. So very basic requirements are like loads and stores are atomic. So you can't, you know, they either happen or they don't and instructions execute in order. And then that's all you need to implement locking. There's two main algorithms if you don't have any special hardware support. Peterson's algorithm and Lampard's bakery algorithm which kind of mimics going into bakery and picking a number and then round robinning. We have kind of an idea of that before every thread has its own storage then they all kind of vote. But problem is they don't really scale that well. You need more space depending on how many threads you have and it's pretty awful. So you actually want some hardware support. And we got this yesterday. Some magical instruction. Someone already knew of this, but this is okay if it's your first time seeing it. There is a magical atomic hardware instruction called compare and swap. And it takes a pointer to some value you want to change. The second argument here is old. So it's what you expect the value to be if it's to change it and then new is what you want to change it to. And what will happen atomically is it will try and change the value pointed to and it will return what the original value is. And it does this all atomically. So if our arguments here are L what I want to change and then zero and one, it will return zero only if it atomically changed the value from a zero to a one. Otherwise it's going to return one to say it didn't actually change the value. So now if we have this function and we go and try our lock again because the whole compare and swap is atomic, this is our entire implementation. So before when we had the while loop, we checked, oh, is it one? We just keep on looping until it hits zero and then we would break out of the loop and then change it to a one. Here we can just replace that all with a compare and swap. So it will go through this loop as long as that value is one because compare and swap would return the value that it's pointed to. So it would return one as long as that value is one. And then if it returned zero, it changed the value from a zero to a one and it did it atomically so no one else was able to interrupt it in the middle of that. So if it returned zero, you know you changed it from a zero to a one and only you changed it. So after the while loop, you break out of it, it's a one. Yep, yeah, so this is a, not only is it a real function you can call, it's a real hardware instruction. So there's a CPU instruction called compare and swap. I think if we see it later, sometimes it's called compare and exchange, they shorten it because of course why not, but every hardware has something that's essentially like this. So it is a built-in, you can use it as a function if you want, but it's actually a hardware instruction. So this is actually a good implementation of a lock. This is a valid implementation, although it kind of sucks. Any ideas as why this kind of sucks even though it works? Yep, the caches shouldn't really matter because it's just one value we're changing. So it has to be somewhere. So our main problem here is like that while loop that especially if it's like a single core in that machine, it'll just try it over and over again and essentially waste a bunch of times when it knows it can't actually change the value or so it will actually consume cycles, consume instructions, it'll just like compare, go back, compare, go back, compare, go back, not really doing any useful work. So this does work, but it has a bit of a problem where it might waste some time. So, but that lock is a valid lock in some cases. If you know that value might change quick or the lock is protecting something that happens very, very quickly, maybe you don't wanna do anything special. Maybe you just wanna try again and that's the best thing. So that is what people typically refer to as a spin lock and they call it a spin lock because it just spins. It just tries over and over again. So yeah, the compare and swap is that atomic hardware instruction. So here's that name on x86. This was when I tried to pronounce change. It says compare and exchange. Depending on who made the hardware, they might call it something different but it essentially does the same thing. So yeah, like again, like I was saying, if you have one core on your CPU and you can't get the lock, you may as well not try again because you're the only thread running. So nothing's gonna change it for you. On a multi-processing machine with lots of cores, well, maybe you wanna try again. Again, like I said, if it's protecting something very small, if you try maybe a few times, you might get it the next time. So we kind of did this, you implement this or hopefully have started thinking about it at least in lab three. Well, you could just add a yield if you know you can't do anything useful. So I have that same wall loop but instead of just trying over and over again, if I don't get the lock, I will just yield and I'll say, yeah, run something else, hopefully it unlocks it because I didn't get it but that has its own problems. One's like something called a thundering herd where if you have multiple threads all waiting on the same lock, they'll all yield and then it'll kind of just stack up and they'll be like 10, maybe hundreds of threads waiting on that lock, all in the queue, all being re-queued over and over again when realistically if a hundred threads are trying for the same lock, you already know ahead of time that only one's going to be able to grab it anyway so why am I bothering just cycling through like 999 of them that won't get it in the next round? And also with that, everyone's just running this loop over and over again and going through the round robin assuming we're doing round robin and you can't really reason about who gets locked next it's just whoever happens to get it next. So if you wanna be fair and be able to reason about it you need some type of ordering and this wouldn't have it. So we can implement a queue so you can add a wait queue for a locks so it's just like your ready queue or if you have a blocking queue or whatever so I could just say hey for this lock I will implement a queue with it for everyone that's waiting for it. So here I'll do compare and swap and then if I don't get the lock I go into the while loop so instead of just yielding I will put myself on a queue and then use something called thread sleep which you didn't have to implement but it's not too hard to reason about. Thread sleep will put this thread to sleep and put it in a special block queue where another thread has to explicitly wake it up and tell it okay you can go in the ready queue again. So you put yourself, you essentially block yourself you put yourself in a waiting queue that says hey don't execute me I'm actually waiting for something then you put yourself to sleep. Then the idea here is that whatever thread has a lock eventually we'll call unlock it would set it to zero to indicate that that lock is available again and then it would check that hey if there's any threads in the wait queue I'll just wake up one thread and whatever is at the head of the list so you can reason about it so it's first in first out order. So there's two issues with this one is something called a lost wake up and two is creatively name the wrong thread gets a lock so anyone wanna point out to me how like bad situations could actually happen with this code or guess what a lost wake up is? Yep, were you gonna say the same thing? Yep, so that is correct. So what could happen is let's assume we have a thread running that's calling lock that doesn't get the lock and another thread has a lock. Well, what might happen is I check the condition, I don't have the lock so I go into the while loop and I want to add myself to the queue that's my next thing I'm gonna do but I get context switched right essentially I get yielded without consenting to it and it switches to another thread and it could switch the thread that's going to call unlock. Oh yep, oh so get a lock means one thread made it through this lock call so it changed the value from a zero to a one and it's the sole owner of that lock. Yeah, so in unlock if you have the lock and remember when we reasoned about before if only one thread should pass lock and it would set it from a zero to a one and then in unlocked it means hey I'm done with it and it just has to set to zero. Yeah, yeah so if it's locked right now at the global state, somewhere that value of the L is equal to one because another thread would have acquired the lock it's doing some critical section it's doing something. So yeah, so I would have in a thread that is trying to acquire the lock so it calls lock, it would go into the while loop it would say hey it's already locked I cannot get it so it would go here and the next thing it wants to do is add itself to the queue but I could get preempted right before I add myself to the queue and now this is the thread that would have the lock so this is the thread that set L equal to one and if it's going to unlock it well it would set it to zero here because it's the only one with the lock there's no data races because if it acquires the lock it's the only one with it so I can just set the lock to zero and now it's available for other threads again. So it might continue executing if it continues executing then it would check hey is there any threads in the wait queue and right now there is not because the other one hasn't added itself yet and then it wouldn't go into here it wouldn't wake up anything because there's nothing there and then just exit, exit return from this function carry on do whatever it's going to do. Now when we come back to this thread well it is now unlocked so the value of the lock is zero now it would add itself to the wait queue and then put itself to sleep and now nothing so it's just going to sit there nothing is ever going to wake it back up again unless another thread goes and acquires the lock and then unlocks it but it might have missed a single wake up so does that make sense to everyone? Bad things happen so the other one is the wrong thread gets locked and the idea of that is you're supposed to be fair so if you're waiting for a lock and you're the first one waiting for it you should be the first one to get it so there might be a scenario where let's say we have three threads so we have let's see three threads T1 and let's say T1 is already in wait queue so it tried to get the lock it couldn't get it because thread two has the lock so this is our scenario so T2 has the lock it's going to unlock it at some point and T1 is first in line it tried to get the lock it executed it put itself in the wait queue and put itself to sleep so there might be a scenario where T1 or T2 has the lock it calls unlock so right now the value of the lock is equal to one and so T2 calls unlock it would change that one to a zero and at that point it might get preempted so it might get preempted switch to another thread let's call this tricky thread thread three so it switches to thread three and thread three calls lock so if thread three calls lock well it would check the condition right now the lock is unlocked so it would change it from a zero to a one and essentially get the lock and pass lock and now thread three just kind of swooped in there thread one was patiently waiting thread three just came in swooped it up and now it has the lock instead of what we would imagine thread one was nicely waiting so does that make sense to everyone? cool so here is the lost wake-up example so what we just demonstrated we're thread one we have two threads thread one and thread two thread two has the lock so thread one gets into the loop and swaps before it adds itself to the wake queue then t2 unlocks it would check if there's anything in the queue there's not so it wouldn't wake up anything else and then that thread just puts itself to sleep and now known potentially potentially nothing will ever wake it up again and then this was the wrong thread getting the lock there's already something in the queue but the thread two that holds the lock unlocks it and this one I swapped thread one and thread three from what I was talking about but idea is still the same some new thread came in swooped up the lock yep uh so be sorry what do you mean so it's in a queue but if it got preempted here go back so if whatever was executing got preempted before it checked the queue then it doesn't even look at the queue and the other thread would go acquire the lock before it checked the thread that was unlocking actually checked the queue so it would just grab the lock and then it would wake up that thread that was patiently waiting for the lock and then it would just try again in the while loop and it would be like oh crap well it's still locked so I put myself to sleep again but so essentially nothing bad will happen it'll just be in a different order than you expect because one thread was waiting around a long time okay and here's how you fix it so the fix of this it looks really really ugly but it's actually not that bad if you translate some things so basically this uses a spin lock internally and the idea behind that is this these things aren't very big they don't take very long so you can use a spin lock which is a perfectly valid lock even though it wastes some time so you kind of need it so if you see this there's going to be a spin lock called guard that I only use within these functions so if you see this this is essentially a guard mutex that I am locking and if you ever set the guard to zero it means I'm unlocking that spin lock so if you translate it it's a little bit easier to read but this is what the code would look like but of course you could refactor it so in the lock I would acquire the guard lock and both these functions the first thing they do is acquire the guard lock so I can't get preempted in between them so I have mutual exclusion no data races only one thread has this guard lock at a single time so I have the guard if it passes through this while loop so whatever I do is fine in fact it makes your I don't need compare and swap for the actual lock value because I don't have any data races anymore because there's mutual exclusion there's only one thread executing it so makes it a bit easier to read so I check if the lock is unlocked well I change the value from a zero to a one and then I can unlock the guard because I'm done with this function I just immediately acquired the lock I don't have to do anything special and then in the other case where I don't acquire the lock well I would add myself to the queue this is just however you implement the queue and then I would unlock the guard and then finally put myself to sleep and then whenever you call wake up it would just resume like it just returned from sleep so the thread would just wake up here and the idea is we would transfer the lock there because we're we're only going to wake it up from the unlock function and we'll see that so in unlock you have called to lock the guard lock so after that you have mutual exclusion then it would check if the queue is empty if the queue is empty well then I just set the lock to zero to indicate I unlocked it so I change it from one to zero and that's it otherwise I don't set the value to zero I just keep it as one because I'm essentially going to I'm only going to wake up one thread and I know one thread was waiting on that lock so I'll just keep it locked and I'll say okay it's your issue now you deal with it so I would wake it up and take it out of the queue whatever was first in the queue and then I can unlock it so does this kind of make sense yeah yeah yeah so the idea is I'm taking the previous code and fixing the lost wake up thing and the wrong thread getting the lock so I prevented both of those this time because of this guard lock only one thread can do this at a time so if I'm patiently waiting nothing can swoop in because the problem was with the wrong thread getting the lock when the thread with the lock just changed the value then it context switched over to another one that was calling lock but now that can't happen anymore because if the thread that is unlocking acquires the guard and then changes the lock to a zero well another thread can't call lock because it would have to acquire the guard and this thread has the guard so it can't come in and kind of like swoop it right so is there any problems with this code because there's still a subtle data race here so this isn't quite correct but it's actually not too bad yeah no idea okay so if I unlock the guard here I could context switch right another thread could call unlock or something like that so what there's a subtle little bug or a subtle data race here that you could actually detect so it's not that bad so what could happen is say one thread has the lock already and another thread is trying to acquire it so it would grab the guard try the current value is zero so it would come or sorry the current value is one so it would come into here add itself to the queue and then unlock the guard so if it context switches after unlocking the guard and before calling sleep well it's currently not asleep so it's not blocked it's still technically active and then the other thread with the lock can actually call unlock now because the guard is unlocked so it could pass through this while loop and it could it would check well the queue is not empty because it added itself to the queue at least so it would get here and then try and wake up that queue or that thread but that thread is not currently asleep so yeah so it adds itself to the queue it knows it needs to wake something up but that thread might not actually be asleep yet but this isn't that bad of a problem because you don't have a data race involving the queue so you know it's about to go to sleep so I can just try this a few times but you have to be careful because it's like almost right and it you might not switch right before the sleep so it's like a subtle data race but it's not too bad and you can actually detect it so any questions about that we are steaming through this cool all right so that was basically that explanation that there's still kind of a data race where you get interrupted right before the thread sleep and then another thread would do the unlock and then try and wake you up but you haven't fallen asleep yet so but again you know it's about to sleep so you could just try it over and over again until it actually goes to sleep so yeah very important especially for even in lab three where you essentially have very very very well-defined concurrency so you have to yield otherwise it won't be taken away from you you still might need to worry about this in part of your lab three in very special circumstances and again it's kind of tricky a data race is when two concurrent actions access the same variable and at least one of them is a right and you have to be very very very careful about this but that also means that you can have as many readers as you want and maybe you want to implement something like that so you don't need a mutex or any protection as long as everyone's everyone's reading so as long as nothing's writing you can go ahead have as many threads accessing it as you want so you might want to have a lock that allows a lot of readers and only one writer so there isn't any data races and that is what a read write lock is actually this is a good segue so this in lab three remember I was kind of talking crap about like hey no one can cause my thing to segfault yeah that was a bad idea someone caused it to segfault and as a reward I will tell you what happened because it will probably happen to you because it was in my solution and it was in the TA solution so both solutions had this problem and I'm going to imagine yours would too so here we go you broke it so this is likely what happened well this is what happened in mind and I can almost guarantee you have the exact same issue so in what join when you implement that you'll have all kinds of checks for errors to make sure that it's actually running blah blah blah all that stuff so in my code at least when I was keeping track of everything I essentially had a pointer that was like a dynamically allocated array so I had it I said it it was you know whatever and then I had to put this thread to sleep because it was waiting on something else so I would swap contacts away from it and then eventually I would swap contacts back to it whenever that thread actually wakes up and in the between those I dynamically changed that pointer because I had to resize the array so now the value of this pointer variable this one was the old value still the old value even though I changed it because it context switched back here and it was still the old value so whatever I tried to access it only if I resize something between you know when I swapped away from it and when I swapped back then I accessed it and I sag faulted yeah yeah yeah so in this case I had a global variable that was just the whole like basically a global variable that pointed at my dynamically allocated thing and it moved between these so yeah so a resizable point so it just means I was pointing at some dynamically allocated memory and I was using a realloc array so sometimes realloc array when you say hey I want more memory it will actually move everything because there's not enough room to place it where it is so in some cases for me it would move everything so if it needed to move it that value changed and that's a data race because I was essentially changing that global variable and I had two concurrent accesses so one thread was changing it while another thread you know was reading it so it read it here and a right happened in between those and I shot myself in the foot because C is so fun right everyone loves C yep yeah so I had in mine I mean you're going to have to have it too where you have the information that keeps track of all the threads and you don't know how many you're going to have so it's a dynamically allocated array so I just had that and I changed it I changed it between the two calls right so this thread was joining on something it went to sleep and then while everything was executing I had to make it bigger and then when I came back it was like okay yeah I know where it is and it was the old value which is now no longer valid yeah sorry so if you use a link list that's okay as long as you don't it like it's the same data race thing right if you saved some value that could change by the time you come back you'll encounter this problem because I had to make it bigger yeah but it depends like if you're changing what what yeah you could if I save the element of the link list by the time I come back that element could no longer exist depends on your implementation oh god yeah yeah it depends but like if you were keeping track of some other element and by the time you came back it doesn't exist anymore then you're gonna have a seg fault so it depends on your implementation yeah so that that still doesn't help if I just like save the value of that pointer right there's an easy fix to this what's my easy fix that's never the right that's never the right solution yeah I saved it to a local variable so I should just read it again so yeah that was my fix so you just you know you just take this and you go yee yee and just set it again right so you get the most up-to-date one by the time you get back and that's the fix yeah nope well the there's no one test case this really depends on your solution to break yeah yep yeah the basic crux it's a data race so I read a value and by the time I use it again it got modified and I read the old one yeah oh yeah you guys added lots there's gonna be some hidden test cases because some are really big and pretty good so yeah there'll probably be some and like some of this this bug only comes up depending on like where you resize so I resized after like eight or something it just happened that one of the test cases did this with thread eight when I had to resize it so uh like you're not going to be safe if you just make a huge array because it's it's really easy to tweak how many threads you make and it's a lot harder to you know allocate way more memory yeah no I have I have to like manually convert them yeah so they'll probably be hidden test cases that you don't know I might try and add some more test cases before it's due just for more sandy check but yeah the point of that is you made your own test case too and hopefully you got an answer from the solution so you kind of know if you're on the right track or not yeah yeah I have to make it so I don't know yeah yeah sure if someone wants to volunteer for that just do it saves me a lot of time yeah but oh I guess I gave you the solution so you know what it should be we can crowd source in a we can all crowd source and I will oversee it so I can convert some test cases sure I'll just add test cases and people can tell me to add their test cases or not and I'll add it for everyone so you don't have to form little groups and have like a group that does good and a group that does crappy so yeah I guess I'll add more test cases but yeah looking at some of you are not very good at writing test cases or converting them so I will do it so yeah I will add more test cases but unfortunately I don't have an infant amount of time okay any other questions or midterm related or anything like that because it's I guess later tonight which kind of sucks yeah do you need a calculator no type three or whatever basic dumb calculators sure if a calculator makes you feel better go for it uh it'll be like two to the power of something and I think on it it's like the last one where you can leave in powers of two if you want so the hardest thing you have to do is add or subtract which and like up to like 30s so if you can't do that faster than a calculator then sure it makes you feel better but yeah it's generally if a prof if you ask that question that prof doesn't care it means it's not going to help you all right any other questions or concerns or anything I'll imagine it's later tonight yeah but that will definitely that's a fault will definitely be in your code even if you you know even if you allocate a huge array eventually it's going to pop up and I guarantee you you did not think of it so yeah depending on your implementation and make sure you read the new what join thing because the new the rule that was implied that I didn't spell out is that you can only wait on an active thread so then you can't have the situation where one waits on two and two waits on one or you also can't create any cycles and you don't have to detect it or anything like that so all right there's no questions I guess for the record we can clean up so yeah read write lock whoo going back so if you want however many writer or to allow however many readers you want and only one writer at a time well there's a special read write lock and the idea of it is kind of the same idea so I will reuse the mutex so now I'll just call them the mutexes or spin locks I'll just call them locks because it doesn't really matter which one they are so I'll use a guard lock that I only use within all of these four functions and then a lock that I use to represent the right lock like the actual lock and then I also have a variable to keep track of how many threads are currently reading the value so in the terms of right lock I only want one thread in there at a time so my right lock and my right unlock just simply lock and unlock that mutex or whatever it happens to be that lock variable now in the read lock I would try and grab the guards so I'm the only one modifying this number of readers so I don't have any data races so this is essentially like that plus plus counter thing so I don't want any data races with it because I'm keeping track of the number of readers so in here I increment the number of readers and then I check well if I incremented the number of readers from a zero to a one that means I'm the first reader and I have to acquire this underlying lock and then I know I now have the lock and I'm we're the only ones with it at this point and then I can unlock the guard and then other threads can come in and call read lock or read unlock so if another thread comes in and calls read lock again well I can have two readers that's okay that's not a data race because they're both just reading so it would acquire the guard increment the number of readers from like one to two and it doesn't have to acquire the lock because a reader already got it so it can unlock and this could happen as many times as you want so you could have 10 readers a thousand readers 10,000 readers whatever you want then eventually some of them are going to call read unlock so it would again acquire that guard subtract the number of readers because it's trying to unlock it and only the last reader would unlock that mutex so if this is like say we had two readers if this one decremented it from a two to a one it wouldn't have to unlock that underlying mutex because there's still a reader trying to read it so I don't do anything I just unlock the guard and the only thing happens is when I decrement the number of readers from one to zero which means I was the only reader left I can unlock the lock because now nothing is nothing is reading it anymore and now our right lock could come in writer could acquire it be the only one so with this implementation I could have either a single writer or as many readers as I want and not both so this will give you right if writes are very very seldom you might want to do something like this because if you just did a lock well anything reading or writing would have to be protected and only one thread could do it which would be really really slow but if writes are really infrequent you would do this and then you can have as much parallelism as you want because they would all be reading and they could lots of them could read at the same time so any questions about that? cool all right that's pretty much it so the whole crux of this is you need some critical sections to protect against data races and you also need a reason about them because even in lab three you will still get data races because now we have that dreaded thing called concurrency even if in lab three it's actually not too bad so you need like mutexes or spin locks are like the most straightforward locks they just only allow one thing to pass lock at a time and that's it so spin locks just try over and over again mutexes usually are a bit more involved and will put themselves to sleep if they don't acquire the lock but to implement locks you typically need some hardware support so that's that compare and swap instruction you would need some curl support for wake up notifications so if you really wanted to in lab three if you were really bored you could implement a sleep and a wake up if you really wanted to so that's up to you but that's not there I intentionally took that out just to save you a bit of time and then yeah lastly there's special types of locks so if we know we have lots of readers we should use a read write lock because it's not data races everything's reading so just remember phone for you we're all in this together