 All right everybody, welcome back to CS162. We're gonna pick up where we left off on synchronization. And we were just starting to discuss atomic instructions last time. So we're going to start however, by reminding you a little bit about what we've been talking about. So we've been trying to figure out how to implement locks. And we started by asking ourselves if we only had atomic loads in stores, what could we do? And at best what we came up with in the too much milk solution domain here was this, where we had two threads that were both synchronizing on a critical section of if no milk by milk. And after working at several different varieties, we finally came up with this and it works. It's related to Dykstra's solution from way back when it was solved in general by Lamport. As I mentioned, it works. It's very unsatisfying because essentially every thread in the system would have to have a different synchronization protocol and different set of instructions. And so if you look carefully, you see thread A is different from thread B. And you can basically figure out at the X point here, if there's no known for B, it's safe for A to buy. Otherwise you wait to find out what's going on and then B basically just says if there's no node from A, then it goes forward. And so we, you know, this was an interesting exercise, but we wanted to move on. And so then we reminded ourselves why locks were appealing because really what we wanted is this simple milk problem solution where we acquire a lock, do the critical section or release the lock. And if we could somehow figure out how to build a lock that was gave us this sort of uniform API, then we would be in much better shape. And so in the hope of doing that, we first started, as you might recall, talking about disabling interrupts, okay? And here was an example that we had last time. And what I did here was I actually augmented this example, which I gave in class by showing that you can have as many locks as you want. You just name an integer and then the acquire and release take a pointer to that, just like the high level locking that we've been talking about. And this particular solution, disabled interrupts had a critical section, which was related to basically modifying that lock variable and then enabling interrupts. And we got to this after we decided that disabling interrupts to acquire the lock and enabling interrupts to release the lock was way too risky and not something that we would wanna actually do. And so this was our solution. And just at a high level here, you notice that the lock is either free or busy, that could be zero or one. When we go to acquire, we first disable interrupts. And by doing that, what we've done is we've prevented the scheduler from switching threads. And so now we've got one thread that's currently, the active one, and that's the one we're running on. We check for the locks busy, if it's not busy, we set the lock to busy and re-enable. If it is busy, then we have to go through this trick of putting ourselves to sleep, which means putting ourselves on the right cues and so on while still running, right? And so that's a little bit of a paradox, but we talked about how in fact that isn't as much of a paradox when you realize that the way the actual scheduler works going through switch and back up again and so on actually deals with disabled interrupts and so on, okay? And so on the release side, again, we disable interrupts and that's because we have a critical section. And in that case, we see if there was anything on the weight cue, then we'll put it, wake it up. Otherwise we will free the lock and re-enable. And the question about, does the idle thread re-enable interrupts like every other thread, yes. So we haven't really talked a lot about the idle thread. In some sense there, the idle thread is kind of what runs when nothing else is running. Clearly if there's nothing running and you're gonna be in that state for a long time, you have to re-enable interrupts. So yes. Now, the other thing I did for you guys last time and I just wanted to do it again quickly is to see exactly how this works. And so I made an animation and notice that I've also changed this animation a little bit to reflect the fact that we're actually inputting the particular lock we're interested in as an address into acquire and release. And so this particular simulation, which is in the kernel, right? Why? Well, because we're disabling and re-enabling interrupts. If you're really interested in doing this at user level it means you have to take a system call before you run this acquire and release. But if you notice we have thread A is running, thread B is on the ready cue. So if you remember what that means is you're either running which means you have the CPU resources or you're on the ready cue, which means the next time timer tick you could potentially get switched in by the scheduler. So both A and B are runnable. The value of my lock is zero which means that nobody has the lock, okay? And so basically if we never got to acquire and release here A and B are just gonna alternate back and forth just like S and T did in that example I gave you last time. And the other fields we have up top here in addition to the actual integer memory location, my lock we have a list of waiters which is a wait queue associated with the lock. So every lock has a wait queue and those are the threads that can't be runnable but instead are waiting for the lock to be released and obviously it's empty right now. Nobody's waiting on the lock. And then finally this owner which is gonna point to the current thread that has ownership but it's not a requirement. There are some variants of locks that you will have that explicitly remember who their owner is but there's nothing about a lock that really needs to remember an owner. If you think about the key analogy you lock your door does that mean you own that lock? Well, you could hand the key to somebody else and they could unlock. And so the notion of a lock doesn't by itself require an owner. Putting an owner in is really more about understanding for instance whether somebody's violating something by trying to unlock when they don't own. Okay, so here we go. We're running the acquire and release codes in the middle here. So this thread runs and it hits the acquire and so the first thing it does is it runs acquire. And so acquire of course disables interrupts that's what that little red circle was. And it now says well if the lock variable that's being passed in is equal to one because somebody's got it, we're gonna do something but that's clearly not true because it's zero. And so we're gonna go to the else clause. We're gonna set the lock to one meaning we've got it. For our own edification we're gonna consider it being owned by A now but we haven't actually changed anything anywhere. This is just for us. We'll re-enable interrupts, we'll go back and now thread A is happily running along and it's in the critical section. Why? Because it's got the thread or it's got the lock. Excuse me. Now or the lock is locked and it has come back from acquire, right? That means it's in the critical section, it's doing fine. Now at some point because thread B's on the ready queue, well the timer goes off and now we're gonna let thread B run. So if you think about what has to happen we have to unload thread A's registers, putting them into the thread control block, execute a switch. When we're done what's gonna happen is A's gonna be now on the ready queue and B's gonna be running. Okay, so here we go, we get a timer interrupt that takes us into the kernel, dot, dot, dot, dot, that's what these dots are. We interrupts are disabled during that period of time so we've entered the switch routine. At some point it takes thread A and it puts it on the ready queue, it takes and pulls thread B off and loads its registers into the CPU and then we re-enable interrupts and now B's running. Okay, and notice that B's happily running, that's what this blue line means without running into anything because it hasn't tried to acquire the lock that A has already acquired. However, the moment we hit acquire what happens? Well, we go to run the code, we disable interrupts and at this point somebody's got the lock because my lock is one and so it enters into this portion of the code that basically puts itself on the wait queue, see it's not waiting and then going to sleep really means that now that we're gonna take ourselves off of the CPU and run switch to get back to thread A, okay? And of course, just before A runs again it's gonna re-enable interrupts and now we get to keep running and so A is now running happily in the critical section, B is put asleep and if you ask yourself where's the PC for B, its program counter is right here at the end of the blue arrow. So when it finally wakes up it's gonna come out of that part of the blue arrow and go finish up the acquire and then return back to the thread. So at some point A is now gonna execute release, okay? And this is gonna be important. As you see here, we're gonna execute a release we disable interrupts. Is there anybody on the wait queue? Yes, there is. So that means this waiter is now gonna be woken up and ready to run, okay? And the mere act of putting him on the ready queue means we're gonna let him continue to run and come back from acquire. So just by putting him on the ready queue he's now gonna wake up and have the lock. Now notice I haven't changed the lock from one to zero, right? Why? Well because in some sense A's handed the lock to B and things are just gonna stay locked, okay? And so then A re-enables interrupts and continues to run. Notice B hasn't started running so it's not just because I am locked doesn't mean that A immediately starts running. All it means is that A has taken off the wait queue and put on the ready queue. Sometime later, timer goes off, scheduler comes into play, it disables interrupts as the interrupt, the timer interrupt happens. It goes into the scheduler which is gonna restore, is gonna put thread A back on the ready queue and it's gonna restore the registers for thread B and get it runnable in which case it's gonna emerge from sleeping, it's gonna re-enable interrupts and now it's gonna emerge from the acquire call. So from the standpoint of B, it's tried to acquire and it's been in that acquire call for all this time and then eventually it came back out of acquire and now it's in the critical section running, okay? That's my simulation. So I'm gonna let that go since we did it already last time. Are there any questions on that? I just wanted to do it again to make sure we're good. Good. So B actually got the lock. That's a good question. Why do I know B got the lock? Because it emerged from acquire, okay? When you return from acquire, that means you've got the lock, okay? That's true of all of the locking protocols. When you do an acquire, you're sleeping there and the moment you do return, you now have the lock, okay? And notice also that since B is running in a critical section and the lock is set, you know that somebody's got the lock and it's gonna be B that's running in the critical section, all right? So interrupts, as I've mentioned here, we've said a couple of places where interrupts are scheduling thread A to run and thread B to run. So these dotted lines are all about the timer interrupt coming in, waking, coming in and rescheduling the next thread, okay? Now, where we were last time as you said some problems with that solution is first of all, you can't give this lock implementation to users because you can't allow them to enable and disable interrupts, way too dangerous. And so we could have a system call. So what I've got here, they could be the acquire and release system calls, okay? But of course, the downside of that is that means that in order to just grab a lock, we're doing a system call, which is expensive, okay? And so the number of locks per unit time we can have is gonna be seriously limited by something that has system calls in the way. So we'd like to have something that's running at user level rather than the kernel. Of course, just to see where we're going with this, of course, if we actually have to put somebody to sleep, we gotta go in the kernel. But that's already a long operation. So the moment we decide we have to put them to sleep, doing a system call at that point to put them to sleep is probably the right thing. And so we'll get there toward the, I don't know, two thirds of the way through the lecture or so. But so the other thing that is a little more subtle is this doesn't work on a multiprocessor or even a multicore because when you disable interrupts, you're only disabling them for one of the processors. So yes, it might be the case that when I disable interrupts and re-enable them, I'm preventing the timer on that particular processor from going off and other interrupts from disturbing me. And so I have a nice atomic section to make a nice lock. But the moment I have more than one processor, this doesn't work, okay? And so that's a bit of a downside to this particular, this particular implementation. So the alternative is gonna be doing something that's runs in the memory system that doesn't have to go into the kernel and will work across multiple processors. And this is atomic instruction sequences, okay? Now, when we started talking about atomic actions, remember we said, here's a set of instructions, grab the account, deposit money, store their account back, that we wanted to put together into a single atomic sequence. And so what we did is we acquired the lock and released the lock. What we would like to do is mimic that idea but have it as instruction sequence that is atomic, okay? And so in this, all of these cases I'm gonna tell you about these instructions, read a value and write a value to memory, atomically, such that no other thread can get between the read and excuse me, the write, okay? So hardware is responsible for implementing this correctly and it's gonna work on both unit processors and on multi processors, which in some cases requires some work from the cache coherence protocol. And unlike disabling interrupts, these atomic sequences are actually gonna work fine across multi-core or multiple processors, okay? And there are, you can get Intel boxes that have multiple chips that are all tied together in a server system. And so they're not just multi-core but they're multi-processor as well. And this would work for that. So here are several read, modify, write style instructions. The most common one, which you're gonna find on pretty much, I'm gonna say every architecture, it says most here but I'm gonna say every, looks like this. And the way you interpret this code here is that this test and set is actually an instruction. So everything that's inside here happens atomically all at once in a way that can't be interrupted by any other thread. And so what actually happens here, what happens is you pass in an address, you get the value at that address, so this is like pseudo code I've got here, you get the value at that address and at the same time, you store a one there. So whatever was there, you grab, you store a one there and you return the result, okay? And if you think about this, it's gonna help us with synchronization because if we start out with a zero there and 12,000 threads all do a test and set at once on the same instruction, only one of them will notice the zero was there before they put the one, all the other ones will just see a one there, okay? So that's gonna be our first primitive. It's an atomic primitive because all of these things I'll show you here all happen atomically and 12,000 threads all doing test and set at once on the same address won't interleave, okay? So only one of them in that instance will turn a zero into a one and the rest of them will just try to turn a one into a one. We'll see how that helps us. Now we can get much more interesting than this, okay? So here's a swap where it takes not just a memory address but a register on say the x86 or Spark processors or a value depending on what particular system you're working with. But the idea here is grab the value in the memory location and store the value of the register there. So this is more like a generalized test and set. So if there was the number five down there and I do a swap with a six, what happens is when I'm done there's a six in the memory location and I get a five out of it, okay? The even more powerful is a so-called compare and swap, okay? And this is a very popular one on the x86 as well as it was on the 6800 originally, 68,000 I mean. And this is a little more complicated. So bear with me for a moment. It has a memory address and two registers. And what we do is we say, well, if the value in the address is equal to what's in register one, then store register two there and return success, otherwise return failure, okay? So look carefully at this. We have a memory address, so that's somewhere in memory. And it says that if what's in memory is equal to register one, store register two there, atomically other and return success, otherwise return failure, okay? And this is an instruction that I'm gonna show you has some pretty interesting properties to it very quickly, okay? The last one is called load link store conditional. And this was something that showed up originally on the R4000 and on the Alpha processors. And what this does is it basically lets you load a value and do something about it, you can look at it, you can store it in register one, and then you can store something back to the location. So I've loading from the location, I'm storing back to the location, but that store is conditional so that if anybody else stored anything in that address between the point when I load it and when I stored, it fails and I loop, okay? And I'm not gonna go into this in any great detail right now, but the idea of this is that this is a way to spin enough times you can construct, test and set swap and compare and swap in a more risk fashion, I guess, a little simpler than having a single instruction that does all of those operations, okay? So let's focus on, so are there any questions on this? And keep in mind that everything I show between braces here, this is not like a normal procedure call, all of this stuff together happens atomically as a single instruction in a processor, okay? Questions, all right? So can I repeat? Everything in, take your given instruction, everything between the two braces happens altogether at once atomically in a way that two threads can't be interleaved, okay? The way it gets implemented such that it's just one instruction, for instance, think of test and set is you lock the memory bus, all right? You lock the memory bus and a load store happens simultaneously, okay? Why is it set to one? Well, one is a good value for doing synchronization. We'll show you how this works in a moment, okay? If the value's already one, then all you did is you get a one, you store a one, okay? If it's already got a one, what was there? Well, you load what was there, you store a one, you return what was there. So what you get back if it was already one is you get a one. If it was a zero, you get back a zero and you always leave it as a one, okay? And we'll start seeing how this works. The point is gonna be great synchronization will result. Yes, you can build a lock with this, a much better lock than any of the ones we've seen so far. Okay, now, so, sorry about the weird animation. So what I wanna show you before we get to something is the following, okay? So here is a non locking, okay? This is a version of a linked list. It's pretty fun, okay? So what I wanna do is I have this simple single linked list, single headed linked list, where there's something that there's a root and it points to the first item, which points to the second and so on. And what I wanna do is I want this list to be such that I can have thousands of threads that simultaneously try to add things to this. And I wanna make sure it doesn't get screwed up. And I can do that with a compare and swap, okay? And the compare and without any locks, okay? So unlike what we've been indicating up till now, where you have to put a lock around shared data, here's a situation where because of the atomicity of the compare and swap, we don't actually have to have a lock at all. So this is gonna be faster, okay? And so let's take a look at this code. How do you add a new object to this queue? I'm gonna work in a loop. And what I'm gonna say is I'm gonna grab the root value into a register, load R1, right? And then I'm gonna store that root value into the next item I'm trying to add into the object. And then I'm gonna try to swap a pointer to my object with the current root and assuming that the root hasn't changed. So notice it says as long as the root memory location is still equal to R1, swap me in as the new root. Otherwise I'm gonna fail and do this all over again. And I'm gonna keep looping until I succeed. And of course that's not gonna loop very long because it's gonna just, everybody's item is gonna get added to the list. And so if you look here, here's what happens when I add a linked list, right? I take my object, I take the current root and I store it in my next pointer, okay? This is not busy waiting because it's resolved very quickly, okay? So that would not constitute busy waiting. But notice I take the root, I store it in my linked list, my new item, excuse me. Well, yeah, I take the root, which is this next one. I store it in my new object. So now I've got a link from the new object to the next. And what I wanna do is I want to take a pointer to my new thing and put it in root. But I only wanna do that if somebody else didn't beat me to it. Because if somebody else beat me to it by adding their item and then I store a pointer to my item on the root, what did I just do? I just threw out their item, okay? And so that's the danger here. And the way that works is what I've got here. So first I load the current root into register R1. I store the value of R1 into next of my new object. So what does that mean? That means that next of my new object is now pointing to the old head of the list, okay? So that's been done right here. Now, even if I fail and do this over and over again, nobody's harmed because I'm just storing different successors into my next item. And then the only thing that matters is I wanna find a point in which I can change root to point at me, but I can only do that if somebody else hasn't changed it between my store, between my load and my compare and swap to point to something other than this next item. Assuming they still match, I can put it there, all right? Pretty cool, huh? Questions. So this is what's often called a lock-free implementation. Once you've got these more powerful atomic instructions, there are oftentimes situations where you can build things like this, don't even require locks to work. Now, up till now what you would have done with your 162 knowledge we've given you is you would build this code by grabbing a lock that sort of locked the root, storing the root in the next pointer and then storing the pointer to your object in the root and then you'd unlock and you know, well, that would be consistent all the time and things would get lost. Instead, we have this very quick code that under the good circumstances where there's zero contention, you basically take one pass through and number loop because you basically load, store, compare and swap, done. And so it's one load, two stores, you're good, okay? You can make these atomic operations yourself, no. So the question is, can you make atomic operations like these with disable interrupt? And the answer is, it wouldn't be the same thing, okay? Cause there's no disabling of interrupts here. What happens is this is just an instruction like just like add or multiply, except it's an atomic one. So this is something much more powerful than a disable interrupt. Cause disable interrupt is like bringing a hammer to tap on a window, not a good idea. And the thing that you were doing with the with disabling interrupts wouldn't work on a multiprocessor as was just pointed out, whereas this thing works fine on a multiprocessor. Okay. All right, now, what can we do with test and set? Okay, so we already had a couple of folks on the chat there that were starting to figure this out, but here's what we do. Okay, why do you need to store if you've already swapped? So the point here is this store, this store instruction, I'm assuming is what you mean, is the one that stores the old route into the next of my new object. So that store has to happen. Okay. Good, now. So let's use test and set now to make a lock. Okay, we're trying to get out of disabling interrupts and doing something better. All right, and here we go. So here's my lock, it's in memory again. Okay, so you can have any memory location you want. I'm gonna start it at zero. The interface is as usual, gonna be a choir is passing the pointer to my lock release as a pointer of my lock, so that's standard. And a choir is gonna look like this. A choir is gonna take a pointer to the lock and it's gonna do in a while loop, it's gonna keep looping over test and set. Okay, and the operation test and set, here's atomic. Now, why does this work? Okay, it works because I start with zero means free. And when I execute test and set, one of two possibilities here. Either there's still a zero there, in which case I store a one but I get back a zero, the while loop exits, and I've just exited a choir, meaning I got the lock, or somebody did that before me and I'm just gonna keep spinning, okay? And then release is very simple. I just store a zero there and the very next thread that manages to execute test and set is gonna get back a zero from test and set. They're gonna store a one there and they get to execute a choir, okay? So the simple explanation of this is if the lock's free, test and set reads a zero, sets a lock to one, so the lock's now busy, returns a zero and so the while exits, if the lock is busy, test and set reads a one, sets lock to one, which doesn't change anything. So grab one, store one, everything, it may be atomic, but it doesn't change anything, okay? And it returns a one, so the while loop keeps trying and then when we set the lock to zero, somebody gets to go, okay? Now, question, is this busy waiting? Yes, this is awful, right? You wouldn't want a lock that worked like this, but we're getting there, okay? So the first thing to understand though is even though this is busy waiting and it's bad for that reason, this will work perfectly well in a multiprocessor, it'll also work perfectly well without going into the kernel because notice there are no system calls or anything here, we're just doing accesses to memory. So while this is busy waited and not great, from that standpoint, we're starting something that maybe we can build on, okay? And yes, the question of busy waiting is even on the slide here, right? The thread is busy consuming cycles while waiting. And so what'll happen here is the one, all the threads that are waiting will spin till their quanta runs out, so they might spin for the next 100 milliseconds, they'll give up the processor, the next one will spin for 100 milliseconds, give up the processor. Eventually we'll get to the thread that actually has the lock, it'll get to run the critical section, release, and then somebody else will finally get to run. So that's why busy waiting is so bad because all the threads that are waiting are basically wasting cycles. Okay, now the one time that, and I'm gonna tell you this, the one time which this might be okay, all right, is if you have a multiprocessor with let's say 10 cores and you have only 10 threads and you know for a fact there are 10 threads, then busy waiting on one core doesn't impact the other ones. That might be a situation, okay, don't try this at home folks, where it makes sense to synchronize that way if those 10 threads are trying to respond to locking as quickly as possible. Okay, but let's see if we can do better. And the one thing about this is this is actually not great for a multiprocessor either. We'll make a better one in a second, which is every time we go through the while loop, this test and set is not a read, it's a write, right, because we read and write. So it's a write operation, which means if you have cache coherence, the cache lines are bouncing back and forth between every core that is running this code. And so if you know anything about cache coherence, this is awful, because you're burning up all of your bus cycles or your network cycles moving around this lock and ironically you're not even changing it, you're setting it to one over and over again, okay? All right, so the comment on the chat, which is interesting is atomic instructions on a 64 core processor sound hard, they're not. And the reason they're not hard is if you have a working cache coherence protocol, you just pull it into your cache and you lock it so it can't be removed while you do the atomic operation and then you release it in your cache and it works fine. So it doesn't matter how many cores there are, the cache coherence protocol, if you've got one that's working, it lets you build arbitrary atomic instructions like this. Now, busy waiting's bad, so the positives for what we just gave you is the machine can receive interrupts because they didn't do any interrupt disabling. User code can use the lock, so that's great, works on a multi-processor sort of. Some negatives are it's very inefficient as the thread's consuming cycles. The waiting thread takes cycles away from the thread holding the lock and so ironically the thread that's waiting is actually preventing the thread that would give up the lock from making progress to give up the lock. And this could be priority inversion. So if the busy waiting thread has higher priority than the thread holding the lock, you might actually be in a place of no progress. Now, you guys don't know anything about priority scheduling yet, you will in a couple of lectures, but that's a priority inversion if a lower level thread holds the lock, but the higher level thread is forced to spin waiting for the lower level thread. So now the lower level threads effectively preventing the high priority thread from running that's called priority inversion. And this is exactly what happened to the original Martian rover. We'll have an interesting story for that in a couple of lectures, okay? But for semaphores and monitors where you start getting more sophisticated style of synchronization, the thread may wait arbitrarily long and so you may end up spinning arbitrarily long. So we'll get, we need to do something else. And any solution you give on an exam or homework should avoid busy waiting unless we explicitly tell you it's okay, which I don't think we will in most cases, okay? So let me give you one other thing called a test and test and set just so you know, this is a much better solution for multiprocessors where busy waiting is not a concern because you know you're consuming every core anyway. And what it looks like is this, okay? So the idea, the release is the same but if you look at what we do in a choir is we spin while the lock is kept, we spin on it. So while it's equal to one, we're just read, read, read, reading, notice that this doesn't really take any bus traffic because you get a cashed copy in your cash and then you're just spinning on it, you're not doing a right. And then the moment it becomes zero, you exit this and then you quickly try to do one test and set and then you go back to spinning. And so what this does is it prevents the ping ponging effect where all of these nodes that aren't actually succeeding in getting the lock keep writing and causing the cash line to go back and forth. So that's called a test and test and set, okay? So it fixes the ping ponging in the cash coherence protocol but it still has a busy waiting problem. All right, so what can we do? Well, if you remember what we did with, to get rid of the busy waiting, if you remember what we did with disable and enable interrupts is rather than the actual disabling and enabling representing inquire and release, instead we use those to very quickly disable and enable interrupts as part of implementing a lock, okay? And so let's do that. So let's build test and set locks without busy waiting, okay? And so we can mostly get rid of busy waiting, okay? And this is a mostly get rid of busy waiting that would be okay. I mean, if you notice here, we're gonna introduce something in red there called the guard variable and it's gonna be global across all the locks in our system. And then of course we've got my lock, which is our actual lock. So if we had 20 locks, we'd have 20 integers that are blue here and one guard for all of them, okay? And that guard is the thing we're gonna test and set on. So the acquirer looks like this, while test and set, okay? So that looks like we're spin waiting, except we're gonna make sure that what's in the critical section is really fast, okay? And so we're not gonna be spin waiting very long. So we spin until we got the guard. So now guard is one, we know that no other Fred is in this critical section for the lock implementation. And then we do what we've just seen. If the lock is busy, we're gonna put ourselves on some weight queue and go to sleep and somehow simultaneously set guard to zero. Hopefully that sounds familiar to be somehow put ourselves to sleep and re-enable interrupts. It's some similarity there, right? Otherwise, if it wasn't busy, we go ahead and make it busy and we release the guard to zero and we exit acquirer. So that would be the case where we managed to get the lock, okay? This is much better than the kernel interrupt because it doesn't make a system call, okay? You got it. Now for sleep, you're still gonna have to make a system call because right now the only threads we have that you know about are kernel threads, but the hope here is, you know, if you go to sleep, you're gonna be there sleeping for a period of time. And so that's okay because it's gonna take a little time to get into the kernel but then you'll be put on a weight queue, okay? The problem with and with releasing, we're now going to grab the guard, check, see if anybody's on the weight queue. If they are, we're gonna have to do something to wake them up. Otherwise, we go ahead and set the lock to free. Now, depending on your circumstances, might still have a priority inversion issue, but let's hold off on that for now. I want to get to an idea on this particular implementation, okay? And if you notice the sleep, when we go to sleep, we have to somehow reset the guard variable. Otherwise, this is not gonna work because if we go to sleep with guard equal to one and nobody else could release the lock and so we'd be in trouble here, okay? Now, in the case of priority inversion issues, if you're worried about that, you'd have a different guard variable for each priority, for instance, and that would take care of this issue. Now, so let's compare this to the disable interrupt solution, right? So this was our how we disabled interrupts. Notice that we built acquire and release and when is guard set to one? Guard is set to one right here. Oops. When we do the wild test and set, that'll set guard to one, right? Okay. Now, so if you looked, remember this was our disable interrupt solution. We had a critical section that was quick and re-enable interrupts. So notice that we've essentially done the same style of redesign here of acquire and release. We've essentially turned disable interrupts into the well test and set and then guard and then enable interrupts is setting guard to zero. So this is essentially the same code. Looking at it another way here, for instance, here's how we use interrupts to build acquire. So we had my lock, we do acquire and release. For the first case, this is so silly that we don't even have separate locks, right? So in this instance, maybe we pass in an integer pointer, but it doesn't help us because there's only one disable and enable in the system, okay? We decided that was a really bad idea. And so what we did was we turned it into this code where we use the disable and enable as critical section or as locks around the simple critical section that's very fast, okay? Same idea here for test and set. So here the basic spin waiting test and set looks like this. And what we did was we took that acquire release and we use that type of locking to build a lock that we can afford to have held for long periods of time, okay? And the test and set on the guard itself is gonna be very fast. All right, questions. So this is the example, this is the prior example, by the way, so this is nothing new. But if you notice here, test and set, we do busy wait, but we busy wait for a very short period of time because all we're really doing is the person who grabs the lock is doing some really quick critical section and then releasing the lock. And so the problem with this version that I've got in the middle is really that you don't know how long the, when you acquire a lock, you have no idea how long the critical section is. And as we start getting more sophisticated in our locking, we may have no idea how long that critical section is. And we don't want the system to be locked up because our critical section is long, okay? What we want is we wanna go to sleep as quickly as we can if we're waiting on a lock, okay? And the reason you'd use the same guard for all the locks is so that you didn't have to pass a unique guard into the acquire and release. It would get messy as an API if you did, okay? But if you felt like you wanted to have several different guards, you could also do that. There's no reason you couldn't have multiple guards with this particular implementation. Now, let's see if we can tease this out. This is all, this is still in user mode. Everything here is in user mode. So that's why this is particularly helpful. So what can I do to help this discussion? So let's do the middle one for a moment. If you notice, this is entirely at user level. We're just saying, we're saying, well, test and set on the lock. We're basically spinning until we get a zero back. And this says we set the lock to zero, back to zero in order to release the lock. This is all running at user level. Is everybody good on this? Okay. There's no, there are no system calls involved in this because we're just using test and set instructions, which are just like ads and subtracts and multiplies which run at user level. This on the right is taking this original acquire and release and instead using them, here's acquire and here's release around an implementation of a lock that when we discovered that the lock that we're using here, you could say the blue one is our actual lock. We can put ourselves to sleep on a sleep queue. And so there's gonna be potentially a system call in the middle to put us to sleep. But that only happens if we actually have to go to sleep. If we have an uncontested lock, we can grab the lock really quickly and release it and not have any system calls involved. Okay. That's right. So this, right. So what was said on the chat is exactly correct. What's good about this while, this acquire implementation is we grab the guard, that's just to get the lock implementation and we quickly check and if the lock is taken, that's the thing in blue, then we put ourselves to sleep on a sleep queue and then we release the guard and we are now in the release the guard as part of being putting on sleep. And so this thing on the far right is a way to very quickly take things that are trying to acquire the lock but failing, put them into sleep on the actual sleep queue by diving into the curl. Okay. So the test and set is busy waiting but it's only taken for a very short time and so the busy waiting doesn't have a major impact. Yes. Okay. That is exactly the way to look at the thing on the right. All right. Now, however, let's go a little further with this. Okay. I'm gonna introduce you to the few texts here. So the idea is, yes, this is good but in fact, there's something in the middle here where we have to put things to sleep and we don't have a good interface for that. Okay. And so, if you look at the so-called few texts system called that Linux has, this is for fast user space mutex, there's, it basically has three arguments. Okay. A pointer to an integer in memory which sought to sound familiar from what we're just doing. An operation which can be for instance, wait or wake, those are the only two we're gonna look at. There's a bunch of other ones that are more interesting. You can do a man on few texts and a value. Okay. And that time, and then the the timeout is something which we can add optionally where the single timeout if it waits too long. So the value is just an integer. Okay. Few text stands for as you see at the top of the slide fast user space mutex. Now, what this will do, okay, it's an interface to kernel sleep functionality let the thread puts themselves to sleep because if they call few texts with a few texts wait, okay. And the value they pass in is equal to the value in memory, then they will go to sleep on the sleep queue. And the only way they'll wake up is if somebody calls few texts wait and wakes them up. Okay. Few texts is not typically exposed in libc. It's used with implementation of P threads. So you can implement locks and semaphors and monitors which we'll get to in a second. So here's our first try here, which is for a choir, we'll say while test and set, if we fail rather than looping in a tight loop we'll just call few texts. And few texts takes the lock pointer which we know is equal to one right now that's because we failed the test and set. We say we're gonna wait and what we do is we say I wanna be put to sleep but only if the lock is still equal to one. So if you think about that what we're saying here is I wanna avoid a race condition where between the test and set noticing that this was still a one and my calling few texts somebody released the lock and I went to sleep in the kernel but they never woke me up. Okay. So that's exactly why we have few texts has this extra argument. So notice test and set if it's a one we call few texts with few texts wait. We say here's the lock value and as long as it's still a one put me to sleep and if it's not still a one it'll just come right out of the few texts right away and you'll call while again. Okay. And so what this does is if we're lucky enough to catch a zero on the test and set we immediately exit we've got the lock otherwise this will put us to sleep until somebody releases and at the point that they release they set the lock to zero and then they say wake up one. Okay. Now if you think about this the sleep interface by using few texts there's no busy waiting whatsoever in here. Okay. If you look at it there's no wasted cycles. However, and the overhead for acquiring is potentially as fast as a one atomic instruction there's no system call. Okay. Unfortunately every unlock has a system call. Okay. So this is not quite clever enough to have a situation where we can grab things quickly but not release them quickly. Okay. Now why not if instead of while? Well, you know, we have to just keep looping on this until we're woken up and there's a zero here and keep in mind that even after we get woken up between us returning few texts and trying to grab the lock again somebody may grab it on us. And so we have to keep looping just to make sure that we actually get the lock which means we actually were the ones who turned a zero into a one. If we were the ones that turned a zero into a one we have the lock otherwise we don't have the lock. Okay. And that's for acquire. Now we could do this. Okay. Now if you think about the only objection you might have to this is what I say at the bottom here is to unlock you always have to do a system call. What we'd like is we'd like the situation with an uncontested lock where there aren't two threads that are trying to grab the lock but in general just one grabs a lock and releases it we would like that to be completely at user level as fast as possible. And only when people actually have to go to sleep do we want to use system calls. Okay. And so here's another attempt. And so if you notice what I did here is I added a new variable associated with the lock called maybe there are waiters. And I'll start it with false. And what happens here is I do wild test and set and assume for a moment I go from zero to one and so that means I've got the lock I exit. That's great. When I release I set the value back to zero. That's great. Oh, by the way that's this should be star the lock equal to zero I'll fix that. But then I say well is the maybe thing that's been passed in equal to zero. Well, it's equal to false. If so I don't do this arm and I emerge right away. Let me fix this right now. Sorry about that because I know this will be very confusing enough without having a bug in there. So I'm gonna say, all right. So everybody see it still again. Okay. So if you notice here as long as there's only one thread grabbing the lock and releasing it we're good. Now does few text wake all the threads? No, this last argument which I didn't talk about tells you how many threads to wake up. So in this few text wake it wakes up at most one. But if you look here, you could use few texts for the actual locking but that would kind of defeat the whole purpose, right? Because you'd be diving into the kernel. If now let's look at this situation where we fail because we grab the lock, we try to grab the lock excuse me and we get a one back. So now somebody else has got the lock. So we wanna go to sleep. So what we do is we set this maybe variable to true and we go to sleep and assuming that the lock is still one, this will actually put us to sleep. Okay, forget this extra one for a moment. Later when the release happens and lock gets set to zero we say is the maybe variable equal to true? Well it is therefore we know for a fact somebody is sleeping in few texts and at that point we said maybe to false and we wake them up. And then they'll emerge over here. They'll set maybe back to true which is a special race condition if there's multiple people on the weight queue in the kernel, try while test and set again assuming they succeed they'll exit and we'll be good to go. Now I don't wanna go into this in great detail but you can take a look at, you should search for few texts that are tricky by Ulrich Trepper and see a little bit of how to optimize this. However I'm gonna even blow your mind a little bit more because test and set is just the wrong thing to use here. Much better is more atomics, okay? And that is the lock here is not gonna have two states. It's gonna have three. And if you think about what I just showed you here it's kind of like three states, right? There's not locked with maybe waiters false. There's locked with maybe waiters false or true. Those are kind of three or four options in there. And in fact what we really want is three options which you'll see from that paper if you look at it. Unlocked which nobody has the lock. Locked which is one thread's got the lock and nobody's in the kernel. And contested which says somebody might be in the kernel. And if we can do the right thing with this we'd like to only call the wake up if we know for a fact somebody might be in the kernel. And so what this code does, and I'm gonna leave this to the reader for later but the first thing it tries to do is it tries to compare and swap. If it's unlocked we get locked back. We put a locked there and we immediately return and we win. Otherwise we swap in this second state of contested and as long as the thing's still unlocked we go to sleep and every time we wake up we try to swap in contested and look for unlocked. Otherwise we'll just keep sleeping and when we go to wake up only if the value there is contested do we wake things up. So I don't wanna go through this in great detail but the interface here is really clean because there's only one integer that's got three enum values. The lock is grabbed cleanly by either the compare and swap or the first swap. So where do the atomic operations come into play? They basically turn a zero into either locked or contested and there's no overhead if uncontested. So as long as you've got a thread that grabs a lock, release a lock, grabs a lock, release a lock it can do that entirely at user level at high speed with no kernel calls. And you can build semaphores in a similar way. And so that's an exercise for the class reader. Now, and that other paper, other web description I told you about the blog basically talks about the three states. All right, now the question of will this be on the midterm there might be something on atomics on the midterm that whether you'll have something that's complicated in the midterm is hard to say. Okay, so where are we going with synchronization? We've now got I think a really good understanding of why loads and stores by themselves aren't enough. We talked about disabling interrupts as a locking mechanism really only works on one processor works great in the kernel for certain situations we'll be using that a lot as we go on. We talked about test and set I hope you're all starting to get a flavor for how test and set works. And we need to provide primitives at user level that allow us to do better synchronization. So we've already built a bunch of locks and you could imagine semaphores built very similarly using locks but I wanna move on to a better primitive than locks and semaphores now. If you remember the thing about semaphores which you've used them so you should be very familiar with them now is they're kind of generalized lock which is a non-negative integer and supports the following operations. One initializing at the very beginning with a value two, a down operation or a P operation which basically atomically decrements waits for the semaphore to become positive and then decrements tries to decrement by one so it'll never go below zero, all right? So this is an atomic operation that if the semaphore is bigger than zero it decrements it by one otherwise it waits and that waits not a busy wait it's a sleeping wait and then up or V increments the semaphore and if somebody was waiting they'll get woken up, okay? So that's the semaphore that everybody has been using and familiar with with project one and technically examining the value after initialization is not allowed, okay? So that's not part of the official interface but if you were to Google the POSIX semaphores you'd find that they actually provide that as an option but it's kind of outside the semaphore, okay? Now, we then sort of came up with a bounded buffer solution using semaphores, okay? And basically what we said was we really want one semaphore per constraint. So if you remember it's simple to make a lock or mutex out of a semaphore by setting it initially to one and then the full and empty buffers basically represent how many coax in the Coke machine example could be added and how many coax are still there to be taken, all right? And that led us to this code last time which I don't want to spend any more time on but basically we start the full slots equal to zero because there's no Coke in it. We set empty slots equal to the number of coax the machine can take, okay? The mutex is equal to one because it's a lock and if you remember the mutex serves as a critical section to make sure that in Q and DQ don't get messed up. And then we basically we use a semaphore V to increment full slots there by waking up with consumer if they've been waiting for Coke to get in the machine and then we increment empty slots to wake up the producer if it's got an extra Coke that it's been trying to put in the machine. Now this code works, hopefully you've digested it from last time and actually I had this even in my extra lecture. It's good, it's a huge step from having just locks. So if you go back to last lecture you'll see that we looked at if you tried to build a bounded buffer and you only had locks, it's a mess, okay? So this is a little better, a little better but the problem is the semaphore here has two purposes. One is a mutex and one is a scheduling constraint. And if you remember if we swapped the P operations here in the producer we could get deadlocks and it's very easy to, yeah you can build this kind of code but how do you know it's correct? All right and so we'd like something better and something better is we use locks for mutual exclusion because that's what we want them for and something called a condition variable for scheduling constraints. So a monitor is a lock and zero or more usually one or more condition variables, okay? And it's for maintaining concurrent access to shared data and it basically has some languages like Java actually have this natively, okay? Other languages like C, you use a library with P threads that has condition variables as a library option, okay? And a monitor is really a paradigm for concurrent programming. So if you get a handle on how to do the monitor pattern you'll find you can do some very complicated synchronization pretty easy, okay? Once you get the hang of it, okay? So what's a condition variable? So a condition variable is a queue that you can sleep on that a thread can sleep on when the conditions aren't right to proceed and that condition is gonna be something you sleep on with the lock held. So you're only gonna use a condition variable to sleep inside of a critical section. So I wanna stop for a moment, okay? Because that is weird, I hope for everybody. The only way that you're supposed to use a condition variable is by going to sleep inside a critical section when you've determined that the conditions aren't right for proceeding. And I will give you examples of how this is but I wanted to highlight that up till now sleeping while holding a lock was just a very bad idea because you deadlocked the system. Condition variables are made for that. They're supposed to be used that way and in fact, that's how you make sure that you have the right constraints and that your synchronization works, okay? So there's some operations, standard operations, like wait for a condition variable. That's how you go to sleep waiting. There's a signal which is how somebody wakes you up and a broadcast which says take everybody who's sleeping and wake them up. And so you could think of condition variables or like generalizations of the weight queue that's normally inside the kernel but we're bringing it out to user level for you to use. Okay, so this is an API and the rule is you have to hold the lock when doing any condition variable operations. I'm gonna say that again, you have to hold the lock. Okay, so when you think about a monitor, a monitor being a pattern or a way of programming controls actions to some shared data. So there is a queue of waiting threads. Okay, those are the ones that are sleeping just like we had in the kernel and a lock that controls entry. And then there's a bunch of condition variables potentially that are actual threads waiting on conditions. So that lock, the entry is just a regular lock queue. And so anybody who's trying to acquire the lock might be put to sleep waiting for the lock. The condition variables are this more general thing of threads that have already entered the monitor but are now waiting, okay? And I think the best way to get going on this is a simple example. I wanna, we've just looking at the double-sided buffer example with the Coke machine where there's a constraint on the size of the buffer. We're gonna start, we'll get there in just a second but let's start with a synchronized buffer that is an infinite buffer, okay? So what an infinite buffer means is it never gets too big. We don't worry about the size but if a consumer ever comes along and there's nothing to take out of the buffer we want the consumer to go to sleep, okay? So everybody with me, so this is like half of the Coke machine example, okay? This is like half of the Coke machine, the consumer half. And if you notice, what do I have here? I have a lock, I'm gonna call buff lock. I've got a condition variable, I'm gonna call buff CV. And then I've got a queue which is, you know some sort of linked list or doubly linked list or whatever that we're using, okay? And the producer, since we don't have to worry about overflowing the queue, remember this is half of the Coke machine example, all we do is we acquire the lock, we enqueue the item, okay? So we put it on the queue. Why can we do that? We have the lock. So we don't have to worry about different threads trying to mess with the queue at the same time. We've grabbed the lock. And then what we do is because we've acquired the lock we can now do condition variable operations and the only operation we're doing is we're gonna signal to say, hey, I just put something on the queue so if you happen to be sleeping there you might wanna look, okay? That's a signal. And then I release the lock. So the producer here is pretty simple, right? Acquire the lock, enqueue stuff, signal anybody who happens to be waiting for Coke because I just put some there and then release it, okay? This is exactly notify and Java, yes. Okay, great. Hold that thought, okay? The consumer is the more interesting part now, okay? So I want you to look here. We acquire the lock and now I'm doing something very strange here, right? I'm saying while the queue is empty go to sleep, condition wait. I have to give the condition variable and the lock, okay? So I have to say put me to sleep on this condition variable and here's the associated lock. Can anybody figure out why I have to say what the associated lock is when I go to sleep? Yeah, great. That's exactly right. Because when I go to sleep somebody better release it for me, okay? Now I, we're gonna understand that. So at one level I want your brains to appreciate that the reason we give a lock is so that when we go to sleep, things get unlocked. So what I'm proposing you is not violating the laws of physics or programming in any way but what I want you to do is you're gonna push that knowledge aside and I want you to get into the paradigm now. So what happens differently here in the paradigm is that we're not checking for a full queue. Remember this is only a half of the bounded buffer without the bounded part. So we acquire the lock and what we say is because we have the lock we can check things like what's the size of the queue. We don't have to worry that by the time we, if we check the queue and it's empty and we go to do something we don't have to worry that it's gonna change on us. Why? Because the lock is taken, okay? So we have the lock. So we can happily check conditions. We can do anything we want and they won't change on us until we release the lock and that includes going to sleep. So from the standpoint of you as the programmer here have to think I have the lock, I went to sleep, okay? When I get woken up, I still have the lock. That's the way you gotta think about this, okay? I'll get to the while loop in a second. Why do we have the while loop? Well, because even when we get woken up we may not still have the condition satisfied so we have to check it again and I'll say why in a moment, okay? But the idea is I grab the lock, I can check conditions, I can go to sleep on conditions, I can wake up, I can recheck the conditions but I always have the lock between acquiring it and releasing it. That's the way you wanna get to thinking about monitors. Even though we all know the laws of physics aren't violated because the lock is released underneath the covers for us. But for now, I check the queue, I go to sleep if there's nothing there and if somebody signals me then I wake up and I check the queue again, okay? And if I find that it's no longer empty I know because I have the lock that I can go down and dequeue something there. It's not empty because I just checked it, right? And then I release the lock, okay? And yes, the reason why we have to do the while loop is exactly because somebody could come in there between us getting woken up and us operating system reacquiring a lock for us, somebody else could get in there and grab the item on the queue. So we always have to check, okay? Now, part of this while loop thing I wanna talk about because this is important. So there's two different types of monitors, a Mesa monitor and a horror monitor, okay? The Mesa monitor was named after the Mesa operating system from Xerox PARC. The horror monitor was named after a mathematician who developed it. And if you look, we've all been asking ourselves or I've saw several questions, why the while loop? Why did we say while is empty, wait, then dequeue? And the question, which I'm sure was in your mind is why not say if is empty, wait, and then come out? And the answer is between finding out that it's empty, between, excuse me, waking up and actually getting a chance to dequeue, it's possible for somebody to get in there. Now, remember the way to think about this code is I already have the lock. So if I emerge from conditional wait, I know for a fact that I have the lock. So the only point when somebody could get in there is between somebody signaling me and me being put on the ready queue and me actually starting to execute somebody might have gotten in there and grabbed it for us, okay? But that's actually a distinction between Mesa and horror scheduling, okay? So if you look Xerox PARC's Mesa's operating system, which I think I even have a paper on, if you're curious, I think I put it up on the reading list is what most operating systems use. And that's the situation where between being put on the ready queue and us finally starting to run somebody grabbed it for us versus the horse style, which is named after a British logician, much more complicated, okay? So let me start with the second case. In the second case, the signaler actually gives up the lock and the CPU to the waiter and the waiter runs immediately. So the way you look at this is here I am on the signaler, I acquire the lock, I signal, the signal immediately gives the lock and the CPU to the one who's waiting and they now can do anything they want because they know that no conditions have changed between the signaling and them running. And then when they finally release, they give the lock back to the original signaler and they get to go, okay? It seems like great semantics. It's easy to think about maybe, but the problem is it's messy from an implementation standpoint and it's actually really bad from a cash standpoint because you got this guy over here on the left who's happily running, doing all sorts of stuff. And just because he decided to signal somebody, he loses the CPU and all of his cash state, okay? So this seems like maybe it's not great from a performance standpoint, okay? Forces a whole bunch of context switching. Mesa on the other hand says the signaler keeps the lock and the waiter's placed on the ready queue. So here what happens is when we signal, all we do is we put the waiter on the ready queue and we keep going and we release the lock and whatever sometime later, the timer goes off, the scheduler runs and we wake up and we've been on the ready queue all this time and we go back and check our condition, okay? And so practically we have to check the condition again just to make sure nothing's changed from the point at which we were signaled to now, okay? So most real operating systems all have this Mesa scheduling, more efficient, easier to implement, better for the cash state, all right? Questions, now let's do our fully bounded circular buffer since it's been asked about a couple of times. So the only real downside of this is it's non-deterministic, that's correct. But in fact, the performance advantages here are far outweigh any of the non-determinism because you don't wanna have a situation where non-determinism gives you an incorrect result because you're gonna design things correctly, correct? Now, there's too many other sources of non-determinism so that one's not worth removing. So if you look for the circular buffer, we're gonna have one lock and two condition variables, one for sort of the buffer being too full and one for the buffer being too empty. And of course, these condition variables don't have anybody waiting on them. And so now the producer is gonna acquire the lock and it's gonna say while the buffer is full, wait on the producer condition variable. And when that is no longer full, then we enqueue the item and we signal the consumer, okay? And in the case of the consumer, we sort of say, well, while the buffer is empty, we wait on the consumer condition variable and when that's done, we dequeue and we signal the producer. So if you were to look at this, this is essentially mirrors what we did for the semaphore version of this, but it's much cleaner because we're waiting on condition variables. We're doing so inside of a lock so we don't have to worry about any of the things we look at changing because somebody else is messing with us. Whenever we run all of the code, the while loop, the buffer full check, the condition weight, all of that stuff is all running with us holding the lock. Okay, now, so what the thread does when it's waiting is it's sleeping, it's not busy waiting, okay? So condition variables are interfaced properly with the operating system, they'll put you to sleep. And by the way, you could imagine you can build condition variables using few texas. Now, why the while loop Mesa semantics? Most operating systems when the thread is woken up by signal, it's simply put on the ready queue, may or may not require the lock immediately. What about reacquiring the lock? Well, because of the way it think about this, the semantics are I never run code without holding the lock if I had the lock first. I grab the lock, I go to sleep. Yes, I'll release the lock temporarily under the covers but I'm not running. And before I get to run again, the lock will be reacquired by the system before I return from condition weight, okay? So I always have the lock. Ah, how are they both in the critical section if they're sharing the lock? So the answer is they're not both running in the critical section, one of them sleeping, the other one's running, okay? So what actually happened, let's say the producer is sleeping on the conditional weight here, the consumer now says, well, while the buffer's empty, which it, it's not, there's something there. So, because the buffer's full, I go to dequeue and then I signal the producer. So notice that I'm running, I have the lock, this guy's not running, he's sleeping. So the lock is only really owned by me, okay? Eventually I'm gonna signal, still I own the lock, I'm gonna release the lock, now I don't have the lock, I'm gonna keep going. Later the scheduler is gonna kick in and the first thing that is gonna happen when I get pulled off the ready queue is the implementation for conditional weight will reacquire the lock before it emerges from conditional weight. So you could almost think that I was sleeping on the condition variable and then I'm sleeping on the lock and then I emerge. Okay, but I don't want you to think, and that's because condition weight gives up the lock under the covers without telling you, okay? And then it grabs the lock without telling you. And that's why this, and it's this that makes it much simpler and yes, it actually sleeps. Okay, and it may or may not acquire the lock immediately, but any code that you run has the lock. Now, I wanna start on this a little bit because I wanna get you guys an idea that monitors are much more powerful than what I just showed you and that's why they're so cool. Okay, so let's talk about the reader's writer's problem with a database. So a database, you wanna have many readers and one writer and you can't have those two mixed. So when you're writing, there can't be any readers. And when there may be many readers, but there can't be a writer, okay? It does actually sleep in conditional weight, yes. So there are two classes of users here, readers and writers. And using a single lock on the database is that sufficient to get us the semantics we want? If we grab, we lock the database before we do our reader or a write, does that give us the best behavior? Anybody? And why does not give us the best behavior? Homework run flashbacks. Yeah, because you can't have more than one reader that way. So what we're gonna do is we're going to come up with a solution using monitors that lets us have multiple readers and one writer, okay? And so the correctness constraints for this problem, our readers can access the database when there are no writers. Writers can access the database when there are no readers or writers because we can only have one writer at a time. And only one thread can manipulate state variables at a time. Now, hold your breath for a second. Don't do it too long so you turn purple, but I'm gonna show you the state variables that are gonna let us do this, okay? So the reader is gonna wait until there's no writers. It's gonna access the database and then check out which is gonna wake up a waiting writer if there are any. And the writer is gonna wait until there are no active readers or writers and access the database and wake up readers or writers, okay? And we have these four state variables and two condition variables. Now, this sounds bad because it's complicated, but it's not, okay? Very simply, the four variables are as follows. How many active readers are there? That's how many readers are actually reading the database. How many readers are waiting, okay? That would be the number of readers that are not allowed into the database because of a writer. Similarly, the number of active writers is the number of writers that can be in there. If you think about that, our constraints say that AW can only be zero or one. And the number of waiting writers is the number of writers trying to get in there. And then we're gonna have two condition variables to sleep on. And if you look at the code for a reader, what you're gonna do is you're first gonna check yourself into the system. So all monitors, you start by acquiring the lock and now we're gonna check conditions. Now what's cool about this is we acquired the lock so no other thread can get in there, which means I can do arbitrarily interesting checks. And my check as a reader is if there are any active writers or any waiting writers, I have to go to sleep because as a reader, I'm not allowed in the system. Because if there's an active writer, clearly I can't read. And if there's a waiting writer, then that means there must be an active writer and that waiting writer is gonna get to go next. And so as long as there are any writers in the system at all, this solution is gonna increment the number of waiting writers, waiting writer plus plus, going to sleep on the okay to read. And then when I wake up from that, I'm gonna decrement waiting writers and try again. Okay, and I'm gonna keep looping in here until there are no writers of any sort in the system. And then basically when I've exited this, I know that there are no active writers or waiting writers. And so now there's one active reader, namely me, okay, and I release the lock. Okay, and I do the actual database access. Okay, WR is a waiting reader, okay. And then when I'm done accessing the database, at this point I grabbed the lock again. I check out by decrementing the number of active readers because I'm no longer active. I'm not in the database anymore, okay. And if at that point there are no active readers, but there is a waiting writer, then what I'm gonna do is I'm gonna signal the waiting writer to wake up, okay. And I'm gonna release the lock. Now notice what's interesting here is if active readers are not zero, then I know that some other reader will come on and wake up waiting writers. And if active readers are zero and there are no waiting writers, then there isn't gonna be anybody in the system to worry about or there's an active reader or active writer who will wake us up. So we'll go through this more. I'm gonna actually, if you'll bear with me, talk this through a little more so I don't leave you in a totally confused state, but there's a couple of things I want to think, you to think about. So first and foremost, writes are prioritized over reads in the way that this been done, okay. And the reason the writers are prioritized over the reads is because, well, is that a good idea? Well, standard data shows that they're far more readers than writers and that writers tend to happen quickly and you would like all of their rights to be reflected in the readers as quickly as possible. So A, writers are not, they're rare and B, we would like their state to update. So we give priority to writers over readers. Only for this example, that's not to say that every version of this might be the right thing. Okay. The other thing to notice is this is a pattern. Grab the lock, check conditions. Are the conditions right? For me to go, if there are writers in the system, the conditions aren't right right now. At that point, I update information to let other people know that I'm in the system waiting and then I go to sleep and I try again. And then when I'm ready, oh, the conditions are now right, I increment the fact that I'm an active reader, I release the lock and notice that I've actually released the lock before I go into the database, okay. And you might ask, why release the lock here? Can anybody tell me? Great, exactly right. So another reader can come along. So notice this is like meta locking, okay. This is not locking, this is meta locking because the lock is protecting my invariance on entry into the system. It's not locking the database. It's checking whether my constraints are right to allow me into the database. That's different thing. That's why I release the lock before I even do the database because my entry code is checking conditions, okay. And my exit code will wake up anybody who needs to be woken up, okay. And this is gonna exactly allow multiple readers, okay, great. Because the next reader will come along and go through the same thing. Notice that there are no writers in the system and get to go forward. Now what we know about the right side is it better be the case that if there's a reader in the system, there'll be no possibility for a writer to start, right. That would be bad. So the writer using the same pattern is gonna acquire the lock. They're gonna say, well, if there are any active writers or there's an active reader, either of those situations kill it for a new writer, okay. It kills it for the new writer because a new writer can't write if there's either an active writer or an active reader. So they gotta immediately go to sleep. So our condition for entry is different in this case. If there's either an active writer or an active reader, then what we're gonna do is we're gonna say, oh, conditions aren't right. I'm gonna increment the number of waiting writers and go to sleep. When I get woken up, I can decrement the number of waiting writers because I'm not waiting anymore. Go and check my condition again, okay. And writes must wait until all of the running readers are dead, yes, okay. And at the point when I finally have no active writer or active reader, then I become an active reader. I release the lock just like I did with reads and I do the database access. Now why is it an issue if there's active readers for a writer to go forward? Well, the readers are doing something with the database that may look at many different fields and a writer is gonna go in there and start screwing that up by changing the consistency. So the assumption here has to be that the writer is gonna touch things that are gonna screw up the reader. It's not the case that the reader's just gonna look at one thing. The reader is looking at a full database record or what have you, okay. So in this particular set of, kind of like cache coherency, but bigger, okay. You could imagine that records may have many fields in them. And if the fields aren't consistent, then the reader is gonna have problems. And so a writer shouldn't come along and change anything until there are no readers left, okay. And so the exit on the right side is gonna be similar. We're gonna grab the lock. We're gonna decrement. We're no longer an active writer. And then we have an interesting exit here. Like if there's a waiting writer, then we're gonna wake somebody up, okay. Otherwise, if there are no waiting writers, but there are waiting readers, we're gonna broadcast. So notice the difference. I wake up one writer, but I potentially wake up all the readers and then I release the lock, okay. Now, interestingly enough here, let's, normally I don't do this at this point in the lecture, but what would happen if there were many waiting writers and we broadcasted to all of them to wake up? Do we get bad behavior by having more than one writer suddenly using the database? Well, let's think this through. Notice what happens when a writer wakes up from conditioned weight. The first thing that happens before they come out is they grab the lock. So only one of those many writers who woke up got the lock and got to emerge from conditioned weight, they're the ones that decrement the number of waiting writers go through the loop. Notice that there are no longer any active readers or active writers and get to release the lock after incrementing active writers. All the other ones then wake up and turn, but at that point they're gonna look through and they're gonna say, oh, active writers is now one, greater than zero, so they'll go immediately back to sleep. So even if we mistakenly broadcast and wake up all of the writers, only one of them would get through. Now that's a great question, what if they're interleaved and the answer is, why can't they possibly be interleaved? What's the paradigm here? Because of the lock, right? Forget for a moment that we go to sleep and leave out locks, okay? Put that aside in your brain, which you need to think now is all of the code between a choir and release has the lock. There can be no interleaving inside between a choir and release. Therefore, when I'm checking conditions and changing waiting writers and all and reading writers, all of that stuff, there's only one thread doing that at a time because it has the lock and therefore this consistency, there's no interleaving there in that header code. As soon as I release the lock, then there could be interleaving, but that interleaving is set up to only allow multiple readers or a single writer. Okay, well, broadcast down here, the only reason only one of them wakes up is it's not actually that only one of them wakes up. They all wake up, but only one of them finds that the conditions are right for it to run, it gets to run, the rest of them put themselves back to sleep again. Okay, and there's no priorities here, just the one that got the lock first gets to notice that it's ready to go, okay? Okay, why broadcast instead of signal? Well, in that case, we want multiple readers. And if you look at the reader case, if there are no writers then all the readers get to go. All right, why get priority to writers? Well, we talked about that earlier. Now, what I'm not gonna do now because we've run out of time, but we'll do it next time is it's fun to look at a simulation where you can actually see these variables go up and it might help you a little bit, but I'm gonna let you guys go for now. Sorry, we've gone over a little bit, but we've done a lot today. This was a big lecture, so I apologize for all the topics here. We've been talking about atomic operations as something that runs the completion or not at all, but we've now moved it to be looking at instructions. So we talked about hardware atomicity primitives, okay? Disabling interrupts, test and set swap, compare and swap. These are all atomicities, okay? We showed several constructions of locks. We've looked at ways of using disabling of interrupts, looked at ways of not having busy waiting and not tying up resources. And ultimately what we did is we separated the lock variable from hardware mechanisms to protect the implementation of the lock, okay? The other thing is we talked about semaphores again as being good, but maybe too complicated. And so we started introducing monitors here, which is a lock plus one or more condition variables. And monitors are really representing the logic of your program. And next time we'll talk about the reader's writers a little bit more and show you walk through an actual simulation so you can see all those different variables changing, okay? Now, the question about on the chat here, which I would be happy to answer, people are welcome to go if they like, but if one thread signals or broadcasts, does the scheduler have to wait until the thread releases the lock before it wakes up? No, because we have Mesa scheduling, what happens is the broadcast or signal merely takes threads and puts them back on the ready queue. That's all it does. It doesn't have to wait for anything, okay? Just puts them on the ready queue and then it returns running, okay? And then when the thread wakes up from the ready queue at that point, it tries to reacquire the lock inside the implementation and it might get put immediately to sleep again because the lock's not available. And all of those things that have been woken up will go through one at a time, all right? And yes, we'll talk a bit about implementation of monitors if you like, but I wanna let you guys go. I hope you have a great weekend and we'll see you on Monday. Ciao.