 Welcome, everybody. This is Professor Kuby with a special installment of CS162. I'm going to record lecture 7 here because I don't want us to get too far behind. And we weren't able to actually give the lecture on Thursday due to a projector malfunction. So what I'm going to do today is continue on with our discussion of synchronization. We're going to finish up discussing locks and lock implementation. And then we're going to move on to some higher-level primitives such as semaphores and monitors and show you how something more powerful than a lock actually makes it much easier to construct interesting synchronization primitives. And we'll finish up, in fact, with the reader's writers solution using monitors. If you remember, on Tuesday, we were discussing synchronization in the context of the too much milk solution. And we had constrained ourselves to only allowing our atomic loads and store operations. And we didn't have any other atomic operations. And that led, after a little bit of work, to this particular two-note solution where each thread has a note, which you could think of that as being in a memory location. And each has the same critical section, which is this sort of, if there isn't any milk by milk, that's in red here. And that critical section is crucial that we basically have the invariant that only one thread's allowed in there at once in order to ensure the correctness of our algorithm. And notice this entry and exit code here with the notes basically was designed to try to make sure that that invariant is held. And yes, it works. If you look carefully through this, and we did this last time, what you find here is that, for instance, in the thread A case at x, we can be sure that either B hasn't left a note at all, in which case we fall through and do our work. And while we're doing the if-no-milk-by-milk, B is going to get hung up on the fact that A has a note. And so it's not even going to bother the critical section. So we only have thread A in the critical section. On the flip side, if B has left a note and A starts scheduling, it'll leave the note. But then it'll spin here at x until either B sees A's note and exits, in which case A goes ahead and buys the milk, as it was before. Or in that case, if B had already gotten a little further beyond before A started running, it will do the critical section and exit. And A will spin until B is done. And then it'll check the invariant after B is gone, and the right thing will happen. So this works, I guess. It has the downside of being very complicated to even reason about. And in particular, the fact that A and B have different code is very worrisome. Because as you can imagine, if you had 100 threads, every one of them is going to be different. And then, how do you make sure you got correct code? I will guarantee you that that's a bug that's just waiting to happen. And also, thread A's solution, and pretty much any thread other than the one that looks like this, is going to have spinning like this. And while A is waiting for B to do its thing, it's busy in this loop. And let me just point out that B could be off buying milk and A would be spinning in this loop waiting. And so the time that you're in the spin loop here could be minutes or an hour, depending on how long it takes B to go get milk. And we call this busy waiting. And this is bad. This is just not a good synchronization solution because we're wasting time and CPU cycles on a thread that's making no forward progress. And B potentially isn't even getting cycles as a result. So that busy waiting solution is not a good one. And if you come up with a solution that has that kind of long-term busy waiting on an exam or in homeworks or whatever, that will be considered incorrect. So looking back for a moment, what do we really want to put on either side of our critical section? So we were talking about locks on Tuesday. And a lock really prevents somebody from doing something. And the essential idea here is that you lock before entering the critical section and before accessing shared data. And then you unlock when leaving. And if multiple threads attempt to lock at the same time, you end up waiting. One of them gets to go forward, but all the others end up waiting. And this is an essential idea, which I also emphasize multiple times in the previous class, which is all synchronization involves waiting of some sort. So the way we fix bad situations that arise from too many threads entering the critical section at once is we make sure that the threads wait in an organized fashion. And so that controlled waiting is our solution. And for example, you can fix the milk problem by putting a key on the refrigerator. You lock it. You take the key with you if you're going to go by milk. And of course, I pointed out this fixes too much because the roommate can't get into the fridge for anything, milk or not. And of course, we don't really have a good idea how to make a lock yet. When we were at the latter part of the lecture on Tuesday, we were starting to build locks using interrupt, enable, and disable, which is a slightly more powerful hardware primitive than just load and store operations. So looking forward to why we want locks, here was our solution number four, which we postulated in some sense if we just had a lock. And so what would that lock do? We'd say lock.acquire, which is wait until the lock is free, and then grab it. Lock.release is unlock and wake up somebody who might be waiting on the lock. And these are atomic. So it doesn't matter how many acquires and releases are happening simultaneously. We still have the invariant that only one thread is allowed through the acquirer at a time. And so however we build these locks, we have to make sure that that implementation provides those atomic semantics. And so that's kind of where the complexity of the implementation comes into play. If we had a lock, then our milk problem is easy. We acquire the lock, we check the critical section, we release the lock, and every thread looks exactly like this. So we could have 100 threads or 1,000 threads. They could all look like this, and we would be keeping the invariant that only one thread is in the critical section at once. I want to point out something here. Briefly, I've been using an object-oriented syntax in the early part of our discussions here just to make it easier. So the lock has acquire and release methods. A little later in the lecture, I'm going to switch over to something that you might be able to do and see that's a little less object-oriented in its syntax, but still works. And it's something you might use for p-threads. So once again, just to re-emphasize what's in red here is called the critical section. And it's the thing being protected by the lock. So now, as you might remember from the last lecture, we were talking about how to implement locks by disabling interrupts. And the key idea here was we don't want to use disable in an enabling directly as our acquire and release. Because again, you could acquire the lock, spend a half an hour getting milk, and come back. And that would disable interrupts for a half an hour. That would be just bad. So instead, what we did was we said, oh, let's make the lock be a variable in memory. And then we're going to use the disabling and enabling of interrupts to help implement the lock. So notice the difference there. We're not using disable and enable as the lock. We're using it to implement the lock. And so here was our acquire method. We had the variable in memory, which we called value here, for instance, which could be either free or busy, 0 or 1, for instance. And the acquire method basically disabled interrupts checks and says, oh, does somebody have the lock or not? And if they don't, we grab the lock by setting value to busy, enabling interrupts again. And the thread that has made it successfully through will exit acquire and now work on the critical section. Alternatively, however, if the lock is busy, we actually then put ourselves on a wait queue and go to sleep. So notice that the important part here, go to sleep, says that we're not going to be busy waiting. We're going to be essentially sleeping on a queue, which means that somebody else can use the CPU. And there was this little question about, when do we re-enable interrupts? Since they're disabled here, we put ourselves on the queue and we talked about the different places we could do it and decided that needed a little more discussion. I'll remind you that in a second. But the release is also very simple. We disable interrupts. We say, since we're giving up the lock, we can hand it to somebody who might be waiting. And so if there's anybody waiting, we basically take the thread off the wait queue, put it on the ready queue, and exit by re-enabling interrupts. And notice that we've gone through release. And so as far as the programmer is concerned, it's released the lock. The lock value will still be busy at that point so that when this thread wakes up and starts running, it will wake up over here in a choir and exit. And value will be busy. And it will have gone through a choir and acquired the lock. So notice, incidentally, that this idea of who actually owns the lock is not explicitly recorded in the lock. The lock is either locked or not locked. And you could kind of say what I just said here that we sort of handed the key from the first thread to the second in some sense. And note, by the way, this style of implementing, see, we're implementing the lock inside with disable and enable of interrupts, could have many locks. So we could have a whole array of these and choose which one to do by saying in our acquire method sort of which lock we're interested in. And as a result, we could have very fine-grain locking where we had a lock for every element in an array or we had a lock for every item on a linked list or whatever. So this is extremely flexible. And we're just using the disable and enable interrupts as our implementation methodology. Now, this question about how to re-enable after sleep. So if you notice here, if we decide we need to go to sleep, we put ourselves on the weight queue. If we re-enable interrupts there, then there's a possibility that somebody else will wake up and release the lock and then we go to sleep. And so we're sleeping with the lock. That would be bad. We can't go to enable interrupts after the sleep because we're sleeping. And so the question was really, what did we do there? And what I pointed out on Tuesday was the problem with this view is it's really assuming that we can talk about one lock and one thread in isolation. And in fact, what really matters is the context, the context of the actual scheduler operating here. And so when one thread goes to sleep, we know that we got to run another one, pull it off of a ready queue. And so if you look more carefully, what we're really going to do is we're going to have interrupts disabled and we're going to go to sleep and then we're going to wake up another thread and that other thread is going to re-enable interrupts. And so that was what we showed you here. We had two threads. We were running along in thread A until eventually we get to where we disable interrupts and we're going to sleep. And this could be either because of the lock implementation I just showed you or it could be because an interrupt is causing a rescheduling to give CPU to some other thread. At that point, we context switch and the thread B's context, including its stack, it's put into the processor. And then on exit, after the context switch, we re-enable interrupts and now thread B runs for a while. So notice it wasn't that thread A was re-enabling interrupts. That was our sort of false way of thinking. What matters instead is that thread B re-enables interrupts as part of the normal business of being context switched and eventually it enters the scheduler again, context switches and having it disabled interrupts and then thread A might enable interrupts later. Okay. Now just to make all of this clear, I put together a little simulation for you guys. And what I wanna show you here is we have two threads, A and B, and they're both gonna try to get into the critical section. So they're gonna try to acquire the lock and to the critical section release the lock and they're gonna be running at the same time. So as we start here, thread A is actually in the processor running but B is on the ready queue. So the regular scheduler would switch back and forth between A and B so they're running simultaneously and we could have a problem with multiple threads inside the critical section at once if things aren't working. A couple of other things to note up top here. One we have this value square which is representing the actual lock and it could be either zero or one. And then I have another couple items here and one of them is owner. And as I indicated in a couple of previous slides, the owner is never explicitly tracked by the lock. The lock just knows whether it's locked or not. Who is the owner has to do with who's allowed through the acquire into the critical section. What I'm gonna do though is I'm gonna help you by pointing at the thread that actually has the lock when somebody has the lock with this little owner box just to help us understand the simulation. But keep in mind there's no actual owner linked list. The other thing is a weight queue and a lock does have a weight queue. And this weight queue is gonna point at all the threads that have tried to acquire and been put to sleep because somebody else has the lock. Okay, let's do it. So we start off with thread A's running. And the first thing it does is it tries to do lock acquire which means it makes a system call into the kernel. We have to be in the kernel in order to disable interrupts. And so then what happens is we disable interrupts. I got a little red dot here just to show that interrupts are disabled. And we're gonna now check the value. And of course the value is not one, it's zero. So we go to the else clause. We set value to one because we grabbed the lock. And at that point, notice how I'm just sort of showing that the lock's owned by the owner and that value's one. But this isn't a real link, this is just for us. And then we re-enable interrupts, notice green. And we go back from lock acquire and now we're running back out of the kernel. The lock is locked and we're running. And we've emerged from lock acquire and we're working in the critical section. Okay, so all is fine and dandy except that this thread B's on the ready queue and the scheduler could take over at any time having nothing to do with synchronization. And at that point, the scheduler takes over, we enter the kernel because of a timer interrupt, we disable interrupts. Okay, and now what? Well, the scheduler is gonna pick the next thread to run and that means B. And so what's gonna happen, if you think through this through is of course we're gonna put thread A back on the ready queue and have thread B run. And so notice that we put thread A on the ready queue so now it's ready. We take B off of the ready queue and we set the CPU's registers up. We re-enable interrupts and voila, we start running. And so now notice that A is on the ready queue so it's technically still runnable. It's got the lock, we know that. But B is now about to approach lock acquire which means it's gonna go into the kernel, disable interrupts and now what's gonna happen? Well, we're gonna check this value to see if somebody owns the lock and of course this time the lock is owned because the value is one and so we enter this code. And this code is gonna put the thread on the wait queue. Okay, and then put the thread to sleep so we know it's taking it off of the ready queue, putting it to sleep and at which point we're going to now schedule another thread to run. Well, what threads are runnable? Well, here this guy thread A's on the ready queue, notice. So the scheduler can take it over. It's gonna re-enable interrupts and now thread A's running and we keep going. So I want you to look here for a moment. What happened was both tried to lock acquire, A got there first, A got the lock, B tried to acquire and what happened was it got put to sleep on the wait queue and from the standpoint of this user code it's still in the middle of a choir so it has not even emerged from a choir yet. It's in the kernel sleeping, okay. And now A is running along and eventually it releases and the release calls this system call which disables interrupts and now if we look carefully what we're gonna do is we're gonna say, okay, I'm releasing the lock. Is anybody waiting for the lock? And if you look up here at the wait queue, yes indeed, somebody's waiting. So we're gonna take that thread off the wait queue and we're gonna put it back on the ready queue, okay. And now again this owner pointer is not a real thing but it's sort of telling us that just for our simulation that we've now transferred, A has kind of transferred the lock to B because it exited release or I mean it's executing release so it's releasing the lock and this code is kind of handed it to B. Now B's not actually running, A is but B now owns the lock because of the way this code is structured and that's why I have a little pointer here for us. And so then eventually we exit, we re-enable interrupts and go back and now A is still running. B is on the ready queue so it can run at any time but it still hasn't emerged from the acquire, okay. So A's running along. Eventually the scheduler takes over because there's an interrupt. Poof, we come back into the scheduler. The scheduler says, oh, who can run now? It's gonna pick a guy off the ready queue which is B and so notice we put A back on the ready queue, B's running. We re-enable interrupts and now we can emerge from lock acquire. So at that point we emerge from lock acquire. We return from the system call and now thread B's running and it's emerged from lock acquire and that means that it gets to run through the critical section, okay. And we could have back and forth now, whatever, scheduler can take over. We have still only one thread actually in the critical section at once and basically that has taken care of our invariant. So I hope the simulation was useful. It kinda gives you an idea of why this works and I did also wanna point out, notice by the way that every acquire and release requires us to go into the kernel and come out of the kernel and into the kernel and come out of the kernel and that's expensive, okay. That requires not just transitions from user mode to kernel mode but saving and restoring of context and scheduling. So there's a lot of scheduling operations that go on in here and so this works. This is a successful synchronization method, implementation of a lock when you go to sleep or when you are waiting to acquire the lock, you're put on wait queue so it's got no busy waiting. So this is a, it's got the right properties but it's very expensive and it also as I'm gonna mention in a moment doesn't work well across multiple processors. This works mostly on a single processor, single core. So if you remember, just to enhance that point for a second, when we were talking about how to make a good web server that can handle many requests at once, we kinda had this thread pool idea where we had a limited amount of parallelism of a pre-allocated number of threads, I'm showing four here, it could be whatever you decide and that every incoming request is queued first and then a thread that's free will take over something off the queue and process the request and send the response. So this particular topology of our application is good because it makes sure that we only have a limited number of things running simultaneously and so the overhead is taken care of from the thread pool standpoint. But this queue is shared, why is it shared? Well it's shared between the master thread and all of the threads in the thread pool and so as a result we need synchronization and if we were to put a lock around this, this means that pretty much every request that gets put on the queue and taken off the queue requires multiple system calls and entry and exit into the kernel and context switching in order to work and so the time it takes to get in and out of the kernel is fundamentally going to impact how high performing this can be. So we could come up with a very simple performance model, we could say that the overhead of that critical section is X where we do a context switch into the kernel, acquire the lock, do a context switch out of the kernel, perform the work, context switch back into the kernel, release the lock, context switch out and so even if everything else, including this exclusive work is infinitely fast, with multiple threads and cores, we still have this entry and exit to the kernel that's gonna limit our overall performance. This sounds like an Amdahl law problem to me, right? And so, for instance, in the highly contended case, we might have P different threads that are all trying to synchronize on that queue and they all try to grab the lock and they end up serialized by this cost of the lock and let's call that X and so for instance, in order for all of these threads to get in and out of the lock, the time is gonna be basically P times X seconds, whatever that is or there's a rate of one over X operations per second and that's gonna limit our performance no matter how many threads we've got and how fast we can operate once we've gotten our lock. So this seems like a bit of a problem, right? And if we go back to system performance, X is probably in the, and I'll show you in a second, kind of in the one millisecond range which if you look back to Jeff Dean's numbers, everybody should know, we're actually talking about down in here for times around, you know, a disk seek or whatever, which seems very unfortunate since we're just trying to lock data structures in memory, right? So somehow this implementation in and out of the kernel is really limiting what we can get and even just to put another point on this, even in the uncontended many lock case where we have lots of locks, remember I mentioned in an array earlier, if every one of these lock acquisitions which are not contented, meaning there's nobody trying to get them other than one thread, if there's a minimum system overhead, that's gonna limit how fast we can do all of these threads regardless. For instance, you could ask what if the operating system can only handle one lock at a time? And so if we actually measure across a bunch of different processors in a couple of different scenarios, we can actually find that the minimum system call time is about 25 times the cost of a function call. And once you start scheduling where you enter the system call and there's some scheduling operations, queuing that's involved there, we'll talk about queuing later in the term can actually end up increasing that cost even more. And so obviously we wanna streamline the amount of system processing, how much does the OS have to get involved in a lock operation as much as possible. And remember in this uncontended scenario here, these locks are never busy potentially until we grab them and then we're the only one grabbing them and we release them. So in the time in which we grab an unlocked lock, we're still being limited by the system time. So there's something wrong with our implementation. Now what you can see is there's lots of optimizations that people have done with kernels to try to limit the system time. They try to make as much in user space as possible like Linux VDSO and that's a situation where we can get closer to a regular function call time but we'd really like to limit the system time as much as possible. So how do we do better? So an interrupt based solution works for a single core but it's very costly. The kernel crossings and system calls are required for users because users are not allowed to actually disable or enable interrupts so we have to enter into the kernel. And meanwhile the other thing is since we're busy disrupting or disabling and enabling interrupts we're disrupting the basic interrupt handling process which we'd like to devote to those things that really need it like IO. The other thing that I mentioned before is disabling and enabling doesn't work on multicore machines. First of all it's very hard to disable interrupts across all the cores. That's a typically much more complicated operation than a single interrupt disable. And the other thing is that in many cases since it's hard to disable across all of them if you only disable interrupts on one core that means that all the threads on the other core will be violating the critical section and so this implementation's not gonna work at all in that case unless we do disable across all cores and that's hard. So our solution is, and we started talking about this last time, to utilize hardware support for atomic operations. These operations are gonna work on memory which is shared between the cores and doesn't require system calls to get at. So ideally what we'd like is we'd like a situation where our basic lock operations are very fast and don't require the kernel, especially in the situation in which we don't have to go to sleep. So if the lock is uncontended, we'd like to just grab the lock very rapidly and go forward to try to reduce that overall system overhead. So we mentioned several read, modify, write operations. This one is test and set. The way to interpret these again was that we have the test and set instruction itself has pseudocode inside where, and remember we did this together we said you grab the value in the address and you store a one, right? So you grab the value, store a one. And so test and set basically lets you find out what was there but always put a one there and that is powerful enough to build a lock operation that doesn't have to actually enter the kernel which is great. And I'll tell you more about that in a second. We talked about some other ones. So swap basically grabs the value from the address, puts it in register A and then similarly simultaneously puts the value in that register back into memory. So you're swapping the value of a register in memory. Compare and swap is a little more sophisticated. That one basically says grab the value from memory and if it still matches something in register one then store register two out. And if you look back on Tuesday I showed you how this is a pretty sophisticated operation and basically allows you to do things like build a link list that has no explicit locks at all but just compare and swap operations, okay? That's what's called a lock free synchronization structure. We'll talk about those more. And then finally we talked about load link store conditional. The idea here is basically that if you grab the value from the address and store it, that's called load locked, and then you do something, in this case we're just gonna put a one into that address and store back, the store conditional says only store back to the address if nobody else has touched the address, no other processor. And in the instance where another processor has touched the address, this store fails and we loop back and try it again. And so if you think through this, what happens is we end up with an effectively atomic operation. Either we make it all the way through and we exit and that would be so this actually implements a test and set or we don't do anything, so we don't alter the state at all and we keep looping until it succeeds. And so this lets us construct arbitrary read modify write style operations or atomic operations using loads and stores load link and store conditional. So how did we build lock with test and set? Here was our first simple cut, we basically set up, we still have value which is our lock itself, but now we started at zero and now what happens is a choir just says while test and set value and release sets value to zero. So remember now test and set grabs a value, stores a one. So assume for a moment that a thousand threads all try a choir at once, they're all gonna try to execute test and set, one of them will see the zero, grab the zero, store a one, but all the rest of them are gonna see a one and store a one. And so what that means is only the first one that sees the zero gets to exit a choir and enter the critical section, the rest of them keeps spinning. And then eventually when that single thread that succeeded executes release, all it does is stores a zero, that zero gets noticed by one of the remaining threads and it gets to go out of a choir and the rest stays spinning. And notice that this implementation is assuming that we have cash coherent shared memory across all the processors, which is an interesting requirement we can talk about later, but let's assume that that just works for now. So the simple explanation that I just told you is if the lock's free, the test and set gets a zero and sets a one, returns a zero, so we get out of the choir, otherwise everybody else just keeps spinning and when we finally set the value to zero, somebody can get the lock. So this has the nice property that we don't have to enter the kernel, okay? So there's no system call involved here, but it's kind of 100% busy waiting. So the thread, all of the threads that are waiting are busy spinning, consuming cycles and this is really not a good implementation, because remember we said busy waiting's not good. And it's even worse on a multiprocessor because a test and set, remember it grabs the value for memory and stores the one, is a write operation from the standpoint of the cash coherence protocol. So when we have many cores or multiprocessor, what's happening is every one of them are doing lots of writes, which means that that value is ping ponging back and forth and causing a huge amount of bus traffic. So not only are we wasting cycles, but we're wasting bus traffic in the memory bus and tying everybody up so they can't make progress. So this is just undesirable because of busy waiting and it's even worse in a multiprocessor. So what we can do, at least in the case of the multiprocessor, is the following, or actually I'll show you that in a second, I wanna talk about busy waiting even more. So the positives for this particular test and set solution are one, that the machine can receive interrupts, so we're not disabling interrupts in any way. The user code can just use this lock without actually entering the kernel. So from the standpoint of the sort of minimum system overhead, why that we kind of mentioned in the previous couple of slides, this is very low overhead because it's just a simple test and set instruction. And it works on a multiprocessor. So we can use this memory shared across multiprocessor, it will work, okay? So we have a lock implementation. The negative is it's very inefficient because threads are consuming cycles and of course when you're busy waiting, you're potentially taking cycles away from the very thread that's gotta eventually free the lock because your thread is running spinning, the thread that needs the cycles are not working. And then as I mentioned for multiprocessors, this has got an even worse problem. And the other thing that's kind of a problem here is priority inversion could get involved and we're gonna talk about that as we go a little further. But if the busy waiting thread has a higher priority than the thread holding the lock, nobody ever makes progress because the higher priority thread is busy spinning. As far as the scheduler is concerned, it's making progress on something, but because it never gives the processor up to the lower priority thread that has the lock, that's a problem. And this is actually one of the interesting problems that happened with the Martian Rover and we'll talk about that when we get into scheduling. So looking forward, we're gonna wanna make sure that among other things that our APIs for our higher level primitives such as semaphores and monitors basically don't have busy waiting built into the API. Right now we've been building locks and the way we've been building locks kind of the user code itself has busy waiting in it and that seems very unfortunate and we would like to do something that the API doesn't have busy waiting and so the underlying implementation could do a better job. And I will point out and I've said this before and I'm gonna say it again that homework and exam solutions should not have busy waiting in them. And if you think about it, is it possible for a user thread to hold a lock and cause somebody else to spin for an arbitrary amount of time then you know you've got busy waiting and you have a problem. So let me very briefly, I promised you this, I wanna fix the multiprocessor spin lock issue a little bit by doing something called test and test and set. So this is not a new instruction, it's just a little bit more clever use of test and set. And so here we have the lock again and notice that a choir does the following. In a loop it first says as long as the lock is one, that's why all my lock, this is saying it's not zero, we spin and then when we're done spinning because there's a zero there, then we try to do test and set to grab it and if we're successful, we'll just plain exit the acquire and we have the lock. And if we're not successful because this test and set, you know grab the lock, store one and we get back a one, then we go back to the loop again and we keep spinning until the next time we see a zero. And if you look at what this does and release a course sets this to zero and what this does is this says that most of the multiprocessors that are waiting, yes they're busy waiting, so that's still bad, but they're spinning and doing reads in the cache. And so if the one thread acquires the lock and holds onto it a while, eventually all the other processor cores are spinning read only in their caches and so they're using no bus traffic for this. So this is how you would do something on a multiprocessor if you were willing to burn busy wait, okay? And so however this is still consuming cycles while waiting. And I will point out it's consuming cycles but a given processor wasting cycles is not impacting the other processors through the cache coherence memory. Not yet because you're still too early in your understanding of synchronization. I will point out that eventually when you get more sophisticated, it may make sense to spin wait in some isolated scenarios of a multiprocessor where you know it's not gonna wait very long and so you briefly spin wait while the other one is doing something, release the lock and immediately get to go. That's for a little more sophisticated locking scenario. So for now however, no spin waiting, no busy waiting in your implementations. So how can we get the advantages of test and set running at user level without system calls without the spin waiting? Well, we can take a page from our, what we did with interrupts earlier which is let's once again do something where we're gonna use the spin waiting just like we did disable and enable interrupts as an implementation of the lock where the lock is in memory. So here we go, dun-dun-dum, this is our lock value which could be either free or busy and now we're going to introduce a different variable in red here which is part of the implementation of the lock. We're gonna call it guard and notice how does the choir work? Well, what it does is it says while the test and set on the guard is busy, weight, otherwise we go in and then we say, we do our check if the lock is currently taken, we put the thread on a weight queue and we go to sleep and simultaneously set guard to zero because remember it's gonna be set to one when we come through here. Otherwise, we grab the lock and set guard to zero. And so again, the way to think about this is since this is part of the implementation of the lock variable, the time inside here while the guard is set to one is gonna be very brief and it's really about acquiring the lock which is then given to the user code and they can keep the lock as long as they want so value will be busy in that instance and we immediately free up the guard. So there are no circumstances where we think we're gonna test and set, spin very off for very long. So this isn't technically a busy weight scenario. And the release, again, we grab the guard, we say, is anybody on the weight queue? If so, we take them off the weight queue and get them ready to run. Otherwise, we set value to free, free things up and set the guard to zero. And so as a result, just like before with the interrupt disable, we're using this test and set as part of the implementation of the lock itself. And note that the sleep has to reset the guard variable here and the same issue we had with re-enabling interrupts, we have to do this atomically, go to sleep and set guard to zero. Otherwise, we can get a situation where the release might get in here and miss waking up the thread. And then finally, I will point out and I'll probably show you this in another lecture that notice that the acquire is entirely at user level using test and set and so on. When we get to sleep, this is the point at which we might let the operating system take over and put the thread to actual sleep. Okay, and there are various scenarios like few texas, et cetera, that are part of modern synchronization that kind of use this combination of the test and set for the very quick grabbing of an uncontended lock and the use of the kernel potentially to put things to sleep. But that's a topic for another day. So let's just take through this, remember what our disable interrupt case was. So we basically had our lock itself and disable interrupts, acquire the lock or go to sleep, enable interrupts and release was similar. And so notice that what we kind of did for this test and set example is we replaced the disable interrupts with while test and set guard and the enable interrupts with guard equals zero. So very parallel kind of operation. And in fact, let's recap very quickly. You can see that we started out by saying, well, what if lock acquire was disable interrupts and release was enable interrupts? We said, boy, that's right out and very bad. If one threads in the critical section, no other activity happens, including interrupts are all ignored. Instead, what we did is we used the disable and enable of interrupts as part of putting a critical section in the implementation of the lock, the lock itself being a separate variable. So in this other instance with test and set, notice how this is very similar. We started with the test and set being acquire, the while test and set being acquire and the value equals zero being release. And it said, well, that's bad because that's busy waiting. So instead we use that test and set as the guard on the implementation. And once again, the lock that we give to the user is a very value in memory just like it was in the previous example. And notice also that as I said earlier in the lecture that this value could be an array of values thereby giving us many locks. So we can build a whole array of locks, one for every array location or one for every item on a linked list or one for every file descriptor in the file system, et cetera. Okay, so what I'd like to do next is I'd like to talk a bit about how powerful are locks? Are they enough? Okay, and to do that, we're gonna use a bounded buffer which is a very common programming paradigm where we have a bunch of producers that are producing things and a bunch of consumers consuming things and a buffer in the middle that's bounded in size. So we don't wanna have too many items in this buffer because we only have a finite amount of memory, for instance. Now, excuse me. So our problem definition is producers put things in the shared buffer, consumers take them out and we need some synchronization to coordinate the producer-consumer scenario. And of course this buffer is gonna be implemented in some way, it could be a circular buffer, I'll show you that in a moment. It could be a linked list, whatever. But clearly we're gonna need to make sure that we don't have more than one thread in the middle of that implementation simultaneously because then the buffer itself could get all screwed up. So we don't want the producer and consumer to have to work in lockstep which is why we put the buffer there in the first place. And so producers can produce at some point, consumers consumer later. If a consumer goes to the buffer and there's nothing in there, the consumer needs to sleep. If the producer goes to try to put something in the buffer and the buffer is full, it needs to go to sleep. So we have several sort of synchronization requirements here. So we need to synchronize access to the buffer, producer needs to wait if the buffer is full and consumer needs to wait if the buffer is empty. So this is getting kind of interesting, right? This is a more interesting synchronization scenario. And it's many examples. So for instance, the GCC compiler uses pipes which you're becoming increasingly familiar with as you do projects and homeworks. But here is a situation where instead of there being one compiler, there's actually many phases each of which is a process. The C preprocessor pipes its results into the first and second, the first phase of the compiler which pipes its results into the second phase of the compiler which goes into the assembler which goes into the loader and data kind of flows its way through as it goes. And so each one of these vertical lines represents a buffer with a producer on the left and a consumer on the right. So this is a common scenario. Another good example might be a Coke machine where the producer, which is the Coke delivery guy comes and tries to put Cokes in there can only do it if the buffer is not entire or the machine is not entirely full. And then of course, students who might be waiting outside can only take Coke out of there if the machine is not empty. And so that's a synchronization scenario which we might need to solve. And there are many others, okay? Web servers, routers, you name it, this bounded buffer is a common paradigm. So let's see if we can do it. So let's start by asking how we might actually build the buffer portion of this. So one option here is we could build a circular buffer data structure. This is the sequential case, which is where we're not worried about multiple threads at once. But we would have an array with some indices, the current write position, the current read position, as you see. And what happens here is each time we, for instance, write, we put an item in there and increment the write pointer. And if we ever get to the end, we're gonna wrap around and the same with the read. And the way we know, we can ask some questions, for instance, how do we know if we're full? Well, we're full when our write pointer is jammed up against the read pointer, or depending on how you wanna do that, or they're overlapping, in which case you can't write anything more without overwriting a read. So that would be a full buffer. The case of an empty buffer would be the case where the read, you keep reading until eventually your read pointer would move on to the write. And in that case, you wouldn't have anything to read out of the buffer. So there are many ways to implement this, but you can imagine that if the reads and the writes were somehow to get messed up in the middle with multiple threads accessing in them, then we could have a problem. And so we need to make sure that our operations are consistent and atomic when we go from this sequential case to the thread safe case. So moving now onward, so how might we build a fully synchronized circular buffer? Well, the simple idea might be, well, there must be a lock involved somewhere here. And so let's have our lock and then what? Well, our producer code might look like this. We acquire the lock and we check. And if the buffer is full, we just spin and wait. So that's what a while buffer is full wait. And then we enqueue the item and release. So the consumer would look similar to this where we acquire the lock and we spin while the buffer is empty. We dequeue the item. If the buffer isn't empty, release the lock and return the item to the consumer. This might look like it works, but certainly we have a lock around the enqueue and dequeue items so no problems with multiple threads touching the buffer at the same time are gonna occur. But you can see that this is pretty bad and actually for the following reasons. What happens if we ever are spinning here? If the buffer is full and we spin, how are we ever gonna get out of that scenario? Well, it would be because some consumer came in and took an item out of the buffer. Except, notice we've acquired the lock and then we spun. And here, the consumer in order to get in and dequeue an item and free up the buffer would have to acquire the lock. And so all consumers are gonna be sleeping, waiting for the acquisition to finish while we're busy spinning. And essentially this is a deadlock that would never come out. And the similar problem happens with a consumer spinning, waiting for the buffer to be non-empty because while the consumer has acquired the lock and is spinning, no producer can get past the acquire. So this solution is broken. Now, you could say, well, maybe what we can do here is the issue is while we're spinning, we have no way to let somebody else acquire the lock and get in there. And so maybe in our loop, we'll release the lock and reacquire it. And perhaps that could be a way around this problem. So to fix this problem, it might look something like this. Hold on a second. So to fix this problem, it might look something like this. So we still have the same lock, but now notice what we're doing here in this loop. We're essentially saying acquire and then while the buffer is full, do a release and acquire again. And we do similarly with the consumer. So while the buffer is empty, we do a release and acquire again. And so now, assuming that the lock implementation is somehow fair, which is a question in and of itself, then we might assume that if multiple consumers are waiting in the acquire, then the moment that the producer, for instance, executes a release, then one of the other ones will get to acquire the lock and go through the process of dequeuing. Meanwhile, the producer will get put to sleep on acquire until eventually, the consumer's done, the acquire will happen again, we'll check the condition and enqueue. So it looks on first blush that we fixed this problem. That we had earlier with deadlock, but in fact, what do we still have here? Well, we should ask ourselves what's happening when in this weight loop, we're still at the speed that we possibly can, spin waiting, right? Because we're releasing, acquiring, releasing, acquiring over and over again. And so whoever's waiting is likely to be in a busy weight scenario. And if for some reason say it's a bad lock that goes into the kernel all the time or has a high overhead, we're busy releasing, acquiring, releasing, acquiring all over again and just wasting time. So the simple message that we can get out of this is perhaps we can build this circular buffer with locking but it seems like we'd like something else because this is messy. And so what else might we do? Well, let's talk about some higher level primitives. So the goal of the last couple of lectures is really asking, first of all, how do you synchronize it all? But then what's the right abstraction for synchronizing threads? We want it to be as high as possible level because the higher the level of the primitive, the more likely we are to use it correctly and end up with bug-free code. So that's kind of an overall goal of ours. And good primitives and practices are very important. So part of what we're doing in these last couple of lectures here is trying to give you a foundation for how you might go about building things that are likely to work. And UNIX, for instance, is pretty good now in all its different variations and BST and Linux and so on. But up until about the mid-80s it would crash on a regular basis and that was pretty much because synchronization primitives were being used in ways that led themselves to bugs. So just to reiterate, synchronization's a way of coordinating multiple concurrent activities that are using shared state and we're starting to talk about ways of structuring sharing. So to do that we're gonna introduce a new primitive called a semaphore. All right, now semaphore is a kind of a generalized lock. First defined by Dijkstra in the late 60s and by the way it's the main synchronization primitive used in the original UNIX and a bunch of different UNISCs to date as well. And the definition is basically that a semaphore has a non-negative integer and supports the following two operations, P and V. So P is an atomic operation that waits for a semaphore to become positive then decrements it by one. Okay, think of this as the weight operation. And then V is an atomic operation that increments the semaphore by one waking up a weighting P if any. Think of this as signal. Now again, let me point out a couple of things. So you could think of a semaphore as a very specialized non-negative integer. So it has inside of it a value that's from zero up, can't be negative. And supporting these operations P and V, P says we try to decrement unless we're going to decrement below zero. If we are gonna decrement below zero, we sleep until we have the chance to decrement below zero or until we can decrement to something that's not below zero. V does an increment. And if somebody is sleeping, weighting that increment's gonna trigger them to wake up and go ahead and do their decrement and leave their P. Okay, so note that P stands for proberan or to test and V stands for verhogen to increment in Dutch. And this is a good example of sort of history is what it is. And so you get these names P and V are basically based on the original semaphores. So let's explore this a little bit more. So semaphores are like integers except they're no negative values and the only operations allowed are P and V. So in fact, once you've initialized a semaphore, you can't even read or write its value. So you have no idea what the value is at any given time except at the very beginning at initialization. So operations are clearly atomic. So multiple P's together can't get the value below zero. If you have an interleaving of P's and V's, you'll never be in a situation where say a thread is sleeping on a P operation but the internal value is above zero. That just, things are atomically built. That would be an incorrect implementation. The semaphore itself is from a railway analogy. And for instance, here's a semaphore initialized to two being used for resource control. And so what's gonna happen in this particular, this particular animation is a train's gonna come by and try to do a P operation. If it does, then it'll get to pass. Otherwise it'll go to sleep. So when the first train comes by, notice that it came by, it executed a P which put the value from two down to one. And then it's on the sidetrack here which is kinda like being in a critical section but a little more generalized. The second train might come by and it's also in the critical section. So because we originally set the value to two, we're allowing two entities to pass through P. And then the third one that comes along is now sleeping, okay? Because it tries to execute P, the value's zero, it's sleeping. Now as these trains exit the critical section, they're gonna execute a V operation incrementing the semaphore and thereby waking up anybody sleeping. So here we go, ready, watch, click. So that train left, incremented V, woke this train up and it got to go forward. So unlike locks, which are basically locked or unlocked, semaphores have many more degrees of freedom, right? It has a value anywhere from zero up to whatever the maximum integer is in its particular type. And so that means we can do more powerful things with it. So for instance, there's at least two different uses of semaphores. So if we set the initial value equal to one, then suddenly we have mutual exclusion and we have locking again. So for example, if you think about it, it's called a binary semaphore, but if we set it equal to one, that's like things being unlocked and when you execute a P on it, what happens is it's like taking the lock and the value goes to zero and any subsequent threads that try to execute P in that state will basically be put to sleep. Eventually the thread that executed P originally executes V thereby incrementing the value back up again, waking up one of the sleeping threads who will now get to exit its P operation and go forward. So this semaphore with an initial value of one is exactly behaving like locks that we talked about earlier. So here's an example. Here's a mutual exclusion example. You do semaphore P, critical section here, semaphore V. Another thing you can do is use it for scheduling constraints. So you can set the initial value to zero, for instance, and use it to say, well, until some resource becomes available, we are able to sleep on this semaphore and so it's a scheduling constraint. So for instance, we start the initial value to zero, thread one waits for a signal from thread two, thread two schedules thread one when the given event occurs. The simplest example might be like a thread join. So imagine you have a parent and a child thread. The parent wants to wait for the child so it does a thread join, which is just a semaphore P because the semaphore was initialized to zero, this parent immediately goes to sleep. The thread, the child eventually finishes, executes a semaphore V, and that wakes the parent up. So very simple scheduling constraint. Now we can do other scheduling constraints. You saw earlier we talked about the train. We set the value to two. That basically said that up to two entities are allowed in the equivalent critical section. So basically all the values from zero on up to max have some very interesting things we can do with them. So let's revisit the bounded buffer here and see whether we can make that work with semaphores rather than that locking scenario which was imperfect. So our correctness constraints here is the consumer has to wait for the producer to fill buffers. If none are full, the consumer will go to sleep. So that's a scheduling constraint. The producer similarly must wait for a consumer to empty buffers if everything's full, okay? That's another scheduling constraint. So the Coke man can't put Coke in the machine if it's full and a student can't take Coke out of the machine if it's empty. And then last we need mutual exclusion on the implementation of the buffer itself to make sure that multiple threads don't try to manipulate the buffer queue at the same time and thereby get an incorrect implementation. So remember again why we need that mutual exclusion because computers are essentially stupid and really we don't know that the implementation of the queue itself is thread safe to multiple threads at the same time and so we use mutual exclusion to prevent concurrency bugs in the buffer. So use a separate semaphore for every constraint. So if I count them up here, I see two scheduling constraints and a mutual exclusion constraint and so that's gonna lead to our three semaphores here. One is the full buffer semaphore, the empty buffer semaphore and the mutex and here's how we're gonna use that. So we're gonna start by saying the number of full slots is zero. So initially there's no Coke and then the empty slots is gonna be the total size of the machine. So initially there are a full machine worth of empty slots, okay? So notice how these two are gonna be working in conjunction with each other and then finally we have something we're gonna just call a mutex, which is one and that's gonna be the lock. And so now what does our code look like? The producer says, well, empty slots.p, that basically says wait until there's space, okay? So if there are no empty slots, we're gonna sleep. Then it grabs the mutex, the lock, and queues the item releases. So this little pattern of mutexp and queue mutexv, this is exactly a locking sequence, okay? And then last but not least, we've put the item on the queue. We execute full slots.v to wake up any consumer that's waiting, okay? Now the consumer is like a mirror image, okay? So first of all, the consumer says, check if there's any full slots at all. So is there a Coke? If not, we're gonna go to sleep on the P here. And then we execute, grab the lock, dequeue an item, release the lock. So this is our locking paradigm. And last, the consumer is gonna wake up any thread that's trying to produce empty slots.v. So if there was somebody sleeping on empty slots.p, this empty slots.v is going to wake them up at that point. So you can kind of see that full slots.v wakes up the consumer, and empty slots.v wakes up the producer. Okay, so what about this solution? So why is there an asymmetry here? So the producer does empty buffer p, full buffer v. The consumer does full buffer p, empty buffer v. Well, the producer decreases the number of empty slots and increases the number of occupied slots. The consumer decreases the number of occupied slots and increases the number of empty slots. So this is directly related to the way that we've chosen our semaphores. Now, is the order of the p's important? So here, notice what I've done in red is I've taken and swapped empty slots p and mutex p. In fact, if we do this, we've got a problem. Okay, because think about this for a moment in which there are no empty slots. So the producer is gonna have to go to sleep until a consumer comes along and takes a coke out of the machine. So what happens in this version of the code is it grabs the mutex, and then it tries to do empty slots.p, which is gonna put it to sleep because there are no empty slots. And now it waits. And what is it waiting for? It's waiting for a consumer to take coke out of the machine. But we come to the consumer. The consumer first does full slots dot p. All right, that's fine, because there is a full slot. Then it tries to grab the mutex, mutex dot p, at which point it's put to sleep. Why? Well, because the producer already did mutex p and then went to sleep. And so now the consumer is stuck here at mutex p and will never get past that to do anything further. And the producer is stuck at empty slots p and won't be able to make forward progress. So this unfortunate scenario says that if we swap the two p's, we end up with deadlock. And that's tricky, right? Now, you could say anytime we're doing a locking scenario like mutex p, mutex v, we clearly wanna have that kind of as an inner, sort of an inner, about a pit of code here. So it's like we've got a hierarchical set of locks. This interlock, the p followed by v, needs to be close together, which they're not there, to make sure that we're really just using that for locking for concurrency. And then these outer ones are taking care of the resource constraints. But you might start to get a little nervous about how do you get that to be correct? It's the order of the v's important, so no. So if you do mutex v followed by full slots v, that's what we've got here, that would be fine. If we swap them, we're not gonna cause any deadlock because all we're doing is we're incrementing the two of them in a different order. It's not gonna change the result. It's still gonna wake up everybody that needs waking up. But there may be some scheduling efficiency differences depending on the ordering there. And if you start swapping different things like v's, then it's gonna be harder to visually inspect the correctness of this code. Like here I can visually inspect the correctness of this and see that the mutex p grabs the lock, we do something, release the lock. I can visually see that that looks okay. Whereas when I start swapping things, it'll be harder to see. Finally, what happens with this code, not the bad code here, but the previous correct code, if I have two producers or consumers, or five producers and five consumers, the answer is it just works. And the reason is really that nothing in the way we've structured this code impacts how many consumers can be simultaneously sleeping on say fullslots.p or how many producers sleeping on emptyslots.p, that's just part of the code. So this just works. Do we need to change anything to do multiple producers or consumers? No. So semaphores are clearly more powerful than locks. If you look back, and I'm gonna go back to just our good solution just so we have it, notice that nothing in what we've got in this code shows any busy waiting or what have you. This is a nice clean solution. It has the implementation of P and V, especially P, could be properly built so that if you have to go to sleep, you go into the kernel and get put on some sleep queue, what have you. So there's nothing, no big deal there in terms of getting it right. And so we've chosen the level of API of the semaphore to support a non busy waiting implementation. However, it's a huge step up, but the problem is the semaphore is a really dual purpose. We're using them both for locking and for scheduling constraints and that's where we're starting to get a little bit of a problem because that's where we accidentally swapped a mutex and scheduling constraints semaphore and suddenly got deadlock. And that starts to give you a little bit of a worry that perhaps your code might be misordered and thereby cause deadlock occasionally. So a cleaner idea is to use locks for mutual exclusion and condition variables for scheduling constraints. So we're gonna go back a little bit to using locks again for what we know that locks are good for, but we're gonna introduce this new idea of a condition variable, excuse me, as an item that basically allows us to introduce our scheduling constraints in a much cleaner fashion. And so we're gonna package this all up, the idea of a lock plus condition variables together and call it a monitor. So a monitor is really a programming paradigm. It's a way of dealing with concurrency that includes a lock and condition variables, okay, zero or more technically speaking for managing concurrent access to shared data. And some languages like Java give you native monitors. Others like C, for instance, the P-threads implementation gives you locks and condition variables as separate items that when you put them together with a monitor pattern, you get kind of the monitor way of programming, okay? So just to emphasize, so a monitor here is a paradigm for concurrent programming. So what's a condition variable? Well, we want to change our consumer routine to wait until something's on the queue and we could do that as we saw with semaphores, but it's a little error prone. And so what do we do instead? We use what's called a condition variable. Now, listen up here. So a condition variable is a queue of threads waiting inside a critical section. So now this is already a little weird and not what you're used to. So the idea here is you grab the lock and then you potentially go to sleep on a condition variable with the lock, okay? Now, the previous implementation using semaphore going to sleep with a lock taken actually caused us to deadlock. But condition variables are very special items whose explicit use pattern is to sleep inside of a critical section. And I'll show you how that looks in a moment, okay? And their operations on condition variables are wait signal broadcasts, something that they have different names depending on what languages they're in and so on. But wait has to know which lock you've got that you're using, but it atomically releases the lock and goes to sleep and then when it, the thread wakes up again, it regrabs the lock. So from the standpoint of you, the programmer, the lock is never released. You enter weight and you exit weight and the lock is just held on to the whole time, but you've gone to sleep, okay? Now signal wakes up one waiter on a condition variable and broadcast wakes up all the waiters on the condition variable, okay? So the rule, and this is a rule, is you have to hold the lock anytime you do condition variable operations. So if you try to say, well, I'm gonna be good and whatever that means and never go to sleep inside a lock, inside a critical section, then you are not programming in a monitor style and your condition variable is gonna not work properly, okay? So don't do that. So we can think of a monitor as providing controlled access to shared data. It's got a lock, that's an entry queue of folks on entering the monitor and a set of condition variables, which is a queue of threads waiting inside the critical section. And I want you to know that neither the lock nor the condition variables are necessarily controlling the shared data. What they are is they're providing a way to give you a sophisticated policy of who gets access to the shared data, okay? And I'll show you how that works in a moment. But let's look at the simplest example we can and this is not the finite buffer yet. This is just a buffer which has producers can produce as much as they want and consumers try to take things off the buffer but if there's nothing there, they go to sleep. So this is a very simple, I'll call it an infinite synchronized queue. And we have a lock and a condition variable and a queue. And the producer who doesn't care whether things are full because we're doing a simple buffer here acquires the lock in queues the item, okay? Because we have the lock so we can use the in queue operation without worrying about it. Then it does a signal in case somebody was sleeping waiting for an item on the queue and then it releases the lock, okay? Now, the consumer does something sort of parallel. First it grabs, acquires the lock and now we have this loop where it says well as long as the queue is empty then I'm gonna go to sleep with a condition weight on this condition variable with this buffer lock, okay? Now notice we have acquired the lock. We've entered, since we have the lock we're in the critical section from the point where we acquire and release so everything in here, you as a programmer should think of as operating with the lock taken. So therefore I can do whatever I want to see if the queue is empty. Could be something complicated using many memory operations. It doesn't matter because I am as the consumer the only thread operating in this red section right now, okay? And once the queue is no longer empty I'll exit the while and I can dequeue the item and release and this dequeue can operate perfectly well without worrying about more than one thread in the queue operations because again the way you think about this is I have acquired the lock and I've released the lock, okay? So this is a slightly different way of thinking but you can start to see the pattern here. The pattern is grab the lock, check our condition if our condition's not right go to sleep on the condition variable. When you wake up, keep looping and checking and when the condition's finally okay then you go ahead and do the operation. So in this case we were waiting for something to be in the queue, to be not empty at which point we dequeue, okay? And think of this as the lock is kept from the point that you acquire to the point that you are released. There's don't think about there ever being a point in which your code interacts without the lock at that point, okay? So this is a way of thinking. Now you can occasionally peel back the covers of the implementation in your brain. I don't want you to do that too often to say well what I know that the condition way does is it releases the lock, puts me to sleep and then after somebody signals it reacquires the lock. You can think of that because obviously somebody has to release the lock otherwise the producer can't work but in terms of thinking of how this works it's better to accept that and come back and say well what I know is everywhere between the acquire and the release the lock is taken and nobody can be inside that critical section while I am and so I don't have to worry about any critical operations in here interacting with more than one thread, okay? So now you might have wondered a little bit about this while loop. So this while loop seems to say well if the buffer is empty wait and then when I wake up check it again, right? Check it again, check it again and the question might be why? Why do that? Why not just say well they signal me so I should just dequeue and the answer comes down to actually the type of implementation of the monitor itself of the condition variables and so there's two different types of implementation one called Mesa and one's called whore and it's really gonna get at this notion of why do we want the while loop as opposed to the if, okay? So this would be perhaps what you think is more natural which is well if it's empty we sleep and then we emerge from sleep and dequeue as opposed to this which is if it's empty we sleep and then we check again and as long as it's not empty we continue, okay? So those two patterns actually represent the difference between Mesa style named after Xerox Park's Mesa operating system that's the first one and the whore style which is actually named after a British logician Tony Whore. Most operating systems have Mesa scheduling and therefore you need to use the while loop, okay? And we're going to assume unless we ask you otherwise about a whore style scheduling that you're operating with Mesa style and so you should be checking the while loop, okay? Over and over again. And let's look at this a little bit further. So let me just tell you why the whore style monitor might be appealing to say in a textbook or talk about monitors. The idea here is that the signaler gives up the lock and the CPU to the waiter and the waiter runs immediately. And so here the waiter then gives up the lock and the processor back to the signaler and so you could look at this this way. I have a thread on the right which has in the past acquired the lock, saw that the queue was empty and gone to sleep. And now the signaler thread or the producer is gonna produce something and wake up this thread over here with a whore style monitor. And so what happens is we acquired the lock, we go to the signal and now notice what happens. The whore style scheduler actually immediately gives the lock and the CPU to the other thread which emerges from condition weight and does whatever it wants to. So it knows that when it runs, it runs immediately after the signaler has signaled. So we know whatever condition here in the case that the queue is not empty is absolutely true. And so we can go ahead and operate as if it's true. And at the point that we finally hit release, we hand the lock and the CPU back to the original signaler who continues from there. So positives of this are that when you emerge from condition weight, you know that the condition is now correct and you can run immediately. The negatives is you can see this is kind of complicated and we're handing a lock and the CPU back and forth between multiple threads and on release, we have to then reschedule. And so the acquire and release operations are now tied into scheduling, which we really didn't like much when we were talking about locks earlier and trying to get the overall system overhead lower. So this is maybe clean semantics, but it's complicated, right? We're context switching, we're moving data back and forth. And really the other thing is it kind of messes up the physical resources. So if this thread, it's been running for a long time and has some really good cache state going and we go to signal and suddenly we're over here and this guy then runs for a bit and then we come back and we maybe the cache has been polluted or whatever. And so neither of these threads get to run very efficiently and it's quite possible that if we do this correctly, it really isn't useful to signal and run immediately because most of the time we're not competing with other threads in that way. So the Mesa Monitor idea is really targeted after a simpler scheduler and making better use of hardware resources. So the signaler keeps the lock in the processor and the waiter goes on the ready queue with no special priority. And so here's our scenario before, but notice I've replaced the if with a while and we'll assume that this first thread is busy sleeping and conned wait. And what happens is the producer thread gets to the signal and all that happens is this signal takes this thread and puts it on the ready queue, period. We don't force it to run. So we're not context switching, et cetera. And then we just keep going. Okay, so the positives about this are many. So first of all, we just plain sort of switch queues for this guy. So there's no complicated context switching. We get to keep using our cache state. We get to keep running efficiently and then notice that acquire and release we can pass through release without rescheduling. So the acquire and the release operations are potentially still very fast. Don't involve the system. And so sometime later, I haven't even indicated when, the scheduler takes over, this guy gets to run. He emerges from condition wait, checks to see whether the queue is empty or not. And if it is, it goes back to sleep, but if not, continues. And remembering the pattern for monitors, you always think of this condition wait as holding the lock the whole time. So when you enter it and when you exit it, the lock is held. And so all of this code, whatever it is in here, you can think of in terms of being inside of a critical section, so you don't have to worry about multiple threads messing with the queue at the same time you're trying to see if it's empty. Okay. Most real operating systems do it this way. More efficient, easier to implement, better at cash state, et cetera. Okay. So let's look at our third cut at a circular buffer with, which is really very close to what you could do with P threads. And using monitors, we have two condition variables and our lock. And now the producer looks like this. We acquire the lock while the buffer is full. We wait on the condition variable, which one we wait on the producer's condition variable. And notice we also supply the lock. And that is of course, because inside the condition weight implementation, we know we go to sleep and release the lock and then we reacquire the lock before emerging. Okay, but just keep in mind, I'm doing this in a typical C style now. We're providing a pointer to both the condition variable we're sleeping on and the lock itself. Okay. And then assuming that the buffer is not full, we enqueue the item and we signal the consumer to tell them that there is a new item there and we release. Okay. Now what's interesting here is the consumer is sort of a mirror image that we grab the lock. If the buffer is empty, we go to sleep on our condition variable for the consumer. When things are finally not empty, we dequeue an item and we signal the producer telling it, okay, wake up and put a new item on the queue if you're sleeping and then we release the lock. And notice that these signals are signaling somebody only if they're there. So if there's no consumer that's waiting, then this condition signal is gonna do nothing because it's signaling an empty condition variable queue. Okay. So that's important to know, okay. So it's a little different, by the way, from semaphore v, which always increments and somebody could later come along with a p and decrement and go forward. Here, the signaler only operates when there's actually somebody there to operate on, otherwise it's essentially a no op. So what do the threads do when it's waiting? Well, condition waits to sleep operation. There's no busy wait in here. And once again, as I've said earlier, this is because we've raised the level of the API for monitors with condition variables such that there's no busy waiting built into the code that we're writing, but rather that's part of the implementation of the condition variable and we're assuming that that is correct. And notice that this while loop is not busy waiting. Why do I know that? Well, we check something. If we go to sleep, we go to sleep. And so if this condition or if the signaling doesn't happen for an hour or whatever, that's okay because we're sleeping and we're not wasting cycles. It's only when a signaler comes along that we wake up, check the condition, potentially go to sleep right away so there's no busy waiting in the way this is expressed, which is different from if you go back to our second cut at the circular buffer where we were doing sort of a release, acquire, release, acquire over and over again, which had plenty of busy waiting going on. So why the while loop? Mesa, okay, Mesa semantics. For most operating systems, when a thread's woken up by signal, it's just put on the ready queue. It does not get to run immediately, so it's perfectly possible, say the queue was empty, somebody put something on the queue, we wake up a thread with signal. By the time that thread finally requires a lock and gets to run, some other thread has come in and grabbed the item off the queue and it's empty again. That's why we have to check while. Okay. All right. So the last major thing I wanna do in today's lecture is I wanna talk about the reader's writer's problem and the motivation here is really considering a shared database. So and the reason we wanna do this problem is it's gonna show kind of the real power of the monitor programming paradigm. And so if you think about a shared database as here in the middle, the readers never modify the database and so potentially we can have many readers at once because they're not gonna interfere with each other. The writer on the other hand is modifying the database and if a writer is modifying the database at the same time that readers are reading, we're likely to get chaos. Or if two writers are modifying the database at the same time, we're likely to get chaos. And so we need to keep the writers and readers separate and we need to keep multiple writers separate. And the question is how's the best way to do that? Well, could use a single lock where you just, you know, writers or readers just put a lock on the database, do their thing and unlock. Except is this good? Well, no, why is that? Well, because if the first reader grabs the lock, then the multiple readers are gonna have to wait until the first reader is done so we don't get the multiple readers aspect here at all. And so furthermore, if a reader doesn't grab any locks, you could say, well, we'll just not use the locks for the reader, then the reader could be busy in the database and nobody would know when the writer comes along. So we clearly need some synchronization, but a single lock is clearly not it. So what I wanna do is show you how sophisticated you can get with a monitor. And so let me explain the variables here and just wait until I finish before you panic on the complexity here. It's not really that bad. So the correctness constraint is as follows, readers can access the database as long as there aren't any writers. Writers can access the database one at a time when there are no readers or writers. And only one thread manipulates the state variables. Now these state variables are policy variables, okay? And they're the things that are gonna be protected by the monitor lock is the policy variables. So the structure of the solution is this, basically, reader waits until there are no writers, accesses the database, and then finally checks out, it decides to wake up any writers that might be sleeping. Why would they be sleeping? Well, while the reader is accessing, if a writer comes along, it can't work right away, so it's gotta wait. The writer has a similar pattern where it waits until there aren't any readers or writers, accesses the database, and then when it wakes up, when it's done, it wakes up either readers or writers who might be sleeping. Here are our state variables. Yes, there are four variables and two condition variables, but this example is powerful in a non-complicated way, and let me get you through these variables. So AR is the number of active readers. Those are the number of readers in the database at this instant. WR is the number of readers waiting to access the database, and they're gonna be waiting. How do we wait in a monitor? Well, we wait on a condition variable. They're gonna wait on this okay-to-read condition variable. Similarly, AW is gonna be however many active writers are currently in the database. Now, you might ask the question, how big can AW be? Well, it can't be bigger than one because we don't wanna allow more than one active writer at a time. Excuse me, whereas AR can be as big as we want. The number of waiting writers could also be similarly large. Waiting writers are gonna be sitting on the okay-to-write condition variable. Okay, so here's how a reader now works. So the reader is going to, and this is a basic pattern for monitors, we're gonna check in to see whether we're allowed to read or not, do the reading, and then check out, okay? So to check in, we acquire the lock. We check and see, well, if there's either an active writer or a waiting writer, we're going to put ourselves to sleep. So this is that while loop for a Mesa-style monitor. And what we do is we check our conditions and as long as there are any writers in the system, what we're gonna do is we're gonna increment the number of waiting readers. That's because we are a waiting readers. We're gonna go to sleep on the condition variable okay-to-read, giving it our lock as well. And then if we wake up, we're gonna decrement the number of waiting readers because at the point we've come out of con-wait, we're not waiting anymore. And then we go back and check. And notice something that we're allowed to do WR++ which is not an atomic operation. That's basically WR equal WR plus one. And the reason we're able to do that without worrying about multiple threads is we have the lock. So remember the pattern. The pattern is that we grab the lock and we think of all of the code after we grab the lock as being inside a critical section. So we don't have to worry about synchronizing here, okay? Then if we pass the while loop, we come out, we increment the number of active readers because that's us, we're an active reader now. And then we release the lock. So notice again, I gotta keep re-emphasizing this because we're inside the critical section. We acquired the lock. We know that if we check A-W and W-W and the sum of them is zero, then we can exit down here and add one to the number of active readers without worrying about somebody coming in in the meantime and incrementing A-W or W-W, whatever because we have the lock, okay? And now notice that we've released the lock. So all we've done at the beginning here is this lock, which protects the monitor is really about these entry conditions. And once we get down and access the database, what we know for a fact is that there are no writers in the system and therefore we can go ahead and read. And when we're reading, the lock is off. And how do we end? Well, when we're done reading, we acquire the lock again. We decrement the number of active readers because we're not an active reader. And if there are no other active readers and there's a waiting writer, we go ahead and signal that waiting writer. So we're gonna wake up one of them on the okay to write condition variable and then we release the lock. So we can ask the following question. Why did we release the lock there? Well, I already stated that, but I'm gonna say it again. Basically, we are releasing this lock because the lock is protecting the entry conditions, not the database. And the reason we wanna release this lock is so that subsequent writers, we could have a hundred other writers all make it through this entry condition code incrementing AR each time. And so we could have a hundred readers inside the database. If we didn't release the lock here, then only the first reader would get to go in the database and all the other ones would be stuck here on this entry, okay? Now let's look at a writer. So a writer says, grabs the lock, we're gonna check the entry conditions and what are they for the case of the writer? Well, if there's an active writer, then we know this writer can't do anything. Or if there's an active reader, we know this writer can't do anything. So remember, writers that are coming in can't work with any readers or writers. And so if there are any readers and writers that are currently active, we increment the number of waiting writers and we go to sleep on okay to write. And then later, if somebody signals us we wake up, we decrement the number of waiting writers because we're not currently waiting and check again. Now if it turns out that in the time that we woke up and come back, some new writer or reader has started, whatever, we'll just go back to sleep again and increment waiting writers, okay? So yes, we decrement waiting writers here, we go about the loop, we could increment it again if we're gonna wait again and that's the right pattern because we have the lock and so no other thread could observe the fact that we decrement and re-increment, okay? Assuming, however, that there are no active writers or readers, now we're in good shape we could actually write. What happens? We increase the number of active writers and there's a little exercise for the listener out there. This AW, clearly AW plus plus is only going to make AW equal to one, right? Because we know that AW was zero up here and therefore this increment's gonna increment it to one. Any writers that would cause AW to be bigger than one are gonna be busy sleeping here, right? We release the lock, voila, we're now reading. And because AW is plus plus, if you look at both the writer entry code and the reader entry code, no other readers or writers will come and bother us while we're busy reading and writing the database. The way we exit, of course, is we acquire the lock for the monitor, we decrement the fact that we're no longer an active writer. If there was any other waiting writers, we'll wake them up, wake up one. Otherwise, if there's a waiting reader, we'll broadcast to all of them, okay? And the reason we broadcast the course is potentially all of those writers could access the database simultaneously. So that's why we broadcast instead of Signal because we might wanna wake up more than one writer, reader, I mean, excuse me. And the other thing that's kind of interesting here is notice how I checked the write condition first and then the read condition, okay? You could view that if you like as giving priority to writers over readers. Now why might we wanna do that? Well, for instance, readers typically wanna know the latest data and so by letting the writers all go forward, we make sure that the readers get the latest data. The other thing to point out is that a lot of evaluation of how systems use data see that the writers are a much more rare case and the readers are the more common case and so what we're trying to do is just get the writers out of the system so we can get back to efficient reading. So that seems pretty complicated, so let's simulate this just to see. So what we're gonna do is we're gonna have four threads. R1 wants to read, R2 wants to read, W1 wants to write, R3 wants to read and they're gonna come in in that order. And of course, initially our state variables A, R, W, R, A, W and W, W are all zero because we're starting with nobody in the system. And here's what we go. So let's simulate our way through. So at the beginning, R1 tries to do a read. Notice there's no threads anywhere and so here it is, it runs the reader code, it acquires the lock. It says, is AW plus WW greater than zero? So clearly not because there's nobody in the system. So we immediately increment AR plus plus, see there we go, we're now an active reader. Release the lock. Okay, why release here? Well, so that R2 can come along if we want. And now we're busy accessing the database, read only. So similarly, R2 comes along. Now remember, R1 released the lock and it's busy in the database. So R2 comes along, it acquires the lock. Are there any writers in the system? No. It immediately increments the number of active readers so now there's two of us. We release the lock and poof, we're now running. So notice that at this stage, there are two readers in the database. The monitor lock is not taken, so it's free. And again, the reason we free that lock up is so that all of the threads that come along can go through the entry code and be classified as either able to run right away or be put to sleep on the right condition variable. So we need to leave that open to allow us to implement the policy of sleeping to give us ultimately our reader's right of solution that we want. So now just for the sake of argument, let's assume that R1 and R2 are gonna take a while reading. So they're busy doing their thing and the lock is released, so there's no lock and only AR is done zero. But because they're busy reading, when the writer W1 comes along, then what happens? Well, we acquire the lock. Notice we're in the writer code. Is AW plus AR greater than zero? Yeah, because we have active readers. At that point, we know this writer can't run and so it's gonna go to sleep. It's gonna increment WW plus one. It goes to sleep on the condition variable. And that's it. It's sleeping. It's gonna, of course, the condition variable will release the lock as we know and we're gonna be basically sleeping on the okay to right. So our three comes along, and after the writer came along, grabs the lock, are there any writers in the system? So now notice that the answer is yes. There's a waiting writer. And because we wanna get priority to writers, we're gonna make R3 go to sleep because it came after the writer rather than letting it go through to the database. Okay, that's our policy and it's a policy we've chosen to implement at this time. And so what'll happen is that reader is gonna increment the number of waiting readers and go to sleep. So at this point in time, we now have a reader that's sleeping and a writer that's sleeping and two readers that are busy using the database. Okay. So at this point, R1, R2 still reading. W1 and R3 waiting on okay to write and okay to read respectively. So notice how this monitor paradigm has let us implement a very interesting locking policy that lets us decide how many folks of different types of readers or writers are allowed in the database at once. And we're doing that by basically keeping invariance of the state variables correct so that between times when the lock is acquired and released, they're always entirely consistent such that these variables tell us exactly how many people are waiting, how many people are currently running in the database and let us make decisions on entry point as to what to do with a new threat. So that's what the monitor is protecting. And so let's now finish this example up. So let's assume R2 just finished. So remember R1 and R2 are in the database. R2 finishes, it decrements the number of active readers. So we're now down to one as you see here. And we check are the number of active readers zero? No, so we know that the active readers still in the database. So all we're gonna do is release the lock and go forward and so now we still have a waiting reader and a writer, waiting writer but the database has still got a reader in it. So finally, R1 is done, grabs the lock, decrements and now AR is zero. And so when we check this condition we can say, well if AR is zero and there's a waiting writer which there is, then voila, we're gonna signal okay to write and now all of a sudden we go to the writer code, it's gonna wake up because the signal will wake it up of con weight. We're gonna decrement waiting writers because we're not waiting right now, this writer. Okay, it's gonna go back and check and it's gonna see that AR plus AR is zero at which point an active writer gets incremented and we run and so now notice in this situation the first two readers came and went. This active writer is in the database and AW is one and we still have a waiting reader. So sometime later, the writer's done, grabs the lock, excuse me, decrements the number of active writers. Okay, so now we're in an interesting scenario here where we have no active writers and no active readers but we have a waiting reader and so now what happens? You could say, well, what if another reader comes along? Well, we've got the lock right now so any new readers are gonna get stuck at the entry code until we're done but let's follow this through. So is the number of waiting writers greater than zero? No. So the number of waiting readers is greater than zero and so we're gonna broadcast to wake up just this one reader but there could be more in some scenarios. Okay, and what happens in that scenario? Well, the reader comes out of condition weight, decrements waiting reader and we're gonna go back and check and there'll be no writers in the system so we're gonna increment the number of active readers. We're in the database and now the only thread that's currently in the database or waiting for the database is just that one reader. It's gonna finish, decrement the active readers, release the lock, et cetera and voila. We have successfully controlled the entrance to the database so that we can have multiple readers single writer at a time and writers are given priority over readers when both of them are waiting. Okay, so some questions you might ask is can the reader starve? So if you look at this reader entry code it basically says as long as there's either an active writer or a waiting writer then the reader is forced to go to sleep and so the answer is yes, readers could actually starve but we've designed the code this way. Here's how the starvation happens. If you have a writer that's busy in the database before it finishes another writer comes along. Doesn't matter how many readers are in the system we're gonna keep giving preference to the writers and so you could have a stream of writers that come through basically never allowing readers to go forward. So that you could decide that that was either good or not, presumably if writers are very low density then it doesn't happen very often and it just becomes a very temporary live lock and the readers get to move forward. If you're more worried about that then you could have something more sophisticated with cues of readers and writers that are related to each other in the order in which they arrived to sort of give a FIFO ordering on readers and writers. We've even asked that on midterm sometimes but this particular simple scenario basically gives precedence of readers over writers. You might also ask something interesting here. So if we look at the reader exit code, when the reader is done it decrements active reader and then says gee what if there's no active readers and the waiting writers are greater than zero then we signal a writer. What if we delete that? Well then what's gonna happen is we're always gonna signal waiting writers regardless of the situation and you could say well that might be bad because the writer wakes up and there might still be readers there. What are we gonna do? And the answer is it doesn't matter because the write code will once again, because remember that while loop it'll check the condition and notice gee there's other active readers so it's just gonna go immediately back to sleep. So from the standpoint of correctness the fact that we have that while loop for Mesa scheduling actually allows us to be lazy about our signaling to the point that we could say well as long as we make sure to always at least signal when it's time to be signaled if we over signal it's okay because the entry conditions will get it right. All right. Further what if we turn signal into broadcast? So if there's a hundred writers that are all sleeping and at the end of the reader we just wake them all up. Again the entry code will fix it for us because the entry code you go back to the while loop each thread at a time and each writer says oh are there any active readers? Well it'll just go to sleep and they'll all go to sleep in turn. Or while there are no active readers one of them might get to run and then all the remaining ones in their while loop will notice that gee there's active writers it'll all go to sleep. So again because of the while loop for Mesa we can be a bit lazy and broadcasting instead of signaling is definitely an example of lazy to make sure we can be lazy and still get the correct results. And so that's kind of an advantage of the Mesa style while loop. I will point out by the way that in cases where you're not sure whether you need to signal or broadcast because things are maybe complicated or confusing you can broadcast if your entry conditions work it out. Now normally in an exam or on homeworks we're hoping that you have a better idea whether you should signal or broadcast but in real life if there's a situation where maybe a single and I'll show you one of those in just a second where a signal queue is actually got multiple things on it and you're not sure which one needs to be signaled then you could broadcast and the entry conditions would work it out. So finally just to really push this laziness idea across what if we only have one condition variable and we let all the readers and writers sleep on it. So our code for readers and writers might look like this where we have the same entry conditions, et cetera except we always sleep on okay to continue rather than having two separate condition variables and by the way on the weights I don't have the address of the lock in here because I didn't have space but does this work? Okay, well almost, okay, it almost works but consider a scenario where R1 arrives and is using the database and then we get W1 followed by R2 both get put on this okay to continue, all right. So in our proper scheduling of this when R1 finishes W1 on a run, okay because it's the writer. However, if you look at the way this exit code goes it basically says that if there are no readers anymore there are no active readers and there is a waiting writer then we signal one thread except that if that one thread that gets signaled is R2 then all that's gonna happen is R2 is gonna wake up notice there's a waiting writer and go to sleep and now the system is essentially deadlocked and that waiting writer will never wake up and so by switching to fewer condition variables we have to be sloppy about our broadcasting or sloppy about our signaling so that we end up broadcasting here instead of signaling and so for instance we change this to broadcast and this to broadcast where basically if there's any chance there's somebody that needs to run we wake everybody up if there's any chance there's somebody that needs to run we wake everybody up and the entry conditions up here are gonna fix it for us. Okay, so now let's finish up the lecture here because it's getting a little bit long but can we construct monitors from semaphores? Well, the locking is easy we just use a mutex, a semaphore mutex to make the lock but what about this for weight and signal, okay so weight might be semaphore P and signal might be semaphore V this seems like it might work except it doesn't and the problem here for instance is that weight can sleep with the lock held and if you remember with semaphores that leads to deadlock so if you grab the lock and then do a semaphore P to go to sleep you're gonna deadlock the system the other thing that's a little more subtle here is if you signal a bunch of times on an empty queue what's supposed to happen in the monitor condition variable case is nothing whereas here semaphore V is gonna increment multiple times and what that kinda does is it says well if I increment 12 times then 12 weights are gonna be just ignored because the P's just gonna decrement down toward zero so this is clearly not the right implementation of condition variables. What about this? Weight might say release the lock, do a semaphore P, acquire the lock, all right so that takes care of the issue of holding our lock and going to sleep. However it doesn't fix this situation with signal because signal you could still do multiple signals and sort of bank up weights for later so it doesn't quite work and so condition variables have no history and the semaphores have history, okay so if a signal, if a thread signals and nobody's weighting it's a no op in the monitor case whereas in the, if the thread later weights it off it definitely weights, okay so multiple signals followed by a weight is a guaranteed weight in the monitor pattern whereas in the case of semaphores multiple V's with no weighting we always increment whereas multiple P's do a decrement and continue and so these are not parallel to each other. So the problem with the previous try is that P and V are commutative and the results the same no matter what whereas condition variables are not commutative between signal and weight. Does this fix the problem? So here we have weight like we did before release the lock, do a P, acquire the lock and signal says that if the semaphore Q is not empty then do a V, what's wrong with this? Well if you remember when I introduced semaphores this is not a valid API so you're not allowed to actually say if the semaphore Q is not empty that's not part of the API so you couldn't even do this if you wanted. So however and there's a race condition here between the signalers slipping in after the lock and so on so there are some issues with this still. However it's actually possible to do this correctly. There's complex solution for horse scheduling in a number of different books and the question might be can you come up with a simpler Mesa scheduled solution using semaphores to implement a monitor and the answer is yes. All right so conclusions on monitors are really the following monitors represent the logic of the program you wait if necessary, signal when changing something so that any waiting threads can proceed to check their entry conditions again. The basic structure of a monitor based program looks exactly like this where you lock, you check conditions in sleep and then unlock, you do something and then you lock, signal and unlock so a lot of your monitor solutions are gonna have this pattern and if you look back to the reader writer example it had this pattern sort of an entry condition and an exit signaling. Okay so I'm basically done but if you bear with me for a second I wanna mention a couple of things so in the C language, C language has somewhat limited support for synchronization and one issue here really is for instance that if you acquire a lock and there's some exception you know what you need to do is you need to catch the exception and release the lock before you return otherwise if you acquire a lock and then return on an exception the lock would be acquired and potentially it never gets released and so that's a bug. So in C you really have to think carefully about how to use acquire and release in a way that never holds it on exception conditions. And where this gets really tricky is with set jump, long jump. So set jump and long jump are very similar to the way exceptions are handled in say Java where here's a call stack and we run procedure A and then B calls set jump which is really kind of like a catch in other languages where I say well if there's a bug I'm gonna come back to where the set jump was and then we run a bunch of code and for instance the procedure C may acquire the lock C calls D, D calls E at this point there's actually an exception and if you use set jump, long jump for this the way you handle the exception kind of like a throw in Java is you do a long jump which would bring you back up to procedure B and notice that I've bypassed C so any place presumably the release is inside C and this just breaks. And so you gotta be really careful if you've got non-local exits going on here in C. C++ is a little better. So C++ that actually have exceptions allow you to do things like this where you could say acquire, do foo, release and do foo might have an exception which throws an exception and if you build your code like this so we have exceptions that are native you're actually gonna exit out of this without ever doing release. And so once again we have to be careful in C++ as well but there's a way to handle that where we might do the equivalent of try catch where we do the acquire and then when we catch we do the release in the catch and throw that back out again so this makes sure that we never leave with a non-local exit in a situation where the lock is still acquired but even better is the auto pointer facility which you should look up where we don't even acquire a lock explicitly what we do is we put something on the stack such as auto pointer saying acquire the lock and if there's any reason that you ever exit the return function it releases the lock automatically without all this try and catch mess. Java support for synchronization is also very interesting so for instance in a bank account example the class account is an object and every object has its own lock and inside that lock then you can do things like public synchronized says basically when I run the get balance operation I grab the lock first do whatever I'm gonna do and release the lock or synchronized deposit says grab the lock I do balance plus equal amount release the lock and so every object in Java has a lock in it that you can get at with things like synchronized and furthermore you can have a synchronized statement so the synchronized statement basically says go ahead and grab the lock and do this thing in the middle with the lock acquired okay and you can do that inside your code for your object and notice that synchronized has an object specification so it knows which lock to use and the good thing about Java is Java is intended to support this directly so if you have a do foo function that throws an exception then in a synchronized method what happens is if do foo throws an exception and exits synchronized oops sorry then automatically the lock will be freed okay so that's kinda clean in addition to the lock every object has a single condition variable okay and so you can use this with weight so how you signal in a synchronized method or block is instead of a signal it's notify or notify all and so you can do something like this where while some conditions not yet true do a wait and then keep looping and somebody can signal you with a notify to wake up from weight okay alright so in conclusion sorry I went a little bit long here but I wanted to make sure that you had some extra words in there since you can't ask questions during this but we have a very important concept of atomic operations which are operations that run to completion or not at all these are the primitives on which to construct various synchronization primitives we talked about hardware atomicity primitives like disabling of interrupts, test and set, swap, compare and swap, load, lock, store, conditional and showed a number of different constructions of locks and in those constructions we showed you have to be very careful not to waste or tie up machine resources excuse me so you shouldn't disable interrupts for a long time and you should never spin weight okay and the key idea on how we fixed that is we separated the lock variable which was a memory location from the hardware mechanisms that are used to implement locking implement a lock with that variable okay and then we introduced two higher level operations semaphores and monitors so semaphores are exactly like integers with a restricted interface they have a P operation which weights of zero decrement when become non-zero and V which increments and wakes a sleeping task if it exists you can initialize the value to any non-negative integer but after that you're not allowed to look at it okay and you use a separate semaphore for each constraint monitors are a lock plus one or more condition variables always acquire the lock before accessing shared data and then you use the condition variables to weight inside the critical section and again this is a little different from all the other things like semaphores and locks where if you grab a lock and go to sleep for some reason then things deadlock in the monitor you absolutely grab a lock and then potentially sleep on condition variables inside the critical section and so I hope you appreciate given our readers writers example that monitors represent the logic of the program weight if necessary signal when you change something so any weighting threads can proceed and it's really a it's a pattern it's a way of constructing synchronized code all right well thank you very much hopefully we'll pick up on Tuesday with a real lecture assuming that the projector is working now thank you