 Let's get started. So this is lecture number six for computer science 162. So what are we going to talk about today? A lot of material to get through. I think I have 86 slides. That's like a slide a minute. Yeah, we go. Lots of animations. So I'm going to do a quick recap of locks. And there was a lot of questions about Mesa monitors. And at the end of last lecture and why we need to use a while instead of an if. So I'm going to go through a nice little animation that will hopefully explain the difference between horse style monitors and Mesa style monitors. Then we're actually going to use monitors in a simple application, a reader's writer's problem. And then I'm going to talk about something probably very near and dear to everybody. Or if it's not, it will be by the end of the semester, which is how you successfully program in a large project OK. So remember that a thread has to wait if another thread is in a critical section. So we have a critical section here. The way we make a thread weight is by using locks. That's one of the ways. So you acquire the lock before you enter the critical section. You release the lock when you exit the section. If another thread's in the critical section, you have to wait. And because these critical sections are user level code and could be arbitrarily long, we don't want to sit there and spin. Instead, we want to sleep. So one way of implementing this is with code like this. We started by looking at this as an example. So here we have a lock that starts with a value of free. Acquire checks to see if it's busy. If it's busy, then you put the thread on a weight queue and you go to sleep. Otherwise, you set the value to busy. And release says if there's anybody waiting on the queue, you take a thread off the weight queue, place it on the ready queue. Otherwise, you release the lock, market to free. Now, of course, this won't work as written. Why? Because there's lots of race conditions because multiple threads could come in here at the same time and think they end up having the lock. So both of these need to be critical sections. So the way we did that was by basically creating a, so this is kind of meta-level. This is like when you learn Lisp or something. We've got a critical section within here which is used to control access to this critical section. But there's some very fundamental differences between these two critical sections. The fundamental difference is this critical section is user-level code. It could run for an arbitrary amount of time. You could compute the first 200 million digits of pi. This code we know is gonna run for a really short time. So as a result, we're gonna implement this using a spin lock. So we'll do test and set on a guard variable. So guard initially starts out at zero. The guard is protecting access to the critical section in acquire and in release. Which in turn, these two critical sections are used to control access to this critical section. Again, busy waiting is acceptable here because this is a short amount of code that we're running here. This is like maybe a thousand instructions or several hundred. Same thing here, several hundred to a thousand instructions. So the only case where this would not be acceptable is if we had thousands of threads that would come into the system at the same time. Because each of those threads would end up spinning while it tried to get access to check the state of the lock variable. But in most cases where we're only gonna have a small number of threads, the amount of spinning that each thread is gonna do is gonna be relatively small. So it's acceptable in this case. Not in the general case, not to protect access to user level code. Yeah, question. Yeah, so it's a very good question. So the question is, if we're relying on this notional use of this spin lock being okay because the number of threads is gonna be small, why couldn't we just say, well the critical section for the user code is also gonna be small. This is a design decision. So in the case of the user code, we really don't have any control. The user is using our locks and they can do whatever they want within that critical section. So it really is unbounded. Whereas here, this will work in cases where we have a moderate number of threads. So that's gonna be a declarative on this is use this lock if you only have a moderate number of threads. If you have more threads, then we're gonna have to use an approach where you go to sleep. And it would be more general purpose in most cases to just simply say, all right, any time we're gonna wait, we're always going to go to sleep. So you could sleep waiting to get access to the lock. You could sleep then waiting to get access to the critical section. The real solution is we use higher level primitives like in Java and all, which implement that for us and we'll do the sleeps for us. We'll see that in the next lecture. Okay, so again, spin lock in this case is okay because it's a very small critical section and we're making the assumption which can get us into trouble that there's a small number of threads, not a large number of threads. Yes. Okay, sure. The question is, can I explain the alternative of sleeping? So rather than doing a test and set, we could do the same sort of thing which is we could go to sleep if we find that it's busy. So we could, for example, spin for short time and then decide to go to sleep. I think it's very complicated to implement something like that because now we could be sleeping, waiting to get into the lock or we could be sleeping, waiting to get into the critical section. So again, much easier solution is just to use a higher level primitive, like a language level primitive. That will handle all of that for you efficiently. Okay, so there was also a lot of confusion about why we need to use a while instead of using an if when we use Mesa style monitors. So the key question is this queue is empty. So when we initially come through to remove something from the queue, we acquire the lock and then we do this test to see is the queue empty. If the queue is empty, then we wanna wait on the condition variable. In this case, it's data ready and we're going to give up the lock when we go to sleep. All that has to happen atomically, wait on the queue, go to sleep and giving up the lock. With whore style monitors, when someone adds something to the queue, they're gonna signal the condition variable. And what's gonna happen is the condition variable, if there's someone waiting, is immediately going to transfer control over. So now this guy has woken up, has the lock and there's an item in the queue for them to dequeue. With Mesa style, all the signal does is just schedule the threat. So it puts it on the ready queue. And then at some point later, the OS scheduler is going to run that threat. But the ground rules could have changed as we'll see in a little animation in just a moment. So with Mesa, we have to replace this with a while. It would be safe to also replace it with a while in a whore style monitor because you're gonna return the condition is gonna be true and you'll immediately exit. But it might not always be true with a Mesa style monitor. And again, the reason why whore style monitors are very easy to reason about and very easy to examine algorithmically. So that's what you typically see in a textbook. But from an implementation standpoint, we wanna give the operating system scheduler the ability to decide what threat it's gonna run next because it might have to meet some deadline scheduling or real time constraints or priority constraints or fairness constraints, as we'll see in a couple of weeks. Okay, so we start out with the queue is empty. This is our monitor. So our monitor, the lock is free. Nothing is waiting on the monitor's queue. And here's our CPU. And running right now is thread one. There's nothing in the ready queue. So the only thing running on our system right now is thread one. Okay, so thread one starts removed from queue. It's going to acquire the locks. So now the lock is busy. And it's gonna check if the queue is empty. It is. So we're gonna have to wait. So we've given up the CPU. Nothing is in the ready queue. And we are waiting on the queue for the monitor. So now another thread's gonna come along and that also releases the lock. Another thread comes along and runs add to queue. So it's gonna put something into the queue. So it acquires the lock. So now the lock is held by T2. It's the running thread. And it adds an item to the queue. Now it's going to signal. Now signal picks one of the items that's waiting and one of the threads that's waiting and puts it onto the ready queue. So there's only one here. So it's going to put thread one onto the ready queue. And then it releases the lock. But this isn't an ideal world. And so what's actually gonna happen is thread three is gonna come along. So thread three is now added to the ready queue. And maybe this is a scheduler that prior arises by whoever's had the least amount of CPU time. So since thread three is new, it gets to run first. So thread three starts, so first T2 exits and releases the lock. And then thread three starts running. So it gets the CPU first. So what happens? It's going to acquire the lock. And now it's gonna check to see is the queue empty. Is the queue empty? No. So it's going to dequeue an item and now the queue is empty. It's gonna release the lock and exit the system. So now we only have one thread ready to run. So that's the thread we're gonna run next. And thread one now starts running. And it dequeues an item from the queue. Oh, but there's nothing in the queue. So that won't work. So this is the reason why we have to replace the if with a while. So the while guarantees that even if we don't get scheduled immediately, all that will happen is we wake up, we run, we see the condition doesn't hold, we go back to sleep. So another challenge with using Mesa style monitors is there is the risk of starvation. That is it could be the case that thread one just is always unlucky. And as soon as something gets added to the queue, someone else comes in and grabs it from the queue, thread one gets woken up and goes right back to sleep. So that is a potential downside. Yes, question? Yeah, if you had a, so the question is, couldn't we get something more style if we were able to transfer the lock instead of the CPU? Yes, you could do that. So you could implement a version of Mesa style monitors where you're not gonna control the scheduling, but you're gonna control the transfer of the lock to a particular thread. The downside with that is now you're interfering with what the scheduler might be doing. And you might end up transferring the lock to a low priority thread where there was another higher priority thread that came into the system and should get the lock instead. You could end up with a form of priority inversion. So that's the main reason why, whore style monitors give you a lot of control. The signaler gets to control who gets the access to the monitor next. But the downside is, if you have a scheduler that's trying to do something, the operating system scheduler loses control. But you can implement whatever you want. So we check again. The last thing is, if you're interested in how Mesa actually does make all of this work, Mesa style monitors, you can read a paper by Butler Lamson who's a Turing Award winner that tells you all about the innards of how monitors work in Mesa. It's not too long and it's a pretty easy read. Okay, any more questions before I go into using monitors? Okay, so let's look at an example of using a monitor. So we're gonna have a shared database and we have two types of users that wanna access this database. One type of user is a reader. Readers, as the name implies, never modify the database. The only thing they're ever gonna do is read things that are in the database. We also have writers. And writers are both going to read the database and they're going to modify the contents of the database. So first question I can ask is, what is the right granularity of locking and control on the database? So is it sufficient to just simply put a single lock on the database? No, maybe? Yeah, it is sufficient. It will work, but it's like that too much milk lecture. What happens if you want orange juice? The single lock is going to be irritating. In particular, we'd like to be able to have many readers in the database at the same time because they don't conflict, right? Readers are just reading. So we could put a single lock on the database but we're gonna eliminate any concurrency and that's not good. On the other hand, we can only ever have one writer at a time. So from a writer point of view, a single lock would be sufficient. But from a reader standpoint, it's gonna be incredibly inefficient. So we don't wanna do that. So a monitor instead is gonna be a better choice. All right, so now we're ready to write our solution. Right, not quite. First thing we always do before we write a solution whenever we have a concurrency control problem is we figure out what our correctness constraints are. So our correctness constraints here are that readers are allowed to access the database as long as there are no writers in the database. And writers can access the database as long as no one else is in the database, a reader or a writer. So the database has to be completely quiesced before a writer is allowed to enter. And we wanna make sure that only one thread manipulates the multiple state variables that we're gonna need to keep track of how many readers and writers and what they're doing. So basic structure of our solution. So a reader is going to wait until there are no writers and then it's allowed to access the database. And when it's done, it's gonna check and see are there any writers waiting to get into the database? If there is, they're gonna wake up one of those writers. A writer is gonna have slightly different code. So the writer is gonna wait until the database is completely quiesced. So no active readers or writers in the database. Then they can go and access the database to read, modify, write. And then when they're done, they're gonna wake up either waiting readers or one of the waiting writers. Now, to keep track of all of this, you can see we're gonna need to keep track of who's in the database or waiting to get into the database. And so we'll have four state variables. So active number of readers, active number of writers, the number of waiting readers and the number of waiting writers. And then we're gonna have two condition variables. One is gonna be okay to read and the other is gonna be okay to write. So you can kind of see where I'm going with the solution. So the code for the reader is gonna look like the following. The first thing we need to do is check ourselves into the system. To do that, we need to figure out is anybody else in the system? Now remember, one of our correctness constraints was only one thread is allowed to manipulate the state variables at a time. So we have to acquire a lock to manipulate the state variables. This is not a lock to get into the database. This is just to look at the state variables. Okay, so we look at the state variables. And our condition is, we wanna make sure there are no writers in the database. And we're gonna add another condition, which is to say that there are no waiting writers either. And I'll come back to this in just a moment. But we're gonna see, is there either a writer in the database or a writer waiting to get into the database? Now, as long as that is true, that there's either a writer in the database or one waiting to get in, we're gonna wait. So the number of waiting writers gets incremented by one and we wait on the okay to read condition. Releasing the lock, yes. I guess I don't have to repeat that question for the podcast. Okay, so if there are no active writers or waiting writers, then we get to go into the database. So we become an active reader. Now we're done manipulating the state variables so we get to release the lock. Now we can read the database to our hearts content, whatever we wanna do. Anything except for writing in the database. When we're done, what are we gonna do? Check out. So again, we're gonna manipulate state variables so we have to acquire the lock. We're now gonna decrement the number of active readers. And now we're gonna check to see if there are any active readers left in the database. If there are no active readers in the database and there's a waiting writer, then we're gonna wake up one of those writers by doing a signal on okay to write. So why are we waiting if there are any waiting writers? Because we're making a decision here to bias our algorithm for admission control. So we're biasing it in favor of writers because the argument is that writers have to wait until the database is completely empty. So if we have a continual stream of readers coming in trying to use the database, then a writer will get stuck indefinitely in starve. They won't be able to get into the database. So what we're gonna do is if there's a writer waiting to get in, we're going to make subsequent readers wait until that writer gets to go through the database and then those readers will be allowed to enter the database. Now this could also cause starvation for the readers. If we have a constant stream of writers coming in, the readers are just gonna have to sit and wait. So again, this is sort of like that spin lock earlier. This is a design decision. We're making an assumption about how our database is being used. If we get it wrong, we could end up with things like starvation occur. All right, okay. So again, we release the lock there because the lock is only protecting the state variables, not access to the database. Access to the database is protected by the state variables and the conditioned variables, by the monitor. Okay, so what's the code for a writer gonna look like? It's gonna look somewhat similar, but with some very important differences. So again, we have to check into the system. We're gonna modify the state variables, so we have to grab the lock. Then we're going to check the number of active writers and the number of active readers. If there are any, then we become a waiting writer and we wait on okay to write and release the lock to go to sleep. Otherwise, we become an active writer and we release the lock. Now we're free to modify the database, read it and write it, and then eventually we're gonna be done. So again, we need to check ourselves out. So to do that, we have to grab a lock before we manipulate the state variables. We're gonna decrement the number of active writers. If there are any waiting writers, so again, we're prioritizing writers. If there are any waiting writers, we know they're no active readers because we were just in the system, so we don't have to check that. If there are any waiting writers, we're gonna signal one of them to start running. So they're gonna get access next to the database. No readers can come in because there was a waiting writer and so they're not gonna be able to proceed. They'll just go to sleep, yes. There are waiting readers. So the question is, how are we favoring writers on this slide? So the thing is, if we find that anybody is waiting, if there is an active writer in the database, we obviously can't go in the database. If there's a writer waiting to get into the database, then we're going to wait, right? Oh, that's the one thing that favors writers. That's one thing that favors writers. The other thing that's gonna favor writers is down here, well actually this doesn't favor writers. It's on the other slide that it favors writers. So here on the reader slide, we favor writers by waiting to enter the system as a reader if we see that there are other writers waiting to get in. On the writer slide, we favor writers at the exit when we check out. We wake up the writers first and then if there are no waiting writers, then we'll go and do a broadcast to wake up all of the readers. If there are any waiting readers. So now, why do we do a broadcast here where everywhere else we've always done a signal? Why is it all right to broadcast? Yes, precisely. It's okay for us to wake up all the readers. In fact, that's exactly what we wanna do, right? We want all the readers, now that the writers are gone from the house, the readers can wake up and enter the database. Okay, yeah, that's absolutely correct. So if we wake all of the readers here, yes, they are the downside of animations. They're going to all have to wait. You know, first one will acquire the lock and check into the system, become an active reader and release the lock, then the next one will come through, then the next one will come through, then the next one will come through, and so on. But some of them are gonna have to wait at the lock acquire because someone else got there ahead of them. So if we have 20 readers, each one will be serialized one at a time entering the system. And then they can all concurrently access the database. That's a very important distinction. Again, the lock is only protecting access to our state variables and serializing access to those state variables so that access to those state variables becomes an atomic operation. Was there a question up here? Okay, so I think I already answered my own question, which is why do we give priority to the writers? We're trying to avoid starvation of the writers. But if there are too many writers, the writers will starve the readers. So we're making an assumption here. Okay, so now we can actually run a simulation. So we're gonna assume we have the following sequence. We have a reader that checks into the reader one that's gonna come in and read the database and reader two that's gonna come in and read the database. And then while those two are running, writer one's gonna come along and then reader three is gonna come along. Now initially our system is quiesced so there are no active readers, no active writers, no waiting readers, no waiting writers. All right, so first, reader one comes along. So it acquires the lock and it does the check. So active writers is zero plus waiting writers is zero and so we don't have to wait. It's not greater than zero. So we become an active reader, we increment the number of active readers. Release the lock, we can now access the database. Okay, now along comes reader two. We acquire the lock, we do our check, zero plus zero is not greater than zero. So we become an active reader, active reader now is two. All right, release the lock. Now along comes and access the database. Now we're gonna assume it takes, excuse me, we're gonna assume it takes a while to access the database. So we're gonna assume the locks for the state variable have been released, but only active reader is non-zero. Yeah, is there a question? No, we didn't have to increment waiting readers because we didn't have to wait, right? The only time we're gonna increment waiting readers, in fact we'll see in just a couple more animations, is when we have an active writer or a waiting writer, the second thread. So we've got two threads now that are in the system. Reader one is reading away, reader two comes in and is reading away, right? So we're assuming both of these readers take a long time. So they're running in the database. But they're only doing read operations so we know there can be no conflict. Yeah, question in the back? All right, so no locks are being held because none of the threads are manipulating the state variables and two active readers. Okay, so writer one comes along. And remember, reader one and reader two are still in the database, so active readers is two. So it acquires the lock. Then we do this check, okay? So active writer, that's zero, that's good. Active readers, that's two. Zero plus two is two is greater than zero. All right, so in this case we are gonna have to wait. So we become a waiting writer and we wait on okay to write. Release the lock, go to sleep. All occurs atomically. So we have to sleep. So now along comes reader three. We have reader one and reader two in the database still and writer one waiting to access the database. So one waiting writer to active readers. So we acquire the lock and now we do our check. So now we end up with zero plus one, which is greater than zero. So we're gonna become a waiting reader. So waiting reader now gets incremented to one. So two active readers, one waiting reader, one waiting writer. All right, so we wait and we go to sleep and we release the lock, all three atomically. So where are we now? We've got two readers in the database. We've got two threads that are waiting. Writer one is waiting on okay to write. Reader three is waiting on okay to read. So eventually reader one or actually, I guess reader two is gonna finish first. So it'll acquire the lock. It'll then decrement the number of active readers and it's gonna check the condition. So there is one active reader. So condition can't hold since it's an end. So we release the lock. Eventually reader one is going to finish. So again, it's gonna acquire the lock to manipulate the state variables, decrement the number of active readers. And now when we perform this check, what are we gonna find? Well, it is true now that active readers, zero equals zero, so that's one. And waiting writers is one and one is greater than zero. So this condition holds true. So we're now going to signal someone waiting on okay to write. Only one thread waiting on okay to write and that's writer three. So a writer one rather, so writer one will get woken up. So reader three is still waiting though. We gave priority to the writer to wake up first. So now the writer gets a signal. It's going to check to see are there any active writers or active readers and find that that's zero. So it's gonna exit and we now become an active writer. So we now have zero active readers, one waiting reader and one active writer. Yes, question. Yes, so it'll, when you come back from wait, you automatically compete to acquire the lock. You're not given the lock automatically, you have to compete to get the lock. That's part of coming back from okay to wait. When you leave that, yes, you hold the lock. When you get woken up, you're going to compete and it may take you a while to get the lock but when you return from wait, you'll have the lock. Question over here first. Oh, nope. Question is, is there a difference between the lock release down here and the other release? No, they're the same. I thought I caught, there was another typo that I had fixed already. Yes, that's correct. So after, or mostly correct. So after wait return, when you get signaled, okay so when it gets the signal, we're now gonna put that waiter thread onto the ready queue. So it now is ready to be scheduled by the CPU but there may be many other threads so it may take a while before we finally get scheduled. Meanwhile, other readers may be coming through and trying to manipulate the state variables so they're gonna be grabbing the same lock that we're contesting for. So when we eventually get scheduled, we may find, oh, the lock is busy and so we have to wait. But eventually, we're gonna get the lock and then we're gonna be allowed to proceed. Sure, so the question is what does it mean to have two readers in the database? So because the readers are only reading the database, not modifying it, they're always gonna get the same answer. There is no race condition that can occur between readers because as long as nobody else is modifying the data and that was our condition, only writers can only enter when there are no readers in the database. We're guaranteed that the readers are always gonna get the same values regardless of what ordering or interleaving or when context switching occurs. Now if we were to allow a writer into the database at the same time, all bets would be off because the reader might accidentally read something that was a partial right from the writer. So that's the reason why writers get exclusive access to our database is because we don't want anybody else to see what they're doing. So it makes all of their actions be atomic. Either the reader sees the data before the writer gets into the database or the reader sees the data afterwards. But multiple readers, perfectly fine. Everybody will see the same result regardless of the interleaving, yes. So the question is if the writer gets the lock and then goes to sleep with that block everyone else. Absolutely, and so that's why when you wait, it atomically does three things. It puts you on a queue for the condition variable. It releases the lock and you go to sleep and all of that has to happen atomically because otherwise context switch could happen at a bad time and weird things would happen. That releases the lock. So when we come back, again when we come back from wait, it's just as if we had tried to enter here. So we're gonna have to acquire the lock and wait will do that automatically for us so it's gonna call lock acquire. So the way to think about it is what does wait do? It puts you on a queue for the monitor. It releases the lock, it puts you to sleep. When you come back from sleep, it does a lock acquire and then it returns. So conceptually that's the way to think about it. Lots of ways to implement it internally but conceptually that's what it's doing. So when we come back from wait, we know we'll have the lock. It just may take a long time to return back from wait because we've also been sleeping potentially. Was there another question? That's absolutely correct. When we call a broadcast here, we're taking all of the readers and we're putting them on the ready queue and then they'll all compete each to acquire the lock in okay to read wait and one at a time they'll get the lock, they'll check the condition variables in the while loop and decide whether the state variables rather decide whether they can go to sleep or continue. And if a writer came along in the middle of that, they're all gonna go back to sleep. So they'll all wake up and then they'll all go back to sleep, yes, that's right. When you come back from wait, you're gonna hold the lock because you're gonna have competed inside wait. You're gonna have called lock acquire to compete and get the lock. So again, this is a difference between Mesa and Horstile. With Horstile, the signaler passes you the lock. So there's no competition to get the lock. They pass you the CPU. So you're scheduled, you have the lock, you can proceed immediately. But with Mesa style, all you do is you just drop it on the ready queue and so it may take a long time before you actually get to run and when you run, you may have to compete to get the lock and that may take a long time also. But it simplifies the design of the operating system and it gives control over scheduling to the operating system, okay? All right, so we get the lock. We decimate the number of waiting writers. So that goes to zero and we check our state variables. No one has come along and entered the database. So we can become an active writer and we release the lock and now we can write in the database. Eventually we're gonna check out and so we'll acquire the lock, decrement the number of active writers. We will check if there are any waiting writers. There are none. So now there is a waiting reader. So we're gonna do a broadcast and wake up that reader. So we signal reader three, which now gets the signal. Again, it's gonna test to get the lock. Once it gets the lock, it's going to decrement the number of waiting readers. It's gonna loop back to do the check. There are no active writers, no waiting writers. So we can become an active reader. We release the locks because we're done with the state variables. We modify the, we read the database. No modification allowed, okay? Then eventually we're going to check out, decrement the number of active readers. There's no active readers, but there's also no waiting writers. So there's nobody to wake up. So we release the lock and we're done. Yes. So the question is, when we do this, okay, to write signal, which of the waiting writers do we wake up? It's gonna be dependent on the implementation of the monitors. If you have something like, I don't know, priority inheritance, that might cause you to sort that queue by priority. Ah, the question is why do we signal before we release the lock? So that's a good question. You don't technically, if you read the barrel paper, you'll see he says you don't have to hold the lock when you're manipulating state variables and doing signals and things like that, but for making sure your programs are correct, it's probably better to do that. You can end up with contention, right? Because you can signal, then immediately get context switched. They come along, they try to acquire the lock, they can't acquire the lock, they go to sleep and then you release the lock and then they acquire the lock. So it's slightly less efficient to have the signal inside, but it's correct. You know it's always gonna give you correct behavior. Okay, so we can ask some questions now. So let's see how much everybody is really understanding this. So I'm gonna take my big marker pad and I'm just gonna start deleting things. So what happens if, for example, I remove this line? Okay, so this was the line that was checking to see if there were any active readers in the database and waiting riders. And if there were no active readers and there was a waiting rider, we woke up one of those riders. So now I'm just gonna remove that. So when a reader checks out, it's gonna call okay to write signal. Is this gonna work? Okay, so I hear a no. Why is this not gonna work? Yeah. Ah, okay, so the noes say that even though the reader is checking out, there might be other readers and so the rider might be allowed to proceed. Does someone from the yes have a counter argument? Yes. So this is, again, this is the very important distinction between if and while. Seems like such simple language constructs. If, oh, bad choice of words. Had we used an if here, we would get the wrong behavior, right? Because the rider would wake up, think, oh, it's fine for me to proceed. They'd enter the database and they'd be writing on the database while there could be active readers in the database. But because this is a while, what's gonna happen is we're gonna wake up, decrement the number of waiting writers and then immediately check, right? And if there are any active readers, we're just gonna go back to sleep. We're gonna increment the number of waiting writers and go back to sleep. So the answer is this will work, but it's inefficient, right? Because we could have unnecessary context switches and unnecessary wakeups and go back to sleep of writers if there are other readers in the database, yes. Would it potentially, what? Ah, yes, so yes and no. So we won't necessarily, the question is wouldn't this just flood the kernel and lead to lots of unnecessary context switches? So it will generate a lot of context switches. It may not flood the kernel, but if we do have a lot of readers that are in the system every time and want just a couple of writers waiting to get into, yes, every time one of those readers leaves, it's gonna wake up the writers, they're all gonna look to see can they get in, they're not gonna be able to get in and they're gonna go back to sleep. So yes, there will be unnecessary context switches, that will be a drain on performance, because the CPU is gonna be doing wasted work. Waking up a writer when they can't proceed is not going to help you from a performance standpoint. From a correctness standpoint, this is still a correct implementation. Just not gonna perform as well as one where we do have that check, yeah. If waiting writers is zero, what happens if we call okay to write signal? It's a no op, so nothing will happen. The queue's empty so there's nothing to actually signal. Okay, let's try another one. So what happens if I change that signal to a broadcast? Is this gonna work? I hear some yeses, I hear some noes. Let's hear from this time a yes. Why will this work? Exactly, so this will work because the first thread that comes through, the first writer that gets scheduled is gonna come through and find that active writers and active readers is zero and they're gonna then proceed. The other ones are gonna go back through the while loop and find, hey, somebody beat me here and they're gonna go back to sleep. So again, this will work from a correctness standpoint but from a performance standpoint, we're gonna have a lot of writers potentially that wake up and discover they can't proceed and go right back to sleep. And that's time that the CPU could be busy running the writer that's actually in the database and getting them out of the database. So the question is, do the additional writers realize that there's a problem because they can't acquire the lock or because of the while loop? So they'll eventually acquire the lock, that's transparent to them. It may take a little bit of time but what'll happen is, again, they'll decrement the number of waiting writers and then they're gonna loop back because we're in the while and they're gonna realize that there's an active writer in the database and then they'll immediately increment the number of waiting writers and go back to sleep. Yes, in the back? That's an interesting question. So what happens if we add a check here to see if there are any active writers? Do we need to do that? Yeah, so if we're in that situation, we have a critical failure because we were the ones in the system. So we were an active reader, so there should not have been an active writer in the database. So there should be no need to have to check to see if there are any active writers. There can only be other readers in the database with a reader, right? Because our mission control here for writers is gonna prevent a writer from entering the database if anybody was active in the database. And because we're manipulating the state variables within a critical section, it can't be the case that somehow or other a writer snuck into the database. Okay, yeah. Yeah, so the question is, signal or broadcast, is it just a design problem? Well, so what happens if we replace okay to read broadcast with okay to read signal? Is that correct? I hear a no. So it wouldn't be deadlocked, but it is the case that we would only ever, if we change this broadcast to a signal, we'd only ever wake up one reader at a time as a writer left the system. So if no writers came into the system after that, all those readers would just be starved. They would just be asleep, waiting, and never would be woken up. So they're not completely, you can just replace one with another. If you have a signal, you can replace a signal with a broadcast. You'll have lots of unnecessary context switches. But if you replace a broadcast with a signal, you may have a form of starvation. Okay, what if we just say, we only want one condition variable? So we're gonna have an okay to continue condition variable instead of okay to read and okay to write. And will this work? Yes, no, maybe, yeah. It'll work, but it'll be really slow. So one problem is we only have signals. So if we only have readers, and this'll work, but if we have readers and writers, this won't work because the writers might not get woken up. You might wake up a reader instead and then the reader's just gonna turn around and go back to sleep. So do I have an example of this? I think I have an example. So reader one arrives, reader one and reader two arrive while reader one is still sleeping. So writer one and reader two are gonna wait for reader one to finish. But then when reader one checks out, it ends up signaling reader two instead of writer one. So writer one doesn't get to run. So we've lost our priority here. So a solution, change them all to broadcasts. If we change it to broadcasts, then it'll work correctly. But again, we're gonna have a lot of context switches that are gonna be unnecessary because we're gonna wake up everybody who's sleeping. Readers, writers, they'll all wake up. The writers will compete and one of them will win. The readers will all go back to sleep because they'll see that there's a waiting writer. Again, it'll work correctly, but it's not gonna be an efficient solution. Yeah, question in the back. It's wrong because reader two goes before writer one and we're trying to prioritize the other way around. It's also not good because, yeah, now that's the biggest thing. We're trying to prioritize the writers with our signals and we could wake up a reader instead. The priority queue strictly set priority for writers to a thousand, the priority for readers to one, yeah. No, actually not, right? So it'll go back to sleep, but there's no signal to wake someone else up. So all threads will just be sleeping until the next thread comes into the system and then it'll randomly wake someone up, which if it wakes up a writer, that writer will proceed and then wake up all the readers. If it wakes up a reader instead, then again, all bets are off, you'll go back to sleep and everybody'll sleep. So easiest thing is just to change it to a broadcast. But again, this'll work, but lots and lots and lots of inefficiencies. So not the ideal solution, yeah, they will. So the first half of that was absolutely correct, that when we do a broadcast, we take all the waiting threads, in this case on okay to continue, we move them all over to the ready queue. So they go from waiting on the monitor to on the condition variable inside the monitor to being on the ready queue. And each one of them eventually gets scheduled, who knows what the ordering is of the scheduler imposes, each one will get scheduled and will compete to reacquire the lock from wait. If it's a writer, it'll compete over here. If it's a reader, it's gonna compete here. And then it's gonna perform, decrement the number of waiting readers and check. And if these condition variables say I can proceed, it'll proceed. If the condition variables say no, I can't, like there's waiting writers or there's already an active writer, then I go back to sleep. So that's why I said it's unnecessary context switches because all the threads are gonna eventually get, that I wake up are gonna get context switched to and then they're gonna decide can I proceed? If they can't proceed, they're gonna go back to sleep. And so that was wasted. Ideally I only wanna wake up a thread when I know it can proceed. So if there are no waiting writers, I can wake up all the readers and know the readers can all proceed as long as a writer doesn't sneak in in the meantime. Yeah, yeah, that's a very good question. So the question is broadcast and atomic operation, yes. Broadcast, signal, wait. These are all atomic operations implemented by the monitor. So they'll atomically take a thread and move it over to the ready queue. They will atomically, and they have to, there is interaction here, right? Because wait is putting a thread onto the ready, onto the wait queue and broadcast and there's no signal anymore, but broadcast and signal are pulling threads off. So they absolutely have to do concurrency control. So there's concurrency control going on at many levels here. Within each of the monitor primitives in the lock we're using to gain access and then within the database itself, we're only allowing readers in at a time or a single writer. So this is really an example that kind of pulls it all together with concurrency control operating at all levels of the system. Okay, any other questions? All right, so we have a deadline this week. Project one, design doc is due Thursday at midnight. Don't be late, there are no slip days for design docs. The TAs have already put up the design reviews. So sign up for your times and that's the reason why there's no slip days because the TAs have to read your design doc so they have good questions for you during the review. There'll be more on the reviews in the sections tomorrow and on Wednesday. We have heard back from campus and they have given us 2060 VLSB for our midterms. We'll split the class in two so you'll have plenty of space. We'll have two big rooms, this room and 2060 VLSB. Once add drop date passes, we'll figure out what the split is gonna be, whose last name will end up in this room and whose last name will end up in the other room. But both will be during class time so you'll have to go to the right place. The first one's gonna be in a month and the second one will be the last, actually that's the last day of classes for the semester. Any questions? All right, we're running a little behind so we'll take like a four minute break. Okay, so let's continue. So this is the fun part. That was the super technical test your brain for the last part of the class. I wanna give you some tips and tools that you can use for working in a large project. And it's all about synchronizations. It kind of fits perfectly with this lecture and with the fact that you're just starting in a large programming project. So a big project is something that requires more than one person or a long, long time. You could have one person work on an operating system but unless it's a really small operating system it's gonna take them many, many years. Big operating systems typically have thousands of person years of people. Why is it that Microsoft only releases a new operating system every couple of years or where Apple releases an operating system once a year? Because it's a massive undertaking to implement all that code and get it correct. The other problem is it's very hard to get software teams to work together. You're gonna see that in this class although hopefully not in the extreme mode as some of the examples that I'm gonna use. So I find this always interesting that you don't see this in most large construction projects. The Empire State Building was built in the mid-30s, 1930s and they did it in a year. Tallest building in the world, they built in a year and they were constructing, forging the iron girders for it down in, I think it was like South Carolina or Georgia or someplace like that, sending it up by barge and doing just-in-time assembly of the building. Literally the barges would arrive in Lower Manhattan, they would truck the girders to the site and erect it the same day. Because they had no storage space at the port and no storage space at the site. And they had a constant steady stream of these things arriving. We can't do that with software. Even the New Bay Bridge is kind of an exception. The Old Bay Bridge was again built in the 1930s and they built it in just like I think it was three years. Two teams working from opposite directions met in the middle on both the suspension bridge and on the truss bridge. If that was a software project, one of the spans would have ended up down by the San Mateo Bridge and the other span would have ended up at Angel Island or something. You laugh, but I wait till you hear some of my examples. You won't be laughing, it's your money being spent. Or your kid's money being spent. The Hoover Dam. This was a project that required like, I think it was 30, 40,000 workers. They built an entire town and again, delivered the dam on budget, on time. It doesn't work that way for software. So with big projects, you can ask, okay, so what is a big project? Well, I would say it's a project where estimating how long it's gonna take is hard. And this is hard in part because as programmers, we're eternal optimists. It'll only take me a couple of days to do that. Doesn't matter whether I'm writing four lines of code or I'm writing 5,000 lines of code. I can do it, give me 48 hours. I'll have it done, right? Everybody is laughing, but I'll bet you're thinking this for the project. I can start on Tuesday and it'll be done by Thursday. All the code ready to run and get a perfect score on the final auto grader. But this is why we bug you early because we know that you're underestimating how long it's gonna take to code correctly. So one question we can ask about a project is, can we break it up into pieces and partition it efficiently? If it's partitionable, then as we add more people, the total time will decrease, right? We add a person, give them a new partition. Add a person, split a partition, one person working on one part of it, the other person working on the other part. This works as long as the partitions are completely isolated. That is, no communication is required. If you require communication, time will very quickly reach a minimum bound. And if there are complex interactions between those partitions, then the time's actually going to increase. So Brooks coined the Mythical Person Month problem. There's a book that he wrote about the IBM OS 360 design and how he watched the amazing debacle that it was in taking almost a decade with like 1,000 programmers to deliver this operating system with like 1,000 bugs when it shipped. That was known bugs. And the problem is you estimate how long a project's gonna take. And immediately you start to fall behind. So you add more people. But instead of taking less time, it ends up taking even more time. So the Joint Strike Fighter, the F-35, it's gonna be our premier fighter that's gonna replace all the fighter jets for all the services for like the next, I don't know, 30, 40 years. And the software, of course, for this fighter is falling behind. So what do they do? They just announced they're gonna hire another 100 programmers to write the 8.5 million lines of code it takes to make this plane fly. And it takes another 10 million lines of code to do maintenance on the plane. I don't know what you do in 10 million lines of code, but it's something important. So one of the first questions that you should ask is how do you partition a big project? And there are two different ways you can partition it. So in a functional partitioning, you have the first person say, like in your project, one, implement threads, and the second person implement semaphores, and the third person implements the locks. The problem, though, with this kind of an approach is there can be lots and lots of communication across these API boundaries, right? So if Person B changes the API for semaphores, then the person implementing the threads, Person A, may also have to make changes. So first example, large airline company, spent, and I'll leave it unnamed in this case, spent $200 million on developing a new system for doing reservations and cargo manifests and weights and balances and crew scheduling and all of that great stuff. And they partitioned it into two teams that were working together. Gave each team the functional specification for their part. Each team now goes off and starts implementing. As you're implementing, you realize that you have to do things differently. So you write changes down, right? And they communicated those change orders over to the other side. At the end of the two years, they came together, put the two halves together, and nothing worked. So what went wrong? Well, they didn't look at those change orders that were coming by, because there were so many of them. So they just set up a filter and it goes into a folder and, you know, you don't worry about it. Until the two years are up and suddenly nothing works. And so they figured out, well, how much would it cost to fix it? It was another $200 million. So they scrapped the system. $200 million and you end up with nothing. Except a lot of bits that don't work. Joint strike fighter, right? So again, this is a project that's going to cost us well in excess of a trillion dollars. The current version is being flown by the Marines down at Eglin Air Force Base, but because of the software, and they can't get the software all functional, the plane can only fly during daylight in regular weather, so no inclement weather. It doesn't yet have the capability to drop bombs and it can't engage other fighters. This is not funny. It's a trillion dollars, right? You would think if you were spending that kind of money, you'd actually end up with a plane that could shoot down other planes or something, or like drop bombs, but no. Okay, so other way you can divide up is by task. So you can have one person who does the design, another person who writes the code, and a third person who does all the testing. So this is, it may be hard to find sort of the right work balance, how much work is there in testing. Testing can be a lot of work, but you can also though match each person's skills. So if you're very kind of theoretically oriented, then maybe you should be doing the design, coming up with the cool algorithms. If you're the Uber systems hacker, you're probably the person who should be doing implementation. If you're the person who has an unusual attention to detail, probably the tester. So you can make sure every code path, every possible input works correctly for it. So what we find is, so what companies have found is debugging is very hard. So Microsoft puts two testers, two QA quality assurance people for each programmer. Because it's very easy to write code. It's very hard to write correct code, as you'll probably learn in this class. So most of the projects in this class, the teams will end up being functional, but teams have had very good success with a task-based division. So first poll, how many people are using a functional decomposition for their teams? Okay, and how many people are using a task-based? So there was a huge gap of people who didn't raise their hands. You might want to make that decision before Thursday. So the question is, debugging is really hard, especially when you're debugging somebody else's code. So wouldn't it make sense to debug your own code rather than have someone go through the pain of debugging your code? It is easier to debug your code, but it's also gonna be the case that since you wrote it, you're gonna make assumptions. Because you already made assumptions about the implementation. You're gonna make assumptions when you're testing it. Someone who comes at it with a fresh set of eyes and a fresh look and a different viewpoint won't make those same assumptions possibly. And so they may uncover flaws in your code. Absolutely, it's much harder to debug someone else's code. It's much harder to extend somebody's else's code. If you haven't looked at the nachos code base yet, I suggest you do. You'll find it's really hard to make changes to somebody else's massive piece of code. But we've tried to help you. We've given you lots of documentation. There's lots of comments within the code. There's a walkthrough. There's a bunch of design things about the code. So when you have really good things like that, it helps. If we gave you the whatever, 40,000 lines of nachos code without any comments or documentation, this would probably be a very different class. There'd probably be a lot fewer people sitting here. But seriously, what we find is typically people who come from industry and are taking this class, they tend to use the task-based approach. Because that's exactly what industry uses. They play to people's strengths. So think about it. It is a different way to do the project, but it can be a very successful way to do the project. Okay, I gotta move a little faster. So communication. Having more people on a project means more communication. Every time you make a change, you have to propagate it to additional people. So think about the code being written by a person who's responsible for the most core part of a system. When they make a change, it's like a broadcast. Everybody has to be notified about that change. What happens if you don't? Well, you get miscommunication. It's a one person thinks the index for the data structure starts at zero. And another person thinks it starts at one. And then your rocket crashes. That happens. So you wanna make sure you're communicating. Now, next poll. Who makes decisions in a project? In your project, in this class. So you can have individual decisions. Those are very fast. I'm gonna do it this way. You probably don't have an argument with yourself. You just do it. So very fast decision, but it can lead you down the wrong road and cause problems. You can have group decisions. Now, you have to reach consensus. In a group of five with a binary decision, you can have two and three, majority rules. What happens if you have a group of four and two people wanna do it one way and two people wanna do it the other way? Which way do you do it? I see lots of people laughing. I've seen groups do it both ways and then fight and they come into my office and not a good situation. You can delegate and have a leader who makes a centralized decision. This is your system architect. So, I'm gonna take another poll. How many people are using the individual decision model? How many people are using the group decision model? Well, I hope you're not in groups of four. And how many people are using the centralized model? You know, if you have a group of four, there are four project phases. You could delegate someone to be the system architect for each project and make decisions. But they better be very clueful. They better know what's going on because they're making decisions. Don't laugh. You'd be amazed that the number of companies that end up with system architects that aren't very clueful. They better have good people skills because some people are gonna have their feelings hurt when they tell them you can't do it that way, you have to do it a different way. And finally, they have to be able to delegate properly. All right, coordination. So, more people means not all people can make the meetings. And if you don't make the meetings, you're gonna miss decisions. More importantly, you're gonna miss the discussion associated with that decision. And that's very important. I've had project groups many times that this has happened where someone misses the meeting, where they made some critical decision to go a different way. And because they missed the meeting, they went and implemented it their way, the other way that the group had originally decided on. And then, they get really upset that now they're asked to do it the other way because they already spent a lot of time doing it that way. So, communication is really critical. So, this is the reason why we limit groups to five people. It's hard enough getting a five-person group in the same section for a one-hour meeting once a week because we're working on a project. You have to be meeting on a regular basis multiple times during the week and the weekend. Why do we limit it to four as a floor? Because we want you to actually have the experience of having to work in a situation where communication is gonna be a challenge and coordination is gonna be a challenge. So, I think it's actually an interesting trade-off because if you're in a five-person group, you're doing one 20th of the work. If you're in a four-person group, you're doing 25% rather of the work versus 20% of the work. But the trade-off is in the four-person group, less communication overhead. So, even though you're doing 25% of the work, you're gonna spend a lot less time communicating. The five-person group, you're doing less work, but a lot more on the communication side. Yeah, question? So, the question is, do we have some law of large numbers that says industry groups have to be odd numbers so you can have split decisions? Typically, in industry, you have a system architect. So, you delegate somebody who, maybe they have a master's or something like that who has the big picture and they're responsible for making those hard decisions. Okay, so one of the challenges you're gonna find in your group here and in industry is people have different work styles. Some people have kids and so they're up really early in the morning. Some people don't and so they're up all night. When do you pick to meet? I had an undergraduate from this class that was doing some research with me and I think he woke up sometime like around 5 p.m. He was actually the department's citation winner. He had like 15 A-pluses. But our meetings always had to start at like 5.30, 6 p.m. because that's when he was awake. He was asleep by the time I came in in the morning. But how do you decide when and where to meet is important. What about your project? When it slips, what do you do? I mean, I can guarantee that your project is going to slip. You wanna make sure you're prepared for it. This is again where communication comes into play. If you're communicating, then you're gonna know when someone in your project is having problems and falls behind and you're going to be able to help them. But if you wait until 11.50 to find out, then it's probably gonna be too late. Finally, it's very hard to add people to an existing groups. Groups have already figured out how they're gonna work together and all of the details. So adding someone in just leads to chaos. But unfortunately in this class even we sometimes have to do it in the semester. Okay, so how do you make it work? First you realize people make mistakes, they're human, so you just deal with it. So adapt. Again, anticipate problems rather than trying to deal with the aftermath. So one way to do this is by documenting. So document at the beginning, document in the middle, document at the end. Why do you document? Because you wanna expose these decisions and make sure everybody is aware of what's going on. You wanna make it easy to spot mistakes earlier, you wanna make it easy to estimate progress. What do you document? Well, document everything. But don't document too much. So it's kind of a fine-balanced trade-off here. If you document too much, people just set up filters and never read it. Document too little, people miss important decisions. Standardize, have a single programming format and structure. All the major companies do this. It simplifies creating tools, it simplifies everybody being able to pick up somebody else's code and understand the structure and organization of that code. So lots of documents to maintain. In this class, we'll give you the objectives. But in an ordinary industry setting, you have to come up with your own set of specifications and goals. You have to create your specifications to include also performance specifications, especially in the real world, not so much in this class. This document is the first one you create. It's also the last one you create because it's a living document. This is why we have you turn in an initial design doc, then we help you fix all your mistakes and you turn in a final design doc that hopefully has fixed those mistakes. Document your meetings. Group meetings that I have, we all have a Google Docs window open and we document what's going on in the meeting. So if someone misses the meeting, they can quickly come up to speed on what we talked about and the decisions. And oftentimes those decisions can go and the thought process behind it can go into your design document. Create a schedule. How many people have created a schedule for project one? Those are the ones who are gonna get the A's in this class. You wanna have a schedule so you know when you're behind. How do you know if you're behind if you don't have a schedule? Oh, because the deadline hasn't passed yet. That's probably not a good way to create your schedule. And then finally, again, in industry you'll have an organization chart, but you don't need that for this project. Okay, lots of software tools. Everybody now uses software revision code tools. It's a way of communicating what's going on. It's a way of being able to go back to a known good state when it's three o'clock in the morning and you don't remember what you did for the last hour because you had too much caffeine and it's not working. Roll back and then you can see what the difference is. Use an IDE, so hopefully everybody is using Eclipse and use automated testing tools. So JUnit is a great testing tool. We're gonna have you, everyone use this this semester. We use it for the last two projects for our auto grader. It makes life a lot easier because it makes it very easy to test continuously, not just at the last moment. And finally, make sure you keep a history trail of communication. I mean, instant messaging is great, but if there's no history trail, you can forget what was decided. All right, so last slide. Test continuously. In industry you'll write dummy stubs so that you can do your integration tests on a continual basis. In this class, schedule periodic times to have before the deadline at 11 o'clock, schedule times to have people come together and run an integration test and see what works and what doesn't work. It'll also tell you how far behind people are. There's lots of different types of testing that TAs are gonna talk about this, but you should be testing random testing, fuzz testing, black box testing, gray box testing, white box testing. We, our auto grader, just does black box testing to the spec, and remarkably, we catch most of the bugs that are in your code. So you don't necessarily have to do white box testing, but it can help catch lots of bugs. Finally, again, if you automate the testing, you can do it on a continual basis. It's very easy to introduce a bug even if you just change one line of code, as we saw with some of the examples. So if you have continuous testing, you'll be able to catch those kinds of situations. So, conclusion slide. Talk today about monitors and condition variables. We're moving up the stack in terms of having higher level constructs that are much easier to use correctly. It's harder to make mistakes, that's a good thing. And for the project, it's very important to follow a very good process. So document everything, use as many software tools as you can, and I can't say it enough, run tests continuously. All right, see you on Wednesday. Thank you.