 Okay, welcome back everybody to 162. Today we're gonna pick up where we left off. We were talking about synchronization and I didn't quite get to semaphores last lecture. I didn't record a little supplemental lecture for those of you that wanted something for your project spec. But anyway, we're gonna pick up where we left off and then dive into some actual details about synchronization implementation. So if you remember last time we started talking about how it is that multiple threads of control can get implemented inside of a kernel and we used this abstract stack model. And we said, suppose that we've got two threads S and T and they're both running this code where they start with A and then A calls B and then B just keeps yielding over and over again. What we saw was that's gonna have a stack that looks somewhat like this where thread S calls A, which then calls B, which then calls yield. And the blue is the user code and then it dives into the kernel because yields a system call. And at that point we dive in to execute run new thread and switch. And the switch as we talked about saves out all of thread S's registers and then switches S's stack to T's stack at which point switch returns. But in the case of it returning, we now have a situation where we're actually, even though we called switch on S's stack because we changed the stacks, we're returning in T's stack. And so at that point the return from switch actually takes us to the instance of run new thread that T had done originally, which will then return across to user space, restoring the user stack, giving us yield, which will then go back to the while, which will then call yield, which will call run new thread, which will call switch, which will save out T's variables, switch the stacks, and then we'll return back up again and we'll get back and forth. And as a result, we'll end up multiplexing S and T forever here. But the key interesting thing about this was this idea that the switch routine really is just saving all the registers, including the stack returning from switch, then returns in a different stack, which then basically keeps executing T at that point. The other thing that this particular code was intended to show is this notion that we have a user stack and a kernel stack are associated together. And this kernel stack is typically called a kernel thread, oftentimes because when we're running inside the kernel, we're running on that stack and that's a thread uniquely associated with the user thread. Was there any questions on this? This particular diagram takes a little getting used to because it's this idea that just by changing the stacks and returning from switch we're back executing a different thread. And what's interesting also about this is that the combination of the user stack, the kernel stack and the associated registers basically define everything about a thread. So you can put it on the background. You can put this on whatever particular, what do I say, I wanna say, every particular queues that you want. I noticed somebody was asking about my background. I forgot to turn that back on, sorry. So the program counter is shared between threads because there is only one program counter. We're assuming that there's only one hardware counter. Now thread S and T, however, each have their own current position counter. So I'm not sure which you're asking about. There's only one hardware program counter because there's only one processor. But what we do when we are saving thread S originally is we save out all its registers, which would also include the program counter of switch. And then when we restore on the other side and return from switch, we're kind of now running in thread T. So what does restore mean? Restore means swap in the program counter for thread T. So there's only one physical program counter but we have two virtual counters because we have two threads, okay? And all the registers are saved, yes, okay. All right. Now, the other thing I showed you about here is this idea of using a timer interrupt to return control. So this is a solution to our dispatcher problem, which is what happens if one of these threads goes into an infinite loop and never calls yield. Okay, that's a problem. So in that instance, we need to have something happen. And as we talked about, there are many options, one of which is an interrupt. So we showed how, even if this blue routine is busy in an infinite loop, the interrupt comes along, the interrupt takes us into the stack inside the kernel. And then at that point, we can just run new thread and switch and we'll get exactly the same switching as we did back here. We could have a situation where T is doing an infinite loop and not yielding, but the interrupt will force us to get into run new thread switch and then we'll switch over to S. And so we'll at least get this fairness in that each thread S and the now broken thread T both get a fair use of the processor in that instance. Okay, good. So timer interrupt routine looks something like this. It maybe does various things involved in the interrupt and then called run new thread. Okay. So we talked about that last time. I wanted to give you a little bit more interesting information about this. So the interesting question here is, is a kernel thread also received timer interrupts? The kernel thread is not probably gonna receive another timer interrupt because if we got a timer interrupt in the middle of a timer interrupt, we'd get a recursive problem that would mess all the registers up and so on. So when you take an interrupt as I showed last time, you can take a look at last lecture. I talked about the interrupt controller and what the interrupt controller does is as soon as you take an interrupt, it disables everything and then the kernel as part of this entering into the interrupt routine is going to disable timer interrupts before any new interrupts are re-enabled so you won't get recursive timer interrupt inside of a timer interrupt. Now it could be that there's other interrupts that are very important that we leave enabled so it's possible while you're servicing one interrupt to get a higher priority one. That does happen but you're not gonna get a timer interrupt inside of a timer interrupt. Okay, now what I wanted to show you is a little bit of the instantiation of this. So the x86, which is what you're gonna be running PIDtoss on has a couple of things that make it work. You could decide whether you think this is easier or harder. It's a different processor than say risk five that you dealt with in 61C but among other things, there is this task state segment TSS format and a lot of operating systems like Pintoss and Linux and all of these only have one TSS at any given time. The way the x86 was designed is every task or thread in this instance would actually have its own TSS. That turns out to get too messy and it's not very portable. So instead, typically there's only one TSS but what's important in the TSS if you look here is the fact that there are stacks for privilege level zero, one and two. Okay, and if you remember there are four privilege levels for the x86 but we only use zero for the kernel and three for the user. But what's important about this is this privilege level zero stack. And so whenever you make a system call or take an interrupt which takes you from u to k that means user to kernel, what happens among other things is yes, the privilege level goes from three to zero but in addition to saving out things like our current stack while saving on our current instruction pointer that's our program counter and flags and the current stack pointer that we're running with we also save out, we also pull back in the kernel stack. So if you notice in hardware the act of either getting an interrupt or a system call actually stores takes the privilege level zero stack and stores it into the processor. So just the mere act in the x86 of going from user to kernel will actually switch that stack out for you. So this boundary which I'm showing you here with blue bounding into red actually has hardware support in the x86. In other processors this interrupt routine the first thing it would do on entering the kernel is it would have to switch those stacks but as you'll see if you take a look in Pintos do I show you down here tss.c and interstubs.caps you'll see that TSS is being supported and that the stack it switched automatically in hardware. And the other thing is once these things are saved then the handler saves all the other registers and so on and then the kernel goes ahead and does its work and on return all of that's undone. So among other things the user stack is actually pushed on the kernel stack and that gets popped off as part of returning to user level. So if you take a look here for instance this is just showing you a diagram we'll do some more details about this later but this is roughly what's happening not just in Pintos but also in Linux and some of the others but when you're busy running user code here what you see is that the code segment instruction pointer this is the program counter we've been calling it is pointing somewhere in the user code space. The stack pointer is pointing somewhere in the stack space of the user, okay? And as soon as that interrupt occurs what happens is as I mentioned automatically in hardware you see that the stack pointer here got pointed into the kernel that's what SS colon ESP is and the instruction pointer or program counter gets pointed into kernel code. So that all happens in the transition into the kernel. Now the question here on the chat about will we learn how to formulate interrupts in assembly or is that something we need to learn by ourselves? It's gonna be some combination of learning it by yourselves we're gonna tell you a high level about this in class but you're gonna have to read that .s file for instance to see a little bit more about what's going on but I will have more to say about entry into the interrupt handlers in a couple of lectures maybe even next time. But if you notice, so just the mere act of the system caller interrupt in the X86 changes the stacks. So these registers here that you see are actually processor registers and that changing into kernel mode has automatically switched those guys for us. And we've gone ahead and saved what I'm showing you here in blue is you see that there are some additional registers that are part of the user code. And so we need to save those two before the kernel starts running because otherwise we'll mess them up. And so even though the kernel automatically pushes the old CS, EIP and SS ESP onto the kernel stack that's done in the hardware. The kernel itself is the code that we've now entered is gonna be responsible for pushing the rest of those user registers into the kernel stack before it starts doing some computation and messing those registers up and that's what's being shown in red here. Now this page table pointer notice isn't changing because right now we've just had a system caller interrupt we're not changing anything about the current process. And so that's just gonna stay the same throughout but now we're in the kernel we've pushed all of the user's information onto the kernel stack and then we can just be executing kernel code and calling functions and so on because we have a kernel stack which is safe. Okay, okay, this PTBR is again this is the page table base pointer base register and that's basically pointing at the address space. Now, and once we're done inside the kernel let's suppose this was just a system call or it was an actual interrupt that wasn't gonna switch processes then we're ready to resume. And at that point what we do is we just reverse the process so we restore the registers that were set there in software we pop them off the stack and we put them back where they were. And then the last thing is gonna be this interrupt return instruction which is gonna automatically restore the user's program counter and stack pointer and then continue executing from where we left off. So this what I've shown you here is the simple example of how either a system call or an interrupt that doesn't change the process would happen and you can see the fact that we switch automatically to the kernel stack that's what this is down here and then we switch back to the user's stack. Just to give you a little bit of a difference if you're interested in the scheduling portion you can take a look at switch.s and by the way dot cap S tends to mean assembly but if you take a look at that code in Pintos what you'll see is once we get to this point here where we saved everything on the kernel stack of the user code we can now do something interesting, okay? So if we schedule a new task because we're gonna switch from S to T in our previous example what happens now is now we swap in a new page table base register because it's a completely new process let's say we've got new, we have new user registers. Okay, I'm showing you this just before the end and when we do the instruction return we're now gonna end up returning to a different process. Okay, and so we had the blue process and the green process and we're switching back and forth. Okay, the question about is this analogous to how a call instruction automatically pushes things onto the stack? Yes, there are certain things that are automatically pushed in a call instruction certain things that have to be done manually and this if you think about a little bit these ones that are pushed automatically are kind of the minimum required to maintain the correctness of the kernel stack and save those user registers that are gonna get lost if they're not saved right away and then it's up to the software to decide what else to save and restore. Okay, all right, now for now until we get into something else the question here about how does the page table base register know where to look in the kernel code? I'm going to basically take a page from pre-2018 and say that the user's memory space actually has the kernel in it as part of the upper part of the memory space and therefore it's just a matter of when you switch into the kernel you now have permission to use those page table entries. Okay, as you're well aware after we had meltdown the kernels had to get a lot more careful about that. Okay, so let's just say now that the kernel has access to its space and just by switching into kernel mode. Okay, all right, now the other thing that we did last time was we talked about using locks to fix this banking problem and so if you notice we had a problem that accounts modifications weren't atomic and so we put locks around them giving us a critical section and I had this little animation that was a little bit rushed toward the end of lecture six I wanna make sure we got it in this I also say it in the supplemental but if you look at the critical section what these locks mean is even when we have a bunch of threads all contending for that critical section only one of them is allowed in at a time and that's because the lock acquire takes let's the first one through and the rest are put asleep and when you release what that'll do is that'll wake up one of the threads afterwards. So now that A finishes and goes through release then B's allowed and C, et cetera, okay. All right, and so you gotta use the same lock with all the methods. So we have to do lock acquire and release for all of the things dealing with an account because the account is the thing we're protecting with critical section so that includes deposit, withdrawal and all the other things you might deal with an account all have to be protected with the same lock and that leads also to this example I gave you also last time which is well if we had a red-black tree we could have a single lock at the root and then all the operations that thread A and B might do would acquire the root do modifications to the tree and release it as long as we did it this way we know we're correct because the structure of the tree maintains its consistency because we lock it before we do any modifications and therefore only one thread's allowed to be in it, okay. Now the kernel and threads now can go back and forth in exchange of information through this tree without worrying about the tree becoming incorrect because of race conditions and there are ways of making it faster by putting more locks in the middle of the tree but that gets complicated. So a question here about our kernel thread registers saved somewhere when the program is running user code and the answer is they don't need to be, okay and the reason is if you think about this example we use the kernel kind of like it's a procedure call, right we call into the kernel with assist call or an interrupt calls into the interrupt handler and so the registers that are needed are created on the fly by entering that thing and then it's done when you exit. Any state that needs to be maintained longer than that's gonna be kept in global kernel state maybe example being a red black tree for instance for scheduler or what have you but we don't save the kernels registers when we go back to user mode because that lower half of the kernel the part of the kernel which we call the kernel thread is really only there explicitly for when the user is not running and we sort of generate all of the things we need there on entry into the kernel. All right, now, so the thing that I didn't get to at the end of last time which I really wanna make sure we talk about I realized I'm taking a long time on synchronization but it's the hardest thing I would say that you learn in this class. So the definition of what we're talking about here is that a bounded buffer where we have multiple producers and multiple consumers and the producers put stuff on the buffer and the consumers take things out of the buffer and it's a finite buffer. And so we need some synchronization first of all to coordinate the buffer because we don't want the buffer to get screwed up. All right, and the second thing is we have to somehow allow the situation where in the buffer is full and a producer comes along we need to be able to put that producer to sleep and wake it up later and when a consumer comes along and the buffer is empty we also need to put it to sleep. So in addition to keeping the buffer consistent which is similar to the question about the red, black tree in the previous slide we gotta do something else with the producers and consumers to put them to sleep. Okay, and that's essentially what I said here and we don't want the producers and the consumers to have to run in any lockstep. So we'd like fully asynchronous behavior producers can arrive at any time consumers can arrive at any time. All right, and I gave an example of GCC here where the pipes which you guys are all familiar with now that you're working on the shell homework is kind of one example where each one of these pipe symbols represents a finite buffer, okay? And another is a Coke machine which is my favorite example because this has a finite number of slots in it and when the Coke delivery guy shows up if the machine's full can't put any more Coke in there. So what happens? Well, we put him to sleep. Well, maybe that isn't quite the way it happens but that would be this analogy. And then students come along to buy Coke and if there isn't any Coke, what do you do? You fall asleep in front of the machine because I know you're all, you desperately want your caffeine. And so in this example, multiple producers might come and there's a finite number of slots in the machine and multiple consumers might try to pull things out. And there's certainly as multiple Coke that would be in there at any given time but perhaps it's empty, okay? And obviously there's lots of examples of finite buffers like web servers and routers and everything. Okay. So yeah, busy waiting is exactly right. This is the equivalent of the guy shaking the machine until the delivery guy shows up, okay? So that's gonna be considered bad programming style. So here's our basic buffer, okay? Which is a structure that can hold, I'm gonna say it can hold any types here, whatever you wanna put into the buffer, these are Coke bottles or they're structures of type X. And then there's a read index and a write index and the read and write index indices are just integers that wrap around. And clearly you gotta be careful that you don't try to put too much in the buffer because you'll start overwriting items in the queue and you don't wanna read too much because then you'll get the read in front of the right and you won't be able to know when the queue is fully more. So we need to make sure that the writes and the read indexes are kept consistent, okay? And I'm sure that you've learned about circular buffers in 61B or what have you. What's tricky about what we're gonna do here is we need to have the ability to have many producers and many consumers and things just work, okay? And we need to come up with what needs to be atomic. And so this was our first cut at this, which is well, what we're gonna do is we're gonna acquire a lock on the buffer for the producer and as long as the buffer is full, we're gonna spin and then we're gonna enqueue an item and then we're gonna release the buffer lock. And in this particular implementation, what's good about this is we don't have to worry about the queue getting screwed up because the queue item like in queue is inside the lock. We've acquired the lock, we've done something, we've released it. So that part seems okay maybe. And for the consumer similarly, we acquire the buffer lock and we wait for things to be empty and then we, excuse me, while it's empty, we wait and then we dequeue an item and release and once again, because we've acquired the lock before we dequeue, then we know that we're not gonna mess up the implementation of the queue, okay? But that's the only good thing about what I just got here, okay? This is just bad, right? And hopefully you can all see that let's think about this for a moment. If the producer comes along and acquires the lock and then decides the buffer is full and it spins, it's effectively tying up the processor while holding the lock, which means that it's busy waiting for the buffer full condition to go false, but that can't go false because the consumer comes along and tries to acquire the lock and goes to sleep because the lock's taken, okay? So this is an unresolvable situation, okay? So this is a bad implementation. And so then the second cut at this was simply, well, maybe what we do is if the buffer is full, we will release the lock and then reacquire it and just keep doing that over and over again until the buffer is not full and then we dequeue and release. And we know for instance, that because of the way this is laid out, we reacquire at the bottom of the loop and then when we check the buffer full, we have the lock so that when we dequeue the item, we have the lock, okay? Yeah, that's a case where the delivery man's blocking the machine and none of the students get their caffeine, right? And so the consumer is the flip side of this and believe it or not, this kind of works. I mean, it does work, but it's horrible, right? This is not a good use of time either. This is only a little better than the previous one. This one's a little better in that it doesn't deadlock and will eventually go full. But if you have a producer that shows up and the buffer is full but there's no consumers, the producer is just gonna go unlock lock, unlock lock over and over and over and over again, wasting processor cycles, okay? So this is what we typically call busy waiting, right? Because we're wasting cycles and busy waiting is a way to lose points on an exam or whatever or certainly an implementation because you're wasting cycles doing nothing, okay? So we wanna do better than that. And that led us basically to higher level primitives. So what's the right abstraction for synchronizing, so unlock is good, but unlock isn't quite enough. Now there's an interesting question here, couldn't you just pull? And the answer is, well, pulling isn't really helping you here because the assumption is the producer can't do anything but deliver bottles, okay? If the producer could somehow go away and check again, maybe that would be okay, all right? But this particular situation the producer is trying to produce and it doesn't have anything else to do. So good primitives and practices are important. So what we're gonna do in this class is we're gonna start talking about other primitives even then locking, okay? But first today we're actually gonna implement locks to get us there, but synchronization is a way of coordinating multiple concurrent activities. And as you're hopefully getting the flavor already, if you do it incorrectly, weird things happen and the weird things happen at the least expected moment, okay, right? That's the Murphy's law scheduler, which you have to remember is always present or the malicious scheduler. That means that essentially it will screw you up and mess up your synchronization at the worst possible time and find the most obscure synchronization condition, all right? And that's the Murphy's law scheduler. So semaphores, which I mentioned a while back, basically we're first defined by Dijkstra in the late 60s and they're the main synchronization primitive in the original Unix. And the definition is a semaphore has a non-negative integer value and supports two operations, P or sometimes down, which is an atomic operation that waits for the semaphore to become positive and then decrements by one, okay? And what this says here is twofold. One, the semaphore can never be less than zero. And furthermore, if anybody tries to execute a P operation on a semaphore that's zero, it waits, okay? But this isn't a polling weight. This is a, I mean, we hope a good weight, right? Just like with Locke, a choir, we've been implying that that a choir puts you to sleep or does something better than wasting cycles. This particular weight's gonna be similar, okay? The opposite is up or V, which is an atomic operation that increments the semaphore by one and wakes up any thread that happens to be sleeping on P. And you can think about as waking only one of the sleepers because if it wakes them all up, only one of them will actually be able to decrement it from one back to zero. And so basically, the atomicity is maintained by both P and V. That atomicity, meaning when you try to execute a P operation or a down operation, you'll never go below zero and you'll have to wait if you do. And when you execute an up or V operation, if you went from zero to one such that somebody was sleeping, you'll always wake up somebody, okay? And P basically comes from proberan to test and V from ferhogen to increment in Dutch. And that's because Dijkstra created them. Okay, so semaphores are integers just like integers except, first of all, their whole numbers because you can't go below zero, okay? And secondly, the only operations allowed are P and V or up and down depending on your implementation, okay? And the question about which of the two sleeping threads will be woken with V, it's unspecified. Okay, so if your application fundamentally depends on a particular thread like the first one being woken up, that's not part of the spec. Unless it says so, okay? And you should always assume it's a non-deterministic choice unless you're told otherwise, okay? So the operations, the only operations are P and V and the operations must be atomic as we just described, okay? And so notice by the way that you can't read or write the value except initially, okay? So notice that's part of the interface. So you set the integer at the beginning but you can't read it later. You can only do P and V. Now, there are those out there that'll say, well, I looked at the POSIX version of semaphores which I encourage you to do. And what you'll find is that they do give you the way to read it, but it's technically not part of the interface, okay? So keep that in mind. And here's a railway analogy, okay? Which is basically the semaphore we set it to two and when the first train comes down here to the track before it was able to pass the semaphore, it does a P operation which doesn't put it to sleep because the value was greater than zero. So the value goes to one that's fine. Here the value goes to zero that's fine. These trains by the way are hanging out and having coffee with their cameras on. And then another train comes along. Now at this point it tries to execute P but it's not put to sleep, okay? All right, now as soon as one of these trains leaves the yard and executes a V, it's gonna wake up our guy here. So the value will increment to one briefly and then go back to zero, okay? So that's the behavior of semaphores which you're now well aware of because of your design review. If you look at the two uses of semaphores that I talked about in my supplemental, one is mutual exclusion, otherwise known as a binary semaphore or a mutex. This is just like a lock. And this is the case where you do a semaphore P to grab the lock and a semaphore V to release it and notice that the initial value is one. So if you think about this, exactly one item is gonna be able to be in the critical section at a time which is gonna be exactly like a lock, okay? So you can technically initialize it as a one but increase it as much as you want. That's a good point. However, in that case it won't behave properly like a mutex. You might say, well, wait a minute, that's bad. Well, the answer is you violated your own spec. So synchronization is a contract between you and all the users and if you're the only user you've just violated your spec and you've broken your own code. So what I'm gonna show you is how to do good synchronization if you put bugs in your code that break your synchronization, then all bets are off, okay? Now another one is a scheduling constraint where we set things to other than one. Here I'm gonna set it to zero and what's interesting about this is this lets you do a join operation on threads for instance. So if you notice the thread join, if we set it to zero, the thread join which might be a parent process or something is gonna put itself to sleep because it's gonna try to do a semaphore P on a zero. That'll go to sleep but then when the finish shows up it'll do a semaphore V which will increment thread join and go forward, okay? Now, the bounded buffer we're gonna revisit so we need some correctness constraints, okay? Which are for instance, the consumer must wait for the producer to fill buffers if there aren't any and the producer has to wait for the consumer to empty buffers if it all is full and only one thread can manipulate the buffer at a time that's mutual exclusion. So this last one is basically saying we need a lock in order to keep the queue itself consistent, okay? The other two are actually correctness constraints and one is about the entry to the queue and the other is about the exit to the queue and we need a constraint on either half, okay? And if you think about it, that's gonna be true because we need a constraint on either half because these constraints are like half constraints, right? They're about something going below zero so we need to arrange so that we can put things to sleep either if the buffer is full or the buffer is empty. Now, the mutual exclusion is really just about trying to make sure that we keep the queue valid, okay? And a general rule of thumb is to use separate semaphores for each constraint so we have a constraint for how many full buffers there are, a constraint for empty buffers and a constraint for the mutual exclusion and that's gonna produce it this way. We're gonna start the number of full slots on an empty machine is zero. The number of empty slots on a empty machine is all of them, buff size and the mutex is one because it's like a lock and so the producer is going to look like this where we are coming along. The first thing we do is we say, wait until there's space so notice that if there are no empty slots then empty slots will be zero and this semaphore P you'll put us to sleep, okay? Otherwise, if we get through there we'll decrement the number of empty slots because we're about to add coke and then we'll grab a lock and queue an item, release the lock and the final thing we'll do is we'll increase the number of full slots, okay? And then the consumer is exactly the reverse which is again we're gonna say are there any full slots at all and if the answer is there are no full slots that means there are no coke so you as a student have to go to sleep. Otherwise, if we pass we decrement the number of full slots, we grab the lock, we dequeue an item safely because we have the lock, we release the lock, we increment the number of empty slots because we removed a queue, all right? And there's three different things to look at. So the critical sections in the middle here are locking and they're about keeping the queue consistent so we could even put red, black heaps or anything we like in this if we wanted, right? The full slot incrementing with semaphore V is about waking up the consumer and the empty slot semaphore V is about waking up the producer, okay? Questions, good. So why is there the say symmetry while the producer does semaphore P and the consumer does semaphore P on empty buffers and semaphore V on full buffers and we reverse and the answer is because they're doing symmetrical but opposite things, okay? And is the order of the P is important and the answer is yes, if we do a reverse like this I've shown here we actually get devock, okay? And if you think that through that's pretty simple, the producer grabs the lock but then goes to sleep on empty slots that means the consumer could never grab the lock to add something to the queue and so we're basically stuck, okay? Can you only have one semaphore for both consumers and producers? So this is not gonna work easily, okay? You can think through, there might be a solution that would do it but it's gonna be more complicated than this one, okay? Is the order of the V is important? No, it's gonna affect scheduling a little bit, okay? What if we have two producers or two consumers the solution will just work, all right? Now, administration as you know there's a midterm coming up on October 1st so that's a week from Thursday so it's getting a little closer. We've talked a lot about this. It's gonna have synchronization, they're scheduling on the schedule but that's not gonna be part of the exam. There's a midterm review next Tuesday that is from seven to nine p.m. apparently. We don't have any more details on it yet but I'm sure they'll be announced when we have them. Okay, so I wanna just dive ahead now to say now where are we going with synchronization? We're gonna implement various higher level synchronization primitives using atomic operations to try to get us toward writing correct code, okay? But we're gonna start with hardware. What can the hardware do to help us build locks, okay? And what you're gonna, we're gonna start with talking about loads and stores and then move forward from there. And once we've got it figured out kind of how to get synchronization out of hardware then we're gonna build interesting locks and semaphores and monitors and so on, okay? And then finally we'll be able to write good shared programs, okay? So we need to start with this hardware question because right now we've been talking about synchronization and it's been floating in space, literally. I mean, because we have lock, acquire and release. Well, how do you do that? We've got semaphore v and p, okay, how do you do that? You know, yeah, you could use a library but let's be a little more sophisticated and dive into how this is actually implemented. So our motivating example here is gonna be the too much milk example, which is kind of fun. So a great thing about operating systems is the analogy between problems in the OS and real life are often very good and they'll help understand things a little bit. The downside is that people are much smarter than computers or computers are much stupider than people and so you need to be careful, yeah, move, okay? So the example here is you're living together with other students and you have a shared refrigerator, okay? And the first person gets home and you look in the fridge and you're out of milk, okay? And so what happens? Well, because you have a good contract with your roommates, you leave for the store to get milk, okay? And you arrive at the store at 310 but meanwhile your other person comes home and they look in the fridge and they're out of milk while you're buying milk at 315, they're leaving for the store and we'll assume that you guys are going in opposite directions, you won't run into each other. The first person gets home at 320, puts the milk away, person B gets to the store, 320, they buy milk, they arrive home, put the milk away and now you have too much milk, okay? So this is a disaster of epic proportions of course and so the question is what can we do to make this work? Now this is a pretty simple solution, okay? And now the idea of leaving notes sounds like it might be a good idea, putting your roommate to sleep, perhaps that's a good idea, you know? I don't know about you guys but some of you might actually have roommates that are 180 degrees out of phase with you as far as sleeping schedule but the question is can you have too much milk, I guess, okay? And so to remember to start thinking about this, remember we've been talking about locks, right? A lock is basically preventing somebody from doing something, okay? And you lock before entering the critical section, you unlock after leaving and you wait if locked, okay? And remember the most important idea behind synchronization is that all synchronization problems are solved by waiting in one form or another. The trick is to wait as little as possible or if you're forced to wait for a longer period of time, don't steal cycles, don't waste cycles, basically let somebody else run, okay? But it's all about waiting cleverly, okay? And so for example, we could fix this milk problem by putting a key on the refrigerator, you lock it, you take the key and you go buy milk, okay? Now I don't know about you but I suspect this fixes too much, right? Because if your roommate only wants orange juice, then that's a problem, okay? So of course we don't know how to make a lock yet so let's see if we can start answering this question. And what are our correctness properties here? We need to be very careful about the correctness of concurrent programs since they're non-deterministic, okay? And so the impulse is to start coding first and then when it doesn't work, you either pull your hair out, you can see how well that worked for me or you can try to come up with a actual set of correctness constraints first, right? And I highly encourage you guys to do that, okay? Think first, code later, always write down the behavior. So what are the correctness properties for the too much milk problem? Never more than one person buys, somebody buys if needed, okay? All right, and I will say by the way that hair is far overrated so, but the first attempt is gonna be restricting ourselves to only using atomic load and store operations as building blocks. So let's assume the only thing that we've got that's atomic to start with are loads and stores. And just remember what that means is when you go to do a load, all of the bits load, all of the bits load from memory at once. You don't get some of the bits, okay? And store, all of your bits get stored at once, okay? All right, so can we do something with that? So here's our first solution to the too much milk problem and yes, indeed, all of those who said let's use a note sounds like a good idea. So we're gonna leave a note before buying. It's kind of like a lock, right? And we're gonna remove the note after buying. It's kind of like unlocking. And if there's a note, you don't buy, you wait, okay? So this sounds great. The problem is that if a computer tries this, perhaps this is not gonna work so well. So here's our code, right? If no milk, if no note, leave the note, buy milk, remove note, okay? So this looks like a first solution, okay? Let's look a little more carefully at this, unfortunately, so we have thread A and B. So thread A says if no milk, but then thread B gets to run, because remember the Murphy's Law Scheduler says if no milk, if no note, and then the scheduler comes back and A says if no note, at which point thread A leaves the note and they go to buy milk and they remove the note. And meanwhile, thread B has gotten past the if no note and now they leave a note, buy milk, remove note. And if we were to just pretend to be computers, then we didn't solve any problem here, okay? So the key thing here is that you gotta think like computers rather than like people, okay? Yeah, TLDR don't be a computer. Unfortunately, you're gonna be designing code that is running on a computer. So let's see if we can figure this out. And the result is really that there's too much milk but only occasionally. So what we've done is we have taken what was almost guaranteed to be broken and we've made it less broken, okay? But less broken is kind of like, you know, less disaster from, you know, a nuclear explosion. You know, maybe it doesn't happen as frequently, but when it does, it's bad, okay? And so, you know, synchronization problems that happen less frequently are far worse than ones that happen frequently because the frequent ones, at least you might have a chance of finding out what's going on, okay? So does everybody see why this only happens occasionally? Because this does mostly work, okay? It mostly takes care of our problem because you mostly won't get a switch right at the wrong point here. And so the note will mostly do the right thing, okay? Yeah, too much milk is the nuclear option here, right? So the thread gets switched after checking milk and the note, but before buying the milk, that's a, it's unlikely, okay? But it's still not good, okay? So the problem is worse now because it's failing intermittently. And you're at the beginnings of the joys of multi-threaded computation. And this is gonna be great, but you gotta learn how to synchronize. And by the way, okay, you can never have too much milk. Maybe I am wrong into the person who loves two gallons of milk. But if you ended up with four gallons instead of two, that might be a problem. So what can we do? So I saw somebody in the chat maybe suggest two notes, right? One for person A, one for person B. But before we try that, what if we try something else? What if we set the note first? Okay, so let's leave the note. Then we say if no milk, if no note buy milk, remove note. Does this work? What do you think? Does this work? Well, there's only one note. Yeah. So what happens here? Well, with a human, probably nothing bad. With a computer, nobody ever buys milk, right? Because what happened is we left a note, we checked to see if there's no milk, and then we say, well, if there isn't any note, go buy milk, but there is a note, right? And then we remove the note. So this solution, one and a half is not any better. In fact, now there's no milk, okay? So that's worse, I would say. So let's try our second solution, which is two notes. So thread A leaves note A, and thread B leaves note B, and thread A says if there's no note from B, then we go off and buy milk and remove our note. And thread B says if there's no note from A, we go off and buy milk and remove our note. Now what? Does this work? Okay, yeah, good. They could each leave their note just before checking for the notes, right? So it's possible for neither thread to buy milk, right? And so contact switches at exactly the wrong time, remember the Murphy's Law Scheduler, and this leads each thread to think the other one's doing it, okay? So this is really insidious, right? Cause this would happen, but at the worst possible time, and there was a time in the early days of UNIX where there was various problems that could only be solved by either rebooting or would occasionally cause a crash maybe once a week, okay? And that's an issue, okay? So I'm experiencing one of those with a new network switch that I just purchased that occasionally loses. It's got a memory leak and sort of eventually crashes every eight days, which is a little bit annoying. That's a very rare synchronization problem of some sort, right? Yes, and so you could say this is also similar to what happens with humans, but this is the I'm not getting milk, you're getting milk. And this is actually a type of lack of cold starvation amusingly enough. So this actually works out pretty well, all right? So this isn't helping us. How about this one? Now I'm gonna leave this up for a second for you guys to really digest, right? So we still have two notes, okay? And milk is better for you than water, by the way. So thread A says, unless you're lactose intolerant, which it's not, but thread A leaves a note, it's note. And then it says, well, note B is there, do nothing. Notice this is a spin. And then if there's no milk by milk and then remove note A and then thread B does not do a parallel thing. It does something slightly different, right? It leaves a note with its name on it. And then it says, if there's no note A, go buy milk and then remove its note. So A and B have different code, okay? So does this work? So I'm gonna tell you yes, but what do you think? So both thread A and thread B can guarantee that it's either safe to buy or the other will buy. And it's okay to quit looking, okay? So for instance, at X here, and that's what this slash slash X means, if there's no note B, we know for a fact that it's safe for A to buy because A's already left a note. So if there is no note B, then there's no way for thread to leave a note and not notice A's note, okay? As far as why it's concerned, if there's no note A, okay, then we know for a fact that because we've left note B that A will either have not been in this code at all or it will be spinning while we're off buying milk. And so it will not try to look for milk until after we come back and remove the note. So it works. How many of you feel so fulfilled by this solution? Hopefully not too many of you. Maybe this will convince you to give up milk, I don't know. Let's take a look at this though. For instance, leave note A happens before if no note A, in which case we can busy wait here, okay? And we're wait for note B to be removed and which case we can now check the milk and we'll know that there's no way for B to be in the if no milk buy milk at the same time A is. And vice versa, if we leave note B and we say if no note A, as long as the if no note A happens before note B is left, then we know for a fact that we can go in and buy milk and A will be caught up in this while loop while we go buy the milk and then when A finally gets around to looking there's already milk, okay? So what do you think? I mean, you could write code like this, okay? And in fact, this generalizes to end thread. So for those of you that are living in sororities or fraternities, you're okay because we can handle end people. There's even a paper on this for a solution to Dijkstra's concurrent programming problem. By the way, Leslie Lamport has written some of the most interesting theory papers that you'll run into. We'll talk about a couple of them at the end of the term and if you take 262 with me when I'm teaching it you'll learn about several of them. But yes, this generalizes, okay? So our solution protects a single critical section of a piece of code which is a if no milk by milk, great. But isn't that the way we were thinking about locks before when we were thinking higher level? So we had to go to a lot of work to get the locks working. And so the question might be, well, wait a minute. So does that mean, Kubi, you're saying that Professor Kubi that somehow all of this stuff is an acquire and this is a release and that's an acquire and that's a release except it's not the same for thread A and B. So that doesn't sound like this is a good solution, okay? Why don't you hold off on implementing this at your sorority until we give a better solution? So it looks like a lock, but it's not very easy, right? And solution three works, but it's very unsatisfactory. It's really complicated. A's code is different from B's which would be different from C, D, E, F and G. And even worse or not, maybe it depends on what you think is worse. A is waiting by spinning, right? So we've got that thing I told you you're not allowed to do. I'll show you right here, busy waiting, okay? So A doesn't go to sleep when it's waiting. It's busy waiting. So this is not a good solution either, okay? It's wasting time. Spinning is another word that's used for that, okay? So that's not good either. So there's gotta be a better way, okay? And first of all, we have to expand our set of primitives from just loads and stores to something else, okay? And it is interesting that the original MIPS processor designed by Hennessey Down at Stanford didn't actually have anything other than atomic loaded store. And it turned out that that ended up being way too complicated to design operating systems and user code. And so subsequent versions of MIPS actually had some atomic instructions of the form that we're gonna talk about here. So we need something other than loads and stores and then we're gonna use those to build higher level primitives, okay? And so what we wanna do, let me just refresh your memory, is we want something like a choir release where this is fully symmetrical no matter how many threads there are, okay? And we would like something by the way also that would allow us to have multiple locks so we could have a milk lock and an OJ lock and whatever else, yogurt lock, okay? And then our milk problem is very easy, right? Acquire the milk lock, if no milk by milk, release the milk lock, okay? So, all right, everybody with me on this? Okay, so the difference between busy waiting and what a semaphore down does is the following. So a semaphore down is unspecified whereas busy waiting is guaranteed to be a bad idea. A semaphore down, you should assume, puts the thread to sleep and lets a different thread run, okay? So what I gave you when we talked about locks and semaphores before we dove into the implementation here was the assumption that when you're waiting, you're actually put to sleep and not wasting cycles, okay? So the opposite of busy waiting is sleeping, okay? So they're not both looping until something happens. All right, looping still uses cycles, sleeping doesn't. And so going back to the analogy from the beginning of the lecture, if you remember, we talked about those multiple threads and switching from one thread back and forth together from S and T, the way you put something to sleep is you take it off the ready queue and you put it on a wait queue so that the scheduler doesn't give it CPU cycles at all, okay, yields control of CPU and that means when somebody releases the lock, it's gonna wake them up and bring them off of a wait queue, okay, so that's where we're going, okay? That's where we're going. So everybody got the difference now between spin waiting and sleeping and the interfaces that we gave you for both locks and semaphores could spin wait when they're waiting, but that would be a bad implementation. Now, how do we implement locks? So a lock prevents somebody from doing something, you lock before entering, unlock, leaving, wait if locked. The atomic load store gets a solution like milk three, that's not good. So what about a lock instruction? So what if we had an instruction such that when you execute the lock instruction, it does a lock and then there'd be a hard, an unlock instruction? Is this a good idea? It certainly would prevent us from having to build these complicated Dijkstra style things out of loads and stores, okay? So I have somebody that says it's slow. It turns out, not necessarily. I mean, it probably is complicated enough to be slow and that would be a good 152 answer. There's something fundamentally more complicated about this. What part of locking doesn't seem like it corresponds well to a lock instruction? Can anybody think? Not so sure? So by the way, yeah, exactly. Putting the thread to sleep. And those of you that maybe we're looking, that's a good answer, okay? So what about putting it to sleep? So the problem is putting a thread to sleep is complicated, okay? It requires knowledge of the current operating system. It requires you to know how the threads look on the stack. It would require you to know where to put stuff. And so trying to have a hardware instruction that handles the sleeping part is really complicated, okay? And in fact, you really don't want a hardware instruction that does that because that would then force you to use a particular version of sleep, which is that makes no sense. That would prevent you from using different operating systems. And by the way, the complexity or slowness that was brought up by another person in the chat is also correct. The Intel 3.432, you can look it up, had all sorts of interesting things. It had Hamming coded, or excuse me, Huffman coded instructions so that they were only as long as necessary. It had all sorts of really complicated stuff. It also had a bunch of different hardware lock instructions. You don't find them other than in computer museums because it was just too complicated and there was really no point. So we want to do something better, something simpler, okay? So let's try interrupt, enable and disable. So we know we can do that, right? So that's where we set a bit in the processor that says ignore interrupts. And then if we turn off interrupts, then because the timer interrupt isn't going to happen, then we won't switch from thread A to thread B and potentially we could get enough atomicity to do something in a critical section, okay? So on a uniprocessor, perhaps we could avoid context switching this way, no internal events. So the thread that's in the middle of a critical section doesn't do IO or anything, disables interrupts, it does some operations and then re-enables interrupts. And as a result, we could actually end up with some sort of critical section, okay? So here's a naive implementation of locks. So the acquirer says disable interrupts, the release says enable interrupts. Anybody think, is this a good idea? Okay, so somebody asked what happens if you should error, seems like too much power. Well, this isn't going to take a lot of power, but here's some problems. You can't let the user do this, right? If the user could run our lock acquirer operation, which disables interrupts, they could crash the machine, right? While true, okay? The other thing is that, as mentioned, you can only have one lock, right? There's only one lock in the system this way, that's good. And the other is if it's a real-time system and you're busy in a very long critical section, this could be bad, right? What happens if you're in a critical section and you get the nuclear reactor is about to melt down, hurry up, help, help, help and it's being ignored, okay? So that could be a problem. So this seems like this is not good, all right? What could we do that's better here, okay? Let's use disabling of interrupts, but instead of using it as the lock, let's use it to implement a lock, okay? So this is a little different. And so here's what we're going to do. We're going to have a value in memory, okay? So this is just a memory instruction. I've called it value and we're going to set it to free. You could think of this as a binary zero or one. And that's going to be our lock. So if assuming this all works out, we could have as many locks as we have memory locations. So that sounds good. And the way a choir is going to work is we're going to disable interrupts first and then we're going to say, well, if the value's busy, we're going to put the thread on a weight queue, go to sleep, re-enable interrupt somehow. Otherwise, we're going to set the value to busy and then re-enable interrupts, okay? So notice that a choir only disables and re-enables interrupts for a very short time. And that very short time is just long enough to see what the state of the lock is, possibly alter the state of the lock or go to sleep if we can't acquire it, okay? And the flip side of release, again, disables interrupts just long enough to see whether somebody's waiting on the weight queue. If they are, we go ahead, pull them off the weight queue and let them run. Otherwise, we say the lock is free and we re-enable interrupts. So the difference here between using the interrupt, disable and enable as our choir and release is we're using the interrupt, enable and disable to implement a choir and release. And fundamentally, what's different here is the fact that we have a very short critical section here from the standpoint of interrupts. So we disable interrupts, we do something really quickly and then we re-enable them, okay? So the interrupts are never disabled for a long period of time, but the user of this acquire and release could take as long as they want. Now, why do we need to disable interrupts at all? Well, this is to avoid an interruption between checking and setting the lock value. So if we get a synchronization problem in our implementation of a lock, we would have a bad result. And so this disabling and enabling helps us to make a good implementation and then we can give the acquire and release to our users. Okay, all right. Now, this still has some problems. That's okay, we'll fix some of them, but I wanna understand this solution first. So we need to disable interrupts for actual implementation and the critical section with respect to the interrupts is inside here, but that critical section is for implementing a choir and release. Now, if you look here, there are some funninesses here. By the way, the previous solution, the critical section inside the acquire, this is very short in here unlike the previous solution. So person using this lock can take as long as they want with the lock acquired because they're not gonna impact the state of the nuclear reactor. So we're probably okay there, okay? Now, but there's a problem here that's a little funny. So what about re-enabling interrupts when going to sleep? If you look here, what we've got is a situation where if you disable interrupts and then you say, well, if the value is busy, we have to put the thread on the weight queue, which is somehow putting it to sleep and then actually go to sleep. The question is, when do we re-enable interrupts, okay? If you look here, if the value is busy, what you see is that we're gonna do something funny in here because we're actually gonna go to sleep with interrupts disabled and that's gonna be bad, right? So that's an issue. We can't go to sleep with interrupts disabled because that will invalidate our whole solution. So could we put the thread to sleep at this or re-enable interrupts at this point? Well, we can't because if we do that, we re-enable interrupts at this point that it's possible that just before we put the thread on the weight queue, the malicious scheduler calls the other thread which releases and then we come back here and we put the thread in the weight queue and go to sleep even though the lock is free, okay? So we can't re-enable interrupts there. Could we re-enable them here? Well, the same problem here, we put ourselves on the weight queue, we get re-enabled, we go to sleep, okay? So we need to somehow wait until we're actually on the weight queue and asleep before we re-enable interrupts so that if the other thread then releases, it will be able to wake us up, okay? So, but what does that mean? That means we have to re-enable interrupts after going to sleep, okay? So that seems like a problem, right? That seems like that doesn't make any sense but it seems like it's required for the correctness. Now, how can this possibly be correct? Well, the answer is, if you look in the scheduler and you're gonna become very familiar with the scheduler once you get to project two, I'm gonna give you a little preview. Is thread A is executing that acquire and making the decision right here that it's gonna have to put the thread in the weight queue and re-enable interrupts. What does that really mean? Well, typically in the scheduler, what happens is you disable interrupts and you go to sleep, okay? But at that point, you context switch by switching from thread A to thread B, which then re-enables interrupts, executes for a while, disables interrupts, context switches, returns, et cetera. So if you think back to that S and T, right? S runs and then it hits switch, goes over to T and then returns up. When you're in the kernel in the middle of the scheduler, you do so with interrupts off. Why? Because if interrupts, if the switch routine is interrupted in the middle of saving registers and you go off and do something else, it's gonna completely screw up all the register state. So interrupts have to be disabled in the deep parts of the scheduler already. And so what we're actually seeing is the way this thing works here is we put the thread on the weight queue, go to sleep, interrupts are disabled in that part of the kernel. And so when we go to pull somebody else off the ready queue and run them, interrupts are disabled. And when they start running, that's when that other thread re-enables interrupts. So the way we solve this little conundrum is exactly that it's the other thread that gets to run after we go to sleep that will re-enable the interrupts. So this is your little mental puzzle for the night to figure out why this works. So again, we have the interrupts already disabled, we made the decision that we're trying to acquire the lock, we're gonna have to be put to sleep. That really means that inside the scheduler, we put ourselves on a weight queue and then we go to do the switch, but that switch is already running with interrupts disabled, we restore the registers, we return from the context switch to the kernel and we work our way back up to user level which will re-enable the interrupts and then we run thread B at user level. Now this is challenging the first time you see this. Now what I wanna do though is I wanna actually show you this with a simulation because everything's better with a good simulation, right? So here is an example of an in kernel lock simulation. Now can anybody say why I'm calling this an in kernel lock that we're building right now? So first of all, in answer to the question in the chat, yes, we don't have to actively re-enable interrupts in that sleep portion of the acquire because the other thread will re-enable them, good. But in the answer to my question of why I'm calling this a kernel, in kernel lock is we cannot give interrupt disabled and enable to the user, we already know that. So whatever we're coming up with only works inside the kernel for now, okay? We'll deal with that later, but here we have thread A and B and they're gonna synchronize with each other, okay? And so what I'm showing you at the top here is some states. So the value of the lock itself is either zero or one depending on whether it's free or busy. We have some number of people that are waiting on the lock and we have the current owner of the lock, okay? And we have the current state of thread B. So thread B is on the ready queue thread A is running. So remember, we alternate between ready and running for threads that are active, okay? And if you notice, the other thing is this owner who owns the lock is gonna be just for our own edification, okay? There never actually is in this view of the lock an owner that's tracked. Now there are some versions of locks you'll run into where it keeps track of which thread owns it, but it's not required for this, okay? So this owner is gonna be purely for our simulation here. So here we have thread A is running, dun-dun-dum, thread B is ready. So totally ignoring any acquire or release, we're gonna alternate between A and B because we've got our scheduler working, okay? But thread A runs and hits a lock acquire, okay? And it's gonna go to the acquire code, which as you saw from earlier says disable interrupts, okay? That's what that little red dot means. So interrupts are now disabled. Notice that the integer, the value is zero, okay? So we say is value equal to one? Nope, because the lock is free. At which point we set value equal to one and now I'm gonna say that the owner is A, but in fact, as I told you, this is only for our simulation because we don't actually have to record who the owner is. All right, so now that we've got value equal to one, we've now the lock is busy, that's because it's one. We turn interrupts back on, that's a little green dot and then we emerge from acquire. So notice the key interface with lock acquire is all the threads that are waiting are sleeping inside the acquire and they only emerge from acquire after they've acquired the lock, okay? And so the fact that we return from lock acquire means that we have the lock, okay? So how do we know we have the lock? We have merged from lock acquire, we returned, okay? And now we're busy executing the critical section. Okay, la-da-dum-da-dum-da-dee-dee-dee-dee. Pretty soon, what happens is the timer interrupt goes off and we're about to switch from thread A to thread B, okay? Timer interrupt goes off, that's what this dotted line is the timer code and that timer code is gonna set interrupts, disable interrupts, excuse me, that's why there's a little red dot and then the scheduler is going to look at which thread is on the ready queue? Well, the thread B's on the ready queue. So we are now going to put thread A on the ready queue. So notice how it says ready and it's on the ready queue. We're gonna take thread B off of the ready queue and it's gonna start running. So here's a situation where thread A has the lock, okay, the lock is acquired. Thread A's on the ready queue so it's not actually getting CPU cycles but it's got the lock, thread B's running, okay? And notice we've re-enabled interrupts and now thread B is the one getting the CPU. So right now there's no stopping, no blocking because A's in the critical section with the lock, B's not trying to get to the critical section yet. So we're good. Now of course this wouldn't be fun if we didn't start getting some conflict going on here with the two threads and so all of a sudden thread B hits lock acquire and now what? Well, we know from what I've told you earlier that thread B needs to go to sleep, okay? Because it can't acquire the lock because A's got the lock. So let's see what happens here. So it calls, okay, lock acquire. So the question here, let me see is why don't we set the value to one after waking up from sleep and acquire? Well, we set the value to one right away because we have to indicate the lock is busy. Okay, you're gonna see in a moment why that's important because when B tries to acquire the lock the fact that the value is one and not zero that means that the lock is taken, okay? So we have to set the value to one because that's the lock. Okay, so when we try to do lock acquire we disable interrupts. We're gonna run this lock acquire code. It's gonna see is value equal to one, yes. So notice that because value is equal to one we're, well, the lock is taken. So we gotta do something. Okay, what happens here? Well, we gotta put ourselves on the weight Q and go to sleep. So notice at that point we're on the weight Q. So this lock has a whole set of waiters potentially. Right now it's just us. What does it mean that there's a yellow weight? It means that thread B is no longer on, it's no longer gonna get CPU cycles and it's no longer gonna even be on the ready Q because it's waiting. So by putting it on the weight Q taking it off the ready Q means that it's not gonna get cycles because it's actually sleeping. Okay, so B is now sleeping on this weight Q. And now what happens is we go through the go to sleep which is gonna go wake up A, okay? Which in the process of running A now taking A off of the ready Q and putting it on the CPU we re-enable interrupts and we start running again in the critical section. So notice that the scheduler took us over to run thread B but thread B tried to acquire the lock which put it to a sleep on the waiter and now A gets to run again. And in this very simple simulation there's only A and B, okay? All right, and now we run and we're about to release the lock. Okay, so when we release the lock we thread A is now done with the critical section it's gotta wake up B and tell it, well, you can go now, right? Because if you look at the way release runs let's run this code here. The release code is gonna disable interrupts, okay? That's because we're messing with the implementation of the lock. We're gonna say, is there anyone on the weight Q and the answer that is yep, there's somebody on the weight Q. So what we're gonna do is we're gonna put them on the ready Q, okay? And the act of putting them on the ready Q so that they can now return from the lock acquire means that we've implicitly given them the lock. Notice we didn't change the value from one to zero and back to one again. We left it equal to one but the fact that we're now allowing B to run means that B now has the lock. That's why if you notice I switched this little owner pointer from pointing to A to pointing to B the owner isn't a real thing. It's just for us to keep track of what's going on here. And so thread A basically puts B on the ready Q re-enables interrupts starts running again. Okay, now notice that B doesn't run immediately because B is on the ready Q A gets to run for a little longer, okay? And eventually the timer goes off and it's time to schedule B to run again. And notice we'll pick up B where it was left off, okay? Which is it's gonna come out of sleep and we're gonna put A on the ready Q. We're gonna run B, it's gonna come out of sleep. It's gonna re-enable interrupts. It's gonna emerge from the lock acquire and voila, we get to run the critical section. Okay, and so this shows you hopefully how this particular implementation can work if we have the ability to enable and disable interrupts, okay? So can there be threads on the wait queue for a different reason than trying to acquire the lock? So the answer to that question is no, but it's not for the reason you think. There's many wait queues, okay? Many wait queues and you're on the wait queue for the particular thing you're waiting for. So in this case, you're on the waiter queue for this lock, if there's 12,000 locks, there's gonna be 12,000 wait queues one for each lock. Because otherwise when you go to wake somebody up, you won't know. Now the other question is, why didn't we set value to zero? And the answer is we didn't set value to zero because A woke up B and handed it to lock, which means the lock is still busy, which it means it's still equal to one, okay? We would only run if you look at this arm of release here down at the bottom. If there's nobody on the wait queue and we skip on this first arm, then we will set value to zero, okay? Now the question is what if the timer went off right after B was placed on the ready queue but before A enabled interrupts? So the answer is the timer can't go off. Look at what you just said there. Timer can't go off while interrupts are disabled because the timer won't go off. Good. And by the way, in case you're worried a lot about that, if you're thinking this through further, you might say, well, what if the timer went off and interrupts were disabled and I missed it? You know, I'm very sad I missed the timer. So the answer is that's not how it works. So interrupts that are disabled are merely deferred until you re-enable and then it'll go off. All right. Good. Now, let's think about this for a second. So this lock acquisition that we're looking at here, we can't actually put this implementation at user level. We'd have to run this in the kernel because we have disable and enable of interrupts. So that's a problem with this. Now what you could imagine pretty easily is we could make a system call, acquire and release system calls that basically take a lock identity of some sort and do lock acquire and release. Okay, so that is gonna be our first thought of how to do this properly. So clearly by going into the kernel, we can actually put the thread that is waiting, we can put them to sleep because the kernel, according to what you've learned so far, is the thing that puts threads to sleep, okay? So the interesting question is, doesn't be put itself to sleep? Well, sort of, except that what happens here is B is running and when it's in the kernel, it calls the right part of the kernel to put it to sleep, but it can only do that because it's running on the kernel thread part of B. So it's in the kernel and you can choose if you like to think about putting, it puts itself to sleep and gets woken up by somebody else. I think that's a deep philosophical question if you like, but in fact, it's the fact that B made a system call if it's user code into the kernel that it could even run this code, okay, in order to make this work. So in order to put things to sleep right now, we're gonna need to enter and exit the kernel, okay? And if you hold off for one second here, I wanted to finish up this thought. So if you remember, we talked about multithreaded servers now where we might have a master thread that queues up a bunch of results and we have a thread pool, which is a queue of pending requests. So this idea we talked about briefly with web servers and so on. If these threads are running at user level, the way they have to lock and unlock shared resources is they have to go through that common system call so that they're in the kernel and able to run that code. And so we can have a very simple performance model here, which is given that the overhead of critical section is X, we can talk about the time to context switch, acquire the lock and so on, do some work and then context switch again, release the lock and so on. There's a couple of system calls involved in this, okay? And so even if everything else, if we have a thousand threads and everything else is infinitely fast, the fact that our lock implementation has to go into the kernel means that things are fundamentally slow, okay? So what's the maximum rate of operations we could have? Well, if every thread has to go into this kernel, then we come up with X, the maximum rate of threads could be one over X, okay? And that's gonna take a really long time to do that synchronization. So if you remember, we talked about Jeff Dean's numbers, if X is a millisecond to go into the kernel and come back, that's only a thousand synchronization ops per second. And if we have a lot of threads, that may not be enough, okay? And so we gotta do something better than going into the kernel. And for instance, we might want this uncontended case where lots of threads are all grabbing and releasing locks but the locks are unrelated to each other. We'd like them to be able to go as fast as possible, all right? And so that's gonna be our goal and it's clearly gonna require something other than going into the kernel, okay? And we talked briefly about this, I showed you this diagram in a different context, but it shows you the difference between a system call is about 25 times the cost of a function call. So whatever we do to synchronize ought to be something that doesn't require us to go into the kernel to disable interrupts and potentially put us to sleep, okay? All right, so to do that, we're gonna talk about next time, we're gonna talk about atomic read, modify, write instructions that can run at user level, okay? And so problems with the previous solution can't get the lock implementation to users. It also doesn't work well in a multiprocessor. So I don't know if you've thought this through for a moment, but if I have a bunch of cores and I disable interrupts on one, that doesn't disable interrupts on another, okay? You can do a cross-core disabling of interrupts, but that's very expensive. And so you don't wanna do that, okay? And so we need something that would actually work on a multi-core. And so the alternative is gonna be these atomic instruction sequences, these instructions read a value and write the value atomically, and the hardware gives you this atomicity. So we gave you loads and stores. We said that that was messy, right? Cause we got the Dijkstra solution that was kind of a mess. Lamport gave us the generalization. We talked about interrupt, disabling and enabling, but it's not general enough and you can't give it to users. We need some other atomic sequence, okay? And that's gonna be for next time. And the good things, I'm not gonna, don't worry about this now. We'll talk about that first thing next lecture on Wednesday. But for instance, test and set is a good example of one that's particularly useful. And so here, what you do is you give it an address and what it does is it grabs the value in a memory location and stores a one. And it does that atomically in a way that can't be interrupted. And it turns out if you do that, then a test and set on a memory location becomes a synchronization op that you can use to make a very simple lock, okay? And that will be for next time. All right, so in conclusion, we've been talking about atomic operations. We talked about the difficulty of basically having more multiple instructions that we need to treat together. And so we need locks around it at minimum to make a multi-instruction atomic operation. We started talking about atomicity primitives like an interrupts and so on. And we showed you several constructs for locks. We haven't gotten to some interesting other atomicity primitives. That's for next time, like test and set and swap and compare and swap. We'll get to those. We've started our implementations of locks and we're gonna continue with that next time, okay? And so let me briefly see here, timer interrupts so loud in these disabled blocks. Okay, so only, so in, I will say, by the way, for the questions that are on the chat, when interrupts are disabled, you don't get timer interrupts in there. That's the point. And so when you re-enable them, the timer interrupts so on. So we're gonna start talking about synchronization. That's gonna be these other atomicity primitives that are gonna allow us to construct locks at user level. All right, you guys, I've held you for too long. I hope you have a good night and we'll see you on Wednesday.