 OK, so let's get started. So this is lecture four of computer science 162. And what are we going to talk about today? Today we're going to talk about synchronization and how hard it is to get synchronization. We're going to talk about atomic operations and how they can help us or still make our life difficult with trying to get synchronization. And then finally, we're going to talk about locks and how we can use them. And some potential ways to implement them. So remember the ATM bank server example. So the challenge is we have this bank server and we have a bunch of ATMs. And they're providing requests. People want money. And we need to check their balances. And then, if appropriate, dispense money. Now, we have these concurrent operations occurring because users could simultaneously be using any of these ATMs. And so we need to make sure that those concurrent operations do not corrupt the database. Key thing is we don't want to hand out too much money. So the challenge is anytime we have threads that are cooperating, that is, they're acting on shared state, there's a risk of corruption. And we want to have threads act together. Because if we have threads running together, we can speed up a server. So we can do key things like use a multiprocessor or multi-core machine where we have one thread or multiple threads per core. And we can overlap computation and IO. Remember, IO takes time. And we don't want to sit there with our machine idle while we're waiting for something, say, to be read in from the disk or written to the disk. Instead, we'd like to do some computation at the same time. So we have requests like deposit some money in account. We want this to run to completion. And occasionally, it might block. So for example, in this case, we have to read some account information. And so that'll cause us to do some blocking IO, because that might involve reading it from disk. We then hear increment by an amount, the amount of the deposit. And then we're going to store it back out to disk. And again, that might involve waiting. Wait for the disk to complete our operation. We'll see later in the semester exactly why disks are so slow. The problem here is, if we have threads that are operating on the same account at the same time, the state could get corrupted. So if thread one is the monthly process that adds interest to my checking account, and thread two is doing the electronic funds transfer of my payroll from Berkeley, then what can happen? So thread one loads up the account balance. And then it gets context switched, interrupted and context switched. And now thread two starts running. So it loads up the account balance, adds in my salary, and then saves that back out to disk. Then we context switch back to thread one. We add the interest, which is a lot smaller than my salary, unfortunately, and then we store it back. And so I didn't get paid for the month. I did get my interest. But this is not good. Our state is corrupted. The key thing we have to remember is that any time we can context switch, either because we voluntarily yield, that we can control, or because we end up blocking, because we do something like disk IO, or because we get preempted. Timer interrupt goes off, or some other interrupt goes off, and we get preempted and now another thread runs. So the problem here is really at the lowest level. We can't make any assumptions about the interleavings. So here, in this case, it doesn't matter what the interleavings are, because we're dealing with two different variables. As long as x and y aren't aliased to the same location, it doesn't matter whether we run thread a first or thread b first. We're going to end up with the same result, yes? Why is it 13? I'm going to go through that example, again, in just a second. So the question is, why are the possible values? How can we get 13? OK, so if instead we do have threads that are operating on shared state, even if it is just to read that shared state, we can have a problem. So here we have two threads. First thread is initially y is set to 12, and thread a sets x to 1, and then sets x equal to y plus 1. So it's reading y, incrementing it by one, and adding one, I should say, to that, and storing it into x. Thread b is taking y and setting it to 2, and then it's setting y equal to y times 2. So what are the possible values that we can have? We have to look at all possible interleavings. So one interleaving is we run thread a to completion, and then we run thread b to completion, in which case we'll have x equal to 1, and then x will equal y, which was set to 12, plus 1, so x equals 13. So that's where 13 came from. And then y gets set to 2, and y gets doubled, set to 4. But we can have other interleavings, because preemption can occur at any time. So we could have the alternate, which is we run thread b to completion. So we set y equal to 4, then we set x equal to 1, read the value of 4, add 1, and x equals 5. But we can also interrupt it in the middle. If we do that, then y gets set to 2, now x gets set to 1, x gets set to 2 plus 1 or 3, and then y gets set to 4. Any of these interleavings can occur. If you depend on a particular interleaving always occurring, bad things will happen, because the other interleavings may occasionally occur. So goals for today, we're going to go through two examples that were in the handouts. And then we're going to talk about synchronization. It's called Too Much Milk. And then we're going to talk about some hardware support for synchronization. So let's look at, first, our correctness requirement. So anytime we have threaded programs, we want to guarantee that they work for all interleavings of any thread instruction sequence. Now, because we have cooperating threads, that inherently means we're going to have non-determinism, because we can't control the order in which those threads are going to be interleaved, and when their interleavings are going to occur. So this makes it both non-deterministic and non-reproducible. So it makes it very hard for you to debug the behavior of a multi-threaded program unless you carefully design it. Now, for the first two projects, you're going to be using nachos. Nachos is really nice, because it's a simulated environment. So if you want to make your life a little bit more interesting, you can set the seed that it uses to control the timings of its operations on the command line. And you can change that seed value every time you run nachos. If you do that, then you will add non-reproducibility. And so when you have a bug, it'll make it a lot harder to find that bug, because the bug may show up, then you run it again, and the bug doesn't show up. Then you run it again, and the bug does show up. You change some lines, the bug doesn't show up. You run it again a little while later, and the bug shows up again. So that's what non-determinism, and especially non-reproducibility, gives you. And you can play around with it. OK, first example I want to talk about is the Therac 25. So this is a machine that was designed for radiation therapy. It had two different therapies it could use. It had an electronic accelerator, and it could use that to generate X-rays or to generate an electron beam. And in the Therac 20, which was the predecessor to this machine, the dosage was controlled and limited by hardware interlocks. But hardware is expensive. And as we all know, as computer scientists, anything you can do in hardware, well, you can do it in software too. So they decided to remove all the hardware interlocks and controls and replace them with software controls. Ordinarily, that would have been a good idea. In this case, unfortunately, it was a very bad idea. Because of multiple software errors, the machine ended up causing several overdoses and also several patient fatalities. So this is a machine that's intended to cure people, and instead it was injuring and killing them because of software errors. If you haven't read the paper, I encourage you to read the paper. Might be on the midterm. And it's a really very critical look at what went wrong and also the process. How did these software bugs get introduced? And it was really around very poor software design and practices. Now, what happened? There was a bunch of race conditions. So exactly what we're going to talk about in terms of synchronization. And also poor software design and also poor user interface design. They determined that the data entry speed during editing was the key factor in producing the error condition. If the prescription data was edited at a fast pace, the overdose occurred. So this is the ultimate irony. Operators who were unfamiliar with the THERAC-25 would go slowly in entering the numbers and the machine would work as intended. Operators who were very familiar with the THERAC-25 would keep punch in the numbers very quickly. The race condition would cause the values to be set incorrectly. And the patient would be overdosed. So usually, when you're going in for some kind of procedure, you want someone who's the most experienced sitting there operating the machine. In this case, you actually wanted the person who was completely inexperienced with operating the machine. But again, it's a long paper, but it's a very interesting read. We're going to try and find a shorter version of that for you to read. It's very important for software programmers for you to understand, especially when your code's being used in life-safety or critical infrastructure situations that it be done correctly. So second example I want to talk about is the space shuttle. So first space shuttle launch. This is America's return to space after Apollo. The entire world is watching it. It's on every TV channel. It's preempted all the local programming. And it's about to launch. Everybody's excited. And 20 minutes before the launch, the computers say, abort. And the whole world's like, wow, America, the great space power can't launch this rocket. So what happened? So the space shuttle was designed for extreme fault tolerance, because this is a vehicle that's carrying more than a half a dozen people up to orbit and has to keep them safe for two weeks while they're on orbit and then safely return them back to Earth. So five computers, four of them running the primary avionics software system or PAS. So these four systems are running, dealing with asynchronous events, so all of the monitoring information coming in from the gyroscopes and the fuel flow meters and so on. And it's real time, because you have to read a value, calculate what to do, and then do something, like move the actuators for the gimbled engines or something like that. These computers are tightly synchronized. So every 440 times a second, these computers are comparing their computations. And if they disagree, we have four. So if one disagrees, we turn it off. If a second one disagrees, we turn off that one also. Now they were very worried that there might be a logic bug that gets introduced by the programmers. So they implemented a fifth computer, the backup flight system, that was similar hardware, but running entirely different software, written by a completely different software team. It's called software fault tolerance. And this computer is also comparing results with the other computers 440 times per second. So we've got hardware fault tolerance through these four computers that are talking together very tightly. And we've got a fifth computer that's doing software fault tolerance. So the idea was to provide a high degree of reliability and availability for the entire duration of the mission. So what happened? The countdown was aborted because the backup flight system disagreed with the pass. And this was because there was a 1 in 67 chance that the pass would be out of sequence with the backup flight system by one cycle. And it occurred so they were both running correctly. They were just off by one cycle. One of these 1 in 440 times a second synchronization points. What happened? Well, they made a change to the initialization code of the pass. The way it worked was when the timer queue went empty, you look at the hardware clock and you synchronize off of that. But because someone added some code that put in some cases a delayed initialization request into the queue, the queue wasn't empty. And so you ended up with them being slightly out of sync, one cycle. So two things went wrong here. Even though they had extensive simulation, the bug never showed up. That's just the way it works. Bugs only ever show up when the entire world is watching. They don't show up when it's a bunch of engineers sitting there watching the screen scroll by. And they did months of extensive testing of this software. In fact, to this date, the Space Shuttle software is the gold standard for software quality in terms of number of defects per line of code. It's only a few million lines of code, but it's perhaps the most tested, verified, analyzed piece of code. Every change required multiple people to agree to that particular change. But these kinds of Heisenbugs are very hard to find. A Heisenbug is a bug that you run something, you get a bug. You run it again, the bug goes away. So it's sort of Heisenberg principle. If you look at it, it disappears. They're good and they're bad. The good thing about a Heisenbug is if you have the ability to reboot, it's going to go away, unlike a Bohr bug. That's a deterministic bug after Neil's Bohr. But a flight control system isn't exactly something you want to reboot while you're mid-flight. Your rocket goes tumbling out of orbit. The other problem that happened here was that this change was made late in the stage. And that makes it hard to handle and hard to test. So again, coming back to this class, make a major design change at, say, 11 PM on the night that the project is due. And see what the chances are of you correctly submitting your code on time. That's kind of what these guys were up against, is a late-stage change gets introduced. You can only do but so much testing. And you think the impact of the change is very clearly scoped. And that is another problem here, which is that a small change can have a rippling effect in a system. And again, that's something you'll probably see on the projects. OK, so yes, question? Yes, so the question was, isn't it a good thing that the computers aborted the launch because they were in disagreement? They thought that there was a problem. Absolutely, the computers did the right thing, just at the wrong time, rather embarrassing time for America. The next long, they actually debug this. If you read the paper, they actually figured out what happened, what went wrong fairly quickly. And we're able to meet the launch window for the shuttle. But so computers behaved correctly. It's just the very public outcome wasn't necessarily what we wanted. So to try and make our lives easier, we're going to talk about synchronization and how we can try to avoid having corrupted data structure. So to understand a concurrent program, we need to understand what is atomic on our system and what is not atomic. What is an atomic operation? So an atomic operation is an operation that always runs to completion or not at all. It's indivisible. So once you start it, something can't come along and modify the state in the middle or look at the state in the middle. It either looks at the state and modifies it before or looks at the state and modifies it afterwards. So this is really a fundamental building block. If we do not have atomic operations on our system, then we're never going to be able to build concurrent programs that operate correctly. And we're going to start today with very low level atomic operations. So in particular, we're going to start with just memory references and assignments. So loads and stores being atomic. And then we'll look at higher level primitives because we're going to see how painful it is when this is all we have to deal with. Now it's very important to recognize that this is loads and stores of words, 32-bit words, not 64-bit words, not double precision, floating point stores or loads, only 32-bit words. It's tied into the size of the computer bus. Those computers have a bus that's 32-bits and so that's how much we can do in a bus transaction. We can write 32-bits or we can read 32-bits. If you have a computer with a wider bus like a mainframe might have a 256-bit bus then you might be able to do floating point operations as atomic loads and stores. All depends. But you need to know for your particular architecture what instructions are atomic and what are not. So for example, in a SISC instruction world like the VAX or the IBM 360, they had instructions that would copy an entire array. That was not atomic. You could end up with very inconsistent who knows what happens results if you're copying an entire array and at the same time you're trying to modify the values in that array. Okay, so some of the challenges that we're gonna face. We have multiple threads operating in parallel and we're doing this because we wanna share resources or share data. We can do either fine-grained sharing or coarse-grained sharing. So the advantage of fine-grained sharing is we increase concurrency and increase concurrency means better performance. We get an answer faster, our programs run faster, lower latency, better throughput. But it's gonna make things much more complex as we'll see. An alternative is coarse-grained sharing. Much simpler to implement, but we're gonna have reduced concurrency, which means lower performance. So think about sharing the CPU in chunks of 10 milliseconds. There's a lot of work we have to do to make that work versus if we shared on one minute granularity. Sharing on one minute granularity will be easier to implement, but it's gonna have much lower performance. You're gonna have to wait a minute to get a hold of the CPU to run again, even if you just have two threads. Add more threads and it could take a long time before you got access to the CPU again. Also, think about sharing a database. You could share it at the row level or you could share it at the table level. Imagine Telebears ran with table level locking granularity. So all the students on campus want to register for classes. We lock the entire table and we do it one at a time. So hopefully before the semester started, you'd actually have a chance to sign up. Actually, hopefully before that semester's final exams, you'd have a chance to sign up for classes. Instead, we lock at the row level. Much more complex for the database, but it means that everybody can be signing up for classes at the same time. The example we're gonna use as a motivating example is an analogy that is drawn from the real world. And this is the cool thing I think about operating systems. You can take lots of things that we have as problems in the real world and abstract that into the operating system or vice versa. The key difference here, however, is that computers are not intelligent. People are. And maybe along the way, we'll learn some life lessons. Okay, so example we're gonna use here is we're in lecture. It's a nice warm classroom and a nice hot day outside. And so you're just thinking as soon as class ends, I'm gonna go home and have a big tall glass of milk because milk does a body good. So you go home at three. It's not this class because we get out a lot later. And you look in the fridge and there's no milk. Bummer. So what are you gonna do? You're gonna go to the store and get some milk because you're really thinking milk does a body good, I need to cool down. So you have a roommate. Your roommate has a professor who's a bit long-winded and so their class runs late. So your roommate doesn't get home until you're actually arriving at the store. And they have the same exact idea of milk does a body good. Calcium's good for the bones. And they open the fridge and find there's no milk. You can see where this is going. So just as you're buying the milk, they leave for the store. Now because this wouldn't work otherwise, you're gonna take different routes coming home and your roommate going to the store because obviously as a human being, if you saw your roommate coming by with a gallon of milk, you're not gonna go to the store and buy some milk. But this is not a human. This is a computer. Like I said, computers are not as smart as people. So just as you're arriving home and putting away the milk after you've poured yourself on a nice tall glass of milk, your roommate is arriving at the store. And of course they buy some milk. And then they come home, pour themselves a glass of milk and now you got two gallons of milk in the fridge. So that's why this is called too much milk. So let's more formally define this problem. We need some ground rules here. So synchronization is when we use atomic operations to make sure that we have cooperation between threads. Now for the example today, the only atomic primitives we're gonna have, load a word, store a word. Nothing else is atomic, which means we could be interrupted during any other operation. And you'll see it's gonna be really hard to build a system that operates correctly if all we have is atomic operations is just reads and writes. So second definition is a critical section. So critical section is a piece of code that only one thread is allowed to execute at a time. So if we have a critical section, only one thread can enter it and start running. No other thread is allowed to enter. Finally, mutual exclusion is a way that we ensure that only one thread at a time gets to execute a critical section. So one thread being in the critical section excludes the other thread from being able to execute in that critical section. Now these are basically two ways of saying the same thing. Mutual exclusion gives you a critical section, a critical section gives you mutual exclusion. Some more definitions. So a lock is something that prevents someone from doing something. So one model would be you lock before you enter the critical section and before you access the shared data. And then after you're done accessing the shared data, you unlock. So the last part we need is if you encounter a lock, you wait. And this is one of the core ideas of synchronization. All synchronization is gonna involve some form of waiting. If someone else is in the critical section, you have to wait. All right, so we could fix the too much milk problem trivially. We put a lock on the refrigerator. You then lock it and take the key before you go to buy milk. Problem solved. Your roommate's gonna come home, the fridge is locked, they can't buy too much milk. But there's a bigger problem here. What if your roommate wanted orange juice because they don't like milk? And there's a gallon of orange juice sitting in the fridge. So this is the difference between bind grain and coarse grain. Coarse grain definitely works. It's much simpler, but you're probably gonna be looking for a new place to live. So not a good solution. There's another problem. I haven't told you how to make a lock. So that'll be the end of the lecture. We'll figure out how to make a lock. So now we have some terminology. We can define the problem and design the problem. And this is the first step that you wanna take. There's always this impulse to just start coding. So how many people here have started coding before they did their design doc on the project one? Yeah, you don't wanna raise your hand for that one. That was a trick question. And yet, last semester, I had a student came to my office hours and he was like, I'm going crazy. My project mates, they just raced ahead and wrote some code. And I'm like, we need to do a design doc first and they wrote it and now we can't get it running and we don't know what's going on. And so don't go down that path, right? Start with a design, understand it. We'll have a much less stressful experience in this class. I've had other students come back to me and say, wow, it was so great that we did the design because we found a major bug in our algorithm. It was much easier finding it in the design. If we had implemented it, it probably would have taken us a long time to find the bug. Okay, so think first, then code retain your sanity. Okay, so thinking first, what are our correctness properties for the too much milk problem? Well, obviously the first one is, we don't wanna have too much milk. So never more than one person buys milk. But there's another correctness property that we wanna have. Anybody have an idea? Yeah. Yeah, somebody actually buys milk, right? Because an easy way to solve the problem would be, you know, I'm not gonna buy the milk and you're not gonna buy the milk and you know, we'll just kind of have a standoff here. So we wanna make sure someone buys the milk. Yes, question? Yes, we are going to exclude the case where you buy milk and then you return milk. We don't wanna buy too much milk. We have a lit, you know, you guys are students. Limited budget, right? No point to buying extra milk that you don't need or is gonna go bad. So never more than one person buys milk. And someone buys milk, if necessary. And we're gonna, again, only use atomic loads in stores. We don't get to put a lock on the fridge, at least yet. So let's try our first attempt at a solution. So we're gonna use a note, all right? And before you buy milk, you're gonna leave a note, all right? So that's kind of like a lock. And when you come back from buying milk, you're gonna remove the note. And that's kind of like our unlock. And if there's a note, you don't buy milk. And so that's gonna be our waiting, all right? So if you see there's a note on the fridge, you're gonna wait until the other person comes and fills the fridge with milk. All right, so we're gonna write a solution like this. Now remember, only reads and writes are atomic. So if there's no milk and if there's no note, then we leave a note, we buy milk, come back, remove the note. So what's the result? Is this gonna work? No, maybe. Yeah, in the back. Exactly, we can end up with both buy milk if we interrupt at the wrong time. So let's look at that. So the problem is, in most cases, this works fine, right? I come home, I see there's a note, and there's no milk, and then I see there's a note, and I don't do anything. But occasionally, it will end up with too much milk. How does that happen? So thread A runs and no milk, no note. Now we get interrupted. Thread B runs, no milk, no note. So then we go back and get interrupted. Thread A leaves a note, buys some milk, and removes the note. So, and then thread B leaves a note and buys more milk. So the problem here is we get context switched after checking for the milk and the note, but before we get to leave a note. Now, clearly, if you are standing in front of the fridge, looking in the fridge, your roommate's not gonna come in and look in the fridge, but this is a computer, right? And the computer is not that intelligent. So this solution is really bad. I'd argue it's made the problem worse because we think we have a solution, but occasionally it'll fail. So this is gonna be really hard to debug because you could run it 100 times, you could run it 1,000 times, and it works perfectly. And then that 1,000 in one time, you end up with too much milk. So we need to make sure, again, that our code works independent of any thread scheduled. Whatever is a legal interleaving has to give us the correct behavior. But maybe we have an idea here. So I'm gonna call this solution one and a half. So the problem was we checked for milk, then we checked for note, then we left a note. So maybe the solution is just leave the note first. So we're gonna place a note, then do all of our checks. So that's kind of like putting the lock at a higher level. So here's our new code. So we leave a note, and then if there's no milk and no note, we buy milk, and then we remove our note. So what happens in this case? You find your own note, and so what does that cause to happen? Of course, with you, right, you wrote on the note, and you're gonna see your handwriting. Plus, you know you just left the note, so well, you know. So with people, nothing bad. With computers, no one buys milk, right? Because it can't differentiate between the note you wrote and the note that it wrote. So even worse, right? So we'll never have any milk. We failed our correctness property. All right, let's try again. So the problem here was while you can recognize your own handwriting and know that that was your note you left, the computer can't, because it's just a note. So we'll have labeled notes. That's gonna be our solution, yeah, question? So labeled notes. Now we can leave the note before we check. So now the algorithm's gonna look like this. So in thread A, we leave note A. In thread A, we then check if there's no note B, then no note, no milk, then we buy it, okay? Then when we come back, we remove note A. Thread B does something slightly different. It leaves a note B. And then if there's no note A, and there's no milk, we buy milk. And then we remove note B. Does this work? Yes, in the back? Yeah. So that's the, it's possible for neither to buy milk. So you can have thread A run and leave a note A. Then you can have thread B run, leave note B, see that there is a note A, and so exit the system and then, before it exits, rather, and removes its note B, thread A runs now and sees that there's a note B. So it exits, then we can context switch back, remove note B, and then remove note A. So this is really, really insidious. We've made the problem worse and worse, right? Because it's really unlikely that this will happen, but at the worst possible time on the hottest day of the year, it'll happen. So this is called starvation, ironically, because neither of you ends up getting milk. So we're still not there yet. But we're getting closer, more and more times this works, and unfortunately, more and more insidious errors crop into it, but we're getting much, much closer. Any questions? That's a very interesting question. So the question is, if we come back to our code, what's wrong with just simply disabling interrupts? Not allowing any context switches to happen. Anybody have any ideas what might happen then? Yes. Yeah, so what happens if, you know, this is a rather short piece of code here, but what if this is some really long piece, like, you know, I decide to buy milk and then I decide to go watch a movie and then I go work on my 162 project and so on. It could be a really long time that we end up disabling interrupts, and that would be bad if there's like, you know, some critical things going on and we need to service those interrupts, like lots of network packets coming in. Those packets would get dropped eventually if we don't service the interrupts. We're gonna come back to this later and look at maybe how we could use interrupts, but in this kind of an environment, we wouldn't wanna use interrupts, yeah. Ah, so that's very interesting, okay? So we could have them notice, to repeat the question, we could have them notice that they're both trying to do something at the same time and have them have some sort of default or fallback behavior. That's getting really close to our solution as we'll see in a moment. So good insight, yeah. Ah, that's another good question. Given two pieces of code, you know, what's given a number of lines of code, could we determine all possible interleavings? Yes, but it's gonna be exponential and testing for all of that is gonna be very difficult. And especially if we add multiple threads and we're talking about hundreds of thousands of lines of code, it's not a tractable problem. So we're really gonna want to get our design correct and we're gonna wanna have the correct, ideally language level synchronization primitives that mean we don't have to worry about all possible interleavings. If we can make effectively our atomic operations large enough, then, you know, we're looking at large blocks that have to be interleaved and that's gonna be a lot easier to deal with than individual lines of code that might be interleaved. So the question is, is there a formal way of when the operating system decides to do an interrupt? So an interrupt by its nature is asynchronous. So the operating system has no control over when an interrupt occurs. It could control when you service that interrupt. But the challenge is in real time environments or in environments where you have interrupts arriving at a high rate, you have to deal with them quickly. So if I have a 10 gig, gigabit per second link connected to my server and I have traffic coming in, the card, a network adapter card is a limited buffer space and if I'm not servicing the interrupts fast enough that buffer space is gonna fill up and packets are gonna get dropped. So you have to service interrupts as quickly as possible. You are allowed to disable them but it has to be for very, very short amounts of time and bounded amounts of time. Other questions. Okay, so a couple of administrative things. Section assignments, updated section assignments I should say. I've been posted to Piazza. We had a little bit of challenge reaching our fixed points because groups told us times they could meet that they couldn't meet, fancy that. But we did get everything worked out and balanced across the sections and balanced across the TA's. So a tender assigned sections beginning with tomorrow's sections and Wednesday's sections. The first project starts tomorrow. So if you haven't already, take a look at the nachos walkthrough. Nachos is a lot of code. It's huge. The walkthroughs will help you understand the structure of nachos and how nachos does a number of different operations and the programming conventions for nachos. Programming conventions are going to be critical as we'll see in the later part of this lecture. If you don't follow the programming conventions, you will run into cases where sometimes your project works and sometimes it doesn't. And usually it's when we're running the test cases that it won't work. So download the nachos tar file. Start setting up your Java environment and Eclipse and Subversion or Git depending on what you'd like to do. The TA's are gonna be talking about that in the sections. So very important you attend sections. The other very important reason why you should attend sections is that we're gonna have weekly quizzes in the sections. These are really more for your benefit than anything else to give you continuous immediate feedback as to how you're doing in this class in terms of understanding the concepts. We're also gonna have worksheets for you to work on in the class. Also to give you immediate feedback on how you're doing in the class. As a result, we're gonna be slightly adjusting the grade breakdown. So 50% will still be from the projects. The exams will go from 45% to 40% 20 for each of the two exams and 5% will be participation. So that's coming to lecture. That's asking questions in lecture. So don't just sit here in lecture but ask questions. It's also attending and asking questions in section. And then also asking questions on Piazza, especially those that get marked as good. So you can't just simply post empty questions. And also answering questions, especially those that get marked as good answers. Piazza gives us lots of statistics so we can see for each person what your participation is. And then 5% as I said from the exams now goes to the quizzes. Any questions, yeah? No, you will not need computers for sections. These quizzes are gonna be short little one-page quizzes. And I think we have like 13 sections remaining and so we're gonna drop like three of them. So we'll only count 10, so each one will be worth half a point. Yeah, yes, this week. Weekly quizzes start this week. And we may still have quizzes in lecture. So do remember that, yeah. How are we gonna quantify participation grade? Asking questions and participating in class. We have a nice little thing from Bear Facts that now lets us see everybody's picture. So I'm gonna be, and John and the TAs are gonna be trying to learn everybody's name and face. So we'll know if you're here and participating. If you're sleeping in the back, we'll probably also notice that. The same thing for sections. Piazza makes it a lot easier, but we're gonna try our best to try and quantify it for section and lectures. Yes, question? Oh, that's a good question. What do you guys think? Okay, so quizzes will have material potentially for Monday lecture and before. Again, these are gonna be very simple, mostly true, false. It's really, are you understanding the material and intended, the reason why we're only making a five points total is because it's more feedback for you than anything else. So using that, using the readings, doing the questions at the end of the readings in the textbook is a good way to be prepared for the exams, or for the quizzes rather. And also the exams. Any other questions? Okay, so with that we will take a five minute break. Okay, so let's get started again. So we had a suggestion before the break that maybe we need to have these two threads realize they're trying to update the same state and maybe do something different. Let's try that as a solution. Okay, so here is a possible solution that tries to follow that idea. So in thread A, if there's no, we leave a note A, okay, so we leave a labeled note, and while there's a note B, we do nothing. We just sit there and spin it. And then if we find that there's no milk, we're gonna go buy milk and then come back and remove note A. What does B do? So B says leave note B, and then if there's no note A, and there's no milk, then we're gonna go and buy milk. We come back, we remove note B. Does this solution work? That's actually correct. So if B leaves a note, and then A comes along and leaves a note, and then, so if B leaves a note and then A leaves a note, then B will check and see there's a note A, right? And so then if we context which back, we'll see there's a note B. And so what will A do? It'll sit there and it'll spin, right? And then B will eventually get context switched and run, and it'll remove note B. And then what happens? A notices there's not a note B, right? So then A checks to see is there milk, and there's no milk. So then A will go and buy milk, right? And then A removes note A. So that interleaving does work. Any other thoughts? Why would A see that there's no milk? Because after B leaves, A is then gonna check to see is there milk? And it'll find there's no milk. Like as B didn't go buy milk because B saw note A. Then there's no note B now, right? So then there's no milk, and so A will go and buy milk. Yes, correct. Always someone will buy the milk. So in fact, this does work, right? But sometimes it won't be B. This is exactly what we heard proposed as a solution before the lecture, right? We're gonna realize that we're both, before the break rather, we're gonna realize that we're both trying to access shared state, and then we're gonna have a default behavior, right? So the default behavior is that if B finds A in the system, it exits. And if A finds B in the system, it waits to see what B does. If B buys milk, we're all good. If B doesn't buy milk, A will then go and buy milk. So we can guarantee now it's either safe to go buy milk or if it's not, the other will buy. And so it's okay to quit. So at X, we know it's the case. So here, we know it's the case that if there's no note B, B's not in the system, it's safe for A to buy. If B is in the system, we're gonna sit and wait in A to see what B does. And again, if B buys milk, we're all set. If B doesn't buy milk, then we will buy milk. At Y, what's the case, right? So if we see A is in the system, then if we see no note A rather, then we know it's okay for B to buy. If we see A in the system, then it's okay for B to quit. Ah, so it's a good question. Are these not threads anymore because they're not sharing the same data and program. So it's different programs, absolutely, or different code that's running in different threads, but they could be threads spawned by the same program. And they are accessing the same shared state, right? The shared state here is milk. Who's gonna buy it? And they are accessing the same shared variable of no milk, of milk, rather. And testing to see is there no milk or is there milk? So they still do have shared state, but now they also have some private operations that they're gonna do. Yes, in the middle, absolutely. Yes, this is a big problem, right? So here it's very simple. And look at all the discussion that we've had, trying to think, does this really work? And I encourage you to go home tonight and think about all the possible interleavings and convince yourself that for all possible interleavings it is gonna behave correctly. And then think about what happens when you get a third roommate. And how does the code have to change? You thought having another roommate was trouble enough. Now think about what happens when you try to make sure you don't have too much milk. And you'll find the codes just multiplies and gets messy. Yeah, in the back. So the question is how do we guarantee we're not gonna get stuck in this while loop here? So eventually the timer interrupt is gonna go off and we're gonna lose our time slice and we're gonna context switch back to B. And then B will run to completion. Either if B was already buying milk, it'll buy milk and return. Or if it's now just entering and noticing that there's a node A, then it'll do nothing and it'll exit and then we'll context switch back. Yes, that's correct. The safety is already there in the OS that we are assuming preemptive time slicing here. And so after our time slices expired here, we're gonna context switch back to thread B. Yes. So does this help us by, the question is does this help us buy milk? Should we have an alternate scheme where one roommate always buys milk? So this isn't helping us buy milk faster. This is just making sure we don't end up with too much milk. We could dedicate and say thread A is always gonna go and buy milk, but what happens when that thread goes away for spring break, right? Then there's no milk for that week, which might be okay, drink something else. Other questions, yes. Ooh, yes, so this is a very good point. So the question is, you know, this while loop is sitting here doing nothing. So it's just wasting CPU time. And CPU time is very valuable. You don't wanna just waste it. So is there something better we could do? Absolutely, there are a lot of better things we could do. In fact, in this class, this is prohibited. You're not allowed to do this. It's called busy waiting. So we're gonna talk about this in just a moment. Unless there are other questions. Okay, so our solution protects a critical section. This critical section is that if there's no milk, buy milk. That is the thing, we wanna make sure only one thread at a time is checking for milk and buying milk if necessary. Because if we have two threads trying to do this, we end up with too much milk. So this solution works. Like I said, I encourage you to go home and take a look at all possible interleavings. It's a small piece of code, so you can do that. But it's way too complicated. It's really hard for us to reason that this is correct. And this was just a few lines of code. Multiply it by 100,000 and then try to make sure it's correct. As was pointed out, we have different code running in A from what runs in B. So try to think about what it would be with three roommates, four roommates, five roommates. This now gets to be insanely messy. You don't want a system where adding a new thread means rewriting all of the code. It's not gonna scale. It's not gonna be easy to design. It's not gonna be easy to implement. It's not gonna be easy to test. Also, as was pointed out in a comment, thread A is consuming CPU time while it's waiting. It's called busy waiting and it's really bad. All right, so yes, there is a better way. We could have the hardware give us better, so higher level primitives than atomic load in store. So we demonstrated in this case we could do it if all you gave us was atomic load in store. But it's a completely undesirable solution that we end up with. Even better, and we'll see later on in a couple of lectures, is if we have higher level programming abstractions. We put this in Java, add a keyword, like synchronized, that will make it much, much easier for us to implement solutions like this. So the high level picture here is having threads as an abstraction, yeah, question. So now we're leaving a more complicated note. The question is, can that note fit into one word? And it effectively just turns into trying to implement a lock again. Because we need to check if there's a note and we need to be able to leave a note, check and leave a note atomically, basically. So we have this nice abstraction of threads. It gives us this sequential instruction, sequential stream of instructions as an execution model, which is a very nice and easy model. It allows us to overlap IO and computation and to have parallelism. So those are all the sort of positives. But it's very complicated still to access this shared state. This is just one example. We can come up with, that was one solution. I'm sure there are other solutions we could come up with that are equally as complicated and difficult to reason about in terms of correctness. And so this is the wrong level to be thinking about things. We need to think about things at a higher level. We don't want a solution that's very tricky to implement or understand and thus is likely to be error-prone. Worse yet, think about trying to modify that code. Any modification is likely to introduce errors and not be correct. So we wanna develop a synchronization toolbox and have some common programming paradigms that we're gonna use. So imagine this now as our fourth solution. So let's say we had a lock, right? Then we could, and that lock had a couple of different operations. First operation is acquire. So you wait until the lock is free and then you grab the lock. The second is to release the lock and that unlocks the lock, releasing anyone who happens to be waiting. And we're gonna say that these are atomic operations. So if two threads come along and both simultaneously try to grab the lock, one will succeed and grab the lock, the other will wait. Again, waiting is part of synchronization. So now our solution is easy. We simply acquire a milk lock. If there's no milk, you buy milk and then you release the lock. And once again, the critical section of code is between the acquire and the release. So again, we have our critical section of if no milk, buy milk. So now we've just kind of done the hat thing and we've just swept the problem under the rug. So now the next thing we need to implement is a lock. So how are we gonna do that? So remember, a lock is something that prevents someone from accessing something. We lock before we enter the critical section, before we access this data that's shared. When we're done accessing that shared data, we just simply unlock. And if you encounter a locked lock, then you wait. So again, if we're gonna wait a long time, we should sleep. We should not be busy waiting. So our implementation of lock should not simply say, oh, if the lock is busy, then do a while. Because that's wasting CPU. And ironically, that's also, if we only have one single CPU, that's keeping the other thread from running. Because we're holding the CPU and just sitting there spinning. So one solution, and if it can be done in hardware, some hardware designer has done it, is to implement a hardware-based lock instruction. So we can ask the question, is that a good idea? Just because we have the transistors doesn't mean we need to use them for something. And the problem is, think about how this would interact with the operating system. So if we want this hardware lock primitive to put a task to sleep that's waiting, then it's gotta have some way of talking to the scheduler to say, put this task on the wait queue. That's a pretty complicated API for an instruction to implement. Worse yet, it's tied to a particular scheduler or a particular scheduler API. So what might work really well for OSX might not work nearly as well for Windows. So now we've got a processor that works with Max, but it doesn't work with Windows. Probably not a good strategy if you're trying to make a general-purpose processor and you're trying to stay in business. The other side of it is the argument that you've heard over and over in the lower divisions about complexity. You make the hardware more complex, it's gonna run slower, and it's gonna be more difficult to implement and take up more area on the die, and so be less energy efficient. So for all those reasons, even though people have in the past on processors implemented hardware lock instructions, not a good idea. Not the way we wanna solve this problem. So we do have something that does allow us to prevent another thread from running and someone raised it in a comment, which is enabling and disabling interrupts. So let's implement our lock doing that. All right, so remember, the way that the dispatcher gets control is either because you voluntarily relinquish the CPU, that's an internal event, so you yield or you call some operation that can block like read from the disk or write to the disk, or external. So an interrupt occurs, a network packet arrives, a timer interrupt goes off, all of those can cause us to have to relinquish control of the CPU. So we can avoid doing the things that would cause us to internally give up control, and we can disable interrupts to prevent external events. Now we're gonna see later on in the semester when we talk about virtual memory that avoiding internal events gets really complicated when you have virtual memory, but don't worry about that for now. Let's assume we don't have virtual memory. So we can disable interrupts and that'll cause us no longer to be preempted by an interrupt coming in, either a timer interrupt occurring or some other asynchronous event occurring. So a naive implementation of locks is just simply when we want to acquire a lock, we disable interrupts. When we want to release the lock, we enable interrupts. So now how can we use this? We can, so the problem here is we can't allow this because a user could do something like this, acquire a lock and then wow, true. Now most users aren't going to implement code that does this, but as you're gonna learn, some of your project partners might implement code that has this behavior of just going off in an infinite loop and not terminated until the auto grader kills it. You laugh, but I promise you it's going to happen. So if we allowed this kind of code, what happens, right? The system goes off, a user program goes off chasing wild geese and our system effectively hangs. Even if your project partners could write good code and it didn't do this, there's no guarantee on how long that critical section might be. And if we have a real time system, we lose all abilities to have guarantees on timing. So think about the space shuttle, right? The space shuttle gets a gyroscope update that says it tilting over too far the wrong way. And so it's gotta gimbal the rockets and adjust the flight control services within a really short amount of time to correct that or the problem's gonna get worse, right? If you have arbitrarily long critical sections, there's no guarantee you're gonna get those interrupts that say something is going wrong in time to be able to correct it. And there's lots of other kind of important events that might occur. Like if this is a reactor safety critical system, it might be saying the reactor's melting down, might be time to extract the control rods or rather push in the control rods. And if your code's busy doing the too much milk thing, well, you won't need milk anymore. So there's gotta be a better implementation here, right? Instead of trying to disable interrupts for the entire critical section, let's just make acquiring and releasing the lock a critical section. So sort of a meta critical section. So we're gonna have a lock variable. We're gonna impose mutual exclusion on accessing and updating that lock variable. So our lock code is now gonna look like this. Our lock initially starts out as free. We, in our acquire, we're gonna disable interrupts. Now, we're gonna check to see if the lock is already acquired. If someone already has the lock, then we're gonna put the thread on the wake queue. We're gonna go to sleep and we're gonna enable interrupts. We're gonna do all of that at the same time. Otherwise, we're gonna set the lock to busy. We're gonna acquire the lock and then we're gonna enable interrupts. Our release code is going to do something similar. It's gonna disable interrupts. Then it's gonna check to see is anybody on the wake queue? If there is anyone on the wake queue, then it's gonna take a thread off of the wake queue, put it on the front of the ready queue and enable interrupts. So we're basically transferring the lock to that thread that we just put on the ready queue. And then otherwise, if there's nobody waiting, we're gonna set the value to free. So let's discuss this. So by disabling interrupts, we're able to avoid anyone interrupting us between the check and the setting of the value or the check if anybody's waiting and releasing it. No one else can come in here and get into either to look at the value of the lock or to manipulate the queue waiting on the lock. Only one thread at a time is gonna be able to do that. Yes, question. So the question is each time we get run, taking off the wake queue, are we gonna have to run a choir again? No, because what'll happen is we go to sleep here and when we get woken up, we're the one holding the lock. So all we have to do is just enable interrupts and we're done. I'm sorry, I don't follow your question. So the question is, does this work for two threads? It'll work for n threads. We could have as many threads as we want. And we're gonna guarantee that only one thread at a time is going to be able to come in and look at the lock variable and manipulate the lock variable or manipulate the wake queue. Because in a choir, if two try to acquire, one is gonna disable interrupts first, then we can't context switch to the other one because there's no timer interrupts and we're not gonna do anything unless we want to to cause a voluntary yield. Now if we get into a choir and we find that the lock is already held, then we are gonna put ourselves on, we're gonna go to sleep and re-enable interrupts and then the other thread can now run and enter this procedure. And it's gonna enter and find again that the lock is acquired and so it'll put itself on the wake queue in the second position and then it'll go to sleep and release interrupts, re-enable them. Same thing over here, only one thread's ever gonna be releasing the lock because it's gonna be the one holding the lock. We wanna make sure that there isn't a race condition between someone acquiring, trying to acquire the lock and put themselves on the wake queue and us checking the wake queue to make sure we can wake someone up and put them over. Yes. Ready queue is the queue of threads that is ready to run. So if you're on the ready queue, you can be scheduled to run at any time. If you're waiting, you can't be scheduled because you're waiting on something to happen. So when you go on the ready queue and especially if you can put it in front of the ready queue, you'll be the next thing that gets to run. So when this thread gives up control, you'll then run. Yes, you next? Ah, good question. So the question here is, what if we have multi-core? Are we disabling interrupts for everybody? So in this case, we're gonna assume we have a single core, single CPU. So just one thing can run at a time. It gets more complicated when you wanna think about disabling interrupts across multiple cores or across multiple sockets in a multi-processor system. Yes. So I put it at the, the question is why did I put it at the front of the ready queue? Because I wanna make sure it's the next thing that gets scheduled to run. We'll talk a little bit later when we talk about monitors and how you transfer control and the different styles. The textbook will use this style where you get put at the front of the queue, it's a whore style and you immediately transfer control. In a modern operating system, you don't have that control and that's a Mesa style. Okay. So, well, we're running out of time. That's good actually. Means we're getting lots of questions. Okay, so we disable interrupts so that we can make this a critical section and prevent two threads from acquiring the lock at the same time. This is our critical section in acquire. And unlike the previous solution, the critical section is really short because the time it takes us to acquire or release is really small. So once we've acquired the lock in that critical section, interrupts are enabled and we can run for as long as possible. Short time, long time, doesn't matter. Interrupts are enabled. So the key thing is this allows critical interrupts to be taken quickly. Now, where do we re-enable interrupts? This is a hard question. We could choose to re-enable interrupts before we put the thread on the wake queue. But the problem with that is that release could then check the queue and see we're not on the queue. And so we wouldn't get woken up, right? Because release comes and runs and says, oh, nobody's waiting, I'll set it to free. And then releases. And then we get to run again and we put ourselves on the wake queue and we go to sleep. Now, if no one ever acquires the lock again, we'll sleep forever. If someone else acquires the lock, then they will see that we're on the wake queue. But you really don't wanna write your programs assuming that. That would not necessarily work. Another choice would be after we put the thread on the wake queue. But now we have the problem that's even worse, right? Because release can put the thread on the ready queue and then what do we do? We immediately go to sleep while holding the lock. This is deadlock, right? Because we're holding the lock, we're never gonna run again because no one is ever gonna wake us up. And no one else can now acquire that lock. So we really wanna put it after the sleep. But how do we do that? So the way we can do that is as follows. We know that interrupts are disabled when we call into sleep. So we can make it the responsibility of the next thread that runs to restore interrupts when it comes back from sleep. So that looks like this. So thread A, disables interrupts and then goes to sleep. We have a context switch to someone else and they return from their sleep routine and they re-enable interrupts. Now they run for a while and they acquire some lock and have to sleep and so they disable interrupts. And now when we get context switched back because we're now able to run because the lock is now ours, we're going to re-enable interrupts also. Now this is just context switching between two threads that are sleeping. But there may be other things that are causing the threads to give up control like yield. And so we're gonna have to make sure that our programming discipline for our kernel, our operating system, is that when you return from a context switch, you re-enable interrupts. Before you context switch, you disable interrupts. Now if someone forgets to re-enable interrupts because they add some new function to the operating system, then sometimes the code will work and sometimes the code won't work. You're gonna encounter something very similar to this when you look at the nachos code of you have to make sure that you enable and disable interrupts appropriately. Nachos provides some functions to make it easy to figure out whether you're supposed to enable or disable interrupts. If you forget to do that, your program will run sometimes and then when we run the autograder, it won't. Okay, so in summary, we talked about a very important concept today which is atomic operations. So again, these are operations that run to completion or not at all, they're indivisible and this is the synchronization, this is the primitive rather on top of which we're gonna construct various types of synchronization primitives. We showed how we could construct locks using interrupts. It required very careful use of disabling interrupts and re-enabling interrupts and programming paradigms for our operating system. We have to be very careful that we don't waste or tie up machine resources by spinning. We also wanna make sure we don't do things like disabling interrupts for very long periods of time. That's only gonna cause us problems. So the key idea here is we're gonna use a separate lock variable and we'll use some kind of hardware mechanisms to protect against modifications of that variable. So from now on, we're gonna just keep going up the stack adding more and more sophisticated primitives and not just atomic loads and stores. So any questions? All right, see you on Wednesday. Thank you.