 All righty, welcome back to Operating Systems of the Day, talking about centerfors. Generally, the name comes from using flags and stuff to signal ships. Unfortunately, ours will be less fun than that. They're software centerfors. So let's briefly review what we already know. So locks, they ensure mutual exclusion. Remember, anything between the lock and the unlock is called the critical section. There is only one thread allowed to execute that at a given time. It doesn't help you ensure ordering between threads. So how could we ensure some ordering between threads? Let's look at today's problem. So in our first thread we have it printing, this is first, and the second thread prints, I'm going second. Now if we set this up, let's go to our example. If we set this up, we create two threads, then wait for them to finish. In this case, they are kernel threads, and we're not controlling the scheduling at all. So we have no idea which one is going to run. If we go ahead and run this, well, we should see that, hey, it goes this is first and this is second, which is what we want. One time, second time we run it, not so lucky. Run it again, not so lucky. Run it again, not so lucky. So knowing what we know already, how would we fix this so that one executes and then the other? And the solution can be as lame as you want. Yeah. Sleep? Sleep? Okay, not that lame. Slightly less lame. So we could use the global variables of flag, but that sounds complicated because then we probably have data races and stuff. Yeah, we can use locks for that. Like just switching around some lines of code. Is there a way I can make this work? That's lame. Yeah, just do this. Just move the join before the create. That way, I mean, this is a bit lame, but it will work. So it creates the first thread here. It prints then the join in the main thread waits for it to finish and waits for it to terminate. Then it creates the other thread. I'm going second. But now we're in the case that we're essentially only running one thread at a time. Which is kind of crappy, especially if this takes a thousand seconds. We have some code after the print statements. Well, even if we have a machine with two cores on it, with this solution, this will just take a full thousand seconds. Then we need to execute this. This will take another thousand seconds. The whole thing will take two thousand seconds, which is a bit slow. So without using any special things, can I fix this with just a simple lock? So this will take some creativity. So let's see what happens if I just create a mutex. I'll initialize it. And then I'll do something crazy. So here I'll do a lock. And then I will do another lock. So I do two locks with the first thread. Yep. So why? So I want this line to always print first and this line to always print second. So I can make it so that if the first or second thread executes, it will do a lock call. It will be successful because it's currently unlocked, so it will essentially get the key. And then it will try and lock again and it can't make any progress because, well, it has a lock. And then in the first thread, I will unlock it. So now the second thread, so this locks, then it will try and lock and it gets blocked because that mutex is currently in the lock state. And then it would switch over to print first, print this as first, and then unlock the mutex, and then the second thread can proceed and then print off its going second. So let's see if my crazy idea worked. Okay, apparently not because I, what the heck did I do? How do I get pages of errors? It kind of works. Or print line two. What the hell? Where did that come from? Okay, don't know where that line came from. That's a mystery to me. So if we print now, hey, look, it works. I'm going first, I'm going second. I'm going first, I'm going second. Kind of works. I saw the first message first. Looks a bit weird. It kind of hanged. Kind of hanged, you know. User's problem. They used it wrong and then it works again. Yeah. At the end of the program, it just process finishes, everything's gone. So what's happening in the case where it just hangs? Yeah. Yeah. So we have to consider the situation where thread one runs. So initially the mutex is unlocked. If for some reason the first thread executed first, well it would print off its line and then unlock the mutex. So it's already in the unlocked state. So it does nothing. So unlocking and unlocked mutex does nothing. And then if we execute the second thread, well it would get the lock and then try and get the lock again. And nothing is going to unlock it for it. So it's going to sit there and wait forever, which is essentially it deadlocks itself or can't make progress. We'll learn more about deadlocks in the next lecture. But this doesn't always work. It gets us a bit closer. It's fairly lame. And also if you look at the documentation for mutex, we're actually not allowed to do what we did. So by the spec, you are only allowed to unlock a mutex from the thread that locked it. So technically this thread locking it and this thread unlocking its undefined behavior, it appears to work with this implementation of pthreadmutex. But if you read the documentation, you're not guaranteed to do that. So let's see, we clean that up. So you had a good idea of essentially make some flag or something like that. So if we are in computing, we want to make something fairly generic we can use. We don't want to just resolve this problem every single time. So we will use something called a center four. So a center four is basically just a value. You can think of it as an unsigned int. So it will always have a value of zero or greater. And it only has two operations that are atomic. And remember that means they either happen or they don't know in between, don't have to worry about data races. So the first operation is called weight. It decrements the value. So it would change it from one to zero or two to one, so on and so forth. The other operation is post. So it will increment the value atomically. So it will just increase it by one. And the reason that this works and it's useful is weight has another special rule. Weight will not return until that value is greater than zero. So if that value is zero, whoever called some weight, which is different than the process weight will get blocked until it can actually successfully decrement that value to, yeah, essentially decrement that value because that value is not allowed to be negative. So it has to, if the current value is zero, it's going to wait until it's at least one. And then after that value is one, it will decrement it from zero to one and pass. So when you use the center four, two operations, there's the increment and then the decrement and the decrement will wait until that value is greater than zero. And the other way to use it is you just set an initial value for it. So its API is fairly similar to pthreadlocks. So you include center four dot h, you can create one. So you give it a pointer to the center four type. And then it has another additional arguments. So this p shared argument specifies whether or not this center four should be shared when you fork. So it will put it in shared memory if this value is one indicating you want to be shared across processes. So you can use this to handle ordering between processes if you really want. Otherwise, if this is zero after the fork, each process will have its own independent center four and they won't affect each other like we know and love. And then the third argument is just the initial value for the center four. Then you can destroy it. Then there's this wait function, which again will decrement it. If the current value is zero, it will wait until it is positive. And then try wait is a non-blocking version of wait. So instead of waiting until it is allowed to decrement it, try wait will just try to decrement and tell you whether or not it does. So it's just the non-blocking version, just in case that is a useful thing to do with your implementation. Then the other one is this post. There's no try post because remember post just atomically increments that value. And you can check all the return values. If they're successful, they should return zero. So now with the center four, which is basically a fancy copy of your flag, let's see if we can make this always print. This is first and then I'm going second to do. So with the center four, if we want to use one wall, we should probably include it. Then we'll just make one. I'm feeling creative today, so I'll just call it SEM. So how would we use that if we want to make sure that this thread prints first and this thread prints second? And usually the first thing to do is easiest thing to do with center fours is to think about where we need to place our wait call. So what do we want to prevent from executing first? That's probably where we should place our first wait call. So what do we want to prevent from running first? Yeah, at the beginning, we want to prevent this from running first. So if we want to prevent that, well, we should probably have a SEM wait here. So that alone by itself will just check the current value. If the current value is zero, it will actually block until someone does a post on it. Right now, well, we didn't initialize it, so we should probably initialize it. So SEM init, SEM, we don't care if it's shared. What should the initial value probably be? Zero. So if the initial value is zero, what will happen is if thread two executes first, it will hit the SEM wait. The current value is zero, so it cannot successfully decrement it. So we'll just sit here and block until someone else increments it. So who else would increment? Well, let's see. If we just did a SEM post immediately in the first thread, that doesn't really prevent anything because it would immediately change the value from zero to one. And then we could context switch over to the second thread. It would be unblocked from the wait because the value of the center forward is now one, so it could decrement it from a one to a zero, and then it could print this first. So what should we do? We should probably move the post call till after the printf. So now in this case, we can argue about both ways. So if thread two executes first, well, it'll hit the SEM wait. The current value is the initial value, which is zero, so it won't be able to progress at all. It gets blocked. If we execute thread one, it'll print this as first, then increment that value from one to a zero, and then this thread can get unblocked. Print, I'm going second, and the rest of the code there, that this takes a thousand seconds, can execute in parallel. So they're not prevented. The only thing we are preventing is we're just preventing this printf message from going ahead of the other one. Otherwise, they can go in parallel for the rest of the time. So let's go ahead and make sure we are sane. So print this as first, this as first. We can run it as many times as we want. It'll always print this as first, but better than trying it a thousand times and it always working, we can just look at it and argue about it. That's fun. You're going to have to start getting used to this for programs now just because it doesn't exhibit a bug whenever you run it doesn't mean that you have thoughtfully designed your program such that it does not have any data races or not have any issues. Any questions about that? All right. If I do something like come along and be like, oh, I don't like that value being zero. I want it to be one initially because that makes me feel better. Is that good? No, essentially I'm just wasting time now. So I didn't prevent anything. So if the initial value is zero or sorry, if the initial value is one, if the second thread goes first, we'll wait it can immediately pass. So it can change the value from one to a zero print and then we didn't prevent anything. It happened first. Otherwise, if the first thread happens to go, I mean it doesn't really prevent anything. It just would post it from a one to two and then this weight would just decrease it from a two to a one. We're not really changing anything. So the initial value is very important. 90, that wouldn't do anything. All right. So any questions about the center four? We are now experts? Yep. So if p shared is one and you fork, they will both see the exact same value. Yeah. So you can use it to synchronize across processes. Mu Texas cannot because they have to be owned by a thread. So you can use center fours across processes and be careful if you're on Mac OS, they don't like this center four because it's anonymous and their implementation does literally nothing. So if you're depending on it on Mac, you're screwed. Yep. Oh, that's a good question. So what are things that a center four can do that a mutex can't or other way around? So in fact, we can use a center four as a mutex. So if I want to use a center four as a mutex, so a mutex is basically a special case of a center four. How would I replace a mutex with a center four? What should my initial value be? Right. So if I set the initial value of my center four as one and then replace lock with weight and unlock with post, it behaves exactly like a mutex. So that initial value of one, if we always have weight followed by post, well, if we wait, we would decrement it from one to zero and then one thread can go ahead, execute that. One thread comes in and tries to do the lock, which is just a weight. It can't progress until we're done and we post it back from a zero to a one. Then in that case, the other thread can come. It's the only one there. It would do the post as part of the unlock. So a mutex is literally just a special case of a center four. But it's really important with the synchronization derivatives that you just use, you kind of use whatever you want your intention to be. So if you just want mutual exclusion, just use a mutex because that's all it can do. It's really clear what your intention is. Even though you could use a center four, generally you only use center fours for ordering things. You do not use them to ensure mutual exclusion. So if you want mutual exclusion, use a mutex. If you need ordering, use a center four. You do not use a center four for mutual exclusion because most of you at the end of this course will just see mutexes everywhere and you may or may not forget what the hell a center four even does. So if you read some code, you might be like, what the hell is a center four? I forget. And mutual exclusion, easy. You don't have to care. And that center four behavior also depends on the initial value. So if you change the initial value from a one to two, for example, suddenly it doesn't ensure mutual exclusion at all. Even though you like your lock and your unlock look correct, that initial value is really important. So as another example too, because you are now going to be using other people's code, so you have to ensure other people's code is essentially what's called thread safe. So if I was to break into a definition of printf, printf could do something silly like this, like three write system calls. It just breaks it up all in one and it assumes your processor just runs sequentially. So this is fine. And in print second, I get three system calls. I'm going second. That's fine. If I try executing this, well, if I just have a single thread on my machine, it's going to look the same as anything else, right? If I just use printf or anything. But if I run it now, I get a nice line. This is first. I'm going second. So we got lucky. Got lucky. Got lucky. Got lucky. Got lucky. No, we got unlucky. We got I'm this is going first, second. Yeah, that makes sense. Oops. I was just compiling. I'm first. I'm second. So lucky, lucky, lucky. I'm this is going first, second. Yikes. Yeah, happened to be the same. So how would I fix this to make sure that it is thread safe, AKA it's doing what I intended to do. So I want, if I start printing a line, I want the entire line printed all at once. I don't want any weird intermixing. So how would I fix this? Yeah, we just want to lock for this because we essentially want to make sure that, hey, this runs to completion and it doesn't get interrupted at all by this. So we would probably easiest thing to do is use a mutex. So we could use a mutex. I can do a creative name. OK, p thread mutex. Lock. So in this case, if I use the same mutex for both of them, while I have mutual exclusion between the different critical sections. So if this thread starts executing and it gets a lock, we know for sure it's going to finish these three right system calls before the second thread can do anything. Similarly, if the second thread gets the mutex first, it's going to do its three calls before thread one can do anything. So in this case, if I fix that, well, I get, I don't get the ordering between them, but I make sure that I definitely get one line at a time. So this is now going to be a problem when you use libraries because this in technically is a valid implementation printf from what I showed you before. But you wouldn't want to use this if you have multiple threads because it's not really safe. So now when you look at documentation for functions, if you have a threaded application, you want to make sure it is something called thread safe, in which case it is safe to call from multiple threads without calls affecting each other and without essentially messing up whatever you intend. So luckily printf is thread safe. Luckily malloc is thread safe. So you can call it with multiple threads and you're guaranteed not to have issues. But in the real world, if you use someone else's code and call their function and you are using threads, you use their function and then you ask them, hey, is the function you wrote thread safe? And if they reply, what the hell is a thread? You do not use their program because it will probably not work. So that's why we take this in this course. This is why it's a required course because, well, guess what? You're going to want things to go fast using a thread. You're going to have these issues. You need to look for at least thread safe. So luckily all the C standard library stuff is all thread safe. But whoops, but again, calling people's code not guaranteed to be as such. Okay, so here is that example. If we want to use the center for as a mutex, note the value. So we just set the initial value to one. Instead of lock, we did wait. Instead of unlock, we did post. It's a silly thing, but we could use it as a mutex this way for that count example we saw before. All right, we are experts now. So you've probably seen this before, like a producer-consumer problem. So we're going to make it go fast. So let us assume we have a circular buffer. What's a circular buffer? It's just a fancy way of saying it is an array that just goes back on itself. So if this array has n elements, the indexes would be zero to n minus one. So if we're at n minus one and we go to the next element, instead of saying this list is done, just goes back to the beginning, just goes back to zero. So in this, we're going to try and make it go fast. So we're going to have multiple producer threads and multiple consumer threads. So what the producers are going to do is essentially write data into this array. They're always going to start at index zero and then write index one, two, three, four, five, six in this case, then go back to zero. One, two, da-da-da-da. It'll just keep going like that. The consumers will read data from the buffer in the same order. So first they will try to read the data in slot zero, then one, then two, then three, then four, da-da-da. Same thing. So since we have multiple threads producing data and multiple threads consuming data, we want to make sure we don't do anything silly. So one silly thing we could do is read from a buffer that doesn't have any valid data in it. So we should only read from places with filled in data. We shouldn't read empty data. Otherwise, we just wasted our time. We can't do anything. The other thing we should prevent is we should prevent the producers from overwriting data. So let's say if we had seven producers, well, it should be able to produce an element here, here, here, here, here, here, here, fill it up, and then it should not overwrite whatever data is in slot zero until a consumer has read it. So some consumer needs to have read it. Otherwise, I would overwrite the data in position zero and I would lose that data forever. So let's make this work. So let's go into our code example. So there's a few things going on. So here, my buffer size is 10. And I have written this program. It's ignored for now. That has quite a few arguments. So we'll just use the first two. So the first is the number of producer threads. So each of these will be a thread. And then the next is the number of consumer threads. So all the producer threads call this function. So don't worry about this while. This while is just so we have a bounded number of entries to produce. So if you were to really write this program, this would probably just last forever and probably just be like while true, do something or until you run out of data or something like this. In this case, I'm just using a center for to keep track of the number of items I need to produce. So ignore this while line, assume it just says while true. All we care about is that every producer thread will essentially execute the body of this while loop. So it's going to sleep for a little bit of time to simulate it doing some work. So this is producing data. It needs to grab some data, I don't know, from the internet from, I don't know, from somewhere on the internet, produce some data, we'll simulate doing some work. And then after it has produced some data, it will fill the slot and it producing data, we want that to be in parallel with a bunch of other producers and consumers. So this fill slot I have written for you. So it increments the index of the, that's shared between all the producers. I wrote it so you know that it's thread safe because I wrote it. So yeah, of course it is. So that will go through and fill slots in order. And then in the consumer again, ignore the while loop. What each consumer will do is try to empty a slot and then simulate doing some work on that data. So let's go through and run this. So I can run this with just say 10 producer threads and only three consumer threads. So if I execute that, I have it printing a red line whenever I have done something bad. So I have my consumers trying to empty slot zero, but I haven't produced a data in slot zero yet. So I print out an error that the slot is already empty. So why did you read it? Then here, my other threads empty slot zero, another error, empty slot two, another error, and then my producers run. So they fill slot zero, one, two, three, all the way up to nine. So zero to nine are all full. All the buffer has all valid entries in it. Now my consumer thread empty slot three, which has some data in it. Empty slot four and five, which have some data in it. And then my producer thread keeps on going. So it fills in slot zero, which no one has read slot zero yet. So it's already filled. It just overwrote some data. Then we fill slot one. It also overwrote some data. Filled slot two also overwrote some data. And then we filled slots three and four, which is fine because they were already read. And then we're out of elements to produce and then it just empties the rest of the slots. So any questions about that? That seems pretty bad. All right. How would we fix this? So the first thing to do, usually with Senate 4s, the easiest thing to do is place your first weight to prevent something bad from happening. So what is something bad that's happening that I should prevent using a weight? Yeah. So I should probably have a weight here because I want to make sure that I don't empty a slot that has no data in it. So this is probably a prime candidate for a weight. What should I call my Senate 4 slot full? Yeah, filled slots. Let's call it filled slots. So we'll wait for at least a filled slot. So let's go ahead, let's create it. What do we call it filled slots? All right. So now we have to initialize it. I have this code that calls this initialize so we can have everything nice and together. So we don't care if it's shared, so we hit a zero. What initial value should we use? Zero, probably. So if we use zero, that's good because if a consumer thread runs immediately, what will happen is it will hit weight. Currently the value is zero, so it gets blocked immediately. It cannot empty an already empty slot. So that's good. So if I run this now, it'll work. I need a post. Where do I need a post? After I fill a slot, okay. Send post filled slots. All right, am I good now? Let's see. So we feel like we're pretty good. So what did I miss? What else do I have to do? Another Senate 4. Yeah, so we should probably have a first Senate 4. So usually we look at our first problem here. So what is our first problem? So here we were filling slots. That was good. We didn't run the empty, we didn't run the consumer first. So we filled four, no, three slots. Empty two, filled another five. Empty another one. Then we fill, fill, fill, fill, fill. Fill slot zero, fill slot one, two, three. Uh oh, we filled a slot that was already filled. We went too fast. So essentially the producer caught up to the consumer and shot past it. So how do I prevent it from shooting past it? Second Senate 4, right? What should the second Senate 4 keep track of? Yeah, the number of filled slots. So my problem was in my producer. So first thing to do, I should probably put my weight here. I'll call it something like empty slots. And then let's go ahead and create it. Let's hide this. There, am I done? Yeah, so in the consumer I need a post. Probably after I empty a slot, right? Post empty slots. There, am I done now? So the empty slots, while I define it here, it will just be general called buffer size. Are we good now? All right, let us try. Magic. And even if we go the other way, no one's that lucky. So any time I ran it without it, this 10-3 frame which always did it. But we can see, watch. So we know that there are more producers and consumers in this case. So we can see we filled slot zero, one, then we empty slot zero, fill slot two, three, empty slot one, two, fill, slot three, four, five, six, seven, eight, nine, went back, filled slot zero, one, two. And then it waited because luckily or probably those producers were, there's way more producers. So it could have filled a slot more slots immediately. But here, thankfully it waited. It waited for empty of slot three, empty of slot four, then immediately it filled slot three, it emptied slot five, immediately filled slot four, emptied slot, then it started emptying because this is the last call to fill. So it just emptied the rest of the buffer and we were all good. So even if we have the other way, we have more consumers and producers. Well, we'll just see that in this case, as soon as we fill a slot, it pretty much immediately gets empty because we have way more consumers and producers. So we fill slot gets emptied, fill two slots, they get emptied, fill two slots, they get emptied, fill slot gets empty, fill slot gets empty, fill slot gets empty, da-da-da-da-da-da-da-da. So this will always work. We will not have any of our conditions. So we are good. So we solved the hardest problem we'll see. So it wasn't too bad that we just broke it into little steps. We first placed our weight, then set an initial value, placed our post, then we just did it again. So here is our skill testing question then. What would happen if I did something like that? Something would block forever? Yeah, so in this case, if both the initial values are zero, well, if the producer runs first, it hits this weight, it can't progress because the current value is zero. It doesn't matter how many they are, none can progress. And then if I call weight in the consumer, the current value fill slot is also zero. None of them will progress. So they're waiting for something to post. Nothing is ever going to post. In this case, this is what is called a deadlock that we'll see more of next time because no thread can make any progress whatsoever if we set the initial values wrong. So very important to set our initial values correctly. So everything is in the slide so we have it that we just went over what happens if we initialize both the values to zero. Everything just blocks. We get in a situation that is called a deadlock that we get to go into more detail with tomorrow because spoiler alert, you can deadlock using multiple mutexes as well. Basically the same thing, holding another thread hostage. So what we did today though was we used center force to ensure some proper ordering between threads. We saw a mutual exclusion before using mutexes but now we can do some more synchronization between threads. They can actually run in an order we determine. So center force is pretty easy to use. They're just an unsigned int. You can set the initial value to whatever you want. Two operations that are atomic. Post will increment that value by one. Sometimes it's called P in the literature. Math people like shortening it. Weight, all it's going to do is decrement that value if it is greater than zero. If it is zero it will block until it can successfully decrement it without it going negative. So sometimes in computer science thing I think that's just V. Don't ask me why. So with this just remember you still need to prevent data races. So ordering does not prevent data races at all. That's what mutual exclusion is for. So don't forget that. But we have something else in our tool about. So oh also lab four. The early testing thing like half of you didn't submit anything for the early testing. So you have another day to do that in case you want like the easiest five percent that you will get. So submit something so you get that. Like you can do a lame test if you really just want five percent. All right so remember pulling for you. We're on this.