 Well, all right, welcome back. So yeah, I guess some of you came back. So the midterm, hopefully, wasn't too bad. At least anyone I talked to in the hallway said was reasonable. I don't know if they were just lying to me, because if it was actually reasonable, it seemed OK-ish. So I guess we'll see. Everything's being scanned now, so hopefully they'll be back to you sooner rather than later. Also, I'll have three stuff if you want to add tests. Go for it. I've added two more tests, so thanks to the people that did that. And you can add more tests if you want, so there's going to be hidden test cases. So the more you want to add to the global pool, and I'll just curate them quick, the better everyone's solutions will probably be. And today, yeah, let's get into center for some more fun synchronization. So previously, we looked at locks, and they ensure mutual exclusion, so only one thing can get by a lock at any time. So between lock and unlock, there's only one thread there at a time, so therefore, you don't have any data races. So and if you don't have any data races, you're all good. You won't get any weird unexpected results or anything like that. But the problem with just having locks is your only synchronization is, well, maybe I want to actually ensure some type of ordering between threads. So how could I do that? So we already know a little bit about how we could ensure some type of ordering between threads based off what you've hopefully implemented in Lab 3. So how could we ensure kind of a little bit of ordering between threads? Like if I want one thread to definitely finish before another thread, yeah, I can join on a thread. So if I join on a thread, I'm pretty much guaranteed that threads, well, I should be guaranteed based off my solution that the thread I'm joining on will terminate before I wake up again. So there's a little bit of way to ensure order between that, but that's probably not good enough. So let's have a little example. So we'll have two threads. One thread just prints this as first, and the other thread wants to print I'm going second. And your task today is to make sure that the one that says it wants to print first always prints first, the one that wants to print second always prints second. And since we now have multiple threads, you also have to look at the documentation for any functions you use to see if they are thread safe, which means you can call them from multiple threads and nothing bad will happen. Thankfully, most of the standard functions are also thread safe. So they accounted for that, but you may have to look it up in a general case. If you don't, you'll have some really weird bugs. So let's go ahead and switch to that and start executing it. So here's my order print code. Let's see that real quick. So in main, we're going to create two threads and this is just essentially looks like your lab three, but more complicated and your lab three has a definite order because there's only one actual kernel thread. But for this p thread, we create one thread and we tell it we want it to start print first and then we start create another thread. We say it should run print second and then now we're up to the scheduler. It could schedule one thread or the other, both of them at the same time. Who knows, but it's up to the scheduler and then the main thread will just join both of them to make sure that they both finish before the main thread eventually return zero, which is an exit and then the process is done so. So in print second, print first, again, just a print line, print second, just a print line. So let's go ahead and execute it. So in this case, hey, yay, it worked. So good, I'm done. Everything works perfectly. But if I run it a few more times, I get, oh crap, but now it says I'm going second. This is first, it's in the wrong order. So what's one easy way I can make sure that I always see this is first printed before this is second. Sorry, without making it sleep. I guess, yeah, but I'd have to like mess with the variable. So I could move the join so that print second joins on the first thread instead of the main thread joining on both of them, but I'd have to reorder a bunch of my code and that sounds complicated and I'm lazy. Yeah, okay, so you want me to create a lock and then have one thread lock something and then the other thread can unlock it? Yeah, yeah, so the suggestion is basically I make another variable that as a flag I check it, make sure and then write to it and then something else happens. So that would work in this case because I wouldn't have a data race because one thing just reads and then the other writes and then it just reads over and over again, gets updated. But in a more general case, that's gonna be a disaster because we're gonna have data races, right? If we want multiple threads with that, so that's not gonna quite scale, but it's a good thought. Someone also asked something about atomic. So an atomic operation either is just something that happens all at once or just doesn't happen at all. So there's no in between. So yeah, we don't know what a center for is yet. So what about if I just do something a bit slightly silly? What about if I just do that? I'm a certified genius by doing that. But I mean in general, this is also a terrible thing to do because I create a thread and then I just wait for it to finish and then I create another thread and wait for it to finish. So it's like, what the hell am I doing here? That's pretty, that's also pretty terrible. So a little bit too smart. All right, we also had a comment of, what about if I use a lock? So I thought about that. What about that? Yeah. Oh, wait, okay. There, yeah, that sounds better, okay. Okay, I went down like one, okay, that better? No, now it's back to normal. Hello, hello, okay, sure, that was fun. All right, so what about this? So here, I'll do a lock. Does that work? I get some noes, so the idea was to have one thread lock and then the other unlock. Well, I need, that means I need like two locks here because if I have something that looks like this, well, if, say we only have one actual physical thread, well, if print second goes first and it calls lock, well, it can pass by lock because it's the first one that does it, then it would print print second. So that's not gonna quite work. But now if I do this, if I lock lock, so it would get the lock and then try and get it again, which it can't get because it's now locked. So it would just sit there and wait and then the only time it would pass through unlock is if this one prints this as first and then hits unlock, right? Yep, so this like kind of works. Here, let's even try it. So ordered print mutex. Perfect, perfect, perfect, yep, we're all rolled. This is great, so good. What happened? Yeah, yeah, so in this scenario, well, this is first went first before the other thread even got scheduled. It unlocked the lock that was already unlocked. So it essentially didn't do anything. Then the second thread came in, locked it and then tried to lock it again and now no one's around to unlock it ever. So it just sits there forever. But hey, it worked more often when I ran it. It was like nine times, so that's pretty good. So this is also a decent idea but it's also technically undefined behavior because if you look at the rules for P thread lock, the rules are that the thread that acquired the lock has to be the one that unlocks it and you can't unlock an already unlocked lock and so on. So this is full of undefined behavior. I also can't lock the same lock twice. So not only are you not allowed to do that, I'm not allowed to just unlock the mutex from another thread. So this is full of terrible ideas. Well, not super terrible ideas because it kind of worked but we can do a bit better. All right, but we don't really know how to and also there's something we can solve. So sometimes you have to look up and see, hey, is a function thread safe or not? So printf is thread safe. So either print the whole line or nothing in the middle but if printf wasn't thread safe, it might behave something like this where it makes multiple system calls and it might be like halfway between outputting the string before it gets interrupted or something like that. So if we run that, we're gonna get all sorts of really weird output because we're not only are we not ensuring any ordering, we're also nothing's thread safe. So if one starts printing, the other one won't stop printing. All right, so every time I run this, it's gonna be slightly different. That one actually worked. Is something else broken? Hello? So it's either nothing or like no mic is fine. Okay, screw it. I'll just yell. Okay, good. I hate this room. This room is so cursed. Okay, so yeah, now whenever we have this, so this one magically worked by we getting lucky but sometimes it prints like halfway in between. So could I fix this so it always prints a whole line at a time? So this is something you should be able to fix without using printf. Yeah, yeah, so this is exactly what a mutex is for. So if I created a mutex here, here I'll actually just go to the solution. So if I wanted to fix this, whoops, go back, I would just create a mutex and then I would lock it, do all the prints and then unlock it when I'm done. So now we have mutual exclusion. So if it gets a lock, it'll go through all those three writes in a row without giving up the lock and then it unlocks it and then the other one would be able to do it. So this doesn't ensure ordering because we still have that same problem of either it prints this a second, this is first, you don't know what order, but at least it'll print a whole line at a time. So if the function wasn't thread safe, you'd probably need a mutex if you wanna make sure just one calls it at a time. So that makes sense. So mutex, good. Okay, so we still want to ensure some type of ordering. So I guess that's the title of the lecture, it's Senate Force. So they're used for exactly this purpose. So they sound more complicated than they are. They're like from the Navy or some weird thing where like it's a bunch of flags and stuff. Thankfully our version of Senate Force are dead simple and you can think of it as just an integer. Even easier, you can think of it as just like a natural number. So zero, one, two, three, four, that's it. So it's just a number and then you have two fundamental operations that behave atomically. So you don't have to worry about data races. So there's only two, sometimes they're called different things but I find these two function names the easiest and they match what you'll actually be using. So there's weight and post and all weight does is decrement the value atomically. So if it's a one changes to zero and post just increments it atomically. So if it's a one goes to two or if it's a zero goes to one and it does that all atomically. So there's no data races or anything like that. Now the only thing that makes Senate Force special as a synchronization permative is if you call weight and the current value is zero, it will put that thread to sleep and wake it up whenever that value goes above zero. So Senate Force cannot go negative. If it sees it's zero and if it would decrement it would go negative, it just sits there and waits until it can actually decrement it without going negative and that's it. Yeah. So for this, the question is, is the thread that calls weight first get woken up first? I would have to check with the documentation I think so. Yeah, yeah. So if it's currently zero and it tries to call weight, it'd be like, okay, go like same with the mutex if you can't get it, go on the queue and I'll wake you up in the order you tried. So if we have this now, oh yeah, until the value of the Senate Force greater than zero. Yeah, it'll sit and wait until it can actually decrement it. Yeah, okay. So yes, here's its API and don't, if you have a Mac, don't use this because they deprecated this and it doesn't work at all. So you can only do this if you're on Linux, even though these do exist on a Mac, but they essentially do nothing and it's kind of terrible. So, and it, it's like the same C thing where it wants to initialize something and you give it a pointer to something and you can allocate memory for that wherever you please on the stack, heap, global, whatever you want. Then the second argument is P shared and that specifies it's just a zero or a one, it's just a Boolean. Whether or not you want the Senate Force to be shared with any of your children processes if you happen to fork. So put in shared memory so they can both access the same Senate Force and if you set this to one, you can actually use the Senate Force to ensure ordering between processes if you really want. But for this course, it'll probably always be zero because we're just learning how to use the Senate Force just straight up. And then the third argument is the value. So it's an unsigned int, so zero and above and that's the initial value of the Senate Force. And then of course you can destroy it and then we have weight which will decrement it and if it's zero it will sleep and like the question said, there's a question like if I call weight will it always block? So if you don't want that behavior there is something called try weight. So you can do a try weight and it will tell you whether or not it successfully decremented. So if you don't want to sit there and get blocked you can do try weight and it will say yes or no whether you actually decrement the value. And then post, there's no try post because it will always increment the value and there's no problem. So all the functions if they're successful return zero and again like I said, the second argument is if you want to share it between processes like your children you can set the second argument to zero but or one but we'll probably just have it zero. So let's go back to the problem. So now using a center four, is there any way we could make sure that one prints before the other? And let's go back to the code. Okay, everyone got that? So yep, we got club. All right, so let's one sec, let me find this. So we'll have the same code as before, oops. But except this time we'll create a center four. So at the very top we just have essentially a global value center four static just means that it's only accessible within the C file. And then first thing I do in main is I initialize it. So I give it the pointer to the center four. I say p share to zero and then I'll set the initial value to zero. And why would I do that? Well, the simplest thing, whenever you use center four is it's generally easiest to figure out where I should put a weight where something needs to wait. So I want print second to go after print first. So if I put a weight there, then yeah, if I put a weight there and the initial value is zero, then if print second happens to get scheduled first, well, the value of the center four is zero. It would hit the weight and it can't decrement it. So it put itself to sleep. And then in print first, well, it would print the first line and then after it's done that, it would increment it and then the second thread can go ahead and actually decrement it and finally do a print. So now if we execute that, fixed, oops. So now if we execute that, it works now. So it always does first and second. So any questions about how that works? Or are we good? Yep. Yeah, so the question is this any better than if we do the flags thing? So the flags thing, you would have to make sure that there's no data races involving the flags. So if I did this with like 10 threads or something like that, at the end of the day, as soon as you fix all the data races with that, you essentially made a center four anyways. So this is like the general solution. Yep. It's just, yeah, it's basically the same idea, but after you try your idea and you give it to students and they make your code seg fold a few times, then you figure out, hey, center four is what you come up with. So the kind of bad thing about center four is this being correct. I can't really tell if it's correct just by looking at print first and print second because what if I do something like that? And some of you may have even missed that. So I just set the initial value instead of being zero. Now it's one. So now does this do anything? Yeah, so now it literally does nothing. So in the case where print first executes first, well, it would print this and then it would post it. So it would go from a one to two, cool. And then print second, it would hit the weight and then go from a two to a one. So it wouldn't get blocked or do anything and then print F, I'm going second. And then the other case that if print second happened to go first, well, if the initial value is one, well, weight doesn't block or anything, it just decrements one to zero and then print prints F, that's it. And then print first would come along, print this as first and then post it back up from a zero to a one. Yeah, yeah. So you're not allowed to access the actual value. You just are allowed to set an initial value and then just do post and weight and that's it. Okay, any questions about the center four before we go on to more difficult tasks with it? Okay. So that's, so next question is, well, it kind of looks similar to a mutex now too. So the question is, can I use a center four as a mutex? And if I can, how would I do that? Cause they look oddly familiar-ish. Yeah, yeah, so if I set the initial value to one, well, I could essentially make, or yeah, if I set the initial value to one and instead of locking I weight, well, that will let one thread through. And if another thread tries to call weight, it will sit there because now it decremented from a one to a zero. And then if I replace unlock by post, well, it would change that zero to a one. And now another thread could go and make it past the weight, but only one thread. So it's exactly equivalent to a mutex. In fact, it's like pretty much exactly what like our spin lock was doing, right? Our spin lock was, our spin lock implementation was just a number and it went between one and zero. So same idea here. So if you wanted to, you could go ahead and fix that counter example by using a center four instead of a mutex. So again here, I initialize it to one. If I initialize it, yeah, I initialize it to one. Instead of lock, I do a weight, which will let only one thread through at a time. And after I'm done, I post it. So the center four will just go from one to zero and then from zero to one and that's it. So what would happen if I made an oopsie and I initialized the center four to be a zero? Yeah, I'd get stuck. So say I even had like eight threads executing run. Well, the initial value is zero and all of them would try and wait and they can't decrement it because it's already zero. So all eight threads would just hit there, get blocked and then I can't make any progress. So I'm essentially screwed at that point. So everyone good with that? Okay, no, good. Okay, so we can solve a better problem now. So producers and consumers. So maybe this is like a common thing that you might have as soon as you learn about threads, you might have some threads that produce data and some threads that consume data and you want them to work in parallel as much as possible where everyone can, yeah, you want them to run in parallel as much as possible. So one way you might do that is you might have like a buffer or just an array if you want to think of it, that's circular. So it just keeps on getting reused over and over again and each of them has a slot for data that is either empty or filled with a value that you have to consume. So the rules for this is a producer should be the ones to write data to the buffer and it shouldn't overwrite any data. So the buffer shouldn't, that slot shouldn't already be full because if I overwrite something, well, it never got consumed and I just lost the value. Then consumers, the other end of that, so it should read from the buffer, read a valid value and then do something with it and it shouldn't just read an empty slot and just do nothing because, I mean, doesn't read any data, it's kind of useless. So for this one way you can do it is make it so that all consumers share an index and all producers share an index. So I'd have to make sure that there's no data races for that index between all the producers and no data race for that with all the consumers. And in both cases, the index would just start at zero and then increase sequentially and if it hits the end it just goes back to the beginning. So here is kind of the base loop. So producers, all they'll do is have a while loop that may go on forever for this. I actually have like a set amount of things and then all it wants to do is fill a slot and then the consumer, all it wants to do is empty a slot. So we'll go ahead and look at the code. So there's a lot of code here just to make it work but at the end of the day, I use a center four to keep track of like the number of things they want to produce and the number of things I want to consume but you don't have to worry about those center fours. This is the crux of the program. So in the producer, it will simulate doing some work so it needs to compute a value, do some machine learning model, do whatever, take some amount of time and then it fills a slot. So it writes to that shared buffer between all the threads. And then in the consumer, it would grab some data from the buffer and then the sleep is just to simulate it actually doing some work with that. So for this example, I made it so you can like specify the number of producers and consumers you want to run and a few other arguments. So the first two arguments we want is the number of consumers and the number of producers and each one of those will be a separate thread. So if we do 10 threads of each and let it execute, well, we get some bad behavior. So anytime we break one of those rules we don't want to actually overwrite a slot or read an empty slot, I have it print a red line. And in here we can see that, hey, all of the consumers executed first before any producers, which sounds like an ordering problem, good thing we learned about center fours. And you can see without anything it just starts trying to empty slots that are already emptied. So that's bad, it just read nothing and they all read a whole bunch of nothing. And then they started filling the slots. So all the producers started filling slots and filled all the slots and luckily enough the consumers emptied like five slots and then it filled another five and they just stayed there forever and never ever got processed. All right, any other fun things you want me to try with these arguments like what if we have more producers than consumers? Well, we can see we have a lot, we have less of an issue. So we only empty three slots because take some, like we just empty slots immediately but we only have three producers. And then all our consumers, sorry, consumers and then all of our producers come in, fill all the slots, we empty another three slots, we fill a slot and then we actually wrap around back to the beginning before the consumers actually use that data. So now we fill a slot that has already been filled and not processed yet, so we screwed up again. And then the other bit of it actually works well. So what should I do to fix this problem where none of these issues happen? Hopefully using a center four. So we have two problems, right? One problem is we are overwriting slots that are already filled, the other is we're emptying slots that are already empty. Yep, let's see here. So you want a center four, let's see if we made them. What do you want to keep track of, sorry? So the number of filled slots, yeah? So we'll make a center four for the number of filled slots and we'll go ahead initialize it and what do we want to initialize it with? Zero, okay. So everyone liked this track where we're going so far, we're keep track of the number of filled slots, yep. Sorry? Yeah, so if we are keeping track of the filled slots, well we should also keep track of the emptied slots. So but let's just, sure. So let's just do the posts and stuff for the number of filled slots. So initially nothing's filled, so what do we want to prevent with keeping track of the number of filled slots? So it's generally easier to place our center four weights if we want to prevent something. Yeah, so everyone agree with that, we want to put our weight for the consumer, so it actually weights for some data. So if we go ahead and we do SEM, or poster weight, weight. So I weight, so I weight on filled slots. So that's good, we can actually argue about this a little bit. So filled slots is initially zero and my problem was that if a consumer ran first, well it would try an empty slot that doesn't have anything in it. So if the initial value is zero and a consumer happens to execute first, it would hit the weight and actually get blocked. So it would not try an empty and already empty slot. So that sounds pretty good. Should I probably put a post anywhere that goes with the number of filled slots? Where should I put it? Yeah, yeah, after we produce some data. So now we post. So as soon as we fill a slot, we post it which just increments that value and it's like, hey, we filled a slot. So if we fill a single slot, well one of the consumers would wake up, say there's a backlog of 10 of them, only one would wake up and actually process that data and that's it, we're all good. All right, sound reasonable to everyone? That seems to work. If we go ahead and compile that, yes, so now if we go ahead and compile that, we see, hey, we don't have that big red message anymore. So we know that the consumers will probably want to run first but now they get blocked. So now we don't see any emptying of a slot because it waits exactly for one slot to get filled and then it empties it pretty much immediately because it was just waiting around just kind of chomping at the bit. And you can see this happens for these first slots then it fills the rest of them and then keeps on going, which is pretty good, that works. So did we solve all of our problems? So that kind of worked, that kind of worked. Oh, so we didn't solve all of our problems. So if we have more producers and consumers, we still have this problem where we are emptying a slot that's already empty. So if we have more consumers, there's more things to empty slots and we don't have anything to protect against the other case. So what should I do to protect against the other case? We already kind of started it by creating a center for to keep track of the number of empty slots. So that sounds pretty good. So let us place our wait and our post. So wait is generally the easiest thing to do. So what should wait on an empty slot? Sorry? Yeah, the fill slot or the producer. I'll go up a line. So we should probably put emptied. So we want to make sure it waits on an empty slot. That seems fairly reasonable. So what should, am I missing anything else? Yeah. Yeah, so I should probably post after I actually consume a slot. All right, everyone agree with that? Yeah. Yeah, this is to ensure that you don't actually overwrite a slot that's already filled that before it's been processed and you don't fill up a slot when it's, or sorry, you don't empty a slot that's already empty. All right. All right. So we're good, perfect solution. This will work. So let's go ahead, compile it. Someone broke something, help. So you said you want to initialize the emptied slots to one? Yeah, okay, yeah, the number of slots. All right, one or the number of slots? Is anyone's a number of slots? Okay, well, let's try one first. So what happens with one? So of my number of empty slots, if I put one, am I gonna see the same thing where it hangs? Yes, yes. So confident everyone, okay. Not all good. Yeah, so it works, but it's ping ponging a lot. So it takes way longer to fill a slot than it does to empty a slot, but it's just ping ponging back and forth because we set the value as one, which we argued before is essentially like a mutex, or yeah, essentially like a mutex, although it's from different threads, so you're allowed to do that, but it's essentially ensuring, ordering that you go first or I go first, then you go, you, me, you, me, you, me. Which is only going on the one. Yeah, so the value just keeps on going from one to zero, zero to one, one to zero, zero to one, and they'll just ping pong back and forth. So I don't have any issue, but this kind of sucks because only one thread is doing something. So only one thread fills a slot, waits, empty, fill, and then at that point, why do I have multiple threads? I'm just kind of wasting time. So then we have a bunch of people who said I should just do the number of slots, which what I call it, I call it a buffer size. So now if the center for buffer size, well, that's the number of things in there. So if consumer went and the number of empty slots is, sorry, yeah, the number of empty slots is the size of the buffer. So the producer could actually fill the whole thing up before any consumer runs. So if there's 10 slots and you initialize it as 10, well, 10 producer threads could go. One would decrement it from 10 to nine, nine to eight, da-da-da-da-da-da, and they all run in parallel until you could have even more producer threads. So there could be 20, but in this case, I mean, that's not gonna really help anything because my buffer is only 10. So if I go ahead and do that, yay, it works now. Any number of threads I want, I mean, 15, 15, no matter what I do, I don't get any red messages because everything is now properly synchronized and everything works, cool. All right, any questions about that? So that's ensuring ordering. Like the center for, or sorry, doing this with producer-consumer using center for is actually fairly hard, but if you break it down and just do one condition at a time, the hard part is generally figuring out what's going wrong and where the data races are. So as soon as you figure out that, hey, filling already filled slots is bad and emptying already emptied slots is bad, well, as soon as you can identify that, actually fixing the problem isn't too bad. You come up with a center for, you figure out where to put weights and then based off the weight, you'll probably know what value you should initially set it to. So zero if you want to stop something and then the number of things if you want, that many things in parallel to run. So in this case, I can only have like buffer size, number of threads running at a time, so I initialize that to the number of, or buffer size. So pretty good, all right. So yeah, so that was that. We just, same things just so you have in the slides, yep. Yeah, so yeah, so that's a good point that you're thinking about data races, right? So a bunch of consumers are using a pointer and all the producers are using a pointer. So if you actually go and look at the code, so yeah, you can look at the fill slot and the empty slot because you'd see here that I had to prevent data races. So of course, you can see that I used a mutex because bad things didn't happen when I executed it. But if you're not given any of this, you have to start off identifying data races. This was a fairly obvious one because you have a bunch of producers all modifying the same value. So I need a mutex for all of them and then I need a mutex for slots too because I have to protect the buffer itself against data races because they're all sharing it. So whatever you have, multiple threads, you have to think about what data share between them and you have to think about, hey, are any of them writing to it? And then you have to prevent the data races. So you can look through the restless code to see if you understand it, but it's pretty good. All right, any other questions? So yeah, that was a good insight, yeah. Cause if I gave that to you without a mutex or anything, you'd also have to protect against that and bad things would happen. Oops, yeah. So we saw what would happen if we initialized both CenterFors to zero too. It just hung because all the producers went there, it was zero and they couldn't produce anything. They all got stuck. So we saw, we used some CenterFors. We ensured some proper orders. You can just think of them as an integer. Easiest thing, they're zero or above and they have two operations, increment and decrement. And if the value is zero and it can't decrement it because it's never allowed to go negative, it just blocks and waits until something else increments it so it can go ahead and decrement it. And based off that last point, while that ensures some ordering, but it's still not gonna save you from potential data races and you still have to prevent those and generally for data races, you would use mutexes just because it makes it look a bit clearer than just using CenterFors for everything. So there's different approaches you can use, but if you want mutual exclusion, generally it's clear to use a mutex instead of that special case of a CenterFor. And if you want to ensure some type of ordering, use a CenterFor. So that's it. Just remember, phone for you, we're all in this together.