 All righty, welcome back to Operating Systems. So today will probably be a fairly chill lecture. It will be very similar to Lab 5. So today we're running a bank. So what we're going to do is we're going to simulate 10 million transfers between accounts. The test site or the program I wrote will let you vary the number of accounts we are managing. And the rules are each account has a unique ID and then a current balance starting at $1,000. And we will generate transfers between random accounts so they transfer 10% of their current balance out of their account to another account. So if we are in the bank, we shouldn't be losing money. They're just transferring between each other. If we start off with $1,000, we should end with $1,000 or $2,000, or depending on how many accounts we are managing. And then each of the, well, whatever you start initiating a transfer, you have to call this function called securely connect the bank. It's just not actually doing anything, just kind of emulating what would actually happen as part of the program because we need to initiate our transfer. Learn that, we're just going to make it go fast. So everything is given to you, so the default template without any threads or anything is given and we will just make it go fast. So let's run through the program real quick. So we have a few defines. So our starting balance for every account going to be $1,000. Number of transfers is 10 million. And then if we want to make it go faster, well, we should probably create threads. So if we create threads, we'll create eight of them. So here is a struct to represent our account. It has a unique ID and then a balance to represent how much money they actually have. Then here I keep track of the total number of accounts, initialize it to zero because it's going to be set as a command line argument. And then I have a pointer to a struct of account and that will be our array for all of our accounts. Initially, I will set it as null. Securely connect to bank, don't have to read it. Assume it securely connects to the bank. How it's written right now is it just wastes a bit of time just to make things more interesting. And then here, it's kind of the meat and potatoes. So our transfer function calculates the amount to take out of the from account. So again, we're taking 10% of their current balance. And then it will decrement it from the from account and then increment it to the to account. So we shouldn't lose any money if we are the bank. Here's a run function that we don't use yet. Check error. So in this case, if we're checking error for pthread functions, well, if they return zero, that means they're successful. So we don't have to do anything. Otherwise they directly return an error message. So it's a bit different than before. Then after that, we have our main. So we check that we have at least one command line argument. We do the set little cal thing just so we get sane input from printf. Do some things. We essentially just take the first argument, convert it to a number, check for errors, don't have to care about that. Whatever we get from the command line argument will be the number of accounts we need. So we will calculate how much space in bytes our account needs, so the number of accounts times the size of that struct for an account. That's how much memory we need. So we'll malloc that number of accounts. And then we'll just print off how much memory we're using for accounts. Then here in this for loop, for every account, we'll just initialize them. So here we give them all unique ID. So the first ID is one, then two, then three, then four, da, da, da, da, da. And then we also initialize their starting balance at that 1,000. So here we can calculate how much funds our bank currently has, so it is $1,000 times how many accounts we have. Then I comment out a bunch of stuff just to make my life easier when we want to make threads. But here is the thread that, or sorry, here is the serial code that simulates doing that 10 million transfers that we want to make go real fast. So for here, it just iterates every single time we're doing a transfer. It creates a random number to pick for the from index, then a random number for the to index, and then it will call that transfer function that connects to the bank, takes 10%, subtracts it from front, adds it to two. So this is what we want to paralyze and make fast, because, well, this is where most of our execution is. Afterwards, I just added this as a sandy check. So I calculate the bank's total balance across all the accounts. And this should run serially, so we don't have any data races or anything like that. So I just iterate through every account and then add each account's individual balance to the total balance. And whenever I run that, well, I should have the same amount of money as I did before and after. So let's run our fund bank and see how long it takes, because that's our benchmark here. So if I have 1,000 accounts, and they each have $1,000, I am managing $1,000,000. Don't have to use that much memory. It's not that much. Let it run. So when I'm done, across all of my accounts, after 10 million transfers, I still have $1,000, and it took like 10.6 seconds. So if I want to speed this up, well, I'm sharing memory a lot. So I could try and make eight different processes and then share memory and all that, but that is hard because I'm mostly sharing memory anyways. So I should probably use threads. So let's start speeding this up and we'll see what we do here. So here, I will just uncomment this. So now what this does is it will create num thread. So it will create eight threads in this case, because, well, I have eight CPU cores. And for every thread, I will malloc space for an integer. And it will be a pointer to an int. And malloc it, get some memory because I'm going to pass this to the thread. And I can't give it the address of my stack because if it got an address of main stack, it'd be shared across all the threads and we'd have data races or they'd share the same value or something bad. So I have to make sure I'm using some unique memory per thread. So each thread gets its own malloc space. So I allocate some space on the heap. I set the thread ID equal to the same thing as my loop iteration. And then I create a new thread, tell it to run this run function and then give it the thread ID as an argument. So now my idea is I want to paralyze this loop. So this loop should probably be in my run function. So let's see here, whoops. So let's comment this out and then let's see our run. So right now in our run, we just cast that pointer that C doesn't really care. We know we passed it Uint32 pointer. So we cast argument to be that pointer then we can de-referenced it to get the value. And now we have read that memory value into our stack. So now it's independent per thread. I don't need to use that heap space anymore. So I can just free that memory. I have it as part of this thread stack. So if I want to make it go faster, I can do the old copy paste, go down. So I want to paralyze this loop so I can just have every thread do this loop. Boom. All right, there we done, we just make it go faster. Yeah, so there is a data race. There's also another problem that's probably more fundamental because well, if I have eight threads working in parallel, hopefully in doing that 10 million transfers, it should be like eight times as fast if I can do it in parallel, right? Well, let's run this. So compile, sorry. Yeah, that seems more than 10 seconds. Why is this so much slower? Yeah, I joined afterwards. So here I set off all the threads, I create them and then I join them afterwards. Oh, no, no, we're not done yet. Okay, so our first problem is how many threads are executing this run function? I'm just gonna stop it. Eight, right? How many transfers is each thread doing? Sorry? Yeah, they're doing, each thread is doing num transfers. So if my num transfers is 10 million, that means I'm doing what, 80 million now. I wanna do the same amount of work. So how should I change this so that I'm actually dividing the work up a bit? Num transfers divided by eight or if I wanna be a bit more general, num transfers divided by num threads. So if I do that, then while each thread will be doing an eighth of the work, they all go in parallel, it should be pretty fast. So let's try that. In total, it was a bit faster. We screwed up. We have a bit, our funds are a bit off. But let's see here. Let's say we're a big bank. So this is kind of an odd situation because generally, whenever you make your problem bigger for when you're programming, it makes things harder. But in this case, let's see what happens. So no problem. Too big to fail. Why would that be? So in this case, right, I have, what's this? 10 million accounts. So why would this case be less likely to show anything? That looks right. Any thoughts yet? Yeah, there's eight threads running and they just transfer between two accounts. The likelihood of essentially a data race happening between all of those 10 million accounts, it's fairly low. So if I do the reverse and I have less accounts, that probably means I have more of a chance for a collision, more of a chance for a data race. So let's say I only have four accounts. So I'm like a little mom and pop bank or like, I don't know, this is your family bank, your four grand. And then, oops, 30, 36 bucks at the end. We are not a successful bank. So let's see if something changes if we run that a few times. So 36 bucks again, not doing too well or a terrible, terrible bank, 36 bucks. Let's go seven, do, do, do. What happened there? So you think a little bank with $7,000 will be able to finance, what is this, 98 billion? Probably a bad mistake, right? What likely happened there? Yeah, well, specifically in this case, it was an underflow because my balance is a Uint, so it's unsigned. So if I subtract from it and it becomes negative, well, they can't become negative, so it rolls over to a giant positive number. So one of my accounts due to the data race became negative, that can't be represented, so it rolled over as a giant number, and wow, that's wrong. So obviously I have a data race here. So before we even get into fixing that data race, this run function is running between multiple threads and I'm using a function there called RAND. I have to make sure that that function is safe to call with multiple threads. So I have to look at the documentation. So if I look at the documentation for RAND, I can see that if I scroll down a little bit and I ignore stuff, it has attributes and it will tell you the thread safety attribute. In this case, it says MTSafe, which means multi-threading safe, so this function is safe to use with multiple threads, so I don't have to worry about it. But one thing I might be concerned about is, well, this RAND uses some internal state that is shared between all the threads, so it probably internally uses a mutex or something like that to prevent data races. So ideally if I want things to go a bit faster, well, I can make this RAND independent for each function. If you look at the documentation, there is a function called RAND underscore R and that will use some state and the state it uses to generate random numbers is just a U at 32. So I can initialize it to something based off my thread ID and give it the address and then, so each thread will create its own seed variable and the RANDR function will just use that seed. It won't be shared across threads, so hopefully I get a little bit of speed up from here, although I think if we test it, it was generally a good idea, but it didn't work out quite well. So last time what, we ran 7.1 seconds. Oh yeah, and this time we failed a lot quicker, so 1.3 seconds. So way faster, we just looked up whatever RAND did. So we still have a data race. So how should I fix my data race? So right now, what do I have a data race on? Each thread has its own independent seed. This for loop is independent. The from index, the to index again are local variables. So my data races must be coming from this transfer function. So hopefully each thread should be able to securely connect to bank without interfering with any other thread. So my data race is likely from changing the balance, right? So the accounts are shared between all the threads and multiple threads, update them. This minus equal, plus equal, that's a load. Well, that's a read and a write. So we have a data race with the amount. So how do I fix this? Yeah, I want a mutex. I want something like that, right? So like one mutex. Yeah, number of accounts I have that many mutexes. Well, let's first start with the easy one. So let's just fix the data race. So let's just create a single mutex. I have creative names, I will call it mutex. And then where should I place my lock and unlock calls? After, so call pthreadmutex, lock, something like that. Okay, well, what about if I did that? Is that still okay? Yeah, I should probably just waste some time because, well, only one thread can do this critical section at a given time. And if this is independent, well, I don't have to do it while I'm holding a lock and wasting time. So let's just make sure that we do see some effects. So let's run it again with a thousand accounts. In this case, remember it took 10 seconds before, before we started introducing threads. Now we introduced threads, made each of them do an eighth of the transfers, and we're still waiting. So this took 18.2 seconds, but we didn't have a data race. So that's pretty good. So like you said, well, if I want it to go faster, I want to make my critical section as small as possible while still being safe and not having a data race. So assuming that the securely connected bank is independent for each thread, I should probably do something like this. So if I run it now, hopefully it's much faster than 18 seconds, one, two, two, five point nine. So before we started with threads, we were at like 10.6. So this is an improvement. So it's almost two times better, but I have eight cores on my machine. So hopefully we want it to go eight times faster, not two times faster. So any improvements we could make, more threads. So I only have eight cores on my machine, so more threads isn't going to make it go faster. I can mean I can try, so it's easy to change. So more threads, let's try it, change it to 16. So how long did it take before? So it was 5.9 seconds, case 5.0. Got a little faster, not significantly, but hey, more threads, more better, 106. So in this case, doot, doot, doot, about the same. So didn't really change that much. Doesn't seem to matter too much. All right, so you want lots of mutexes, right? Yeah, so you essentially want, because let's say, and they count we're too big to fail and we don't have any data races, well, that happened because one thread is transferring between a count, like 1,000, 2,000, the other one's transferring between a count, 3,000, 3,033, something like that, they're all different. So if they're different pairs, well, then you can do them in parallel as long as they're not affecting each other. So to do that, I can't really make sure that this pair isn't being used by anything else unless, well, I just have a mutex. So one thing I can do is instead of having a one giant mutex, I could have a mutex per count. So how I would do that is I can just add it here, pthreadmutex, and I'll just call it mutex, and then I have to make sure while I initialize them. So in my code, which initializes stuff, I have to call pthreadmutex init, and then give it the address of that mutex, and then no attributes. So now I initialize all my mutexes. So now in here, all my mutexes are initialized. So what should my lock and unlock calls look like? What am I locking? Yeah, which account, from or to? From first, and then I should probably unlock it. So is that that fixed? I have to lock two? Okay, so just like that? Yeah, so technically you can do that, but that won't screw over the bank, but it makes each transaction not independent anymore. So this introduces a very subtle data raise. So let's just be safe for right now and do this. So is this good? No data raises? Wait, I'll do that. Oh, we got a term deadlock. Well, let's see. So after about 10, we should start getting concerned. All right, I'm concerned. So if you want to check for a deadlock, well, you can run htop, which will tell you how active your processor is. So in this case, it shows every single core here as a little bar chart, and the percent is like how much utilization we have. So all of my cores are currently doing nothing pretty much. So that probably means I do have a deadlock because they're not doing anything and my program doesn't look like it's making any progress. So how do I fix this deadlock? Or why does this deadlock even happen? Yeah, so yeah, so you technically want me to do this so I don't acquire two mutexes at the same time, I assume. So this will have the very subtle bugs if you care about the accounts in that they're not going to be terribly consistent. So in this case, according to the bank, it won't really matter, but if I say I transfer A to B and then transfer B to A and then also transfer B to A, something like that, it will change the, like the overall net amount won't be different, but account A will probably get mad or account B depending. So let's change it back to the one that deadlocks. So why does this deadlock? And the hint is those two transfer functions up there. So let's say this is thread one and this one is thread two. So in this case, what lock does thread one try to acquire first? Yeah, the lock A, right? So what happens if thread one, assuming we just have a single core, thread one acquires lock A and then we context switch to thread two. Which does it acquire first? B and now we're at a deadlock, right? Thread one is once lock B, but thread two has it and thread two wants lock A and thread one has it. So we have that hold and wait condition so that is a deadlock. So how do we fix that? Condition variable did not solve it. So remember, the four, there's four conditions with that need to be true for a deadlock to happen. The two that are easiest to avoid is well, circular weight. So if we acquire the locks in the exact same order, then we don't have a deadlock and the other is eliminating hold and wait. So I don't have a lock and try to acquire another one. Yeah. Yeah, so we could do a try lock and then if we don't get it, we give it up. So that's eliminating the hold and wait, right? So let's say I get from, let's have a while. So I can pthread mutex try lock. So I can try to get the two mutex and if you look at the documentation for try lock, it will return zero if it has successfully got the lock. Otherwise it will return something that is not zero. So I have this condition. So while try lock, it will go into the body of the while loop whenever it did not successfully get the lock. So at this point, we have the from lock. We weren't able to acquire the to lock. So to eliminate hold and wait, we should give up the from mutex, the from mutex. And then at this point, this is where we're like we would have a yield or a sleep or we could do nothing and hope we get lucky. If in this case, it doesn't really matter because while we only have eight threads and they'll be running in parallel anyways, but if you were in like lab four, you would do something like just yield and try and be nice. There is an equivalent of that as a system call and it is called scheduled. So that will just essentially give up your time, tell the length scheduler you're giving up your time, you can run something else if there's nothing else to run and it's called scheduled. So at the time we do this, we hold no locks. So that's good. Hopefully we can resolve the deadlock because we get lucky something else runs. So what, so is this done? I should remove this. So are we done? Because remember here, we should have both locks. We can try and run it too. Looks like we deadlocked. Well, what did we miss? What could happen here? Well, one thing that bad that could happen. So we acquire the from mutex here, we try and get the two mutex and then if we don't get it, we unlock from yield when we hold no locks and then we don't hold any locks then we only try and get the two mutex and then at this line, we might only have the two mutex. So that's one bad thing. So I should have a lock for two or lock from from again. So if I lock from again, I don't pass this line unless I have the from mutex again, try to get two and then if I get two at this line, I have both locks. So we should be good now. But if I run this, still the deadlock. Why do I have a deadlock now? This is like exactly the code from the lecture that I told you worked. We don't have holding weight anymore. At the yield, we don't have any locks. At this line, we have both locks. Should be good. So here is a hint. So what happens if we do something like that? So even with one thread, what's gonna happen if I call lock on lock A? Let's assume I'm the only thread. I'll get it, right? And then what happens if I try lock on lock A again? I'm not gonna get it. Am I ever gonna get it? No, so I'm gonna essentially go in this loop and I can see that this is my hint that something bad is happening. So it looks like I'm in a deadlock but if I look at my CPU usage, every core is at 100% because it's just trying again and again and again. And this is why I plugged in my laptop. So my battery doesn't die because this is killing it. So this is just wasting lots of time. How should I fix this issue which is a bit more subtle than before? Cause I don't have holding weight anymore but I'm essentially deadlocking myself because well, I'm acquiring the same block. Yeah, I should probably check if, does it even make sense to transfer to and from the same account? No, you don't even do anything. So why would I, like I could check, I'll make sure that if they're the same thing just get one mutex instead of two but instead it doesn't make sense in this case. If from equal to two, like there's the same memory address means they're the same account. So I don't have to do anything. My net result since I'm running a bank in this case can be nothing happens. Whoops, what'd I do? Oh, I was at the wrong terminal. Whoops. So let's compile it again. So now I should be good, right? No data races, should be nice and fast. We have a minimal amount of locks boom, look at that. So it was 10.6 seconds before. Now it's at 1.7, almost an eight speed up. Eight times speed up eight cores can't really get much better than that. Pretty good. Will I get the same speed up if let's say I only have eight accounts? Yeah, our guess is that it's slower. That's definitely not one point say it's slower but it's not that much slower. Why is it slower? Yeah, because the likelihood of doing things in parallel is a lot lower if we only have four or what eight accounts in this point. So that's our limit to parallelization. The less accounts we have, the less things we can do in parallel. Therefore, the slower it will go. If we only have two accounts, well, it'll probably be, yeah. So it's a bit faster than the eight. And that's just because, well, in this case, the likelihood of being the same account is probably pretty high and it just does nothing in that case. So if we didn't have that little optimization, it would be slower, but in this case, we do have that optimization. So if we have, I don't know, 10, yeah, a bit worse. If we have a thousand, that was really good. If we have 10,000, but the same because our likelihood of essentially not being able to do things in parallel was pretty slow. So with that, is this the only way I can prevent a deadlock? So if we remember the other way is if we acquire the locks in the same order every time. So in this case, I can order them easily because they have an ID, which I told you was unique for all of them. So another way to break the deadlock is I could just always acquire the mutexes in the same order. And that's the same like absolute order. So here I can create a pointer to two mutexes and then I can check the ID from the from account and compare it to two. In this case, the ID of the from account is less than two. And if I ensure an order between locks, in this case, I'm going to always acquire the mutex associated with the account with the lowest ID first. So if I do check the from ID, it's lower than the two, I will set M1 equal to the from mutex and then set M2 equal to the to mutex. Otherwise, if the to account ID is higher than from, I should reverse them. So I should do that. I should acquire the lowest one first. In this case, the to account has a lower ID than from. So I should acquire them in this order. So here I can change all of this while loop to this and just be consistent. Oops. So in this case, I will always acquire M1 and then M2. And I made sure I acquired them in the same order. So I always acquire the lowest mutex first and then the highest one second. So because I have an absolute order in the number of mutexes I have, I also should not have a data raise. So if I run this, sorry, not have a deadlock. So in this case, same performance, same everything, I just changed it a little bit. Any questions about that one? All right, so that was hard. That was a lot of arguing. There is some nice new tools here and I will show you because it might save you a lot of time, but they are not iron clad, but they might help you. So there are these sanitizer tools that are very, very useful. One is called Thread Sanitizer and its job is to run, it instruments your code and tells you if you have any data races, if you might have any deadlocking conditions and it should print it for you. So the caveat between this tool is because it looks at your program and monitors it while it is running. If it reports an error, it is definitely an error, but it's not guaranteed to catch every error because it's monitoring that particular run. So you might get super unlucky in which case you don't, it doesn't look like there's a data race, doesn't look like there's a deadlock, it might just be rare. In that case, this tool will not tell you about it and it still exists, you just won't know about it. If this tool tells you you have an error, there is a hundred percent chance you have an error. So let's do something silly. So say I just did this, where I didn't have an absolute order on my locks whatsoever. So this will deadlock again, right? Because I don't have any order between the threads. So I included this thread sanitizer, if I compile it again and then run it. Well, it's going to make my run really, really slow, but it's also going to help me a lot. So I run it and it throws up at me because, well, I've done bad things. So here it will tell you that, oh, here's all the threads created, here's the problem. So mutex M0 was acquired while holding M5. So that's a hold and wait and it makes sure that there's no circular weight. So it tells you that M5 was, you know, M0 was acquired while holding M5, M5 was acquired while holding M4, M4 was acquired by holding M3. You can see this circular pattern here and it tells you the entire sequence here. So this is the big circular weight. So it says, well, you had M0, you're trying to acquire M1, then the thread that had M1 was trying to acquire M2 and that, it shows you the entire circular weight here. So this will detect that for you. It is slow, so if I let this run, it's really, really, really slow. But again, it told me my error pretty much immediately. You can just stop it and then know you have a problem. So that was the deadlock. It is also useful if I just go ahead and delete everything. So let's just remove the mutex calls. So in this point, I'm not locking, I have a data race again in this point, right? Don't have any deadlocks, but I do have a data race. So nice thing about this is if I run it again, well, it tells me there's a data race. Data race on line 61, what's line 61? So it tells you there's a data race at that line. You have to do a little bit of arguing, but it at least points you in the right direction. So there's a data race for this balance, which is the data race we were arguing about. And see, it gives you a lot of things, a lot of things. Even tells you line 43, so it gives you the stack trace. So you call transfer, went into transfer, line 61 was our data race, and then it also says, oh, it also happened at line 62. So this line also contains a data race. So helpful little tool. Again, won't catch 100% of your errors, but likely if you have a data race and you see it ill-effect of it, this tool will yell at you quite quickly. So any questions about that? Yeah, what's what, sorry? Yeah, so the tool is something called Thread Sanitizer. So if you want to use it, you can just use that command. I think it's in the slides, is it in the slides? Oh, no, I didn't. So yeah, this is a useful thing if you are part of this class, so you definitely know how to use that. So that will just reconfigure your build directory to compile with this tool. So if I do that, you can see here user define options that will do a Thread Sanitizer. So if I compile it, it will be compiled in. If I want to get rid of it, I can do the same thing. Well, I could just delete my build directory and reset it up again, or if I don't wanna do that, I can do sanitize none and that will remove all the sanitizers and I can just compile, run it again. And there's a whole bunch of useful ones too. So instead of doing Valgrind, you could do, there's like an address sanitizer. There's undefined behavior one. There's lots of fun ones. They're good tools to have. All right, any other questions? All right, cool, we're free. So just remember, pulling for you. We're on the...