 All righty welcome back to operating system. So today is a fun lecture fairly chill We're gonna run a bank today and make it go fast So yeah, yeah current labs do on Sunday, right the fifth Yep, Sunday Yeah, which I mean most people have not shown up today because of midterm, so well at least we'll be doing it So all right today fun. We're running a bank. Our bank is doing 10 million transfers just to simulate doing something We can create a varying number of accounts. So that is a command line argument We have how many accounts we are managing the rules are each account will have its own unique identifier and It's own balance and every account will start with a thousand dollars We're going to generate 10 million transfers between random accounts Where they will transfer 10% of their current balance to another account So in this case if we are the bank, they're just transferring money between themselves. We should not be losing money So the other little thing here is there's this function called securely connect the bank Before you start any transfer you need to call that first and that's to simulate You know connecting to the bank's mainframe or their database or whatever to simulate the transfer So rest of the day coding example all our goal is is to make it go fast So let us switch to it and see what it does So here we go. There's a few defines. There's one define for the starting balance. It's again a thousand dollars The number of transfers we're going to generate which is 10 million We're probably going to want if we want to make this go fast We're going to want threads. So I just define num threads equal to the amount of cores I have on my machine just in case we want to use them as eight Then here, this is the struct that represents an account So it has an ID which will be unique and then a balance Then I have the number of accounts I set to zero and then an array for all the accounts that I set to null Because I'm going to allocate this base off the command line argument So next is to securely connect the bank function It does some nonsense right now where it just wastes some time Ideally this would actually connect to the server or something like that It's just a placeholder. So we have something else to argue about so we will skip over that Look at the transfer function. So we are transferring between two accounts. So the first will be by a pointer So an account that will be used as the from and then account that will be used as a to So for the transfer, we will securely connect to the bank Then calculate the amount we need to transfer. So we're taking 10 percent of The from balance, which is the same as just dividing by 10. So That's 10 percent That is the amount and then we are subtracting it from the from accounts balance and then adding it to the to balance So this should be From the point of view as the bank just zero sum We're just moving money between accounts if we have a great bank. We shouldn't be losing any money Right should be the same before and after The only thing is the customers accounts might be different, but we're a big bank We don't care about our customers. We just care about our bottom line so We have a check error function So all the p thread functions just in case we want to check errors a zero from them being successful and then non-zero Means it was not successful so it can print an error message That's just a little helper function in our main We check that we have two command line arguments first being the name of the program Then the other would be the number of accounts to create if we don't have to we exit we set locale just so we can get some Commas whenever we use print f generally you don't have to care about this, but this is just so we have commas So it's easier to read Then we calculate the number where we get the number of accounts by converting the string to a number checking for errors all that fun stuff After that we can see how many bytes our accounts will use So it will each account will be the number of bytes To represent an account and then the total number of bytes for our array is just that number Multiplied by how many accounts we need to have So we calculate that we malloc it we check if it's not null to make sure we're not out of memory We'll print how many megabytes we use here and then after that We'll just initialize them make sure they all have a unique ID and then they all start with the thousand dollars and Then initially we'll just print off how much funds like all of the funds our bank is managing So it should be the starting balance times the number of accounts. So that's how much money we're managing So that's all common to do it. So I don't have to type it later This is the thing we need to make faster. So this is simulating the 10 million transfers. So right now For zero up to 10 million. So this it will execute 10 million times We randomly pick a from index to pick an account Then we randomly pick a to index and then we initiate the transfer. So we figure out what account that index represents Then we initiate the transfer So after that we're just going to calculate the total amount of money Our bank has because it should be the same before and after so we calculate we Initialize the total bounce to zero iterate over every single account and just keep on summing up that balance and Ideally the final funds of the bank is going to be equal to our initial funds. So we're not losing any money so since as We've written it right now. We don't use any threads or anything. We just have the one main thread We don't have any problems. We were talking about before Everything should work properly. So if we compile that let's say let's time it because our goal today is to run faster So let's just have a thousand accounts, which means we are managing a million dollars And we'll see how long this takes to do 10 million transfers So we're waiting. We're waiting We're waiting We're waiting. So it took 10.6 seconds So our amount before and after are the same. That's good Now our goal today is to make it go faster So if I want to make it go faster Well, I could make a new process for each one and then share memory between them But most of this is just by default like we're just sharing a bunch of memory anyway So we should probably use threads So here is the loop. We are trying to paralyze So we are going to create I Will uncomment this so we want to make Eight threads to make our bank go faster. So here I create num threads p threads and then create them in this loop So here I'm going to pass a thread ID to each thread so that they are all Have a unique thread ID So because I want to share data between threads or I want to pass things Well, I have to allocate some space on the heap and then give it each thread a pointer to that So then it can use it because we can only pass a pointer as part of our run function So here I malloc an int. I assume I have some memory then I write a value to it and then I create a new thread that wants to execute this run function and After I create all my threads. I just wait for them all to finish and then I would check the final balance of my bank So in the run function while it takes the pointer argument Cast it back to the type. We know we passed to it. So it's a pointer to an int We de-reference it that we read load that value or read it into a local variable So now we don't need to use that heap memory anymore. So we can just free it We saw this before to pass arguments to the thread so we can free it and then each of our thread will run here So The code we were trying to paralyze again was all of this transfer So if I want to make it go fast. Well, I Just do this. I just do the old copy paste move it in each thread So is this fast? Probably so will there be data races. That's a good question Will this also be? slow so Let's just see So Let's compile and run it with the single thread with just the main thread just what we had before it was 10.6 seconds In this case We're waiting We're waiting We're waiting We're waiting This isn't looking good for speeding things up. Is there any deadlocks? That's a good question. Well, let's see Well, let's argue about deadlocks. Would there be any deadlocks here? So deadlocks are between mutexes Currently we have nothing Yeah, there aren't any locks. So how can we have a deadlock if we have no locks? So this is still going So we can check each top and see what my processor is doing. You can see it's doing some work So each of those bar charts is like the CPU usage of each of my cores. I was doing stuff and now it's Not doing anything, which means it probably finished So this took a minute and two seconds way slower Why would that be way slower and also? We're wrong here, so that's not good So How many transfers are we doing in total in this case? So num transfers remember was that 10 million? So if I have eight threads What is my first issue here? Yeah, I'm doing 80 million transfers if I want to do the 10 million faster Well, the idea is I just divide it up equally between each thread and have each thread do 1,000,000 250,000 in this case I have each thread doing all of the transfer So they're doing a total of 80 million So my first problem is I'm doing way more work now than I did before so If I want to make sure that each thread is only doing an eighth of the work I should do something like that so I make sure that each of the threads is only doing an eighth of the work Yeah, yeah, so I could instead increment this instead by one increment it by num threads in this case, it's a nice divisible number, so it doesn't matter and doing one more transfer is going to be like Very fast, so it doesn't really matter But yeah, you're right if you really cared about dividing it up super equally if there was an Indivisible number then I should do something like that in this case. I will not worry about it. So Now I run it. I have eight threads each doing an eighth of the work. Hey, it's a bit faster than one But yeah, our total bank funds are not correct. So that's not good Well Let's pretend we are Scotia Bank or TD Bank or something like that. We have a lot of customers So let's say we're managing. What's that 10 billion dollars? We're using 152 megabytes of memory just for the account. That's a lot of memory. We run this Same before and after Hmm, why would that be? Yeah, yeah now a data race is super unlikely because our data races are Changing an account's balance. So if two accounts or two threads are changing the same accounts balance at the same time Then we might have a data race if we have up to like what's this? Million no 10 million. So if we have 10 million accounts and only eight threads the likelihood of two threads Essentially showing a data race is quite low. So at this point, I'm too big to fail. So I'm good So unlike most problems in computing When it gets bigger it typically gets more difficult for data races. It's the opposite So the bigger the more complicated I'm doing. Well, maybe the less likely I am to have a data race Well, but if I do something like that seven accounts likely That should be more likely for the data race. So I start with seven thousand dollars and I end with Yeah, I end with 95 billion So imagine you are operating a bank that only has a sum of seven thousand dollars and then at the end of the day It says you owe 95 billion. Yeah, this is a banker in your favor. What likely happened here? Yeah, so there was a data race. There was an overflow or in this case specifically it was an underflow So the balance was a you int so it's an unsigned. So what happened is when we subtracted It was too much. It went negative But because it can't represent negative value it Over or under flows and goes to the maximum size which is gigantic So at the end of the day, you know, our bank is super wrong and that's not great If we have three we're also more likely to have a data race, but also less likely to underflow So in this case we're at end up with twenty seven dollars. So also not a good situation Yeah, so now we're at your bank balance, which isn't that great. So Assuming we want to fix it. How would we fix this so we don't have a data race? Yeah Yeah, I have a lock. Let's just make a lock So I can just create a good old lock static Method mutex So here's my lock This coat actually before even doing that Actually, if we can leave that so before even dealing with the lock We have to argue about another thing. So this run function is running from multiple threads and We call this ran function, which we did not write So we should probably check if it was even safe safe to call it with multiple threads at a given time So if we look at the documentation for whoops not read if we look at the documentation for Rand We can pull it up and if we go down We can see some attributes and there's this fun attribute table where they have an attribute called thread safety So if the value is empty safe, it means multi-threading safe So we're actually safe to use this across multiple threads. We don't have any data races or anything like that But you might imagine that Rand has some internal state So it might be using a mutex between all of our functions, which may not be good so if we wanted to there is a version of Rand called Rand R and It lets you manage your own state so that you can have completely independent state and there'll be no interaction between multiple threads and It wouldn't need a mutex because well, it's supposed to modify your own state So here its state is just a number I can initialize it to I don't know the thread ID plus one and then give that seed to both functions So in that case, I have all my state Hopefully these Rand R functions are completely independent. So they wouldn't need a lock or anything. So this just changing the function I'm using to be Faster and more independent. Hopefully we'll get some type of speed up So if I run it now it went from seven to Just one point seven So it got a lot faster just by knowing that well if we use Rand and it's thread safe It probably has a mutex in it It probably makes it a lot really slow if we have multiple threads all calling it at the same time So we can use Rand R make it a lot faster, but in this case we still have our data race So we just failed faster, which yeah, not always a bad thing. All right, so we created a mutex Where should I place my lock and unlock to make this completely safe before and after the for loop? so in the run the seed It's a local variable. So it's independent This loop those are all independent variables. Those are all independent per thread. So we have no These are all independent. So we don't have any data races in transfer. Okay We're sharing the accounts between threads. So this is probably where we have data races so While we could do the Java thing, right? It's safe if we just put a lock and an unlock around the entire function So let's do that. So Does this have a data race now? Let's see just because you run it and you don't see remember our big bank We had a data race, but we're too big to fail. So just because we don't see it doesn't mean it's safe So if we look at this, well only one thread can do this at a time We're sharing the accounts. So I shouldn't have a data race. I don't have any concurrent accesses I only modify the accounts in one go so Hopefully when I run this I don't have any data races. So I should have the same balance before and after So I have a million dollars Seems a lot slower because before it was only one second and remember our Baseline without using threads or anything was ten point six In this case, ooh 18.3, but hey, it looks like we don't have a data race and in fact while we don't because This whole thing happens So can I make this any faster because right now I'm doing a very poor job I have eight threads and now it's twice as slow So even moving this lock can I move it and Hopefully make it go faster because remember we're limited by this critical section because only one thread can do it at a time So we just turn the whole thing serial so you want this So in this case So in this case, we would still have a data race because while we're not protecting this read So the read still include in the data race so But what about if we do this is this safe? So ideally this securely connect the bank function is Independent for each thread in which case we don't need a mutex, right? If we want things to go faster, we want a critical section to be as small as possible So we should be able to take this securely connect the bank out of it. So Let's run it again Hopefully it will be it should be a lot faster because we made our critical section a bunch smaller. So now we're at five point nine So ideally it goes eight times faster, but now instead of being slower. We're two times faster. So that's not so bad At this point, this is about the best I can do with a single lock. So What should I do to make it go even faster? So for example? If you want things to go faster, you want as many things to go in parallel as possible So if I am transferring between accounts, could I transfer between you two guys and then you two guys at the same time? Sorry, so I can run to and from between threads as long as they're all different So there's no way right now without using a mutex or anything that I can check what other threads are doing at any given time So I need a mutex Yeah, that's an idea one mutex per account Seem terribly large mutex. That's not too bad. Let's let's try so here We'll just add a field to the account called mutex So now each account will get its own individual mutex. Of course, we're gonna have to initialize it. So let's go down So here we have to call p thread Whoops not min-tex mutex-nit account dot mutex Then null so default attributes. So now every Account has a mutex. So what should my transfer look like now? both lock from and to so Do something like that. I can't switch between those two locks Dead locks So yeah, can this deadlock as I have written it I have to switch them No, what is that possibility of deadlocking? Two threads try and transfer in the reverse So if I have transfer I don't know ID ID 1 to ID 2 Whoops, and then also I have a transfer of Whoops Transfer of ID 2 to ID 1 and then this is in thread 1. This is in thread 2 so what may happen is thread 2 or spread 1 executes first Well, it would get this lock first and then we could context switch over to thread 2 And to argue about deadlocks and everything just assume you have only one physical core And you're just context switching around so in this case. I would context switch to thread 2 it would acquire Mutex 2 and now we're at the deadlock condition So we're at hold and wait thread 2 has locked 2 thread 1 has locked 1 and they're each waiting for each other So how would I resolve this? Are they in the same order? So if they were in the same order if I have transfer ID 1 and 2 I should always acquire them ID 1 and then ID 2. Yeah Yeah, smallest ID first so each ID is unique for each lock in this case I want to make sure I always acquire the lowest ID first doesn't matter as long as you're consistent I could pick the highest ID first so you said lowest ID first, right? so in that case well, let's Create a pointer to a mutex. So we're going to lock this one first and Lock this one second. So in this case. We always want M1 to be the lowest ID and M2 to be the highest ID So I can just check their you their IDs are supposed to be unique So if the from ID is less than the two ID Well, then M1 should probably be the from mutex and M2 should be the two mutex What is this complaining? Yeah, whatever. Don't care Yeah, so the other case is well if the two ID is Smaller than the from ID then I should acquire the two ID first so just do the old copy paste a rune and Change these around so now I always acquire the lowest ID first Do I have a deadlock now? Shouldn't right? Let's see how clever we are What so after about 10? We know we have an issue Well, you gave up on it at seven Okay, now we're concerned All right, so one way we can check So it might be like we could just be trying really hard, right? So we can check each top see what our CPU is doing If we look at it all of our cores are doing not a thing. So yeah, we probably deadlocked Not good. So this isn't making any progress Yeah, so this Error is very subtle So I do acquire my locks in the same order every time which is good But there is a very subtle case where I can deadlock With just one thread because I do something a bit silly. So here's a Yeah, so what if? Well, if they're both the same Yeah, we're generating random numbers So We can think about this a bit harder. So in this context If we're transferring To and from the same account Yeah, we could just return immediately. We don't have to argue about oh, okay They're the same. So we just only need to lock once or something in this case Transferring to and from the same account that does nothing. So why even bother? So we'll just do nothing. So if From equals to then whatever just return So now is there a deadlock? Well, hopefully not we acquire everything in the same order. So that breaks the circular weight and Well, we're not accidentally deadlocking ourselves by trying to acquire the same mutex twice so Let's run. Let's see how we do Boom almost eight times as fast Yeah, which is good. So we're actually using all eight cores. We can even see this is fairly satisfying Let's see how fast I can go Wait, I didn't put Boom see go ramps up and finishes isn't that satisfying our machine is being used to its full capacity so Am I done? Probably Eight cores almost eight times as fast. I can't get any better than that What about if I do something like this? Will that be eight times as fast? Should be Let's run it. That was slow. Don't have any data races, but slow. Why? The number of accounts So there's only five accounts There's less locks There's only five blocks. There's one per count Yeah, so there's more contention, right? So we're fighting over the locks and Well, there's less things we can do in parallel. So the slower we go. So If we have well in this case if we have like three accounts Well, in actuality of probably be slightly faster Which seems a bit wrong but in this case Turns out that well if we only have three accounts the likelihood of transferring to the exact same account is pretty high in that case We do nothing So it goes a bit faster in general if we didn't have that little optimization It would be a lot slower but in that case in this case Well, we just aborted if it's the same account. We have like some weird trade-off here where it's more likely We just do nothing so successfully created a good bank so If I wanted to Is this the only way I could have prevented a deadlock? Let's say you don't have an ID anymore. You could compare the pointers. Yes, let's say You are scared of pointers Yeah, we do the trial lock so we eliminate the hold and wait thing, right? So Let's rewrite this again so we can lock the from mutex And then we can have while then we can try to lock the to mutex so remember this try lock is a whoops is a non-blocking version of The lock call and it will tell you whether or not you have acquired the lock So if try lock return zero it means successful It means you have this to mutex if it returns anything other than zero It means you don't have it. So if I don't have it in the I go into this while loop What should I do? sorry yield So I could yield sure So if you were doing lab 4 you would do like what yield or whatever? Well, the system call yield is called sketch yield So here we go So here, let me delete the other code we have And restore this to its former glory. All right, so That's all we need Yeah, we need to unlock from so we should Probably do that before the yield right because we want to yield while we're holding no locks so In this case, I would unlock which lock am I unlocking? From all right, is this good? I can try it Do our old let's do a hundred oops Yeah, so Right here at this line We need both locks to be safe from a data race, right? So is there a scenario where I reach this line without both locks? Yeah, I'm pretty much guaranteed to so here I Lock the from mutex So if I pass by this if I make it past this function call, I have the from mutex Here I try and get the two if I go into the body of the while it means I did not successfully get the two Then I unlock from so at this yield. I currently hold no locks And then this case Well, I just try and get the two mutex and if I acquire the two mutex I follow the loop and then I'm executing this line with only the two mutex acquired I don't have the from So I should probably try and reacquire the from right So where would I do that? So here. All right, give it a try. So that means we will certainly Oops We deadlocked herself oops So generally trying something is not a good way we should argue about it. So in this case Well, what could happen is? We make it through this while loop. So at here we would have the two mutex and Now we're trying to acquire from so now we have that holding weight again So we're trying to acquire a mutex while we have another one. So that's why we deadlocked there So I should probably put my lock here Right after I yield so I reacquire from and then I While I have from I try to acquire two again So I just retry again, and then if I don't get it I unlock and I yield No harm no foul. So do I have a deadlock anymore? Hopefully not. So here is solution number two this case No debt no deadlock about the same performance. So It doesn't matter which one we we use All right, so any questions about this we have successfully ran the greatest bank in the world all right, so I Will show you a little tool. It does not excuse you from not arguing about things, but it will make your life a little easier So just like Valgrin kind of helps you detect if you screwed up memory issues There's this fancy tool that helps you detect if you have screwed up your threads So if you want to use it it is called the sanitizer tools and they are fantastic So there is one tool called thread sanitizer So if you want to reset up your entire build repository such that you compile your code with the sanitizer in it This is how you do it So you redo the setup command and you give this flag that says use the thread sanitizer when you build this wipe will just Re-initialize everything so after this we'll see user defined options Sanitize thread. So now if we compile our code It now has that thread sanitizer built in it Which will instrument our code and tell us if we have any errors the caveat between this for this is it will make your program Way slower because it's adding a bunch of checks for you. Also another caveat is if it doesn't show you any errors That doesn't mean you are a hundred percent safe So if it shows you an error, it is a hundred percent probability you have that error But it's just moderating whatever your program does During that execution. So even if you have a data race it might not present itself because it might be super Unlucky so just because this tool is clean does not mean you do not have a data race Just means you might have a rare data race, but if it says you have a problem, you definitely have a problem so Let's uncomment that line because we caused a deadlock here So if we compile our tool or if we compile our code again with thread sanitizer Hopefully it helps us out So if we run our bank sim again It should be really angry and it's really angry. It gives us a bunch of output and what's it say? Well, here. Let's go. So it says Gives us a bunch of information about the threads and it also says mutex M zero acquired while holding M one Then it says M one acquired while holding M zero Sounds like a hold and wait to me and it says. Oh, here's your lock order inversion So it shows you the cyclical dependency that circuit circular dependency of locks So a thread has M zero Trying to get M one while another thread has M one trying to get M zero. So it shows you that circular transfer that circular deadlock Shows you a whole bunch of cases where that's true. So we definitely have a deadlock in this case So helpful little tool if we really screwed it up. We can see that our deadlocks can get really complicated So let us rewind a little bit So if we rewind to this where we just had a lock Two calls to lock and we didn't Even try anything else Well, we have a lot of output and it also shows that our cycles are gigantic So M zero depends on M one that the dead the done This is a series of ten locks that make it all the way back to the first one So some deadlocks are more deadlocky than others. This is pretty much takes every thread out with this gigantic cycle so It will show you and it is really slow. So here Let's rewind even further. So this is where we had a data race, right? Well, let's see what this tool tells us about that So we compile We run it and it yells at us So it yells at us and it says, oh, there's a potential data race. That's very helpful of you It also tells you what line number it happens. So in line 58, that's the two. So it's telling us. Well It has something to do with this balance. So the balance is shared between different threads That's pointing us giving us some helpful tips is where a data race is If we look up a bit more gives us some more information data race also tells us there's a data race on line 57 which is this line here. So it tells us both of these lines have data races in them So it gives you some more. I guess guidance to actually solve your problem So any questions about all this fun stuff? So this thread sanitizer tool a Plus plus would use again. Yeah Sure Well, well, yeah, so I'll probably put it in the lab five description because you're gonna be using mutexes and it's there as well and You guys went to this lecture. So you know to go look at it But I'll also include in the lab five one because lab five while This was essentially lab five, but you're doing hash tables instead of accounts But same principle So you'll have to argue about but yeah, this lecture is basically lab five. So what's that 54 minutes? So you should be able to finish it in 54 minutes Probably not All right. Any other questions before we leave for today? Yeah. Yeah, so that's a good question So let's back up until we don't have an issue. So This Was not good So this code worked, right? Let's get rid of the sanitizer. So let's make sure It's time Okay, so this was a good version of our code, right? So you're asking what if we have more threads? So say, I don't know 16 So 16 in this case Same thing. So I'm limited by the amount of things I can do in parallel This is technically a little slower because there's a bit of overhead context switching between the different threads But I'm only transferring between two if instead. I don't know thousand threads well Likely they'll just waste a bit more time because I'm context switching between a bunch of threads. So if I run that now It's a bit slower So my speedup is limited by the amount of things I can do in parallel So if one of those so it might speed up if I can't run all those threads in parallel Once if one's waiting on like a file or something and I could run another one My program might actually go faster if I use nine threads or something like that But in this case, I'm not waiting on I oh, I'm not doing anything special If I'm scheduled if I can get the mutex's I can just go so in some cases It might be faster if you add more threads, but if I'm not waiting on I oh or anything I'm limited by the amount of actual physical cores. I have on my machine Yeah, so in this case like I can only do eight things at once and then We're just context switching between threads So that's just we're just wasting some time because I can only execute eight at a time anyways So on some processors, so there's something called like hyper threading so like on Intel processors You might have eight cores and then it says oh you have like 16 cores So some cores will like share some resources and try and Make two threads happen on one physical core Depending on the operation. So you might increase your threads to 16 and it might go faster Maybe eights faster it kind of depends on what your application is actually doing and then that's more of like a performance thing Which you should deal with in 454 although maybe not this Maybe not ideally you deal with that in 454 because there's no one true answer it kind of depends and depends what the architecture and then like You throw GPUs into the equation and then things change too, right? All right, any other questions? Yeah. Yeah, if you look at pretty much if you so One of the things I did in grad school is like I looked at like Linux Commits that had the word fix in them and then saw where that Code was introduced like how long in time that was and yeah, those bugs can be around for like seven years Before someone actually fixes them because they're that hard to debug so like these are the hardest things to debug period because they don't always happen, right and Based off sometimes you might just never see it and then you get new hardware and now you see it and now What do you do? So this is like as hard as programming gets this is the hardest class of bugs to fix So yeah, they're just hard So these are the things that last for years and years and years and years and years So if you have a bug in your actual program where it's more complicated This is fairly straightforward because every account's independent But you can't imagine kernel code can get quite complicated Or some other code might get more complicated in which case. Yeah, you have to start arguing about it Very very thoroughly and make it really complicated Yeah No No software that can a hundred percent detect So it will tell you if you have an error if it sees it if it doesn't see it doesn't mean you're safe And that's like a whole stream of research if you want to get into doing research the infinite possibilities for that if you want to try and Things that you know, we'll catch that that the the Fun stuff. All right, so we are out of time. So just remember pulling for you We're all in this together and this is like as hard as the course gets