 Alrighty, welcome back to Operating Systems. Sorry, my computer decided to reboot itself as soon as it got here. So that's fun for my complicated setup. Well, good thing today's lecture does not last the entire time. So today, talking about locks. So, what we saw before... Let's see... So what we saw before was this problem where we had eight threads incrementing the same global variable a bunch of times. And whenever we ran it, you know, we got different answers. I can make that bigger. Got different answers every time. So, in order to figure out what our problem is, we need to identify it and give it a name. So, specifically what we saw there was something called a data race and they occur when you are sharing data between threads or anything concurrently. So, specifically what a data race is, is two concurrent actions to accessing the same variable where at least one of them is a write. So, you just have to have one write and then any number of reads and that can lead to a data race. Essentially, it's what we saw before when you are reading still data and bad things might happen. And in this case, it's just concurrent accesses. So, you might actually encounter this issue too when you implement Lab 4. And I almost guarantee you will have this problem. One test case will be very hard to debug. And I will post something on it because I'm guessing all of you will mess it up. Basically, you have to be very careful when you come back from a swap context because if you dynamically reallocate the array of process control blocks, well, before you come back, you might make the array bigger and move all of that memory in which case if you have an old pointer that points to the old location, it will not be valid and whenever you come back from your swap context, it immediately segs faults. So, that's because you are reading some old data. You need to make sure when you come back from swap context that you read the newest data. So, in order to argue about data races as well and like what interleaving between threads might happen, we have to talk about atomic operations and they are things that happen on your computer that are indivisible. So, they either happen or they don't. There's no in between. It doesn't get halfway done or something like that. So, because it's atomic and either happens or it doesn't, there's no in between, means you cannot preempt in the middle of it. Again, it either happens or it doesn't, nothing in between. And then, you have to argue about the preemptions between atomic instructions. So, after you do something atomic, you could context switch over to another thread, do some other operations and context switch back, but you just can't context switch in the middle of an atomic instruction. That's why it's called atomic. So, this is a preview for your compiler course if you want to take a compiler course. So, whatever compilers just take your C code and spit out machine code, they don't just do it all in one step. They have some type of intermediate language that is not quite assembly, but is a lot simpler than C. And they use that to analyze your code, do optimizations. No compiler pretty much does any optimizations at the level of C. They do it at the level of what's called three address code. So, the statements, it divides your program into very simple statements, and each statement can set a value or assign a value, and then it can only do one operation at a time with up to two operands. So, you can't write something like one plus two plus three directly. You have to break that up into two statements. So, you have to like assign a register to one plus two, and then you have to assign a new register to the old one plus three. So, that's how you would do one plus two plus three. So, it's like when we did addition back in elementary school, you can only add between two numbers. If you want to add a third, you just do it again. So, all the statements, the most complicated statement you'll see in three address code is result equals operand one, operator operand two, and that is as complicated as they get. They can have less arguments. So, on the topic of naming good things, GIMPLE is the three address code that's used by GCC. Don't ask me why they called it GIMPLE. Probably for some silly reason that stands for GNU, something intermediate, something, something, something. I mean, we're not that creative with naming things, but if you want to see the GIMPLE representation for any of your C code, if you're using GCC, you can use the flag fdump tree GIMPLE, and you'll see all of it. It's actually fairly readable, even if you've never seen it before. If you want to see all the three address code, there's another flag for that. And even without knowing it, it's a lot easier to read and to reason about your code than just using assembly directly or trying to use something like C. So, here is the data race we saw before that we argued about. So, if you looked at the GIMPLE of it, while GIMPLE works on pointers to stuff, it's a global variable. It would be represented as a pointer to count since we're modifying just a virtual address. So, if we converted plus plus count to GIMPLE, it would look like this. So, the other nice thing about GIMPLE is you're not restricted by any physical limitations or anything, so it just assumes an unlimited number of registers. So, a d dot number is just its own unique register, and each of these commands just assume that it writes to a new register, and if it needs to compile it on an actual physical thing, it would do register allocation and all that fun stuff you will learn about in the compiler course. So, this operation is our load from memory. So, we are loading the value of whatever is pointed to by pcount into a register called d1, and then this step is our increment step. So, it's taking the value we loaded, adding one to it, and assigning it to a new value d dot 2, and then this is our third step where we do the memory write. So, we are updating that global variable. So, assuming that these are steps for plus plus count, and initially our count is equal to zero, while arguing from last, last lecture, what are the possible values of count at the end after the two threads are done? Yeah? One or two. Yeah, one or two. Do we all remember that it is either one or two, depending on how unlucky we get? So, depending on how unlucky we get, we could get into the situation where thread one does the load of count is equal to zero, then we context switch over to the other thread. It also does the load of count which is still zero, and now both threads doesn't really matter. They will just write out the value one, and they'll both write it once. So, this is what you would have to do to analyze data races. You have to assume no matter what preemptions you get in whatever order, then you will always get the correct value. So, in this case, even if we have a single thread, and we're just doing a read followed by a write, well, from thread one, we can call them R1, so read from thread one, and write one, write from thread one, and they'll always be in that order within a thread. We'll assume that threads execute sequentially. Although, realistically, your compiler can reorder stuff, and then your CPU scheduler can also reorder instructions, but for the purposes of this course, we will assume that nothing gets reordered, but that's where real fun comes in. So, if that's thread one, then we can represent the operations from thread two with R2, so read from thread two, and then W2, so write from thread two. So, assuming, again, no reordering of instructions, all our possible orderings are, well, the first row here is, well, we can start by reading from thread one, and then after we're reading from thread one, here's these two columns represent, if we context switch over and do a read from thread two, in which case it would also read a zero, and then that point doesn't matter if we context switch back or not, we still get the result one, and we will get the result two in the top case where thread one executes the completion, and then we context switch over and execute thread two to completion. So, for context switch or for data races, you'd have to argue that all of them have the same outcome in order for it to be safe, and you can quickly see how this gets out of hand. So, this is just for a two threads updating one value. So, you can imagine writing that all out if we just included all eight threads that we were at before. So, it kind of seems like a miracle that we ever got 80,000 whenever we ran it. So, any questions about that? Those are all our possible fun orderings, and we would have to argue about all of them. Okay. So, in order to fix this problem, we can create something called a mutex short form mutual exclusion, and this is just how to make them. So, they are in the pthread family of the pthread library, and they are just called pthread mutex underscore t's. And if you want to create one as a global variable, this first line is how you would create it. So, you just assign it to pthread mutex initializer, which is just a fancy define. Otherwise, if you want to do like the same, like a malloc equivalent of it where it initializes it, you do a pthread mutex init, and the argument here is a list of attributes, like we saw the attributes for threads, although in this course, we will just assume the default ones. And then after whatever you are done with the mutex, it's just like memory, you should destroy it as soon as you are done using it. So, that is how you just create a mutex. So, how you use a mutex is what is the important part. So, if you have some code, so a mutex is kind of like, it's called a lock, so you can think of it as a lock, and its purpose is to ensure only one thing essentially is doing this protected code at once. So, a crude way to think about it is assume this protected code is like a bathroom. You probably, like a single room bathroom, you probably only want one person in the room at a time. So, how do you guarantee that there is only one person in the room at a time? Well, you put a lock on the door. So, if you put a lock on the door, and whoever goes in locks the door, and then only one person is allowed in at a time, and then whenever you are done, you unlock the door, and then other things can come in. So, that's basically how this works. So, this protected code here guarantees that only one thread is executing this at any given time. And even if you preempt to another thread, it would reach this lock call, and then if you are currently in the room, or if another thread already has a lock, another thread should not be able to acquire the lock until the initial thread has unlocked it. So, if one thread is executing this protected code and another thread reaches lock, that thread should just block until the other thread is done and makes it to the unlock. So, the idea is that we can get rid of data races because if we have mutual exclusion, and only one thread is executing that protected code at any given time, we have no data races because we don't have concurrent accesses. So, nothing can switch wallets doing that protected code. Yeah, so in this case, M1 could just be a global variable, so it's just a mutex. Yeah, if it's local, then each thread has its own local one, and it doesn't do anything. No, so for mutex lock, it's only a single mutex at a time. Yeah, so if you want to do a lock of M2, you have to do another separate line. Yeah, and we'll see. So, yeah. Ideally, if you have multiple mutexes, which is another thing that will be a complication later, you have to avoid deadlocks, and that's something we are not going to talk about yet. Basically, a deadlock is a case where neither thread can make any progress anymore, like if you have a lock for one mutex and you're waiting on a thread that has a lock of another mutex and it's waiting on you, then you can't really do anything. So there's also this trilock that we will get into later. But any questions about the protected code part? So it should be the case that only a single thread is executing this protected code at a time, and it can't get... Nothing else can concurrently execute that protected code. Yep. So if two threads make it to the lock code, the lock code should guarantee that only one of them makes it through. Otherwise, if both of them made it through, then it wouldn't be a very good lock. Like, if both of you went into the bathroom at the same time, probably not a good experience. Okay, any other questions about that? Yeah. Yeah, so if the first thread, let's say the first thread makes it to lock, no other thread has a lock, so it passes by, then if it's executing the protected code and then we context-switched another thread, and another thread tries this, it shouldn't be able to progress. So there's different options. It could just retry again, waste some time, or it could go to sleep, or start executing another thread in hopes that it gives up the lock, or something like that. Yeah, yep. So basically, this line, what is lock have to wait? Yeah, so if another thread has a lock and then another thread makes it to lock, it has to wait until that first thread that has the lock is done. Thank you. Yeah. All right, so we will see an example of it. So let us fix our problem. So, if we want to fix our data race, which is a bit of a silly thing here, but if we wanted to, we could create a new mutex as a big global variable using that initializer, and then our problem was with the counter, so that was the global variable that was being shared. So if we want to make sure only one thread can increment it at a given time, we can make it protected, so we lock, then we do whatever was the source of the data race, and then we do an unlock. And then, of course, whenever we join we can destroy the mutex because we are responsible people that like to free things. So, now if I execute this, every single time I execute it, because it is now protected, I have 80,000 over and over and over again, and I will not have a data race because this is protected by the mutex, which I only have one lock essentially, so every thread is requesting the same key, and only one of them will be able to pass by at any given time. Yeah, yes. Yeah, so this only works in the scenario where every single thread that accesses that counter uses a lock. If one doesn't, then you have the situation where data races could occur because you can have too concurred access to it. So that will become a bane of your existence because everyone has to agree. Luckily for threads, everything is living in the same process, and you control all of it, so you just have to agree with yourself, which is much easier than agreeing with other people. All right, any other questions? Yep, so you can think of this as happening atomically, like the incrementing is atomic. Between all the other threads, as long as all the threads always get the mutex when they increment it. Yep, so we could put multiple instructions between the lock and unlock, and it won't preempt between them because if another thread tries to get the lock while another one is in there, it can't pass. So yeah, so that makes the whole thing. So there is such things as atomic integers, so that will make the plus plus count be atomic, so we won't get into this situation here, but for more complex data structures or you want to make sure things happen all in the same order, you would need to use a mutex lock and unlock because atomic operations are only for integers and very simple values. All right, any other questions with that? Cool, all right. Yeah, so here is our code just so you have it. So that critical section, so between the lock and the unlock, we call that the critical section, and again, that means only one thread executes that at a given time, and if you were to implement your own mutual exclusion, then you should have some safety, which means that safety just means that only a single thread is executing that critical section at once. Other properties which harken back to scheduling is you have to have liveness, so if multiple threads reach the critical section, one must proceed. We can't just sit there and do nothing. You won't get a data race, but you also won't get any progress, so that won't be good. That critical section should not depend on outside threads. You can mess it up and deadlock, which we will see later, and then that lock also should have a bounded weight or aka B starvation free. So starvation just means if a thread makes it to that lock call and it can't acquire it, it must be guaranteed to eventually make progress. So yep. So we use M1 like lock it, and where else does it go? So the question is, can I just have multiple, use the mutex multiple places in my code? And yep, I absolutely can. So I could do something like this. I could, I don't know. You could do something like that, and the mutex will make sure that since we have lock and unlock, and we have two of them, it means only one of these can happen at a given time, and only one thread will be doing that, because it's the same lock. Yep. At which length, 12? Yeah. Will the second thread proceed to at the counter, or will it be locked? Will it wait until the first thread finishes? So if one thread makes it here, say it gives up the lock, so thread one is here, and then thread two is here? Yes. So at that point, you don't know what's going to get scheduled first, because it's up to the operating system, but you're guaranteed that only one of these will pass. So either T1 will go ahead and get the lock, in which case if we go back to thread two, it's stuck at this lock call because thread one got the lock. So basically... Yeah, so if you make it to a lock call, and some other thread has the lock, you just have to wait for it, because again, only one is allowed to proceed at any given time. Thank you. Alrighty. Nope. Our P thread mutex locks re-entrant, it depends. You have to look at the documentation. Yeah, the default ones are not re-entrant, and if you don't know what that means, that's okay, we will possibly not get into it later, or possibly get into it later. Alright, so if you were to implement your own lock, your critical sections, well, actually this is just a programming thing, so your critical sections should be as small as possible, and your lock implementation should be efficient, so you don't want to consume a bunch of resources while waiting. They should be fair, kind of like in scheduling, so you want each thread to wait approximately the same amount of time, so that does kind of make you believe that there should probably be a queue if one thread acquires a lock, and then there are seven threads waiting for it, they should probably form an orderly line, and so everything is fair, and also it should be simple, so the lock should be simple to use and hard to misuse, although that is hard to do. So, similar to libraries, there will be different layers of synchronization, so the atomic operations need to be provided by your physical hardware in order to build something like mutexes on top of them, so we'll get into one of the most important primitives in the next lecture, but there's some low-level building blocks that have to be supported by your hardware, and then on top of that, you implement some high-level synchronization primitives like mutexes and other things we will see in this course, and then using those primitives like mutexes, you should be able to use mutexes to have a properly synchronized application, and what does a properly synchronized application mean? It means your application does not have any data races in it. So, if I wanted to implement my own mutex for whatever reason, if I have a system with only a single core on it, and my only source of concurrency is interrupts, well, I can implement a lock pretty easy. So, in my lock function, I can just disable interrupts. So, if I disable interrupts, and interrupts are my only source of concurrency, I won't get interrupted while that core is doing that critical section, and then as part of my unlock, all I have to do is re-enable interrupts. So, I just get rid of concurrency, I don't have any problems, but this, of course, doesn't work on a machine where you have multiple cores, and it doesn't work in practice for generic operating systems, because generic operating systems will not let you disable hardware interrupts. But this might be applicable if you do really low-level embedded programming where you just have a simple dumb processor that is only a single core, and you are writing your own operating system. Well, guess what? The only source of concurrency is interrupts, so just disable them if you don't want to have any concurrency. So, let us try to implement our own lock in software since we have a multi-core machine. So, we will just represent a lock as like an integer, and we will have a pointer to it, and we will just flip the value between zero and one. So, initially, the value of that integer should be zero to indicate that no thread has the lock. And then in my lock function, this looks a bit weird, so while the value of the lock is equal to one, there is a semicolon here, so that semicolon means that it will keep on doing that while loop while that condition is true. So, what that is supposed to do is, well, if the value of the lock is one, that should mean some other thread has the lock, so I need to sit here and wait for it. Otherwise, if the value of the lock is zero, that while loop breaks out immediately, and this thread should set the value of the lock to one to indicate that this thread now has the lock. So, if it writes the value to one, that should block any other thread from trying to acquire the lock, so it should just be in this while loop infinitely. And then the thread to unlock the lock can just change the value back to zero, and then the next thread in line can hopefully get it. So, have a good think about this for like a minute or two, and then let me know any issues with this implementation. So, remember how locks are supposed to work if two or more threads make it to the lock, only one should be able to proceed. Yeah. So, specifically, what is the details of the race, or what is an ordering of the threads that will cause a bad outcome? So, how would they both pass the while loop? Yeah, assuming one core. Everyone on the same page with this, or need to explain a bit more? This is not the correct thing. So, let us assume with this we need to be very specific with what will happen here. So, let's say initially the value of my lock is zero, and then we have two threads. So, you should be able to explain a very bad outcome just with a single core using concurrency. So, in this case, if this is our lock implementation, well, what ideally should happen is thread one goes ahead, and let's see, what's a fun color? And then it will read the value of L, which is currently zero, and then great. So, it would break out of the while loop, and then it would write to L the value one to indicate that thread one now has the lock. Well, that's great. So, if we had thread two, then what's going to happen is if we context switch back to it, then it would read the value of L, it would be one, so that condition of the while loop would be true, and because there's a semicolon over there, it will just try it again. So, it would read the value of L, that would be one, and it would read it again and again and again until finally we context switch back over to thread one. Hopefully, it writes zero to the lock, which indicates that it's done. So, that seems reasonable, right? Seems like it works. So, is there essentially a bad context switch that could happen here that would cause this not to work? Yeah, so basically, it's what would happen if just immediately after a read, I do a context switch to the other thread, in which case what value would that read? That thread would read zero, and then this could write L the one. We context switch back over to this, and it would write L the one, and they would both make it through the lock call. So, two threads arrive at the lock call, and two threads pass. That does not sound like mutual exclusion. We should be able to write this in a way that, well, only one thread passes this if this situation happens. Any other issue with this implementation? It's a pretty big issue. There is another issue. Yeah, so it essentially, if it worked properly, and that is a big if in this case, if it worked properly, well, this did a write, and then we context switch back over to this, and it just did read L, whoops, one, and then it would keep on doing this whenever we try to execute it. So, that would take CPU time. Is that doing any useful work? If it's reading, if we have a single core and that's reading one, and the next thing it's going to do is read one again, should it bother even reading again? Because it should know that it's kind of futile, right? It's just going to read the same value again because I'm not switching threads to anything else to execute, so why bother? So, this would be a good time that if you did this as part of your lab four, then if you see that you can't acquire the lock, it would probably be better just to yield and let another thread execute because you can't do anything, and you should know you can't do anything. So, that's the second problem. It's not safe, like we said before. Both threads can make it there, make it through the lock call, and the other one is while it wastes time because it's just a busy wait or the CPU is just wasting cycles. All right, any other fun questions? No, okay, let us set... Let's see if this button works. Okay, that doesn't exist anymore. All right, cool. All right, so hopefully some of you have seen lab... If not, we can read that to make sure that we have some time. I intentionally made sure there was extra time in this class so we can talk about lab four because you should be starting. All right, so, what we're doing in lab four, we are creating user threads. So, you're creating your own user threads, you're heavily relying on the UContext library. I showed you how to do in lecture 18, and you are using queues. So, what you are doing... So, let's us go over a progression. So, there are 13 test cases that you should be able to pass in order, and some of them are very easy. You could... With two lines of code, you should be able to pass three test cases. So, the first... The first test is called main thread is zero, and I added a bunch... Based on your feedback, I added a bunch of documentation for it to explain how the testing infrastructure works and what to expect. So, in this case, how the testing infrastructure works is it will run your... this test function in a new process so it can check that it exits properly and all that stuff. So, if I do what init, that should initialize your user library and based off the documentation for what init, it should assume that the initial thread that is executing this test function, which is a kernel thread, well, it should have a special ID of zero and you shouldn't have to allocate a stack for that new thread, you're just using the initial thread stack. So, this test case makes sure that what ID should always return the ID of the current running thread. So, in this case, it should return zero. So, it writes it to a shared memory location and then this check function will execute your test function, wait for that process to finish and then it will check any of the values for whatever their expected value is. So, in this case, the return value of what ID, which was in shared memory zero, should be zero because the ID of the main thread should be zero. So, any questions about the test setup or anything like that? Okay, so let us, I will help you and go over a more complicated example. So, here is a way more complicated example. So, it will always start with initializing the library. Here again, it does what ID, so the initial thread should still be ID zero. Then here, it creates a new thread. Its ID should be the lowest ID available, which in this case, the next ID after zero is one. So, this should return one as the ID for this thread and it should be set up using make context and everything like that, such that whenever we switch to this thread one, it would start executing T1 run. Then it also creates another thread that should be executing T2 run and its ID should be two. So, that is these values here. So, they should just be sequential. So, thread zero is running and if we have a ready queue, so what create should not automatically switch to the thread, it should just set it up, set up that you context and then stick it on the ready queue of threads that are able to execute. So, I showed you how to use a queue in last lecture as well. So, at this point, thread zero should be running and the processes should be added to the queue in order. So, the next one to execute is thread one. Then after that is thread two. So, here it does a join on thread two. So, that means thread what zero is blocked waiting for thread two to finish executing. So, because of this join, thread zero can't run anymore and it's waiting for thread two. So, whatever thread two terminates, it should allow thread zero to continue. So, in this case, thread zero can't run. So, it should run the next thread in the ready queue, which would be thread one. So, thread one should start executing here and then there is this fun function called whatCancel. So, that is basically your version of kill-9. So, it will terminate the thread and that thread is now dead. So, what this thread does is cancel thread zero. So, if it cancels thread zero, thread zero now is terminated and there is now nothing waiting on thread two because thread zero just doesn't exist. Well, it's terminated and it still has a thread control block until something joins on it. Luckily, in this test case, the next thing thread one does is join on thread zero, which is terminated. So, in the case that join is joining on a terminated thread, it should return immediately. You look at the documentation, if it gets a kill-9, its status should be set as 128. So, it should read 128 into this shared memory five. Next thing thread one is going to do is now join on thread two. So, nothing should be waiting on thread two to terminate now because only thread zero was before, now it doesn't exist. So, now thread one would block waiting on thread two to finish. So, the only thread we can execute at this point is thread two. We killed thread zero and now thread one is waiting on thread two. So, in thread two run, it increments this global variable, checks, reads the value of the global variable into a shared memory location, should be one, then it creates a new thread that executes a function that just does nothing. So, this new thread should get the ID zero because it is the lowest number that is unused because thread one joined on thread zero and it terminated, it should have cleaned it up and that ID should now be available to be used. So, this ID should be zero and here it writes that to that memory location. Then thread two should exit. So, thread two should exit, now thread one should be unblocked but it should be thrown at the back of the ready queue so we should run thread zero. So, we should come up, execute thread zero, it doesn't do anything so it should just immediately exit and terminate in which case we should go back to thread one which was at this join. So, it should come back from the join, it should get a status of zero because it just implicitly exited, increment the global variable and then we write it so it should have incremented it from one to two, then it just checks then at this point thread one should implicitly exit, it's the only thread that was available so there's no other threads to execute so your process should end and that's the whole test case. Alright, any questions about fun stuff? So, the status is something you control, you keep track of, it should be in your process control block. Yeah, you design the process control block, you context should be part of it because that's all the registers and everything and that does most of it so you're just adding a little bit more on top of it so you're going to want to keep track of the status and as a Pint maybe I guess so that status can only have so if a thread terminates that status is like zero to 255 if you make it an int you can use it and use special values to mean other things so like you can give a special meaning to negative one, negative two or something like that that's not used for normal statuses so, but don't let me tell you what to do, you can do whatever you want with it. So, it's status is only when it's terminated that should be valid so you can yeah, so the status yeah, the status is the exit status and initially it's not exit so you can set to whatever you want yeah, you don't have to you can assume that it implicitly exits because it's testing the solution yeah, alright alright, any other questions? I will be on the discord because we are out of time so don't forget to remember, pulling for you we're on this together and start lab 4