 All right, good morning. So I guess plan of the day is let's blast through this lecture and then talk about quiz one, so more people come. So let's just jump right into it and then get to talking about the important stuff, which is the quiz, which is not that bad. Whoops. OK, mute myself, yikes. All right, so last lecture, we at least attempted to implement a lock and we failed miserably because two threads could make it to the lock code and then two threads could return from that function, which does not guarantee mutual exclusion. So in order to implement proper mutual exclusion, you have to have you can do it with minimal hardware support, so as long as loads and stores are atomic and instructions execute in order, you can actually implement mutual exclusion. So there's two main algorithms you can use. There's Peterson's algorithm and the Landport's Bakery algorithm. They don't scale that well, though, and processors nowadays execute out of order, so we actually need some hardware support nowadays. I might go into the details of one if Ashwin does it later, but I'll show you how locks are actually implemented in hardware nowadays. So the easiest way to think about how you would actually implement locks in hardware nowadays is you can assume that there is a magical atomic function called compare and swap, so this will translate directly to a CPU instruction, except reading assembly sucks, so let's just pretend it's a C function. So the compare and swap function takes three arguments. It takes a pointer that points to a number, and then it takes what you expect that number to be and then the third argument is what you want to change that number to, and then whenever you call compare and swap, it will always return atomically what the value that the pointer is pointing to is, and it will only swap if you get back the original old value, so if I call compare and swap and I gave it my lock I gave before and said the old value is zero and the new value is one, it will only return zero if it changed that zero to a one, and that's the only time it will transition from zero to one, which would indicate a lock. Otherwise, if you tried to call this with zero and one and it returned a one, it means that the value was one and it did not swap successfully, so it will only swap if it returns the old value. So if we give it another shot with our lock implementation, well, this actually makes it a lot shorter since compare and swap actually sets the lock for us, we can just have a while loop that just continuously does compare and swap. So if compare and swap returns zero, it means we transition from a zero to a one, which would break out the while loop because it would be while zero, which is while false, so it would break out the while loop and then return from the lock. Otherwise, if it was already locked, it would return one and then keep on looping through this again and again and again. So now this actually works as a mutex and this is called a spin lock and it's called a spin lock because of that while loop. So it just continuously tries and tries and tries and tries again if it can't get the lock and it doesn't do anything smart, it's just wasting CPU cycles on that, this. So this is a valid implementation, but it kind of wastes time and this is called a spin lock. So any questions about compare and swap? It's just a magical atomic function, so it either happens or it doesn't, there's no in-between and it will only swap the value if it returns the old value, right? Otherwise, it just returns whatever it's pointing to. Okay, so that's a spin lock, right? That's what we implemented in the previous slide. It's a valid implementation. So this compare and swap is the common hardware instruction that you use to implement all your synchronization permatives. On x86, it's called complex-tinge because that's a great name, right? It's just a dumb way of saying compare and exchange. Don't ask me why they shortened it like that. They thought they were clever. But this has a busy weight problem. So if we have a unit processor system and we can't get the lock, the better thing we could do is just yield our CPU time. If we're the only thread active and it's locked, no other CPU is magically going to unlock it, right? So why are we even bothering trying when we know we can't get the CPU? So the better thing would be to just yield your CPU time and then hopefully the thread that has the lock gets scheduled and then by the time it's done running, it's done with the lock so now I can run, right? But a multi-processor machine might depend so how long another thread uses a lock. Maybe you don't want to put yourself to sleep. Maybe you would just get the lock in like two or three spins anyways and not waste that much time and you'd be much, much more responsive in that case. But as everything in computers, it's going to be kind of a trade-off that you have to make. But let's go ahead and add a yield. So if we don't get the lock, we'll just do the compare and swap. If we fail, which means that the value is zero or sorry, the value is one, which means it's locked, we'll just yield the thread. But now we have another problem. So we have a thundering herd problem. So there might be eight threads trying to get this lock. Then they all yield themselves and then as soon as it's unlocked, you have eight threads all fighting for that lock and then seven of them are going to be yielded again. Ideally, you just want to just wake up one thread and then that thread gets a lock and you don't have to bother monkeying with seven other threads. Also, remember what we said about properties of our locks. We want them to be fair. So if you can think of it, you'd rather there be a line for the lock than just a giant free-for-all, which essentially this is just a free-for-all, right? You have no control over who gets the lock. Whoever gets woken up first or whoever just tries it first gets it. There's no sense of ordering. There's no sense of fairness or anything like that. So we can do better. So you can add a weight queue to the lock. So just pretend we have a kind of queue structure. So in the lock, if I don't get the lock, I will add myself to some lock weight queue where I'm just going to politely stand in line and then I'm going to put myself to sleep. And remember, putting myself to sleep just kind of blocks it until someone else wakes me up. So then in the unlock code, it would unlock the lock. And then it would check if there's any threads in the weight queue. And then if there is a thread in the weight queue that wants the lock that I just had, then I just wake it up, right? So I wake up a thread I know is waiting for a lock. So there's two issues with this. One's called a lost wake up and the other's called the wrong thread gets the lock. So can anyone guess what might happen to cause, let's say the first one, a lost wake up? So is there a scenario where you have two threads that want the lock where one thread just gets put to sleep forever? So let's switch to this. Whoops, didn't see that. I'll say that later. So anyone see any issues if we have two threads? I'll put the whole thing there. It wakes up multiple threads before? So it should only wake up a single thread, right? How would it wake up multiple threads? So the only thread that's gonna be able to unlock is if it has a lock already. So we know that this will just get executed by a single thread at a time. Unless we don't pair our locks and unlocks in which case you used it incorrectly and then that's their fault. Yep, yep. And then that's if a thread already has a lock already, right? Right? So here, I'll try and clarify that. So assume thread one has the lock. So if we assume thread one has a lock, well, what can happen to Jolly old thread two? So thread two can execute this compare and swap which is going to fail, right? Because some other thread has a lock? And then it will go into the while statement and be right here. And then next thing it wants to do is add itself to the queue. Now we get context switched. So now thread one calls unlock here. Then thread one, let's label that thread one, would go ahead and unlock it. It would go ahead and unlock it. Then next thing it would do is check if there's any threads in the way queue, which there are no threads in the way queue yet because the thread that requested it hasn't added itself to the way queue yet, right? It got context switched. So this statement wouldn't go into the if statement, so it would just exit unlock. And now at the end of that, thread one now no longer has a lock. So thread one doesn't have the lock anymore and now it would context switch back to thread two which then thread two would add itself to the way queue. Now there's one thing in the way queue and then it would call sleep on itself. And now it's just a sleep. No one has a lock, so there's not gonna be an unlock paired with it. So it's just gonna essentially be in a coma, right? Like it's not gonna get woken up because you're assuming that you add yourself to the way queue and then if there's threads in the queue you just wake one up. So that's an instance of lost wake up. So does that make sense? So we have to do a bit better here. What about the situation where the wrong thread gets a lock? Yeah. So compare and swap to Tomic. So it will make sure that only one thread will change the value from like a zero to a one. So that way that they both can't acquire the lock which is what happened before, right? Where they were reading and then writing. So it combines read and write in kind of a useful way where we know whether or not we were the thread that transitioned it and there can only be one. Yeah. So the other thread after the first thread does the compare and swap and swaps it it's going to read get one return from compare and swap because that's the current value of it. Value, yeah, yeah, yeah, yeah. Yeah. So the other problem with the wrong thread getting the lock so assume that, you know, so assume T1 has the lock and then thread two is in weight queue. So if you have that situation where it's in the weight queue then what can happen if thread one goes and unlocks itself and then thread one is here. So it's just unlock the lock and then you get a context switch to a third thread, thread three that hasn't even requested the lock yet. It could go and execute this instruction thread three and then it just came in, swoop and then took the lock when it should have been thread two, right? Thread two is already in line but if I have a context switch right after the lock another thread's not gonna know that there's anyone in line it's just gonna do the compare and swap and be like, hey, I changed the zero to a one, I got the lock, yay for me. So another thread came in, swoop the lock, that was not super fair, right? They cut the line at Disney World, they got the fast pass or whatever the hell it's called now, yep. So the question is how do I know when the context switch will happen? And the answer is you don't have any control over when it happens. So when you need to argue concurrency thing you have to argue that, hey, you know, there's a lock here I know only one thread's here it can't context switch in and do this code but any code that's not between the locks you have to consider at any time it can get context switched out. So this and usually you just do it by unfortunately reading and understanding the code and then finding a situation in which the context switches really screw you over. Yep, so that's you changing the lock back to a zero. So it was one, so at the time of unlock it's gonna be a one, right? So we just write the value zero to it to indicate it's unlocked. Yeah, well they'll just keep on spinning over and over again. This is just to save time. Yeah, when it wakes up, when it wakes up it would wake up here, right? And then try it again and then it would acquire the lock. Okay, so that's the example of the last wake up just so we have it and the wrong thread. So except I called the thread something different but when you create these examples you can say whatever. All right, so this is how you fix the problem. So to fix these problems you actually use two locks. So you can kind of decompose this into two locks where I'll use my spin lock implementation because I know that's okay. So we can assume that that int guard represents a spin lock and the spin lock is only used for these lock and unlock calls. So if it makes it easier to read let's hide myself real quick. Get out of here me. So to make it a bit easier to read at the top of lock here this while compare and swap guard you can just substitute in like lock guard and then at the end of both of these scenarios where we set guard to zero that's just unlock guard and then again in the top of the unlock code you can think of that as just lock the guard and at the bottom it's unlock the guard. So where am I? So within the unlock we know this block of code it only runs in one thread at a time right? It's like the same as having a mutex lock and unlock around it. So all of this will just execute in a single thread and it can't be context switched out and something else can't run that. And then same thing here. So there's in the lock. So this is again like lock for the guard and then we have an if condition that checks the value of the actual lock that we want to represent as part of this lock and unlock. So if it's zero we just acquire it in this case and we don't need a compare and swap for that because we know we have the guard and this is all mutually exclusive this is all happening in a single thread it's mutually exclusive. So we don't even have to worry about the compare and swap to transition the lock from zero to one. So in this case we'd acquire the lock oh sorry we'd acquire the lock and then unlock the guard and then it would return but in the case that it does not acquire the lock that's when we would add ourself to the queue and now because that's within that guard lock we know we can't get context switched and this can't happen because it's the same guard lock right so if we'll add ourselves to the queue and no one else will try and access the queue at all then we unlock ourselves so that some other thread could execute an unlock or something and then we put ourselves to sleep. So then in the unlock we can now that we don't we're not fighting over adding elements to the weight queue right there's no data races involved with it anymore we can just check if it's empty we just release a lock we don't have to wake up anything otherwise we would just transfer the mutex to the next thread so we would just check the front of our queue and then wake up that thread. So does this make sense as a better mutex implementation so now we're fair right there's no data races involved with our weight queue it's just like a first in first out queue and then we also don't yeah we don't have any data races with that so we don't have any lost wakeups nothing else can come in and swoop the lock because it needs this guard and we're all good we solved all of our problems we don't have a busy spin lock at all except for the guard but that's okay as a spin lock because it's protecting a very small bit of code. All right any questions on that yeah so spin lock was just that so that was our spin lock so this was a valid lock implementation it just wasted CPU time because it just kept on trying and trying it again but so spin locks are valid they just kind of waste time so we essentially used a spin lock to implement a better lock all right cool so that makes sense so again remember what causes a data race so two concurrent actions access the same variable and at least one of them is right so if you don't have any concurrency between them you don't have any data races your life is good so that's why if we consider the queue as a whole we don't have to worry about any one particular element of that being involved in a data race if we just lock all the accesses to it makes our life much much easier but if we look at this definition there might be some things we want to do as kind of an optimization so at least one of them is a right which means that well we can have as many readers as we want and you might be in a situation where your memory location has very infrequent writes and then a lot of reads and you still don't want any data races with it so you can use a different lock for kind of optimizing this case which is called a read write lock so with mutexes and spin locks you either have the lock or you don't even if it's a read you have to have it protected even though you might be able to have multiple threads in so mutex is just all or nothing it doesn't really care but you want reads to happen in parallel ideally so the way that work is there's this pthread read write lock so there's two lock calls there's a call for locking it for read and then a call to lock it for write so the idea is that many threads can make it to a read lock call and then pass through it because you're essentially saying that oh pass this is just read so that's fine and yeah that's fine we can have as many threads do that as we want but only one thread should be able to go through the right lock at a single time so this is how you could implement a read write lock so it's the same idea before where we just use a guard lock and this time I'll just not show the implementation of it because it could be a spin lock it could be a mutex at this point so we have a variable for lock that represents our normal mutex and then a guard that's only used within these read write lock calls so the write lock can just straight up use the lock because we know that guarantees mutual exclusion so the write lock and the write unlock are just correspond directly to our mutex lock and unlock the only difference is our read lock so the read lock just uses a guard because there'd be a data race with the number of readers so we're gonna keep track of the number of threads that currently have the lock open for reading so we have a guard lock that protects the number of readers so if we acquire the read lock and we get the guard we increment the number of readers and then if this is the first reader so if it transitioned from a zero to a one we know it's the first reader and then we should grab try and grab the normal mutex lock right so then that way you can't pass a write lock because now the read lock essentially has the mutex we care about and then it would just unlock the guard again mostly just to protect the end reader variable and then in the read unlock it would get the guard again to protect because we're going to modify the number of readers so you decrement the number of readers if there are no more readers then you can just unlock the lock right the lock that represents the entire thing and then you unlock the guard so does this make sense as an implementation so if you have eight threads all calling read lock they would just increment the number of readers the first one through would acquire the actual lock representation in this read write lock struct right it would acquire this lock and then another thread can come in it doesn't need to reacquire the lock right because we already acquired that lock for reading so we can just allow more and more readers to come in until eventually all the readers will unlock at some point and then when we have the last reader we can unlock the lock completely so does that make sense to everyone cool so the summary of what we talked about today and yesterday is we want critical sections to protect against data races you should know what they are and how to prevent them mutex and spin locks are the most straightforward and kind of useful locks that you will grab and they're kind of the most straightforward and we know how to implement them with some hardware support you're going to need some kernel support for wake up notifications or in lab two you're actually doing this too so you could implement a spin lock on top of your thread or a good mutex on top of your threading implementation because you'll have thread sleep and you'll have thread wake up right so you could implement this now and then if you have a lot of readers you should use a read write lock all right any questions about locking all right cool well then we can talk about the quiz all right so quiz one details it is open book this is posted by Ashwin so you consult your notes, lecture slides, textbook you're not supposed to search the internet and you can't talk to your friends digitally, in person whatever you want also optionally there is a room book that I will be in or Ashwin will be in likely me I hope where we can go you can go write the quiz if you think finding a room on campus will be difficult just as a curiosity who actually wants to do that any commuters here yeah so I'll be there I have a longer commute than most of you probably so yeah if you want I'll be in that room I guess there's not that much interest but it houses like I think it has room for like 50 people or something like that so there's that option if you want to use it it will be I mean it won't be that fun but it'll be something the quiz format there are going to be 15 true false questions five multiple choice five multiple answer which is like pick the things that apply and then to short answer we write like at most three sentences so on Quarkus there was a micro quiz posted either last night or this morning I'm not actually sure but we can do that cause honestly this quiz isn't that bad and this micro quiz is actually probably worse than the actual quiz so let's go through it together so the first question is invalid I'm not sure why you put that on there so on context switch between two processes the operating system has to modify the state in the MMU that's yeah it gets true so this isn't super valid for us but we know that the operating system has to change address spaces so you could answer that yeah true yeah I literally went through and so this is pulled from I guess an old one cause I saw this question in the question bank and I changed it to just say address space instead of MMU well it's posted here so it's probably not in the actual bank yeah so that's just state in the MMU like the hardware part of the MMU yeah the state of the process is a different thing so to switch between two process one would be like put depending on what it did it would either be put into blocked or put into waiting and then the other thread would transition or other process would transition to running right but it's kind of a weird states like a super overloaded word in computer science all right next question cool when a divide by zero error in a program occurs a trap instruction is invoked so the kernel can kill the program yeah sorry so trap instruction is something like a system call where you invoke it yeah yeah so false it's an exception and also if it's an exception you'd get a chance to handle it too there'd probably be a signal sent to you so false cool and yeah and that was actually a pretty hard question all right question three on a computer system just one processor core there are no benefits to having multiple threads within a process I mean that's pretty easy up if that was true then why the hell would we take this course right that would just be mean all right this is a weirdly worded question which of the following is responsible for translating a process's virtual memory address into physical memory address which also is not super valid because the answer actually should be the MMU but between all this yeah which of the following is responsible so technically it is hardware because it's the MMU so what which is why I don't like asking these questions now because we don't know how virtual memory actually works so the kernel will set up the mappings and then there's some hardware that actually does the translation so it's essentially like you set up a lookup table and then there's hardware that uses it which is not that hard of a concept which is why I'm waiting for it but yeah so hardware all right this is question five this is a multiple answer one this is tricky assume you have a computation such as matrix multiply that you'd like to parallelize so the computation can be completed more quickly you can do this by either using multiple threads within a process or multiple processes to decide which of these two approaches you to use you consider various trade-offs which of the following considerations are valid select all that apply so using multiple processes will result in less physical memory needed yeah that sounds not true because there'd have to be some more accounting for processes they probably use slightly more or it's like a watch or it's like insignificant at best using multiple processes will result in slower more costly synchronization true right because they're in different address basis so you have to communicate between them and you have to go through the kernel and that's slow all right using multiple processes will result in faster sharing well that's kind of the opposite of what we just said so that can't be true using multiple processes will result in higher context switching overhead true but this one's also kind of yeah so that's true if you're switching between threads within the same processes without switching to another process in between so context switching threads if you're within the same process you just have to essentially swap the registers while if it's a process you have to swap the registers you have to swap the address spaces and you have to swap open files other stuff we don't really know about yet so that sounds true using multiple processes will result in fewer harder to detect bugs oh we got so okay so for sake of argument do you have data races between processes so are data races a pain in the ass to debug so probably true because most of the time all your problems are due to two things running at once and communicating between them so if they're in the same address space you can't really detect if something really bad happens while if everything is super explicit then you actually know probably have a better idea of understanding what's going to happen and then two if a thread crashes your process is dead but if a process crashes the other it'll probably give you a better error message and the other one will still be alive so you'll at least be able to figure out in which process did something bad happen and you're probably more likely to narrow it down cool let's submit our quiz well yeah see so now you can see why I said don't be that worried about the quiz like some of the questions so it's like it's pulling from a question bank so it's like it'll pull what I say 15 true false questions out of like 30 there's some that are slightly trickier than others but like I don't know at least in my opinion like this is probably the hardest one or not I don't know depending on you guys yeah any other questions or yep no no code writing short answer is going to be either explain this or what happens if which remember we kind of talked about the only real valid thing to ask and what would happen if is like threads stuff because I can't ask anything about processes because they don't know anything right all right well cool we can so I had a request to also clarify make it super crystal clear what is kernel mode and what is not kernel mode so let's write some kernel code so kernel code is so unfortunately you don't get to do anything or have a flavor of it but so here's your flavor now so if you want to write I don't know okay it doesn't so if you want to write kernel code it looks something like this I don't know what apparently code doesn't recognize it but the kernel kernel code is just C code it's not special in that way the only thing that's special about it in what is what CPU mode it runs in so anything I'm writing here if it's running in the kernel it will be running in kernel mode and I can do any random stuff I want I can kill all the processes if I want to I can do whatever I could play with hardware if I knew how to do that I don't know the calls off the top of my head but you could do that so it has some macros just to make things easier to debug and stuff but it doesn't exactly look like a C program because it's missing something called main right so the kernel is kind of like a very you can think of it if you want kind of as a long running process so it doesn't have any main because it's already running so how the kernel works is it separates code into modules and then you insert the module into the kernel and then it would execute the code as part of the kernel so it's in kernel mode so as part of their functions you can register a function to run whenever the module gets loaded into the kernel and starts executing and then you can have some code that executes whenever you take it out of the kernel and have it stop running so in this case you'll also see that because I'm in kernel mode libc does not exist I don't have a C library so I don't have printf so what they did just to make it easier to develop in the kernel there's like an internal buffer that they used to print messages to so you can see so instead of printf use printk and there's a bunch of different it's essentially like a logger if anyone's used a logger before that you have different levels that you can print to so one is called kerninfo and then you can just print whatever message you want so I'll just show that this code's actually running as part of the kernel I'll just print something into the internal kernel buffer and then when it exits I'll just print that the exit happened so if we go ahead and make that we're all good and you'll see that it generates a lot of stuff but it also does not generate an executable at all what it does generate is this .co file which if you compile like a C file you'll generally see .o files that you can compile into an executable right but since this is the kernel there's not going to be an executable that .co file just represents a kernel object file and then you're allowed to load that directly into the kernel so let's go ahead and do that so the command to insert modules into the kernel is called instmod so install install module and we could say hello.k and then now at this point that all the code in the knit ran because that code is now in the kernel right it's running in kernel mode which is pretty sweet let's go ahead and see that message so you use dmessage to look at any messages in the kernel's buffer right because this is all user space we don't have access to kernel I have to do it as a super user and then use dmessage which would make system calls to the kernel it says hey I want to read your buffer right so if I do that I put bash l info because I only care about seeing info messages if I do that I see a bunch of crap that happened when I booted and then my message right there so my message so that is kernel mode right so I could have written a bunch of stuff there but the problem with writing kernel code is it's kind of a pain in the ass right even to figure out how to print stuff there's no terminal there's no anything so I have to use printk you have to kind of look up documentation get used to it yeah so the question is could I kill a knit here finally finally kill a knit people want to kill a knit uh... yeah you probably could can I uh... I don't know what the call to use is you could but and then to remove the module I remove module hello if I look at that see now my code exited and that's that's it and we can even you know see when we install our module we could even s trace it to see you know what's actually going on we expect you know it's not going to print to standard out or anything but we can see the system called actually inserted into the kernel so if we look at that there's just a bunch of crap and uh... so this is the kernel module here it opens it up file descriptor three we see the normal l file so it's a good thing we learn what that is so it's just some code and then it's this call right here at the nip module that loads it into the kernel and then the kernel takes over from there and then it would just start executing it so that shows are clear separation between kernel mode and user mode so any questions about that fun stuff so this is how you actually write kernel code it's like not special the only thing that special about it is what mode it runs in so cool we we all know how to write kernel code now but this like super illustrates that kernel mode and user mode like they don't even really look that different it's just a CPU mode on your hardware and it's like clear boundary right so printk that is a kernel internal buffer if i try and run this in user mode that not even doesn't even compute right it's running a completely different thing the kernel doesn't exist in user mode it only exists in kernel mode so any of these calls any code you write it's not going to work which is also why debugging it is a pain in the ass so i don't have gdb anymore i don't have uh... yeah i don't even have a standard c library like what you do you don't have malloc right the kernel needs its own malloc of course because you don't have c and who has to implement the mechanics behind malloc the kernel right yeah so questions what's the underscore t in like in all the definitions it's just a shorthand for type so it's just a way that because typically you want to the whatever is at before the dash t is the name of the thing and you'd actually want to use that name and if it was just called that without the dash t you couldn't use that name anymore because the compiler would get confused so they just throw dash t to say that it's a type and not an actual variable or something yeah so that yeah so yeah so sometimes too there's all different conventions sometimes you can call a struct something else that doesn't have struck in front of it sometimes it'll have struck in front of it that's like just random c things you can do and that kind of depends on you like you can get rid of the struck keyword if you want in the kernel they want the struck keyword always there so if you try and contribute code to the kernel and it doesn't say struck on it you try and hide your struck they'll yell at you and the links current developers will yell at you if you do something wrong but like in a kind of nice way alright any other questions about the quiz or anything yeah mm-hmm the read write the thread woke up before it went to sleep true yeah it's still a problem yeah so because the guards unlocked here I could get a wake up before it sleeps second card yeah yeah I have to look at how they I think that came up before yeah but yes there would be an there's still an issue it'll happen less frequently yay yeah I have to double check that alright any other questions for quiz one the simple quiz yeah lab lab two is harder than my OS labs were by a fairly long shot I don't even remember what my OS labs were in undergrad there's one where we like uh... did a buffer overflow attack on the kernel that was cool but other than that I don't remember any of the other labs I'm gonna look at what the other course did uh... because they did some set contact stuff so I might do that Monday just to help with lab two and plus we're like they're gonna take at least two or three lectures to catch up to us on like fork and p-thread stuff so we got time oh no I mean I guess we'll see we'll see after quiz one if you guys do better than guess yeah so I guess on that uh... yeah hopefully we do good on quiz one but if there's like forks and stuff we we can do pretty good I think pulling for you we're all in this together