 All right. Hello, everyone. Let's quickly, so game plan for today, we will quickly blast through this lecture, and then we'll talk about the quiz, since that's probably all you care about. And we'll write some kernel code and we'll do some fun stuff just to make things like crystal clear. All right. So today, we'll talk about lock implementation. Last lecture, we tried the lock implementation and didn't quite work, right? We showed an example where two threads can call lock, and two threads can return from lock, and that is not a mutex, right? Two threads came, two threads went through, when in a mutex it should only be one at a time. So you can implement locks in software with minimal hardware support. The only two thing, two hardware requirements you actually need are atomic loads and stores, and you need instructions to execute in order, which sounds like a no-brainer, but processors today execute out of order, so that is not necessarily true nowadays, which makes your life of pain. So if these two things are true and you want to implement locks with no other hardware support, you can use something called Peterson's algorithm or Lampard's Bakery algorithm, which is essentially just you, if anyone's ever been to an old school like Bakery where you just pick a number and then you're served in that order, that's pretty much what it is. And we'll see what the other class talks about in terms of doing this if they don't show hardware support, but for now we'll talk about real hardware support that you would need to implement locks. So they don't skill well and you need special hardware support because now processors do execute out of order. Yeah, so within your processor, right, you have instructions and they get split into micro-instructions and the hardware to make optimizations will reorder them if it can and do weird stuff like that. Yeah, so it would reorder them such that you still get the same result, right? But internally it might, if we start caring about the details now in terms of data races and stuff, it might reorganize it where you'd still get the right result, but the details in between might be different and that might cause us some issues. But in terms of hardware support, there is this kind of magical atomic function so you can think of this function as atomic so it either fully completes or it doesn't complete and we'll call it compare and swap and give it a C API just to make it a bit clearer to read than assembly. So it's called compare and swap. It takes a pointer to an integer and then two integer values, an old value and a new value and it returns whatever value that pointer points to. And why this is a useful function is it will only swap if the original value was old and then it will change it to new. So that way if we have compare and swap and we give it L with a zero and a one, there's only gonna be two outcomes because this is atomic. Either compare and swap returns zero which means it pointed at a zero and you're guaranteed from using this function that means that zero got swapped with a one. So in only the case where compare and swap return zero then it will have changed that whatever it was pointed to from a zero to a one and that's the only way to transition from zero to one. Otherwise if it was already one, compare and swap will return one which would make this loop go on and on and on again. So it would just keep on doing the compare and swap. The value would be one. It would keep on looping, looping, looping until eventually the unlock happens and a zero is written to it and then another thread that was waiting at the lock could pass. So this is actually a valid implementation of a mutex. It is technically called a spin lock because as you can see this while loop just spins. It just tries over and over and over and over again and it doesn't do anything very smart. And it also violates another rule that we wanted for mutexes where things are fair because in this case if everything's just fighting over the lock you've no control over what thread actually gets the lock and it could be that the same thread just by virtue of context switching and whatever the kernel decides the schedule can be it can just be one thread that gets the lock over and over again when technically you should share it. So that was essentially a spin lock and compare and swap is actually a very common atomic hardware instruction. On x86 it's called kum-stinge which is a sick shortening, right? So all that says it's called compare and exchange instead of compare and swap. Don't ask me why they cut so many letters so it saves them some typing I guess. But that's a valid implementation of a lock but it still has a busy weight problem and it's not the greatest implementation. Yeah, there's a C function for it but I think it's also called kum-stinge which is less readable, I just called it compare and swap. So it's a bit more readable. But yeah, so if you consider a uniprocessor system so something with only a single CPU core on it we know we're definitely wasting our time. Like if we can't get the lock here and I'm the only thread that's allowed to run if I see one, nothing else would have changed it to zero so I'm just kind of endlessly spinning for no good reason even though I should know better at that point that I cannot acquire the lock. So a better thing to do would just be to let the kernel schedule another process or another thread in the hopes that that frees the lock. So you are going to implement this in lab two, that's your yield. So instead if you see that you don't get the lock you could just simply yield and then hope another thread frees it up for you. And then on a multiprocessor machine it depends you might want to have a spin lock if you want to be super responsive and you know your critical section is very, very small it might actually be more efficient to just spin a few times as opposed to putting a thread yielding a thread and then waiting for it to come back. So this is what it looks like if you add a yield it doesn't complicate the code that much it's just if we see that we didn't successfully compare and swap which means the lock is still locked we just yield but now this has a we kind of eliminated the busy weight problem but now we have a what's called a thundering herd problem where if we had eight threads that came here and they all yielded well as soon as we unlock we know only one of those eight threads can acquire the lock after we unlock it but what's actually going to happen in this case is the kernel is going to wake up all eight threads at some point and then they're all going to try to get the lock even though we know seven of them aren't going to succeed and only one is. And we still have the problem where we have no control over what thread gets the lock next. So ideally we want to be able to reason about it and the easiest thing to do is to just kind of have like a first in first out queue so you're just in order of when you try and acquire the lock is the order you actually get the lock. So we can change the implementation and we can add a weight queue so now when we do the compare and swap and we see that the lock is acquired by another thread we can go ahead and add this thread this current thread to a lock weight queue and then we can put ourselves to sleep which means we need another thread to wake us up. So in the unlock we would unlock the lock and then check if there's any threads in the weight queue cause we need a thread to wake up if there's a thread waiting if there's a thread in the weight queue we would wake up one thread and it would be the thread that's at the front of the list so hopefully we're nice and fair but because we're now in the land of concurrency and difficult programming there's actually two issues with this code even though it looks kind of correct one's called a lost wake up and the other one is just called the wrong thread gets the lock. So let's have a look at it and see if anyone can tell me where my problems are. So looking at this code first thing we'll try and identify is a lost wake up and the lost wake up should mean that a thread gets put to sleep and it never gets woken up and if you wanna go the real life analogy the thread essentially gets put in a coma which is that. So the comment above that puts itself into the queue. So what thread one is like here it adds itself to the queue but hasn't slept yet. So thread two comes here it goes here to here so it unlocks. Yeah, so that's one problem so it would come in here and then there's a thread in the weight queue that's not asleep yet and then it would try and wake up although that's something you would check you probably get, if you wanted to you could check some error code from the wake up and be like oh the thread's not asleep yet or something but that's definitely a problem that we don't consider now so that's one problem. What about the opposite where I am asleep and I never get woken up again? Yeah, yeah well that wouldn't happen though if here let's see if we had another condition here that's like if not asleep try again. So then that would fix that. Yep, so you mean like if thread one, whoops, what the hell did I just do? So if thread one instead of being context switch there it got context switch there. Yeah, so and this also assumes that thread two has a lock and it's calling unlock so we would assume thread two already has a lock and then thread one comes into the lock function it would try the while loop, compare and swap would return one so that means the lock's already acquired and we just said thread two has it. So it would go into the while loop and it might get context switched out immediately before it adds it to the queue and then thread two could go ahead and unlock itself so it would set the lock to zero and then check if there's any threads in the queue there currently is not a thread in the queue so it wouldn't bother waking up any thread and it would just return and then we would get context switch back to thread one and then thread one would add itself to the queue and be put to sleep and now no one is going to wake it up because no one is going to call unlock, right? You need to pair lock with unlock. It could be woken up if someone calls lock and unlock but that would also might add another thread to the queue in which case you wouldn't clear the queue so you're missing a wake up. Yeah, yeah, yeah, yeah, yeah, yeah, so it's all done. So does that make sense? All right, how would we fix this? Oh wait, no we still have another problem to go over so that's lost wake up. Can anyone think of a case where we have multiple, say we have three threads, here I'll make it a bit easier, say we have three threads, that can still be there, three threads, T one has the lock and then T two is in wait queue. So if that is our beginning assumption, can anyone tell me a case where some rogue thread could come in that is not in line and essentially bypass the line? Yeah, so right now if T two is in the wait queue, the next thread that should get the lock is T two. So assume we have another thread called T three. So is there any way for thread three to get the lock before thread two? Yeah, so thread one has the lock already so it would be the one responsible for calling unlock. So at this point it would be say thread one is here, it's about to, it's called unlock and then it goes ahead and changes that value from a zero to a one and then it gets context switched out right before the if statement and if right before the if statement thread three comes in which is the new thread and calls lock it would see that the value of the lock is a zero. The compare and swap would return zero which means it just grabbed the lock and it just swooped right in and took it when it should have been thread twos turn, right? It essentially got like the fast pass at Disneyland without paying for it which Mickey Mouse does not like that. So does that make sense that they can still bypass the queue? All right, cool. So let's fix that. So here's example just so you have it of the lost wake up which is slightly different than what we just kind of discussed and then the wrong thread getting the lock just so you have it. So here's how you would fix that problem. It's kind of big. It's ugly and we will go through it quickly. I'll get myself out of the way. Okay, so to do this we are going to use essentially two locks. So there is lock which is what we want to represent the entire mutex and then we're going to have a second lock called guard that we only use internally and guard we will use our spin lock implementation because we know it works and this is the only place we'll use that guard lock so we know the code is nice and short. So having a spin lock won't be that big of a detriment. Yeah, so lost wake up is only going to happen if that queue is empty and it essentially skips that if statement when it knows if another thread is about to be added. But yeah, you could not have a bug if there was something already there that was ahead of it. So there's a thread waiting anyways and that thread would get waking up even if I haven't added myself to the queue yet. So likely you wouldn't see this bug ever, right? It's kind of a rare one. It would only happen if the list is empty. Okay, so let's break down this quick because it kind of looks ugly. So whenever you see this compare and swap with the guard you can replace that with just think it's a spin lock. So just replace that with a lock to guard and then here is an unlocked to guard and unlocked to guard. So we're locking into the beginning of the function and then there's two paths and we unlock it in both paths. So we grab the guard lock so everything until we unlock it is all you know only one thread can be there at a time so we won't have any data races which is what we didn't want which is what happened before we got context switching and bad things happened. So in the case the lock isn't acquired we don't need compare and swap anymore because now we have this guard and we're under mutual exclusion so we're the only thread so we don't need compare and swap anymore to actually acquire the lock. So we check if the lock is zero if it's zero we just change it to one because we're the only thread actually accessing it and then that means we would acquire the mutex and then we can unlock it and then it would return. So that's the case where we don't have to do anything we just immediately get the lock and it's all good. The other case where it is locked we would add ourselves to the queue throw ourselves at the back of the list and then unlock the guard and then put ourselves to sleep. And then in the unlock here is our essentially our lock to guard and then we can check the queue because now if it was going to add itself to the queue it would have already done that because it had the guard lock so it wouldn't be able to go jump from here to there because it would need a lock to actually execute this it's in a critical section. So after the guard we check if the queue is empty and then if it's empty we can just release the lock we don't have to do anything. Otherwise instead of even unlocking it we can just keep it locked and just transfer it to the next thread because the wake up is going to transfer the lock here we're just going to wake up a single thread so it'll transfer here the value of that lock will still be one indicating it's locked and it'll just continue and will be all good. So does that make sense to everyone? Yep. Yeah so the guard is like a mutex right it is a mutex but it's a spin lock so it behaves the exact same way. So like this is locked this is unlocked this is unlocked so anything between the lock and unlock only one thread can be there at a time so we don't have any concurrency issues. Yeah so that there is a slight error or well there's a slight bug here because we unlock here and it could be the case where before we put ourselves to sleep another it could context switch right at this point and this can actually run now because it can acquire the guard and then it would do a wake up on a thread that's not asleep yet but that's actually a condition that you can check so your wake up can be like oh that thread's not asleep yet. Yeah so in this case there's a slight data race but you can actually detect it so if wake up fails I know something is in the queue and it's about to put itself to sleep and I got super, super unlucky so if I just it's like the spin lock right if I just retry a few times probably be fine but yeah there's a slight yeah there's still a slight data race but that's like even catching that I didn't see it the first time. All right that makes sense to everyone? Yep sorry so the guard is exactly a view text like what we saw yesterday so only one thread's allowed there at a single time because this is essentially lock and then unlock so if you have the thread and you are executing this line here say you're executing this line here and you get context switched out you acquired the guard right so no one else is in this code at all they would just be essentially either here or there and trying to acquire the lock and they couldn't get it so it's just gonna wait till you get context switched back to the original thread and then it's gonna continue executing yeah yeah so it's a mutex so like this is a valid implementation of a mutex right and it only allows one thread between lock and unlock and we're essentially just using this spin lock in this code so we're using a spin lock as a guard to implement a smarter lock that will actually put the thread to sleep and do all that good stuff so we eliminated that busy wait problem right we're just trying to solve all these problems the last wake up the wrong thread getting the lock and also the busy wake up and we were trying to also be fair so now we have a queue that works you can't bypass the queue you can't have a lost wake up you have to retry you might have like it might be about to go to sleep but you can retry that and we put ourselves to sleep right okay so that's the actual implementation of a mutex that you'll see I don't think you actually have to implement it but that's essentially what a mutex is which is kind of cool so also the next thing I'll say is remember what causes a data race so a data race is when there's two concurrent accesses to the same memory address and at least one of them is a right and the right is kind of the important part there so if the only the source of the data race is the right that means we could have as many readers as you want you don't even need a mutex as long as nothing writes at the same time so if you just and it depends on your program so you might have a program that writes very very infrequently and then has a lot of readers so ideally you'd like to be as parallel as possible so if you can have as many readers as you want you want to permit that right if I use a mutex if I just have a mutex in my arsenal I would have to put a mutex around the read and the writes just so I don't have any concurrent accesses that could be a right so what you can do is implement a lock that has different behaviors based off whether you're acquiring it for reading or writing so there is something called a read write lock so with mutexes or spin locks you just have to essentially just lock the data even if you don't know if a right could happen which is what I just talked about briefly but you want reads to happen in parallel as long as there's no writes so there's for a read write thread you separate out the lock so there's going to be a lock specific for reading and a lock specific for writing and the property you want is that multiple threads can hold a read lock but only one thread at a time can hold a write lock so that you don't have any data races right you'd eliminate that two concurrent access problem and make it back down to one and the read write lock would essentially wait for all the readers to be done until you can get exclusive access to that lock so we would use the same idea of having a guard lock here and essentially our guard lock and this time I'm just using it as a lock just to assume it's like a mutex I'm using the guard to essentially guard this integer reader value so I don't have any concurrent accesses so this looks exactly like we're going to use it to keep track of the number of readers that are currently active and this plus plus reader sure looks like a lot like plus plus count and we don't want the same issues where we don't know the correct numbers especially if we're actually managing a lock because we also have a decrement so in the case of the right version of the lock we want it to essentially exactly be our original mutex so write lock just calls lock and write unlock calls unlock for our lock in that structure and then the only difference comes for the read lock so it would acquire the guard and then unlock the guard so all of this code and all of this code they're in critical sections that are protected by the same guard lock so they can't run concurrently because we're changing n readers so all it does is keep track of the number of readers so it just increments the count that you would assume is initially zero and then if it went from zero readers to one reader so this is the first reader it would just try to acquire our lock and then after it's done it would unlock and so now if another thread comes in and calls read lock again it would just increment itself from one to two it wouldn't have to call lock again because we're just reusing that lock it's still a mutex if we called lock on an already locked lock it's too many locks and it dies so then our unlock we would just decrement the number of readers and then if we see that we're the last reader we just do our normal unlock call yep everyone good yep so you need the read lock because it acquires that lock if you didn't have a read lock all the reads could still happen and a write could happen as well and that's still a data race yeah yeah so this is if there's one reader it gets the lock and then they can all share it essentially so either there's like two versions of the lock either just the writer just has it like a mutex and it's the only one with it or one or more readers have the lock and they're all sharing the same lock okay this makes sense cool all right let's wrap up and talk about quiz stuff and have more fun so you want critical sections that we talked about today and yesterday to protect against data races you should know what they are very very important and even more important how to prevent them there are lots of fun bugs and lots of fun and interesting questions you can ask about them because preventing them is very hard so mutexes and spin locks are the most straightforward that we actually know how to implement now ideally you need some hardware support to implement locks and you need some kernel support for the wake up notifications or you actually have those you're actually writing those yourselves in lab two and then you know if you have a lot of readers you have a read write lock all right cool let's talk about the quiz so did anyone go on corkis so everyone saw the micro quiz is everyone anyone worried about it yeah yeah what okay so let's quickly go over the micro quiz because this will take us like two seconds so question one again this wouldn't be valid because that's the word MMU on it I don't know why I read this one but so on a context much between two processes the operating system has to modify the state in the MMU to make this valid for you guys it would be the operating system has to swap out address spaces so is this true or false true beauty question two whoops which is probably on the harder end of things for this so when a divide by zero in a program occurs a trap instruction is invoked so the kernel can and kill the program false why yeah trap instructions the system called what's this actually called an exception which will generate a signal so close so that false on a computer system with just one process there are no benefits to having multiple threads with one process that better be false otherwise you're all taking this course for no reason alright which of the following so this might not actually be super valid but it's like kind of general enough but this is probably like the hardest question on this in terms of like knowing stuff from the lecture which of the following is responsible for translating a process's virtual memory address to physical memory address whoa alright the compiler probably not operating system kernel not technically which is why this question is kind of odd hardware technically yes because the MMU is hardware and does yeah MMU is on the CPU which well this is why the question is kind of weird because it's like kind of classifies hardware so technically your kernel will set up essentially lookup tables for the MMU which is why I don't want to talk about it yet because knowing that it translates it doesn't really help anyone so technically this answer is hardware it's like one of the most silly questions on this which is why this little micro quiz is probably actually harder than the actual quiz you don't know that yet we haven't talked about that so question 5 assume you have a computation such as matrix multiply that you would like to paralyze so the computation can be completed more quickly you can do this by either using multiple threads within a process or by using multiple processes to decide which of these two approaches you use you consider various tradeoffs which of the following considerations are valid select all that applies so this is your multiple answer question so using multi processes will result in less physical memory needed no at best it would be a wash or negligible but it's not lower alright using multiple processes will result in slower more costly synchronization yes because if you have separate processes right the only way they can talk to each other is through the kernel which would have to be system calls which is slower using multiple processes will result in faster data sharing false because I just said the first thing alright using multiple processes will result in higher context switching overhead true right because it we don't know everything involved in context switching processes but it's at least going to be the same thing as threads plus swapping the address space so that's more that's slower alright using multiple processes will result in harder to detect bugs we got true true yeah so that's actually kind of true because remember if a thread dies and does like a segfault or something whole process is going to die you're not going to have any single clue as to where that came from while if one of your processes dies it won't kill the other one and you'll at least have an idea about where it came from so yeah so another explanation is like well if you have processes to link it back to the first one to talk to each other goes through the kernel so you can monitor that pretty easy well if you share the same address space it's a lot harder to see if you're walking over other things yep so it's essentially just the opposite so synchronization would be much faster but it'd be harder to debug and it'd be faster context switching it's just the opposite of this yeah synchronization so any communication between the two processes or like synchronization generally is involved with locks and stuff like that and like making sure you do things in the right order which we don't really know how to do yet we kind of know how to prevent data races but like locks and stuff aren't even on the quiz yeah yeah so any communication has to be super explicit because it has to go through the kernel yeah context switching overhead is just the amount of time it takes to context switch yeah switch process these are threads so it's going to be much faster to switch between threads than it is for processes because processes would have to change the address space as well yeah it takes more time to switch between two processes than it does to switch between two threads within a process yeah yeah well if you have processes in this case one process is just going to have one thread right if everything's a separate process it depends if it's a kernel thread or a user thread if it's a kernel thread the kernel just decides if it's a user thread you get to decide alright let's submit our quiz and see how we did sick we yeah what do you mean we got it right you think I'm just going to go through and have them all wrong yeah 1.1.1.1.1.1. beauty hundo more details about the quiz just because I guess people like hearing about it open book you can use any of your notes, lecture slides, textbooks you apparently are not allowed to search the internet you're not allowed to communicate with each other in any form or fashion or whatever there's a room booked if you're worried about the internet and want a witness it will be GB 304 you're not required to be there but benefit is I will be there I can answer questions but it would be like a normal exam where I hand up and I go over to you I think Ashwin will also be on zoom if you're not in that room and then also benefit is if the internet in that room goes down you have a valid witness so if the internet goes down and everyone in that class can't do the quiz you at least have a witness so that might be good or it might be somewhat fun probably not quiz format it will be 15 true false 5 multiple choice and 5 multiple answer with 2 short answer questions that are between between like max 3 sentences if you write more than 3 sentences you probably are like BSing and probably shouldn't do that yeah PS that's like a pet peeve of mine when I was the TA in marking if you like just spew knowledge that you think is right but not relevant to the question that's a real crap shoot if your TA just doesn't like that or will show pity on you I didn't show pity for that well if you just dump knowledge for part marks that's not even relevant you're just like wasting my time in your time yeah when you make a process it just has a single thread in it yeah yeah depends what your threads do if you have if they're kernel threads and you have multiple CPUs they can be run in parallel yeah alright cool alright to make things crystal clear let's write some kernel code wait it's actually not that hard alright not sys calls I don't have to communicate with the kernel we are now the kernel so kernel code is not super different aside from a few things that are much different that makes sense but the kernel code is just C code in C code there's nothing special with it so it has include files so you include stuff from kernel and there's a few macros so you'll notice in this code there's no such thing as a main function this is my entire function so there's not going to be a main because this isn't going to be an executable because this isn't going to be a process because this isn't going to run in user space this is the kernel yeah so processes are just user space things so I am the kernel now that's it so there's not going to be a main because executable because the kernel's already running at this point, right? I'm typing into my system, kernel's already here. So the way kernel code works, at least for like fairly modular things that you can throw into the kernel is you make something called a module and you can add code that executes whenever you load it and whenever you unload it. So this says whenever I'm loaded into the kernel, you should run a function called hello init and whenever you're unloaded from the kernel, you should run a function called hello exit. So hello init, you'll see a few differences with C. So like normal C you've been written, this is not print F, this is print K. Anyone hazard a guess as to why? Yeah, do I have the C standard library in the kernel? No, because I am the kernel. So in the kernel, it's a bit different too. So it's essentially like an internal buffer just for logging purposes. So everyone's probably used logs before in some respect or at least seen them. So there's different levels of messages you can run depending on how severe it is. So like you could have error messages, warning messages, debug messages, and so on. So I just said, hey, I'll put an info message just because it's nice to read and I'll just print to the internal kernel buffer hello init and that's it. And then when I exit, I'm going to print hello exit. So if we go ahead and compile this, it's like it kind of is a C file, but it uses a bunch of kernel headers. And when you compile it, you'll see that it generates some kind of normal looking stuff. There's like a .c file, this .mod stuff that is supercurl specific. But generally you see if you compile it in little chunks, there's going to be like a .o file, right? We've seen .o files, but because it's a kernel, this is a .co file. So which just stands for kernel object. So it's just indicating that you shouldn't probably use this unless you're using it with the kernel. So that's all you get. So you get this kernel module and it is a normal ELF file and the kernel knows what to do with it. So you might ask yourself, well, I have some code. It's clearly not running because it's not executing it all. How do I even see it? So to insert it into a kernel, you use this command. So I have to use it as a super user and it will. Inst.mod is to short for install module or insert module, however you want to think of it. So I'm going to insert hello.co into the kernel. And then as soon as I insert it into the kernel, it will start running that hello init and that's now going to be running in kernel mode. So I can do whatever I want. So if I insert that, oops, it says file exists because I tried it out before. OK, we're going to see a bunch of messages. But if I insert it into the kernel, it now runs and now I can see my debug messages if I use this command called dmessage. So that lets you read and you'll never be tested on this. We're just exploring kernel mode. So you can use dmessage to see all the internal kernel messages and then dash L to see all the things that are at the info level. So if I do that, I called it a few times. And this is the latest one. So I can see that within at this time, so everything's time stamped in the kernel, I loaded my hello init kernel module. All right. And then I could go ahead and remove the module from the kernel and then run dmessage again. I see hello exit. So now my code is no longer running in the kernel. Yeah. Yeah. So this was this morning's lecture. This was at the end of the lecture because I, I'll, I'll S trace it for you to see what happens. And this was me just fixing it. And then this is right now. Yeah. Yeah. So how does it know how to run it? So the kernel does a lot of like hacky things to make it work. But essentially, if you include that Linux dot module and compile it with all the header files, the Linux kernel understands. It defines these things called module init and module exit, which register functions to run whenever it's loaded into the kernel. So we can see that if anything executed because we insert into the kernel, it has to be executing in kernel mode. So for example, if right here I started, I can type code there and it runs in kernel mode and I can do whatever the hell I want. So if I knew that function, right functions to call, we could probably finally kill and knit if we really wanted to, because we are now the kernel and we can do whatever we want. Right. And we can see even if we want, we can, uh, we can S trace. Oops, pseudo S trace. We can see what ints mod actually does because it shouldn't be too much of a mystery. If it's inserting itself into a kernel, the only way it can do that is if it makes a system call and says, Hey, Colonel, please insert this into yourself and start executing. So if we S trace it, we can actually see what's going on because we now know how to read it. So at some point it opens that hello dot KO file, gives it file descriptor three, and we can see here that it is definitely an elf file because we know what they begin with now, and then it does some other garbage where it f stats it and then the important call is right here. F init module gives it file descriptor three, which is our kernel module, and then after that system call completes, the kernel is free to run that module as it sees fit. Yep. No, so, uh, in a normal object file, your C file gets compiled to an object file and then to make an executable, you combine a bunch of object files, but object files are just machine code. It's they're compiled. Yeah. So, so it's the, just a dot KO is kind of like a little piece of a library, right? I just defined hello init, hello exit. That's it. And I said, please run. What I'm only able to insert it into the kernel because it does some user checks to make sure you can actually protect that system call. All right, cool. We wrote some kernel code. So right now, if I wrote anything here, I can do whatever I want. I can screw up my kernel completely better than a fork bomb. I can loop over all of my processes and just kill everything. You'd have to reboot. Your kernel is going to die. So your system's going to die. So you're going to, if you power it back on again, just reboots, right? No, no, because I just inserted it. Yeah. You have to insert the module. Yeah. Yeah, you could, you could, you could, yeah, you could screw up your whole firmware if you want.