 Good morning, everyone. I still haven't seen the quiz yet, so don't worry about it. Who makes the quiz? You or the other one? We will both make it, but there's like a big question bank that I need to see. So he's just going to post a question bank. I'm going to go through it, make sure we covered it, and yeah. Is there a good practice for him? So he has questions at the end of his slides, but they may or may not be relevant to us. Depending on what the quiz content's like and where are the other classes, I might just do preparation for the next lecture. And this will be the last one we see. But if we're behind or if there's like lots of weird questions that I didn't anticipate, we'll do something about it. But it's hopefully not too bad, because it's mostly multiple choice, and then it should just be like 20% write a sentence or explain something. So hopefully the quizzes aren't too bad, and lab one was far, far worse than the quiz will ever be. All right, so let's talk about threading implementation, since that is what you have to do for lab two. And it's good to understand the pros and cons of all sorts of different things. So there's a few multi-threading models, and they pretty much just answer the question of where do we implement threads? Do we implement them in the kernel? Do we implement them in the library? And for example, pthreads implemented in the kernel. Your lab implemented in a library that you get to write. So user threads, which is your lab two, are completely in user space. The kernel doesn't care about your user threads at all. As far as the kernel is concerned, you are just a process running on a single thread, and then whatever you do within that process is up to you. And you can decide to do whatever you want. If you want to implement threads and switch between them, good for you, which yeah. Well, on the other hand, kernel threads are implemented completely in kernel space. So the kernel manages everything for you, and it can treat threads specially, which we'll see some issues. We kind of talked about it a little bit before, but some things can only be handled by the kernel, but of course there's gonna be some drawback if we have kernel threads because we have to go through that system call interface which is going to be slower than if we can just do C function calls and do everything completely in user space. So any thread support requires a thread table and it's kind of similar to a process table that we saw or discussed previously where you have to keep track of all the information that is associated with your running program and a thread you can think of as just the virtual CPU part of it. So it's gonna be the registers along with it, and now we know that each thread would have its own private stack just to facilitate actual execution of a program while sharing the same address space and all that fun stuff. So these can live in the kernel or in a library in user space depending on where it's implemented. So P threads live in the kernel. Your lab two code is going to live in a library that you will become familiar with. And for user threads, you're also going to have a runtime system to determine scheduling. So you are going to have to have your own scheduling, but I'll talk about it at the end. Your scheduling for the purposes of the lab two is just gonna be just a simple queue. But the benefit of having threads in kernel space is that you don't have to write a separate scheduler for it or anything. It can just reuse the process scheduler and schedule threads as if they are processes. And then you get all the nice benefits of the kernel scheduling things for you so things can run in parallel or concurrently and that's up to the kernel. But in both models, the process can contain multiple threads and by default remember that a process just has one thread when it's created and then all the threads live within a process and they kind of represent the virtual CPU part of it. So if you avoid system calls, one thread blocks literally everything. So if you have pure user level threads, again with no kernel support, that's essentially what that means. They're gonna be very fast to create and destroy. There's no system calls. You're just gonna have to allocate a stack and that's it. There's no context switches in the kernel that might switch more than just registers or a stack. But the drawback is if one thread blocks, the entire process blocks, right? Because the kernel just schedules it as a process and then if a process is blocked, even if you implemented multiple threads in your program, the whole process is blocked, so none of them are gonna make progress because that process isn't gonna be scheduled and therefore none of your threads are gonna be scheduled. So if it calls something like write, it's going to wait for that operation to complete before another thread can run. So the kernel can't just schedule another thread to run while that's waiting to output to file or read to file or something like that. And yeah. The system calls blocks the process. So if a system call blocks a process, it'll block all the user level threads. What system calls do not work as fights? Just write depends what the call is. It'll wait until the system call returns, but the system call might be waiting on something before it can return. So any system call is gonna wait for the system call to return and the system call can take unspecified amount of time depending on what system call it is. So all system calls block the process? Yeah, you can think of them all will block it temporarily just to do the function call or whatever. Until the return. Yeah, until the return. Yeah, so kernel level threads again. You're gonna have to go through the kernel. You're gonna have to make a system call. But if one thread blocks, the scheduler can just schedule another one because the kernel actually knows about them. There's nothing else that would stop. If there's nothing else that stops another thread from running, it's free to schedule it to run. So all the threading libraries you run are going to be in user mode and they're just gonna either make system calls or you are going to manage it yourself. So the thread library just maps user threads to kernel threads and it depends on you as to what that mapping is. So in the many to one model, all the threads are completely implemented in user space. So the kernel only sees one process and therefore only one thread. So that's why it's called many to one. There's only one thread that the kernel knows about living within the process. And then all of your user level threads are implemented on top of it and essentially just uses that one stream of execution. So therefore you do not get parallelism or anything like that. One to one is one user level thread maps directly to one kernel thread providing your kernel has support which they all pretty much do now. And then the kernel handles everything so that is how P threads work. So we can even S trace something to see the exact system calls that P threads use. But basically all the threading library does in the P threads case is just make a stack for each thread for you and just kind of saves you from having to write a fairly nasty system call. And then in the many, the many model it's just kind of all free for all. Many threads can map to as many kernel level threads as you want and we'll see maybe why you would want to do something like that. Yeah. Why do user level threads they don't do context switches? So they do context switches but not in the kernel. They do them within the user process. So it's again what happens in the kernel space what happens in user space. So they do context switch in the user space? Yeah. Yeah, so they do, so for user level threads all the context switching is in user space which is what you'll be doing in lab two which is the tricky part of it which is just swapping the registers and the stack pointer. How's that different from the kernel? Kernel mode context switch? Yeah, so the kernel would do the exact same thing swap the registers and the stack except it just has to interrupt itself it has to transfer control from user space to kernel space which is typically a bit slower but that's it. The mechanics are going to be exactly the same though. But it sort of gives different hardwares different hardwares to operate that so it's kind of different. Yeah, the question is if it uses different hardware so the mechanics of it are going to be exactly the same. It's just whether or not it has to pass through that user to kernel interface which is typically just slow. Okay. So it also has context switches then? Yes, yeah. So just to clarify that no matter just switching executions between two threads that's context switching it has to be done all the time and context switching is going to be swapping registers and the stack pointer which would be in the registers. Yeah. What's the difference between one to one? Yeah, so we'll see cases of many, many is a bit weird and why you might use that typically only like insane people use that one. So let's first talk about what we're going to do and then I'll circle back and relate this back to the lab too. So many to one is the pure user space implementation. So it's fast as we said before and it's fairly portable because it doesn't depend on any specific kernel, right? You write it in C which is essentially portable assembly and doesn't really matter what kernel you run on. You don't have to rewrite any threading code. Nothing is specific to any one kernel. So the drawbacks are again, this is a big drawback that everyone should know that one thread blocking causes all threads to block again because the kernel doesn't know about it and because there's only one thread within that process that the kernel knows about your threads will never ever execute in parallel and remember the difference between parallelism and concurrent. So with user level threads, they can all execute concurrently but because the kernel doesn't know about them they will never execute in parallel which you can imagine if you have a eight core CPU that's going to be quite crappy. Okay, so one to one just uses the kernel thread implementation which shares a lot with processes, right? It's just a thin wrapper around the system calls just to make it easier to use and we can S trace some applications just to see how kind of ugly it is but you get to deal with that ugliness and that too unfortunately. But this exploits the full parallelism of your machine since the kernel knows about it if one thread blocks or if you just have eight cores and then eight threads it can schedule them to all happen in parallel and use the full potential of your machine. So they do use a bit of a slower system call interface and then you lose some control because again this just reuses all the kernel stuff, all the process scheduling, it's just reused for threads so if you want to make sure threads are scheduled in a very particular way and that you're smarter than the kernel developers you lose that control because of course we're just subject to the same scheduler that processes are subject to. And then this is the implementation that's actually used for P threads and then the next one is the many to many approach. So the idea here is that you would have in the rule of thumb is you would have more user level threads than kernel threads so you might want to do something like if for example you think that creating kernel level threads are very, very expensive you might just want to cap out the number of kernel level threads that match the number of actual cores on your machine so you get the maximum parallelism if you want it and then after that you just implement user level threads on top of them and then your threading library just has to decide okay well if I make 50,000 threads which one of these eight kernel threads should each thread run on? So you can get the most out of multiple CPUs and reduce the number of system calls, yep. Sorry? Yeah. So in terms of threading libraries instead of doing this I'll show some techniques you would do instead of this because generally you just want to make as many threads as there are cores on your machine and then just kind of multiplex that so we'll see different techniques of doing that that are slightly better than this but it leads to a more complicated threading library if you start having some multiplex odd numbers of threads to odd numbers of kernel threads and it gets really, really weird and depending on your mapping luck you might block other threads so for example say you have eight kernel threads and then you have like 100 user threads and you map each core equally and then your like four threads happen to map to the same kernel level thread that all want to run in parallel but oh no one of those threads blocked and then it would just block all those other threads and you know you couldn't do anything about it. Yeah. Could you explain again why many to one user level they cannot have parallelism but only concurrency? Yeah so the question is can you explain why many to one doesn't have any parallelism just concurrency so because many to one maps a bunch of user threads onto one kernel thread the kernel can only schedule that one thread right which is just that whole process in this case so and if a system call blocks or if it has to do a system call it just blocks that process and then no other threads could run because that process is blocked. Oh and then there's sort of a concurrency within the process because of the threads. Yeah so within the process you would have concurrency because your threading library can switch between multiple threads which you can think of as just you know trying to execute like eight functions at the same time. Yeah. You will just slice one to one. You just wondering what that would mean a theme wrapper around a system call. So because in one to one since it's the kernel implementation you don't have to do any management you don't have to do any context which is you don't have to do any scheduling you don't have to do anything like that so your library pretty much just uses a system call to create a thread and that's about it. So that's the benefit of using the kernel threads too the kernel knows about it and it's quite simple to write a threading library while unfortunately in lab two you get to do the hard mode version of it. Okay so that's many to many. So as you can imagine threads definitely complicate the kernel so we'll just go and talk about kernel level threads. So if you have a process with eight threads and one of them calls fork what should happen? So do, should it copy all the threads and then how do you know what state all the threads are because as soon as one thread calls fork the other seven threads might just be in the middle of executing something and when you fork your program will just be in some weird unknown state that is probably completely inconsistent so you might not want to do that I can get out of hand very quickly. So what Linux does is to solve I guess one of the inconsistent race issues is that Linux will only copy the thread that made the fork call and create a new process as we're used to so that new process will be created with a single thread that is a copy of the thread that called fork but again since it's a complete copy of that time all the memory is going to be a complete clone of when that fork happens even if other threads were modifying it right before the fork call so you can get into that inconsistent memory state problem if you really go into it there's a man page for pthread at fork which will not be covered in this course but just so you know that's like to control what happens so you can essentially put in some callbacks that say hey if any thread calls fork all threads should run this function so I know that the memory is in some consistent state so then after the fork I don't have like weird invalid memory but that goes way beyond the scope of this course so we should just know that if you call fork with a process of multiple threads it's just going to create a process with a new process with a single thread that is a copy of whatever called fork so the other complication we kind of touched on before is signals again which threads if you have a process with multiple kernel threads which thread should receive a signal should it go to all of them well Linux just says hey if a signal comes in it should only be received once and it just picks a random thread to receive that signal so you have to program it so that each thread is able to run that kind of signal handler code without running into issues but this makes concurrency really really hard because you don't even know what thread is going to be interrupted when a signal comes in your life might be a little better if it was only the main thread but unfortunately signals just go to any one random thread that the kernel deems that it wants to okay so this is what the same version of many to many is there's something called a thread pool so remember the goal of many to many was just to map threads to avoid the creation cost essentially of creating a bunch of kernel threads so instead you would create something called a thread pool so you just create a thread pool of a certain number of threads again most of the time you just want to exploit parallelism so you would make it match the number of kernel threads and then you would have just some basic scheduler within the thread pool that would just essentially have a list of tasks to do and then it would go ahead and assign a task to run on one of the available threads until it's done executing and then it would get and then that thread would start handling a new task so it's kind of like a mini scheduler that gets rid of the whole complicated thing of how would you map each, how you map user level threads to kernel level threads you just divide it into different tasks and then send a task to the pool of threads so you can get the most parallelism you can. So then in this case you would just reuse you'd create your for example your eight kernel level threads and just keep on reusing them within each thread would just essentially have a loop that says hey is there any work to do in the queue any work to do in the queue if there is it would execute it if not it would just sleep until there's a new request to come in. So you'll see this technique all the time, yeah. So all the tasks should be independent of each other? The question is should all the tasks be independent of each other and no so in threading right you can have all the tasks use the same memory you're in the same address space it's easier if all the tasks are independent because you don't have to worry about any issues but they don't have to be. What if some tasks should be run before the other task but they are run the same time on two kernels on two cores? Yeah so the question is what if two tasks actually need to be run in some type of order and that is like the next topic. So that's one of the hard topics that you know that again this is where bugs start existing in the kernel for like seven years. Okay so you are going to implement main to one and this is remember the process life cycle for threads it's exactly the same thing and these are the names of the functions you are going to create. So you're going to have thread create which will essentially allocate a stack that only one thread should use. Remember threads are all in the same address space especially in user level threads because there's only one process right? So you have to allocate stacks the stack on the heap using malloc and then you have to make sure that each thread only touches its own stack because there's going to be no protection, no segfolds to stop you from using the, accidentally using the wrong stack and you're going to have a bad time if that happens. Then you're going to have to create a structure that saves all the relevant registers as well. So after your thread create you should have a valid thread and you're essentially just going to have a queue of threads to run and that will act as your scheduler so you can just like add the tasks to the back of the queue and then take the next one to run at the front of the queue and then that's essentially the most basic scheduler you have and then this is going to be co-optive scheduling so threads remember when we talked about co-operative processes well it's the exact same thing for threads a thread's going to have to give up the CPU voluntarily you're not going to interrupt it or try and steal the CPU away from it so in lab three you're going to do the preemptive where you get to steal the CPU away from a thread but in this one you don't have to worry about that you're going to depend on threads to call thread yield and that will transition them from running to waiting and then your threading library is going to decide which threads that are in the waiting queue are promoted to running and you would just take whatever threads at the front of that queue and then the only way they're going to get added back to the queue is if the thread itself again calls thread yield which should put it back at the back of the line and then there's going to be two other ones to have blocking threads so you can call thread sleep on a thread which will transition it from running and then put it in this block state where it shouldn't be in the ready queue and can't be scheduled at all so it's just, and you will decide manually when to wake it up so there'll be an explicit function call called thread wakeup which will essentially take that thread out of the block queue or sorry, out of the block state and then add it back to the waiting queue and then of course you're going to have thread exit which will transition it from running to terminated and that will be in that thread zombie state and then after it's red which I don't even think there's a joint equivalent then you can delete the resources I think it's just called thread destroy. So any questions about the lab to task? So this, yeah. Can I explain to you what is thread yield? Yeah, so thread yield so this is going to be cooperative multi-tasking so normally when I'm executing the kernel just decides what to run and if something's running it'll interrupt it and then take the CPU away from it so in this, the only way to stop a thread from running is if the thread itself calls yield and gives up the CPU. Please, I have a signal to 10 to see that that only stops. Yeah, so yield is to tell your threading library that okay, I'm done with the CPU put me at the back of the queue so if some other thread wants to run, it can run. Yeah. How to activate it from waiting to running is through thread yield. So waiting to running is, you know when the current thread is, when the current thread gets yielded and adds to the waiting queue it would just go to the back of the queue and then your, which is in your threading library, right? So this is all your threading library code. So it could put at the back of the queue and then to transition it from waiting to running it just picks whatever threads at the front of the queue and then context switches that one in. So this, also you can see that yield if there's only one thread running and it yields it would add itself to the queue where it's the only member of and then it would just run again. But if you had another thread ahead of it it would run that thread instead if you yielded. You mentioned there's no join. Yeah, I don't think they make you implement a join. But. Also we have to, we don't implement that thread to add to the queue. Yeah, because they'd be in this running state if they have a return value I'm not sure your threads even have return values. So that's the process of lab two. Yeah, so this is just have this in your mind in lab two so you can relate it to process state because it's essentially the same thing. This is like a virtual CPU. These are all the states you're in. You're probably going to want to keep track for every thread which one of these states it's currently in if that's not provided for you. So that's one of the things we're gonna have to do. Oops. And again, your scheduler can just be round Robin. It doesn't have to be super intense. So you just create a queue or a list that's like double ended. So have a head and a tail then just run whatsoever at the front. And then if something yields just add it to the back and that way everything's fair. So it comes in the same order. And then you'll have to do the context switching code which will be a bit tricky because you'll have to save all the registers and whenever you restore them they have to be pointing at the right thing. For example, you'll have some problems with the program counter because you'll have to fudge it a little bit otherwise it's gonna, if you restore it exactly where it's left off it's going to go back to like the code where it restores it and then reloads it again. So you're probably going to get into an infinite loop at some point. So you'll have to really think about the program counter. So these are again called cooperative threads. So they have to be nice. You're relying on your provided user level of code so it actually yields the thread. Otherwise a thread, if it never calls yield it just runs until it exits and just hogs up the entire CPU. But thankfully for this since it's your threading library you get to write your test programs and I believe the testing scripts are fairly nice. All right, any questions about that? It's essentially the same thing we've been talking about for processes the whole time. Now we're just replacing the word process with thread. Okay. Well then, I think that's pretty much it for threads so we can talk about our fun synchronization issues. So here's our next complication that we kind of saw a bit before. So we'll create a program that spawns eight threads and then each thread increments the same variable 10,000 times. So if I have eight threads each incrementing a variable 10,000 times at the end of it I should get 80,000, right? That would make sense as the final value especially if you initially set it to zero. So if you set it to zero, increment it 100,000 times with eight threads it should be 80,000. So let's see that and see if we can fix it. All right, so here's just a review of our P thread code. So here's what you would do if you want to join multiple threads. So in main I would just create an array of the P thread types. So again we can just map threads to what we know about processes. So if you want you can think of this P thread type as just a thread ID instead of a process ID if you want just to make things easier. It's just a little bit more of a complicated structure but at the end of the day it's essentially just a thread ID. So we'll allocate an array for the number of threads we want which is in this case eight and then we're going to go through this for loop num thread times and call P thread create and give the appropriate index into the array pointer. We're going to give it null as argument so it's going to be a joinable thread so we have to join it to clean up all of its resources and we're going to tell it to run a function called run and we're not going to give it an argument. So up here I create a static int counter so that it's a zero and anyone knows what static means in this context? That wants to tell me? To be put on the stack. All right, stack, yep. In fact if other files can see variable. Yeah, so essentially in this context it means which I don't think people went over what static means. So in this context static for variables at the top is essentially like this. So what's counter now? Sorry? Yeah, it's just a global variable, right? So the only difference is if you put static in front of something that looks like a global variable it means you can only access it in the C file. So it's a global variable that you're confident it acts like a global variable but you're confident someone else can't like grab it changed the value at all. So that's the only thing static does in this case. And another fun thing is if you do static here oddly it behaves exactly, it behaves way differently. Anyone want to guess? Actually, no, it behaves exactly the same. So if you put static here by default if I just have int i it's just going to be a local variable and be on the stack, right? But if I put static in front of a variable like that it essentially makes it a global variable now but now that global variable can only be accessed within this function but other than that it behaves as a global variable. Yeah. Will that be destroyed after the static has been accidentally? Yeah, so the question is does static in i get destroyed after the function exits? And the answer is no, it's exactly like a global variable. So global variables exist as long as your process exists. Yeah. Only the function can access that global variable? Yes. Yeah. Only the function can access that global variable. How is it different from a local variable except it doesn't be destroyed? Yeah. The question is how is it different than a local variable? And the answer is the only difference is that it's treated as a global variable. So it lives as long as the program ends. And if you call it a bunch of times, right? Since it's a global variable you can save a value to it and read it later so on and so forth. So that's your random static aside just as a not part of the course but hey, we learned something. So in run I'm going to go through this loop 10,000 times increment counter and just return null. So that's just the default I don't care value. It's like returning zero from main. And then in the main thread, so this is one of our synchronization issues. So I only want to print the counter after I know all the threads have completed their work. So I can't just print out counter without joining them because threads could still be running or zero threads could have run yet. So if I want to print out counter after I know all the threads are finished I would have to call join on all of them. So in this loop I just go, I execute this loop num thread times and use I as my index and I just join each thread. So the first line, the main thread would create eight threads and then the main thread just waits for all eight threads to finish running. So does that make sense to everyone? Any questions about that? Yep. What we can also like join immediately after this strategy? Oh, so that's a good question. The question is why can't I join immediately after threads created? So anyone hazard a guess? As to what would happen if I did that instead? Yeah. So we will wait until the first thread ends before creating the second thread? Yeah, yeah. So if I write that it's going to create a thread, one thread and wait for that thread to exit and then it's going to create another thread, wait for it to exit, create another thread, wait for it to exit. So they're not going to be running in parallel ever, right? Because I'm saying I'm going to create a thread, wait for that one thread to finish. So I wouldn't want to do that, right? So that's why I separate into steps like this because I want to create all my eight threads first and have them all active at the same time and let the kernel do its magic. Any other questions? Yep. Just curious, could you also do this task? Yeah, so the question is can I do this with detachable threads? So if I make the threads detached, I don't know when they die, right? So I could do this, but I wouldn't be able to print. I'd print the counter and I don't know, right? So yeah, I could do it detachable if I didn't care to print the counter after everything was done. Can anyone think of a way to do this with detachable threads using the fun C exit interface at exit we saw before? Yeah, so if we just have everything, if we do detach threads and just let the process kind of die, we could register an at exit that would run as soon as all the threads died and it could print the counter. But that's very specific to all the threads run and then they produce some result when it's done. Yeah? Why is it guaranteed that we're going to create eight threads first and then onto the main topic like this? After we create a first thread, the scheduler might decide to run the thread one instead of the main thread. Yeah, so the question is how do I know that all my eight threads are going to run concurrently or in parallel? And the answer is you don't. You just want them all active at the same time and you hope this kernel is good. So if you thread one, you're running the function before the other thread, you don't get created. Yep. That's like possible. Yeah, so it is possible that one thread even executes that whole thing and is done before I create another thread. That's a possibility. But you just want to program in a way that your kernel has at least a chance to schedule them in parallel. Yeah. But I think my premise is you have to join. So the main function will wait for all the threads to finish before it runs out. Yeah, yeah. So in either case, if it schedules them all at the same time or one after the other, I'm still joining on it, so I should get an answer at the end. All right. So I said we want 80,000 as our answer. Anyone want to guess at what the answer is going to be? Five, 10,000. At least one. At least one. At least one's good. I would say at least eight. So 63,000. So we're missing. I do that. Yeah, so let's run it again and I guess illustrate again why threading is so awful. Yeah. Is it because sometimes they increment the same number by one so it wastes some incrementation? Yeah, yeah. So let's run this a few times, see how bad it really is. Oh, 48. Super unlucky. Yeah. This is not a bug. It's a feature. It's a random number generator, which, hey, bug free, right? So what a great program. So this is going to be the problem with, this is why programming gets very, very difficult because of things like this. So if you just ran it that time where it got 80,000, it's like, yeah, perfect, no bugs. I'm all good. But that's realistically not the case. Even if you run your program 10 times and it works 10 times, it doesn't really mean there's not a bug in it, unfortunately. Yeah. Is there like a way to make it a bug? At least for this. So the reason why this is happening and this would happen even if I only have a single CPU on my machine. So we're using a global variable. So it's just a memory location somewhere and then within here because we're in the same address space, each thread is going to do plus plus count and plus plus count is not just incrementing a global variable. It's actually three operations. So what plus plus count will do is first, because it's a global variable, it's going to load whatever that number is from memory. So it's going to load that number in from memory and then locally it would do like an increment instruction. So it would add one to that number and then it would write it back out in memory. So we can count how many. So the count will be wrong. If one thread save the initial value is zero. If one thread executes and it loads of zero and then it context switches to another thread, it would load zero again. And then at that point, either thread could execute. So thread one could increment, it would increment it to one and then write out that value to that memory location. Then it would context switch back over to the other thread. It would increment, increment it to one and then write that back out to the same memory location. So the counter would get the value one written to it two times instead of having one written to it and then two written to it. So this is essentially counting how many times that my, it gets context switched right before load. And this, since this is like just incrementing value, threads are scheduled for a while. So it's actually pretty fast, but you can kind of guess how long threads are scheduled for by how awful this number is. So yeah, so if they got, you know, if you said, hey, it was a 50-50 shot every time, you know, a thread loaded a value, whether or not it gets context switched away, this number would be like, okay, just an insanely high distribution where I would essentially never get 80,000. But in this case, you can see that, hey, if it gets it, it gets the CPU for quite a while. So I probably won't get a number like under 40,000. Yeah, so you just want static like that. Okay, well, let's do that. Hey, but yeah, there's a comment that we got higher numbers, but again, it doesn't actually relate to the code change we made because it's just completely up. It kind of just depends on when it gets scheduled. So just changing this to static in front of that, just it doesn't change how it works at all, even though it might look slightly different. Static just determines what, like what C file you can actually access that global variable from. Would static make it more expensive to mode? No, so static is exact. In this context, static just means exactly the same as a global variable. It's just what files can access that. That's it, yeah. Oh, what if the counter is a local variable? So that's a like, so if the counter's a global or a local variable, it would just be on the stack and then you wouldn't have this issue because everything's independent, right? Yeah, so another way you could do this, this is kind of silly to use a global. If you wanted, I mean, if you wanted, you could just return 80,000 each time, right? You know what the answer's supposed to be. But you could also just have a thread do everything locally and then combine the results at the end so that, so yeah, so that's a good point. I could completely get rid of all this data-resist stuff. If at the end of run I just return the value and then at the join, because this is the main thread and it's just running, it could just add all the values together and then I would get 80,000 every single time. But that's essentially just adding 10,000 eight times, which makes the code really weird. Yeah. The context structure can happen like in between and that single line of code. Yeah, so that, sorry. This single line of code is actually three operations. Yeah, so a context switch can happen between any operation. You can think of an operation as happening atomically, so it either happens or it doesn't, but this is actually three kind of atomic operations. So this is load from memory increment, which would be local and then write to memory. Yeah. Basically tries to combat this issue. Yeah, yeah. So you just described there's something called an atomic issue that's exact or atomic integer that's exactly for this issue. So it's gonna be some hardware supported instruction to make it so that it doesn't interrupt in the middle of it that does those three operations as one atomic operation. So you get a consistent count every time. So that's one of the things and we're going to see another tool for that in a later lecture. All right, any other questions? All right, I will try. Yeah. Regarding the threat implementation of this program, is LA 1 to 1 managed to warm in Spain? Yeah, so, oh, that's a good point. So the question is what is this threat implementation is a one to one, one to many, what is it? So this is one to one. P threads are exactly mapped to kernel threads. And I can actually see the system call that it uses because we have our old friend S trace. So if I S trace this, it's going to have a bunch of crap in it. So I'll just say what it is. So this is actually your P thread create here. And this is why it's nice to have a threading library because this looks really ugly. P thread create thread is always one to one. Yeah, yeah. So essentially, essentially the way fork works is it clones a process. So in Linux, the clone system call lets you specify exactly what you want to clone. So essentially this says just clone the entire threat, clone the entire process, but keep the same address space, the same everything else. Essentially just give me my own virtual CPU and that's it. But if you want to make this behave like a fork, you would say, hey, give me a new address space, give me a new everything. Yeah. I'm wondering if you have say 10,000 threads, would that make it to a many to many because you don't have many cores? Oh, so in here, if I make 10,000 threads, that would not be many to many because P threads are exactly kernel threads. So it would just create 10,000 kernel threads, which would probably be a waste. And that point you probably want to reach for something like a thread pool. So you could do like a thread pool on top of P thread. All right. Any other? Yeah. Basically what is this atomic operation? Yeah. So the question is, what can you explain what an atomic operation is? So we'll see different types, but an atomic operation is just, you can think of it as just one instruction on your CPU that it either hasn't executed or it has executed. You don't get like some weird in-between state where it can like context switch in the middle one CPU instruction. It's like a status. Sorry? It's status. So it's more like just the operation, you either know the operation is completed or not. There's no in-between. So I will try it and post on Discord Piazza whenever I know about the quiz. Hopefully it won't be too bad. So I'm pulling for it. We're all in this together.