 There we go. Now I have sound. Whoops. All right. So, if I have multiple, if I have a process with say eight threads and one of those threads calls fork, what should happen? So should I create a new process that's an exact copy where that new process also have eight threads? Well, that would get out of control really, really quick because those other seven threads were just doing some random stuff whenever one thread called fork. So you don't even know what they were doing. It would be just in some weird inconsistent state. It would get out of hand really, really quickly. And especially if you were like fork bombing yourself, well, now suddenly you can fork bomb yourself much more effectively. If it just copied all those threads as well, that would be kind of silly. So in order for this not to get out of hand, the rule in Linux is whenever a process with multiple threads forks, one thread would have called the fork. So it will create a new process that is a copy of that. But the caveat is this new process will only have one thread in it running and it will be a copy of whatever thread called fork. So if in that process I have threads one, two, and three, and say process two calls fork, it will create a new copy of this process. So all the virtual memory will look identical. All the file descriptors, everything like that, except that this process will only have a single thread in it and it will be a copy of thread two. So it will look like it continues there. So that new process, one thread, that's it. So Linux just copies the thread in the new process. The new process just has a single thread in it. And if that thread hits P thread exit, all the same rules apply, it would be the only thread running. So it would just end the process and it would just call exit. If you get into more advanced uses, generally calling fork with multiple threads and then wanting, like, assuming things are consistent is really, really hard and generally you don't do it. If for some reason you do do it, there's this function called P thread at fork that will not be covered in this course that will run to completion before the fork happens. So you can guarantee your memory from all the other threads are in some consistent state. It's really, really hard to get that right, but it gives you some control over what happens at the fork in terms of the other threads. But we're not going to cover that in this course. I have never used it in my life, which probably means you will never use it in your life either. So typically, you either create threads or you create processes and creating, forking with multiple threads generally doesn't really happen. Oh, yeah, yeah. So if this is just for kernel thread, so if you just have user threads, they just exist in that process. So if you fork, you'll just create same rules apply as last time. You just create an exact copy. It keeps on executing. They look the exact same except for the return value of fork. So yeah, if you do that, if you have user threads, they'll still exist. All right. Other fun thing that if you have a process with eight threads in it and you send it a signal, what happens? Does one thread get the signal? Does the main thread get the signal? Do all of them get the signal? Turns out there's really no good answer to that because, well, if the main thread gets the signal, main thread's not guaranteed to exist. So that's probably a bad idea because you might lose signals. If all of them get the signal, that's probably bad because all eight threads or however many threads you create would all be trying to do the same thing at the same time. Generally, that's inadvisable, so it doesn't do that. So what Linux does is just pick one of the eight threads at random and send the signal to that thread. And then that thread gets to handle it. So this makes concurrency hard because you have no idea what thread would even receive a signal. It could get interrupted at every time. Go ahead, have to run the signal handler and then switch back. Yeah. Can we designate to an extent a thread to not receive a signal? Yeah, so you can have threads block signals if you want. But what if I have every thread block signal to a different one? Then that thread would get it. So you could do it if you want. But by default, if you just have every thread handling, if you just set up your default signal handler, that's what would happen. If multiple threads are registered for that signal. So yeah, it would just randomly pick one. One thread gets it. That makes your life really challenging because you don't know when any thread will get interrupted at any given time. So that hybrid approach generally comes and goes, like I said, seems like a good idea. If you want that hybrid approach, you can just do something called a thread pool. Some of you may have heard of this before, some of you may have not. The goal is to get all the benefits of the hybrid approach with none of the drawbacks. So a thread pool would create a certain number of kernel threads up to however many cores you have on your system and also create a queue of tasks. Because typically this will only give you advantage if you have a lot of really, really short tasks. So like I want to create 10,000 threads, 100,000 threads, millions of threads. You don't want to use kernel threads for a number that high. Typically you don't have to do much in that. So instead of that, I divide my task up into a queue. And then I create, say I have eight cores, I would create eight kernel threads. And then in each of these kernel threads, all it would do is ask for more tasks. If there are any tasks for it to do, it would go ahead and execute that. So I could run everything in parallel if there are enough tasks. And then if there are no tasks, it would just go to sleep, wait for another task to come in. If another task comes in, it would wake up, execute that task, resume them. And then the nice benefit of this is I only create eight threads ever. And I just keep on reusing them over and over again. So if you want to make your program go fast, there is thread pools that are built into Python. So let's look up Python thread pool. Boom, that's what it does. And it just creates a little queues. It creates the queues for you and you just give the threads tasks. Things go fast, really, really good. Like I said, this course is good outside of C. So if you want to implement main to one, well, guess what? This is what you're doing for Lab 4. You are going to create a threading library called Wacky User Threads. Why did I name that? Because it gives me joy to see what at the beginning of all your calls. I'm easily entertained. So this is the state diagram for a process. Well, guess what? The state diagram for a thread, which is what is executing, looks exactly the same. So initially, you will create a new thread through this what create library call. And that should create a new thread for you and then throw it into this ready state or waiting state. And that would be the queue of all the threads that want to execute. So creating a new thread doesn't mean you immediately switch to it and start executing it. You just create it and you put it on this waiting queue or ready queue. And again, I will show you how to use a built-in queue so you don't have to implement it yourself. Now, to switch between threads, it is going to be explicit. So this will be cooperative user thread. So the only way to context switch between threads is to do it explicitly. So if a thread is running, it needs to call what yield and that will put it on the back of the ready queue so it would go to the back of the line and the next thread in line would start running. So only one thread will be executing at a given time. So if you want to say, go ahead, something else run, you call yield, this thread goes on the ready or waiting queue and then the next thread executes. Concurrency. So we have concurrency, right? So we can make progress on multiple things at once or not once. We can make progress in multiple things in little small chunks. If we switch between them fast enough, it looks like parallelism. And sometimes it's easier to program concurrently. Whatever, like the confidence. All right, whoops. So if a thread is running, another thing that might happen is it calls exit in which case that thread will be terminated. It cannot run anymore. It cannot pass go. It cannot collect $200, so on and so forth. And these are all going to be by default joinable threads. So one option you have if a thread is running, it can block so it can try and join on another thread and that will block this thread from running so it won't go on the waiting or ready queue and it will wait for that thread to terminate. Then whenever that thread that it's waiting on terminates, that thread can now go ahead and execute again. It would just go to the back of the ready queue or waiting queue. And this is essentially minus a function, everything you need to implement. Yeah, OK. Any questions about this? Lots of fun. Again, it's set up so that you will have, I think I said before, so you will have a week, so to encourage you to start early and really understand this, you will create your own test case for this lab. So your own test case, you'll have a week to do that. It's only worth 5% of the lab just to get you to at least start thinking about it early. And what you do, you don't have to write any code for it. You just use these functions. You can create as many threads as you want. And what will happen is you submit that test case and I will run the solution on it and then tell you the result. So if there are anything tricky you want to ask about, you can get the definitive answer from the solution because whatever you write, I will run the solution and I will tell you and only you what the result of running the solution in that code is. So you are encouraged to write good test cases. So the better test case you write, the more information you will get out of the solution. You will not be allowed just to, you know, get the source code of the solution or anything like that. That is not valid. It just checks for numbers. So any questions about that? No, we're good. Yeah. So you just add it to your Git repository. It'll be a week after the lab's released. So it will just be one of your source code files in your repository. So nothing special. Yeah. In this lab, we do concurrency fast enough and then we can achieve parallelism. The goal of this lab is you're just implementing user threads. This is silly because, I mean, they have to yield from each other so you know when the context switch. It's just practice implementing what threads would actually do in like the simplest way possible where you can actually figure out how they work. Yeah. So block in this context means that thread can't run anymore. So if a thread calls join on another thread, so say thread one joins thread two, well thread one can't execute until thread two is done. So not being able to execute is what we call blocked. So it's blocked because it's waiting on thread two to terminate. Yeah. Yeah. So thread one joins on thread two. Thread one can't continue executing code from that join until thread two is done. And then if thread two terminates, then thread one can go ahead and resume. Yeah. Yeah. So you're essentially implementing pthread join. Yeah. No. So if you are joining another thread, so thread one joins thread two, well if you put thread one in the ready queue and it starts to execute again, no. So thread two can yield. So it's possible thread one waits on thread two. Thread two is the next thing to run, so it runs, but it could yield to like thread three or something like that. And then thread three, maybe it terminates, goes to thread two, thread two creates thread four, terminates thread two creates thread five, right? It could do whatever. Yeah. So basically when a, doing normally, it goes back to waiting. When a thread finishes executing normally, like, yeah, if it yields, it goes back to waiting. So basically, the same unusual happen. Yeah. You just switch to another thread. So then both of us yield together or? No. You just wait for another thread to finish and then when that thread is finished, you just, you're able to run again. So you just go to the back of the waiting queue. When you exit a thread, it will just be terminated. In this case, it will be like a zombie thread. So we'll have some resources later. So you can't clean it up until it gets joined. So there'll be like return values from threads. We're going to emulate like a status. We're not doing detached, by the way. No, no detached. So there'll be only joinable just to make your life a bit easier. Yeah. You can call yield whenever. So like yield is just a function call. No. So you just call yield and that means if there's another thread to run, run the ad instead of me. So in this case, things are cooperative. So like if you want to switch between them, it has to be really explicit. That will get into my next point. So the scheduler between them can just be round robin or whatever. So create a queue. I'll tell you how to do that. Easy. Run the thread at the front. Yield the one way. If it yields, go to the back. You have to do a context switch. Again, I'll show you how to do that. So these are called cooperative threads. So they're a bit useless because if you want to switch, it has to be really explicit. You have to essentially give up your CPU time. They have to be nice. But if we wanted to make them preemptive threads, which look more like real kernel threads, what we could do is we could set up a signal and tell the kernel to interrupt our process after a given time unit. And then after that given time unit, that signal handler can call yield. So then it's automatically switching between threads and it could do this however long the timer interrupt is, which would be like a time slice. So you could implement round robin, switch between threads really, really quick without them having a say about it. So this is just to make it a lot easier to debug because if I put in a timer interrupt and we try and debug your program, it's suddenly non-deterministic and you have no idea what we're doing. If it's cooperative, you get to control all the context switches so you actually have a hope of debugging it. So you are welcome. All right. So here, really quick, here is our next complication. So midterm material, soft now. So this midterm stuff is only going to be lecture like the first half. So lecture one to this, which is 17. And this will be the next problem we have to solve. I'm not going to tell you how to solve it completely today because if I told you how to solve it completely today, I could test you on it. So we'll just demonstrate this problem. So let us see this fun little program I have created. Whoops. No, I have way too many windows open. Yeah, you cannot see what I see. Oh, I have to fix the other. Okay. So here is a silly program I created. And by a silly program, I mean awesome program. So this creates a new process. This main thread starts executing the main function. The main thread will create an array of P thread T's up to num threads. In this case, num threads is eight. So it has a for loop. So it creates eight threads. So every array index gets its own thread. And it will create eight threads that want to run this run function. Otherwise we have default arguments. So these are all joinable threads. They're not detached. After creating all eight threads, I have another for loop that joins them all. So I create eight threads and then I wait for them all to finish. Then after they are all finished, I print off the value of this counter. What is counter? I'm glad you asked. So counter. It is a global variable. So it's a global variable. I put static in front of it. Remember static just means I can only access its global variable in the source file. And that doesn't do anything special. So creates a global variable called counter. And then in the run function for each thread, while declares in i, then has a for loop that iterates 10,000 times. And each iteration should increase the value of the counter by one. Then they're all done. So if the counter is initially zero, and each thread increments at 10,000 times, and I have 8,000 threads, what should the value of counter be? Eight threads. So eight threads, each of them does 10,000. So it should be 80,000. So this program will compute your grade. So 80,000 means you guys get 100. So if I compute your grade, you got 55. Do it again. Got 52. It's even worse now. You got 62. A bit better. Let's see. Da-da-da, 79. Oh, it was kind of close. Boom. No bugs. My program works just fine. So this is why this course is hard, because now this is a possibility. Anyone want to guess as to why it happens, and if you know what the term data race is, don't tell me. No. Yeah. So essentially this global variable, well, it's just in virtual memory somewhere where the compiler would just pick an address for it. So since it's in virtual memory somewhere, it's shared between every single thread. So they're all trying to update the same thing. So if we remember 243, does anyone want to tell me what this line actually does? Move and add and move. Yeah. So we know what loading from memory is, right? Yeah. We know what storing is. So that line of code. Let's go back to this. So that plus plus counter. What that actually does. Oh, yeah. Like how did I get 80,000 that one time? No. This is just luck. Literally just luck. So just because it works a few times, I mean, it doesn't mean it actually works, which is why debugging this stuff is close to impossible. So you have to essentially, you're going to have to reason about your programs from now on. I actually know the low level details. So like for example, in this case, plus plus counter, it will do several things. So say if we just execute it on one thread where remember registers are local in a thread. Well, the first thing it's going to do is it's going to read that value into a register from memory. So it would load from whatever that address of counter is. So it would read the value that's at that address and then populate that register with whatever the value it's read, right? So next thing it would do is it would increment that register. And then finally, well, it would write that register to whatever, to that address. So that plus plus counter actually is three steps. So if initially, let's say, let's say initially counter is zero. So if initially counter is zero, well, what's going to be the value in this register? Zero. So the register would be zero. And then we would increment it so the value of the register, hopefully we can add one, should be one, right? I mean zero plus one is one. Yeah, okay. Zero plus one is one, at least in this universe, hopefully. So if I write that register to that counter, that would update counter equals to one, right? So if another thread came and did the same thing, so let's say I have thread two. So let's erase what I said before. So let's say we have thread two that comes along now. So we switch to thread two after we update the counter. So what would it read into the register? What value would it read? Zero. So remember that global variable is shared between processes. So thread one updated it to one. So thread two would also read one. Yeah, so it would also read one. So now that thread would increment that register. So hopefully that register would now be equal to two. And then the last step, it would write and update that global variable. So now counter equals to two. No problem, right? Makes sense. One thread increments, the other thread increments. Well, the bad thing that can happen is we don't have control over if any concurrency happens. So what could happen if we are unlucky, which in fact is likely to happen. So let me move this up. So what can happen is thread one executes first and let's assume our counter is equal to zero. So what would it read into its register if it's loading that value? Zero. So we could get unlucky in that the kernel context switches over to thread two. So if thread two starts executing, what would it load into its register? Zero. Zero. And now at this point, we are absolutely screwed. So if thread two continues running, while that register goes to one, and then it would update that counter to be equal to one. Then we could context switch back over to thread one. The register is unique to a thread. So whenever you switch back to it, its register is still zero. So if it increments that register, it gets incremented to one. And in this case, it would update that counter also equal to one. So both threads would update counter to one, which is why, guess what? That number when we execute this program is lower than 80,000, not above 80,000 or something like that. And that's because this happens. And the lower the number, it just means the more unlucky we got. So we just got really unlucky with switching really fast between processes. So that's fun, isn't it? So how would we fix that? How without knowing anything else, how could I fix this program so that it always prints 80,000? Well, because of time, let's just do that. Would that work? So that would work. Would that be a good thing to do? It would be very pointless. So this works. This would print 80,000 every time. But one thread just gets created. It executes. Another thread gets created and executes. Essentially, I'm making this serial with additional steps and I'm making my program even slower. So threads are not always... There's always a trade-off with going as fast as possible. But the faster you go, the more you might encounter this problem. Safest thing to do is to be serial, but then things are really, really slow. And if you just add threads on top of the serial program, it just becomes even slower and you will get fired from your job. So that's time. So just remember, pulling for you. We're holding this together.