 All righty, welcome back to Operating Systems. So today's light. Yikes. All right. So first So lab 3 is coming up. Hopefully you're almost done with or lab 4 is coming up. Hopefully you're almost done with lab 3 Please provide some feedback because I don't even know if the video I posted about lab 5 old lab 5 new lab 3 was even useful So if that was useful, you should probably let me know and then you get more of it So I have no idea. No one's given me any feedback so far. Yeah Very useful. All right. So good. So I Assumed it was useful. But again, please share stuff with me if you like or don't like anything We will do essentially a lab 4 help video next lecture So I will teach you how to use things because you are going to implement threads So this is a prelude to that lab and then we'll get into implementation stuff to help you along And then you should be able to go do lab 4 really quickly there's some encouragement in lab 4 to get you to start thinking about it early so you get to submit your own test case and Figure out and you can see how things run your test case You submit will be run against the solution so you will be able to see what the answer is supposed to be So you get a week to do that. It's only worth five percent of the total grade So not much so if you want to skip it go ahead by encourage you not to just write your own test case Anything that is tricky at all just write in a test case And you get to run against the solution and see what the actual answer should be So if we are implementing threads, there are a bunch of different models we can do So if we implement threads, we essentially have two choices We can either implement user threads or kernel threads. So user threads means they are completely implemented in user space Your process just has one thread that the kernel knows about So and it doesn't treat it any differently So if one thread blocks then all the others block because the kernel only knows about one thing If on the other hand you have kernel threads They're implemented Majority in kernel space and the kernel actually can schedule them so it can treat the threads specially or specifically and that way if you have eight threads for example and one blocks it goes to sleep It's waiting for a file or something of that nature the other seven threads can go ahead and Continue executing they could even execute in parallel if the kernel knows about that and you have enough hardware for it So thread support requires a thread table So similar to a process table if I accidentally didn't say that before a Process table is just the data structure that contains all of the process control blocks that contain all the information about the processes so these thread table that contains all the information about threads specifically could either be in user space or kernel space Depending on who is keeping track of them So for user threads, which is what you'll be doing for lab for you get to keep track of all of them Thankfully calling it a table is a bit of a stretch. It's basically just a big old array so it's just a big array of Structures that you are using to track information about threads so The drawback is if you have user threads while you also need to do your own Scheduling between threads because the kernel does not know about them So you will have to do your own scheduling in lab for Thankfully, you are just going to do first come first serve and throw them to the back of the queue So it's going to be a dead simple scheduling algorithm If you want to extend the lab later for some reason you could implement whatever scheduler you want on top of them But in both of these models, it's still the same thing where a process contains multiple threads It's just whether or not the kernel actually knows about the rest of the threads so if we are doing user threads we get to avoid system calls and If we avoid system calls well Then for pure user threads again with no kernel support They're going to be really fast to create and destroy there's no system call or anything to do You just essentially will create a stack and some space for local variables You don't have to do any expensive context switches or anything like that But the drawback is because the kernel only knows about the main thread that is executing your process well, if you create eight user threads and then one of them blocks because it needs to go to sleep or something like that well that process just blocks and No other threads will be able to execute because the kernel doesn't actually know about them Well, like I said on the other hand if you have kernel level threads They're a bit slower to create because we have to do system calls Which are slower than just actual function calls But we have the nice benefit that if one thread blocks the kernel can go ahead schedule another one Even if we have only a single CPU core and the main benefit is well If we actually have more physical CPU cores on our machine Then it can go ahead and schedule them in parallel Things go vroom vroom things go really fast if we can run things in parallel and we all remember the difference between parallelism and concurrency Yes, hopefully parallel just all at the same time Otherwise we just have concurrency or switching back and forth really really fast so All the threading libraries you use have some component that Run in user mode and then that thread library will map your user threads to kernel threads so there are few ways to Have this system so you could have many to one which is the pure user threads Process so you could have as many user threads as you want and they all map to one kernel thread that the kernel knows About and the kernel can go ahead and schedule so the kernel just sees One process with one thread that's all it knows about it is not aware there are like if you create eight threads It's just not aware of them If you do a one-to-one mapping that means one of your user threads maps directly to a kernel thread and the kernel handles all the scheduling allocating a stack everything like that and You just use it It just under the hood would have to do some system calls which might be a bit slower and there is many to many So that will map a bunch typically you have more user threads and kernel threads So this will map a bunch of user threads to one kernel thread and there'll be some division there So for example, you could have like a thousand user threads and eight kernel threads So you might do a mapping of like hundred and twenty five to one kernel thread or something like that or Something that is more reasonable because of you thousand threads. It's quite a lot So main to one is like the pure user space implementation. This is you in lab four So the benefits are like I said before it's really fast and it's portable I don't rely on any specific kernel or anything like that It's just a library just code like any other code the drawbacks are that like I said before if one thread blocks Everything blocks because that whole process would block because the kernel doesn't know about your other threads You also cannot execute anything in parallel because again the kernel only knows about that single thread So you can only switch between your threads concurrently That nothing can run in parallel because only the kernel gets to control your hardware so If you have one to one that means we just use the kernel thread implementation So it's just a thin wrapper around the system calls in order to make it easier to use So for example p threads what we saw before they are one to one So they are actually mapped directly to kernel threads and it just essentially gives you a number that you can use So this will allow you to exploit the full palism if your machine So even if you have like 32 physical cores the kernel can actually schedule 32 things to run in parallel if there are 32 things to run but the drawback is it uses that slower system call interface and We also lose some control because in this case The kernel is scheduling between the product or between the threads So just like before whatever we created two processes and we didn't know what one was going to run next It's the same situation for multiple kernel threads So we don't know which thread will actually execute at any given time So sometimes you might want absolute control above then like setting your niceness and things like that So that could be a plus or it could be a minus because now I don't have to rely on my own scheduler So this is the p thread implementation So if we use threads on Linux or macOS or anything like that we assume this one-to-one implementation Many to many is like a hybrid approach So the idea is that there's more user level threads than kernel level threads And you cap the number of kernel level threads to that number To the number that we can actually run in parallel So if I only have eight cores on my machine, well, I can't run more than eight things at once So I may as well just make eight expensive kernel threads and then just implement user threads on top of that so then I get my parallelism and Get the most of my CPU I can get that parallelism with having the minimal number of system calls using to create the threads but however this It gives you a very complicated threading library and typically this comes like this is a good idea And then someone tries to implement it and then it's not such a good idea because it leads to a lot of strange bugs Because you're kind of at the mercy of that user thread to kernel thread mapping. So if you have Say say that same thing you have a thousand threads and eight kernel threads and you map a hundred and twenty Twenty-five to one kernel thread while you're still in that situation if one of those threads blocks Since the kernel only knows about that one thread that thread will block and the other hundred and twenty four can't execute so you'll have the same situation where you block and you You might not have control over that mapping or you may get unlucky So sometimes you run your program. It goes really fast because you got lucky with that mapping Sometimes you run your program. It goes really really slow because you got really unlucky with the mapping and it's typically a giant mess to try and Diagnose any performance issues and at the end of the day most of the performance issues will come from the fact you got unlucky So this is apparently coming back So if you've seen Java virtual threads, hopefully you haven't but that's like a new feature. They're touting and It's this approach. So it will seem like a good idea People will be like wow it's so fast to create threads, then they'll start using it They'll get into lots and lots of issues and then they'll go back to kernel threads probably so That is my caveat for the day, but this typically comes and goes usually if you want this hybrid approach There is an easier way to do it. Oh Wait, we'll get to that later There's an easier way to do that which we'll talk about later so These threads now complicate the kernel So for instance if we have a process that has eight threads in it and it calls fork What should happen? well one thing that could happen is it could create a new process that is an exact copy of the original so if the original has eight threads then the new one would also have eight threads and One of those threads would have called fork But the other seven would have been doing whatever so that you just get a copy of whatever they were doing when it was forked And that would get out of hand very very quickly because you couldn't guarantee anything about those threads and You can imagine well that also makes your process bigger and will consume more resources So if I did a fork bomb and I had like a thousand threads I could fork bomb myself even faster, which would be lots of fun So that would get really out of hand really fast so the rule is if you have a process with multiple threads and One of those threads calls fork The kernel will create a new process for you and that new process will only have a single thread in it and That thread will be a copy of whatever thread called fork originally So the other threads will not exist So you have eight threads for example one of them calls fork You will get a new process. That is an exact copy of the original So all the virtual memories all the same But this new process will have a single thread which is a clone of whoever called fork So it would look like it resumes back from that fork call. Yep So So if main calls fork so main the main thread is just a thread like any other thread. So you get a new process Yeah, you get a new process and then it would have one thread That would be a clone of the main thread that called fork if the main thread called it. Yeah Yeah, so if you want multiple threads after the fork you have to create them after you fork again So even if you have eight threads and then you call fork the new process just has one So you'd have to recreate them if you want them again. Yeah so if you have like thread one two and three all living in the process and Process or thread two forks Create a new process all the memory looks the same But this new process will only be running a single thread that looks exactly like thread two So it will resume from that fork call and there'll be no other threads that exist So that makes life fun, right? So Linux that's what Linux does if it hits p thread exit Well, it'll have the same rules as before that new that new process would only have a single thread So if that thread calls p thread exit that would exit the process and the process would not exist anymore And also not involved in this course. You can imagine if I have like if I fork in one thread I have seven other threads doing stuff I have no idea what state they're in whenever I fork So you might have like inconsistent memory or something like that So it's not covered in this course But if you wanted to make sure things are consistent There is a p thread at fork function that you can run and if you're it's guaranteed that it will run this Function to completion before it does a fork so you can ensure your memory is in some type of consistent state And that's just in case you ever use this I have never had to use that in my life because forking something that has multiple threads is generally a bad idea I'm already using threads for something. I don't really need to fork Or if you do need to fork it's because you're creating a new sub process or something So the other fun thing well, if I have a process with eight threads and a signal gets sent to it What happens? So should One thread receive a signal should all threads receive them should a specific thread Receive it like should it be the main thread? Well, I can't really be the main thread because the main thread might not exist anymore but it could so It would also be weird if every thread got the signal because then all if you had say again eight threads And all eight threads would try to be doing the same thing So they might be fighting with each other So what linux does to get rid of this ambiguity is it makes it more ambiguous it will pick One and only one thread at random and then that thread will handle the signal So this makes concurrency Really really hard because that signal handler is an example of concurrency It will stop that thread from executing switch to your signal handler execute that and then switch back to whatever that thread was doing So this makes it really hard because you have no idea what thread is even going to get the signal So you have to make sure you code your program such that any thread could receive the signal and handle it safely And that's generally why people hate signals And why we already looked at signals I will not make you do signals with multiple threads because it is a giant pain generally people just completely avoid it So here is what you can do instead of doing that many to many or hybrid approach You can use something called a thread pool So the goal of main domain was just to avoid creation cost What you can do instead of just depending on the library to do it is just use something that uses straight up kernel threads And you create a thread pool So a certain number of threads that equals the number of cores you have on your machine And then you also create a queue of tasks. So you just create Say I have eight cores I would create eight kernel threads and then a queue of tasks And then I would never destroy any of the threads and in the run function for every thread it would try and get a request and Then process that request and keep on Reading from that queue while it has some work to do if one becomes idle then it doesn't have any There's no task in the queue If I add a new task and a thread is sleeping I can go ahead and wake it up and have it do some work and then When there's no work it just goes back to sleep So I only create my threads once and I just have them in an input loop accepting tasks waiting for a new task to come in and That way it's more explicit And it has the same benefits as a hybrid approach So you can even do this in python in python There is a thread pool and now you know what a thread pool is So if you want to make things go in parallel really really fast And you don't want to create lots and lots of threads So for example, if your tasks were really small and you had like thousands or Billions of them or millions of them Well, just create a queue of the tasks and create a thread pool to handle it And then you'll be able to do that in parallel. Your program will go really really fast So this is what you will implement in lab 4 So you're going to implement many to one so they're all user level threads They're all sharing the same kernel thread So here's a state diagram for a process. It also describes a state diagram for a thread So you will create a new thread using this Function that we will name what create Why because I thought it was entertaining stands for wacky user threads And I don't know it brings me joy. I'm easily entertained. I thought it was a funny name So if you do the what create well, that should create a new thread So that creates a new thread and we'll throw it on this Waiting or ready queue. So you create a new thread. It will not initially start executing Remember, you are just sharing that same kernel thread. So it will go on a ready queue So that is one of the things you will need to implement And by implement I mean that I will show you how to use a library queue So you don't have to implement your own link list and I will show that to you in the next lecture So you can just use an off-the-shelf one So while it is in the ready state Well, there'll be a thread running initially and in this lab The context switching they're going to be called cooperative threads. So any context switch between threads is going to be explicit So whatever thread is currently running it can call what yield and that will essentially put itself On the back of the ready queue and let the next thread in line run. So it'll just be a fifo queue So if a thread calls what yield it will go ahead and put itself in the ready queue And the next thread will go ahead and be put in the running state and start executing So it will do like a context switch And again, that is what we will go over in the next lecture You will not have to do the context switching yourself There is a nice helper library that will do all the hard parts for you like setting all the registers and saving all the registers So i'm not going to make you individually save every register because That would just be you writing like lots of lines for no educational purpose. You should just know that hey saves registers So that context switch will all be done through those helper functions And we'll go over them in the next lecture and that will do the context switch for you So eventually whenever some thread is running there will be an exit function And that should terminate the thread. So Now that thread can no longer run If there are other threads to run it should go ahead and execute the next thread in the ready queue or the waiting queue and Then some other process would have to essentially acknowledge this To clean it up. We're not going to have any detached threads in this where all our threads are going to be like joinable threads So it will exit and if it's the last thread it should exit the process Otherwise some other thread that is running may be able to join it So if a thread tries to join it it will be a blocking call which means that Whoever called what join they are waiting on some thread to terminate and they can no longer Execute because they're waiting for that thread to terminate. So it will be put on another You don't have to do an explicit queue. You can if you want, but it will be put in this block state So it cannot execute because it's ready waiting for another thread to terminate Eventually whenever the thread it's waiting for terminates through doing something like exiting Well, then it would be just stuck back in the ready queue because it can now execute because whatever it was waiting for Or joining on is now terminated. So any questions about this? So this is mostly here to help you and give you guidance because I mean I haven't released the lab for rotation yet and again, please provide feedback. I'm adding more stuff to lab 4 1 because I Getting the sense that it should have more stuff But I don't know you guys need to say something. There's a whole feedback channel. Please let me know So this one hopefully with the next lecture Should be pretty good to get started I even ordered the test cases in a way that makes sense and you can just implement a little bit get some progress feel good about Feel good about ourselves and carry on All right, so the scheduler for that. I mean it could be round robin. It could be whatever So we're going to use it's going to be kind of like round robin, except we're not going to essentially kick them out They're going to have to re queue themselves explicitly So you're just going to create a queue or a list again. You don't have to do this yourself I will show you how to use an off the shelf one and The rules are simple. It's first in first out. So whatever is at the front of the queue is the next thread I need to run and whenever a thread yields, it gets thrown at the back Obviously, there is some Caveats here. So if it is the only thread and it yields Well, it should just continue running because there's nothing else to run So if you want you can implement going to the back and then the back is the front So you can execute yourself. You can do whatever you want So you'll also have to do the context switching. You have to save the registers again I will show you how to do this in the next Next lecture and These are going to be cooperative threads. So they have to be nice A simple thing you could do to make this More realistic and more like the actual kernel is Making this preemptive is actually not that difficult You can just set up a timer interrupt to come in through a signal and then that would call the yield So the threads don't get to control when they call yield. It would just be like every so often You could generate a timer signal and then you call yield for them So it would automatically switch which thread is executing without them having any say in the matter So Here is our next complication. So At this point. So what are we lecture? 17 So this is the official you're done any midterm testable stuff So from this point on it will not be on the midterm midterm. It's only lecturers 1 to 17 So this is going to be our next problem with threads We will figure out how to solve this But you do not need to know how to solve this for midterm stuff Which is also like still in the month anyways because reading week is in a very weird time So Let us go through this problem So what you should be able to do is, you know, create threads and argue about the execution and like normal execution where they're otherwise independent This case will be a bit different So if we execute this program We will be we will have a main thread that starts executing main What it will do is create an array of p thread types Up to numb threads in this case numb threads is eight And then it is going to do a for loop. So for every thread it will create a new thread. So this should create eight threads And eight threads that want to run this run function, which I will show you in a bit The next thing the after creating all eight threads Then the main thread is going to join all eight threads. So it will Block and wait for all eight threads to finish running After that it will print this counter variable. What is this counter variable? I'm glad you asked So this counter variable is a global variable That you can only access in this dot c file and that is what static means So static just means I can only access it in this c file Other than that, it is just a global variable that I initialize to zero now In my run every thread Will loop 10,000 times and it will increment that counter by one. So increment it by one So my question to you is after all eight threads are done running. What should the value of this counter be? So I have eight threads each of them increments the counter 10,000 times Yeah A round 80,000 Yeah, it should be 80,000 80,000. All right 80,000. So this So this is my program that's going to compute your marks really really quick. So if you get 80,000 You get 100. So let's see what marks we get today 70 and it's not bad 78 Running a few times 80,000 no bugs no problems run again 80,000. Ooh ouch 40,000 Almost failed on that one. So this is different every time anyone want to Anyone have any guesses as to why this is? because This is the hardest part of the course. Yeah Yeah, so They probably didn't cover this in other courses, but let's see does anyone know what this actually does If it is a global variable So where does this global variable exist? Below the stack so Yeah, somewhere below the stack just some virtual memory that the compiler decides that that variable's at so Because of that it's shared between every single thread right so that just lives at some virtual address And they're all modifying it So this instruction is actually Or this line of code actually does three things In that line of code. So what? it actually does is And we will go over this a few times So don't worry if it doesn't make super. Well, it should kind of make sense, but We will also go over it again. So plus plus counter If it is a global variable, it's at some virtual address So this actually does three things in a thread Let's see So let's say thread one What your computer will actually do is it will read that from memory and put it into a register So i'll just call it register So it would create it would read from memory Whatever the address Or read the value at the address of that global variable So if it's that address, I don't know eight thousand something like that It would read the memory at address eight thousand And then whatever that value was it would stick it in a register which would be specific to a thread So the next operation it would do is it would increment that register So it would just add one to that register that is the cpu instruction that you probably did in Is it two forty three or two forty four? Two forty three, okay, so two forty three. That's what it would do and then The last thing it would do is it would store the register At whatever the address is of the counter, right? So it would write that value Out to whatever that memory location was, right? So if initially Let's say The value of the counter is zero Well, what would this thread do? So I missed my eye So if the initial value is zero What would this thread load into its register? What value would it get? Zero right counter zero. So it would read zero. So in this case The register would be equal to zero So if I increment that register What is its value now? One right just increments just adds one to the register And then if it stores the value of the register at that memory location. Well, that should mean that counter Is now equal to one Which is fine. Any questions about that? So that's low level. What would happen? So the problem we are encountering is well We have multiple threads So say we have another thread thread two So I'll get rid of these So if thread one did this it updated the value of the counter from zero to one And then we context switched over to thread two. So this is concurrency What would thread two load into its register? Huh one right. So I have one thread updated the value to one It's a global so it will be Available in the other thread. So if the other thread goes ahead and reads that memory location It would read the value one into its register. It would increment it. So what should the value of the register be now? Two hopefully And then it would store it so it would update that counter variable equal to two. Yep, right All right. So the bad thing that can happen is remember We don't control whenever context switches happen They could happen in very bad orders So what could happen? It's a race all of this Instead of thread one Running those three instructions and then thread two running those three instructions. We might get in the situation where thread one only loads the variable and then Say it context switches over to thread two And then thread two starts executing So in this case again if counter is initially zero What would thread one read into a register its register? zero right So it would read zero into its register And then we context switch over to thread two. What does thread two read in its register? Zero again, so it reads zero And from this point on doesn't really matter what executes in any order. We're kind of screwed So thread two might increment its register its register is now one Then it would store that To the address of the counter So counter would get updated to one And now we context switch back over to thread one So it would increment its register which again remember registers and the stack is private to a thread So thread two executing has no effect on it So it would increment its register which was zero and when it goes back to it. It's still zero So its register value is now one and then it would store that value At the address of the counter and update it So now counter equals to one So both threads try and increment it but instead of them running sequentially And the value being two it's one because they both write one to it and we lose Some of our count which is why in that example whenever I run it Basically just tells me how unlucky I got because the number will always be less than 80 000 because I'm essentially missing counting some things Sometimes I got really lucky and I executed it. In fact two times in a row should buy a lottery ticket today I got it right twice in a row, but other times when I run it not so lucky 76 65 80 000 My program works. So yeah, this is where it gets really hard because you've no control over that, right? You have to write it in such a way that that always works otherwise you get bug reports and you have If you thought debugging was hard before Does that look a lot harder because it's not the same every single time so With this do we have any ideas how to fix this? Can you make this print 80 000 every single time? Oh Yeah, so there are ways to manage this we'll get into them later and doing that is like the hard part of the course But without knowing anything new can I make this print 80 000 all the time? But yeah, you're on the right track with what we'll see Yeah close uh Yeah, so essentially Well skipping steps essentially you just want Something like that, right? but but So you might be less likely because there's only eight threads fighting for one in fact, let's see It'll probably complain at me So 80 000 80 000 Yeah, it kind of works But it's also has this fundamental problem, right? So it's the same situation as this but instead of incrementing I just add 10 000 to it But I'm still updating so I can have this it's just that this only happens eight times So it's just really unlikely, but it could still happen So I might get a look So if I get unlucky it will be off by like a factor of 10 000 So I'll either see like all you see 80 000 or 70 000 or 60 000 or 50 000 Just depends on if one of those writes gets missed, but This makes it less likely to happen which makes it will harder for you to debug because You will get like one out of a million people will just be like, yeah it broke And you'll be like, oh How do I recreate that? I have no idea so All right any other ideas to how to make this Always do 80 000. Yeah Yeah, so one idea is well, I could join them all and get a return value from them And then the main thread can go ahead and add them all up Instead of that too. I could also if I wanted to local So I could do something like this And then like just create an array one for every thread and then have each thread just Only change its specified index And then there's no problem with that because it's just one thread updating it And then the main thread could go ahead and add them all sum them all up at the end so that Hey, guess what you just invented something called essentially map reduce you Uh, invented like google's big thing. So that's like high-performance computing big big data You did the big data thing So you don't need to take the big data course anymore. You just came up with the answer yourself So that's how I make things run really really fast just Do each individually and then just add them all up at the end really really quick and then it pretty much runs as fast as it can All right any other way without changing any other code So what if I did something silly like? This So would that be always 80 000? Should be right So if I go ahead and do this Hey, it's 80 000 enough times that I mean, I'm not super confident. Well in this case you have to argue about right with these types of programs You always have to argue about it. So this it should always be 80 000 because I create one thread It modifies it. So it's the only thing running and then I wait for it to finish Then I create another thread let it run Let it finish create another thread. Does that sound like a good idea? Yeah, that's only one thing is running at any particular time So I might as well have just written two four loops instead and then It probably would have been faster than this version of it because this version Does the serial version except create threads and just does some extra work so After this course I guarantee well Especially your employers will be like, please make this program faster and you'll be like, I know what to do I will make it threads and then you will have some problems with it And then eventually if you want to make it safe you essentially make the whole thing serial And then guess what you have made it slower and you have lost money So that is something you always have to be weary of threads are not a one size fit all just boom program go faster You have to make sure that it is safe And you also want there's also a trade-off well the less safe you make it the harder it is to get right So serial versions are easy to get right because there's this isn't a problem But the real problem is trying to make it safe, which is like the hard part of the course Which we'll figure out how to do Later we'll like have a week for it. It'll be great. Have a lab on it Figure it out. We'll make things go vroom vroom much faster All right. Any other questions or anything like that? All right. Cool. Let us hit the summary slide and skedaddle So both processes and kernel threads they enable parallelization Each process can have multiple kernel threads or multiple user threads Most implementations will either be one to one So every thread is a kernel thread or many too many but P threads are just one to one The operating system has to handle what happens with forking if I have multiple process or multiple threads or signals Now we have what is called synchronization issues Which was that problem I outlined and you will figure out how to fix Which we'll come back to after I show you the stuff you need for lab four. So next lecture lab four Should probably go to that one if you want to start lab four because you start lab four And then we'll go back to synchronization issues, which will be the hardest part of the course Which thankfully will not be on the midterm and then we'll Ride the rest of the course to the end and it will be lots of fun So just remember pulling for you. We're on this together