 Alrighty, welcome back to Operating Systems. So today we are going to talk about threads implementation and how you might implement threads if you were to cough, cough, wink, wink, implement it yourself coming tomorrow. So let's get into this because this will be useful. So there are a few multi-threading models, so you might ask yourselves, well, where do I actually implement the threads? So there is actually two options, well there's actually three, but the two main ones is I could implement the threads as kernel threads or user threads. And what's the difference between them? Well kernel threads the kernel knows about, so the kernel knows they are separate execution streams, while if they are user threads that just means you are creating your own threads you are faking it. As far as the kernel is concerned, you only have, it's only executing a single thread and then you are concurrently switching between that on top of that thread. So the user threads are completely in user space, which means that they're just part of your code. The kernel doesn't treat your process any differently. So if you do user threads, your process only has that one main thread that the kernel creates and then as far as the kernel is concerned there is no notion of any other threads existing in your process. So you just create them and that is what you will be doing in Lab 3. The alternative is to have kernel threads that are implemented in kernel space. So the kernel actually knows about the threads so you would have to do a system call to create them and then after that the kernel manages all the resources for you and it can actually schedule threads separately because it knows about threads. So it could run two threads in parallel because the kernel knows about them while if you just have user threads you can't do anything in parallel because your process will only have a single main thread and that is all the kernel is concerned about. It will only execute one thing at a time. So if you need thread support it requires something like a thread table which is also analogous to process control block similar to the process table so it could be in kernel space or user space depending on if you're implementing kernel threads or user threads. So if you're doing user threads which you will be doing there is also a need to have your own scheduling system on top of it for our purposes in Lab I guess 4 you will just be doing first in, first out, first come, first serve so the easiest scheduling we could do but you could implement your own not for the test case but if you wanted to you could implement your own scheduling and take your user threads way further but in both models a process contain multiple threads it's just whether or not the kernel is aware of the multiple threads or not. So we could avoid system calls and let a thread block so for user level threads with no kernel support they're going to be really fast to create and really quick to destroy because there's no system call associated with them no context switches at the kernel level you have to do all your context switches yourself and while your program is running but the big drawback for having user threads is well because the kernel only knows about one thread running in your process if one of your thread blocks by doing a read or a write or any other system call that blocks or doing a sleep then your process no other thread will execute because as far as the kernel is concerned that process is now asleep can't execute anything else if you have kernel level threads are going to be slower to create because they necessitate a system call but the nice thing about kernel level threads is well the kernel knows about them so if one if you have like eight threads in your process if one of them does a sleep or blocks while that doesn't affect any other thread the other seven threads can go ahead and still execute the kernel could even run them in parallel if you had enough resources so all the threading libraries you run have some aspect that runs in user mode and then the thread library will be responsible for mapping those user threads to kernel threads in whatever fashion it wants so pure users pure user space threads that's what's called many to one so it's how many user threads map to one or map to how many kernel threads so in many to one that means all your user threads map to a single kernel thread so it's purely implemented in user space again the kernel is not aware that your program your process actually has multiple threads it doesn't care so if that process blocks that process just can't be scheduled anymore so the other scheme if you want one thread to map to one kernel thread that is called one to one so each of the threads in the library actually maps to a kernel thread the kernel handles everything there's just some the thread library is just responsible for doing some light accounting and figuring out you know numbers to threads and things like that so p threads are an example of one to one so they are actually in kernel kernel threads so that's why if you have multiple threads they will if i have eight threads and i have eight cores on my cpu well it could schedule them all in parallel they could all be running at the same time because they are kernel threads there is another option and that is many to many so i can map a bunch of user threads in the library to one or more kernel thread generally there's more user level threads and there are kernel threads and you cap this at the number of kernel threads that matches your hardware because anything on top of that can't really run in parallel so that is another option so like i said many to one pure user space implementation the kernel is only aware of a single thread and everything is on top of that so this is what you will be doing in lab four so it's going to be fast going to be portable doesn't depend on anything it is just a library but the drawbacks are if one thread blocks all the other threads block because all the operating system cares about is scheduling that one thread it knows about in the process also like i said before nothing can be done in parallel because at the end of the day your kernel is that like a gas leak okay it stopped we are saved all right whoo all right so in this case we can't execute any threads in parallel because again the kernel only knows about that single thread so if one blocks the whole process blocks the other one again like i said harpy on this one to one just uses a kernel thread implementation so there's just a thin wrapper around some system calls to make it easier to use and this is essentially what the p thread library does so that allows you to exploit the four full parallelism of your machine the kernel can schedule each thread on a different core run them all in parallel which we learned what the difference between concurrency and parallelism is on the last one so it could run them all in parallel but it has to use a slower system call interface and you lose some control so if we are doing a pure user thread library while you actually have some say with the scheduling but if you're doing the kernel version of it you don't have that much control you have as much control as with what we saw before with the process so you might be able to adjust your priority a little bit but ultimately you have no control over the schedule so typically this is the actual implementation that's used and this is what we'll assume for linux using p threads so many to many is a hybrid approach and the idea is while we have more user level threads than kernel level threads so I will cap the number of expensive to make kernel level threads to match the number I could actually run in parallel on my machine and the idea behind that is you get the most out of your cpu and your system with the fewest number of system calls however this really leads to a complicated threading library it tends to be popular for a while and then die out and then come back so recently it has come back if you're using java so java virtual threads use this idea typically it's really complicated to get right because any of these threads could also block and if one of the threads blocks whatever is mapped to that one kernel thread would also block so sometimes you might not want that you might get really weird performance and you could block a lot of threads one execution you know you might get lucky one you might not so typically it's really unpredictable and hard to get right and people generally avoid it yeah yeah so the minimum number of system calls so say I wanted to create like a thousand threads well if it was one to one I would have to do a thousand system calls for each kernel thread if I did many to many I might just make a thousand user threads and then if I only have eight cores on my machine I will only make eight kernel threads and then I'll figure out a mapping from those thousands split them up into like what 125 each something like that but the drawback is well if one of those 125 block that's running on the kernel thread the other 124 that are also mapped to that kernel thread will also block so you might get typically this comes and goes with the time apparently it's new again so if you see java virtual threads their new feature that's what this is and I would avoid to use it I would avoid using it so threads we haven't even talked about how they would complicate the kernel so for example if I have multiple threads that are running and they're kernel threads what should happen when I fork so if there's eight threads running and I fork what happens do like the new process is an exact copy do they just that new process also has eight threads and it's just whatever state they were in when the copy was made how would that get out of hand really really quickly well remember that I could essentially accelerate that fork bomb so if I had like a thousand threads and then I had each of them doing like a fork bomb thing well then I multiply it really really I multiply it even quicker than I did before so the rule behind that is actually a fairly simple rule so whenever you call fork in a process with multiple kernel threads well Linux is only going to copy the thread that initiated that fork call so your new process will only have a single thread and that single thread will be a copy of the thread that called fork so if I have eight threads and one of them calls fork I would create a new process with only one thread and it would be a copy of that thread that called fork so my new one my new process would continue on with only a single thread and because of this well if it hits p thread exit and it's the only thread in the new process it would just exit the process because it is the only thread available now so it will go ahead and call p thread exit and all that as far as I can tell there's also p thread at fork which we will not cover in this course because as you might imagine if one thread is calling fork you don't know what the others say seven threads are doing so you might want to make sure your program is in some type of consistent state so that that new process is consistent whenever it gets forked so you're allowed to control you're allowed to control what happens out of fork using p thread at fork to make the state consistent and again that will be too complicated not covered in this course with how to do that and make things consistent but you should know about its existence and that yeah it's probably a giant headache yeah and what sorry yeah so if I have eight threads and then one of the threads calls fork it will create a new process that is a copy of the old process so like we had before but the new process will only have a single thread in it and it will be the same thread that called fork if that kind of makes sense if not we'll have an example we'll come up with an example so also we have to think all if I have a process with eight threads and a signal gets sent to the process what should happen so which thread should receive the signal should it be the main thread okay well if the main thread doesn't exist anymore should it be all threads will well then each thread tries to do the same thing so you might be doing the same thing eight times so the linux solution to what happens when a signal gets sent to a process with multiple threads is to just send the signal to one of the threads which one of the threads yeah it just picks it at random so it will just randomly send the signal to any old thread you don't get control and this makes concurrency hard because any thread can be interrupted at any given time due to signals and this is mainly this is another reason why people actually do not like signals so that's fun so another thing is well you might think that many to many is actually a good idea so what you can do instead of implementing a whole threading library on top of it you can use something called a thread pool so because remember the goal of many to many is only to avoid the thread creation costs avoid the system calls so if I only have eight cores on my machine I can just create something called a thread pool of eight kernel threads and then you you can just have a queue of tasks and then each thread can just grab a task off the queue and just do the work as long as there are some tasks to do so you can actually run up to eight things for example in parallel and with only creating eight kernel threads so as requests come in you might wake up the thread tell them they got some work to do and then they can go back to sleep and if there's no work to do they don't have to run so you might see this once you get into like if you start caring about performance on your machine you can use something like thread pools in fact in python there's like you can look up thread pools in python you can use thread pools in python so that is a good way to make your programs run in parallel so like I said lab four you're going to implement main to one and this state diagram may help you because your threads are going to follow the same state so the library you're going to create is called what or wacky user thread so I'm easily amused so I just made a three letter acronym that I don't know amuse me somewhat so whenever you create a thread you're going to create your own thread through a function called what create so that should create a new thread for you and put it in this waiting state or ready state and since you are doing user threads only one thread can be running at any given time so eventually whenever you do FIFO order it will be running and then we're going to do cooperative threads so they can actually yield which would mean that they can you know go ahead and put themselves on the back of the queue and another thread can run so you will have an explicit call to yield that will put the current thread back into the ready queue and essentially make sure that the next thread is going to run so it just context switches to the next thread and that will be done through a nice helper library that we will explore in the next lecture so you don't have to do any of the hard context switching most of the heavy lifting is done for you so while a thread is running it may become terminated by calling exit so that would make that thread terminated and in which case while you would have to go you would not clean it up you would set its status code and everything like that and then start running the next thread if there is another one to run if there is not another thread to run while this should just exit the process like we saw before with p threads you will get into the situation where some threads can block so you are going to implement join so you are allowed to join another thread so in that case whatever thread calls join will block until the thread it is trying to join actually terminates so that will block the thread and then what whenever it terminates we'll have to go and take this block thread and then put it in the ready or waiting queue whenever the thread it's waiting for has terminated so this is what you'll be juggling in lab four so this is like a vastly simplified version of old the old lab that took like 80 hours so this one should not take that long if this one is like super daunting please I haven't seen any feedback for labs or at least the last lab so please use the feedback channel if you have any comments or anything like that this lab however shouldn't be too bad yeah yeah can we get another help video for that lab and that's the next lecture so next lecture we will be doing a real video so I'll show you how to do all the context switching and everything like that I'll show you how to do a queue so there'll be a queue you can use you don't have to implement your own and I've from last year I updated the documentation added some more help things and they'll also be incentive to start early because you get to write a test case and you have a week to do it so you get to see how it works to encourage you to you know actually try it all right so with this your sketch oh yeah yeah so you can submit a test case just and you'll have a week to do it and it's like worth like five percent or something of the lab so a very minimal mark if you want to forget it it's mostly so that you know what it is so you get to write a test case that does whatever you want and it runs against the solution and you get to see the proper output of it so if you are confused about anything you can write a test case for it and then I will run the solution on it and then tell you what the output was so you'll be able to tell exactly what it should do and you can do that without actually implementing anything we'll see examples of it and I'll update the documentation so that it has some code examples too all right so for the scheduling could be round robin or first come first serve in that you just create a queue or a list and it's a fairly simple algorithm so run whatever the thread is at the front if it yields throw it to the back of the queue in this lab you have to do the context switch and it will have to save all the registers and everything like that but there is a handy dandy library that I will teach you to use in the next lecture that will do all the heavy lifting for you so the threads you are implementing are going to be cooperative threads so that means that they have to be nice and they have to yield in order for another thread to run but if you want to extend your implementation to do preemptive threads it's actually not that much it's not that difficult all you would do is get like a timer signal and then whenever that timer interrupt comes into your process you just call yield for that thread so then you just constantly change it over and over again so here is our next complication then we can ask questions and explore do whatever so let us explore this fun little program because this is when things get real so here we go so we have a main so once we start executing this we'll create a single process with the main thread that starts executing main so what it is going to do is create an array of p thread t's up to num threads which in this case is eight and then we will have a for loop that will create eight threads and then set them all to execute this run function and then after we create them all we're going to join them all from the main thread so they're all joinable so I will let the main thread go ahead and wait for all eight of them to finish and then I'm going to print a counter what is counter I'm glad you asked so counter is a global variable so the only static here if we forget what static means it's just a global variable that I can only access in this .c file otherwise static doesn't do anything so all my run function does is each thread will loop 10,000 times and increment the counter by one for every iteration of the loop so let's do some quick math here if I have eight threads and they each increment that counter 10,000 times what should that counter be when I'm done executing yeah 80 80,000 right so each thread adds increments at 10,000 times I have eight threads eight times 10,000 is 80,000 so we would all agree that that's 80,000 should be 80,000 cool all right so let's compile 73,000 73,373 let's try it again 62,000 yikes 76 there our program works no bugs right it's 80,000 that's what I expected no now works I fixed it I just had to run it a few times so if I run it a few more times you know I get oh I get 80,000 again 68 so imagine if this program was calculating your grades or something like that oops sorry you got like a you got a 62 instead of an 80 oops so anyone wants to try and explain what is happening here yeah oh that's a good guess it's not waiting for everything to finish so is it waiting for everything to finish so it must be waiting for everything to finish right because I'm waiting for so unless I screwed up the loop bounds which I didn't so this executes the same amount times as this so for each thread I create I join it yeah yeah they're sharing memory right so the only thing that is local to a thread is well essentially it's stack so like this eye is going to be unique to a thread so each thread is actually going to execute 10,000 times but this counter is a global so it is shared between all threads and this is why they have harped on you the difference between the heap and the stack generally to prepare you for this and the low level pointer stuff was not an accident it's because the details actually matter here so we'll go over this more later but just so you know if I do plus plus count and count is a global variable well this operation doesn't just happen in one step so what your computer actually has to do is well each thread is going to read from the address of counter I guess it's called counter so it would do a memory read from that address and it would read from that and then put it into a register and then the next thing it would do to implement plus plus counter is it would increment that register whatever it happened to be and then the final step if it's a global variable well it would have to write register value to that whatever the address of the counter was so there's actually three steps for that plus plus count so I have to load the value from memory put it in a register increment it increment the register and then I have to go ahead and then write that that new updated value back up to update that global variable so what can happen and don't worry if this is confusing right now this is the heart of the hardest part of the course so don't worry we will go over it many a times but what can happen even with two threads well say I have two threads that are each trying to do this so I could have the first thread read from counter so I'll just make it slightly shorter into register so in this case let's assume that counter is originally counter gets initialized to something like zero so in this case thread one will read zero into some register value so the register is now equal to zero then in the next step it would increment that register so now register is equal to one and then it would write two counter from that register so now that counter global variable is now one and then ideally the exact same thing happens we could switch over to thread two and then thread two could read from counter in this case it would see the new value of counter so instead of reading a zero it would go ahead and read a one and then it would increment its register so it would increment it to two and then it would write to counter and then counter would be equal to two and life is good right two threads each increment one so it should go from zero to two so any questions about that that's how it would normally work yeah yeah so this is whatever it switches between threads did you have any say as to when the kernel will context switch between processes no so you have no say whether or not the kernel context switches between threads so because of that well this is what would happen in the ideal case but in the not oh yep yeah so in the not ideal case whoops so in the not ideal case well thread one could start running read counter into its register and then we context switch to thread two in which case it would read from the same global variable and in this case it is not updated so it's going to do exactly what you said and thread two is going to read zero right now at this point we are screwed so no matter what happens if it switches back to thread one well thread one with it so the register is local to the thread so it's already loaded as zero so it won't change whenever we context switch back to it we would increment it from zero to one and then we would write out to that counter and then counter would be updated to one then if we context switch back over to thread two well its register was zero it's currently zero it would do that increment and if it does that increment well it would increment it from zero to one and then it would also update the counter and in this case it would be updated again to one so I would see one when both threads actually update it so instead of what I would expect one thread writes one the other writes two if I get unlucky well then one may write one and the other may write one as well and just overwrite it so this is like the crux of our problem and it is called a data race and you will learn how to fix these and this is it this is like the hardest thing data races are like the most notorious hard hardest bugs ever to fix and it and we'll explore why as we go on we'll need several lectures to actually cover it and how to defend against this but does everyone understand that this is fairly bad so it just turned out that sometimes I got really lucky how I got 80 000 but you know oh hey luck again but turns out I'm not always lucky oh whoops so turns out you know 60 000 that's a unlucky one 70 000 ooh 50 000 yikes real bad 80 000 they're fixed yeah so this is like the reason why programming is actually hard all right any questions about that stuff are we fairly good I didn't break you too bad yet yeah well if I wanted to let's say oh yeah okay someone tell me how to fix this so it always prints 80 000 but we still create eight threads anyone come up with the solution yeah sure like I want to sleep here here let's try this this all right so if we learn anything so far it should be that sleeping and it just working is not a solution so it might work for some things it might not work for others so in this case let's try it oh this is going to be real slow all right all right so this is already not not satisfying okay that got lucky this is very not satisfying yeah either we don't know about mutexes yet yeah yeah so I could create an array of these and then have each thread just update its own so that I essentially make some thread local storage for each one and then I would probably have to add them all together at the end right to sum everything back up so you actually just came up with map reduce so map reduce it's like a big data google thing you won't learn that until like fourth year so that is one thing we could do but I don't want to write that many lines another yeah so do this all right let's see eight thousand seems pretty good so this kind of fixes the issue but what am I losing by doing this yeah so this is essentially just running it in serial so I create a thread it executes I wait for it to finish I create a new thread wait to wait to finish create another thread wait for it to finish so at any given time there is only one thread doing this right so if I have eight cores on my machine can I use all eight cores in parallel no so if you want things to go vroom vroom fast and you want things to run in parallel well we're going to have this issue but this is a fix that will give you the correct outcome you want but at this point is this a smart idea probably not it's probably going to be a lot faster just to have one thread that doesn't create anything else and just loops 80 000 times instead that's probably going to be a lot faster so you will encounter that whenever you try to like add threads to your applications you might get into the situation where oh well I'll fix all these problems by just essentially running it in serial and then suddenly in your desire to make your program go a lot faster you have now just made it slower you've made it still serial but you've made it create threads and use threads on top of what it was already doing so you can actually make things go slower by using threads so any other questions that's fun all right so we'll be doing threads for a while now so we can go ahead and early or yeah this is we can end early we can take a break have a nice little breather finish your lab three uh if you haven't done so already and then be sure to come to the next lecture because we'll be doing like how you should start lab four and how you should use all the libraries you should be using so just remember going for you we're all in this together and this again hardest thing in the course