 353. Thank you for joining me today. So today we start our journey talking about threads, the actual hardest topic of the course. So let's just dive into it. So first we have to talk about, I said at the beginning of the course, there's like concurrency and parallelism usually mean the same thing in English like their synonyms, not so in computing. So this is our first aside that concurrency and parallelism aren't the same if we're talking about computing. So concurrency is just switching between two or more things like you can get interrupted, you can get preempted, you can make progress on one thing for a bit, switch back to another and make progress on something else. Goal with concurrency is just to make progress on multiple things. You can have concurrency with just a single CPU core just switching back and forth really fast. Paralism is actually running two or more things at the same time instant and to do that they would have to be independent and the goal there is just to run as fast as possible. Now as a note here, if you are not that fast, you can make concurrency kind of look like parallelism. So think if you were just like at some table, someone was drinking, had like a glass of red drink in front of them and a blue drink. If you closed your eyes for 10 seconds and then they were halfway done, both of them. Well, you don't know what they did. They could have just down both of them at the same time or they could have just switched back and forth. You don't even know how many times. So depending on the period, you're actually looking at this and observing it while concurrency may look like parallelism even though it's not the same. That's how back in the day, we can make the illusion that we have parallelism even though we only have like a single CPU core. All right, so real life analogy to just get the idea of concurrency and parallelism. You can assume that you're sitting at a table for dinner and you have four options you can do. You can eat, drink, talk, and gesture with your hands and we're assuming we're slightly polite people but there's this weird caveat that you're so hungry that you won't that like if you start eating you won't stop until you're finished. So through this we can argue about what for any two tasks. Can I do those two tasks in parallel? Can I do them concurrently? Can I do one not the other or maybe the other way around? So like let's say drinking and talking, those two tasks, and you know we're assuming we're not savages. So for eating and drinking I can't do them in parallel right because I'm just using my mouth. I can either talk or drink. I can switch back and forth between them so I can do them concurrently but I can't do them in parallel again polite dinner things. So those are tasks I can do so I can't switch back and forth between them so I can't do them concurrently and I can't do them in parallel. What about gesturing and talking? Well I mean I can do that right now. I can do them in parallel. I can do them concurrently if I want. I can switch back and forth between them. I could stop doing one if I wanted to or I could just do them both at the same time and then gesturing and eating well I can do those things in parallel but I can't do them concurrently because of that stupid caveat while if I'm so hungry I can't stop until I'm done. So that gives all the situation so any questions about that just to try and relate that to a more real-life situation so concurrency parallelism not the same. So there's just so you have it just in case but now we get to dive into threads. So threads are like processes with shared memory which makes things very very difficult for you. So same principle as processes except by default they share memory so they have their own registers like including the program counter and stack those are the only thing that are independent with threads otherwise you know they share memory within that process. So they have the same address space as a process so any change you make appears in all threads which makes things complicated but it also makes things fast. So if you want to like have memory specific to a thread you have to explicitly state it that's something called thread local storage or TLS so there is a way to get memory specific to a thread or you know you can just malloc get memory yourself and then make sure only one thread accesses it so there's some options if you do want some independence. So the relationship between processes and threads is one process can have multiple threads within it so by default like a process just executes code in its own address space or in other words its own virtual memory threads actually just allow multiple executions in that same address space so they all have the same virtual memory they're lighter weight less expensive to create them processes and they're going to share code file descriptors and everything like that so when we created a process you know it had to have its own address space we now know a bit more from working on Lab 3 that it would have to get you know its own page tables everything like that it also have to get you know new process would also have to get a new copy of all the open files and what they point to and then on top of you know the virtual registers and everything like that so threads they just need a stack new set of virtual registers and that's it everything else is in the processes address space so it uses the same page tables same open files everything like that so like I said even assuming we have a single CPU core threads let us express concurrency so even if you have a super super low power device they might be worthwhile so a process like it can appear like it's multiple like executing in multiple locations at once however the operating system is just like context switching within a process really really quickly and that is accomplished through threads so it might be easier to program concurrently and not have to deal with you know in case something blocks then you can make progress on something else so like for instance what web servers typically look like is they'll just have like a while true and then they'll just sit here and block on like a request from the user wait for someone to connect to it and then once someone connects this will return it will unblock and then they'll create a thread to process a request so it could do you know hundreds of requests concurrently maybe in parallel but at least concurrently and then it could immediately just go back into this with the main thread and wait for the next request and be able to you know react to it as soon as it comes in make progress on it and go back and forth so we'll see how to do network IPC in the next lecture but this is like the basis for every web server they had kind of all look like this so threads lighter weight than processes so for a process you know completely independent address space so independent code data and heap between processes but between threads and the same processes they all share code data and heap because while they live within a process processes completely independent execution and threads live within a process so they must live with within an executing process if we don't have a process we don't have any threads running so they'll both have like its own stack and registers you might find it easier to think about just a process by default has a single thread in it and a thread is what is doing the actual execution then for processes right it's expensive to create them in context switching because well mostly to do with virtual memory so it would have to create copies of all the page tables and then now we know when you context switch it would like have to flush the tlb and all that stuff which is fairly expensive especially because well you want to keep your caches as valid as possible but threads they're fairly cheap to create just a stack some registers that's it easy to context switch to don't have to if you're within the same process you don't have to change address spaces or anything like that tlb is still valid all of that fun stuff so when a process like exits and someone waits on it it's completely removed like everything to do with it all of its memory all of its open files everything like that well if a thread ends the only thing that's removed from it is the resources it consumed which is just a stack and maybe some virtual registers and some other bookkeeping information and when a process dies or terminates all the threads within it die as well so in this course we'll be using posix threads so for windows there's like a win32 thread but we're going to use unix thread so this is available on macOS linux free bsd all those fun things so all you have to do to use it include a pthread.h and then when you compile and link you just do dash pthread and that will go ahead find the include directories and then do the linking with the library and all the pthread functions of documentation the man pages but they're going to work a bit different than our c wrappers for system calls so unlike fork where we just you know after fork returns two things are kind of active at the same time two processes are active and you don't know which is going to execute with threads it's a bit more clear cut because you kind of direct a thread so you create threads pthread probably a more obvious name than just fork so what it takes is just a pointer to a pthread struct whatever that type is and it will go ahead and initialize that for you which is why it's a pointer argument there's some attributes you can set if you want and then there's a start routine so it's c so it's kind of hard to read what that means so if you don't know what that means that is a pointer to a function called start routine that takes a void star input so it just takes a pointer input and then returns a void star or pointer so it takes a pointer's input and returns a pointer and then the final argument pthread create is just a pointer argument and that will be passed to that start routine function whenever that thread starts executing so the function so after you after pthread create successfully returns you will have another thread and you don't know if you're going to continue executing on the main thread or the new thread is going to execute but whenever the new thread executes it starts execution at this start routine so it starts executing that function and then we can of course just give it a value to pass to that function here all the pthread create functions they return zero if they're successful not negative one unlike the c wrappers and otherwise they just directly return an error number if something bad has happened and if something bad happens of this the contents of the thread are undefined so here is what that looks like so let's go to this so here i just you know declare a thread a pthread type i'll call it thread and then i do pthread create so i give it the address of the thread i'll do null attributes so i'll just take all the defaults i'll tell it i want to start running this function called run and here's this run function so it takes a pointer which i don't use so i said i don't use it so the compiler leaves me alone and then i return a pointer and eventually i just return null so the argument the value of the argument i give it is just null because i don't care about it and then after i have pthread create i have the main thread which might do print f in main or if the operating system or kernel decides to schedule the other thread it would just print in run and i would see that printed and then when it returns null it's done so yeah if this was a process as review am i missing anything so for processes we created with fork and then what's the question we always ask ourselves once we create a new process yeah not quite something we need to do yeah check the exit status after it dies or like more high level you need to be a non-neglectful parent right so in this case well everything's living within a process and it's like oh okay well i don't know when the other thread is actually done executing so maybe there's an equivalent here for waiting on a thread or waiting on a process but the thread version of it because if i try and run this something interesting will happen it'll just print in main not printing in run even though there's another thread that could actually run if i executed a few times let's see how lucky i get oh hell yeah all right so this weird thing's happening again so i saw in run twice i forget it so that happens because there's like some parallelism problem with print f so that's real fun um yeah i can explain that but it's actually two print f calls and it's something to do with the buffer being weird and not flushing properly so if i don't get stupid unlucky let's see if i run it a bunch of times all right thing it unlucky but i also wasn't a good parent to it i didn't wait on it or anything like that so there is a weight equivalent for threads and they call it join why do they call it join i don't know why why can't they be consistent and you like wait on a process or wait on a thread again don't ask me so the weight equivalent for threads called join so it's p thread join you tell it what thread you want to wait on and then you get a pointer to a pointer and that's the return value so it will just update that value with a pointer value which is what the function returns so that's basically how you get its exit status because with processes right main returned an int so we did like a pointer to an int to get the value back out of it once we did wait so same thing for threads except that they return a pointer so in order to get the value set we need a pointer to a pointer because yeah that's just how C works so same thing return zero on success error number otherwise and we should only call this one time a thread so if you call it multiple times on the same thread that doesn't give you an error like with the weight system call C just says it's undefined behavior and you are now on your own so if i wanted to see both lines print all i have to do is say p thread join tell it the thread and i don't care about the return so now if i run this i'll see in main in run every single time because now the main thread after prints in main if it gets scheduled first it will join on the thread i created so it will wait for it to finish until before it actually returned zero which just exits the process so again if i did not have this join what was happening before was the main thread came created a new thread and it was going to execute this in run what if it got scheduled but the main thread kept running it printed in main and then it hit return zero and once it hit return zero that's the same as calling exit which ends the process and because it ended the process all the threads within the process are also gone so the thread that was about to print in run just never got to run before the process died but now when i have a join if the main thread goes ahead and executes first it'll wait for the other one to finish before it hits return zero and then exits the process so again we'll see both things printed every time so questions about that yeah yeah so this p thread underscore t object basically is just like record keeping for the thread so it's kind of like a process control block but for a thread so you can think of it as like a thread control block so like it would have the address of the stack that that thread uses and like a place to store all of its virtual registers and things like that if it was implemented in some ways maybe that's in the kernel maybe it's just keeping track of the stack yeah yeah just represents the thread you don't really have to worry about it it's implementation details you don't have to concern yourself because it'll change depending on what kind of thread it is which we'll get into later and you'll also implement this in lab 4 so the thread lab is probably more fun than the virtual memory one but you'll get to learn about this but there's helper functions so you don't have to deal with registers and all that fun stuff all right any other questions yeah so yeah question is why does it always print in main before in run that just has to do with the scheduler because well creating a thread is fast and it's very very likely that you know we talked about the completely fair scheduler how it kind of gets like a time slice so it's very likely that the main thread just still has time remaining and it's very likely that it'll just continue executing until the process is done but if I change the scheduler this might not be true anymore right all right any other questions before we break it even more all right so main thing here is just where we just return zero the same as exit from a process and then everything all the threads get removed all right so here's the code so all we did was we added a join which is like a wait make sure we're good parents even though there's no parent child relationship for threads because they all live within a process so if you wanted to you can join on any thread you want even if you don't create it you still just have to follow the rules where you can only join on a thread once otherwise you have undefined behavior and similarly to waiting on a process when you join a thread after join returns all the threads resources are cleaned up so like it's stacked and everything like that so there is also an equivalent for threads for like we did exit in that terminated a process well there's a way to terminate a thread so if I want to end a thread early the kind of corresponding call to exit is called thankfully in this case p thread exit so in the case of threads their functions just return a pointer so before we just had exit that had an int now we have p thread exit that has a pointer so the return value is just the value passed like that's passed to the function that calls p thread join and start routine is kind of like the equivalent of just main so it all has all the same rules so if you return from that start routine that will automatically call p thread exit for you just like returning from main automatically calls exit for you so they kind of have like a one-to-one relation so just think of the difference between main and p thread or main and exit that's the same as whatever start routine you give it and p thread exit so like I said implicitly called at the end of start routine whenever the thread returns so unlike processes we are allowed to just kind of ignore being a responsible parent in this case again no parent child relationship so there is a thing called detached threads so joinable threads which are the default thing will wait until someone calls p thread join to clean up all the resources because you know we need to get the return value you may actually want to check you know what their output is or something like that so if you don't want to if you don't care about its return value at all you don't want to read it anyways there is an option to just detach a thread so a detached thread will just release their resources immediately when they terminate whenever they call exit their resources get cleaned up immediately you can't call join on them because they're not joinable threads so the drawback is you can't get a return value out of them but also you don't have to remember to join them to clean up all the resources because there is also an equivalent of like just like we have zombie processes we can have zombie threads if we have a joinable thread and we don't join same concept it'll just like waste some resources but if we have a detached thread it cannot be a zombie thread because as soon as it exits it gets cleaned up and also for threads there's no concept of an orphan thread because they live within a process if the process is dead they don't get reparented or anything like that they live within a process if the process is dead they're dead so for p-thread detach it's just a function p-thread detach and you give it a thread and it will mark the thread as detached and it says if you call p-thread detached on a thread that's already detached it's undefined behavior because of course it is it's c so why would they make your life easy so here i can just do that same example so here is the same thing so i will just create a thread that executes this run function again i don't care about the argument and i don't care about the return value and then i'll detach it and then just do the print in main and print in run and if i go ahead and run that whoops it's called detached error i get the same thing as i had before right i just got in main because well so happened that the main thread created a new thread it was able to execute this in run function but the scheduler just decided to keep going and then return zero and the process is now dead and so the other thread is dead so if i run this a whole bunch of times likely it's just going to be in main i got lucky once and i got in main in run but again we don't have to do that but now if i wanted to always have both of them print before my solution was to just like we put a join here and waited for the other thread to terminate before i stopped but now i can't so i can't join on the thread now because it's detached so the question might be how do i actually wait and i don't exit my process until every single thread is done yep if you create a sleep to solve concurrency issues that is not a that is a band-aid and may or may not work so the problem here is basically this right this returns zero in the main thread that exits the process terminates the process and everything goes away so turns out that the thing that is calling main you think of it as the main thread because it's a thread like any other thread so there are rules where if you have multiple threads going and they all call p thread exit while your process will actually stay alive until the last thread calls p thread exit and the last thread will go ahead and call exit for you so if instead i wanted to make sure all of my threads are done before my program is done i can just do a p thread exit here and this p thread exit well when this executes my main thread is now done so my main thread now has terminated and it wouldn't continue to get to this line it would be impossible to get to that line now because now the main thread has terminated it now is no long it now no longer exists and then since there's only one thread eligible to run now in this process well my scheduler is going to have to pick this at some point and then it's going to do print f in main and then it's going to return null which is actually does the implicit p thread exit and then the library is going to go oh okay well this is the last thread in the process so i should just terminate the process now because there's nothing else to run so if i go ahead and whoops if i run this now i should see every single time i see in main in run i see them both i don't know what order they're going to be in very likely in main is going to come first because of the scheduler but again i can't guarantee and come on flip whoops all right we're not going to get lucky today all right whatever yeah yeah so the question what can i do if i want one thread to definitely run before the other one and we will need till after reading week to get into that but we will learn how to do that so that's called synchronization so yeah you might want to like orchestrate things between threads but after reading week yeah valid to return something on the stack of a thread way you mean yep so but whenever you exit from like return from the function it gets freed so you shouldn't do it so yeah the question was if i do like some local variable let's say in x equals something and i just did like this right so for the same reason you don't want to do just that if you have you know so like yeah is that boo a good idea so no because well this local variable only exists as long as the function exists wherever you hit return doesn't exist anymore so you're pointing at memory that could be freed it's undefined behavior uh what we'll have an example of this later too but the compiler also just yells at me so it says function returns the address of a local variable shouldn't do that um for lots of reasons but yeah we should not do that so if you want to like return the pointer to something you'd probably have to malloc it or something like that so i could malloc something get a pointer back and return that and then on join they get the pointer they can use it all they want and then they can free it when they're done is usually how it goes all right any other questions all right so here is our solution like i said all we did was we put the p thread exit here at main so that the main thread we just exited the main thread it terminated the main thread and prevented the main thread from actually terminating the whole process so now the code kind of works more as we expect so you can use these attributes too if you want to like get set thread variables so for instance it's kind of ugly to use but i mean it's c so it is what it is so if you wanted to actually use the attributes we don't have to use them in this course so we'll either just use the defaults or at most will detach something but if you want to use them you just create an attributes you have to initialize them with this init and then there's a whole bunch of attributes one of them would be like at the stack size so we can just get the stack size to just see whatever the default is and print off whatever the default is destroy this so running this on most systems it will show you that the stack size is eight megabytes and one of the other attributes is this joinable so you could just set a thread to be joinable which is the default or you could create a thread that is detached so that is also an attribute you can set so like you can set the attach state p thread detached so you don't have to do a separate detach call but setting that attribute so common they just made a p thread detach function for you so you don't have to deal with this attribute thing all right so oh yeah all right so we can compare like creating threads to processes and show that you know they're technically faster so so I didn't show that before so here I have a bunch of like I have some other programs here we can play with so there's one that creates 50 000 forks so it creates 50 000 processes and then this one it creates 50 000 threads so we can just see how expensive they are so I can time uh create fork uh I don't have time okay cool so if I time that and wait and wait see it's like six seconds real time 6.4 well if I just do the p threads oops what is wrong with my keyboard if I do the same thing with p threads a seems to be a lot faster but you know they share memory by default other examples we can go into so multiple thread examples so here we have a loop here so we have i going from one all the way to four we create a new thread so I wrote this new thread function that will just create a thread for us takes an id and then checks if there's any error and then here in the main before we return from main we do a p thread exit so we get rid of the main thread so in new thread this is where I show you like how we pass an argument to it so I can't just you know give it a pointer to this id because it only exists as long as this function exists you don't know when the thread will actually execute so in order to give it an integer the right way to do it is okay well I have to malloc some space for it so I malloc four bytes for the space of an int I get back an int pointer I write the value to whatever that pointer is pointing to then I create a thread and here I set its attribute to none I tell it to execute the run function whenever we give that function and then I give it the argument which would just be the pointer that malloc returned and then in the run okay well I know based off what I passed to this function that arg so again void star is like sees way of saying it's a pointer it's up to you to figure out what the hell it's pointing to so I know that arg is a pointer to an int so I can cast it to a pointer to an int and then dereference it to get its value so now I have its id now in a local variable which is again since threads have their own stacks this will be local to the thread so nothing else can interfere with it so now it gets the id in this local variable called id and then well we can we should free things so I don't need that malloc space anymore it has done its duty so I can just free it and then here I have a for loop that you know goes zero all the way to nine and I'll just print off like thread id and then sleep for a little bit so now if I execute this I'll see you know I won't really know the order between the threads in fact it seems to switch every single iteration so I've thread three one two and four all printing off you know zero and then I have thread one three four two printing off one then one three four two again then one four three two then again don't know the order between them but they'll all march on like this so questions about that all right well just to prove that weird things will happen let's say malloc yeah that's hard screw malloc so don't need you don't need you and let's just say let's just give you the address of id right and then let's say I don't need you all right what's going to happen if I do something like this a brave soul here let's stare at it for a minute and think about what in the blue hell is going to happen with this yeah so each thread is going to print different garbage or the same garbage so each one will have its own unique garbage all right we got unique garbage for each one any other guesses oh wait are you're mending after some point okay so maybe okay for a little bit and then I'll hell break loose okay yeah sake fault because so the threads aren't supposed to share their own stack but again everything's in the same address space right so they could maybe so what's your vote things are just going to be random or yeah someone save help random going to be good all random or we're going to agree all right well let's just let's execute and see so this will not be good whoops oh I open a new shell whoops oh it's still running no it's not all right wait that actually I did not save there we go so I got thread three four four zero three four four zero four four three zero four four three so I got some random stuff yeah yeah yeah so this one zero four four three so that's yeah that's a good idea let's see if we can get any pattern so if we run it again we got zero zero two three oh so you want to print off their addresses in run I've got arg something like that oh oh we screwed it up oh no we've introduced a new bug so they all got the actual same address why is that so don't we do like four different we do four different function calls to this right well turns out the main thread is the only one that does function calls to it so it'll get some stack space and then whoops uh yeah so we'll get some stack space for this argument and then create the thread and then return and then do another function call in well let's do going to call the same function so it's going to get the same amount of stack space and the pointers are always going to line up they're always going to be the same pointer in this case so all the functions just get some random uh well all get the same address which is supposed to be like local to this new thread function and then since they all get the same pointer in here the way they use is they just read whatever value that they get whenever they start executing so that's why we got some that were the same number because it just still happened that before the main thread can then change that value so it changed that value by another function call the other thread just read that value and whatever value it read is what it set id to so you know main could go ahead create two threads and maybe that value is zero and then main doesn't run anymore and then well one thread runs reads the value zero the other thread runs reads the value zero and now they're both duplicates and after that it's not going to change because I only read that pointer value once if I wanted to make this like real bad I could just say instead of just reading it into this variable let's just say I you know just dereference it every single time so every time I do a print I'm going to just read that value so now in this case if I run this okay if I get rid of the print f because it's messing with the stack oh now it's just doing that whole value all the time cool all right so moral of the story is don't do this so I want to make sure that my memory is actually independent and because well in here if I create a local variable in new thread and main the main thread is the only thing that calls it that local variable only exists for that thread I should not share it between threads otherwise real bad things would happen and this is kind of the source of bugs that last for like seven years in the kernel because they're really hard to figure out yeah so now we're just getting it because likely we just got unlucky and this thread exited and just some crap happened to its memory so yeah so in this case we just got we just always get that value now which yeah so I mean so that this thread finishes it cleans up the memory and I read it every time and it probably gets all cleaned up whenever this hits p thread exit before I had this so it might be the case where that value is still valid because the main thread hasn't called exit yet and the other ones just read that and it still happens to be valid and yeah that seems to be the case so seems that p thread exit will go ahead and just kind of delete the stack and just set its value to just some weird numbers because I'm pretty sure if I just like put a sleep here and then waited for a second till I read the value I'm probably going to see the weird big value every single time so but moral the story is you don't want to do that so this is why they taught you how to use malloc and free and all that stuff so now I have to you know actually argue about it between threads yeah yeah so your operating system so the question is like okay can threads run in parallel or not and that is up to the kernel so threads will always be concurrent so it can switch between them but the goal of the operating system is to run things as fast as possible too so if I have you know eight cores on my machine I want eight threads to all run at the same time and I want to run in parallel so the kernel will hopefully run all these things in parallel you have to to argue about their safety you can argue just about concurrency so you just want parallelism for it to go fast yeah yeah so the question is for p thread join does it have a no hanging version and the answer to that is no it does not yeah yeah yeah there's there's yeah so they share memory so that's it's up to you to figure that out so they just said they share memory so you can you essentially you don't have to do IPC right everything's nice and free because everything's on the same address space just reads hopefully the correct memory value so that it actually works all right any other quick questions before we wrap up so just to wrap up whoops so threads they enable at least concurrency hopefully parallelism so we relayed them to something we already know processes right now they're just lighter weight they share memory by default and each process can have multiple threads but just one to start so way to think about is a process by default will just have a single thread that runs main we call it the main thread otherwise we can create more threads after that they all live within a process so just remember phone for you we're all in this together