 All right, welcome back to operating system. So I forgot my camera, so I'll save you from having to look at me on the recording today, so that'll be nice. So today, the first actual content where you'll need to understand everything today, the rest of it was all set up. So we know what a process is now, so today we'll figure out how to create one. So remember, when we were talking about processes, they were an instance of a running program. I could run the same program more than once, even at the same time. We know that within our process we have virtual registers because every process thinks it has control over the CPU, so it has its own set of registers that don't conflict with any other process that's running. We had an example where both processes were counting down, one was not affecting the other. Because of that, we also argued that they also have virtual memory. Each process thinks they have an independent view of memory, so their stack and their heap would also be on memory. Then we went on and we figured out that, hey, global variables are actually in virtual memory too. That makes sense. They exist as long as the process exists, and while we talked about those special file descriptors, and the file descriptors are also part of the process. By default, remember we have three open file descriptors, 012 standard in, standard out, and standard error in that order. So when we get into technical terms, so each process contains a process control block or PCB, not printed circuit board, process control block, that contains all the information about a process. Specifically on Linux, this is called a taskstruck, and that link is browsable on GitHub that will just take you to the Linux kernel source code files, and you can read it if you really want. That's the great thing about Linux being open source, nothing's hidden for you. In it, you'll probably find things that look like process and process state, what state it's currently in, you'll find CPU registers, so that's where it saves the registers. That represent the virtual registers for that process, has all information to do a scheduling, which we will get into later. Memory management information in order to maintain that virtual memory. Some IO status information, information about the file descriptors basically, and any other type of information it needs to account for your process. Each process gets its own unique process ID, you might think it's called a PID, you could say that, sometimes we just say PID, so whenever I say PID or PID, which I will probably never say, that means a process ID, and each process gets its own unique process ID when you create it, and it doesn't change as long as that process exists. Here is a state diagram, we kind of like state diagrams, maybe we don't, well this is the state diagram for the lifetime of a process. So whenever you start a new process, it gets in this created state where the kernel initializes, sets up all that virtual memory stuff, which we will figure out how to do later, and gets it ready to execute. Now, before it gets ready to execute, whenever it's done being set up, it gets put on this waiting state, you could also rename that to ready, the textbook might rename that to ready, if you want to call it ready, that's fine, it's just able to execute but it hasn't started executing yet. Then you'll notice that there's two lines here between waiting and running, so at some point your kernel is going to decide, hey I want to run this process, it will assign it to a CPU, and it will start executing and it will just start running until it decides that, hey it's had enough time, and it puts it back in this waiting state where it doesn't have to execute anymore. Eventually your program might go through this cycle a few times, and then eventually whenever it's running, it will transition to this terminated state in which case that process is done. Does anyone remember the system call we had to use that would probably represent going from running to dead, yep, close, one thing is kill, yep, exit, exit or exit group, that's the way you exit a process and that's the only way you can exit a process, so going from running to terminated is that process actually calling exit, then you might notice this weird thing where it goes from running to blocked and we got a question for that, so running for blocked means that process made a request and it has to wait, I can't execute it because it's waiting on some information, so you might make a request to a network or request to a file or something like that which is really, really slow, so if I try and write some information to the file through a system call, well it's not going to complete immediately, your kernel is going to put it in this block state and that basically just means, okay this process is waiting for something to happen, I can't start executing it yet because it's waiting for some operation to complete and if it's writing a file, well that's slow, it would probably have to wait here for a while and then whenever it's done waiting for that operation and the kernel knows it's done, so like we're writing to a file, then it will transition to this waiting state or ready state in which case it can get executed over and over again and yeah, another comment, can I exit a process by unplugging my PC, that won't exit it, it will just kind of end, yep, so if you deny a process from being created, it would just never come up here because it would never get uncreated, right, if you don't let things get created, so it would never make it to this state, so they're created and they're running, yep, so the running and waiting cycle is just, running just means it's actually executing and then at some point, you know, the kernel might decide that hey, that's executed for long enough, I'm going to start executing another program, so it'll essentially stop executing it, put it in this waiting state and then run something else, so something else will be running during that time, yep, so how it gets blocked, block just means it's waiting for some IO operation or something that's slow, so it made some request and that request can't be done immediately, so the kernel has to just say, okay, I can't run you anymore because I'm waiting for something to happen, that's basically what blocked means, yep, yeah, so there's a question, I said hey, it goes from running to waiting after some time and we'll get into that when we get into scheduling, so that's a whole another topic that will take us a week and you could do that, yep, so yeah, another question, what's the difference between blocked and waiting, waiting means it can execute, it just hasn't had the opportunity yet, blocked even means, even if the CPU's not doing anything, I can't run this because it's waiting for something and it's blocked, right, yeah, yeah, that's a good question, if it's stuck on blocked for a long enough period of time, will it be auto-terminated and the answer to that is no, so we'll get into this and see it when we get into more process management stuff, but so far today that whole created thing, that's where we're going to be focusing on today, so we're just going to learn how to create a process, things will get a bit weird, all right, this is your primer for lab one, so accessing process state is doable on Linux through something called the proc file system, so there will be a special directory at the root of your drive called slash proc and it just looks like a normal directory that has files in it and other folders and things like that, so on Linux this slash proc directory represents some internal kernel state because it doesn't want to create system calls for literally everything, it just presents some information as a file that you can just read from and write to if you have permission to, so they're not real files, they just look like it, which is kind of fun and every directory in that, or sorry, yeah, every directory in that proc that starts with a number or not, sorry, that is a number represents a process, so each process gets its own folder in that directory and within that folder there is something called a status file that contains the state of the process and that's what you'll be using for lab one, so in lab one you're essentially going to make a process monitor, so if you've ever used task manager or activity monitor or something like that that lists all the running processes, well you're writing something that will list all the running processes on Linux, which is kind of cool, yep, so there's, so one instance there's slash proc slash CPU info that will tell you information about like your CPU that you're currently running on, stuff like that, you can poke around in that directory, all kinds of information in that, yep, yeah that's a good question, if I kill a process and it's no longer exist as it gets removed from the proc directory, and yeah the kernel will remove it from the proc directory so it won't exist anymore, okay so about creating processes, this is where it gets fun, so you might think it might be pretty easy, I just say what program I want to execute and say go create a process for that, well that's something you could do, so you could make that a system call, tell it that I want to load that program into memory, create a process control block for the new process and start executing it, and that's what Windows does, so there's a system call called create process that does probably what you expect, on unix like macOS or linux that is not true, so unix decomposes this into two more flexible abstractions so it's a bit harder to create a process, but we'll see why that might be advantageous, so on unix instead of creating a new process what it does which will start off being kind of weird is it clones the current running process, so what it does is it will pause whatever the current running process is, copy its process control block to a new one which reuses all the information from the process including variables, it gets the new process also gets its own virtual memory so it's actually completely independent, but it looks exactly like the process that it got cloned from, and the only way to distinguish between the two is with a parent-child relationship, so it maintains this relationship throughout, we'll get into this relationship more in the next lecture so you don't have to worry about too much for now, but what that basically means is whoever initiated the call is the parent and the new process is called the child because it essentially gave birth to it, kind of, so you can think of it that way, oh and we have processes blocked like in mutex, we will get into mutex much much later, yeah much later, all right so since we create a clone of it well then now that new clone process could load a new program and we'll see how to do that or it could just continue executing the process and then you split it off into two, maybe you can make the new process do half the work and the parent do the other half of the work and hopefully be two times as fast, something like that, we'll get into reasons why you might want to do that, but first let's figure out how to create it, so you might think the system call is called clone well it is, but for reasons the API you will use to actually clone a process is called fork and it creates a new process which again is a exact copy of the current one, so you might see it returns an int and it takes zero arguments and its API is as follows, so it returns the process id of the newly created child process, so negative one means it didn't create a new process and it failed and otherwise this will look like it returns twice because after it returns successfully it will get two different return values depending on if you're the old process or the new process and this is where your life gets complicated, so fork in the new process gets an id of zero to represent that hey it's the newly created one and the parent gets the process id of the child, so it gets the process id of the newly created process and we'll see that it's also called the parent because it will also have to take care of it in some respect, so the naming is kind of makes sense although this is where your google searches kind of get weird because you'll be like how does the parent kill the child and then they'll be like you'll be on some list yep so yeah the question is if I'm not passing anything how does it know which process to duplicate and the answer to that is it duplicates whatever called it, so if you called fork so in main if the first thing you called fork it would duplicate that process, so we'll see an example most of this class is like dedicated to code examples because it will probably hurt and you'll tell me to do odd things and we'll see it, so yeah after this point two processes are running they can access the same variables but they're completely independent and an operating the operating system or the kernel will do some optimizations to make this really really fast that you will understand later yep yeah so you might yeah so that's okay if you don't understand the in the child and then the parent we'll get to it so I'll actually just show you an example and we'll go through it and it'll probably be clearer than me trying to explain it looking at code is all always better so I'm pausic systems you can always find documentation using man if you really want so which is short for manual so today we're going to use fork and if we have time hopefully we have time we'll use execve and then next lecture we'll use weight but you should be able to do like man fork man execve and man weight and you can get some documentation for it because it's full of fun little caveats that will make your life more difficult but let's just dive right into an example so here's fork example so in this example at the beginning it has a main whenever we execute this program it will start executing at main our lives haven't changed yet and the first thing it's going to do is called fork now if fork successfully returns there's a new process created that is a exact copy of whatever started executing main luckily it didn't really do anything beforehand but now there is two processes running this program so first we have an error check here that sees if it failed so if it failed there wouldn't be two processes that would just have an error code and this would exit immediately otherwise one process is going to go into this branch because its return value from fork is zero to indicate the child so it would say print f so i just print the return value from fork so hopefully it's zero because that would make sense and then there is a system call here called getpid to get the process id of the current running process and then there's also another one to get the ppid which is the parent's process id so you can see that there's this relationship that is enforced now in the parent well we can get the return value from fork it's not going to be zero and it should be the child's process id here because that's the new process that got created and then of course the parent is going to have a process or the parent is going to have a parent as well so we could also print that print that ppid just to make it fun so oh and we have good questions and we have lots of forks okay we can try breaking stuff i'll probably wait till the end of the lecture for that so here we go so if i execute this program i should hopefully see all of these lines oops so i execute it i can see that in the parent it returned pid 2816 from fork which lines up with the child's process id and the parent's process id is 2815 and its parent is uh well 1726 and you can see here that the child's parent id is 2815 which is the parent so that makes sense but here you can see that lots of fun things are happening so it looks if you don't know what fork does and that creates a new process it looks like we're going into both of the if branches at the same time which is not true that your world still makes sense only one but only one process is doing it there's two processes yep yep yeah yeah no so here let's so here i will try and break it down a bit more so let's go to ipad first time breaking this out okay so let's say we begin execution at main with process id what was it it was process 28 so something created a new process that started running this program right so it would start executing here we know that's a little bit of a lie but you know it has to load the standard c library and all that but for our purposes it can still start executing at main so that process is created and it currently hasn't called fork yet so what's going to happen is it calls fork so at the time it calls fork it kind of gets paused before it returns from fork and we create a new process in this case it was process id here i'll do a different color process id 28 16 so it will be an exact copy of this at the time of the fork so it will also be here and it hasn't returned from fork yet so it's an exact copy neither of these have returned from fork yet and now i have two processes that look exactly the same at this point and this point only so right after this they're going to differentiate because fork for the original one will return 28 16 and fork for the new one is going to return zero so right after the fork we don't know exactly which one will execute that it's like a whole another can of worms but they will both be right after the fork and have different values for that p id variable because it's going to return and now they're independent of each other and then they assign that p id variable to zero or 28 16 so now there's two things executing so yep yeah so that's a good question if i had instructions before the fork then would the child execute them the thing for that is it's a copy at the time of the fork so if the parent executes them before that it'll be executed in the other one because it'll just copy it at that point so we'll see an example yep yeah yeah so if if the fork fails there wouldn't be a new process created so the one in green there wouldn't exist and the original there'd still only be one process and it would return negative one and we wouldn't have two yep no so it copies it during the call and then returns from it and then it just continues executing so both of these at the time of the fork it gets cloned and then right after that no it will not clone the actual fork call the fork starts to clone so the fork makes the clone yeah then some questions what we'll call this fork to fail lack of resources someone said what happens if a fork forks a fork and uh yeah we'll see that that's bad yep yeah so the question is what is p id underscore t so p id underscore t is just supposed to mean the type of a p id it's an ant yes yes no so it depends so right after this fork the return the processes go their separate way after the fork returns so one of them will get a return value from fork and then assign it to its p id variable and the other will get its return value and assign it to its p id variable yeah so right after the fork they go their different ways and their different ways is the next thing both of them will try and do is assign p id right because it does the call first and then the assignment yes yeah yep yeah so that's a good thing question so if their exact copies do they also have the same if statements and everything and the answer to that is yes they have the same instructions just one so let's see here so this original one that is currently here well it would assign 2816 to p id and then whoops what did i do and then it would continue execution as normal so same thing from top to bottom whoops i took some of that yeah i'll fix it so it would check if p id is equal to negative one it isn't so it would hop over then it would check if p id is equal to zero it's not so it would go into this else and then it would print this print this return and then and eventually and that's before the child executes so whoops i stole another piece of it and then when the child executes well same thing it would it would assign zero to its p id then go through the if branch check oh no it's not negative one go here is it zero yes and then print print print then fall out and then return zero and then end yeah so the question is how do i know which one actually runs first and the answer to that is you don't it's completely random it will likely be the parent but not always so right now i have this call here to sleep to make sure that uh actually that doesn't make sure of anything yeah yeah so that doesn't actually do anything oh right that's why i have it okay so that doesn't guarantee anything it just guarantees that my console doesn't look weird but if i run it a bunch of times there it's in whoops so if i run it a bunch of times i can get different orders i don't know which one's going to execute where that time when i ran it the parent line that printed what a return happened first then the child happened for two calls that one's a good one and then the parent went and then the child went and then the parent went yep so there's a question is this technically multi-threading and the answer to that is technically no technically we'll figure out what this actually is it's very very much related to multi-threading but not quite the same thing yep no so a process ID yeah the question was the return value from fork does that become the process's new PID and the answer to that is no so your PID never changes so like here uh the as long as the parent is executing its PID is going to be 2807 it could create multiple forks over and over again it's not never going to change so stays constant for its life so return value in the parent tells you the process ID that you just created so it tells you the process ID of your child and for the child it tells you that you're a child yeah so we have some questions about previous instructed let's go let's go through this because we'll see some more examples so what if i add some more stuff in x equals 42 something like that so if i have that and i do i don't know let's say x equals x so in this case what should i expect to see yeah 42 and then all the parent stuff and then 42 and all the child stuff um well let's see because that might not be true and in this case it was true so we got x then all the parent stuff and then x and then all the child stuff so because fork happened sorry because fork happened before well it would be an exact copy like an exact copy so the new process would also have an x and it would also be 42 and after that fork as long as it's successful there's two processes so both of them will print that line yep so what here like what if i just did this all right anyone want to guess what happens here yeah 42 parent child and how many 42s just one just prints once we got just prints once so hopefully so yeah just prints once so that happens before we fork so there's only going to be one process running whatever i like whatever just created that fork example it creates a variable prints and then it splits off into two yep yeah so there's another question why is the print order not deterministic and the answer to that is because there's two processes running and the kernel gets decided which one runs and it can make that decision and do whatever it wants so yeah it will not make the same decision eight every time if it did then that would be kind of limiting it tries to just run things as fast as possible so because i have a multi-core machine one of them could have been on they could have been on separate cores and running at the same time there's lots of things that happen yep no so this delay happens every time it's because my terminal returns back to me whenever the parent's done so i might be in the case where the child or the parent finishes first and then my terminal gets all messed up so that's just not to mess up my terminal yeah there's a question is there a way to ensure one runs after the other and the answer to that is yes we will not learn how to do it yet you will learn kind of how to do it after next lecture yep yeah so in this case the child processed in print the value of x because the original process came it started executing main it created a local variable called x it printed and then it forked after the print so now there's two processes after this time that print already happened so it wouldn't happen in the child yep so the answer when does the stop so it copies the current process at the fork yeah the instructions literally everything they look exactly the same so the instructions are literally everything that's loaded in the memory it's the whole program yeah the whole program will be in memory still yeah the whole thing because so the whole thing will be cloned it just won't be executed because it already executed it so like the instructions are just some binary data right but um but at the fork it's a copy including the program counter and everything so it wouldn't go backwards and start executing stuff copy is literally everything everything yeah so the fork the fork is the place where a new one gets created and they start diverging yep yeah so there's a question if it copies everything before the fork then why does it why does it copy the instructions that already happened and the answer to that is well we'll get into it essentially it doesn't cost anything it's free so and you also don't know imagine if the fork got created in a loop and it went backwards so it would have to be there anyways right otherwise you get into this weird program analysis thing and it doesn't yeah it doesn't make sense so in order to see that they are exactly the same here i will create x equals to 42 whoops and then after do a cap lock on and then after this here i will print the address of x to show you that they are indeed the same and oops wait did i not compile oh i didn't save that would help me all right there so you can see the address of x for both of them same thing it's an exact clone and why does this why is this possible because it's virtual memory you can even see that hey in this one let's say x equals one and the other we don't care and then at the end let's go ahead and print x so what should i expect to see here okay put it in no not really yeah so i only have i just have the address and then one print yeah so that was a good guess it was uh hey let's just execute it so in this case i got address from both of them don't know what order this case it looks like the parent happened completely before the child happened uh did i forget to save again yes okay this one's a bit weirder so the parent happened first printed the address then printed its three lines then the child printed the address which is the same thing and printed its three lines and then 42 42 happened which should have been the child because in the parent it changed x to one and so this is from the child and then this is finally from the parent right and they're independent so no matter if i get them in a different order how unlucky will i get apparently not that unlucky yeah actually that wasn't that much okay i can't get that unlucky but they'll print different variables no matter if that x one get x equals one gets printed first it doesn't affect the other process because they're completely independent even though that address looks the same for both of them but each process has its own virtual memory right all right any more fun questions about that yep yeah so we got a question what happens if i was that your question too what's your question so the question is why do i get the same address for both of them even though it's the same physical place so these are not physical addresses these are virtual addresses and because i can change one without changing the other no matter what order they happen it means that they correspond those virtual addresses that look the same in both process actually correspond to different physical addresses and we'll get into how to do that mapping later but they're independent uh oh yeah another question what process is 1726 so let's answer that first so if i want to know what process that is what can i do yep yeah let's just check it out slash proc what was it it was 1726 state i think it's status so you get all sorts of cool information about it some of it you won't be able to read yet that's fine we can read its name though and there's its name zsh that's our shell that makes sense because i typed the name of the program to execute in my shell so my shell created it huh well if fork is the only way to create new processes what did my shell have to do it had the fork whoa that's crazy right it's true uh let's see what we have time to do so there's another question what happens if i make something even more fun like fork fork so anyone want to guess what happens there yep yeah i create a lot more copies all right so here let's go so let's just run it because it will look weird and you know this is like compile so i get a bunch of stuff so eventually at the end of the day we know that each pro there's at least four processes there yep yeah so the question is how do i create a process for a new program and we'll get into that hopefully at the end so it's another call yep terminal this one this would be more like hey an example question so you can figure this out we won't yet we can go through it later but we still have some more stuff to get through yeah yeah so basically what will happen here if i put another fork here whoa my writing is terrible so if i put another fork here say you can say that so some process let's say process one to make it easy it's going to start executing main and then it's going to fork and create some process two i can draw an arrow that says what the parent child relationship is so p1 created p2 now there would be both exact copies at the time of the fork not much is going on this one would get pid equals to two this one would get pid equals to zero and then they would both fork again so you don't know what order they're going to happen in so this one would create process like three and then process two would create process four and it would look something like that yep oh god all right so that will probably ruin my next lecture well whatever so let's say oh one what does that do yeah so first time through the loop how many copies am i going to have two next time through the loop how many copies am i going to have four next time through the loop how many copies am i going to have eight 16 32 64 128 things probably get real bad real quick and that is when fork starts to fail because well that'll grow pretty fast the pid is an int so eventually it's going to run out of space to even give me process id's because it'll run out of numbers if i don't run out of memory we'll run it before okay so i'll set up another thing and then we'll run it and probably ruin the next lecture so they'll thank you so let's quickly go through this we'll have more time next lecture to play with these so there's a question how do i create a new process well there's another system call for that and it's called exec ve so that replaces the current running process with another program resets the whole process control block and starts executing that program so it has a bit of a different api it needs a path name which is the full path or filename of the program to load and become and then its next argument is an array of strings that will eventually get passed to main for argv and then some environment variables that we can have this have this handle and then it will turn an error on failure and it doesn't return if it's successful because this process becomes another one yep so exec ve replaces whatever process called it so if i go back to this example if i put oops so if i put exec ve here the parent would call it if i put it here the child would call it if i put it somewhere else they would probably both call it so let's quickly get into the example so i can say we did it and then break stuff so here is how that works so again if i start executing this it becomes a process that starts executing main it should your life is the same as it was at this point it should print do this print f and then here are the argv i'm just going to have it run ls and i'm not going to set in any environment variables because i don't care and then for exec ve i'll say this is the full path of the ls program and i'll give it the argument and the environment and then i'll check its return value so if it return negative one it means it failed for some reason and if it doesn't well this line should never get printed because it the process just becomes ls the whole process control block gets replaced the process doesn't exist anymore so let's go ahead and see that so you can see here when i run it says i'm going to become another process and then it does the same thing ls does so it just shows me everything all right so why could it fail well i mean i could just give it an invalid file name lss something like that in which case it's going to fail it's going to say no processor directory yep so the question is is there a way to get the process id for the new process i created so in the exec ve example i'm not creating a new process i'm just becoming a new process so the pid would stay the same yeah all right so we still got three minutes but let's see how this works so go to code fork example all right so i have this loop we kind of said that it should go one and two then three then four and then probably break well let's go ahead and execute it and hopefully i'll have time to fix it i didn't save it again i keep doing that okay compile all right oh i'm scared someone else want to hit enter for me oh i'll do it uh oh oh i opened a new terminal oh it's not opening oh i can't even type anything in it that's the letter all right just remember pulling for you we're all in this together and someone can come fix my laptop oh yeah see look terminal process failed to launch native exception fork failed lots of bad things