 All right, welcome back to another fun day. So this will be the start of actual operating systems when things start to get difficult. So I guess no one likes using the Discord either. I've posted a little brain teaser in there from last lecture, who, if execve is the only way to start an application, then if I type LS, who actually is calling execve on LS? No guesses after like two days or one day? Oh, okay, we have some new section, okay. Well, I guess we will find out and kind of dispel some mysteries, but yeah, I guess everyone's time's kind of sucked up by the labs instead of actually having fun and learning stuff on the Discord. So let's get started. So if execve is the only way to start executing a new program and it doesn't correspond with creating a new process, which keeps track of the execution of a process, that seems kind of weird. You could just start a new process that executes a new program. And if you are a kernel developer, you are allowed to make that interface. In fact, that's exactly what Windows does. So Windows has a system call called create process and that's exactly what they do. But in Unix, it decomposes it into two more flexible abstractions, one being execve that we saw last time and the second being the fun one we will encounter today. So the other option is instead of creating a new process and then loading a new program and starting to execute it, well, instead of creating a new process, we could just clone the already running one. So you can pause the currently running process, make a complete clone of it, so copy its PCB into a new one. So that's exactly where it's executing and saw the files it has open and it's all of its memory and everything like that. And then you now have two copies of the same program executing and then you'd have to distinguish between them somehow, otherwise you would just have exact clones. And Unix does this through like a parent-child relationship. So whatever one created, did the copy call will be the parent and it will get, you know, the process ID of the newly created process and then the other one would get just, it'd be set to zero, which we'll see and that's your new process and that's how you can differentiate one process from the other. And we will see that in code examples towards the end but believe me when I say it clones literally everything. So there's a strict parent-child relationship between all the processes on your Unix machine so you could go ahead and like go through top and follow the relationships or I put in the next slide, you could do something called H-top that will let you look at the process tree and see what is the parent of what and then you'll see something special at the top, there's always something called init which is given a special process one which is the only thing your operating system executes when it starts and it's responsible for doing the rest of the stuff. So you can see here kind of the parent-child relationship if I run H-top which is the oops, which is one of the monitoring kind of command line tools it's parent is ZSH or bash for you so it'd be your shell. So that would have had to copy itself and then become H-top and we know how a process becomes another one. So the answer to that was what calls execve bash or your shell, whatever it is. So Firefox would have something similar. So the way Firefox is architected is each tab runs in a new process because all the processes as we learned before are isolated so if something happens in one it doesn't affect the other which is a good thing especially if you write very unstable software. So this is how you see your process tree you can use H-top and if you use that you can press F5 to switch between the views. We can see that quickly. So in our terminal here I can execute H-top and then over here I can see all of the command so it starts with S-bin init and then shows all the children of it on your actual machine there's going to be a lot of different processes so this is just some of them. It gives you some accounting information up here like right now I have 91 of them running so because this machine only has a certain number of cores which is, I forget how many it has like eight or something, 12. It can't be running all those in parallel so it has to run some of those concurrently but again it's the operating system that decides what gets to run and when. So remember we had the kind of process life cycle before that had the waiting state created so it was waiting or ready and then running and then this block state well unfortunately for you Linux terminology is slightly different it's kind of a pain. The R state combines both running and waiting or ready if you want to call it that so if anything can run even if it's not actually running Linux just says hey it's running and then there are two block states there's interruptible sleep which is the program just saying hey I'm going to put myself to sleep because I know I don't have to do any work right now which is the case for a lot of programs like for instance watching a video you only have to render a frame every 60th of a second and your CPU is very very fast so it would just render the frame and be like okay I have almost a full 60th of a second left I can put myself to sleep for that long and then I'll draw the next frame when it's time. And then there's two other weird states here the stop state so the stop state is kind of a weird thing you can start and stop processes by yourself if you want to so you can behave as your own manual scheduling I've never seen anyone use that it's kind of just for your information and the last one is a weird one it is a zombie process which sounds really weird trust me next lecture we're gonna get into even weirder terms than that and it's gonna sound like we're trying to kill people yeah yeah so interruptible means it can go to that running state so it could wake up but it's probably just waiting to time out and uninterruptible is it's waiting for like some file IO something that can't be woken up from like even if you wanted to you couldn't run it yep yep uninterruptible sleep is if you're like reading a file or something like that and you have to wait until you read all the bytes until you can continue so even though your process is waiting for the kernel to do something access some hardware do something it needs to wait for that to be done before your program can start running again so interruptible sleep it could wake you up early if you if it wanted to right so you're not actually hard blocked on anything you're not waiting for any hardware things to happen you just you just decided to put yourself to sleep so uninterruptible sleep like if you wanted to you could okay really distinction you can kill anything in the estate you can't kill anything in a D state and kill means you just and you force the process to end which kill is one of our fun terms we'll see next lecture but we'll see that later yeah so stop state is just like you moving it to block yourself so you just say you can't run anymore it's in just a special queue where it's just waiting and then you have to remove it from it manually yeah it's like kind of users kind of managing the processes but I've literally never seen anyone use it yeah so runnable is it's also like wants to run sleep is like yeah I don't I don't want to run even if you wake me up right now I don't want to run yeah sorry yes yeah if you the only way to go to a tea stage is if you manually do it and then you have to manually take it out and you can in bash I forget what the key is to do that you can I think it's like I have no idea what it is you can look it up though yeah so the way this all happens just to give us a bit of a feel of it when we get into the actual code examples is as soon as the kernel boots up and initializes itself some someone has to create a process right it can't just materialize out of the air so your entire boot process all the kernel does whenever it's done initializing is it creates one process and only one process and it runs exec ve on it essentially and just runs this hard coded binary so it just looks in the path if it's not there you're not going to boot and that is the only thing that happens once it's booted and then that program is responsible for booting up or for creating every single other process on your machine so that's and we'll see other things that and it has to do one of the things is creating every single process on Linux it's called something called system D nowadays but there's tons of other options for it you could even write your own in it system if you wanted to it's lots of fun when we'll see kind of how to do that and yeah some operating systems will just create some idle process too that the scheduler can run if there's nothing else available to run mostly just for accounting that's just the weird detail that you don't actually need to know okay and then also to explain why our hello world example works is because there are file standard file descriptors for links has anyone seen these numbers before okay so no one's seen the numbers but people know like standard in standard out yes ends up okay so on Unix standard in standard out standard error are hard coded numbers and their file descriptors are file descriptors just a number so this is just by convention it only works because everyone on the planet has agreed to do this so the number everyone has agreed to is zero is standard in one is standard out and two is standard error so that's why when our write system call we wrote to file descriptor one and we wrote hello world and appeared on our terminal because that's exactly standard out right so you can tell that a terminal emulator's job has to you know translate key presses from actual keyboard inputs and then convert them to ASCII and then put some on that file descriptor and then anything that gets written out to them which are again are just ASCII bytes have to get converted to the actual pixels on your display and that's the job of the terminal emulator and the fun thing about this too is everyone's like opened a file before and had to write somewhere right so the fun thing about Unix and why it's this way is you can have any program you want that uses these file descriptors right to a file without you having to ever open a file or do any you know file handler code so what you can do in Unix is you open a file it just gets a file descriptor so you could open a file as file descriptor one and then run that program and then it will output to that file instead of going to your terminal and you never have to write any file or you never have to open the file or do anything so it's actually kind of great and you will get some practice doing that later but we need to get through today's lecture but if you want to check open file descriptors and this is how we're going to check kind of do some sandy checks yeah yeah so it's just another place to write bytes to so well you don't know what one is that or what file descriptor one two or anything is connected to by default they're all connected to your terminal but you could change them to files if you want no you're just writing to a number and then that file descriptor can represent literally anything and we'll see how to play around with that later in the course yep so it doesn't have to be connected to a file so your terminal emulator will connect it so you can connect it to a file if you want and that's through doing some terminal stuff that we'll see later but it just gives you the flexibility all it is is it says that hey you can only read from file descriptor zero and then you can write to file descriptor one or two and that's how standard terminal applications work so if you print out like there's like printf and then there's fprintf that has standard error it just goes to two yeah yeah so here let's here i'll even execute hello world so i guess we haven't super seen fun terminal stuff yet okay so even if we go back all the way let's go back to lecture one right so we know our hello world all the oh whoops there all right so we know our hello world it just prints hello world right and that just does right to file descriptor zero so if i want if we haven't one of the things you can do is bash it's called a redirect so you just put like a little arrow and then you could say uh my i don't know output dot txt so what that'll do is the shell is nice enough for you that it has file handling code your shell will go ahead open a file called output dot txt and then connect that to file descriptor one so that when your program executes which just writes to file descriptor one it goes to that file so now when we do that we don't see any output here and we can look and yeah there's no output and if we look at the file it says hello world so that's where our output went yep well so after that application dies right yeah so we'll see why it'll make more sense once we see how to create processes yeah so there's a terminal emulator and a shell so terminal emulator is going to you know handle all your keyboard input and stuff yeah so this part is a terminal emulator that you see but what i'm typing in is the shell yeah so you can write a shell shells aren't actually that hard to write but terminal emulators are pain yeah sorry oh so cat just reads that file and then yeah so cat will read that file and then output the results of that file to file descriptor one and it's default again so it just goes to the terminal but if you want i mean i guess you could do hey let's let's do it again right cat that file so then cat will read that file output that's contents the file descriptor one now we're going to have our shell throw it to o2.txt so it doesn't give any output and we see it again which is kind of weird but like if you're catting to an output it's kind of weird but yeah yeah yeah it just writes to one you don't care what it is yeah yeah yeah so if you if you write a c application and then you know you interact with it go ahead and s trace it you'll see a bunch of read calls from zero so have you can always in this course you can always have fun s tracing things and then we should be able to explain it in class behind first yeah i'll show that in just a second yeah yeah so i mean that's what's happening in our shell it's all being outputted by the terminal emulator and you're actually sharing stuff well it'll make sense in a little bit um what was i going to do oh yeah list of open file descriptors so there's let's see this slide so you can actually see what file descriptors are to answer your questions is this proc file system that the second it's all folders and the folders are by process id's which are just a number and then inside of it there's an fd directory that has everything name one zero one two and whatever else it has open and you can see so if we look at our terminal emulator which is going to have a lot of stuff here i'll make this bigger so there is a special sim link called self that will get the process id of the self and we can go ahead and see all the file descriptors it has open and what they point to so this has like 50 something file descriptors open but if we want to scroll all the way to the beginning we can see zero one and two so the standard ones all point to this weird thing called divis uh pts one so that's something that we probably won't learn that's just that's a terminal emulator that runs and is supported by the kernel it's called a pseudo terminal it's like some weird hand wavy thing but that's what's represented so that's where that's exactly where all your bytes are going to yep yeah so these are all like sim links so it's just the name of the file so if i could access you know proc self fd dash zero and then this is what that is actually pointing to so there's pointers in files too great eh so you can see other stuff has open like file descriptor 103 goes to this downloads i don't know why them it's downloading something vs code is doing something so yeah you can see what your system is doing but yeah it's got some fonts open you can see all the files it's actually using oh yeah that's because i'm using the weird vs code one okay so yeah yeah so they're related to whatever that process was so here right it'll just list everything open and i guess in the interest of time you can do ls of and then put a file name and it'll do the reverse lookup so you can see any processes that have a certain file open especially if you want to know hey i have a secret file or you know what's writing to this file so you can even do it right the standard c library is just a file like anything else so you can actually see everything that uses the standard c library if you do ls of lib libc and you'll pretty much get back every single process on your machine because everything uses c even at some point okay so now we're going to get into the fun stuff so on POSIX systems right you can use find documentation using man which is usually your friend we saw execve last lecture today we'll see a system call called fork and then we'll see wait in the next lecture and then usually you just do man in the function name to look up documentation for it but unfortunately for fork if we do man fork we get this like weird awk extension modules thing that doesn't help us whatsoever unless you're writing awk i guess which thankfully you don't have to so if you want the real documentation the number two is for system calls so if you do man to fork you'll get the linux programmers manual which actually gives you you know what where the header file is and what the signature is so fork you know it doesn't take any arguments and returns a process id type which we'll see shortly is just a number okay so here is the fork api so it takes no arguments and it just returns something so it can only return three things it returns negative one and then sets error no because it's a c function if something bad has happened and this is the only way to create new processes so if this fails your computer is probably dying so you probably don't need to check for errors that often but you know you can if you want and then usually in if it is actually successful and creates a new process we now have two things running and the only way to differentiate them is the return value of this so in the child process or the newly created process the process id will be equal to zero exactly zero and then in the parent whoever created that new process it will be some value greater than zero and the number it gets back will be the process id of the child process so it can actually keep track of it so this is where things get fun because so far our execution has started at main and went until exit happened somehow that's no longer true yeah is it just any number greater than zero? yeah does it matter like if it's like does it just give different information about it? no so it's just a number to identify the process yeah they just have to all be unique so you can uniquely identify a process yeah so it's an exact cop after the fork it copies the process completely and now there's two things running and there's only the only difference is that value so this one's running with PID or the PID equal to zero this one is PID greater than zero and now there's two things running yeah yeah yeah we'll see you in a second I'll show a code example yeah so fork it just returns a number right so if you read the documentation it says there's only essentially three options if that number returns as a negative one it's an error if it's a zero it means this is the newly created process which we'll see in a second and if it's greater than zero it means it was the original process so we'll see it in a second yeah let's use for let's use a system call instead okay yeah yeah yeah so let's let's here wait one second let's let's just run the example example and see what happens because you know code doesn't lie to us right and code's perfect all right so before the fork happens our world is how you've learned it you know execution starts at main although we know that's not really true because that's to load the C standard library blah blah blah but let's assume that execution starts at main so here we are at main the first thing we do is call fork so right before it there's just one process running running main first thing it does is call fork boom now there's two copies of the exact same program and they are both executing at this point right now so there's two things executing here and one will be you know PID greater than zero the other will be process ID equal to zero assuming that my computer's not dying which hopefully it's not so at this point there's two things to run the kernel gets to decide what to run and you hope it runs things in parallel to make things as fast as possible but this is why we have an operating system because you don't have to make the decision if who gets to run next so there's going to be two processes we don't know who's going to execute when and where but for any one process it will follow you know normal execution so if we consider the first one the original one the process ID will be greater than zero so this if statement will be true and then it will print off parent return PID so it will just print off the return value it got from the fork call and then it will print off its own process ID through this system call get PID and then it will also print out its parent's parent ID so whatever created it which is the extra P here so parent process ID and then we just sleep here for a little bit just you'll know why I do this in the next lecture and then right so it would be in that if block and then it would go ahead fall out of it it wouldn't go into else if or else because it went through the first if and then it would return zero and then that process dies right it returned zero it's dead so that's what would happen in the original one and then the other case is when process ID equals zero for the newly created child so in the child remember at the point of the fork it was exactly the same and the only thing that's different is the process ID so it would continue to execute now this if statement is not true because process ID is equal to zero so it would go into else if and then this one's true because it's equal to zero then the second process would print child return process ID whatever got from fork which if this isn't zero then nothing makes sense anymore and then it returns then it prints its process ID which should probably be related to what the parent got and then the child's parent ID which should be whatever created it so let's go ahead and execute that and then we can take some questions so fork example all right so here we go and the order of this is actually not deterministic we just kind of got lucky so in the parent right so the parent returned or child returned is what they got back from fork so the parent printed first it got back 3,554 which that is the process ID of the child and thankfully if we looked at the child's process ID it matches right so that makes sense and then the parent's process ID itself is 3,553 which you know that's itself which corresponds to this right that's the child's parent ID it's what created it and then so that makes sense and then we get to see whoever created the original process which is 1,449 and if we want to figure out what that is well there's that fun proc file system right we can go ahead and look it up so if we look up process you know 1,449 we can actually see in state or status there's something called name so we can see who created it so the shell created the original process and then that process created another process yep so the return PID is the the return value of the fork call so the fork created two processes and one has the process ID of the newly created one and one is the newly created one so it has two okay so at the time of the fork it completely copies the running process complete copy exactly the same the only difference is going to be the return value of fork so now there's two things executing one will have the process ID of the newly created process whatever just got created from the fork and the other will have a process ID of zero and that means it's a newly created process so originally that's no so the original process ID was the running process which was the parent so this is the original running thing right because its parent ID is this which is the shell so I executed this and then the shell ran it so this process has yeah so so the original process is 3553 and it created 3554 so the child's ID is 3554 yes so where does zero so zero is the return value in the child itself yeah so at the point of the fork right right here right here there's two processes running right and the only difference is the value of PID otherwise they're exactly the same and they're two processes they're running well they could be running they're going to run yeah oh so this was I just looked up this process ID to figure out what its name was so in proc right you can look up stuff by PID there's a status file that'll tell you all sorts of stuff about the process one of the things is the name so I just print out the name yeah okay so so fork just returns one value but now there's two processes no so there's two there's two so there's our running process which is you know three was it three five five three right this is executing it hits the fork call fork and then it creates a new process called this and they're both executing at the next line but in this process PID equals three five five four and this one PID whenever it continues execution so now there's two things executing at this point right after the fork one's going to execute with PID equal to this which was the original process one's going to execute with PID equal to zero which is the new one yeah yeah you created this it's just the return value from fork you're not changing your own PID yeah it's just a return value yeah yeah I'll go no so at this point all right this is important right after the fork there are now two exact clones of each other right there are two exact clones one in one so you can you have to think about two different executions so in one process ID is like is this one right so this one will execute until main and then this one will execute until it exits from main so there's going to be two they it's not you know the same value there's now two copies of the exact same thing and the only difference is the is the value of this variable to your back first yeah so so after the fork right there's now two running processes the original one and the new one and the way I can identify what the new one is is the value of PID is zero so I'm only printing off one PID here so it's only this returned here I'll execute again so only this returned represents the value of that got only this returned is the value from fork so the other ones are just I did system calls right there's get process ID system call and then the parents process ID system call so there's three numbers for both of them they're all in the same order it's what was returned from fork what's my process ID and then what's my parents process ID the process ID and PID mean the same thing so it just it's just a number so the return value from fork in the original process that created one is going to it's process ID is going to be equal to the process ID of the newly created process so the return value in this if we call this the original process it's the return value will be equal to the child and then in the child by definition right the it's process ID return from fork is going to be zero right so yeah so yeah so PID don't get confused with the term process ID process ID is just a number every process has its own unique number yeah yeah no exact copy from when the fork happens so okay holy crap all right yeah so that's the process ID so a process ID is just a number to identify it every process has a unique ID yeah so I called it again so the numbers I used here were used before so it just generates new unique numbers so this one 3553 is the parent and then 3554 is the newly created one which is the child so so the three thousands and stuff that's the process ID it's just a number there's no difference between like the name could be the name of the command you've had but process ID is just a number that identifies uniquely what the process is no they stay the same right and then in one I get yeah okay so sorry so this is the running process right this is the actual executing process the child right yeah would yeah so this is a running whoops so there's only before fork there's only this and it is executing it has process ID 3553 the process ID will never change this is one process and then it goes it calls fork and then the return value of fork is 3554 and this is a new a newly created process and at the time of the fork they're exact clones of each other and then the return value it sees from fork is zero no so it's right after the fork so here let's see let's do something more fun because we're running out of time all right so here before questions I will create a variable this variable is called x equals one right so at the time if what I'm saying is true and everything is an exact copy and here I'll do even do a print statement okay so here's I set x right up to this point of the fork we all know what's going to happen right it's going to declare an x set it to one do a print and then some magic will happen after fork but before fork we're all good right so if I execute this just by itself yeah it's unused if I execute this by itself how many times am I going to see I set x printed out once right because it's before the fork we're still in our normal world so if I print f after this right so how many times will that get printed two right one for each process now let's do the fun part parent okay now if I do that right we should see after the fork printed twice so because there's two different executions one will go in the parent branch one will go in the child branch and they'll both print off x equals to one right hopefully okay so they both print off x equals to one yeah yeah so you'd call exit or return from main that's the only way to exit a process yeah so like here right in the child the first thing I could do is be like ah this is too confusing I'll just exit immediately and then in that case that process will die immediately right and nothing else to get printed just the parent stuff is the process ID what yeah yeah process IDs are just integers yeah if you want okay all right let's sell down two seconds because it's going to get even worse so we're about to get worse so here I'll say x equals two now what happens so so in the parent I will set x equals to two so in the parent hopefully x equals two what does someone think this will print one right because it's an exact copy at the fork but then one of the our principles is that each process should be isolated from each other so if I go ahead and run this which is fun I'll see that in the parent x equals two and then in the child x equals one then you might be saying oh what about if it's just you know updating that variable first well we'll make sure it writes so that so we can just put the child asleep for a second so we're guaranteed that the parent writes that value that updates x to one or sorry updates x to two and then prints out its value and then we'll see if it changes so who thinks in the child that x is now equal to two all right so a process should they should be complete copies at the time of the fork and then other than that they should be isolated from one another so if the value changes if you can affect the value of one program from another one that's bad so if we do that we can see that hey in the parent x equals two and after hey x equals still one in the child because it never changed it right and to even illustrate this even further that you might be like oh well you know what about the address of it right what about if it's like pointing to a different address so let's see what the address is of x and both of them so if it's an exact copy like I'm saying the addresses should even be the same which should be quite remarkable and then if I do that whoops I get rid of the sleep whoops ah so if I do that and I look oh Jesus if I do that and look at all the addresses the address of x in the parent is blah blah blah blah a 24 hey in the child it's blah blah blah a 24 they're exactly the same yeah so they're both the same address right but they're not changing the same variable because we just tested that so these are not these are obviously not physical memory addresses then these are all virtual memory addresses so when a fork happens I said exact copy and that's down to the pointers down to what they actually point to so the operating system right the kernel has to magically fool you about this because they look exactly the same but clearly they can't be pointing to the same physical address because that makes no sense right so the kernel which is what you have to do in this course you have to make this illusion true so they're both mapped to different addresses but because they're virtual addresses they look exactly the same no so because they're virtual addresses there's right they're like a layer of indirection they can point to whatever memory so in one process that address points to some block of memory and then the other process that same address gets mapped to a different block of memory so within a process it will stay in the same spot after the operating system decides where it is so there's only two x's here so the one at the very beginning gets copied so then after it's copied there's two of them all right any other questions about this because we have one minute and it's going to get way worse so I don't even have enough time to go through this so here's another one that in main all it does is execute a loop four times calls this function called create process with a number and then it immediately forks so right so far there's only one program running so it immediately forks and then process ID is greater than zero it's the original process so thankfully for you guys it just returns and then the new child process goes ahead here prints off what process it is whoops prints out what process it is and then thankfully calls exit so nothing weird happens and if I execute that I should see you know all the different processes printing out all sorts of different values I guess I'll office hours that after this but two if I really want to break your brains if that's not bad enough which I guess we'll carry in the next lecture anyone want to hazard a guess what would happen if I did well one so let's see what happens when I do that which is a great way to end the lecture because I'm about to break my computer so nothing will print out my thing just went blank it's now unresponsive and that what that does is hey as soon as it executes it will call fork we don't even check the return value because we don't care anymore it creates a process and then both of those right exact copies then it both of those would go back up to the loop and then they both fork yeah so now they both fork so now those two processes each create another process so then you have two then you have four then you have eight then you have 16 and you can do that to your friends tell them to write that and execute it no so it's here yeah see resource temporarily unavailable fork failed because I ran out of space all right so we'll continue these examples later I guess I have office hours soon-ish too at like four all right