 All right, welcome back to Operating Systems. So today we are going to talk about process management. But before we get into that, let's go to our exec v example. We didn't have time to do last lecture. So last lecture, we figured out how to create processes. You guys figured out how to create too many processes. My laptop was very hot when I tried to run to the train by the time I got there and then didn't work when I opened it. So that was great, thank you for that. So we learned how to create processes with fork. Now we need the other side of how to start running a new program. So that's what exec v e does. So let's start just reading this example. We start at main, we're gonna have a print f. At this point, we still have one process that is running this. It's going to print a print f because we have an encountered fork or anything new yet. So nothing is going to happen. Then we will encounter this exec v e system call which is a bit special. So its first argument is a path which is a fully qualified directory and file to access which will contain the actual program to run. So that will be an elf file. We all know about that all the way from lecture two. Second argument is arg v. And that looks a bit different than the arg v when you have a main in C. This is the system call version and C does a bit of massaging to change this into arg C and arg V that is not null terminated. So the way that they define this arg V array once you are doing the system call is it's supposed to be an array of C strings. And again, a C string is just a pointer to a char. So the way they define it, it's like in C strings, they're null terminated characters to signify the end while this should have a null pointer at the end to signify the end of this array. So here I will just give it one argument, ls and then say I'm done. The third argument to this function is this environment variables. They're kind of like global variables for processes. We won't deal with how to massage them or pass them because there are wrappers for this that just do it for you. So we don't really have to bother for this. So for this, we'll just set it as zero. And then this exec VE system call, if it is successful, it will transform this process and start running another program. So the process ID of this process will not change. If exec VE is successful, the kernel is going to load this into the processes process control block and then start executing this process. Whoever called exec VE will never return because it just got replaced by something else. So if this is successful, it starts running ls and then whenever ls quits, that's when the process quits. Otherwise, if it returns negative one, it means there's an error of some sort. In this case, it would fall into this if branch and then it would also set error node to indicate what error actually happened as part of the C standard library. Here I save it because I want to return it as my exit code and PR might actually set it. So here I just save it to a variable. And then I'll do a PR that will print out what that error node actually referred to and some message that I print like exec VE failed and then I'll just return from main which will terminate this process. So I have two options, either this is successful, my process starts running ls or there's an error and I return from here. I will never, there is no possible way for me to ever hit this print f line because there's either an error or this process is reloaded and starts executing another program. So if I go ahead and I run that, I see I'm going to become another process and then I see the output of ls. Any questions about what's up there? Yep. So the questions, why do I bother checking for negative one? Because this can have an error. So what's one obvious error you could put in to make this have an error? Yeah, like, right? That would be an error. In fact, yeah, sure, we'll compile it. So it says error, no such file directory. So that's the case that we would have an error. Oh yeah, that's a good question. Why do I even check that it's negative one if it's going to reach there anyways? And it was just so I could point to a line and tell you that that will never execute. So in reality, right, I didn't have to check that it's negative one because it has to be negative one to reach this point. So that is a good point. All right, any other questions here? Yeah, yeah, that's a good question. So is this RV even necessary? Like if I just did this, would it work? Well, ain't nothing to it, but to do it works fine. So that is a convention. So the convention is that if you've ever read like the first argument in your program, it's supposed to represent what the user type to represent your program or to run your program. And it might not necessarily be related to the file name. So some programs change their behavior based off what you called them as. Some use this as a way to print like help messages and stuff to give, it's supposed to give you better context. So for example, if you did like ls dash dash help, something like that, oops. So the first thing it's gonna tell you is it's usage and give you this nice little helpful help message. But that's just a convention. So I could do something like this if I really wanted. So if I do something like this, let's compile and then I will just scroll up to the top and I'll see that it's help message uses that program name, but it's just a convention. In fact, the point of doing this, that was actually the source of a major security flaw. So if you ever use sudo, it escalates you to a super user. Well, it read that assuming that R1 will always be defined, but you don't have to, you could just pass null and then suddenly it's referencing invalid memory and you can do odd things like become the super user without having access. So they had to patch that. So you actually found a security flaw that if you played with stuff, you could have got root and like hack the NSA or something like that. So, all right, any other questions? All right, we found an exploit, cool. So now to get into this lecture. So if you have looked around in your lab one files, you might have realized that that whole state diagram I told you isn't exactly representative in Linux. So if you look at the proc status file and look at the state, it's not quite what I told you about the process state in the last lecture. Last lecture was like the generic terms that you use to teach students because the Linux terminology, there's like weird corner cases and things that they consider that they think is important that theoretically isn't that important and we won't really get into the minor details. Just know basically on Linux, if you see this running or runnable, it represents both the running state and the waiting state or the ready state. So Linux doesn't really make a distinction because most of the time the process isn't actually in that state. Most of the time it's in this state. So there are two block states. And remember block means it's waiting for something to happen. The first is interruptible sleep. That means the process is not currently executing. It is not able to execute. But if we poke and prod it, it would be able to execute if the kernel really wants it to. And we'll see what that actually means a bit in the next lecture. Uninterruptible sleep is the other thing of the coin. Even if the kernel wants to poke and prod it to run, that process cannot execute because it's actually waiting for something. So if your process is in this D state, it is not possible to get it to run. And then there is a stop state you might see. There's actually a way for you as a user to just tell a process, hey, stop. You're not allowed to execute anymore. And we'll see how to do that a bit or the mechanism to do that in the next lecture. And then you'll see this weird zombie state. I did not make up that term. That is a real term. And we will learn what that means today. So you might also ask, well, if there's like this parent-child relationship between every process, how do you start the first process, like what kicks things off? So the kernel has just one job after it initializes all the hardware and it's ready to run user programs, which is what it's really all meant to do. All it does is create a single special user process. And then as far as the kernel is concerned, its job is done and it's all up to that process to do the rest of the system. So this process is called init. Usually it's in slash espin init. And it is the ancestor of every single process on your machine. So it's responsible for launching every other process on your machine, either directly or indirectly. And it must always be active if it exits and that process is done. The kernel thinks user space is done. It thinks you're shutting down and it will just shut down your computer and you'd have to reboot. So on Linux, the specific project that does init, mostly you'll find is something called SystemD, but there's all sorts of different options that some people take as a religious war what init is. And it doesn't really have to be that complicated. And on some operating systems, you might find like on Windows, there's an idle process that the scheduler can run. That's a fake process that does nothing. That's basically just supposed to keep track of how idle your CPU is. So this is what a typical process tree looks like on the virtual machine. It will always have init as the grandparent of every other process. Whoops. So for here, let's say I would have something called like JournalD and Udev that init launches. They're like system demons that just do system-y things that we don't really have to care about. The one that you'll probably see is there's a SystemD for your user that's responsible for launching processes and keeping track of processes for your user account. And within them, you might have something like NomeShell, which if you have installed a desktop environment with a UI and everything, that process represents your UI. That is what is drawing your UI. And then that process might have children, which would be something like Firefox if you have a web browser open. And then within Firefox, it might have a bunch of different children processes. So in Firefox and how all the other browsers are architected, each tab runs in its own process. Because, well, it would be annoying that if one of your tabs crashes because it's running JavaScript and does something bad. Well, if one tab brought down your entire browser, you might get upset. And so they were like, well, we can take advantages of processes being independent and then make every tab on Firefox its own independent process. And that way, if a tab crashes, it's fine that process dies, but Firefox carries on and you don't get angry. Yep. Yeah, so there's a comment that, well, if they all fork from init or something like that, can't they access and that's resources or whatever? But remember that init would have to fork, but then it would, to run a different program, it would have to call execve, which reinitializes everything. So it couldn't access its memory or anything. Because it could completely, yeah, execve would completely reinitialize everything. Yeah. And here in Firefox, well, they would share a bunch of things, but in order to communicate between processes, well, you would have to do some form of inter-process communication to actually communicate between them because they're independent right after the fork. All right. And then the other side, if you have a terminal open that you're typing into, you might have this process and a terminal server open. And then the sub process of that would be something like CSH, which is your shell, which is what you actually type into, which was the parent we saw when we ran our fork example. And then here, this would represent whatever process you're currently running. So CSH would have to fork to create a process and then it would have to execve that program. And hey, guess what? When we S-traced our little example in lecture two, the first system call was execve and we didn't make that, but that represents the start of our process of our new running program. So that's where that came from. So on your virtual machine, you can use a tool called H-top to see your process tree. You can press F5 to switch between a tree view. That'll show you all the parent and children and the list view. So you can see all the processes on your machine. And if you're running Visual Studio, there is a lot of processes running, which is why lab one is sometimes flaky if you're using VS code. So these processes, they're going to be assigned a process ID on creation and it does not change. This process ID is just a number and it is unique for every active running process. So on most Linux systems, the maximum PID is like 32,000 and zero is reserved as being invalid. And eventually the kernel will recycle a process ID after the process dies for a new process. And on more some systems, this is a configurable limit so this isn't always true, this is just the default. So on your machine, it might actually be higher than this. And then remember, each process has its own address space. So you might come across this term. The term address space just means that each process has its own virtual memory or independent view of memory. So some people just call it an address space. So we have to maintain this parent and child relationship. So previously I cheated a little bit and made sure that the parent exited last because I had a little sleep at the end because I didn't want to show you some weird things that might happen. But today weird things will happen. So we might ask yourself, well, if there's this parent and child relationship and the parent exits first, so the parent process is no longer there, what happens to the child process? Anyone with a guess? It already is independent. Yeah, it might become a child of another running process. That would make sense maybe. Yeah, so the comment is, if I run my process in my shell and then I kill it, the sub-processes are also killed? That is not true. So they die for a different reason, which we'll get into later. But in general, when your parent process dies and then the child process is still running, well, the child process still carries on, it just no longer has a parent. The Linux terminology or the operating system terminology is very literal. This child process that no longer has a parent is called an orphan process. Very literal, so it doesn't have a parent anymore. And if it's an orphan, it needs a new parent. Again, very literal. So we end one thing to always remember that the parent process is responsible for the child. That's why it's not called like a sibling relationship or anything like that. That's why they specifically said parent and child because the way to think of it is the parent process is ultimately responsible for the child. The operating system is going to set the exit status when a process terminates. And the process, again, remember the way to terminate a process is by calling exit. But even though this process is no longer running, the operating system cannot remove its process control block and clean up all the resources yet because that process needs to be acknowledged as actually being terminated. Yep, we haven't got to sub-reverse it yet. Yeah, I'll hold that for a second. So this is amendment acknowledgement. And there's two situations here. So if the child exits first, well, the parent was responsible for the child, the child is now exited and it needs to be acknowledged. Before it can be acknowledged, it is called to be a zombie process because it's not quite alive and it's not quite dead, which is why I guess they called it a zombie process because it needs to be acknowledged. But it's terminated and it can no longer run, but the kernel cannot remove its process control block yet or free its process ID. So its process ID would be in the process control block. One way you might think about this is, well, if a process terminates, and I'm just pulling it over and over again to ask, hey, are you terminated yet? Are you terminated yet? Are you terminated yet? Well, if I don't have any acknowledgement, it might be the case where I pull it, I see that it's still running, let's say process ID 20, and then eventually, and then it's alive, and then I pull it again, and between those two times I ask about its status. Well, that process terminates and then a new one launches and somehow gets process ID 20. And now that process will think that, hey, it's still running, it never died, and that is why there needs to be some acknowledgement because it needs to see that it died and only its direct parent can see that. The second situation is what we said before, which is the orphan process. So the way to acknowledge a process is to use a system call called wait, and it has the following API. So it has a variable called status, which is an address to store the wait status of a process. So the kernel will write a value to this address and it will return a process ID of the child process, which has now terminated and been cleaned up. So like all these system calls, these C wrappers for system calls, it will return negative one if it fails. It will return zero in the case you don't want to actually wait for something, which is called a non-blocking call. So you can just ask the kernel, hey, is the process terminated yet? And it will instantly return back yes or no. Otherwise, if you don't do a blocking call, it will wait until the first child process terminates, and then it will return a number greater than zero that represents the process ID of the now dead child. So that wait status contains a bunch of information, including the exit code. You have to use the man pages to read wait to find all the macros to query that status. And if you can actually wait on a specific process, if you use a wait PID system call, so that says wait for this specific process to be terminated, otherwise wait waits for the first one. So let's just get into the code example. So in this case, we have main, we immediately fork, so we create another process. So now we have a parent and a child. Here, check for errors, probably no errors hopefully. And then we check if PID or the return PID is zero. If it's zero, am I the parent or the child? If I return zero from fork, am I the parent or the child? Child, everyone should know that like that. So here in the child, I sleep for two seconds, and then it would go outside of this if, and then return zero. So the child process sleeps for two seconds, and then finishes. Now, in the parent, we're going to print F calling wait. We will create a local variable called W status. So that is a location that the system call can write to. And then we call wait and give it the address of that W status. So the kernel is going to write to that int. And when we call wait, we're going to wait for our child to terminate. So this will take two seconds in order to return. And when it returns, we'll get a wait PID, which will hopefully be greater than zero, and it will be the process ID of our child. So if you look at the macros, there's this W if exited status macro because this int, they pack a lot of information into it. So you have to use these macros to first ask how it terminated. And then when you ask how it's terminated, different fields might be valid. So for now, we can just assume that if a process terminates, it exits normally. So that means it returned from main, it called exit eventually. So we have, if it exited, then we print wait, the return value of wait and the status, which we can read through this W exit status macro. So you're only allowed to use this macro if W if exited is true. So if we go here and we run this, hopefully it's calling wait, wait two seconds for our child to die. And then when our child's dead, it says wait return for an exit process. Here's its process ID and here's its status. Yep. Yeah, wait waits for your first trial process to terminate. Yep, yep. So the question is, can I just call wait without passing any arguments to it? So you have to at least use this argument. I guess you can call null and it won't write anything, but you have to use an argument. This is statement here. So we do need that if statement because there are cases where a process might terminate, not through exiting normally, that we'll get into next time. So if I didn't have this here and I just read this all the time, this might not be valid. So I just have a case where if that's not true, I just returned some error code to signify. I don't know what the hell's going on, but when we run it, it seems to work. All right, any other questions with this code? Really nothing? Question is how is it packed into the W status? And the answer to that is you'd have to read the kernel. Yeah, you can decode it yourself if you want. It might be in the documentation. No other questions like what might happen if I do the, you know, two weights? Anyone guess what happens then? Sorry, close, let's just run it. So calling weight, the first weight still runs and then we overwrite it. And then, hey, guess what? We get a negative one, which means weight returned an error. And if we did PR there to see what the error was, well, it's going to tell you, hey, you don't have any children, why are you calling weight? So you're only allowed to weight on your direct children and I don't have any more children. I just have one, so I'm only allowed to call weight once. So technically, I just, so technically that W status didn't get updated because this returned an error. So it just never changed. So technically that's the W status from the successful one. Any other questions, no fun things to do? All right, let's make some weirder processes then. So a zombie process then. So again, I told you that a zombie process is waiting for its parent to read it to state. So it means this process is terminated, hasn't been acknowledged. Again, that is the kind of it can't run anymore, but the kernel also can't clean it up and recycle its PID. So we'd also, you might also think of it as the process may have an error in it or something like that. The parent should read its exit status and do something about it if it has an error. The operating system can also trigger and interrupt to the parent process and poke it to acknowledge its child and we'll get into how you write your own interrupts in the next lecture, but this is just a suggestion from the kernel. It's an interrupt handler. You've written your own interrupt handlers. In your interrupt handler, you can just be like, nah, I'm good, I'll ignore it. I don't really care that a child is crying. Not a good parent, but so again, remember that the operating system has to keep the zombie process until it's acknowledged. If the parent ignores it, the zombie process needs to wait to be reparented to another process, and that's the only way that something else can go ahead and clean up its resources. So now we can talk about an orphan process. So it needs a new parent. So the child process lost its parent process. It still needs to be acknowledged by something and by default, your operating system is going to reparent the child process to a NIT. And then a NIT is now responsible for acknowledging the child. So we learned that a NIT now has two jobs, and now you could write your own NIT if you really wanted to. So the first job of a NIT is to launch every other process, either directly or indirectly, and then its next job is to essentially wait on any other process that might get reparented to it. So at the end of a NIT, it could just have, while true, wait, and then it would clean up resources. Yeah, there's a question. Can you say that a NIT adopts orphan zombies? And yeah, that is true. You could have a process as a zombie process because its parent is ignoring it. Its parent is now dead, so now it becomes a orphan zombie process. It would get reparented to a NIT, and then a NIT would wait on it, acknowledge it, and then it can get deleted finally. Yeah. So by acknowledge, I mean someone calls wait on it and reads its exit status or ignores it, at least they have it. So there are some terminology here that's a bit more gruesome. So a NIT, getting the default orphans, you can think of it as the orphanage if you want, but the default name that the links developers call it is a bit more gruesome than that. It is called the reaper, like the grim reaper. Yep. Yeah, but. So the question, yeah, well the question is, won't a NIT have a bunch of children? And a NIT will have a bunch of children if you create a bunch of orphans, but it would just have a while true weight and it would clean them up as they come in. It would slow it down very minimally because it just has to get reparented and then if a NIT is written properly, it would wait immediately. The kernel reparents orphans to a NIT. A NIT just calls weight to get rid of them. Yeah. Yeah, so I called a NIT the reaper because it inherits all of the orphans to eventually kill them. Yeah, not gruesome at all. This is where your Google searches get really weird. You can also designate a process to essentially become the orphanage, but the term for that is you become a sub-reaper. So that means any of your descendants will get reparented to you instead of a NIT. So we can go ahead and see that this is true through an example. So here is an orphan example. So it will look exactly the same and yes, the code is posted. So in main we fork and then we check if there's an error. Hopefully there's no error. And then if we are the child process, we will print our parent's process ID through this getPPID call. Then we will sleep for two seconds, then call it again and see who our parent is now. And then the child is going to return zero. And then in the parent, we are going to only sleep for one second. So initially when the parent or when the child calls getPPID, the parent should still exist. I should get the process ID of the parent. And then after two seconds, well after one second the parent exits. So I lose my parent and I would have to get reparented. I wake up a second after that and then I ask who my parent is now. So if I do that, I can see that, hey, my process is some really large number. And then when I wake up again, well my parent process is now parent process one. So I got reparented to a knit and nothing killed me. It just waits until I finish executing and then a knit cleans me up. So in this case, I just had an orphan process, not an orphan zombie. So it got reparented to a knit and it was still executing. Then it exited, then a knit cleaned it up. Any questions about that one? Yep. So it doesn't clog up a knit. Remember, wait just waits whenever the first process terminates. And the kernel knows when a process terminates. So it's not like it's busy pulling it or anything. It's very event-based. So it doesn't waste any resources. Yeah, so you just described a zombie process, right? So the child dies, the parent has acknowledged it. So it's just gonna waste resources. At minimum, it's going to waste its process ID because we can't recycle it. So eventually we'll run out of process IDs and another piece of information it would have is like the exit status, which might seem like a small bit of memory to you, like two inch, but if you had a million zombies, even two inch gets really big after a few million. So yeah, this would become a problem. Yep. So here it will work 99.99% of the time because sleep for a second is like an eternity to a processor. Doesn't matter what order comes in. So when this sleeps for a second, this is definitely gonna print before then and sleeping for two seconds, it's like a minimum. So it would definitely sleep for longer than the other process. So maybe if you were on like a one megahertz, this wouldn't work, but on a modern thing, this will work more or less fine. All right. Any questions on that? So let's go to the other one then. So, yeah, let's just go to the zombie example. Now let's just go to the zombie example. So the zombie example, we have, again, main, we're gonna have four immediately. Hopefully this is getting boring, which one's which. Check and error, hopefully it doesn't have an error. If it's the child, we're gonna do the same thing and sleep for one second. Now in the parent, we're going to sleep for one second and when we wake up, well, hopefully our child process is still running. So we'll print the process child state and then I wrote this print state function, which we'll go into proc state and then read it state so we can see what its actual state is as reported by the kernel. Then we go ahead and wait for sleep for an additional two seconds. So the parent process will wake up after a total of three seconds now, one plus two and then print child process state and print state again. So by this time, the child process should have exited already and we should be able to get its fun state. So if we run that, we get child process state sleeping because it's asleep and then it reports a zombie process because it is now terminated. It is an actual term in the kernel. We haven't called weight on it, we're just wasting resources now because we can't even recycle the process ID or anything like that. Yep. So the question is where in the else statement that the child process exit. So this slept for a second and then did something print F and reading this is really, really fast and then slept for another two seconds and here the child slept for two seconds only. So halfway in between this sleep is when the child process likely exited. So the child process likely exited in the middle of this sleep and then we printed it off and said, hey, now it's a zombie. Duh, yeah. So you want me to wait like here after zombie. So, I mean, we probably... Would that not work? I mean, it would work, but we probably won't see anything, right? If we wait after we print something and wait properly, then it's going to be cleaned up at this point. So then it's just going to no longer exist after the wait. So we could move it if we want to see that it no longer exists by moving it here before we read the child process state. In this case, I think it's just complaining at me because I don't see its return value. Oh, it's implicit and whatever it probably is defined. You can probably find it. So in this case, if I run it, it should go sleeping and then unknown because that process no longer exists because I called wait on it and nothing else reused that process ID because I got kind of lucky, although I'm not launching processes, so that's why. And you can see how I wrote it. So if you can't find that entry, I just guessed that you ran it on macOS. So that's where that error message came from. Yep. No, I wrote the print state function. Yeah. Yeah, the print state one is something I wrote. All right. Any other questions, learning about zombies and orphans and reapers and things that make your Google searches real fun? All right, so if I ask Ken Parent Fork Orphan, everyone knows the answer to that? Yeah, so it's called an orphan when it needs to get reparented. So as soon as it gets reparented to something else and it's no longer an orphan. But so hopefully it's not an orphan for very long. All right. No other questions or fun things to do with the code examples? Yeah. Yeah, so sorry, a process can't fork. Basically not an orphan, but like a normal child with that parent process. Yeah, so it would start off as a normal child and then whenever the parent dies, it becomes an orphan process and then it has to get reparented. No, so only a process can call Fork and then it creates a new process that's a clone of itself. All right. All right, no other questions off this? Yeah, they could have used adopt, but they used reparent. Yep, so when it reparents it, it will, it just gets reparented to a new process. It doesn't really affect it that much. It can still execute. It's still, it might not even be aware it got reparented. It might not care. Sorry, can you speak? So you ask, so a child only gets killed for something? So we don't know how to kill children yet. So we'll kill some children next lecture and especially if, no, I won't say that. All right, fine, fine, question to you then. So, let's see, how many times, or how do I know how many times I need to call wait to be a good parent? Yeah, equal to the number of times I Fork, right? That makes sense. Anyone disagree with that? So is there a situation minus being a knit where you won't know how many children you have? Nope, yep, yeah, minus reapers and sub-reapers. So, or basically in other words, is there any reason to excuse you from not being good parents and calling wait? Yep, if your child was killed by outside forces, you should still call wait on it. Yeah, so that's the question. If I call exec VE, how does that play into things? So yeah, that would probably be a situation where you don't know how many times to call wait. The process before you spawned a bunch of children and then called exec VE and then you still maintain the same process ID and everything, but suddenly you inherit a bunch of children. So that would be one case, yeah. But in general, you should know how many parents you have because you can also only wait on direct children. So they have to be processes you create and you have to call fork. So you should know how many children you have, right? It's like kind of the same thing as being a parent in real life. If they're your children, you should probably know how many you have if you're responsible for your direct children. Hopefully you know how many you have. All right, and the other questions. Wow, the other lecturers had lots of questions. I guess we covered everything. No, cool, wanna go home? Sweet, let's go home. Just remember, phone for you, we're all in this together.