 All righty, welcome back to 353. So thank you for joining me today. Today we get to talk about sub-processes. So this is basically tying everything together we've learned so far, and this will prepare you for exactly what you will see in Lab 2, including some mistakes you will make. So this is basically implementing the sub-process module in Python that lets you create and actually manage processes because guess what? Python has to use all the same system calls as we do. So no getting around that. So this is basically the equivalent of what the Python module will do, which will be somewhat useful. So what we want to do today is slightly similar to what we had before. So we want to send and receive data from a process. So before we just received, we just sent data to a process and then it was able to read it. So now we want to be able to do both at the same time. So for today, these are our three goals. So we want to create a new process that launches whatever the program is as specified as the command line argument, send the string testing with a new line to that process and then receive any data that that new process writes to standard output so that maybe if this was a more involved program that we go ahead, we use that data for further processing or something like that. So if we saw exec VE kind of a pain to use, so there's C wrappers for exec VE that are a bit more convenient to use and one of them they'll all start with exec and some different letters to kind of signify what it does. So there is an exec LP that is slightly different than exec VE. So this L means it takes a list instead of a vector or an array because that was kind of annoying. And this P means we don't have to give it the entire path including like the root directory, like the absolute path to the executable, we can just give it a name and then it will use a path environment variable to go ahead and find where that program is for us. So it will save us a bit of typing. So exec LP, the first argument here is just the name of the program to actually run and it will do all the work of actually searching for it. And then instead of an array of C strings, well, we just can have a variable number of C strings we give it until we give it a null and then that signifies we're done. So we don't have to bother with an array for this just saves us a bit of typing. So otherwise, after all that magic of like looking for things and converting things into a array of C strings, well, it's going to eventually do exec VE because that's the system call. So have all the same rules, doesn't return if it's successful, negative one if it fails, sets error no, all that fun stuff. So it just saves us a bit of work, bit more convenience that is basically what the C wrapper is for these function calls do. So final API's we can go over or doop and doop to, we saw a doop before so that just duplicated a file descriptor and gave us a new one. So create a new file descriptor and whatever we gave as the argument, it made sure that whatever that represented the new file descriptor also points to that thing. So in the case where it was like the read end of the pipe, if we doped it, we could get a new file descriptor back that also points to the read end of the pipe. So there's a doop to call that is a bit more convenient because before we just like closed file descriptor zero and then reopened one in one of the examples. And that was a bit of a pain because when we get into more complicated processes, well, something can actually happen between you closing it and you opening it, so you're not guaranteed to always get file descriptor zero. So in this case, this is the file descriptor that you want to make a copy of whatever it is pointing to. And then the new file descriptor is the new number you want to point it to. So you can directly control what the new file descriptor that gets returned is. And the rules for that is this function call will have or system call will happen atomically. So you can't interrupt it in the middle of that. We'll get into why that is important later when we have multiple threads and things like that. But for now, just know that it's atomic and it will ensure that they both of these file descriptors will actually point to the exact same thing. And if this new file descriptor was open before, like it was pointing to something that was valid, it's going to close it for us so we don't have to remember to close it. So aside from that, yep. So for dupe, you'll just get back a new number. It should be the lowest sequential number. But if you have like, when we get into here later, when you have multiple threads going on, you might not know what returns from dupe when. But if we only have just a main before, like what we had, we just close zero and then we duped, then we're confident that we'll get file descriptor zero back because it's the lowest free one. But sometimes we can't make that guarantee. Yep, yeah, so both of these return the new one. So in dupe, you don't get to pick what the new one is. It will just give you the lowest available one. With dupe two, it will return the new file descriptor. So you get to control what file descriptor gets replaced, which we'll find useful, yep. No, so old, in both of these, old will be left alone. So that's whatever you're copying to. So this new one, if it was already open, that gets closed. And then you're making that new file descriptor point to the same thing as old file descriptor. Yeah, yeah, if new file descriptor, if that number you pick, because it's just an integer, if it already, like in that process, represented a valid file descriptor, it gets closed first. Yep, yep. So yeah, the question is, why won't I know what file descriptor gets returned from dupe? And that will essentially come up when we have threads and multiple things can be executing in a process at the same time. And like not knowing the order between processes, you also wouldn't know the order between threads. Yep. So threads are like different parts of it? Yes, yeah, so when we get into threads, so this confusion of having multiple processes going on at once and you don't know which one will execute, but it's not too bad because they're independent, you just don't know the order. In threads, it's that same problem except threads live within a process and bad things might happen because you don't know the order of things that will happen and they all share virtual memory and things get really messy, which is why threads is much later. So don't worry about threads for now. We're still in the nice land of just a process executing, it only does one thing at a time. All right, so now we get to tie everything together today. So this will be fun. All right, so here is our program again and this will be essentially a small version of lab two. So here I checked that I have two command line arguments because how I want to run this is I want to be able to do something like sub-process and then like the name of a program to run in a different process. So for example, if you haven't seen the uname program before, if I run it, what it does is it will just return the name of the operating system that runs. In this case, I'm running Linux. If you're on your virtual machine, you're running Linux. If you're on macOS, I think it might return Darwin, which you might be like, what the hell is Darwin? Darwin's the name of the kernel in macOS. So, but this command actually runs on macOS. So this is what I want to do and I already have some of the code written. So in main, I go ahead, I check that. I have two command line arguments. I just created two arrays for pipes because well, that is how I'm going to send and receive data to the process. It's about one of the only forms of inter-process communication we know. So we're going to go ahead and use that. And then here I fork. So I just create a new process and to make things a bit more readable, I will just see if my process ID is greater than zero, I call this parent function. So only the parent calls this and it gets both of the type FD arrays and then the process ID of the child. And then in the child, it also gets those arrays and the first command line argument. And that's the program I want to actually execute. So right now in my parent does absolutely nothing. So eventually it will go ahead and just return from here, call return zero from main. So that's exit zero, parent's done. In the child, it does exec LP and here the last argument is the name of the program. So we just tell it the name of the program. We don't have to try and find where it is. This exec LP will go ahead and do it for us. And by convention, the first argument should also be the name of the program that we saw before. Whenever I changed that and had the help message that was slightly different. So I'll just follow conventions here and then I give it no other arguments. So if I compile and run this, so it's called sub process, build, whoops, too many L's and I give it your name. What should happen when I do this? Sorry? Print's Linux, right? So if I run this, I print Linux but it's not the process that I ran that print's Linux. It creates a child and then that child, well it shares all the file descriptors of its parent including standard out, standard error, standard in, all of that stuff. And then the child process goes ahead and does exec LP of U name. So it will now directly just print to the same standard out because it's shared in both processes. So if I run this, all I see is Linux, same terminal, same everything. So we're all good so far. Yep, yep. Sorry? So the question is, don't I need two arguments in the main function? Yeah, I have two arguments right here. Oh, so this is the first argument, this is the second. So the convention is the program names always argument zero. Yep. Oh, why did this to initialize the pipefd arrays? So that's just a way in C just to zero initialize the entire array. So I just zero initialized it because I'm just used to initializing things. Yep. So right now I just started executing main so all I know at this point is I should have zero which is standard in, one which is standard out and then two which is standard error. That's it. So I didn't actually create new file descriptors because I didn't do any system calls like gave me a new file descriptor. So I just set aside some arrays because I'm going to use them. Okay. Yeah. All right, so next step is, well, whenever I run this uname sub process, right, my child process is actually doing the print uname I want to go ahead. I don't want to see it in my terminal. I want to capture it because maybe I want to do some processing on it. So I need a pipe because I want to make sure uname whenever that process writes to standard out that I can go ahead and capture that output and then use it in my process. So for that, I need some inter process communication. We only really know what a pipe is so far. So here I'll do a pipe system call. So pipe and then I will use out pipe FD and then just go ahead check for an error. So now I did this before the fork, right? So what I would assume that happens is that after the pipe system call, as long as it's successful, I now have two new file descriptors, one for the read end of the pipe and one for the write end of the pipe. So now in this process, file descriptor three would, which is actually stored right now in pipe FD zero, that would be the read or the write end of the pipe. Quick quiz, read. So easiest way to remember which end is which is they kind of look like the standard file descriptors. So standard in that's always input. So you're always reading it. So it's zero. So same for a pipe, the zero one is the read end. They kind of look the same. So standard end is the read end of out pipe FD or I'll just call it out pipe and then we'd have file descriptor four which is stored at index one and that would be the rate end of out pipe. So I did that before I fork. So after I fork, both of those processes, since they're exact clones, they have all those file descriptors open in both processes and then after the fork, they're completely independent. So if I go ahead and run this again, what do I expect to happen? So I created a pipe and that was pretty much it. So nothing happens, right? It just does the same thing before I just created a pipe and both of my processes could use a pipe but they never did. And then yeah, for a question of what exec LP does, so exec LP, that will do the same thing as like exec VE. So if this is successful, this child process just turns into, in this case, U name or whatever argument I give it and it starts executing that program with that process ID. So in this case, if I do this, this program goes ahead, forks makes a new child and then in the child, we do exec LP. So it would just run this U name program, which goes ahead and eventually just prints Linux to standard out. So if I want to capture its output, well, I should probably make it force it to output to the right end of the pipe, right? I want to take the output that the child process is generating, capture it so I could hopefully do something with it. So that is where my fun dupe two comes in. So dupe two, the first argument is the thing you want to copy. So I want to take, choose the file descriptor that's pointing to the right end of the pipe because I want this process to eventually right to the right end of the pipe. So here the right end of the pipe is FD one and basically what I want to change, I want to change, oops, wrong argument. So what this will do is it will take whatever this file descriptor is pointing to. So it is the right end of the pipe and it will also make file descriptor one point to the right end of the pipe. So in the child process, now if that process tries to write out to file descriptor one, it's going to instead go instead of to the terminal, it's going to go to the right end of the pipe. So I'm just replacing which one it is. So now if I run that, I see no output. I'm still running that process. It's not, I didn't change what that process is doing. That process, whenever I do exec LP and start running that program, it's just going to write Linux to standard out but through, because I did a dupe two right before I did exec LP, I change what standard out was pointing to. It's now pointing to the right end of the pipe. So now whenever that process does a right system called the standard out, it now goes to the right end of the pipe instead of the terminal. So now I can no longer see it. Questions about that? All right, before my advice was to close any file descriptors as soon as I don't need them as long as they're not the standard ones. So I should probably heed my own advice here and what file descriptors am I done with? Well, I definitely don't need the read end of the pipe. So I definitely don't need the read end of the pipe anymore in the child process because my intention is that the parent goes ahead and reads from the read end of the pipe and that's how we get data back out of it. So technically I also am done with now the right end of the pipe through its original file descriptor. So at this point, when I, oops, actually yeah, that's fine. So right now when this child process or when I did this child call, these were all my file descriptors and what they were pointing to. Right after the dupe two, I essentially just made file descriptor one also point to the right end of the pipe. So now my file descriptors, oops, look like this right after that dupe two call. Yep, I'm not reading anything right now but if we were to read from that file descriptor we would read whatever information gets written out to it, right? Because the pipe is basically just a buffer that's managed by the kernel. So if I read from it, which I want to do in the parent process to get information out of that process but in the child I don't wanna read, right? Because pipes are like just one way communication channels. So my intention for this is to have the child right to it and then the parent read from it. Yeah, all right. So this is what it looks like after dupe two. So I'd still have five file descriptors and after dupe two all I did is change what file descriptor one is referring to. So after that, well I actually don't need file descriptor three and four which are stored in pipe FD index zero and index one. So after the closed, my file descriptors look like this right before the exec LP. Yep, so yeah, right now I haven't touched in pipe FD. So I'm only concerned about getting information out of that process. So yeah, we haven't touched in pipe FD yet. Yep. So we have, when we close number three that doesn't act, does that not close the file that is the read end or just sort of close like the communication channel? So the question is after the close, why can I still use the right end of the pipe? Yeah, why can I still use one? Well, so each file descriptor they're independent like it's just file descriptor refers to something. So just because two file descriptors refer to the same thing, if I close one that just means I can't refer to that through that number anymore. Yeah, it's essentially like deleting a pointer. So easiest way to think of file descriptors as they're pointing to something. So each time I get a new file descriptor that's a new pointer, if they point to the same thing and delete one doesn't, one doesn't affect the other. Yep. Okay, perfect. Yeah, easiest way to think of file descriptors as pointers. Yep. No, so right before we do exec LP, I deleted it for some reason. My file descriptors look like that. Which is exactly what I wanted because, right, I can't change whatever that process does. All I know is that whatever it outputs, it's just going to write to file descriptor one, whatever it inputs, it's gonna read from file descriptor zero. So I want to set my file descriptor so that if I want to capture its output, instead of just going to my terminal, I'm forcing it whenever it just writes the standard out to write to the right end of the pipe and that way I can read it in the parent, which we'll do right next. Yep. Yeah, so technically the kernel will manage like what the pointers are pointing to and each file descriptor is just a pointer. Yeah, so the question is how do I like free the block of memory associated with the pipe or something like that? So the kernel smart, so when you close, it's essentially like deleting a pointer and the kernel will know that hey, if nothing is pointing to that anymore, I can delete it. So that'll be done by the kernel and it keeps track of that. Yeah, and that's again how it also knows that if it's possible for a pipe to have data in it, if there's at least one pointer to the right end of the pipe, well then it's possible that something could actually use that and fill it up with data. So that's one of the things it uses to make sure that hey, this pipe can no longer get data. All right, let's do the other side. So in the parent, we want to get that information out of it. So we'll just create a buffer of our magical number and do a good old read system call. So I can read now from the read end of that pipe and I should be able to capture whatever the output was from in this case, uName if I give it the uName argument. So let's just do a read with our fun buffer. Whoops, I hit the wrong button. This one is too, all right. So now we read, let's check for an error, make sure I didn't screw anything up. So after I do that, I should have read in this case the data that came from my child process, which if I just give it the uName argument, it should be Linux. In this case, I don't do anything about it. So let's just print off what I've actually read. So print out read our special string formatter. So int bytes read buffer. All right, so now if I run that, I actually can read Linux. So it wrote to the right end of the pipe and then in the parent process, I was able to read what it wrote. So now in this case, well, it just read Linux, but now it's kind of extensible. I could actually like build up and hopefully like use the actual output for something useful. So I could run it with the LS or something like that. And it was going to say, hey, read the output of LS. You could actually process the output and do something more useful with it than what I am doing. So questions about that, yep. So the question is, does print always go to the terminal? So answer to that is no, print basically will always eventually call right to file descriptor one. So it's whatever file descriptor one is. Yes. Yeah, so the question is, how did we get a message if we screwed up file descriptor one? So remember right before the fork here or right after the fork, both those processes are independent at that point. So they looked the same, so they would have had all the same file descriptors pointing to the exact same things, but after the fork, they're now independent because I have a parent process and a child process, right? So in my child process, yeah, I made file descriptor one point to the right end of the pipe, but in the parent process, I didn't do anything, right? So let's see. So in the parent process, whenever it started, whatever the child did do, it's file descriptors does not affect me. So when I started this function, these were all my file descriptors. And standard out was untouched. So by default, it's just the terminal. So I didn't touch it. If I, yeah, if I went here, which is going to be a fun way to debug things, as soon as you start screwing with file descriptor one, so like in this process, let's say I did print F, I don't know, you'd start like debugging or something like that and try and throw a debugging line. Well, guess what? If I run this, I'm just, oh, in this case, it's even worse. So I don't even see this debugging line because well, it won't go to my terminal because I changed what file descriptor one is. And in this case, print F also isn't really guaranteed to even do a write system call unless I like flush it and really mean it. So in this case, it didn't even do a write system call and then I started executing a different program and it was even worse. If you want to force a write system call, you can do something like this. So I believe if I do this, I run that, yeah. So now I got debugging because while I did a print F, it wrote to file descriptor one, file descriptor one is now at the right end of the pipe, so I just filled that up. So yeah, so if you're used to print F debugging, print F debugging doesn't work if you change what file descriptor one is. So good luck debugging, yep. If it's reading, so it would just write the standard error. Yeah, yeah, so the reason there's a standard error is like generally it's nice to have them separated so if you don't want to see the errors, you know. Yeah, yeah. By default, standard out and standard error are both the terminal. So you see them, doesn't really matter most of the time, but you can split them if you want, yep. So when I flushed it, I forced that system call and because in the parent, I just do a single read call, I don't just constantly read until there's no information left. So yeah, the write call forced to read, yep. So when I just had my print F here and I didn't see anything, well, print F isn't guaranteed to do a system call right until it wants to, so. Did it ever do it? Yeah, in this case it never did that because print F can buffer input and just kind of wait until it's optimal to do a system call. At this point, when I just had a print F here, it was waiting around and then I did an exec LP so I immediately started executing a different program so it's never gonna come back. That program just got replaced. That program was like, oh, print F was like, okay, yeah, I'll get to it, I'll do the system call in a bit and then I just replaced the program and it's not gonna ever come back. Yep. Yeah. Yeah. Yeah. Yep. Yep. Yeah, so that's a good question. If I did not close these and say I had file descriptor three and four, what would you name still have those open? And the answer to that is yes. It's the exact same process so it would still have the file descriptors open. So it's generally considered rude to have more file descriptors open than is necessary. No, no. Yeah, so you name wouldn't have any code that closes those file descriptors and the only way they get closed is the process terminates and then all the file descriptors get closed. Yep. Why not just use the right system call instead of print F? Because that would have. So that would have just done the same thing as the flush. So yeah, but after you take this course and you learn about, hey, print F doesn't always execute, kind of a pain with debugging. After this course you might start debugging by just writing to file descriptor one because then you're sure that that system call actually happens instead of print F that kind of ignores it. Yeah. So the question, I don't know if we can cover it in this, is why does print F not just do a right system call immediately? So remember, system calls are slow. So I could have something where it's like slightly off topic but like four one to like a thousand something like that just print F like a star or something like that. So if it didn't buffer, so that's what it's called. Just buffering, collecting a bunch of inputs, that would be a thousand system calls and that would be slow. So what it's gonna do is build up that system call and then wait until essentially a new line or like all thousand characters are already in that big array and then just do a single system call. So it does it for performance reasons. Because especially in your first year course think about all the times you've called print F and just like with the character or something like that. Your program would be very, very, very slow if it just went with you and just immediately did a system call. But becomes a pain in the butt for debugging. So if you ever think you have gone crazy and you debug something, you're like, no, my program has definitely hit that line and I don't see anything. Well, it might be because print F sometimes just doesn't do a system call. So either flush your output or since you've taken this course you can just do your own system call if you want. And if you're worried about like standard out something weird happening to it you can just create your own file descriptor with a file or something and always write your debug messages to that and then you know nothing else can screw with it. All right, so, whoo. All right, that was a side journey. So where are we? So now we have the information in our parent. So now we've successfully went through one cycle of this. So we are getting information out and I should probably in my parent close the file descriptors I don't need anymore. So the parent never needs the right end of the pipe because that's only for the parent. So I should close that pretty much immediately. And then after I'm done reading from the pipe in this case I only read from it once I should probably just close it right after I'm done. Remember that tip because that will help you through lab two and cause you not to go crazy anymore. So now if I run that I get its output. All right, so now we have to do the other side of it. Quick, so we want to be able to send in this case I said send the string testing with a new line to it to that process. So we need another pipe. So we will call this one in pipe FD. So now I have a lot of file descriptors open. So we'd have whatever like all the way up to six or something like that. Yeah, we'd have all the way up to six but same rules are going to apply. What I want to do is in order to send information for the process right before I do the exec LP I want its standard in, so it's file descriptor zero to be equal to the read end of my in pipe. So, and that should be it. So essentially let's do the good old computer engineering special and do the old copy paste and replace it within. So aside from that all I want to do is replace file descriptor zero with the read end of the pipe. So here I have my dupe two. So I take whatever the read end of the point or whatever the read end of the pipe is what it's pointing to and then also make file descriptor zero point to that. And then after that same principle as before I can close the extra file descriptors. Yep. Yeah, so what's the point of use, can I just use one file descriptor for this? Or sorry, one pipe for this? Yeah. Yeah, so we'll go over quickly why you cannot do that but basically if you don't have two separate pipes you don't know which side is supposed to be writing and which side is supposed to be reading. So here, quick, if I did something silly like I don't know, made a cat. If I type into it, well this is fine because in this case whatever I type it goes to standard out, it reads it, it outputs it, that's fine. If I had it in a loop or it was constantly reading over and over again and I only used one pipe it would be reading from the same pipe it is writing to. So it would create an infinite loop. So here, I can draw that and let's see if I can draw that. So say I had a pipe, like a Super Mario Brothers pipe that's cool. So this is the right end of the pipe, right? So I'll have an arrow there, I'm like filling it with data and what poops out of the pipe is file descriptor zero or the read end of the pipe, right? So if say my process was something like cat, it's file descriptor one, two, and three. Say it's file descriptor one was filling data from the pipe and then it was reading data from that pipe and then in my other process, so say my original parent process, what would I call it sub process? Say it just wrote one character in here, so say it just wrote like a X, while it would write into the right end of the pipe and then now cat is stuck in an infinite loop. So it's going to read the X that I wrote in and then it would write out the X and then while that goes into the pipe so it would read the X, write the X, read the X, write the X, read the X, write the X, so it would be in this kind of infinite loop forever. Yeah, two pipes do not share information. That's why I'm creating two pipes because I want them separate. So why have two pipes is my like, this process is throwing data into the pipe, right? And then any data that I get out of the pipe, I read it in this process and then I'll create a separate pipe essentially like this. Whoa, that's getting funky. So I'd create a separate pipe and because they're independent, this sub process can now write data to this pipe that can be read by cat and they won't interfere with each other. Yeah, so because I forked and all my file descriptors were pointing the same thing, I can manipulate them however I want. Oh, okay. Yeah. Yeah, so it would get an infinite loop because in the implementation of cat, it just keeps on reading and reading and reading until there's no more input. Okay. So because of that, well, if it constantly would also write into the same pipe it's reading from, so it would never close and it would just get stuck in its own infinite loop. Okay, yep. If sub process was also a cat, then probably then that, yeah, that'd be like the same thing as when we had our one process and we forgot to close the file descriptors and just hung there forever. They'd both just be hanging there forever, waiting for input. All right, so now, all right, we'll go, we have to speed run this now. All right. So I have in pipe FD, so I want to give data to that. So let us create a string testing, size T. All right, so I want to actually write to in pipe FD, zero, the string, the length. All right, screw checking for errors. That's, so I'll close it after I'm done with it and then I will close the read end. Whew, okay. So this whole mess of code, that should just be writing data to the in pipe and then this, my child process, because I've set it up so that it's standard in is the read end of the pipe. It should immediately read whatever information I give it. So in the case that I run U name, well, I just read Linux, so I sent it a string, but U name, it's just a little dumb program. It doesn't read anything from standard out. So I sent some information, just ignore that. If I ran it with cat, then cat, while that reads from standard in and then outputs the standard out. So it would read in this case from the read end of the pipe, so it gets the data that I sent it, so it gets that string testing with the new line and then I go ahead and read whatever it sent out. So I can actually verify that it works. So before real quick, what we should ask to tie everything together is we create a child process? Was I a good responsible parent? No. How do I become a responsible parent? At least in this course. Yeah. Yeah, I should probably wait on the child. Does it matter what line I put the wait at in this case? Yep. So probably at the end that would be good. All right, well, all right. So I know it's process ID, so I'm just going to wait on child PID. I don't care and I don't care. All right, so I'll just wait on it and then print wait, okay, sure. So now if I run this, let's see if I'm a responsible parent and I cleaned it up because before, while I was essentially creating a, probably potentially a zombie orphan at the same time or the very least an orphan. So now in this case, oh there, I successfully waited on it so my parent, my child process was done. If I wanted to, in fact, I could also move this wait PID up before the read. So I can let it die. So I can just let it fill up that buffer. I don't have to actually read that data before it's dead. So I could move that wait PID here and nothing bad is going to happen. So it doesn't really matter because the pipe that's managed by the kernel, so it will outlast that process. So if I put it, let's see, if I put it up here, probably good things will not happen because it will just get stuck. So why does this get stuck? Yep, yeah, in this case, it doesn't exit because remember our trial process is cat, so it will just constantly read from file descriptor zero until it can't get any output and the way to signify through a pipe, if you can't get any output is no process has the right end of the pipe open. So if I put my weight right at the beginning here before I close the right end of the pipe that I use, well, it could still get data. My child process is never going to actually detect end of file or no more input. So I'll just sit here, wait forever for my child process to die, which is currently just waiting for input. Yeah, I guess we're out of time, crap. All right, so with that, just remember, I'm pulling for you, we're all in this together.