 All right, welcome back to 353. Thank you for joining me today in this cold lecture room. So today, we're going to get a bit more practice using processes and see some more forms of IPC that we can actually go ahead and use today. So I looked at the Lab Zero Survey things. A lot of you want to implement your own operating system, which is great, but a little beyond this course. So there's actually a teaching offering system that, by the end of this course, you should be able to read and figure out everything that was going on in it. And it is called XV6, and specifically targeting risk 5. So it's used in the MIT Graduate Operating System course. And if you want, you can go ahead, download it, run it on your VM, which means, yes, you'd be running a VM inside your VM. So you'd have VM inception. And it's basically a re-implementation of UNIX-V6, which is quite small and manageable, for risk 5 in C. And one of the parts of it that we could now probably read is not that one, is their init process. So remember, the kernel starts up, and it starts one process, one process only. Called init, it's special. It gets process ID 1. And for XV6, this is the implementation for exactly init. So what it does is it has a main. It's just a normal user process. And it opens a special file called console, and it just checks for errors. And if that doesn't exist, it creates it, and then just opens it. And that special console file just represents like a serial connection, or just some connection I can go ahead and write some bytes to and read some bytes from. And then this new system call will go in today called dup. So it basically just creates a new file descriptor that points to the same thing as the argument. So in here, it dupes file descriptor 0, which is just this console. And the new file descriptor for this process would be file descriptor 1. And then dupes again. So then we get file descriptor 2. And guess what? Those are standard in, standard out, and standard error. So they're just created immediately. And then they have a four semicolon, semicolon. They could just write while true. All it says is starting sh, and then it forks, and then checks if there's an error. And if it's the parent, or sorry, if it's the child process, so if it's the new one, all it does is exact sh, which is your shell. Otherwise, it just prints that it failed and then quits. And then the whole kernel is done because this is a net. Otherwise, what it does is it just infinitely calls weight over and over again. And why does it do that? Well, because it needs to clean up all the zombie orphans it gets. So it's constantly just calling weight over and over again. And weight will return whenever one of your children has terminated. So here it checks to see if the process ID that terminated is equal to the PID that it set, so equal to the shell. If it was the shell, all it does is just restart it over again so that we always have a shell to type in. Otherwise, checks for an error. Otherwise, it was just an orphan process. So we just do nothing. We just cleaned it up. We were the responsible orphanage, I guess. And that is the entire implementation of a net. So we can see what a net does. So this operating system, kind of cool if you want. You should be able to read it. And if there's any questions, well, that's one avenue to get some answers in. So for today, so we can talk about really old operating systems too, so something called Uniprogramming. And that's for old batch operating systems that just only ran a single process at a time, which none of you have probably ever used. So Uniprogramming, well, that just means only one process is running at a time. Two processes cannot run in parallel. They cannot run concurrently, no matter what. And those two words probably mean the same thing to you, parallel and concurrent. They are actually slightly different things when we talk about them in computer science, which we will get into whenever we get to our threading topic later. But for now, don't have to worry about it. They kind of mean the same thing. Two things can't actually happen at once. And an example of a Uniprogramming operating system is DOS. If you've ever heard of DOS before, probably not. But that only ran one process at a time. So everything today, including your watch or literally anything, is multi-programming. So it allows multiple processes to run in parallel or concurrently at the same time. Modern operating systems, we have multiple cores now, even different types of cores, sometimes efficiency cores, sometimes performance cores. Modern operating systems, they want to try and utilize the hardware as best they can. So if we have multiple cores, we want to run everything in parallel that we can and concurrently if we just have a massive amount of processes. So we probably have more processes and more physical cores on our machine. So we have to go ahead and swap between them usually pretty fast so it kind of looks like you have hundreds of processes all running at the same time. So the scheduler that we will get into at the end of the week, that decides when to do the switching. So to create a process, the operating system has to at least load it into memory. And when it's waiting there, the scheduler that, again, we'll see later, decides when to actually transition it from that waiting state to that running state. But first, we'll just focus on the actual mechanics of switching a process that is actually switching the current running process on a CPU. So the core scheduling loop that just changes the running process usually just has only four steps. So all it does is pause the current running process so it is now no longer executing. You would save its state so you can restore it later so all the values of its registers and the state of its virtual memory which you probably wouldn't have to change. So basically you're just saving all of its registers and then the scheduler will figure out what the next process to run is from the scheduler and then all we do is restore that processes state so restore the values of its registers because they should be independent and then we just let it run directly on the CPU. So that deciding of what process to run, well, one option we have is that we can have processes pause themselves and give up the CPU voluntarily or we have the operating system maintain control which is usually the thing that's done. So letting the user processes decide when they want to run and give up time is called cooperative multitasking because in order for multiple processes to be running they all have to voluntarily give up the CPU and they all have to cooperate. One rogue process that wants to decide to hog the CPU while suddenly it becomes like a unit programming system where only one process runs at any given time. So in this case, the processes there would be a special system call to tell the operating system that, hey, I don't need to run anymore, you can go ahead and change processes and you'll actually be using that in lab three whenever it comes up. The true thing that happens with modern operating systems is it's called true multitasking where the operating system maintains control and pauses processes and you don't have any say about it. So when you actually execute your process and it runs you don't get to decide whenever it's physically running or not the kernel gets to maintain control of that. So some mechanisms the kernel does to maintain control is well, sometimes it'll just only let processes to run for a set amount of time. Sometimes it will just wake up periodically using like timer interrupts and then do the scheduling. So it would say, hey, wake myself up every thousandths of a second or something like that. And then when it wakes up in that interrupt handler it's going to do the scheduling there and decide whether or not to give the CPU to a different process. Yep, yeah, so specifically here I use the terms interchangeably because I don't know I'm just used to it but specifically this would be the kernel doing it because the kernel has to maintain that hardware control. Otherwise, like the kernel is definitely part of the operating system but the operating system has more parts. This part would definitely be done in the kernel because it has to maintain control. If you gave user processes the choice to switch what's actually running they could take control over the whole system and just run themselves or just, you know if they really didn't like you they could just run something irrelevant and just bring down your system, things like that. So yeah, specifically I have a bad habit of doing them interchangeably. Most of the things I talk about in this course are things that are done in the kernel. Yep, yeah, so the question is what is sleep an example of true multitasking So in this example, sleep would just be true multitasking but you can also voluntarily give up the CPU if you want. So you can have aspects of both but ultimately if my process calls sleep the kernel gets to decide whatever happens with it. So there are mechanisms that you can voluntarily give it up but ultimately you do not have say over whether or not you run. Yep, so yeah, the question is do any modern operating systems use cooperative multitasking? The answer to that is any general usage one, no. Maybe for embedded systems where you want really fine control over it you might do something like that but on embedded systems typically all the processes running are under your control and you have written them so sometimes you might wanna do cooperative just so you actually can argue about what gets scheduled when but for any general purpose operating system it would be all true multitasking because for security you cannot trust the user to do anything and especially with like even beginning programming, right? You don't want one infant loop to just hog your entire system and you can't do anything about it. All right, so swapping between processes that is called context switching and we said that the minimum we have to save is like all the values of the registers so we have to save all the values using the same CPU we're trying to save them on which is a bit weird so sometimes there's hardware support for saving states so it will save a group of registers but sometimes you may not want to save everything so depending on the CPU there might be more registers than you expect like special floating point registers special vector registers so on and so forth so context switching so just switching like saving the state of a process and restoring it something called overhead overhead just means it's not doing any useful work the useful work I want to do is actually running process any time I spend that's just switching between processes is at least from the user's point of view completely wasted because there's no actual work being done that they care about so we want that to be as fast as possible so usually there'll be like a combination of hardware and software support to save as little state as possible one example would be hey if your process doesn't use any floating point numbers then I won't save any of the special floating point registers that could be humongous so that would be one example of that so usually there's kind of a combination of hardware and software just to make saving the state and restoring it as fast as possible all right now we get to go into our new API we talked about it a bit before like we put a bar between processes we called it a pipe and we said it could communicate between them so now we can figure out what the shell is actually doing because well pipe is an actual system call so this is the new API we have today so it's a C wrapper for a system call and that system call is pipe so it takes an array of two ints so part of the system call will set those two integers in that array and there'll be new file descriptors and it will return an int so it just returns zero if it's successful negative one if it's a failure and then set error no so what this pipe system call does is it creates a one way communication channel using two file descriptors so in this pipe fd array assuming pipe is successful the file descriptor at index zero is going to represent the read end of the pipe so I'm allowed to do read system calls using that file descriptor and then the file descriptor at index one is going to be the right end of the pipe so I can write bytes to that end of the pipe and you can think of a pipe as just a kernel manage buffer so anything I write to it the kernel will store internally so no other process can access it and then some other process can also read from that buffer and it will empty it out so you'll see data in the same order that you have written to it so whenever you write data to it it will fill up that buffer in the kernel and then whenever you read from that file descriptor you're going to empty data from that buffer and then you don't have to worry about managing it you could write megabytes into it if you want to write may fail at some point but it should keep going for quite a while and we'll get an example of that later, yep so success in this context means it has created two file descriptors for us and internally the kernel has a buffer that any bytes we write to it it'll go into no so you just need to create it yeah, yeah so success in this context is it successfully created the file descriptors for you and internally there's a buffer in the kernel that you can now access yep is this just that so in this case so for one pipe system call that's one buffer and then you get two file descriptors one to fill it and then one to empty it and if I did another pipe that would be another separate buffer as independent yep, yep so to create them so that those file descriptors are valid and it writes the values to that array so within a process right now we have like file descriptor zero one and two that are valid if pipe is successful it will create two more file descriptors that are valid there so likely we'll have file descriptor three and four yep yeah so yeah so the usage between this is I can create a pipe and then fork and then because of the fork both processes have access to the read and the right end of the pipe so one process that we'll do later today is one process can write to the pipe and then the other process can read from it and then we're communicating between two processes then well you just want to have some type of inter-process communication we want to just be able to send bytes between two processes because normally if we didn't have pipe or files or anything like that then each process would be completely independent and we couldn't do anything special with them yep yep yeah in this case the only way to share these pipe file descriptors is through forking and actually being a complete clone of whatever the process that created the pipe was there are some ways that these aren't named you can do name pipes and things like that so they don't have to be related but for this they would have to be related and we can only get those file descriptors through forking sounds weird all right so before we go into example so I'll have an aside I think we went over this before I had a question about it like using that ampersand in your shell what does it do so if you have an ampersand at the end of your command that means your shell just starts that process and then returns immediately so it just forks it throws it and kind of off in the background and lets you go ahead and type more commands into your shell so it will output the process ID for you and it lets you know when it finishes so we know our shell has to be culling weight on it so this horizontal bar character that creates a pipe between two processes so if you want to sneakily fork bomb someone that we saw which was basically like while true fork well this is an actual valid bash command that does the fork bomb that's just really impossible to read so you can trick like first years into actually running it so what the stupid thing does is in bash for some reason you are allowed to name a function colon so this creates a function called colon and inside colon it creates it calls colon and then creates a pipe with another colon so that creates a new process and then it throws it off into the background and here just the semicolon just lets me run a new statement and then I just call colon to start off my fork bomb so this if you've seen this do not actually run this but if you want a quick and dirty fork bomb there you go I don't really know why I showed you that don't run it or at least don't write yourself and if you make someone else run it be nice and fix it for them so let's do some inner process communication and see because with forks it'll tie everything we learned about inner process communication before and kind of tie everything together so we're going to create a process that creates a child process and then we're going to write some information to it via a pipe and it is going to read from it so let us get into the example so little setup so I wrote some code ahead of time just to save myself some writing so I just have this check that all of the c system call rappers just return negative one there's an error so I just check if it's not negative one I just return otherwise I'll save the error no print out the message and then just exit immediately in case I have screwed up beyond all recognition so in here my main I call I create an array an array of two elements called fd's and then I create call pipe and check that it has returned successfully so at this point if I write out all of the file descriptors in this process likely what I have is I have standard in which is one standard out or sorry standard in which is zero standard out which is one and then standard error which is uh... two because in the pipe just creates new file descriptors and we number them sequentially I'm likely going to get file descriptor three which will be the same as fd's zero which would be the read and of the pipe and you just have to look up the documentation to know which and is which so I've already looked it up I memorized it so I know that in that array the zero with element is the read and of the pipe so I'd also get file descriptor four in fd's at index one and that would be the right oops and of the pipe those are all my file descriptors so now I can go ahead and let's just create let's just fork and I'll check PID make sure I don't have an error why do I have a red line compile alright don't know why I have a red line alright at this point now I would have two processes and both of them have all these file descriptors so at the time of the fork they are exact copies only difference is going to be that return value of fork and in this case if it's a child it is zero if it is apparent it is greater than zero and both processes since they are exact clones would have the same set of file descriptors pointing to the same things now afterwards if I tried to close file descriptors at this point it would only close it in the process that actually called the close so let's start off if PID equals to zero so now I want to actually have some communication between these two processes because after the fork they're completely independent so what I would like to do is in the child I will go ahead and just read some data so let's just create a buffer of some magical size and then do bytes red so I'll do a read system call and I'll read from the read end of the pipe and then I will read that information into the buffer so size of a buffer so I'll check bytes red make sure I don't have any errors or anything like that and do read so after this oh what's the format after this I want to print off I read child red oh what's the format I think it's this if you've seen it before so this will interpret my argument as a yeah okay so that's how you specify a size to the format specifier so if you haven't seen dot star that's what it does so it's because this isn't going to be a C string it's just going to be just a random array of bytes that I've read so if I go ahead and run this now so if I make this an int it should be happy alright so we're happy so if I run this right now then likely some bad things will happen so if I run it right now it returns immediately and it doesn't look like anything's happening so does that why would that happen does that make sense to everyone why it just returns immediately yep yeah so I didn't write anything to it and in fact well what I ran was essentially the parent process and the parent process in this example would just create a pipe fork and then in the parent this is what happens special happens in the parent nothing happens it just returns zero and it just went away it just terminated immediately so likely if I check and I do PID of pipes whoops so there's a process with this current name and that in fact is the child process so I accidentally orphaned it and then it got likely reparented to init and now it's just sitting at this read system call because remember read is a blocking system call that will just sit there and wait over and over again just wait until there's actually some data to read and in this case well there's actually nothing to read and also in this case you might be like well this is stupid that process is just waiting for another one to write into that pipe and doesn't really look like anything can write into that pipe turns out let's kill it so let's kill it violently because that's fun so turns out that when you are using pipe file descriptors remember I said wait before this read system call will sit there and wait until there's data unless it reaches like end of file and there's no input possible so in this scenario we think that there's no input possible because nothing can write to the right end of the pipe but in fact the kernel disagrees with us because this child process has the right end of the pipe open and it is a valid file descriptor and it is theoretically possible that it itself could write into the right end of the pipe so that it could read the data from it which is a bit weird so what we want to do and get in the habit of whenever we're creating pipes and things like of that nature is we should close the file descriptors as soon as we no longer need them so after I fork I should immediately close the right end of the pipe here in the child because at least how I'm setting this up right now is I'm only going to have the child read from the read end of the pipe so I should immediately just close the file descriptor whenever I know I won't need it anymore so if I go ahead and run this now I should not create a horrible well that sounds bad so in this case I get some output from the child so it probably terminated if I look up to see if I created an orphan I didn't because the child actually terminated so in this case well that original parent process it immediately exits so it would free all of its file descriptors so it doesn't have the right end of the pipe open it doesn't have the read end of the pipe open so in fact it doesn't exist anymore and then in the child process since we close the right end of the pipe then no process has the right end of the pipe open and the kernel is smart enough to realize that hey if no process has the right end of the pipe open that means it's not possible for any new data to appear in that buffer and well since read will return zero if it's not possible to have any new data we actually get read in this case returning zero and that's why we just have child read with nothing there so we should get into the habit of closing file descriptors as soon as we don't need them otherwise weird things will happen like your process will just sit there and be blocked forever in a read so in this case let's be nice and make sure we close the make sure we close the read end of the pipe as soon as we're done with it yep yep in for the pipe call yep yep so for the pipe system call we're giving it the address of that array and then so in this pipe system call we're giving it the address of this array right this fd's so as part of the system call it will create that pipe for us so create that buffer in the kernel and then set the elements of that array to be the file descriptor representing the read end and the file descriptor representing the right end okay yeah so the pipe overwrite it? yeah so if I just set you know fd's if I just well right now they're random values so even if I'd set it to 23 something like that this system call will overwrite them yeah yep so what we want to do is actually communicate between two processes in this and I've only written half of it so I want to just show using a pipe how we can actually have two processes communicate so ideally what I want when I'm done is I want one process to write data to the pipe and then the other process to read data from the pipe so we get two processes communicating with each other so I'll finish it up yeah so so the question is where pipe stored in memory so the pipes would be stored in the kernel's address space that the kernel gets to manage so you don't even see the data for it you can only access it through the read and write system call so it's somewhere in the kernel and we don't have to worry about it yep so IPC can be slow because in this case we're doing system calls system calls are typically slow there's ways to get around that that we'll see way later in the course but generally any IPC is going to have to use a system call there's some that you can just set it up using one system call and then essentially just share a spoiler you can just share a bunch of memory between processes if you want that's one thing people do to speed it up yep yep yeah so previously right now before I had this close I created I created a child process and it was in this blocking read system call and read never returned so it outlasted the parent so I created an orphan but in this case since I closed the file descriptor here as soon as the parent terminates and that is the only process that has the right end of the pipe open the kernel's smart enough to know that no data can go in the pipe and that you're essentially at end of file there yep so whatever the parent process terminates right now all the file descriptors get cleaned up yes yeah after the fork they're independent so child has a set of file descriptors and the parent does there's at the time of the fork they're the same but after that they're independent yep yeah there's already data there the read would return in this case I don't have like a while not end of file keep on reading over and over again so likely just do one read and be fine yeah it won't read read won't return as long as there's no data there and it's possible for more data to come in so let's do yeah right now this is silly I've only set up the child to read data I need to actually write data to it from the parent so let's create a string I'll call it s because I am helicreative let's say howdy and then a new line so let's get its length and now we can do some inter-process communication so now I can in this process so this is going to be a complete independent process I can actually write data to it and it should I should be able to read it in the child process yep just to match the return values of the calls so s size t is signed and size t is unsigned but I'm just following whatever the function prototypes are so that my compiler doesn't yell at me yeah as someone who has developed compiler stuff I don't like when my compiler is angry at me so here fairly bad form but actually let's get the return value so bytes written and then let's make sure that we didn't fail horribly check bytes written right all right so now if I compile and run this oh I actually didn't need the new line because I put one there okay let's just get rid of the new line just to pretty it up a little bit make it bigger so now when I run it my child all it does is read data and it read whatever the parent sent to it yep oh yeah so that's a good question was that your question too okay so I'll answer that one first so the question is how did I know that the parent ran first in this case I didn't so after the fork any process could run at that point I don't get to decide so let's say the child process runs first what's going to happen is well in the child process it gets a zero from fork so it would create a variable called PID set it to zero we would take this if branch we'd close the right end of the pipe create a buffer and then do this read system call and remember read is a blocking system call so it will just sit here and block because there's no data in the pipe yet so the parent process has the right end of the pipe open still so it's possible that data will come in so read won't immediately return zero because in this case the parent still has the right end of the pipe open so read will just sit here and block and I have no other choice if I'm the kernel assuming these are the only two processes I'm managing I have to let the parent run at this point so the child is sit here blocked on read and then in the parent we get some actual process ID from fork it would create the string calculate its length and then write to that the right end of the pipe and then at this point I don't know the order so now the child can actually unblock from the read because there's data it can read from the pipe so at this point I don't know what's going to happen but my parent process doesn't print anything so in this case my child would just print out whatever it saw yep yeah so in this question so what happens if I write more data that can fit in the buffer so in this case I only call read once so it will just fill up as many bytes as it can and then that's it if I wanted to read all the data I'd have to call read over and over again until it just returns zero like we did in that open example last lecture yep yeah so if my pipe fills up or something like that right would just return an error and just be like negative one no space or something like that you'd have to read the error codes but it would just be like nah I'm good or you know it might also just we might try and write in this case was it I don't actually check that the length is written so however many bytes we have there like 12 yeah 12 bytes so I don't actually check that I actually wrote 12 bytes maybe the kernel was a jerk and said I only wrote two but here I didn't check but usually you can assume it has if I wrote this with error checking stuff in mind I've checked that I actually wrote exactly 12 bytes and I didn't have any weird partial writes or anything like that because if you read the documentation the kernel can just do whatever yep yeah so all this is an example of true multitasking the kernel gets to decide what process runs I'll have some situations like that blocking read system call where that process can't resume until something else happens so in the case that the child reads first it's going to get stuck here it's going to call this read and it's going to get blocked waiting for the parent to actually write some information all right so any other questions for this so in this case I also technically missed something so if I wanted to be like heed my own advice and then close all the pipe file descriptors whenever I don't need them anymore well in the parent I don't use the read end of the pipe so I should close it immediately and then after I do this right I should also just close the right end of the pipe in this instance doesn't matter because the parent process terminated and whenever a process terminates it closes all of its file descriptors so I technically didn't need it but for good practice I should probably write something like this all right any other questions from that yep yep yeah so the question is what happens if I have multiple child processes and they're all calling read so in that case you will not know which one returns from read and only one copy of the data will be read across all of them so say I just wrote the letter a and I have three children one of those processes is going to read a the other's not going to read anything if I wrote a b c well then there's a lot of possibilities one process could read a another one could read b another one could read c or they could read them in different orders but you're guaranteed across all of the things calling read they'll all read it in the same order but you don't know which one's going to return from read first yep no no question okay all right any other questions or comments with this yep yeah so when you do a read system call internally it will just go ahead and just get rid of anything that has been written so it essentially just copies it into that array and then internally it just gets rid of it so at this point as soon as I read howdy class if I call read on it again it's already gone I've already read it I can't get it back it's kind of up to me what I do with it now so yep for so the question is would it matter if I just close them all at the end inside the if then else so in this case the ones in the these two and the parent don't really matter because I'm not waiting on them anyways and the process ends immediately anyways so I don't really have to do that but remember in the child without this one here without closing the right end that's where I had the situation where I created an orphan that just stood that just stayed around forever so typically with pipes you want to be very careful about the right ends of the pipes and read ends don't really matter that much but me saying doesn't really matter that much don't take that as advice just do what I say not what I do sometimes all right any other questions so set up two processes and had them communicate through each other with pipes that's pretty fun all right any other questions from today so I don't oops all right so there's these process questions that I guess we ran out of time for but basically they just kind of give you a program with a fork and then say hey is there a defined order that these things happen in because well in this case there wouldn't be any defined order because it just has a fork and an exit so nothing's blocking waiting on anything different but question two all it did is throw in a weight PID and while that probably ensures some type of order so we can take this up later but we're not going to do it in three minutes so I'll take a question yep so this exit will terminate the process so that's the same as returning from main but I could call exit anywhere exits exits essentially system call exit with that special C stuff we kind of saw where you can register at exit handlers and stuff like that but eventually for the purposes of this exits like system call exit process done so all right any other quick things all right we can enjoy our cold extra two minutes in the winter yay all right so just remember I'm pulling for you we're all in this together