 Alright, welcome back to operating systems. So today we're going to cover some odds and ends and do some more stuff with processes because you know, that's lots of fun. So odds and ends I looked at your feedback for lab zero and some of you are very ambitious and want to create your own operating system. So that's not where this course is aimed, but if you want I will guide you to a resource and we should be able to read it and pretty much understand all of it by the time we get through this course so there is from the MIT Graduate Operating System course there is this kernel which also has some user utilities that compromises the entire operating system in not that many lines of code for a RISC-5 processor that again you should be able to read at the end of this course. You can, there's instructions I put it in there to get this repository and run it. You run it as a virtual machine within your virtual machine so that's really cool. So there's a bit of inception there. So if you want to play with it, you can go ahead and play with it. We should be able to understand it. It's a re-implementation of a very early version of Unix that had less features. So for example, one thing we can now read that is from that is probably their NIT program. So we talked about yesterday what a NIT would have to do or sorry we talked about that even in lecture five what a NIT was supposed to do. So here's what their NIT does. It opens a console and checks for errors and everything. So it just opens a console if it's not already open that would get file descriptor zero and that represents your terminal. In this case it's just a serial console that represents it because this is a very simple operating system you probably dealt with like memory mapped serial devices in computer organization hopefully. If not just a simple device where if you write to a specific memory address send some information there. So then what it does is just copy that device for standard out and standard error. So it just duplicates that file descriptor and gets a new one. So file descriptor zero, one and two will all refer to the same thing. And then it's just a while true loop. It starts a NIT. It forks, gets a process ID. And then if work fills, well something bad has happened because all it's trying to do is launch a shell. So if that fails and it fails and your operating system just doesn't work, otherwise if you get a process ID if you are the child it does an exec which is basically exec VE that will just start running the shell which is just a program called SH. And that's all it does. And then if it fails it just prints off it fails because otherwise exec should transform this process into another program and start running it. So this should only print if it fails then we get an exit. And then there's a big loop because remember and it has to spawn all the processes and then it gets to adopt everything and call wait on it. So we saw the spawn part and this is just a big for loop that just does wait over and over again. And it has a check to see if well was the child we just waited on did it represent the shell we created. So if it represented the shell we created we just restarted. So it just creates another shell process for you to run and that's all it does otherwise it is something it's not really responsible for. So it just takes away process ID doesn't do anything with the status or anything like that. So it just adopted a process it doesn't do anything then just goes through the loop again to wait again and again. So this thing has two jobs to launch a shell and if you close it it relaunches the shell otherwise it just waits for everything and that's literally all it does. So if you want to experiment with your own operating system you should be able to read that by the end of this course but again that's like an MIT grad course so they hack on it a bit. So you could hack on it a bit but unfortunately we do not have all the time in the world in this course. So a few other things when we talk about process scheduling we can go into really old historical systems that only had a single CPU oh yep yeah yeah so the question is why does a NIT have an infant for loop for wait so because you don't know how many processes you need to adopt so it just constantly is an infant for loop that just waits and waits and waits and waits again because you have no idea how many children you have to wait for right they just get adopted yeah so to go on the historical thing so back in the day when you only had a simple CPU and the CPU didn't even have a kernel mode and a user mode well sometimes your operating system didn't need to be that complicated so they had Uniprogram RAM in operating systems and they only ran one process at a time so because of this they were super simple two processes can't run in parallel they can't run concurrently no matter what you just launched them wait for them to be done and then you launch another one and that is like DOS back in the day if you've ever heard of that term so everyone here is too young for that that existed you know when I was a kid so multi-programming is what we have now where we can allow multiple processes two processes can run in parallel or concurrently those mean the same thing to you right now but as we go on through the course they actually mean slightly different things modern operating systems just want to run everything in parallel and concurrently as fast as possible use as many cores as you have on your machine as fit as efficiently as possible and that's the job of the operating system so the specific part of the operating system if you have multiple processes and say even modern ones you could share a single CPU core well what do you have to do with that well we already know that to create a process operating system has to load it in the memory has to know when to execute it and set up virtual registers and all that stuff when it's in that ready state the scheduler is the thing that decides what to run and when which we'll see in the next lectures but first we can talk about the mechanics of actually switching between processes which we kind of talked about earlier so to go back so the core scheduling loop just changes whatever process is currently running on a specific CPU core so the steps it has to follow is pause the currently running process so stop it from executing save its state so you can restore it later so this would be all that's registers for example then it would ask the scheduler what is the next process I need to run and then it would load that processes state load all those registers that were saved set the current CPU registers to equal those and then just tell it to run so two ways of doing this we can let processes schedule themselves and kind of opt in for pausing or we can have the operating system maintain control so in something called cooperative multitasking that means the processes have to use a system call to tell the operating system hey it's okay to pause me now otherwise you don't get paused you just keep on running until you exit or you eventually tell the operating system you're about to pause and true multitasking the operating system maintains control and pauses processes and you have no say about it so for true multitasking which is what every modern kernel does operating system can essentially give a process a set time slice and just whenever it's done its time slice you say okay you're done now I'll see if something else needs to run and that decision is done by the scheduler which again we'll see later the other option if you don't do that is you can use interrupts on the system so your CPUs you can program in timer interrupts so you can just before you start running a program you can have your operating system tell the CPU to generate a timer interrupt after some given an amount of time and then your kernel would get that interrupt and then it can go ahead and pause the running process and switch it out and do whatever it needs to do so the process of swapping processes is something we'll call context switching and we already said at minimum we'll have to save all the current registers save all the values using the same CPU as we're trying to save as you can imagine there's no good way to do that other than using memory and using a stack or something like that sometimes it gets really tedious so you could look at the the MIT code to see what they do it's just essentially push for every single register value and then for a load it's you know a pop for every single register value in the opposite order and you can imagine there's a lot of registers so unlike x86 machines there are so many registers that will make your head spin so instead of that there's some hardware support for saving state but you might not want to save everything so like there's a set of registers for special floating point operations thought and so on and so forth which don't make sense to save if your program doesn't use floats at all for example because this context switching is just takes time where you're not actually executing anything worthwhile so it's pure overhead which overhead just means it is work we're doing that is not part of our goal so our goal is to just run applications anytime I spent saving registers to switch between applications that's time I'm not actually spent running applications so usually there's a combination of like hardware instructions to save sets of registers at a time and then software to save as little as possible so check if this program has ever used floats or ever used any of the big vector operations or something like that so that's context switching so we can go into more more IPC because you will see this in lab two which is now posted oh yeah and lab one grade should be back pretty much everyone got a hundred which means it was probably too easy so lab two actually you have two weeks for it and it will be much more difficult than lab one but it will still be completely doable so don't worry about it so the new API you might see is something called pipe we looked at it a bit before when we saw that little line operator between processes so the pike system call creates two new file descriptors for us so like all the C wrappers it will return negative one on failure and set error no otherwise this system call returns negative or return zero on success and it takes as an argument an integer array of two ints and it will set those if it is successful so pipe forms a one-way communication channel using file two file descriptors which it sets from the arguments and pipe FD whatever the file descriptor is at index zero is the read end of the pipe and whatever is at index one is the right end of the pipe and how that works is whatever data you write to the right end of the pipe well if you call read on the read end of the pipe you'll read that data so it's basically just writing into a kernel controlled buffer that you don't get to see and any data you write into it you can read out from the other end so yeah and just as an aside first so you might see using an ampersand in your shell so if you use an ampersand at the end of the command your shell starts that process and returns immediately so just forks and just lets that process run and it outputs the process ID to let you know when it's finished the just the line character creates a pipe between two processes and I think I posted this on discord but just so you have it this sneaky bash command is actually a fork bomb so it defines a function called colon which takes no arguments and then creates a call to itself piped with another call to itself in this ampersand creates that process in the background and then there is a semi-colon and then it calls it so it's basically going to do while true fork so if you really don't like your friends and they're using Linux you can tell them to execute that and their computer will die yep yeah so some of them will know that hey there's like a big tree of processes caused by common ancestor and you could figure it out and kill them all you could have even written a processor to kill all descendants of something too maybe that's a good thing to write but yeah I gave up I just restart my computer after you guys broke it so yeah warning do not run this command on the actual machines you can probably only launch like 30 processes max or something like that so it won't damage anything too bad but it'll be annoying to clean up alright so whoops alright so let's just see an example me me talking is boring so here is a pipes example to see how to use a pipe so I will create an array of two ints to represent my file descriptors as I give as an argument to pipe I will call pipe with the file descriptors and I have a check wrapper which checks if it returns negative one otherwise it does something with error no and prints an error and all that stuff so I just use this to make my life easier and keep checking for errors so creates two file descriptors I have a right end and a read end so what am I going to do well I'm going to fork and then check if fork failed otherwise I will do two things so this so remember what fork will do will copy the complete process everything no matter what that also includes file descriptors so here pipe would open to new file descriptors so I'll probably get something like file descriptor three which would be equal to FDs zero which would be equal to the read end of the pipe and then for the other one I would probably get four in FDs one which would be the right end of the pipe so I would have those file descriptors open for my process and then whenever I fork my child process also has these file descriptors open yep oh so check here we can just read it check will return or do the call and not do the call check will check if the return value is negative one if it is it uses the error no prints a message and then exits my process so it's just my error checking just so I don't have to write if it equals negative one over and over again I was right check over and over again okay so here we have a fork and then at the time of the fork they're both copies well yep so yeah so the question is why do I exit with error instead of just error no itself and the answer to that is I want to exit and set my status equal to the error no that caused my issue initially so p error if you read its documentation can actually set error no as well so I just save it before I call it and then exit with it exit with the same value because that's the first one I don't want to see whatever Pierre did yep no so pipe just takes pointer to an array and writes two values to it no I just know that I'm going to get three and four because by convention I've said yeah by convention 0 1 and 2 are open and it will use the lowest number available so I know it will if it's successful I know I'll get fd3 and fd4 and that's the variable I can refer to them as and that's what they are okay so in the parent I will create a C string where that just says howdy child and I will take the length of it and then I do a write system call so I do a write system call to the right end of the pipe with however many bytes I want to write and the string and then I just check the number of bytes written and then the parent would just close the file just close its file descriptor so the parent would close three and four close the other ends of the pipe whatever you want to say and then finish in the child process the child process declares a buffer of that magic size does a read system call on the read end of the pipe and then prints off whatever the child just read so when I execute this whoops I see howdy child yep yeah so the child would also close it so the child after the child prints off child read it would also close their file descriptors so the child didn't open the file descriptors we open the file descriptors before the fork yeah so they're the file descriptors are independent are independent for the process so if you know if I just immediately closed you know I immediately closed fd's one here that doesn't affect the other process it would just close it there that's it no so yeah so the question is well you know what might happen is the parent writes to it then closes them then how would the child read from them so remember that the file descriptors they're independent for each process so before we fork we create it and then we fork and then both of those processes have file descriptors three and four open if I close it if I close file descriptor three in one process it's still open in the other they don't affect each other yep so here I don't assume anything so remember read will block for data so if for some reason the child executed first it would just call read and there's nothing to read yet it's still waiting for data so it would get put to sleep and then eventually the parent would run and then write to that file descriptor and then the child would return from read here I'll do this first yeah yeah so the file descriptors are independent but you can think of file descriptors as like pointers so in right before the fork right file descriptor three points to the read end of the pipe and file descriptor four points to the right end of the pipe and then when you fork in both processes three is pointing to the read end of the pipe and four is pointing to the right end of the pipe so even so yeah they refer to the same thing so the parent can write to that right end of the pipe and then the child can read from it because it refers to the same thing yeah yeah the pipe call creates file descriptors yeah so fork remember exact copy at the time of the fork so if you file descriptors open your child up file descriptors yep so what happens if you call read on the wrong file descriptor like if I call read on FD1 yeah so if I so the way that the pipes implemented there's like a read and a write end so if you try and read from the right end it'll just return an error from read so you just get an error all right so we have another question somewhere yep yep so their file descriptors are pointing to the same thing yeah so yeah so say both processes file descriptor three is pointing to the was it the read end of the pipe right so if in the parent I close file descriptor three then it is now no longer pointing to the read end of the pipe but it's still this process it's still pointing to the read end of the pipe so these do not represent files so pipe essentially you can think of pipe as like a kernel managed buffer where whatever data I write to it I can read from the other end so it would be stored in memory by the kernel and the kernel makes sure that you know yeah it would be virtual memory but the kernel makes sure that you know you have to have a file descriptor to access this memory because it's part of the pipe and so on so if you want to write to a natural file you just have to open a different file descriptor right that actually represents a file because this just uses file descriptors it's reading and writing this doesn't go to a file it goes to some buffer and managed by the kernel if I want it to go to a file while my file descriptor needs to represent a file and then what where does the parent write to exactly like ft3 yeah so they're not saved to files so remember the pipe system call just creates a buffer managed by the kernel that you can access through two file descriptors a read side and a write side and the way it works whatever I read write so here in the parent I wrote howdy child and then in the child I read it so I just so any information that other process wrote to it I can read it and that's it so they don't represent files or anything even though they're called file descriptors file descriptors are just a number that represents some resource that I can read and write bytes to so yeah was and then what's the question so any more questions about that all right I have another thing so let's see how much we do understand it whoops so what happens if I do this so now the parent shot doesn't yeah I'm not writing to the buffer so yeah the buffers empty so what's going to happen in the parent so the parent just allocates a string and then closes immediately right what's going to happen to the child yeah it would get to the read system call is that what you're gonna say yeah that's true too also an orphan so let's assume that the child runs first before the parent does anything right after the fork then the child is going to reach this read system call and it's going to be put to sleep waiting on some data and it just gets put to sleep and then the parent runs it doesn't use write so that read will never return and then the parent goes ahead closes the file descriptors returns parent is now dead child is still waiting on this read so it should be an orphan right so if it's an orphan if I do pit of pipes and I think I screwed it yeah so if I do pit of pipes guess what it still exists because that is my orphan process I just got reparented to a knit and it's still doing that read system call and I never waited on it I didn't do anything I just created a poor little orphan which is now trying to read from a pipe which nothing will ever write to it ever again yep yeah the question is is there a timeout on the read and the answer to that is by default no so the kernels really smart about pipes so the kernel will know that if no process on the system has a file descriptor that represents the right end of the pipe it knows that process can no longer get data from the read end of the pipe so then whenever you call read and it's not possible to get any more data read would return zero which represents you can't get any more data and then it would close no problem the problem is in this case the child itself still has a file descriptor open to the right even though it doesn't use them but because it's still open even though it's the only process with it and it's waiting on that pipe it's still possible that that process could write to its own pipe and therefore it is possible it might get data and then read will never return so if I wanted to okay first who remembers I want to get rid of this stupid process how do I get rid of it yeah kill or kill dash nine let's try and be nice first so the normal way is kill which sends a term which is a nice way of saying it and then I look at it's gone yay I cleaned it up I didn't have to resort to violence so although I mean I could kill dash nine and resort to violence immediately yeah so here if I wanted not to create an orphan process well I could probably do things properly or let's just move this so in the parent I'll close both file descriptors whenever I'm done and in the child before I do the read I will close the right end of the pipe because this is why it gets in good practice to close file descriptors whenever you're done using them so the child will not use the right end of the pipe so you may as well close it immediately so here it will do a read and then let's go ahead and clean up fd0 if I go ahead and run that yeah unused well I get child read nothing and that was the output of my child because read returned zero and if I check I didn't create an orphan cool so we fixed it yep so yeah the question is if I move here this here basically why do I get an orphan so the kernel is smart so the only way that you can read data from a pipe is if someone writes data to a pipe right so the kernel knows for every process how many of them have a file descriptor that points to the right end of the pipe and the rule is as soon as there are no more references to the right end of the pipe the kernel knows you can't add anything to it so that would send like an end of file notification to it where read return zero so in this case where I had the orphan and I had the close at the end well I did a read and I still had one process with the right end open and it was me right so it's super unlikely but I could have had a single handler that wrote to that if I wanted to or something odd like that right even though it only had one core or something like that it's still possible through signals so it has to be conservative about it yeah yeah yeah so there's a question what does kill all do so kill process by name so yeah it just kill all I believe will just kill takes a process ID so kill all will take a process name and then send it to all processes that match that name so you could write this by now right so it just opens proc looks to see if there's a match and then it can look up its process ID and actually do the system call that actually does that yeah so peak call peak hill it lets you use just a process name so no one really knows process IDs yep so the file descriptors are independent in the processes so even even if the parent close them immediately and executed first the child would still have file descriptor 3 and 4 open so they don't affect each other in the parent like if I like yeah I close both of them in the parent because it doesn't really matter in fact I could just not close anything in the parent because the parents going to exit anyways and they'll get cleaned up by then yep yep yeah file descriptors literally they're the processes are exact clones of each other at the time of the fork so all the file descriptor numbers are the same for both of them and what they point to and what they represent are also the same yep so the question is can I write to those file descriptors from outside of the terminal so for this there's no way to actually aside from that system call that return these file descriptors like we said this doesn't actually represent a file or anything like that there is something called a name pipe that will like add a path that you can access it to and then other processes could use that name to open these file descriptors but for this the only way to share these file descriptors is through just getting inherited by fork yep so the file descriptors are all managed by the kernel so you don't get to pick yeah and then what the file descriptors yeah so ideally you would have to close them in both processes if you're being nice and closing them as soon as possible so in this case whatever I did the correct thing well at the time of the fork file descriptor three and four open but if I go back here and look at my code as it was before well at no point in the parent do I ever use FDs zero so what I should do is close it immediately in the parent because I don't use it and then I'll write to FDs one and then close it when I'm done with it and then in the child it should be the other way around right it never uses FDs one so I should close it immediately in it and then do the read and then close it whenever the child's done with the read right so I should close my file descriptors immediately yep like the variables yeah so there's a question if I like assigned it to something else who so if I did something like that remember the kernel is just taking the file descriptor number and it represents something to the kernel so right now I have file descriptor zero the four open if I just make up five and try and read the five was five represent nothing what would you expect to get an error right and be like file descriptor five is an open doesn't represent anything I don't know what hell that means okay so cool so let us get into what you can expect to see on some exams and like problems they like to ask you so fork it's really confusing real quick so here's from this so for each so this was one of the questions for each program shown below state whether it will produce the same output each time is run or whether it may produce different outputs when run multiple times explain why it behaves like this and it does a bunch of forks which is of course fun oops iPad so wow okay so in this case anyone want to tell me what this does immediately who's yep how many times will I see how many processes will I get what output will I see okay so you got 16 64 wow so remember this is an exam question you have to 16 so this is an exam question so student reasoning do I want you to print out 16 things actually 16 is not too bad 64 is bad 16 might not be too bad okay if there's 16 what would each of the 16 processes print yep the I value so what but it starts at four and goes down to zero so what am I going to see like four four three two one for 16 processes oh time 16 times four so see four fours four threes four twos yep the child become okay let's just go let's let's start at main so so if we start at main let's say we have process ID 100 that starts executing main just because I like the number 100 so this process starts executing main what's it do first the clear is a variable call I and sets it equal to one so I'll just set what it has I equal to here so I equals four okay it will check this condition does I equal negative one or does I equal zero no so we'll go into this loop so this process will go into the loop and then call fork what happens when I fork when I create a child process ID 101 probably is it going to have an I what's the value of I and here I'll write in the corner here the parent child relationship so process 100 had a child called process 101 so we create two processes which one's going to run first don't know let's for the sake of argument let's say the parent runs first so the original one runs first what would fork return for it 100 101 whoops that was a giant eraser so its fork would return 101 and then it assigns it to a new variable called PID okay just so we're here the child what would it return from fork zero and it also assigns that to its variable PID yep let's get through it yes we got factorial response let's just let's see what happens so we can argue about let's say the parent starts executing next so the parent starts executing is PID equal to zero no so it would hopefully fall in this else branch and then it would what would it do print for and then exit so it would be done so right doesn't exist anymore okay so we don't have to argue about it anymore great so what would process 101 do so 101 was at the fork so it would check this if statement is PID equal to zero so it would take its update its eye from four to three and then it would fall out of this loop so it would go from here and then go all the way back up to check the condition so is I equal to zero no so we have to go into the loop now it forks so process 101 is forking so what process would it create 102 this is where we're glad we're engineers we don't have to like name children all the time because I'd run out of names by now what's its value of I and PID is a local variable that doesn't exist yet in either of them so it would have been refreshed for both of them as soon as it exit the loop so now in process 101 what does fork return 102 so PID equals 102 for this process what does fork return for process 102 zero can see this kind gets well not tedious but it's not terribly fun all right so at this point which one runs first no idea let's say 102 runs first for argument sake so it would go it would check right here this if condition is PID equal to zero yes indeed so it would decrement I so it would decrement I from three to two would it also do this no why wouldn't it not do that I yeah there's separate processes now they're independent of each other so one updating its eye doesn't update the others I so so this is now to so imagine telling yourself that in 105 that updating I doesn't update I yeah so how much we've grown so in here yeah we just decremented I and then it would go back up to this line is I equal to zero yep because they're independent right so the reason they can share file descriptors is the kernel is managing that right it's not represented by memory or anything it's a thing represented by the kernel so that's why I can do it yeah all right so where were we yeah so we're at the loop does I equal zero okay so what does process 102 do then it would call fork what what would process what would we create yep 101 not dead yet because we said it didn't execute first we said 102 did this time because we didn't know which one was going first so I just said yeah let's just say 102 goes yeah 101 whatever it starts running we'll just print 3 and die but we haven't said it's running it so okay so 102 just called fork it created 103 so PID got re-initialized so what does fork return for 102 103 what's the value of I in 103 2 what did fork return for that process and here is our parent child relationship so 102 was a parent of 103 so we just have a big linear line of processes so now what's going to happen well we can see that if 102 executes well it would check this statement is PID equal to zero nope so it would hit else and then print off its eye so this would print to and then exit and it's dead all right well let's say 103 goes first let's say we get the fun order so 103 goes is PID equal to zero yes it is so we would update I to be one whoops two to one and then it would go for the loop and eventually it would create so I'll do this fast mode so it would eventually create a new process that gets ID equals to one this would get process 104 this would get process ID zero in this case this process it would check it I goes from one to zero then it would go to the back of the while loop and then it would check if I is not equal to zero now it is equal to zero so this process would just quit immediately and the other processes we kind of know would this would print three and this would print one so we would get four three two one in any order we don't know what order they would be in so yeah so the one you should do that it's actually easier is it just puts a weight PID in the else statement so then you should be able to answer well what happens now is there a specified order is it still the same thing so that's it just remember bone for you we're on this together