 All right, apparently people like review less, but I guess this should be quicker At least give you a good idea what the midterm will look like speaking of the midterm. Hey, they didn't fulfill my request Joy, so the midterm is March 1st at the original time in that crappy crappy room, so yikes so Yep, sorry Original time was like 6 p.m. March 1st Yeah, so the midterm will actually start at 6 30 So to give you time to like because some of you might have a midterm that ends at 6 So give you half an hour to make it go the bathroom do whatever So be like 6 30 to 7 45 so that'll be fun Sorry about the room not my fault. I asked they ignored me, so I guess we have to live with it So speaking of the midterm. Let's essentially we'll just do a quick recap of everything we learned doing sample midterm and then Feel better about the midterm because it probably won't be too bad so again three major concepts in this course virtualization concurrency and persistence and so far we've only really gotten into virtualization, but we've learned a fair bit about it and we have next week we'll start virtual memory and That will also of course be on the midterm and that's generally where things start going even more downhill, but hopefully we have enough Processes typically catch people up. So hopefully we've got enough practice with that and especially in lab 2 You'll get lots more practice with processes and don't try and over complicate lab 2 So it's you don't have to create pipes or do anything like that That's all done for you if the test case uses pipes It creates its own pipes you just have to set things up nicely and kind of see the weird thing virtual studio or visual studio does So first concept we saw What the kernel was and it was essentially any code that ran in kernel mode because on our CPU There was there's different modes of operation that give you access to different instructions There's user mode which has been all your normal processes run in user mode It's everything we know and love and then our CPU also has a kernel mode That has special instructions that can interact directly with the hardware do Some fun memory stuff that we'll see next week and all that sorts of fun stuff And there was a nice interface between them There was that system call interface that transitioned you from user mode to kernel mode And we saw every program has to use this interface and the nice thing about that was well We have this nice little tool called s-trace which will tell us everything a process does and we can go ahead No matter what language it's written in doesn't matter. We can see what it's actually going to do on your machine For the midterm, we don't care about the file format of elf and any of the details That was just like so we understood every single byte so you don't have to worry about that but you should know the difference between API and ABI and how we broke an ABI and just the different kernel architectures, so a microkernel versus a monolithic kernel that just has all the responsibilities versus a microkernel that has like the minimum set of responsibilities, which the minimum set of responsibilities is like basic IPC Scheduling and managing virtual memory, which again, we'll see next week so we went through processes Unix systems just clone processes every process has a parent-child relationship you can only create a new process with fork and After the fork both processes are exactly the same the only way to tell them apart Is the return value of fork so in the child it will be zero in the parent it will be greater than zero and that will be the process ID of the child and then at that point you're up to the scheduler the scheduler decides when to run either process and In a lab 3 that will be coming much in the future You will also be doing a little bit of scheduling and it will be lots and lots of fun So on top of that creating processes while you have to properly manage processes, which is lab 2 You're basically just managing a bunch of processes So operating system has that strict parent-child relationship You should be able to identify and prevent zombie processes and orphan processes So your lab 2, you know the idea is you won't have any zombie processes because we're all Your the idea of that library is it's a very good and responsible parent So it will definitely wait on all the child children processes and for orphan processes That's the extra like 20% of the lab where you become The awful name sub reaper and then you can actually adopt a bunch of different processes too. So that's like 20% of lab 2 So then we saw basic IPC Read write system calls just read some bytes to somewhere file descriptors are just numbers If it's easier to think of a file descriptor from like the last lecture Think of a file descriptor as kind of a pointer. So it points to something So whenever you fork you have all the same file descriptors in either process And then after the fork they're all independent But they're all essentially pointing to the same things So that's why you can actually share some So they share the or the pointers are independent, but what they're pointing to is actually shared between the processes So we saw a redirecting file descriptors for communication through like opening files so we can replace File descriptor one with a file and suddenly our program just outputs to a file instead Which is kind of cool and you can do all sorts of shell tricks with that And then we saw in the other example where we messed with pipes to replace the file descriptors with pipes So And then we saw signals too. So they're like interrupts for user processes The kernel has to handle three different kinds of interrupts because they'll also come from hardware In user mode all you deal with is signals and They can represent they can come from hardware, but they don't have to So we saw that you get a signal if you get you know if one of your children Parish you get a signal if someone presses a keyboard you get a signal if Someone wants you to exit and you also get a sig kill if someone really doesn't like you and then you have no choice Your process is now done Then we saw scheduling scheduling at least the first lecture was pretty basic It just had some trade-offs. So we saw first come first serve most basic scheduling algorithm think of being at a line at any Store or anything like that when you go to cash out Then we saw shortest job first which prioritizes Processes that have the shortest job first in order to minimize waiting time So how long all the processes combined wait together? Then we that was for non preemptible processes. So we can't switch between them and then we saw well What about if we can preempt processes and actually switch between them? So that was when we tweaked it we did shortest remaining time first So whatever process has the least amount of time left to complete then we would go ahead and Switch that in and start executing that again Most of these are just done after the fact because in practice We don't know how long a process is going to live for and it may Request some resources that it needs to wait for and it's way more complex, you know in real life Then we saw round robin which optimizes fairness and response time But we saw there was some kind of engineering trade-offs with the quantum length or basically the length of the time slice So if it's short you get better response time Everything cycles through quicker, but you also context switch more. So that's the act of switching between processes Which is essentially just a waste of CPU time. So if that is Too high of an overhead. So In other words, you're spending too much time switching between processes and that's a bad thing So you don't want it too small And you don't want too large because if it's like infinitely large, it's essentially first come first serve Which is bad has all those problems starvation being like a major one Then we saw a more complex scheduling that just added more features. So we saw there's a priority feature We could screw ourselves over because if processes depend on each other Well, if a higher priority process depends on a lower priority process Then that's what we call priority inversion because we should at least temporarily treat that low priority process as a high priority process because of that dependency and that can go on forever There's also different trade-offs depending on the processes So if you're interacting with the processes, you want it to be interactive Be good get a quick response out of it While others you might just care about throughput like how many you can complete in a day or something like that um, so You may have different goals between processes and the operating system would know about them necessarily Then we saw well, that was Just all the complications for having a single cpu core to schedule things and then we introduced multiple cpu cores So we could just do the same thing, but that's bad Or we could do a scheduler per cpu core, which is a bit better But you might get some imbalances and then we saw a way to deal with that by just you know changing What processor or what process a What core a process is running on just depending on how busy things are Then we saw the completely fair scheduler which tries to model ideal fairness of just giving every process an equal amount of cpu time and that's like the actual scheduler used in linux and Most of the other systems are going to have something similar Then we saw libraries. So this is That was good because lab two. Hey, you're making a library So we learned what a dynamic library was how to compare them to static libraries Static libraries are basically just a collection of code And when you compile your program everything gets copied into your executable So you can't change anything from then going forward You'd have to recompile if there was any changes while a dynamic library. We'll just look it up at runtime And if you update the library and you run your program again You use the updated library, which is both good and bad depending on who wrote the library so we saw that if you break an abi in your library really bad things are going to happen and you can't expect anything To work anymore. So if You do have a library that you depend on and they break that you will have a very difficult time debugging so there's like kind of an ebb and flow with What programmers prefer if they prefer static libraries or dynamic libraries and it seems to just cycle every few Decades or something like that So some people like static because it will work until someone, you know bricks it Then they have to re or yeah static just works until everything gets like super super out of date And then people start complaining And then they all switch to dynamic where they get all these updates But then everyone starts doing that and they write really crappy code and then they break everything So everyone is like, oh, yeah, I want things to be the same again And that cycle just kind of kind of continues forever All right, so let's get into the midterm and something more fun So that's the link that's the latest midterm. I have written. So this is like the style of things I do uh, this was closed book so yours Is essentially open book where you get a piece of paper that you can do whatever you want The only reason it's not open book is because I don't want you to kill like 10 000 trees. So One piece of paper is essentially open book and you know, it actually gets you to study a bit too. So The only thing we can't answer already is the virtual memory question, which is last So we'll skip that and we should be able to answer it next week All right, so here it is So there's just a Starts with like a few like one or two sentence quick short answers like what's the concept of virtualization has applied to operating systems So that's just making something believe it has access to all the resources when it might not so we don't know about virtual memory, but We don't know how it's actually implemented, but you can actually still talk about virtual memory. So And with virtual memory all the processes think they can access the entire range of memory when they actually are sharing physical memory You could apply the same thing to a virtual cpu all this A process just thinks it's the only thing running on the cpu when realistically they're all sharing the same cpu cores Then the next one was just kind of an easy one So what service would you find a male monolithic kernel that's not in a microkernel? So microkernel the basics that a microkernel has to do is Scheduling which is like switching what processes to run because it needs to maintain control and being that more privileged mode Then virtual memory it has to do in basic ipc So if you take the opposite of that Well things that would be in a monolithic kernel, but not a microkernel or like file systems device drivers or advanced ipc Then us one What should you use to monitor all system calls the process makes hey s trace And then next one that we saw yesterday if you include a c struct in your library's header Why shouldn't you ever trade it or change it? And the answer to that is well, you're essentially changing the abi of your library because The way c lot lays out that structure and memory depends on the order of the field so if you change the order of the fields or anything about the struct it will change how memory is laid out and If you give that to a you know, if you put in a header file or something like that And someone uses that while they're going to compile that Essentially they're going to compile that version of the struct into their code And if it doesn't match what's in your library's code Then really bad things are going to happen. So you either have to hide your struct and just give it a pointer which Remember at the beginning the solution to everything in computer science Is just add another layer of indirection. So that's essentially making a pointer hiding things And adding another layer of indirection there through pointers And then five where the two responsibilities of init. So we kind of know this now So it's first job. We know that when you boot up your kernel All your kernel cares about is Launching a single process and then as far as the kernel is concerned its job is done So it creates process id one which is generally something called init And then init has to make all the rest of the processes or you know through grandchildren or something like that So and then its second job is it's kind of like the default orphanage or however you would want to say it or reaper So it has to acknowledge all the orphan processes that make beseed or that it may receive So it just kind of infant it creates some processes and then infinitely calls weight And then Oh, the next one was page replacement, which we haven't got into yet All right. So this one should be more fun. So Here's some code Can we still see that So here's some code because this was closed book I said what fork does in the problem But I guess since here's open book and you've been doing lab two and a bunch of stuff I don't have to tell you what fork does So Copy of the current running process. Everything's the same Greater than zero process id of the child If it's zero it's the child process And then I said what weight does but we kind of know what weight does So we don't check and in this I didn't check return values or do anything like that. We just kind of ignored it So what were our questions? so Our first question was How many processes get created? So anyone has their guess how many processes actually get created here not including the original one that starts executing Yep three Who anyone disagree with three Same guy So yeah, the answer is three. So why is it three? So Oh, that is not for you Okay, it closed it. I don't know it didn't Okay. So why is it three? So through the question I said One way to argue about it to make it easier to argument Argue about it is the first process that calls that starts executing main is process id 100 So we'll have process 100 here and it starts executing main And we don't count that towards the newly created processes So it comes along first thing it does is call fork which creates a new process So we get a new process probably called So in process 100 we create a new process called process 101 likely and then well The only difference is the return value of the fork their copies Doesn't matter at this point because they're both essentially Essentially, there's nothing that executed main just ran So in process id 100 The return value of fork would be 101 because that's the process id of the new child And so pit one sorry would be 101 and then in the child pit one Would be zero because that's the return value of fork so now We would essentially have two processes 100 and 101 After the first pit one value gets set And then at this point you don't actually know what's going to run first. So we can just Pick one at random. It doesn't matter So let's say process id 100 Whoops runs first So sorry that's 101. So if process id 100 runs first Well, it's going to call that fork create a new process. It's going to be process id 102 probably And then it would set its value of pit two equals to 102 And in process id 102 Well, it would be a copy of 100 at the time of the fork So it would have pit one equal to 101 Because it just copied it it existed before And pit two gets set after the fork. So it is its independent value and it's the return value of the fork Which in the child would be zero So then at that point pit Or process 100 and process 102 are both here and then process id 101 would probably execute Call fork create a new process called 103 And the only difference and it's a copy at the time of the fork. So pit one equals zero and then That's the only thing like it's copied and then after the fork Well, we set the value of pit two. So in process id 101 pit two would be equal to 103 And then in process id 103 pit two would be equal to zero So now We've processed 100 process 101 process 102 process 103 And they are all at this point And we've made it through all the fork. So that's all of our processes. So process id 100 was our original and it spawned Three other processes 101 102 and 103 So any questions about that? And if we'd like draw the parent child relationship well process id 100 was the first process and then first it created 101 which was its direct child And then afterwards it created 102 which was its direct child And then process id 101 is the one that created 103 so 103 is a direct child of 101. So that's kind of what our all our parent child relationships look like So any questions about that? Okay, then we have We essentially have you know We have our four processes in total and Let's see the other questions So here we go. So how many new processes get created three? Does process id 100 or any of its children create any orphan processes? And why or why not? So the answer to this is new because the first process creates two children and then We can see that well process id 100 would call wait One time because process id 101 here It would execute because We're looking at its value. It's pid one is 101. So it would wait once and then wait for 101 to die and then The next thing it would do is wait Go into here because pid two is greater than zero. So it would wait twice. So Two children two weights So that matches up. So it wouldn't create any orphans Then process id 102 Well, here it has one child. So it has that one 103 child so 101 would call So would process id 101 go into this if branch So process id 101 would not go into the if branch because pid one equals zero. So it wouldn't it skips that weight But pid two is 103. So it would go into that branch and call wait So one child one weight all good And that's actually all the weights we need but We can so It doesn't create any orphans But then there's the last part which says There's an issue with this pro program when you run it. It seems fine. And that's because we don't check for any errors What's the issue and how could we fix it and you can just describe what you need to do To fix it instead of writing the code so Can anyone guess what my issue is what error would I actually see if I like Exhaustedly check all the system calls and all the errors that would happen Give you a hint. It is related to to that orphan thing. So we check these two processes so The hint is what would this process do that is kind of weird Yeah, so it so process id 102 Well, it's value of pid one is 101. So it would go into this branch and call weight And does it have any children? No, so it probably shouldn't have called weight if you actually checked for errors Weight would have returned an error that said hey, you don't have any children So why the hell are you calling weight? like So that's essentially our error and to fix it. Well Well What we could do is just make sure we just call weight once so the easiest fix is if we detect we are the New child process we just reset pid one equal to zero because it just means we're the child process Or you can do any number of things just to make sure you don't call weight Don't call weight more times than you need to so Fairly easy fix so any questions about that Okay, cool. We'll look at more then so Here's some basic ipc code. So this is Oh, yep So process id 100 like if you just look at what process id 100 does It starts executing main calls fork which creates a new process so it would get and then fork would return the process id of the newly created child and we just Set pid one equal to that So if we just go about let's just assume process 101 So all it's going to do is call fork and then fork would return 101 so we set pid one equals to 101 and then it would call fork again creating another child And then this fork returns 102 so pid two equals 102 and then just keeps going Yeah Yep No because if you set your variable you're independent right because of the fork So you wouldn't affect the first original process Yeah, so anything after the fork remember you're independent. I'm only getting the value of pid one Because it already exists by the time I fork For every case except the first one where pid one doesn't exist yet But like we saw any changes you make after the fork independent In the processes because remember we had that example where we even printed the address of some variable And it was the same in both of them And we changed it in one and it didn't change it in the other. Yep No, it just gets an error from the weight system call that says you don't have any children So it just boots you out immediately which If you check for errors you might encounter that in lab two if you have screwed up your code So that's why it's good for to check to errors because if you have too many weight calls, that's not good and you Depending on how you do lab two you might actually expect to at some point Encounter that you have no children left And there's no other way to know other than reading the reading the error and knowing that you got the correct error so there's like Yeah, there's at least two major ways to do lab two So you get to pick your poison. So that should make lab two more fun All right, any other questions Yep So weight's just weight's just a system call So the kernel will go ahead and not return that for the usual weight until A child is dead So w status is just a way to get like all the information out of a dead process back to the parent So in this case, uh, it would write a value to it, but I never use it So in lab two one of the things you have to do when one of the child processes dies is tell me its exit status So you would actually need to go through and read all that like we showed in the examples Yeah, in fact, I think I could have called weight was just null So yeah But a good thing to know is that weight by itself will wait for any of the children to die not a specific one And if you want to wait for a specific child to terminate then you use weight pid and then give it a pid argument that you want to specifically wait for Or if you want to turn weight pid into weight you give that pid argument as negative one and it'll wait for any child Okay, any other questions about this? Okay, the next one So we got more code More code more forking more fun So In this At least reading it real quick well, it looks like we Thankfully it's written in such a way that it doesn't hurt your brain through fork fork fork fork. So The main process would go through essentially create a pipe and then fork and then the first child would go ahead Mess with some file descriptors and then exec so we don't have to argue about the child anymore Then the main process would continue along create a new Create a new process which eventually calls exec And if it's the parent it just closes and then waits for two processes which Makes sense because our original main process Creates two processes. It waits two times and we don't care In what dies in either order? so that's Assuming no functions fail So it says we compile the program on the previous page executed as a new process again Assume that the system calls don't fail for whatever reason and then One process becomes ls one becomes wc, which is just like a word count program But we don't even have to know about that So explain how they communicate how the two processes communicate together You should explain in terms of each process and read and write system calls So since they both operate on file descriptors, so how do these processes communicate? And what is the communication? Yeah, so the way this works is well we create a pipe Here I can draw a better pipe than that Wow look pipe so we create a pipe And then we have a right end of the pipe which would be fd One and then anything we write to it It pops out of fd zero So we have a pipe that's you know Managed by the kernel and then we create two processes and in the ls process. Well in ls We make its file descriptor one point to fd one And then the rest of them are all kind of default So yeah in the ls process the for the first child We would dupe to fd one to file descriptor one So we're essentially replacing whatever was file descriptor one to point to the right end of our pipe And then we close it so we don't have any extra file descriptors open. Hopefully And then we exec lp it so whatever that process Tries to write to standard out it writes to the right end of the pipe And that will just go start filling up this pipe with a bunch of stuff And then in the second child. Well what we do is It has three Of the standard file descriptors, which we so far haven't touched And then the dupe to call would replace file descriptor zero to essentially point to the same thing as fd zero, which would be the read end of the pipe Wow, that's a big eraser So we set it up so that it gets the Read end of the pipe before we exec so that anytime it tries to read from standard in It would do a read system calling that and it would Read from the read end of the pipe and get anything that the other process writes to the other end of the pipe And then in the parent process Well, it closes both ends of the pipe because it doesn't actually need the pipe It doesn't use it and then it calls wait twice so Okay, so that that's the first part of the question explaining that hey, we set up the pipe so that in both the processes if it writes to standard out It fills up the pipe if the other one reads from standard in it reads from the pipe and empties it So any questions about the first part? So we are good with pipes. Okay then the The worst part is when you run this program. It looks like it hangs. So I give you a hint. Oh, yep Oh, sorry. I'm bad. Well the second. Oh, there's a question. Will the second if statement be executed twice The answer to that is no. So the second if statement Or which one this one? So this one will not be executed twice So what's going to happen? We can give things numbers again So process ID 100 We'll start executing our program Normal c execution so it gets some space Wait, so Just curiosity. Where does it allocate space for that file those file descriptor ints that array? Yeah The stack okay, it's everyone good with the stack when I say it like that so they get allocated on the stack Okay So those get allocated on the stack That like them telling you in first year doesn't matter right now, but trust me it's going to matter because When we get into more advanced topics That distinction will definitely matter So these get allocated on the stack And then our main process creates a pipe it would fill in those values with whatever as part of the pipe system call And then process ID 100 calls fork So it would create It would create a process 101 probably that is not It'd create process 101 Where they were both exact copies of each other Including what all the file descriptors that had open everything it points to So this would have the pipe open it would also have you know the values for fd Zero fd1 And they're the same in both processes And they would probably be three and four and then process ID 100 So in process 100 PID would be equal to 101 and in process ID 101 PID would be equal To zero So only 101 would go into that if branch and when it goes into that if branch It's not coming out So It goes into that if branch messes with the file descriptor and then calls exec ve So remember what exec ve does is it transforms this process and starts executing another program so process ID 101 would hit exec lp and I said assume that you know Nothing fails So it would exec lp ls so process ID 101 would just become ls and start executing that program ls and then I Wouldn't ever return from exec lp because it's executing something different now. It's not executing your code at all So it would never return from exec lp So process ID 101 would not hit the second if branch because it doesn't exist anymore. It's executing something else And then when ls calls exit eventually which ends terminates the process that process is done So how could it ever get back? Um then other ones say we have a parent with two children If the child writes to the pipe and the other will the other child be able to read from it So in this case we set up our two children. So we do have two children. We have you know process 101 which would be ls and then process 102 Which would be uh wc And we set it up in such a way that when process 101 writes to the pipe process 102 will see Whatever it writes to from the read end of the pipe So we created two child processes Set and set them up so that they can communicate so Our communications between our two child processes and what we saw last lecture when we had a sub process is We just communicated to our child directly. So there was parent to child communication which was less steps And in this case, we are essentially making the children communicate together So Yeah, sorry. So hopefully that answers that any more questions about that. Yep Yeah, if exec LP succeeds this process is starts executing a different program And yeah, the process ID will stay the same But that process will end whenever that ends So it starts executing something completely different so it never returns Yep Yeah, so this That is the second part of the question. So, yeah So let's go to the second part of the question Which just says when you run this process, it looks like it hangs why What would you do to fix it? So we got the fix So the fix is that hey, I didn't properly close my file descriptors so Yeah, so here I didn't close FD zero And in here I didn't close FD one But the major problem there is well Which one of those missing calls caused my program to look like it hangs? Because one of these doesn't really matter. The other one's really important. Yeah Yeah, so the Answer is this one is the important one So The right end of the pipe is the important one because remember if You're consuming things and you're just reading from standard in over and over again The only way you know you're done is if you get a zero out of that out of that read call, which means Yeah, in the context of pipes means that hey There's no one available. No one's alive to write any data to the pipe. It's not possible to get more data So whatever you empty the pipe, nothing else is coming in So it's closed for business so if you don't close file descriptor fd one In that same process. Well, that's the right end of the pipe open by the same process for By the same process, which is really annoying And it's just not going to ever finish because someone has the right end of the pipe open So that read will just sit there and wait and be like, yeah You know just wait a bit. You can have more data coming in later so I need to close file descriptor one in the second child in order for It to know that nothing else in pot is possible from coming out of the right And it'd be good practice to also close this one because I don't need it anymore Yeah, yeah so Yeah, so the question is after I called dup and then I Close fd zero. Am I not affecting fd one or sorry fd zero Yeah, so it makes it a bit easier to visualize it. I guess if you Kind of consider the file descriptors as pointers so Right before I do any dupe two calls it would probably look something like this Where file descriptor zero points to the original Zero whenever the process got made. I don't know what it actually refers to One is original one Two is original two three is the read and of pipe and then four is the right And so what dupe two will do is essentially take this argument and make It points to the same thing as fd zero. So in this case fd zero Here I'll put the numbers here fd zero would be three So if you go through the value, it's like make zero points to the same thing as three so I would take zero and then I would point it to the same thing as three and then It would also close zero So before it did that so this would now be gone. There's no way to get it back So it makes Zero point to the read end of the pipe and also threes the read end of the pipe So I have two file descriptors actually point to the same thing So if I read from three or read from zero, I'd actually be reading from the same thing And then the close of three essentially would just remove that entry Oops, it would just move that. Sorry. It would just remove that pointer So I still have a way to access the read end of the pipe through file descriptor zero But now if I try to access file descriptor three It's now no longer valid because it's closed and you can't It's just gone now and then if I remove file descriptor four I would just be removing that and That was the only reference to the Right end of the pipe. So now Nothing references the right end of the pipe and you can't get it back anymore So any more questions about that? It's probably easier to think of file descriptors as kind of pointers and then When you fork You have the same set of file descriptors and they all point to the same thing But afterwards if you do dupe twos or clothes or whatever You do it on your set of file descriptors. Whatever process that is so Process id 102 here closing its file descriptors doesn't affect the main original process Its file descriptors would be untouched And You know it's in good form that you close any other open file descriptor So your process only has file descriptors zero one or two open And there's no other garbage open and you'll see that vs code does not follow that because one of the things you have to do in lab Two is closed any file descriptors that are already open and you'll quickly discover some garbage that's open Uh, and yeah, you can discuss it under this chord because it Spoiler alert you'll have I think file descriptor 19 and 20 open because vs code I don't know it's written by microsoft. So they're not very good at linux All right I guess that's it. We're out of time. Uh, so just remember pulling for you. We're all in this together