 Let's get going. So today we're going to try to finish talking about fourth. And then we will start talking about synchronization, which will probably consume us for the next week. Last year I reordered the lecture somewhat to try to talk about synchronization a bit earlier in the semester so that it overlapped a bit more with assignment one. This year people are off to such a great start on assignment one that this didn't work out perfectly. But this will still hopefully help you as you finish up assignment one next week, particularly as you work on some of the synchronization problems. We're not going to talk as much about the primitives. Talk more about how you would use those primitives to solve problems so that dovetails kind of nicely with the second half of assignment one, which is solving the stoplight problem and the whale mating problem, things like that. OK. So as promised, the assignment one checkpoint. So at this point, assignment one is due a week from today at 5 PM. If you have not started your way behind at this point, make sure that I updated these. If you don't understand semaphores, you're behind. If you don't have, if all you have are working locks, you're a bit behind at this point as well. It should really be, remember, you need to start these assignments when we release them, not when you start to wonder about when the deadline is like next week at some point. If you finish CVs, you're a bit behind. Yeah, I see people laughing. If this is provoking peels of laughter, then you're way behind me. Either that or you're laughing is like, yeah, we already did all that stuff. If you've solved one of the synchronization problems, like you're probably in OK shape. And if you're working on reader-writer locks, people are definitely at this point. People in office hours today were at this point. So I think people are off to a good start. But if this is making you concerned, or if you're having, like, we're totally aft giggles, then maybe start coming off as ours. So the parts, and one more reminder, the parts of this assignment that really have to be rock solid are locks and conditioned variables. The rest, well, if you don't finish the rest, you'll lose points. But you won't struggle down the road. And I have personally signed off on several people's locks and CVs in office hours, which probably means that they have some subtle bug that they're going to find in a week, they're going to flame me on rate your professor or whatever. But anyway, I'm pretty sure that they're right, actually. I hope so. All right, any questions? Yeah, Rachel? Nope, assignment two, you need all of it. Yeah. Sorry. Yeah, I wish. All 150 points of assignment two are required for assignment three. So you can really get yourself into a sad place with assignment two, where for some students in the class, that's the last part of the course they ever do. Everyone else has gone on to assignment three and they're doing cool stuff and they're still working on assignment two because they started the night before or something like that. So yeah, don't do that. Assignment two, you need all of that to work for assignment three. Good question. All right, any other questions at the point? Yeah. When did Test 161 be up? OK, good question. So I was just writing an announcement on the forum and I didn't want to do these in class. But Test 161, we have an updated version of Test 161 that we're going to roll out today. We also have some changes that you need to merge into your OS 161 source tree. Now I will apologize about this in advance. I know that the course ratings are declining right as I'm saying this. So how many people have ever been going to get merged before? OK. How many people have handled a get merged conflict before? OK, good. I'm really happy about that. So if your hand was to not find one of those people, because you're probably going to have to handle one now. So I apologize for this in advance. Yeah, stop muttering. I'm going to explain it. Yeah, what's that? OK, good. All right, muttering. Stop. Let me defend myself. So here's what happened. David Holland released OS 161 2.0.3. This is a minor version increase. We decided it's early enough in the semester, let's get that out there. Most of the changes are not going to touch anything you're working on. So 95% of the changes will not conflict with anything you've done unless you're off modifying parts of the system that you should not be. So if you've been rewriting the assembly code for the trap handlers, you're in trouble. If you haven't been, you're fine. There is one place that is going to probably cause a conflict for a lot of people. David has added a new feature to OS 161.3. I think you guys will actually find useful. It's deadlock detection. So we will talk about this next week. But your kernel can get into a state because of how you've acquired locks where nothing is able to run. All the threads are asleep and nothing's making forward progress. David has implemented a deadlock detector as part of OS 161. So how many people want that deadlock detector as part of their code base? They think they might be OK. So in order to get that, there's some small changes to your lock acquire and lock release that you have to make. There's like a new function that you call at the beginning of lock acquire and then the beginning of lock release. That changes in the patch that we were going to give you, but that change will probably cause a merge conflict because I suspect that Git won't be able to find where to put it. It is going to be not hard to fix. You just have to take that one line of code, put it at the right place in your lock acquire and lock release, and then deal with the rest of the conflict. Does that sound OK? We can still be friends? OK. Now, I'm serious. This is useful. This is not the last time that we will push changes this semester. I'm certainly not guaranteeing that. This is like part of life as a software developer is dealing with things like this. But we made a calculated decision here. We said, OK, this is probably going to cause a merge conflict. And you know what? You guys are big people. You can handle a merge conflict. It's a small one, but I think it's worth it because this feature is worth it. In the future, we may distribute more changes to the base system, but they will be two parts of the system that we really don't expect to cause merge conflict. So for example, Scott has some changes he wants to make to the test 161 directory. As long as you're not poking around there and modifying things, those changes should merge in beautifully without any problems. Any questions about this? And of course, we're doing this on Friday afternoon. Happy weekend. Yeah, I will be on the forum over the weekend to help. We're going to provide directions. If you're having trouble, the number one thing that you can, the worst thing you can do with a merge conflict is just to get add dot and make it all go away. Because now you have files that you've committed that have get conflict markers and nothing will build. So I've certainly had people come in and my kernel stopped building. And you're like, yeah, it's because that huge line of equal signs is not valid C code. So that's not going to compile. Whatever, this is all fixable. It's not that big of a problem. Any questions about this? Oh gosh, I can tell I'm in trouble. We're still working on weekend in remote office hours. Sorry about that. Blame Ali, he got busy with other things this semester. But we don't have any weekend office hours this weekend. Although maybe this afternoon I'll ping some of the people that we're doing remote office hours for us and see if they can do forum office hours over the weekend, particularly to help people with getting something like that. But yeah, when there are changes to the office hours schedule, it certainly lets you know. But we're hoping to add some of those. Yeah, oh, that's not a question. OK, so remember last time we made this subtle change to the process model where we introduced a new level of indirection in how file descriptors, which are the identifiers that processes use to identify the files that they're using, mapped down to the actual lower level identifiers that the kernel used to identify files. So this is the mapping. The file descriptor maps to a file handle. The file descriptor is private to each process. The file handles are usually private to each process, except in a couple cases that we're going to talk about today, where they can potentially be shared by two or actually more than two processes. And then the file handle contains a reference to this lower level file object that's maintained by the file system itself. And that's what actually allows the file system to find the content on disk or over the network or in memory or whatever. OK, yeah, good question. Yes, it does. And the information it holds is important. And we'll come back to that. So part of the reason that we did this, great question. So the file handles have information in them. So when we added a level of indirection, we split some of the information into new places. If you only have two pointers, then there's certain information that has to be in one place or another. Once I have three levels of indirection, I have another place to put things. And so we have some things in the file handle that are important because they're used in certain sharing scenarios. So specifically, file handles store the current file position. Remember that we talked about how the system call interface for dealing with files on Unix, read and write. The file position is implicit. The file position is the result of where the file pointer ended up based on past operations. There is a way to position the, why is that there? Who is using that for anything? I don't understand. There is, see, I'm totally blank. There are ways to position the file pointer manually. And you guys will work on this for assignment, too. People have started to look at assignment, too. There's a system called lseq. And lseq is designed to allow you to tell the kernel exactly where in the file you want your next file operation to go. So if I want to read from a particular byte of the file or write to a particular byte of the file, I can use lseq to position the file pointer where I want it and then perform the operation. But normally, the file pointer is updated at the end of reads and writes to point to the end of the operation. If I read 256 bytes from position 256 in the file, where does the file pointer end up? If I read 256 bytes from starting point 256 bytes in the file, where does the pointer end up? 512, right? I've read the 256 bytes and the pointer points to the end of the operation. Does that make sense? OK, yeah. All right. So in the file objects, you don't really have to worry about. Those are maintained by the file system. And these can be transparently shared across multiple processes in ways that you just don't have to bother with. OK, so we talked about fork. We talked about the right at the end of class we were talking about the fact that fork creates a copy of the parent process. It copies all the memory that the parent process is using. It copies the parent process's file table. We'll talk about the consequences for that in a minute. But there's one thing that it doesn't copy. What is fork typically not copy over from the parent? The child is a carbon copy of the parent, except for what? No, the file table is copied. Yeah? Any threads other than the calling thread? Any threads other than the calling thread. And there's a reason for this, right? The reason is somewhat obvious. Well, I don't know why I would say that. Of course, it's not obvious to you. You're here in the class trying to learn this stuff. It's obvious to me because I've talked about it for years and years. There's a good reason for this. Let's put it that way. The thread that called fork. So remember, fork is this weird beast that returns twice to the same point in the program. And we'll show an example code later that shows you exactly how fork works. It's kind of weird. You call fork, and suddenly there are two processes executing the same code at the exactly same place. That's kind of cool. The problem is, what happens if there's a bunch of other threads out there that are doing stuff? How do I copy them? And so for certain reasons, it is much easier to copy the thread that called fork than it is to copy any of the other threads in the process. In particular, it's hard to stop all the threads in the process at some well-defined place so that they can be copied appropriately. And this is why, typically, Linux will only copy state for the thread that called fork. So people were asking after class, let's say that I want to create threads in the child process after I call fork. What do I have to do? Let's say that the parent has eight threads, and the child wants to have eight threads. The child starts off as one thread, so what does the child have to do? Call clone, which is the fork equivalent that creates new threads and create a bunch of new threads. So it's just sort of up to the child to do this itself. So I mean, I talked about this. This is the reason. The thread could be blocked in the middle of doing something, which is really hard to get right. And the thread could be doing something in the program itself. And so for a variety of reasons, copying the threads that are not calling fork is hard. So we don't do it. So it copies one thread, the caller copies the address space, and it copies the process file table. So here's my little diagram. I copy one thread, let's pretend thread one called fork. I copy the address space. We'll talk more about address spaces in a month. Now here's an interesting question. If I copy the file table, so the file table contains pointers to these file handle objects. If I copy the file table for the child, where are its file descriptors going to be pointing? So I copy the file table. So this is the one place where file handles can be shared. Does this make sense? I'm literally copying the file table. And the file table contains pointers to these file handle objects. So if I copy the file table, the child now has pointers to the parent's file handles. The child starts off with the same files open and the same file descriptors. So if the parent had 0246 open, the child has 0246 open when it starts. And the child has pointers to the same file handles that the parent is using. So we'll come back and talk about that in a second. But this is the reason that we split that information into two pieces, because it allows the child and the parent to communicate. There is one difference between the environment that the kernel creates for the child and the environment that the kernel creates for the parent. And it's somewhat important. So remember, the goal is if I only have one thread in my process that I call fork, the process that's created is a carbon copy of the parent, except for one difference. Has anyone ever seen code like this or written code like this before? This is the canonical way of calling fork, typically. Particularly when you want the child to do something else. So remember, I call fork, and I should do error handling here. I know. Sorry. Let's pretend this works. I call fork. What happens right here? How many processes execute this line of code? I just called fork. Let's say fork succeeded. How many processes call two? Start to wrap your mind around this idea of multi-threaded, multi-threading, multi-programming. There are now two processes that are executing the identical code, except there's one difference. What's the difference? Can you tell from the slide? Yeah, so fork returns twice, but it doesn't return the same value twice. One process gets zero, and the other process gets a number. The process that gets zero is the child. So if I check the error code, sorry, if I check the result of fork and I get a zero, I know that I'm the child process. That's the only way for me to know is by looking at the return code of fork. If I get a number back on the parent process, now this code doesn't use that yet, but what would be a useful number to pass to the parent process? Process idea of the child, so that's what fork returns. Fork returns twice. The child gets return code zero. The parent gets return code the PID of the child. This allows parents to keep track. You don't want your child just wandering off, so you're gonna get a call from social services. First of all, they're gonna be like, your child's just wandering off here and nobody knows about this child, and then they're also gonna be like, how did you have a child on your own without help? So there's definitely gonna be some sort of investigation there. Yeah, so the return code allows the parent to keep track of the children that's created. It doesn't have to use the return code, but it does use the return code, and well, I shouldn't say it does, it can, and there's some system calls that you need the return code to use. All right. Okay, all contents of memory. Now here's the interesting thing. I know this is a small text. Both the parent and the child went after the fork complete, so right at this line of code, there are now two programs running this line of code. Two processes. Those two processes have the same file descriptors open, and the interesting thing is because the file handles are shared, the position in the file is also shared between the parent and the child. So if the parent, let's say the file position is zero and the parent writes 256 bytes. The next time the child writes, where will the child write in the file? 256, because the offsets are shared. Now, there is no requirement for the child. Well, we'll talk about this in a second. There's no requirement for the child to continue this arrangement. The child, and this is sometimes, this is what certain types of children do, right? The child can just go through and close all of its file handles and reopen them and point them at anything it wants, and when it reopens the file handles, will they be shared with the parent? No, right? This arrangement is only set up after fork, not in any other time. All right, so somebody, yeah, sorry. Great question. Yeah, so this is something you guys will have to figure out for assignment too. And we'll come back to this when we talk about synchronization. So let's say that the parent and child, at the same time, try to write, let's say the file pointer is zero, the parent child both try to write 256 bytes to the file. What are okay things to happen? What's one okay thing to happen? Parent and child both write 256 bytes. Yeah. Well, I'm not gonna tell them to wait, I'm gonna complete the operation, right? So both of those writes are gonna complete. There are two valid outcomes in terms of what ends up on disk. What's one valid outcome? The parent's content first, the child's content second. What's the second valid outcome? Child first, parent second. Anything else is not okay, right? So the rights to a shared, this is a good point, the rights to shared file handles are serialized by the kernel. So if somebody, if the parent and child go back and forth writing 32 bytes, even if those rights happen at the same time, somebody goes first, somebody goes second. What should not happen is they shouldn't clobber each other and the file handle should always go up by 256. So if two rights happen at the same time, when those two rights are done, the file handle better be 512, the file position better be 512. In terms of what's on disk, that's not up to the kernel. This is a classic instance of what's called a race condition that we'll start talking about today or Monday, where something about the timing of events ends up influencing what happens in the system. Typically you try to get rid of race conditions. If you're a programmer, you may actually care who goes first and who goes second. And in that case, you have to do some extra work to make sure that somebody goes first and somebody goes second, right? That answer your question, okay. All right, fork bomb. Has anyone, I mean, you guys can write this code. Write this code in your VM and just try it, right? I mean, what does this code do? Yeah. Yeah, this might make Ken mad. Hopefully again, most of the shared systems have reasonable guidelines about this, but last time I checked Timberlake, it was running like an ancient version of Linux, so who knows? Yeah. Okay, this is. Okay, anyway. Seems like there's been some flaws in the autograin system. There are no such problems with test 161, right. Yet, as of now. Yeah, so this creates a geometrically increasing number of processes. So the first time I have two, then I have four, then I have eight, right? So this is kind of like, you know, that's my favorite. Right? More. Those are great movies, by the way. That's probably time to queue those movies up again and watch them, because I haven't seen them for a while. Okay, any questions about fork? So let's talk, we're about to get to the point where we talk a little bit about why we've done some of this. Where were some of the semantics of these calls came from? You might think like, what's the point? I don't see the reason for the parent and the child to share these file handles after fork. It just seems like it's complicated. And again, a lot of cases, the first thing that the child does after fork is either use a different system called exec to become a completely different program. Like that's what the shell does every time you type a command. Every time you type a command, the shell forks a new copy of itself and then it runs the command that you gave it. That way the shell is still waiting there for that to return. But if I run a canoe command, all that work is wasted. So in many cases the parent and child never used these shared file descriptors, but I can use them in a clever way by using the pipe system call. So pipe creates something that's called an anonymous pipe object. It returns two file descriptors, one for each end. And I said read only and write only. I actually think that's not true anymore. I should fix the slides. The point is the pipes have two ends. It's like a socket. If I write something to one end, it becomes readable on the other end. If I write something to that end, it becomes readable on the other end. So I can send data from one end of the pipe to the other end of the pipe. So if I'm writing on one end of the pipe and somebody else is reading, they'll get the contents that I'm writing. Now this is another one of those cases where remember we talked about file type objects and we'll come back to this when we talk more about file systems. This is another example of a file type object. Pipes are not implemented by using disk blocks. There's no file. You can create something called a named pipe in Linux and I'm not gonna talk about that, but the pipe system call returns you this thing that acts like a, it kind of acts like a file and that I can read and write to it, but there are no disk blocks that are being used. There's no permanent storage for this. It's all done in memory. Just using an in-memory buffer that the kernel maintains. So yeah, anything I write to one end becomes available at the other end. How many people have done socket programming before? Okay, so you guys get this, all right. So why is this useful? Why are pipes cool? Because pipes allow you, I mean, how many people have used a pipe in a shell command? Hopefully more of you guys now. Okay, this is how this works, right? If you've ever wondered like how does the shell execute this complex pipe, like a set of pipelines that I've created. So here's how this is done. So before I call fork, the parent uses pipe to open up a new pipe. After it calls fork, the parent and child have an agreement about one closes one end, the other closes the other end and now what I've created is sort of a passageway so that any data that's written by the parent to one end of the pipe is received by the child and I can do this ad infinitum. I can use this to set up shell pipelines between an arbitrary number of processes but we'll just talk about how to do it with two. So here's how this works visually. The parent calls pipe, remember pipe returns two file descriptors, right? One for one end of the pipe and the other for the other end of the pipe. After I call fork, the child has a copy of the parent's file table and so it has these file descriptors and then all I need to do to get a nice sort of, because the parent doesn't need both open, it just needs one, so as long as we close the right ends and this is what I end up with. So this is a way to set up IPC between the parent and the child. This allows the parent to send the child data. Does it make sense? If you wanna make sure that you understand file, how the file table has changed after fork, this is the example to go through, yeah. They are initially, right? So remember when I start, this is how things look exactly after fork. Does that make sense? And then the parent closes one end and the child closes the other end. So they just have an agreement like, okay, I'm gonna parent's gonna close end six and the child's gonna close end seven, right? And I can do that because the parent and child can figure out who they are, yeah. So the pipe is like a file except it's not, it doesn't obey the precise semantics that a file obeys, right? So if I write to one end of the pipe, the data becomes available at the end of the pipe to someone who reads, right? Same thing as if I write to, so again, it's very much like a socket. When I send on a socket, someone who's listening on the other end gets the data, right? If I listen on a socket and somebody sends on it, I get the data, right? Does that make sense? Yeah, I mean there's some memory in the kernel that usually has to be used for this, right? Because for example, let's say the parent writes some data to the pipe but the child hasn't run yet, right? The kernel will save that data in a memory buffer and then when the child calls read, it gets the data out. So yeah, there's definitely memory that's used by this. I don't think there has to be, I think you can set it up so that, so another way to do it is to say when the parent calls, right, it blocks until the child calls read, right? And then I can move the data directly from one to the other, right? But I think normally there's a little bit of a buffer in between, yeah. I think so, I think, yeah, I think so, like that sort of violates our typical understanding of how the shell pipelines work, which is the data doesn't travel uphill, but I think that is possible, yeah, yeah. I mean, exactly who's the parent who's the child, I would suspect that it's set up that way. I think the shell may actually be the parent of both, right? But this approach is being used to set up that pipe between the two of them. Yeah, yeah, yeah, it's created before it calls. Yeah, good questions. Has anyone used the name pipe before? There's a way actually to create a file that acts like this. So there's a way to create a file, and again, it's not a real file, it's a fake file, so that two processes can communicate with it like a pipe. Normally, if you find yourself using named pipes like something seriously has gone wrong and you're trying to do something that is not a good thing to try to do, but anyway. Okay, so here's an example using a C-like pseudocode, pseudocodesh, right, yeah. How do I identify the pipe as two ends? How are those ends identified? What are those ends? When I call pipe, what do I get back? I get back to what? File descriptors, right? One file descriptor identifies one end, the other identifies the other, right? Yeah, yeah, yeah. And actually, we can sort of finish this example, yeah. Yeah, it's called syspipe. You are free to implement it if you want for assignment two. It's actually not hard to implement. We don't ask you to do it, maybe this year. I'll just add it to the assignment for fun, but it's, yeah, I mean, once you guys get through assignment two, it'll be pretty obvious to you how you would implement a pipe. Yeah, I don't know, right? That may be something that you can set up when you call pipe, right? Yeah, I mean, I don't think, it wouldn't be good defensive programming for the kernel to allow me to create an arbitrary memory buffer of infinite size, right? At some point, I think what happens is if the buffer that the pipe uses to hold information between the two gets full, stuff just starts to block. And you may have seen this. So I remember trying once to set up a shell pipeline that had like six or seven components to it. And at some point, what happened is that one of those programs wasn't reading data properly. And what will happen is the ones at the front will start to run, but at some point, if there's a blockage, I don't know. I mean, we're talking about pipes, right? So it's like, if a pipe gets blocked, stuff will just start to stop. And actually it's, I wish that your sewer weren't this way, right? Because what happens if this happened to me in the last summer, like your sewer will just kind of burst and explode and that's not good. Pipes will just stop, right? The process will not be able to continue to run, right? Because it'll try to write data to the pipe and the kernel will stop it and wait until there's room in the pipe. Yeah. This is, the code that we're using is set up to allow that to happen, right? This is a particular design pattern that uses properties of fork and the pipe system call. There's no requirement that I do. Again, like, if you're the child and you're just created, you can write code that the first thing the child does is close all the file descriptors, right? In fact, there's actually a variant of fork that allows the child to not inherit file descriptors from the parent, right? So I can call that version of fork and then the child starts with its own file descriptor. So there's no requirement to do this. Again, all versions of IPC require consent from both parties. There's no way for me to force another program to accept data from me. Yeah, good questions. Okay, so this is, let me go through the example. So I call fork at this point. Who am I right here? Return code is zero. I'm the child. So I close one end of the pipe. So this call, and again, ignore the parts about read and write angst. I think they're bi-directional. This call gets two file descriptors, right? And this could be the type file descriptor, whatever. So pipe will put the two file descriptors into this array. The child closes one end and then tries to read data from the other end. The parent, this is the parent now. So it closes the other end and writes data to the child. Hello, sweet child. I'm changing to say sweet child of mine or something like that. Questions about this example? Okay, any questions about fork? Try not to let me see where we are. Try not to think we're gonna land at a good spot today. Okay, yeah. Yeah, I don't need, I don't wanna, I'm writing to this end, so I don't wanna read data from the other end of the pipe. I don't need to read the data that I wrote, right? Yeah, yeah, it's probably a typo. Yeah, all right, pipes. Okay, so one of the core problems with fork and one of the reasons that there have been so many different fork variations that have been spawned is that thing about what fork has to do, fork has to take the parent and it has to create this new process that is the carbon copy of the parent and there is a fair amount of state to copy over. What is, again, we haven't talked about these specific abstractions, but what would you suspect is the most time-consuming part of the parent state to copy? Well, okay, you're getting warmer, right? The stacks are usually pretty small. Yeah. Okay, also warmer. I mean, there's a general category here that you guys are, yeah. The memory, yeah, all the memory. Remember, when fork starts up in the child, all the memory contents have to be identical. The parent could have been running for days that could have filled its heap with all sorts of stuff and the child has to get all of it. So I have to take all the memory from the parent and I have to copy it, you know, byte by byte. And one of the frustrating things about this from the kernel's perspective is frequently. So there are times when I call fork because I actually want more concurrency from the program that I'm actually running. If we remember, go back to Apache. We saw that Apache calls fork to create multiple copies of itself. That's so that it can handle multiple requests concurrently. On some level, not a perfect mental model, but when you get a web page from a website, there is a thread that's part of a process that handles your request from start to finish. And again, that may not be perfectly true, but that thread is listening on the socket. It receives data from you. It finds the page that you want. It does any sort of server side rendering that might be required running PHP or whatever it is. It collects the results and then it sends them back to you. And so the way that I achieve concurrency, and again, in the simplest possible way on a web server, is I have a bunch of possible threads and processes that are available to handle those requests. Yeah. Oh, that's interesting. Are there ways to keep the parent and child in sync? No, not that I've heard of. That would be wild. Yeah, so that's a great point. I just want to make sure this is clear. After I call fork, the parent and child are free to go their separate ways. There is no additional synchronization that goes on. Thanks for that clarification. Once I call fork, the parent, the child starts off with a copy of the parent's address base and file handles, but at that point, all the actions that the parent and child take are separate. If the child starts to close all the file, if the child goes through its file table and closes all the open file descriptors, the parent doesn't see those closes. If the child allocates a bunch of new objects on its heap, the parent doesn't have access to those objects. These are two separate processes, and so all the normal rules about process isolation come into play at that point. Now, so there is one use for fork where I'm creating more copies of myself, but the typical use for fork. So probably if you look at all the forks out there in the universe, I would suspect that 98% of them plus are the parent wants to run a new program. So again, if you think about how does the shell work? Simple model for a shell. You type a command, the shell takes the command, calls fork, tries to run that command, waits for the command to finish, and then reruns the prompt. That's all it is. So every time you hit return in the shell, most times there are some shell built-in commands that don't use fork, but every time you run sys161, every time you run test161 you run make, compilers, whatever it is, every one of those commands involves the shell calling fork, waiting for the child process to finish, and then redrawing the prompt. So that happens over and over and over and over. And in that case, the child that gets created, it may do a few things to set up pipelines or other stuff, but then what it does is it uses this other system called exec, and exec is like transform in me into a completely new process. And when I call exec, all the work that fork did to set up the address space, copy all that memory, gone, because exec blows away the entire address space and starts from scratch. And so, early people who were developing unicycle systems saw this pattern, it was obvious to them, they were programmers themselves, and they were like, this is dumb. Every time we call fork, we do all this work to copy all this stuff, and 98% of the time, the first thing the child does is call exec, and exec wastes all that work that we just did. So there's a cool approach to optimizing this, it's called copy and write that we'll talk about when we talk about memory management in about a month. The other thing you can do is just change the semantics. So there's now something called v fork. And v fork is an optimized version of fork that simply will fail if the child does anything other than immediately call exec. So I think the way this works is it sets up read-only mapping to the parent's address space. So the child can keep running, it has access to the instructions, but if the child tries to change, tries to write a variable or modify a variable, it just fails immediately, right? Because it says I didn't, and this allows me to not copy the address space. And then, as I sort of been hinting at for a long time, there's now fork has essentially been replaced by a new system called called clone, which is much more powerful. It allows me to control at a very fine granularity exactly what the child has shared with the parent and what gets copied. It also allows me to create threads, right? So there are variants of clone that allow me to create a thread within the same process rather than a new process, all right? And yeah, we actually, there's clone and I think people have gotten clone to work on OS 161, it's certainly doable, it's not hard. So fork establishes this parent-child relationship between two processes. So every process on the system, even if all the process does is call exec or close all the file descriptors, even if the process wants no further relationship with its parent, there still is this relationship. And so you can visualize all the processes that are running on a particular system as a tree. And there's something called psTree, I don't know if it's installed, again it's pretty easy to install. So here's an example I wrote run on the old 4.21 web server. So this is a pretty simple lamp server that we had set up and this is the process tree. So one interesting question is, where does the first process come from? What is the name of the first process here? Init, has anyone ever set up Init scripts for machines to like figure out what runs and boot? Yeah, so Init is the first program, it gets set up kind of by hand because Init has no parent, right? Init was just born from the head of a god or whatever. So Init has no parent, the kernel sort of sets it up by hand and launches it when the kernel boots. And then at that point, every process on the system is somehow created by some other program. So you can see here that I have an Apache web server, you can see Apache created a bunch of other, like I said, Apache creates both threads and processes. So Apache created four copies of itself, one of those, each two of those copies have 17 threads each, the other two have 26 threads each. So I think the total, I'm not good at math, the total is like 104 or something like that. Is that true? No, I lied, I'm really bad at math. 36, 42, 84, who knows? I don't know why, 84. Wow, that's a good number. I don't know how they came up with that number. And this is all the other stuff. So everything on this list was started by Init, right? Init started, these are a bunch of sort of classic system services. Exim four, anyone know what Exim four is? David, really, it's a mail server. There's a, what else here? There's SSHD, which you can see has started a couple of, I have no idea why SSHD creates several children. But here's the shell that I was using when I launched the command and here's the command I used. So the command that's running that generates all these relationships like every other process on the system fits into this hierarchy. This makes sense, it's kind of cool. You can run this on your own machine and check it out. You may or may not be able to get this to run on temporarily, I don't know. Try it. Okay, any questions about, yeah, good question. How does PS, what's your guess about how PS3 creates, harvests this information? Proc, yeah. They're just reading through Proc. And I guess, it looks like I didn't run this as pseudo. I guess like this information is considered to be not sensitive enough by the kernel to actually keep private. All right, I think we're gonna stop here today and then we will have a full week of synchronization next week. Again, keep your eye on the forum tonight for the announcements about how to do the merge. And we will see you guys on Monday. We have office hours now, first floor, Davis.