 All right, everybody, welcome back to CS162. As those of you that are local have noticed today, it's like we're on Mars or something because the sun is red and the smoke is in the sky. It's pretty strange. But let's see if we can get a good lecture out of here anyway. So today, we're going to continue our very short little discussion of talking about some abstractions at user level, both to help you get going in the class and to kind of see what it is that we're going to be doing in the kernel when we are trying to support these abstractions. So today, we're going to talk about the file abstraction, which is really also the IO abstraction, which is an interesting thing about UNIX. And we're going to finish discussing process management, which we didn't quite get finished with last time. But we'll talk about both the high and the low level file IO APIs. And we'll talk a bit about why we have the different ones. And then we'll look at some interesting gotchas that sort of come about when you mix processes and file descriptors in IO. And yes, to the comment in the chat, we are definitely in the upside down today. So if you remember from last time, among other things, we talked about threads and processes. And we introduced just briefly this notion of synchronization. Now I'm going to talk a lot about that in a couple of lectures. But just remember some ideas here. One was there was mutual exclusion, which is ensuring that only one thread does a particular thing at a particular time. One thread excludes the others. And that's a piece of code that's being excluded from is called the critical section. It's typically something that's being operated on that basically if you have more than one thread in there, you're probably going to get some bad behavior. And so that's why we call it a critical section and why we need mutual exclusion. And the way we did that last time is we talked briefly about locks. Only one thread can hold a lock at a time. And it gives you that mutual exclusion. So we talked about two atomic operations. We talked about lock acquire and release. Wait until the lock is free, and then grab it, and then release is unlock and wake up any of the waiters. And again, that was just a brief, quick discussion. We will get there in much more detail when we start diving into synchronization in a few lectures. But one thing I did want to briefly do is tell you that there's some other tools that we might use instead of just locks. And there's a really rich sense of synchronization primitives that we'll start talking about. But one of them that I wanted to just mention since you might encounter it fairly quickly is semaphores. And the semaphore is basically a generalized lock that was first defined by Dijkstra in the 60s. And it's been around since then, and everybody uses it inside of various operating systems. And it's really kind of like a generalized number. So semaphore has a non-negative value associated with it and has two numbers, or two operations, p and v. So p means it's an atomic operation that waits for the semaphore to become positive and then decrements it by 1. Some implementations call this the down operation. And then v is an atomic operation that increments the semaphore by 1. And if somebody's waiting on it, it'll wake one of them up. So p, by the way, stands for proberan to test in Dutch. And v sends for ferrohogen, which is Dijkstra's influence on this. What I wanted to give you was a couple of patterns. So one pattern for a semaphore is very much like a lock. We call it a binary semaphore or a mutex. The initial value of the semaphore is equal to 1. And then if you do a semaphore down, then the first thread that does that decrements the semaphore from 1 to 0. And it gets into the critical section. If any other thread tries to do that, then it immediately gets put to sleep because that would decrement the semaphore below 0, which is not allowed. So all subsequent threads that try the semaphore down are all put to sleep. And then eventually, when you finish the critical section, that first thread calls up, which increments the semaphore from 0 to 1, which immediately wakes up one of the threads that then decrements it again. So this acts exactly like a lock in its mutual exclusion pattern using semaphores. And we actually saw the lock we use was called a mutex. So that terminology gets intertwined between locks and this particular use of semaphores. Another pattern, which is kind of interesting with semaphores, which is why they're so interesting. They can have many patterns, is for instance, if we start a semaphore off at 0 instead of 1, then what happens? Well, if somebody executes semaphore down, they're immediately put to sleep because they would try to decrement this below 0. Wouldn't happen. They'd go to sleep. I'm going to call that thread join for a moment because if another thread then executes semaphore up, you immediately wake up the one that did down. And so this is like this thread finish join pattern we talked about. And this is yet another use of semaphores. So in a couple of lectures, we're going to go through a number of different synchronization patterns. And you can see that just by setting the initial value of the semaphore to different values, you get some pretty interesting patterns. So notice, by the way, the question in the chat here, let me clarify just so we know, the initial value of the semaphore is 0. So that means that semaphore down doesn't actually decrement. It can't because you can never go below 0. So what happens instead is the thread that executes this block of code goes to sleep right away without decrementing. The block that executes thread finish increments it to 1, which then immediately wakes this guy up and then he decrements it back down to 0 again. All right. Now, these are actually, it's from the Dutch if you actually look at Dijkstra. But anyway, all of those languages are related in one way or another. OK, so if you remember also from last time, we talked about so non-negative value of a semaphore is actually a locking pattern, not necessarily exclusively due to the hardware. So we will talk a lot more about how you implement these things later. So if you notice, we're talking about abstractions now. So you don't have to worry how they're implemented. You just have to worry about the API. We'll get to implementing them all in due time. All right, so try to get the pattern and the API, not how it's done. So the other thing we talked about, of course, was processes. And in some detail, and we noticed that there's multiple versions of processes, one which only has a single thread and another which has multiple ones, the key idea is that a process has a protected address space and state, such as open file descriptors, which we'll talk about today, and then one or more threads. And for every thread, each thread has a stack and a thread control block for saving its registers. OK, and pretty much anything that runs outside of the kernel these days runs in a process of some sort. And the other thing we talked about last time is we talked about how to create processes. And to do that, we introduced fork. And I'm going to briefly say again what fork does, because it is the first time you see it, it's a little weird. But basically what fork does is it takes an existing process and it absolutely duplicates it. So there's a new process that is duplicate and that new process has an exact copy of all of the data in the address space plus copies of things like file descriptors and we'll go into that in more depth. The question here on the chat about whether threads basically share the heap is, yes, they do. So they share the same heap. They each have their own stack, because if you share a stack, you would actually have a clean execution of any sort. And so they don't share stacks, but they do share the heap. So this thing about duplicating is a little weird. So the return value from fork is, it's a system call. So what you get back is a value. And if that value is greater than one, then you happen to know you're running in the parent and the parent process, that value that came back is the process ID of the child. On the other hand, if you get a zero back, then you know you're the child. And then you have to get the ID to find out what your process ID is. And if you get less than zero, then everything failed and you didn't actually create a child process. And so just to repeat this, and we're going to see it again later in the lecture, the state of the original process gets duplicated in both the parent and the child, completely duplicated. The address space, the file descriptors, et cetera. So if you looked, for instance, we looked at this brief bit of code here. So here we execute fork. Before we execute fork, there's this one parent process. After we execute fork, then we now have two processes. And I'm going to say this again because it's just weird. So those two processes are running at exactly the same spot and have exactly the same state until they return from fork. One of them returns a non-zero number. The other one returns a zero. And that's the point at which they diverge and are no longer exactly equal. So the process that calls fork is always the parent. But it doesn't know that it's the parent. So the way it knows is it gets back a non-zero number. And it's parent process. Parent, yes, this should say process. But there's also a thread running there too. But yes, that would be parent process, child process. In fact, here, let's just fix that. OK, so there we go. We are fixed. Now, if you notice, so we'll talk about what happens when you fork inside a multi-threaded process. It's not pretty. So we'll get to that a little bit later in the lecture. But the bottom line is the only the thread that happened to have called fork is the one that survives. And all the remaining threads just go poof. Their state is around, but there's no thread that's actually running. So if you look at this example we gave, can everybody see the screen again now since I went out and came back? I think we're good, right? Yep, all right. So if you notice here, we call fork. So now there's two processes. The one that got greater than 0 we know is the parent. The one that got 0 is the child. So the parent with this kind of an if, else if, else pattern is how we typically write a fork pattern. And so here, the parent goes off and says I goes from 0 to 9 basically and writes parent and goes parent 0, parent 1, parent 2. The child goes I from 0 to minus 9 and basically says child 0, child minus 1, child, so on. And the thing we talked about last time is this does not get screwed up because, does anybody remember why what happens is the parent goes up and the child goes down and they don't prevent each other from doing their task? Anybody remember why? Yep, they all have their own eyes. So this I, this int I starts out as a global variable that's in the parent process. But as soon as we fork, there's now two different eyes, one in the parent, one in the child. And so this going up and going down thing don't interfere with each other because they're in completely separate address spaces. So you gotta keep that in mind as well. The only thing that's gonna be a little weird here is since we're gonna be sharing the file descriptors for standard out to the screen, the parent and child statements are gonna get interleaved in a non-deterministic fashion. So we won't know how they're interleaved with each other, but we do know that the parent will have 10 values and the child will have 10 values, okay? All right, and the heaps will be separate from the point at which the fork happens. Okay, because the entire address space is copied. So it doesn't really matter whether this is global in the static space or it's on the heap. All right, completely new process. Now, here's a question, would adding sleep matter here if I put sleep in there, would it change the outcome and the answers? No, what it's gonna do is it might change the interleaving a little bit, but again, it's not gonna prevent the two processes from running to completion. Okay, are we good? Any questions on that? So the reason it matters whether a parent is a parent versus a child is that the parent typically has control over the child in terms of signals and the parent also can wait for the child, which is by next statement. I'll show you here to exit and get its return value from the child. So child process really is a subordinate of the parent. So the other thing we talked about at the very end of the lecture was starting a new program with exec and notice this, here's the fork pattern. We do our fork, we say if we're the parent, we're gonna wait and I'll talk about this wait for a moment. We're gonna wait for the child, but when we go to the child process, what it does immediately is it doesn't exec, there's many flavors of exec. So you should do a man on exec to find out. This particular one takes a path and some arguments and it's now gonna take the completely copied address based from the parent and then it's gonna throw out all the copy and start a new program in that address space. Okay? All right, so anyway, so this seems a little strange. This pattern where we fork a new child, which is a copy of the address space and then we throw out the address space does seem like it's a waste, but in fact, to get the fork semantics, as I briefly mentioned last time, we're actually gonna pull tricks with copying the page tables, not copying the data and so this is not as wasteful as it seems. Okay, so just to look at this idea of starting a new process, here's a typical shell pattern. Let's just look at this in a different way. Again, notice we fork, if we're a PID equals zero or the child, so we'll exec the new program, otherwise we wait. And if you notice, what happens is the result of the fork, the child up here says, oh, I'm the child I'm gonna exec and the parent goes to wait and now the parent is waiting for the child to exit and the child goes off and starts the new program. Okay, so this is a typical pattern in a shell. Now I haven't quite showed you how to wait yet, that's my very next slide, but you get the idea that in a shell, when you type a command, it actually forks a separate process for the child, it runs the program and then later when that program exits, which means the child exits, then the parent will come out of wait and it goes on to give you the next prompt. Okay, now if those of you who have been typing commands in your version of Pintos, you're typing them at the command prompt, that's the shell. So that's the process that lets you type commands and have them run and that's homework number two, you're gonna actually get to design a shell. Okay, all right, command line. So bash, T-C-S-H-S-H, all of those things are shells. Now, so let's look at a couple of other things. So wait, for instance, is waiting for a child process to finish and so here's a very simple example, I just showed you the wait, okay? And so there are many versions of wait, you should also do a man on that one. This particularly simple one takes a pointer to an integer as you see here and that pointer to that integer will get filled with a return code and this particular version of wait says, doesn't care which child process it waits for, it just says wait for the next one. Okay, and in this instance of this program, there is only one and it'll wait till it finishes and then when it finishes, we'll actually get back the PID in that case of the child, which there's only one that just finished and its status. Well, what's the status come from? Well, the exit code here. So as you all remember, 42 is the meaning of life. So in this case, we exit with 42 and what'll happen is that's the child finishing that'll wake the parent up who's been trying to do a join type operation by waiting that 42 will get filled into the status variable, we'll get back the PID of that child and now we'll get to move forward. Okay, and of course that PID is gonna be the same as the PID from CPID because we only made, we only created one child in this instance. Now, the last two things I wanna show you here they're related each other is how to use the signaling facilities. So this was about how to interact with child processes and if you have many child processes then you can actually wait for specific ones, et cetera. Okay, and wait works because the kernel keeps track of parent-child relationships and that's gonna be something that you're gonna get to have a chance to do some implementing with and we'll talk about more later, okay? And this, we're not passing the, this has nothing to do with which child we're waiting for. We're passing a container for it to put the status in but this particular wait says wait for the next child to finish, okay? Now, and if the child seg faults or something else causes it to fail that will also wake up the wait because that'll just exit with a non-zero code kind of automatically. Now, and if the child calls exec then it's still the exit code of the actual child process not the particular code they're running, okay? So we'll wait until the process finishes not this particular piece of code because you're really waiting for the process not for whatever's running in it, hopefully that's clear. So now let's look at signaling and so last but not least, if you have two processes and you're interested in signaling from one to another remember that processes don't share, processes don't share memory unless we do some work which we haven't told you how to do yet and so they have to have some way of communicating and one way is the signaling facility which is kind of like a user level interrupt handler and the way we do that is we have to declare a special structure called a SIG action and inside that SIG action we can set some flags and some masks for what's enabled and you can look that up but here's the simple thing to do here that the SIG action structure, the handler we're gonna set to this signal callback handler, okay? And that's this function we've declared here and then we use SIG action to set that whenever we see a SIG int signal call use this SIG action handler, okay? And notice that this code is not particularly great because it goes into an infinite loop, right? While one do nothing. So this particular code on the face of it looks like it goes into an infinite loop forever except if you send it a SIG int which by the way is what you've got is what you got when you do a control C then that's control C will cause that signal to go to the callback handler called the callback handler to be called and we'll say caught signal and then we exit at that point, okay? All right and there's a question here about whether we need to do struct SIG action saw or SIG action saw. It depends on whether it's typed after or not. So you should take a look in the actual header file, okay? Now good question, great question. Is there a default? I'm thinking SIG action isn't necessarily typed after but it could be in the version of headers that one has because they change but this, you know, struct SIG actions essay would work. Now the question that was in the chat which is a good one is what happens if you didn't redirect it? So there's a whole bunch of default actions. So the default action for SIG int is actually what happens when you hit control C is it kills the process. So the default SIG int action actually kills the process. What you can do here is if you don't want control C to kill it but rather to do something else then you can make your own signal handler, okay? And so there's plenty of default actions. Now there are some handlers that in fact don't have any default actions or don't have anything you can set, okay? And so for instance, SIG kill is a good example if you do kill minus nine and you send that to a signal or to a process, there's no way for it to catch that signal and it will immediately die. But simple things like control C have either default actions or things that you can do on your own, okay? And so there's a whole bunch of POSIX signals and SIG int is control C, SIG term is the kill shell command. SIG STP is control Z, et cetera. And so the things like kill and stop are ones that you can't actually change with SIG action. All right, so we'll get to what POSIX stands for in just a little bit, but it's the standard for the system calls we're gonna be talking about, okay? And it is the portable operating system interface for Unix where the X comes from, all right? So just to remind you of where we're at, we've been talking about the levels of the operating system and the last lecture in this one we're kind of floating up here in user mode, but you gotta remember that there's a bunch of things down here in the kernel that are providing functionality for us. And we need to talk about how we get from up here to here. This interface is a system call interface and we briefly talked about it last time. You're gonna get to learn a lot more about it as you design the system call of your own. But basically the things that you're used to at the user level all kind of float in the standard libraries and they're pretty much above the system call interface. So we showed you this last time, this was kind of the narrow waste of the system call interface, okay? It's kind of like an hourglass or whatever. User code above, system code running below and then there's the hardware and the system call interface is basically a set of standardized functions that you can call that go across users kernel interfaces. And we're mostly again focusing at the OS library and above what you do with that, okay? And I pointed out, I think last time as well that there's this libc, which is the standard thing that gets linked when you use GCC and you link a program and that libc has a whole bunch of standardized functions that you typically call. And when you think of C, they're often the functions that libc's got and that those functions end up calling the system calls which call the OS, which is why many of you have not quite seen system calls yet but you will, okay? So at Ministrivia, we are now in full game mode in this class, project zero was due today. Remember, this is to be done on your own. This is just getting you used to everything about the projects and compiling them and so on. I also mentioned briefly that we up the slip days a little bit because of the weirdness of the pandemic and maybe because of the weirdness of living on Mars these days today, which was weird. But I'm recommending that you guys bank these for later rather than use it them right away. So a group assignment should be mostly done. Plan on attending your permanent discussion session this Friday, assuming that we've assigned them yet and remember these discussion sessions are mandatory so we're gonna start taking attendance as soon as people get used to them and remember to turn your camera on so that your TA can get to know you because they are gonna be your advocate throughout the term so it's important to get to know them. The question about when they're gonna be out is soon. I'm not entirely sure the exact timing on that but it'll definitely be before you need to attend and attendance will be taken through the Zoom so just make sure to log in. The other thing that we've chosen now is so midterm one is gonna be October 1st as we said on the schedule it's gonna be five to seven which is and it's gonna be three weeks from tomorrow so it's coming up on us and we understand this conflicts with CS 170 but the 170 staff said basically that you can start the 170 exam after 7 p.m. and they'll give you some details about that rather than starting it at six, all right? And our exam is gonna be video proctored. There's gonna be no curve. This will be a non-curved exam so that will reduce a little of the pressure there and it's video proctored which will reduce a little additional pressure and also so you know you're gonna be using the computer to answer questions so we'll put out more details as we get closer to the exam. We haven't put the bins out yet but we'll get those for you semi-soon just so you know this is gonna be based on previous terms for the bins, okay? And there are no alternative exam times during our pandemic so there's one exam so that's you should talk to send mail to CS 162 and make sure you talk about the conflict forms and the fact that the discussions are on Thursday we'll take care of that, okay? All right, the other thing is start planning on how your groups are gonna be collaborating, okay? So get, you guys should talk to everybody, okay? You're gonna, we'll talk more about video proctoring but we're also gonna want microphone and video and stuff but basically start thinking about how you're gonna collaborate and plan on meeting multiple times a week. I would suggest with a camera, right? This is kind of the how to humanize things enough that you can actually have interactions. We may even give some, we may even give some extra credit for pictures of you guys all on Zoom together, we'll see how that works. Make sure to fill out the conflict form on Piazza if you have other conflicts, okay? I think that's been out for a while so hopefully people know about them. The regular brainstorming meetings try to meet multiple times a week. I'm gonna give a part of a lecture that I used to give a while ago and I think I'm gonna start giving again on strategies for collaborating with teammates. Again, it's very hard to deal with this in today's sort of virtual environment. So we'll see what we can do. Okay, I think that's all the administrative review that I had for today. Unless there are any questions. Okay, homework one, I don't know. I haven't looked at the schedule. Everything is on the schedule. So whatever, so I think it's wherever it is. So definitely take a look. I don't think it's due quite so soon. All right, now let's move on. So there was a question earlier. What is P-threads stand or what does POSIX stand for? So POSIX is the portable operating system interface for UNIX and just to, there's a chat right now about deadlines. We will make sure that every deadline you need to worry about is on that schedule. Okay, so we'll try to keep that as up to date as possible. So just look at the schedule. All right, I'm glad we cleared up that. I was pretty sure homework one wasn't due tomorrow. So anyway, so POSIX is the portable operating system interface for UNIX and it's loosely based on versions of the system calls that were appearing in different variants of UNIX. You should know there are many variants of UNIX. Starting with the early AT&T days and then there was Berkeley Standard Distribution UNIX. Yay, Berkeley and a bunch of other ones including the one you're working with, Pintos. And so just among the UNIX variants, there were variations and then there were other operating systems that didn't have the UNIX versions of the system calls. And so there was a standardization effort to come up with a set of standard system calls that operating systems could support even if they had their own unique ones. And so in fact, if you actually go to look at the Windows system calls interfaces, there's actually a partial version of POSIX for some of the system calls. So you can take a look. And what P thread is the POSIX threads, okay. And so that was what P thread stood for. So let's now talk about this UNIX or POSIX idea that's kind of the linchpin of this lecture which is that everything is a file, okay. So this was actually a little bit of a strange idea when it first came out and now pretty much everybody's used to it but there's an identical interface for files, for devices like terminals and printers, for networking sockets, for inter-process communication like pipes, et cetera. All use the same interface with the kernel, okay. And what is that interface? Well, that interface has open, read, write, close. Those are very standard variants and the question of is Linux a version of UNIX, yes. So open, read, write, close are standard calls and you use those on everything from files on disk to devices, et cetera, okay. And there is an additional call there's an additional call IOCTL for those things that don't quite fit in the standardized open, read, write, close. So some people call it IO-cuddle. I've always heard IOCTL. It's really IO-control, so I call it an IOCTL. But there are a lot of IOCTL calls that you can make once you've opened a device to configure it. So it might be things like what's the resolution of a screen? Are you blocking or non-blocking, et cetera? Those are all typically IOCTLs, okay. And so when you make a new device and you're developing your device driver interface with the kernel, you typically have an IOCTL interface for those specialized things that don't quite fit into that. There's square pegs in a round hole as far as open, read, write, and close. Now sockets, the question about sockets and operations on that will actually start talking a bit about sockets next time as well. So this idea that everything's a file was a bit radical when it was proposed. There's a kind of a seminal paper from Dennis Ritchie and Kim Thompson that described this idea back from 1974. And I actually usually teach this paper when I teach 262 because it's an interesting first paper for that class. But since I'm not teaching it this term, I'm teaching you guys instead. I figured I'd pop it up there as an optional reading. So if you go to the resources page, you can actually take a look at that paper and see how they talk about this idea and how they talk about things that still are well-used ideas in UNIX operating systems to this day. And that's from 1974. So it's pretty impressive how some of their very clean interfaces and ideas have lasted so long. It's kind of a, it's a little bit weird from a research paper standpoint. If you've done any reading of research papers, we'll read some more normal ones later in the term. This one doesn't really have a lot of evaluation, but it does describe some ideas. So give it a shot. So the file system abstraction, which is what goes across devices and files and sockets, et cetera, is pretty much the simple idea that it's a named collection of information in this file system. POSIX file data is a sequence of bytes as you can imagine, the input from a keyboard is a sequence of bytes. The input from a disk is kind of a sequence of bytes. It's really blocks that then get put into the kernel and then eaked out to the user as a sequence of bytes. For files themselves, there's actually metadata, which is information about that file, such as how big it is, the what was the last modification time, who's the owner, what's the security info, what's the access control on it, et cetera. Does it have a set UID bit or a set GID bit on it? We'll talk a little bit more about that later, not today. And then a file is like a bag of bits, okay? A directory is, as you well know, a hierarchical structure for naming bags of bits, okay? And if you notice, as you're all very well aware, a folder is something that contains files and directories, and what you're gonna learn as you get inside the kernel as a folder is really just a file that happens to map names to actual file contents, okay? And if you look, the hierarchical naming is really a path through a graph, okay, so you start at the root directory, which is a file that contains root names, like slash home means that the root directory slash has a home entry in it, which points to a different file, which has an FF entry in it, which points to a file that has CS162, et cetera, and opening a file is a path through all of these different directories, and you can imagine we're gonna wanna talk about caching and stuff to make that fast, but we don't need to worry about that later, and then there's a bunch of other interesting things about links and volumes and things that we can talk about as we get more in detail, but we're trying to keep things a little more at the user level for the moment. So, and then tying this all together, of course, every process, graph or tree, that's a good question, depends on what you're talking about. The directory infrastructure you see described in the original Unix is strictly speaking a tree. We've got the ability to make something much more graph-like with modern operating systems, and especially when you get soft links, it gets much more like a graph, okay? Soft links or SIM links, as it was mentioned in the chat there, they're the same thing. So, every process actually has a current working directory. It can be set with a system call, which you could look up. You could do man on Chadir, change directory, and it takes a path and it changes the current working directory of that process, okay? So, that on the face of it is nothing more than just a path that looks, you know, like here's, this is a path here, home, FF, CS162, public HTML, so on, but that path is associated uniquely with that particular process that called change directory, and then it can be used. Now, we can still use absolute paths like home, ASCII, CS162. This is an example of a path that's absolute because it starts with a slash at the very beginning of the path and therefore ignores the current working directory, but all these other things you're used to, you know, index.html or dot slash index or dot dot slash index or tilde slash index, these things are all relative to the current working directory, okay? And so, that's why you might set that current working directory, and then you can use file names that look like this. So if you say in, you know, index.html, what happens there is it takes the current working directory and then appends to it slash an index.html and that's the real file we're talking about. So that's why you don't need to have an absolute path for everything you use, okay? And dot dot is a standard notion for the parent of a directory. So if you use dot dot slash index, it would actually take the current working directory, go to the up a level and then down to index.html, okay? And tilde is actually a form of absolute. So that's a thing, it's under my relative. So this is a little misleading. It's not relative to the current working directory. It's under my notion of relative here because everything is relative to whatever your home directory has to be. So that's a good catch. I'll fix that. So tilde slash index says my working directory slash index, tilde CS162 means the working directory of the CS162 account. All right, so those are two different usages of tilde. Okay, so the focus of today's lecture, so did everybody catch that? So this tilde slash and tilde's name slash, those are two different usages for different users. Okay? Either the U user, whoever you are, or the CS162 user. Okay, now, so we're gonna be working our way through a lot of different things through here, okay? It's, by the way, the tilde is actually a function of your shell. It's not necessarily a function of the operating system. So if you think it's too much of a hack, then you could use a different shell that doesn't have it, for instance. So today we're gonna kind of work our way through parts of this upper level here, okay? So for instance, we'll talk about the high level IO with streams, and then we'll get into file descriptors and the system calls, and we'll go a little bit below the system call interface, okay? But we're not gonna get too far down there because we're trying to keep ourselves in the mode of user level here. Okay, so quickly, high level file IO for streams. So a stream is really an unformatted sequence of bytes. Could be text or binary data. UNIX is notorious for having no being agnostic as to what the format of files are. That was actually also a really big innovation at the time that that UNIX paper came out, and you can take a look. But if you notice, that means that an unformatted sequence of bytes with a pointer, that's a stream. And so here are some operations. Oftentimes you wanna include standard IO.h, STDIO.h. But for instance, fopen is an example of a high level streaming interface. Most of them have an f in front of them, not all of them, okay? Excuse me. And fclose, and notice that fopen, which opens a stream, returns a pointer to a file structure. Okay, and over here we have a mode, and that mode is actually a string which tells you about how you wanna open that file. So you can do things like open it for reading or writing or appending or et cetera, okay? And some of these options allow you to truncate a file to zero and so on. Okay, so there's nothing in it if you open it, et cetera. So an open stream, if we succeeded because the file existed and we have permission, then what comes back here is this file star. So fopen returns a pointer to a file data structure. And that file data structure is what we're gonna use from that point on to read and write and interact with that data, okay? If we had an error, we would actually get back a null or a zero from this. We'd get back a no file star. And so ideally you would actually check to see whether what came back from fopen is null or not. And that would indicate an error and then you'd have to go take a look at an error structure to find out why. So standardio.h is the file you wanna include, okay? Here include standardio.h has all of the things that you're gonna need to be interacting with IO. So if you try to use some of the things I'm talking about in lecture and it tells you it doesn't know some of the constants I'm using, it's probably because you've forgotten to include that .h file and you're gonna wanna get used to figuring out what .h files you need to include because that's gonna be an important part of figuring out how to get your compiles to work. Okay, so let's try to keep the chat down a little, chatter down a little bit so that we're not distracting people in the lecture here so they can ask questions. There are some special streams, okay? STDIN, STDIOUT and STDERROR, which are defined for you, okay? So standard in is a normal source of input like the keyboard, standard out is the normal source of output like the screen and standard error is the place where errors go and usually standard out and standard error both go to the same place, which is to your screen, okay? But these are all defined without you opening them. So when your process first starts up, you have a standard in, standard out, standard error. And by the way, when we, well, you'll also have the low level IO versions of these as well, okay? So standard in, standard out basically give you composition in Unix, okay? The reason file is capitalized is because it's a structure and they've chosen to capitalize a lot of the names of important structures. The other answer of why the file is capitalized is I guess because it is. Anyway, so the question about what happens if you open a file but don't close it and then exit the process, typically what happens is it flushes everything out for you and then closes in the kernel. So it's not possible for you to cause a major problem by opening something and then killing off the process without closing it, it gets cleaned up automatically. So standard in and standard out, you're gonna see when you start working with your shells especially in homework two are basically gonna allow communication between processes because if you have a whole chain of processes and you manage to connect standard out of one to standard into the next, then you can communicate between those different processes in a chain and that will be one of the patterns that you're gonna get very used to as you get more comfortable with Unix, okay? So this is an example here. The cat command says just take a file and send its output to the console. So if you were to say cat, hello.txt, you would just see and you had a hello.txt file. You'd see the whole file just streaming on your screen. On the other hand, when you put a pipe symbol like this little vertical bar and you pipe it to grep, then what happens is cat takes the file, sends it to standard out, but by putting this little bar, I've redirected standard out to be equal to standard in of the grep command. And so now grep will take the input that we got from hello.txt and will grep for the word world and it will only output to the screen or it's standard out things that actually have the word world exclamation point in them, okay? So this composition with bars, which you will implement on your own in home or two is really a connecting of standard in and standard out, okay? Good. Now, let's look a little bit more at some of the high level API. So for instance, there are character oriented versions. So notice that all of these commands have a file star pointer into them. So we have to have opened the file first and then we pass in the pointer we got back to something like F put C or F put S or F get C or F get S. We put that file handle in there and as a result, the file structure, then we can put characters. That's a type of writing a single character at a time or a string at a time or get characters, okay? So example, here's a simple example where I open the file input dot text. Notice that this is a relative reference. So the current working directory is gonna matter here. And I'm opening that input dot text for reading. I'm opening output dot text for writing. What comes back from F open is the input file structure pointer. What comes out from this F open is the output file structure, okay? And we also have an integer, which we're gonna use for getting characters. So we do an F get C on input that gives us the first character or end of file EOF if there's no character there, okay? Now, can anybody tell me if we know that characters, let's talk about ASCII characters for a moment, are eight bits, why did I use an int for C? Can anybody think of that? Okay, EOF is something that is not eight bits, right? Because it's minus one, which is really, in representation it's really all ones in C. So it's 32 ones is a minus one. And so we can basically check for end of file by looking at that character. Otherwise we can use it as its character representation of eight bits, okay? Good. And so then notice we check and see as the character EOF, if it's not, we put it on the output with F put C, and then we continue F get C for the next and so on, okay? Yep, exactly like a 61 C project. So hopefully this is reminding you guys what this is like. Now, let's look briefly at the block oriented version. So those were character oriented, block oriented, our F read and F write. And here we again, we're opening the same files, but now we have a buffer, okay? And so now F get, so now what we're gonna do is F read is going to be grabbing a buffer pointer from us. So we're gonna put the buffer here and we're gonna say how big the buffer is, and we're gonna say what the size of the items in the buffer. So notice this buffer is char characters and it's a buffer size in size, okay? And if you notice, so then what we're saying here is our buffer can take buffer size characters. That's what those two things are. And here's our input file descriptor, or excuse me, input file structure pointer. And we'll F read, we'll read data into the buffer. Now, how much, can anybody tell me how many characters that this F read command will read from the file? Anybody have any idea? How many characters this F read will grab? Okay, so everybody's looking at buffer size of 1024 and they're all saying 1024. However, what happens if the input file only has 20 characters in it? This F read, how much will it return? So it's gonna return 20, right? Cause we're gonna get 20 characters. So this'll read the whole file in that instance. So just because you give it a buffer that has 1024 characters worth of space doesn't mean you'll get 1024, okay? So the F read is gonna give you, tell you how many it got. Then we're gonna say while we're getting some characters that are greater than zero, which why would we get zero? Well, if we're at the end of file because we've read all the characters, we're gonna get zero back. So this says while we got some characters, let's write those characters out. So notice the pattern here for right is here's the buffer, this is its length in characters and we're gonna, that's our output file and that will write the characters we just read. Then we'll read the next grouping and we'll keep looping until we're done and then we'll close. So this really just copies input.text to output.text. Okay, all right. Now, and moving, okay, so if there are only 20 characters in the file, of course we'll read one grouping, we'll write it out and then we'll get zero this time and we won't even go through the while loop a second time. Now you have to take a look at the, do a man page on the commands to see the exact organization. Okay, so we have a question here about why we get 20. So the reason we get 20 is if the file only had 20 in it. If the file had, you know, 1,025 characters, what would happen is we'd get 1,024 in this first read, length is definitely bigger than zero in that case, we would write 1,024 characters out. We'd grab the second read would only get one character even though it could get 1,024. We'll go through the loop one more time, we'll write that one character out, the next read will get zero characters, then we'll close the two of them. All good. Now, system programmers, that's you guys. So the question also, will this block, depends a lot on what you're reading from. If you're reading from a file, if you're reading from a file, it's, there won't be necessarily any blocking there, it'll just read till the end of the file, okay? If you're reading from a standard in like a keyboard, then end of file comes when special characters are typed like control D sometimes as end of file. And so no, it doesn't have to be 1,024 either, and these could be something other than characters, they could be integers in which case, you would say size of int and this thing would pull of things in in quanta of four bytes at a time, okay? Characters depends on whether we're talking about unicode or not as to how many bits. For now, since we're not, that's not an issue we wanna deal with, we're gonna say that characters, ASCII characters, we're gonna say ASCII characters are eight bits for now, okay? You guys will get to learn more about that later. Okay, so you as system programmers, that's what you are now, need to be paranoid, which means you want to always check for errors. So for instance, we ought to always write code like this. I mean, you guys ought to always write code like this. fopeninput.txt, if input is null, you gotta deal with the fact that there was a failure, okay? Always check for null. Always check whatever the return code is, make sure you check it. This case, the return, the fact that there is an error is returned as a null and then you have to do something else like call perr or whatever to find what error it is. This will actually say fail the open input file and then tell you what the error was. Every one of the commands has a way of giving you an error back if there's a possibility of an error. So be paranoid, okay? Check return values. It's very easy to be bad as a system programmer and not check your return values and then you're gonna get code that behaves very badly at the worst possible time. Okay, there's a Murphy's law for bad code, okay? And yes, so a language with result such as Rust, I'm assuming that you're talking about which is totally an awesome language. Well, maybe we'll talk a little bit about that later in the term would give you a better way to check but we're talking about C here right now, okay? And perr knows the interface to interact with which is the erno interface. It knows how to look for where the error is, okay? All right, I may be a little loose with error checking. Don't take what my looseness with error checking is anything more than trying to make sure the code examples on the screen don't get ridiculously long, okay? So this is literally do as I say, not as I show you in class when it comes to error checking, all right? All right, so I do wanna talk a little bit about positioning the pointer with your insight of a file. So there's F seek which lets you basically set where that pointer is so the next read comes from it. So what I've been talking about transparently without really saying a lot about it is I said, well, maybe this F read reads the first 1024 and then when we do it again, we start at that 1024 point for the next read. Why is that? Well, cause there's an internal pointer, okay? There's an internal pointer that's in in the buffering system that's gonna keep track of where you are. And so you need a way potentially to change that position. And so F seek lets you change where you're gonna read from next. And F tell tells you where you're reading from next. That tells you where the pointer is and rewind goes back to the beginning, okay? So, and notice that this seek command actually has a wince argument to it, which basically can be one of these three constants seek set seek end or seek curve, which basically tells you that when you say, go to a given offset, what happens? Well, if you seek current, it takes the current position and adds an offset to it. If you say seek set, it basically just takes your offset and sets the pointer to that absolute value. And then if you say seek end, it actually takes from the end back, okay? And you can look this up, but it's preserving this high level of abstraction of a stream. Now let's contrast what we've been talking about with low level IO, okay? So kernel Unix, the Unis Cs, which have POSIX IO have sort of the following design concepts behind them, okay? There, the question here about whether you need wince, there are different forms other than F seek that actually don't need wince. You can just do a man on F seek and see them, okay? So some concepts that went into this, which I've already hinted at is uniformity, that everything's a file, we already talked about that. Open before using, clearly we've talked about that, but for instance, that gives a, an opportunity for the kernel to check for access control and arbitration and not return an open file handle that you can use unless you have permission to use it. Everything's byte oriented, okay? Which is even if the blocks are transferred, everything is in bytes. So this is the fact that the kernel is completely agnostic on the structure and format of any files or data in the system. It has no, no requirements except for one particular type of item and that's the directory. So the directory has a special format that the operating system, excuse me, can know how to interpret, okay? The kernel is gonna buffer reads and writes internally. Part of the reason for that is for caching and performance, we'll talk about that. But another reason for that has to do with the fact that things like disks are blocks oriented you can only pull in a block at a time. Whereas again, this is a byte oriented interface to the user and so we need to have buffering inside reads and writes to give us both performance and the ability to match the block structure, the devices against the, the bytes of the user. Okay, and then explicit close. So let's look at this raw interface. So notice there's no F in front of open here, no F in front of creator close. There are some flags that sort of say what access modes you want and what permission bits, okay? And what comes back from open is not a file star, it's an integer, okay? It's a file descriptor, it's just a number. And if the return value is less than one, that's an error and then you have to look at the error variable to know what the error was, okay? All right, well, mutexes there is no explicit locking of the form that's being asked about for mutexes in there. You can take a look at that philosophy in the UNIX paper. We'll talk about locking a lot more as we get further. So what, so when you get back from an open, you get a number, which is a file descriptor. This is a open is essentially isomorphic to a system call. In fact, what's inside of open in the libc library is a little bit of wrapper around a call system call, okay? And so the operations on the file descriptor are as follows. When you do open and it succeeds, you actually get an open file description entry in a system-wide table in the kernel, okay? And the open file description object in the kernel is an instance of an open file. And the question I might ask you is, so why did we return a number that's really a pointer or is really an index inside a table that points at file descriptors rather than a pointer to the file descriptor? Can anybody figure this out? Yes, security, what sort of security? Anybody guess? Yeah, so there's lots of good answers in the chat there. So one, this description entry is in the kernel. So the user couldn't access it if they wanted. More interesting, there's a philosophy here, which is by returning only a number and only allowing you to access a number in the commands, it means that there's no way for you to access things you're not supposed to because the kernel immediately checks your number about against the internal table. And if it doesn't match up, it just doesn't let you go and do anything. There is a little bit of a information leakage advantage to that as well, but this is mostly about the security of not being able to address file descriptions you're not supposed to. So if you look at some of, if we look at the parallel to the ones we talked about before, there are standard in, standard out, and standard error, which are the system called descriptors equivalent and their values are zero, one, and two. Okay, and they're in this UNISTD.h. Okay, and then there's a way to say, well, for a file star, give me the file number inside of it. And that's because if you did F open, you actually are running a library call that internally calls open. And so every file star you've got actually has a file descriptor saved inside of a user level data structure. And you can go back the other way as well. So the low level file API, we have things like read instead of F read. So read takes the file descriptor integer, a buffer, and the maximum size of that buffer in bytes in this case, it doesn't quite have the flexibility of read. And it'll tell you how many came back. Okay, and if you get zero bytes, you get an end of file. And if you get minus one byte, you have an error. Writing is similar and seeking is kind of the equivalent of FC we talked about earlier. So here's a simple example where we do open. Here's the name of the file. We have the following flags for the fact that we want to be read only. And various permissions that we want to have on that file. Okay, and we open it, we get a file descriptor back. We read from it, okay. We close it. Notice that read and close have to use that same file descriptor, okay. And then write, we might open, or we might try to write something to that file descriptor, okay. But if you notice when we've closed the file descriptor by the time we get around to writing it, it's already closed, so that could be an error, right? There's lots of errors that can come back. The file being bigger than Mac size is not gonna come back as an open error that's gonna come back when we try to write on it, of course. Okay, so how many bytes does this program read? Well, we look at what came back from RD and that tells us how much we read. So design patterns, again, just to tell you this, this is actually at the system called interface. Always open before you use. It's byte-oriented and you have to close it when you're done. Okay, reads are buffered inside the kernel. Writes are buffered inside the kernel for lots of reasons we talked about. This buffering is all part of a global buffer management which we'll also talk about when we get to the internals. And you'll see why the demands of things like the file system and the buffer manager and so on require that caching, but also that it can give us good performance as a result. So some other operations at low level IO, we talked about iOctals. Okay, this is an example of when you open something that's not a file in the file system, but rather is a device or whatever you might call some iOctals on it. You can also use iOctal on open files for certain issues about blocking and non-blocking and so on. We can duplicate descriptors. Okay, I'm gonna show you that where you have an old descriptor and you get a new one out of it. Okay, and we can also make pipes where we create a brand new pipe, which is two file descriptors, two integers in an array. And then if you do fork, then you have two ends of a pipe that the two processes can use to communicate with each other. And that pipe command is exactly what you're gonna use to set up pipes when you do your shell. Okay. And there are ways to do file locking, but it's not a mutex per se. It's locking that's specific to the actual file system. Okay, and ways of memory mapping files. So that'll be another interesting thing that we'll talk about once we get a little bit further along in how things like page tables work. We'll talk about, in fact, how to take a file and map it directly into memory so that now you can do reads and writes to memory instead of reads and writes to the file system. So you'll be actually looking at memory and structures and so on in memory rather than executing read or fread and write or fwrite calls, okay? And we'll talk about asynchronous IO a little later as well. So why do we have a high level file IO? Well, high level file IO, first of all, to look at it, we have something like fread. What happens when you execute fread is there's a bunch of work being done just like a normal function in the library and some of that work is about checking to see if the thing that they're trying to read might already be buffered in a local user level buffer. Okay, and if not, then it goes ahead and does this pattern we talked about last time or the time before and how to actually do a system call where you have to set up some special registers with a system call ID and the arguments, et cetera, and then you do a special trap that goes into the kernel and does the system call and comes out, okay? All right, low level is an example in which where the read really just does the system call. So read is essentially just a C level wrapper for the system call, fread is something more sophisticated. Now there was a question in the chat about what I mean by buffering. What I mean is you may do read, you may read 13 bytes at a time but the underlying system is maybe optimized for 4K bytes at a time. What fread will do is it'll actually ask the kernel for 4K bytes and then put it into a local memory data structure and then all the subsequent freads you do for a while just look in that buffer and grab the next 13 bytes without having to go into the kernel much faster, okay? Because kernel crossings actually take some time, okay? And so streams as I mentioned are buffered in memory and so one of the ways you can see this for instance is if you do printf beginning of line, so printf actually goes to the buffered version of standard out and you do a sleep and you say end of line what happens is when that finally gets flushed to the console possibly because of that control or that new line there, everything gets printed at once it says beginning of a line and end of line as a single item. Whereas with the low level direct system call you might do write to standard out file numbers so the standard out beginning of line you wait a little bit and then you do the same thing with end of line and what you'll see is the word beginning of line on your console you'll wait 10 seconds end of line. So there's no buffering in this path at the bottom but there is buffering in the path up top, okay? So yes, so now you're starting to say some interesting questions here, okay? So the 18 and 16 have to do with the number of characters we're writing there by the way. So the question you might ask is is there buffering? The question that was asked is there buffering in the kernel if there's buffering at user level? Yes, there's two different buffers going on, okay? The buffering in the kernel is completely transparent to you, there's no way for you other than timing and failure of your system to really know that buffering's going on in the kernel. Buffering in user level can make things much faster but you can do things in a way that mixes things up quite a bit if you're not aware that you're using for instance the stream version of a file and the raw version of the file together and that's usually a problem, okay? So what's in a file star? Well, as we mentioned a file star has user level buffering so inside of it it's gotta do the raw calls and so it's clearly gonna have a file descriptor inside of the structure file star that structure is gonna be in your program, okay? And so when you do F open what happens is F open allocates a new file structure, then calls the raw open and then returns and some buffering inside the file and then returns the pointer to that structure from its library to you, okay? So buffering inside of a file is done at user level so when you'd call F right it's put into the files buffer until you flush. The C standard library may choose when to flush out to the kernel. If you really care that something is visible in the file system then you're gonna need to do F flush on your own, okay? And so you wanna make sure that you're not expecting things that you just wrote with F open that you do F open, F right and you're doing something else you don't necessarily know that that's gone to the file system unless you do flush, F flush, okay? So weakest possible assumptions about whether things got from user level into the kernel or not. So here's an example where we do F open of file.txt we write something, okay? To, we write a B to that file, okay? And then we do F open file.txt again so notice we have two copies of the file open in two different file star structures. And so if we go to read from the second one we're not necessarily gonna see the first one, okay? So this F right here may or may not have gotten into the kernel depending on whether it got flushed or not, okay? All right, cause we've opened this file twice two different bufferings in the kernel we've written to one and we haven't flushed into different buffering in the user level we haven't flushed it out so we don't really know what's gonna happen. So if you're gonna write code like this, be aware. So notice what I changed here is I wrote the data then I did an F flush. At that point all the data that's buffered gets put into the kernel and now this F open and read will get the data, okay? So just be aware that when buffering is going on and you're start doing weird communications you gotta be careful, okay? If you close the first file then yes it'll get flushed out, okay? So your code should behave correctly regardless of what's going on so make minimum calls to F flush and with a low level API you don't have this problem so if you only do open reads and writes you're not gonna have the problem of different users of the file not seeing the data because the kernel hides all of its buffering from the users, okay? But you don't get the performance advantage of all the buffering in user level. And why do you wanna buffer in user level? I just wanted to show you system calls are 25 times more expensive than a regular function call. So if you look here the blue is time for regular user just function calls. The green is system calls for doing get PID in this case and the red again is a version of get PID that doesn't have to do a system call, okay? And so notice that it's much better not to make system calls if you can avoid them, okay? So if you read or write a file byte by byte the max throughput for instance might be 10 megabytes per second whereas if you do F get C which is a buffered single byte by at a time you could actually keep up with the speed of your SSD. Why is that? Well, F get C is a buffered command and so you're giving it a file star and what happens is the first character you read goes into the kernel brings a big block of data into user level and then the subsequent F get Cs just quickly return you another character until you use up that buffer and then you make another system call. This is exactly a form of caching. Okay, exactly. And that's part of the reason that you can run into trouble if you use it incorrectly. So system call operations, why buffer in user space now? So in addition to performance we wanna keep the kernel interface really clean, okay? So the operating system doesn't know anything about formatting, okay? For instance, there's no way to read until new line from the kernel because again the kernel doesn't know what a new line is. That's a feature, okay? So what the solution is is you use the buffered calls like F get S or get line that take file stars and what they do is they read a chunk of data out of the kernel and then they just very quickly walk through until they find the next new line and give the whole line to you. So now let's talk a little bit about process state, okay? If you notice here we're kinda moving our way down to the bottom a little bit but the kernel on successful call to open has a file descriptor returned to the user and an open file descriptor is created in the kernel, okay? And so for each process the kernel maintains a mapping from file descriptor to open file descriptor a description in the kernel and then on all the future calls the kernel looks up the file descriptor it gets to find the actual description structure, okay? So here if we notice we have two buffers we open food.txt and then we read here we read from that file descriptor into buffer one a hundred characters, why does this work? Well the kernel remembers because you opened it that FD the number is talking about the file food.txt that's all cached, okay? And therefore just calling read knows what file to work with and furthermore it also knows to pick up where it left off so this read gives you a hundred characters this read gives you the next hundred characters and why because that's stored in this file description in the kernel, okay? So what's in the file description? Well you could look it up, right? You guys have Pintos, you can check it out. The things that are important for today here are the inode structure which is an internal file system thing we'll get to soon enough and it tells us about where all the blocks are on the disk for your file and the offset tells you kind of where you are in the stream. Okay, so what's the abstract representation of a process? So if you guys bear with me a little bit there's a couple other things I wanted to say before we're done. So remember a process has got threads, registers, et cetera it's got memory for the address space and then in kernel space we've got this file descriptor table which maps numbers that are file descriptors to actual descriptions of files. So if we execute open food.txt and it gives us back descriptor three, this is what happens. We have descriptor three in your process points to an open file descriptor table, description table in the kernel that says the file is food.txt and it's at position zero, okay? And not shown as descriptor zero, one and two. So I started at three and hopefully we'll get to zero, one and two here in just a second. But now suppose that after we open the file we say read descriptor three which is this file into the buffer the next hundred characters. Well what happened was we read the next hundred characters into that buffer and we're at position 100 and notice the kernel knows what position that file description's at. It's at position 100, okay? Finally if we close what happens there is the file descriptor table is cleared and the file description's cleared and voila we've just finished that off, okay? But let's do something more interesting. So let's not close, let's fork, okay? So here's process one. Here's a child process we just created. Notice that we have the address space is duplicated. We've got the thread control block. I'm assuming there's only one thread for a moment. And the file descriptor table is duplicated. So now both process the parent and the child point to the same file description. So that means that either of them can read from that file, okay? So if this process tries to read 100 bytes from file descriptor three then it's gonna read 100 bytes and we'll now be at position two. And now this guy does the same thing and voila we're at position 300 because we have forked the process and they're sharing the file descriptor, all right? It's copy. So now we start to see what it is that fork is doing that's more than just the address space, okay? And if this process one closes the file notice that all that that does is it only removes the file descriptor pointer to the file description because that pointer for that file description is still in use by another process. So there's a reference count on there and the fact that process one closed it does means that process two still has access to it, okay? So if you're asking can we copy this open file description for process two if you fork process two you'll get a copy of it again, okay? The only way to get a new file description that's unrelated to the old one if that was your question is by doing another open of the same file, okay? So why do we allow this? Well, aliasing the open file descriptions a good idea for sharing resources like files between parents and children processes if they're working on the same thing together, okay? And remember in POSIX everything's a file. So this really means that both the parent and the child is both have access to the same resources. The question was why is this 300 and not 200? If you notice at the point at which the read happened we went to 200, notice that this process goes to read another 100 bytes from file descriptor three. If we look up three we see that yes indeed here's the file description, the pointer is at 200 and so when we read the next 300 bytes we've just advanced it to 300, okay? So when you fork a process the parent and child's printfs go to the same terminal. So this is one of the last ideas I wanna finish up but let's take a look and this is gonna be very important for homework two so hold on for a second here. There are a set of three standard file descriptors that are always allocated. We already talked about them. Zero is for standard out, one is for standard in and two is for a standard error. So zero is all the inputs from keyboards. One is the standard output that has no errors and two is the output for errors. So if a process that happens to be say a shell forks another process which might be a child process it gets copies of all the same file descriptors. This is why if we have a parent that forks a child and the two of them are both printing output notice that descriptor zero is shared and so the outputs go to the same terminal interleave. Okay? And that's the standard way that command that you type at the command prompt for a shell works which is why when you type a command and it's printing it goes to the same terminal that your shell was running from. Okay? So and if we close standard out in process one we don't close it in process two. Same with standard in. Okay? The only thing that will change standard out or standard in is if you change them. Okay? Which is the question here is if you have two processes both on standard in wouldn't they duplicate the input and the answers? No, it's whichever one reads first gets the next character and vice versa. Okay? There's only one copy of things coming in. So other examples are sharing network connections after fork sharing access to pipes. These are all things that when we start getting into more interesting patterns are gonna be there. Okay? The final thing I wanted to show you here is about dupe and dupe two which is for instance suppose we've got files descriptor three pointing at this description and now we execute a dupe of three. So what dupe of three is gonna do is it's gonna make a new file descriptor four. Okay? Which points at the same open file description that three was. And so after dupe now we have both three and four pointing to the same file. And we could if we wanted close three and still use four. Dupe two allows us to do something a little different which is basically allows us to take file descriptor three and duplicate it and call it file descriptor 162. And so now we've chosen which file descriptor we wanted to use whereas dupe chooses us a one. Okay? And when you start getting into the shell with homework two rearranging what descriptor zero, one and two do is how you will make pipes from one command piping to the next and so on if you remember cat piping into, piping into grep. Okay? All right. And I think we've run out of time. There are some fun things. I guess we did have enough questions. I wanted to just give you this one which is fork in a multi-threaded process. Everybody's asked me about this. Don't do it unless you really know what you're doing and aren't gonna be surprised. So here's an example of a process that not only has some file descriptors but it's got multiple threads, a red one and a black one. If you fork and suppose it's the black thread is the one that runs the fork command then when you're done, you've got duplicates of all the file descriptors and address spaces but only thread one still running. So this is unlikely to do what you want unless you're really doing what you expect, okay? All of the memory that the threads had will still be around but the threads themselves won't be running, okay? If on the other hand you exec, that's exactly right. That was a good question. Then you throw everything out and you get a brand new process and that's probably will do what you expect, okay? Okay, it's safe if you call exec. The other question about, does Dupo always assign the next int? I wouldn't count on that if you had anything that depends on that, I wouldn't count on it. It basically gives you one. It's probably the next one, but you never know for sure. And what does exec do? Exec erases all of the processes address space and loads it up with the new process, okay? All right, I think we're over time. So I'm just gonna say in conclusion, we've been talking about user level access to the file IO and some of the user interfaces that you're gonna become really familiar with, okay? And the POSIC idea of everything's a file is a pretty interesting one. I encourage you to take a look at that original UNIX paper. All sorts of IO's managed by open, read, write, close, an amazing amount. And we also added some new elements to the process control block, like mapping from file descriptor to open file descriptions, the current working directory, et cetera. So I wanna wish you all a great weekend and we'll see you on Monday. Sorry for going a little bit over. Have a great weekend, everybody.