 All right, well, why don't we get started? Again, my name is Carl Nestle. I'm going to be covering a little recitation today. What we're going to be doing today is covering the balance of the file system calls, kind of going over about how file system calls relate to the process calls, and introduce some synchronization issues with the file calls that, up until now, you really didn't have to worry about in your project, and then kind of give like a little introduction as to how to begin the bulk of assignment two, which is going to be the process calls on that. So in terms of what we have coming up, questions so far about what you've done so far, you had checkpoint 2.1 a week ago, which was console tests. So I'm assuming at this point, everyone does have, at the very least, kind of cis-right working, as well as your data structures for your files and your file tables on this. Correct so far on this? Questions I can just answer in general about the assignment, where to go from here before we actually begin. Going once, going twice, all righty. Well, jumping right in then, let's actually talk about some of the file calls that are coming up here. Again, I should say read and write are pretty much mirror images of each other. So if you already do have, I'm sorry, right written, read is a case of taking right and tweaking it out around a little bit. My suggested order of preparation for this is do open first, if you have not done so already. That'll take care of initialization of your data structures, specifically the file descriptors on that, and then move on to close, and then L-seq. The next two, change directory and get current working directory, are two that are, well, you need to do them for the assignment, but they're somewhat smaller, and the bulk of the ones are again open, close, read, write, and then L-seq on that. So let's actually take a look at what we can do in terms of time coming up. You've got about two and a half weeks to go with this assignment. That's somewhat deceptive, because you probably have heard, and I can certainly confirm, the process control calls are much harder and take longer to implement than do the file calls. So again, I don't mean to kind of dump anyone's parade here, but you really do want to, if ideally, have the file stuff, or at the very least, have file-only test passing within the next few days or so. So if you can kind of allow yourselves this weekend to get caught up with the rest of your classes and jump into process calls as soon as possible, that would really be great. So again, file calls, you want to do them before the process calls, but frankly, the process calls are going to take a little bit longer. They're going to be more lengthy, and so you just want to be aware of that. OK, open essentially is what, if you will, create a file descriptor on this. And we were kind of going through some suggestions that you need for the implementation. You may have already done a lot of this, but if you have not, some points to think about here. In all of the calls, Jeff is going to be testing a lot of, if you will, security, sanity on that, especially if you haven't looked already in the test bin. There is one particular test, bad calls, that he's going to be throwing at your calls. And essentially what it does is it throws a whole bunch of garbage inputs on there, like things like null pointers, numbers that are too big or too small, some other fun stuff on there. So one of the things we're going to be talking about a lot in the next few minutes here is do basic things. Like remember, if you will, the open syscall takes in, roughly speaking, two parameters, a pointer to the file name, as well as kind of a flag for the type of file that you want to create on this. So to begin with, we want to make sure that the file name itself is valid. In other words, that the user did not pass us a null pointer or a bad pointer or what have you on that. Other things, too, flags, again, we're going to be able to support certain types of flags on open, but there's other flags that the user may try to pass in that is not supported. And again, you want to return an error on that. Again, you probably are sick of me saying this by now, but the man pages are your friends. There's a lot of detail in them. But especially when you take a look at the return value and the error codes on that, if there's an error code that's listed in the sys161 man page, it's probably there for a reason, and it probably will show up on one or more of the test. So in other words, make sure that you return the proper error code for the proper error situation on that. Beyond, if you will, initial input sanitization, the bulk of open is to create the file descriptor. Well, how are we going to do this? First off, remember, what is a file descriptor? Roughly speaking, it's an internal data structure that contains the state of a file session. So that contains things like the file position and other fun stuff that you kind of cram in there. And where do you actually put the file descriptor, if you will? Well, we have this thing called there is a process. That's what you're running in with things like an address space. And then you have a file table. And then you have file table is a list of, if you will, open file descriptors. So what you're going to need to do here is find, if you will, an available file descriptor. Well, what do we mean by that? You've got to look at your file table and see if there is a free slot somewhere. And there may not be. If there's not, again, you need to return the appropriate error on this. So once you've actually found an available slot in your file table, you need to populate that with a reference to your file descriptor and then create the file descriptor with appropriate initial contents on that. And when I say appropriate initial contents, things like, hey, take a look at this here. When initializing the file table, well, where do we want to actually begin reading or writing a file? Where do you expect to be getting? At offset number? Exactly. So in other words, just always initialize your data. And I should say this. This may sound obvious, but from hard, hard experience, one of the most common mistakes that students make and it can be very tough to debug is not initializing your data structures on this. So you know what they are. And I'll get to using this later on. But it's something like maybe just assuming that, well, of course, all data is zero. Well, maybe, maybe not on this. But in other words, whenever you create any structure, immediately where is that going to be initialized to the default values on this? Because if you don't do that, I guarantee you you will have problems. Questions on open? So again, basically, we're creating the state of an initial file. We'll read along. Yes? No, good question. In other words, how do you test these things? And excellent. You should be testing all of this stuff as you go along. Definitely, you want to make sure that at some point you pass the official test because that's how the grading scripts are run on this. The grading tests are actually very thorough in many cases. And sometimes, as you're kind of beginning to implement, let's say, open or something like this, if you run, let's say, something like file-only test, it'll just blow up at you. And you really will not have an idea about where the problem is. So this is actually, and if you noticed my post on discourse from a few days ago, but we strongly suggest you write your own test code that exercises your code as you're going along. Because what that can do is just write a simple hello world program that opens up foo.txt. And you can use that to check, are you even importing the parameters correctly? And then after that, you can maybe check to see, are you actually populating the file table. So in other words, what would be a good jumping off point here is write a program and user code that, again, this is all on the discourse post, that literally opens up the file foo.txt and writes the string hello world to it. And then you can see whether or not it actually got written and then open up, I'm sorry, create another program that reads that text and prints it to the screen. That'll allow you to test the basic functionality of, let's say, just open and just write and just open and just read. And then you can move on to things like the read-write test, which is kind of an amalgam of the two on this. But my point is in writing your own code, you can, number one, you have a better understanding of this. Trust me on this. There's a reason why I'm suggesting write user code. But the other thing is it allows you to target small subsets of what you actually want to test. Does that make sense? OK, other questions? OK, moving right along after open, close, kind of the dual of this. And if you will, we're talking a bit about a file as it exists in, if you will, the process level of abstraction. In other words, we're talking about a file session, something that gets created by open. And as you can imagine, kind of gets trashed by close on this here. As before, again, you want to do a little bit of input sanitization here. Now, in terms of recycling file descriptors and file handles, let's actually talk about that. We've kind of glossed over this, but possibly passed this along to your friends here, that the, if you will, underlying file descriptor and, let's say, the file handle, OK, they're related, but they're not quite exactly the same here. In other words, just because you remove a reference to a file descriptor, you're not necessarily trashing the underlying file descriptor itself. This is a case of rough counting. And this is why we're going to be talking a little bit about process calls. Up until now, you've only had the init process running. So one process, it gets kind of the entire system to itself. So in other words, if a process opens the file, that process is the only process that has a reference to that. So when the init process closes it, you know what? You can go right ahead and you can destroy the underlying file handle or file descriptor. No problem, because who else is going to complain? Once you start implementing the process calls, and in particular fork, you're going to have multiple processes in play. And we've got a problem here. Because, yeah, in other words, we can say we want to call close on, let's say, we've opened file foo.tex. And the file descriptor we got back in user land was five. What have you? Now I'm done with foo.tex. Then I call close on file descriptor five. Well, that means I'm done with it as far as I'm concerned. The problem is that there may be some other process out there that is already using it in the meantime. So in other words, now you have to keep track of references and locking. We'll explain that in just a minute or so. So in other words, what we're trying to say with close here is just because you trash a file, let's say, one of those FD integers, it may or may not mean that you have to trash the underlying internal data structure. Make sense on this? And I can tell you what's going to begin to test a lot of this stuff is fork test and following. Read and write, not a whole lot of time on that. Suffice to say, presumably you already have a functioning write. Read is the dual of that. Do more sanity checks on this. And oh, yes, current UIO. Presumably you're already up to speed on that. That's kind of the data structure that's used by both read and write. Questions to read, right? All righty, lseq. I was asked to spend a little bit of time on this because probably of the remaining file calls, actually almost assuredly, this is the most difficult and time consuming of the remaining file calls on this. It's not that it's conceptually difficult. Simply put, you use lseq to move the file position. The problem with lseq is it is mechanically intricate. And so you're going to have to spend a lot of time. I know when I remediated 250, I thought I had a good understanding of C pointers. Oh, no, I did not. When I got to lseq, I got schooled by that. And then when I wrote LXXV, I got schooled again. So it's actually, I mean, it's a good learning experience. But I'm just saying allow enough time in your schedules to kind of go through this here. Problems with lseq on this, in other words, again, this goes to the mechanical implementation here, is up until now, all the file calls that you've been dealing with, actually all the syscalls in general, have had nice, easy, peasy 32-bit parameters. This one now has a 64-bit parameter. And not just that, it has a 64-bit return value. So again, got to be cautious about that. Plus, as an additional kicker here, one of the arguments ain't even on, if you will, the trap frame on that. So beyond that, the usual sanitization cautions apply on that, too. So just before general questions about lseq, before we kind of dive into some more mechanics of it. Suggested implementation, how many people are working on this or done with lseq? OK, sweet. Any past file only tests? Sweet. So if you're not up there at this point, this is what you actually need to be cautious of here. Arguments and returns here. Again, in general, and this again applies to everyone in this room, take a look at the syscall interface API. Essentially, it's documented very well in the syscalls.c file. And this is one batch of comments that you really do need to, if you will, read. I know there was a question on discourse a few days ago. Hey, can I change the way, let's say, I pass data to or from? Can I use, let's say, a return value point or something like that for my syscalls on this? And our answer to the back was, essentially, if you will, the connection between the syscall, this patch, and syscalls.c and your underlying implementations of the functions themselves, you control both ends. That's not an official API, so you can do with that whatever you want on that. What you need to be careful of is the official API as things kind of face user land. And that's actually what is a bit of a problem with LC. That's the stuff that concerns, if you will, which piece of data goes in which register on this. So again, take a look at the comments. Now, let's see how we can actually handle, if you will, some of the arguments for LC here. First off, in general, we have a 64-bit argument for LC. That's going to be the position argument. And the kicker of that is it has to be stored in a pair of consecutive 32-bit registers that are 64-bit aligned. Roughly speaking, the base address of the parameter has to be divisible by 64-bits on this. So what we mean here is take a look down here. The format for LC, you can see the second parameter is off to position. The first parameter, we can very nicely put in the A0 parameter on the track frame. So that's a piece of cake, no problem there. Now, normally, we would put the position one in the second argument, which would be A1. The problem is that position is a 64-bit parameter and A1 and A2 are not aligned. So what we're actually going to have to do there is kind of skip over and leave A1 blank and then pop the position parameter into A2 and A3. So not too bad of a problem here, except that at this point, we've used up all four of the argument registers, A0, A1, which is skipped over A2 and A3. And now, we still have to take care of this WENCE parameter. So how we handle that? Again, the official documentation tells us, per MIPS convention, that once you've run out of argument registers, you put the parameters on the stack. So you're going to have to actually pull that off of the stack, beginning again as the documentation says, wherever the stack pointer is pointing, plus 16 bytes on this. Now, cautionary note for those of you who have not implemented LC yet, remember what we say the stack. In this point, we're talking about the user stack. So remember I was talking about when I was covering lecture last week that there were two stacks. There is a user stack and there's a kernel stack here. When, let's say, you call lseq and you implement an lseq syscall, you're trapping into the kernel. And remember, as part of the whole interrupt handler, you're cramming all of the registers onto the trap frame. And where does the trap frame get stored on which stack? Which one? Kernel stack. Exactly right. So the registers get stored on the kernel stack on the trap frame. The problem is that the, let's say, any arguments beyond those that don't get put in registers are still left in the user stack. So just cautionary note about that. When we're talking about stack pointer plus 16, we're talking about to some place in user space. And again, whenever we're talking about pulling stuff from user space, be really careful because you're going to have to deal with the copy and copy out routines on this. So kind of to recapitulate here, lseq is not conceptually difficult, but we have to deal with things like importing 64-bit values. We're going to have to deal with endianness, in other words, combining them in the right fashion. Similarly, in terms of pushing stuff back to user code, again, endianness questions come into play. And we're also going to have to deal with pulling this last, if you will, parameter directly out of user space on that. So questions, comments about lseq so far? Going once, going twice. Yes. OK, roughly speaking, the lowest address has to be divisible by whatever it is that we're trying to align it by. In other words, let's kind of take an analogy. Let's say that I'm just trying to align things in multiples of two. So I start with, let's say, 0, 1, 2, 3, what have you. 0 and 1, that is a, if you will, modulus 2 aligned block of data. So 0, 2, 4, 6, because those are all divisible by 2. In other words, 1 and 2, 3 and 4, those are not, if you will, block 2 aligned. Does that make sense? So in this sense here, if you will register A0, it's 32 bits in length here. So A1 is 32-bit aligned, but it's not 64-bit aligned. I would have to go up to A2 in order to get the next 64-bit block aligned of data. Answer your question? Other questions? Mm-hmm. I think the big Indian is the big Indian. Oh, roughly speaking, which one of these? Because remember, we're talking about a 64-bit block of data here, and it's going to have to fit into 232 thingamabobs here. Well, which one is it, let's say, because again, let's say, if we want to put, let's say, the position variable on, let's say, A2 and A3, well, does A2 contain the highest bits, or does A3 contain the highest bits on this? And this is one of these religious or holy wars in terms of how the chip designers do things on this. And we're just saying for the purposes of this, honestly, every time I have rewritten L-seq, I can't keep it straight in my own brain. You know what I do? I write a piece of test code, and before I even have implemented the guts of L-seq, what I do is I just call L-seq from user land with the parameters 1, 1, 1. And then in the kernel space, I just make sure that have I actually combined them properly, because think about this. If I reverse the 2, rather than have 63 zeros and a 1 is expected, I have, in effect, 31 zeros, a 1, and then 32 zero, that's the number 4 billion. And I'm going to see right away that I've messed things up. Does that make sense? So again, very often rather than trying to think about this conceptually, just write test code until actually it makes sense. And the same thing is true in terms of porting data back out to the user space. All righty. Couple of short sys calls here. Change directory. Again, just make sure that the pointer is valid. I should also say, remember, since we are importing a string from user space, you want to do that safely. And if you will get current working directory, again, make sure the pointer is, if you will, output to user space correctly on this. These are pretty quick. Dupe 2. This is kind of the last significant, if you will, file sys call. And roughly speaking, what we're doing with Dupe 2 is remember, we've got this idea of an underlying file descriptor. That's where we open up foo.txt, and it holds, among other things, the current read-write position of the file. And that's an internal data structure. But we also have a bunch of references to this, if you will, file descriptor that the user holds on this. In other words, let's say I opened up foo.txt, and remember the user got back the integer 5 on this. Well, what I can do with Dupe 2, roughly speaking, is, you know what, I want to clone that. So I can call Dupe 2 with, let's say, 5 comma 6. And what's going to happen is the kernel is going to kind of paste both referring to the same internal data structure. So why might we want to do that? We'll look at an example in just a minute. But kind of with me on what's going on here. In other words, take a look at this graphical elucidation here. Let's just say here that this is the old situation. I have, let's say this is, who knows, 1. Let's say this is, who knows? Let's say 1, 2, 3, 4. OK, so file descriptor 1 pointed to this file handle. File descriptor 3 pointed to this file handle. Then I call Dupe 2 on this. And in effect, what's going on here is, you see, I want to take, if you will, the underlying file descriptor that's pointed to by the old file descriptor and basically say, clone me a new reference to this file descriptor so that now I have two references to that old underlying structure. And as you can see, sometimes what happens, if this one was already pointing to another existing structure, that kind of gets cut off. Questions on this? Where this is often useful is redirection of console input and output on this. And Jing Hao actually coded up a little bit of a blob that illustrates this. Kind of to make a long story short here is, let's say we have a small program that, in essence, just prints to the screen what it was called with in terms of a parameter here. And we have a minimum, let's say, a little bit of a shell here that, in essence here, we just call, if you will, this program out here. So we just want to exec this program. And so if we type, let's say, echo hello world at a user prompt, what's going to happen is the parameter hello world is going to be fed to exec. And I think Jeff maybe, has he talked about exec yet? OK. To make a long story short, what it does is it, if you will, runs a new program off of disk and whatever parameters you have that, if you will, string hello world gets passed to the next program that gets run here. So when it just gets printed to the screen here. Now let's actually see how this might help us here. Why might we want to use dupeto here? Well, let's take a look at this. What we have here is, before, what we were doing is we were simply execing the file here. In other words, just run this file on disk, which is to run this thingamabob right there. And what does this thingamabob do is it does printf. And what does printf do? As we know, it prints stuff to the screen. So that's why, if we type echo hello world, it'll print hello world to the screen as we expect. But if we add some extra goo in right here, let's take a look at the next slide again here. And before we call exec here, what we want to do is we open a disk file, hello dot text. And we call dupon this. And what's going on here is we're saying, well, remember, this fd is a reference to the file, if you will, the file structure for, in this case, hello dot text. And what I'm saying is clone me a second reference to this file, namely whatever number this was. I also want standard IO file number, which is what? 1, remember? That's just a constant here. So I want 1 also to point to this disk file. So in other words, from now on, every time I call read or write on file descriptor 1, rather than it referring to the console output, it's going to be referring to this disk file. Does that make sense so far? So now when I write this here, if I, in effect, let's say, type echo hello world, what's going to happen is instead of it appearing on screen, it's going to be sent to some disk file. So my point is it's useful if we can do that because I didn't have to change this original program here because printout thinks it's writing to the screen. But again, remember what we were saying earlier. All the init process gets is previously open console output, or input output. What happens after that is up to subsequent processes. The init process could kill and just close out all of the console output. Or as in this case, what could happen is we're going to change things so that whenever you write to, let's say, file number 1, instead of it being printed to the screen, it goes to a disk file. We can do whatever we want. Questions on dupe 2? Was it this one? Yeah. What would happen to the standard out? Oh, good point. What happens, because we now have two references to this new file here, what happened to standard out? Well, let's take a look here. This is kind of exactly what's going on here. Pretend this was standard out here. What I'm saying is, one pointed to that, well, I now want one to point to something else here. That means I lose this reference to standard out here. And remember, we were just talking a couple minutes or so ago right here, close here. In effect, we're removing a reference to a file descriptor. So short answer is, we definitely lose a reference to standard out. Now, what happens after that? This is just true in general. We want to see, is there someone else using it? In the case of standard out, almost certainly yes, because what probably happened is, init has forked many, many times. And we essentially have hundreds of files, I'm sorry, hundreds of processes, with references to standard out. So the underlying file descriptor, which in the case of console is the stuff that prints stuff on the screen, is still going to be there, but we've lost a reference to it. Answer your question? I can always reopen it manually, but I've just lost my kind of pre-canned reference to it. Other questions? And by the way, you can do the same thing with input too. You can clone input so that instead of reading from a, let's say, from the text, I'm sorry, reading from the keyboard, you read data from a disk file. So that's how, if you want to know how Jeff scripts these tests, exactly this stuff. In other words, he's pulling a set of known inputs and using dupe to send it to your kernel, which thinks that a user is typing at a keyboard, well it's not, it's coming from a disk, and then the output is being sent to another file instead of appearing on the screen. And then the grading scripts then compare to make sure that the output matches the expected output yada yada. That's essentially scripting. All right. OK, I'm moving right on the process controls here. OK, so people have finished. It's good some people have already finished a file-only test. If you are, I'd say you're probably on par with where you should be. Don't rest on your laurels. If you're not finishing file-only tests right now, you probably want to push the accelerator a little bit harder on this. Again, I don't want to scare you, but again, remember part of this is, I'm sure you people, as I mentioned, have these things called other classes. And this ball of wax is due, remember the day of spring break. And I'm sure your other classes have other projects in midterms and whatnot. So in other words, that last week, it's a case of, well, I'll finish assignment two, but then you got everything else piling in there. So the sooner you're going to get started with this, the better on this. And again, use us. That's what we're here for. Post to the forum. Ask us at office hours. That's what we're here for too. So again, timeline. Make sure that you pass file-only test before you go on to writing the process calls. I'm not saying start looking at them, but before you really get cranking, a good jumping off point is you do have file-only tests under your belt. It is really, I would say, the first very significant milestone in your kind of OS 161 journey on this. You got one syscall out of the way before, but now you have a pool of syscalls. And just file-only test is really nifty. It really does throw the kitchen sink at these. Now, I should say this. Despite having, let's say, passing file-only test, which means you have a base functionality for open, close, read, write, and lseq, until you have process calls, at least somewhat written, you're not going to be able to test complete functionality of your file syscalls, because some of the functionality, if you will, involves, if you will, synchronization here. So this is what I was talking about. Whoops, wait a minute. There it is. OK, until now, we've only had one process. Oh, I should say this. K-proc, I know it's kind of come up in the meantime. It's not really a process. It's simply what OS 161 refers to as the threads that are not really processes on this. So let me be clear. When you hear the staff talking about the first process, we're not referring typically to K-proc. We're referring to the thing that when you're sitting at the OS 161 console and you type, let's say, p-bin-true or p-bin-foo or what-have-you, that's the first, if you will, thread that obtains a user level of context with things like a file table and an address space. And that is a true process in the traditional Unix sense of the word on this. So again, up until now, we've only had one of those things. And that's the thing that you have launched, if you will, to pass your console test or what-have-you on this. Now, once you pass file-only tests, again, for those of you who have already done so, frankly, so just go right on and start cranking on fork. Fork is going to be a ball of wax. And the sooner you can get started with that, the better. There are, if you will, there's going to be, roughly speaking, four process calls. GetPid is literally one line. So that's something you can do when you're feeling blue and you need kind of an ego boost on that. But the dependencies is you want to start with fork because you need more than one process, essentially, to do the waitPid call. Because remember, waitPid, we talked about this in lecture. In other words, one process waits on another. Well, it's kind of pointless if you only have one process because it's nothing else to wait on. XSEC, actually, you can do XSEC before you do fork, but it's really you're putting the cart before the horse don't. And it's also arguably the hardest of all the SIS calls on that. So make sure that you get the troika of, if you will, fork, waitPid, and exit kind of all written. And of course, getPid. So again, pass file only tests. Then at this point, start cranking on fork. And again, once you do that, be careful because now you're going to have to worry about synchronization of a bunch of things. And in particular, you're going to have to, if you will, synchronize accesses by multiple processes to, among other things, your file data structures here. So again, remember this whole thing. You probably glossed over it, but now it's important. Operations on, if you will, file descriptors need to be atomic on that. And this one here, the fork test, is definitely going to be testing exactly that. It's a really good test. I remember when I initially took this class myself as a wee little tacker back in the Cretaceous era, that I didn't bother putting locks on my file descriptors because the tests were not good enough. And then when this thing came down from Harvard, promptly my old code blew up. So it's there for a reason. You do need to make sure that your operations on your file descriptors are atomic on this. In particular, there's a lot of janking around that involves the position field, but there's some other stuff that's going on. So again, pass file only test, get started on fork. And then the next kind of, if you will, milestone is passing fork test. That is a really thorough test. One other thing to please do not simply run fork test. Before you do that, drum roll, write your own simple user code to, for example, test fork. You know, fork, if return code is 0, hi, I'm child. If return code is greater than 0, hi, I'm parent. Something simple like that. And again, there's another post on discourse. And if you've never used these sys calls before, and again, I'm not trying to point fingers, I was one of them myself. I had never used these sys calls myself before I took Jeff's operating systems class. If you haven't used, don't try to write them. Go on to Timberlake, boot up your system, and write code so that you can actually look yourself in the mirror, you've used them before you write. Does this make sense? In other words, don't write fork when you've never used fork on this. And it's going to make the pill go down a lot easier. So once you've actually used it, then go on and write a simple test case in OS 161 to ask if you will exercise the fork that you are writing. Because again, if you simply start running fork tests, and this is one of the most common student mistakes on this, again, if you're listening to your live or if you're watching this later on, if you will, hi, everyone out in camera land, please do not simply run fork tests. It's an extremely thorough test, and your code will blow up. So first, get your code passing on something very simple, hi, I'm parent, hi, I'm child. You're going to have plenty of problems debugging that, and then move on to something more thorough like fork test on this. If for no other reason, remember, once you get, if you will, two processes going, you're going to have problems of intermeshed output. And you're going to have, well, it'll be just a little bit difficult trying to read the output that gets put to the screen here on this. Questions about the general order of operations? Again, pass file-only tests, start thinking about, if you will, synchronization, write fork. And once you have, at least, not just fork, I would also say like have like wait-pid, and exit at least reasonably working, then go on to pass fork tests, and at that point, you can make sure that the stuff with your file calls is actually properly synced on this. If you're passing file-only tests, that is a good test, that your process-related synchronization for your file calls is in pretty good shape. Questions, comments? Do we need wait-pid and exit for the fork test? Yes, you do. Yes, you need fork, you need wait-pid, you need exit, and you need get-pid. Again, get-pid, one line. So you do not need exec. There's actually a bunch of tests that you can, does not require exec. So I mean, at the very least, fork test, like I say, that's a good milestone before you go on to exec. If you've got problems with fork tests, don't, again, jump over the, if you will, the fence before you clear that. There's, okay. Files, let's talk about what they are, because, wait a minute, what's a file? Is it kind of easy? Well, yes and no. Remember, Jeff has been hammering home levels of abstraction on this, and let's actually talk about this. Essentially, a file, if you will, as it appears to the user is, it's an integer. That's it, a simple number on that. Often referred to as kind of a handle. In other words, a generic reference to something that is inside the kernel, this file descriptor. But before we get to that here, where are these file descriptors actually held in this file table? Both the file table and the file descriptor are, if you will, in kernel data structures. So again, the file table is simply where, if you will, references to the open file descriptors are held. So back to the levels of abstraction here. The user just sees an integer. But what the kernel gets to see is, it takes this integer, finds whatever reference it is in the file table to the underlying structure. And that's something that only the kernel itself gets to muck around with, and it's designed this deliberately so that the user can't, if you will, change the context of the file descriptor without properly going through the syscalls on this. So this is the state of the file as it appears at the process level here. So a file descriptor, it's an open file. It's the level of abstraction that you see at the process level. And it holds things like, again, file position, reference to the V node, what have you on that. Because have you noticed this already? When you were writing, let's say, your past console test did the VOP right thing above take in a position field. It took a V node reference, and it took in, if you will, I'm sorry? Yeah, it took a V node from an UIO. Okay, and it took in a UIO. And what did you have to do? You had to manually spell out what position you wanted to read at every time. This is the key thing. In other words, at the VFS layer, you have to tell it where on the file you actually have to begin reading or writing here. Now at the syscall level, do you have to do that? Well, you don't. Remember, you can open a file, and let's say you read 10 bytes, and then later on you call back, come back and call sysread again. It's gonna remember where you are. Reason being is because it has the state of the file position. That's something that's maintained at the process level of abstraction. It is not maintained at the V node layer of abstraction. That's what we're talking about. It's all part of the same file, if you will, levels of abstraction, but some things are maintained at one level and are irrelevant to other levels on this. They all represent files, but that's what we're talking about on this. Questions are kind of like the distinction between the file level of, I'm sorry, the process level of a file and the node level of a file, which is next one here. Does this make sense? And again, this is not something that you have to implement for this assignment, but I will say it's something that Jeff is gonna be talking about later on in the class and it will be definitely level, I'm sorry, relevant in terms of exam material again. So real quick, bouncy thing on this. Okay, we have the file descriptor level, and again the file descriptor has, if you will, a reference to the V node level, and essentially a V node is a virtual node. In other words, this is the generic reference to the stuff at the VFS layer, and that in turn holds information about what is this virtual file. You probably heard in UNIX that quote unquote, everything is a file. Well, that's one of the reasons why we have the concept of a virtual file, because sometimes a virtual file actually is a file. In other words, it refers to an I node, which is an index node, talking about metadata where on disk data actually is, and that makes sense here. But you know what, we could also have a V node point to something else, let's say like a network socket, or let's say a device driver, this is how we can redirect to the console on this. So again, if you will, we have the level of abstraction of an integer that the user sees, which refers to a file descriptor, which is a process level of abstraction. That in turn contains a reference to the V node layer, which contains a reference to, well, it could be a file, but it could be something else. And essentially, this is all quote unquote a file. Make sense? And again, you don't have to implement this part for assignment two. What you're doing is you are implementing a process subsystem, which is again this kind of level right in here. Okay, let's go through very briefly. It's a bunch of slides, but it goes through and illustrates some of the relations in terms of references and files and processes. Let's say that I have a simple process, a parent process, and remember, each process has a file table. So right now, let's say I have a spot for three files. And this is kind of where things begin. And let's just say here that the parent decides to open a file. So what happens is we have to create a reference. This is what open does, find a slot in the file table. We create, this is our file descriptor structure. And then there is a reference to some V-node that actually makes the magic of the reading and writing happen. Now, let's say that this parent decides to call fork. What's actually going to happen here? Second, process. And remember, each process has its own file table. But per the man page on fork and these other things here, at the point which you call fork here, the parent and the child share all currently open files here. Key word is share. In other words, the child does not get, if you will, a new file descriptor, it gets a, it has to share, if you will, it gets a reference to that. So again, right away, here we have a case of two different processes, sharing one thingamabob underneath. And this is why you're gonna have to start syncing and locking your underlying file descriptors. Make sense on this? So again, you probably already heard me saying this, okay, if the parent at this point decides to read 10 bytes from foo and then the child after that decides to read 10 bytes for foo, where is the child gonna start reading at position? Well, yeah, 11 or like zero to 10. Well, yeah, exactly, 11 slash 10. Yes. It wouldn't reflect the other, so that's not true. The child does something, it's gonna reflect on something that parent tries to do. Excellent question. That's exactly what we're trying to talk about. Yes. In other words, this is kind of a type of inter-process communication. So in other words, the state that's maintained at the process level of a file, if you will, in this file descriptor, right? That is very much shared among processes. In other words, any file that's open before you call fork, it's shared. Yes. And again, since there's two references, we're gonna have to have locks to make this atomic. And again, fork test will make sure that you do have locks on these things. So the point is we open a file and then we called fork. Let's see what happens if we open a file after we call fork. Parent opens up a file bar and again creates, again, another file descriptor and whatnot. Now let's say child wants you to do the same thing. Now remember, child does not get a copy of this because this is after fork was already called. Instead, this is what happens here. Yes, we have the same underlying file, but take a look here. We have two separate file descriptors. So key point here is foo was opened up before you called fork, bar is opened up after you called fork. So in other words, fork duplicates currently existing open file descriptors. It obviously doesn't even know what you're gonna do after that. So in other words, if the parent decides to read 10 bytes from bar, and then the child decides to read 10 bytes from bar, where does the child begin reading? Zero, okay? So because the state that is in this thing is separate on this, okay? Notice too, I've got these numbers here. Essentially, these are ref counts here. So again, we have two. So this thing has to be synced. As of right now, these only have one reference, but who knows? These things can fork and fork and fork. And especially when it comes to something like console IO, that could have literally hundreds of references to it by the time a real operating system gets going. Let's go the other way. So we created this stuff. Let's talk about when you start destroying things. And let's just say here that the child closes foo, okay? See what's happened here is the child has a reference to foo here. So we get rid of it. But we didn't get rid of the underlying file descriptor because why? Right, there's already an existing reference to that. In other words, the point is we've gotta check our ref count. And since there's an existing ref count, we don't ash can the underlying file descriptor on this. And again, this too will also have to be something that's synced, all right? Because there's gonna be a bit of a race condition in terms of clothes. Now, let's say that the parent closes foo. The parent has the only reference, so yeah, now at this point, we can ash can that file descriptor. Pretty much sense. And so let's say that the parent closes bar. Take a look, since the parent had the only reference to this bar, we can get rid of it. But again, you still have this other line bar that was opened up by the child. Oh, by the way, I should say one thing here. The V-notes. Remember, everything at the V-node layer, right on that is, ah, right here. Everything at the V-node layer here is given to you. So, and again, for everyone out there in video camera land here too, please note this. In terms of things like synchronization and ref counting, from here and below, that's taken care of for you by the working VFS file system on this. So if you're thinking about how do I worry about, if you will, well, when you do things like VFS open and close and whatnot, it performs the magic for you below. But again, from here on up, this is what you're doing as part of implementing assignment two, which is the process control subsystem. This all makes sense. And again, you know what, I've kind of already did this before, but suggested order, again, just make sure that open, close, read, write, and else he is working. Make sure that you are passing file only tests on this. Then start on the process calls, again, write fork, waitpid, and exit. Waitpid and exit kind of can be done together as an amalgam on that. And make sure fork test is passing before you go on to exec. At some point, you want to finish up the rest of the file calls, then tackle the big one, exec, and then big one, if you will, the big test is add and arg test. You know, the ultimate test of assignment two, I kid you not, it was probably about 10 minutes to midnight of the due date. We could run, we launched our shell, and literally I typed add space two, space three, hit return, and I got five back. That means that you have a working fork, wait, exit, and exec, and all the stuff is synchronized on this. And getting that simple thing working is, in essence, that means you have completed the guts of assignment two. But something that simple really, you have to have all this heavy lifting. That's what you're doing. And my partner proceeded to high five me. So I remember that to this day. So questions on this? All right, thanks for coming. Same time, same channel next week. Or at least, I forget how long we're gonna be in this room. We may have to announcement on that. Okay, thanks people.