 Talking about disks and files, we're going to start to talk about the abstraction that we use to manage and deal with disks. So with the CPU, we had threads. With memory, we had address spaces. And with disks, we have files. And we'll talk a little bit about file metadata. And then we'll get into Unix file system semantics and start talking about hierarchical file systems. And at some point, it'll become Wednesday. But maybe we'll see how far we get today. So I'm not quite finished with the assignment 3 autograding. That's coming, though. One of the TAs was telling me, oh, well, people have been bringing their design docs in. And they're not sure how to get feedback. And I was wondering if maybe there was something broken about the assignment 2 design doc submission. And then I realized that, no, that's not something broken. It's just no one has submitted one yet. So I've started to track a little bit about where people are in the class. And as far as the assignments, and I have to say that I'm concerned for many of you. Many of you guys have not even submitted a assignment 1 patch at all. How many people have not submitted an assignment 1 patch? Some of you guys are lying because I know who hasn't submitted an assignment 1 patch. I have the list of names right here. So if you're a little embarrassed, then that's probably good because you should be. Because you're behind and you're in danger of doing poorly in this class. If you come in here and nail the exams and fail the assignments, you're not going to do all in the course. That's how the grading works. So if you guys haven't even started assignment 1, you don't even really know what's in store for you. If you haven't written any code, if you haven't gotten started with that, then you're flying blind here. You're like the guy in the cockpit with his hands over his eyes. This isn't going to end well. At least open your eyes and see what the problem is. That'll help you guys out. So that was my version of a pep talk. But it wasn't a very good one. But you guys still have time. If you guys really sit down and get going on this stuff, you guys still have time to complete these assignments. But you're definitely not going to complete them at the rate that you're going at this point. If it's taking you two months to do assignment 0, then I'm worried about where you are. So on Friday, we talked about disks. We talked about the mechanics of how disks work and some of the implications of the details of disk operation for how we design file systems and files. So any questions about disks before we do a little bit of review? All right, so what's the disk platter? Remo. Yeah, it's the platter. It's like a plate that the data is on. A circular flat disk on which magnetic data is stored. All right, what about the spindle? Spencer? It. Incorrect. Satish? Yeah, the drive shaft. The thing that the platters are attached to. And do you want to come back and try that again, Spencer? Yeah, you're back for me. Yeah, there you go. This is the head is the actuator that actually floats over the surface of the disk. And reads and writes the magnetic data onto the substrate below, right? And technically, these are mounted. Does anyone remember what these are mounted on? Yeah, Jimmy? Yeah, yeah, so the disk arm is sometimes what we call that. The arm will have multiple heads on it. Sometimes disks will have multiple arms, right? The one we talked about had a single arm. All right, what about disk locations, right? So when we talk about a track on the disk, what are we talking about, Dan? Think track. Yeah, it's like a lane on a track, right? So it's a circular path around the disk, right? In the equal, think of a lane on a race track, right? Dan, do you want to try that answer again? Yeah, there we go. OK, so the sector is this pie-shaped area on the disk. So it resembles kind of a slice of pie cut out of a single platter. And what about a cylinder, Jen? She doesn't know. Jen too. Yeah, so if you stacked all the tracks vertically on all the platters on the disk, right, you'd have a cylinder, right? Or sometimes we talk about all the sectors that are on, sorry, all the data that's stored on that is being on the same cylinder group, right? So yeah, so if you imagine you took a cylinder and you kind of sliced it down through the disk platters, the place that it would intersect is what's called the cylinder, right? All right, so what's different about spinning disks from the things we've talked about, Sean? How many, oh, OK, so how many tracks are on a cylinder? So it's a good question. Who can answer that using another piece of disk terminology, Manish? What's that? That's also true, but who can answer this question using disk terminology? Huikiyang, I have a disk composed of multiple platters. And how many tracks would be on a cylinder? Does it depend on the sector? OK, yes, that is true, but why does it depend on the height? Yes, it is. Right, so it depends on the number of platters, right? So there's at least one track per platter, right? Potentially, how many tracks per platter could you actually have, Sarah? Well, remember, a cylinder is kind of these vertically organized tracks, right? And I'm at least going to have one track per platter, but I could also have how many? Dan? So that's if I had one per platter, but I could also have what, Sean? Two. How would I have two? One on both sides, right? It's like a pizza with toppings on the top and the bottom, right? Of all the pizza innovation out there, apparently, nobody has tried that one yet, right? Toppings like on the crust, they've been there? Ah, there we go. I don't know how. The box designed for that pizza would be difficult, right? Anyway, so yeah, so we have magnetic substrate on both sides of the platter, why not, right? I mean, that's an easy way to just double the capacity of our disc, right? So I have two tracks, potentially per platter, and then I have some n number of platters on the disc. That's a good question. I don't know how many. I wish anybody had a disc they want to let us just rip out of their loft off and pull open so you can see how many platters. I wish I knew. It's probably like six or eight. It's not that many, right? So yeah, it's the number of platters times two, probably. So that would be the number of tracks per cylinder. Usually, we talk about cylinders in groups, right? Meaning that we take adjacent cylinders, right? So it's less of a discrete grouping of tracks, but it's kind of maybe a group of tracks that are next to each other, right? So think about a bunch of lanes on the racetrack that are next to each other, and that defines a cylinder group, right? So that's kind of all the data that I can get to without moving the heads too far, right? All right, good. OK, so who's going to tell me how discs are different in kind at Grim? No. See, AJ. Yeah, they move. Nothing else in your computer moves, right? Oh, one of the other components we've talked about so far, move, physically. What about degree? Lovely. They're slow? They're slow, right? And they're slow partly because they move, right? And then, also, how are they different in terms of how they're integrated into the operating system town? They are slow, but I don't want to believe I'm going to ignore you for now. Actually, yeah, so we usually think of discs as devices, right? So what we're going to try to do here to some degree is build these interfaces that allow us to put multiple devices below them. That's kind of what we did with the CPUs and address bases, but you don't think usually about, you can do this. I mean, how many people have ever replaced the CPU in a machine with a different CPU? So this is possible, right? How many people have usually put in new memory and things like that, but we usually don't think of memory and CPU have to have pretty, there has to be a lot of standardization in order for that to happen, right? Whereas discs, we have these interfaces like SATA, and you can buy hundreds of different discs that all support the SATA protocol. They all have some naming so that the operating system can use them, but they may work very different, right? So I can take a machine that has a spinning disc in it, I can take that disc out, I can put in a solid-state drive, and everything just works, right? So that's kind of nice. All right, so how do we read and write data from the disc? Let's talk about how to do this. What's the first step, Sam? Yeah, so I have a block of data. We talked about discs, you usually think of blocks of data, right? So like 256 or 512 bytes of data. And the operating system wants to write this to a disc. What's the first thing I need to do, Jeremy? Yeah, even before that. Oh no, let's see here, what can I pick on? Yeah, I have the issue of the command, right? The disc doesn't know what to do. The disc is just sitting there at the end of some interface, right, on the operating system. So the first thing you need to do is actually issue this command, right? So I have to send a command over the SATA bus potentially so the disc knows what to do, OK? Now the command arrives at the disc, now what happens next, Nick? Yeah, so now I have to move the heads, right, to the appropriate track. So the disc needs to be able to map from whatever the block ID is that I gave it and needs to know where that is given its own internal geometry, and it needs to move the heads there, right? So that's the first thing that has to happen. Andrew, what happens next? Yeah, so I have to settle, right? So the heads actually have to be able to stabilize on top of that very potentially narrow track, right? Then what happens? Yeah, so now that's what I call the rotation latency. So I have to wait until the platters rotate back to where the stuff on disc is stored, right? What's next? Coppina. I've got my head to the right track. I've got the track to the right sector, you know, potentially, and so now what do I need to do? I need to actually read the disc, read the data off the disc and transmit it back over the bus, right? So usually probably what happens is, you know, the data is read into some buffer on the disc, right, until it's completely, until the block is read completely or some portion of it is read completely, and then the disc streams that buffer back to the operating system, right? It says I have data that's ready and then the operating system reads the data off the disc, right? So we talked a little bit about a kind of disc trends, right, and we call this the IO crisis, right? What's the, what two factors are driving this crisis, Dan? Yeah, so discs are getting bigger, but they're not getting faster, right? And there was definitely a period of several decades where discs got really, really big, especially in sort of personal devices, but they didn't get faster during that period, right? And so they were frequently, and they got a little bit faster, right, but they certainly didn't get faster according to Moore's law, the way that everything else on the system was getting faster, right? And that was meaning that the difference in performance between discs and other parts of the system was getting worse, right? All right, before we go on to files, any more questions about discs? Going once, going twice? OK, so when we start talking about files, what we're going to try to do is sort of be precise, right, as opposed to other parts of this class where we just try to be vague and unclear. So most of us are kind of familiar with files, but a lot of us haven't thought kind of pretty specifically about what the file interface looks like and what are the semantics of a file, right? So we'll talk about kind of like just the minimum amount of sort of what the minimum interface is for just being a file, right? What do I need to do to be a file, right? What do I expect files to be able to do, right? We'll talk about other information that's associated with files, right? And this has actually become more and more interesting over time, right? So there's other useful information that file systems typically store about files, right? Or in we sometimes call this file metadata, right? It's different from the contents of the file, right? And maybe stored with the file, it may not be. So we'll talk a little bit about the relationship between files and processes, right? And some, you know, if you think about things like open and closed, right? You guys are working on implementing hopefully, right? Well, why do these commands exist, right? Why do we, why does Unix and other operate systems allow you to kind of, as a process, establish a relationship with the file, right? Why would I do that, right? Why not just have read and write, right? Read from this path, you know, this number bites easy, right? And then we'll talk about kind of when we start organizing multiple files together, right? To kind of a, you know, talk about a file, how file systems do that, how do we organize them in useful ways, right? So some of this is gonna seem very familiar, some of it will not, right? So at the minimum, what is, so we're starting to talk about the subtraction of a file, right? What do I think that the file has to do to be useful, right? What are the two things that I at minimum have to do? I thought I was building the simplest possible file system, right? Jeremy. Yeah, that's to store data, right? Reliably store data, right? This is important, right? I put things in a file and processes and your system expect that data to persist, you know, essentially forever, if it's not modified or deleted, right? And then what's the other thing I have to do? This is a little less obvious. Yeah, I store data somewhere and then what else do I need to be able to do and or if that data is going to be useful? Organize it, but specifically, what do I need to be able to do? Yeah, Tim. Well, reading and writing is, I think in this reliable store and retrieve, Sarah. Yeah, I need to have naming, right? I need to be able to locate the file. The process and the operating system and the disk together have to work together to create this namespace, right? So I need to have some way of naming things, right? Naming things in computer science turns out to be a huge problem in general, right? And you guys are aware of this on some levels and not aware of it on other levels, right? But naming, you know, computers aren't good with names, right? Computers are good with numbers and arrays and things like that. And humans, you know, so how many people think they could go to Google by typing its IP address? Yeah, good, me neither. So we're good with names, computers aren't, and getting computers to be good with names is quite a big challenge, right? So yeah, so usually we give, you know, we give files a name and we need some way of, we need some semantics associated with these names. And again, a lot of this is like really internalized. You guys have internalized a lot of this, but this actually didn't always used to be this way. So we'll talk a little bit about how this happened, right? Okay, so at minimum with file expectations, right? So we expect that file contents shouldn't change, right? That's the reliably stored and retrieved data, right? And we also expect that they should change when we want them to, right? And when we start talking about shared access to files, there's some questions about how exactly files should change and multiple processes are using them. But in general, we can think about this in terms of just one process using the file, right? And when you think about these simple requirements for file systems, right? This stuff seems really like a no-brainer, right? Like this is basic stuff, right? We haven't talked about, you know, optimizing for performance. We haven't talked about namespaces and hierarchical file systems and all sorts of other fancy features, you know, on-the-fly compression and all sorts of cool things that you can do, but it turns out that, you know, getting this stuff right is not easy. And, you know, this was a year ago, but, you know, Windows 8 has had file system issues, NTFS has had file systems, Rizor FS has had, like there's still problems with file systems, right? In terms of these very basic expectations, right? So, and part of this is, you know, bugs, right? Just simple problems with file system design, right? But keep in mind that because, to some degree, we expect more from files than we do from CPUs and memory, right? If you take, you know, your desktop computer and you suddenly yank the power from it, right? You don't expect the memory to be still holding contents when you turn it on, right? Or the CPU to have any idea what just happened, right? So with the file system, you might say, hey, what happened, all that data I was storing like a millisecond before the power, right? So file system expectations make the design of file systems a little bit more complicated, particularly when we started talking about dealing with failures, right? So parts of the disk fail, power goes out, you know, head crashes, things like this, right? You know, the person who I took this course from at Harvard used to say that, and she was a file system engineer, right? She had done a lot of file system research and design, and she used to say, you know, problems with data corruption in file systems are the worst kind of bug to have because if somebody experiences that bug, right, they have both the, you know, inclination, right? Because they're angry, right? And the time, because you just deleted all of their data to, you know, back when she was designing these, maybe it was call you up on the phone, now it'd be like flame you on, you know, 4chan or something. But they are mad, right? Like you have destroyed their data, and while they're busy recovering it and running all their file system cleaning operations, that's when they can send you the angry emails and, you know, post in the forums about you and things like this, right? So these are not good bugs to have, right? They're very difficult to recover. People usually like to have their data, right? They like their data to kind of like be there, right? And, you know, if we all got an email from Google to Marabi, like, yeah, sorry, we lost, you know, half of your email, right? I think some people would be sad, and they would be angry. All right, so there's other information that we frequently might want to associate with the file, right? So, you know, we think about files as having contents, right? That's like the information that's inside the file, right? That's the data that the file is storing. That's information that processes can read or write. So what else might we want to know about files for, you know, I don't know, a variety of reasons? Same. Yeah, so we might want to enforce some access permissions on files, right? We might want to have files be able to be private to certain processes, and there'd be mechanisms for which processes can share those files in limited ways with other processes, right? So that's one thing. What else, Jeremy? Yeah, so maybe temporal information about the file, right? So when was it created? You know, when were the contents changed? Things like this, right? What else? I'll show you. Yeah, things like size, right? I mean, you know, that could be useful in certain cases. Yeah, okay, so now we're getting into some more interesting things. File types, right? What is a file type? Does anybody know? Yeah, maybe. Okay. Right, yeah, I mean, the reason I'm, is because like, it's not really a rigorous definition of what this means, right? Like, on some computers, they use the file type to determine what program should try to open it, right? Sometimes, as Rimo pointed out, there might be differences in encoding between different files, but that's not required, right? I can take a text file and rename it to be a CSV file. The encoding's the same, right? But what's different is how something tries to interpret it, right, potentially. So file types are kind of an interesting case of this, right? So, and then on a per file basis, and I wanna go down this path just briefly. Jeremy, yeah. Oh, you're answering my question already. Hold on, sorry, I thought you had a question. Let me state the question first, and then you can answer it. So, for other more specific types of file, right? You might want there to be more, file-specific metadata, right? So for example, EB3 fundamentally contains sort of audio data, right? Audio data that's been compressed in a particular way, right? But there's other information that we might wanna know about the EB3, like Jeremy. Things like this, right? And there are, there's an interesting, there's an interesting question in file system design about where this information should be stored, right? So where is one place that we could store it, right? For EB3s, actually, where does it usually get stored? Yeah, so one way of doing this is to jam this stuff into the file itself, right? So I could just stick it in there, and then the nice thing about this, right? This is as the file moves around, right? That data stays with it, right? So when you buy your EB3s from iTunes, they come with all this metadata set properly on them so that when you load them into the music player of your choice, it can find out this information and use it in a variety of ways, right? Another way to do this in some cases is to store it in another file, right? And then you kinda have to keep those two files together. Let's stay with our music example. What's one piece of metadata about EB3s that frequently gets stored in a different file? That's, yeah, album artwork, right? Incredibly important stuff, right? Maybe it should go in a file, I don't know, but it probably doesn't go in the file partly because it's a little bit of redundancy there, right? You have the same piece of artwork for multiple files. And then there have been file systems in the past, and I think some modern file systems still support this. That allow you to actually associate attributes and almost add code of database-like information to file records themselves, right? So I can set an attribute on a file and that information is not stored in the file, it's stored by the file system, but it's queryable, right? And there's some file systems that have actually produced a pretty nice information. I don't wanna go through this at length, I don't know why I did this last year. So anyway, their pros and cons of each approach. All right, so there's a lot of, most file systems provide some way, some interface for establishing a relationship between a process and a file, right? And essentially what that, if you think about it, right? I mean, think about open and close. We talked about this a little bit earlier, right? Why do open and close exist? What is the point, right? I mean, clearly they expose some kind of information to the operating system, right? What do they really tell the operating system about a process's use of files? Yeah, well, no, no, no, it doesn't actually, right? I mean, open doesn't, open has no memory-related semantics. Correct, maybe, but I mean, again, to some degree all open, what is open, right? I mean, first of all, is open required for correctness? Do I really need to call open on a file in order to use it? Or could I design a file system interface that did not use open and close? How many people think I could? Yeah, me too, right? I mean, how would it work, right? What would you pass to read? What's that? Muhta. Well, okay, no, no, hold on. What do I pass to open right now? What does open receive? Wrong, open gets a what? Path name, right? When I call open, I give it the name, right? The name that the file system has, that me and the file system have agreed on, that names those contents on disk, right? And then what do I, what does open return? A file descriptor, right? So if I don't use open, what do I have to pass every time to read and write? Sam? Well, but if I don't have open, where do I get the file descriptor? I have to pass what? The path, right? So I can write, read and write without open, right? I just pass the path every time, right? And maybe I pass the flags too. Okay, I might need to do something a little bit different with the flags, Jeremy, yeah. Yeah, I might have to pass the offset too, right? But that might be nicer, right? I mean, I kind of hate this implicit offset stuff, right? It makes programming with files very confusing, you know? I have to kind of remember where it, where was the offset last and things like that. So yeah, I might have to pass the offset every time, right? But open and close are not really, again, I'm trying to point out is they're not really required for file use. What do they provide, but what do they provide the system option? They can, but they don't have to. Especially if I'm not creating a file, right? If I'm opening an existing file, right? That the permissions may be set on the file already. But I already have a call to do that. It's called flush, you know? It works, yeah, I'm gonna fight indirection. Well, I set up levels of indirection to do this, but that's not quite what I'm going for, I'm looking. Yeah, it does do this. So on some level, right, what I think of open and close as doing is providing hits, right? They provide the operating system with hint about who's using the file and for what, right? And we could talk a little bit about, particularly, right, so particularly when I close the file, that's a nice hint, right? I'm done using it for now. And as somebody pointed out, I might have buffers that I can at that point flush, right? I might, the file system might be doing some caching on this file, which I can now stop, right, which is good. Some of these things I can start when a process calls open, right? So a process calls open, I could say, okay, I wanna start caching the file contents, et cetera, et cetera. So there's some nice hints with open and close, right? And they also, of course, give me some data about the fact that somebody's using the file, right? Or particularly how many processes are you using and things like this, right? But they're not necessarily required, right, for correctness, right? And there are some file systems that you guys probably use on a regular basis, whether you know it or not, that really don't even support the idea of open and close. How many people log into Timberlake to do something periodically? Yeah, for me, it's like once a year, if I can manage that infrequently. But Timberlake and other systems support these network-mounted file systems that frequently don't support these calls at all, right? So one of the reasons is to prove performance, and then, right, and as someone's pointed out, I can also provide guarantees, right? So for example, it's difficult to provide exclusive access to a file, which I can do using open if I don't have an open and close call, right? Because I can't, yeah, I could try to do that with read and write, but it'd be a little weird, right? And as it is said, some systems like network file systems don't bother to establish these relationships. And in many ways, on NFS, this is done for, or at least on earlier versions of NFS, this was done for failure reasons, right? So for example, let's say I have a client, right, that opens a file for exclusive access, meaning that while I have it open for that type of access, nobody else will be able to open it, right? So the network file system updates the state and it says this file is kind of locked by this process, nobody else can use it, and then that process dies or the person's laptop shuts down or whatever, right? So open and close, at least, with NSF and NFS and its earlier versions produced a lot of headaches. And so these were usually essentially ignored. Was there a question in the back? So, and again, right, we just talked about this, right? So Unix semantics for read and write, store the file position, and this is also a convenience, right? So what I'm trying to do is just get you guys to think about some of these conventions, right? These are not necessarily ironclad things, but they've come about for a variety of reasons, right? So I've got this, you know, I've got this establishing relationships with the file, right? And then I have reading and writing, right? And you guys, I think at this point, are familiar with these, hopefully, because you're working on assignment two and then I have some positioning as well, right? So this is how I can, L-seq is how I can move the file handle without actually doing a read or write front, right? So read or write, move the file handle implicitly, L-seq moves it explicitly, right? All right, what's missing from the interface that we've described so far, right? I went through that quickly, I think, because I'm hoping you guys have seen this as you've been going to recitation and things like this, right? So, okay, so of the parts of the assignment two, the file system interface, what have we not really covered, right? Rima, dupe two, wow, it's good, right? I'm gonna take a moment to that, right? And at some level, the reason why we don't talk about dupe two much when we talk about file systems is dupe two is really about maintaining, you know, maintaining the data that's used to map sort of between a process and the operating system, right? So dupe two only manipulates the process file table, doesn't actually touch the file system at all, right? All right, so let's talk a little bit, start talking a little bit about how we organize files, right? So it seems fairly basic, for example, that every file needs to have its own name, right? And on early file systems, essentially, this was accomplished in a fairly straightforward way, so let's see here, it's the first week of school and there's my file, right? And then next week, here's the letter to my wife and then there's one to my dog and then I'm gonna write my wife another note and then another one and then, and actually, you guys may not notice exactly what's weird about this, but some early file systems actually exposed flat namespaces, right? So no directories, right? Or if you wanna think about it differently, imagine if your file system had one massive directory, right? And all of your files were in there, right? Maybe some of you guys actually do this, right? This is kind of how some of you guys organized Gmail, right? You know, like no filters, no folders, one huge inbox with 10,000 unread messages in there, and maybe, again, as we talked about, maybe this doesn't matter anymore, right? Maybe if I have search, I don't even care, right? I never have to go into that folder, run LS and wait for like 10 minutes while all of my files scroll by me, right? And then if I run another letter to my mom, I need to find like another unique file name, right? And actually, I think, and how many people remember, this is, I'm gonna date myself, early days of email, like copy-serve, did anybody have a copy-serve email address? Ah, email, so the, so I don't know, I don't know, this is very funny, right? So you guys, I mean, you guys are still living under the plague of this a little bit because many of you guys have these completely nonsensical Buffalo email addresses, and it's because they're still limited to eight characters, right? Which is a limitation that Unix introduced, I think probably in the 1970s at some point, right? That you guys are still living with, but I think it's pretty hilarious. Copy-serve had this brilliant idea of assigning numerical email addresses, right? So your email address was like a 10-digit number at compuserve.com, or whatever it was, compuserve.net. And why would they have done that, right? What were they thinking? Why not like people like names, right? Letters, like how many people have an entirely numeric Gmail address? Nope, right? But what was their model? What was their mental model here? What were they thinking that email addresses were gonna be like? Tom? Yeah, I'm not thinking of something else. How many people can remember their social security number? Yeah, okay, about half the class. But again, what were they thinking, Greg? Yeah, but like why did they think, given the choice, people would want these numerical, like why did they think anybody would ever be okay with that, that's, and those are other things, but what else is a 10-digit number that many people know? Phone numbers, right? So they were like, yeah, everyone's all gonna have phone number and people like numbers, right? So we're gonna say, hey, here's your numerical, yeah, I have my phone number at whatever. And it was kind of dumb, right? So, and this is a variant. You can imagine that again, if I forced you to come up with a unique name, right? I mean, most of your systems probably have tens of thousands of files on them, right? If I forced you to come up with a unique name for every one, I think that would be pretty gross, right? And so what file systems started to adopt was this idea of kind of folders or directories or filing cabinets, right? Ideas, you know, find somewhere to put things and begin to allow users to kind of store and examine related files together, right? This was the real idea behind this abstraction, right? And our goal here, right, is that every file is stored, should be now stored in one place, right? Remember before I had this goal of having unique names for my files, right? Which was important. Why do I need a unique name for the file? What gets kind of wiggie if I don't have a unique name? Sean. Yeah, I mean, how do I call open, right? I call open path to file name that has 10 copies of it, right? Is it like probabilistic? You know, do I get 10 file descriptors back and I have to figure out which one is the one I want? Yeah, so I mean, that gets weird. So, all right, so when you start to, we're trying to build this up from first principles, right? So now I have this idea of location, I can put things into folders, right? I have some idea of, now I need some idea of navigation as well, right? So I need to know kind of, because the files that are visible to me, right? If I think about browsing the file system, right? Or locating things are now dependent on where I am. In the flat name space, they weren't, right? I just saw everything all at once all the time, right? But now I have to have this idea of location, right? And I might have locations that point to other locations and so I end up with this whole hierarchical tree, right? So, okay, this is kind of obvious. So most hierarchical file systems, there are probably some exceptions to this, but the ones you guys are used to, right? Organized files essentially into an acylic graph with a single root, which we call no tree, right? You guys think about the directory for your system, right? Why this requirement? Who could think of some weird things that might happen if this isn't true, damn. Yeah, so that's one good reason, Jeremy, what else? Yeah, but also has to do with naming. Oh, I like this example, right? So let's say this is my, so this, so first of all, this is not an acylic graph, right? And let's say that this is my file, right? These are my directories, right? This is my file. What is the name of this file? What is the unique name of this file? I mean, you could have this, like, this is, you know, so you're in this directory, you can get here, here, you're in this directory, you can get here and here, you know, like, this is, this would be kind of a fun thing to play around with, but what's the name of this file? Let's see, so, okay, now, so now I picked a root, right? But let's say that this is still, this is, so now I have a root, right, which is gonna help me name the file, before I didn't even have a root, right? So I wasn't even sure where to start, right? Now I have a root. What's the name of the file now? Is this an acyclic graph? No, there's a cycle right here, right? So what is the name of this file? How many names do people think there are? Do you think there are two names? I think there are several more than two, right? Yeah, so I have one name, you use to love well, right? Another name, you, me, love well, I have another name, you use to love me, you use to love me, love well, right? So, I mean, again, this is something that you guys haven't really probably had to think about before, right? But this is why cycles are bad, right? So, so this is, so let's see here, right? So, okay, so this is now a, so now I have a canonical name, right? And let's think, and when you guys have used Unix systems, you're familiar with these canonical names, right? This is a name that's rooted in root, right? So the canonical name for this file is used to love well, right? That is the way that I locate the file by starting at the root and traversing downwards until I find the file that I'm looking for, right? There are also a bunch of relative names, right? So if I'm in this directory, right? I can have relative days that are rooted at the root, right? So this is a relative name, right? Where I got here and then I backtracked and then I went here, right? And if you guys play around on your virtual machines, you can, again, you can produce essentially almost, is it actually infinite? No, it has to be infinite, yeah. You can produce an infinite number of names that your system will accept for any file on the system, right? Just by, this is like the Buffalo, Buffalo, Buffalo thing, right? Like you can produce an infinite number of names there, I mean, usually in relative names, yeah. Is there a limit? I mean, yeah, at some point, Unix will probably barf and die, right? It'll just be like this path name is too long, you know, what are you doing, right? But it'll take it a while, right? You should figure out where that point is, right? But essentially, if I just oscillate between these two states using dot, dot, and moving forward, then I can do it, right? Here's another relative name. This just starts in a different spot, right? So, love me, love me, yeah. So this is kind of my example of how to make it. I like this, I'll just put this up online, you guys can read it. This is a very nice poem that I was using for my, for my example, right? All right, any questions at this point? We move. File like object. Yes. Yeah, okay, so this is a great question. So I am, we talked about this a little bit earlier in the class, and I think this is something that I should point out again before we go too much farther. So, what we're talking about now, right? Are files that are files, right? Files that map down to disk blocks, right? And we're talking essentially, some of the things we're talking about are sort of taking a name, right? And mapping it down to blocks on disk, right? That are on stable storage. So what Riemann's pointed out, which is a great point, is that operating systems have abused the file system interface in a variety of ways to expose other types of things to processes that are actually not files, right? And sometimes we call these file-like objects, right? And we talk, I think we talked about these a little bit, maybe a month or two ago. So how many, does anyone remember one file-like object? AJ? Yeah, network sockets, right? So a network socket, I can read, I can write, right? I can open it, I can close it, right? So it's got some things that are, what would be something that would be not a feature of a network socket that I would probably be able to use? What's one part of the file system interface that doesn't make a lot of sense, yeah? Yeah, L-seq, what does that mean, you know? Like, I can't seek on a socket, right? I mean, maybe that means like, skip the next, yeah, but it doesn't really make any sense, right? What's something else that's a file-like object? You guys remember from our earlier days where we were exploring our Linux system using some standard utilities, learning about processes, Spencer? Yeah, so the PROC file system, right? If you guys go in your virtual machine and run mount, you'll notice that there's a file system called PROC, right? The PROC file system is not actually a file system at all, right? It looks like a file system, right? It smells like a file system. It's not an actual file system. There's no actual files, right? The contents of PROC are not stored anywhere on disk, right? But it's kind of fun. You can go into PROC, you can browse around, right? But why do that, right? Why does PROC exist? Why, like, why not, you know, what's the purpose of PROC and how did it end up looking like a file system? What's in PROC? What do I find if I go start poking around in PROC? Spencer? Yeah, okay, so now I think we're talking about Sys2, which is also similar but different, right? There's also something called the SysFS, right? Which, as Spencer was pointing out, is frequently used to actually communicate information to the running kernel, right? So if you want to change something about this system, you can write it to something that looks like a file and that information ends up potentially being communicated to some part of the kernel, right? So that's kind of a nice way of exposing that. But what's in, well, let's go back to PROC, right? Because PROC is maybe a little bit simpler. So what's in PROC? Yeah, so PROC is essentially used to export and you're probably right that there are parts that you can write to, right? I know I'm right. He knows he's right, he's done it, okay? Well, if he knows he's right, then I know I'm wrong. So PROC is there to expose information about processes that are running on the system, right? So if you go in PROC, for example, the way PROC looks is like there's a directory for every process that's running, right? The directory is the process ID, right? Again, there's no contents on disk anywhere, right? What's happening is that the operating system is generating those things on the fly, right? It's making it look like there's a file system there, but there actually is no file system, right? So, but what we're talking about here primarily, going back to RIMO's point, is files that map down to disk blocks, right? So, but again, the file system interface is frequently used to expose other types of information. That's something that's useful to take on. Does that answer your question, if it's, sorry, if it's what? Wait, I'm, yeah. Oh yeah, yeah, yeah, absolutely. I mean, the system needs to know what kind of file it is, right? Because it needs to figure out, so what part of the system is responsible for reacting to the calls to open and close, right? So one of the things that happens when you guys run open and close, read or write, is that those calls, right, start out associated with the file descriptor, but eventually they were going to be handled by some specific part of the system, right? So for example, if the system determines that the read operation was on a file that was on like an NFS-mounted file system, right? It goes down into the NFS code, right? If it determines it's on a socket, it goes into the networking layer, right? So yeah, absolutely, these have to be routed properly. And essentially there's a bunch of, so if you guys have started looking at, you know, your kernel for assignment two, you've seen that there's this thing called the VFS layer, right? And the VFS layer is essentially there to allow multiple different clients to implement a file system-like interface underneath, right? They don't have to support all the operations, right? But they're there to, that interface is there essentially to take those calls and kind of vector them in the right direction, right? That's a good question. Any other questions before? I think we're almost out of time today. All right, so on Wednesday we're gonna keep talking about sort of file systems, right? So we've talked a little bit about what we expect files to be able to do. And on Wednesday we'll talk about how file systems accomplish some of these things, right? And here are some of our design goals for you guys to think about for Wednesday, right? So first of all, I need to translate names, right? What I'm gonna expose to users and to the system and to processes are names, right? And from names I need to find contents, right? And usually finding file contents in a file system means locating the disk blocks that store that information, right? I want files to be able to do the things that users expect files to be able to do, right? So they should be able to change, right, clearly. But I also wanna be able to grow and shrink them, right? So when somebody goes back and continues writing their long letter to their mom, that file's gonna get bigger, right? And so as the file gets bigger, I need to find more space for it. And as it shrinks, I need to be able to free up that space. Also as I move files from place to place in the file system, right? I potentially need to make some adjustments to my file system as well, right? Hopefully not as bad as that terrible cut and paste we saw, copy and paste we saw on Friday, but usually there is something that the file system needs to do, because the file system also needs to remember store where things are because it has to translate names, right? And then file systems also do some work optimizing access to single files, right? Which we'll talk about. And then there's also this related challenge of optimizing access to groups of files, right? So if I can start to understand relationships between files, it's possible that I can use that understanding and my knowledge of the layout of the disk and properties of the disk geometry to improve performance, right? And again, we care about performance here a lot, right? Because in the worst case, the disk is really, really slow, right? So if we can do some clever things we can make and the file system is kind of a charger trying to do some things to make the disk seem faster, right? And finally, we wanna survive failures, right? And surviving failures requires two things. I mean, one, to the degree possible we want the file system, we want the contents on disk to reflect changes that were made to the file system, right? And if I pull out the plug, what I'd like is the file system to look like the file system was supposed to look right before I pulled the plug out. The second part of this, though, is maintaining consistency. So what I don't want is, for example, to be caught off guard in the middle of trying to do something, power gets pulled, and when I boot up, the file system is in such an inconsistent state that I can't recover from it, right? Or recovering from it takes a huge amount of time, right? So these are two related things that actually to some degree, they're at odds with each other, right? This is another design trade-off that we're gonna talk about, right? Okay, so we will start talking about these things on Wednesday. See you then.