 for to continue to listen to Grace Hopper rather than me, but out of time. So how many people have seen that interview before? That's cute. That's a good one. All right. Today, we're talking about file systems, more about file systems. So we'll finish up talking about our basic expectations about a file. We'll talk a little bit about how files are named and some of the implications and sort of requirements of a hierarchical little namespace. And then we'll get into actual on-disk data structures and talk a little bit about how file systems actually work. So once we've covered what files are and what you do with them and what you expect from them, then we'll start talking. We'll have a starting point for a conversation about how file systems actually work. All right. Simon 3 is due Friday at 5 PM. So finish it up. Based on the submissions that we have seen, it looks like this part is not that difficult. But you do need to do it. So keep going on that. If you've been working on assignment 2 as well, maybe it's a time to sort of slow that down a little bit until you get at least the first part of assignment 3 since these points are there for the taking. OK. So let's finish up talking about the file interface. So last time, we left off at the point where we were discussing how UNIX system calls established relationships between processes and files and how I might exploit these relations. So we talked a little bit about the fact that by opening and closing a file, I'm not only enabling things like exclusive access to the file by a particular process, which I can obtain by passing flags to open. But I'm also giving signals to the file system about the use of files that might be helpful when trying to optimize performance and do other things. So this is one of the reasons why we use some of these system calls that don't necessarily seem required. Remember, we can come up with passable versions of read and write, which is really what I want. I want to be able to modify and retrieve the contents of files. Everything else is just sort of dressing. So I can get rid of open and close. I can get rid of L-seq. And I can solve a passable file system interface. But those calls have a helpful purpose. Except for L-seq, really. I really don't understand L-seq. All right, so I'm sure you guys remember how this works in terms of how the file system interface works, establishing relationships, accessing files, moving the location around in the file. This is the relationship that processes have with files. And if you think about the file system interface, obviously we haven't talked about dupe 2. But dupe 2 doesn't really matter. Dupe 2 is not something that's, dupe 2 is all about the relationship between the process and the operating system. And really nothing has nothing to do with the file system. Dupean just is sort of creating a new alias for the same file. So it's not that interesting. All right, so let's talk about how we organize files, how we name them. And we'll spend 10, 15 minutes talking about sort of hierarchical namespaces. So in the early days of file systems, you want to remember one of the requirements for a file that we had was to be able to, the ability to retrieve files contents. And again, it's sort of interesting. I mean, you guys as computer users don't directly interact with pages of memory or with CPU threads. But you do directly interact with the file system. So this is another difference between the file system compared with some of the other parts of the operating system we've talked about. The file system actually has a human-facing interface. Open up a shell, and there you are, and you can poke around. You can see things, and there's a namespace. And so that actually, it matters that that's something that a human can use. Maybe this is, again, sort of a little antiquated now, because you guys can use search to find things. But to some degree, file names matter. So what's one name to do this? Well, there were early file systems that really just gave you a flat namespace. So you can imagine that this starts out OK. Clearly, I write letters to everybody as text files, right? Letter to my wife, letter to the dog, and then at some point, this starts to become kind of a pain. I have to establish some sort of really ugly naming convention. Maybe I start putting dates in things, because that's useful. But this single namespace fills up potentially pretty quickly. You can imagine it's even worse if, like some early file systems, you have these restrictions on how long the file name can be, right? I mean, most of you guys are still living underneath this terrible, terrible and probably the most bizarre restriction in all of computing history. How many people know what I'm talking about? How many people have been abused by the limitations of UNIX in some way, shape, or form? Oh, yeah, that's sad. That's, I think, we sort of fixed. But I'm thinking about something else. All of you, yeah. That's OK, though. That's a lot, right? All of you have this problem. All of you have this problem. What's that? No, you all have the same name? That's interesting. It's going to make my grade book very complicated. I don't think so. No, I'm not even thinking about file names here. I'm thinking about some other limitation. A naming limitation that may have made drive you nuts every day. Some name that you have here in this particular, at this particular university that has a bizarre and completely outdated restriction that has been placed on. Yeah, there you go. UBIT names, right? Eight characters long. If anyone out there is SSM5321 or whatever, then you know the problems associated with this. How many people have some sort of ugly number in their UBIT name? How many people have some weird part of their name that's been chopped off or otherwise mangled? Yeah, every international student raises their hand. And my favorite is the ones in the department. So like Demetrio at Buffalo. So close, right? Just one letter missing. Steve's is like steveco with a missing e in there, right? I just can't imagine how many emails get bounced because people actually wrote Steveco at Buffalo and then were like, that's nine characters. How many not send that email? Anyway, so imagine if that applied to your file system, right? Maybe you'd feel more sympathy. But anyway, these restrictions are stupid, right? So and this gets awful, particularly when I don't have some sort of hierarchical namespace. But this was actually a hierarchical namespace which you guys are used to now was a feature at some point in early file system design. All right, the other thing about this, again, remember, this is a human-facing interface. So there are human interface considerations here, allowing people to organize and focus thing and focus on something. So if you do an LS, let's say you had to put all of your files in one directory. And let's say that was even possible, OK? You found a way to name them all uniquely. And then what happens? You run LS, and you sit there watching your terminal scroll for like 10 minutes as it shows you all this stuff. So the other nice thing about hierarchical namespaces, it gives me a way to kind of group things together so that the contents of each directory are smaller and easier to handle. It's great when LS will actually fit on one screen. So now I can start organizing things. I can create namespaces that have some structure to them. I asked this before, does anyone still do this? You guys, like, I don't know, on your own machine, you just put stuff wherever. There's no organization at all. Someone has some organization. OK, we have some organizers here. I like that. What do the rest of you guys do? I'm just kind of curious. Like, wherever the program happened to put it, like, hmm. I mean, how many of you, like if you, how many of you, if I told you you had to open a Word document from the terminal, could actually locate it? OK. So you guys must know where things are kind of. How many people don't care and they just store everything in Google Drive? Yes. Thank you. The future is here. All right, anyway. But even in Google Drive, it still has this feature, right? I mean, that's what's sort of interesting, right? Google Drive still maintains this hierarchical namespace to fight the fact that it's completely fictional, right? It's just there from an organizational perspective. But you know, folders and directories. I mean, there's some sort of root here in, I would argue, and how people actually want to organize things. File folders. I mean, these names have physical meanings for a reason, right? I don't know what a directory. Oh, I guess a directory was like those things that you looked up someone in, like a telephone book. So quaint. All right, so this gives me a way to organize things a little bit better. That's nice. Now, in these hierarchical namespaces, one of the things that's important is that there is some canonical name for a file. Modern hierarchical namespaces relax this assumption, but there is still usually one canonical name. Even if you provide a bunch of simlinks and relative pass and things like that, usually a system will have a way to resolve a name to this is the name. This is the best name for it. There may be other names. This is the most sort of fundamental name. So what are the implications of this approach? So first of all, it requires that I actually provide a way to navigate around. Notice that I've created something new here. So before, if I just had a flat namespace, there was no actual concept of a directory, right? I mean, everything was just in one place. Now I've created a new entity within the file system. That directory that's a container for other files that you guys are so used to, itself has to be represented by the file system. And to some degree, it's entirely an abstraction. It's an illusion. It's just there to help you organize things and help make the names that you used to access content on the file system more meaningful. And the name here is the ability to resolve names is critical here, right? I mean, I have to be able to actually take the name and find the file. That was one of our goals. And we'll talk a little bit more detail later today or next time about how we actually accomplish this. OK. So most file systems that you use to have this requirement that the namespace is a tree, why do you think that is? This is a directed graph with a single root. Why is this important? What does this allow me to do? Yeah. Yeah, so OK, so that's fair. I mean, I can always walk. So if I start anywhere on the file system, I can always walk back to the root. And on Unix, it's a very easy way to do this. And I can use that traversal, which is always in the upward direction, to create that canonical name for a file. So one way to create a canonical name for the file is to start from the file and walk upwards until I get to the root directory. And because I'm using a DAG, I know that there's always a backwards pointer from every directory, and I can keep going up. And I will hit a root. If I don't have one, I might not hit a root. So here's a graph that's not a tree. What's the name of this file? So these are directories. Here's my file. What's its name? Does it have a name? Who can give me a name for this file? Yeah. That's one name. Yeah, why not? That works. Or love me, you used to love well. Or used to love well. So that's one problem without. So what am I missing here? What don't I have that the normal hierarchical file system has? I don't have a root, OK? So now let's pick a root. OK, so now we solve the problem. Now what's the name of the file? Well, it could be you, me, love well, or you used to love well, that's sad. It could be you used to love me, you used to love well, even sadder. If you go around that loop in that direction, it just gets worse and worse. So I still have a problem here, right? Because this is not a cyclic, right? This is a graph with a root, but it is not an acyclic graph. There's this clear cycle here. It's the circle part. All right, so now here we go. So now what is the name? What's the name of the file? You used to love well. And the reason for this is forming the canonical name. I start here, and I work my way up towards the root. I wouldn't go. So this is actually a subdirectory of love. It's down here. I could order this so that the top part was straight and that there was a little bifurcation at the bottom. But now I have a unique name for the file. So that's why I have a DAC. That's why it's blue, Sean. Come on. Doesn't blue scream root directory to you? Come on. No, I know. Yeah, I know. I'm teasing you. Yeah, the blue is the root, the blue thing, right? It's the color of a root, like a beet. Beets are red anyway. I mean, a very dark beet could technically be blue, OK? All right, so the nice thing about trees is that they produce this completely canonical name for every file. And again, the way I find that name is I start at the file and I work my way up towards the root directory. And the path that that defines is the canonical name of the file. We usually name the file in the other direction, starting from the root downwards, but it really doesn't matter if you do it either way. OK, now notice that there are all sorts of relative names because if you remember on Unix like systems dot dot, what is dot dot? It's the parent. So how would I produce the canonical name on a Unix system? Give me an algorithm that does that. I give you a file. How do you get to the root directory? Follow dot dot. I use dot dot. If I'm at the root, I keep going. I use dot dot, dot dot, dot dot. At some point, good trivia question. Where does dot dot in the root directory point? Probably to itself, right? So once I get to the root directory, I know because dot dot points to me and I'm done. So that's the algorithm I can use to find the root directory. But it also means that I can produce, despite the fact that there is one canonical name for a file on a Unix like system. There are also many relative names because I can essentially create all sorts of fun names that just sit here and kind of go back and forth using dot, dot, dot, and go up and down. So this was love me, let's see. Love me, I went dot, dot. Love, wait, and I went dot, dot. And then love me, dot, dot. Well, so that still works. And there I also started somewhere else in the tree. So that's also a valid name for the file. But that depends on some notion of where I'm starting from. This is the poem, if you guys can read it later. It's a cute poem. All right, any questions about file properties, file naming, where we go on? So again, this is something you guys, I think, are pretty familiar with, but hopefully you understand a little bit more of some of the implications. All right, so how do we actually do this? This is the fun part now that we've explained what a file is. So here are some design goals we have for our file system. We're going to start talking about how we actually design and implement a file system. Clearly, I need to be able to perform this translation, and I need to be able to do so relatively efficiently. This is something that file systems do all the time. It turns out that name translation is a very, very frequent operation, because anytime anything uses a file, which is frequently, I have to translate the file name to some sort of disk information, information about blocks on disk. The names are fiction provided by the file system. At some point, when I modify things, when I read things, it all has to come down to disk blocks. So anytime I'm presented with a name that consists of a string, the file system has to be able to map that name to disk blocks or disk contents. I'm translating things from the file system name space to the disk name space, which is a set of numbers. I need to allow these file properties that we've been talking about. And if you go back and think a little bit about fragmentation, which is something that we met when we were talking about memory, you can see why this is tough. If I have put a file somewhere on disk, and suddenly you decide that it needs to double in size, I can potentially have a problem in terms of finding more space for it and making sure that I'm doing it a good job of efficiently using the space on disk. That's another design goal that may or may not be listed. Then I have a couple of different things. Remember, the disk is slow. So the file system, to some degree, I'd say to a large degree, part of its job is to optimize access. And this is all about performance. Speed, making sure that when you do modifications and access it to files, it's as fast as possible. And I want to do this on a couple of levels. And the first level is to a single file. Just within a single file, what can I do to improve access to single files? Frequently, I have groups of files that are accessed together. What's an example? It's an example of a group of related files. Yeah, contents of a directory, right? So maybe. Give me a stronger example. Give me a directory that probably contains a bunch of files that are definitely related, that are definitely accessed kind of usually all at the same time. Yeah. Ooh, oh, OK, yeah. So PROC turns out is fake. PROC is not a real file system, right? There are no disk blocks harmed in either PROC or SIS. Actually, I think we talked about this at the beginning of the class, but this is a fun thing to point out. PROC and SIS are entirely fake. And just in case you needed a reminder that the names and the directory space of file systems are an illusion, PROC and SIS are entirely an illusion, right? There are no disk blocks involved in creating PROC and SIS. It's just a way for the operating system to expose information to user space. But you are right that they would be accessed at the same time. Give me another example. Directory that contains a bunch of files that are probably all accessed at once. Yeah. Oh, OK, fair enough. Yeah, like if I have a web page in SIS, there's probably a couple of different files that are involved in serving one web page, like the page itself, any CSS files it uses. So that's a great example. I was thinking of something weirder, like VIMRC, right? The VIMRC directory has a bunch of plug-ins and scripts, and those are loaded every time VIM starts. So not only does it read its configuration file, which is one file, but there's that whole .vim directory in your home directory where you can put other stuff. And all of that stuff is probably always accessed and read at the same time. But maybe that's a weird example. And maybe you use Emacs, you don't care. All right, so there's related files. And then, like we said, this is all super important as well. Actually, saving file contents reliably. When you save a file, you expect the contents to be there. If you had a probabilistic file system that from time to time just lost the contents of things, that would be interesting, but probably not something that you would enjoy. Has anyone ever had that experience before? Like, you're pretty sure that the file system didn't save the contents of the file? Really? Nice. Timberlake for the win. So maybe Timberlake is running a new probabilistic file system that randomly drops things. I think that would be very interesting, but not usually what you want. The fact that you guys haven't had this problem, it's kind of a good thing. It means that we're winning here, and this doesn't happen very often. So that's good, but it doesn't happen very often because people worked on the problem for 50 years. OK, so we're going to talk about a series of file systems in this class, going back to early file system designs that are fun to talk about because they're really tied to the underlying characteristics of hardware. We'll talk about some file systems and introduce some novel features, particularly for crash recovery. There's still active work in this area. Google had a paper about maybe five, six years ago on the Google file system that they use internally to store data. I mean, Google does store data, right? And so it has some sort of file system, and it turns out it has a file system that is designed specifically for its needs, which is kind of cool. But here's the features that these support. So duh files, including some common mixture of things like file permissions, which are a very, very common attribute, and maybe other mixes of file attributes, but timestamps and permissions are very, very common at this point. Hierarchical namespaces with pretty unrestricted names. So this is what we're used to. So the differences here, remember from last time, the differences really come down to how do file systems use the underlying disk blocks to accomplish this. And there, it turns out, there are all sorts of interesting differences and interesting design choices. So this is what you see as a user under the hood. I could give you two file systems, and unless you had access to some sort of special diagnostic commands, you probably could not tell them apart. But if you looked on disk, how they were using the disk blocks, the store content, the store names to do all these things in file systems to be very different. So that's what's kind of interesting about it. OK. So you may have noticed at this point that because file systems essentially consist of an on disk data structure, that data structure somewhere on disk, if you have a file and that file has some string in it, that string is somewhere on disk. It has to be. But all the other stuff about the file as well, the file name, the file permissions, all that information, that has to be on disk as well. And that's not strictly file contents. So broadly speaking, we can divide the on disk data blocks into two categories. There are data blocks. Sorry, the on disk, the disk blocks. We can divide them into two categories, data blocks which actually store content. Now a data block is a one-to-one mapping with some place in an actual file that you could find and access. And then there are called index nodes, or what they're broadly called now are inodes. And inodes are everything else. So inodes contain something other than file data. And we'll talk about a variety of different things that inodes can contain. But broadly speaking, when you look at the disk, this is what you will see. Now from a file systems design perspective, what do I want, like how do I wanna allocate things between data blocks and inodes? What do I want more of? Yeah, remember, it's sorta like Scheduling, or sorta like VM, that inodes are overhead. You don't care. All you care is that the file had a name. How the file system accomplishes that is not your problem. But if the file system consumes an enormous amount of the disk, let's say you got a disk and it was two gigabytes and you formatted it and then you had a gigabyte left. Wouldn't you be sad? You'd be like, what is this crappy file system? I'm gonna go get something else. Now, when you probably have noticed when you format a disk, you do lose some space. And the space you lose is because the file system has taken some of those disk blocks and turned them into internal blocks that I know is that it uses to store its own data structures. But hopefully not half the disk, that would be bad. That would be a pretty bad file system. And so what makes file systems different is the on-disk layout in terms of where things get put and the type of data structures they use to set those things. And then this is a big one we're gonna talk about which is crash recovery. How do file systems prepare for and survive failures? Both failures of individual parts of the disk due to some of the problems we talked about with spinning disks and also failures like power outages and sudden disk connections and things like that. This is what really distinguishes file systems from each other today. Now, to some degree, I just wanna point out there are newer file systems that have features that are not necessarily, don't fit into these categories. So I don't wanna argue that there's no innovation going on in file system feature design. There certainly is. If you've used newer file systems like XFS or ZFS, I mean in some sense the way that these are marketed is because they can do these cool things. If I get a new disk, I can just stick it in the computer and I can push a button on ZFS and suddenly my whole file system just got bigger. Or I've got a bunch of different ways to stripe data across the disk to improve performance or reliability or whatever. But those files, so I'm not gonna talk about that as much. We'll talk about these features and how they're provided. And so the hard part about this is because I'm trying to maintain this data structure but I'm doing it on top of a pretty unreliable medium. So let's take an example. Let's say I want to write some data to a file. Pretty common operation. Wanna write some new data to the end of the file. So what things does, what are some of the things that the file system needs to do to accomplish this? It's, I wanna write, I don't know, kilobyte to the end of a file. What do I need to do? What's that? Okay, I've gotta find the file but think about modifications that are going to result. What is going to change? List all the different things that are gonna change that need to change on disk for this to happen. Yeah. Yeah, so the size of the file needs to be updated. That size is probably stored somewhere in some sort of files specific data structure so that needs to change. What else? What's that? Yeah, okay, the size and the timestamps, bunch of metadata on the file itself, but what else? Yeah, I actually need to store the data somewhere. Remember, I need to find some disk blocks that are unused and store the data there. What else does that require? I mean, you guys are writing an allocator right now, right? So yeah, so I probably need to update some internal data structures so that I know that the blocks that I just used to write this data are no longer free because if their mark is free, then that data is not gonna be there for very long, right? What else? So I need to know how much and I also probably need to make some, like how is this, how are these blocks going to be, how do I know that these blocks are part of the file that's being written to? So there's other. Yeah. Yeah, I need to somehow link these new blocks that I found with the existing file. So that's an issue. Let's see, I think I had five of them here, right? I need to find some empty disk blocks and I need to mark them as in use, okay? I need to associate those blocks with the file, write the, adjust the size and the timestamps and other metadata on the file itself and then actually write the data out. So you can kind of go through here. Now from the perspective of, this is something that in databases you would call maybe a transaction. All of these things really need to kind of either happen or not happen. Because if, let's say I fail halfway through and I do some of them but not others. So you can go through this list and you can kind of think, huh. So what happens, for example, if I do all the things but I don't mark the data blocks as being in use? Yeah. Someone else is going to overwrite that data right away. So that's bad. What happens if I mark them in use but I don't associate them with the file? Yeah. They're gone, right? I've just leaked a bunch of data blocks. Whoops, not so good. So that actually happens, that's funny. Has anyone ever run the file system cleaner and found things and lost and found? Does anyone know what that directory is? That directory is essentially for stuff that the file system discovered, like kind of hanging around. It's like I have no idea where this is supposed to be. So I'll just stick it in this random directory and let you figure it out. So you can imagine what happens if I adjust the size of the file but I actually haven't completed the right. Well now the size of the file is wrong and there's things that depend on the information. So again, all this stuff has to happen all at once but I've got to do a bunch of different things to the disk here. Every one of these operations probably involves a change to some other different spot on the disk. The data blocks are in one spot, the data structure that stores whether the data blocks are free is in another spot. The structure that stores information about the file itself is in a third spot. So I've got a bunch of different things I need to touch here and if I get things halfway done or a quarter of the way done or 90% of the way done, it's not good. Things are bad. And so, and this creates two problems. I mean the one we've been talking about is consistency but the other one is performance because what does this require? So let's say I've got to touch, I don't know, three or four different spots on the disk go back to our fun video of the disk doing stuff. What is this going to look like? No, you're done for today. Yeah, yeah, I've got to like race over here to do this thing and then a race back over there to do the other thing and it's like slow, right? It looks fast to you, I know. Let's be, I'm reading a book now and I'm discovering how slow humans actually are, right? Turns out if you like poke somebody in their foot it takes like a long time for the signal to get to your brain, right? You guys have probably made fun of Brontosaurus's about that or whatever because it took them like five minutes but it actually takes you a fairly long time too. Anyway, the computer can do a lot of work during that time but it's way, way, way slower than other parts of the system. So other things are idle waiting for this to happen. Okay, so let's talk a little bit about the on-disk data structures, right? So we're gonna talk, we're gonna continue by talking about how we translate names to numbers. That was one of the things that was important and find the data blocks associated with a given file. And these are sort of kind of universal things that all the file systems that we'll talk about have to do and then how we allocate and free these internal data structures because this is also something that's important. I'm gonna use examples from EXT-4 but this type of stuff is pretty common and is done in some way, shape, or form by the file systems that we'll talk about and really all file systems. These are pretty universal challenges. Okay, so let's go back and talk about introducing some terminology. So a sector is the smallest unit that the disk will let me write or read from. So on older disks, this was like 256 bytes. I think on more modern disks, it might be up to like 512 or maybe even at 1K. Disks have gotten bigger, file sizes have gotten bigger, the unit of granularity that I can write to on the disk has gotten bigger. So to write a byte, I have to read that whole thing in, modify the byte of memory, write a byte, write the whole thing out again. Now the file system usually does not want to write only one sector. So the file system frequently will choose a block size like 4K, which might be a number that you recognize and decide that all of its operations to the disk are actually going to be done in this bigger granularity. Why does that help performance? Yeah. Yeah, think about it this way. I mean, navigating on the disk is what's slow. By the time I get to a particular spot on the disk, I've invested a lot of time and energy in getting there. And so I want to take advantage of it. If I can do a bunch of writes or a bunch of reads all from the same spot, that's great. That is the ideal scenario. The worst case scenario for the disk is that I'm running halfway across the disk every time and I'm picking up like one byte. Obviously, I can't do that, but one sector. The best case scenario is I navigate once and then I write like the whole track. That would be awesome. In fact, the whole cylinder group, every track on every platter. So the best way to amortize seek times is to actually do a fair amount of IO once I arrive at the spot on the disk that I was trying to get to. Now, the modern file systems have actually taken this to an even greater degree. So now we even talk about something called an extent. So some file systems now will actually even allocate larger chunks. And this is all for the same reason. I do not want to run halfway across the disk to pick up one tiny bit of a file and then have to run way across the disk and pick up another part of the file. There's a trade-off here in terms of, if you think about the chunks that I build files out of, the data chunks, I mean you can think of these as a bunch of data blocks or as a single data block. As these chunks get bigger, what gets better? Let's say I made an extent like a megabyte. So extents are what I used to hold the data for the file. So a megabyte extent sounds great for what? What is that, what type of file would that really help me out with? Yeah? Well, database, data is like the catch-all solution for every question in this class. Database, well maybe, this is so complicated, right? Probably, yeah, databases have big, big storage, but what else? Give me another type of file where huge extents wouldn't really matter. One megabyte. Yeah? Videos. Videos, audio, frequently, now there's two things that matter there, right? The first is that those files are big and so creating, if I have to create a one gigabyte file out of one megabyte chunks, the largest amount of space that I'm gonna waste is not very big compared with the size of the file. The other thing is access patterns. So databases may or may not work for this because media files are frequently accessed sequentially and so big extents are good because when you get to that next frame in the movie, I can pretty much predict that you're gonna watch the next three seconds of the movie and so I might as well get the whole thing all at once. I'll be wrong sometimes, but I'm right more often than I'm wrong, but I'm not. Huge extents, one megabyte, terrible for what kind of file? Why wouldn't I use an extent this big in real life? That type of file would just really be bad for. Text, what's that? Text, well, okay, most text files, I don't know about your text files, maybe you're writing all of your OS 161 code in one file, which I don't recommend. Does make your current.conf simpler. But so yeah, if you have text files or small files, configuration files, a lot of the things that are on disks are pretty small, why is having a huge extent kind of not a great idea for a small file? I'm wasting like most of it, right? So if you have a 1K file with the one megabyte extent, I don't get that space back. So this is really another fragmentation issue. It's very similar to the trade-off we talked about, we talked about pages and other things, right? All right, so we talked about this, yeah. But from the perspective of disk scheduling, these bigger extents, reading more data from the same part of the disk is good. And so file systems have started to, now minor file systems have made changes to the original file system design, I think that are kind of driven around the fact that a lot of people are storing much larger files in the file system and accessing those larger, and the other thing is disks have gotten so big that a lot of you guys have, if I have to trade off a little bit of waste in order to get better performance for things like video and audio, it's worth it because you guys have like terabytes of storage. How many people are out of storage on their machine? Like they don't have any more. Yeah, I told you. How many people are like at over 50%? Okay, just a couple. Anyway, you guys have enough space. All right, so I was talking about ext4. ext4 has every i-node. Every file on the system has an i-node. That i-node is where things like the timestamps and the other metadata for the file is stored. The i-nodes are 256 bytes. And i-nodes are stored in groups on the disk. I'll show you this in a second. The i-node contains the location of the file data blocks or at least some information that allows me to find them. Now, you might ask in 256 bytes, how am I gonna store the location of all the data for one terabyte file? We'll come back to that, right? But I need some information to get me started. The permissions are stored in the i-node, all the timestamps associated with the file. i-nodes are named and located by number. So when you format your file system, ext4 creates a certain number of i-nodes. And from henceforth, I use that word in a sentence. I don't think I used it correctly. I don't say henceforth very often. From that point forward, the file when ext4 tries to translate a name to a disk block, the first thing it does is it translates the name to an i-node number. And then it uses the i-node number to find the i-node on disk. So there's a couple of implications of this strategy. What's one interesting implication? When I format the file system, ext4 creates a certain number of i-nodes. So you guys may have run across this problem before. You're running along, you're a happy ext4 user, and then one day, your file system says that it's full. So it's interesting. What do you do? You run DF and it says that you've got 30% free space. File system says it's full. Why? See, somebody else has had this weird problem other than me. You ran out of i-nodes. So ext4 creates a fixed number of i-nodes. When it formats your disk, if you create too many small files, it is possible that you can run out of i-nodes before you run out of space. And in that case, essentially, the file system is full, despite the fact that you've got a bunch of blocks sitting there that you could use. The reason for this, why do things this way? This seems like a stupid restriction. Why would I, when at format time, why would it be useful to create all the i-nodes that I'm ever going to need or try to? This also seems wasteful. I'm pre-allocating all these i-nodes that I may never use, yeah. Well, so I will accept that answer and I will modify it slightly. It means that all the i-nodes are at well-known locations. From the minute the file system is created at format time, ext4 will always know where every i-node is. It doesn't have to do any sort of fancy allocation within some internal data structure. It's like an array. Just looks it up and it can find it, yeah. That is true. If you can do that, but you know what people normally do? They normally do is they don't realize that they're going to need more i-nodes until they run out. And then they're like, my life sucks. I have to. Oh, yeah, yeah. It's a sad day, though, when your system runs out of i-nodes. Makes you very angry. But yeah, with ext4 there's nothing else to do. The file system is full. What you end up having to do is find some huge disk that has the same amount of space. Copy everything onto it. Reformat the file system until you see for you want more i-nodes this time. And of course, this time you're going to have like 1,000 times more i-nodes. So you have way more i-nodes than you'll ever need. You have more i-nodes than the number of atoms in the universe. But you will never have that problem again, so. So just to prove to you that this is true, let's poke around. This is a fun tool. I think if it's not installed in your VM, you can install it easily. This prints off some information about the file system itself. So oh, actually, sorry, this is about a particular i-node. OK, I'm sorry, I'm ahead of myself. So what this does is it tells me to give it information about a particular i-node for a particular file system. So this is the device that the file system is using. And this command is stat2. So 2 is the i-node number. And here's what it prints out. So it says i-node2. Remember, every file on the system in the XD4 is named with a number. It's a directory. It gives me information about the permissions, the size. So the smallest file size on this particular file system looks like 4k, because there's really nothing in this file. So this is essentially all the information that's in the i-node. These are all the timestamps. These are in hex. And then here's a translation of them to human time. And then here's blocks. So remember, I need to be able to find the data blocks for this file in the i-node itself. And here's a number that identifies the data block on disk. And it tells me that there's one block associated with this file. What is 2? Anyone know? 2 is kind of a special file for XD4. Any guesses? What special file would something like this have? What's that? Devices? Yeah. Root. Root. Yeah. So 2 turns out to be root. I don't know why. I think it was probably 0 at some point. And then maybe somebody broke something, and then it became 1. Maybe somebody, this is like a very, very slowly moving version. We'll probably get to 3 at some point, because something will break and I'll need to move it. So 2. Yeah, 2 is the root i-node. As you would expect, there's not a lot. Here, this is a directory. So we'll come back to how XD4 represents directories in a minute. But directories are essentially just special types of files. All right. So how do we translate an i-node? How do we find this i-node structure given an i-node number? It's proved to you we can do it. Well, remember, because all the i-nodes are created at format time, it means that the system knows where all of them are. When every time after that, what the XD4 does is it stores some information in a header on the file system that stores information about where all the i-nodes are. If you guys are doing your core map data structure, this is not very dissimilar. When you format your disk, you're essentially telling the file system, all those data blocks are yours. Do with them what you want. And what file systems do, including the XD4 frequently, is they use some of that space, just like you're doing, for metadata about how they're going to allocate the rest of the space. So there's probably a couple of blocks at the beginning of the file system that have some magic numbers and other information. And one of the things that they would contain is the location of the i-nodes on disk. So what the XD4 does is it puts i-nodes in these chunks. So you can imagine that these are all the data blocks. The XD4 puts all the i-nodes associated that are going to use these data blocks in one spot, sort of at the beginning. We'll come back in a little bit later, and we'll talk about how we modify this general data structure to accommodate larger disks. So one of the consequences of the way that XD4 does i-node layout, and this is not uncommon, is because the i-nodes are all in this one array. And here's all the unallocated data blocks that the files that are in this group are going to use. The i-nodes may not be located close to the data blocks that they're associated with. Why is this a problem? Well, I guess it says up there. Now, one thing I do is that I do try to create chunks of i-nodes throughout the disk so that the distance isn't too big. But even if I have a chunk, so I might have a data block over here that's associated with a file that has an i-node over here. Why is this a problem? Go back to my right example. In general, any time I read or write from a file, what are a couple of things that I'm probably going to have to change on the disk, or use, or access? Reading or writing from a file probably always involves accessing the files, what? Data blocks, OK. What else? I know. On the i-node is the key to everything. It has all of the other information. So any time I access or modify a file, I probably have to touch the i-node and a data block. So if the distance between the i-nodes and the data blocks is large, I'm doing this again. We'll come back next time, or maybe on Friday, and talk about why that's not as big of a problem. And we'll pick up here on Wednesday. So good luck with assignment three.