 Everybody, good morning. Welcome to Wednesday. So today, we're going to continue talking. On Monday, we established some of our expectations regarding how files work and what sort of file operations our file systems are going to need to support. We talked a little bit about hierarchical naming as well. It's a strategy for organizing the contents of file systems in a way that's kind of useful for users and potentially intuitive. So today, we'll start talking a little bit about how file systems accomplish some of the design goals. We'll present the design goals again. And then we'll start looking in a little bit more detail at a specific file system and what happens on disk. So on some level, you could think of, on some level, file systems are really a big data structure. If you guys have taken courses on data structures and think about what data structures do, file systems are a large data structure that's implemented by storing information on disk. So we'll talk a little bit about today about how that works. So I'm still working on, well, actually, I'm lying. I haven't started working on this yet. So that's a requirement for it to actually happen. But once I do, hopefully it'll happen quickly. So yeah, I'll let you know when this happens. And yeah, any questions about, oh, I'd also like to thank the anonymous student who reported that the assignment one grading scope was broken because it's nice when somebody actually tells you that something's broken in a way that is giving them more points than they think they deserve. So anyway, that bug has been fixed. All right, any questions about files before we keep going? So on Monday, we talked a little bit about file expectations and file naming and things like this. So what does a file have to do to be useful? What do we want out of this abstraction? What are some of the requirements? Because these are going to lead to some of the design goals that we look at today. Capino, what's one thing a file needs to do in order to be useful? So yeah, we have some concept of naming. So I need to be able to find the file. There should be a way that processes and users of the system in particular, and the file system can agree on how to describe the name of a file, which means that that's how I refer to it at the file system level. What else does the file need to do? Sarah? Yeah, store data. It would be kind of sad if we just had a bunch of names that we had agreed on. But those names were kind of like, hey, I know about that name, and so do you. And that's kind of fun. But what the names are supposed to do is refer to content. So reliably store data, and reliably store this content, which is a challenge on file systems, and be located. So those are minimum things. We talked a little bit about pieces of file metadata that file systems might want to store because users might want to know about. So what's one example of a piece of file system metadata that we might want to know about Kevin? Yeah, maybe the size of the file. That would be something that would be useful to know. That's something that maybe I can calculate actually if I can find the contents of the file. But a lot of file systems might store this information so it can be easily accessed. Yeah, Josh? Can you do stuff to the file? Yeah, permissions. Who's allowed to do different things to it? Who can read it? Who can write it? Who can append to it? You can imagine lots of different types of file system permissions. What else? Spencer? I have a lot of problems with that. Yeah, when was the file used last? So a lot of file systems support several different types of timestamps associated with a file. I might care when the file was created. I might care when it was last written from, written to, or read from. What else, Jeremy? Other types. Yeah, so then we started to talk about some of these are things that I can really define for any file. But then we started talking about potentially storied file system types. And I might use that type information in a variety of ways. I can use it to associate files with given applications, which is how it's normally used by operating systems. When you open a file on Windows or in any operating system, really, the operating system has to make a decision, which is what program should be used to open the file. You'd probably be disappointed if you double clicked on your MP3 and it was open with the text editor, for example, or paint or something. So yeah, so file system, the type of file, anything else? Nothing. What else might I want to know about a file? We're getting down to the drags here. Anybody have any new feature requests for file systems? Yeah, great. If it's accessible by other files, what does that mean? OK, that feature request has been withdrawn. Shot. OK, that's fair. That might not be stored in the file itself. But we certainly need to know, when we're talking about file-like objects, what they actually are. We're talking mainly about files themselves. Josh, what else? Yeah, OK, so that's fair. We might want to know something. And actually, we'll talk more about location when we start talking about specific file systems, probably Friday or next week, and particularly one of the canonical orally file systems, the Berkeley fast file system. What do you mean by location? Where to store it? I mean, remember, when we're talking about files that are on stable storage, not file-like objects that could point to other things, where is the data in that file? Eventually, that data is stored in a what? Tim? Yeah, it's some disk block. Discs have these 512 byte blocks, and any information in a file that's stored on disk is stored in some disk block. And at spinning disks in particular, that disk block is located on some platter, on some track, on some sector. There is a little piece of magnetic substrate that is storing those bits. And where those bits are, especially on spinning disks, actually, matters quite a bit. And on flash drives, locality matters as well. It turns out just in a different way. These are a good list. So we talked about the fact that a lot of file systems have file system calls, like open and close, that are used to establish a relationship between a process and a file. What are some of the things I can do by forcing processes to use open and close and declare their intentions with respect to a particular file? Yeah, so I have a hint now that a process is done with the file, because it called close. It could turn around and call open again immediately. That's kind of dumb. Why would you do that? So not that it doesn't happen, but I have some clue that the process is done using the file. What else do I potentially know? Yeah, so I can provide exclusive access to the file, too, which certain cases I want to be able to do. If I just have reads and writes, it's a little bit harder to do that. I might be able to come up with something that would provide some kind of similar semantics, but it would be kind of weird. And it turns out I could also improve performance, potentially. If I know what files are being used in what way, I can do some caching. And certain file systems also will, in certain cases, actually try to do some predictive fetching to file data, because certain files are frequently read and written in certain ways. What's an example of this? What's an example of a case where I could use read ahead? What's a particular type of file that you think would be most often accessed in a very specific way? Paul, what's that? I don't know if I understand that answer. What kind of, let's try something different. Heather? What does he say for itself? He said address translation. OK, OK, so, and how would that be accessed? OK, well, what else would be true about, this is a good question, what else would be true about configuration files? When I look at patterns of access, what is more common when I look at configuration files? So this is a good exercise. So what's one thing I expect about configuration files? But if I looked at all your configuration files and I compare them with other files on the system, what would be true about them in general? Think about the difference between a configuration file and that high def movie that you have stored on your hard drive. They're small, right? In fact, can you expect configuration files to be quite small, actually? They're text. They're just throwing a little bit of information. And then what else is true about configuration files? Compared to, I don't know, let's say, word documents or the source code for your OS 161 tree. Yeah, I'm thinking more about something that the file system would be able to observe. Jeremy? Yeah, so they're usually probably read in one piece. But what about patterns of reads and writes? What do I expect, Jen? Yeah, they're not writing very often, right? Configuration files are not changed that frequently, right? So it's potentially a very, very read heavy workload, right? Let's think about a different type of file in terms of an access pattern that the OS could use its knowledge to optimize. Give me another example. Well, it's interesting. I hadn't thought about that. That's possible. But I'm thinking about something quite specific, although fairly common. What's something that a lot of you guys do on a daily basis with your computers? Not in this class, but. Yeah, and what about MP3 files? How are MP3 files accessed? What's the canonical way of accessing an MP3 file? Your iTunes loads up an MP3 file, and what does it do? Spencer? It reads it from start to finish in order, right? I mean, unless you're badly seeking around in the file, in which case it does. But that's not the normal use for MP3 files, right? The normal use for MP3 files and MP3 files and many other kind of media files is they're played from the beginning to the end at a fairly consistent rate. And this is a case where the OS might be able to do some performance caching, right? Because if it knows, if it sees that there's this pattern established within a file, particularly if it might know that the file is a media file, it might start when you issue a request for a certain part of the file, it might go out and just start fetching more parts of the file ahead of it, because it knows those parts are about to be used. So that's an example, right? And we talked, so moving up the stack a few bits, we talked a little bit about how network file systems sometimes don't even, some old network file systems don't support these relationships at all with open and close, because open and close when you have unreliable clients starts to become a little bit problematic, right? All right, any questions about this before you go on? Great, yeah. Well, remember, so this is a good question, right? So let's go back to the ELF example, because this will give us a little memory management review, right? So what does the operating system not do when the program starts? The ELF file has to describe all of the contents of that program's address space, right? Except for things that are initialized to zero, like the stack and the heap. But all the code in particular, that's part of that program. And of course, yes, it doesn't include dynamically loaded libraries, et cetera, et cetera. But the point is that all the code and libraries used by that program have to be on disk somewhere, right? There has to be a file, whether it's part of the executable for that program or part of a shared library somewhere. It has to be on disk somewhere. But what does the operating system not do when the program is run? Robert, what's that? Well, it probably needs to open the ELF file and do a little bit of work, right? But what does it not do in particular? Yeah, probably won't write to it. But in terms of accessing the file itself, I'm ignoring it, Jeremy. Andrew. No, so again, I'm saying like, the processes, here's my ELF format. It's got all this code in it that I think I might want to use. But what does the operating system not do when the process is loaded? Remember, there was a specific trick that we play here to avoid doing things we might not have to do. Alyssa, do you remember what it is? It doesn't load everything, right? In fact, it probably doesn't load much at all. It waits for the process to fault on pages that are missing from its address space. And then when that happens, it goes out and gets them from the file and brings them into memory, right? So if, so going back to your example, if the operating system was actually going to load all of the contents of the ELF file into the address space, right? Then I would see it go, I mean, start to finish potentially all the way through the file loading everything, right? But because I don't do that, right? The pattern of access to ELF binaries is a little bit different, right? And in fact, it probably ends up looking kind of random, because it depends on the pages in the address space that the process needs, right? That's a good example. Any other questions on the file? Yeah, it's called on-demand-paging, right? Yeah, we talked about this when we talked about virtual memory. It's a good reminder. So on-demand-paging, it means that if you tell me that you need a page, I will not get it for you until it's demanded, right? And code page is what it means is that that code page will stay in the ELF binary until you need it, right? And then I'll go get it for you, right? When it comes to things like stack and heap, it means that I am not going to find you a page in memory for that big section of heap that you just asked me to allocate until you fault on the page. Then I'll go find you one. Yeah, sure. Yeah, so you can, right, so you can use LC to move the file pointer around, right? The Unix file system interface allows you to, what's that? Yeah, yeah. You can LC can adjust the file pointer in an arbitrary way, right? So let me ask you another question. So why do we, why does the operating system, why does the, we start talking about file system, we talk about performance. Why do we care, why do we worry so much about performance in file systems, especially when it comes to things like read ahead and stuff like that, I mean, why would we, why does the file, why are file systems designed to try to, why would we try to predict anything about access to files, right? Why does it matter? Yeah, because the disks are really slow, right? And the more information you can give disks about the upcoming requests, the better disk schedulers can do, right? So for example, if I could predict all of, so think about spinning disks, right? You've got heads on them and the heads are sitting there as we saw and they're bouncing all over the place all over the disk, right? And moving the heads back and forth creates a great deal of latency. So the best thing I could do is if I knew all the disk access for the next five minutes, right? Which I don't, clearly, but if I did, I could tell the disk about them all at once, right? And if the disk knew about all the blocks that it needed to get over the next five minutes, what would the heads look like? What would the heads do? Damn. Yeah, well, okay, but what would the disk do? I give the disk this like massive lists of blocks that it needs to get, right? The disk knows where every one of those are, right? So how does it optimize access to the disk, right? What would the heads look like? Remember that video we saw of the heads flopping around all over the place, right? But if I told the disk all, like, a million different blocks I need retrieved, right? How do you think the disk would schedule those? Yeah, so the disk sorts that entire list from one edge of the disk to the other, right? And then what you would see is that the disk heads would just, you know, make one pass across the surface of the disk, right? And along the way they would pick up everything that they needed from any track they were on, right? But they certainly wouldn't be doing this, right? Because bouncing the heads around like that wastes a lot of time and increases latency, right? So the more I can tell the disk, the more I can schedule its head so that it picks up everything in one nice pass, right? We'll talk about this more when we talk about FFS, but it's important to keep in mind, right? Because a lot of the file system optimizations we talk about are really rooted in these properties of spending disks. Yeah. No, no, no, the data can be all over the disk, right? The idea is the disk will, the disk scheduling algorithm will essentially try to move, the disk tries to move the heads as little as possible, right? So if I can do one pass with the heads, right? So you imagine the heads are just slowly crawling across the disk in one direction and along the way they're just grabbing data, grabbing data, grabbing data, grabbing data, throwing it back to the OS. And the platter's still there. Oh yeah, the platter's spinning away, right? So the idea is that my seek times are minimized, or because I never bounce around. It turns out that Windows XP had a really clever optimization that was based on exploiting this technique and doing some prefetching of actual page loads, right? I won't talk about here, because I think it was on the exam last year. Anyway, so, okay. Any other questions about files before we go on? All right, so let's review about our file system design goals, right? So given what we understand about files, the file system has to do a couple of things. So the first of all thing we have to do is we have to do name translation, right? So I'm gonna give you a series of characters, right? And you and I are gonna agree about some semantics about certain special characters like dividers for paths, right? And what the file system has to do is actually find the blocks on disk that store the data for this file, right? So this is one of the jobs. I want to allow files to grow, to shrink, right? So, and this is different than changing the contents within the files, right? Because changing the contents within the files ends up, you know, making a little bit less of a difference on disk, right? But you can imagine that when a file grows, for example, I may need to find a bunch of new disk blocks to associate with that file. And I may need to make sure that I can still find those disk blocks when that name is translated, right? The same thing with files moving, right? So what I mean by moving is I mean moving their location within the file system, right? So I move a file from one directory to another, right? The contents on disk are the same, right? And hopefully the contents don't have to move, right? But what does have to move is the file system's idea of where this name is. So that's an example of changing the name without changing the contents, right? So we talked a little bit about ways to optimize access to single files. And we'll talk a little bit more about this when we talk about specific file systems. And then file systems also do some clever things to try to identify relationships between files and then do things on disk to make sure that those file systems are efficient to access together, right? So certain, for example, a group of libraries that's frequently used by one program, right? If I can detect that, I may consider that when I think about where to put those files on disk, right? And then the other, potentially, this should be job number one, right? Because if you can't do this, none of this other stuff really matters, right? Is surviving failures. And that means both trying to preserve as much of the content as possible, but also making sure that I keep my data structures in a consistent state, right? So if I don't do this carefully, all the data may be there, but my data structures are corrupted. So I had a, when I was in college, I had splurged for this really big hard drive. It was 20 gigabytes, right? It was like, wow, this is huge, you know? And I had filled this thing up with MP3s, right? Had this great MP3 collection I was really proud of. And then at some point, I think when I was starting to fool around with Linux, I somehow corrupted the file system data structures on this drive and I was really sad because I had all these MP3s I really liked and they were gone. So I bought this program that allowed you to repair the file system structure and it ran over the disk for a while in it. And it was kind of nice because at the end it was like, oh, here are your files, right? And there were a couple of, it was like, well, you know, out of your 1,000 MP3s, I couldn't find 10 of them, right? And I was like, oh, that's not too bad, you know? Like, I won't miss those too much. Thank you for the 990 files, right? Well, then I started listening to the MP3s that it had found, right? Where do you think those 10 MP3s ended up? In teeny weeny little bits in all of the other MP3s. So you'd be listening to Madonna and then suddenly Britney Spears would be out for like a quarter second, right? And then you'd be, and so at the end, I just ended up having to throw out everything, right? Because that's terrible. So anyway, so this is a case where you survive a failure and yeah, I mean, at some point, it was able to repair itself, but the repair was so bad that it didn't really matter. All right, so yeah, so the, and the file systems we're gonna discuss all support some super set of these features, right? They all support files, they all support hierarchical namespaces, right? Which is a very, very common feature, right? And so to some degree, the interface to you looks the same, right? So if I took your computer today and took it up to my lab and spent an hour, moving all of your data over to a partition with a different file system on it and then just stuck it in there, you wouldn't notice, right? Because it's still, they're still files, they still Grosier could change, they still are organized in hierarchical namespace, but suddenly I've replaced your NTFS with EXT4 or whatever, right? So what turns out what's different is how these things work and how they're implemented, right? So the overall file system interface and the look and feel of file systems is pretty simple, right? So we start to think about how to implement hierarchical file systems, right? What we notice is the following, that we think about the disk blocks, right? The data that's actually stored on disk. We can divide these into two categories. So some of them store data and then some of them store other stuff, right? So the data blocks contain file data, right? That's what you would expect, you know? At some point, the file data has to be on disk, right? These index nodes, or what we'll frequently refer to them as is inodes, right? What is an inodes, right? Or, you know, what's a list of other types of stuff that I would need to store on in disk blocks that is not file contents? Yeah, Bart. Yeah, so attributes related to files, what else? Yeah. What's that? Metadata. Yeah, other types of metadata, okay, that's fair, but I'm still missing something pretty important, Spencer. File name. Okay, file, yeah, file names, but what else? Yeah. Okay, some, we're sneaking closer to the answer I want, right? So remember, the files have to do two things, store contents and then what's the other big goal of the files? Yeah. What's that? Okay, that's answers vague, but hopefully moving us in the right direction could be better. Anything not data. Anything not data, okay, that is strictly correct, but not as specific as I want to be. Yeah. Location. Location, what do we mean by that, Satish? What does this mean? What am I missing here, Paul? Well, that's also true. What about, how do I implement directories, right? Directories aren't, are directories a file? It turns out on Unix they actually kind of are, right? But they're not, we don't think about them as a file. You don't think, oh, I'm gonna take the contents of my OS 161 assignment and store it into this directory. You might put it inside the directory, but it wouldn't store it in the directory, right? So, next notes have to contain everything else, right? So everything people have mentioned metadata about files, right, of various types. The name of the file, right? Because they name is, we can think of the name as separate from the contents. The name is how we find the contents, but it is not the contents, right? And then all sorts of other information about the file system data structure, right? In terms of, you know, how, in terms of directories, there's parts of disks that you use for failure recovery and things like this. So anything that's not a file, right? So when we start to think about looking at different file systems, what I want you guys to notice about file systems that are different, right? Because all the file systems we're gonna talk about accomplish these goals, right? One of the biggest things that's different about file systems is, first of all, the data structures that they use to solve these problems, right? Particularly, you know, things like name translation, right? What do they do when files need to grow, right? Where do they find those disk boxes? Where do they put files on disk, right? And a lot of this comes down to on-disk layout. So again, you think about it, you know, if I took, if I did that experiment we were gonna do before where I replaced all of your, I moved all of your files onto a different file system, right? The contents of the files are all the same. The disk could even be the same. I could use the same disk, right? What's different about the file systems is where the contents are on disk and what types of information are stored in disk blocks. So if you could see down to the disk level and see what's in the disk blocks, you'd see major changes, right? Despite the fact that the file system looks identical to you at a high level, right? So yeah, so data structures that you use and sort of on-disk layout, right? And crash recovery, right? So how does the, what are the semantics of crash recovery that are supported and how does the file system prepare for it and then recover from it, right? So as we said before, what I'm really doing is maintaining this large and complex data structure. And this is difficult because with any data structure, especially one like this, where I have a number of different things that I'm trying to, never different problems I'm trying to solve, never different things I'm trying to optimize for. Making changes requires updating a lot of different data structures, right? So let's talk about an example of this. So let's say I want to write data to the end of a file, right? I've opened a file, I want to append some data to it. Okay. So what are things that the file system would need to do in order to accomplish this operation? Nobody hasn't picked out today. Tim. Okay, so right, I need to locate the file, right? So I need to locate the contents of the file. What else do I need to do, Jeremy? Yeah, I've got some extra data now, right? So I need to locate some extra disk box, right? So I need to find these empty disk box, right? And I skipped over the locate one, but that's a good one, that should be in here, right? So I certainly need to locate some empty disk box, right? And I also need to indicate that they're in use, right? Cause I have some data that I'm about to put in them, right? What else do I need to do? Nothing. Yeah, so I need to associate these disk blocks with that file, right? Cause these disk blocks are now part of the file, right? And I probably need to have some way of retrieving for a file the sort of ordered list of disk blocks that store its contents, right? So whatever data structure I'm using to associate the blocks, but the file has to be updated, right? What else? Correct. So okay, this is a good question. Do I need to reorganize the blocks so they're contiguous? Is this required? So is this required? Who thinks it's required? Who thinks it might be a good idea if I could? Yeah, so when we talk about EXT4, we'll actually talk about the fact that EXT4 plays some games to try to allocate disk blocks close to the rest of the file, right? But if I can't, then too bad, right? They might be way over on the other side of the disk and then you might see that sort of thing when I actually have to re-derive from the file. But yeah, I definitely want to store things close to each other on a disk cause that minimizes my seat time, right? So that's a great point. What else do I need to do here? Sir? Yeah, yeah, eventually, yeah. No, it's okay, so you're getting ahead of us, right? Weak young, what else do I need to do? Yeah, so if I have meta, remember I was storing the size metadata? Well, that needs to be updated, right? Cause the size of the file just changed, right? And this is important for correctness, right? Because the file system might actually reject writes that are past the end of the file, right? And how do I know where the end of the file is? Well, I probably use this size, right? So this is important to do. And then, and now, as Sarah said, at some point I actually have to send the request to the disk to actually write the contents, right? But think about it. This requires, so at some point, what do I actually need to do in order to do this? What has to happen? Does this, so does this, does this probably involve a disk operation? Who thinks it probably does? Yeah, I mean, where do I store the blocks that are in use? On the disk, right? So here's one, at least one disk operation. What about this one? Does this involve a disk operation? Probably, because where do I store the data structure that associates the blocks with the file? On disk, right? What about this? Does this involve a disk operation? There's a pattern emerging here, right? Are you guys good with patterns? Yeah, so right, where do I store the size of the file? On disk, right? And then, now I have to do this. So there's all these different disk Ios potentially that have to happen in order for this operation to complete, right? And remember when we talked about synchronicity, right? From the perspective of a process and for correctness, all of these things kind of need to look like they happen together, despite the fact that they involve a bunch of different operations to the disk, right? And you can also think about, okay, I just talked about this, you can also think about what happens at different points here if I fail or if the disk crashes or if the power gets cut or whatever, right? So for example, what happens if I complete step one and then a failure happens or my system turns off? Just as an example, yeah. Yeah, so next time I boot the system, my hard drives just looks a little bit smaller, right? And if I don't do something potentially, those disk blocks will never be freed, right? Because they're not actually associated with the file, right, they're just marked as in use. So if you've ever run programs to check and correct errors on your disk, this is one of the things potentially, depending on the file system and its format that they look for, right? They say, are there any kind of orphaned disk blocks, right? Disk blocks that are marked as in use, but actually not associated with the file, right? Because this can happen in certain cases, right? And again, I won't go through each one because I think I'll do this again later, but you can imagine what happens in various cases if things fail, right? All right, so we talked about this, right? All right, and for the examples I'm gonna show you over the next 10 minutes, I'm gonna draw these primarily from EXD4, right? So I think it's nice to make this nice and concrete, right? All right, so we start talking about EXD4, we have to start talking about some specifics of disks. So the sector on disks also refers to the smallest unit that the disk allows to be written, right? And usually this is like 256 bytes, sometimes it's 512 bytes, right? On the other hand, the block is the smallest unit that the file system will actually write to the disk, right? And this is a multiple normally of the sector size, right? And so why would file systems, and here's another question, why not write in 256 byte chunks, right? Why would I write these larger 4K blocks, right? The disk allows me to write it a smaller granularity, why would I write it a larger granularity, right? Yeah, but also because of contiguous writes, right? Because remember, if I write to locations that are close on the disk, right? I do one seek, right? So every time I write a block, I do one seek, right? I seek to where all the sectors that are associated with that block are located, and then I do a bunch of writes, right? Probably these are all even on the same track, meaning that I don't even have to move the heads at all, right? If I do have to move the heads, it's not far, right? Because they're gonna be on neighboring tracks, right? So, and the other thing is, where have we heard about 4K before? Anyone remember our friend 4K? What else is 4K, Sarah? It's the page size, right? The virtual memory page size, and when we get back to talking about file system caching, we'll talk about why this matters, right? And it turns out that there is a, there's a good reason for a match between the page size and the block size that file systems are actually going to write, right? Because, lo and behold, your system uses memory for something else, other than just being memory, and that's, it uses memory as a cache to make the disk look faster, right? And then on EX-D4, you actually also have the concept of an extent, right? An extents are, this goes back to what Greg was pointing out before, trying to find contiguous blocks, right? So on EX-D4, what they've said is they said, okay, you know, even if I have a small sector size of 256 bytes, even if I take eight of those, or 16 of those and turn it into a block, right? That's still not big enough, right? So what I'm gonna do is I'm gonna create even bigger chunks of disks that are called extents, right? And the extents are described by a start and end block. And the idea behind an extent is that I associate extents with the file and the extents hold portions of the file, right? So you can think of extents as a big chunk of a file that map down to a contiguous set of disk blocks, right? Jeremy, do you have a question? Yeah, kind of, right? I mean, you could just think of extents as, you know, as a series of contiguous blocks, right? And why, again, why would I want to write data in even bigger chunks to the disk if I can get away with it, right? Jen, well, I might have a big file, but why do I want to write in bigger chunks? Yeah, for the same reason I wanted to write up here, right? Because contiguous writes are good for disk scheduling, right? And as Jen pointed out, a lot of files are bigger than 4K, right? If I have things like, you know, again, that'd be threes or movie files or whatever, these can be megabytes, gigabytes. So the more contiguous they are on disk, the better it is for disk performance, right? And extents are essentially a way that file systems like EXD4 have started to address that by saying, okay, again, I'm not gonna even bother with block by block allocation, I'm gonna give big chunks to a file, right? So what's the, so what, so what's the danger here, right? Essentially what we're talking about here are pieces of an allocation problem, right? I'm allocating disk blocks or disk sectors, right? I've organized them into blocks as my smallest allocation unit. And EXD4 has said, I'm even gonna use bigger allocation units, I'm gonna call them extents. What's the trade off here? As the extent size gets bigger, what do I have more of? Yeah, so I have this internal fragmentation issue, right? I give the file an extent, right? Let's say the extent is, I don't know, 32K or 64K or 128K, you can set the extent size when you format your files using the EXD4, right? If you have a lot of big files, you might wanna set the extent size to be very big. However, for small files, right, what will happen is that as soon as you give it an extent, if it doesn't use a big piece of it, then that part of the disk is potentially wasted, right? So this is our old friend internal fragmentation. All right, let's see here. So EXD4 inodes, so I'm gonna have at least one inode per file, right? And inodes in EXD4 are actually allocated when you format the disk, right? So when you format the disk using EXD4, it creates a bunch of areas on disk, right? And it turns out that there's several of these, but you could think of it as just being one, right? But when you format a disk using EXD4, it creates all the inodes that the disk will ever have, okay? And so you can actually, depending on how you format your disk, you could run out of files on EXD4 before you run out of space, right? Because if the EXD4 runs out of inodes, it will stop allowing you to create new files, right? So on file systems, if you had like a lot of tiny, tiny little files, right? You might tweak your parameters when you formatted your disk to tell EXD4 to reserve more space for inodes, right? So each inode is 256 bytes, right? And so I can pack 16 of them into a disk block, right? The inode contains the location of file data blocks, right? And as far as the contents are concerned, and we can talk a little bit about how that location works, right? It includes permissions about the file. It includes these timestamps that we've talked about. Now again, this is EXD4, right? So it includes the creation time, access time, content modification time, attribute modification time, and delete times, right? This is kind of interesting. And inodes are named and located by number, right? So again, numbers are what we like. Okay, so let's start using one of these fun tools, right? So, and you guys can use this on your own machines. This is called Debug FS. And what Debug FS will do is it'll print out some information about a particular inode or a particular disk, right? So let's see here. This is showing us, so this is, so I've asked it to print out information about inode number two, right? So again, inodes are located by number, right? And so what does it mean, right? The disk knows about inode numbers, right? You know about paths. So what does the disk have to do to find file contents? Yeah, I have to translate the path to an inode, right? We'll talk about how to do that probably at Friday. All right, so what this tells me is that this inode type is a directory, right? And as I hinted at before, Unix file systems normally store directories as files, just a special kind of file, right? With a very specific format for the contents, right? This is the mode, right? So if people are familiar with Linux, these are permissions, right? This tells me the user in group, anybody know who user group zero are? That'd be root. The size of the file, right? So again, this is a directory and it turns out that its size is 4K, meaning that it takes up one block, right? Let's see here, here are all my timestamps, right? So the creation time, access time, modified time, and what was that other thing? Sorry, the content modification time, access time, whatever, yeah, these are timestamps. The naming is always confusing, the creation time, right? So yeah, you can tell the time that I started setting up the web server for last year's class, that would be Sunday, January 8th, apparently at four in the morning, which I don't think is actually true. I think that might be GMT or something, it was probably like nine a.m. Jeremy. Ooh, who thinks that because there's a fixed number of inodes that they would be stored in an array? I do, yeah, they are, and we'll talk about where they are in a sec, right? And that makes indexing very efficient, right? So finding inodes is quite easy, right? Once I have an IDO number, finding the contents of that IDO on disk is very simple, right? Mapping paths to IDO numbers is still hard, right? Yeah, yeah, so what's the, why do you think I would store this delete time, right? So first of all, doesn't, don't these contents get destroyed when the file is deleted? So remember, do the contents of the inode are the contents of the, so first of all, is the inode ever deallocated? No, right? EXT4 creates a fixed number of inodes when you format the disk, right? And they immediately take up space. So if you've ever bought a brand new hard drive, like I did 20 gigabytes, huge, and then you sit down and you format it, and what do you notice? Immediately the capacity goes down, right? That's because file system data structures take up space on disk, right? So as soon as you format an EXT4 file system, depending on how many inodes you allocate, those inodes take up space on disk and that space is always used by the file system. It will never be yours again. You can never use it for your burgeoning, you know, collection of MP3s, right? But so if a file is deleted, right, I would store the delete time, how long, but how long would this delete time be useful? So when a file is deleted, I can update the inode and I could say, here's the delete time. When would the inode contents potentially be reinitialized? I don't know what somebody asked an answer to the question. Okay, now, so I've deleted the file, right? And maybe I've deallocated the blocks that hold the file content, right? But again, have I deallocated the inode? No, the inode's just sitting there, right? When would I potentially reuse this inode? Yeah, if at some point I need a new inode, right? Like I'm creating a new file and I look around and I can't find any other inodes, I'll come back to an inode that I've already used and at that point I'd reinitialize it and I'd probably reset the delete time to be zero or nothing, right? So the idea is that I think these delete times probably persist until the inode is reused, right? So, but you know, I don't know why actually EXT4 stores the delete time, but it does. All right, so, and then also remember the inode has to allow me to find, so I'm gonna use a path to find the inode. The inode has to allow me to find the rest of the file, right? So, down here, right? What is also stored with the inode is the blocks that correspond to this file, right? Remember, this file is 4K, so how many blocks does it have? One, right? And this gives me the block index of the one data block in this file, right? This is the directory that one data block is actually going to be the contents that I'm gonna use to map path names to other inodes, right? And we'll talk more about that after I, right? So, it turns out I think that on EXT4 file systems, two, the inode number two, it used to be one, I don't know why it got bumped, right? But the inode number two is special, right? What do you, can anyone guess what's special about inode number two? It used to be one, yeah. It's the root of the file system, right? So, this is the inode for root, right? It's a directory, right? Doesn't have very many entries in it, so it's only one block. It's owned by root, right? It was created when the system booted, right? So, yeah, this is forward slash, right? It's the root of the file system. So, as we said before, all inodes are created at format time, right? So, when I format the disk, all these inodes are allocated by EXT4 and the consequences are, well, there's two consequences, right? So, first of all, inodes may not be located near the contents of the file, right? This is kind of not the greatest thing, right? So, one of the, so if I, let's say this was my disk and I put all the inodes here, right? Well, what happened is that, you know, a lot of times I'd be seeking back and forth between my inodes and my data blocks, right? And I could potentially have really long seek times. So, what EXT4 actually does is, and again, these are all configurable parameters when you format the disk. It creates multiple inode groups throughout the disk, right? So, if this was my big, huge, two terabyte hard drive, I wouldn't have all the inodes right here. I'd have a bunch here, I'd have a bunch here, I'd have a bunch here, I'd have a bunch here. And then, when I start allocating files, I try to find an inode and then I try to find data blocks that are close to that inode, right? Does that make sense? Yeah, Jeremy, sorry? No, no, no, I think there's some constant in the disk that allows you to basically take, so essentially what I'm doing is I'm taking an index and I'm mapping it to this distributed array. But I think I can still do that if I know how many, like what the number of things per entry are and where they are, right? I think that's still a pretty simple operation. So, what Jeremy is saying is, it's a little bit more tricking out a map between I know number and find the disk block that I need to get that inode, but it's still fairly easy, right? There's a few more constants I need to know, right? So, as I said before, you could run out of inodes before you run out of data blocks. So, it's possible that I could not be able to create files on an EXT4 system, despite the fact that there are data blocks available, right? By default, EXT4 creates one inode per 16K of data, what does that mean? What assumption is it making here? If I create one inode per 16K of data blocks, what do I hope about the files on my system? Well, they're gonna be, I'm gonna, damn. Yeah, the average size is 16K, right? Who thinks the average size of files on your system is about 16K? Well, they do, right? So, it's a really great question, right? So, we talked before, I mean, there are some files that are huge, right? Like, probably most files that you guys deal with on a daily basis, whether they're videos, pictures, any sort of media format is definitely gonna be bigger, 16K. But remember, what did Bethany point out before about configuration files? They're tiny, right? So, you've got all these teeny weeny little, like, 1K, 2K, 4K files, right? About 4K, right? That's probably the smallest file size. And you have a huge number of them, and then you have probably a heavy tail of sort of bigger files, right? You interact with those bigger files more frequently, but that doesn't mean there aren't a lot of small files as well. Yeah, Bethany, do you have a question? Yeah, that's true too, right? And that's one of the reasons that EXT4 allows you to change this, right? So, yeah, so if you were setting up like a video server or something, right? You would probably change this to be quite a bit bigger, right? You would tell EXT4, you know, create one iNode for like 128K, or maybe even a MEG, right? What that means is, and actually, when you form at EXT4, it'll do a number of different optimizations based on the average file size, right? It'll also give you bigger extents, right? What it means, however, is that you can have fewer files on your system, right? But there's some number of, there's some amount of space you can recover by not creating a bunch of extra iNodes you're never gonna use. Yeah, Jeremy, I don't know why that is. That's a good question, and I don't know if that's the reason, but I'm not sure why. All right, so on Friday, we'll keep talking about on-disk layout, and we'll talk about how we, particularly how we translate file names to iNode numbers.