 Yo, what's going? How many people are done the first part of assignment three? How many people are done with assignment three? Okay, well, then more time. Yeah, assignment threes do on Friday. The scores look pretty good. Looks like this part is not challenging people too deeply. So today we're going to keep talking about on this data structures and we'll do a little design exercise where we talk about caching, which is one of the obvious ways that we use memory to make the file system seem faster. So remember, we just got done talking about swapping where we use the file system to make memory seem larger, but we also use memory to make the file system seem faster. Sort of an obvious, obvious thing. Announcements for today. Have anything up there? Grading update. So Ali has promised me that on Friday, grades will be done for the midterm. However, the midterms will not be ready to return on Friday. So here's what I'm going to do. Friday is also the deadline for the midterm grade evaluations. I will get as much of that done. I think it's unfortunately also at 5 p.m. So probably at some point on Friday, once I have the midterm grades incorporated and your scores for assignment one and assignment two, I will put those together and produce an estimate of a midterm grade. I don't even know what my options are. Whatever. So I will assign midterm grades on Friday that will include the midterm and then starting next week, Monday, Tuesday, Wednesday during office hours, you can come in and pick up your exam. The only reason for the delay is we need to take them to the scanners to get them scanned in so that we can return them. Any questions on that? Is that okay? Everybody's excited to get their midterm back? Okay. So last time we were using some tools to poke around on a real file system. This is EXT-4. I think we left off right around here where we had used this particular tool to start printing off some information about inodes. So EXT-4 identifies every file with a number, referred to as its inode number, that inode number identifies an inode, which is sort of the top of the data structure that holds all the other information about the file, including the file contents. Some of the information is stored directly in the inode. That includes the stuff that you can see here, including things like timestamps, permissions. Anyone remember what this particular inode is? Idle number two, kind of a special case. Yeah. Root. This is the root of the file system hierarchy. It's important to have a well-defined root. We'll see this why we need this when we talk about path translation. We need to be able to have a starting point. We're going to get to that in about 10 minutes where we talk about translating an actual path name to an inode number, which is something that the file system has to do every time you access a file by its name. It needs to find the inode number that corresponds to it so it can locate the rest of the information about the file. Anyone remember? EXT-4, all files have a number. How does EXT-4, given an inode number, how does it find the inode that contains the information about the file? How does it find this data structure? Anybody remember how this is done? It's a very common operation where I know I need to look up some information about inode X. How do I find inode X on the disk? Yeah. Yeah, so I'll remember all the inodes are put in these well-defined locations at format time. So one of the things that EXT-4 does when you format the disk is create all the inodes that it will ever use, that will ever be available. And it puts those in specific spots on the disk so that when it needs to look up a particular inode number there's a very fast and simple translation that allows it to figure out what disk block do I need to load in order to find this information. So this is an example of this. I have a couple of blocks up here that store my inodes and then I have the data blocks I'm actually going to allocate. What does this remind you of? Something that you might be working on right now? Yeah. This is like Cormac, right? I have a bunch of space on disk. I take a little bit of it somewhere. This is a normal place for it. Use it to create some data structures that I need in order to allocate the rest. And that's just overhead here. The actual stuff where it's going to store all your precious contents is over here. These are the data blocks that I'm actually going to use. But of course I need some information about how they're allocated and what files are associated with this stuff. Okay. So and we said there were a couple of consequences of this. One was that we could run out of inodes. The other is that the inode may not be located close to the data blocks that are linked to it. And actually it's kind of interesting here. So go back and what data block is associated with this inode? Inode 2 is a directory which is a special type of file. 4k in size. It has one block. What block is that? Yeah. 8. You start it off good and then you stall it out there. Yeah. 8, 7, 3, 7. Yeah, right here. This is the block. That's the only block associated with this file. It's a directory. The contents aren't very large and so it fits inside of 4k. And EXE4 here is using a 4k block. Who knows where 8, 7, 3, 7 is? But you can imagine it's not necessarily if I'm numbering the blocks from 0 on upward. I have inode 2 which is probably way over here. And then that data block is somewhere out there, not necessarily. Given the number, you wouldn't assume that it's particularly close to the inodes themselves. The inodes probably take up block 0, 1, 2, 3, or 4. Okay. So let's do some other, let's talk about directories. So directories are kind of the foundation of how we do path mapping. Remember this really nice idea that I'm going to be able to create this hierarchical name space, allows me to organize things, allows me not to have to look at every file on the system at once. Et cetera, et cetera. So how is this actually done? On EXE4 and other file systems, a directory is a special type of file. Think of it as in many ways like other files. It has an inode. It takes up space on disk. The amount of space it takes up on disk and query. But the format of the directory is fixed. And the directory contents are only manipulated by the file itself. So it's just kind of a weird mix of something that is partially a file system data structure, but it is also visible to you as a user. You can manipulate this in sort of well-defined ways. So let's look at this. So I don't know if you knew this, but you can actually get LS to print off the inode number of the files and directories that it lists. Has anyone done this before? Good, because there's no real reason to. I don't know why this exists. So LS-I, I'm listing off the root directory. So here's root, and the inode number as we promised is 2. Did this again? Now let me list the contents of the root directory. And here's everything that's in here. If you've set up a Linux system, this is pretty familiar to you. You've got bin and Etsy, home, sbin, all these familiar directories. But they've got numbers. And every file or directory has its identified bin. I don't ever hear the inode numbers that are assigned to them. There's something weird here. Anybody notice? Well, weird things, yeah. Proc insists that the same inode number, oh my gosh. This is terrible. The file system is broken. What? What's going on here? This is not good. This would normally be a bug in your file system. Quick, call the EXT-4 people. Something is wrong with the EXT-4. We found a bug. Or, anyone remember Proc Sys? Yeah, they're fake. They're fake. Proc insists our fake file systems. They do not exist on disk. And so the inode number is just sort of made up. I don't know why there's an inode number there at all. So Proc insists, do you guys understand what that means? Proc insists do not live on the disk. If you unplug that disk, you can still get to Proc and Sys. Not a problem. The system may crash for other reasons, right? But Proc insists our pseudo file systems, and that they look like file systems, you can manipulate them like file systems kind of. But they don't actually live on disk. So has anyone ever messed around with Sys before? Has anyone ever used Sys for anything? Oh, interesting. Has anyone ever used Proc, poked around in Proc? Has anyone used Proc indirectly? How many people have used Proc indirectly? Everybody raised their hands. So I think you all have. How? What tool uses Proc? Top. And PS. Yes. Good answers, right? So all top does is every second or so it just rereads a bunch of stuff from Proc and redraws the display. You can look at the code yourself. There's nothing fancy. So what Proc is is a clever way for the OS to share information about things that are going on on the system. I could do this in another way. I could have a special system called Interface that was all devoted to being able to retrieve information about running processes on the system. That would be another way of exposing information. But what Linux does, which is pretty clever, is it reuses this file abstraction. It says, OK, I'm going to create a directory called Proc. Proc has one sub directory for each what? Process or task, actually. I think it's both processes and threads. And inside that sub directory, you can find all sorts of really cool stuff. You can figure out what the command was that used to start it, whatever. But this is all fake. When you're cd'ing and manipulating that directory, all the contents are being provided in real time by the operating system. There's no disk blocks that are being used. Same thing with sys. Sys is more regularly used to control the system. So there's lots of places in sys where, for example, if you want to disable your Wi-Fi card, so this is a cool trick you can use to impress your friends. Instead of hitting the icon, which a normal person would do to turn off your Wi-Fi, you can probably write zero to some value in sys somewhere, and it'll have the same effect. What happens when you write that value? That write ends up in the kernel, and some kernel code runs and says, oh, this person wants to disable Wi-Fi, and so it doesn't. So this is another kind of interesting way that the OS can provide a flexible interface to a bunch of different capabilities that it has without having to use special system calls. These are fake files. OK. What else is interesting about this? There's a couple of other interesting item numbers here. What is this file? Anybody know? VMlinus. I think that's a directory where either the kernel or the directory the kernel lives in. The kernel image is actually loaded in an uncompressed array boot. I think it's a compressed version of the kernel. So it has a small I know number. That's probably one of the first files it was created on. All right. And then here's, so now let's do this inside a subdirectory, and this prints off a couple of, so Home has two subdirectories, one's a bunch of, and again these have these I know numbers associated with it. Any questions about this? For the file system, names are numbers. Pretty common for a computer, right? OK. So file system names, I know numbers, and directories are files, and the job of a directory then is to map names to I know numbers. That's what the directory does. So going back and looking at this, this is just a file, right? So this is the directory we just looked at this. Just one last example using Debug of S. So this shows us some information about, this is what's known as the file system super block. This is even more metadata that's created during format. And this is kind of interesting. It shows you where the file system is mounted, what directory is sort of the root directory of this file system. There's a magic number here that probably corresponds to EXT4, I think. It shows you the state, whether the file system is dirty, whether it needs to be cleaned, which we'll talk about in a few lectures. I node and block count. So here's how many I nodes have been allocated on the system, and here's how many I nodes are available. So at this point, we've allocated about half of the available I nodes on the system. For blocks, you can see that we've allocated about 40% of the blocks on the system. So this is good. This is kind of what you want to see. It means that I'm allocating space faster than I'm allocating I nodes. But they're not too far off. If I was allocating I nodes a lot faster than I was allocating space, I'd be worried I would run out of I nodes. If I was allocating space a lot faster than I was allocating I nodes, I'd be worried I'd created a lot too many I nodes and they were taking up space on the disk because I could use for something else. And then here's a bunch of other stuff. So block size and information about when the file system was created. Any questions about this? There's simple metadata. Also in the super block are going to be the locations of the I nodes on disk because EXT4 actually stores them in a couple of different places. I'll show you that in a second. Ah, here we go. More information. I'm going to show you it right now. I node size. This is the size and bytes of the I node data structure, 256 bytes. The first I node is 11. I don't know why. That's very interesting. Now down here, what this starts to do is tell you so on a large disk, what would be the problem? Let's say I did the following. Sort of like what you guys do for your core map. I put all of the data structures at the beginning of the disk and then I put all the data blocks at the rest of the disk. This sounds like a great idea. What's the problem on a really large drive? Or even on a medium sized drive? What problem does this create on a spinning drive? Yeah, Sean? Yeah, remember, every time I access a file, I usually need to touch the I node and a data block. And if I put all the I nodes in one spot on disk, think about it physically, probably where those I nodes are going to be is on one of the edges of the platter. And then the data blocks are all the rest of the way. So if I'm asking, as the disk gets full and I allocate more and more data blocks, the data blocks get farther and farther and farther away from the I nodes that I've created. What's a simple way to solve this problem? Do you want to answer that question? Do you want to say something else? Yeah, that's a great point too. So mission critical data structures, things like frequently I nodes and super blocks, a lot of file systems actually make a couple of copies of them on the disk. Because remember, parts of the disk can go bad. So if I have a sector that fails suddenly and that sector had my super block on it and that's the only copy of the super block I have, the whole disk is completely toast, right? Not good. What's a simple solution to this I node location problem? Well, that's even more complicated than I was thinking, yeah. I could put the I nodes in the middle, okay, so that's, now I've gone, I've had this, the seek time's gone down by a factor of two, so that's a little bit better. What else can I do? Well, I mean, I'll just, I'll accept that answer. You can, I can put groups of I nodes throughout the disk. I can put a group here and then have some space, put a group here. So I put groups of I nodes at different places and this is what EXT4 does. And so this is the beginning and this goes on for a while. I think on this drive, they were like, I don't know, a couple dozen groups. Each group is like its own little disk. Every group has some I nodes. It has a block bitmap. What do you think a block bitmap is for? What do I do with my block bitmap? It's a bitmap though. It's really small. What do I do with the block bitmap? It's in there. There's an I node bitmap too. What are those four? Nope. Nothing to do with addressing. That's an interesting direction to go in though, yeah. Yeah, this is my used block bitmap. This is how I know whether stuff's allocated. That's the most efficient way to store an array of allocated or de-allocated bits. It's just one bit. So I pack it into a bitmap to save as much as possible. I've got a block bitmap here that tells me what blocks are free. I've got my I node bitmap here that tells me what I nodes are free. The I node table starts at, these are probably a sector, a disk sector number starts at sector 545. And this tells me some information that I can pull from those bitmaps. I've got some free blocks. How many free I nodes? I nodes. How many free I nodes? Zero. So this group is actually full. This group has all of the files in it that I can create. There's still some data blocks left. And to be honest, I don't know if EXD4 allows me to allocate data blocks from a different group. Maybe it does. But the point is in this group, I've used up all my I nodes. And I still have some space left. In this group, you can see I've used up all my I nodes. I have fewer free blocks. I'm going to engage in some wild speculation and just point out that when you create a system like Ubuntu, a lot of the files that get created at the beginning are probably small. They're like little configuration files and other things. So those probably cause the early files to take up less space on average, right? You can see that clearly the average file size in group one is smaller than in group zero, right? Because they both have the same number of files. In both cases, they've allocated all the files that they can hold. But in group zero, I have more free space still left over. Any questions about this before we go on? Okay. So let's talk about how we actually do... Now you guys have all the information that you need to understand how we do path name translation. Let's talk about how we do this. So when you open a file, you pass in a path. What does the file system have to do? How do I find... What does the file system need to do? There's a translation step that's involved here. You need to translate, in this case, Etsy default keyboard into what? To an inode what? Number. I need to find the inode, right? So I need an inode number from this thing. How do I do that? Okay. Now, in certain cases you might pass in a relative file name. What does the system do if I pass a relative file name? How does it get this absolute file name? Yeah. This is a current working directory. That's why your process is a current working directory. That's what allows you to use. The point is that by combining the current working directory and a relative path name, I can get an absolute path name that the system can then consume. Okay. So I have to translate this to an inode number. So here's how I do this. I start... So essentially I break this up by the path name delimiter, which in this case is a forward slash, and I handle each part incrementally. So the first thing I have to do is I have to bootstrap the process. So I need to find the root inode, and this is why the root inode number is hard coded, because this allows me to bootstrap this process without having to know anything else. So I don't have to look up root. I know what root is. So I go and I open the file with inode number two. What kind of file is that? It better be a directory. It is a directory, right? It's certainly possible in the process of translating a path, if you give the system a bad path, it might get to the point where something that needs to be a directory or is a file at which point it's going to fail. But in this case, we know that the root directory is a directory. So we're good. So now I open the directory with inode number two. Now, remember what I said before, which is that directories essentially are data structures that map path name components to inode numbers. If you guys are familiar with like JSON or something, you can just pretend that the directory has like a JSON string in it that maps path name components to inode numbers. In reality, it's probably a more efficient data structure than that, but that's basically what it is. The directory contents are this mapping. And what I'm going to do is I need to look for an entry matching ETC. And in this case, let's say using our previous example that ETC has inode number 393218. So what's the next step? So I've opened root. I've looked up ETC in essentially the hash table that represents a directory. Okay, well, I don't open ETC, what do I open? The file system doesn't understand those names, those fancy names of yours. It has no idea what ETC is. And of course, there could be a thousand different ETCs in different directories. What do I open? I open 393218. That uniquely identifies an inode. Now, what could happen? Now again, you guys are all pre-programmed to know this stuff. ETC is a directory. But what would happen if ETC was not a directory? Let's say that I got to this point and ETC was in a directory. First of all, how would I know? Well, remember, the distinction between files and directories is done at the file system level. So when the file system loads the inode, you remember in the inode information it said directory. So the inode allows me to determine whether or not this is a file or directory. If it didn't, I could make files and be late in the contents. The file system doesn't want me to do that. That's weird. So I find... Now, there's a couple things that could go wrong here. What else could go wrong? I'm looking up 393218. What else might happen here? At this point, that would cause this to fail. Yeah. The inode 393218 does not point to a valid file. It's an unallocated inode. It's empty space. Now, assuming that the inode exists and that it's a directory, I do the same thing. And I just repeat this process until I locate the inode number that I'm looking for. And this is how path name translation works. I think that's it. And then eventually, assuming this all works, assuming that every step succeeds, I open the file. Now I've managed to translate a path to an inode number and now I can do whatever else I want. So this is one of the translation steps that the file system needed to be able to perform. It had to be able to translate names, path names to inode numbers. And this is the way this is done. Any questions about this before we go on? Pretty simple. I wish I had an example just showing the internal contents of a directory, but again, you can imagine what's in here. Has anyone ever... So when you're like just because I know in your free time you just kind of like CD around your system and just type ls in random places, right? That's what I do. What's the file size in most directories on a Unix system? When you list directories, it'll list a file size. What is it? 4k, why is that? It's the block size. That's the minimum directory size. Is that the size of all directories on your system? How can I create... Let's say I wanted to create a directory that was larger than 4k. What do I do? No, remember I can't manipulate those files directly. I have to do things to the file system that caused the directory to get bigger. What would I do? Yeah. I could add a bunch of files to it. If I was lazy and I wanted to add as few files as possible, what would I do? What's that? Nope. A bunch of files? No, I'm even lazy, right? How do I add as few files as possible? What's that? No, a directory is just another name. It doesn't matter. How do I add... This is a great question. I should have put this on an exam. How do I add as few files as possible? What does the directory have to store? It has to store mapping between what and what. The name, the path name component, and the I know number. Which part of that can I control? Yeah. Make a bunch of files that are really, really, really long names. Right? Because, look, I mean, that name's got to be in there somewhere. So if I have no idea what the maximum file name size is on a bunty, you guys can try this right now if you want to. Just hold down a key for the rest of the lecture. You know, like just right touch and then hold down the A key and, like, drift off like you normally do. Right? And when you come back to, hit return and see if it works. I'm sure there's a maximum at some point, but whatever the maximum is, that's what you want to use because that's going to make the directory as big as possible. But if you have a lot of files in certain directories, you'll notice that the directory size will start to increase. And that's simply because I need more space to map all of the files to I know numbers. What else does this mean about file system performance? Interesting performance implications. Given all that you guys know about data structures and algorithms. Right? How can I make this process very slow? Right? Yeah. Okay, so I can make it longer just by adding folders since all folders, that's certainly true. But what else will make this slow? There's another step here that could potentially be slow. I have no idea what fancy algorithm is used here. If the directory has a lot of entries in it, then that takes a while. Has anyone ever run LS and it just sits there looking at you for a couple of minutes? And then it starts like printing off gobs and gobs and gobs and gobs and gobs and gobs and gobs of output. Usually it's because you've been core dumping in there or something stupid. But whatever. But in that case, the reason it's taking so long is because it actually has to read the whole content and then maybe LS is doing some sorting or something stupid. The longer those lookups take. These are just some interesting consequences of this process. Okay, so now we've talked about how I map names to numbers. And that's sort of half of the story here. The next thing that we're going to talk about is something else I have to do. So once I get to the file, so now I've taken the path name that you gave me and I found the I know number. I'm in good shape, right? Unfortunately, you actually probably also want me to get that kind of interesting. I mean, file systems get no points for just finding inodes. You would not be very happy with the file system that just did that. So now I actually have to be able to retrieve and modify data blocks, right? The actual contents of the file. Now in this case, there's a different translation problem. So now I have an I know number. If I'm trying to, you know, I'm saying read and I made up a mistake. What does this call have to do? That's to translate something into something. So the file handle at this point, if the file system is saying it has information about the I know number in it, and this has all been done, right? Because remember, open did that translation. This is another reason why it's helpful to have open. Because open means that I can do that path name translation to I know translation once. So I can read and write and just reuse the results. So this is another reason that it's kind of helpful to have open and close. So I don't have to do those translations every time. But I've got this. So I've got an I know number. I know the file I'm working with. What am I translating now? I want to write a byte of data or read a byte of data. Second translation step. What am I being provided? I've got the I know. I know that. That's done. A plus on the I know. What don't I have? Data block. I've got to find the data block. What am I being given? Or what do I know? I like that. The logical location. Yes. What is that usually called? Offset. I know the location in the file. Now, the file is an abstraction. I mean, the file goes from zero to the length of the file. But it's probably not laid out that way on disk. I'm just figuring out how do I map the offset to a data block or data block. Because if I'm writing a large chunk of data, I actually need to find all the data blocks from the beginning of the right to the end. Same thing with a read. Okay, so now I'm translating this offset to a data block. I need to figure out what data block or blocks to modify. This is a little trickier than what I did before. I'm just going to use the math names because the file size can vary. I want to be able to support very small files and I want to be able to support very large files. There are multiple ways of doing this. Let's talk about a couple. There are similar trade-offs here in certain ways to what we talked about with page tables. I can store the blocks in a linked list type data structure. You can imagine if I did this the i-node has to be the head of the list. It's got other information. It doesn't store any contents, but it has to point to the first data block. Then every data block contains potentially at least a pointer to the next data block. Maybe if I want to get fancy and go backwards more easily without having to start over, I also contain a pointer to the previous data block. What's nice about this? Is that nice? This is kind of simple. The amount of information sometimes me and the slides are on the same wavelength. The amount of information in the i-node is very small which is good. That's something I always want. Remember, i-nodes are fixed size so I just can't put very much stuff in there. What else? What's a problem with this? There's one kind of disgusting programming grossness problem here. Can anyone identify what it is? How is this going to make you sad? That's an algorithmic bit of grossness. I have O and lookup time to get to a random location. That's sad. There's another bit of sadness here that, trust me, you would hate to have to do this. Okay. You're on to something with that answer. It's not quite correct. Who maintains the pointers between the blocks? The file system has to maintain those. That's not file system state. What does that mean about the data blocks themselves? How large are they? They were 4K and then I had to do what? No, I have to link them together. In order to link them together, what do I have to do? What has to be in the block? No. The pointers. How big are the blocks now? How much usable space is there in each block for the person to read and write from? 4K minus 8 bytes. You're going to just hate that as a programmer, trust me. You're like, how many blocks is the file? Okay. That divided by 4,092. It's not a good computer science number. You just don't want that number. And the offset lookups are slow. That's the other thing. I think I would be more bothered by the other thing. Okay. Here's something else I can do. I can store all the data blocks in one sort of flat array and just index the array. When I create the file, I create this array of data blocks and then I use the offset as an index. What's nice about this? Compared with the other one? Yeah? Oh, one lookups. We like that. What's terrible about this? This will also make you super sad. Yeah? I need a contiguous file of blocks, which is sad. What else? I have a very small and fixed size file. Anytime I create a file, I've got to put aside all the space in the way, which is also sad. And then I need to... That's the maximum size a file can get. And it's not going to be very big. Say goodbye to all your MP3s and your pirated movies and whatever else you have. That's big. And a large portion of this is probably empty for some part. Here's what we do. Here's the actual way we solve this problem or the way a lot of file systems solve this problem. We have a lot of files on our tree, but it's a little more clever. And the observation here is that... And this is an interesting trend. So, for a while... I actually... I would love to see the distribution. If anyone wants extra credit... A few points extra credit. Compute the distribution of file sizes on your machine and send me a graph or post it on Piazza. Because I would be fascinated to find stuff. But... Anecdotally, a lot of files are small. Those small files on Unix-like systems where you don't have a registry are things like configuration files. Small bits of information here and there. They're scattered all over the system. But we also want files to be able to grow as large as... basically infinite size almost. And some files do get big. Now, I suspect that over time the distribution of file sizes is that that tail is getting bigger and bigger and bigger because you guys have more movies and more media and more photos and more stuff like that that are bigger. So I think it's probably shifting to the right a little bit. But those little small files are still there. So here's what we do. The inode stores one pointer basically. It stores a pointer to blocks. It has some pointers to blocks that are part of the file. So we refer to these as direct blocks. The inode also can store pointers to blocks that contain pointers to blocks. So these are referred to as indirect blocks. What else do you think we can store on the inode? Just extending this idea. Adding an additional layer of indirection. Of course, we can store pointers to blocks that have pointers to blocks but the files have pointers to blocks. So these are called doubly indirect blocks and you can keep going, you can have triply indirect, you can have quadruply indirect whatever. So how does this look? It kind of looks like this. I should really have a better diagram for this. So the inode in this case has two pointers. So this is the beginning of the file. This is the first data block. This is the second data block. Now once the file gets larger than this this is my indirect block. Now the indirect block itself has two pointers to data blocks. If the file got even bigger, I could allocate a pointer to a block that itself had pointers to other blocks and pointers to data blocks and you can just imagine this keeps going and going and going. So on modern file systems now you can use this technique to essentially address, I don't know, I can't remember ZFS, I think the maximum file size is like in the petabytes or something. So what is in a petabyte file? I don't think I would try to open a petabyte file. I think I would be frightened. Vim would be very slow, right? Trying to load the whole thing. Sitting there like, oh, just let me, I just want to change one byte. Because clearly what you want to do in a petabyte file is change one byte. That's a common operation. So what's nice about this? Other than the fact that I can essentially by manipulating the number of blocks I can pretty much store as much data as I want. But from a data structures perspective what's nice about it? So these doubly indirect blocks what do these represent? Do these store file data? No. They're overhead. So, and to link all these blocks together and to allow the file to be small and big there's going to be some overhead. That's unescapable. I'm going to have to have data blocks that are gone. That are not able to be used to store data. But from a data structures perspective what do I want? What's the goal? Yeah, so I want a quick lookup. So that's one thing. Fighting against that. So the fastest lookup was this array that didn't allow the file to be sized. The other problem with the array was that the amount of overhead was fixed. What I want is the, I want the overhead. I want the data structure size to scale with the size of the file. So in this case if I have a file that only has one data block I don't have any overhead. The inode just has a pointer directly to that block. When you remember we looked at the root inode that's all it had in it. There's no extra overhead. It's 256 bytes. It had a pointer to block whatever it was 8, 7, 6, 7 or something. That's the contents of the directory. I'm done. There are no extra data blocks that have been harmed in this process in order to store a small file. And that's really good because a lot of the files on the system are small. Remember every directory is itself a file and most of those directories are 4k. So in this case it's really nice. I have no extra blocks that I need to store the directories but as the file gets bigger I have to start allocating space to store these doubly and triply indirect blocks. The nice thing is that space grows logarithmically with the size of the file which is good. And the lookup times also grow in the same way. And the offset lookups are still fairly fast. I think this is if you want to impress me do the big O analysis. I don't really care. I mean it's better than having to walk a linked list. So I'm sold. And of course it allows large files to get really, really big. Really as big as I want. Certainly as big enough to be private. Okay. Any questions at this point? Let me see where we are in time. I might... Alright. I'm just going to stop here before we launch into a discussion of caching. We'll come back and start here on Friday. Good luck with assignment 3 part 1. We'll see you guys on Friday.