 Well, I'm going to ruin your day, sorry, in two ways. First of all, we're going to talk about file systems, which I don't know, maybe that's what you like to do on Friday, but I wouldn't necessarily guess that to be true. So we'll talk about file systems today, and talking about caching, how we make file systems fast, and about consistency as well. So just a couple of announcements before we get going, the regular office hours, including that one hour that we've canceled a few times, and the TAs will return next week, since the TAs are done creating the midterm. And you have five weeks left to complete assignment three. Currently, the max score is a 30 out of 100. So you guys have earned, as a class, four points in the max score in a couple of days. So actually, if you keep improving at that rate, you'll probably get to 100 pretty quickly. So anyway, any questions about assignment logistics? Five weeks. I know it feels like a long time. But again, if you think that you can complete the assignment in under five weeks, go ahead and do it. Just start now. Don't start two days before the deadline. So unfortunately, the first thing we need to do today is talk about the midterm. So we're done grading the midterm. The grades are going to be available on Monday. So I'm not going to bring the midterms to class, because that just wastes time in class. But if you want to pick up your exam, please come to office hours where you can get your exam, see the grades, and talk to the TAs about the scores. And at some point, probably on Monday or Tuesday, I'll get around to moving the statistics from the grading into the solution set so you guys can see some of the statistics I'm about to show you, as well as per question breakdowns and things like that. So that'll probably, when that's done, I'll let you know on Piazza. The overall results were pretty grim. So out of 50 points this year, the mean was a 19.5. And the median was 19. The maximum score was a 39. So this would have been, I guess, a little bit more appropriate if it had been a 40-point exam rather than a 50-point exam. In contrast, I was actually really proud of myself, because for three years straight, I gave midterms that pretty much looked like this. So last year, the mean and medium were about 35, and somebody came within a point of getting a perfect score. So all to say, when you pick up your midterm next week, please keep these numbers in mind. If you got a 20, you did better than half the class. OK, so we put these statistics together a few days ago, and I spent a couple of days taking long walks outside and trying to figure out what happened. And so I have three possible explanations. The first one is you guys are just dumber than last year's class. So that's possible. It's not my favorite explanation. The second one is potentially more likely, which is I'm just doing a terrible job this year teaching this class. Just terrible. And again, that's more likely than number one, given statistical variance. And the third one is that the midterm was just too difficult. So let's see here. So I don't think that it's you guys. I don't think you guys are dumber than last year. So let me show you why I don't think that. So these were last year's assignment scores on the first three assignments. And keep in mind that last year they had a long time to do assignment two, twice as long as you did. Here are yours. So assignment zero looks pretty similar. Assignment one is better. Assignment two is much better. So you guys are doing significantly better on programming assignments than last year's class. And to some degree, I would just say thank you, because that's really what I would prefer you guys focus your time and energy on, because I think you'll learn more that way. And that's the heart of the class. So there are people that come in here and they're like, oh, I'm just going to come to class and try to do well in the lecture, and so maybe I'll be able to pass the class without really doing any programming. And I really dislike that approach. And thankfully, you guys aren't doing that. You're doing really well in the assignments. So clearly, hypothesis one of you guys just being a terrible class is not true. In fact, you're stronger than last year. OK, so the second explanation is it could be me. And again, I'm totally willing to admit that this is possible, but I'm pretty much doing the same thing I did last year, except in a different room. So maybe we need to move back into Cook, a room that everybody complained about, but maybe that would have boosted your midterm scores by 15 points. Somehow I don't think that's the case. So again, I'm not saying that I can factor myself out of the equation, but I just don't think I'm doing something totally different this year. Maybe it's the periodic tables distracting you during lecture, I have no idea. So that's led me to sort of conclusion number three. And this is what I think is actually the case. I think the midterm was just too hard. And that's what I'm going to go with. I found this memo online. I think it's awesome. Sorry. I mean, clearly, you can't just blame the piece of paper, right? It's not like this midterm fell out of the sky and somebody forced me to give it. Like I wrote this thing, right? So anyway, clearly. So hopefully it's not quite this bad. And hopefully I don't look like that guy either. So neither one of those things make me happy. OK, so anyway, I did write the midterm. I'm sorry the midterm was too hard. I hope that you guys, you know, so when I was an undergraduate, I took a midterm in one of my courses freshman year, and it was terrible. In fact, I think it might have been equally bad, if not worse than mine. I think the average is about the same, actually, except it was like a two-hour midterm. And I remember I got there and I answered. I kind of answered one of the questions. I didn't really know. It was a math test. It was terrible. And then I just sat there for an hour and a half, sort of not knowing what to do. It was open book, too. So I had the book there, and I was kind of like just reading the book just in case I would be inspired. And it was a massacre. And happily the professor, who was a wonderful person, sent out an email afterwards. And I'll never forget. He said, you know, I'm really sorry. The TAs, I think, kind of knew what was happening, because there were these last minute recitations where they were sort of desperately trying to get us ready for what was clearly going to be a bloodbath. Like they told us the answer to some of the questions. One of my friends just wrote down on the board what they wrote down. He didn't understand any of it, and he just copied it into his book. And he got full credit on that problem. He's now a professor at MIT in economics, in game theory. So that's not a dumb person, right? Anyway, so I'll just say that I hope that the midterm didn't discourage you. I don't think that I, you know, as far as it being hard, I don't think it was hard in a weird way that would have affected some of you more than others. But I just really hope you guys don't feel discouraged or downed or whatever about how the midterm went. It's my fault. It was clearly too difficult. I like the midterm to give people a chance to sort of stretch out and really show me what they know. But this was way off in terms of being poorly calibrated. So again, I apologize about that. So let me try to address some of the concerns here. So is this going to affect your grade? No. So the way that we put grades together in this course is that we don't, I don't give letter grades on individual assignments. My wife has been teaching at RIT, and she's really worried about how her students are going to feel if they get a C on a 10-point paper. So she essentially can't give out a grade lower than a six, right? I clearly don't feel that way. But what we do at the end of the class is we take all the points, we throw them all together, we put everybody sort of up in his spreadsheet, and we find what looked like reasonable distributions where grade should be. And that's how I assign grades. So it turns out over the last couple of years, the grading distributions and the grading breakpoints have been pretty similar each year. It's just enough of you now that this is a statistically significant sample. But anyway, this year, given how well you guys are doing on the assignments, I actually expect to hand out a lot more A's this year than I did last year. And it will be duly noted during this process that this year's midterm was quite a bit more difficult. So when comparing you guys with last year, I know that. And yeah, again, if you guys do as well on assignment three as you do on assignment two, there's going to be a very large number of A's in this class, which will make me extremely pleased. And again, there is no quota on the number of A's. I don't have to purchase them from anywhere, they're free. I'll give out as many as I want. Is the final going to be this difficult? Why not? Yeah, so no. OK, I promise that the final will be better calibrated. People usually feel like they don't have enough time in the midterm. I think a lot of people felt that way. We saw a lot of answers for the long answer questions that were not particularly good. And I think that's because people got bogged down in the short answer questions and just didn't spend enough time really thinking about how to address some of the long answer questions. And that's because I think some of the short answer questions really should have been long answer questions now that I think about it. So the final, usually, people feel like it's more relaxed. It's a longer exam. It will not be three times as long. So that's one thing. And then some of you guys may have also heard that it's an incentive to get you guys to fill out the course evaluation feedback forums. I usually distribute some of the final exam questions beforehand, which gives you a chance, which will hopefully make things even calmer still. The funny thing about that, though I'll point out just to warn you, is that the grade distributions on the questions that I distribute beforehand are usually very similar to the ones that I don't. So anyway, I don't know what that means. OK, so any other questions about the midterm? Yeah. I mean, I thought about reweighting the exams, but I don't want to put any more pressure on the final than there already is. Remember, the final exam, the midterm is, I wish I knew this, is it 15 and 25? Yeah, I think it's 10% participation. Or is it 5, 10, 20? Yeah, there is. Or maybe I got rid of it this year. That's right. Yeah, OK. Again, if you guys, I'm open to suggestions, right? I mean, if you guys think that that would be a reasonable move, I'm OK with that. But keep in mind, I mean, that just makes the final into a little bit more of a high stakes affair. But yeah, I'm open to suggestions here. I haven't really decided what to do. I was just going to sort of plow ahead as normal. But I did want to talk about it. Any other questions? Yeah. I liked the old exams. I'm usually going for about a 35. And again, I'm not worrying about what this falls on the letter grade scale. I like to have an exam that's hard enough that the really good students in the class can do well, really well, right? Can sort of show me who they are and don't end up running up against the maximum. And in the past, we've had a pretty good thing where I don't think I've ever given out a perfect score on the midterm. The last two people got very close. But yeah, this is not, again, this is not what I wanted. I probably would have been happy with a third. But a 20 was off. And again, I understand it's a frustrating experience of taking an exam that's just too long. So I apologize for that. Any other questions? OK, so now let's talk about file systems again. So just a little bit of review from last time before we go forward. We had sort of got to the point where we were talking about how to locate data blocks associated with the file. But let's just back up and review for a minute, right? So in the EXT4 file system, the inodes, the internal data structures that describe the files. There's one per file, they're 256 bytes. What sort of information is stored inside the inode itself? Once I find the inode, what do I know about the file? I've got 256 bytes. What do I put in there? Yeah. What's that? Yeah, it may have some permission information. What else? Yeah. There's a bunch of timestamps in there. But what's the most important thing that I need to have in the inode? Yeah, Isaac. I need to be able to find the data blocks. Remember, the inode is just the first step in actually accessing the contents. So the inode had better lead me to the data associated with the file. And today we'll talk about the data structures that are used to do that. All right, good. And remember, the inodes are named and located by number. How do I find an inode in the EXT4? I have an inode number. This might be during path resolution. I'm halfway through resolving a path. I found out a particular inode number that corresponds to the relative name that I peeled off. How do I use that to find the inode data structure? Well, that's how I bootstrap the path resolution process itself, because I always know what the inode number is for root. But now what I'm asking is, I have another inode number, like 10,000. And I need to find the data structure on disk that corresponds to that inode. How do I do that in EXT4? Well, OK, the superblock contains some of this information, but what does EXT4 fundamentally do to make this process simple? Yeah. Yeah, they're pre-allocated. And where are they're pre-allocated? Well, someplace where I can find them. So there's a very simple way to take an EXT4 to take an inode number and figure out what block on disk it corresponds to. And I could put all the inodes at the beginning. I don't do that for performance reasons. So I put an EXT4, remember, we put chunks of them. It fixed locations throughout the disk. All right. OK. So what is a directory? It's a mysterious thing that I use to sort of navigate, create names within the file system. But what is it fundamentally? EXT4. Yeah, it's just a special file. So there's inodes for it. And it's named the same way as files are, except for the fact that the contents of the directory file are maintained by the file system. And the contents contain a data structure that maps relative path names to other inode numbers. And again, this is nice because this allows directories to grow and shrink just like regular files. So if I have a directory that has a lot of files in it, I allocate more space for it just the same way I would allocate more space for an actual file. But in this case, the contents, again, are maintained by the file system because the file system needs to make sure that the contents reflect the data structure that the file system is using to map relative names to inode numbers. So remember, when I do an open, what needs to happen? I'm given this string. And so what does the file system have to do in order to finish the open? Or at any point when I need to resolve a path name to an inode number? I just answered the question. So there are many times when the file system needs to take a hierarchical name that's familiar to you or to an application and resolve it to a number, specifically an inode number which allows me to find everything else about the file. And I do this by bootstrapping the process from the root. I start with the root directory, which is a fixed inode number. And then I cut the path name into individual components. And I look up each individual component one by one. And assuming this is a file, root is always a directory. Etsy would be a directory here. Default is a directory and keyboard is a file. And I just repeat the process of opening the file. In this case, it's a directory. And using the directory contents to resolve the next component of the path name, either until I fail because I don't have a directory when I was expecting a directory or until I've resolved the full path. So this is how it works. Any questions about this? So this is one of the things that we talked about that file systems need to do is take names, paths, and convert them into information, into a data structure about the file that describes the file contents and other properties of the file. Any questions about the stuff before we go on? So the next thing we need to be able to do is, given that I found an inode, once I find an inode, there's stuff I know about the file. I know when it was created. I know who's allowed to access it. And it makes a lot of sense to store this information in the inode because it allows me to do these checks before I do anything else. So for example, if some user tries to open a file, all I need to do is find the inode for the file, and I can verify or confirm that they are allowed to open it or what other permissions they have on that particular file. I don't have to do anything else. However, at some point, I actually need to get the contents of the file itself. So you imagine that I'm trying to perform a read or write from a previously opened file handle. And of course, implicit in the read and write operation is some offset within the file. So what does this call need to do? What does the file system need to do in order to process a read or write operation? What do I need to be able to do? So you guys have already implemented the system call. So you know how this works. The system calls don't do much. All they do is they take a little bit of information that they maintain about the file, and they pass it to the file system. So once you guys have called VopRead or VopWrite, the file system has a venode. The file system has an offset. So this, although it looks a little bit unfamiliar, this is actually sort of the lower level interface to files. The offset is maintained by the operating system in the file handle. Once you get to now in the file system layout, it expects to know what byte of the file you want. What do I need to do here? Yeah, together. Yeah, so essentially I have to translate the offset within the file to a data block on disk, or to an address on disk, a disk address. If 3, 4, 5 is a valid offset in the file, then there exists somewhere on the disk a block of 512 bytes or 4k, whatever I'm using, that contains that offset. And the goal is to find it. So I need to translate the number to a data block. And that data block contains the contents either that I want to retrieve, if it's a read, or the contents that I'm going to alter if it's a write. And there are several different ways of doing this. So one way is to organize all the data blocks in the file into a linked list where the root is stored in the I node. So I node contains a pointer to the first data block, and each data block contains a pointer to the previous and next data blocks, allowing me to walk the list in either direction. So what is nice about this? We're thinking about design implications. Yeah, it's pretty compact. And it's simple. Hopefully you guys have programmed a linked list. How many people have programmed a linked list before? How many people have programmed a linked list on a white board? Those of you that did not raise your hands, I would suggest you start practicing that, because that is a very common interview question. And I will admit that I have seen extremely talented programmers just break down into tears when they were asked to code a linked list on a white board. These are people that could have done it if I'd given them a term. So it's a good thing to know how to do. But what's bad about the linked list approach? What is this going to make it very difficult or very slow to do? Yeah, what's that? Yeah, so offset lookups are slow. Every time I need to convert an offset to a data block, I have to walk this list. And the time it takes to do this grows with the size of the file. So this is unfortunate. A second way to do this is maybe even simpler, is just to store an array of data blocks corresponding to the file. And store that array in the i-note itself. What's nice about this? Yeah, it's simple. And offset lookups are constant time. I have an array. Why would this probably make you very sad if you had to use a file system that was implemented this way? I have a max file size. And it's not very big. Keep in mind, these are maybe 4k blocks, maybe with newer file systems that use extents. They're a little bit bigger. But still, I've got a 256 byte data structure. And now I've got to start storing four byte pointers to file. So you better be prepared to listen to very, very short MP3s. Max song length of only a couple seconds. I don't know. Maybe there's super, super hardcore punk that's like 10 second max song length or something like that. But I've never heard about that before. And of course, some of this may be in use now. Who really cares? Such a small array. Whatever. OK, so here's what we do. And this design is based on an old observation about files. And it's interesting to revisit this. People have done studies in the past where they've looked at file systems in the wild. So if you took all the files on Timberlake and you plotted the distribution of their sizes, what would you see? And the observation is that it turns out that there are a lot of very small files. What are these files? I mean, the files you guys use, photos, video, a lot of media type files are typically not small. So what's creating all these small files on a typical system? What do they correspond? They're definitely not MP3s or video files. Well, what could they be? What's that? OK, but remember, people don't use text files. You use text files because you're weird. You're a computer scientist. I use them too. But normal people don't use a text file. Normal people would open up Microsoft Word and create a several megabyte file to contain a couple of sentences. Most people don't use text files. But what are these files? They might be text formatted, but what are they doing there? Who's using them? Yeah? Yeah, all sorts of little configuration files, settings for various little services that are running. If you guys use Mac, Mac seems to have 10 million little configuration files all over the place with two lines in them. So yeah, there's a lot of small files in the system. You don't necessarily interact with them, but they're there. And then, on the other hand, we also want to allow files to get really large, because there are these huge files that you guys throw on your computer, probably typically media and movies, stuff like that. But it could be other things. Huge data sets, whatever. And so what we do is we have the inode store mixture of different types of pointers to block structures. So the inode has a sum number of pointers that point directly to data blocks. So there's some space in the inode that I have available for pointers. It's 256 bytes, and the timestamps don't take up the whole structure. So I have some number of pointers to blocks. We refer to these as direct blocks, because they're directly linked from the inode. So then after that, where the beginning in the file is stored, and then as the file gets bigger, what I do is I start allocating what are called indirect blocks. So an indirect block is a pointer from the inode to a block, 4k, that contains pointers to blocks. So now I've got a pointer to an array of pointers to blocks. So now it takes two lookups to get through that. 4k would be probably the block size I would use if it wasn't bigger. Remember, 4k is the file's system block size. 512 is the disk block size, but I don't really use it. Well, to some number, it wouldn't have to be 4k. It could be bigger or small. And then so what do you think comes next? Now what if I run out of space for the file at this point, what can I do? Can you just extend this idea one step further? I started with pointers to blocks, and then I went to pointers to arrays of pointers to blocks. Yeah, here we go, doubly indirect blocks. So now I have some pointers to blocks containing pointers to blocks containing pointers to blocks, which we refer to as doubly indirect blocks. And you can also have pointers to blocks containing pointers to blocks containing pointers to blocks containing pointers to blocks. This is like buffalo, buffalo, buffalo. Those would be triply indirect blocks. You can just extend this idea, and you can essentially support an arbitrarily large file. So here's kind of how it looks. So here's an example of a direct block pointed to one data block. Actually, sorry, there are two direct blocks here. So here are my two direct blocks. And then here's a indirect block, and that indirect block itself points to two more blocks. Questions about this? Sort of a pretty common data structure used by modern files. To convert offsets into, and keep in mind, I can still directly convert the offset. I don't have to search through any of these arrays. I can figure out, given what the offset is, which slot in which doubly or triply or even in the inode itself it corresponds to, the overhead comes from having to walk through multiple levels. So to get to the end of this file, I now have to do two, I have to read two blocks off the disk. And if the file got bigger, I might have to read three blocks off the disk. And of course, there's some overhead here too, because I'm taking part of the disk, and I'm reserving it for these data structures and such. So compare with the linked list, this has higher spatial overhead. There's always trade offs at data structures. So what's nice about this is that the index scales, sort of logarithmically with the size of the file, and offset looks still pretty fast. And I can also support very large files, which is nice. And yeah, my files can get really, really big. I can't remember. Last time I was it, we installed sort of the latest version of the Solaris file system. It's actually pretty awesome. XFS, I think it is. And I think it supports now multi-exabyte files. I don't know what you put in a multi-exabyte file. The world's longest movie, it's been playing since the beginning of time. But anyway, it is ready for multi-exabyte files. OK. And there aren't really cons of this other than it takes up a little bit more space in some of the other data structures. Questions about this? So now we've done two things that were on our file system to-do list. We've talked about how to translate names into I know numbers, and we've talked about how to translate offsets into data blocks. So questions about this before we talk about the next challenge? OK. All right, so got those two things knocked off. The next thing that file systems try to do is not be super slow. And the way that we do that, how do we make a big, slow thing look faster? Put a cache in front of it, a smaller, faster thing. So in the case of the file system, which is using the disk as its primary storage mechanism, which is big. And back in the days, the spinning disks very slow and with weird geometry properties. And now in the days of flash, still slow, but with less weird geometry issues, although there are still geometry issues with flash, just different. So what's the cache that we put in front of memory, in front of the file system? We call that memory. Remember this thing called memory we were talking about a little while ago? Yeah. We call the memory that's used to cache file system contents the traditional name for this is the buffer cache. I don't know why. It's both a buffer and a cache. So maybe that's why they use both words. So the buffer cache is memory that's reserved for the file system to cache file system contents. And we'll talk about how that happens for the next few slides. However, you might recall that operating systems use memory for something else. So it would be awesome if there was just this thing called memory, and I would just use it to cache the file system contents. That would be great. What else does the operating system use memory for? What else do apps use memory for memory? Allocating runtime data structures, the things that might happen on my stack and in the heap and stuff like that. So there are really two, and now what happens is there are these two competing uses of memory. I've got a fixed amount of memory on the system, and now there's this other interesting runtime trade off that the operating system has to make, which is how much memory should I allocate as memory to give to processes for address spaces? How much of the system memory should I allocate as the file system buffer cache? So we can think about what happens at the extremes. There were early systems that, so what's the simplest way to make this decision, first of all? The simplest way to decide. Just pick a fixed size buffer cache. At boot up time, I say, OK, I've got 10% of the system memory for the buffer cache and the rest I'm going to use for paging and other purposes. And there were early systems that did this. I don't think modern systems do this anymore, mainly because this balance is important. So in general, what happens when we allocate a lot of memory for the file system buffer cache, leaving not much left for address spaces and process pages? What happens? What can happen? What's really fast, hopefully? File operations, right? Reads and writes and things like that might go really fast. But if I do this at the wrong time, what can I cause? What's that? Well, my stack is going to overflow, right? But remember, what happens to systems when they're under lots of memory pressure? They might start to thrash, right? So essentially what I'm doing is I'm sort of artificially limiting the size of the memory on the system. And if I do that, the processes can thrash. And of course, if this wasn't bad enough, what's the problem with thrashing? What sort of activity does thrashing create? It creates all this disk activity to the swap file. So now the thrashing is not only going to kill the performance of the paging operations, but it's going to start to interfere with the file system operations as well. So this is not good. OK, the other option here is I can make the vial system buffer cache very small and allow as much memory as possible to be used for process pages. And of course, here it's just the opposite trade-off. I might do very well at keeping address space usage very fast, but now access to the files are very slow. And clearly this behavior is really dependent on your workload, so depending on the type of, and modern systems try to tune this dynamically. So on Linux, there's this, I think new versions of Linux still have this parameter. At some point they had a parameter that was called swappiness, which is like a fantastic parameter to tune for your system, because the name of it just indicates that it's this weird thing that you're not necessarily like, how swappy is your system? It's kind of arbitrary, but essentially what swappiness controlled is how eager the kernel was to evict address space pages from memory and turn that memory over to the file system for caching purposes. So Linux is always maintaining this division in memory where any unused memory is given over to the file system for buffer cache purposes. And so there's always this divide in line between what's being used for one type of operation. And sometimes you can experience this on your system where, for example, if something ran overnight like an index process that used the file system really heavily and nothing else was running, it's possible that the operating system pulled more and more pages into the buffer cache to improve file system performance, and then the next time you sit down and actually start to type or do something, it takes the machine a little while to warm up, because now there's interactive use going on, and so it starts to move the balance in the other direction. OK. So the next question with respect to the buffer cache is, where should the buffer cache be? So there's two choices here. There's two places, and like any other cache, the buffer cache, like the TLB, the buffer cache has to conform to the interface that it's used as a pass-through for. And I have no idea what that even meant as I said it. So let's just make it a little more concrete. So the buffer cache has to have the same interface as the thing that it's caching. So if I put it above at the virtual file system level where it can see all these file operations, that it has to support those file operations in the cache. The other alternative is to put it below the file systems and allow it to cache disk blocks. So if I put it up here, it can actually cache entire files, and I see things like open and close, and I see all the reads and writes. If I put it down here, all I'm doing is trying to guess at what's happening in the actual files, but I actually get to see disk blocks. Does this make sense to people? This diagram is supposed to be an example of how your OS 161 system is configured. And this is pretty typical. There's an interface here. Now because it's c, you end up with these really gross ways of implementing interfaces like function pointers, which are scary and kind of nasty. But that's what it is. It's really an interface that you guys might be familiar with from other languages like C++ or Java. And file systems are required to implement that interface in order to be mounted as part of the system. Currently, your system uses MUFS exclusively if we had asked you guys to do assignment four, which given how well you guys are doing next year, maybe we will, you're going to ruin things for people in the future. Then you would actually start to use SFS as well, which is a file system that's implemented as part of your kernel. But the point is that both of these eventually, so if you think about how you configure your system, you can actually configure multiple file systems to coexist on the same disk. And so those file systems have different data structures. They're studying up things differently. They interpret things differently. But at the end of the day, they're both using the same low-level disk interface of just reading and writing blocks of data based on the address on the disk. So if I put this down here, then that's what the buffer cache sees. OK, so if I put the buffer cache above the file system, then I can actually cache entire files and directories, because that's what's delivered back to the buffer cache at runtime. And the buffer cache interface now essentially has to be the same as the file system interface, because these are the operations that are going to pass through the cache. And let's think about how some of these operations would work. So let's say I'm trying to run a cache above the file system. And so what does this mean in practice? Let me just slow down a little bit, and we'll talk about this. So what this means is that every time the virtual file system receives any operation, before passing it to the client file system, it passes it to the cache first. This is how caches work. So going back to the TLB, when the processor is trying to translate virtual addresses, the first thing it does is it tries to look it up in the TLB. If that works, I'm golden. I don't have to ask the OS for help. If it doesn't work, I trap into the operating system. So a cache always involves a first step of trying to find or trying to use the cache, and if that doesn't work, the cache can propagate the result to the bigger, slower thing. So in this case, so how does my cache handle an open? I try to open a file. What is it possible for the? Yeah, but what am I going to look for for open in the cache? So that actually works. Let's come back to that. I'm not sure that actually works, because oh, here's the problem without, OK, so sorry. Here's the explanation. Remember, the buffer cache here doesn't know anything about the internals of how these underlying file systems work. It can't, because it has to support a bunch of different underlying file systems. So in order to cache an open, I actually need internal information about how the file system operates that the cache doesn't have. So in this case, I just have to pass down that open to the underlying file system and let the file system imitation use its understanding of how its on-disk data structures work to figure out what to do. OK, what about a read? So I want to read some byte of data from the file. You want to try again, Yusuf? So your first answer was right here, right? So now I can look in the cache. So if this is a file that I'm caching, then I can return contents from the cache without passing the call down to the underlying file system. If not, so sorry, that's the second option. If the file is not in the buffer cache, then I have to propagate the read through the file system. But what the cache does then is it may load the contents of the file that are retrieved into the cache so that in the future, I hit in the cache rather than having to make the call. This is the same thing that I do with the write. And on close, I'm going to remove the file contents from the cache. And of course, that needs to also propagate to the file system. So the nice thing about being above the file system is that the buffer cache sees file operations. It understands file operations. So when we go to the next design alternative, what you're going to see is that all I'm seeing are blocks. All I see is read or write to some block on disk. I don't know what it means. Here, I know what the process is trying to do. I can distinguish between, for example, an open and a close and a read and a write and a bunch of other things. Now, open and close might result in a bunch of reads to underline data blocks. So I can't necessarily guess if I'm caching at the block level what a read means. But here, I just have a lot more semantic information about the file system operations that are taking place. And that can be a good thing. The problem here, and these are big problems, and this is the reason this technique really isn't used, is this can hide a lot of information from the underlined file system that's needed for consistency. So for example, if I have a write, the file system may need to do something about that, and we'll talk about some things that file systems do to ensure consistency. So it's rarely safe to not notify a file system implementation that a write has taken place. I don't want those writes to hit in the cache. They really need to go to the file system. The other problem here, of course, is that this prevents me from caching any file system metadata. So the super block on the disk, which is a very, very active piece of data, inodes, internals of directories, none of that can be cached, because none of that data is actually passed up through the system calls. So for example, if I do an open, we talked about the fact I have to read a bunch of data blocks from the disk. Ideally, those data blocks are going to be now in the cache. But if I do the cache at this level, I can't do it. So and this is the other big problem, is file system metadata tends to be very, very active. And so it's the type of information that I want in the cache. So these two things together pretty much make this something that's not workable. So now if we go below the cache, what are the objects in the cache? Above the cache, the objects in the cache were files. Below the cache, the objects in the cache are what? Remember, I'm caching now at the disk interface. So what are the requests going to be? What operations am I going to see? Yeah, I don't even know what it is. It's just a block, right? What the file system does to accomplish whatever it's trying to accomplish is it reads and writes blocks of data from the disk. And so that's what my file system cache sees if I put it below the file system of notation. Read block 10, write block 20, read block 15. I don't know what those mean. Those reads could be part of open. The writes could be part of closed. I don't know. All I know is reads and writes to blocks. OK, so the interface is now the same as the disk interface. Let's just get through this, and then we'll come back to consistency on Monday. So what's nice about this is now I can cache everything. I can cache super blocks. I can cache inodes. I can cache directory contents, whatever. Because everything that the file system thinks about, everything the file system uses, has to be expressed at some point, is an operation to a block on disk. And so now I can cache anything and everything, which is great. Now, another thing that was really important for consistency, and we'll see why, on Monday, is that this also makes sure that the file system can see all the operations even if they hit the cache. So even if a write is going to eventually hit the block cache, the file system sees it. And in certain cases, the file system can force things to this cache as well. So that's pretty important. The cons here that you imagine all my buffer cache now sees are this sort of basically meaningless stream of reads and writes to blocks. And so it's very difficult for me to infer what's going on at what I can't necessarily tell what file is just opened, or what file is being read and written to, read from and written to, because all I see are these streams of block IOs. But this is actually what modern operating systems do, because this is sort of the right set of traits. So on Monday, we'll talk about how caches affect consistency and a consistency technique called journaling. So I'll see you guys then.