 Good morning, everybody. Hope you all had a nice weekend. Resounding good morning from this side of the mess. Let's try that again. Good morning. Yeah, maybe some of you guys feel like you're in prison right now with these assignments. We're going to play some more Johnny Cash prison music for the next few weeks, just to inspire you guys. So today we're going to keep talking about file systems. And in particular, when we start talking about cash and inconsistency, I think we get to some of the parts about file systems that are pretty interesting. And some parts where there's been kind of continued work and research over the last 10, 20 years. So we're going to start today talking about how we make file systems fast. So we'll talk about how we use memory on the machine, not only as memory, but also as a cash for the file system. So a week ago, somebody came up to me and they said, well, I wonder if I have too much memory in my system. I bought a bunch of memory. I have like 32 gigabytes of memory in my system, so that makes me pretty happy. But they were saying I looked using my utility, and my system wasn't using most of that memory. So is it going to waste? And I said, well, probably not, because one big thing that operating systems do with memory is cash file system data and make the file system faster. So on your system, actually, there's this constant sort of interplay going on between the system trying to decide how much memory to use as memory for process address spaces and paging and things like this and how much memory to use as part of the file cache. So even a large portion of your memory doesn't look like it's used by processes. It's probably actually in use making your disk, your big slow disk, look faster. So we'll talk about that. And once we start talking about caching, then we start having to talk about consistency, because for now, what we've been talking about is file system operations that go to disk immediately. But now, if they're going to stop and memory, then it's possible that the disk is going to be in some sort of interesting state if and when the system crashes or the power goes out or whatever. So we'll talk about that. So at this point, if you're on my schedule, my recommended daily dose of CS41, you should be finishing up assignment two. And that will give you about a month for assignment three. I mean, assignment three is fun. Assignment three, I think, is the most fun assignment in the class. So I hope that you guys will get there. If you don't get there, you're going to miss out on a huge chunk of points, and you're probably not going to do very well in the class. But the real reason to get to assignment three is because assignment three is fun. And recitations, I think, this week are going to start with assignment three. That's our signal to you that that's where you should be. So if you're still working on assignment two or shutter assignment one, you're behind, especially if you're working on assignment one. That's worrisome. So that's what recitations are going to do for the next couple of weeks. And I need to figure out, I'll talk to the TAs. We'll probably try to continue to hold office hours after lecture ends until the assignments are due. So there's a couple of weeks there. And I'll talk to the course staff and make sure they're OK with this. But I'd like to keep holding office hours. I know you guys. Some of you guys will be wrapping up. Just the last few bits of assignment three during that time. So we want to make sure that you guys have the support that you need to do that. All right, so any questions on file systems up to this point? We're going to do a little bit of a review talking about design goals, the file system operations, how we translate path names, how we find data blocks. Any questions on this stuff before we do a little bit of review? All right, so who remembers what the design goals for our file systems were? We're going to start coming back to these today when we talk about caching. Who can give me one, Jeremy? Yeah, I mean, I want performance, right? All of this, I want performance. And today we're going to start talking about how I make some of this fast, yeah? Yeah, so efficiency defined as what? Yeah, and particularly, you know, people have observed that when I think about efficiency frequently what I'm trying to optimize for is this on the disk, right? That back and forth head seeks, right? We're going to do a lecture on the UNIX FAST File System, which is getting oldier and cruftier every year, but it's still a fun example of how a file system was really, really carefully designed around the specifics of disks. And that'll give you some sense about tricks that file systems have played in the past to try to, again, reduce seed times, right? Yeah, what's the note? Yeah, so not losing data, right? Some of these other things, you know, you might argue are kind of second order concerns, right? If you had a file system that regularly lost data, you probably wouldn't care how fast it was because you'd spend a lot of extra time, you know, regenerating the data that it lost, right? Let me think about it, you know, if you had this great file system and then you had done all this great work working on assignment tune, you'd finally got exec to work and then the file system lost all your data, right? Well, yeah, sweet, it was fast, right? But human time is more valuable than file system time, right? So all your lost work would probably make you reconsider your choice of file system at that point. What else? Yeah, yeah, I mean, we want a lot of files to sort of efficiently grow and shrink and stuff like this. I think we have everything, right? So, yeah, so translating names to contents, right? And we talked about how that happens, but we really haven't talked about trying to improve that process, right? That happens through, you know, mapping relative path names to inodes and following that trail of inodes, potentially all over the disk, right? Which is potentially quite inefficient. We want a lot of files to be files, right? So we had this file abstraction, it has some properties file systems need to support, right? We want to optimize access to single files. So we talked about at least one trick for doing that. Actually, we talked about a couple of tricks for doing that. What is one sort of optimization that file systems like EXT-4 use to try to make accesses to single files fast? Robert, anybody remember? Sean? Okay, we're not about caching today, but what about at the layout level? Yeah, I'm a... Index table. Okay, well that's just, now that's sort of standard path name resolution. What about trying to make access to individual files fast? What does the EXT-4 try to do when it creates and allocates individual files? Yeah. Okay, so yeah, right? So one thing that we talked about the EXT-4 doing was allocating file data in big pieces, right? But we're getting closer to the answer, right? Why would I do that, right? Why does that make files system operations fast? Yeah, so if I allocate blocks in big chunks, right? It means that all the chunks in that big extent that I've potentially just given to a file are close to each other, right? So when I'm reading data for a single file, I'm not moving around the disk, but what is the other thing that EXT-4 did? Yeah, remember, we broke up the disk into these little sort of mini disks, right? These groups, and then we put our Inos, which is the metadata associated with the file, we put those close to the data blocks, right? So we allocate data blocks from the same part of the disk where the Inos are, right? So that's another thing that the EXT-4 did, right? We haven't really talked about optimizing access to multiple files. This is another file system design goal. We'll talk about this a little bit later when we talk about some of the more advanced file systems. Even FFS started to do this, right? But let me, I mean, let's just throw it out there. If I knew that there was a set of files that were related to each other, what's one trick that I could use to optimize access to that group of files, a group? I know that this group of files is frequently accessed together, right? Meaning kind of at the same time, Jen. You could put them close to each other on the disk. Yeah, I could do the same thing I do with data blocks for single files. I just put them close to each other on the disk, right? I put them in the same, you know, FFS has this concept of cylinder groups, right? So I try to put them in a place where the disk can get at them without having to seek the heads too far, right? And then finally, somebody mentioned consistency and that's exactly right. We'll talk a little bit about how we do this today, right? So we wanna survive failures and maintain a consistent view of file names and contents, right? So, and again, there were two parts of that, right? One is trying to keep the state of the disk as close to what the state of the disk should be as possible given changes that have gone on, right? And today's the first time we're gonna talk about something that directly interferes with that, which is caching. And then the other thing was when a file system crashes, we'd like to be able to recover it to a known good state as fast as possible. We'll talk about some ways to do that today, right? Those two design goals are not completely the same, right? One goal says, try to make the disk as consistent as possible. The second goal says, if there is a failure, try to have a process so that when I reboot, I can restore the file system to some known good state. There might be some data loss, right? That might be sort of, you know, especially when we start caching, that might be inevitable, right? But what I wanna be able to do is not have problems in the metadata that's in the file, like circular pointers or things that would cause the disk to be expensive to repair, right? So old file systems had these programs that you could use to check and repair them, but they would frequently take, you know, five minutes to run, right? And maybe even longer, right? On a really badly fragmented disk or in certain cases they could run for really, really long periods of time, right? So imagine you're running a server, you're trying to maintain uptime, you have some sort of crash, the system reboots, so you're like, okay, great, I get my website back up. You know, people can look at more pictures of cute puppies and then, you know, you wait 20 minutes or 30 minutes while the file system is sitting there re-consistifying itself, right? So that's not what we want, right? All right, so let's talk about what happens when I actually write data to a file. What are the file system operations that have to take place in order for this write to complete? Alyssa, give me one. Yeah, well now a new block of memory, a new block of what? A disk, yeah. So I need to find disk blocks for this write, right? So I'm gonna append to the file, the file's about to get bigger, so I actually need data blocks on disk to associate with this file. What else do I need to do? Okay, I need to actually write the data. Okay, took the easy one, right? There are like three left, Simon. Okay, yeah, so when I started this write, right, or when I called open, I would have had to translate the path name, but let's say I already kind of know what the I know number is for the file, right? So what else, Josh? Yeah, so what do I have to update in order to change the size of the file? Andrew? I need to update the I know, right? So I need to update the I know, what else do I probably need to update in the I know? I probably need to change the size and then Tim, what else do I have to do? The last access time, okay, but what else? I had these new data blocks, right? What else do I need to change, Sarah? Yeah, I need to associate those data blocks with this file, right? So I need to somehow link them from the I know so the next time I find that I know, the next time I translate a path, get to the sign node and try to figure out what data blocks are associated with it, I find the ones I'm about to allocate, what else? I think we're almost there, Paul. Did I miss anything? Oh, okay. Anybody, Nick? Shikanya, okay, let's see here. So I need to find the empty disk blocks to use, right? And you mark them in use, right? So what data structure did EXT4 use to track which blocks were in use on disk? I remember this was something that we were able to view some information about using the file system debug FS tool. Wembley, do you remember? Mukta? No, no, no. So I'm talking about marking disk blocks as in use, right? So that's how I would associate them with the file, right? I had this sort of like, interesting. Yeah, I used a bitmap, right? So in EXT4 I would mark the bits in the bitmap for these data blocks that I've discovered, marking that they're allocated now, right? I need to associate those blocks. We talked about the data structure that allows files to grow efficiently, right? That uses a combination of direct blocks, indirect blocks, doubly indirect blocks, triply indirect blocks, et cetera, right? To allow files to get very, very large, right? I need to adjust the size of the file. So this involves making a change to the iNode where the file size is potentially stored. And then I actually need to do the right, right? And again, from the perspective, this is particularly important when we start talking about this today, right? In order for this, in certain cases, in order for this to take place and for the file system to be in a consistent state, all these things need to kind of happen synchronously, or atomically. All these operations need to take place. If you can imagine, you can go through here and say, well, let's say I forgot to do this or forgot to do that, and you could think about the types of things that would go wrong in those cases. All right. So let's talk about how we map offsets to data blocks, right? So we talked about three ways of doing this. Who can name one? Yeah. So I could have a linked list, right? And you know, we talked about the complexity of that. What's another approach, Tau? What's that? Yeah, so I could have a flat array, which is I'm gonna interpret that answer, right? And we talked about problems with that. Jen, what's another approach? Yeah, so I had this multi-level index, right, that we finally came to, which is kind of the modern way of doing this, right? Where I, and the idea here, right, is I try to have the number of accesses required to get to a data block grow slowly with the size of the file, right? I don't want it to grow with the size of the file, which is what it does on a linked list, but I also don't wanna allocate a huge amount of space as to our flat array to make a constant time, right? So I had this, you know, maybe similar to what we did with virtual memory, but not quite the same, right? All right, any other questions about this before we talk about caches? Now that we've warmed up your own mental caches, all right, so what's our standard operating system trick for making a big, slow thing look faster? What's that? We throw a cache at it, right? So this is kind of interesting because somebody, I enjoyed the fact that someone decided to use the Piazza form for product advice, right? So somebody posted on Piazza about hybrid disks, right? So disks, and I didn't even know these existed, right, but I guess it makes sense they do, right? Disks that combine some sort of spinning medium, right, with flash, okay? So why would I use a disk like that? What is that disk essentially doing? Or what's one way that we could design that disk? Right, I have this big disk, right? I'm trying to get the capacity of the big spinning disk, right, because we talked about how capacity and big spinning magnetic disks is still an order of magnitude cheaper than it is on flash drives, but what am I trying to do, actually? Yeah, I mean, one way of designing a hybrid disk, and I don't know if that's actually how they work, is to try to use the flash drive as a cache, right? The nice thing in this case, right, about the flash drive as opposed to memory as a cache, what's the thing that's kind of preferable, potentially, from a file system perspective, about flash as a cache? Simon. It's on volatile, right? So the stuff we're gonna talk about today, where, again, I pull the plug and the memory loses its contents immediately, right? And so any caching that I've done in memory has potential consistency implications for the disk itself. In flash, it's like, oh, okay, it's still there, right? So that's good. And again, I don't know how those drives work. I'd be interested in finding out, but that's another, again, we think about the hierarchy of storage on your computer, starting with registers at the top, which are like two cycles to get at, and then with big, slow spitting disks at the bottom, right? And essentially, a lot of what we do in operating systems is find ways to deal with this hierarchy and figure out where to put things to optimize performance, right? So, again, we're gonna use a cache, and essentially what we're going to do is we're gonna use memory as a cache for the file system, right? So again, until recently, when I had this thing called flash, memory was the smaller, faster thing that was in between sort of higher-level processor caches and disk, right? And normally when we talk about the memory that's used to cache file system data, we call that a buffer cache, right? So again, on modern systems, what's done, so for example, on Linux, when Linux boots, it has a certain amount of memory that's there to manage, right? And now we're gonna start talking about a second use of memory, right? Up until this point, the only use of memory we had talked about is for process address space. And what modern systems do that have what's called an integrated memory management and buffer cache is that they allow the memory to be used flexibly either as part of a file system cache or as part of process address space, right? The other way you could imagine doing this is that at boot time, you could essentially take your memory and divide it into two pieces, right? You could say, okay, I've got four gigs on the system, one gig is for the buffer cache, three gigs is for my process address space, right? The problem with that, of course, is that you've done this static allocation, right? And so if you're running a database server, for example, that might do a lot of file IO and be really, really file system heavy, then it might have a large portion of memory that's unallocated for process address spaces that you would love to use for the files, right? And Linux actually has a parameter that you can feed into the kernel to determine how the kernel makes this balance, right? How much should the kernel prefer to use memory for process address spaces as opposed to for the file system cache, right? But again, at runtime as your system is running, most systems are making this trade-off dynamic, right? So if you start doing big file system workloads, what the system will do is it'll try to find process address spaces that haven't been, it'll essentially tell the virtual memory manager, can you trim, right? I want you to trim process pages, I want you to swap them out to disk, right? And I'm going to use that extra memory not for something else for the memory management, I'm going to use it as part of the file system cache, right? All right, so again, so I use memory as memory and I also use it as a file cache, right? And as you can imagine, these two types of memory use are competitive, right? So the more I use for the cache, the less I have for process address spaces. And again, if I don't do this trade-off carefully, let's say I over provision for my file system buffer cache, now I may start thrashing in my memory system because essentially to the memory system, what it's looking like is I'm running on a machine that doesn't have much memory, right? And the danger of doing that is I can thrash, right? The opposite problem is that I can make, if I'm file system bound, right? I can make file system operations very slow by giving it a very small cache, right? And on Linux, yeah, the parameter I talked about is called swappiness, right? So swappiness kind of tells, it's a cute name, right? It tells Linux, you know, how swappy should I be, right? The more, what does it mean to be more swappy? Spencer? Yeah, I mean, essentially it tells the memory manager, swap things out harder, right? Swap more, right? Be more swappy, right? Like find pages to evict, right? You're like, run your page replacement algorithms, find pages to evict and get them out of there, right? And essentially what we talk about this is kind of, you think about the system as it's running, right? Processors are trying to use more memory, right? Sometimes we call that memory pressure, right? Like how much memory is the system trying to use, right? File system, the system is running, it's touching new pieces of code, it's allocating the data so that, so the system is trying to use more memory and then how much sort of downward force is the memory manager placing on that, right? By aggressively finding pages that haven't been used and swapping them out, right? So swappiness, right? Be more swappy. So, and then when we start talking about the design of the buffer cache, there's some interesting design choices about where the buffer cache should go with respect to the file system itself, right? So one option is, so for example, this is an example of the type of file, so remember when we talked about file systems, we talked about the fact that file systems are more decoupled in most cases, right? Not always, this is typical in Unix systems, right? But even on Windows, right? You might run with multiple file systems, right? You might have NTFS that's on your main drive, you might have some sort of network file systems on a different partition, you might have a flash drive that for some reason still is using FAT32 or whatever. So most file systems provide, most systems provide what's called kind of a virtual file system interface, right? If you guys are doing assignment two, you're getting quite familiar with this, right? So VFS, right? This is what it's called in OS 161. The idea behind VFS is that, and some of you guys have probably scratched your heads and maybe torn a little bit of hair out, trying to figure out what happens when I call VOP read, right? And it turns out that this is one of those cases where we're forcing C to do something that C doesn't like to do very much, right? Which is kind of pretend that it has object-oriented features, right? Which it doesn't, but we can kind of make it do that by using ugly function pointers. And if you've tried to trace these calls, again, you've probably got very frustrated. But what's actually happening here, right? Is that the virtual file system is allowing these calls, right? VFS open, VFS close, and then VOP read and VOP write to map down flexibly to multiple different implementations. So your current system, for example, has a file system that's implemented inside OS 161 that we used to use for an assignment called assignment four, which doesn't exist in this particular course. So you guys don't really get any experience using this. And then we also, if you've wondered how you can read and write the files that are actually on your file system, it's because there's a thin thing called MUFS, that's an emulated file system that passes the read and write calls that are generated by your kernel down to the UNIX file system that your kernel is running on top of, right? But essentially what happens with both of these is that they issue potentially, right? You can imagine if these were sharing the same file system, this wouldn't happen on your system, right? Because MUFS is using the underlying UNIX file system directly, right? But on a real system what might happen is that you might have multiple file systems mounted on the same disk, right? Using different partitions of the disk. And so eventually what's happening is these calls start out forking, but what they eventually end up doing is just calling read and write block, right? This is the low level disk interface. If I have a file system that's mounted on top of a disk, this is what it's eventually doing, right? So you can imagine sitting here at the file system level and watching these calls sort of fan out to different file system implementations, but at the end of the day, what are they going to do? They're going to read and write data blocks, right? They're going to read and write those data blocks differently depending on which file system they are and the data box will have different structure and et cetera, et cetera, et cetera, but that's what they're doing at the end of the day, right? Does this make sense? I mean low level disk interface, high level file system interface, multiple file system implementations. So one approach here is to put the buffer cache below the virtual in between. So when we talk about the buffer cache, what we're going to start talking about is a piece of code that intercepts these operations and instead of allowing them to continue to the underlying file systems, what it does is it reads and writes data from memory, right? So what would happen here is that I would call now open and close typically don't necessarily affect the buffer cache directly, but read and write. So when I start reading and writing data, what would happen is rather than actually sending the read call to a file system implementation, what would happen is that the buffer cache would present the contents that were at that point in the file, right? And you might wonder, well, how does it get there in the first place? Well, I'll talk about that in a second, right? But one design approach here is to say, let's put my buffer cache below the virtual file system and handle file system operations on this line, right? A second approach, which is I think more common, is to have the buffer cache below the file system implementation itself and above the disk, right? So now what happens is the buffer cache is actually reading. So the buffer cache here is storing disk blocks, right? The buffer cache here has to store some information about actual files, right? So does this design choice make sense to me? I feel like I've gotten ahead of you, Jeremy. Yeah, so, okay, so let me be careful here. When we talk about disk caching, right, and we talk about the buffer cache, we're talking about operating system level caching of file operations. As Jeremy has hinted at, disks themselves do a lot of caching, right? So your disk, like your big slow disk, whether it's a flash drive or whether it's a spinning medium, has buffers on it itself, right? And the disk will do things like, if you ask it to read a block, right? It'll read a huge, big chunk of the disk and put it into a memory buffer on the disk, right? And then, hopefully, the next time you ask for a block, it's already got it in its buffer. So there's all sorts of things the disks are doing internally, right, to try to improve IO. But what we're talking about is stuff that the operating system is doing that will prevent the disk from ever being used, right? So what I want is when I issue a reader write, I want that reader write to actually not even go down to the disk at all, right? The disk never knows. If I've done a read from that block recently, it's in the cache and so I just serve the result from the cache rather than telling it to disk, right? But disks do a fair amount of caching and that can really, really affect disk performance, right? So one of the things you can look for when you buy a disk is like how much internal caching is it doing, right? And how good is it? And actually, I have no idea how to disk do that, right? I could make some guesses, right? And so could you probably by the time you're done with this class, but that's not what we're talking about. We're talking about the operating system level cache. Any other questions? This is a good question. Yeah, actually. On the disk? Yeah, I mean, at some point on, I'm guessing, is that the disk caches, as you increase size, they probably start to lose performance, right? Because part of it doing disk caching is trying to predict what blocks are going to be used next, right? And what we'll see is that as you would imagine, right, is I get higher and higher in the system, right? Closer and closer to the application, the likelihood of knowing what disk block is going to be used next gets higher, right? The hardware doesn't necessarily have much visibility in what's going on, right? The hardware sees read from block 512, read from block 513. It doesn't know that block 512 contains some inodes and block 513 contains some data, right? All it knows is that, oh, well, okay, I gotta read from that box and maybe I'll get some data that's close by, right? So as you get down closer to the disk, and this is actually one of the challenges with this design decision, right? So you can imagine up here, I have more visibility into the semantics of file operations, right? So if I get a read from a certain offset, I know where the read came from and I know actually what's happening, right? Down here, all I see are these reads, all I see are blocks, right? I see again, you know, read from block 513, read from block 514, and so some of this information is lost, right? And this is essentially, this is essentially what drives the design decision here. So above the file system, what I would cache, in my buffer cache, are entire files and directories, right? So, and the interface to the buffer cache is the same as the interface to the virtual file system, right? It's these file system, system calls that you're pretty familiar, right? Right now, because you've been exposing them to applications through the system caller. So, you know, we could talk about if I put my cache up here, what do I need to do in certain cases, right? So again, what I'm gonna cache here is entire files and directories, right? And we'll talk in a second about why that is. The buffer cache interface is the same as the virtual file system interface. So, let's talk about how these calls would work with the buffer cache that's the file system, right? So when a process calls open, for example, this, the buffer cache doesn't, you know, the buffer cache has no information about the file at this point, right? So what I need to do is I actually need to pass that open call down to the file system implementation that we normally use, right? What about if a read occurs? How do I handle a read, right? So this is a cache. I get a request to read some data from a particular file. What do I want to happen? Well, I want it to be in the cache, right? What I want to say is, hey, I've got that data from this file lying around in memory, right? I've allocated some memory for it and I've already read that data, how does data get into the cache? Navi. Yeah, it's gotta come from the disk at some point, right? There's no way to get into the cache unless I actually allow the file system to perform the operation. So what happens the first time I do a read from a file? So, yeah, so basically what I need to do is just allow the read call to proceed, right? I pass it down to the underlying implementation and then what's gonna come up, right? So I tell the file system do a read and what's returned by the file? Not a trick question, what's that? The data, right? And so, remember, I need to be on both paths. So as the data come backs up, as the data come backs up, what do I do with it, right? I put it in my cache, right? So I say, okay, there was a read from this file. Now I load the data into my cache, right? Now, let's say there's another read from this file. What do I do now? Greg, I know, too bad. So now, after I've done one read from the file, what do I do, what do I hope happens in the future when there's future reads from the file? Where would I like to serve them from? Yeah, from the cache, right? If the file's in the cache, I just return the cache contents, right? What about writes? What about writes to the file? I get a write to a file, what do I need to do? Yeah, so the first thing I have to do is if the file's not in my cache, right? I need to potentially, if I'm not going to actually cache that data, I need to pass the write down to the underlying file system, right? What if the file's in my cache? Yeah, I just modified in the cache, right? So let's go back to our observation, right? The caching is opposed, directly opposed to consistency. Why? We've got to the point here where this should become parent to keep, Sam. What problem does this create? Well, but no, let's just say the writes and reads proceed completely naturally, Jen. Right, so that's exactly right. So if what I'm doing is only ever updating contents in the cache during writes, right? Then as Jen pointed out, the cache is, the disk contents are stale, right? The disk contents haven't been updated in the file, right? The file, you've been writing all these changes to exec.c and the changes aren't on disk, right? The changes are stuck in the cache. And so if the system dies or if my operating system crashes, then the cache data is lost and the disk is now in an inconsistent state, right? Actually, the disk that may not be inconsistent, the disk is just wrong, right? The disk has the version of the file that you were editing an hour, right? And we'll talk about how to address this. What do I do when I close a file, potentially? This goes back to our observation about why I might want to have open and closed, sim it. Yeah, I need to pass it down in the file system. And potentially what should I do, right? I should probably take the contents of my cache. Let's say I haven't told the file system about any of the writes that have taken place, right? Now would be a good time, right? Like the file is closed. Frequently, so many people have ever called flush, right? On a file handle, right? In C or sync, right, or any of these operations, right? Well, one of the things that flush does is it actually tells the cache, you know, send this content to disk, right? If you've been holding this content in memory somewhere, actually please write it to the disk now because I would like that, right? Jeremy. I mean, it's a good question, right? I mean, I suppose it might block the process that called close, but I mean, it will probably try not to block other things, right? And the whole system isn't going to stop. I have some IOs I need to finish. Yeah, sure. Yeah, so we'll talk about this, right? So this is the question. This is particularly problematic with writes, right? So remember, so do reads create any consistency, any of this sort of staleness problem, right? No, right? I mean, on some level, reads are completely free, right? Because they don't change the content of the disk. So I can cache reads as hard as possible, right? Like I can be as aggressive as possible as caching reads. I never need to worry about consistency, right? Because they don't change the disk, right? And they have writes on the other hand, I need to be careful about what I do. We'll talk about some different approaches to caching writes or not caching, right, depending on what I think, right? So if I put my cache above the file system, what are some advantages or disadvantages of this approach? Yeah. Well, OK, so I'm assuming that it's going to be fast regardless, right? Like, either one of these approaches, I mean, it's too bad you guys don't have to do assignment 4. I'm sure you guys don't feel that way, right? Because one of the parts of assignment 4 used to be implementing a buffer cache. And it was one of those things that when you do it, it's actually pretty shocking how much of a performance improvement the buffer cache provides, right? Like, you guys aren't used to this on your system because your systems do nice caching already, right? But when you ran SFS on a bare disk, and then you have learned a buffer cache, it was like, whoa, right? I mean, it's a real huge importance for you. So this is going to make a meaningful performance impact, right? But what's a nice thing about doing it up here? Yeah, Joe. Yeah, so I have more sort of understanding of what's happening, right? I see files and offsets, right? Which is nice, right? Because I might say, hey, you know, if I've seen a read for this certain offset into a file, maybe I should actually get some other disk box in that file as well to pull them into the cache preemptively, right? We had this idea of trying to do sort of read ahead for certain types of files. So this is a good question. So I see information about files, right? I see the semantic information. What do I not see? What can this type of cache never cache? So you think about it, right? I did a read operation, right? And that read operation was, and I only saw, all I see when I do a read at this level is I see the read operation going down to the file system and I see the contents coming back up. What do I miss, right? What happens at the disk level that I'm never going to be able to cache? Paul. Yeah, I never see any operations to disk structures, right? Because those aren't part of the contents, right? The inode doesn't come back with the contents. All the inodes and all the on-disk data structures, the bitmaps, the superblocks, all that stuff is only used to perform these file system operations. So it's used sort of below my visibility and so I don't see those, right? And we talked about before, right? Those parts of the disk get used a lot, right? Like the superblocks use all the time, right? The inodes get used very, very regularly. And many, many different file operations have to touch those metadata structures. So this is potentially a big problem, right? The other, so this is this, and then the other thing that happens is I hide a lot of file operations from the file system, right? And this is, so these two things kind of together essentially make this approach like they're kind of a loser. Despite the fact that I do see these file operations, right? The problem is that, for example, you know, if I'm caching a file operation it never actually is passed to the file system. The file system actually has no idea it ever happened, right? So if I managed to cache a write then the file system doesn't even see it. So the file system may want to update some of its own internal structures, right? In some of these cases, right? And I'm preventing it from doing that because it never sees. Sean, did you have a question or no? Yeah. Yeah, so I certainly have to do some careful sort of synchronization, right? I mean the file system buffer cache, regardless of where it's placed is a shared data structure, right? That's going to be accessed by every, you know, by multiple threads running in the kernel and a bunch of different processes concurrent, right? However, what, I mean, at least on Unix, right? And hopefully you guys have been thinking about this as part of assignment two. What sort of semantics, what sort of enforcement guarantees or does Unix provide to multiple applications that have opened the same file independent? What's that? If I open a file, if my process is open a file, these processes are not related, right? There is a special case here that you guys are working on. But if two processes open a file independently and start reading and writing it to it, is there any consistency guarantees that are provided by the file system? Provide the system? Well, I'm not talking about so I'm not talking about the VFS structures, right? I'm talking about is, so here's the scenario, right? Two processes concurrently call, they open the file and then they call read, you know, or what, let's put it this way. At the same time, right, or overlapping, one of them calls read 256 blocks and the other one calls write, right? Are there any guarantees as to how those are going to work out, right? Will the read see the data that was written by the right call or not, okay? Yeah, I mean, in this particular case, right, there aren't really any guarantees provided by the system, right, you know, and it really depends on kind of who wins, right? Like did the read happen first or did the write happen first, right? And if processes don't coordinate this properly, then they're gonna end up sort of reading garbage, right? So in general, trying to use files to do IPC is difficult, right, and that's why we provide other IPC methods. All right, so, but that's a good question, right? So again, these two things together essentially kind of make this a looser, right? All right, so let's talk about our other alternative which is putting the cache below the file system. So if I position the cache below the file system, I wish I had another copy of that nice diagram. What is the cache actually caching, right? So above the file system, I was caching files and directories, right? That's what I saw operations on, right? Below the file system, what do I see operations on? So what needs to be in the cache? Yeah, what are the file systems operating on, right? The file systems see these calls and eventually what do they do? What's the low-level interface that they're using? What do I need to do with my caching, probably? Yeah, I cached disk blocks, right? Just whatever the disk block size is, and again, on EECD4 we said it's 4K, so that's nice, right? One page per disk block, you know, in general the disk blocks are probably going to be some either multiple or divisible divisor of a 4K page, so this is nice, I don't have any fragmentation, right? So I have a system that essentially caches disk blocks, right? What's the interface to the buffer cache? Yeah, we'll read and write what? Yeah, read block and write block, right? So it's, you know, what I'm doing is I'm intercepting the low-level calls that are being issued by the file systems themselves, right? Remember, file systems get a call, read, write, open, close, and what actually happens, right? The file system implementation translates that call by using a series of low-level disk operations, right? Write block, read block, right? That's essentially how file systems work. That's the two operations they have, right? There might be one or two more that they can call in the disk, but essentially the way they keep the data and their data structures on disk up to data is by calling read and write block, right? So again, so let's talk about this approach. What's the nice thing, and these are, this is kind of the remembering what was on the slide five minutes ago question, right? So, because these are pretty similar to the pros and cons for the other approach. So, Lee Kyung, what's a pro to caching file blocks? File systems, disk blocks, sorry. What's that? Okay, I'm not sure I'd like that answer. The choice, I think you're getting close, yeah, Paul. Yeah, I mean remember, anything the file system, anything the file system uses is a disk block, right? I can cache anything that goes to disk, right? I can cache inodes, I can cache the super block, I can cache on disk data structures like the block bitmaps and everything, I can cache all this stuff, right? Which is awesome, right? So now there is nothing that the file systems use on disk that I can't put in the cache, right? So that's a big plus, right? What's another pro to this approach? Well, what was another one of the problems with the previous approach? So what's true here, right? And the previous approach reads and writes or other file system operations on it. Okay, so that would say that's probably, yeah, okay, so that might be a con if you're putting that in the con category. Yeah, great. Yeah, but why, right? So before what would happen is sometimes the blank would not see the blank. So what will see all the file operations? Well, okay, it probably will, right? But who else will see all the file operations? Yeah, okay, I'll accept that answer. The file systems are going to see all the file operations. Anytime you call read or write, the file system will know, right? So if the file system wants to implement consistency semantics of its own, and we're gonna talk about how file systems keep data consistent, right? But part of the requirement for keeping data consistent in the file system is that they see all the file operations, even if they're cached, right? Even if a write or a read hits the cache, there are many cases where in order to keep state up to date, the file systems should still know that the write happened, right? With my cache above the file system, sometimes the file system didn't even know the write happened because it hit the cache and the cache never pulled. So in this case, all the file operations are seen by the files, right? And this is pretty important. When you think about this, we come back to talking about journaling and other things that file systems do to maintain consistency, right? So somebody pointed out one con, which is that I don't see, it's more difficult to see semantics of files and file relations, right? The cache itself may not know that these two disk blocks, or if I tell you, here's a disk block to put in the cache, right? The cache can't necessarily say, well, okay, you asked for offset zero in the file, I'm gonna pull everything from offset zero to offset 1024 into my cache, right? Because all it sees is the disk block, it has no idea of that disk block. So again, because I see all these on-disk data structures, I also might not know what those data structures are, right? So I see a block, is it an I node? Is it part of the super block? Is it part of some sort of other on-disk data structure? Is it a data block? And even if it is a data block, is it closed to anything else, right? Are there other data blocks nearby? So what's one thing about, somebody pointed out something about EXT4 extents before. So what's one way that EXT4 extents or other types of strategies by file systems to put blocks close to each other might help here? What is it? So when EXT4 creates space for files, we talked about how it doesn't allocate one block at a time. It takes a big chunk of blocks and associates them with the file, right? We cause some data, it causes some internal fragmentation, right? Because it's not guaranteed that all of that extent will be used. But what's one impact it might have on caching? No clue. So I just said that the file, the cached can't necessarily make assumptions about locality of disk blocks, right? So I see a read for block 510. I don't necessarily know what that is. Yeah, so in this case, what might happen is I might be able to make those assumptions, right? Because if most of my data blocks are located close to other data blocks in the same file, right? Then the cache might be able to actually do more aggressive caching, right? So if it sees a read from block 512, it might say give me all the blocks close to 512, right? I'll load them all into the cache because the likelihood is that they're related to each other in some way, right? And extents make this more likely, right? Because extents mean that there are big chunks of blocks on disks that are related to each other, right? And again, this is what modern operating systems do, right? We have a block, a disk block buffer cache, yeah. No, I think it's really because caching about the file system and it particularly interferes with consistency, right? And there are ways to work around that, right? But the, and the other big win here is I can cache metadata, right? Like that's really important. Now, when I start talking about metadata like the inodes and super blocks, the caching semantics of that metadata might be different, right? So for example, a file system might say whenever there's a write to an inode, right? Or to the super block, or to an on disk data structure, flush that write to disk immediate, right? So it will never allow writes to stick in the cache because it's one thing, as Jen pointed out before, okay, maybe the file contents haven't been updated for a few minutes as I've been editing the file. That might be okay, right? If the file system dies, then you get a slightly older version of the file, okay, right? But when I start talking about metadata operations that affect these critical file system data structures, then if those don't get to disk immediately, I might boot up and I can't parse the file system at all, right? My data structures for the file system are broken and I might have to spend a lot of time repairing things, so. All right, so we talked before, I'll just end here today and on Wednesday, we'll talk about consistency, right? So we talked before about the fact that objects in the cache are lost, objects, dirty cache blocks, right? Anything in the cache that hasn't been synced to disk are gone when the system fails, right? When the system fails, what you have left is what's on disk, what's gone is the cache and so, and again, remember that every file system operation involves modifying these multiple disk blocks, right? You know, updating the inodes, you know, changing various on disk data structures like these bitmaps, right? Actually writing the data box, associating the data box with the file, right? So all of these different things have to happen and again, I'll just leave you with this to think about, you know, if any of these get stuck in the cache and don't get to disk, right? I mean, this can already happen, right? This could happen even if you flushed them immediately if the file system died like between steps two or three, right? What caching makes worse, right, is that the potential for these, the longer things stay in the cache, the longer you have to unplug the machine and create some sort of problem on the disk, right? The shorter things stay in the cache, the shorter your window is for creating some sort of ugly problem, right? So, and again, this is directly, directly affects performance, right? Because for performance, I want things in the cache as long as possible, right? The idea behind caching things, right? So why do I cache writes? Let's say there's a write to a disk block. Why not just write it to the disk right away? What's, what, what am I, what am I hoping will happen on it? Well, writing is slow, but so what? I'm gonna have to write it at some point, right? What am I hoping will happen in the cache before I have to write it to disk? Yeah, for, well, okay, read, but what else could happen that would be even better? Paul, you know, I'm just thinking about standard cache semantics, right? I have a write, I haven't written it to disk, why not? Actually, yeah, there's another write, right? So I said, okay, I've got, remember I've got several different operations to make into the i-node, right? So if I write the i-node here, I've done one write, then I've got to write the, then I've got to associate the data blocks, and I've done another write, right? Then maybe later I'm going to update the modify time, now I've done three writes, right? If those writes stay in the cache, then hopefully I can take those three writes and amortize them into one write, right? But again, the longer I wait, the more writes I can combine. The shorter I wait, the more likely it is that the files in this is consistent when there's a failure. And we'll talk about ways to work around this on Wednesday. And we'll see you guys then.