 All right. Good morning, everybody. Let me do our stand-up thing since it's Wednesday. Yeah, that's right. You might think you're sick, but I think you can stand up. Ladies, participation points are at stake? OK. For the slowest stand-upers? All right, let me sit down. All right, so today we're going to. So on Monday, we talked about file system caching. We talked about a couple of design decisions related to caching, and we talked about, in general, why would we cache file system information, particularly because the memory is the slow-fast thing that makes the big, sorry, the small-fast thing that makes the big slow thing, namely the disk, look faster. That's given me a long day. So today we're going to talk a little bit about the implications that that caching has on file system consistency. So specifically, because we're caching information and memory, what happens if there's a failure? And we'll talk about ways that we'll talk about some things that can go wrong. We start caching information, and we'll talk about one approach, which is called journaling, that many modern file systems use to be able to recover completely and rapidly after a failure. It's pretty similar to Monday. I'm really going to work on the assignment through your auto-grading today. I promise. This keeps being up there. I'm like, ah, it should be done today or tomorrow or today or tomorrow or the next day or the next day. But yeah, so I'm going to do that right after class. And it's not a big deal. But I've also started up an extra grading machine. So assignment through your grading takes even longer than assignment to potentially. And potentially even longer if your system is very slow and messed up. So yeah, that can take a little while. So we have a little bit more resources that are now being devoted to grading. And we have a little bit more that we can devote to it as well if we need to. So hopefully not too many machines. We'll have to run 421 tests. All right, so any questions on file system caching before we go on? Any questions on this? So what was our trick to make a big, slow thing look fast, Greg? Yeah, so we used the cache, right? Or putting a smaller, faster thing in front of it. We talked about that being memory. Today maybe that could be like an SSD or something like that. But anyway, the point is we make our big, slow disk look faster by putting a smaller, faster thing in front of it. For consistency, what becomes interesting here is that these two things have different failure semantics. Stuff that's on disk is permanent. It's persistent. You write it, it stays there. Stuff that's in memory does not. But it is faster. And so that's why we use it for caching, right? And remember, we call the file system cache the buffer cache. So now we're talking about two different uses of memory, right? So on modern systems, how does the operating system make a decision about how to allocate memory at runtime? First of all, what are the two things that I could be doing with memory? Sirach. One of them's up on the slide, and the other one is, yeah, so I could be using it to hold what? Yeah, address spaces, right? Remember, I could be having pages that are assigned to process address spaces, or I could have pages that are in use in the buffer cache, right? And we talked a little bit about how modern systems have to make this trade-off at runtime, and they try to make it based on the needs of the system. So I use memory as memory, and I also use it to cache file data. These two types of memory usage are competitive. And depending on how I allocate things, I can ruin performance, right? So if I have a very file-intensive workload, but I use a lot of memory for address spaces, then the system's going to be slow. If I do, actually, I think I just said the opposite. So if I have a lot of memory in use as memory, but I actually have a lot of file IO that I need to cache, then I can make the system quite slow. On the other hand, if I reduce the amount of memory I use for address spaces, I can create thrashing, right? So either one of these problems. So it's kind of interesting, right? Because what do both of these sort of bad cases have in common? Why is the system slow in either case? Sam? Yeah, so in both cases, the system is slow because of the disk, right? So it's kind of funny. I mean, we've talked about, essentially, we talked about thrashing, right, a case where all the swapping activities making my system look so. And then you can also have cases where improper file system caching or not enough file system caching makes the system look slow. The disk is really slow, right? When your system runs at the speed of your disk, it's really slow, right? So in both these cases, essentially, what's happening is the disk is now starting to bottleneck the rest of your system. And the system will very, very rapidly start to feel very slow once the disk becomes the bottleneck, either because of too much swapping or because of too little caching in the file system itself, right? So in general, again, I mean, it's kind of like I'm doing my best. The operating system is really doing its best in both cases to avoid using the slowest part of the system, right? Don't be bottlenecked by the slowest thing on the system, right? And we talked on Linux. We had this cute little parameter of swappiness that controls how aggressively the system tries to reduce the amount of memory that's in use for address spaces, right? And that memory gets added to the buffer cache, right? So swapping is to term sort of how much downward memory pressure the system applies to try to increase the amount of memory that can be used for the buffer cache, right? So we talked about two different design choices in terms of where we could locate a buffer cache. Where were those two places? Well, it's one of them, see me? Yeah, so I could put it down here, right? And at that point, what is my file system caching? Richard, is your name Robert? OK, I get confused. Is it Richard or Robert? OK, Robert. Disbox, right? So if I put the cache down here, then when I'm caching our disk box, right? And all I see are disk box, right? What's the other place I could put the cache? Alyssa. Yeah, so I could put it in between the virtual file system and the file system implementations. And at that level, what would I cache, Dan? Yeah, my cache interface is essentially defined by the file system interface, right? So you have to think about where I put the cache. The cache has to expose the same interface as the thing that it's caching, right? So if I cache down here, the cache has to expose the disk interface, which is reading and writing blocks. If I put it here, the cache has to expose the file system interface, right? So what was kind of a serious problem with putting the cache up here, actually? Yeah, so the file system potentially doesn't see all the reads and writes, right? Which could have some problems for consistency. And when we talk about the approach to consistency we're going to cover today, you guys should think about the implications that that style of caching would have on it, right? What was another problem? Right. Yeah, so this is that same problem, right? It's essentially that because I have the file system hasn't seen all the writes or reads, it may not have maintained metadata that's important. But what else about metadata was I concerned about here? Jen? Yeah, remember that none of these operations directly access or return all of this internal file system structure, right? Inodes, super blocks, free bitmaps, et cetera, all that good stuff that lives on the disk. It's things that are necessary for the file system. It's frequently information. The file system has to touch a lot, right? We talked about writing to a file and how that might have to make several modifications to the inode. So this might be stuff that I'd be really interested in caching, but if I do it up there, I never see it, right? The nice thing about putting the cache down here is I can cache everything. Anything that's stored in the disk block I can cache, right? What was one potential drawback, though, of caching above the disk? Yeah. Well, I see, I mean, on some level, I see all the actions, right? Because I see them after they've been translated to reads and writes to disk box. Harish, what kind of information don't I see? Varun, you want to help them? What kind of information do I miss down here? Spencer? Yeah, so for example, if I see up here, right, what am I seeing? I see read from this file at this offset, right? And so if I wanted to add some other blocks to the cache that were nearby, right, and maybe some blocks that were a few offsets away in case I was doing some sort of, if I was reading the file from one end to another, I could do that more easily, where down at the disk interface, that's more difficult to do, right, Jeremy? Probably, yeah. And I mean, this example is a little bit simplistic, right? So for example, I could, so for example, if I wanted to put a cache here, let's say I wanted to put a cache here, but I also wanted the underlying file systems to still know about every read and write that occurred. What could I do? You're building this system, right? You've decided I want to do a cache below the file system layer. What could I do, Spencer? Yeah, yeah. So for example, I might change the file system interface. I might widen it a little, but I might add some calls. I might have a call that says, hey, I'm just letting you know that I satisfied this write from the cache, right? That I'm not, like, I'm going to tell you that these, I might have a call that says, hey, these are disk blocks that were modified in the cache, right? You don't have to write them to disk because I've got them in the cache, but I'm just letting you know that they were modified, right? So the file system can do things, same thing with reads, right? So, hey, I satisfied this read from the cache, right? Writes turn out to be more important, right? Because why, Andrew? Maybe, Bart, why am I more worried about writes when it comes to caching? What do writes do that reads don't? This is not a trick question, Robert. Yeah, writes change the disk, right? Reads aren't going to modify the disk, right? Writes are going to modify the disk. So when we start talking in a second about caching and consistency, we think much more about writes, right? Because if a cache will read, that's great. I mean, to some degree, there's no impact on correctness, right? Whereas when writes get stuck in the cache, that can happen. Yeah. Sure. What do what do? No, I mean, why would the memory manager do that? No, I'm saying that what Spencer was proposing before, I think, right, is this idea that what I might want, if I put the cache here, was a way to still inform the file system that there had been operations on files in the cache, right? Even if I don't push the contents down to disk, right? We'll see in a minute, for example, that file systems actually record some information about these operations as they're going as a way to survive failures, right? If I put my cache in here and I complete, so for example, if I just stop write operations from going down to the file system at all, then the file system is potentially now missing information that it might need in order to survive failures, right? But instead of saying, hey, you have to change these blocks on disk, I might say, these blocks were changed in the cache, right? So if you want to make a note of that somewhere or something, then that's fine, right? But I've cashed these in, I've sort of, I've allowed this write to lodge itself in the cache, right? And I haven't pushed it to disk yet. You also might need an interface so that the file system can actually pull dirty blocks down from the cache, right? Because I mean, so what's the other, when we start to think about policies regarding when I actually write things to the disk, what's another challenge with having the file system buffer cache up here, right? What do I end up kind of doing? Paul, I'll take a guess. Question is, when I start thinking about policies, for example, when I write things to disk, right? I want to let some write stick in the cache, but from time to time for consistency, I might actually need to flush stuff to disk, right? So let's say I put the cache right here. What am I kind of doing in a certain way? What am I forcing all the file systems below me to do? Or what am I making it more difficult for them to do? That's part of it. I'll answer this question. So what I'm doing is essentially, whatever policy the cache implements about when to write out information is now kind of imposed on all the file systems below me, right? So file systems can now, can't make their own choices about when to write information out to disk, right? I'm kind of making it for them, right? I might add an interface so that they could tell me how to do it, but then it starts to get a little bit ugly, right? So down here, the idea is that if I'm reading and writing disk blocks, then the file systems above me essentially have a lot of control over when things get written, right? And usually what we'll talk about is caches have ways to allow the file system above them to say, please write this block to disk immediately, right? I don't want this block to, you can cache this block, but I want this disk write to go to disk immediately, right? All right, so we talked about these two locations. All right, any other questions about caching before we talk about consistency? All right, so why, okay, so the first question is, we talked about this a little bit at the end of class. Why does caching exacerbate consistency? A couple of different ways to answer this question, Tim. Right, so that's a great observation, right? The cache is not persistent, right? So information that is in the cache potentially when the system crashes is lost, right? So okay, keep going in that direction, yeah. Yeah, so I have some sort of failure. So what could potentially be in the cache that would cause a problem when the system failed? Nothing. All right, okay, so the idea is that there's some change to the disk that hasn't actually been written to disk yet, right? So in databases, you might talk about committing a transaction. It means the transaction is actually on the disk, right? So we can talk about that in a similar way here. You could say there are changes to the disk that haven't been committed. They're not on the disk yet, right? The disk platters don't reflect those changes, right? But why does caching cause this problem? I mean, this can happen anyway, right? We talked about the fact that even just a simple operation like appending to a file has to make multiple changes to disk blocks, right? So, you know, I yanked the power in the middle of a right and there's some set of changes that haven't been made. So who cares, right? I mean, why does caching make this work, Alyssa? Okay, right, so if I cache metadata, this is particularly problematic, right? Because the metadata is pretty important for file system structure. But what else does caching do? Yeah, okay, we're getting closer here. Greg, wanna add to that answer? Right, so you guys are identifying nice things that can go wrong, right? So why does caching make this worse? What does caching definitely do? Yeah, right? There's no caching since it doesn't return a success value with up to the user application to manage that. Okay, that's part of what I want to look to. What does caching definitely do, Nick? Okay, I will accept that answer, right? I mean, think about it this way, right? You have a disk that's running, right? You have a file system that's running in your operating system and it's making changes to that disk over time and there's programs running and the contents of things on disk are changing. You can think about it as if there's like a target, right? There's a target on the disk. There are these brief periods of time, right? Where if you manage to, like if you were sitting there with scissors, don't do this with scissors, right? I would advise or have big gloves or something, right? Anyway, or maybe you weren't even gonna cut the power cord. You weren't gonna be that dramatic because why ruin a good power cord, right? You were just gonna flip the machine off, right? Suddenly without any warning, right? You're just sitting there and it's like, you know, there's like little, little brief periods of time, right? Brief, tiny little periods of time where if you flip that switch, you're gonna cause some sort of problem, right? There's gonna be data loss. There's gonna be some kind of inconsistency, right? And there are these teeny little little moments of time, right? What caching does is caching makes those that target bigger, right? Because stuff sits in the cache, right? So for example, a right to some critical piece of file system metadata, right? Let's say I've allocated a disk block and I'm working on marking the bit in the block bitmap as allocated so that I know that that disk block is allocated on disk, right? So now I've done the allocation, I've got my disk block, I've had this right and the right is kind of, you know, making its way, you know, down, you know, through the disk controller, out onto the bus, over the bus, onto the disk, the disk has to move the heads, find the sector, write that actual bit, right? So there's my window of time, okay? And actually, I don't know, I mean, it's probably pretty short in certain cases. Now you're gonna let it sit in the cache, right? So now I wrote it and like Nick said, you know, it's in the cache, it hasn't actually been committed to disk yet, right? Because I'm caching it because I don't wanna, I'm hoping that there'll be some other changes to it. I'm trying to, you know, reduce the number of writes I'm doing and so the longer it sits in the cache, right? The longer that window of time is where if you, you know, cut the power, now you have an inconsistency on disk, right? So this is the problem, right? Caching is gonna create more opportunities for you to have failures if you're not clever, right? And a lot of what we're gonna talk about today is cleverness, right? The ways the file systems have found to work around this, you know, what seems to be kind of like an irreconcilable trade-off, Jeremy. Yeah, so yeah, so we'll talk in a second about writing policies, right? And we particularly are interested in policies related to disk writes, right? Because disk writes are the thing that we're worried about, right? Again, reads, we could just serve from the cache until we're blue in the face and we're happy and there's no consistency problems, right? Writes on the other hand, if they get stuck in the cache, right, are lost, right? And we talked before about how any file system operation has all of these different steps to it, right? So for example, so let's go through this, right? So let's say that after step one, my file system fails. What's wrong with the file system after step one if it fails? What would be something that I'd have to fix when I start up against Simon? Yeah, so now I come back up and I've got one I know that's just gone, right? It's not in use for anything, right? It's just gone, right? So this example, I'm creating a new file, right? So I need a new inode. I've allocated the inode, whoops, gone, right? That inode is probably just full of garbage, right? But it's allocated. And so if I just let the file system keep running, there's one fewer file that you won't be able to allocate, right? Probably not the end of the world, but still kind of annoying, right? Okay, so what happens if I fail after two? Thor, what's wrong with the file system? I fail after two, I come back up. I still have one that's wrong, right? So I still have an inode that's wandered off, right? I'll never find again. But what else, what other problem do I have here? Sir? Yeah, they're not linked to anything, right? Where are those data blocks, right? Those data blocks have been marked as allocated, but they're not in use for anything, right? So now I've lost one of my capacity on the disk, right? All right, what happens, what happens? Okay, so what happens if I fail after three, right? So now I'm doing better, right? These data blocks are associated with the file, right? But does the file exist anywhere? Ping. This file is just like gone, right? It's like the ultimate hidden file, right? There is no path name on the system that will actually translate to this file. If you guys, I don't know how many of you have ever noticed that in certain Unix file systems there's a directory called lostandfound. Has anybody ever noticed this? Have you ever wondered what's in lostandfound? This kind of stuff, right? So if your file system checker runs and it finds a file that's there, that looks consistent, that has this box, but it's like, it's nowhere in the file system, it's not linked, right? What does it do with it? Where does it go? Nobody knows, right? So it throws it in this random directory and it's your job to kind of figure it out, right? So you go into lostandfound and you find a file and then you're like, oh, I remember where this goes, right? And then you can move it back to the directory, but that's what that's for, right? That's one of the things that ends up in lostandfound are unlinked files, right? Files that somehow exist on disk, but there's no path name that translates to them, so otherwise you would never be able to find that file, right? All right, what happens if I fail after four? Okay, I'm getting closer here, so I've actually changed the directory, yeah. Yeah, okay, so at this point I'm getting closer to something that might actually be okay, right? I've got the file, the size is correct, maybe the i-node's got the right information in it. The problem is that your date is not there, sorry, you know, like that's the last thing that happened, right? So at this point, again, I can find this file, I can, you can potentially delete it, so I'm in much better shape, the date is not there, right? So on some level, if you think about it, one through four are all things that are associated with file system metadata, right? This last thing is actually writing the data into the file, and in many cases file systems actually care more about their own metadata, than they care about your data, right? And that's a good thing in general, right? Because their metadata is actually more important, you know, if their metadata gets corrupted enough, they won't be able to find any of your data, right? So it's kind of like, I'll trade off, you know, at that last little right you were doing right before you, you know, you cut the cord for having a file system that still has a treat, right? And then having some idea of being able to navigate this, right? All right, so we talked about the fact that caching can increase this time span, right? So essentially, you know, the caching can make this whole thing take longer, right? So if I think about here, if I start caching at any point here, right, what it does is it increases the period of time where I can have a failure of the kind that we've talked about, right? So, okay, so let's talk about the safest possible approach to maintaining file system consistency, right? And doing caching, right? So what's the safest approach, Jen? Yeah, so the safest approach is just don't buffer rights at all, right? Remember, the cache is still really helpful for reads, okay? You know, even if I don't allow rights to ever sit in the cache, I'm still gonna get a performance boost from reads, right? And that's gonna make me happy. So this isn't as crazy a policy as we might think about, right? And what we call this is this kind of cache is called a write through cache, right? Because rights go just straight through the cache, right? They don't, they might update the information in the cache. Why would I update the information in the cache if I'm not actually going to use it? Why would I change the cache and then do the operation at disk, okay, that's fine. Yeah, so I want the read blocks to be up to date, right? Otherwise, the read block for that block is stale, right? And I might have to fetch it again from disk, so exactly. So in order for reads to work, I still modify the cache on writes. I just don't actually allow rights to sit in the cache, right? I immediately flush rights to disk, okay? Now, what would be the opposite approach, right? Let's say that you are like super certain that the power will never fail or that you will never crash your system, yeah. Yeah, or well, okay, how about, how about I can do better than that, right? I here don't write it to the files closed. I think I can be even more daring, right, great. No, no, no, remember, this is for the daring, right? This is, these are the risk takers, right? I'm not gonna write through a Spencer, yeah. Yeah, yeah, how about that, right? Because look, I mean, writing when the files closed sounds like it's pretty daring, but what if the file is opened again, right? Then the file, I could just keep it in there, right? Like, why, hey, that's too early, man. That file might be reopened like an hour later, right? So I might as well wait and let it sit in the cache and hope that I can amortize a few writes. So, yeah, on the other hand, wait in until the file system shuts down or is unmounted or something, right? Or here in the cache, when we talk about this, we actually will say until blocks are evicted, right? Because the cache is an infinite size. So at some point, I'm gonna have to remove a block from the cache, right? This is kind of like virtual memory. I have to evict a page or I have to evict a block from the cache. At that point, I have to actually write it to disk, right? Otherwise, the changes aren't permanent. But in the best case, if I had an infinite size buffer cache, this moment would be shut down, right? And that would be great. And we can call this approach what's called a write back cache. So a write back cache is kind of the opposite, right? So a write through says everything hits the disk as soon as possible. A write back says I wait as long as possible before I make a write to the disk. I essentially only write when I have to write for correctness, right? And I think you guys can imagine, right? That the write back cache is, the write back cache will amortize as many disk operations as possible, right? The whole point of buffering writes in the cache is that I hope that I can combine a couple of writes to the same disk block and move them into one write to disk, right? The longer I hold blocks in the cache, the more of those operations I can amortize, right? The shorter, with the write back, sorry, with the write through, I never do this. With the write through, there's a one to one mapping between changes to the cache and operations to disk, right? With the write back cache, there's the end to one mapping where n is as large as possible, right? That's one way of thinking about it, right? Of course, for safety, right? This is much better, right? Because things are on disk as soon as possible, especially changes, and so if I crash the number of uncommitted changes, this is the smallest possible, right? What can I do as kind of a middle ground? I think, Jeremy, you've hinted at this before, right? What's one thing where if I didn't want to wait as long as possible about it, I didn't want to write things as soon as possible of room? Okay, okay, yeah, so actually that's a great point, right? So one thing I could do, right, is I might say, hey, you know what? Like, operations to file system metadata are pretty important, right? Inodes, super block operations, on disk data structures, those I might write immediately, right? What would be left? What would I be actually caching at that point? I'd be caching rights to what? If I let the metadata go straight to disk, Tim, what would I be caching? Yeah, user level stuff, let's be more file system friendly here, yeah? Yeah, data blocks, right? Remember, blocks on disk divided into two categories, data blocks, metadata, and actual data, right? So data blocks, data blocks that are part of the file, and everything else, right? Everything else that the file system uses to find files, to organize information, et cetera, et cetera, right? So when you format a disk, some of that disk goes immediately into these on-disk data structures, the rest is reserved for data blocks. So I can still buffer data blocks in the cache, but I allow metadata blocks to be written through immediately, right? What's another approach? Another way to balance these two things, Bethany? Yeah, oh, yeah, that's, okay. So that's actually a really, that's a really interesting observation, and it's not one that I put up here, but they're actually, yeah, so they actually have, there was some research work maybe five years ago looking at user perception of when things should be in sync, right? And it was actually using that do this sort of thing. So that's certainly, that's a potential approach, right? But remember, to some degree, when you hit save, like let's say you're using work, right? And Word actually will auto-save a lot, right? A lot of those auto-saves never hit the disk, right? They're just sitting in the cache, right? And that's okay because I'm doing that because Word is gonna auto-save like another 60 seconds later, right? And a lot of those blocks will be the same, right? But I could, and Bethany's writing that most file systems have ways to force things to disk, right? So on Unix, there's a file system system called called sync, right? The idea behind sync is sync is supposed to tell the file system, synchronize your on disk, synchronize the on disk file system with whatever changes I made up to this point, right? So sync should essentially flush everything from the cache. And I can do this on a whole system basis or I can do it on a per file system base. But what's a different middle ground here? Yeah, AJ. Yeah, so periodically, right? I might, sorry, so somebody pointed out I can write this immediately but delay data writes. I also can use sync. And a third approach is that I can essentially periodically write things back, right? So periodically at a regular interval, like maybe every minute or something, and this can be configurable for some file systems, I flush all the dirty cache blocks to disk, right? And what does that do, right? Well, it takes that window of time that I was worried about failures and it shortens it, right? It means I'm not gonna give you this huge target of like eight hours where I've got a dirty cache block that's really important and I'm just kind of be like, hey, shut me down, let's see what happens. I'm gonna like write that out periodically so that the period of time in which a failure can cause a problem is reduced, right? Where'd that go? Okay, so one of the things that we talked about is the fact that making changes to any part of the file system requires multiple changes to disk boss, actually, yeah. Yeah, I mean, you can imagine having, yeah, okay. So this is a great question, right? So one of the things we haven't talked about and we're not going to talk about is buffer cache block allocation policies, right? So we've been talking about this buffer cache thing. You have some vague idea about how it's organized. Sadly, there used to be a fourth assignment in this class that you guys don't have to do, part of which was implementing that. It turns out that implementing a buffer cache and choosing what blocks to buffer in the cache ends up feeling very similar like an assignment you guys are going to do, right? At least to me, right? So it's actually saying like how, you know, if I thought about it, how might I decide what blocks to keep in the cache? My buffer cache is not infinite size, right? It's a cache, right? It's smaller than the disk, right? It's much smaller than the disk. So how would I decide, you know, actually answering your own question, I mean, how would I decide what blocks to keep in the buffer cache? Well, yeah, I mean, but in general, I want to keep sort of hot blocks in there, right? I want blocks that are accessed regularly, right? So the idea is that my cache works better if I have a lot of hits in the cache, right? It's like any other cache. What I want, think about the TLB, right? When did the TLB work well? When most of the addresses were being translated by the TLB and the kernel didn't have to help, right? It's the same thing here. Caches work well when my operations hit the cache, right? If I hit a disk, then I'm as slow as the disk, right? If I hit the cache, I'm as fast as memory. The more often I can hit the cache, the better. So file systems and buffer caches do have this block allocation challenge, right? Which is to figure out I've got the small amount of memory which blocks from the file system should be in there, right? And so certainly one thing that I want to look at is activity, right? If I have a bunch of processes that are reading and writing to the same file, that's a great file to cache, right? Because a lot of my file system operations will not hit in the cache, right? What are other things that would be pretty good to cache? What are other pieces of the, just generic, right? Generic pieces of the file system that would be good to put in the cache that are accessed often. Okay. What's that? Yeah, Sean. Well, you're getting closer. Where do the timestamps live? Yeah. Yeah, the file system metadata, right? So even if things like this, for example, the super block and all those on disk data structures, they may not, I still might write them to disk immediately, but I still want them in the cache, right? Because they're used all the time, right? What's another file system, like, what's an inode on the file system that's probably used quite a bit? Yeah. The root inode, right? Like any absolute path name translation starts with the root inode, right? So there are certain parts of the file system that I probably want in the cache, right? That are metadata, and then data blocks are probably gonna be data blocks that are driven by usage, right? So yeah, if somebody opens a file, if a bunch of processes are using a file together and there's a lot of reads and writes to that file, I would want as much of that file in the cache as possible. Yeah, okay. So back to thinking about files to some animicity, right? So we talked about how we want these operations to files to look atonic, right? So if I append some data to a file, I want, on some level, when I, if I'm interrupted in the middle of doing this, right, by a failure, I want it to look like it either did happen or it didn't happen, right? I don't want to be stuck in some inconsistent state where I've got an inode that's dangling or I've got data blocks that aren't, are allocated but aren't associated with any file, right? But we also talked about the fact that this always involves writing a bunch of disk blocks, right? So, so if I think about the disk, right? From the perspective of the operating system, we know that writing multiple disk blocks is an atomic, right, it requires multiple operations, right? What is something that is atomic? When I think about the disk, if writing multiple disk blocks, I can't think of it as atomic, what can I think of it as atomic? Paul. Yeah, how about writing one disk block, right? I mean, on some level, I sell the disk, this is the amount of data I want to write, 250, you know, 4K or whatever, and the disk is going to seek and it's going to write that data, right? And, you know, I mean, to some degree, yes. Is there a window of time when if I shut the disk, it might be in the middle of doing the right? But I actually, yeah, it might actually finish the right. Some disks have some on-disk battery and other things will allow it to complete operations, right? But that period of time is very, very, very small, right? So if I write one disk block, right, then it either happened or didn't, right? And so, this is so embarrassing. I don't know why I have this quote up here. I used to be a big Tom Clancy fan when I was a kid. I find that to be embarrassing to admit in public, but one of the Tom Clancy characters, right, had this, what I considered when I was 12, this really brilliant insight about the world, which was, you know, if you don't write it down, it never happened, right? So how many, did anyone else read these books? Oh yeah, I don't know, kind of. Anyway, it's a dark period of my life. I do read better books now, I promise. So anyway, so one of the tricks the file systems use is we exploit this fact, right? We exploit the fact that writing one disk block is atomic, right? And what we do is we keep changes to the file system in a data structure that's called a journal, okay? And a journal is a lot like what it sounds like, right? A journal is a data structure that the file system uses to, you know, say, hey, it's April 8th and I'm a file system and, you know, like I was asked to write this disk block and I was gonna go do it, so I wrote it down in my journal that I did this. Like it's probably the most boring journal you would ever read, right? Except for the ones you guys might keep yourselves. But the best thing about these journals, right, is the entries are very small and they can be used very easily to bring the file system into a consistent state after a failure, right? So again, there's a special area on the file system, a special thing about it is a file, really. And there's a structure for these entries, right? So the idea is that the file system keeps track of the things it's doing, right? The things it's done to the file system and the things, and this is one way that I could figure out when I fail what had been finished, what was done, what wasn't done, right? And again, so here's an example of this, right? So it's like, okay, let's say I'm creating a file, right? So what do I have to write in the journal, right? Why I say, okay, I'm gonna allocate this iNode, right? And I'm also going to associate these data blocks with it, right? And then I'm gonna add it to this directory, right? This is our example from before. I'm creating a file in a new directory. I'm creating a new file in an existing directory, right? So I need the iNode for the file, right? So here's my iNode number. I found some data blocks in a certain part of the disk. I'm gonna associate those, and then I'm going to put it in this directory, right? And then there's usually a message in the journal that says that's an operation, right? So these operations themselves, right, are not atomic, right? This is gonna be one disk write. This is gonna be a couple other disk writes. This is gonna be another disk write. So this is gonna require multiple writes. However, writing this entry in the journal, I can potentially do with a single write, right? So again, this journal data structure is pretty compact, and so I can write this out, right? So now let's say, so it's okay. So I'm going along and I've written this down in my journal, now let's say at some point later, all of these changes are on disk, right? Some of these changes I cached, right? Maybe I had some changes to these data blocks, so I didn't write them out immediately, but at some point, all of these operations have actually made it to the disk, okay? So what do I do at that point with my journal? Yeah, I update the journal, I create what's called a checkpoint, right? So essentially, again, as I'm going along, I'm keeping track in my journal of things that I'm going to do, right? When I know, so I cannot checkpoint this journal until all of the operations before it are on disk, right? However, once they are on disk, what I do do is I go through and I check off everything, right? So what it means is, every time I perform a checkpoint, it means that all of the changes above it in the journal are on disk, right? However, what does it mean about changes below the checkpoint? So let's say I fail and I come up and I find it, so I go back, I find a checkpoint, the last checkpoint in my journal, right? And then I have a bunch of changes after that checkpoint. What do I have to assume about those changes? Did they happen? Nothing. They might have, but I can't, right, I don't know. That's the tricky thing, right? Because some of these changes are gonna, these changes are not gonna, oh, sorry. These changes are not gonna happen in the same order, right, depending on the caching strategy that my buffer cache is using, you know, I might actually update this I node first and then later I go back and do this or whatever. So just because I have a checkpoint, it doesn't mean that none of the operations have made it to disk, some of them have, right? But I can't assume that any of them have because I haven't created this check, right? So I think we are, yeah, we're basically out of time. So I'll leave you guys to think about this for Friday, which is, what do I do when I recover, right? So when I recover, I've got this journal, I've written down things I'm going to do, I've taken very, very good notes, and I've written, I've also kept track of the things I've already done, right? So when I start up the system again after some sort of failure, how do I use the journal to recover things? That's what we'll talk about on Friday. I'll see you guys on Friday.