 All right. Good morning, everybody. So today we are going to finish talking about file systems. So there is basically two weeks of class left, which is kind of sad, but we're almost done with lecture. So today we're going to try to wrap up file systems by looking at one more file system design, sort of interesting file system design point that we call log-structured file systems. For the rest of the week, we'll see what happens. So on Wednesday I'd like to talk about operating system structure, which we haven't really spoken about yet, but is something that I think is interesting and brings up some interesting design points. And then we might do some performance analysis and hopefully get to virtualization, which I might move up a little bit this year because it's kind of fun stuff. But yeah, so this is kind of the end of the last big piece, and then there's some just scattered little things that hopefully will be fun. All right, so let's see, if you're on track then hopefully you're in the final throes of assignment two, just sort of wrapping up any last little things that are breaking or whatever, submitting patches, looking at the auto grader output. We've decided that we're going to continue recitations through the week that class end. So the last lecture is two weeks from today on the 29th. We'll do some exam review at that point. For the rest of the week, we'll do exam review and recitation. We've also decided that we're going to continue office hours until the assignments are due. So we'll have office hours through the 10th, whenever that is, if that's a Friday. And just thank the TAs for being willing to, and I don't know if this is part of their job description or not that they've agreed to do it. So yeah, they'll be here to help. And I've seen more people in office hours recently, which I assume means that the end of the semester is approaching. People are actually working on stuff, so that's good. Yeah, everything is due on the 10th. 11.59 PM Eastern Standard Time, modulo when I get up on Saturday morning. So at some point, after that point, all of your assignments will mysteriously not be able to be submitted anymore. So yeah, we'll see exactly when that happens. But anyway, 11.59 is the latest that I will guarantee that you will be able to submit there. All right, so on Friday, we talked about a fairly old crufty file system design, but something that was nice in terms of being very tailored to the characteristics of disks and how disks operate. So any questions about the Unix fast file system or FFS? Any questions about this before I do a little bit of review? So what's close on disk? If I want to allocate things close together on disk, what does that mean, Robert? Yeah, so the same track would be the closest I could get them, but what do I really measure close this by? How much I have to do what? Alyssa, move the heads, because that's the slow part. So really, what I'm more interested in as closeness is measured laterally. The rotational, OK, that helps too, especially once I've got the heads to the right place. But what I really care about in spinning disks is this, because this is what's slow. And this is the major component, and then this rotational, there's this rotational component as well. But the C times really don't. So what are some of these things that FFS did to try to exploit disk geometry? Tim, what was one of them? What's a cylinder group? Yeah, it's essentially all the part, it's one part of the disk that I can get to without moving the heads very much. So it's all the tracks on all the platters extending downward through the disk that are close to each other, where I don't have to move the heads too far. And what did FFS do with cylinder groups? Right, I can't do that. But how does FFS use that to improve disk performance? Yeah, so essentially each cylinder group is kind of like what? Ping. I've got one big file system. Each cylinder group is almost as a super block. It has inodes. It has data blocks. What does it start to look like? Funnily. Yeah, like it's own little mini file system, right? And we saw legacies of this in EXT4 as well. So each cylinder group is stuff that comes from multiple platters. And on FFS, each cylinder group has essentially all of the components of a full file system, right? So it's really like I have my own little mini file system. So here's another, we haven't talked about this, but here's a little interesting piece of disk layout errata. So if any of you guys have taken physics at any point in your life, you might be able to answer this question. So let's say that I have this platter, right? And my platter is spinning at a fixed rate. Let's say it's spinning at 15,000 RPMs. Where would I put the head so that the underlined magnetic substrate is actually moving the fastest underneath the heads? What part of the disk would the heads actually be able to read data the fastest, Jeremy? Towards the outside, right? Think about it. On the outside edge, the track is longer, but it's rotating at the same speed. So on modern disks, where they can actually read at full bandwidth from the disk, file systems will actually try to lay out, start by laying out data toward the outer edge of the disk, because it's actually faster to read. And what's the other consequence? So because the track is longer, what does that mean? What other consequences does that have? It means that the data is spinning faster on the heads, but what else? Yeah, that's not what I'm getting at, though, Jeremy. I could store more data, right? So the tracks on the outer edge of the disk actually store more data than the tracks inside, just because they're longer. Think about it. If you ever seen a race on a track, that's why some of the runners have to start farther ahead, right? Because they're actually going to travel farther of the curvature of the track. So what else does that mean? So I can read data faster. I can store more data on each track. So what is that going to help me cut down on, Sean? Seek times, right? I mean, at some level, seeking is moving from one part of the disk, and the amount of seeking I have to do is determined by the amount of data that's on the disk. So if I can get more data on each track, I don't have to seek as far between tracks potentially to pick up all of a certain size file. And so the outside of the disk is optimal in a variety of ways for this sort of thing. So modern file systems will actually do this, right? So they'll actually try to pack as much data as they can towards the outer edge of the disk for these reasons. Just another little piece of bizarro disk geometry stuff. And we saw examples of this going back to FFS, right? So one of the things was FFS had this incredibly gnarly scenario where it could actually compensate for the fact that the disk can't read from full bandwidth by actually sort of jumping over tracks, right? Skipping a few blocks on the same track so that the disk read bandwidth matches the bus bandwidth, and I don't have to stop and kind of reset all the way around the track. And again, to some degree, some of these tricks and really, really sort of low level layout sort of customizations are gone, right? Really have been lost. Some of them aren't that useful. But we've seen a bunch of legacies of FFS in EXT4. We saw the idea of essentially cylinder groups, right? And we still see some accommodation to disk geometry today when we're laying out on big spinning disks. All right. Any other questions about FFS before we go on? OK. So FFS circa 1982, all right? So now it's 1991, all right? How many people are now born? OK, good. We're doing better, right? So at least this file system is one of your contemporaries. You guys are probably like one year old or something. So who remembers 1991, right? What's different about 1991 from 1982? Let's see here, right? So Eye of the Tiger, classic song, 1982. But we've moved on in rock and roll history. That's a terrible song, right? So really, in terms of music, you can think maybe things are going downhill, right? I don't know. Silence of the Lambs, Gandhi, it's, yeah. Maybe this was like a moral decay, right? In 1982, we were interested in heroes like Gandhi. In 1991, we were interested in serial killers. So cannibals, actually, right? So that's great. What's different about disks? 1991, Jim. Yeah, they're getting bigger. What else are they getting? Or not getting, yeah. They're not getting, they're getting a little bit faster, right? But it's this sort of IO crisis that we've talked about. Discs have gotten bigger. They're not getting much faster. But so disk bandwidth is improving, right? Meaning the operating system can stream rates rise to the disk faster, right? And the other big change that's occurring during this time is computers are starting to have more memory, right? So computers in 1991 might have had like 128 megabytes of RAM, right? It's like, whoa, this was like a huge amount, right? Now I think like one tab on Firefox consumes that much memory, right? So I don't know what that says about, well, anyway, I use Firefox. You guys know I'm embarrassed about that, so I'll just stop talking about it. So computers have more memory. Disk bandwidth is improving, right? But what about seek times? Who thinks seek times have improved based on Moore's law during this period? No, seek times are still terrible, right? Now again, these are physical objects, right? So the bus speed is improving, right? Remember, with FFS, we were just talking about this. I was doing this thing where I was actually skipping blocks on disk because the bus bandwidth wasn't fast enough. OK, well that's fixed, right? So we've got the bus running faster, right? We have more memory on the machine. Seek times are still terrible, right? So we're still focused at this point in time on trying to reduce seeks, OK? But what LFS or a long structure file system is going to do is take a kind of a fun and fairly counterintuitive approach to how to do this, right? So let's see. And again, so this really comes down to how can we do fewer seeks, right? How can we do fewer seeks? Are there clever tricks to just avoid seeks all costs, right? Seeks are going to make the disk feel slow. It's going to make the system feel slow. So are there ways to avoid doing seeks, right? Jeremy, what's that? Yeah, exactly. Defragmenting back, that might have actually made your disk feel faster, right? It's supposed to now where it's just like one of those OCD things that weird people do. All right, so and then again, so we'd take these two things together, right? So we have all this memory lying around, like 128 megabytes of memory, right? Maybe even like 64 megabytes of it isn't free when you're playing Winter Olympics on Apple too, right? So could all this memory be used, right? Can we use it to improve this performance? So what have we, well, we just talked about this last week. So this is good review. I mean, what did we do? What was one way that we put spare memory on the system to use? Manish, to improve file system performance. I'm ignoring you, Jeremy. Yeah, we used it, right? We used it as a buffer cache. So we're gonna put this buffer cache in front of the file system. We talked about this a week ago, right? This is review, okay? And so, but then remember with this cache, right? The cache was more effective at one thing than something else, right? So let's say I've got a big cache and I let the system warm up and so a lot of the files that are in use are in the cache. So what at this point can I potentially avoid doing entirely? I mean, fill in these sentence, right? With a large cache, we should be able to avoid doing almost any blank, Nick. Reads, right? So remember, and this is really the observation that I think led to log structured file systems, right? Which is that caches are great for reads, right? Caches, once they're warmed up and once I bring data into the cache, that cache will essentially sit there and absorb all the reads that I'm doing, right? And even if I write through the cache, as long as I update the contents of cache when I do writes, I can still absorb future reads to blocks that are hot that are being updated, right? So caches are great for reads, but I still have to do writes. But one of the observations of LFS was that the cache can help me with writes because what we can do is we can use the cache to sort of gather a bunch of writes together, right? So I can let part of the cache get dirty, right? And then after a period of time or when the blocks are evicted or at some point for consistency, I'm gonna flush that out. But now I can do a bunch of writes all at once, right? Remember, we talked about this maybe a week ago. The more operations I can give the disk to do it once, the better for the disk and the better for me, right? Because the disk can do better scheduling. So if I tell the disk, here's 10,000 different blocks to write. That's fantastic. Basically, the disk will sort them into some optimal order and they'll just do one nice pass all the way across the disk writing all that data that needs to be done, right? So the cache can help me with writes a little bit, but it's really helping me just optimize the use of the disk, right? So I'm gonna soak up some writes in the cache, reads I'm gonna just assume, hit the kid in the cache. I'm gonna assume that the cache is my panacea for reads, right? So again, so now forget everything that you've learned about file system design. Forget about EXT4, forget about EXT4, forget about FFS. What is the best way to avoid doing seeks when I write the disk? The best way to avoid doing seeks, right? Write everything on one track. Okay, so that's not a terrible answer, right? Yeah, how about write everything in the same place? Okay, write everything in the same place. Now, it's not exactly the same place, right? But, because if you wrote everything in the same place, then you wouldn't have much data left on disk after a while, right? But I wanna write everything to essentially the same place, right? And what these guys did, so this is John Osterhout and Metal Rosenblum who invented walk search and file systems, is they came up with a nice idea where they said, you know what? Let's just write everything to the same place, right? Not exactly the same place, but essentially the same place. So I keep a log on disk, right? This is a, you know, how many people have ever seen a log from an application or something? You guys know what logs are, right? A log is essentially a big, long list of things that happen, right? And all I'm going to do when I do writes is I'm just gonna keep appending to this log. I'm never gonna seek anywhere else on disk to do a write. I'm always going to write to the same place, right? Again, I'm gonna keep moving the disk out a little bit, right, as the log grows, but essentially I'm not gonna do any seeks when I do writes. I'm just gonna keep appending to this log, keep appending to this log, right? So how does this actually work, right? It sounds like a great idea, right? If you can pull it off, okay? So let's go back and review. So let's say we normally want to, let's say in a normal file system, we want to, I wanna modify one byte into five, right? What do I need to do in order to do that? What's the first thing I need to do, Josh? I wanna modify a byte in a file. It's the first thing, damn. Yeah, so I need to read the inode map. So I need to find where are the data blocks for this file on disk, right? So the first thing I'm gonna do is I'm gonna read the inode map, right? So that's gonna require one seek. What's the next thing I'm gonna do? Harish. So now I need to read the inode, right? So I found the inode map, so I know where the inode is. Now I need to read the inode to find out where the data blocks are, now what do I do? Looked at. Yeah, so I'm gonna find the data block that I need and then I'm going to seek to modify the data block, right? And now what's the last thing I need to do? A little bit of housekeeping here. Sam. Update the inode, right? So maybe I have an updated time on the file or whatever, so I need to update the inode, right? So in this little toy example, right, there are two reads that I'm doing in two writes, okay? So, and this is how this would look, right? So I seek to somewhere on disk to read the inode map, right? Now I found the inode, so I seek over here to read the inode. Now I find the data block to somewhere else, I'm gonna write to that data block and then I'm gonna seek back to write to the inode, right? I wish I would have used this earlier in the semester, right, so, and you can imagine, you know, on EXD4 this might be one group, right? And so this might not be perfect. All the inodes might be actually right here and the data blocks might be right here, but who cares, right? I mean, in general, the idea here is that I do one, seek, two seeks, three seeks, right? And potentially I'm jumping all over the disk, right? So now let's remember that our big cache, our big 64 megabyte disk cache, right? So, I mean, I've been saying this, so why, like, just to make sure that we're all in 1991 together, right? I could have played everything I do, I do it for you, but that would have made me throw up in my mouth a little bit, so I didn't do that, but, I mean, why is this, why would like a 64 megabyte cache be so effective in 1991? It's really small, yeah, maybe, that's part of the answer I want, yeah. Yeah, the disks weren't that big, right? You might have had like a, I don't even know what size disk, but remember, like, I'm in college in 1998 and I think like a 20 GB disk is huge, right? So, like, the disks weren't that big, right? So, yeah, I mean, this was in a 64 megabyte cache in front of a one terabyte drive, right? That would be terrible, right? The drive, the terabyte drive now probably has a bigger cache than that on disk, right? Like, as part of the device. So, small disks, right? Fairly small disks, but, you know, large-ish cache, right? So let's assume that our big friendly cache is gonna soak up these reads, so now what do I need to do, right? So now, the nice thing is, I still don't have to do these reads, right? So I'm just gonna assume that the inode map and the inode for this file are already in the cache, okay? They've been brought in before and so these reads are gonna hit in the cache, I'm not gonna have to do any disk operations, but I still end up having to do this seek, right? So, even if I assume that the reads are gonna soak up, gonna be soaked up by the cache, I still have to do these seeks to do writes, right? And this is where log structured file systems come in, right? So here's, you know, conceptually how a log structured file system looks on disk, right? So this is my disk, this is the start of the disk, this is the end of the disk, these are disk blocks, you can think of it this way. I mean, clearly, like the ordering of disk blocks, if it even maps onto disk location, which we talked a little bit about last time, maybe that wouldn't even be true, but the point is you can imagine that if this is zero and this is, you know, the largest block on disk, that this sequence of blocks that I'm showing here is a big rectangle, like actually spirals around the disk in some way, right? It jumps in between tracks and stuff like that, but on some level, it starts on one edge of the disk and it works its way to the other, right? But that 3D representation of what a disk would be like is a little bit difficult to work with on a slide, so this is a rectangle. Okay, so, and the idea is at some point in time, what's happened is when I started running my log structured file system, I started with a log, right? And the log started at the beginning of disk and over time what I've done is I've just kept depending to the log, right? And we're gonna talk about what those appends actually look like, right, and how I can get away with this, okay? So whenever I write to a file in LFS, the current inode and current data block are somewhere in the log, right? Remember, everything that's written on LFS is written to the log, okay? So somewhere in the log is the last copy I wrote of the inode and the last copy of that data block. When I make a modification to these, where do I put the new copies? So I'm going to, this write is going to change both the inode, because I'm going to update the modified time or whatever, it's also going to change the data block, right? Because I'm changing the byte in the file, right? So where do I put the updated copies of these structures? The log structured file system, right? Yeah, which is right here, right? So this is actually what happens when I make this modification to LFS. What LFS will do is it writes the new data block, the updated data block, and the inode to the front of the log. And it just updates the inode so that the inode now links to this data block, right? Rather than the one that's in the log. So what does this mean, Jeremy, do you have a question? So that could be true, right? But why, so remember, you have to drink the LFS Kool-Aid for just a minute here, right? Why don't I care about read seek times? Mute. They're going to hit the cache, right? Just pretend I never have to do a read again, right? Like, you are correct, right? And one of the things people thought about with LFS, and we'll get to that argument a little bit later in the lecture, is do reads still matter? How much do they matter? Yeah, I mean, LFS is giving up a lot about layout with reads to do nice writes, right? Because the idea is that now the data blocks for my file could essentially be anywhere on a disk, right? Just random locations, okay? So if I have a read bound workload, this might work very poorly. But remember, I have a big cache and I never have to do any reads, right? I'll just stay in that mode for a minute. Okay, so, but now that I've written these updated I-nodes and data blocks, what's true about those two old copies, Jen? Are they valid anymore? No, right? So essentially what I'm gonna do is these are now free space in my log, right? This, you know, and I need to keep track of this somewhere, right? But the idea is LFS is going to indicate that these are stale copies, right? What I would have done normally is I just would have overwritten these with the new data, but that's not how LFS works, right? So I wanna do all the writes to the same place. And so what happens over time is that the log is going to, so over time, what's gonna be in the log, right? Imagine I run this all the way to the end. What's gonna be, my log is gonna be a mixture of what and what, AJ. Well, I know it's in data and other disk structures, but let's split them into two bigger groups, yeah, Tim. Yeah, some of my log is gonna be essentially free space or sort of stale stuff, right? Old copies of I-nodes that aren't current anymore, old data blocks that aren't correct anymore, you know, maybe other, we'll talk about other data structures that might be in there. And then some of it's still gonna be good, right? So there'll be pieces of it, like maybe this is the last time I update this file before I get to the end of the log. And so these are still good, right? Everything else is just stale, right? And so again, I mean, to some degree, if you buy that reads, I mean, there's two things you have to buy if you're gonna buy LFS, right? There's two things that the LFS salesman has to be able to convince you of, right? One is that reads don't matter, right? Reads don't matter, they handle by the cash. What is the other potential problem with this, though, that you guys can see coming? Yeah. Well, maybe, right? But I mean, the idea is when I get to the end of the disk, I have a problem, right? What's my problem, Sean? Yeah, I mean, essentially, the problem is the disk is not full, right? I can't tell the user, sorry, the disk is full, right? Because most of the log is empty space. So now I have to do something about this, right? And LFS calls this log cleaning, we'll get to it in a minute, right? Okay, right, but there's some thorny problems here, right? So the first of all is, the first is when do I actually do writes, right? So remember, I still wanna try to stream as many writes together as possible. Like, I've done a better job with Seeks, but it's still better to gather writes into the cache. And this starts to affect consistency, which is something that we talked about last time, right? But this is a huge issue, but essentially what I do is I try to gather as many writes in the cache as possible and push them out together, right? I think the original LFS used to do syncs when the user called sync explicitly or when blocks were evicted from the buffer cache itself, right? But you can imagine doing this at a periodic basis, right? But there's some issue about just trying to gather writes together, because the more writes I can amortize. And the nice thing here too, right, is that remember, because I'm writing to the same place, it's even more efficient to gather writes together, right? Because I don't have to do all these big Seeks, but the idea is that the writes may stick together. The other thing that happens is, so this is another thorny question, right? So who remembers how FFS, how did FFS translate I know numbers to disk blocks, right? Remember, the file system does not understand names. It just knows about numbers. So on some level, translating a path name meant translating a sequence of characters into a number. Once I had the number though, how do I find the file itself? So on FFS or on EXT4, like, you know, I know numbers 632, how do I find where that I node is located on disk? Anybody remember, Jeremy? Yeah, I mean, there was some, like, well-known location where I put my I node arrays, right? So EXT4 and FFS had these areas on disk that were normally, they tried to co-locate them with data blocks where I had arrays of I nodes, right? And so I mapped the I node number, I could use that to find the array and then I could use that to find the I node itself, right? What about LFS? How does LFS do this? What do you think LFS does with the I node maps that map I node numbers to the I nodes themselves? So it could put them in well-known locations, but then what would it have to do? So remember, my goal is to put all the writes in the same place. If I put the I node maps in fixed locations on disk, can I still do all my writes in the same place? Anybody wanna, Nick? Okay, yeah, so if I tried to put them in well-known locations, then I still have to seek to do writes, right? Because when I write to the I node map, when I update the I node map, right, then I have to, and when I update I nodes, I'd still have to write to some other part of the disk, right? So I nodes are just dependent to the log, right, on LFS. So that's the other problem with LFS, right? So let me go back to the picture, right? Let's make this more clear. On, okay, so on our, on EXD4, right, the I node map was always in the same place, and all of the I nodes were always in the same place, right? So an I node with a fixed number would never, would always be in the same place on disk, right? I node number, once you format the file system, I node number three is never going to move. It will always be a specific, you know, 256 bytes or 512 bytes or whatever it is on disk, right? On LFS, on the other hand, these I nodes are moving all the time, right? I mean, here was the old I node, right? But then I appended a new one to the front of the log. So every time I modify a file, the I node for that file is going to move, and so I've broken this nice mapping that I was using to find I nodes, right? Before I had a number, and then I had to use it to find an I node. Now, but the I nodes were always in the same space, so that was easy, okay? Now my I node keeps jumping around, right? Anytime I modify a file, the I node moves. So what do you guys think LFS does about this, right? So I still need some way to map I node numbers to the locations of I nodes on disk, right? That's some data structure. Where do you think that data structure gets put every time I modify an I node and the I node location moves, many. Yeah, so I definitely have this in memory, but I need it on disk, right? I mean, this has to make its way to disk, otherwise I can't find I nodes, right? If I shut the system down and boot it up again, it's like, where are the I nodes? I don't know, right? So where does this data structure end up being put? Yeah, so it logs the I node, right? So maybe at the front of the log, I don't know exactly how it works, right? But the idea is all on disk data structures, right? In LFS are also just appended to the log, right? So anytime I modify I nodes and I move them around, I have to log the I node map, right? Somewhere where LFS knows how to find it, potentially at the beginning of the log or some offset from log, right? And all of this other stuff too. So remember I had these bitmaps that helped me determine which I nodes were in use, right? Maybe I don't need them anymore in LFS because I don't have these arrays of I nodes, right? But essentially all the file system metadata that we've been talking about that LFS needs is logged, right? So there was this really this nice consistency to their approach where they said, we're really gonna put everything at the front of the log, right? Anything that gets changed, anything that gets modified, it's gonna be logged. Frank, do you have a question? Yeah, that's a good question. So do you think LFS have this problem, right? I mean, we kind of talked about it as a problem, right? So remember with EXD4 and other file systems that allocate fixed arrays of I nodes, they're limited because at format time they create all the I nodes the file system will have and then if they run out, then they run out, right? Do you think LFS has this problem? Does LFS have fixed size arrays of I nodes anywhere? Did I hear a yes? No, I mean, it's just logging stuff, right? So on LFS, where's my thing, right? There we go, right? So on LFS, I just have this log, right? So if I create a new file, like let's say I create a new file on LFS. Where do I find the I node for the file? On LFS, I had to find a free I node, right? On LFS, what do I do? What's that? I have memory, right? What do I need? All I need to have to allocate an I node on LFS is what? Just space on disk, right? I'm just gonna put it right at the end of the log, right? So yeah, so LFS, because it doesn't have some of these data structures that are created at format time, I don't think has that same limitation, right? It has other problems, right? But it doesn't have that particular one, yeah. So when you say a big nested directory, some 15 directories, so 15 I nodes will be updated. So should all the I nodes go on the updated? Why would, I mean, if you updated all those directories, but why would you have updated them? If I update the file, the parent directory has to update the I node, the I node location will change. Yeah, yeah, but only the parent directory. So when you write the parent I node directory, even the location of the parent I node will change. Well, no, no, sorry, sorry, sorry. Remember, what do, yeah. So what do directories map? Directories map, path names to what? I node numbers. So if the location of the I node changes, I don't have to change the directory. I just have to change whatever data structure I use to map locations, the numbers to locations on disk, right? So if I change the I node, so it's a good question, right? The question was, if I update this I node and the I node moves, so I have to update the directory that the I node is located and the answer is no, right? Cause all the directory is gonna say is, this is still I node 635 or whatever. It's just in a different place, right? So the directory will still say etsy goes to 635, right? It just means that 635 has now moved and so the file system has to be able to find it. That's a good question. Any other questions about this? This is kind of a wild and weird, weird approach, right? All right, so let's keep talking about it. This is one of those things that's just like, it's conceptually very nice until you start to think about it, right? And then it just develops problem after problem, right? All right, so we did this. Problem with locating I nodes. We can log all this stuff. Okay, so now we have one of the big issues, right? Which is the log hits the end of the disk. Most of the space in front of it will be what? We kill. My log has grown to expand to the size of the disk, but most of the disk is what? It's still free, right? It's just, again, it's stale stuff, right? It's stuff that the file system had marked as free because it's copies of things that have since been updated, right? So there's usually a lot of unused space earlier in the log. The amount of unused space has to do with the pattern of file system operations, right? So for example, if I'm doing a lot of updates to files, right, especially if they're timed in a certain way so that they don't get gathered together by the cache, then I potentially have like, imagine I just update one byte of a file every minute or something, right? Then that data block gets written over and over and over and over and over in my log. By the time I hit the end of the disk, I've got, you know, 80,000 outdated copies of that stupid file, right? It's one of the reasons that LFS tries to do aggressive write caching as well because it doesn't want that, right? It only doesn't want to write, you know, the more sort of spurious updates that it does to the disk, the more free space there is in the log, right? So LFS had this, so again, LFS has this like great period of time in which it's just appending to the front of the log and it's not doing very many seeks and like life is beautiful, right? And then you hit the end of the disk or LFS actually did this in segment so it actually broke the disk up into smaller pieces but you can think about it as just happening in one chunk, right? So LFS gets to the end of the disk and suddenly I had this big chore to do, right? It's almost like, I don't know what the good analogy is, it's like, if you never cleaned, and some of you guys do this, I think, probably, so like if you never did laundry for long periods of time, right? And you just like threw dirty clothes into one corner of the room, right? And then pretty soon they're like expanding, expanding and like you're trapped in a corner and so it's like, okay, I've got to dig myself out of this situation, I have this huge cleanup job to do and that's kind of what it is, right? I mean, if you really like spending several days in a row doing laundry all at once, that's, you can do that, right? I don't have that much laundry so I couldn't get away with that. So at some point I need to clean, right? And essentially the way LFS cleaning works is, so here's my segment, that segment is some mixture so the white is free space, I think the red is metadata and the green are data blocks or whatever so I had this segment, large parts of it are unused and one of the reasons why LFS did this in pieces was it's easier to clean one segment into another segment, right? So imagine I have two halves of the disk when the log fills up the first half what I do is I gather, I compact the log, right? So I gather everything that's alive in the log and I write it into the beginning of this segment, right? Now this segment is clean, right? So now this segment can be reused and I can keep logging here so I have a piece of this segment that I can reuse and then this entire segment is now clean. Nick, do you have a question? Yeah, I mean at some point your disk can still fill up, right? But on LFS the problem is that, you know, you might get to the end of the log and only have 10% of the log be actually an active use for live data, right? So you can't just let the disk fill up with junk in the log, right? Sean? No, no, I mean that stuff in the, I mean that stuff is really, you can really think of it as being free, right? It means that I have a more up to date copy of that same piece of information, right? So if it's an inode, I have a more up to date copy of the inode for that file. If it's a data block, I have a more up to date copy and at some level, so actually, I think there are some versions of, so what could you do here, right? This is an interesting observation, right? So on some level that space isn't free, it's actually being used to store old copies of certain pieces of data associated with the file. So if you were clever, what could you kind of, what might be something that would be straightforward to add to LFS? Yeah. Yeah, some form of versioning, right? Because that's kind of what LFS is doing already, right? It doesn't overwrite old data, it just keeps writing new versions of things, right? And I think there are some versions of LFS that integrate versioning, right? Simply because it almost kind of comes for free, right? It means that at some point I'm gonna use more of my disk, but versioning always means using more of my disk, right? I have to keep old copies of this, right? But I mean the simplest way to think about is stuff in the log that's marked as free is stale, right? It means that I don't need that data anymore, and that's what's happened here, right? So you can imagine, you know, maybe this is a data block and there's a copy here, there's a copy here, there's a copy here, there's a copy here, there's a copy here, all those copies are relevant at this point, right? Because they're not up to date. All right, good questions. Any other questions before we go on? And it turns out, right, that cleaning is pretty terrible, right? Because, I mean, what is cleaning involved? So I've done all this beautiful work, right? To avoid doing what? Seeks. Now I've got this incredibly seek heavy, potentially job. And LFS came up with really nice, elegantly engineered solutions to make this fast and efficient, but it's still terrible, right? Partly because I've got two segments here, right? Remember that cut and paste that we watched before? That's kind of what's going on here, right? I've got one big chunk of the disc where I've got some crap in it and then I need to move into this other chunk of the disc, right? It's sort of, I mean, so yeah, I mean, I had this beautiful period of time where things were great and now I have to clean and yeah, so maybe LFS is like that, I don't know, it's like the brand new disc that feels really fast for five minutes and then it never felt that fast again, right? Because on some level, at a certain point, I'm always doing this, right? Once my disc starts to get active, I've always got to be doing some cleaning, right? There's always some segment that, you know, it's all full of junk and I've been trying to ignore it, but now I've got to go in there and clean it up, right? So, you know, LFS tried all sorts of different ways to make this problem go away, right? So they said, oh, we can run the cleaner only when the system is idle. But now you talk about a pretty high demand web server, it doesn't idle, right? There's no idle. Idle means that like, you know, something else crashed, right, in front of your, in front of your high performance database server or web server, right? So idle's not a good thing, right? You know, there's all these issues about what size of segments, right? So if I clean big pieces of the disc at once, right? Then that means that I can sort of amortize some of the cost of doing this cleaning. But so small segments, so there's one fairly nice case here, right? And this actually does happen, right? So in certain cases, depending on the size of my segment, it's possible that all the blocks in the segment are dead, right? And especially if I let, you imagine if I keep logging, right, and then I'm cleaning a couple segments behind where my log is, it's possible that all the blocks in that segment have been updated since that segment was used. And so in that case, cleaning is trivial, right? All I do is I mark it clean and I go on, right? So again, here's another sort of design trade-off. If I make the segment small, cleaning them is potentially very easy because I have this nice corner case that happens. More often, if I make the segments big, then it amortizes some of the overheads of doing it, right? And then it turns out that the other, you know, one of the reasons that LFS file systems seem to have been very popular to argue about is that the cleaner workload is incredibly workload dependent, right? Sorry, sorry, the cleaner overhead. So on certain workloads, it's like, cleaner's like, I'm done, man, you know, no problem, right? And on some other workloads, it's like the difference between, I had some roommates in college who played rugby, right? So, you know, their cleaner overhead was like incredibly high, right? Because they came in with like most of the mud from Cambridge on their clothing. So depending on what you do, the cleaner overhead can be either very light or very heavy, right? And so when people started trying to debate these systems, you know, there were lots of different ways to make log structure file systems look good or look bad, right? And depending on the point of view people take. And then, you know, as somebody pointed out before, if the cache doesn't soak up, let's say the cache isn't as effective as we want it to be, right? Well, now I've done, I end up with this really discontinuous block allocation. So what's happened is that I've made this trade-off where I said I'm really gonna try to heavily optimize for rights. If rights don't turn out to be my problem, then I'm in trouble, right? Because I've given up some of the tricks that FFS and modern file systems even today use to do better block layout in order to really, really focus on this one issue of rights and seeks associated with rights, okay? So there's some fun, there's a long back and forth about LFS. You probably find like long discussions in like, you know, news groups and stuff like that. The stuff like the earlier versions, like this is 4chan 1991 style, right? So, but there was, people thought about this. So there was this original paper by John Usterhut Rosamund 1991. So Margot Seltzer, who's now at Harvard, but was one of John Usterhut's students, re-implemented LFS and did some additional performance testing. So at this time, you know, FFS was still around and it was considered to be kind of state-of-the-art. And so there was this big argument that took place over, you know, FFS versus LFS and there were multiple, papers with different workloads and things like this. So she, they made some improvements to FFS, now FFS beats LFS, right? And of course, Usterhut came back and said, oh, you did a terrible job of implementing LFS. So, you know, you guys are, this is like the, you know, in this community, this is like, oh man, don't insult my code hand, right? So, and of course they insulted everything about this, right? Like you didn't implement it well, you chose bad benchmarks and then you did poor analysis, right? So it's like, there was, I mean, maybe all those poor things like gathered together actually like negated each other and led to some good analysis, but I don't think that's what he thought. So, and then in 1995, there was a second paper by, by Margo again, you know, questioning these LFS performance lines, right? And then this starts to get a little bit more nuanced. So it's just to be more about, you know, what kind of workload is good for LFS as opposed to FFS, how much tuning are we willing to do in order to, you know, produce good performance for a particular file system? Nick, do you have a question or are you? Yep. Well, so it's really interesting, right? I think that now, I don't know, maybe a gigabyte or something, you know, movie files, right? Like high def video, a couple gigs maybe for a, your VM image for this course, you know, a couple, I'm sorry, I wish I had done better with that, but I'll try again next year. Yeah, so, yeah, but actually, one of the other things that's happened is that file access patterns have changed a lot too, because I would argue that some of those media files have very, very different access patterns. There was a, I want to finish up, but there was a study that was done at Harvard on NFS about, I think it was maybe five years after this, right? So they got these NFS traces from the, from the campus servers, right? The general purpose login servers, you know, again, back in the old days, you used to like log into a machine and run this like terminal program to read your email, right? It was called pine, I think there's other versions called elm and stuff like that. It's pretty crofty now, obviously, but, and what they found, which was really interesting was that there were these really weird workload characteristics that were the result of email, because people had these huge inboxes, right? I mean, this was before Gmail, but people still did this. They had these huge inboxes with tens of thousands of messages, and those inboxes were stored by the mail clients as one file, right? But all the activity to that file happened at the end, right? So they had these really sort of odd workflows. Anyway, so this goes, so, you know, Uster who, of course, comes back and he's still not happy because, you know, the paper doesn't say that LFS is better than, you know, better than Wonder Bread or whatever, I don't know, so this went on, right? So this is Margo. Margo actually initiated the creation of this, I don't know, this wonderful educational operating system that you guys are using for this course. She taught me operating system. She's a wonderful person. She was involved in this debate. Next time we will talk about OS structure on Wednesday. Done with file systems. Good luck with your assignments.