 So today, we're going to talk about two file systems that are fairly different. So a file system designed from the 1960s, 1970s called the Berkeley Fast File System, which of course is a great name for a file system, because what else would you want your file system to be? And then we'll talk about a different, very different approach to file systems called log structured file systems. And we'll see how far we get today. So I'm going to briefly just remind you to do assignment three. Obviously, you know that. OK, so no review today. Let's just go straight to the materials and say we have a fair amount to try to get through. So Berkeley Fast File System. So this was first released in 1982 as part of one of the Berkeley standard distributions of UNIX. So Berkeley was one of the schools that was involved in early versions of the UNIX operating system, along with IBM and some other places. And Berkeley released a distribution that's still quite popular called BSD. How many people have ever used BSD? Yeah, if you guys go on and take Ken Smith's summer course on operating system stuff, he does a lot of BSD stuff, I know he's a big BSD person, like BSD people are kind of their own breed. They don't really mix very well with other people all the time, particularly with Linux people. There's a lot of hate in the BSD community about Linux, sort of some of it slightly irrational, I think. David Holland, for example, is a BSD person. So he grumbles whenever you try to get him to use something that smells like Linux. OK, so the Fast File System was developed and continues to be developed by Kurt McCusick. So one person wrote this file system, just pretty cool. And there is an ongoing effort to continue to develop FFS. It's now called UFS, or the UNIX file system. And last time, a couple years ago, Kurt was still working on this project. And there are still releases of UFS that you can get. It's still used on certain systems. So the original Berkeley Fast File System we talk about because it made a lot of contributions to file system design. Now some of those contributions were very lasting contributions, certain types of features, certain ways of thinking about the file system. And then there was some that were very, very much tied to the specifics of spinning disks and to certain quirks and properties of various spinning disks. We'll talk about both today just to give you a sense of what these were. So essentially, if you want to divide the two features, the features that depended on disk geometry have tended not to really have not aged very well, partly because determining things about disk geometry has become very complex. And of course, when you start to think about things like flash, flash doesn't even have the same geometrical properties that spinning disks have. So when we start thinking, so we talked about way back, and this is sort of a continuation of our earlier discussion on file systems, one of the things that file systems wanted to do to improve performance is to, and you can really think about this spatially, to think about where to put stuff. So data ends up in a physical location on disk, on some planet, like you could point to it if you could open up the disk. So when we start thinking about this sort of thing, what types of considerations might the file system want to make when it thinks about where to put things on a spinning disk? What are some of the aspects of the disk location that I'd like to consider? Yeah. Yeah, so like genuine spatial locality. So where do I put stuff? Where do I put inodes? Where do I put data blocks? So where in general would I like my data blocks to be? Randomly located all over the disk? Where would I like the data blocks to be? A couple of answers to this question. For good performance on a spinning disk, yeah. What's that? OK, but the head moves around. So now you've just given me an answer that's essentially anywhere on disk. Together. I want the data blocks to be together. Why? Because the files are frequently accessed. I open a file and I read the file. The whole file and then maybe I read into memory and I process the file from start to finish. So close to each other, close to what else? I'd love to have the data blocks close together. I'd also like to have them close to what? To the inode for that file. Remember, the inode is this data structure that has metadata about the file. So if I could put the metadata and the file contents as close together, then as possible, then I can reduce the seeks that are required when I open and read a file. Because to open the file, I remember I have to get to the inode. Hi, how are you doing? I thought he was going to start dancing. So yeah, where to put the data blocks? How about where to put files that are related to each other? So these two first things are single file optimizations. But then I have even more fun games I can play where maybe I want to put all the files in a particular directory close to each other because they're all accessed around the same time or something like. So I can look at temporal patterns of access to multiple files. Of course, that forces me to answer this question, which is what files are likely to be related to each other, and there's a bunch of different ways to do this. So FFS, in many ways, you can sort of think of it as being the first modern file system that had a lot of features that you guys are familiar with. And these are some of the more lasting contributions of FFS. We'll come back to the rotational planning stuff and admit. So FFS included introduced larger block sizes. And we talked before about why this was nice. FFS included this idea. FFS allowed me to allocate data blocks close together on disk. This is good, because remember, even if I can't do anything else, I certainly want the data blocks for a particular file to be close together. Modern file systems have gotten away by sort of playing a game where they trade off storage efficiency for locality by allocating these big chunks of a file, which EXC-4 calls extents. And the area of file systems lacked all these features that you guys are used to, so symbolic links, file locking, which is used by a lot of programs to do IPC, user quotas, file name links that can be as long as you wanted. That's like an awesome feature. You guys should be really happy that that feature exists. So yeah, stuff that you guys are used to. How many people have used a machine recently that had a limit on the size of the file name? Good. That's a good answer. If you've used a Mac, you have a machine that even has a weirder property. Does anyone know what Mac does? That is something that I've never been able to understand? Yeah. Has case-insensitive file names. It's like the weirdest thing I've ever heard. And the thing that's so weird about it, sorry, this is a little bit of a sigh, but there has to be extra code in there somewhere to make the file names case-insensitive, right? Because if I just compare the characters and the two file names, they're going to be different. So you guys understand what this means? Essentially in a Mac, if you create a file called capital foo, and then you open lowercase foo, those are the same file to Mac. I have no idea why this is true. I once tried to install a Mac OS machine, and I decided I hated this feature, so I formatted the drive with a case-sensitive file system. Turns out it will not install, which is totally bizarre. Somehow the installation process has to dependency on case-insensitive file names. Just blows my mind, right? Anyway, OK, so that's a feature that came along later, I guess, thanks to the Mac people. OK, so go back to think about disks, right? So based on your knowledge of disk geometry, what stuff's close on disk? Particularly given the enemies of closeness that you guys understand. So if I wanted to say this particular area of the disk is all quote-unquote close, what would that be? Yeah? Stuff on the same track? OK, it's a good start, but I think actually there's more. You guys remember the cylinder? Yeah, so the seek times are a major problem. The rotational delay is something that we're not even going to bother with. So FFS is the first to introduce the idea of a cylinder group. So a cylinder group consists of all the blocks that are essentially on neighboring tracks, but on any throughout the entire disk on all cylinders. So it's a bunch of cylinders that are close to each other. And FFS use cylinder groups to try to do better spatial layout when allocating blocks from a file or putting in inodes and things like this. And on FFS, every cylinder group has its own super block. This turns out to be a backup copy of the main disk super block, which is nice. And then it has a header that's almost like a super block for that part of the disk. It has inodes, it has data blocks. So it's almost like one of the things FFS did is it created these little mini file systems located throughout the disk. So rather than having one big file system spread all the way across the disk, I break up the disk into a certain number of cylinder groups, and I spread out the file metadata across those cylinder groups. So it's almost like a little mini-knee file system. At some point, people are going to stop laughing at this. And I'll know that I'm really old, but I'm not there yet. You guys remember this from EXT4? So remember when we looked at the EXT4 metadata, you saw this stuff, group 1, group 0, group 1. This is the same thing. This is a legacy of FFS. So this is where it gets really sort of wild and crazy is the rotational planning that FFS would do. So let me ask you a little, before we go into this, because I don't think this has come up yet this semester, let me ask you a little basic rotational planning question. So you are, for some reason, forced to write a new file system for spinning disks. Where is, and you're starting to allocate the file system is initialized using the whole drive, you're starting to allocate data blocks. Where should you put your first data blocks? What's that? The outside. So someone is claiming that I should start allocating at the outer edge of the disk. Why is that? OK, so the outside of the disk is going to store more data than the inside, but that's actually not really why I like the outside so much. What do I like about the outside? It doesn't really matter. The heads are going to move. Yeah, so it turns out I can actually read and write faster to the outside of the disk than I can to the inside. Because think about it, the outside track is longer, but in one rotation, all of that area comes underneath the heads. If I go to the inside, the track is shorter, and so I can write less data per rotation. So that's like terrible, right? This is like this, you know, no wonder people like doing file systems so much, or all this gooey stuff to think about. FFS got even, because you have to remember this is the 80s, so these are even older drives with even worse properties. And so FFS just went totally nuts on the rotational plan. They really got deep down into the guts of these drives and understood things about them. So for example, imagine, and this was true for some early drives, that the speed at which the heads can read from the disk was faster than the speed at which the disk could send data back to the OS. So the speed at which the heads can read is faster than the bus that carries the data back to the rest of the system. So here's the problem. If I have data that's located in consecutive blocks, the disk heads read block 0, and they start transmitting it back, but when they get to block 1, there's no room in their buffer for another block. So it turns out I have to actually go all the way around the disk again, read block 1, all the way around the disk again, read block 2. So this was bad. I don't want to have to do this. So what do you think they did to fix this problem? How do you work around this? I mean, you certainly don't want to do another round trip every time. I mean, rotational latency is smaller than the seek time, so I'm already doing OK. I'm on the same track, but I still don't want to have to pay an extra round trip every time I pick up a block. So what do you do? So they actually incorporate this delay into their layout. So rather than putting the blocks for a file right next to each other, I interleave them a little bit to allow the buffer to clear. So this is one of these things where you're like, I'm totally impressed, and yet I'm horrified. I'm horrified that we had to do this. This is clearly a dark time in human history when we had to do things like this. These problems have been fixed, thank God. All right, so here's the question with FFS, which is kind of the big $100 million dollar question, which is, does this stuff matter anymore? I mean, we talked about the FFS features that matter, that you guys are used to. And if not, this is a good thing. It's a good thing that these things are gone. And it turns out that this is actually sort of an instance, how this has evolved is an instance of this battle that you guys, whether you want to or not, are now a part of, between hardware people and software people. Although maybe there's some computer engineers in here too. So you guys are on the other team, okay? You guys are the hardware people. And the thing is the software people always want more flexibility, they want more control, they want the hardware to do what they tell it to. And the hardware people are always like, no way. Like, you guys, and your software never gets it right, I'm super fast, I'm gonna make my hardware do these clever things. So actually, I had a friend in graduate school who, so keep in mind, these old disks had very, when you told an old disk where to put something, it put it where you wanted it. And because the disks had these weird rotational properties, the file systems started adapting to that. So 20, 30 years later it turns out that the disks were doing a lot of internal remapping. So you would say to the disk, store this data in three blocks that you think are right next to each other and it turns out one of them is way over on other, some other side of the disk, right? They're consecutively numbered, but the disk is playing these games with where things go. This turns out to be even more funny on RAID, which we're gonna talk about on Friday, where there's really not necessarily any relationship between the IDs of the disk blocks and where they are on disk. And so my friend did this experiment where he took some file systems that still had legacies of these old rotational planning features in them and he ran them on drives that had no locality, right? There was no relationship between block IDs and locality. And it turns out that those features actually start getting in the way. It would be better to just get rid of them. So he wrote this paper called, I think it was called Stupid File Systems Are Better. And the argument was let's get rid of all of this block ID layout and just pick random block places to put things because the disk is gonna move them around anyway. And so why, you know, try to plan, uh-oh. All right, so, and this comes down to this question of who's responsible for making the slow drive fast, right? So, and the hardware wants to do this for you because the hardware thinks I'm good at it and the benefit that you always get in the hardware is that the hardware is fast, right? Remember our TOB example, the hardware is fast. So hardware is typically faster than the software, however it's a lot less flexible. And the software has the other attributes, right? So for a while, this was sort of how things worked, right? The software told the hardware where to put things and the hardware did what the software wanted, right? And the nice thing about this is the OS now has a lot of visibility into the way the hardware operates that you can potentially use to improve performance, right? The bad things about it is if you talk to hardware people is that software is slow and because it's written by you idiots who don't test your code properly, it's buggy. So this is actually true, right? You have to think about the mentality that you've acquired because you write computer software, okay? So for example, when people build a hardware chip it costs like a million dollars to make. And that's not a fictional number, that's a real number that I know a friend of mine had to spend it turned out they won some contests but that's how much it took to make a copy of his computer chip. So imagine every time you compiled your OS 161 kernel you had to pay a million dollars. Your workflow would be different. Imagine every time you compiled it you had to pay one dollar, right? Forget a million. And so hardware people think differently. I mean, to them this is really high-stakes stuff because if a chip comes back and it's a bug, you're done. In fact, the million dollar chip I'm describing it came back and it had a bunch of bugs. Didn't work very well. And that was very sad for the people that spent the money on it. And actually they ended up sending it to some guys who tried to like fire lasers into it from the side to fix various parts of it by somehow, I don't know how it worked, but he was that guy that you go to when you have no other options, right? It's like I bought this million dollar computer chip it doesn't work. Can you fire some lasers at it and see if that helps? I mean clearly this is an act of just purest desperation, right? There's very little chance that's going to work and it turns out it didn't, right? So they had this million dollar chip that didn't work completely, which is sad. So anyway, and again, this is why the hardware people want to do this, right? Because the hardware thinks I know more about me. In the case of the disk, the nice thing too is the hardware buffers and the hardware cash are all closer to the disk. You don't have to cross the bus. The problem is that this can frustrate things that the operating system is trying to achieve. We talked about that last time with consistency where if I told the disk to write something to disk, I actually really want to know it's on a magnetic medium and not lodged in a cash somewhere. All right, so FFS is still under development. There are trade-offs here between block sizes that they're exploring. They started to do things like co-locating inodes and directories. So remember, directories typically involve seeking to the inode and then also grabbing the contents. Well, for a small directory, why don't I just take the directory contents and jam it in the inode? It got a little bit of space, maybe that works better. And then they have this very elegant solution to consistency that's called soft updates, which I can't explain at all actually, but would be a fun topic to look at. This is sort of an alternative to journaling. It's a lot more sophisticated. Okay, so we're done with FFS questions. This is 1982. Yeah, yeah. So most modern hard drives will remap sectors, right? We talked about this earlier where the disk comes and some sectors don't work. And so what the disk does is it wants to produce a consistent namespace for the OS. It has some spare sectors that are set aside for this purpose. And so if it tests itself and it realizes, oh, sector five doesn't work, it says I'll take spare sector 10 million and call it sector five. And you see the same thing. There was actually a very interesting study about three or four years ago where they ordered, and this is a fascinating experiment. They ordered, I don't know, 20 hard drives, identical model, same batch from the manufacturer. It literally might have come off the assembly line within a couple days of each other and they ran performance tests on them. And what did they find? Huge differences between the same identical drives. And this has to do with things like remapping and other features. So this is a pretty common feature. And then of course when you get to things like RAID or network file systems, then you really have no idea. There's no point trying to do rotational planning on top of a system that doesn't have any, what you're relying on is spatial locality between close blocks that have numbers that are close to good. So the assumption is block 21 and block 22 are going to be close to each other on disk. If that assumption breaks down, don't bother. That's a good question. Your question's about FFS. Where we storm through log structure files. Okay, so now it's 19, now we're onto the, so FFS is 1982, so now it's 1991. And what has changed about the world? So the top song that year was everything I do, I do it for you, which replaced Eye of the Tiger. I think Eye of the Tiger is a way better song. So in that case, 82 wins. What's that? Yeah, this was a very dark time. Everything I do, I do it for you. Yikes. I should get rid of this slide because every year I have to remember that song exists. The Silence of the Lands, Oscar-winning movie, 1991. Okay, but anyway, what's different about disks since that time? What would you guys predict? If I said, what do you think has changed from 1982 to 1991? Yeah. What's that? Yeah, well, okay, so the FFS guys were pretty pressing. They're talking about PCs. They're thinking about PCs. They're more prevalent. There's more PCs out there. But how have this changed? You have to compare the 82 disks to the 91 disks. You would probably say that one thing had improved a lot. Yeah. Capacity, right? I don't know why this doesn't work. Yeah, so the disk bandwidth is improving. Disk capacity is improving. So by bandwidth, I mean, when I get to the part of the disk and I start reading stuff, I can get it back to the OS quite quickly. That's partly a function of density. There's more data per unit area on the disk. And it's also a function of the bus technologies. So once I get to the place where I want to read, I can, or write, I can read and write quite quickly. Computers have a lot more memory now in 1991. So 128 megabytes of memory. It was like a lot. Pretty awesome. I mean, it really, you guys have to keep in mind. It really is pretty incredible how much this stuff has changed in the past couple of decades, right? I mean, if you extrapolate from here and think about what computers are gonna be like, I don't know, when you're in your 40s, it'd be pretty interesting to find out. Okay, but on the other hand, what do you think has happened to seek times? Are seek times following Moore's law? No, seek times are following like dog's law, right? Which is that they don't, I mean, look, let's be honest, dogs are improving over time, right? The dogs we have now are better than the dogs we had a hundred years ago. They're just not that much better, right? They don't improve exponentially. Okay, so now here's the thing. So we have this problem. We've got all this bandwidth on the disk. Again, the bandwidth has gone up quite a bit. If only we could fix this problem with seeks. And then there was another thing on that last slide, which is that all of a sudden we have more memory. So why does more memory help us? What's good about the fact that computers now have more memory? And maybe in that 128 megabytes of memory, there's some spare memory line, right? I think Chrome probably uses 128 megabytes of memory per tab now. But back in the day, that was a lot and so you had some extra, right? So what can I do with this memory to make the file system faster? Yeah? Yeah, I can use it for the cache, right? So I'm gonna use a cache, that's my usual trick to make a big, slow thing look faster. So I've got a, and what's happening now is the cache is improving. Okay, yeah, I think you guys know this by now, right? So the cache is improving. And the cache, the theory here is the cache is now gonna be very effective at soaking up reads. Reads from the disk. Remember writes still need to probably get out to disk, but once I read something once, I can, you know, the cache is gonna help me make sure that I don't have to read that thing again. So the cache is gonna soak up a lot of my reads. So now let's focus on writes. Now, the cache can help me here because it can allow me to collect writes in the cache and then write them out to disk in a larger chunk. So if I have a bunch of modifications in the same disk block, those modifications can be amortized in the cache and then I can write them out later. As we remember, there's a consistency trade off here, but I'm assuming that I don't have a write back cache, or sorry, I don't have a write through cache. I'm not gonna write every write. I'm gonna use the cache to soak up some of the writes and allow me to write in bigger chunks. But here's the problem. I'm still, if you don't forget everything we've learned about OS design or file system design, those writes are still gonna end up all over the disk. So what can I, what would be the goal here? If you forget everything that you know about file system designers starting over, where would you like all those writes to go? At some point I have writes that have to go out to disk. For best performance, those writes should go where? Where should those writes go? The disk would be oh so fast if I could just write everything where? You know what's the guess? Write everything to the same place. Never have to move the heads. So clearly this doesn't make any sense, right? This is nonsensical, but if I write everything sequentially, if I write everything the same place then I only, my disk is only one block big and that would be sad. But if I write everything sequentially then I'm moving the heads as little as possible. So ideally every write goes to the next block on disk. So this sounds fantastic, right? So how are we actually going to accomplish this? So log structure file systems were developed by these guys out at Stanford, John Usterhout, who is actually going on to do work on memory only servers and other things. And one of his graduate students in the time of a guy named Mental Rosenblum was now a faculty member at Stanford. And the main idea of log structure file system is quite simple, it's a very simple elegant idea which is all of the writes go to an append only log. Now the reads can still come from anywhere on disk but why don't I worry about the reads? Why am I so focused on writes? What's gonna help me with the read problem? The cache. So the assumption here is the reads or the cache is gonna soak up a lot of the read bandwidth and all I need to worry about are the writes. Okay, so actually now the question is how do we build a system like this? So conceptually this is very simple. I want to write everything into a depend only log. The reads can come from wherever they need to come from. I'm gonna assume I don't have to do too many seeks to do reads because the cache is my friend. So let's talk about what happens when I on a normal file system when I change a byte in a file. So what would I normally have to do I want to change one byte in a file. We're gonna write one byte to the file. This could be a reasonable short answer question for the exam. What are the steps that I need to take? What's one? You said readable. Okay, so let's say I already have the block and the cache. I have a modified block. You're right. I wouldn't need to actually read the block first but let's say I have the modified block that's ready to be written out. What do I need to do? What are the steps here? Okay. Yeah, I need to seek to read the inode map. I need to seek to read the inode. So the inode map's gonna tell me where the inode is. I need to find the inode. I need to read that. Then what's next? Once I'm at the inode, I can find what? Yeah, I should be able to figure out where the data block is that I need. So I need to seek to write it. And then I probably need to update the inode when I'm done to mark that the modification time has changed. So what does this look like? Here's the inode map. Here's my inode. Here's my data block. And then I go back to the inode. And so you can imagine that the head is potentially zooming all over the platter to accomplish this right. And that's what I want to stop. I want to get rid of as many seeks as possible. So now let's assume that the big front lay cache is gonna soak up the reads. And so I'm not going to have to read the inode map because that's gonna be in the cache. And I'm not gonna have to read the inode because that's gonna be in the cache. But I still have to write to modify the data block and update the inode. So I still have two locations on disk that I need to visit to complete the write. So here's what happens on LFS. In LFS, the disk is a big append-only log. So there's only one place on disk where I write data. That's here. So here's my current inode. It's somewhere in the log. Here's the current data block. That's also somewhere in the log. These were written earlier. All I do is I write the data block. I write a new copy of the inode. So now what happens is that this data block and new copy of the inode have replaced these old ones. So I need to mark that these old ones are no longer valid. And so now I have free space in my log. So what I've done, as you see, is again, I am not going to seek to write, never. So all the writes go at the end of the log. And this means that writes don't overwrite the old copy of whatever it is I'm writing. Writes just invalidate that old copy. Does this make sense? We have any questions about this? Yeah. Yeah, I'll come back to that, right? Yeah, yeah. Yeah, that's the whole crux of the matter here, right? So yeah, but it's a good point. I am not seeking back here to mark these as free. I either need to have some memory, in-memory data structure allow me to do this, or it turns out that I don't even necessarily have to do. I can do this in another way, which we'll come back to. Any questions about this? So here were my current inode and current data block that had been written previously by these same operations. It turns out in this case, I'm writing them both together because of the operation that's seeking place. And it's probably, it would probably be true that these would be together in the previous part of the log as well, because I can't think of as, well, maybe, well, anyway. I think most modifications in a data box will require modifying the inode. So these things will tend to be close to each other in the log. Yeah, yeah, great question. Okay, we'll get there. So I just wanna, okay, so this is what's nice about this, right? The cache soaks up reads, and now writes can be streamed to the disk at full bandwidth. As fast as I can write to the disk, I can, because there is no seeking at all. All, or I shouldn't say that. I'm just seeking to the next block. So from time to time, I need to scoot over a track, but in general, I'm just writing, writing, writing, writing, writing, writing into the middle of the disk. But right, there's some issues here. First is, ideally, I want to write as many things as possible together. So I wanna write in large chunks. So for FFS, I want to wait as long as possible to amortize writes, and so I write to the log when the user calls sync, F sync, or when blocks are evicted from the buffer cache. I could write these things earlier as well, but I want to collect as many writes in the cache as possible, so that I can utilize all that disk bandwidth that I have. The other problem is the following, right? Which is, how does FFS know where inodes are? Remember on, so how did EXT4 solve this problem? And FFS and other previous file systems. So, usually, I have too much, I shouldn't say, usually frequently, I have to map an inode number to an inode data structure. I have to find the inode on disk given a number. How did earlier systems do this? Yeah. Yeah, they put them all in well-known locations. So there were groups of them, and I record the location of those groups in the superblock, and so it's very easy to find things. On FFS, what happens is they end up all over the place. So I keep modifying them, and they could be anywhere in the log. So how does FFS handle this? So it's, so okay, so this is what FFS did. For LFS, inodes can move, and so what does LFS do? Yeah. It logs the inode map. So it requires an inode map. Now the inode map will be in the cache, right? The inode map is one of those data structures that's just gonna be in the cache pretty much all the time because it gets used all the time. But when I modify it, I just write new copies of the inode map to the log along with everything else. Because remember, I need the inode map to be persistent. When I restart the system after a power down, I still need to know where everything is. So I need to be able to find the inode. So the inode map is always in the log, the most recent copy of the inode. And the inode map is just a data structure that maps inode numbers to locations on disk. Pretty similar to what the other systems use, except those locations now can be anywhere. And so I have to have a special data structure to map. So all the metadata about the files, inodes, data block allocation, bitmaps, all logged. Essentially, LFS tried to take this approach where all of the file system metadata and data blocks, everything that's written is all logged. I never want there to be special locations for anything on disk. Everything just ends up in the log. Now, here's the issue though. At some point, I'm gonna run out of space. I'm gonna hit the end of the disk. Now, when the log reaches the end of the disk, is the disk actually full or not? I see people shaking their heads. Why is the disk not full? I mean, at the end of the disk, right? I mean, I've logged, all there is to log. And now I shall report that the disk is full, right? Yeah, there's all sorts of dead stuff in the log. And this is where the battles over LFS really began, because there's a, probably, potentially, definitely, a lot of unused dead items that are in the log. Old inode bitmaps, old inodes, old data blocks, all this stuff that's been superseded by later log entries. And in order to reclaim this space, we need to perform a process called cleaning. So cleaning has to use the data structures that the file system, the current data structures to figure out what is dead in the log. And you can imagine how to do this. I go through the log, block by block, I figure out what this is. If there's a newer copy of this data block that's linked to the file that's later in the log, I just say this data block is no longer needed and my market is free. So conceptually, you can think about LFS getting to the end of the disk and doing this process where it cleans the whole disk. In reality, this isn't what happened. The reason for that, so LFS divides the disk into these segments and cleans them individually. Why is that a better idea than waiting until I get to the end of the disk and do it all at once? Yeah, it's not fault tolerance. That's another property that people like to. So how would the file system behave if I did this? Yeah, it would be like a little pop-up would show up and it's like, please come back 20 minutes later, right? So I don't wanna do this. Essentially why I'm cleaning, the disk is out of space. So I don't want to wait and then have to halt all activity on the system for five minutes while I clean the whole disk and restart. That would be bad. So I amortize that by cleaning parts of the disk. So you can imagine I have little segments. Once I run into the next segment, I clean the previous segment and I repeat. So here's an example of how cleaning works. Here are the colored parts are parts of my log that are in use. The white stuff's not. So you imagine I take maybe all the metadata, I put that up at the front and I take all the data blocks and now I have a clean segment. Notice also that frequently this requires a two segment. I need an empty segment to clean into and I need, I think you can do cleaning in place but it's less efficient. It's more efficient to have a whole segment free and this is another reason that LFS broke up the disk into multiple segments and clean them one at a time. Cause this way I can have an empty segment that I've moved stuff out of and I clean it to that and then I'm done. So the performance, so LFS sounds like this fantastic idea. It's very elegant. The problem is the clean. This is where all these battles over LFS took place because you know, and there's, I think there's a certain genre of computer systems where I would say what they do, it's like a magic trick. They're like here, I'm gonna watch me wave my magic wand and I'm gonna make all your performance problems disappear. But of course they didn't really disappear. They're just hidden somewhere else and so your job is to sort of figure out what part of, what did you make really slow in order to make that stuff fast? And with LFS some of, so I think there's some real performance improvements to LFS but some of it was this idea that I'm deferring work into cleaning and then cleaning also has this other property which is, so there's all these issues related to cleaning. So the first issue is when should I try to clean? So I'm gonna want to clean when the system is idle but if I'm on a heavily loaded server that doesn't happen a lot because cleaning as you probably notice creates an enormous amount of IO. So while I'm cleaning, I'm creating disk activity that's really ruining the property of the disk that I want. Remember, I want all the right activity on the disk to go to a single place. I wanna do as few seeks as possible but by definition when I'm cleaning one segment I've got all this disk activity over there and then I'm still trying to use another segment and so I have real disk activity that's associated with new files and things like that happen in another place. So while I'm cleaning to some degree I've lost some of the benefit that I wanted from LFS in terms of seeks. The second question is how large of a segment should be clean? So large segments are nice because it sort of amortize the cost of to read and write all the data because anything that I read from, some percentage of the segment is going to be dead and everything else needs to be copied. So I need to touch everything once and anything that's still alive actually I have to touch twice because I have to write it as well. However, small segments increase the probability that I have this miraculous moment where I realize that the entire segment is dead. Yes, because then I can just say forget it. I don't have to copy anything. I don't even need another segment to copy things into. This segment is clean and I'm finished. So that's a nice corner case and the smaller the segment is, the more likely it is that there's no alive, valid data left in it. It's all been superseded by later entries of the wall. So the other thing that I think caused what we'll talk about next is the fact that how bad cleaning is depends on a bunch of these parameters but it also really depends on the workload that you run. This is something that we'll come back to when we talk about performance next week but right now just keep in mind that I can run one test. It's like back when Mac was still using those PowerPC processors. I always had these really glossy ads in magazines showing how much faster their PowerPC processor was than the Intel processors. And of course it was funny when they finally switched over to Intel processors because they were kind of like, well, all that stuff we said before and not true anymore. Now the Intel processors are the best because we have to be using them. Anyway, I mean you could run the same benchmark on two machines and two different results then you can run different benchmark and get the opposite results. So the benchmarks matter and we'll come back to this. So the other problem is with reads which is that a lot of, so LFS doesn't bother. LFS is so focused on writes that it doesn't bother with all this other nonsense as these other file systems are thinking about like trying to make sure the blocks are close together on disk because who cares, blah, blah, blah. But if the cache doesn't work as well as I wanted it to, the blocks on disk that I'm writing that are associated with one file are all over the place, right? Just scattered throughout the logs because LFS is paying no attention to where things are. It just wants to put things at the end of the log. So that can lead to very, very discontiguous allocation for single files. And if the cache doesn't help me out, the way I thought it would, I could be in trouble. All right, so I just wanna, so there's this long sort of debate about this. So you see the small font, right? So 91 was the original LFS paper. So in 93, Margot Seltzer re-implemented LFS and compared it with FFS at the time and found that LFS did not perform as well as people had claimed. And then the original, of course, when you build something and you tell people how awesome it is and someone else comes along and they say it's not as awesome as you said, usually what you say next is, no, it actually is as awesome as I said and you're wrong, right? So then there's this long back and forth between the two of them. Wooster, how does, I love this, right? You know, poor benchmark choice, poor analysis, poor BSD LFS implementation. There's really nothing that he liked about, about this next attempt to measure LFS performance. And so again, Margot Seltzer comes back 1995 and talks about the fact that, you know, now imagine that you're the person who writes a paper about LFS. This is your new awesome file system. Maybe when you were doing your performance comparisons, you didn't put quite as much energy into tuning the other system that you were comparing against into that you did into tuning your own system. I mean, this is very natural thing for people to do. And, you know, that made a little bit happier, but he still was complaining about misleading in several ways, right? So anyway, so there's this long back and forth in the research community about how much LFS had improved things. But of course, in this situation, everybody won because part of what LFS was doing was causing people to re-examine how file systems were designed and also people to improve the performance of FFS, which is a file system design that I think is probably contributed in a lot of ways to the file systems that you guys use today, okay? So does anyone know anything else interesting about Margot Seltzer? Anyone remember? What else Margot Seltzer did? Is it intimately related to you guys? So she is the person who started the OS 161 project that created this fantastic simulator slash torture chamber that you guys have been using this semester, right? So you can blame her for this. David did most of the programming, but it was Margot's idea. She taught me operating systems. I have a soft spot for Margot. I wish I could give her the last word in that debate. I'll have to find another quote. All right, so on Friday, we're gonna talk about RAID. Please try to read the paper, okay? You know, there's equations in the paper. Don't worry about the equations, right? The tables you might wanna look at, they're kind of interesting from historical perspective. But please try to skim the paper and get a high level overview. We will talk on Friday about how to approach reading research papers like this and also about RAID. And I will see you guys on Friday.