 Let's get going. Good morning. Good morning. It's Friday, end of the week. And I'm sure everybody's tired, tired of working out assignment two, tired of this stupid class, tired of talking about file systems. So let's everybody get up, stand up. The actual standing, Jason, I'm looking at you. You're always the last person to stand up. How long will everybody have to wait for Jason to stand up? Hold up. Dachi, too. Oh, I don't know, man. Put me in a tough spot. Dachi? Your leg hurts. See, try that excuse next time. That's more effective. All right, everybody sit down. Should have class standing one day, just to see if people are. OK, so today we're going to talk about log-structured file systems. This is the last file system that we're going to talk about. And that's like a sign that everything that happened in file systems that was interesting happened before or around 1990. Because here we are, we're coming ever close to the present. We spent Wednesday in the 80s talking about FFS, and now we're on to a new decade of innovation in computer file systems. But we're going to talk about LFS. LFS is a fun design for file systems, so it should be kind of a fun lecture. There's some fun history here as well that plays into this story. So Simon 3 is on its way out. We're just working out the last kinks with it. If you've taken all your late days, I think if I calculated correctly, a Simon 2 must be turned in by tomorrow at midnight. So that's it. And we're working on the assignment 2 solution. I kind of hacked it together yesterday using the old code, and it seems to work, but I'm going to bang out a little bit more in case you guys would like to use that going forward on a Simon 3. If you don't have, look, I think it's a really baller move to keep you in your own code base. Maybe we'll have some awards at the end of class for people who have not chosen to use the solution size. But at the same time, if you don't want to spend your time working on the Simon 3, debugging all of your Simon 2 problems, and your Simon 2 isn't really in a strong working state, then I would suggest that you grab the solution and just move on. Lick your wounds, take your losses, and kind of move forward. So we talked about the Berkeley FAST file system on Wednesday. So anybody have any questions about the Berkeley FAST file system? Really kind of beautiful file system design, very intimate with the properties of disks. Any questions about FFS? Going once. Going twice. Can I ask a question on about the Berkeley FAST system? I guess. In Windows FAST 3.2 system, you cannot transfer. Right. Why do we need the limitation? We don't. And it would be great if it didn't exist. Is there any similar limitation? No, actually. So there probably was at some time in earlier versions of Linux systems. Look, early file system designs made assumptions about the size of files. And those assumptions seemed reasonable at the time. So if it's 1980 or 1990 in your state-of-the-art machine has 128 megabytes of RAM, then maybe you're thinking, ah, a 4 gigabyte file, that sounds like plenty. So no, there's no reason for that limitation. That's what some people would consider to be a bug or a mal feature or something like that. But there's no good reason for it, right? And it frustrates and irritates people who have to work with those file systems, right? I'm trying to remember that I was reading about UFS because FFS has kind of evolved into UFS now, the Unix file system. And I think new versions of the Unix file system can support file sizes up to a word I hadn't even heard of before. It's like nanobytes or something. So I think that's like lots of bytes, right? It's not quite like Google bytes. But it's good to move in that direction, right? So now I think people will start to assume that files might get really big, right? We used to think, oh, yeah, 4 gigabyte file is never going to happen, right? And now that's a serious limitation on those file systems. So I think new file system designs are kind of like, OK, look, it's kind of like the longer IP addresses, IPv6, right? It's like, look, we ran out of IP addresses once. So now let's create enough IP addresses so that we can assign every atom in the universe its own IP address, right? And then we'll have enough, we think, right? Unless some atoms start to want two, and then you're going to have an issue, right? So anyway, yeah, that's just a limitation of FAT32 that's kind of a pain in the butt. All right, any questions on FFS before we talk a little bit about that? Yeah, what's that? Yeah, but I mean, that's yes and no, right? No, I think that's where that comes from, but I don't know why, right? Because on an indirect block file system, right, you can essentially, the file size, you could use 8 bytes on the file system to store the file size. That's not the end of the world, right? You know, ooh, I'm wasting 4 bytes so that I can have a file bigger than that. And depending on, remember we talked about the indirect block mechanism that allows us to associate many data blocks with a single file. So depending on how I set that up, I can potentially support really, really massive files, right? So if I have these quadruply indirect blocks, right, then every time I add a layer of indirection, I'm adding almost an exponent to my file size, right? So they can get big pretty quickly. Any other, good questions? Any other questions about file systems in general, FFS in particular? You know what, I have enough to cover today. I'm not really gonna go through the FFS review. I'll just let you guys look over this yourself. These are just copies of the old slides, right? So remember, FFS did both seek planning, right, with this idea of cylinder groups, trying to keep things that are related, almost like its whole little mini file system on each cylinder group, right? So without leaving the cylinder group, without moving the head very far, I can allocate inodes, I can allocate data blocks, I can, you know, hopefully put things in directories because hopefully the directory that I'm operating on is in my cylinder group itself, right? So it's almost like breaking the disk up into little mini file systems that are all trying to be located on a single cylinder group so I don't have to move the heads, right? And then, you know, FFS also did this funky rotational planning stuff, right? So disk geometry information was stored in the super block and used to try to actually figure out, you know, when I was done reading a certain sector, a certain block on a particular track, where was the head gonna be next, right? So there were a lot of like really delicate, intricate optimizations that FFS did to try to improve the performance of slow disks, okay? And we were kind of, at least I was kind of simultaneously amazed and bothered by this. All right, so yeah, any more questions about FFS? I wanna get to LFS because LFS is fun, all right? Okay, so again, we talked about FFS circa 1982, right? You know, most of you guys weren't even born yet. I was three years old. You know, you guys, we just don't have memories of computers that are at all, right? Okay, so now fast forward. It's now 1991, right? So now we're getting into at least the distant future. Some of you may have been born by that point, although probably some of you haven't been. So what's different about 1991 from 1982? Well, I collected some information from the internet. So a big hit in 1982, Eye of the Tiger, you know? Great song, actually. 1991, everything I do, I do it for you. I don't know, like, maybe not my favorite song. I was gonna play this in class, but then I was like, no, I didn't really wanna play Brian Adams at this hour. It's just too early for that kind of stuff. All right, a hit movie in 1982, Gandhi, 1991, Silence of the Land. So we're getting darker, right? I think as a society. 1982, big hit about a man who single-handedly helped overcome an entire empire, and then we're talking about some guy who likes to eat people. So this is different. But okay, what about disks, right? What about disks and computers? What's changed about disks and computers, do you think? From 1982 to 1991, right? I mean, again, you guys weren't alive, but you guys know about Moore's Law. You know about the personal computer revolution. So what's happened during this time period, right? So, disband width is improving a lot. Remember, we talked last time about how FFS was actually built for systems that couldn't stream data across the bus fast enough to actually keep up with the rotational movement of the disk, right? So now this problem is being addressed, and we actually can stream reads or writes to the disk much, much faster, okay? Assuming something, okay? Computers have more memory, right? You know, the big fancy machines that these guys were hacking on at that time were like 128 megabytes of memory. Wow, that was like a new feature. And actually, there was like, it was 64 bytes on one bus and then 64 bytes on another. There was something hacky you had to do to actually even jam 128 megabytes of memory. Now I think your phone has like eight times that much memory, right? So this is, these were the state of the art service at the time. What about disk seek times? Who thinks they know what happened to disk seek times? 1982, 1991. Maybe slightly better, but still really slow, okay? So you've got this improvement in disk bandwidth, but that assumes that you don't have to, you know, bounce the heads all over the disk, right? This is new, I'm gonna walk down an aisle. Not doing it very well, I gotta practice. Okay, so this is assuming, disk bandwidth's going up, but this is assuming that you don't have to move the heads. If you gotta move the heads, then you have this great improvement in bandwidth, but it's being wasted because I'm bouncing all over the disk, okay? All right, so, and here's the question that computer scientists at Stanford were asking themselves. So look, I still have seeks and seeks are terrible, right? Seeks are what make the disk slow, okay? But I've got a lot of growing bandwidth and I wanna get to that, that's frustrating me because I've got these seeks that are preventing me from really utilizing all this new bandwidth that's being added to these devices, okay? And of course, you know, the best way to improve performance is to take advantage of that bandwidth, right? So I've gotta solve this pesky seek issue, okay? And I've got a bunch of spare memory, right? Systems are being built with more and more memory, okay? So what do you guys think I'm gonna do here? Cash, all right? I can use a cash, right? Again, this is one of our system design principles. I'm gonna make a big slow thing look faster by using the cash, right? We've talked about this over and over. So I'm gonna put this, so now I've got this bigger cash on my system, more memory, and so I can cash the heck out of the file system. And this is gonna fix everything, right? I'm gonna use the cash. It's gonna fix everything. It's gonna be awesome. I've got this buffer cash thing. It's gonna be huge. And it's just gonna soak up all the traffic to the disk, right? It's problem solved, right? This is a short lecture today, you know? Everybody can go home, we're ready for the weekend, okay? All right, so I've got this huge cash, right? And what is that cash gonna soak up, right? What is that cash we think gonna be great for soaking up? It shouldn't, it should mean that I should barely have to do any what. No, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, ooh, writing and reading, that's all at this guy, oh, what about, what's the difference between reads and writes? Reads don't modify data, right? So with the cash, once I put a block into the cash, that cash should just sit there soaking up all the reads and I'm golden, okay? So caches should just improve the performance of reads dramatically, right? Why doesn't this apply to writes? Why can't I just have a cash sit there and suck up all the writes? Because if they don't get to the disk, then the data's actually not on the disk, right? And if the system fails or whatever, then I'm gonna have an issue, right? So this starts to become this, you know, this identification of the fact that caches are great for reads, right? Caches are fantastic for soaking up all this read traffic, but they are not as effective with writes because the writes have to go to disk, it's just that frustrating part about disk where we actually have to send data to them if we expect it to be preserved, right? I wish we could eliminate that feature. That however, the cash is also gonna help us with writes, right, and what the cash is gonna allow us to do is it's gonna allow us to coalesce writes in memory, okay? Until we can get, you know, a bunch of writes together and then we can do them all at once, right? So the cash does help with writes a little bit. So like writing one byte, right? You know, doing 10 one byte writes to one block shouldn't cause 10 writes to the disk, right? It should cause eventually one write to the disk, okay? So the cash helps with writes, but not to the degree that it helps with reads, okay? So we've been talking about sort of standard file system designs for a couple of weeks and I want you guys to just for a minute and I know you guys, I know that, you know, at least with me, you know, Leisure and, you know, reading magazines and, you know, imbibing certain kinds of beverages helps me forget things anyway, but let's, just for a minute, please try to forget everything that you've learned about file system design, right? So, and answer this simple question. I wanna avoid doing seeks, right? The cash is gonna help me with reads. The cash is gonna soak up reads, but I've still gotta do writes. So what's the best way to avoid seeks when I'm writing data? What's the best way? I've got a bunch of writes to do and I don't wanna do seeks. So what do I do? What do I do? What's the most obvious thing to do? I don't wanna do seeks. So what do I do? I just don't do seeks, right? I write everything to basically one place on the disk. Now that's the same place. This is entirely right, because then I would lose a lot of data, right? But I just try to keep the heads in the exactly same spot and I just keep adding stuff, right? I just keep adding stuff to the same location and I do the smallest seek possible. So I write 4K and then I get to the next block and then I write the next block and I just lay them out sequentially on disk, okay? So I'm just gonna write everything to the same place. And this is the key insight behind log-structured file systems. We'll talk about how they work, right? So there's two guys at Stanford. Mendo Rosenblum on the bottom was a, we believe it was a PhD student at the time. He's now a faculty member there and this is John Oosterhout who's an extremely well-known faculty member at Stanford. So they had this insight, which is if we want to avoid doing seeks, we should just stop doing them, right? And we should write everything to one place in the disk and then we'll deal with the consequences of that decision, right? We'll spend the rest of today talking about what the consequences of that decision are, right? Well, what there, okay? So, okay, so again, main idea. I treat the disk as a single append-only log, right? Log-structured file systems. All the rights are appended to the end of that log, okay? Anytime I write anything, it goes to the end of the log, okay? It sounds like a great idea, right? You know, easy, now I've just got this big log with a bunch of junk in it, right? So how do we actually do this, right? Okay, so let's look at what happens. So let me refresh your memory a little bit about what happens with a normal write, okay? So this is a normal write on a standard file system that has data structures spread all over the disk. It's not this log-structured file system thing. It's a normal file. So, okay, I wanna do a write, okay? And I wanna change one byte in my file, right? What do I normally have to do, okay? Well, I have to do a series of seeks, reads, and writes, okay? The first thing I might have to do is I might have to do a seek to read the INO map, right? Figure out where the data, where the INO corresponding to this INO number is, right? Now I've gotta do another seek to read the INO, right? So I've gotta move to some other place to actually read the INO to figure out where the data blocks are, right? Now I've gotta seek to the right data block that I need to modify, right? This actually might involve a couple of seats if I'm using a multi-level index, but let's just pretend it's just one more seek, right, to get to the data block, and I'm gonna do a write, right? And now I might need to update the metadata stored in the INO, like the modification time. So I'm gonna have to seek back and do another write to the INO, okay? So this is four seeks, two reads, two writes, okay? And let's look at how it looks on disk, right? So I seek to somewhere, and let's say my INO map, it's like way at one end of my disk where I know where to find it, so I do the read there, okay? Now I do a seek, I find the INO that I wanted, I read the INO, now I find the data block I wanted, somebody else I've got a, this is my second seek, and now I'm gonna seek back and I'm gonna modify that INO, right? So now the disk is done, you know, one, two, three seeks, and just sort of some random pattern across the disk. We've talked about ways to try to optimize where the stuff ends up, right? But still, like the optimization strategies that we've looked at aren't perfect, and they're still gonna lead to seeks, right? Okay, so let's say, okay, let's say that I'm gonna, now let's say that we have this nice cache, okay? And the cache is essentially gonna soak up reads, meaning that reads are gonna hit the cache. Let's just assume the worst, best case scenario here, and that the reads that we need to do are all in the cache already, so I don't need to read them again, so that means that the INO map is cached, so I don't need to read that off the disk, I can find that in memory. The INO itself is also cached, right? So I can avoid doing these first two seeks, right? But I still have, you know, two operations to do on disk, and some potentially large seek between these two structures, right? And again, I mean, a lot of file systems do a lot of work to try to put these things close together on disk, right? But the log-structured file system does an even better job, because again, all the writes go to the end of the log, okay? So this is my LFS file system, and this is the log I'm gonna tell you exactly what's in the log in a second, right? But the log is, what I maintain in an LFS system is a pointer somewhere to the end of the log, right? So this part of the disk is considered to be completely clean, free of any useful data, right? Any useful data, or potentially, we'll talk about, unuseful data is stored in the log, right? So I've got my log here, and I'm just appending here, and I'm just gonna append, append, append, append, look at that, look at those appends by me jiggling the laser pointer, that's pretty cool. Anyway, I'm gonna append, append, append, here to the end of the log, and that's kind of, that's how my log file structure file system is gonna work, but how do I actually do this, okay? So let's say that at some point previously, remember, everything on the file system is in the log somewhere, and it all got there because I appended it to the log at the last time that I modified it or wrote it, okay? So the current inode is in the log somewhere, right? But let's say it's cached in memory, and the current data block that I'm modifying is also in the log, right? So what happens when I actually do the modification? So remember, there's two things I need to change. I need to change the data block because I need to change the data that's in the data block and then I need to change the inode because I need to update some metadata about the file, all right? So where do I put the new data block? Anybody? Where do I put the new data block? At the end of the log, right? So there's my new data block, okay? And then I'm modifying something else. So where do I put the new inode? At the end of the log, okay? And I've got to update the inode to point to my new data block. So this is a file that only has one data block associated with it, right? So when I write the new inode, I can just write a pointer to this one data block and then I'm good, okay? Now what's the problem here? What's the last thing I potentially need to do here? Well, right, so now this is my most up-to-date copy of the inode and this is my new copy of the data block. So these two things are now stale, right? They're no longer needed. And so I need to free them, okay? And when I free things from the log, it creates these holes in the log that I'm gonna have to fight to work with later, okay? But at least conceptually, this is very nice, right? And this is essentially how it works. If I could've done a more complicated example, it would've taken me another hour. But you know, you can imagine if I had an inode that had pointers to multiple other data blocks that were in the log that I wasn't modifying, I would just write the inode and I would keep those pointers to those data blocks, right? Now if I'm doing reads from the log, right? I would have to find the inode, which we'll talk about in a sec, and then I would have to locate other things in the log. So it's possible that reads from a log structured file system can bounce around the disk, right? Because there's no way to know when I'm gonna read, right? And so the reads can happen anywhere in the log. But remember our assumption here. Why don't I care about reads skipping all over the disk? Because they're normally gonna be cached because my big, fat, big memory cache is gonna soak up most of my read traffic, right? And so if most of the traffic that's going to disk is writes and all my writes go to the append only log, then I'm minimizing my seeks, right? So most operations that go to disk are writes. Most writes don't seek very far at all, right? They're essentially just exploiting the natural geometry of the disk and continuing to append things to where the head's already up, okay? Questions about this, yeah. Well, so let's say this inode was in a directory somewhere. I don't need to change the directory file, right? Because the directory file contains the inode number, right? But the location is... Yes, yes, you're getting ahead of me, right? So, okay, so I just said this, right? So basically reads are, let me get to this. I'm coming there, it's the next thing, right? So reads are handled by the cache, writes I can stream to disk at full bandwidth, right? So now, because I'm limiting my seeks, I'm able to use the full bandwidth at the disk, right? And those bandwidths are going up and this is nice, right? So this is on some level this... On first blush, when you start to look at log structured file systems, you think this is brilliant, right? Like, wow, this is such a cool idea, okay? Oh, it goes downhill from there, it's so sad. All right, so let's talk about some things we need to fix, right? So, actually, let me get to the next thing, right? So you brought this up, right? And it's this question before. So in FFS and other types of file systems, how do I translate an INO number to a disk block, right? Remember, directories contain INO numbers. The INO number needs to be translated to the location of the INO on disk. So how did I do this in FFS and in the file systems we've talked about? Generic file systems we've talked about. How do they translate an INO number to a disk block? I'm gonna pick on this part of the room today. How did I do this? I have a number, right? I have the number four, and you need to tell me where is that data, where is that INO, where's the contents of that INO located? Anybody remember? I have an INO map, right? How do I find the INO map? I stored, yeah, so I stored in a well-known location, right? I've got somewhere on disk, when I format the disk, remember, I created all of the INO maps, and actually all of the INOs, right? So I knew exactly where those INOs are. There's just flat arrays of INOs located at specific points on the disk, okay? And this is a nice idea because it allows me to find the INO easily, but what problem have I just caused by my log-structured file system? Right, what happened to the INO? It moved, right? INOs weren't supposed to move, okay? So the FFS stored the INO map in a fixed location, right? For LFS, INOs are just appended to the log like anything else, and so the INOs can jump around. And this means that I need to potentially, what do you think LFS does about this? What do you think? I've got this really elegant solution, I'm just gonna write everything to the end of my append-only log, so where does the INO map go? At the end of the append-only log, right? When I change the INO location, I log a new INO map, and I just put it right there. Again, most of the operations of the INO map are reads, and so they're gonna hit the cache, but when I change, when I do what I just did, and I update an INOD, and I log it, then I just log the INO map at the end, right? So this again, this is kind of like, when you give somebody a hammer and they decide to take out some screws with it, right? Like at some point, like you, this is a great tool, and we can just keep, we'll just keep writing everything to the end of the log and figuring out how to sort out the details, right? Okay, so let me go back to this other thing. So remember, the other goal of LFS is not just to do writes in one place, but to try to stream as many writes to just together so I can maximize bandwidth, right? So when do writes actually happen? When do writes actually happen? So again, I can buffer writes in the cache, right? Can anyone guess when we actually write to the log? So there's one case where we have to write to the log, right? What case is that? Well, okay, so that's a good one. So sync, so if I sync the entire file system that I have to write out any dirty data in the cache to the head of the log, if I sync a file, then I need to write off that file's dirty data, or when I evict something from the buffer cache, like let's say my super big fat humango buffer cache doesn't soak up everything, right? And I need to evict something, then I have to write it at that, right? But again, I try to coalesce as many writes together so that I can write large portions of data to the desk all at once, okay? So okay, so we went through locating in iNodes, and then again, what about other file system metadata? I've got these iNode block bitmaps and things like that. Where do you guys think these go? There's a theme emerging at the end of the log, right? So you could just log this stuff too, right? Everything just goes to the end of the log. Anything that's written goes to the end of the log, okay. All right, so now I've got this great thing. I've got this brand new disk. I start logging from byte zero. I just start writing stuff. And then at some point, my log has traveled all the way across the disk. And at that point, I just say the file system's full, right? Like no more, there are no more bytes. Is that what happens? There's probably a lot of junk in the log that's dead, right? There's a lot of data blocks that I've written that are no longer valid, right? Remember the free space we created before when we updated the iNode in the data block? So there's a lot of holes in the log, right? So there's a lot of, there might be some valid data in the log, but there's a lot of invalid data in early parts of the log, okay? And so this starts to become the big question mark about this approach. Because now I've got this essentially what turns it to kind of a garbage collection memory management type problem, right? I've got this log and it's got a bunch of valid stuff and a bunch of invalid stuff. And I need to somehow sweep through the log which is usually called log cleaning to reclaim empty space and create kind of a new log, right? Now, conceptually, you can think of this as happening across the entire disk. The way log searcher file systems actually do this is they break the disk into segments, right? If it happened across the entire disk, well, what happened is that you would get to the end of the disk and then your computer would freeze for two minutes, right? While it compacted the log and then it would start up again, right? So if I break things into smaller pieces, I can be cleaning certain segments while I'm using another segment, right? So, but essentially, again, you can think of this as kind of another mini-me sort of scenario, right? I break up the entire log into little mini-logs and I'm using, you know, I start using a mini-log, to start using a mini-log, I want it to be basically clean and then when I finish writing across the whole thing, I run some sort of task to compact it and store that data somewhere else so that I can reuse that segment, okay? So let's look at what happens, right? So here's my segment and I've got data from two files, the red file and the green file and let's say that, you know, my log is essentially run out of the end of the segment, so this segment is currently not in use. There's some other segment on the disk that I'm currently doing my log writes to. So what do I need to do here? Well, essentially I need to sweep through this segment and I need to identify all of the live data and I need to compact it into this new clean segment, right? So I need a clean segment on disk and I run a process that essentially collects all of the, again, the live data, writes it, appends it to this, so I've created one kind of compacted log and now this segment is ready to be used again, right? So this segment is completely clean and I can start logging right here, right? And I can just log right across this segment. Questions about how this works, right? So again, what's the trade-off here? What's the trade-off that I'm making compared with traditional file systems, right? I've made something easier and then I've created a new problem. What's the thing that I made much, much easier? Rights, right? I've eliminated seeks for rights. Rights are essentially seekless. They all just get appended to the same point on the disk. What's the new problem that I've caused? Fragmentation and what does that mean I have to do periodically? I have to do this cleaning process, okay? And most of the debate about FFS, sorry, LFS, has been how well does this part work, right? Because it's clear that the right stuff is brilliant, right? The right stuff is a fantastic idea but the mess it leaves behind is not necessarily an easy thing to deal with, okay? All right, so again, LFS seems like a great idea and then it's kind of like not having any sort of organizational strategy for your room and just throwing things everywhere and that works great for a while and then it's like, oh, I gotta clean things up. I can't find anything anymore and then you spent a whole weekend wishing that you had some sort of organizational system in the first place, yeah. So I think the cleaning is usually done kind of almost as it's described here, right? So I have a new segment that I'm gonna compact things into and what I want is I wanna produce a completely clean segment, right? Now you can imagine, right, what could I do here? So I could use this segment and just start the log here so I could seek here and just start logging that way but I don't think that's what they do, right? So yeah, so there's actually a third segment here that's being used by LFS while this is happening, right? And so this requires kind of a free segment and then I clean into that and now I've created a new free segment, right? So if I keep doing this enough times, I can always maintain some group of clean segments, right? Question. Yeah, yeah, yeah, yeah. So that's actually, that's a great point. So there were, right, so there were versions of this system, right? So let me go back to the, right? So what's happened here, right? Well, the interesting thing about this is that I still have an old version of the file in the log, right? So you can imagine that if I had a lot of disk and I didn't mind not being able to clean the log or compact it, I can store multiple versions of the same file because if I have some idea of version in some way of storing versions and the file system understands versions, then what I've done here is that if I want a previous version of the file, I just go look at this I know, right? And so yeah, so this actually does, that you can use a log-structured approach to implement a versioning file. I don't know if that's how versioning file systems work, but yeah, it's a great observation. All right, cleaning, okay. So yeah, so and once you start to bring the cleaner into the picture, right? This is kind of where the debate over LFS starts to get heated, right? Because again, it's one of those things where you're making a trade-off, right? It's like I'm giving you this great write performance for a little while and then you've got to run the cleaner and so things can get terrible, right? Uh-oh. Got a problem with the slides, hold on a second. All right, any other questions on LFS while I'm repairing my slide deck? So let's talk about cleaning, right? So again, I've got this cleaner, right? And I know that at some point, if I don't clean, I'm gonna get to the end of the disk and I have this completely dirty disk and then again, then your machine's just gonna stall for like two minutes while the cleaner runs. So this is not good, right? I would also love to run the cleaner when the system is idle, right? So, you know, while you're in class, not paying attention to your laptop because you're just reveling in my stimulating lecture, that's a great time for your cleaner to be running, right? So if you guys had any outlaw structured file systems, I could tell if you were paying attention or not because I could listen for like the heads to be running and the clean need to be happy, right? However, if you're sitting in here, you know, writing email or whatever, then maybe the cleaner isn't gonna be running as much. Other question, so what about segment size, right? So I didn't tell you how large the segments were that I wanted to clean and there's a trade-off here, right? So if I use a large segment size, it means that I'm amortizing some of the read and write costs, because essentially what I could do is I can potentially read the whole segment into memory, do the cleaning in memory, and then stream out all the writes for live data all at once, right? So that can be nice. However, small segments can be nice too, because they increase the probability that all the blocks in the segment are dead, right? So depending on the file system activity, it's possible that you have a segment that has no live data in it at all. All it contains are old unused I-Nos, old unused data blocks, old unused metadata, right? And cleaning that segment's really easy. You just toss it, right? So this is a nice corner case and a place where you can do a really nice optimization of the cleaning process, right? If you can tell that there's no live data in a segment, you can just discard it, right? So this is it. So there is kind of a runtime trade-off here and I just have it. So what other effect does log cleaning have, particularly on performance for this system? Anybody wanna make a guess of why? And we've talked a little bit, again. I mean, this is one of those fun things that the file system community debated because there's a clear win, right? But what would you expect the log cleaning process to be really dependent on? What's that? Well, okay, but let's say that I have some idle time, but like when I go to clean the log, how hard it's going to be depends a lot on what? Your usage patterns, right? It's incredibly workload dependent, the log cleaning. And so this is one of those cases where, when people started to run tests and experiments on log-structured file systems, group A could be like, ooh, wow, 100% performance improvement and group B could be like 100% performance degradation, because they're using slightly different workloads and those workloads either cause the cleaner to blow up and consume a huge amount of bandwidth or they allow the cleaner to run extremely efficiently, right? And there's also, there's a huge amount of work in figuring out how to run the cleaner, et cetera, et cetera. And then again, you start to get into sort of standard garbage collection territory here, all right? And so again, so this made it ripe for a great debate within the file system community, okay? And then finally, I wanna point out one other thing. So here's another trade-off that we haven't mentioned so far with log-structured file systems, right? What about reads? So let's say, well, I forgot the cache for a minute. Let's say that the cache doesn't soak up as many reads as we might like. Where is file data located in a log-structured file system? It's located all over the place, right? I mean, it just really depends on when stuff was logged. Imagine I have a, yeah, I'm only here. Well, the reads in general can get really, really terrible, right? On some level, if I keep modifying bits of the file, the reads might be okay, but imagine the following scenario. I have a file that has two data blocks, it's 8K. When I create the file, I write some data into the first data block, and that data block is never changed. And then I continue to make updates to the second data block. So what's gonna happen in log-structured file system? Well, I'm gonna continue to append that second data block to the log, but what about the seek distance between those two data blocks? Where is that first data block in the log? It's right at the beginning, right? And then where's that second data block keeps getting farther and farther and farther and farther away, right? And if I'm doing reads, and those reads aren't being cached for some reason, then, you know, again. So one of the things that's tough about LFS is that the file systems that we've been talking about until now do a lot of work, partly because they are more structured to lay out files contiguously on disk, right? We talked about extents, we talked about block allocates, a little bit about block allocation, we talked about FFS. And LFS essentially just ignores that. It says reads will hit the cache. I don't care about reads, right? And I'm gonna optimize for writes, okay? All right, so this, and again, this kind of prompted maybe one of the more interesting back and forth between luminaries in the file system community, right? So again, 1991 is the original paper from Stanford on LFS by John Oosterhout and Metal Rosebloom. And then in 1993, the log structure file system was reimplemented for BSD by Margot Seltzer, who was a Berkeley graduate student now as a professor at Harvard. And she, along with some colleagues, wrote a paper that questioned some of the performance improvements, right? So again, I mean, there, so what happened, right? Well, FFS was evolving too, right? So FFS made some changes. And when they benchmarked their system against an enhanced version of FFS, it outperformed LFS, okay? Well, the Stanford guys weren't just gonna take this lying down, right? So John Oosterhout wrote and basically wrote this complaint where he pointed out he claimed that they had done a bad job of implementing LFS. He claimed John did a bad job of choosing benchmarks, which again, we identified as something that's critical because of how benchmark dependent the cleaner overhead is. And finally, poor analysis, that's my favorite part. Like, and you're done too. You couldn't implement it, you couldn't test it, and then even the fact that you couldn't test it didn't prevent you from messing up, interpreting the results. Okay, so now, okay, two years later, there's another paper looking at LFS performance published by the same Margot Seltzer, again questioning the LFS performance claims, right? So, and I won't read this whole quote, but essentially it says when LFS is tuned for writing it's large file write performance is approximately 15% better than LFS, read is 25% worse. But when it's optimized for reading, it's large file read and write performance is comparable to LFS, right? So again, I mean, this stuff gets very nuanced, right? Large file performance, small file performance, and any time that somebody made a claim here, the other person would come back and say, oh, you know you're wrong. So again, Oosterhout described the 95 analysis improved, but still thinks that it was misleading, right? And at some level, this was kind of a fun back and forth between people in this community, right? So this is Margot Seltzer. Does anyone know what else Margot Seltzer did? To you. What's that? She's done a lot of work on databases. She taught this class. What else is she responsible for? Yeah, the OS 161 system that is your own little personal torture chamber. She looks like such a nice person, right? She is a very nice person. She's married to Keith Bostock. Yeah, another really well-known BSD developer. And she taught me this class. So, all right, so we're done with file systems, right? We are finished, file system unit is over. Yeah, right, right, so you're right. When we do cleaning, we do have a chance to adjust the layout of files on desk, right? Because on some of them, when we start to clean things, we see more at once, right? When we're doing writes, we're just seeing one block at a time, so when we did a read, we could, sorry, when we did a clean, we could correct that situation where this is huge span between these two blocks that are next to each other in the file, assuming they're in the same segment, right? If parts of the file are now located in multiple segments, you know, I'm not sure exactly what they did here, right? But yeah, so when I clean, I have some chances to adjust that. Okay, so next week's gonna be kind of a grab bag week. We're gonna do one lecture on operating system structure, which is something that's normally covered a little bit earlier, but I think it's nice that we've waited because you guys will have a little more context for this. So we're gonna talk about monolithic kernels, we're gonna talk about multi kernels, we're gonna talk about micro kernels, we're gonna talk about exo kernels, and maybe I'll make up some sort of name for a new kernel and see if you guys notice. And then I think what I'm gonna do next week on Wednesday and Friday is talk about performance. I don't think, I have to remember, I have to look at assignment three again. I don't think, there used to be a performance component to assignment three, I'm not sure it's gonna be part of the assignment this year, but I would like to introduce you to some guys, to some ideas about sort of how to do performance analysis, how to do benchmarking, and how to make sure that you're working on the right performance problem, because that is the key thing that people miss and it's something that will be useful for you guys to take with you from this class to whatever you end up doing next. So anyway, have a great weekend. If you're still finding assignment two, good luck, and I'll see you Monday.