 Just want to see how many people it's submitted to Simon 3.2. So we have 18 submissions so far. It's pretty good. Obviously, people that are submitting because they're done, but the median score is a perfect score, so that's good. All right, so today we're just going to review journaling for a minute and then talk about the Berkeley Fast File System, which is admittedly probably the oldest, cruftiest thing that we're going to talk about this semester, but fun. And something, a file system design that set a lot of the standard for later file systems and established a lot of the features that later file systems were expected to provide. All right, so one week left to do assignment 3.2. So please get going. I think people are kind of moving along, but yeah. And then 3.3 comes right on the heels of 3.2. So please keep making progress. Unfortunately, I didn't have time today to make the edits of the solution set that I wanted to, that we need before we can hand out the midterm. So we will start returning those on Monday. They are graded, and they will be the midterm grade that you get today that I'll assign this evening at some point will reflect your score on the midterm, as well as your score on the assignments up till this point. But it will not include, we don't have the midterm papers. Well, we're not ready to give the midterm papers back. So anyway, hey guys. I'm talking. What's up? No, UB learns. Sorry, I don't. Yeah, no, no thanks. Is that bad? Am I a bad person for doing that? Like I said, I'm trying to be nicer, but I have my limits. UB learns maybe outside of that bounds. Yeah, well, let's put it this way too. I haven't used UB learns up till this point, and I'm not going to learn now. There's really no reason to learn UB learns at this point. So no, but they're on the paper, so you'll get it. Any questions about the process for midterm return? So again, I understand that you guys, you're concerned about your grades and stuff like that. But there is nothing that drives me more insane than people coming in and spending half an hour trying to get one point back on a problem. Let's just be real with each other. Your grade is plus or minus epsilon. Your grade is a distribution. If we graded your paper 10 more times, a couple of the times, it would be a little bit higher, and a couple of times, it would be a little bit lower. So there's some distribution where your grade is. This is a human process. This is not test 161. Sadly, if I could automate this, we would have. They're humans in the loop. They read things. They respond to things a little bit differently. The people that graded these papers are very good at it. But again, there is nothing that will cause me to be frustrated with students more than if you spend a lot of time trying to get like one or two points back on the midterm. Because trust me, if I grade your paper again, I'll find some other points that you got that you weren't supposed to get. So this just ends up all working out in the wash. Again, if you think that we've made a big, serious mistake on a problem that cost you all the points or a significant amount of credit, fine, let's talk about it. But please don't come and spend lots of staff time trying to get like one point back. Again, just find another question and think, well, maybe I got a point there that I didn't deserve. And just kind of feel like everything works out. And the universe has a plan, and it's all going to be OK. Any questions about that? It's not even a question of I'm not even claiming the grade is right. Like I said, there's this distribution. It's like quantum physics. Your grade is like in somewhere. And I can put it into a bound, and I'd say with high probability, it's within this range. But I can't localize it precisely. That said, it's just a question of how we allocate resources among the staff. It would be much, much better to take that TA and have them spend half an hour helping you with assignment 3.2 or 3.3. You get a lot more out of that than quibbling over like one or two points out of the midterm. All right, cool. I also just sort of get protective of my core staff. They spend a lot of time grading these exams. And they're good at it. The real trick is to keep guru away from grading. Because guru is like the best grader that I have. But there's two problems with that. First of all, he takes forever. And his scores are so low. So every time guru grades something, I'm like, just guru, just add, give him a couple of free points. That way people won't be as mad. But he's like, no, no, no. They really didn't get that right. I've read that sentence five times. I'm like, yeah, OK. Anyway, so we don't like guru do grade anything. OK, let's just wrap up journaling again. Because this is a powerful idea. I want to make sure that it gets embedded in your brains here. So we talked about how we could alter how the cash writes data out to try to improve consistency, to minimize periods of time where there's a memory, the buffer cache, and the contents of disk are out of sync. The other approach we went over last time was journaling. And again, I think maybe I didn't do a great job of trying to explain why this works. So modifications to the disk, even something as simple as just adding a little bit of data to a file, usually require changing multiple data structures on disk. Those data structures are stored in separate blocks, typically, and so that modification ends up becoming sort of by definition not atomic. It requires a bunch of independent changes to the disk. So these operations are not atomic. Now on some level, when we do journaling, we're actually duplicating information. We're making a trade-off where we're saying, OK, I'm going to take a small amount of space on disk for this journal. But in many ways, the journal entries duplicate information that's going to be written to other parts of the disk. So for example, if I make some changes to the inode, if I say, OK, I'm going to allocate these data blocks to a particular file. I'm going to link them to the inode. I'm making changes to those inode data structures. And then I'm also writing down the same thing in the journal. So there's duplicated data on disk. The difference with the journal is that the journal is more compact. And so I can write these single journal entries with one modification to the disk. And therefore, I know that they're there. And then the other changes that are required to these on-disk data structures that end up touching a bunch of different blocks can make their way to disk as needed. OK. So how did we do this again? So these file systems have a file called the journal or multiple files. You might have seen, if you've ever rebooted your system, you might have seen there's a line during boot where an ESC4 file system should say, it's reading its journal from the last time it ran. If it shut down cleanly, there's probably nothing in there. But if it didn't, there might be stuff in there that it needs to redo. Has anyone ever seen that line before? It's there. You guys should. This should be a fun thing. Maybe for the final exam, what I'll do is I'll just print off all the Linux boot messages and give them to you. And then ask you to write an essay about them, like what's happening. Because it's kind of interesting stuff. I mean, at some point, you should sit down with those and kind of go through them and be like, OK, can I figure out what's going on? Has anyone ever been on a plane where the entertainment system rebooted? And you got to see all the, yeah, there's some airline entertainment system that uses Linux. And sometimes they'll reboot it. And you get to see all the terrible boot messages. I always love watching other passengers because they're looking at it like, am I in the matrix? Like, I saw that cat twice. I don't know, whatever. OK, so after a failure, I use the journal to get myself quickly back into a consistent state. And what I do is I go through the journal in order and I compare the contents of the journal with the contents of disk. And in places where things are out of sync and there are inconsistencies, I bring the file system onto data structures back up to date using the journal. And once I'm done, I know that the file system is in a consistent state. This avoids having to do these big file system-wide checks. So here's an example of creating a new file. So what are some of the changes I had to do? This is also a good way to review the data structure changes that are required to do something like create a file because they all have to be encapsulated in the journal. So I need an inode. So I need to allocate an inode. I need to know which inode it is. I need to allocate some data blocks from the disk, make sure that they are marked as in use, and allocated sort of linked to this particular inode. I probably need to keep track of what order they're in. Maybe that's implicit in this entry. I need to link the inode into a directory so that it has a name as part of the file system. And that's it. So these are the steps I need to take in order to create a file. So I write that down in my journal. And then over time, what's going to happen is some stuff is going to make its way to disk. I wish I had a better slide for this. So at some point, maybe inode 567 is in my cache, but at some point it gets a victor from the cache it gets written out. And so now the inode table and the contents of inode 567 are on sync on disk. And so I can mark that off in my journal that that was done. At some point, I get around updating the data structures for this particular file to associate those data blocks with it. That gets done. That gets marked off. So what's happening is over time, depending on the caching policies and cache behavior, parts of my journal entries are making their way to disk. And over time, the entire journal entry makes its way to disk. And once that happens, this journal entry gets removed. And I have this checkpoint that I'm moving forward. And everything before the checkpoint is finished. And everything after the checkpoint is either not on disk or partially on disk and partially not on disk. So as entries are finished, I move the checkpoint forward. Does that make sense? OK. So now let's say that my file system crashed. And the system crashed. And I'm bringing the file system back up to date. What do I do? How do I use the journal to do this? So where do I start the checkpoint? I start the checkpoint because I know, by construction, that everything before the checkpoint is done. That's all on disk. I'm not assuming there's been any disk corruption here. In the case of disk corruption, who knows? All bets are off. I mean, if I start modifying stuff on disk, there's really no file system that can recover from that. Stuff might go wrong. Stuff might be bad. But what I'm trying to get away with here is not having to write back stuff from my cache so aggressively. So what I'm assuming is there was maybe some stuff in memory that was lost. So I started the checkpoint because I know that everything before that is up to date. And I work forward from the checkpoint. And what I do is I look at my checkpoint entries and I compare the contents of the checkpoint with the contents of disk. And anything that has already been done, I can skip anything that isn't fully done. So what would happen here? Essentially, I would say, OK, data blocks 5, 8, and 97 were supposed to be the first three blocks of the file with inode 5, 6, 7. I would go and I would read that file and make sure that indeed these data blocks were linked to it. What else would I check here? There's another consistency check I can do at this point. What else should be true about blocks 5, 8, and 98? They should be marked as allocated. So I can check my block bitmaps to make sure that those blocks are marked as allocated. So did this, done. And now I make sure that the file is in the directory that it's supposed to be. And once I'm done, I can move forward. So I essentially do this from the last checkpoint. And this is nice because it's fast. So on some level, the journaling approach means that the amount of work. So roughly, if I want to do a full file system check, remember, I can walk the entire file system and do a lot of integrity checking along the way. That said, roughly how does that scale? How much work do I have to do to do a full file system check? O of n, where n is what? Roughly the size of what? Probably the size and probably the amount of space that the file system has allocated. Because essentially, I'm going to walk the directory tree and use that to check all my entries. I might need to check things both ways. So for example, I can use the directory tree to figure out every i node that's in use. Every i node that is in use is linked into the directory tree somewhere. But then what I want to do is I want to take that set and cross check it against the set of i nodes that are marked in my bitmaps to make sure they're the same. But generally, it's probably the amount of space that the file system has allocated. For journaling, what's your question? It starts with a checkpoint. Yeah, so remember the check. The idea with the, I mean, on some level, you can think of, once I checkpoint, I delete all the entries before the checkpoint. I mean, usually I think what happens is this is a circular buffer. So I'm checkpointing this direction and I'm adding stuff to the other direction. But in general, once an entry is checkpointed, it's done. I'm finished with it. And I assume that the Andes data structures are up to date. So for journaling, how long does it take to check the file system? Or at least to process the journal, roughly, what is n here? See, you guys didn't think we were going to do big O notation in this class. Now, I can mark this down on my ABET checklist. Yeah, the number of incomplete journal entries, which is on some level function of the cache size and some other things. It's a little complicated, but it's a lot smaller than the entire file system. I mean, I might have a multi-terabyte disk and only be using a few gigabytes for caching. And even if I'm caching things, remember, there's a lot of things that can cause the cache to be written back. And so the amount of entries in my journal that are uncompleted is typically a lot smaller. So journaling is an approach that's used by a variety of different systems that have this type of challenge, where they have memory that they're using the cache stuff, but they need to maintain on-disk data structures and keep them consistent. What's an example of another system like that? That might be something you might use one day or are already using. Git? I don't think it does that. That's a great question. Remember, Git doesn't run long running processes. You can crub Git's data structures, but Git usually runs, makes them days to days, so it stops, right? The trick here is I have a long running memory cache that I want to make sure is preserved. So what's an example of a system like this that might have the same issue? Where I'm making changes to things, have stuff in memory, want to make sure that eventually those changes get written out. Just start yelling out other computer systems. Yeah, so that's the thing. Web servers don't typically push changes to pages, right? A web server kind of assumes that the pages I need are on-disk and I'm just serving them to the client. The web server crashes, the cache. So a web server cache is entirely for performance, right? But what's a system that both provides access to and also modifies data that's not a file system? How about a database? You've heard of those, right? Yeah, so yeah, that's another example of a system complex on-disk data structures, a large amount of need to use memory to improve performance and the same kind of problem, right? So this is not an idea that's confined to files. Okay. Now, there's some corner cases here. So for example, if I didn't finish writing a journal entry, I can't process that journal entry. It's incomplete. So you can think of a journal entry as like a transaction. If I didn't finish writing now what the transaction was supposed to be, I just discard these. So if I get, if somehow I'm really unlucky and I crash in the middle of writing a journal entry, maybe the journal entry is too big to fit on one disk block, then I can just ignore those, right? And like I said, the goal here is it is impossible to completely eliminate data loss. We're just trying to eliminate as much as possible, let the cache do good work for us and allow ourselves to recover quickly. Okay. So nice thing about this is a lot of metadata updates in the journal can be actually represented on a page small enough that they can fit into a single disk write. So that's pretty cool. So a lot of our metadata, we don't worry about this. Now, here's an interesting question. So I mentioned before that we write things, journaling ends up making us kind of write things twice. So we write them once to the journal and then we write them once to disk. You write them journal right away and then disk later. What about data? What's the trade off here? So I can write data to the journal, right? I can just journal modifications to data blocks on disk and then I can process those updates the same way I do everything else. What's the problem with that? Yeah, data, like there's a lot, most of the rights of the disk are actual data you hope. And so the journal can actually fill up with a lot of data and it makes the journal a lot larger. So I can include them in the journal and then I have to write every data block twice. So this also increases the amount of write traffic to the disk, which is not great. Or I can lose them. I can just say to myself, you know what? If I crash, there might be a little bit of data loss. Remember file data loss, which is a, you know, again, this may be weird to you, but the file system sort of thinks of your data as a second class citizen, right? Its data structures are way more important. And on some level that makes a lot of sense because if its data structure is corrupted, you can lose all of your data, right? Much better to lose a few rights to a few files here and there and let the file system work. Okay. Any questions about this stuff before we start talking about specific files inside? Yeah. So what problem would this help me with? Yes. Right? No, but it's the modifications that I'm talking about, right? So for example, let's say you're in the middle of appending to a file, right? So there's these metadata changes that'll require, but then there's the data that's actually gonna hit the file. So if that data is in the cache, right? Which it might be anyway in your system, it might be in the cache on its way to write this modify copy, then it's just lost, right? I still have the file, right? Prior to your changes that you were about to make, right? So the file doesn't go away. I have an older version of the file. I just don't, I may not have the latest up-to-date version. Does that make sense? I mean, again, if you think about it, at some level there has to be a place where data loss can occur, right? I mean the app could be, I've been about to write to the file and the system crashed, so who knows, right? I mean, when you save something, or particularly on like complex systems, like Eclipse or whatever, I mean, it boggles my mind to think of what the data flow path is from like save on like some complicated IDE to disk box speed written, right? There are many, many, many layers there that you have to get through. So we're only, the file system only cares once you get to the file system. Anything that happens above it, not my problem, right? Okay, so let's talk about FFS, the Berkeley Fast File System. So this was first included, and so for those of you that don't know your UNIX history, I don't really either, but at some point in the close to the dawn of time, I'm sure none of you guys, no one was alive in 1982 other than me, right? I mean, okay. Yeah, 1982, UNIX development is going on. A fair amount of that is going on at Berkeley, Berkeley did and continues to lend its name to a flavor of UNIX distributions. So I think now there's free BSD, net BSD, maybe another BSD, these split and some of them went off in their own ways and stuff like that. Part of the reason for this was that there was a major variant of UNIX that was underdeveloped by IBM, and IBM was not releasing sources to it, and so it was difficult for people to hack out. So I need a file system. The first, you know, very, very early release, I shouldn't say the first, early releases of BSD started to include something called the fast file system. Like I said, there are features of the fast file system that are gonna seem completely bizarre and totally lost to the sands of time, and then there are other features that have essentially sort of become part of the de facto standard for how UNIX file systems have to work. So it's kind of Kirk McCusick still working on FFS. So this is now like as far as I know, as far as a couple of years ago. So, you know, this is now a, you know, 30 plus 35 year software development project for Kirk. So that must be kind of fun. I don't know, that's a great question. I should ask him like, does it get boring after a while, or is it still cool? I mean, FFS is still an active development. So he's still like, there's still stuff that he and other people are building, but early variants were for Kirk. Oh, sorry, so it changed ages at some point. So that's called the UFS, UNIX file system. Has anyone ever mounted a UFS system? I don't know, I don't know. It must be used by some of the BSD variants. It's not, UNIX has its own history of file system development, uses some different file systems. Okay, so like I said, there are lasting contributions that FFS made to file system design. And then there are the less lasting contributions, and they're sort of fun to talk about, right? The fast part of the fast file system, at least in early versions, had a huge amount to do. So file system design has always been very tied to the features and the characteristics of the underlying hardware. I mean, file systems in many ways are uniquely tied to the features of one specific device, and that is a disk. In 1982, we're talking about spinning disk drives, these old things with sectors and platters and all sorts of things like that. So, and keep in mind, at this point, the performance, there were big performance gains to be made by thinking about disk geometry. So by disk geometry, we start to think about where is data on disk? On a, I'm sure this is true of a flash drive too. I mean, I think if you had a flash drive, you could point to where the data was, right? I mean, it's on one of the flash chips somewhere. But on spinning disks, there's a location on the disk. Like that is where your data is. So if I take that disk and I wave a magnet over that little part, that part of the disk is gone. Maybe the file system can recover. Maybe not, but there is a locality to data. So what are some of the ways that we might want to exploit the rotational features of a disk drive? Disk geometry. Somebody actually brought up an interesting one the other day, which is still true for spinning disks. Again, disk spin. They're platters. They've got an inside and an outside. They've got multiple platters. One set of heads. What might I want to do? Yeah, so it's kind of random, right? As the disk is, this is one of those things about disks you're just like, thank God this went away, right? But it hasn't really for newer disks. But when the disk spins, the linear speed of the outside edge is higher than the inside edge. It's the two parts of the disk are doing the same number of revolutions per second. But the outside is much longer, right? The inside is shorter. So the outside actually has farther to travel. So that means that I can actually read data, assuming I can keep up with the disk, which wasn't always the case in the FFS era. We'll come back to this. This is one of the more awesome FFS features. Awesome slash. But the outside of the disk moves fast. So if I can keep up with the speed, I can read data faster from the outside of the disk drive, right? It's just physics. So where do a lot of file systems start to put data when you boot them up? At the outside, I mean, if the disk is mostly empty, the best place to put data is the outside. As the disk fills, and I have to use these inside parts, maybe this means your disk gets a little, starts to feel slower as time goes on. It might be better to do the opposite, right? Start in the, start, give people realistic expectations and then things get faster over time, you know? Anyway, but I mean, most of us, how many people feel like their computer gets slower over time just on its own without any help? Yeah, it's weird. It's probably not true, actually. It's probably just totally psychological. Anyway, so outside moves faster than the inside. What else? That's not it. I mean, we could talk about this for half an hour, right? What other disk geometry features? Remember, we've got multiple platters, one set of heads. What's that? So I've got multiple heads. So remember that the arm has heads on it and those heads are simultaneously reading from all platters, right? This is, oh no, no, no, we're not gonna modify the disk drive itself, right? We're stuck with disk drives. Now, multiple head drives, you can't say no one hasn't tried it, right? That's the thing. I'm sure you could look on, you know, you could look on Google and you could find me a multi-head drive, right? But most of the ones today I think have single heads. Yeah. Well, you're warmer. I mean, the size is sort of irrelevant, but the fact that the heads are all in the same place at the same time means what? I mean, what's slow about reading from the spinning disk? Moving the heads. So let's say I have a single file. Where should I put the single file? File contents are very likely to be accessed all at the same time. This isn't necessarily true for every kind of file, but it's true for a lot of files. So where should I put all those file contents? Yeah. Imagine that I start putting the file on the top of one platter. I use one track. And then I use the bottom of that platter. And then I use the next platter down and the next platter down. And I keep doing that because now I have a file that I can read in its entirety without moving the heads, which is awesome. That's what I want. So this stuff, again, this stuff is wild. What about I-Node type data structures? So I-Nodes are an interesting case. The other thing you have to keep in mind about 1982 is that memory sizes were quite small in early computers. So this idea of just building a huge cache and memory so that I could cache the entire file system, not happening. Yeah. Yeah? So I could put the I-Nodes somewhere in the middle to minimize the seek distance between every file and every I-Node. I could also maybe try to put them in the outside because they're faster to read there. It's sort of trade-offs. Data blocks, talked about this, port-to-put-related files. So FFS, we're going to come back to some of the geometry stuff, right? But FFS also made some sort of standardize certain things. FFS included these larger blocks for reading and writing from the file, sort of the standard 4k block size that comes from FFS. FFS included a way to... So remember, one of the things we want to do frequently is find data blocks or find blocks that I can use for some sort of file system data structure that are close to each other on disk because stuff that's close on disk is going to be easier to read at the same time, read or write at the same time, smaller seeks. Earlier file systems apparently had no way to do this very well. So for example, if I just build up a free list of blocks on disk over time, that free list, if I'm not careful, can become totally unordered. And so earlier file systems, if you said, hey, I need four free blocks, they would give you a block over there, there, there, there, right? I mean, they had no way to allocate blocks intelligently to find blocks that were close to each other, right? And so FFS included an ordered free list to make this possible. So again, this is someone who was studied the features of this device very carefully and built a software system that responded very well to the specific features of disks at that time. Yeah, so FFS added symbolic links, yay. File locking is a feature, unrestricted file name links. I mean, it boggles my mind that this is still a problem that we're living with today, right? Like I have seen some very, very bad UBIT names, right? Does anyone wanna reveal that they have a bad one and what it is? Who has one that they don't think is very good? What is it? I thought it was gonna be like 666, you know? But it was gonna have a little bit of a number. Yeah, I know, but someone else has that one, right? Yeah, okay, who? Yeah, oh yeah. Well, I've seen ones that made something terrible in my language. Yeah, anyway, so, but why, like, why? This is so stupid, right? I mean, the fact that we're still dealing with, I don't know, whatever, anyway. Like eight characters is, I don't know. Someone's responsible for that, by the way. Human being made that decision, right? If we could find them, we could punish them. Anyway, but early file systems had these types of restrictions. It was like, oh, sorry, you can't name a file with more than eight characters in it, which is sort of dumb. So, FFS did away with those. So we know these two enemies of things on disks. So, seeking times are bad. Rotational delay is also a problem, but sort of a minor problem. But again, at FFS-era disks, I suspect that actually rotational delay was more of a concern than it is now. Not necessarily on par with seek times, but as we packed more and more data onto a disk, I think the seek times have gotten probably not improved very much, if not gotten a little bit slower, because the target that I'm aiming for is so much smaller. So at this point in time, the tracks are so close together on disks that in order to get to one, I actually have to seek all the way across the disks and center that head on a very, very, very, very narrow little track. So that's gotten tough. So FFS introduced this idea of cylinder groups, and this idea persists till this day. So a cylinder group is all the data that can be read without moving the head very far, right? It's not really without moving the head at all, because that's a very small amount of data, but it's all, you know, imagine I break the platter up into like eight disks that are sort of nested inside each other. Each one of those is a cylinder group. So it's all the data I can read from the disk, without having to move the heads very far. And that's on all the platters, top to bottom. Now, what FFS did, which is sort of clever, I remember we talked about, where do I put inodes? To make sure that they're close to data blocks. I could put them in the middle. What FFS did is that every cylinder group has a backup copy of the file system super block. That's that data structure that contains the location of all the inode maps, the location of all the free block bit maps. Like, this is sort of a data structure that's incredibly important for the file system to function properly. Why would I put backup copies on every cylinder group? Yeah. Yeah, if I fail, if part of the disk goes bad, it's more likely that I can read this. And this is still true. File system still stash backup copies of these data structures. Really important data structures. The thing is they don't consume very much space. And so it doesn't, you don't lose a lot of space from your disk if I make 16 copies of the super block and put them in well-known locations just in case the sector or the block that the super block lives on goes bad. I don't wanna lose the whole disk. So each cylinder has its own header. That header has super block-like information in it. It stores for the cylinder group how many inodes are available, how many data blocks are available, things like this. It has data structures for maintaining lists of available data blocks. Number of inodes for cylinder group and data blocks. So each cylinder group is kind of like what? So it's almost like I've broken this file system into little pieces. And each piece is sort of like its own what? I mean, what other thing has these features? So every cylinder group has a super block, inodes and data blocks. This starts to sound like a what? Like its own little file system, right? I don't know why people are still laughing at this slide, right? You guys even seen this movie? Okay, I haven't actually. So anyway, maybe I never thought it's funny, but it always gets a laugh. So yeah, so every super block is like its own little file system. Now, do you remember when we printed off the statistics for EXT4? Remember what we saw? We saw all these groups. So EXT4 calls these groups, but they're essentially the same thing. Every group has its own block bitmap. Every group has its own Inode bitmap. Every group has data blocks and Inodes that it allocates. And the goal is to allocate a file entirely from a specific group, which guarantees or ensures that all of that file's data structures are relatively close to each other on disk. This is sort of how we solve the problem of trying to keep Inodes and data blocks close to each other. Break up the disk into these little parts, treat them all sort of independently from each other. Yeah, so the FFS super block at the time contained information that, so FFS would measure stuff about the disk. I think during the, maybe when you form-added it or some other time, it would run these tests to ascertain information about the disk, and it would store that information in the super block and use it to do wild stuff, okay? So here's an example. Okay, and this was true for a period of time. Remember when we talked about where to put things? We were making the assumption that the file system could actually return data over the bus to the rest of the system fast enough to read data at full speed off of the disk. What if this isn't true? What if my bus, because remember this is like, I don't know what sort of bus they were using at the time. But the actual data bus that connected the disk to the rest of the system wasn't fast enough to keep up with a read. So I actually can't read every block off the disk in order. At some point my buffer gets full and I have to slow down. Now, what happens if I put everything next to each other? Let's say I start reading a large file and I get four blocks in and the disk is like, oh no, my memory cache is full and it's taking so long to send this data back to the rest of the system. What do I have to do to read block five? All the way around, right? Let the disk do a full rotation. Now at that point hopefully the cache is drained and I can pick up block five. So how do I get around this problem? Yeah, yeah, that's right. I put like one skip, one skip, one skip, one skip, or one skip, skip, one skip, skip. Now I can use those other blocks for other stuff, right? Like I can have another file that's interleaved there. So I could have file A, B, C and I put A block, B block, C block, A block, B block, C block, A block, B block, C block, right? Yeah, so that's pretty nasty, right? But that's what FFS would do, you know? Cause it was trying, right? It was like, look, I've got this disk. You told me to make this disk fast. I'm trying to run that F in my name. That's not for file system. So yeah, so this sort of stuff really, these sort of features were kind of, like I said, lost to the sands of time. At this point most disks can transfer data back to the rest of the system. In some cases, faster than they can read an off disk and that's so that you can let them have a cache locally. So yeah, yeah, this is one of those things that was like, it was pretty cool that they managed to do this. And I suspect when Kirk took out this code, he was like, thank God, you know, those days are over. The future has turned out to be kind of okay because I don't have to do this sort of rotational planning anymore. So this is interesting. So this sort of stuff, right? I mean, this is a question of, we're studying now something that's 35 years old, almost as old as I am, does this stuff matter anymore? And if not, is that not a good thing? So one interesting thing that's happened in the file system world that sort of mirrors what's happening in other parts of computer systems. So when we talk about computer systems, particularly operating systems and other computer systems interact very closely with hardware. That's the same thing for network protocol stacks. It's the same thing for databases on some level because databases have to achieve good performance where they have to understand things about the file system and how the file system works. There's always this tension between trying to make hardware better and trying to make software more intelligent. So, and I work closely with people that do hardware stuff, right? And so we have this argument all the time, right? Which is kind of, you know, the hardware people are like, we're really clever, we're gonna make hardware better and more, we're gonna make hardware better and we're gonna make hardware more clever. And software people are like, no, no, no, don't do that. Particularly don't do that if it obscures features about the hardware that I might want to know. So example of this, modern disk drives present this linear namespace, like I said, like a linear array of blocks. On some level, file systems are allowed to assume and do assume that blocks that are contiguous to each other in the array are close to each other on disk because that's normally kind of true, right? There's only one case where it becomes not true which is that apparently when they make disk drives, spinning disk drives, and they test them initially, a bunch of the blocks on the drive just don't work. You know, this is like a magnetic, you know, disk. So when they made it, there were just, you know, the little process variation, there were a couple of the blocks on the disk that I can't reliably read and write data to. So what do I do? So I can mark them as bad, but newer drives are a little more clever than that, okay? So let's say that block five is bad. What do I do? So I can mark it as bad, but let's say I want to preserve this linear namespace, what can I do? Yeah, great idea, right? So the disk has its own little mapping table where it's like, I know you said you wanted block five, but my block five doesn't work so good. So I'm gonna give you block 72,842, right? And this works great, except for one problem. Where is block 72,892? Not close to block four, right? And so this feature, right, which on some level is clever, you know, on some level, the hardware people are like, oh, I'm gonna do this cool thing and it's gonna make it, they're trying to make our lives easier. That's what they keep saying. But, you know, again, disks that make these sort of assumptions can have big performance problems, particularly if I put like an Inotable there, right? So the disk is like, why is this happening? I keep like reading this Inotable, it takes like 10 times longer than every other Inotable on disk, why? And it's because the disk is like doing this huge seek out to these reserved sectors that they put way at the edge of the disk, doesn't case any, right? So this kind of stuff drives software people nuts. Yeah, anyway, this is what I just talked about, right? So as disks move things around, this sort of thing becomes really hard. One of the interesting things that happened over time is that, you know, we just saw an example of file system designs that are really, really tied to this idea that file systems know where disk blocks are and that they can make assumptions about things being, like for example, if I read block N and block M, the smaller the distance between them, the faster that's going to go. And a lot of file system designs are sort of based on this. I had a colleague in graduate school who explored what happens if that assumption totally fails. So certain types of systems, there is no guarantee about that, right? They're just random, you know? The performance of any two blocks is random. And it turns out that some of the locality-based planning that some of the file systems try to do actually makes things slower if you completely undermine those locality assumptions. All right. Any questions about this? We're almost done. I think I'm almost done with my slides. Just gonna go through this. Yeah, so, yeah, so FFS is still, like I said, an interactive development. So, you know, we're still working on figuring out the right block size. Still, oh, this was a cute thing. So now, one of the things that FFS and other file systems have started to do is in certain types of inodes, if the file is really small, I can actually jam all the file content into the inode itself. So I make the inodes a little bit bigger. And for directories now, I can actually, reading the inode and reading the directory, assuming the directory is small, is the same operation. So that's kind of cool. UFS introduced some really interesting ideas about consistency, including something called soft updates, which I don't fully understand. I would have to write a lecture probably to explain. Okay, so Monday, we're gonna talk about log structured file systems. So FFS is like this classic file system design, just like right down the middle, really well done, really well engineered. Log structured file systems, on the other hand, are this wacko idea that people spent 20 years deciding whether or not it was a good idea and people got into arguments over and people got angry about. But it's cool, it's a neat observation. And then we'll talk a little bit about how to read a research paper because on Wednesday next week we'll talk about RAID after you guys read that paper. I will post the paper information early next week and we will start distributing midterms in office hours on Monday. Have a great weekend, enjoy the weather. I will see you guys on Monday.