 It's hard to do anything after listening to Lana Del Rey. I just want to go lie down. All right, but we're going to have class instead. So today we're going to move on to the third unit in the class and start talking about stable storage. This is part of the class where we'll spend most of our time talking about file systems. But today we're going to talk specifically about disks, the actual device that file systems are built to operate. And that's because, unlike the CPU in memory, there are a lot of properties of disks that are things that we need to be careful of thinking about when we design file systems. And we have one of the classes with some fun videos, very cool YouTube videos. All right, so assignment three, DesignDoc and code reading are due Friday at 5. I think everybody knows this. Please don't leave DesignDoc points on the table. I mean, you may or may not want to implement assignment three. That's sort of a personal decision. But there are points for just pretending that you are going to implement assignment three and writing up two pages about how you plan to do it. And that's a generous number of points because we're trying to help you guys out. So don't not do that. Don't leave those points undone. Bring your DesignDocs in tomorrow and Friday and the TAs will be happy to look over them. They know what the rubric is. And you guys have seen the rubric. If you turn it into DesignDoc for assignment two, you've seen the rubric for assignment two. So you know the type of things that we're looking for. Yeah, Kaila? There are five or six questions right before you upload the assignment. I don't know. I think that probably is like a subset of the rubric. Because you get points for not having more than two pages, for example. But yeah, that's a good question. Post it on Piazza and we'll look. But the rubric's not intended to be a secret. But we don't show it to you until we grade partly just because we want you guys to think about how to do it. If we give you the rubric, then what we get are people who have basically just written solutions to the rubric rather than solutions to the overall design. OK. Any questions on virtual memory before we plunge onward? I decided to just skip over the part on copy on right that was at the end of last Monday's lecture. You guys are welcome to go check that out on your own time. It's on the notes. Any questions about virtual memory, memory management before we go onward? OK. So today, let's start talking about disks. And the first pretty important piece of terminology that we are going to use when we start talking about disks is, well, let's introduce a couple of different terms. So stable storage is storage that does not lose its contents when the machine is powered off. This is to distinguish it from what kind of storage? Memory, unstable storage. Memory registers anything that's going to lose contents when the machine is powered off. At this point in human history, we have several different types of stable storage available to us, several different types that we'll talk about in this class. And there are other kinds, obviously. But here are the ones that we care about when it comes to file system design. So hard disk drives, HDDs. These are stable storage that consists of typically a spinning that's a magnetic medium. It's a spinning drive. It has moving parts. And HDD stands for hard disk drive. And that was a great acronym until these other kind of drives came along, which we refer to as SSDs, or solid state drives. The big difference between flash drives and hard disk drives, well, you guys are computer scientists. So a couple of things. What's not on this list? What other types of storage media are not on this list? Yeah. Tapes. There we go. Got a really retro answer here. But there are organizations, including UB, that still do backups to tape, believe it or not. Yeah, now we're going even back farther. Floppy disks. Whoever used a floppy disk ever? Weird. You guys are not as young as I thought you were, apparently. Or for whatever reason, you guys all grew up in a computer museum together or something. I don't know. What else? Punch cards? No, OK, whatever. That's for programs anyway, not for storage. There's other types of storage you guys use on a regular basis. What's that? RAM is not stable. RAM loses its contents when I'm powered off. Paper. Yeah, I guess you could technically print out the contents of your email inbox. And I don't know how you would get them back into the computer. You need some sort of fancy OCR stuff. I'm thinking of something you use with a computer. Well, maybe you guys actually, it's possible. And I'll have to admit, I have not handled one of these for a while. But my computer is still ready. Yeah. CDs. CDs, DVDs. How many people use CDs and DVDs anymore? Oh, yeah. Well, you guys got to get to the future. Streaming is awesome. Yeah, I was thinking the other day I was going to go to Red Box and get a DVD. But then I was like, I don't even have a DVD player on my computer. So that's not going to work out very well. All right, so these are the kinds we're going to focus on. Because these are the things that we build file systems on top of. And do we build file systems on top of some of these other types of stable storage media? Of course. But for a variety of reasons, those file systems are sort of less interesting than these. And these are the dominant form of media, including the most personal computers today. So what's the big trade-off between these two guys? If you guys have bought, well, that's one of the trade-offs. OK, speed. So we say speed is the trade-off. Which one of these is faster? SSD is quite a bit faster. But is another trade-off? Yeah. Yeah, so in general, hard disk drives are bigger, slower, and cheaper. There's also a form factor consideration here. I don't know if you guys have ever taken apart one of those solid-state drives. But it's kind of amazing. It's this 2 and 1 half inch form factor. But then there's nothing inside. It's mostly air. And there's a few microchips sitting there that are providing all of the storage that you paid an exorbitant amount of money for. So yeah, hard disk drives, bigger, physically bigger, slower, and cheaper. This was, well, projected for 2012. I guess I could have looked up the real number, given that 2012 is three years ago. But there's still this gap between the price per gigabyte for flash and for hard disk drives. And that's why people are still buying hard disk drives. That's why companies are still using hard disk drives. It's just that, particularly when you start to really think about bulk storage, that's what people are still doing. And yeah, if we ever were talking about anything, you guys are confused about what type of storage we're talking about, please ask. Because the differences between flash and hard disk drives lead to very different system designs. So I want to sort of return to one of the questions that we posed at the beginning of the semester. So we had this question of why studied operating systems at all. And I hope I answered that question at least somewhat. But of all the things to study in operating systems, this is the thing where you should be really scratching your head. This feels very retro. Let's study spitting disks. Because look, at some point, there was a prehistoric version of me up here who was saying the same thing about tape. Well, we still need to study tape because tape is still with us. And it will be with us for a few more years. But that's not really true anymore. Nobody studies tape drives. So why are we still talking about hard disk drives? And why? Because flash is the future. Well, admittedly, there's still a lot of hard disk drives around. You guys will probably continue to encounter them. Partly that's because they're cheap. And there's aspects of file system design that are changing. A lot of you guys, and having watched you guys manipulate your virtual box images for this class, I hope that the way you guys find things on your computers is through search. Because hierarchical file systems don't seem to be helping you very much. I mean, how many people regularly use spotlight or some search feature on their computer to find things? Instead of having a very carefully thought out hierarchical directory structure that starts in your home directory and is incredibly well organized into different directories, how many people still do that? Yeah, OK. Maybe we're at a tipping point here. But when you look at local storage, we still have these hierarchical file systems that underlie this. So that's still how things are actually stored and organized. And but there's also, I think, a lot of ideas from this space that we want to talk about. And that's really the reason to study file systems. I mean, look, I am not going to try to claim the detailed understanding of how the fast file system implemented in the 70s is going to help you in the future. But there are some really nice design principles. And there was an enormous amount of time and energy spent by the systems community into designing file systems and to designing file systems that had robust properties given some of the challenges of the underlying device. So that's really what's fun here. And I hope you guys will enjoy studying this even and learn something from it, despite the fact that no one is going to claim that the future is in spending disks. You've got researchers that are claiming then the future will have data centers that don't even have stable storage at all. Only memory, because they're super fast. But in the meantime, we still have some stable storage around. We still have some spinning disks around. And there's a lot to learn from studying these file systems. And so I hope you guys will agree once we're done that this was a worthwhile exercise. All right. So before we can talk about file systems, we have to talk about disks. Because a lot of the things that make file systems fun are based in the things that make disks irritating. Irritating properties of hard disk drives. Old spinning drives. So parts of the disk, the platter. The platter is the thing that where the magnetic medium is mounted. That's where data actually gets written and deposited. Most, I don't know how long this has been true. But at this point, platter is going to have data on both sides of them. So a hard disk drive consists of a series of platters. The spindle is the drive shaft. That's the thing in the middle that's rotating the platters around at a variety of different speeds, depending on what class of drive this is. If you look at server drives, they can move quite rapidly. And then the head is the part that's actually doing the heavy lifting here. So the head is the part of the drive that's reading and writing data onto or off of the platters. And the head is positioned, depending on where the head is positioned, that controls what part of the disk you are actually reading or writing from. So here's a picture. I've got my platters here. It's hard to see, but there's a stack of them. The spindle is in the middle. The heads are mounted on an arm. So when you guys hear the disk running, you can probably hear a hard drive. You can hear the rotation. There's a certain noise that that makes. But that sort of grinding sound that the disk makes, that's this. That's this moving around. I'll show you a video of that in a minute. And then there's all this other stuff that you don't really care about. So you can imagine the engineering challenge here. These platters are, in order to increase the amount of data on the disk, what we've done over generations of hard drives, it's not like hard drives got bigger. You don't have a computer with a massive hard drive. In fact, when you were a kid, you were my age, you saw the drives getting smaller. We went from these 8-inch disks to the 5-inch disks that held 10 times as much data to the 3 and 1 half inch disks that held 100 times more data. So essentially what was happening is we were getting better and better at writing smaller and smaller pieces of magnetic data onto these platters. So the denser we can write data onto the platters, the more data we can store. And other performance aspects of the disks improve as well for reasons that you guys will hopefully understand by the end of class. However, that means in order to read or write a particular byte of data or a particular block of data from the disk, I have to position those heads over a tiny, tiny, tiny area. I have to be really precise in terms of how I position the heads. On the other hand, in order to make performance good, I need to move that arm quickly to get to different parts of the disk. So it's pretty impressive that this stuff all works, period, as well as it does, even if it doesn't work as well as we might like. All right, so when we talk about locations on the disk, there are three different places. I have a track. So think of a track is one lane, quote unquote, on the platter. It's a circle on the platter from the same distance from the spindle. And so if I don't move the heads, it's the path that a head will follow as the disk spins. A sector, so a sector resembles a slice of pie that's come out of the platter. I'll get to the diagram in a minute. And the cylinder is actually, imagine that I took a soup can and just sliced it all the way through the disk. It's essentially all the tracks on every platter that are at the same place. And the reason why cylinders are interesting is it's all the data that can be read or written from the disk without moving the heads. Because remember, there are heads, or don't remember it, I haven't told you this, there are heads between every platter because I need to be able to read or write from every platter. So that disk arm that you saw right here actually has parts that go through in between every platter on the disk. And there's heads mounted everywhere so that it can read or write from every side of all the platters that are part of my disk. So here's the diagram. Probably we're envisioning this track. The cylinder is this, again, the sort of vertical projection of those tracks through all the platters, and there's a sector. All right, so you can find all sorts of cool videos of disks online, right? And here's one of them. So before I show this, what part of the disk is this? It's the arm, right? This is one of the heads. It's the head you can see, right? The other heads are in between the other platters. Here are my platters, and this is the spindle that's driving the whole thing, right? And this is going to show you, my favorite part of this video is it inadvertently shows you how messed up early Windows file systems work, right? So that's a little bit to enjoy, and maybe we'll come back to it later, right? All right, so a little bit of advertising. All right, so I'm gonna fire up the disk. It's not me, by the way, this is some guy on YouTube. Actually, let me put the sound on. Cool, you can hear it. All right, so that's deletion. Copy in between two files. So what's cool here is you can kind of, you can see that those files, you can sort of see where those files are located on the disk, right? You can see that the disk head is, when I deleted, so sorry, when I deleted the folder, that's what's so messed up about Windows. Why is it so hard to delete a folder? It's not that hard to delete a folder. Other files systems do that way better. There's no need for so much disk activity in order to delete a folder, but anyway, you saw that, so when I deleted the folder, the arm was sort of flying all over the place in this area. When I do copy and paste, you can sort of see, it's hard to tell, but it's localized over a couple of places, right? So it's like one of the files is over here, the other file is over here, and in order to copy and paste, I have to pick up data from one part of the disk, move it into memory temporarily, and then write it out into a different part of the disk. So you can sort of see this happen. Quick format, impressive. All right, any questions about this? So again, you can't see it, but there are, this disk maybe has like eight platters, 10 platters, some number, and that arm, if you could rotate it, you'd see that arm looks like this. So it looks like a comb, and it's sort of sliding in and out of the gaps between all of the platters on the disk. Yeah, yeah, so the arms on this disk, and I suspect on most 99.9% of disk, the arms are all synchronized, right? So it's one arm, right? And what that means, of course, is that the heads are always at the same location on every platter, and that's why the cylinder is so interesting, right? Because the cylinder is all the data I can read and write from the disk without moving the heads, right? To every platter, that's a great question, yeah. So the heads, it's one actuator arm. I mean, again, it's hard enough to throw that arm as fast as you can across the disk and then settle it rapidly over a tiny, tiny little track. It's hard enough to do that with just one arm, right? Doing it with eight would be really a nightmare, yeah. Why is it shaped like a triangle? I have no idea, right? I wish there was, do we have any mechanical engineers in here? Answer this question. I don't know, because it's a great question. I have no clue. Yeah. Wait, hold on, did someone answer the triangle question? Yeah, that's probably true. I think he's right, because I think if it was small, it would wiggle too, right? Yeah, because remember, it needs to be rigid, right? So imagine you take that arm and you throw it all the way across to the disk and then you try to stop it. If it's too thin, it's gonna sit there going, so it's gonna be a little woozy from that long trip and they just take a while to stabilize. You can't see it, but it already takes a while to stabilize, right? That's one of the slowness involved with the disk is actually waiting for that arm to stop shaking. But it's shaking at a level that you can't even see, yeah. Isaac, I don't know, I have no idea. That's one of the mysteries of this video to me, right? Why does the Windows file system require so much IO to delete a folder? I've never understood this. I mean, you guys have done this before. It's like, oh, I'm gonna throw that folder in the trash, right? Better go have a coffee break, because it takes like 20 minutes. Why? I don't know. There's really no reason for that. Unless it's like cleaning up the contents or like you set up the disk in some secure way that it needs to overwrite the files, but there's really no reason for that to happen. Yeah. Don't act separate from the disk. Oh. So maybe it's actually, yeah, maybe it's actually. Actually, you actually, you don't. No, I know that, but okay. Let's come back to this later. That still does not require that much disk activity, right? I will maintain. It's a great observation though, yeah. I have no idea. I think I read it was NTFS. Yeah. So, yes, in general, and now we're getting to like electrical engineering aspects of this I don't fully understand, the, all the heads can be active at once, right? And in theory, some of them could probably be reading, some of them could be writing. At some point, I think the bandwidth back and forth to that might start to exceed, it might hit, you might hit other bottlenecks, right? You might outstrip the buffer capacity on the disk or you might outstrip the interface capacity, right? But yeah, and if you, and this is something that we'll come back to when we think about how to lay out files for a file system, right? Because ideally you wanna use all those heads at the same time, right? Because otherwise I have to do a lot more movement to get the same amount of data. What's that? Yeah, yeah, yeah, yeah, yeah. Again, I think there's something deeper, a deeper problem at work here. Yeah, so this is a great observation, right? So there, I don't think I go over those in detail in this class, but there are algorithms that people have designed to essentially determine the order in which the disk head should move, right? So if I have a bunch of things that I need to get at the disk, you know, now you've obviously sort of noticed already what one, let me sort of go on because you guys have sort of got into something that's interesting to talk about. So hopefully this video sort of points out that spinning disks are quite distinct than the other parts of the system that we've discussed, right? So disks move, right? Discs are, and again, and maybe, you know, the disks are the last part of your system that will ever move, right? The next system you buy, if it isn't already, may be a completely solid system that doesn't move. And I have to say, it's awesome, right? Particularly when you drop it, it's like that, whatever, right? I have dropped that laptop so many times and I just feel good every time I do, right? Because I think about how I would feel if it had a spinning disk in there and it was potentially toast, right? Yeah, I know, I have one of those, too. If you drop it enough times, you'll catch it on a bad day, right? Yeah, but that also makes a great point. Drives have tech, newer laptop drives, I shouldn't say newer, like 10-year-old laptop drives have technology where they would actually detect that the drive was being shaken or falling and they would do things to try to protect it, right? Because I'll show you in a few slides what happens when this goes wrong, right? But imagine, again, you have these heads that are flying around the disk, like nanometers away from the surface, reading and writing data, and just imagine what might happen if you like whack the disk, right? Bad stuff, right? So, and disks are slow, right? And this is sort of a consequence of moving, right? Having moving parts makes you electronically slow. You are not gonna outrace an electronic circuit with anything, right, regardless of how fast it is. The last part and the part that I think makes this sort of fun to talk about and file systems fun to talk about is disks are really programmed by the and controlled by the operating system in a way that other parts of the system are not. So that allows us to do a lot more software engineering on top of disks, right? For example, operating systems now maybe they have support for some of these things, but you can't, you know, you can unplug a hard drive from your computer. You can plug in a USB hard drive and that USB hard drive just sort of shows up and you can use it and you can unplug it later. You can't do that with a core, right? I can't buy my new i7 core and like plug it in, right? And then unplug it later when I wanna move to a new machine. That'd be very cool, right? But yeah, I can't do that, right? Same thing with memory. There are probably systems now where you can hot plot memory, but still like, it's not something you guys normally do, right? You gotta reach in there, grab some sticks, move to another machine, machine gets faster, put it back later. So yeah, because of how disks are integrated into the system, there's a lot more of a software challenge here. And again, I mean, one of the reasons to talk about disks despite the fact that they're a little bit of an old technology is the fact that so many smart people spent so many decades working on them. And in a lot of ways, some of the techniques that they came up with are still with us and have been applied in other areas. Okay, so, disks move and therefore disks are slow, right? And this is particularly true when you compare it with sort of electronic time scales. When you think about the amount of time it takes for electrons to flow from the RAM to the cache or into a register, right? Who cares how that happens, right? It's electrons flowing through copper, okay? On a disk, I've got to get this aluminum thing to move, right? There's no way you're gonna win that race. The other really interesting property of, all right, I should have guru do that every day. All right, that was awesome, actually. What's up? I don't know, later today, yeah, this will be on there. Did you pause it? Okay, good, okay. I hope I handled that well. I mean, it's sort of awkward, right? Okay, well, I have my beads now, so it's been a good day. Okay, so, disks fail, right? So, disks like our friend move, and disks like our friend also fail. So, yeah, and this is actually something that file system designs spend a lot of time thinking about, right? So, because your disk moves, it wears out, it breaks down, bad stuff happens. But, of course, the thing that you bought the disk to do is to store things reliably. And so, if that property of disks causes the reason that you're using the disk to fail, that you're gonna be sad about that, right? And disks can fail in two ways, right? So, one interesting thing about hard drives is a lot of hard drives ship with parts of the disk that already don't work, right? You guys probably don't know this, but, all right. A little redux. Shall I just go home, right? That just made, that just made my day. So, the disk in your computer, if you bought a spinning drive, one of the things that might have done at the factory or the first time it powered on is checked itself, right? And discovered, hey, I've got some broken parts, right? There's a couple sectors here and there that don't work. And so, what does the disk do with those? Ignore them, right? Don't use them. And this can actually happen over time, right? So, if any of you guys, if any of you guys have sort of like a non-catastrophic disk failure where there is like, maybe the OS threw up some sort of scary error message and then you ran some sort of repair tool and it seemed okay, and you just sort of went on with your life, right? That could be caused by just a sector going bad, right? For whatever reason, that part of the disk stops working. Hopefully, you didn't have any super critical data there, maybe a file or two vanished and you just went on with your life, right? However, yeah, so sectors can fail over time and the disk can also fail catastrophically, right? So, this is what we were talking about before. There are plenty of ways to break a disk. You could take it and just throw it on the floor and smash it to bits. But one of the more interesting ones is something called a head crash, right? So, a head crash takes place when there's some sort of violent event that causes the heads, which again, the heads are designed to just float over the surface of the disk. If you suddenly jolt them, what they're gonna do is they're going to crash. They're gonna crash into the surface of the disk and they're going to take off plenty of material on their way, right? So, you've been spinning that disk as fast as possible. Suddenly, you push the head into it and you lose least the whole track. You've destroyed the platter and I was hoping it was that guy again, but somebody different. And yeah. And so, I don't know if you've had this happen before. The sad thing is the computer continues to work for like an arbitrary period of time. So, you can sort of like sit there and look at it and be like, bye-bye, right? You were a good computer. I enjoyed you and now your days are over, right? And then the thing freezes and you can never boot again, right? I've done this to machines. I know exactly what it's like. So, okay, so let's look at this. This has got a cool video too. Yeah, so check this out, right? So, this disk doesn't have a racing stripe, right? Despite the fact that that's how it looks. This is not normal, right? Your disk doesn't have like a two-tone color scheme. Oops, sorry. Yeah. Yeah. So, in case you didn't see the thing with the filters. I assume that this disk spins in a counterclockwise direction. And so, the filter here is designed in case there's any sort of foreign material that gets in the drive, like a little piece of dust or something. You know, the force of the disk is going to throw that off into this channel and hopefully it ends up back here behind this little dinky filter where it won't end up back in the disk. And so, his point about the filter being dark was that when this head crash occurred, all that material that was scraped off by the heads got flung out there and is trapped in the filter as well. So, again, a classic example of why you should back up your data. All right, so, and again, I mean, the thing that's cool about disk failures was that the disk was one of this early component that had this property and it was really mission critical for the system to work. And so, a lot of people spent a lot of time thinking about how to make this work, right? So, maybe this year we'll actually read the raid paper. It's a pretty cool paper and it spawned essentially an entire industry, which is not normal for research papers. But so, there were a lot of interesting ideas about how to improve fault tolerance with disks. And a lot of them, even if they're not necessary with solid straight drives, have sort of survived, right? They survived, they flourished, they moved on into other areas. Their ideas that were used to protect other systems and do other things, right? So, a lot of ideas that started in this area have lived on, okay? The final problem with disks is that disks are slow and disks have the potential because of how slow they are to bottleneck and slow down many other parts of the system, right? So, and operating systems play, you know, we play some of our normal games with disks and we'll see this in the example file systems that we talk about to try to address the slowness, right? So, what are examples of some of these, our usual tricks? What are some things that we do to make, to try to improve performance? What are some of the system design principles? Yeah, use a cache, right? That's one thing. What else? We have other tricks. Yeah. Yeah, so layout, right? Figuring out where to put things on the disk, remember, the disk is a physical device and so where things are matters because I need to go where the data is, right? I would say that's an example of what design principle that we've tried to apply in the past. Use the past to predict the future. Know where things actually, the best place to put things is for the next pattern of accesses but I know how things have been used in the past, right? I use a cache and then I also have places where I can do some procrastination and we'll see this come up when we look at log structure file systems as well. And the other, in case I haven't convinced you that file systems are sort of fun yet, despite being sort of antiquated, is all this stuff's done in software. So with memory, for example, and the way that the processor hides memory latency is super cool, right? And if you don't know about out of order execution, you should find out about it because it's awesome. How many people know about that? Oh, look it up. It's super cool, right? So believe it or not, the order in which instructions are being executed by your CPU is not the order in which you executed them. Modern CPUs play all of these games internally to try to minimize memory latencies and it's super interesting and super off topic, right? So not something I'm going to talk about but you should learn about it because it's cool. But the point is that that stuff's hidden from the OS, right? Because memory latencies are pretty short and so getting the operating system involved and trying to address memory latencies is not really very effective. That's something you have to do in hardware. On the other hand, with the disk, the disk is so slow that the operating system is really who gets to try these things out, right? So when it comes to making memory look faster, that's really something that the processor does internally. When it comes to making the disk look faster, that's something that you guys get to play with in software, right? Or if you were a file system designer, that's something that you get to address, okay? So let's talk about reading or writing data from a physical disk. So what are the steps involved to accomplish this? And we'll think about the speed required for each. So what's the first thing that has to happen? Before anything else takes place? Before anything moves? Before anything... What's that? Well, it's actually not an interrupt, right? Interrupt is potentially how this is all going to end, right? But how is it going to start? Yeah. Yeah, I need to issue a command to the disk, right? The disk is a device. So in some way, based on some interface, and again, the details of those interfaces are uninteresting, and not something we're going to cover, but I need to be able to instruct the disk what I want, right? I need to tell it, read this byte, and it's usually not a single byte because it's too small of a unit. So read this block, maybe 256 bytes, maybe 512, at this location, right? So there's an addressing scheme that the operating system has to know about the disk. Like how do I address things on the disk? So I say, you know, read this particular block, and I need to tell the disk what to do. And that goes out over the interface, and then when it reaches the disk, the disk has to figure out, based on its own internal geometry, where is that block? And that allows it to do several things. First of all, it has to pick which head it's going to use, right? Because not every head can read every block, and in fact, none of the heads can read every block, unless you have a very old and very small disk. So I have to choose the head, and I also have to figure out where to position the arm in order to capture that data. The next thing that needs to happen is the drive has to move the heads over the appropriate track. Once that's happening, after that happens, I have to wait for the heads to actually settle to the point where the heads are usable. So if the heads are still vibrating a little bit, they're going to be picking up data from multiple tracks nearby, and I can't read data at that point. So I have to wait until the heads are stable. Then I have to wait for the data I want to come underneath the head, because the disk is spinning, and I'm only reading a small part of the data at any time. And then I need to read and transfer the data back to the system. So first of all, what's the slow part? Yeah? The seek and settle. That just dominates everything else. So these two. The only other physical part here is the rotation time. To be frank, it doesn't really improve that. I mean, that's why you don't have disks that are at 100,000 RPM or something. If you look at disk capacities and if you look at disk speeds, disk rotational speeds have rarely been a selling point for disks and stopped improving at a certain point. Because they just don't really matter that much. It takes so long to get the heads to the right track that once I get there waiting for the data to come around just doesn't really add much latency. So the seek and settle time, this is my problem. Over time, interconnect speeds have gone up. I mean, I have had a series of faster and happier disk interconnect interfaces. I don't know how many of you guys ever tried to build a machine using those old IDE cables. And it was terrible, because they're this thick. And you have to wrap them around stuff. And you know, like, sad as, like, thank you. It just made things a lot easier. But certainly, the bus, just getting data back and forth to the disk, that's something that we've thrown some engineering and some hardware at. Seek times, where do we think those have gone? Nowhere, really. They've improved. They've improved a lot. But there is no Moore's law hiding here. These have certainly not improved at the speed at which processor core density and other types of the system have been improved. Hortation speeds, again, these vary, but they're just not really, they're not the bottleneck. And of course, again, the power required to spin a disk at that speed starts to become an issue. So at some point, there was an interesting moment in system design. And actually, this led to some file system designs that we'll talk about, or some design choices that we'll explore. And I remember I was working at Microsoft at the time. And people, they were really terrified by the sort of intersection of these two trends. So one trend was hard drives got bigger, like, quickly. And that was partly due to density, but partly due to just adding platters and making the disk bigger. So disk capacities got bigger. I remember when I was in college, I had a 20 gigabyte hard drive, and I thought that was huge. That was where I was going to store my music collection that I would have for the rest of my life. Now it's like, I don't know, and now you can get a 20 gigabyte thumb drive. It's just the amount of space you guys have. Now you can buy one and two terabyte drives for probably less than I paid for that 20 gigabyte drive. So really the market is sort of fed consumers with a steadily larger and larger disk. The problem was that seek times were really making it difficult for disks to keep up with the increasing capacity, and it was causing the disk to become slower and slower relative to other parts of the system that were getting faster. I mean, CPUs were getting faster. People were putting more memory in. And so Microsoft was terrified of these big, slow disks, they called them. And the impact they were having on system performance. And they did some clever things about this. This is just a cute little graph showing you the evolution of capacity. Now this ends in 2006, so again, it's a little outdated. But this is a log scale. This is not linear. So essentially a linear curve here is an exponential increase in capacity over time, which is pretty fun. Here's a terabyte. And these are 3 and 1 half inch form factor drives, not really sort of consumer style drives. But now this line has sort of gotten to the terabyte, a multi-terabyte point as well. So it's kind of a neat story about how we gave people so much storage. What they do with it, I don't really understand. And again, going back to the idea of file systems and software, the low level disk interface is really limited. Disks force me to read and write 512 byte blocks using this very limited address scheme. All the things you guys think about when you think about disks, file systems, file names, file types, all this stuff is all layered on top of that in software. Reliability, check pointing, all sorts of things. Why one file system can move files quickly and the other one takes a lot of time or delete things? These are all software properties, because this is all done in software. So the file abstraction, which we'll start talking about next time, is really something that's implemented entirely in software. And so it's kind of a fun thing to think about. And another thing to point out is that this is why there's so many choices when it comes to file systems. Other parts of the system, for example, even when you start thinking about the memory management data structures that we just finished talking about, those are constrained by properties of the hardware that the operating system has to meet. Here we have this very low level capability of reading and writing 512 byte chunks. And that's created this ability for people to write all these different types of file systems that this list is out of date. And still you have new file system designs that are shipping and new people who are still working on this problem. There are still people. There are people that started working on file systems in the 1960s and are still working on file systems today. And still solving cool problems. There are file systems that are still shipping with really cool features, where you're like, ooh, I wish my file system could do that. So this story is not, in any sense, over. And again, this is why a lot of systems people, not necessarily me, but a lot of them have a soft spot for file systems, because this was a big, hot area in systems research for a long time. And I would argue that's pretty much over, but still persisted for a long time. There's some really nice things to talk about. So just as a coda, and I want to come back this year to talk about Flash a little bit before we move on as part of this unit, but Flash, right? So you might think Flash is going to solve all these problems, like it doesn't move, it doesn't break. Yeah, so except that Flash has some issues, right? So most commercially fashion available Flash chips have this process, this problem where in order to write a byte of data, I have to first erase this huge piece, like 32K or something, right? So that's interesting. That makes things a little more interesting as far as how it's done out of the system. And here's a fun fact about Flash. Flash wears out. Have you guys ever found this out, or had a smart phone that stopped working after a couple of years? For real, right? I mean, Flash has a limited number of cycles that it can be read or written. And once you exceed that number, it's done. And so this has created all these interesting opportunities for what they call wear leveling. So your Flash drives that you use have software on them that tries to make sure that the entire Flash chip fails at once. So it tries to spread out accesses to the chip in an intelligent way to make sure that when one part of the chip starts to fail, the whole chip is essentially used up. Otherwise, your storage would have this interesting property where it would shrink over time, which most people don't really like. My disk is a little smaller today than it was yesterday. So yeah, the biggest problem with memory, of course, is that it's volatile. So when I power the machine down, the contents are lost. The other problem is that it's expensive compared with these. If you look at the price per byte of memory, higher than the other two alternatives. But you're right, that's not stopping some people. So there's a project called MemCloud that you can go look up where people have designed a cloud system that has no storage at all. It's just memory. What's that? Well, you'd use it for backup and stuff like that. But during runtime, it doesn't use storage for anything. And again, you understand why someone would want to try to do something so insane. It's because it's super fast, super fast. And with the cloud system, you never shut it down anyway, so who cares about the volatility? So next time, we'll talk about file systems. And we'll talk a little bit about scheduling algorithms for hard disk requests.