 All right, everybody, good morning. It's Wednesday. I've got a smaller crowd today, but maybe people are still filtering in. So I just want to address sort of a kind of important situation we have growing on here, which is that the weather in Buffalo this year seems to be, I'm new here. But I don't get the sense that this is normal. The warmth outside. So I don't know about you, but I'm struggling with this kind of very premature spring fever, meaning that it feels like May, and I'm kind of ready to be finished with CS421 and off to the summer and stuff like that. But it's not May, it's March, right? And so you guys still have another month to put up with me and a whole other assignment to do. So my suggestion is that we all kind of like, just imagine that it's snowing outside, that there's like three feet of snow on the ground. It's freezing, and you need to be inside. And the only thing you can imagine doing is working on your assignment two and assignment three solutions, because you're kind of stuck with that for the next six months. So don't lose focus here, and I'll do my best not to too. But the weather is nice. So when you have a chance to take a break outside, take a walk. All right, so today we're going to start talking about stable storage. We'll probably talk about stable storage for a week or two. Stable storage ends up looking a lot like stuff that we've already looked at, which is kind of cool, because we get a chance to see. Oh, that was a part of the dog that he doesn't like to be stepped on. You OK? OK. It's like, I'm going to sit on that side of the room now. All right, so stable storage gives us a chance to see a lot of our system design principles kind of come to life and be applied to this interesting problem. Now, it's kind of weird. I mean, and maybe it is just the fact that we don't usually start with file systems. And maybe by the time we get to file systems and disks in a typical operating system class, I'm starting to lose focus myself because it's starting to be spring wherever this class is taught in the spring. But I think not because so I never like file systems were never my favorite thing. And you guys have noticed that the assignment that we dropped for this class was the file system assignment. And I don't know, like in the past I always kind of thought file systems were boring. But as I started to think about this, I thought this is where, in some ways, this is like the real glory of operating system sort of research and development is into these file systems. And it's partly because disks are so complex. It's partly because a lot of file systems, in fact, almost all file systems are implemented in software. So it really gives the system community a chance to shine. So I think there's some cool stuff here. We're going to talk about spinning disks. Spinning disks are a little bit old school. But again, I think the point here is to look at how these systems were designed and try to learn from them and take some of those lessons so we can apply it in other places. Again, I don't think you guys are going to spend very much of your life worrying about the details of spinning disks. But we will talk about them today. All right, so we're still working on looking over the design documents. But when we have feedback for you, we'll release it. And assignment two is due next Monday. That's the deadline we have right now. Who knows? Maybe it'll move. It'll definitely move later if it moves. But for now, consider it to be Monday. And to use that as some incentive to actually work on it, which will be required. Working on it will be required regardless of when the deadline actually ends up being. So the sooner you get started, the sooner you realize that you should have started sooner. OK, so let's talk a little bit about some basics here. Because I think now in our modern world, it's important to be a little more specific about what we mean when we talk about disks. Because there's two kinds of disks and they're very, very different. So in general, when we talk about stable storage, we're talking about a component of the system that stores data persistently or in a stable way. Meaning that when the machine is powered off, the data is preserved. Maybe this is kind of obvious, but when we talk about stable storage, that's what we mean. So any storage on your system that does not lose its contents when the computer is shut down. Now today, again, we have two main categories of stable storage. And I just want to introduce some terminology that I'll use when I talk about them. So we talk about HDDs, hard disk drives. We're talking about spinning disks. And we're going to talk today about the parts of the spinning disk. We'll see some videos of spinning disks in action, which are really cool. And it'll give you a little bit of a primer to how these devices work, which is kind of fun. You can't really do that with the CPU in memory. It's not as exciting to be like, here's a YouTube video of a RAM chip in use. It's just kind of boring. But this is kind of cool. Maybe they should make them that way. Like they light up or something, like some sort of visual cues about what's happening. But disks do things, and so they're fun to watch. So when I talk about HDDs, spinning disks, or hard drives, I'm talking in general about a stable storage device that's constructed for rotating magnetic platters. And again, we'll see plenty of this today. If I talk about an SSD or a flash drive, I'm talking about a stable storage device that's constructed from non-volatile memory. So we're talking about non-volatile memory. We're talking about something that's kind of like RAM, usually slower, but non-volatile. It doesn't lose its contents when it's powered down. And these flash drives SSDs are really, in many ways, revolutionizing how storage works on every class of device, from tiny little phones and MP3 players all the way up to big servers. Everybody is trying to figure out, what does flash mean and how do I use it? And how much is it going to cost? So we'll come back to that in a little bit. Yeah, Carl? Yeah, with 64-bit chips, you're putting it into the memory address space at the CPU? Sure, so yeah, that's a great question. So last year, I was at a workshop where people talk about kind of new trends and operating systems. And there is at least this pie-in-the-sky idea that someday we will have memory, we will have something that looks like RAM in terms of being as fast as RAM, an address like RAM, but that will actually be persistent, like flash, or like stable storage. And that starts to blow your mind a little bit when you think about what the implications of that would be. Because especially when you start to think about boot. Because on boot, your system is essentially taking this information from the hard drive and using it. You could think about one of the things that happens during boot, or maybe the only thing that happens during boot is the system's kind of initializing the volatile components of the system, and particularly RAM. I'm taking some information that I stored on disk, and I'm using it to set up the RAM in a way so that I can operate the system. Well, if I don't have to do that anymore, if nothing that I store in RAM ever goes away, then there's a lot of cool things that can happen. So if you want a pointer, there was a group that thought about this and came up with some really, really fascinating ideas of the types of things that would be possible. And if you want a pointer, that paper all said it to you. It's actually a really, really neat thing to look at. Maybe we will look at it, because it's fun. It's five pages, it's short. All right, so I just looked up this number just 30 seconds ago. And this is completely unscientific. I just looked up a drive on New Egg, and then I found something on Garner about projections for SSD prices. But at least right now, there's basically an order of magnitude difference between the price well, I don't know what GDs are, but the price per unit of storage for SSDs and HDDs. So a 1 gigabyte, you can get a 1 gigabyte hard to strive for 100 bucks. Garner is forecasting that by 2012, which is now, SSDs should be down to $1 per gigabyte. So you, sorry, you can get a 1, OK, hold on. Hard drive just got really expensive for a minute. You can get a 1 terabyte hard disk drive for 100 bucks, better. And it would cost you about $1,000 based on Garner's projection to buy a 1 terabyte SSD. So we're still talking about an order of magnitude difference. Now it's not clear that this really even matters, because one of the things we'll come back to at the end of class is just the fact that one of the interesting things that happened over a period of time was that hard disks got really big. And it's not clear they really needed to be that big. And maybe what will happen, especially now that we have all this really cheap, free, essentially non-violetized storage in the cloud, is that we'll see an interesting role reversal where consumer devices like smartphones and laptops and tablets will actually start to have less storage. Because there was a day when you needed every photo that you ever took of your children on your own hard drive. And so you needed this 3 terabyte drive to hold it all. Well, 10 years from now, it may cost $0.10 a month to rent 3 terabytes from Amazon or something. And so you can get away with having 100 gigabyte flash drive that by that point might cost $50. So we'll see what happens. Isn't it interesting how this is going to play out? I mean, the new lines of laptops frequently are shipping with smaller SSDs. And personally, I don't care. It doesn't bother me. I love this machine. It's fast as hell and has a small disk drive, but I don't notice. So I don't know, maybe I need to take more pictures. And finally, when we're having discussions, particularly when it comes to file systems and the details of disks, generally we're going to talk, at least for the first couple of days, maybe the first week, about HDDs. Because this is the device that the systems community and the file systems community has spent 20, 30, 40 years figuring out how to use and done a lot of really, really clever things about. So we're going to study some of that because it's interesting. And this is a mature design solution. I'll get to you guys in a sec. SSDs are newer, and so we'll talk about them last. But if I'm not being clear, please ask, John. Oh, yeah. Right, exactly. So if you think about it, I mean, in the past, we had this hierarchy. And I wish I had a slide for this, but I always like to think about it. You have registers. You guys ever, do they have one of these around here where they put the planets, and then they put the actual relative distance between the planets? They have something that's in Boston where the sun is downtown, and then Pluto is like way out in the suburbs close to where the Boston Marathon starts or something. And then the other planets are at the relative distance is where they are. So you could do something like that for the speed of storage on your computer, right? So you've got some tiny, tiny number of registers, and those things are like blazingly fast. And then you go down. You have the L1 cache a little bit more, a little bit slower. The L2 cache a little bit more, a little bit slower. Maybe an L3 cache a little bit bigger, a little bit slower. Memory, a lot bigger, actually a fair amount slower, right? Maybe the next thing now we hit is flash, right? Bigger, slower. Spinning disks, even bigger, slower, right? So you can think of, who is it? Ted Stevens once referred to the internet as a series of tubes, right? You could think about computer systems on some level as a series of caches, right? Like data is what we're doing. We're manipulating data. The CPU is doing the processing, but everything else on the system is really just involved in storing data in one way or another, and they're all trying to make each other either look bigger or faster, right? And if you do it carefully, you can do great things. Definitely there's a lot of work on integrating flash into the existing storage hierarchy, which really developed without that concept. Yeah, Malik, you had a question too? Oh yeah, yeah, absolutely. And a lot of what we'll talk about in this class for the next week is how all these tricks that people did over years and years and years to figure out how not to do random IO on spinning disks, because random IO on spinning disks is terrible, right? Oh, I can't wait to show you guys this video. It's so cool. Maybe you guys have already seen all this stuff, but I'm still kind of new to the interweb, so I found some cool things out there. Like watching disks in action, right? Okay, so anyway, so this is our terminology, right? And then again, I mean, I've hinted at this a little bit, but why should we study city disks? I mean, people might say, you know, flash is the future, right? If disks are obsolete, there's actually some people out at Stanford who have either spent too much time in the sun or are smarter than all of us and think that, and I think that he's right eventually, but does anyone ever use a tape drive? How many people in here have used a tape drive, all right? So a small, why? Anyway, what's that? Okay, well, okay, anyway, we need to upgrade, what's that? Right, so, but at some level, tape drives used to be really common, right? And tape drives are pretty much, unless you took 10 Smith's class or are just kind of like a techno, are running a technology museum pretty dead, right? They're gone, okay? They're not really something we think about as a modern product computer systems. There are people that think the disks period are gonna go that way, right? And eventually we will have systems, especially server systems that are RAM only, right? So there's a whole group out of Stanford that's designing this thing called RAM cloud, which is a whole server architecture where there is no disk, there's no stable storage period. And if you think about it, if you have, Google has servers that are online all the time, they never shut down. Why do they need a disk? The only point of having a disk on some level is so that data survives when the machine shuts down. If the machine never turns off or if the machine only turns off and it finally dies and gives out and you have to kick it out of your data center, who cares? So anyway, so there was some thought that not only is flash not the future, but disks aren't even part of the future period. Hey, can we have one conversation, guys? Thanks. All right. But again, we're still, at least for the time being, we're living with these hard disk drives, right? There's still out there, there's still a lot of devices and so you can still find videos of them on YouTube, right? You know, and there's another, when we start talking about hierarchical file systems, there's another camp of people, including the woman who taught me this class the first time, that have started to propose that hierarchical file systems are dead, they're gone, they're finished, right? There's no, who cares? There's nothing to see here anymore. Why, right? What's the dominant paradigm by which people access information now? So right up on the slide, search. Do you guys, any guys remember Yahoo? Yahoo still exists, right? So you might remember it because you used it yesterday. But in the early days of the internet, Yahoo had this very hierarchical way of finding information, right? Like, you know, if you wanted to look up the score of a basketball game, you clicked on sports and then you clicked on basketball and like U.S. basketball and then scores and then you found what you wanted. And it was kind of in some way, this file system hierarchy type of approach. Now what do you do? You go to Google and you Google basketball scores, right? And the first thing that pops up is, you know, a live score for the game that you were interested in, right? So, and there's all this, you know, you guys probably seen this in various magazines and smart people always search making us stupid and, you know, people like you, young people don't understand the world anymore because all you do is look up things on Google and you don't have any deep understanding of anything, blah, blah, blah, right? And this kind of worrying about youth has been going on, I think, as long as the world has existed, so it's not, you know, don't take it personally. But anyway, so there's some sense that maybe search is a dominant paradigm. So how does search play out on your own local system? How many people have a Mac? How do you, like, do you guys use Spotlight? Yeah, I mean, that's it, right? You know, before, people that are weird like me actually still organize their stuff into directories, right? With directory hierarchies, but I think a lot of people don't bother with that anymore, right? It's in there somewhere, right? And the way I'll find it is I'll look it up on Spotlight. It's a very similar way to the way people find things on the internet. And so maybe, that's the whole idea of needing to support hierarchical file systems kind of over, except for the fact that things like Spotlight are still implemented on top of hierarchical file systems today, right? So we're not quite done with these sort of systems, right? And finally, again, whenever a new technology comes along, there's always the sense that, oh, the world will change in a fundamental way, nothing will ever be the same, you know, this brave new world that we're entering into without spending disks. And the reality is that, you know, the new king looks a lot like the old king, right? Like, a lot of the new solutions are, should be, should be, right? Inspired by, driven by, at least communism of earlier efforts. Like, when there's actual changes, you want to make different design decisions, but there's no need to try to present that the past never happened, right? So when we talk about spending disks and when we talk about hierarchical file systems for the next two weeks, again, I want you guys to focus on the design principles that are at work here, how they play out, and also, again, really, you know, have some admiration for the hard work and effort and elegant design that the systems community put into this problem for many, many years, despite the fact that, you know, maybe it doesn't matter anymore, right? Who knows? I'm not gonna take a position on that question, but you guys may feel that way, so. And we won't spend a huge amount of time talking about this, all right? All right, so with that said, let me introduce you to the parts of a disk. How many people have ever pulled apart a hard drive before? You know, just to see what was going on in there, right? How many people posted a video on YouTube of yourself posting it? Not me. Thank God people didn't. Okay, so disks have these pieces, right? So if you pulled apart a disk and we're gonna watch the video in a sec, you'll see that there's a set of platters, right? The platter is the actual circular spinning piece of the disk that is usually a rigid, non-magnetic material that's coated with a very, very, very thin layer of magnetic material. I think we're talking like 10 to 20 nanometers, right? So really, really thin magnetic coating on top of this rigid non-magnetic platter, right? And this is where we actually write and read data by changing the magnetization of the little, tiny little bits of the material, right? I'm not a material scientist, I'm just kind of faking this based on things I read on Wikipedia, right? And the idea, the other thing to keep in mind is that platters can have data on both sides, right? So that's one easy way to increase the capacity of the disk is just use both sides of the platter to store data, right? All right. So the spindle is just the drive shaft, right? That's just the thing that runs down through the center of the platters. It's hooked to a motor, right? And spun at, you know, different, you know, differing RPM. We'll talk, you know, you might think, hey, you know, 15,000 RPMs, I mean, we can make it go faster than that, right? I have 10, you know, 100,000 RPMs, right? The disk will be really fast. But the problem, as we'll see, is that on some level, rotational latency isn't necessarily what's killing you when it comes to disk seek times, right? What is killing you are these things, right? So how do you actually write and read data to the platters? So you have this thing called the disk head, right? The disk head is an actuator and a sensor, right? That is capable of reading or either changing the magnetization of those little pieces of the disk or detecting and reporting the magnetization bit by bit, right? And the thing to keep in mind is that these heads, I wouldn't say, I've always heard the term float, but they don't really float, right? Nothing floats, right? Gravity exists, but they cruise, you know, maybe a couple of nanometers or 10 nanometers right over the disk surface, right? So they're almost on top of it, right? They're not touching it, but they're very, very close, okay? So here's this beautiful diagram that I did not draw, but that kind of shows you what's going on here, right? So you see that this is, I guess, supposed to be a set of platters, but here's a platter. This is the spindle going down through the drive looks like on this drive, you know, the drive mechanism is underneath. And then the head is mounted on this arm, right? And this arm controls the head position from, you know, all the way out of the perimeter to the inside, right? This is the surface over which data is read and written, right? And by moving the heads back and forth on this actuator arm, I can write and read data in any point, right? Normally, disks have multiple heads, right? Multiple heads for platter and then heads on every platter, right? So what you can't see here is that if I have multiple platters here, there's heads here, this actuator arm spans, it kind of, you know, goes, it's like a cone, it goes in between all the platters, right? So I can read and write at the same point, but at any point up and down along the disk based on where the heads are located, okay? Questions about parts of the disk, all right? Okay, so disk locations are a little bit more difficult to talk about in English, but I'll try and then we'll have a diagram. So tracks, right? So think about a lane, how many people ran around like a running track, right? So think about a lane on a running track, right? That's the idea of a track here. It's a circular slice of the disk that if I left the head in one position and allowed the disk to spin, that's the orbit that the head would see, you know, as the disk rotates. Okay? A sector on the disk resembles a slice of pie that's cut out of a single platter if you're developing your mental imagery here. And a cylinder on the disk is, imagine if I take my stack of platters and I took, you know, a cup and I just intersected it like this. So a cylinder is actually this vertical cut of a set of tracks that are all vertically aligned on the disk, right? And again, this is difficult to visualize. So let's look at a diagram. So here's a platter viewed from the top down. Here's one track, right, in yellow. The cylinder that is associated with that track goes down through multiple platters, right? And then the sector here is delineated by this particular pie-shaped piece, right? So it's a radial cut of the disk. Now, why would I care about a cylinder? Why would I even talk about a cylinder? What does it matter, right? Ben, why, so that's absolutely correct, but why, right? Because the head doesn't have to move and that's because just make it as obvious as possible, right? Because we all just started thinking about this stuff. So the head doesn't have to move why? Because the head is where on each platter. It's in the same position. So if I have a head that's rotating on this track on the top platter, the cylinder is composed of all the other tracks that I can read on the disk without moving the head, right? So as Ben said, if I, and we'll see in a minute, what creates latency with this is the fact that I actually have to move these heads around, right? And that, that's why you can hear a disk operating, right? Like you hear those clicks, you know, that's actually the heads moving around on the disk, right, seeking to different locations. And if I don't have to do that, things are really fast. And so if I write files on the set of tracks that can be accessed without moving the head, then I can get all that data. Sometimes in one path, because sometimes disks, frequently disks can read from all the heads simultaneously, right, so if I have a file that's located, or some group of data that's located on a single cylinder, I can read that all in one rotation of the disk potential, right? All right, so let's look at this cool video. This is pretty awesome, right? Where is my mouse? Okay, here we go, right? So what you're gonna see here, so this, so you can see on this, this is a disk viewed from above, this is the spindle, this is one platter, this is the actuator arm, and there's, you know, probably a series of heads, maybe even below here, but definitely right there. And what you're gonna see is this actuator arm moving along the disk as things are happening, and this is kind of labeled with what this particular person was doing as they took this video. All right, so you can see, I'm gonna see the heads start to spin. So now you can see the head traversing across the disk, right, now it's gonna do a couple of things. So deleting a folder, right, so look at this. Pretty cool, right? I have no idea why this would take so many seeks on the disk, this is probably Windows, right? Windows file system is messed up, right? It should be a constant time operation, but not on Windows, Windows has to do something else. So here's a copy paste. Now here's what's interesting here, right? So let me pause this for a sec. All right, so can you guys see what's happening? The head is actually essentially jumping between two different locations on the disk. So what do you think those two locations are? This is a copy paste. It's a source and destination, right? So either the source of destination file is closer to here on the platter, and the destination or source file is over here, right? And so the disk requests are, the kernel is grabbing a bit of information, it's storing it in memory, and then it's blasting it out into another location, right? So this is kind of neat to be able to watch. All right, so now we're gonna format the disk. All right, now you also notice this is the first time that the disk is actually rotated that far into the middle of, towards the spindle, right? So all the other axes that we've seen have been out here at the perimeter of the disk. And there's a reason for this. All right, any questions about this? I think this is pretty cool. I can set out the link to this video. It's kind of fun. And Joshua, Marius, or whoever it is. So here's a question, right? Based on what we've just discussed, and we can watch other videos if we want to. What is the, so you notice that the heads seem to spend most of their time at the outer perimeter of the disk. Can anybody guess why that is? Why would I want to read or write files towards the outer edge of the disk? Why is it faster? So you guys are absolutely right. And the reason for this is that the density, so there's some density of magnetization, there's some density of bits that I can write to the disk, okay? And the disk is spinning in a constant RPM. So at the outer edge of the track, there is more data that I can write on a single track, and therefore more data passes under the head in a single rotation. So the bandwidth of the disk is actually considerably greater at the outside edge. This is the sort of wiggy stuff that makes file system design so much fun, right? Like where are the heads? Where do you put files? How can you put files close to it? It's just, you know, again, this problem is a total mess, and people had fun solving it for decades, right? But yeah, so this is this kind of thing. Okay, so let's talk, I think it's important, when we're talking about spinning disks, to identify some of the things that make them different from the other parts of the system that we've discussed. And there's three differences. I think the first two are more significant, the last one I kind of threw in there at the end, and we're not really gonna talk about it, but I think it's worth pointing out, right? So what's the difference in kind, right? Why are disks just completely fundamentally different than memory and CPUs? What do disks do, spinning disks do, that nothing that we've talked about yet does? They move, right? They move, it's the only part of the system that moves. And now you have these solid state machines and nothing moves, right? Which is kind of nice, but in the past, the disks were the only thing that moved. And moving creates a lot of consequences, right? Moving means that, to some degree, creates some of our other differences, but moving also means that like other physical kinds of objects, disk wear out, they break, et cetera, et cetera. What's the difference in degree, right? Disks compared with memory and the CPU tend to be really what? Slow, right? Disks are really slow. And compared with those other, and then there's other slowness in the system, but the disk slowness is a source of slowness on the system that operate systems have been battling consciously, right? Other parts of the system try to hide the slowness from the operating system in certain ways. And a lot of that's done in hardware, but for disks, the operating system is really in charge of trying to work around the fact that this can be slow, right? And what are disks? I mean, disks are typically thought of, memory and CPU are kind of like fundamental parts of the computer. You really can't have a computer without them. Disks, on the other hand, are usually thought of as what? They're devices, right? It's a device. And on some level, disks present an interface, right? And a disk interface, we'll talk about in a second, but it's usually a very low level block level interface. And the operating system builds this whole other abstraction on top of it by implementing file systems. And so on some level, the file system, as long as the file system looks like a file and interacts with the disk in terms of blocks, there's a huge amount that you can do, right? And that's why, you know, I mean, that's why there are different file systems, right? Because you can implement a lot of different types of ways of storing data on the same set of physical devices, right? Because the physical devices all provide pretty similar low level interfaces, right? There's a huge amount for software to do to take the disk block interface and use it to make something that looks like a file, okay? All right, so the first thing, disk move, right? So what we're doing is we're really introducing a new time scale into our system, right? Electronics time scale is fast, right? It's kind of literally, how long does it take, you know, an electron or a stream of electrons to move from one part of the computer chip to another? And hardware designers actually spend a lot of time thinking about this, right? To some degree, one of the challenges of multi-core computing is that the cores are farther away from each other, right? And by farther away, we mean like a couple more millimeters, right? But actually that produces noticeable latencies and creates difficulties in supporting these types of systems, right? When you're a hardware geek and you're thinking, I don't know, I don't know what they talk about, like nanoseconds, picoseconds, whatever, like that actually matters, right? The amount of time it takes an electron to go a, you know, a millimeter through copper or whatever it's passing through actually starts to matter and impacts somebody that we do design, right? But mechanics time scale is totally different, right? Now we're talking about the time necessary to actually move the head from one position to another, and that just has a whole different set of measurements, a whole different, you know, these two things are not really even comparable. It's very slow, all right? So this is kind of an aside, but I just thought it was interesting. And actually as I was thinking about lecture today, I realized that there is actually an application of this to disk design, but I wanted to just point out that in certain cases, things are also capable of moving very fast, right? So does anyone know what a table saw is? Table saw is a device, it's like this side of the room knows things today and this side of the room either doesn't want to raise their hand or doesn't know things, all right? Okay, so you guys, find out what a table saw is. A table saw is a device that's used to cut wood, it's got a, as you can see here, it's got a blade that's emerging from a surface and one of the things that table saws are notorious for doing, luckily not in large numbers, is removing fingers from people who use them, right? Because, you know, you're sitting there, you've got your hand on the wood, you're feeding the wood across the table saw and oops, like you weren't paying that much attention and you know, you did a really nice cut, smooth cut through the board and your pinky finger is on the other side of that, right? And so this guy, this very, very clever person, designed this safe table saw, right? Now I'm just gonna let you explain it, him explain it to you, because I think this is pretty cool. And there is an application to disk design. The safest table saw I've ever built, I think of it like seatbelts or an airbag for table saws. The mechanism is very sophisticated, but the technology behind it's actually quite simple. The blade carries a small electrical charge. This charge is continuously monitored by a digital signal processor. When contact is made, the human body absorbs some of the charge, causing the voltage to drop. The drop in voltage triggers a quick release of aluminum ray. A heavy duty spring forces the ray into the teeth of the spinning blade. The teeth dig into the aluminum, stopping the blade cold. The blade's momentum forces it to retract below the teeth and the motor is automatically shut off. Is that awesome? So look at the hot dog. Just a nick. Anyway, it's pretty neat, right? And it does have something to do with file systems, which maybe if I don't run out of time today, we'll talk about it. It's also just cool. Okay, so table saw. All right, so the other consequences of disk moving is that they fail, right? And disk can fail in a variety of ways. So the interesting thing about disk failures is I don't know what it's due to. Maybe just manufacturing defects or the fact that these things get bumped around as they're transmitted. But I shouldn't say many disks. Every disk that you buy has some degree of built-in sector failure, right? Hey guys, it's been like 15 minutes. It's kind of getting annoying. So every disk that you buy comes with defects, right? So at some level, some number of sectors on the disk ship to you broken. And what happens is at the factory or online, the disk can detect itself that these sectors don't work and the disk essentially just hides those sectors from you. So if you get a bunch of disks, they probably each have a slightly different capacity, right? Because there's some small number of sectors that are broken, the disk is just pretending those sectors don't exist and your system just works fine. It just, you know, the operating system never sees those sectors, right? And a lot of times the disk will actually remap those sectors so that the sectors look contiguous. But there might be, it's kind of interesting consequence for file system design because the sectors are usually numbered contiguously, but some of those numbers point to other, most of those numbers go around the disk in a certain pattern, but some of them point off to somewhere else, right? Because the disk has remapped a sector that was broken, right? And then over time sectors can also fail. And depending on your file system, of course this can result in data loss, right? If you have a sector on your disk that happens to fail and it holds some really important piece of file system information, then the whole file system might be toast, right? And file systems have learned to not be this brittle, but if it's holding data, I mean at some point the data might be gone, right? This doesn't normally happen until the disk get pretty old, right? But this is all good reasons to take backups, right? So maybe this slide should be inspiring you guys to take back to this, right? The disk also failed catastrophically, right? So how many people have ever had a hard drive just die? How many people have ever had a hard drive die and you knew what you did, right? Okay, yeah. So I had a laptop when I was in graduate school that went through six hard drives in about six years. And most of them were caused by me just dropping it or bumping it or whatever. And what this causes, so remember those heads when the disk is spinning are rotating right above the disk surface. And so if you send an unexpected jolt into the casing, that can cause those heads to essentially dive into the platter, right? And as soon as they start making contact with the platter, what they do, given the rotational speed of the disk, is they just start tearing off material, right? And as the disk continues to spin, the head, I mean if the head's moving laterally can tear off a whole swath of the disk. And at that point, you know, you have, I always remembered like I would drop it and I would pick it up and it would be working and it would be working and then like 20 seconds would go by, it would just freeze, right? That was it, right? So those are your 20 seconds, like if there's a file that you really, you have 20 seconds to say goodbye, I don't know what you can do in 20 seconds on a system that doesn't work, but maybe just, you know, just, I don't know, say you were a good computer and I'll see you on the other side or something. But anyway, so similarly, here's a pretty cool demonstration of what a disk looks like after a head crash. So the guy's gonna explain it to you, but see this, this is not normal, right? This is not, this is not a stripe on the platter, disks don't come with racing stripes, right? Like this is, that is missing material that is caused by a head crash. This is just a short video, if it's gonna load. I'm gonna go look at it, cause it is. Class of example, why you should always make up your name? Yeah, so that's kind of gross and I'm guessing that that, I mean to some degree, you know, file systems are built to be able to recover. And you know, if this happens, so you go see like a licensed professional, maybe he can actually do something for you. I had a drive crash in college once that I had all these MP3s on. It was like this collection that I had amassed over years and years and I was pretty sad. So I got some software to recover the data and the software claimed that it recovered something like 98% of the files and that was fine. But then I started to listen to the MP3s, right? And the other 2% of the files that it couldn't find, they were still there. And where they were is that they were little snippets of other songs embedded all over the rest of the song. So you'd be listening to, you know, like whatever and there'd just be like 2 seconds of another song and then we'll go back, right? Cause it, Amy, think about it. It's clear what happened. Like it just lost track of those files and the data was still there, but it kind of assumed that those blocks were part of another file. And so it would read right through and it would just get like a little similar. So anyway, I ended up trashing the whole thing. It was just too weird. But anyway, so, and there are nice ways to work around high drive failures, right? And we'll spend one lecture talking about RAID, which I think is a neat idea. It's a very, very clever application of redundancy, which is another systems design technique that we haven't talked that much about, but we will get to this, right? So finally, so this is slow, right? And as I said, the operating system spends a lot of time working to hide these things from the system, right? And what we're gonna see when we start to look at file system design and different ways of making disks look faster is just, again, I mean, these are our design principles and they apply to this problem as well, right? Using the past to predict the future, figuring out ways to anticipate the behavior on the disk so that I can get out ahead of it and get in data before it's actually accessed. Using a cache, right? Every operating system has a file system cache, right? It caches hot blocks in memory, right? Now, you can imagine that creates some issues with failure and we'll talk a little bit about what those are. And finally, procrastination, again, like not having to do things right away, sort of lazy access, which is another sort of design technique, right? And the other, and I just wanna point this out again. So here, the system does a lot of work, right? So memory latencies are also a really important part of trying to make systems fast, right? Because to the processor, memory takes a long time to access, right? But what do processors do? How many people, I mean, you guys have all, I think at some level taken a computer architecture class or computer system organization or whatever it's called, right? So what do processors do to try to hide this? Anybody know? So memory, so when I am an instruction that accesses memory, it stalls the processor for some large, maybe not large, maybe 10, 20 cycles or something, right? While the request is being sent out to the memory bus and the memory bus is locating the byte and sending it back. What's that? So it's kind of a form of a cache, but what modern processors do is they do out of order instruction. So they execute a bunch of instructions, but if one stalls, they try to just keep going, right? And they have all these clever tricks that they play to try to hide the latencies of that instruction. Now the thing is that the operating system knows nothing about this, right? This is all being done in hardware, mainly because the memory stalls we're talking about are slow to the processor, they're not really slow to the system, right? So that's all happening transparently, and if you're interested in that sort of thing, go study hardware, it's still very much an open problem, but here the operating system software is involved, right? And this is really something that we do in software and there's not as much hardware support, all right? So let's talk, last thing kind of we'll cover today is the process of reading and writing data from a disk, right, what actually happens, right? So what do you guys think the first thing, there's a byte of data I want on the disk, what's the first thing I have to do to access that byte? Let's say I'm gonna perform a read, what is step number one? Anybody, what's that? Find it, well let's say I know where it is, so let's say I know the block on the disk I want to read, the file system knows it and so I have this block I wanna read, what's the first thing I have to do? What's that? Okay, we'll move the head, but how does the disk know to move the head? What's that? Okay, I've got a block, now what do I do? Maybe this is too obvious, I have to tell the disk what to do, right? I issue a command to the disk, it's the first thing the operating system software does, the device driver, right? I tell you I wanna block from this disk and it actually has to tell the disk, now remember the disk is a device so the disk is interconnected over some sort of, you know, disk interconnect like ID or SAT or whatever and that command has to actually be transmitted to the disk, the disk has to sort of understand it and then choose a head, right? So that's something that the disk is also gonna do, the disk is gonna, depending on where the heads are on the disk and where the data is on which platter, the driver's gonna select which head is gonna use, right? But the first thing is, you know, issuing the command, actually telling the device what to do, right? Now, somebody said something about moving, what was that? Right, so I have to move the head to the appropriate track, okay? This is a problem, okay? Next thing I'm gonna do is, and this is the, maybe you can combine these two together, but I have to stabilize the heads on the track, right? And keep in mind, I mean, these are tiny little tracks, right? I mean, to some degree, one of the ways that we've made just so big is that we've made tracks really, really narrow, okay? And so getting the head there and finding exactly where the head is to be positioned takes a little bit of time, right? So seek and subtle time, okay? Now the head is over the track that I want, now what do I have to do? What's the next thing I have to wait for? What's that? Spinning, I have to wait for the data I want to come under the head, okay? So I have some rotational latency. And then finally, what's the last thing that has to happen? Now the data is under the head, what starts to happen and what needs to finish before the read is completed? Yeah, I actually have to transfer the data back. The data has to be read off the disk and pushed out back over the interconnect to the operating system, probably written into memory somewhere, right? Okay, so over time, right? As disks have evolved, okay? Interconnect speeds go up, right? I mean, they've gone up historically, sometimes with leaps, sometimes with bounds. I mean, new systems are now using SATA 6, right? Which is an interconnect speed that has double the speed of the previous standard, right? What about seek times? How fast do you think seek times are improving? Not much. Seek times are the biggest bane of disk, spinning disk performance, right? Seek times, again, you're moving this, this is moving the physical object, right? Like this is just, there's some limit to how fast we can do this, right? Without, and again, in a controlled manner, right? I mean, I could fire a little gun and send the thing spinning off and whack it into the spindle and destroy the drive, but then I wouldn't get the data. And then rotation speeds, so rotation speeds have gone up historically, but they've kind of seemed to have plateaued, and part of the reason for that is that they just don't matter that much, right? I mean, the rotational latency is not the largest component of the time it takes, right? The rotational speed is usually dominated by the seek time, right? It usually takes much, much longer to get the heads where I want them than to wait for the data to come out of the heads once the heads are in the right place, okay? So this is, you know, people talking, some of the file system design that we're going to discuss is motivated by what people refer to as the IO crisis, okay? And the IO crisis is created by two factors, right? One is that hard drive densities, as we mentioned in the Silverton class, hard drive before, today, this class, hard drive densities have gone up, right? So we've figured out how to pack more and more and more bytes onto a platter and onto a disk, okay? And it's because of this, right? I mean, your system now probably has an order of magnitude more disk space than it did 10 years ago, right? And it probably had an order of magnitude more than it did 10 years before that, right? So this is a log scale, right? And this only goes to 06, right? But you see, even by 06, I mean, you had, I don't know what these different, okay. So this is the three and a half inch form factor, two and a half inch, which are laptop drives. And then these are consumer based, like micro drives that you might find, or you used to find in things like iPods, right? So you can see that, you know, three and a half inch desktop disks are, you know, you can buy like three, four terabyte drives, right? I mean, those are big drives and they're actually really cheap. And the fact that the price per gigabyte has gone down so much encouraged people to start buying these machines with these mega drives on them and store all sorts of crap on, right? I mean, maybe it's not crap. Valuable family archival information, right? Or an increasingly underutilized collection of full HD movies or whatever it is, okay? But the point is that, you know, we gave people all the storage and by and large, as people usually do, when you give them more space, they spread out, right? But the problem is, at the same time that capacities were soaring, and at some level, the sort of the need for the disk and disk requests were increasing, bandwidth wasn't keeping up, right? So seek times weren't increasing fast enough. Interconnect speeds weren't going up fast enough and so disks were struggling to keep up with this, right? So when I was at Microsoft working on operative system performance, you know, that what they felt like was the big b-debling factor, and this was in, I don't know, 2002, 2002, no, sorry, I'm not that old, like, no, it was, oh man, I am that old, like 2000, maybe. Back when Windows XP was coming out, you know, they were really b-debled by what they called big, slow disks, you know? These huge, you know, back then it was probably like 300 gigabyte hard drives that consumers were buying despite the fact that the disk bandwidth wasn't really keeping up, right? Okay, so again, as we're gonna discuss, file systems are, the system operates as a response to various problems with operating physical disks, right? Physical disks have this very, very low level interface that's messy and really limited. You read and write, you know, 512 byte blocks. That's the interface that disks give us. And from that, we're gonna build up a whole system of files, right? A whole system of directories, structure to the file system, metadata associated with files, all this stuff. None of this is supported directly by disks. It's all stuff that's done at software. And again, the reason why this is, the opportunity to do this in software explains the fact that if you have, you know, a Linux system, you can install like eight different file systems on it, right? I mean, and it's because all this stuff is just software. So if you wanna write a file system, right? Actually, you know, if you wanted to write something yourself that you could run on your own machine. Hacking, you know, the scheduler is hard, right? You know, as we talked a little bit about this when we talked about Linux scheduling, but there's not really a pluggable scheduling. Maybe there is now, but for a while, there was not really a pluggable scheduling subsystem in Linux. If you wanna write your own virtual memory manager, good luck, right? Like that part is really, really tightly integrated with the rest of the system. But if you wanna write a file system, you can do it, right? And the systems, modern systems are built, particularly Unix systems, to accept and allow you to extend and write your own file system. So if you wanna write something for your own system, that you can run, you know, MeeFS or whatever, you can do this. This is the one area where you can have this. And I think this is kind of why, again, a lot of systems people have a soft spot for file systems, because it's really an area where we can play, okay? Let me briefly cover Flash, and I'm gonna let you get out of here. So on some level you think Flash, great, no moving parts. You know, I don't worry about the fact that the disk is faster on the outside and the inside, I have to move the heads around at these cylinder groups and it's all, oh man. And then you start to realize that, well, okay, a lot of Flash drives require that you erase an entire chunk before you can write one byte of it. So essentially, I have to read the whole thing in, modify one byte and write it out again, so that's kind of a pain. And then Flash also has this property where Flash wears out, right? And it can actually kind of, certain Flash chips wear out kind of fast. And so, and they can wear unevenly, right? And so now you start to think, well, okay, there's some complexity here again and we'll talk at some point a little bit about supporting file systems on Flash and the type of different characteristics that Flash drives have. And a little bit toward John's suggestion of, you know, how would you integrate Flash drives into the storage hierarchy that we have today, okay? So on Friday, we're gonna talk a little bit, just a few minutes, about scheduling disk requests and disk request scheduling algorithms. So if I give you a bunch of blocks to get on the disk, how does the disk schedule itself so that it gets those blocks in an efficient way? And then we'll start talking about file systems in terms of what sort of state to file system store, how are directories created, et cetera, et cetera. So we'll see you on Friday.