 Hello? Down here we're discussing motivation. It's a very fancy looking theoretical flyer down here about motivation. It has graphs, figures, diagrams, arousal theory, that sounds interesting. If you need some motivation right down here. Free motivation. Alright, so today we are launching into the third sort of unit of the class. This is on. Oh, maybe we are. Maybe we're having issues with this again. There we go. Okay, so today we're going to start talking about disks and stable storage. So of all the, so we talked about the CPU, we've talked about memory, now we're on to kind of the third big part of the system. Disks are probably the place where I would argue there's been the most sort of radical change within your lifetimes and my lifetime in terms of the storage hierarchy. You know, we have more cores than we used to. Memory technologies are a little bit different although fundamental sort of properties of memory are pretty much the same. But with disks, you know, the move that's going on right now from spinning disks to flash really changes a lot about how the operating system interfaces with those devices. We're going to talk about hard disk drives in this class and spinning disks and that's mainly because there's 40, 50, 60 years of really beautiful important work on file system design that, you know, is useful for you as a software developer to think about and learn from. However, you know, it's worth pointing out that flash drives are making a lot of this work obsolete or at least allowing us to rethink a lot of this work. So let's keep that in mind as we go along. Assignment three is due a week from Friday. There's motivation down here if you need it. Hopefully assignment three is motivation enough. Please, you know, like I pointed out, the first part of assignment three is not necessarily that difficult. Please don't use that an excuse to wait to get started. Instead, get it done and then you can move on to the other parts. So we also opened up the leader boards today so those are now publicly visible. If you, I posted on discourse about this, if you have tried to make a submission visible and it is not visible, it is possible that your partner is the problem. We're going to try to make this a little bit more clear. In order for your submission to show up on the leader board, if you have a perfect score, both you and your partner have to allow it on some level, either anonymously or sort of fully anonymously. So again, Yee Hong is working on making that a little bit more obvious in cases where you don't see your submission and you're wondering why. Okay. Any questions on VM before we go on? This is it. Yeah. Okay. Does anyone remember when I promised that we would have the midterm grades back? Two weeks from last Friday. Yes. So tax day, I guess. Is that true? No, April 8th. So I think it's the same day as the assignment three. What's that? I don't know about you. I have to pay taxes. Yeah, if it's not tax day, it's April 8th. That's our target for the midterm grading. Ali, are we on target for the midterm grading? Ali says we are. Okay, great. Yeah. So that's when we'll have the midterms back. We return, we give you guys your midterm back and you hopefully will come and get it and take it away forever. We're going to scan all the midterms before we return them. So we already have the original. So please come get your midterm, you know, burn it, cut it into little pieces, whatever you want to do, frame it. I don't know what you want, whatever you want, but please come and get it and take it away. You're welcome to wander off. And at that point, the complaint period will officially begin. All right, good question. Any questions about VM? All right, so let's talk about something new. Disks. So when we start talking about disks, I mean, a lot of you are probably familiar with this. You know, we talk about, so stable storage is the general catch term that describes both and variety of different technologies that we're going to be talking about. That's just storage that doesn't lose its contents when it's powered off. That's a way to, you know, hold those contents for a long period of time. There's two primary types of stable storage that we're going to talk about. And, you know, one is sort of the past and the other is the future. Hard disk drives. Hard disk drives, and we have some fun videos to watch today, but hard disk drives store data on a magnetic medium and move. So that's the big thing. So the way that hard drives, spinning disks, magnetic hard drives, retrieve data is by, is by moving. They spin the platters, they move things around. That's how you get data on and off of them. And that makes them, from a computer science perspective, fairly interesting. Flash drives, on the other hand, are what are known as solid state drives. Has anyone ever taken a part of flash drive? Opened up the case? It's really funny because, like, they're in a form factor that's designed around the old spinning disks, right? So spinning disks, if you take us, has anyone opened up a spinning disk before? So if you open up a spinning disk, there's stuff in there, right? It's pretty much, you know, if you get like a two and a half or three and a half inch drive, it's full. There's a bunch of stuff inside. If you open up the equivalent flash drive, it's like empty. It's mostly air. And there's a couple of chips sort of tucked away in one corner on a board, and that's it. So the, the solid state drives store things in what's known as non-volatile memory. There's some caveats and some consequences of this. But still today, and it's actually pretty impressive how the gap has shrunk. So I'll ask you guys in a sec if you can sort of predict this. Hard disk drives, spinning disks, are still cheaper than flash drives. And they also tend to be, because of that reason, they tend to be larger. They're cheaper in terms of price per gigabyte for capacity. So organizations that are doing large amounts of storage typically do sort of warm storage on spinning disks simply because it's still more cost effective. Now here's the interesting thing. So back when I started teaching this class, that was 2011. The difference was about an order of magnitude. So back then, about 10 cents per gigabyte for a hard disk drive, versus about a dollar for a solid state drive. Anyone know what that number is today? Anyone priced out disks recently? Want to pick a guess? Yeah. Quarter to a dollar, okay. So that that would indicate that it's shrunk to kind of a factor of four. I guess the question is, in this was projected for 2012, it was going to be an order of magnitude difference in price per capacity. Anyone want to guess what that is now? Yeah. 15 cents, okay. That's actually pretty close. What is it for the hard disk drive? What's the ratio here? How much more am I paying per unit capacity for a solid state drive? Pretty close. Yeah, so it's about 7 cents to 15 cents. So we're only talking about a factor of two. So go out and buy a solid state drive, because it's worth it. But this number is shrunk quite a bit, so this is pretty interesting. I don't know enough about the EE underlying these things to understand what's driving these trends. But it's certainly quite interesting that you're seeing a very, in a relatively short amount of time, a pretty significant narrowing in the gap. It is no longer prohibitively expensive to buy large solid state disk drives. And the performance that you can get from those drives in certain cases is much, much better than you would get from a spinning disk. How many people have a computer with a spinning disk in it still? Okay, how many people have a mobile device with a spinning disk in it? Oh, man, you guys are playing with fire. Get rid of that thing. It's going to break, trust me. I went through at least seven hard drives on my laptop when I was in graduate school. All right, so and when we talk about now some of the file system designs we're going to talk about, and more mature file system designs, we're still frequently designed around the capabilities and limitations and the problems associated with spinning disks that we're going to introduce today. But if you're confused about what we're talking about, please ask, because there's pretty significant differences with the new technology. Okay, so I talked a little bit about this, but you know, why bother? Right? I mean, this is a class, you know, maybe when when the department introduces the modern operating systems course, they'll stop talking about, about flash drives. But I think that it's worth. I mean, first of all, there's still a gazillion of these hard disk drives around. They're still out there in the world, they're still in active use. If you are really interested in just capacity per unit dollar, the hard disk drives are still a big win. And if you're a company like, I don't know, Dropbox, for example, that has to provision terabytes upon terabytes of storage, that multiplier of two is actually kind of important. For you, when you buy your laptop, maybe you're willing to pay another hundred bucks to get a flash drive. But, but for them, that's really an issue of sort of the bottom line. Why would, you know, get so, so let's just presume for now that flash drives perform a lot better, they're a little bit more expensive. What would be an application where you might still want to use a spinning disk? Yeah. Oh, that's an interesting question. Yeah. Okay. So, so here's another question. Do flash drives last forever? No, negative. Yeah. In terms of flash drives wear out, they wear out in a different way than hard disk drives. But they still, yeah, so flash drives are not forever. But, but what's an, you know, what's an example of a type of data that you might think would be okay to store on, on a hard disk drive? Yeah. Yeah, but I mean, in, in generally, we think of sort of cold or warm storage, things that don't get touched very often. What's an example of this from your own life? Give me some piece of data that you have stored in the cloud that is probably completely okay to move on to a spinning hard drive. Yeah. Your textbooks. I like that. That's a good answer. Pictures, maybe. I mean, what about the emails that you sent in 2010? You could still find those, right? I mean, Google still has those, or whoever you use as a cloud email provider probably still has them, but they're not hot. They're not stuff that you're looking at often. And so, to some degree, the story that we're starting to see emerge with the introduction of flash is it provides an interesting additional tier in the storage hierarchy, where if we have stuff that's really hot and is accessed really often, it might make sense in certain cases when I have performance constraints to move that stuff into flash, whereas stuff that's sort of coldish can be easily moved into, into spinning disks, where it's accessible. It's a lot more accessible than something like tape, which is super slow to access. But, you know, it's not, there's not a performance bottleneck when you're talking about really ancient email. It's kind of impressive to me to think about the fact that emails that I will probably never read again are sitting on some computer in one of Google's data centers. That is just a total waste of space. I'm just, I'm sorry. But there they are, right? It's kind of interesting. So, hierarchical file systems is the time that we talk about. How many people still organize their files on their computer? In like a, okay, now we all want to be organized, so I understand that. But like, how many of you have ever written a computer program to reorganize your files on your computer? Okay, there we go. See, you guys, the other ones, you're not really that organized, right? So, so yeah, I mean the idea of moving things, you know, you know, accessing stuff through folders. When you guys want to play a song on your computer, do you navigate to the folder where the song is located and double click on the mp3 or whatever it is? How many people do that? Yeah, okay. What do the rest of you do? Just don't listen to music? That's cool. Let's say. How do you find it? Okay, let's presume that you have some music somewhere that you can call up. How do you locate it? How do you, how do you get at it? You search, you start typing in the, you misspell the artist's name badly and Google's smart enough to figure it out. It's like, okay, here's which one. Yeah, so search is really changing the way that people interact with data. And if you listen to certain, you know, prognosticators, it's all ruining our minds and we can't think clearly anymore because all we have to do is like type of badly worded sentence into Google and it just is like, oh, here's what you want. But, but you know, the fact is like all these search tools like spotlight and, you know, any search tool that's built into apps that you guys use, these are still built on top of hierarchical storage. And hierarchical storage, hierarchical namespaces still form the basis of file systems that are out there in the world. I don't know how long that's going to be the case. I mean, this stuff changes very slowly. We're talking about data that's going to be around for decades. It's unlikely that Google is going to be really excited about a big project to move all of their completely cold and useless data onto some new file system who cares, right? It works, it's there, it doesn't matter. But there are sort of new trends emerging in this space and I think search is one of them. But fundamentally I mean understanding the future in this case always is always used to study what we've done and understand the past, where things came from. Because I think there are probably design quirks about file systems and storage that are inherited from spinning disks that will be with us for a long time if not for. And we will talk about flash. I'll probably add maybe a lecture this year on just entirely on flash storage. And the goal here just to keep this, you know, keep this in mind is to talk about the system design principle because while hard drives have changed, the principles of hard drives that make file system design very exciting. So there's a lot of work on file system design for decades and there's a variety of reasons for that. But one of the reasons for it is that there are things about hard drives that are difficult to deal with. And those things aren't gone. There may be hard drives don't fail in the same way that they used to, but other types of systems fail in the same way. And so it's still useful to talk about this. Okay, so let's talk about spinning disks. As you, if you've taught, I should really start doing this. I mean, I think when I took this lecture, Margot, she brought like a hard disk to class, right? So this is the one part of the computer and it's kind of, I mean, that's the kind of thing that's sad about flash. I mean, if you take apart a computer, it's not very interesting. It's like, there's, I don't know, some blob of integrated circuits over there and there's another blob over there and who knows when anything does, right? I mean, and the processor's cool, right? But of course, you're going to touch that and then it's going to like bonk out, you'll be out 200 bucks. You know, you can kind of see that, you can kind of see memory. But disks were the one thing that was kind of fun to take apart because there's moving parts in there and, you know, it's a little, little sort of, there's stuff that goes on. So where data is actually stored, these are called platters. Platter is coated with some sort of magnetic medium. Data is written in red to and from these, these platters. That's what actually, that's where the bit of data is actually located. The spindle, so the way that the spinning disks work is they have a bunch of platters, they all rotate together and the rotation is part of how we navigate to a particular spot on disk. And we'll talk about this more later. And then the, most of the, the brains and most of the complexity here is in what's called the disk head. So the head is the sensor and actuator that is actually reading and writing data from the disk. The heads also move, so they move and that allows them to move from one side of the platter to the other. Okay, here's a diagram. Let's check this out. Platters, right, and there's a stack of these. There's not just one. Modern disks would have, I don't know, eight, ten, twelve, a whole stack of these. Each one of them has some amount of capacity on it. They're rotating around the spindle. Here's, and the heads are on this arm, right? This arm swings back and forth, which is what allows the heads to reach any spot on disk. So through some combination of moving the head and allowing the disk to spin underneath the heads, I can reach any part of the disk that I would need to, need to reach. Just make sense? Many people have seen this before. Okay, cool, great. How do you get to other platters? Great question. Anyone know? Yeah. Okay, let's, anyone want to guess which one it is? Yeah, so what that arm is actually like a rake, right? So there's heads on at least the top and probably actually, so each platter has two sides, so there's actually heads facing every platter all the way up and down. But the whole head, typically on, you know, modern sort of inexpensive disks, I'm pretty sure you have one arm that moves a whole bunch of heads. Now that has consequences, right? Because all the heads are in the same position on the disk at one, at any given time. So if I have, if I have some data, so that, that affects what the heads can read at the same time. Does that make sense? Yeah, I think there's a, let me, okay. So we talk about places on the disk. So the disk has actual physical locality, which is kind of interesting, right? I mean, we talk about memory. You think about physical memory as having some sort of address space from low address to the high address. But you don't think like, I mean, maybe you do. It would be interesting to be able to figure out like what stick of actual physical memory is this address on. But we usually don't care. With the disk we do, because it affects a lot of how performance on disks works and how we design file systems. So the track is, you know, one, think of like a race track or the track is a lane on a race track that runs all the way around the platter. The track is what I can read without moving the heads. So if all I do is let the disk spin, I can read all the data on one track. So a sector resembles something like this. So this is like a slice of pie. And the cylinder is one track dropped through all the platters. And here's a diagram. So the track is the yellow part. The sector is this. This is not a arrow. This is an actual slice. So that's one sector on the disk potentially. And the cylinder is a little hard to think about, but it's almost like a can that's been dropped all the way through the disk. Does that make sense? The cylinder is interesting because that's all the data I can read from the entire disk without moving the heads. Just letting the disk spin. Okay, so we have some fun videos to watch today. Because this move, right? I mean you could make a video of memory and use, but it's not very interesting. Whereas at disk it's alright. So let's check this out. Alright. Free advertising for whoever this guy is. So we've got platters, right? Spindle. This is the head. And again if you rotated this you would see that this head has identical arms. Sorry, the arm has identical sort of components that are in between each track. Sorry, in between each platter. There we go. Oops. There we go. So that's cool, right? Now you know what made that noise, right? So I suspect what the noise is is actually the heads stopping or you know getting to one point and turning around, right? So this is fairly inexplicable. This is clearly a Windows machine because I'm serious because deleting a folder should not take this much time. It's not like it was just really one thing you have to do and then it's kind of gone, right? But anyway, okay, here we go. Oh, sorry. We're still going. We're still deleting that folder. I'm sure there's like icon, like you know there's an animation going on, like things are flying to the trash. So this is kind of cool, right? So this is a copy-paste. Now here you can really see locality. One of the files is over here on the disk, literally located there. The other one is over here and you can, if you watch carefully, you can see the heads essentially picking up a little bit of data on one side and dropping it on the other. Oops, sorry. Alright, so now we're doing a quick format. Alright, questions about this? It's kind of cool. Yeah. Oh, good question. Yeah. What fails? All sorts of things can fail. I mean the magnetism I suspect is usually not the problem, right? The disk has physical parts. This is why spinning disks, for a long time, I mean particularly on mobile devices, spinning disks were a serious and problematic cause of failure. This is one of the reasons why flash drives are so great, particularly if you have a laptop. I mean it used to be if you dropped your laptop, you set a little prayer. Sometimes you were lucky. Sometimes actually you'd pick it up and you'd have about 20 seconds to say goodbye. That was it. And then something would freeze and that would be it. And we'll talk about why that happens in a second. So usually, now once you start dropping the disk or doing other things to it, all sorts of things can go wrong. But the physical parts of the disk are usually what fail. So the and when we start talking about file systems, just always have to keep this in mind. Discs move and that really sets them apart from everything else we studied. And that movement causes failures, it causes things to wear out, it causes delays that are different in computer time from the other delays that we've talked about. The delay to move a physical object is always way way way bigger than to like throw some electrons across a bus. And that starts to impact how file systems are designed, caching all sorts of other things. Yeah and that causes, that's really what caused disks to be slow. The old spinning disk was the fact that there was things moving. Now if you want to get something done fast, don't move. Like don't move physically and you'll go a lot faster. The other thing is sort of how disks are integrated into the rest of the system. So the things that we've been talking about before, the CPU and memory, these things are really tightly integrated into the OS. Discs less so, right? How can you tell? What is the kind of hint that you might have if you use a modern computer system that the disk is a little bit more separated from the operating system? Well I have two different technologies, so that's a good point. I mean it's important that those two technologies expose the same interface. But where do I see differences in software? Yeah. Okay so that's, yeah I mean disks show up on your computer as devices, which is, but again it's sort of a hint at what's going on here. But what's, where do I see the difference? Right? I mean maybe let me put it another way. So with the CPU, a lot of the fancy stuff that goes on in CPUs is done internally. So if you think about things like out-of-order execution and pipelining and all sorts of things like that, that's stuff that the CPU does without the operating systems help at all. Memory, the OS is really heavily involved in providing this address space abstraction that's very common. On the disk there's a lot of software involved and again how can you tell? How many people have installed a Linux or an Ubuntu type system before? Okay. You have to make a choice at some point. Maybe you didn't notice. Did you choose something about the type of memory management abstractions you wanted to provide when you install the operating system? You did? What choice did you make? No, okay sorry. No, the memory. So did you make any choices related to the virtual memory subsystem? Yeah okay that's not a big deal, right? It's not like, it was like what type of virtual memory system would you like? Did it ask you what type of you know schedule or did you like? No, you just got the schedule that was built in. You got the virtual memory system that was built in but you made a choice which he pointed out which is what file system do you want, right? File systems are software and I think that's the other reason why they're fun, right? They're fun to talk about is because we have a lot of choice. You guys could go out and build your own file system and you could integrate it with Linux pretty easily if you want to. In fact, there are tools that allowed you to build file systems in software, right? Above the OS interface so you can actually something called fuse which allows you to implement a file system entirely, sorry not in software, that's clear in user space. You don't even have to have kernel privileges. You can just take a file, like an area on disk and build your own file system there and go wild in case you want to build a file system. That would be cool, yeah? Like open, when I open a file it plays a YouTube video, right? So yeah, I mean like the disk, the underlying disk interface and abstraction will talk about. There's not a lot there, it's basically just chunks of data. What you do with that in software and how you build a file system on top of it is really up to you, yeah? So what I mean is that the interface that people use to find things, so 30 years ago if you wanted to find something on your computer, you either opened up some sort of nasty gooey and started clicking on folders and you're like oh where did I put that letter? I'm going to look oh it's my letters folder and it's in this whatever. Now people don't do that, right? They just open spotlight or whatever the dog thing on Microsoft is called. I don't think it's the dog anymore, whatever they call. And you start typing like letter and then it shows you a bunch of search results. It's built in index basically. To some degree I would argue a lot of computers just have no idea where things are anymore. You open up your Microsoft Word editor, you click New document, you start typing something, you click Save, you close it. I bet most people could not find that file on their computer if they tried, right? But they know how to find, they know how to locate it which is either they open Microsoft Word again they say oh recent files or something or they search or whatever. So just the way that people locate things is different, right? The location. Now again under the covers there is a hierarchical file, something that this is all built on top of. But you can imagine getting rid of that. You can imagine having a file system that you couldn't browse in the same way hierarchically but you could still search. I mean to some degree this is almost like that. Anyone ever remember using Yahoo? Ever used Yahoo? Back when they had, so Yahoo had this heroic effort to essentially build a hierarchical quote-unquote kind of topic-based file system on top of the internet. You know it was like topic, subtopic. I mean they tried to classify everything. It was very Java-like. It's like we must have a type hierarchy. And Google was like no way. Who cares? Search, right? Like you just put in your search terms. Google is not trying to build this like rigid hierarchical structure. They're just trying to help you find things. So maybe that embodies the difference. It's just how people locate things. Yahoo thought somebody would locate, if they're looking for computer help, they would click on the computers section and then they would click on the help section and then they would click on the Microsoft Windows section and they would click on whatever, right? So it's just a different interface, a different way of locating things. And the web is really what's driving this, right? I mean Google is really sort of that that functionality is really what's changing a lot of how people interact with their computer. And I would argue maybe for the better, unless you're one of the people who thinks that we're all stupider because we can use Google to search for things. I guess. I don't know. To me that seems weird. We can find out a lot more than we used to. So I don't know how that makes it stupider. Okay, so I already pointed this out. I mean because this move we do a lot of work at the file system level to try to hide these latencies. And there's tricks upon tricks upon tricks. Some of them feel like hacks now that are all based around this idea. And that's what's so interesting about flash because flash doesn't move, flash doesn't have the same locality constraints. So to some degree flash comes in and just says I don't care about this problem any more. And all the file system people are like we've been solving this problem for 40 years and we're so good at it. And yet gone. All this locality stuff is gone. But it's still fun to talk about. And again clearly moving things around is slow. So accessing the byte of memory fast, moving the head across the disk. Now you saw that head. That looked fast. Right? But keep in mind you're a slow human. So in the time that it took that head to move from one part of the disk to the other, I just got like a gigabyte from my RAM chip easily or something like that. It just doesn't compare. Disks also fail. So one of the fun, some of the fun and I would argue more lasting contributions of file system design have to do with failure and figuring out ways to recover from failures, anticipate failures, things like this. Disks can fail. So this is a fun fact. And there's an interesting paper maybe we'll read. We're getting to the part of the class where I start to assign research papers for you guys to look at. Little ones that are fun. About five years later there was a group that did this really fun study. So they bought ten, or I don't know what it was, twenty identical hard drives. Okay? Quote unquote identical. They were by the same maker, same manufacturer, same vendor ID, same lot number. You know they might have come off the factory line like next to each other. What they found is that there were huge performance differences between these discs. And a lot of it has to do with sort of manufacturing defects that disc cover up. So in a lot of cases spinning discs have, they come with bad sectors. They come with parts of the disc that don't work. And I do not know why this is. I'm sure this is just some sort of manufacturing process problem. But instead of, this is kind of clever, instead of trying to make the manufacturing process perfect what the vendors do is they just say as long as the disc knows what parts are bad and avoids those, doesn't let you use them, then it's okay. So discs do these checks where they figure out, okay that sectors gone, I'm not using that anymore. And this can happen over time as well. So in certain cases you can have sectors that sort of fail. And again I have no idea why this is but it happens and discs will kind of remap things to make those sectors go away. But this can cause some of those performance differences. All right. And discs can fail all at once. So that's kind of like slow decline. Now when we talk about flash, flash can also fail in this slow burn sort of sense. Where it turns out after a long period of time, flash has a certain number of times that it could read, it can erase, sorry, it can basically write a particular part of the flash drive. And once you exceed that number there's a probability that those will fail. At which point that chunk of the flash drive is gone. And so we'll come back to this when we talk about flash. But flash chips do all sorts of funny things to try to make that problem go away. So discs can also fail in these ugly ways. So a head crash, you saw those heads as you know they were you know flying back and forth across the disc. In order to improve density, those heads sit like, I don't know, it's an engineer number. It's like a nanometer or something above the surface of the disc. They are really really really close to the platter. And of course it makes sense, you know the farther away I am the harder it is to read or write data to it. In the closer I am I can write smaller and smaller and smaller tracks which makes the overall capacity of the disc larger. Unfortunately if I, I can't touch the platter. The platter is moving quite quickly. My heads have like little sharp edges on them. And what can happen if I do, if I do touch the platter, this is what's called a head crash. And this just will wipe out a disc. Right? I mean this basically ruins the whole thing. And this is I'm sure what I did to like disc after disc when I was in college. Just by dropping them. So here we go. Let's watch another video. YouTube is full of these. Can anyone see it? Can anyone see the problem? It's so funny. It almost looks like a racing stripe. You're like oh that's a fancy hard drive right? I didn't know they can. Does that make it go faster? You know like a someone told me the other day if I had a racing stripe to my car, 10% faster immediately. Right? Can anyone, can anyone see the problem? Yeah, like this is not good. That's not, again, that's not a decal. That's where he'll explain it. Oops. Sorry. Notice here, notice that this only occurred in a small area of the platter creating a cylindrical scratch. Yeah, so this is also kind of interesting because of the fact that that discs are so kinetic. They do a little bit of work. I mean this is not some sort of advanced filtration system. This is like a little piece of gauze or something. But they try to avoid getting particles in here. Because you can imagine if I have a little piece of something hard and it ends up inside the disc case and starts bouncing around. It could do some damage. Right? There we go. How many people do backups? Okay, thank you. I'll see you get, by the way, gets a great backup tool. Okay, so we're going to talk about RAID later, which is a cool way to bring multiple discs together to basically both improve performance and reliability. But there's also a fair amount of fault tolerance that's done at the file assistant level. And there's a bunch of, again, I think they're enduring contributions from the file system community. There's several that have to do with failure handling. Okay, so now let's talk about the fact that discs are slow. So disc fail. I'll talk a little bit about that. Because discs are slow, they're often the part of the system that everything else is waiting on. If I have to read or write from a file, if I have to actually do some IO. And this is still true, even when I'm using flash. I mean flash is faster, but it's still way slower than memory, way slower than registers or first or second level caches. And so the file system does a lot to try to hide this. And in this case, the operating system is really intimately involved. So go back to the processor example. The out of order execution the processors do to try to hide memory latencies is totally opaque to the operating system. All the operating system knows is that instructions go in one end and come out the other. And they come out in order. What the processor is doing in internally? No clue. It's too fast. It's happening too fast. With the file system on the other hand, sort of the difference in speed between the memory subsystem and the files and the disc is something that the OS is very involved in handling in software. And so there's a bunch of things that we can learn here. We use the fact that all these system design, classic system design techniques have been applied to this problem. You know, wait to do things. And I just set this. So the OS is really much more involved in hiding these latencies and so there's more for us to learn here about some of the classic approaches to doing this. So let's talk about what's slow about reader writing to disk. So it's a brief primer in terms of what you actually have to do to read or write to disk. So the first thing I have to do is tell the disk what to do. Disk is a device so there's some protocol I have to use to speak to it and tell it for example I have this buffer of data and I want you to write it at this point to this address on the disk. And at this point we're not talking about file systems. There's no notion of a file name or anything else. This could be anything. We'll come back and talk about the interface that disks expose but it's quite simple. Essentially an array of chunks of data of a fixed size like 256 or 512 bytes. So I tell it what to do. The drive then has to move the heads. Now this is simplified somewhat. There's more there's buffering that goes on here as well. The drive has to move the heads to that location. Then there's something called settle time which I love which is essentially the time it takes for the heads to stabilize. So the heads have been on a very exciting journey from one side of the disk to the other. And they made that journey as fast as possible but when they get over to the other side they're kind of like shaking back and forth a little bit. So you have to kind of wait for them to stabilize so they're centered on the track that you're trying to read or write from. And then there's a rotation time where the platters actually have to rotate the position where the data is located. The disk reads this internally and then transfers the data back to the OS or back to the system over some kind of bus. Yeah. I have no idea. Small. Right. Yeah. Small as possible. So the capacity increases that we've seen in disks which I have a slide to show you in just a second are really the function of just being able to make tracks smaller and smaller and smaller and to make the heads smaller and smaller and smaller and to make everything more precise. So in the past you've had much larger disks. The tracks were a lot wider. You know you couldn't stabilize the heads is well enough. So I think there's a lot of kind of cool electrical and mechanical engineering that went into being able to make the tracks and also materials engineering. Right. To make the tracks so so small. Obviously the problem is at some point if they're too if they're too narrow there's not enough magnetic material to really hold the information. When they're too narrow and too close together I also have interference. Let's think about it. And again I'm way out of my league here. But imagine you're the head. You're like holding a magnet over the disk and like magnetizing various parts and un-magnetizing other parts. But there's two other tracks right there. And so those two tracks are too close to me. I end up actually spreading a little bit of that over into other regions. And this happens naturally. But there's enough sort of redundancy in how the disk is used in order to prevent it from being a problem. But I certainly don't want to erase data on a neighboring track. So over time when we looked at improving spinning disks. So one of the things that happened is capacity went up because tracks got smaller. Internet connect speeds got much much faster. What about the seek time? So the seek is actually moving the head from one part of the disk to the other. How much do you think that improved over the past 20 years? Moore's law, like that? Not so much. So this just doesn't is not improving that rapidly. And there's just fundamental physical constraints here that come into play. Rotation speeds don't end up, you know, maybe has anyone ever been tempted to buy like a 100,000 RPM disk? I don't think those exist actually. I think those would like, I don't know, they might be unstable and fly out of control and shatter everywhere or something. But it turns out that the rotation latency is just not the problem. By the time I get to the track, waiting for the disk to rotate around just turns out to not take very long compared with the amount of time it's taking me to get there and stabilize the heads. So this just turns out to not be a huge component of overall speed. So over the past 30 years, there's interesting trends going on, right? So on one hand, you had Moore's law that was making processors and other parts of the system a lot faster. And on the other hand, you had disks and there were two problems with this. First of all, disks were not getting faster very quickly. But people kept buying bigger and bigger and bigger disks. I mean, it's it's sort of shocking. I mean, I know that processors are faster now. But the amount of disk space that my computer has now compared to what it had when I was in college is really amazing to me, right? I mean, in college, I bought a 20 gigabyte hard drive and I thought that was huge. That was like my big hard drive, you know. Now it's like I have like two terabyte drives that are an array to raise. Why not, you know? And you know people are recording videos and of their stupid cats and taking pictures and like thousand P resolutions so that they can be blown up the size of like the side of a building or whatever. Like I don't know what people are using all the storage for, but clearly they are because people kept buying these things, right? So here's just let me finish with like two two graphs. So here's a fun graph. This is showing from 84 until 06. So it ends relatively recently sort of the progression of a couple of different types of disk technologies. These are the three and a half inch drives that you typically find in desktop machines. These are the two and a half inch drives that you find more commonly in laptops. These are some of the smaller micro drives that you were seeing in things like iPod. And so this is a log scale. So this is capacity. So capacity is going up exponentially over time. On the other hand, here are the performance. This is performance in terms of the the amount of data I can get on or off the disk. Now this looks like this kind of the only problem is this is not a log scale. So this is linear. So over the same period of time, you had roughly three orders of magnitude improvements in capacity, but only two in speed. So I've got more and more stuff on the disk and yet the ratio between disk performance and disk capacity is changing. The other thing of course that's happening around the same time are processors in memory are getting way, way, way faster. So this ratio really defines a lot of what we're going to talk about over the next couple of weeks. Okay. So I will pick up here on Wednesday. Wait, it is Wednesday. Friday, Friday. I'll see you guys on Friday.