 So I think we'll do Iggy Pop next time. Sorry. All right, so it's a thin crowd today. I guess it's nice outside. People have noticed that. So today, hopefully, last time, we sort of got to the point where I tried to convince you that there are still exciting things that will probably happen in the future of technology in your lifetime. Today, we'll talk a little bit about how to approach research papers, and then we'll talk about rate, which is pretty fun. No announcements today. Simon, 3 part 2 is due a week from today. At some point, I think Scott's going to post the back end targets today, or at some point over the weekend. So keep an eye out for those. If you guys are done and want to submit, hit the top of the leaderboard. The, what was I going to say? At some point, we're going to have performance targets up there too. We'll get to that for the people that want to go above and beyond. All right, any questions about LFS or about technology trends? We'll ask you, some of you guys. So in your lifetime, assuming you're going to live like, I don't know, at least 50, 60 years, what do you guys expect to happen? What's the thing that you're the most excited about? It's going to change in the world of technology. Yeah. Asteroid mining? Cool, I don't know what that is, but it sounds exciting. Well, I guess as you go to an asteroid and you drill into it, OK, got it. Yeah, Steve. OK. I like this retro computing. Like floppy disk, looks like a floppy disk stores, I don't know, like four terabytes of data. That's cool. Like that. What else? What else? Yeah, Steve. Oh, interesting. Quantum computing, interesting. That would be, yeah. So what is quantum computing promise to make possible? What's exciting about quantum computing or sort of scary? I know like this much about it, but yeah. Yeah, you can factor. Uh-oh. If you can factor, I'm moving to another planet because there's going to be chaos here. Everything, everyone will know everything about everyone else. All secrets will be revealed. All bank accounts will be available to everybody. Again, I don't know. Buy your cabin in the woods right now. Prepare for that. There we go, OK. What else? Future. Let's hope, let's assume we can't factor, so there is a future. What's that? Cloning. OK, yeah. It's more like biomedical. What about tech? What's exciting about tech? Yeah. Self-driving cars. I have to say, in like the next 10 years, that is, I am ready, man, just sign me up. I'm ready for my self-driving car, yeah. Time travel, hmm. Interesting. Forward or backward? OK. One of those directions is harder than the other. Yeah, run. However, something should be done just because we can do it. Oh, we're already having that conversation, I think. There was a woman who was here like two weeks ago, gave a distinguished talk to anyone, go. Yeah, so the talk was on deceptive speech. So I don't know. I mean, a lot of it was lost on me. But basically, it sounds like what she wants to be able to do is to listen to you and to be able to tell if you're lying or not. I don't know if I want that to be possible, because that would be interesting. Yeah, OK, I didn't know what that means. Yeah, so OK, well, let's put it this way. At some point in the future, you will just be walking around with some sort of implant or something that will record everything about your life continuously. I don't know who says it's time to watch all that stuff. That sounds pretty boring. But there'll be some highlights. I don't know. That'll be interesting. Yeah, imagine being able to relive any moment in your life, at least on some level. OK, this is good stuff. I mean, who knows? So I would just encourage you guys as technologists. I mean, to some degree, what you might do on a day-to-day level might be a little bit mundane. But think about the future. I mean, it's fun. It's an interesting exercise. OK, so let's talk about research papers. So research papers, to some degree, are about the future. They may not be like the wild and crazy time travel to asteroids that we're going to mine, or our clones will mine on those asteroids, because we're going to stay at home where we can factor. So they may not be that far out, but the research projects that you read about are things that might happen in the future. There are things that people are talking about, considering new ideas, new ways of building systems, new capabilities of existing systems that are fun to think about. If you want to read papers, and at some point, and reading papers may be a good way to think about it, is this is a way to connect you to the future. This is a way to connect you to the cutting edge of systems design. If you want to read papers, read good papers. There's a lot of crappy papers out there. There's an axiom in academia which says that any paper can be published somewhere. So you can take that and expand on it, and assume that a lot of the papers you find are any paper. You don't want to read any paper. You want to read good papers. And we'll look at some excellent papers over the next month or so. Don't worry about understanding the details unless you really care. A lot of systems papers include a certain amount of implementation detail. It doesn't really matter. What's the big idea in the paper if there is one? Or what's the main contribution? That's what you're going to remember in 10 minutes, or certainly in 10 days. So unless you're really curious about it, don't worry about getting bogged down and exactly how people did things. Look for the whys. Try to figure out how this fits into a bigger picture. So we'll talk about this in a second. There's nothing natural about how academics communicate with each other, but there is a system, which you can describe. And papers tend to typically fall into categories. So let me come back to these. And we'll come back to all of these. So where do papers get published? So if you go out, if you wanted to read papers about operating systems or systems, I would tell you there's a workshop called Hot Topics in Operating Systems, literally, Hot Topics. Now, how hot those topics are is debatable, but there are certainly some hot papers at Hot OS every couple of years. And here's an example of workshop papers. So workshop papers tend to be short. They're trying to be provocative. Again, hot topics, hot in operating systems. They don't always succeed, but this was a paper that Margot and her group published at, when was this, I think this was maybe a decade ago. So it's not super hot anymore. But this is interesting. I mean, the premise here is that hierarchical file systems or hierarchical namespaces, which you guys have just learned about, are dead. And this was a paper that talked about search-based file systems, none of the things. So this is making a design argument. We're not going to build these kinds of systems anymore. In the future, we'll build systems differently. Longer versions of papers get published in what are called conferences. These have a more complete evaluation. They have more results, blah, blah, blah. I mean, this is expected to present a full system in a way that's convincing hot topics or workshop papers are smaller and just kind of intended to give you an idea of something that people are working on. They're more to get feedback. So here's an example. This is another paper that we may look at that has to do with fundamental limits on concurrency that are created by operating system interfaces. So this is a pretty cool paper, too. I think there was an exam question about this paper a few years ago. And then journal papers just ignore them. It's like all the stuff that people cut out of the conference paper that was not interesting enough to make it into the conference paper. So if you want to read that stuff, go for it. The rest of us have other things to do. And so I would stick to the shorter paper. In a lot of cases, the workshop paper is the most fun to read because it's the most provocative. All right, so here's when you start reading the paper, and we'll do this today when we look at the original raid paper, one of the first things is to try to figure out what kind of paper are you reading. And this can be hard without some context. There are idea papers or sometimes big idea papers. These are papers that spawn entire industries, change fields, reveal some fundamental fact about systems that people hadn't understood before, hadn't noticed. When you read a paper like this, frequently you feel a combination of, wow, that's a great idea. And almost immediately afterward, I'll just warn you about this, it will seem totally obvious. So that's a warning about new ideas. Ideas that seem obvious in retrospect tend to be really, really good ideas. So don't let it fool you. If you think, oh, well, of course. You're like, well, of course, why didn't you know it a moment before? Problem papers are intended to try to get people to care about a certain thing. They try to claim that a certain thing is wrong or broken or we're doing things in the wrong way. They're trying to open up people's ideas and thoughts about a new area of systems. Data papers tend to be about analyzing problems. So they bring a data set into the picture. They analyze it, and they try to glean some insights about how systems work based on that. New technology papers. So I like these. These are kind of like there's this new hardware device. And systems and software people write these papers all the time because we don't tend to create the hardware device. Someone comes along and says, here's a new piece of hardware that has this interesting feature. We're like, ooh, that's interesting. How would I integrate this into a system? A good example of this kind of paper is there was a paper at the hot topics in operating systems maybe, again, a decade ago talking about. Imagine you had a storage substrate that was as durable as flash but as fast as memory. And there's research into these types of memory technologies that's going on right now. So right now, we have these things. Flash has these trade-offs, and it's slower than memory, and disks are even slower. But imagine you had a device that just had a huge chunk of memory that was persistent. How would you design systems around it? It turns out there's some really interesting things to think about. And then wrong way papers are usually, and this is my taxonomy. Who knows? I mean, this isn't some official list. But you read papers that are trying to claim that we're doing something wrong. The community is doing something wrong. We've built something in the wrong way. The way that it used to work doesn't work anymore or whatever. And we should try some things. And we'll look at examples of some of these papers. I won't promise all of them, but we'll certainly look at a couple. All right, so a quick breakdown. So when you sit down with a research paper and look at it, what are the parts of the paper? Abstract, read it. It's short, what an abstract is. It's sort of an overview of the paper's content. Introduction, again, usually a longer version of the abstract. Usually worth reading. Usually has some of the bigger ideas in the paper in it. So if you're only going to read a couple parts of the paper, this is what you would read. And these are kind of some types of things that you can be in the introduction. These are the kind of things you'll find. Sometimes there's a motivation section that will continue to try to convince you that this is interesting or important. Then they'll talk about sort of what we did. And some papers have this section and some don't, because in some cases it's just an idea. Like a workshop paper might just be all about motivation and this is the wrong way or this is how to do something and we actually haven't built anything. We just want to convince you that this is the right way to do something. But if you're talking about a paper that's about a real system, there's usually a design section that says here's how the system is designed. A lot of times you try to stay away from actually describing how the system is implemented because the point is, this is like the design document for the system, gives you high level principles, goals, things we were trying to accomplish. The implementation part will give you more of the GUI details about how things actually worked. And a lot of times that's not particularly interesting. The design section has more of the high level ideas. People talk about related work. The goal here is to show you that I'm doing something new and then results. And these are kind of important. If you're writing a paper that's about analysis, the results might be the whole paper. All I'm doing is analyzing a data set that's interesting and drawing some conclusions from it. If I'm building a real system, the results section will have evidence in it or it should have evidence in it that this system is an improvement on what has gone before. If it doesn't, you should be suspicious. If it does, you should still be suspicious because clearly when you build something a certain way, a lot of times you're trying to make an argument that this is the right way to do things. And intentionally or unintentionally, you might be blind to certain aspects of the system or certain things that don't work well or whatever. I mean, you guys heard a little bit about this when we talked about LFS. Certainly the LFS papers tried to put its performance in the best light, but when people went back later and looked at it, they found some problems. Okay. And then, I guess I was talking about the evaluation. This is sort of like, does it work? So any questions about this? Not particularly, probably. It was good to do this on the same day we do RAID because this is boring and RAID is not boring. All right, so let's talk about RAID. Any questions right off the bat? How many people have ever used RAID? Set up a RAID device? Okay, okay, so you guys have some idea. What, like if you think about RAID, so this is 1988, I think, maybe earlier, I think it's 88. What kind of research paper is this? How would you describe the RAID paper? 1988. Eh, not exactly. There's no new technology here. No? What's that? Okay, just like, I just asked you to raise your hand and some of you guys had actually used this technology that was proposed in a paper in 1988. So what kind of paper is this? In the, with the long view of history, right? I mean, RAID became a new technology, so that's a hint, right? But at the time, there was no RAID, there were just disks. So RAID comes along. This is a huge idea paper, right? I mean, when your paper gives birth to a multi-billion dollar industry, you can put this tag on it, right? You guys are, now we're all, ah, RAID, whatever, okay? But there was a point at which RAID didn't exist. There was somebody who came up with the idea. This is the paper that started, you know, a huge industry and an approach to solving problems that goes far beyond disks. So yeah, so and, so what's the big idea behind this paper? You can boil it down, yeah. Okay, so there's ideas about duplication and redundancy here, but what's the, yeah? No, but take it out of data, right? Try to generalize the idea as much as possible. Because once you do that, you see this idea everywhere today. Yeah, exactly. I mean, these are things that RAID accomplishes in the context of storage, but what's radical about what they're doing here? What did they propose? Yeah, you know, yeah, so that's like that, I would say here is the key idea. I can use a couple of cheap things to build a system that outperforms a single expensive thing. Does that make sense? So where else do you see this idea play out? Give me another example of someone. Yeah, so originally, now this is not true anymore, right? But when Google got started, Google bought commodity PCs. Google stocked their data center with Dell PCs. They would buy like a thousand of them in a time. They would get them, you know, and they had, there were reasons for this. The PCs were kind of exactly what they wanted. They had big disks, they had certain features, but they didn't buy expensive servers. Now they do, right? Now if you look at Google's data centers, they've got custom made servers because they've got like a million of them or something like that, right? But when they started out, they said, look, we're gonna build this really reliable, efficient search engine, but we're not gonna do it by buying a bunch of expensive servers. Instead, we're gonna purchase a bunch of cheap commodity PCs. And they got a great, I mean, you can imagine if you buy the old Dell's at like 10,000 at a time, you get a pretty good disk app, right? So, you know, multi-core processors is another example of this. Here we had some fundamental limits that we hit. But at some point, getting more speed and more computation out of hardware required that we start replacing, you know, the cores that replace the P4 were at some level a little simpler, they were just more of them. And actually, I think as we see the future of computation and silicon, you're gonna see this even more in certain ways. Google crowd sourcing, you know? I mean, using, trying to gather lots of unreliable inputs from a large group of people and combine them together to create a reliable result. It is, you know, I could measure a certain thing with a really expensive sensor or I could try to combine a bunch of measurements from cheap sensors into an equally reliable data point. Okay? All right, so what's the problem? I mean, the rate paper starts off by pointing out a problem with current disk technologies. Again, 1988. So many of you guys were probably not born. But if anyone looked at the paper like the first couple pages, it's always a good test to see if anyone read the paper because this is like very, very early on. And I think the sentence starts with the problem is, so it's not hard. They didn't hide it behind fancy language. Computer CPUs are getting faster, computer memory is getting faster and hard drives aren't keeping up. So they have this second part of the paper that's like the IO crisis or something, the current IO crisis. And they try to convince you in the paper that this is a fundamental problem. And I think they're right. And it didn't, they do a little bit of math and it's kind of, it gets a little weird for a minute, but you kind of agree with them. I mean, look, this is 1988. It's almost 30 years later. We know that this happened. Now we're living in the future. We know that they were right, but they were quite precedent. So we need innovation to avoid this IO crisis. And so now, if you apply the, I just gave away the answer, but if you apply the RAID approach to things, there's this fundamental trade-off here that they're making, which is the expensive hard drive, you can argue would be more reliable. Should be more reliable. I paid more money for it. I get more hours out of it. So the cheap things fail more often. And so RAID is both an argument about performance, meaning I can get more performance if I merge things together. But as people pointed out, it's also an argument about fault tolerance. How does this play out in Google's data centers? I mean, clearly Google has been online the whole time we've been here. What goes on there? Yeah, I don't even, it's probably at least a couple of hard drives, entire machines in Google have failed since we started class. Just, I mean, you could just do the math. You can say, okay, they have a million machines, this many hard drives, hard drives last on average this long. And so they're gonna have failures, maybe a couple failures an hour. So once you have enough machines, they're just constantly dying. They're constantly crashing. Some robot is going into the data center and like taking the disk out, sticking another one in. Maybe the whole machine is gone. Maybe some other thing failed. So once you have these big computer systems, stuff is failing constantly. And yet you never notice. I mean, has anybody ever noticed some consequence of this? Using like a big cloud provider? I mean, maybe Gmail goes down for like two seconds or something, but I've never noticed this, right? Certainly haven't lost data or whatever. So, we've gotten to the point now where it's well understood and well accepted that a great way to, it's possible to build these reliable systems that have great uptime on this very shifting foundation of machines that are constantly crashing and failing and dying and stuff like that. So it's kind of interesting. So we need a plan, right? We need a way to handle these failures. And Raid was one of the first. So here's the paper. We're not gonna look at it in detail. Oh, I think my pointer gets stuck when this happens. So let me go over here and free it. All right. So let's talk about the different Raid levels. Because this is kind of interesting. So a normal disk, I've got a certain amount of data. And one of the things that's actually really fun about this paper is the way that they build up an approach in this very incremental fashion. So what's the first way to create a more reliable and more performant disk array? So I'm gonna use cheap disks. I have, you know, they have equations and they build models about their failure characteristics. But what's the easiest way to do this? What's the, or what's the Raid one way to do it? How does Raid one work? There is a slide up there that explains it. It's not that complicated. Yeah. Yeah. So with Raid one, I duplicate everything on both drives. And so to help understand Raid, I brought a prop today. So imagine I have a piece of data, like some question on the exam. And you guys are interested in that piece of data. And you wanna make sure that I store it reliably. But failures represent me shooting one of these guys with the Nerf gun, right? So what I can do is I can tell both of them the same thing, right? And then, you know, if I happen to shoot one of them, that's not working very well. What's that? I think they're helping me to debug my prop. We haven't used these for a while in the lab. It's too bad. Ah, there we go, okay. There we go. So I just, Vicky has failed, right? So do we still know the information? Who are you gonna ask at this point? Yeah, so we still know. That's good. So this is Raid one. It's not complicated. But on some level, Vicky and Matt also have a limit to how much I can tell them, right? They're not gonna remember an infinite amount of information. So if I always am telling them the same things, how much data can I store on this pair of disks? So remember, I started with one disk worth of data. Now I bought two copies of that same disk. How much, what's the capacity of the Raid one array? Yeah, so I'm, the efficiency of the array is essentially half. It's because every byte is literally being written twice. Does that make sense? Now, if I want, now imagine every time I ask one of them a question, it takes them a minute to respond. Because they both know the same information, what's the overall throughput I can get out of both of them? How does that change? Because remember, disks are physical devices. They've got a seek and stuff like that. So if I wanna ask them a bunch of questions all in a row, how do I do that to get the most throughput? Yeah, I split them up. So I ask one of them one question and while they're remembering it, I ask the other one and I go back and forth. And so the read capacity here is, sorry, the read bandwidth is actually better than a single disk because you can imagine I have two sets of heads and so I can send requests to either disk. What about the write capacity? What about the write throughput? Can I do the same thing with writes? Yeah, so it's either the same or a little bit worse. Why would it be worse? Why would write slow down a little bit? Yeah, not even a question of seeking slower, it just means that I get the worst of both seek times. So when I make a write to a raid one array, I have to wait until both disks store the information. So depending on where they're positioned, you can imagine that I'm essentially taking two random numbers drawn from a distribution and taking the larger one every time because I have to wait until both writes complete to know that there's data on the drive. All right, so this makes sense? Yeah, not on a raid one array, right? The goal on raid one is that the two copies are identical and you could imagine if I have some failure, I can reconstruct the array by, if I find a block that's mismatched, I kind of choose one of them, right? I just keep them in sync, yeah. But raid one array, the goal is I always keep them in sync and so when I write to them, I have to write to both of them synchronously and wait for both writes to finish before the data is done. Yeah, good question. All right, so there's a couple of problems here. The main problem here is capacity. I've lost quite a bit of space. But there's another problem, okay, so the performance here is better reads capacity is the same as one drive, but I bought two. So I'm not super happy about that. Okay, the second thing they think about in raid two, now there's only a couple of these raid levels that are really alive anymore. Does anyone know what they are? There's five original raid levels. How many are still in use? Zero is not true raid. Zero is I wanna lose all my data raid. We'll come back to raid zero in a minute. Raid one and five. Raid 10 is also after this paper, right? So yeah, raid one and five are still around. The other interstitial levels are pretty much gone. They're just not used. So here's the problem. What happens if one of these two guys tells, let's say I ask them both a question and they give me different answers. So one of the disks is corrupt. It's been, the data on it has been broken or something like that, all right? Told it to store a certain amount of information. It's stored the wrong information. So what we do in raid two is we say, not only am I gonna store the data, but I'm gonna store extra information. I'm actually gonna store a hamming code with the data that's gonna allow me to recover from a certain number of failures. And if you've talked to a tree or take courses on coding theory, I can design these codes to survive sort of an arbitrary number of failures. So imagine here I've got six disks. What this array allows me to do, now you can see I'm writing four bits and then I have these three extra bits I have to write. What those three extra bits allow me to do is to reconstruct the original values even if any of those disks starts to lie to me. It tells me the wrong information. I store a zero and it returns a one. So this is raid two. What's the capacity here compared with raid one? How much extra information am I storing? So raid one I wrote everything twice. Here when I write four bits I store how many extra bits? Three. This is better. I'm being a little bit more clever. I'm using coding theory to reduce the amount of extra information I have to store. That makes sense. And the capacity is a little bit better for that reason. I'm storing a little bit less redundant information. However, when I read or write to any one of these values it still requires most of the disks because if I change, for example, if I change this value, I have to change a bunch of the other values. I have to change the Hamming code values to make sure they still match. So writing one bit potentially requires modifying a bunch of other bits. And when I read values, I have to read all of them. So I have to read all of the individual values and then check the extra codes to make sure that they're all consistent. So what's an, yeah, sorry. Yeah, it turns out you only need three in this case. To handle one failure, you only need three, right? And you can sort of, and maybe this actually even handles two failures, I can't remember. You can convince yourself of this pretty easily if you look up coding theory, right? When I'm duplicating information, I'm actually storing more information than I need to recover stuff. Okay, so this model, rate two assumes that disks have this funny property which is I can write one value and then read a different value from the same disk. Is that how disks tend to fail? I mean, I know you guys weren't around in 1988, but based on your understanding, is this how disks, do you think this is how disks fail? No, so, sorry. On rate three, what they start to assume is that the disk itself can tell when it's failed. So if I have six of you guys and I ask you values and one of you guys has failed, you'll say I don't know or you'll just be like, you know, whatever. So here I'm assuming that there's some disk level controller that can detect failures and so I don't need to worry about a disk telling me the wrong value. The disk will never return an incorrect value. Instead, what will happen is the disk will lose the value. It'll say I'm not sure or it won't return a result. You can imagine this is equivalent of just taking an array and yanking a disk out of it. The disk is gone. The array knows it's gone. The disk is not powered on anymore. So if I send a command to that disk, I'm not gonna get incorrect data. I'm gonna get no data. So now the nice thing is if I make that assumption, I can reduce the amount of extra information I need to one, to a much smaller amount. So for any number of, so if I wanna be able to correct an error and a certain number of bits, I store one extra bit. That means that if any of those bits is missing, I use the extra bit to determine the parity and I can correct it. Does this make sense? Rate two, the way that I do the coding here is a lot more in common with the way I do coding for things like wireless signals, where they actually can get corrupted. This is a feature that's much more akin to things that fail. So if I tell you guys five different values, five different one or zero values, I compute the parity bit by summing them all up and determining whether it's odd or even. And that's the extra bit that I store. That way, if one of you guys fails, I use the parity bit to determine whether or not the failure was a zero or one. Does that make sense? That's confusing. Hopefully you guys understand this. So here, compared with rate two, I made even more improvements in capacity because I'm storing less redundant information. I'm only storing, you know, I only need one bit of redundant information rather than three for that four bit sequence. And to some degree, the rest of the system is unchanged. Yeah, Ron. Ah, okay, this is a great question. So what happens here if two drives fail? Data loss, exactly. Yeah. And so if you look at the paper, remember, they have all these calculations about the probability of that happening. It turns out, in a lot of cases, if you have a rate or rate that can tolerate one disk failure, what happens when that disk fails? What does the array stop doing? Like I have a rate array, I've got five disks in it. One of the disk dies and the array can tell that. If I've configured it to handle one failure, what will the array stop doing? Yeah. Yeah, maybe, maybe stop reads too, just stop working, right? Usually, you're right. I mean, usually what will happen in this whole set, I'm not gonna let you write anymore because if you lose one more disk, this data's gone. So what's your job at that point? Replace the disk. Yeah, exactly. Again, this is like the same design principle that everybody uses to build everything now. When one of the computers, if you took half of the computers in Google's data centers and could get them to all fail at the same time, you know what would happen? Data loss, right? Who knows what it would be, like your old emails, pictures, I don't know, whatever, YouTube videos are gone, but it would happen, I guarantee it. Nobody can survive that kind of outage. I mean, Google probably has plans to survive like one or two of their data centers being down or hit by a nuclear strike or something like that. But if I could take half of their computers and crash them and destroy the drives, they would lose data. Probably, right? I doubt that their failure margin's at that high. But as individual machines and disks fail, they replace them and the replaced disks are reloaded with additional information and the whole system just continues to go. So as long as I have small numbers of failures and the system is built around it, I can replace the components in time to keep it operational. But if I can wipe out a bunch of things all at once, then you're in trouble. This is why people say, you know, if you really want to save, if you really want your data to be really secure, why is it not sufficient to put it on a radar? You really, this is like super important data. It has to like survive the potential extinction of our current human civilization, right? Like aliens ain't be able to find this when they make their way to our planet, like 100,000 years. Why is it not okay to put it on one radar? It's not secure. Why not? What's that? Well, no, I mean, I'm gonna argue, well, if one drive fails, I'll replace the drive and you keep maintaining that array. No, why is it not safe? Sorry, no, what is it? If your building burns out. Yeah, exactly. Like what if I hit your, oh, I hit your building with a missile, right? You're in trouble. The data's gone. So you gotta have one copy here and one copy, you know, halfway around the world and another copy that's been buried at the bottom of a mine and blah, blah, blah, right? If you really want to keep things safe. Yeah, you just duplicate stuff, right? But you can certainly move the disks around, right? Just don't have stuff physically co-located. It's interesting, it turns out, you guys may not know this, I found this really fascinating. Insurance companies handle claims differently if a missile hits your house during a war or during not a war, right? So you can file a claim if a missile hits your house, but not if the area that you're in is actually officially being part of a war. I guess that makes sense, but I thought it was interesting. There's a lot of fine print in those insurance policies. What's that? No, I just heard this on the radio. It was pretty interesting. It has, but to some degree it's related because it has to do with correlated risk. Insurance companies don't want to ensure things where a bunch of things would go wrong at the same time. That's why it's so expensive to get flood insurance because a flood will destroy a whole bunch of houses all at once, all in the same place. Okay, and that would also destroy your radar. So raid four, so the difference between raid three and raid four is essentially boils down to in raid three I'm storing small bits of information on each disk and then like you could imagine that I have one bit, two bit, three bits, check bit, four bit, five bit, six bit, check bit. It would be probably better to have five disks in this example because then I could store nibbles of four bits, but you get the point. And so the problem here is that if I wanna read like a byte, how many disks do I have to contact here? All of them, okay. It turns out that's kinda dumb because I can do the check sums on a much higher granularity. So I can take a whole 256 bytes and produce check sums for all of the parts of it. And so what we do in raid four is we use a much larger stripe size. I store larger blocks of data. And so in this array, let's say that the block size is like 256 bytes or something like that, how many disks do I have to contact to read or write one byte? Just two, right? Whatever byte it's on and disk three, which is the check disk. So on this system, I've got, these are my data blocks and this is the check block that's stored on the fourth disk. This makes sense. So this is improving my throughput because I can use more of the disks in parallel. What's the last problem here? It's kind of alluded to on the slides. There's one thing I can do a little bit better. One final little optimization. So we said that reading or writing bytes to this array, particularly writing bytes, but certainly reading requires two disks. So if the byte is here, which disks do I have to access? Zero and three. What about if the byte is here? What about the byte is here? What do all three of those have in common? Disk three. So the check disk in this raid four scenario becomes a bottleneck. Every operation on the array requires the check disk. So what's a natural way to fix this? Yeah, so what I do here is I stripe the parity information across all the disks. And this is raid five. This is still around. People still use this. Here you can see that disk one has data blocks for A, B, C, and a check block for D. Disk one has data blocks for these two check blocks here. So if I want to modify a byte here in C, I have to talk to disk zero and disk one. If I want to modify a byte here in B, I talk to disk one and disk two. So now I've distributed the load for all the check sums across the disks equally. Does it make sense? Yeah, Steve. Well, remember, I don't want check bits next to the data because the check bit has to be somewhere where the data can fail and the check bits are alive. So it's always required that data and the check sums live on different disks, right? That's what, so for example, here, let's say that disk two fails. I lose this guy but I can reconstruct it using this check one. I use this, I lose this guy and I can reconstruct it using this check block. I lose A3 which I can reconstruct. What about B? I've lost a check bit. What do I do? I reconstruct it using the data blocks, right? I use the data blocks to generate the check sum anyway. So I can definitely reconstruct the check sum. So this is what would happen if I took a disk out of this array. So this improves write performance because now when I write, I'm not writing to that one disk over and over again where all the check sums are located. Okay, so here's rate zero. I have a rate zero array but it's because it stores things I don't care about, right? Like your midterm exams. So like the scans of them, which I don't really care about. The rates, do you even know what rate zero is? Can anyone describe rate zero? Fake rate, non-rate, bogus rate, what do I do? Yeah. Yeah, I essentially take a file system and I just put half the blocks on one drive and half the blocks on the other drive. So how much redundancy do I have? Zero hole, rate zero. No, there is no rate level here. The performance is fantastic though because half the IO goes to one drive and half the IO goes to the other and so you have twice the bandwidth over the disk interfaces and you have twice the performance of the disk. So the performance is great. There's no redundancy here whatsoever. And the capacity is also great. There's no, because there's no check information anywhere. I just hope things go right. No redundancy. All right. So the original rate proposal was that I could tolerate the failure of one drive. Has anyone had set up like a more modern rate system? Like using what? ZFS. ZFS, yeah, that's a good example. If you want file system features, check out ZFS. ZFS has a crapload of features. One of the features that ZFS has is the ability to use multiple disks and you can parametrize the fault tolerance. So on ZFS you can say things and on other rates that you can, you can imagine generalizing these ideas. So if I wanna survive two failures at the same time, then I just have to have some more check information. I just, it increases the amount of redundant information I have to store. I have to think about striping things but I can certainly build a rate array that survives an arbitrary number of failures. And you know, you can have a case. The nice thing about this is that the rate array continues to work even when one drive is missing. So if one drive goes down, you imagine somebody gets an email saying, by the way, you need to buy a new drive. While I'm out at the store doing that, the array is still working. If I lose two drives, then it says forget it. I can't go on. But one drive, that gives me time to go replace it without the array being down. On rate zero? Typically, yeah, that's what's fun about rate zero. Like you're living on the edge. Because imagine like half of your file system data structures are on one. So normally when you build a rate zero array, the file system doesn't really know about it. So the file system just thinks, wow, that's a really big disk you have there. So the file system puts some of its data structures there. And so imagine I took a file system and I killed off half the blocks. Would it continue to work? Probably not. That's again, that's just a large number of, you'd hit I-nodes, you'd tell us it's important. It's just very unlikely. But yeah, so if you want to live on the edge and get great performance, go rate zero. So it actually turns out from a redundancy perspective, rate zero is worse than a single disk. Why? Yeah, exactly. So imagine I've got two disks and every time I buy a disk, that disk is going to work for a certain period of time or a certain number of writes. With a normal file system, it works until the disk fails. With rate zero, it's like before, I get to pick two random numbers from a distribution and I get the shortest one. That's when my file system dies. So that's a little sad. Okay. Yeah, so this is what happens. So when you put a disk and now rebuilding the array, that seems like a sort of trivial. How long do you think that takes? Big rate array, disk goes down, I throw a disk in and it's just sweet, ready to go. This is actually another interesting sort of part of RAID. The story doesn't end when I put the new disk in. Why not? Yeah, I mean it basically has to rebuild that drive. It has to remember to go back to RAID five. If I lose this disk, I have to replace all the data on it. And that takes a long time. Particularly if you have a big rate array that's like multiple terabytes or something like that, you lose a disk, it can take days for it to be rebuilt. Unfortunately, while the array is being rebuilt, it's still vulnerable to data loss because if that drive goes out and another drive fails, I still don't have, that drive is up at full capacity. So if that drive fails, then I'm a trouble. All right. Any questions about RAID or redundancy in the real world? Yeah. Yeah, I mean there are actually companies where, I don't know, I'm assuming this is going to take off at some point. My wife is a photographer and so we've talked about backup a bunch of times and I always find it to be a really frightening conversation because the solutions are all really expensive. But there are companies where they will send you like a storage device that you put in your house. And let's say that has four terabytes of storage. You get some percentage of that, right? And it networks itself with all of these other devices all over the place and builds a redundant sort of data system out of them. And so pieces of your data information that would be required to rebuild some of your information is stored all over the place. So you can certainly do that. I mean if you think about companies that run big data centers, one of the reasons why they geo-distribute them is performance, obviously, because they're closer to people that are using them, but it's also fault tolerance, right? If a whole, because look, I mean, if a hurricane hits Florida, it's possible it's going to take out a few data centers. I don't think people build data centers in Florida, it's too hot. And there's hurricanes, right? But you don't want to have a single point of failure. And so geo-distributing things is something that big companies do and certainly I think something that you'll see people start to do as well. That's what I would love, I don't know. When we looked into this those solutions just weren't very mature yet. Maybe it's time to take another look. Any other questions? All right, so next week we're going to do a couple of lectures on OS design. So how do I organize the operating system itself? Where do I put things? What are the implications of that? We'll start that up on Monday. I'll see you then.