 All righty, welcome back to operating systems. I'm sideways. That's cool. All right, there's me. There we go. All right, so hopefully everyone had a at least restful reading week where you didn't really have to pay attention to this course anymore. So let's get back to the next topic. Well, we'll just carry on because the other sections already had their lecture on Friday before reading week. So we get to carry on with some content. It looks like we will make tomorrow more review time or at least no new lecture time. So we can do that because yeah, our midterm is on Wednesday, and it is all finalized and printed, so there's no going back now. All right, so on the topic of discs. Well, some people once they get in this course start talking about magnetic discs, and I assume most of you, because you were born with the year two in front of it, probably have no idea what the hell a magnetic spinning disc is and have never seen one at all. Has anyone used like the original iPods that were these big bricks? No? Yeah. Well, how many people here have seen hard disk drives like a big drive? That's a good portion of us. Not everyone though, because well, guess what? It's 2023 or whatever the year it is. So yeah, probably a long time ago. So you probably don't know what spinning magnetic discs are unless you are like me and a data hoarder or your Google or something like that. You probably have no idea. So if you're doing that, what a hard disk drive is, it's a big magnetic spinning piece of metal and through the power of magnets, it will store zeros and ones. And as you might imagine, it's a big spinning disc with a big head that just reads the data from it. And if you have hardware like that, well, guess what? All accesses are not treated equal because it's literally rotating under a head. You probably have to schedule. You can schedule it. You have to figure out where data is. If it's in a straight line, probably goes fast. If it's in random places because it physically has to move, access is going to be real slow. So we won't talk about that. We will talk a bit more about solid state drives or SSDs because while it's what we all know and enjoy. So they use transistors instead of like more like RAM instead of magnets to store data. So the pros over that is they don't have any moving parts, no physical limitations. They have high throughput and good random access. It doesn't matter where the data is physically located on. All accesses take the same amount of time. It's more energy efficient because if you're spinning a hunk of metal, well, that takes energy to actually spin it around at like 7,200 rotations per minute. They take some energy to do that. And also those drives are quite big. They're like this by this. So SSDs are a lot smaller than that so they have better space density. And cons is that they are more expensive. So hard disk drives you can buy easily by like 20 terabytes for a few hundred dollars. If you tried to buy an SSD that was that big, you will go broke. It will be a few thousand dollars. And at least back in the day, solid state drives had a limited number of rights you could do to them before they actually wore out. Technology has gotten a lot better since then. But when they were new people were worried about their drives running out of rights. So like after a year or two, suddenly your drive is no longer usable because you used all the rights on it. And solid state drives are also more complicated to write drivers for, although that's debatable. Why are they more complicated to write drivers for or actually interact with if you are the operating system? Because the actual layout looks something like this. So there is a big chip called a die. And within them it's divided into planes, which we can safely ignore. The only part where it comes to programming is the two lower levels. So the block and the pages and the pages typically they're four kilobytes and they correspond to our pages in memory, but they're the same size as our pages in memory, but it doesn't always have to. And note here that several pages live within a block. So a block contains many pages. Typically it's like 128 or 256. So like I just said, pages are typically four kilobytes, but not necessarily always four kilobytes, but it does make your life a little bit easier if it lines up with the page size of your virtual memory system. And typical performance numbers like reading a page is really, really quick. You can do it in like 10 microseconds. Writing a page, something like 100 microseconds, bit slower. And then the other operation is erasing a block. You might notice this is not erasing a page. It is erasing a block and it is excruciating slow. It is one millisecond. So this is how SSDs work, and it's kind of weird because it makes writing drivers a lot more complicated. So the rules are as follows. You can only read complete pages and write two freshly erased pages. So I can't just write a page and then overwrite a page. It needs to be freshly erased before I am allowed to write to it again. And that's where most of the complications happen if you actually have to interact with these things. So why is it laid out like that? I don't know. I'm not a hardware designer. You would have to ask them why it works like this. I'm assume physics. I don't know why. So erasing is done at a per block level, and a block will have 128 or 256 pages, and an entire block gets erased. So remember we also have the rule where we can only write two freshly erased pages. And we can only erase blocks at a time. So if for instance there's like 255 pages you want to retain, but one page is already written to, but you want to overwrite some data on it, well guess what? You're going to have to move 255 pages to a different block, then erase that block in order to rewrite that page. So they can get kind of annoying to work with. And writing is slow. We might have to use a new block, and you would have to keep track of the number of blocks. So the OS can help the hardware. The hardware doesn't always know what pages are used and what pages are unused. So the SSD, it would have a controller on it, and it might need to do things like garbage collect the block. So if you're not using a block anymore, well it can go ahead and just erase it, and then make all those pages now usable. But the disk controller is pretty stupid, and it doesn't know what data is on the drive, which things are still active, which pages are still actually being used by the operating system to represent files, so it doesn't actually know what blocks are still alive or if there are blocks that are dead. So SSD might think the disk is full when a file could be deleted, which might not mean it's erased, and it would still actually be on that disk wasting space. So the operating system has an option for SSD, something called Trim, and that is to help the SSD, it's specifically the SSDs, to tell the SSD that, hey, I'm not using this block anymore, so you are free to erase it whenever you are otherwise idle. So because of that, the OS works with the hardware controller, and therefore it can actually erase blocks that are otherwise not used. So that's about all we need to know for that, that's just a crash course on some hardware. So far we've been talking about single devices, so single hard drives. So how many of us have more than one hard drive in their computer or one storage disk in their computer? More than one, we got like one person with only two. So what happens if that drive dies in everyone else's computer? Box screens everything, and that data is now gone, right? So if that is your lab or your thesis or something like that, and your hard drive dies, it's gone. No way to recover. Not good, right? So if you're a Google or someone that cares about their data, you talk about someone having one drive, jokingly, as sometimes called a single large expensive disk, and it's just one large disk what most of us have, and if it fails, we're otherwise screwed. So what people do that actually care about their data is there's another technique to combine multiple disks, and that is called RAID or redundant array of independent disks. So that's a way to take multiple disks and use them together to either use some redundancy, which is redundancy is just keeping multiple copies of something, to either prevent data loss or to make things go faster is the two main concerns. So there are several different levels of RAID given by RAID and then a number, and the first one we'll talk about is something called RAID 0, and it's otherwise called a stripe volume, and it'll essentially take the disk and just divide it into little stripes or little portions of 256 kilobytes or 128, and it will distribute that data over the disk, and this is done primarily for speed. So say I just had one gigantic file called A, and I broke it up into eight parts, well I can store some parts on disk 0, so I'll store like the odd parts here, so A1, A3, A5, A7, and I can store the even number parts of that file on disk 1, so A2, A4, A6, and A8. So you would want to do this if you care about performance, right? So now if I want to read this file A, I can read half from one disk and half from another disk in parallel, so compared to a single disk it would be going twice as fast. But now what happens if one of these hard drives dies? Do I lose my file A? Yeah, I lose half of the file, and if I have half a file that's as good as having no file because it's probably unrecoverable at that point. I can't just make up the other half of the file. So this is good because if I have two disks I'll get twice as much read performance and also twice as much write performance, so if I'm writing a file, instead of just writing the entire file to one disk, I can write in parallel half to one disk and half to another disk. So I can do all my operations two times faster, but now I'm at an even bigger risk for data loss because now if one of my hard drives die, then I lose all my data. No good. At least lose half of it, that's probably indecipherable. And I can extend this and do the same thing with multiple disks, so if I have three disks I can just split it up into threes, so I can read and write three times as fast, but now if I lose any one of those three disks I am now screwed, I have lost data. So any questions about RAID 0? So RAID 0, just divide it up among the drives, go faster, live free, die hard or something like that. So data is strapped across all the disks, you will have faster access because you can use all the disks in parallel at roughly n times speed where n is the number of disks. So if I have three disks, my read and write performance will be three times as fast. The bad thing is that well, if I have any disk failure, I have data loss, so if you are using this for games or something like this, you do RAID 0, you get lots of performance and well if one of the drives dies you just replace it and reinstall everything again. RAID 1 is the polar opposite of RAID 0, so instead of going fast, RAID 1 is compared about just saving data, data integrity, making sure you have all your data. So RAID 1 is also called a mirror and it will just make each disk an exact copy of one another disk. So if I have my file A, which in this case is only in four parts, well on each disk I would have a complete copy of that file. So now in this case, if I lose, say I lose disk 1, have I lost any data? No, I haven't lost any data because disk 0 has an exact copy of all the data. And if I did this with three drives, so if I had three drives that were all exact copies of each other, could I lose two drives and still have my data? Yeah, sure thing. As long as there's one left, I'm all good. If I have four drives, doesn't matter, I can lose three. As long as one's left, I still have all my data. What about performance? Does this help me with performance at all? Is my read performance going to be better if I do this? No. Why not? Because they're the same. They're the same. Wait. Yeah. In this case, I could still read in parallel. So I could just say I have two disks, I could still read in parallel, just do the same thing as the striped, and just read the odd number ones from disk 0, and read the even number ones from disk 1, so I could read twice as fast. But what about writes? Yeah, everything's the exact copy of each other, so I have to write the data to every single disk. I could do that in parallel, but all of our speed ups are compared to the same disk. So it would just be the same performance as a single disk in terms of writing. So for reading, I get end time speed up, but for writing, still the same as a single disk. The only major benefit here is that you will not lose your data unless you've lost every single drive on your computer. So this is something you do if you really, really care about data. But no one really does it. So simple, but kind of wasteful, because every disk is the exact copy of one another disk. Especially kind of like making an automatic backup, and that way, if one of the drives dies, well, you're fine. Typically, people only use this with two drives if they don't have any other options. So we have good reliability in this case, as long as you have one disk left, you don't have any data loss, you have good read performance because you can split it up among the disks because they're all the same. But our write performance is the same as a single disk, and it's a high cost for redundancy. So I'm not gaining anything from using multiple drives. So for instance, if both of these drives were, I don't know, two terabytes, well, how many terabytes can I actually use if I have two terabyte drives using RAID 1? Yeah, just two terabytes because they're just exact copies of each other. If I had RAID 0 and each of these drives were two terabytes, well, I could actually use four terabytes, right? So because it's just spread out over every drive and in RAID 1, they're exact copies of each other. So what can I do that's kind of the best of both worlds, that somewhere in between a RAID 0 and a RAID 1 where I can get some more performance and some more redundancy without sacrificing too much? So the next thing we can talk about is RAID 4. And you might notice we have skipped RAID 2 and RAID 3. RAID 2 and RAID 3 were just terrible ideas that were a product of their time and had to deal with the hardware they used before. They were just bad ideas. So RAID 4 is the next number that's actually used. The idea behind this is it will stripe the data like RAID 0 over the disk and then save a single disk for parity. And what is parity? Parity is just some extra information that if you lose some data, it can reconstruct another one. And typically, the parity will just be an XOR of everything. So does everyone remember how XOR works? Yeah, XOR, exclusive OR. Okay? Explain XOR for me among three arguments, like three bit values. Yeah. Yeah, so XOR across multiple arguments, if you want to think of it faster and more intuitively, well, just add all the bit values together and it will be zero if it's even and one if it's odd. So that's an easier way to think of XOR. So in that case, that's the idea here. So there'll be one disk that AP will just be an XOR of A1, A2, and A3, such that if I lose any one of those, I can reconstruct the data. So let's see how that works. So for instance, if I have A1, A2, and A3 say it's 101, well, I can compute a parity of that. So it's just an XOR of all of those. So 1 plus 0 plus 1 is 2, so that's an odd. So if I XOR all of them, that's a zero, right? So now the beauty of this is if I lose any one drive, I can reconstruct what that data is. So for instance, if I erase A1, well, knowing this parity information here and knowing what A2 and A3 is, I should be able to reconstruct A1. So it's just some unknown plus 0 plus 1 needs to be even given what this says. So does that mean A1 could have been 0? No, because that doesn't correspond to my parity information because 0 plus 0 plus 1 is an odd number. So that means A1 couldn't have been 0. It must have been 1, right? So losing any one of these, I can reconstruct the other one. So everyone agree with me about that? OK, what about if I lost 2? Yeah, then using that parity information, no good because this is a solution, but this is also a solution to that parity information. And if you don't know which one it is, well, you have a 50-50 shot of getting it right. But that's not that great. You can kind of recover from this, depending if you know how the files are constructed and what should be there and blah, blah, blah. But in general, you can't do this because you have a 50-50 shot. So that is the whole idea between RAID 4. So I just have as many drives as I want, I just save one for parity information and then I stripe the data across all the other drives. So in this case, if all of these drives are 2 terabytes, how many terabytes can I actually use for all my files and things? Six, right? Because one drive is just parity information and I can use the other three data just distributed across them. OK, what about read performance? How is that compared to a single disk? N minus 1 times, right? So it's the same as RAID 0, but I have a disk for parity, so it's just N minus 1 times read performance. What about writes? So yeah, so it's a bit different. So it's kind of like write is like N minus 1, but there's a lot of pressure on disk 3. So imagine that disk 0 is just changing A1, disk 1 is changing B2, disk 3 is changing C3. Well, guess what? Those are going to hit the parity disk three times. So they're writing each one block, but guess what? The parity disk gets hit and that quickly becomes a bottleneck because, well, it's involved in every single RAID operation no matter what because it's the parity across every disk. If you change one of them, you have to change that parity disk. So with this, well, we can use almost the amount of free space. Performance is pretty good, but write performance might suffer because we're kind of bottlenecked on that parity disk. And also, these drives have a certain lifetime. So if you're in a big data center, it's not a question of if a drive will fail. It's a question of when. And in this case, we're kind of concentrating rates on disk 3. So disk 3 will probably die before all the other ones and fail first. So this is a good idea, but we can do a bit better. So RAID 5 is the exact same idea in RAID 4 and it's the first other one that is actually used. So RAID 5 is used. It's just RAID 4. But instead of just having a disk dedicated to parity, it distributes the parity across all the disks to try and even out things across all the disks. So in this case, disk 3 would have the parity information for A, disk 2 would have the parity information for B, disk 1 would have the parity information for C, disk 0 would have the parity information for D. This way, we just kind of balance it out more between the disks. There might be some imbalance depending on how you use the file, but this is a lot better than RAID 4. So otherwise, it's the exact same idea as RAID 4. For usability, we're still using a disk for parity. We're just distributing that parity across all the disks. So any questions about this? Otherwise, it's exactly the same. So if all these drives are two terabytes, well, I can use six terabytes of data because I have one disk reserved for parity. It's just distributed across all the other disks so it's shared more equally. So in this one, again, how many disks can die without you losing data? Just one, right? Because I only have one parity. I only have parity information. That's my XOR. So yeah, story time. So this is what I had. My thesis was on this. A disk drive died. I was working on it. Did I lose data when my disk died? No, I was fine, right? Typically, this drive died together. So another one failed while I was working because I didn't want to go replace it. So did I lose data? You do? Oh yeah. Oh yeah, that was fun. So don't do what I do. So there is another RAID level that is slightly better than RAID 5. So there is a RAID 6. So RAID 6 is the same idea as RAID 5 except it introduces another parity calculation. So now instead of just AP, which is an XOR of everything, it also has an AQ, which is a complicated thing that you can ask computer scientists about. It involves Galois fields and some advanced math and blah, blah, blah. But it's possible. It's just like a linear combination of XORs. Beside that, I can't explain the math behind it. But theoretically, we're just using two disks for parity and then we're just distributing that across all the other disks. So in this case, if every disk was two terabytes, how much can I actually use in this system? Six again, right? So I essentially have two disks for parity and then I strike the data across all the other disks. So now if I lose one disk drive, I haven't lost any data. If I've lost two, I haven't lost any data. And if I have lost three, now I am screwed. So this just introduces another level of parity. You lose some available space and you need at least four drives to do it. The right performance is going to be slightly less than RAID 5 due to another parity calculation and it will be essentially N minus two times read and write performance. But you do have the flexibility of being able to lose two drives without data loss. And which one you use depends on who you are. So this is basically, if you've used the cloud, which everyone has, guess what? This is how they store your data so it doesn't get lost. So what's the famous words? The cloud is just someone else's computer. Someone else is doing this for you that has multiple drives and essentially just have enough people using it that they can actually afford to use multiple drives and it's not a big deal. Maybe it's a bit too expensive for you guys to do it, but for people like if you're Google, well guess what? They have hundreds and hundreds and thousands and thousands of drives and it's actually someone's job at the data center to just go along with a cart full of hard drives and just start replacing dead ones. And that is someone's full-time job if you're operating at the scale of like Google or Amazon or something like that. So any questions about fun RAID stuff? Yeah, so the Q thing is not a parity of the parity because it has to retain information across like in this case A1, A2 and A3 because I have to be able to recover that data, right? So Q has to involve all of the data, not just a parity of the parity because yeah. But to get into what Q actually is, that's like math courses and you can look it up. It's basically a linear combination of XORs that works because math and I will not go further than that. All right, any other questions? All right, so the textbook has a weird thing because well, this isn't the only options, you can do other things. So in the textbook for some weird reason that I still to this day have no idea, they call RAID1, if you look at the fine print they said RAID1, when they say RAID1 they actually mean RAID1 plus zero or something called RAID10. So something you can do is you can combine RAID1 and RAID0. So the idea behind that is I have a RAID1 at the top where I maintain two copies of an array of RAID0 drives. So in this case, say I have six drives, I would RAID0 between three and three. So I would have something like this. So I have this, whoops, can't write. So here I have all my six disks and between every pair or sorry, between every pair from the top, all of these three drives would just be a RAID0 of each other. So they would just be striped. So my data would be like A1, A2, A3 with no copies on multiple drives. And then using the other three drives I would have a RAID1 between them so they would be exact copies of each other. So I would have A1, A2 and A3 here. So now if I do something like this and each of these drives were two terabytes, how many terabytes could I use with these six disks? So if each of these drives were two terabytes, how many terabytes could I use for actual storage? Six, right? So because I essentially split it in half and then make those two halves always equal and then between those two halves I just striped the data across all those drives. So I can only use half the space. So now how many drives can fail before I lose data? Yeah, it depends, right? So worst case, what's my worst case? Two. No, worst case for number of drives to die to lose data. Yeah, worst case is two, right? So if, say, this drive dies. Well, now I'm playing roulette and if I lose this drive on the other half I have now lost data because I have now no copies of A3. But I could have got lucky. So that's worst case, best case. Well, I could have lost that. No data loss, I still have a copy of A2 and then I could lose this one. No problem, I still have a copy of A1 and now at this point, if I lose any other disk I am now screwed. So this is an arrangement you can do if you'd like to essentially play the lottery. So if I added another disk, something like this, well, now I'm still at that limit where worst case, if two drives fail and they have the same data on them then I've lost data, but now I could withstand up to four drives failing if I got lucky. So if I just took out an entire mirror, got lucky I can actually survive some data loss or I can survive without data loss. So any questions about that? And this is pretty much a complete split between RAID 0 and RAID 1. So I have one copy of everything and then otherwise I just drive across them. Yeah, so you can combine the RAIDs in lots of different ways and it depends on your tolerance for data loss, how many drives you have, so on and so forth. So some people like RAID 6, so I can lose up to two drives. That might seem like a good idea, but what about like if you have eight drives, probably good to use this. What about if you have 40 drives, if you're Google? If you have 40 drives, you probably don't wanna use this, right? So if only two drives out of 40 fails, that's probably not a bet you wanna make. If you use RAID 1 plus 0, well, that's also not good. So you can split it up in other ways. So I can just essentially combine RAID 0 with a bunch of RAID 6s. So I could take like groups of eight drives and then do a RAID 6 between every group of eight so that in that group of eight, two can die and then I can just stripe across them so that I don't waste any space. And that way, if I have 48 drives and I have six groups of eight, well, suddenly I can lose up to whatever, 12 drives and be fine. So probably a better way and you can combine them any number of ways you want. Just depends on how much performance you want, how much you wanna spend and how much data, how much storage you actually wanna use from them. But most things, they'll just have a drawer of like 48 hard drives. They'll do some combination of RAID 6 in a bunch of little groups. All right, any more questions about fun disk stuff? All right, so the disks, we've finally gone to the last topic. So they enable persistence. We explored two topics today, SSDs, how that hardware works and RAID, SSDs more like RAM, except they have this weird thing where you can only access them in pages and blocks. And also they need to work with the operating system because they behave quite weirdly where you can only write to a freshly erased page and you can only erase blocks at a time. So the operating system in order to maximize the performance from the SSD and use it to its full performance has to keep track of what blocks are in use and whatnot. And for best performance, it should also inform the hardware if it is done using a block because it's not using the files anymore or not using the data. And then after that, we talked about RAID. So RAID allows you to combine disks to tolerate failures and improve performance in a variety of different ways. So with that, just remember, phone for you, we're on this together.