 All righty, welcome back to Operating System. So our final, yeah, it's on a crappy day. It's also in crappy rooms. So you're writing it in like a church and like some other room and something. So like the lecture rooms are like 10 minutes away from each other, which means if I want to be in all the rooms, that's really hard. So I emailed the university to complain and they replied to me that they said they're looking into it. So if we get super lucky, it might change rooms or day. Hopefully day because yeah, that day sucks. So I will keep you in the loop with that. But yeah, again, not a great day. So today we finally get into the last of the three main topics. So saw a bunch of virtualization, virtual memory, threads, all that fun stuff. Also concurrency, data races, and then our final one was persistence. So we finally get to start that. So first we talk about disks. So I assume most of us or most of you are young enough that you do not know what a magnetic disk looks like, like a spinning disk that's big. Has anyone ever used one of those? What, a few? Hey, we got a few. Okay, so if you have one of those, it's a big mechanical disk that spins around. You would have to do your own scheduling with that and it's super inconsistent because it's a big hunk of metal that spins around and reads a magnet. So it has really bad random access. Things, it's just a head reading a disk that's spinning. So ideally everything's all in the same order. If you have random access, it might be in other locations on disks. The head has to move all over the place. Random access is terrible and you also have to schedule what blocks you read when. I removed that because most of you guys are young enough that you don't really use one unless you're a data hoarder like me or you've seen the first ever iPod, the big thick boy. Well, it's thick because it uses a magnetic disk. That's before SSDs are really a thing. So we'll talk about SSDs just quickly in minimal detail. They're the more modern alternatives. So they use transistors like RAM to store data rather than spinning magnetic disks. So the pros are it doesn't have any moving parts or physical limitations. So hard drives you can imagine didn't like being dropped because it's a spinning magnetic disk. It also didn't like medics magnets being near it because again, it's magnetic. You corrupt all your data or any of that fun stuff. They don't like heat. They have tons of physical limitations and SSDs will have higher throughput and actually good random access because while it's just electricity, electricity doesn't really care about the location for things. SSDs are also more energy efficient, better space density. So you can have, you know, disks that is this big that stores terabytes versus a big hunk of metal that has to be like this big. Cons is that it's more expensive and SSDs also have a limited number of rights although more modern ones. Generally the endurance or how many rights you can do them will outlast the life of the device. So generally this is less of an issue. Like the first generations only lasted for like two years until you ran out of rights and then you couldn't write any more data to them or change anything. So you had to buy a new one and people were very displeased. The cons is also the way that they work. They're more complicated to write drivers for which is what we will get into in a bit of detail. So the internal layout of an SSD kind of looks like this. So all of the storage is on something called a die and then it's divided into multiple planes and then within the planes, there are blocks which are the gray blocks here and then within a block lives a page. So the only important thing we have to care about is blocks and pages. So page sizes, the pages might not necessarily be the same page size we use for virtual memory. Typically they are, so typically they're four kilobytes and then a bunch of pages live on a block. Typically it's like 128 or 256 pages that live on a block. So why do we have to talk about blocks and pages? Well, it's a bit weird. So we can read a page and reading a page is a lot faster than writing to a page. So like reading a page will be on the order of like 10 microseconds, something like that. While writing to a page is going to be a lot slower, it will be like 100 microseconds and then we can only erase a block. So you might notice here that we cannot erase a page which is why we have to distinguish between a block and a page. So you can only erase a block at a time and that is much slower, it's like one millisecond. So the other weird rule is, well you can read complete pages but when you write to a page, that page must have been freshly erased. So you write to a page and you cannot write to it again until it has been erased. And remember we can only erase blocks at a time, we can't just erase an individual page. So you can imagine that makes your life a lot more difficult if you are the kernel and you have to manage all of the pages. So again, erasing is done per block, block will have like 128 or 256 pages and you need to erase the entire block and you need to freshly erase the pages before you can write to them and writing is slow. So what might need to happen is if you just need to move some data like say you have a file that is on a page and you want to just update it, you want to modify it, well guess what? Instead of just rewriting over it, it's going to have to write to a new page, possibly in a new block that it has to erase and move that page over to that block and essentially just copy most of the data minus your modification and then that page will just be unused on that block and you can imagine over time, that whole block might be full of unused pages that we just moved somewhere else. So it becomes, the kernel has to do a lot of keeping track of where the pages actually are. So the SSDs themselves too can also garbage collect blocks. So what they will do is if it sees that, oh maybe only one page is being used in this block and everything else I can't write to it anyways, well I might just go ahead and move that to a new block and then go ahead and erase that block so I can reuse it again. So that would, you would need to copy any active pages to that new block which you're just copying data that's really not doing anything useful so that would be considered overhead. And also if you are the disk controller like if you are the hardware on the disk you might not know what pages are alive within a block or what blocks are still alive. The SSD hardware doesn't actually know what the operating system is using all the pages for. So the SSD might think the disk is full but the operating system knows that those things are not used so it actually has to communicate with the SSD in order to figure out oh I can actually erase this safely without having to, without needing to move it. So that's essentially what the trim command is so that is an option that typically whenever you set up an SSD you want to make sure that's always enabled because what the trim command does is let the kernel tell the SSD that an entire, I'm not using anything on that entire block so you can erase it at your own leisure. So the SSD will have a controller on it and then whenever it is otherwise idle it can just erase that block without having to move any data around so it can go nice and fast. So, so far we've been talking about single devices and that's probably what everyone has used in their computer. So anyone have a computer with multiple hard drives in it? One, two, three. All right so your multiple drives are they just single drives that you access or are they anything special? Just drive C, drive D, something like that. Okay, so that's good. So that is called, sometimes jokingly called a single large expensive disk. So it's just one disk for data and if that disk dies you're screwed, right? Any files that are on that are now gone. You have a single point of failure. Well, there's also another thing you can do if you have multiple disks and it's called RAID which stands for redundant array of independent disks. So the idea behind that is we take the data and distribute it across multiple disks and we use redundancy. Redundancy is just having multiple copies of the data and if you have multiple copies of the data we can sometimes prevent data loss. So if it dies you don't lose any data or you can use redundancy. So having multiple copies of it to increase throughput or increase performance. So the first RAID level is something called RAID zero which is also called Stripe and data stripes typically there are 128 kilobytes or 256 kilobytes and it will break up your files into things of that size. So in this case it just distributes the data over multiple disks. So if we have two disks it would take, let's say file A here which is split up into eight parts and it would put essentially half of it on disk zero and then half of that file on disk one. So disk zero would have A one, A three, A five and then disk one would have A two, A four and A six. So in this case what would be the benefit of doing this? So what happens if one of those disks dies? Yeah, so you lose half your data but you lose half of a file say which makes the entire file useless if you're missing half of it. So in this case it's actually worse. So if you split your file across two drives and now if either drive dies you lose data which is bad. So why might I want to do this? Yeah, why would I want to do this? Cause right now if a single drive dies I'm screwed. Sorry, not quite, what about, oh yeah. If your file doesn't fit on one drive maybe, yeah, yeah. Yeah faster, right? So if I split up the data across two disks well compared to a single disk if I want to read this file well instead of reading it all from a single disk I'm reading half from one and half from the other which I can do in parallel so it'll be twice as fast. And similarly if I have to write the file while I write half of it to one disk and half of it to the other. So this is purely for performance. So however many disks I have I could add a third disk split file up into three and then that way I get three times better performance. But if I have a third disk in this case well if I lose any one of those three disks I've lost everything. So this is what you will see on super gamer computers or something like that because this is for performance you only care about going fast. So buying a super fast hard drive you can't get any faster than that. Well the only way you can get faster than that is getting two of them and doing a RAID zero between them. And if you're just gaming on it or something like that who cares if one dies and you lose data just buy a new one and reinstall it, right? So here this is great. We get this, so all the performance comparisons will be compared to a single disk. So compared to a single disk if we have a RAID zero of n disks we will get an n time speed up for reading and writing because we're splitting it up over however many disks we have. And the only con here is we essentially trade speed for not having any data redundancy or anything like that. If any of those disks fails we're screwed. So any questions about RAID zero? So we're gonna RAID zero, split things up, go fast, live dangerously. So the other opposite extreme to RAID zero is something called RAID one and that is a mirror which means every single disk will be an exact copy of each other. So let's say file A only has four parts so instead of putting half on one disk and half on the other each disk has a complete copy of file A. So now in this case if I have two drives if one of my drives dies do I lose any data? No, so in this case if one of my drives dies I don't lose any data. What about read performance? How's that compared to a single drive? Yeah, so in this case for read performance if I have two disks in this case well I don't have to read all the data so I could read half from one drive and half from the other so I could still get it in this case with two disks at two times speed up. What about writing? Yeah, I'd have to write all the information to all the disks. So if I assume I can do that in parallel it's gonna perform exactly the same as a single disk because I do the same thing in parallel to all of them. So in this case I have good write performance. My read performance is good but my write performance is the same as a single disk. But for that I also trade off in this case if one disk drives dies I'm still good. If I have, today I have four disks that are all exact copies of each other. How many disks can die before I lose data? Three, right? So because they're all exactly the same as long as I have one disk remaining I don't lose anything. So the con to this too is that it wastes a lot of space. So if these drives are both, I don't know, two terabytes well I can only use two terabytes of disk space, right? So I essentially have half the space even if I have three drives you're limited by the lowest one. So if I have three two terabyte drives because they're all exactly the same I can only use two terabytes of information. Well for RAID zero because it's split up if these were each two terabytes I could use four terabytes because it just doesn't keep multiple copies of anything it just splits it up. So any questions about RAID one or RAID zero? Cool, all right so yeah RAID one simple everything's the exact copy of each other in the OS book for some reason they call RAID zero so you can combine RAID zero and RAID one so you could have like a mirror of striped disks if you wanted to you can combine them for some reason in that OS book they call RAID 10 RAID one and it's super confusing RAID one just means strict everything looks exactly the same. So don't get confused by the wording in that book they like have a little sense they're like when we say RAID one we mean RAID one plus zero and it's very confusing. So with this good reliability as long as one disk remains you don't have any data loss, good read performance but there's a high cost for redundancy basically I can only use the same amount of storage as a single disk but I get RAID performances the same and we can do better. So the next one is RAID four you might be asking what the hell happened to RAID two and RAID three they were bad ideas no one uses them. So RAID four is also a bad idea no one uses this but we can use this to illustrate our point for our next one. So RAID four introduces something called parity so it will stripe the data basically do the same thing as RAID zero over some disks and then use a single disk as a parity and the parity will be an XOR and a parity just means some extra information it can use to recompute the information if a disk is to die. So does everyone know how XOR works? So what does XOR essentially mean? Yeah, yeah so if they're the same it returns zero if they're the same it returns one what would it mean to XOR three things together? Yeah so yeah so an easy way to think of XOR between lots of inputs is well if I added them all together the XOR of all them would be zero if it's even or one if it's odd it's a way easier way to remember that. So like so for two inputs right if we had A, B if we do like our silly logic gates like this that if we did A XOR B well it looks like that right which is simple in this case so if I added these two together it would be zero so that's an even number if I added these two together it would be two which is an even number and if I add if it was zero one or one zero it'd be an odd number so this works across a lot more inputs so for instance if we have A one, A two, A three and then a parity so the parity will just be an XOR between everything. Whoops. So for instance if our information is I don't know one zero one then if we need to calculate our parity bit well one plus zero plus one that's two so that's an odd or sorry even so that should be an XOR zero right? So now I should be able to if I erase any of these numbers you should be able to tell me through logic what I erased so let's say I erased this one so if I erase this one is it possible to recompute what A's one is supposed to be? Should be and I only have one answer so because of this information it tells me if I add together A one, A two and A three if I add them all together that result should be even so if I'm missing one well if I add this and this together it's odd so I either have it I only have two choices for A one if this was a zero that parity wouldn't make any sense anymore right? Because if I XOR all of those inputs it is odd which would mean if I XORed all of them it should have been one not zero so if it's a zero that means if I reconstruct this that means this has to be one right? So if I lose a single disk I don't have a problem because I can always recalculate whatever the thing that was missing was what happens if instead I do that? Can I recover? No. Yeah I have a 50-50 chance because essentially there's two results that and that make the parity both true so in this case with RAID four I can lose one disk and be okay right? But if I lose more than one I'm screwed. So let's go back so everyone understand that that's not too bad? All right so what about I don't know RAID performance how does RAID performance compare to a single disk? In this case where we have four three times as fast right? Because essentially these three disks here are exactly like RAID zero and I'm just using one disk for parity information so that's not helping me at all. What about RAID performance? Yeah also three times faster with a little bit of a caveat so there is something bad here so say disk one was updating A1 disk one was updating B2 and then disk two was updating C3 well in that case I have a bit of a bottleneck because that would cause three updates on disk three because it has the parity information for all the disks so this is a little bit bad because we're kind of bottlenecked by disk three that has all of our parity information on it so everything's gonna be concentrated on that if I modify anyone the disk it causes a right to that so everyone good with that seems like kind of a bad thing. So also for available space if these were all two terabytes how much space can I actually use for useful things? Six terabytes right? Because I can use all the space minus one disk that I'm using for parity so I would be able to use six so you lose a disk of space and this requires three drives but you get basically n minus one times the performance because the parity disk isn't helping you aside from that essentially RAID zero. So the nice thing about this is you get a lot of the nice speedups and you can afford to lose a single disk so if one of your disk drives dies you can just throw a new one in and rebuild it and the con in this case is right performance can suffer because we're bottlenecked by that parity disk. So RAID five is the exact same idea as RAID four the only difference is instead of putting all the parity information on a single disk it just distributes it across all the disks so in this case disk three has the parity information for A this two has the parity information for B one has the parity information for C and zero has the parity information for D so otherwise it's exactly the same right? All we'd change is instead of having one disk that has all the parity information we just spread it out across all the disks, right? Everything else all the other characteristics should be the same. So same capacity characteristics the right performance would be slightly better because we're not bottlenecked on that single drive so this is just a way better idea than RAID four but it behaves essentially the same way. So any questions about this RAID? Yep, yeah so if we lose the parity information that's fine we just have to recalculate it right? Because DP is just an XOR of D one, D two and D three and those all still exist so we just recompute the parity we essentially that's a slightly easier job right? Because we don't have to find what was missing we just redo the XOR again. So yeah, also in like the RAID four case we could lose this parity disk too right? We just have to recompute everything. All right, any questions about this? Yeah, so for one disk failing same thing as RAID four but in RAID four this disk is being hit a lot harder so it's actually more likely this one fails but it's the same thing you can withstand a single drive. So word of caution I use this for my PhD thesis this is where all my information was stored and now we know one disk drive dies we're all good right? So one of my drives died I'm good but turns out hard drives they usually die together at the same time cause you usually buy them together so another drive died before I finished guess what? So if you have this and one of your drives dies go out and replace it as soon as possible and don't be like me. So that's bad, very bad. Yeah, yeah so there's some things you can do typically you want the hard drives to be manufactured in different batches so at least they fail at different times you can buy them at different times but also with this you're limited by the slowest one so like you don't wanna buy some crappy one throw it in then you'll be limited by that one so you don't wanna be limited in that way either so yeah typically if you're running like if you're Google or something like that you'll just have a drive sitting by that you can just throw in immediately. My problem was that my drive died and I had to go to the store to buy another one cause I was a poor student so I just didn't have a drive so I was like I can save a hundred bucks I can do this yeah can't do that. So any questions with that and word of caution? All right so the next one is well what about if I want to survive two disk drying so that's what RAID 6 is. So RAID 6 just adds another parity another parity block or another parity disk that gets distributed. So there's AP which is our normal parity which is an XOR of everything and then a Q which is another parity so if I lose any two drives I can recompute the information I have lost. You might ask what the hell is the parity calculation for Q you have to go to the math people to get that answer because it involves Galois fields and all that fun stuff and like different possibly differential equations I can't read that stuff anymore so if you really want to know what it does you can ask those courses but just know that it is possible so it's possible to have another disk as information parity that you can reconstruct any two. So in this case if I have two disk drying that's fine I can recover that information but if I have three I'm now screwed so this behaves the exact same way as RAID 5 so our performance well we're essentially losing two disks to parity so we only get essentially n minus two speed up for reads and writes and we also lose two disks of space so these were all two terabyte drives well I can only use six terabytes of information in this case because two are for parity yep. So yeah the Q copyright so the question is that Q calculation really intensive it's a bunch it's essentially a linear equation of XORs so it's actually pretty fast again don't ask me exactly what that calculates it's a bunch of terms but it's pretty fast and all the calculations are gonna happen on your CPU which is way faster than the disk anyways and you could have a code processor if you really want to speed it up but turns out CPUs are fastest they're slow. All right yeah no okay we're good so yeah same thing so we lose two disk drives to space and otherwise it is going to be n minus two reads and write speed so our write performance and read are slightly less than rate so yeah our write performance would be slightly less than RAID 5 because we have two parity calculations so we're writing to the disk more often so it would be slight for this course we can assume that that's fairly insignificant and that parity is pretty fast. All right any other questions for this? Because that RAID stuff it's essentially any data center you've ever used any cloud service anything like that guess what they just buy trays and trays and trays of hard drives and someone's job they it's like with hard drives it's not a question of when it will fail it's a question of if so like someone at Google their entire job is to have a cart of hard drives and go around and start replacing dead ones all the time and rebuild it so that could be your career if you want to so RAID 4 has no advantage over RAID 5 so RAID 5 is just a better version of RAID 4 yeah so in this case if any disk changes anything it has to write to disk 3 right so essentially disk 3 will see like on average three times more writes than the rest of the disk which is bad because it'll wear out faster it'll die quicker the idea of between RAID 5 is we take that and we distribute it across all the disks so instead of one disk getting hit three times more this one gets hit like a fourth more, a fourth more, a fourth more, a fourth more but for all the other calculations it's exactly the same so RAID 4 not used we're just using it here to illustrate RAID 5 so the only ones that are actually used are like 0, 1 and then a combination of 1 plus 0 you can do that so you could have a mirror of two RAID 0s so you could do that so we could even see that real quick so the idea behind something like RAID 1 plus 0 is well at the top you have a RAID 1 so the disks on both sides of that look exactly the same and then you have some say we have six disks you distribute the data equally across all of these disks so if we had disk 0, disk 1, disk 2 say it had a 1, a 2 and a 3 well the other side of that would also have three disks with the same information so disk 3, disk 4, disk 5 so between all these they're exactly like a RAID 0 and then we essentially have two copies so in this case how many drives can die before I lose data? yeah it depends so what's my minimum? two right? so if I get super unlucky I can lose this and this and I'm essentially screwed right? what about if I was lucky how many could I lose? four which four could I lose? so right now if all these die then I'm good because I have a copy of everything still but yeah when the fourth one dies I'm screwed right? so I could lose up to three and still be good if I lose four then I start losing data so this one is like you want to leave your performance up to the gods so no matter how many you do so your worst is always going to be two so if I had another disk for some reason I had another disk right my worst is still two so if I was unlucky I just lose any pair and I'm screwed but if I got super lucky now I could lose up to four and it depends so with this you know if you like roulette you can do something like this so no matter how many disks you throw at it two if you get unlucky but if you get lucky hey you might be able to survive up to 50% of them dying so you'll have to make that trade off alright any other questions? alright so now we can get off the cloud and use our own computers well just buy a bunch of disks and then hey you're good just don't be like me if one dies and yeah replace it so today we looked at SSDs briefly and RAID SSDs they're more like RAM except they're accessed in pages and blocks and have that weird thing where you can only write to a freshly erased page and you can only erase blocks at a time so it's kind of annoying so the offering system or the kernel is going to have to work with the hard drive in order to get the best performance something like trim that would tell the disk that hey I'm not using any pages on this disk or on this block so you can go ahead and erase them all and then they'll all be freshly erased so I can write to them then we saw RAID there are a bunch of different RAID levels that all have a trade off of how many drives you can tolerate failing before data loss and how much extra performance you get and how much extra space you can use so usable ones are RAID like 0, 1, 5 and 6 are the common ones and then a combination of RAID 1 plus 0 so with that we can end so remember pulling forward we're all in this together and have a happy reading week with a much deserved break