 Welcome back to 353. So today we get to talk about the last topic, persistence and specifically disks. So, let's do an age check. How many of you know what I've ever seen one? What the original iPod had, that was like a big hunk of metal and your parents tell you to not bring a magnet near a computer. So, those are pretty old. It's like a spinning magnetic disk. It takes a long time. Most of you do not have that, but if you were to have that, well it's a spinning hunk of metal and basically there's a magnetic head that reads and writes bits and well it's kind of divided up into pages and if we were to talk about that you have to like schedule accesses to it because sequential access because it's an actual spinning thing relatively fast but random access they could have to move a magnetic head and wait for the disk to spin around and it's a whole mess and you can schedule it and all that. We will skip talking about that because that is old. So instead we will talk about solid state drives which most of us actually have. So instead of magnetic disks like RAM except that they persist whether or not they have power. So pros is they don't have any moving parts. There's no spinning magnetic disks that if you drop it, it probably breaks or any physical limitations like waiting for a hunk of metal to spin around has higher throughput and because well it's just circuitry, it has good random access, more energy efficient, don't have to put energy to spin a hunk of metal and better space density so you know SSDs can be like that little thing while a hard drive is typically something like this by like that so it's a lot more space efficient but more expensive if you like I don't know data hoarding like a local Netflix or something like that. They have lower endurance so you can only write to SSDs a certain amount of times before it just wears out and you're not allowed to write to it anymore and they are more complicated to write drivers for in some aspect and we will touch on that a little bit but not go into too much detail. So how an SSD is actually organized it's organized into several tiers so there is a die so that is the complete circuit that stores all the information then it is divided into bigger things called planes don't really have to know about that and then on each planes there are a block so that is one thing we argue a little bit about and then on a block there are pages which may or may not be the same size as pages in virtual memory so some weird numbers they're much faster than hard disk drives the pages typically are the same as with our virtual memory subsystems so most of the time they're four kilobytes but they have these weird properties so reading a page from an SSD typically pretty fast like 10 microseconds writing a page is like 10 times slower so it's going to be like a hundred microseconds and then this is weird erasing a block so the weird rule is you cannot erase a single page at a time you must erase an entire block and here well a block contains many pages could be eight could be 16 32 something like that but we can only erase blocks at a time and not pages at a time and erasing a block is way slower it's like one millisecond so because if you're the operating system you have to actually use these rules in case you want to go ahead and erase data or something like that so the rules are you can only read complete pages and write two freshly erased pages so you can't just write a page and then write over it again you have to erase it before you're allowed to write to it again so erasing like I said is done per block so block could also have 128 pages 256 something like that and with that rule well that entire block needs to be erased before actually writing something and this is why writes can be really really really slow on top of being just a bit slower by default so we might need to create a new block so if we want to just overwrite one block or sorry one page well what we might have to do is move let's say there's 128 we would have to move 127 of these blocks to a new page or sorry to a new block then also make the modification there and then we would have to go ahead and erase that whole block of everything we moved so we might just have to move a lot of things just things just so we can erase blocks at a time so the operating system can help speed up SSDs so the SSDs might need to garbage collect blocks so they might want to go ahead you know move any pages that are still alive to a new block that it could do while it has while it's idle because well that would otherwise just waste time so there's also a few other things so the disk controller itself if you go ahead and delete a file and you don't use some pages anymore well the SSD is not going to know that you no longer use them it would just assume that they are still alive so just like with memory allocation you have to go ahead tell the disk controller that oh okay i'm not actually using this page anymore so if you want you can go ahead and just erase it in the background or just stop using it or you know whatever just it's marked as unused for now so the os can do something called a trim command and that informs an SSD that like a whole block is unused and then instead of just waiting for a block to get erased whenever you really need it so it like comes down to the last minute then the SSD just when it's idle you can just say okay i'm not using any pages on the block just erase it whenever you're idle and not doing anything else important uh and then it can go ahead erase it and then we don't have any penalty there because it's doing it while it's idle yep so like if i just remove a file or something like that well my os might just keep track of like okay i'm not using it anymore so i don't have to tell the disk anything right i just keep track of it in memory and it'd be like okay i'm not using it anymore that's fine so the the hardware won't know anything about that so that's why you have to actually tell it that hey i'm not actually using this page anymore so you can go ahead you can get rid of it whatever you want and yeah ssd is just storage um the only difference from memory is that this is persistent and it's slower so when you go ahead and turn off your computer and reboot it again your information's still there unlike ram where your information is not no longer there but a lot slower than ram but it has like the same ideas where it divides things up into pages so any quick questions about that so it's just kind of this weird thing where we have to erase in blocks and we have other rules where we can only write to freshly erase and it's kind of weird gets tricky if you actually have to implement something like that okay so we can fun talk about the fun topic then so so far we've been talking about like single devices in most of your computers unless you are not slightly insane but more like me you probably just have one disc in your computer so sometimes people that actually care about day good integrity jokingly call it a single large expensive disc because it's just one large disc for all your data and what happens if the hard drive in your laptop dies are you screwed absolutely so that's why it's important to have backups of your files and all that or not a backup but you can try and ensure data integrity using something called raid which is short for a redundant array of independent discs so you have more than one disc in your system and then you configure it so that all those discs work together to distribute the data maybe k a you have multiple copies of the data so that you don't have data loss if a single device dies or maybe use redundancy in order to increase throughput so if you have two drives maybe you want to have your this performance be twice as fast so the first raid they all have numbers associated with them so the first one is just called raid zero because we're in computing everything is zero indexed and it is called a striped volume so what it does is basically just distributes all of your data instead of being on one disc it distributes it all across two discs so it will divide up your data in like little stripes or chunks of like 128 kilobytes or 256 kilobytes and for just for sake of argument say you have a large file called a that's split up into eight parts well if you have two discs here so like disc zero and disc one maybe you put the odd parts of that file on disc zero and maybe you put the even parts of that on disc one and why would you do this well if these two discs have like the same performance compared to a single disc if i'm trying to read information well i can read that same file twice as fast so if it was a single drive i would have to read eight chunks of data all from the same disc well if i do this and i have two discs four from one and four from the other and assuming i can do that in parallel because i should be able to well reading would be twice as fast so i would get two times of performance here and this would scale up so if i had three discs then i would have three times the performance for that i would have four times the performance so this is a common thing you'll see in like people with really high performance pcs they'll put two drives in it and do a raid zero so that it essentially goes twice as fast and it's the same idea of thing the right performance will increase in this case by two times because instead of writing all eight chunks to the same discs assuming i can do it in parallel i can read right four to one and four to the other and it would be twice as fast as well so i would get in this case two times the read performance and two times the write performance compared to a single disc but what is the very very bad drawback of doing this if i just had a single point of failure when i had one disc before how many points of failure do i have now until i lose data yeah no so if i lose one of the discs here i lose essentially like half of a file like half of every single file i have and now they're all useless so you can't really do much if you have like you can't really boot your computer if you have half of windows it's just not going to work and it's just going to be a random half of windows so now the big drawback here is that well before when i had a single point of failure until i had data loss now if one of my two drives fails then i have data loss so you would only do this if you only care about performance and you do not care about your data pretty much at all because it's actually worse if you care about your data so this for performance only the data is striped or just divided across all discs in the ray we can have more than two we could have three we could have four typically people just stop at two because well you get two times of performance and if one disk drive dies and whatever you just buy a new one and just replace all your data because hopefully it's fast so you have faster parallel access roughly end time speed up but if any disc fails you have data loss so we have more points of failure and another pro that's not listed here is if these two discs were uh what's a good size two terabytes so if these two discs were two terabytes each then you could use up to four terabytes because it's just two times two so you could use up to four terabytes in this case all right any questions about raid zero room room go fast so if you want to go fast you can buy another hard drive yep yeah but i'd be like reading data from the two discs in parallel yeah yeah so assuming you do it in parallel if i could only do one at a time then we wouldn't see any benefit whatsoever but we assume we can do it in parallel and there might be a bit of overhead with that but typically they read into memory memories a lot faster than this so it's mostly about two times all right so raid zero performance only there is a flip side where we actually just care about our data and we don't care about performance as much so there is raid one and that is a complete mirror okay hopefully that question got answered on discord i think so that we have multiple points of failure if you lose half your files you're still screwed so raid one that is called a mirror and that makes essentially every single disc an exact copy of each other so if i have say file a now only has four parts well i would store the complete file a on disc zero and also disc one would be a complete copy of this zero so it would have the full contents of file a as well so if i have raid zero where my two discs are exactly the same would i get any benefits in terms of read performance for this everyone's saying no why not why not yeah yeah you'd still have better read performance right so just because it stores the home file doesn't mean i can't have the same ideas before i could read you know the odd parts of that file from this zero and the even parts from disc one so i would get two times my read performance but in the case of writes well writes aren't going to have writes aren't going to go faster right if i need to update this file well i have to update it on every single drive assuming i can do it in parallel it's the same as just writing to a single drive because it all needs the same file so i can actually get better read performance out of this just because i can read half from one drive and half from the other but since they are exact copies of each other if i'm updating information i have to update it to every drive so my write performance is the same now assuming each of these discs were two terabytes each how many terabytes can i use to store my files two right just two they're just copies of each other i can't use four because well then that would mean i have the files spread out over each and they aren't exact copies of each other so since they're exact copies of each other i can only use two terabytes even if i throw another disc on here and make it a clone of all the other ones well i can still only use two terabytes so it's simple but wasteful so every disc in the array looks exactly the same as one another it has good reliability so as long as we have a single disc still alive we don't have any data loss so if this disc just dies like it fails you throw it out you throw it in the garbage you step on it whatever doesn't matter you still have a copy of your data on disc zero so and if i have three drives that are all copies of each other well two of them can die i still have a single copy left so i'm good so this kind of the extreme for caring about our data as long as one disc is left standing we don't we haven't lost any data we can get good read performance but it's a fairly high cost for redundancy you just have n copies where n is the number of discs you have and the right performance is the same as a single disc so we can do a lot better than this so this is typically only used if you only have two discs because we're essentially just wasting half of our space which isn't too bad so the next is raid four and you might ask what the hell happened to raid two and three um they were bad ideas so no one uses them anymore so and also guess what no one uses raid four either raid four is a bad idea but it will illustrate what the idea is for the next level of raid so it introduces something called parody and parody is just some amount of information that we can use to reconstruct some lost information so data stripes it's the same idea is like raid zero where we just stripe our data so we divide it into little blocks and distribute but now we have a dedicated parody disc so if we have four discs in this case we have we stripe our data across three of them and then we use one disc the last disc here with parody information so that if any one of the disc drives we can use the information on disc three to reconstruct that data that was lost read and write performance well the other three discs look a lot like raid zero so we should get like three times the read performance and three times the write performance about ish um and we're also going to have to update this parody information so that we can reconstruct data if it gets lost and this parody information is typically an XOR of all of the bits or bytes whatever across all the other discs so how many of us are familiar with XOR should be everyone right all right someone want to explain to me in plain English what it means to XOR three things together yeah yeah you're gonna say the same thing too yeah odd or even number of ones so easy way to think of it is if I XOR everything together it'll essentially tell me zero if the result of adding them all together is even and one if the result of adding them all together is odd so with that let's just do a quick example let's say I had like an a1 an a2 and an a3 all across different discs and let's say I had a one one or zero one one and then my parody information is just going to be when I XOR all these together so if I XOR all these together I'll just show them for individual bits but you can expand it to bytes if you want so one plus one plus zero well that's two which means that our parody would be a zero so using this parody information we can go ahead and if one of these discs dies so if I just you know this disc is now dead and I go ahead and erase it well because I know this parody information here I can uniquely identify whether what I lost is a one or zero so it's basically telling me that if I add I need to find a number whoops so I need to find a missing number here that if I take a one and add it to it so this will either be a zero or a one so if I take a one and then I add something to it it means that it needs to be in this case even well because I took I have that bit of information which is saying that it needs to be even well I know it can't be a one because if I did or sorry it has to be a one sorry all right so I know it has to be a one because one plus one is two and that is even it can't be a zero because then if I did one plus zero that would give me an odd number which would disagree with what I had before and this is true for anything I would lose here so if I went ahead and I lost this one well then I know that if I add two to a one or a zero it needs to be be a one because that would get three that's an odd number so the data I lost it had to be a zero so everyone okay with that all right so what if I lost two yeah yeah I don't know right I have two solutions it could be this or it could be that and they both agree and I have no idea which is which so in this case I can tolerate a single disk dying that's fine I can reconstruct what information it has but if two disks die then I'm screwed right you might be able to pay someone a lot of money to recover that data because well this is like 50-50 they could essentially roll the dice and do the 50-50 a lot of times and like knowing the file format and everything they could guess what bytes it had to be in order to make sense but you're gonna have to pay someone like a few thousand dollars to do that for you so yeah you can only really tolerate one disk failure so funny story you could store some of your important data on something like this like I don't know maybe your phd thesis and then maybe a disk dies while you are working on it okay that's fine right so I could still recover it I just had to go to the store buy another hard drive stick it in and let it recalculate that's fine well it turns out with hard drives they typically die together because if you buy them at the same store at the same time they manufactured the same they've like the same level of wear on them so they'll probably die close to each other so I was busy and I left it for a week and while I was working the other one died did I lose data yes so do not do what I did and if one disk drive dies go to the store and buy another one so if you're doing this like data centers will do something like this and they will have some drives on hand already available so that when one dies someone's job at like the google data center or aws or something it's literally to go around with a cart of hard drives and just stick them in as they are dying and replace them and that is why you use cloud providers to you know take care of your data because they hired someone to literally just do that all day and make sure your data survives so that is also a fun career path you could have you could be the guy that has a cart and a bunch of hard drives and sticks them in when they die so does it pay well uh probably yeah because you probably have to like know all this stuff and know the theory for it so you probably have to take a course like that to have that job so good thing we're taking this course so we can get paid a lot to just shuck hard drives all day so that's pretty good so having this kind of a bad idea though so what happens in this case if we have to update let's say we only have to update I don't know A1 well then we have to you know read all the other disks recalculate the parity information then update this disk then if we write let's say B2 to this disk oh well we also have to update the parity data on disk 3 and oh no if we update I don't know C3 on disk 2 we have to update the parity information on disk 3 so anytime you update information on any disk it causes a right to disk 3 which will cause disk 3 to die much much quicker than all the other ones basically it'll get like three times the rights so it is a bit imbalanced which is why this system isn't used and also it's a bit better than raid one so if these were all I don't know two terabytes each or well let's go bigger let's say these were all 10 terabyte hard drives each well then the amount of usable space I have would be 30 terabytes so I can use three disks essentially to store all my data and then one of the disks is just reserved for storing all that XOR information so I would have 40 terabytes worth of disk but 10 of it or a single disk is used for parity information and all the rest have my data on it good all right so with parity that's basically what that says we can use like all the space minus the amount of space for one disk and we need at least three drives for this otherwise it doesn't make sense so for the pro we get basically n times one performance so basically we just argue about it more or less it looks like raid zero with three disks without the parity disk and the nice pro about it is unlike raid zero if a single disk dies I don't lose any data I get my ass to the store and buy another one and this have it recalculate it so I don't actually lose any data but the big con here is that right performance suffers because essentially everything's concentrated on that parity disk which is why it's no longer used so it's only used to illustrate the next idea which is raid five which is exactly the same idea but instead of having all the parity information on disk three it'll just distribute it across all the disks so essentially every disk will be responsible in this case for hoard for holding a fourth of the parity information and the idea behind that is you instead of concentrating all of the rights to a single disk you kind of spread it out so it's more even and not concentrated on a single disk otherwise exact aside from just spreading out the parity information across the disk we argue about it the same way we argue about raid four yeah yeah so the parodies will wear out the drives faster but if we put all the parity on a single SSD that one SSD will die away before the other ones which some data centers actually like that for some reason so some have gone back to that because they'd rather just one drive die quick and just replace it so it's kind of come back this is a much better idea if they were like spinning this so kind of depends but mostly in terms of performance you'd rather have it distributed out and raid five this is the first one other than raid zero and raid one that's actually used so this is what i was using to store my data when the unfortunate happened so raid five oh yeah oh yeah so pretty much all the same things as raid four so high level it's about the same n times one or n times one read performance n times or sorry not times one n minus one read and write performance but the right performance in the real world is slightly improved because that bottleneck of the single parity drive is now removed we don't have to wait for every single operation to update that one disk so in practice it's a lot faster well not a lot but faster all right the next one that is used is raid six and the idea behind raid six is that it uses another parity some other parity information so instead of just having you know just p that's an xor of everything it would have another bit of parity information q which is a complicated thing that is not an xor that it is like a linear combination of xors basically to explain whatever the hell q is i have to explain gal was fields and like some advanced mathematics and something like that so don't worry about that someone else figured it out for you so just assume that q is just another bit of parity information so i have two bits of parity information so now same idea is raid six or sorry raid five in this case where i have five disks i essentially have two disks used for parity but i'm going to distribute it across all the other disks and otherwise it is the exact same so i have two bits i essentially have two disks used for parity and then the other disks i can go ahead and the information is just striped across them so with raid six assuming these were all i don't know 10 terabyte drives the size of two drives is used for parity so of my 50 terabytes worth of drives i could actually use 30 terabytes to store information but the big pro with this is now i can withstand two drives dying so as long as any two drives die i can use these two bits of parity information to essentially reconstruct two bits of information instead of just one so i can tolerate a single disk failure so if i had this if i had my thesis on this and two disk drives died i would be okay but i would have had to buy another drive in order to set this up and i was a poor grad student so yeah i guess i just got what was coming to me so yikes so it can recover from two simultaneous drives failures so do the extra parity essentially i lose another disk of space and this requires at least four drives in practice you probably want at least five and right performance is going to be slightly less than raid five just because i need a every time i write a piece of data i need to write two things of parity instead of just one so in practice it's going to be a bit slower than raid five and i'm going to be able to use less of my space but the main benefit i get here is that well i can tolerate two disk drives dying so any questions about that okay so there is a third one if you read your textbook your textbook whoops not that your textbook is very strange so raid one is supposed to just be a mirror of everything and it's an exact copy for some reason your text textbook just says four raid zero or raid one whenever i say raid one i actually mean raid 10 or aka raid one plus zero so why your textbook does this i don't know but there is a thing called raid 10 which is basically raid one plus zero and the idea behind that is let's say i don't know i have three disks just zero d1 d2 so say i have six disks five let's see what my drawing is like wow they look like trash cans all right cool yeah they look like that old mac pro or whatever that was the trash can addition so the idea between raid one plus zero or raid 10 is at the very top here we use raid one so we essentially split them in half and make both halves exactly like each other so they're exact mirrors of each other and then these two mirrors are split off into raid zeros so that means that this disk is exactly the same as this one this one is exactly the same as this one this one is exactly the same as this one but between them it's a stripe so all of your data is striped across these disks so assuming each of these disks was again 10 terabytes how much space can i use to actually hold my data 30 right i can use half of it so because they're exact copies of each other it's like raid one so i can use 30 on this side 30 on this side if they're 10 terabytes each so in total i can go ahead and i can only use 30 because well they're exact copies of each other in terms of uh read performance my read performance well i could read a bit from all six of them at the same time so i'd get six times but in terms of write performance well i would get three times faster than six times faster or sorry two times faster because i have to two or three crap would be two or three three yeah it would be three because well i'd get the speed up from here so from writing to the raid zero i could write to all three of them in parallel and i could do that in parallel with the other raid zero so i would get three times the right performance in this case all right what about the more important one how many disks can fail until i lose data yeah yeah whenever any matching pair fails which in this case well if any one drive dies that's okay so let's say this this died well now i'm in the danger zone but i might be okay so essentially in this scenario i will have data loss if disk three dies right because then i have no more copies of that yellow data but disk one two four or five could die so we can leave this up to the gods so it might be the case i get lucky in the worst case right as soon as i have one disk drive dies and this happens i'm screwed so i might only be able to tolerate one disk dying but if i get lucky well maybe this this dies okay i still have a copy of the blue data that's fine now i'm really playing with fire if i don't replace anything because if disk three dies i'm screwed if disk two dies i'm screwed but i might get lucky and say just one dies so best case i can lose like a whole half of it so i can lose half my drives if i have a nice lottery ticket and i won't lose any data but again they typically we typically only argue about the worst case and in the worst case well i can only tolerate a single drive failure dying but who knows i might get lucky so if i stored my thesis on this i might have got lucky but i would have needed six drives so who knows um also like a practical thing this will also be faster to rebuild so the main reason why didn't go to buy a drive right away is because recalculating that parity information depending on how big your drive is and also writing to it can take like days or a day and a bit or days yeah sometimes days well if you have this system this is a lot faster because there's no parity involved at all so if i went out to the store and i bought a new disc five well i just stick in disc five and then we just do a straightforward just data copy straight from disc two to disc five and it's much much much much faster because there's no parity information involved or anything like that so in terms of practical reasons some people like using this because it's a lot faster to actually repair when something inevitably goes wrong but the big drawback here is that i can only essentially use half the space i have and this is this this is kind of what your get repositories are on something like this so if the server dies we have to be unlucky and a lot of drives have to die at once in order for you to lose all your work but you don't really lose it because well it's git and you have a copy of your repository on your computer so it doesn't actually matter if my server blows up yay all right any other questions or anything yeah uh read would be six times faster here because we could read just parts from all the disc yeah yep so in practice it just depends how much you care about the data depends how much money you want to spend depends how yeah what your tolerance to risk is how much space you need factors like that so there's no one size fits all solution in terms of data centers they'll typically they like really really care about data so and they might have you know hundreds thousands of discs that they all want to work together so there are special like distributed file systems that will like keep three copies like the most common thing is to keep three copies of the data and it distributes them among different computers so if a whole computer goes down it's fine your data is still somewhere that's basically would like if you heard aws s3 basically what they do it's like a distributed file system so this is just on the same machine so if you're amazon or google um you might have this on one machine but then you'd also have multiple machines connected to each other and then you know you distribute data that way all right any other fun questions so yay hard drives so discs enable persistence we saw ss oops that's not it so discs enable persistence we have ssds and raids ssds are more like ram like except we go ahead and in top of accessing pages there's also blocks that we have to erase and they have a few weird rules most of the rules the operating system has to work with the hardware in order to get good performance so it was especially a thing when ssds was new to have operating systems that understood trim aka telling the disk drive whenever you're not using pages anymore so it can go ahead and erase a block when it's otherwise not doing anything and then we also use raid so we can just get a bunch of discs in the same system together and tie them together in different ways so we can tolerate drive failures and improve performance while using multiple discs so with that just remember phone for you we're on this together