 All right, back to regularly scheduled programming. So quiz three is set. It has a scheduling question, which is fairly short. That's a multiple choice. 10 general multiple choices on virtual memory. Then kind of written response about scheduling. Then page replacement, which is like page faults. So it'd be like optimal, least recently used or FIFO that you have to worry about for that. And then there's a paging question. You can expect multi-level page tables. So with that, we will switch topic to the last kind of thread of the course and talking about persistence. And first we'll talk about hardware and maybe older hardware that some of you might not have seen. Good old hard disk drives. So this is the actual hardware of the hard disk drive and kind of what it looks like if you tore open one of the spinning rust hard drives. It looks something like this where there's a bunch of platters. So each of those big spinning disks is called a platter and that's where all your information is stored. So it's all magnetic. So it's read as a one or a zero. So that big arm assembly there has multiple arms and at the head there, it can read whatever it can do some magic with magnets and actually read or write bits to the disk there. So then when you talk about the disk there, so each track, so it's called a track if everything's at the same radius because we have to be able to select some amount of information. So there'll be a track which is out radius and then a platter is which one you select which it might also be called cylinder because it looks like in cylinder if it's 3D and we have multiple of these and then within a track there'll be different sectors and then each of those will be of size say a block or something like that. So there'll be a set size. So you'll be addressing things in sectors which will be a set size usually four kilobytes and multiple sectors live on a track and then there are multiple tracks across all the cylinders representing all the platters of the disk. So that's kind of what it looks like and how it works too is that arm assembly can move back and forth and then the disk can also rotate so it rotates to spin all the disk and that's what that loud sound is. So here your access speed because things physically have to move really depends on how close two things are together. So if you have sectors on the same track you can read them continuously so you don't have to move the head at all you're just reading the subsequent sector so you essentially get it for free if it's already rotating anyways so the magnetic head just goes and it can constantly read things and it can actually do that fairly fast but if your data is on another sector on a different track well then the read-write head is gonna have to move in and out and it might be on the 180 side of the platter so you're also going to have to wait for the disk to spin around and get to the sector that you want. So switching between tracks and repositioning that head is actually kind of slow and repositioning the arm is also expensive. So when you are the kernel and you're writing things to access the hard drive you have to say what sector you want and you do that by addressing it in this old addressing mode called cylinder head sector so you give each of those coordinates so you can think of it as just a coordinate where the platter is like the Z axis so that's which head to write from and then the track or something historically the cylinder is basically the radius so what radii you want and then the sector is just wearing that track which would be just like the degree. So this scheme has like an eight gigabyte limit historically because it assumes there are 512 bytes on a sector and then up to 63 sectors of track and then up to 255 heads so that's how many tracks are in a cylinder and then how many cylinders there are. So the alternative approach to just giving three coordinates to actually address the sector is something called logical block addressing so it basically just uses an index to address any block yeah just uses an index and then that goes straight to a sector so it's not limited eight degrees it's just however many indexes you have so you can say however many index bits and that will just read you off a sector but at that point you have no idea about the actual geometry of the device so you don't know if two sectors even if their indexes are right next to each other they may actually be on different tracks because there has to be breaks somewhere so you lose a bit of control if you want to like really really optimize something if you use logical block addressing because you're not guaranteed that any two sectors if they're indexed one apart it doesn't mean they're right next to each other. So there's also this thing for hard drives called shingle magnetic recording which is a kind of new thing that typically you want to avoid basically the right head only writes in the center of the track and has some unused padding so because there's magnetic interference and all that stuff so under normal circumstances when you have this you can't write to the padding without destroying anything on the neighboring tracks that's why there's padding so there's some space where some interference can happen and that's fine it doesn't affect something on the next track but with shingle magnetic recording it basically takes advantage of this and writes over the padding if you do it in a certain way so you have to guarantee all your writes happen at like say outer tracks before inner tracks and that way you can kind of overlap the interference a little bit and pack things even closer together but you have to be really careful to do everything sequentially or else that interference is still there and you might be able to ruin stuff and also your performance may if you don't know this is happening your performance may suffer if you write in the opposite order then things are shingled because then it's going to have to write all the data again just to write modify a bit at the end and you're going to have to do this over and over again so it gives you some more density but also if you use it incorrectly your performance is going to suffer a lot but it is cheaper and if you buy cheap hard disks now they might actually have this technology in them and you actually have to check so these hard disk drives like I said positioning the head and rotating the disk are not free and the amount of time it takes just depends on the amount of physical distance it has to travel so the rotational delay is the time it takes to physically rotate the disk to get to the correct sector which would typically be like four to eight milliseconds if you assume the average delay is like half a full rotation so yeah so on average it has to rotate half if you assume just completely random accesses and then seek time is moving that head in and out to get to the correct track typically this is a lot faster since it's on a track it doesn't have to wait for anything to spin it just moves up and down so that would be like half a millisecond to two milliseconds and then the transfer time is how long it takes to actually read bytes from the disk assuming everything is sequential and in order and we're just reading it as fast as we can from that magnetic head and typically the maximum transfer speed for these kinds of disks are going to be somewhere in the neighborhood of like 125 megabytes a second maybe 150 while our NVMe hard drives now are like 7,000 megabytes a second or something like that so relatively slow but still much, much cheaper so if you were to calculate the transfer rate well it's some fairly simple math the total time it takes to actually transfer some data you have to take into account the rotational delay so how long the disk takes to rotate then how long it takes for that head to move in and out and then how long it takes to actually read the data from that magnetic head and transfer it so the transfer rate is going to be just equal to the size of the transfer divided by how long it takes since that is the definition of transfer rate and then we can answer the following questions like hey what is the transfer rate of large sequential accesses and then what's the transfer rate of small random accesses for each disk so you should use them sequentially whenever possible because your performance really, really badly degrades so say we have two hard drives here hard drive one is like a consumer one that you could buy off the shelf and then hard drive two is more of like an enterprisey one so you can see the enterprisey one spins more than twice as fast which is directly proportional to its rotational latency so a consumer drive might be like 4.2 milliseconds while an enterprise drive would be something like two and then the average seek time would also be about twice as fast on a more enterprisey drive where the seek time for your consumer drive is like nine milliseconds enterprise would be like four and then the maximum transfer speed between the disks wouldn't actually be all that different so like 105 megabytes a second versus 125 megabytes a second they'd have the same number of platters and their interface would be a bit different to actually communicate through the hardware so for the sequential 100 megabyte read we know our transfer rates because we're going to assume that this is the ideal case where the seek time is zero and the rotational time is zero so in that case we're given our transfer rates so if we want to calculate how long it takes for a 100 megabyte read while we have the rate we have the amount of data we need to actually read so we can calculate the time knowing that then the more interesting calculation comes when we actually want to calculate how long it takes for like a four kilobyte read so you can see between the two hard drives that's where the big difference comes in even though they're both really really slow so if you just had to read like four kilobytes of data randomly that could be anywhere on the disk it would be something like 0.3 megabytes a second versus 0.6 for the enterprise drive yeah, the total time yeah, yeah so for the first one where we calculate where we calculate the transfer time for the sequential accesses we're just assuming the seek and rotational time are zero so if t is the total time it comes all it only comes from the transfer time yeah so for the first hard drive there so for hard drive one for hard drive one I'm just assuming that the the rate is 105 megabytes a second and then I know I'm transferring I know I'm transferring 100 megabytes so well if that's the case it's just 105 megabytes a second divided by a hundred and then that would give us something like the 850 milliseconds time which would actually be like 0.95 seconds so that's the more interesting calculations for the four kilobyte read so for the four kilobyte read we can just go ahead and assume so we won't even bother we'll just assume that we can read that four kilobytes instantly so our total t is then going to be equal to average seek time plus average rotation and then that's it so we're just going to assume that the whole transfer time is instant so for the first hard drive for each dd1 well the numbers on the slide was our average seek time was 4.2 milliseconds and then our average rotation was 9 milliseconds so our whole time is going to be whatever that is 13.2 milliseconds which is the same as 0.0132 seconds so then in this case the amount of data we're transferring is four kilobytes and then we can actually assume it's like the hard drive manufacturer run so where four kilobytes is actually a where one megabyte is actually a thousand kilobytes which is not really that true so we know that you know if we're due powers of two well one kilobyte like this equals two to the ten bytes but hard drive manufacturers do not believe that hard drive manufacturers believe one kilobyte equals a thousand bytes and this is why if you've ever bought like a one terabyte hard drive and then you try to use it it would only be like 900 and something gigabytes well that's because hard drive manufacturers don't believe in powers of two they believe in a thousand so they can sell you something that you think is bigger than it actually is so because there's a difference of like 24 bytes for it's like one kilobyte the number will be like kind of similar but if you get up to like one megabyte well hard drive manufacturers think that is you know a million bytes well you probably believe it's like two the 20 which uh what's two the 20 oh cool yeah so two the 20 is like actually something like this and then it's going to be even worse if you believe one gigabyte is like a billion bytes and then you think it's two the 30 then the numbers are going to be even wild more wildly different so that is why you buy a one terabyte hard drive it's not actually one terabyte so that's marketing for you so if we go ahead and do this then oops yeah then we can calculate our rate so our rate is just going to be d divide by t so how much data we're transferring divide by how long it takes and then this is 0.004 megabytes divided by 0.0132 seconds so that gives us our 0.351 megabytes a second so that was for hard drive one we do the same thing for hard drive two well we can just get the rate directly so the rate we're transferring the same amount of data in this case it's going to be way faster because we're going to add two plus four which is like going to be about twice more than twice as fast because it spins a lot faster so that would be six milliseconds which is 0.001 seconds so that would be 0.66 megabytes a second so that's the two difference between the drive some fairly simple math so when we have that logical mapping remember that's like you just have an index that you can just index sectors but you don't know where they are on the disk if you're hard drive or if you're using a hard drive you probably want them numbered like this where most of them if they're sequential in the order are actually located next to each other in the same direction that the disk spins so here I could sequentially access you know block zero all the way to 11 and if I do that all in order it would be all sequential and be really fast and then I have a break between 11 and 12 where I have to move tracks but I try and minimize that as much as possible so then if you assume that you actually don't have to you know use all those three coordinates and you get pretty good performance without actually having to do any complicated looking into the hardware or anything like that so yeah and also too here because it spins you may want to optimize this a bit better so when you transfer from you know you're reading sector 11 and you have to move it to sector 12 so you have to move it between tracks if you're designing the hardware you would actually want to number that such that by the time the head moves to the next track that sector 12 would be under the head and being able to read as quickly as possible and then you don't have to and then if you're the kernel and you're scheduling this you just access them all in order and then it's as fast as it could possibly be so that actually makes it a bit more efficient but you may want some more flexibility so that's like really simple to program you just try and access things in order it tries to reduce the seek time for sequential accesses as much as possible but if you are the kernel and you are controlling the file system well you can't try and inspect or try to optimize anything because you'll have no idea maybe you want to be absolutely sure that you don't change tracks or something like that well with just this default mapping you have no idea because you just have essentially one dimension in total and you might be able to try and like reverse the mapping by seeing you know how your transfer rate is between different blocks and then you can kind of figure out where the break is but you might not want to do that and also the hardware in a disk is quite complicated so the number of sectors in a track might change and also the disk those sectors go bad randomly and the disk will silently kind of remap them and say hey that sector is bad i won't use it anymore and then modify all the numbers and then you'll quickly lose track of things because it won't stay consistent and then this is where we get into our fun friend a cache so we've talked about all sorts of caches so you can also use caches for disks to significantly speed up their transfers so some disks will also have some internal memory so like for the western digital red ones they have like 64 megabytes of cache and what they can do is implement a read-ahead track buffer so basically it just reads the entire contents of a track while it's already there and keeps it in a cache so if you're going to access some information that has already been read in that track you don't have to wait for the disk to go around and spin back around to essentially where it started it would just read from the cache because it's already passed by and then other things you can do you can do something called write caching with just your RAM so if you modify any of these sectors maybe you make like thousands of small changes to that file well what the kernel can do is just keep you know essentially that block or sector in memory so you can modify it you just modify it as if it's memory and it won't actually write that data back out to disk until you're finished with it or you actually tell it you want it so there's some strategies you can use you can either explicitly say hey write this back to disk or you can kind of claim it's written back to disk so if you want to be really really fast you can say hey kernel please write this to disk i've modified it and then the curl can say yep i did that and then lie to you and not actually do it and then it's going to be it's going to look really really fast because if you keep on modifying that reading that data you'll see the modified data but if there's power loss because it's essentially only in memory and we know that if your memory is volatile so if the power goes off all your information is gone so if there's power failure you're actually going to lose data and that's not going to be good and then the other thing is if you really are serious about things being written you can do something called write through so you actually get an acknowledgement that the data is physically written to disk where you can guarantee that if there's a power failure that that data that's actually written and you're not going to lose any data so if you're like implementing file systems you would actually probably want to be notified that if you actually care about your data okay and the fun thing about disks too is we can schedule them so scheduling is still a thing here so you want to minimize the amount of time the disk moves without reading or writing data and this mostly applies to if you actually have access to all three coordinates so you can actually do first come first serve and if you know your programs are reading and writing to sectors you can just schedule them in the order that they're received remember your kernel is going to be way faster than disk because disks are fairly slow so if you do first come first serve for essentially like sector accesses or block accesses they'll be fair but this is going to have it's essentially going to be random accesses that you might be able to do better at so the first thing we can look at is something called shortest seek time first so in that scenario if i have a bunch of requests i'm going to service them in the order that they're the closest to wherever the hell my head currently is at so i'm trying to reduce the amount of arm movement or seek time but this is also going to be unfair because it's essentially like the hard disk version of shortest remaining time first so i'm just going to only do the close accesses so if there's an access that's really far away you know it may get starved because all your access your other accesses are really close together so it would actually service them instead so what you can do is an algorithm called the elevator which sweeps across the disks so if a request so say these are requests for you know sector 0 12 and 20 well what the elevator does is it will essentially go sequentially through the tracks so it'll go through this track this track and then this track and only service requests that are already kind of active at the time it goes through and it won't go backwards so it won't go backwards until it reaches all the way to the top and then it'll go back so in this case it would start at the outside track read sector 0 and then after that track's done it would go read 12 then after that track's done it would read 20 so what it would do is if the elevator is currently moving up so say we read zero and then we moved up for 12 and then we had a request come in for like one or something like that something on the outer track well the elevator is moving up so it wouldn't actually read sector one until it the elevator goes back down to the floor and restarts again so does that kind of make sense to everyone hopefully so what that elevator does too is actually ignore rotation so typically that shortest positioning time first is often the best strategy but you need to work together to actually implement that so elevator because it ignores rotation might actually not be the most efficient thing so you might want to use a combination of that and the shortest positioning time first so you try and optimize you try and minimize the amount of the distance between physical sectors so you don't have to rotate the disc or move the arm yeah no so elevator would work through the tracks in order and doesn't care about where they are rotationally so say yeah so say with this example too so say instead of we had 0 12 20 and say we also had six so if those were all our requests well a faster thing to do might be to read 0 12 20 while it's moving up and then read six afterwards but the elevator would do everything on the same track first so if we also had a request for six it would read 0 6 12 20 and then keep going up and then say we read 0 then 6 and then move to 12 well if a request comes in for five then it's not going to do that until the next time it goes through so yeah so it goes horizontally so it goes like it would go outside track all the requests for the outside track move a track in do all the requests there move a track in do all the requests there and then reset the head back to the beginning so with the elevator that the track only moves in one direction and then resets back to the bottom and then moves only in one direction if we yeah so if we had 0 12 20 and then 6 if it's elevator it would do everything on the outer ring first so it would do 0 6 12 20 while the fastest thing to do might be to go like 0 12 20 and then six afterwards and get all the other ones sequentially because i'm gonna have to if i do 0 then 6 i have to 0 then spin this thing 180 and then read 6 and then i have to go spin it 180 again to get all the way back to 12 so instead i could have just went 0 12 20 without spinning without having to wait for it at all essentially if they were lined up and then i just move the head while it's rotating 180 degrees and then pick off the six so that yeah so elevators a bit slower but it's going to be fair and not have that starvation issue because if i just did shortest positioning time first say i just had requests you know all next to each other here that kept on showing up again and again and then i also had a request for 20 well 20 might never actually get read and it just stays on the outer track and just does all the lower requests so yeah so you have the same idea with starving a process you can starve a block on the disc okay so disenable persistence so we explored kind of two hard drives like the enterprise one the other the enterprise one the consumer one saw you know shingled non shingled saw kind of two options for a whole bunch of hard drives but basically the magnetic discs have really poor random access they also need scheduling which good thing we learned scheduling and then if we do shortest positioning time first that's kind of the best i mean that minimizes actually waiting for any mechanical thing to move so that would be the best scheduling for throughput but kind of like shortest remaining time first it's going to maybe have issues with starvation so our whole scheduling it's still a trade-off even with this okay now do we have any final quiz three thoughts concerns or anything like that yeah yeah so this one's kind of between so it has one scheduling multiple choice which is like kind of short-ish but not a short short one and then 10 quick multiple choices of virtual memory and then a written one about scheduling which would be like hey some benefits or drawbacks and then page replacement which would just be like a longer one of like what's all the page faults at each time and then a paging question so given this virtual address tell me what it is so three longer-ish questions a bunch and then otherwise multiple choice so somewhere between quiz one and two so it's 40 points total which isn't super ideal but it's better than quiz two because that was only 30 and had only a few questions yep for the final format i do not know what the final format is going to be look like so it's in a computer lab i believe it's a quarkus quiz but it's a supervised quarkus quiz so you'll be doing it in the computer labs so you'll be supervised and they're restricting i don't know they're doing some setup where they're like restricting something or other so they have a setup for the machines there so you have to go on the machine and do it yeah so that yeah it'll be an in-person online quiz well i'll finish this up and then i'll answer questions i guess off-stream so just remember pulling for you we're on this