 Okay, let's get started. So this is lecture 13 of Computer Science 162, and we're gonna start first with one of our quick little quizzes, and it looks like I need to restart my PowerPoint again. So a really weird bug with PowerPoint where it seems to lose all the fonts. All right, let's try that again. There we go. Okay, so a quick quiz. We've got five questions. So first question, within a synchronous interface, the writer may need to block until the data is written. Second question, interrupts are more efficient than polling for handling very frequent requests. Third question, segmentation fault is an example of a synchronous exception or trap. Fourth question, direct memory access is more efficient than programmed IO for transferring large volumes of data. And our fifth question is in an IO subsystem, the queuing time for request is 10 milliseconds, and the request service time is 40 milliseconds. The total response time of the request is blank milliseconds. All right, so let's start with the first one. With an asynchronous interface, the writer may need to block until the data is written. So how many people think that is true? Okay, how many people think that's false? Good, you know what asynchronous is. So asynchronous means you don't block, so answer is false. Question number two, interrupts are more efficient than polling for handling very frequent requests. How many people think that is true? Okay, how many people think that's false? Ah, you guys are doing great. All right, so that is indeed false, right? Interrupts are very good if we have something infrequent. Polling is what we wanna use if we have something frequent because we don't have the overhead of all the context switching that occurs when you have interrupts. Okay, question number three, segmentation fault is an example of a synchronous exception or trap. How many people think that is true? Okay, how many people think that's false? Good, that's true. When you have a segment fault, you are immediately going to trap into the kernel. You cannot block that, stop it, or prevent it from happening. Okay, question number four, direct memory access is more efficient than programmed IO for transferring large volumes of data. How many people think that is true? Awesome, and false. Awesome, okay, that is indeed true, right? Direct memory access is what we can use when we wanna transfer, say, a large block of data from the disk into memory or vice versa or to the network card. It doesn't require any CPU overhead other than setting up the request. After that, the CPU is free to do something else. Yeah, question? What is programmed IO? Programmed IO would mean where you're doing like loads and stores with the CPU or doing input-output instructions. Okay, last question, in an IO subsystem, the queuing time for a request is 10 milliseconds and the request service time is 40 milliseconds. The total response time of the request is 50 milliseconds. Great, you guys are gonna do really well on next week's midterm. Okay, so what are we gonna talk about today? So we're gonna talk about hard drives and solid state drives and then we're gonna talk about some of the very important storage policies that we wanna have in a system and the access and usage patterns for a file system. And then we're gonna talk about some of the data structures in a file system and we're gonna actually go through an example of the world's most commonly used file system. So we can see if you can guess what that is. So first, let's start with what we store our data on. We store our data on hard drives. So hard drives have a long history. IBM spent 10 years developing the first Winchester Technology hard drive and it took them a billion dollars to do that. So it really was a moonshot kind of research project for IBM. But the result has been a proliferation of drives, of all sorts of varying sizes ranging from drives like this three and a half inch drive all the way down to micro drives. So one of the first commercial drives that was available for customers, for businesses and consumers, small businesses and consumers was the IBM Personal Computer. Back in 1986, so probably before everyone was born, stored a whopping 30 megabytes. That's megabytes, not gigabytes, megabytes. Cost $500 had a 30 to 40 millisecond seek time and we'll talk about what that means in a little bit but that's a really large amount of time and could transfer between 700 kilobytes and a megabyte per second roughly. About a decade later in the mid 90s, IBM introduced the micro drive. The micro drive is a little compact flash size drive with a platter in it that's the size of a quarter. And these, the original one was 30 megabytes, these, here's one at 512, one gigabyte and four gigabyte sizes that they ultimately produced. Primarily these went into tablets and into cameras. I think the original MacBook Air had one of these. So how many people have taken their computer and opened it up and looked inside at a hard drive and actually taken the hard drive apart? Okay, well, why don't we do that today? So I have a variety of different drives, not all of them I can hand out. So here's a drive, for example, that's in a bag because the way hard drives work is on the Bernoulli principle. So you spin the disk really, really fast. The head's really small and light and you'll get to see what the heads look like in a moment and it floats on a little cushion of air. Now if you drop your computer or otherwise subject it to high torque or other things like that, you can cause that head to contact the media. And when that happens, all your bits get scraped off and then you put your drive in a bag like this so it doesn't make a big mess. Okay, so like I said, I have a whole variety of different sizes and my only request is that I actually get these back. So Kevin's sitting in the back. I have a lot of valuable data on these. And if you really want one of these drives, I have many of them so I can give you one. But it's good if I get them back because then I can use them for next semester. Okay, so as you're looking at these, what are some of the properties? So some of the properties of these drives are they're a set of platters. This is where your data lives, okay, is on these platters they're typically aluminum or in some very high-end enterprise drives they're made out of high-strength ceramics and they're coated with ferromagnetic material, okay? And then we have a head, a thin film head, which is the thing that's floating above the drive and actually writes on the drive. Then on the bottom of the drive, you'll see there's a large motor which actually spins this thing at high speed and there's another large motor which moves the heads back and forth, okay? So we take our drive, we take our platters or each of these surfaces and we divide it up into concentric rings. Those concentric rings are tracks. We take each track and we divide it up into sectors, okay? Now, an operating system, that's the sector's the smallest addressable unit on a drive. The operating system groups those sectors into blocks and that's the unit of transfer for the operating system. So we transfer a block at a time which is one or more sectors. Now, we have random access. So this thing is always spinning. We can go to any position on any of the platters of this drive by moving the arm back and forth and waiting for the drive to rotate around to the appropriate location. So access is either sequentially or random access to anywhere on the drive. Now, typical numbers, and these are always horrible out of date because they change literally every few months, is that you have somewhere between 500 to 20,000 tracks per surface. So those little micro drives, they're not gonna have a lot of tracks. A larger drive like this will have a lot more tracks on it. And each track will have anywhere from 32 to 800, rather, sectors on it. Now, you might notice that if you're closer to the center, the track is actually a lot smaller in area than if you're at the outside. So most modern drives do what's called zone bit recording. So they keep the density of bits constant across the entire platter. This allows you to cram the maximum number of bits into the drive. So what it means is that if you look at the center of the drive, there's gonna be fewer bits stored here. So fewer sectors in the center, the outside of the drive will have a lot more sectors. Older drives like floppy drives and all on Apple IIGSs or the original Max, they actually were floppy drives and they actually spun the disk at different speeds depending on the track location. That's how they maintained a constant bit density. Modern drives, they just vary it automatically with the drive electronics. The drive spins at a fixed speed. Okay, so some more characteristics. So we can take, if we take all of the tracks on all of the surfaces of all of the platters that are under a head at any given time, that forms a cylinder. So that's all the tracks under all the surfaces, the track under all of the surfaces, on all the surfaces under all of the heads. Now, reading and writing is going to be a three-stage process. So the first part of the process is we actually have to move the arm. So we're gonna move the arm to the appropriate track. That is our seek time. Now, once we have it on the appropriate track, we have to wait for the drive to spin all the way around until the proper sector is under the head. That's our rotational delay. And then, once that's the case, we just read the bits off the disk or we write the bits to the drive and that's our transfer time. So those are the three components of the physical side of the drive. Now, there's more time associated with reading and writing from a drive. The total time is first, we actually have to deliver the request to the operating system and it gets queued up and processed by a device driver. Then it gets sent to the hardware controller and that's the big chip on the back of this, like this one from Texas Instruments that says DSP because it's the digital signal processor that does all of the data extraction, the signal processing. And then it's the media time. So again, the time to wait for us to seek to the appropriate track, wait for the appropriate sector to rotate under and then transfer the contents of the sector. So the highest bandwidth that we're gonna get out of a drive is when we're transferring a group of blocks sequentially from one track, okay? Anything else? It's gonna be slower. So yeah, question? So a cylinder, the question is what is a cylinder? So a cylinder is if I take the head and I move it to a particular track, if I look at all the tracks that are under all of the heads, so in this case I have one, two, three, four, five, so I've got 10 heads, so the 10 tracks that are under the heads form a cylinder. You're reading from one track at a time, that's correct. And then, but the key thing is you can actually switch reading from one track to one head to another head very quickly. So it's just switching an amplifier to connect the amplifier to that head or to a different head. So it's electronic as opposed to a physical time, like waiting for something to rotate around. Okay, so some typical numbers. Average seek times, and so these numbers will very wildly depending on the size of the drive and how much you pay for the drive. So if you buy a drive that's in a very low end laptop, it's going to be a lot slower or in a low end desktop, it's going to be a lot slower than the kinds of drives that we use in the second floor machine rooms in Soda Hall to store the department's files and to store all your project files. So typical seek times, five to 10 milliseconds. And depending on actual, how closely you're seeking, it could be a lot lower. This is physics, right? So you have to remember like F equals MA, you remember all of that stuff. You actually, this thing has mass and so you have to accelerate it up to speed to go from one part of the disk to another part and you have to decelerate so you don't overshoot the track. So there's a lot of complex sort of algorithms here as to how quickly you can accelerate and how quickly you can decelerate. Okay, average rotational delay. So laptop and desktop drives typically rotate somewhere between 3,600 to 7,200. I'd say most are probably 4,500. The higher end drives will be 7,200 RPM. And that gives you a rotational delay of 16 to eight milliseconds. Of course, for the average, you only have to wait for the thing to come around halfway because on average when you get to the track, the sector you're looking for is gonna be halfway around. So you can cut those numbers in half. Drives that we use in server environments, they spin at 15,000 RPM, revolutions per minute. You probably don't wanna stand next to a big chunk of metal spinning at that speed. Those platters, like I said, are typically made out of ceramics, so if they do fail, they're contained by the dry strain. But that's also part of the reason why these things have such massive chassis is because if the disk does fail, you don't want it going flying all over the place. And then controller time, this will depend on the hardware. So higher end drives will have powerful processors. Lower end drives will have weaker processors. Yeah, question? Yes, it does. Yeah, that's a very good question. The question is, are we reading through the disk with these multiple heads? Or do we have one giant head or do we have individual heads? We have individual heads. So if you look at the top of the drive, you can actually see how small, actually this head fell off, but you can see how small the head is. So that's why here there's actually a stack. So here's one head, and there's the counter-canilever mechanism. There's another head under here, there's another head here, there's another head under here, there's another head here. So each surface has its own head and each surface has its own set of tracks. So the way we can increase the amount of data that we can store in a drive, one way is by adding more platters. So here's a large set of platters, this is from a much larger drive, whereas this is a much thinner drive, it only has three platters, whereas this one has, I don't know, like eight platters. So what happens is those heads are all connected to an amplifier and we can, mucks actually, we can switch the amplifier for the head from one head to the other. Yeah, that's correct. So there's, we read from one head at a time, there have been sort of Rube-Golberg kinds of proposals to have multiple arms, but you get into a lot of airflow problems and vibration issues and such sorts of things. So of the heads that you have, you read from one head at a time and part of the other reason is that you've got one DSP. So you can only read from one track, process the data from that track, then you can switch the amplifier to a different head, read that data and so on, without having to move the arm. That's exactly what we have. So each of these arms is talking, each of these arms here has a head that goes up and a head that goes down. So both sides of every platter have a head on it. Sometimes the bottom platter doesn't or the top platter doesn't or on older drives they used it for servo information so you could do fine grain positioning. But today we try to use every surface. It's critical. We're trying to pack as much data as possible onto these. So here's another way sort of of looking at it. The heads are like a comb that touches each surface and floats above each of the surfaces. Any other questions? Okay. So controller hardware is gonna depend on how much money you pay. Higher end drives will have a more powerful processor and be able to process data faster. Cheaper drives will have a lower quality processor and lower performance processor. Transfer times will range from 50 to 100 megabytes per second is typical. The transfer size is usually a sector which could be anywhere from 512 bytes to, actually the sector sizes are getting a lot larger. They're now I think up to four kilobytes or even sometimes 16 kilobytes on some of the really massive drives. Rotation speed again is anywhere from 3,600 to 1,500 RPM. The recording density that the sort of bits per inch on a track will vary widely also. Again, the higher end drives will have a much higher recording capability. Originally the magnetic domains were arranged flat sort of like flat stones. And then they realized they could actually pack more data if they arranged them vertically like a set of pencils on end. So the cells are getting smaller and smaller that they're storing your information in. And diameters range from the size of a quarter up to typical is five and a quarter for the largest size. Probably the sweet spot is two and a half or three and a half inch drives. Costs drop at exponential rates. Performance doesn't change that quickly. Performance kind of improves on a decade by decade basis because it's physics. But the cost drops very rapidly. So even this number is kind of off. It's a little, I think it's a little bit higher than that because there were floods last year and that took out a lot of capacity. But it's under a dime, rather, per gigabyte. OK, so performance wise, let's go through some simple examples. So we're going to ignore the queuing times, ignore the controller times from now. We're just going to worry about the seek, the rotational delay, and the transfer time. We're going to assume an average seek time of five milliseconds, a rotational latency of eight milliseconds, and a transfer rate of four megabytes per second. Sector size of one kilobyte. So let's say we want to read from a random place on the disk. So again, what do we have to do? We have to seat to that location. We're going to move the arm. That's going to take us five milliseconds. We have to wait on average for half of the disk to rotate around. So that's going to take us four milliseconds, so half of the eight. And then we have to transfer the sector, which a transfer rate of four megabytes per second and a sector size of one kilobyte is going to take a quarter of a millisecond. So total time to do that transfer is going to be nine and a quarter milliseconds, or roughly 10 milliseconds. So we have a transfer rate to a random place on the disk of 100 kilobytes per second. Anybody see something wrong with that? We're paying for a drive that's supposed to give us, it says on the label, transfer rate four megabytes per second. And we're only actually getting 100 kilobytes per second out of it. So that's really bad. So we can do a little bit better if we stay on the same cylinder. Because if we stay on the same cylinder, we don't have to wait. We just have to wait for the disk to rotate. And so now we just have the rotational delay, which is four milliseconds, plus the transfer time, which is a quarter of a millisecond. So four and a quarter milliseconds, or approximately five milliseconds. So we've doubled our transfer rate from 100 kilobytes to a whopping 200 kilobytes per second. Still a far cry from the four megabytes per second that we were promised. So I'm still not going to be really happy. If we read the next sector, no delays. We can just have transfer time. Now we actually see the four megabytes per second that we were promised. So the takeaway here is if we want to use our disk efficiently and think file system, we want to minimize the amount of seeking and rotational delays that we incur. So as much sequential reads of the next sector on the same track or cylinder as we can do is going to be really important. That's going to give us the highest performance. OK, so that kind of naturally leads into the question of how we actually schedule our disk. So just as we had CPU scheduling, now we're going to look at a different scheduling problem. So the disk can only do one thing at a time. And we have to move the arm in order to get to the location that we want to read from and wait for the disk to rotate. So I'm going to ignore cylinders for now and just assume we have a single arm and we're moving it from one track to another. So we have a bunch of requests that come in from users. So track 2, sector 2, track 5, sector 2, track 7, sector 2, and so on. We have to figure out what order we're going to service those requested. Now there's lots of scheduling algorithms and we're going to go through four of them and look at the trade-offs between them. The four are first in, first out. That's our favorite, our simplest of all. Shortest seek time first. And then two scan algorithms, scan and C scan. OK, now to further simplify the problem, we're going to ignore the sector number and just look at the track number. So we're going to start with our disk head on a particular track and we're just going to move it around following a given scheduling algorithm. So let's start with FIFO. So let's say we have our head sitting here on track 5 and our request queue is 2, 1, 3, 6, 2, and 5. Those are the track numbers. So what's going to be our scheduling order? Yeah, it's really easy. Just 2, 1, 3, 6, 2, 5. So we're on 5. We will move the head in to service 2. We'll move it in again to service 1. Then we'll move it back out to service track 3. Then we'll move it out to 6 to service request 6. Move it back in to service request 2. Move it back out to service request 5. So advantages, really simple to implement, and it's fair amongst the requesters. Whatever order you arrived in is the order your request gets serviced in. But depending on the order of arrival, we may be doing a lot of long seeks. And I just got through telling you, the last thing we want to do is lots of seeks, because that's going to really hurt our performance. We're going to get the lowest speed out of the drive instead of the fastest speed. OK, so let's look at shorter seek time first. Instead of trying to just take it in the order, we're going to do a sort. So if we assume the head's again starting at track 5, now this is called shorter seek time first. It also has to take into account the rotational delay. But for today, we're just going to look at the seek time as our metric. So if our request queue is 2, 1, 3, 6, 2, 5, and we start on 5, where are we going to go first? 5, seek distance of 0. And then we'll go 6, and then we'll go 3, then 2, and 2, and 1. So that'll look like this. We'll service the request that's on track 5, then we'll seek to track 6, we'll seek to track 3, seek to track 2, service the second request on track 2, and finally seek to track 1. So the advantage, we've reduced the number of seeks. We're seeking, I'm sorry, not the number of seeks, the distance we're seeking. So a lot shorter distance seeks. The disadvantage is it could lead to starvation. If we have lots of requests that are out here, we may never service requests that are at the inside of the disk. So this would not be a good algorithm to implement in practice. So an alternative is scan. So with scan, we're going to take the closest request in the direction of travel. So we're going to assume the head is moving towards the center of the disk, and our request queue is 2, 1, 3, 6, 2, 5. So what's going to be our order? We'll first take 5, because we're already on that track, so we don't have to move. Then we'll take 3, then we'll take 2, the other 2, 1, and then we'll seek out to 6. So we'll go 5 rather, 3, 2, 2, 1, out to 6. Yes, yes, because there are two references to 2, so we look at 2 twice. Yeah, so the question is what happens if another request for 2 and another request for 2 and another request for 2 comes in? Yes, that could cause us to starve the other requests. If we're continually sorting our queue by the closest things. So advantage here is we're not going to starve areas as long as we have a distribution of requests. If we have a hotspot, we're going to stay on that track until we service the request on that track, and we've reduced the amount of seek distance that we have. But the disadvantage is it's not very fair. We're favoring the tracks in the middle. This is why if you ever live in like a high-rise building, pick a middle floor, because then there's always elevators going up or elevators going down. And so you have a high probability of catching an elevator in a very short amount of time. If you're at the extremes, it takes a lot longer. That's why we put the labs and sort of hall in the basement. It takes a lot longer for you guys to get elevators. But faculty offices used to be on the fifth and sixth floors. So we had elevators always going by us. OK, so a fairer approach is C-Scan. So with C-Scan, we're only going to service the request in one direction. So for example, we might only service the request going from the outside to the end. Coming back, we won't service the other way around, from the center to the outside. So if our request Q is 2, 1, 3, 6, 2, 5, our order will then be service the request on track 5, continue moving out to track 6, then seek all the way back to track 1, then service 2, then service 2, then service 3. So we'll do this, 5, then we seek out to 6. We seek all the way back in, and we service the request on track 1, service the request on track 2, the second request on track 2, service the request on track 3, all right? Yes. Sure, that's a very good question. And that's actually the original way hard drives work. So the original hard drives were actually on a drum, and it was coated with magnetic media, and you had a head on every track. But the heads were actually in physical contact with the drum. So that limited the size and the density of tracks that you could have. And the heads were direct contact with the drum, which meant they were wearing the media off over time. So with Bernoulli drives, the head is floating with Winchester drives, rather, using the Bernoulli principle, the head floats above the drive. It's less than the width of a human hair that it floats above the drive. So if we put multiple heads, first of all, we wouldn't be able to put them close enough to match the densities of tracks. And the second is the airflow would interfere with each other, and we'd have heads crashing all the time. And the third would be, if you wanted to make it move, it would be really heavy, and that would require a really big motor. Yeah, so, I mean, people have looked at a variety of different techniques. You know, if you really want to read something that like drives that are really amazing, look up shingled magnetic recording. With that technology, rather than having discrete tracks, the tracks actually overlap, and they use complex signal processing to actually recover your data, hopefully. Seagate, I think, is gonna be releasing those drives next year, and they're gonna store like five terabytes in one of these small form factor drives. Yeah. Ah, so the question is why short seek time first will lead to starvation. That's because it's going to always take the request that's closest. It's gonna take the request that's closest. So, if there are a lot of requests, say, to track six, it's gonna continue to service the, or five and six. It's gonna just ping-pong back and forth servicing those requests, and it won't ever go in and service the requests that are inside. Okay, so with C-Scan, again, we're only going in one direction, so we have the advantage that it's fairer than scan. But the disadvantage of, we have these long seeks where we don't do anything. It seems like kind of wasted work, right? But this is a trade-off to get better fairness. Okay, any other, yeah, sure, any other question? Yes, C-Scan and scan are better in terms of starvation. So, again, here, we're not going to just simply go back and forth, so if we got a request, if we're on track five, and then we go to six, and then we get another request for track five, even though it's very close, we're not gonna service it. We're gonna go all the way in, and then work our way back out. That's correct. If there was a string of requests for a single track where we're not seeking, we would service those requests. But we're not gonna star for seeks back and forth between five and six, or four, five and six, and ignore, say, track one, yeah. Yeah, so you can, the question is, could we still have starvation, or are there more sophisticated things? Drives today are incredibly sophisticated, so they have a variety of scheduling algorithms. You can do more things like, for example, look at how long something sat in the queue, and if it sat in the queue too long, then bump it to the front of the queue and service it anyway, so you don't have starvation. They're implemented in, the question is, are these algorithms implemented in software? They're implemented in the firmware of the drive, so yes, they are software. So the drives have a microcontroller or a microprocessor that's running all of these algorithms. Again, you know, older drives were really dumb, and all of these algorithms and scheduling of the drive was implemented in the device driver, in software, by the main processor. Now all of this is implemented primarily by the actual device. Okay, any other questions about hard drives? Yeah. Sure, absolutely. The question is, is there any optimization where we place data? So, you know, one of the optimizations that, so originally when people were storing the sort of, we'll get to it, I'm skipping way ahead, but the inode structures, which are the metadata for where we find files and things like that, that was originally stored on the outer track of the drive, but the head is never near the outer track of the drive, so quickly it was realized it was much better to store it in the middle tracks, because that's where the head is spending a lot of time, and so we're gonna have much shorter seek times to go and read the metadata that tells us where to find our files and directories. So absolutely, you know, it's really, you know, we're gonna see a lot in file systems that were shaped by these physical constraints that we have with a hard drive. Any other questions? So now let's switch gears and talk about what I would really consider a very revolutionary change in how we store our information, and that is the rise of the solid state drive. So it turns out solid state drives themselves are not new. They've been around since the mid-90s. Back in the mid-90s, it was basically take a card, put a bunch of DRAM on it, and put a battery on it. That was a solid state drive. It worked. It was very expensive, but it was very fast. You know, if you had any, if the battery ran down, you lost all your data. So it primarily had applications in very high-end servers and also military applications. In 2009, that's when the real change happened. That was really the introduction of NAND-based multi-level cell flash memory. So in multi-level cell flash memory, you have these cells, and each cell you fill full of electrons, so floating transistor gate. And depending on how many electrons you put in it, you can measure four levels of charge, and that four levels of charge translates into two bits per cell. And you store data in your sector size here as four kilobyte pages, but you store multiples of those pages, anywhere from four to 64 of those pages, in a memory block. And I'll go through what that means in a lot more detail in just a moment. No moving parts. This is what makes it so revolutionary. They're really small. So I asked one of my colleagues who just had an SSD fail on him, could I bring it in, take it apart and show it to the class? He's like, no, my data's on there. And I'm like, yeah, but it's all corrupted, so it doesn't matter anyway, but still didn't want to part with it. But it really doesn't matter, because what does it look like? Well, on the outside, it looks just like a hard drive, and if you open it up, it just looks like a computer. It's just a bunch of identical memory chips, and they're really thin. The amazing thing if you open up one of these is it's mostly air. Because it's designed to match the form factor of a standard laptop drive, or standard enterprise drive. But inside, it's just a bunch of chips. And depending on the capacity, it may even only be chips on one side of the board. The higher capacity drives will have multiple boards or multiple chips on each side of the board. But no moving parts. No motors. So it's silent. It's very low power. It's incredibly shock resistant. And we've eliminated those physical constraints. So there's no seek delay. There's no rotational delay, which means our access times are on the order of tenths of a millisecond. So really fast access times. So how do we read from one of these things? So if we look at it from an architecture standpoint, what we find inside is that there's some software that manages a queue of incoming requests, and also a little bit of DRAM. Now high end drives will actually have a super capacitor or really small battery in them so that if they lose power, the DRAM remains powered. Consumer drives and most other enterprise drives do not, which means you can actually suffer data corruption if you turn off power to an SSD if it has any state that's currently being stored in the DRAM. There's a flash memory controller, which actually talks to the banks of flash chips. And you actually have parallel access to them so that you can actually do multiple things at the same time internally with a flash drive. Reading data is very fast, 25 microseconds. So incredibly fast. No seek, no rotational delay. The transfer time is basically how long does it take me to transfer a four kilobyte page? And that's gonna be mainly limited by how powerful this controller is and the disk interface. So if you connect it via SATA, you're gonna be limited to somewhere between 300 and 600 megabytes per second. So latency then is just queuing time in the operating system, plus the controller time, the queues here, plus the transfer time. So very, very low. So the amazing thing here is now you get the highest bandwidth if you read sequentially or you read random because there's no moving parts. Everything's equally far apart. And also everything's equally close. So this kind of really changes everything when we start to think, when we look at file systems, there are a lot of decisions that file systems made based on drives being physical. Physically things that have to move. Okay, that's the easy part. Everything comes at a price. Rights are way more complicated and way more expensive. So rights can take anywhere from 200 microseconds up to 1.7 milliseconds. That's versus 25 microseconds for a read. So an order of magnitude to a couple order of magnitude. The problem we have with Flash is you can only write a blank page. Page has to be empty. So we write data in four kilobyte chunks. When a page is full or full of things that we don't want to use, we erase it. A racing is really expensive. Racing takes one and a half milliseconds. So what's gonna happen is the controller is behind the scenes gonna constantly be trying to maintain a pool of empty pages on blocks so that you can write at this 200 microsecond speed. But depending on the drive, if it's a consumer drive, the controller's less powerful and has a lot less flexibility and freedom to find free blocks and free pages. And so it's gonna perform a lot slower than an enterprise drive, which is, as I'll talk about in a little moment, a little while is going to cheat. Yeah. Yeah, so the question is why does this take so long? So a racing is really expensive because you have to apply a high voltage to each one of the cells to basically reset the cell. And so that makes it expensive. Right similarly, you have to apply a high voltage to get those electrons to stick in the cell because you have to basically get, force them through an insulator and that takes a lot of current. And so charging up the rails to do that is time consuming and that's where your biggest delay comes from. Ah, good question. So what about this is different? So in a DRAM, you're storing it in a leaky cell basically. So you have the concept, it's called dynamic RAM because you periodically have to refresh the RAM and recharge those cells. If you remove power, all the charge leaks away. In a non-volatile, in a flash cell, a NAND cell, there's an insulator. And so you have to force electrons into that well and then once they're in the well, they're trapped. So getting them back out, again, you have to apply a high charge to force them out of the cell. So that's the big difference between DRAM and non-volta RAM because it's insulated, it's like pouring liquid nitrogen into a doer. It'll keep it nice and cold, unlike if you put it in a cup where it'll just boil away. I'm sorry, what's the question? Yes, so the question is, is this what people call flash memory? Yes, so NAND flash doesn't just appear in drives, it also appears in USB sticks, SD cards, all of those rely on NAND flash technologies, multi-level cell technologies. The question is, is it true that this becomes unusable after about 10 years? Yeah, so there was a study at CMU that found the average expected lifetime of a drive was around seven to nine years, which is actually higher than a spinning drive, which is typically rated for a lifetime of around five to six years. The difference is, as we'll see in a moment, these drives are much more prone to a catastrophic failure where they just suddenly corrupt all your data. So they're really fast, but live dangerously, yeah. So the question is, how we deal with clearing pages and writing? I'm gonna go through an example of that in, I think it's my next slide, I'm gonna talk about how you actually manage the controller manages free space on pages. You do use the DRAM, you use the DRAM as a temporary holding space. That's why there is this risk of, if you turn off the power, you could turn it off while you were in the middle of an operation and lose the data that you thought was non-volatile on the drive. I think it'll become much clearer when I go through the example. Was there another question? Yeah. So the question is, do SSDs always require a battery to retain data? No, so the thing is, there is this small DRAM which is used and we'll see how it's used for housekeeping. Aside from that, everything is non-volatile. So the only risk is, if your drive is doing this housekeeping and you turn off the power, you hold down the power button on your computer or pull the plug or pull the battery, there is the risk of low-level corruption. But only for that narrow window. But it turns out there was a research paper recently that found that actually it was a significant likelihood if you pulled power from a SSD that you could have some low-level data corruption. So always shut down your operating system. Don't pull the plug. Okay, so how does this work? So here we have a block that has a bunch of free pages on it and we write A, B, C, D. Now we're going to do eight more writes. We're gonna write E, F, G, H and we're gonna write new versions of pages A, B, C, D. Cause we updated them. We recompiled your project. So rewrite that data. Now we can't actually get rid of these so we're just gonna record in our bookkeeping tables that these are the correct values for A, B, and C, and D. And these are obsolete. We don't need these anymore. And then periodically what's gonna happen is the controller is going to go and create more free space. So now we have no free pages. So what it'll do is it'll find another block and erase that block and then it'll copy all the data from that old block to the new block. Now we have four free pages. This is called coalescing or garbage collection. When you're doing this coalescing or garbage collection, it is one of the most complicated algorithms that the drive has. And as a result, they often get it wrong. People are human. And this, Sausage Drives is an incredibly competitive marketplace. Everybody's trying to provide the fastest drive with the highest capacity, with the best ware leveling algorithms, as we'll talk about in a moment. And so as a result, bugs creep into the firmware. So I had one of these drives, probably about three or four years ago, that was in a Windows laptop. And over the course of six months, it managed to corrupt a significant fraction of my data. And it was all sort of silent corruption. All of a sudden, my operating system didn't boot. And when I did boot it, I could see files had been corrupted. I make backups. I make backups on a nightly basis, so I didn't actually lose any data in the process. That's a public safety announcement for making backups. My colleague, who had his drive failed two weeks ago, exact same thing, different manufacturer, silently corrupted the data and rendered his system non-bootable, he went to his backups and his backups were corrupted. So he lost like six months worth of work. The biggest problem actually was the reconstructing, finding all the license keys and files and disks and things like that so he could reinstall his software. So that's a public safety announcement for, you should not only make backups, but you should periodically actually test to see that your backups work. Don't say I didn't warn you. Now, when a drive is typically full, which drives are always full, because that's just the nature of the beast. We are really bad about deleting things. You end up with around one erase every 64 to 128 writes. In enterprise drives, they will reserve more blocks than the drive actually has. In fact, it can be as many as 80% of the capacity of the drive is reserved and unavailable to you. That way, they can always have free blocks and you can write to them at full speed. In consumer drives, they typically only reserve maybe five to 10% of the capacity of the NAND chips for doing stuff like this and also replacing pages that wear out, yeah. What's my opinion on cloud backup? I think it's a great idea. I think you should use more than one cloud provider. Cloud providers do occasionally go out of business and you don't want them to go out of business and take your backups with them. So I backup locally on a NAS drive at home and then I also mirror that to a disk in the office. So I back up to multiple places. And then I keep a lot of stuff in the cloud also, yeah. So how do you test your backups? Have they been corrupted? You can periodically test your process by trying to recover a file and seeing whether or not you can recover it. That often diagnoses a lot of problems. If you really care, you can actually, you know, sit there and run a diff that will compare the contents of your disk with the contents of your backup. So one of the other things I also do like for my Mac book is I clone it periodically. I clone it to an external drive. That's just to speed up the recovery process. I can recover quickly to a cloned image and then play forward on the incrementals. Okay, so one of the challenges with NAND is that it's destructive doing these reads and writes and erases because it requires high voltage to change the cells. So we end up damaging the memory cells every time we erase a page. So erase a block rather. So the drives are constantly trying to balance and level where the writes are going and where the erases are going. So what this means as a side effect is that you write some data and it gets written to a location. And then the drive is constantly doing garbage collection and this ware leveling. And so your data actually moves through multiple chips on the drive. It's called write amplification. And this is why it's really important that those coalescing algorithms be correct because every time you move the data, there is a probability if the algorithm has an error of introducing corruption. So no really good solutions except to buy from reputable manufacturers which I thought I was doing and make sure you're on the latest version of the firmware but don't go on the latest version of the firmware until after it's been out there for a while and bugs have been found in it. The other thing is the controller will use error correcting codes to be able to correct when errors do occur and a memory cell has become degraded. You'll correct the error and then you'll mark the block as being unusable and use one of your spare blocks. Yeah, sure. Right, so the question is why can't we sort of do a replace in place? And the problem is you have to erase. There is no rewrite. Once you've written a page, you cannot change this page without erasing the entire block and then rewriting it. Yeah, if you have a free page, you can write that but if you don't have a free page, you have to erase an entire block and then write the pages. So this is why your drive does a lot, a lot of copying. Yeah, so the question is for erasing the block, effectively that's what you are doing. You're removing all the charge, you're writing zeros to all of the pages on this block. No, in order, because you need to remove, if you have a one, you need to remove that one from a cell and removing that requires high voltage. So to reset all the cells to empty requires high voltage erase. So the question is when we're erasing, are we erasing everything on a chip or just a page? You're just erasing a page. The page is our unit of allocation within the SSD. So we write individual blocks and we erase entire pages at a time. So again, the question is why don't we have to do this with main memory? Because with main memory, it's volatile. We can easily remove or add charge because again, the way that a NAND cell works is it's this, literally this cell that we have an insulator on top of and a floating junction transistor on top of that. So we have to use high voltage to get charged to go into that cell because it's fully insulated. So this way, when we take power away, those electrons, and it may be a very small number of electrons like tens of electrons that are sitting in that cell will stay there and not be able to leak out. Yeah, well DRAM is similar except that DRAM, the cell that does not have an insulator. So if we remove power, the charge all just floats away. We can drain the charge out of a DRAM cell. Okay, so the result of all of this complexity is that writes are very workload dependent because it's the queuing time in the operating system plus the time for the controller to find a free block that we can write to. Actually, this should be a free page and the transfer time. The highest bandwidth we're gonna get is gonna be either with sequential or random writes. In both cases, it's limited by the number of empty pages. So if the controller's doing a good job in the background of creating empty pages, we're fine. And the highest end controllers will do that as fast as we can write. So as fast as the bus into the drive. The rule of thumb though is that writes are 10 times more expensive than reads and races are 10 times more expensive than writes. So in terms of the sort of price performance trade-offs, capacity trade-offs, hard drive is 50 to 100 megabytes for sequential read-write. The cost is around five cents per gigabyte to a dime per gigabyte. Size, two to four terabytes, ever growing bigger. SSDs, it varies. So on the consumer and low-end enterprise size, you're looking at 250 megabytes per second for reads and writes. Cost is a lot higher, a dollar to a dollar 58 gigabyte. Capacities, 200 to a terabyte. 200 gigabytes to a terabyte. For enterprise, high-end enterprise drives, you can actually read at six gigabytes per second. Instead of attaching the drive to the SATA bus, they'll attach it to the PCI bus, which lets you do six gigabyte per second transfers. Writes, the highest end drives, and these are really expensive, so add a lot of zeros here, are 0.4 gigabytes per second. So they can write almost as fast as they can read. But again, you're getting a really small drive. The vast majority of the drive's physical capacity is being reserved by the controller so that it can produce free pages as fast as you're writing at this 4.4 gigabyte per second rate. And then memory is typically 10 to 16 gigabyte per second depending on the bus. A little bit more expensive at five to 10 gigabytes, a dollars per gigabyte, and capacity is anywhere from 64 to 256. It's probably top capacity for servers. So takeaway, SSDs give you 10 times the bandwidth of a hard drive for consumer side, and DRAM is 10 times faster than SSDs, again on the consumer side. Price, hard drives are 20th the cost of SSDs. SSDs are 1 fifth the cost of DRAM. That said, the cheapest way you can make an old computer run faster, put more memory in it, and replace the hard drive with an SSD. So how many people have an SSD? So it's amazing, every semester this fraction grows. I would bet five years from now it will be 100%, probably even sooner than that, just because the costs of SSDs are plummeting on an exponential curve. It's all about volume. The more volume that's produced, the lower cost it's gonna be. Okay, wow, we're running way behind. So advantages for SSDs relative to hard drives. Low latency, high throughput, eliminate all the physical delays of seek and rotation, no moving parts, lightweight, low power, silent, shock insensitive. We can read at memory speeds based on the controller and the bus. Downside, they're a lot smaller and they're much more expensive. An interesting thing is a hybrid where you combine a small SSD with a really large hard drive. So think like a 64 gigabyte SSD or 128 gigabyte SSD, paired up with a terabyte hard drive. So if you manage the storage correctly in the file system, you can store the frequently accessed things in the SSDs and store all the rest of your photos and everything in the hard drive. So you get the capacity of a hard drive with the speed of an SSD. Is there a question in the back? Isn't that basically a hard drive? Exactly, it is basically a cache. We turn the SSD into another tier of storage in our memory hierarchy. Macs do this, Mac minis you can buy with the, I think they call it the fusion drive. Windows will do it with, they call it ready boost. And then I think Seagate sells some drives that do that. The other con here is the asymmetric performance when we're doing writes. Because we have to do this, is read a race and write operation in order to find free pages. Also because we're doing destructive writes and destructive races, there's a limited drive lifespan. The average failure rate is around six years and the life expectancy is nine to 11 years. That's based on a CMU study of SSDs. The downside, however, is they tend to fail more spectacularly because of these bugs in the controller algorithms. Whereas hard drives are very well understood and so the failure rates for hard drives is also very well understood. Okay, so some administrative stuff. We have a design doc that's due tomorrow at midnight. And next Monday, we have a midterm. And just a reminder, don't want everybody to show up here. If your last name is A through L, you'll be in this room. If your last name is M through Z, you'll be in 2060 Valley Life Sciences building. It'll cover everything in the course up until today. So that's lectures one to 13. That's all the readings, the handouts, and projects one and two since your design doc for project two is due tomorrow. The TAs are awesome. They're gonna do a review session on Friday from five to 7 p.m. in 390 Hearst Mining Building. Come with questions. So they're gonna spend a small portion of time reviewing the material in the class. The rest of the time is gonna be open for you guys to ask questions. There's over a decade worth of midterms on the course homepage. I encourage you to read through those exams. You'll see that there's a lot of commonalities and a lot of problems that of various forms or other tend to recur on the exams. So make sure you especially study those problems and come with questions to the review session. Finally, as Professor Canny pointed out on Monday, we have a survey monkey up. This is your opportunity to be heard. Your voice matters. We care about your experience in this class. If you're not having a great experience in this class and you have some constructive criticisms for us, please provide that feedback. We'll try to make changes to improve the class. If you're having a great experience in the class, also let us know. We like knowing if we're doing things right or we're doing things incorrectly. We get feedback from Etta Kappa New, but that doesn't happen until like the middle of the next semester. So it's hard for us to make changes based upon that. Any questions? All right, so I was gonna do a quiz. I'm not gonna do a quiz. I was gonna take a break, but we're running really far behind. So I'm going to skip the break. Hopefully I'm still keeping you awake. Okay, so let's talk about building a file system. So a file system is the layer in the operating system that takes us from this block interface that the drives provide us with and turns it into files and directories. So there's a whole bunch of things that a file system has to do. Not all file systems do everything, but every file system does disk management. Turns a set of blocks into files. Every file system deals with naming. Provides an interface, a human readable interface to translate human readable names into files and directories. They provide layers to keep your data secure. They provide layers for reliability and durability. So making sure your files are still there even if you have a disk crash or a media failure or some malicious attack or something like that. Not all file systems provide a high level of reliability and durability. Some do, not all do. Now, we can look at a file from many different levels. We can start by looking at it from the point of view of a user. From point of view of a user, we have durable data structures. I have a grades file. You have your project code, right? Those are data structures that have semantic meaning to us. From the point of view of an operating system, it's just a collection of bytes. So at the system call interface, it knows nothing about the semantics of what you're trying to store. Could be a movie, could be pictures, could be HTML files. From the point of view of a file system, from the operating system, it's just a collection of bytes. If we look inside the operating system, a file is actually just a collection of blocks. A block is this logical unit of transfer to and from the drive. And a block is larger than a sector size or equal to a sector size. So in Unix, the typical block size is four kilobytes. It's getting bigger now and now you'll see things like 16 kilobyte block sizes. So we need to go from the user's view down to what we actually store on the disk. And so from a user point, if a user says something like, give me the bytes that are stored at location two through 12 in my file. What does the operating system do? Well, it's just gonna retrieve the block that contains bytes two through 12 and then return just bytes two through 12. What if we say write bytes two through 12? Well, now it's gonna do a read modify write cycle. It's going to fetch the block containing bytes two through 12. It's going to modify the portion of the block and then it's going to write that block back out. Yes, in this case, we'll assume four kilobytes. Yes, and it is an expensive operation. We're just writing 10 bytes and we've gotta read an entire block of four kilobytes. And especially as blocks get bigger, like 16 kilobyte blocks, yes, we're wasting some of our bandwidth to do this. But when you have a terabyte size drive or multi-terabyte size drive, four kilobyte or even one kilobyte or the original 512 byte blocks, that doesn't work very well. We need a lot of data structures to keep track of that. Yes, yes, write it to the disk. So for now, we're gonna assume there's no cache. In reality, there is a buffer cache and we would write the block into the buffer cache and then eventually the file system would write out the block to the actual hard drive. Okay, so everything in a file system is in these whole-size blocks. So it doesn't matter whether we write an individual byte or we read an individual byte. So if we're doing getC and putC, those are gonna do buffering. So if we're doing putC, we're gonna buffer until we get to four kilobytes and then write out four kilobytes. So if we're doing getC, we'll read an entire four kilobyte chunk and then return it to the user program a byte at a time. So from now on, everything we're gonna think about here is going to be as a collection of blocks. We're gonna ignore the sort of byte interface that gets built on top of it by the rest of the file system and the operating system. We're just gonna think about a set of blocks, a collection of blocks. So next question is how we manage those blocks. And we're gonna break them up into two sets. One is files. These are user-visible groups of blocks that are arranged sequentially in a logical space. So just as like with an address space, we had the notion from a user standpoint that we had memory from zero to two to the n. The same thing is true here. A file starts at zero and goes to n where n is the size of the file. And then that's turned into a set of blocks scattered out on the disk. Even though from our standpoint, it's a sequential set of blocks. The directory is just the user-visible mapping where we map from names to files. Now, we access the disk as a linear array of these sectors. Modern drives use what's called logical block addressing. And every sector has an integer address from zero up to the maximum number of sectors. The controller deals with everything else. So the controller deals with the actual physical layout of those sectors on tracks and heads and so on. And it deals with any bad blocks. So there will be defects in the media. At the factory, they analyze the drive and see where the defects are and then store that in a firmware chip on the drive. And then the drive keeps track of all of its bad blocks and remaps so you don't see any bad blocks. That all happens under the covers. And then the controller also is responsible for figuring out where a given sector is stored on the disk. So it's hidden from the operating system. Again, older drives, older operating systems had to worry about everything. They had to know the physical layout of the disk. Modern drives, you don't know anything about the layout of the disk. Question? Ah, very good question. So if the hardware is, and the drive is hiding these bad sectors, how much of your drive can go bad before something happens? So there's a set of reserved sectors on the drive. Same thing with an SSD. We have reserved blocks on the SSD. And as blocks go bad in an SSD or as sectors go bad in a drive, it transparently remaps those behind the scenes. Now, you can actually find out exactly how many of these reallocations have occurred if you run the smart command or smart tool against your drive. I mean, you'll know what a smart tool is. So the rest of you have no idea. Your drives could be just about to fail. On a Mac, you can go to the about system information and if you look at your drive, it'll say smart tests passed. If it says smart tests fail, time to get a new drive. Under Windows, you can download smart tools either from the drive manufacturer or there are a bunch of freeware ones and actually see all of the statistics for your drive. How many reallocations have been done? How many times the drive's been turned on and off? How many times there's been an error that's been correctable? How many times there's been an uncorrectable error? And if you look at those statistics and they start growing, time to replace your drive. The error statistics, not the power cycle statistics. Yeah, you're skipping ahead. So the question is if the hardware is abstracting the drive and lying about all this stuff and it just looks like a linear array, how can we build file systems that take advantage of the structure of the drive? The answer is it's becoming really hard because the drive is not being honest about where it's storing data. You think you're storing stuff sequentially and it may actually be stored, if a block was replaced, it could actually be stored somewhere else. So this is actually a great complication. Now, the good news is as we shift to SSDs, all of that goes away. Okay, so we need a way to track what blocks are free on the drive and originally that was done using a free list, a linked list, that's inefficient for a four terabyte drive. So we use a bitmap instead. We need to know where all the blocks of a file are located. That's stored in a file header. So each file has a file header and that tells you where to find the blocks for that file. And so we're gonna optimize the placement of a file's disk blocks to try and match the axis and the usage patterns for those files. So this should really raise the question, what are the axis patterns for a file system and what are the usage patterns? So let's start with axis patterns. So there are three different axis patterns, two of which are very common. The first is sequential. So this means I read a bunch of bytes and then I read the next end bytes and then I read the next end bytes and then I read the next end bytes. Most accesses are of this form. Random is give me some bytes A through J. This is important even though it's less frequent because we use it for things like paging. The swap file on disk, we don't wanna have to read the whole swap file just to read a page so we can restore it to memory to implement virtual memory with paging. So it's not a very frequent case but it has to be very fast because when we do use it, we need it to be very, very fast. And then finally could be content-based access. So you could say for an HR database, a human, an employee records database, find the hundred bytes starting with Joseph and then take the salary field and double it. Don't think that's happening anytime soon. Most systems don't provide this form of access. Instead what they do is they build a database which requires random access on top of files that are organized sequentially like a set of records. An example would be something like spotlight search on the Mac. It allows you to search by content rather than having to think of the name of the file. Okay so usage patterns. Turns out most files on your computer are small. Right there are your .java files, your .c files and so on. There are a few files that are really big like executables and core files and swap files but most of the files are really small. On the other hand those large files can take up most of the disk space. It only takes a few Blu-ray movies to fill up your hard drive. I don't know that from experience. And that seems kind of contradictory but they're equivalent to a huge number of small files. A 10 gig movie file equals a lot of Java files. It's also the case that they use most of the bandwidth going to and from the disk because they're massive. Now we're gonna use these observations but it's very important to be aware that you can look at usage patterns and beat your competitors by optimizing for the frequent patterns but patterns change over time. And also people may be using systems because of the way they perform. Maybe people have lots of small files in Unix because big files are really inefficient and slow. So they break up their big files into small files. So anytime you try to look at a usage pattern and optimize around it you wanna be careful that you're not sort of falling down a rat hole and optimizing for what you see today and you get passed by by what happens tomorrow. When Unix file system were designed most people weren't storing multi 100 gigabyte sized files. Today that's very common. Okay, so our goals for our file system maximize sequential performance provide efficient random access and make it easy to grow and shrink files. So in the last few minutes I wanna talk about the most common file system in the world which is the file allocation table. Originally developed by Microsoft for floppy drives. It's in everything. If you have an Android phone it supports the FAT file system because that's what the SD card uses to store data. If you have a digital camera it uses the FAT file system on the SD card or the compact flash card. So it's in like well over, I think the last count it was like a several billion devices that use the FAT file system. That said it's insanely simple. Okay, so how does it work? It has a directory entry and it links together the blocks of a file but rather than doing it with the links being in the blocks the links are in a file allocation table. So this file allocation table has one entry for every block that you have on the media whether it's an SD card, a hard drive, a floppy drive or something else. So these entries are linked together and so if I wanna find, if I wanna read the contents of this file I first go to block 217. So this is our file header and that tells me that read block 217 then the next block of the file is gonna be 618. So I read block 618 then the next block is going to be 339 so I can read 339 and that's the last block of the file. So sequential access is going to be very expensive if you don't cache the FAT in memory because you're going to read block 618 then have to seek all the way back to the file allocation table, read it into memory, look to see what the next thing is which is 339 then seek all the way back over, read 339 then seek all the way back to the file allocation table to read the next and so on. So you wanna cache it in memory. Random access is always going to be slow but it's gonna be really expensive if the file allocation table is not stored in memory. Now one problem that this has is these blocks are not sequential. So we're gonna be doing lots of seeks for reads or for writes which is not very good. And so for floppy drives and for hard drives that use the FAT file system they're not gonna perform very well. If we put it on an SD card it'll perform very well because everything is equidistant on an SD card because it's NAND flash. Okay, I'm gonna skip the next quiz and go to the summary. All right, so what do we look at today? We looked at magnetic hard drive performance. The performance here is a function of the queuing time plus the controller time plus seek and rotational transfer cost. Rotational latency is on average half of a rotation to go around and the transfer time depends on the rotation speed and the bit density. We spent a lot of time talking about solid state drives. The read performance is a function of queuing time, controller time and the transfer rate. The write time is a function of queuing time plus the time for us to find a free page that we can write to which may require a racing and copying and doing lots of coalescing plus the transfer time. The amount of time to find a free block is gonna depend on how fully SSD is and also the duration of our writes. So if we're really pounding our drive with lots of writes it's gonna suddenly slow down on us because it's gonna be doing lots of coalescing and then also the drives have a limited lifespan but it's pretty long. Nine to 11 years is longer than most people keep a computer. The downside is they can fail catastrophically so you do wanna have good backups. Finally, we talked about file systems where we transform blocks into files and directories. We wanna optimize for the access and usage patterns. We wanna maximize sequential access but also allow very efficient random access since that's how we store things like our page files. And then finally, files and directories are defined by a header. In Unix we call this an inode. Any questions? Okay, please make sure Kevin gets all the drives back and good luck on next week's midterm.