 Welcome back to 162 everybody, I don't know if you're like me, it's pretty hard to turn away from the continuous state counts in the election, but let's get to some operating systems. We've been talking about how to actually store information on devices and give us the proper abstraction of files and directories and so on. We're going to get much more into that today. So we started by talking about devices and performance. So if you remember one of the things that we talked about is how important it is to keep your overhead down. And so we talked about this particular graph of effective bandwidth. And it said that even though we have a gigabit per second link, our effective bandwidth can be a lot lower and the reason is that there's this big overhead. For instance, a millisecond that affects our bandwidth. So this, this graph kind of showed you the size of a packet along the x axis and how long it takes to send it. And, you know, this is going to be a linear graph with respect to the size of the packet but it has an intercept that's not zero. And that means that the effective bandwidth, which is how much you actually get how many bytes per unit time you actually get is more like this red curve and you have to get past a certain point before you can even get half of a gigabit per second. Okay, and so when we look at our file systems coming up we're going to want to make sure that we somehow keep the effective bandwidth is low or keep the effective bandwidth is close to the real bandwidth is possible and that's going to mean we have to keep our overhead low. The other thing we talked about in some detail last time was performance and queuing theory and I just wanted to remind you of what we came up with. So among other things, we talked about how there can be many examples of cues feeding into servers. Okay, and oftentimes you can take a much bigger system with lots of cues and lots of intermediate servers and boil it down to this. And boiling it down to this. We basically we're talking about a system in equilibrium, among other things and so this queue is neither growing or shrinking without bound, probabilistically, we have an arrival rate that's some lambda per unit time and we have a service rate which is and by the way, you can be also one over the average service time of the server. All right, and so the different parameters we talked about our lambda is the average or mean number of arriving customers per second. T sir is the average time to service a customer. Sometimes that's called the mean or the M one moment. We have C, which is this squared coefficient of variance and that's the standard deviation squared over M one squared. This is unit list. And what's amazing about that is that our result. See basically tells us all we need to know we don't need to know how complicated the overall probability distribution was. So this server is allowed to have an arbitrarily complicated service. And we can compute the, the average service rate by taking one over that average time. This arrival rate however is memoryless in our model. So we're keeping things very simple here. Computing from our parameters so if we know the the average time to serve or the average time to get your hamburger at McDonald's then by one over that is mew. The server utilization is lambda over mew, or multiplying this out lambda times T sir. And the interesting thing we talked about is the fact that this server utilization can't be bigger than one. Can anybody remember why it can't be bigger than one what happens if row is greater than one. What happens if row is greater than one q grows without bound yep. That's problem. Okay, so the parameters we might wish to compute so if you notice here these three red ones are enough to compute the green ones and pretty much everything else if you have pretty much three out of these five then you can compute the others. And notice by the way that C is talking about the standard deviation of the service time of the server since this is a general potentially a general probability distribution. So parameters we might care about is how long are you in the queue that's T sub q, or what's the length of the queue on average okay and using little's law which we talked about last time that's lambda times T q. And some results that matter here now are for instance, if you have a memoryless arrival rate and a memoryless server rate, or an M M one q, then the time in the queue turns out to be the service time times row over one minus row. And if you have a general distribution, where it's not memoryless going out, then notice the difference between these two is very, it's very close it's still the service time times this extra factor of one half one plus C times row over one minus row. And the interesting question that we sort of confronted last time was, why is it that the service latency blows up and the answer is the simple answer is that all of these models come out with this row over one minus row factor so that as row goes to one, this thing blows up to infinity. And so really, this behavior that we're seeing here, potentially going to infinity if there's an infinite queue is solely the fact of this row over one minus row. All right. So you should be able I gave some examples at the end of the lecture last time. So, I would say you might want to remember them. This could be a useful thing for midterm three. The important thing to keep in mind is, I would go through the last lecture to understand where these numbers what these numbers mean but the most important sort of back of the envelope queuing theory results we've talked about in this class are these two, the M M one q and the Mg one q. And the one by the way at the end just means there's a single queue, and it's a single server excuse me so we could have multiple servers and then the equations would be a little different but if we were going to do that to you would give you the equation. All right, any questions before I move on. So once you've got these equations by the way it's just a matter of plugging in playing so you get some estimates of you know if this is disk service rates you might have estimates of how often requests come in from the user processes to the kernel. So this queue might be queuing in the kernel and this service rate has to do with the disk drive, which we gave you a way of computing how long it takes to do something with the disk and so once you know some of these parameters and you could make an estimate is my queue going to blow up, or you know do I need another disk in order to get more service rate here. Good. So, we also talked about a couple of lectures ago, a few ways of hiding io latency, which I wanted to just bring up because you know as we start designing file systems, you'll be able to start seeing where we can put some of these different options in here. So blocking interface of course is the one that you've learned from pretty much lecture number two or whatever, which says that I do a read. And I tell they say I want to read so many bytes and what happens is the system call doesn't return until we have that number of bytes or until there's an end of file. When we go to write, I'd say write these number of bytes and it won't return until they've all been written, or if it does return at least tell us how many bytes have been written. The other two I gave you a few lectures ago where the non blocking and the asynchronous, the non blocking interface basically says do what you can immediately and then come back and tell me, don't wait. So the non blocking interface may require you to process it in a loop, but you'll never be blocked waiting. The asynchronous interface is what I like to call the tell me later interface. This is an example where you the user code hands a buffer to the kernel and says well do do my read of 100 bytes and put it in this buffer. And then it immediately returns from the kernel, but you get a signal later that says it's ready. Okay, and so those are those are kind of two asynchronous options. And the reason they're interesting to bring up is pretty much in the kernel. The non blocking and asynchronous interfaces are really what the devices provide. Okay, they don't provide blocking. That's something that we give to the processes and it's an abstraction. So the asynchronous interface is exactly like a type of call back. Yes. Okay, and if you're interested, you can often turn this on for file systems and other devices by using the I octal interface on the file descriptor after you've opened it. So that's a good question. Okay. So if you remember we've had this kind of diagram almost from day one, we talked about lecture for even we talked about a bunch of different ways of accessing files like streams and file descriptors, etc. So that's the the f open and versus open. And then we've talked a bunch about devices over the last couple of lectures. And so today we're going to talk about what's in the middle. And what's in the middle is interesting because above we have this abstraction of bytes streams. Okay, streams of bytes, where we can ask for 12 bytes or 13 bytes or whatever. I think we know that there are blocks. Okay, we talked about disks, having sectors, or multiple sectors together giving you a block. And so that's not bite oriented that's block oriented so somehow in the middle, the file system has to provide a matching between the blocks underneath and the streams above and that's what the file system is going to help us do. Okay, and of course the things you're all used to with files like looking them up in directories. And closing them writing them all of that stuff needs this thing in red here to work properly and so that's what our next couple of lectures are about. Okay. So how do we go from storage to file system so up at the top level here, we have variable size buffers, and the API insist calls that we're using are all about, give me this number of bytes, and maybe we set the offset to give me this number of bytes at some offset or write these bytes at some offset underneath is the file system which is a block based interface, and a typical block that we might talk about is four kilobytes. That's a pretty common block size, which you should recognize from when we were talking about virtual memory as well. And this is underneath, mostly mapped to these blocks. Okay, so underneath, we have sectors which are smaller than a block, but typically we put a bunch of sectors together on a track on a disk and we call that a block. And so the physical sector being the minimum chunk of bytes that you could read or write, could be their 512 bytes that's pretty standard or the really big drives that we have these days four kilobytes. And so that's the basic chunk of bytes that you can read and write. And so somehow again we're going to have to go from this variable size up top through the block interface to the actual physical interface of the disk drive. And one of the things we're not going to talk about today but next time is we want to try to put some sort of caching in here or something to make this faster. Okay, because we, you know, I've sort of joked at various times this term that pretty much everything in operating systems is a cache. Okay, and so obviously they're going to be a cache somewhere here but we're going to deal with structure first and then we'll cache it later. And we also talked about SSDs or flash based disk drive. So one of the things that's different than just raw flash chips is when you put it into an SSD, this interface between the operating system through the device driver and the device has a lot of similarities between a disk drive and an SSD. And in fact there is a layer in there that makes that SSD look like spinning storage except it doesn't have the seek time and the rotational time to slow you up. Okay. And if you notice the other thing that's very unique about the SSD which I mentioned, and I'll and if we get to something at the end of the lecture today I'll show you again is that this the blocks are things that can never be overwritten. Okay, you have to take an erased block and write to it. You can't take a block you've already written to and change the bytes which you have to do is, if you're going to change a physical block which you really have to do is find a new physical block, copy everything over except for what you want to change and then the previous block gets garbage collected. And the way that the operating system doesn't have to deal with that is this translation layer. And so the, the logical block numbers that the file system and the device driver think you're using are actually translated inside the SSD to the physical block. And that translation layer in the firmware is responsible for making sure that things don't wear out. So that the, you're not overusing some particular physical block by writing it erasing it writing it erasing it over and over again. Instead, there's actually where leveling firmware that makes sure that the SSD doesn't get overwritten. Okay. And so then that's the other thing that needs to be kept track of is a bunch of erasures that happens you actually have to work to make sure that you erase a bunch of blocks that aren't in use anymore, so that you can have them ready to go. Right. And so that's a fundamentally different aspect from hard disk drives where you can actually overwrite the sectors. Okay, but the interface is pretty much the same. It's dealing with four kilobyte blocks that are read and written. It's just the underlying physical things a little different. But we're popping up. Okay, so now there's a good question here. So if you override a block with zeros to erase the file. Is there any way to tell the SSD to actually erase it. That's a really good question. And the answer is not always yes. Modern. There are some SSDs that have the ability to encrypt things natively on on the drive itself. And then you have a little more control over it but just because you write a bunch of zeros into block number 536. Absolutely means nothing in terms of what actually happened to the data underneath because you're writing to a completely new block. Okay. Now, oops, where am I here. Okay, so how do we build a file system. So what's a file system, a file systems a layer of the operating system that transforms the block interface of the disks into files or directories or things you're used to. And so this is a classic operating system situation that you're very familiar with hopefully by now been doing this all term where you take limited hardware interface which is an array of blocks and you provide a new virtualized interface it's much more convenient and provides in this instance a whole bunch of features like naming so we can file file find files by name not necessarily block numbers. We could organize the file names inside of directories. We can map files into blocks and figure out which blocks belong to which files. And then of course things like protection and reliability are important things as well which is, we want to enforce the access restrictions to prevent, you know, unauthorized writing files that they're not supposed to, and reliability we're going to want to put some level of redundancy into the system to make sure we don't lose our data, even though we have crashes and hardware failures, etc. Okay, so this level of abstracting is really what the file systems about and I'm going to give you another number of actual case studies to show you how people have done that in several file systems that are currently actively used. So, again, what we just said a little bit ago but I wanted to repeat this is the user's view of files is that they're durable data structures that you put the data in and it doesn't go away. The systems view of course is that it's a collection of bytes, that's the Unix view at the system call level. It doesn't really matter what data structures you put in the disk. So the interesting thing is the user only really knows how to interpret the bytes, Unix makes no restrictions on how you structure those bytes. It's entirely up to you. So from the systems point of view it's a bag of bytes. And then when you get underneath the system call interface and into the actual file system and so on. So from the caching system, the systems view then underneath there becomes a collection of blocks, because the block is the logical transfer unit. Okay, and the block size typically is bigger than the sector size where the sector is the physical transfer unit. That's the minimum transfer unit on the disk. And then the blocks because typically like the sector is 512 bytes that's just too small and so we turn the budget sectors into a block. And that's what we read and write off of the disk. All right. So you can kind of look at this you know here's the user, they have a file full of bytes, they talk to the file system the file system talks to the disk. And when all of a sudden done the user thinks they have files that are a bunch of bytes. Okay, so that's our goal. So just to hammer this home a little bit. What happens if the user says, give me bytes two through 12. Well, what happens is the file system has to fetch the block that has those bytes in it so that block might be on disk. Okay, in which case it's got to pull it in to a cache. And then, since that's probably 400 or excuse me, since that's probably four kilobytes, it has to figure out where bytes to through 12 bar package them up into the users buffer and return it. Now, it's quite possible that if this is the second time we asked for, you know, when we go to ask for bytes 13 through 36. Maybe that blocks already in the cash and we don't actually have to go out to the disk. Now, it's an interesting question here what if you have multiple files with different permissions in the same block. So the answer is that doesn't really happen that's a, that's a bit of a failure the file system because right now the file system provides a one to one mapping between files and underlying blocks. So the permissions are on the files not on the individual blocks because the blocks are assembled into files and the metadata for permissions are actually in the I node which I'll show you in a little bit. Okay, good question though. So what happens if we go to write bytes to through 12 this is a little trickier, and I wanted to make sure this is clear so you have to actually, since you can only deal with blocks at the disk level, you have to pull your block in overwrite bytes to two through 12 and then write it back out before you can modify bytes to through 12 you can't actually go in here and only write a few bytes on the disk. Okay, it's just not, it's not possible. So, we have to make sure that we, you can start to see why having blocks stored in Ram, at least temporarily is going to be really important because at minimum we're going to need to bring a block in, overwrite a couple of bytes and store it back out. We're going to do much better than that when we get to the block cache but we're not there yet. And of course everything inside the file system itself is in terms of whole size blocks, the actual IO happens in blocks and any reading and writing of something smaller has to happen across this file file system interface. Okay. Now. So, how do we manage a disk so we're going to in the next. I don't know I'm going to say, half an hour or whatever we're going to talk specifically about disk drives but we're going to generalize some ideas about how do we manage a disk. So basic entities on a disk that we're going to want to have is we're going to want to be able to have files and directories. Okay, so files a user visible group of blocks arranged in some logical space or what I like to say is a bag of bytes. A directory is a user of usable visit excuse me a directory is a user visible index mapping names to files. So, we're going to have to figure out how to do that so that we can turn a file name into something as a file. Okay, and so that's going to be part of what the file system does. So the disk is a linear array of sectors. And so how do you identify those sectors. Well there's a couple of ways to do that one, which was kind of in the original discs before they got too big. A sector was really just a vector of which cylinder surface and sector it's on so if you remember a cylinder is all of the tracks that are on top of each other and it really represents the positioning of the head assembly. The surface is which one top or bottom and which platter. So that's which surface you're on and then which sector so the sector itself is a three tuple here, just finding where that thing is on the disk. It's not really used anymore. And one of the reasons was things got so big that the biases weren't able to keep up with it. And in this instance the OS slash bias which is lower level in the OS had to deal with bad sectors and the disk just got so big that it wasn't working anymore. So at some point we switched over to the logical block addressing where every sector has an integer address, starting from one and working its way up to the size of the disk. And the controller does a mapping from the integer number to physical position and shields the OS from the structure of the disk. So the SSDs don't actually expose the cylinder surface sector interface either. So that was a good question in the chat. Pretty much this logical block addressing is what it pretty much taken hold before SSDs were really very popular so pretty much the SSDs are giving you this LBA level of an interface which is a logical ordering of blocks from one day and. Okay. Now, this has some consequences so if you recall from last lecture, we talked a bit about elevator algorithms to basically take a bunch of requests and rearrange them so that the disk would do a nice clean sweep rather than randomly going all over the place. So the logical block addresses now you're only really guessing that somehow blocks that are next to each other are close to each other and in the same track. You don't quite have the same level of information that you had before, but operating system still try to do a job of optimizing for locality. It's just not quite as precise as it was back in the days with physical positioning that was cylinder surface sector. So what does the file system really need to work. Well, it has to track which disc blocks are free. Okay, and in the case of the SSD it's also tracking which blocks and I say that in quotes are free. It's just, it knows the logical block number it doesn't really know what physical part of the flash chips are storing that but it has a notion that there are these blocks and some of them are free and some of them aren't. So it's still doing that same idea which is tracking the disc blocks. And you need to know that so that you can know when to where to put your newly written data. You need to track which blocks contain data for which files. Okay, so that you know when you go to open a file and you start reading from block from bytes two through 12. So where is the first block of that file well it's on disk somewhere how do you know which one it is well that's the file systems problem. Right, it also has to track files in a directory, so that you can look them up by file name. Okay again that's the file systems problem. And where do you put all this. Well, since we need to be able to shut the whole system down and come back and our data is still there. All of this stuff has to be on disk somewhere so not only does the disc hold all the data but it's got to hold all this metadata in a way that we can start from scratch once we turn the file system once we turn the operating system on and reboot the machine. So we are, you know you could say there's a little bit of a recursive issue here but somehow the information that's tracking the files we need to put that information also on disk, possibly in a file. If you, if you are tracking that you can see that perhaps there's something about standard positions for the root file system or something like that and we'll talk about that in a moment. Right, questions. Okay. You guys with me so far. What are, what's the story with putting data structures on diff on disk. It's a bit different than data structures in memory. So in memory, I could have pointers to things that are arbitrary byte pointers and I can do link lists and stuff. The ideas there are the same except that the data structures have to be made out of these minimum quanta of blocks, and that kind of changes what data structures we use a little bit. And it turns out once we start worrying about performance we're also going to be very careful about which blocks are next to each other on the disk, because we're going to want to try to keep them next to each other in the file as well because that'll give us better performance. So, the other thing is we can only access one block at a time. So you can't really efficiently read write a single word we already said that you have to read or write the whole block containing it. And ideally you want sequential access patterns where you sort of write a bunch of stuff along a track on the disk. All right, and you can imagine that with SSDs as I've told you, every time you go to write something you actually have to allocate a brand new folder under the covers and use that to do your overriding and so part of this has to do with being careful about how much erasing and reallocating you're asking the flash translation unit to do. And so flash aware file systems are a little bit careful about when they even decide to read and write blocks. Okay, and if we get to that at the very end of the lecture I'll tell you a little bit about F2FS which is one of the flash file systems that's in use these days. So, now, the other things to start thinking about is when you go to write something on disk it takes a little while to get there. And, and furthermore, if we have these data structures that are on disk, and they have to look a certain way there's some consistency in those data structures. Ideally when we go to shut the whole system down and turn it off. The disk is a completely meaningful consistent state. And I don't know if any of you have ever lost data because your machine crashed at the wrong time I'm sure there are many of you. Then you'll know that the file system doesn't always shut down in a clean state. And so although we won't get to this this time next time we're definitely going to start talking a bit about journaling and some of the other techniques for making sure that data is never lost even when we have sudden shutdowns. So that's going to be important. Okay. Now, that's me. I don't have a lot of administration trivia we're almost almost almost done grading. So, I'm, I'm, I feel almost like I'm talking about counting votes in the current election. We get we're getting there, it's going to happen. And so, we'll let you'll know as soon as we're ready. The other thing is, and I think everybody's probably done this but make sure to fill out post midterm survey. Let us know what we're doing how we can improve. And the other thing which I'm not sure we put into the survey but you're welcome to forward to me individually is if there are any particular topics you'd like to talk about in the last lecture or two. And I'll, you know, I might throw together an interesting lecture with topics that were requested by people. So, feel free to take advantage of that. I've actually had people ask me about things like quantum computing which is not really 162 but I'm more than happy to to talk about things as long as I can say something meaningful about it. Yes, I would say that the results of the midterm grading just is going to be far less contentious than the results of the election we shall see. But as you have seen, for those of you that are watching the counting of the election, slow and steady is the name of the game. So, this is all about taking a breath, which is good breathing is good by the way to the other thing I wanted to point out is if you have any group issues going on whatever make sure your TA knows about this. Make sure you reflect those issues in your group evaluations give us some good feedback because we will take all of these things into account. So I don't think there's anything else to say about administer via. I think that's pretty much what I had to say so I think we're good to go unless anybody had any questions term is kind of winding down we're down to like the last, maybe six or seven lectures sometimes I do my special lecture into our week. We might do that but we're getting down the last few so. Okay. So, let's talk about designing file systems. So what are some critical factors, well clearly performance. Okay and hard disks are a good example of trying to get performance out of a less than ideal device from a performance standpoint, because if you have to seek and then you have to wait for a long time to see and then you can read. That's going to take a long time, and I showed you several examples, a couple of lectures ago, which showed you the difference between if you have to seek your total bandwidth goes way down, versus if you don't have to seek your bandwidth is much higher. And so we're going to want to do a good job of the same thing in our file system. So that's going to be important. Okay. And this by the way is hard to get right. So I'm going to talk about this by the way just to put one more point on this is when you get to SSDs, then randomly writing things in your logical block space is or reading from it is no longer a performance issue because every block pretty much. It takes about the same amount of time to read. And so some of the optimizations for disk drives are less required in SSDs. Now other things that really just feed into the UNIX view of the world is that we always have to do open before reading and writing. So, you know, you just think about that you do an open system call and you get file descriptor back and then you can do reads and writes. What's good about that model is you can perform protection checks on open figure out where the file resources are in advance and then from that point on you're really just accessing the blocks directly. And this fits in a little bit earlier. There was a question about what if, you know, you had different permissions on the block from different people, they just don't do that in these typical file systems. Okay, all of the permissions are attached to the file as a whole. Now, the last couple of lectures we're going to expand quite a bit where we start talking about file systems that might actually span the globe. Excuse me. In that instance, you can't necessarily trust that the permissions that have been checked on open are going to be kept when you're talking to data that's being stored somewhere in Antarctica or wherever it happens to be. And so we might have to adapt slightly different behavior once we get there. Okay. Excuse me. So the other thing is kind of a side effect of Unix, which is the size of the file is determined as you use it and think about this a second you open a file, and then typically you write bytes to it, and then you close the file. And so the file system really has no idea how big your file is going to be until you actually close it. Or if you open it, you write some stuff, you close it and then you go back later and you open it, you append some stuff and close it. Now the file is also growing incrementally. And so to any extent that the file system is going to optimize the placement of your bytes to try to make everything fast runs into this unfortunate problem here that it doesn't really know how big your file is. Okay. So the other thing we're going to need to do is organize everything into directories and so we have to figure out what data structure that is. Okay. And then finally we're going to need to very carefully allocate in free blocks, so that our access remains efficient, and we can hopefully minimize seeks as I started out here, maximizing sequential access those two things are going to be very important for us in our design. Okay. So what are some important components of a file system so we have your file path which is the name, you go somehow into a directory structure. That's going to give us something we call an I number, which is really a pointer into an I node array. I'm going to do that in a little bit. And what is an I node and I know it is basically a file header structure that points out which blocks belong in the file. So think of this as like an index or like a big array that sort of translates from a position in that array to which data block is in the file. So this file header structure is the thing that's going to get modified as I read and write the file. I write the file make it bigger. I'm going to be adding entries here. When I allocate a brand new file by opening it with create what I'm doing is I'm allocating a brand new I node, just for that new file. Now, the interesting question here also that's in the chat here is does error checking usually depend on the block device or the file system. There's a lot of layers of error checking we'll talk about those next lecture but just as a simple thing to point out the data sectors themselves have a whole bunch of read Solomon bits on them that you actually write more bits than your and that allows it to handle a lot of read errors just off the disk. And then, once we want to really deal with the fact that maybe a whole disk could die then we start doing stuff like raid and so on which we'll talk more. So mostly the error checking at one level is on the disk itself and then at another level we use redundancy by writing to multiple disk drives in order to deal with a drive failure. The I here is just for I node so I node is index is what the I stands for and so it's an index node, and this is the index node number I number or whatever but all right. Now, if you remember by the way, way back when we talked about the abstract representation of a process. It's got some thread registers got some address space and so on the file descriptor table is in the process. Okay, and that basically transforms numbers to open file descriptions, if you remember, and the way we talked about it you can go back and look and I don't know lecture six or something was we said well this file description keeps track of what the file name is and what position you're currently adding that file name so that when you're reading and writing, it can kind of pick up where you left off. In reality what's actually being stored in the open file description is the current I number, because if you remember, we open the file first and that's where we trace the name all the way through the directory structure and then eventually we find the I number, which is the actual file, and that I number now is what we use when we read and write. So you can actually get into a situation where you open a file and then it gets moved. And you can continue to read and write it even though it's moved it somewhere other than what your name pointed at and that's because the open has held on to the I number not the name. Now, so we take the file name. And we look that up in a directory structure which give us the file number. So open performs the name resolution we're going to have to figure out how to do that translating path name into a file number, read and write operates on the file number and use the file number as an index to locate the block. The file number goes into the index structure to the storage block and that's on disk and so really you're going to figure out while I'm at offset. Suppose I was to go to 5k 5000 in some file. Well, that's going to be in the second block because the first block is zero to 4095. The second block is going to handle the next set of bytes and so I'm going to look that up in my index structure and find out where the second block is or block number one. That's going to point at the disk somewhere and so I know that when I go to access bite 5000 I'll know which block it is. So we're going to have to look at both how the directory works and how this I know structure works to help us find which block is of interest to us. Okay. So one of the components which we're going to talk about in the next few slides. One is what's a directory look like. What is it exactly. Another is what's that actual index structure. A third is we're going to talk about storage blocks and the free space map. A lot of these choices in here of these four pieces at least are things that vary depending on what file system you're using. Okay. So first to ask our question how do we get the file number. Well you look it up in the directory. So a directory is really a file in most file systems containing file name file number mappings. Okay, and so basically a directory is just a file and you go in that directory and you find the file name you're looking for, and that gives you the file number and as a result of that then you can know, get the index structure and know where to look on disk. The file number could be a file or another directory could point out a file or another directory. So really the way you go through slash a slash B slash C slash D is you find slash and in slash you look up a which points to to to directory a and then in directory a you look up on and so it's a chain of look ups through multiple different directory structures. Okay, and so each file name file number mapping is actually called a directory entry. Okay. Now the processes are never allowed to read the raw bytes of a directory so if you try to open a directory. It doesn't really work properly. Okay, and so the I what I said earlier is that by and large Unix doesn't care about the format of the data in files, the one point at which that's not true is the directory format because the directory format something that's directly interpreted, interpreted boy I'm losing it today sorry directly interpreted by the operating system. Okay. This is from watching vote counts for too long. I think I'm going slowly crazy, but anyway, so instead there's actually something called a reader system call you can look it up to a man on it which iterates over this map without revealing the bytes. Okay. So, why shouldn't we let the OS read and write the bytes of the directory well because they might screw it up. Okay, and so pretty much the read directory write directory create all of that stuff are operations that cause changes to directories indirectly. Okay. So, just keep that in mind, but by and large, except for the format inside a directory and directories just a file and so keep that in mind because we're going to be building files using our file system and we're just going to use those files to store data or to store directory mappings. And so the basic bag of bits that invites that we end up using for our directory is something we're going to get out of our file mapping. Okay. So, here's directories just in case, you know, this is what you get on a Mac OS, just the idea of these folders are something that kind of came up graphically 20 years ago or whatever, but basically what we're seeing here is this top level directory has a directory in it called static. And that static directory has in it a bunch of other things which have for instance, homework. And inside of that might have homework zero dot PDF. This is a set of directories that we search until we eventually get to the actual file. Okay. So directory abstraction just to say a little bit more so directories that's what these blue things are here are specialized files contents with lists of pairs of file name and file number. So in the slash us our directory, what you see here is a pointer to live for dot three is actually a pointer to this directory, a pointer to live is this one inside the live for dot three directory as a pointer to foo, which is this actual file. So these pointers are really just links, their i numbers which point at the i node structure that describes this file which happens to be a directory in this instance. Okay. So the system calls to create directories open create, read directory traverse the structure. So notice it open and create and things like that actually add things to directories you can do read dear to read your way through all the links. There's make directory and remove directory you guys know about that that would be the way that for instance the original live for dot three got put into slash us r. And then there's link and on link which allow you to mess with these actual links. Okay. And there's a bunch of libc support for iterating through the directory so you should take a look but there's like open directory. And then once you get back to directory star then you can read the next entry from it and you can read it in various ways. So there's a whole series of system calls that have been made just to traverse this directory tree, which is something that you end up doing. Almost. For sure if you ever have if you ever write an application that's got to talk to files. Okay. Well, let's take a look here I'm just going to hammer this home I said this earlier. So how many just accesses does it take to resolve say slash my book slash my slash book slash count. Well you first have to read in the file header for the root directory. So that's the slash directory and that turns out is at a fixed spot on the disk somewhere. So one of the things that a file system gives you is the root I know for the root directory. Okay. And then you read in the first data block for the root so remember the root is just a file. So I read in the first block of the file and I start traversing the directory, and eventually hopefully I find, you know it's a table of name index pairs. You can search it linearly to find the word my or the name my in it. And you can search it linearly in most standard unices. Okay. And so that linear search become the really big problem if you have a directory with lots of entries in it which sometimes automatically generated directories are that way. The question here is if the root is at a fixed place. Does that mean it has a maximum size so the answer is no. The thing that's at a fixed place is the I know index structure, not the file blocks and the, there is a maximum file size in a typical file system but that's much larger than you'd ever fill up with a directory. Okay so you're the fixed thing is the I know not the data. I know as we get there. So then you read in the header for my. So yeah that's another reference. And then you look through my defined book and then you read in the header for book and header by the way is the same as I know. And then you read in the data block for book you search for count, read the file header for count and at that point, I now have the, I now have the I know for the actual file and that can go in. I'm basically cashed for all my reads and writes at that point on. Okay, in the description. Now, in a file descriptor points at the description which holds on to the, the header for count. All right. Now, the question here might as a good one which is why not just store the full path in a big hash table. So the answer is, there are some file systems that do that where. So what you're basically saying is you take my slash my slash book slash count, and you, you map that to the I know you could do that. Except that then that makes management a little harder because you know typically you link a new directory and if you think about you make a directory and then you add things to it. So the directory structure itself is typically organized the way we're talking about. But it's not impossible to organize it as a hash table. Okay, but let's let's organize it this way for now. Right, because this is this is closer to what most file systems do. It's more simpler than a huge system wide hash table because you're not storing, you're not having to worry where to store the hash table, you know if that answers that question or not. Okay, but it's not it's not a out of the question and there are some file systems that have chosen to do it that way. So the other thing that we mentioned kind of way back when was current working directory, which is basically a per address base pointer to a directory that's used for resolving file names. This is an example in which the current working directory could be slash my slash book. And in that case, you could actually cash the I know structure for slash my slash book in the kernel and thereby when you go to get to count it's much faster if your current working directory is slash my slash book. And in keeping with the notion that everything's a cash. In fact, what we cash under some circumstance actually what the operating systems do cash is they cash names. And so slash my slash book is actually kept the book pointers actually kept in an internal name cash, which gets a little pretty close to the question that was just asked about keeping a path in a big hash table. So if you think about the hash table as a cash rather than as the ground truth on the directory then that kind of works the way I think you were thinking there. Now, so our in memory data structures. Here's the the per process file table, which takes a file descriptor number and that looks up the in the file description, and that file description which is typically system wide. You load the I knowed into it and it points at data blocks okay and so once we pull the I knowed into memory then we can read the various blocks in the file pretty quickly, and we don't care where it's the actual file name is. Okay, so the open systems system call basically finds the I knowed on the disk from the path name by traversing all the directories creates an in memory I knowed, and from that point on then access to the file is fast and it's independent of how the path name is one entry in this table, no matter how many instances of the file or open. So this file is opened by many people. There's only one description here with many different file descriptors pointing at it. Okay, now if you rename or move a file does it create a new I knowed or modify the existing I knowed, neither. What it does when you move the file is it just changes. The directory structure it's the same I knowed it's unchanged. So the I knowed is the file in some sense, and you can move it around but all you're doing there is you're changing who's pointing at that I knowed. Okay, I hope that answered that question. And in fact, if the same file is in several different directories, then you can have several different directories point at the I knowed, and that just all works out. And so this is part of why the I knowed is the thing that we want to store our permission bits on as well. Okay, now of course the first file system we're going to talk about the fat file system and it violates all bunch of these things but it's probably the most common file system and use today so we're going to start with that one. So read and write system calls look up in memory I knowed using the file handle. And so once we've opened then everything is fast. Okay, so the last thing I want to do before we look at some case studies is, let's see if we can understand what our characteristics of our files are. In order to help us design our file system and so there have been many things studied over the years here's one that was published in fast which is a file system conference 2007. One of the observations was really that most files are small. So what they did was they tracked the size of files. In the file system over the years and starting in 2005 years worth of data. And what you see here is that most of the files are in this small range here even though there are some long tails. Okay, and so most files are small says that I need to optimize for small that's like 2k or less files. But most of the bites are in the large files. Okay, so if you look at how many bites are total in the file versus how much of the space it uses up. What you find is that most of the space on the file systems used by the large files, even though there's a lot of small files. So there's a lot of small files. But most of the bites are in the large files so what these two pieces of data show you. And the trends of course are that files keep getting bigger and so on but what these two pieces of data show you is that one we need to be extremely efficient with small files. And two we need to support large files still, because those are very important so we can't really focus on just small files or large files we want to have something that does both well. Okay, and we're going to keep that in mind because that's going to tell us a little bit about why various operating systems design their file systems the way they did. So the first one I want to show you is the most common file system in the world I would say this is the one that you have on your cameras that you when you take and plug in a USB stick it's a fat file system and so on. This was the original MS-DOS file system and it has found its way through many iterations and sizes to ridiculously large flash drives. Okay, and so this is a good one to know, because it kind of lets us see the simplest form of how we could build a file system. And so the simple idea fat stands for file allocation table. And what a file allocation table is is it's just a big table of integers. Okay, and you can think of it as sitting next to the disc blocks. Okay, and that big table of integers is one to one correspondence with all the data blocks so there is a entry zero and the fat corresponds to disc block zero entry one corresponds to disc block zero entry one and so on. So you could almost think of the fat file system as being a one integer worth of metadata per block. Okay, and this fat directory or this fat index is basically going to be stored on the disc in a few disc blocks, and it's actually replicated for reliability reasons. So let's build a file system out of it. Okay, so assume for now that we have a way to translate a path. So that means a full name into a file number. Okay, so let's assume we have a directory and I'll show you how that works in a moment. Well then disk storage is just a bunch of disc blocks. So what's a file file is a bunch of disc blocks. How do you figure out which disc blocks they are well, we're going to somehow link them together in a in a linear order so that we've got a file out of them. And you could think that each block holds file data. Okay, so it's, you can think of it as block number x of the file or block B of the file offset x gives you if we have say four cable bite blocks gives you which of the 4k bite bites we're interested in. Okay, and there'll be n blocks and so if we put a bunch blocks together block zero be some disc block then there'll be block one block two, and then we can figure out which block we need and then inside it which index we need to get the bite we want. Okay. So let's suppose that we're talking about a file and I'm going to call it file 31 block zero file 31 block one file 31 block two. So what I've just assumed here is that somehow our, our files are numbered. And each file has a set of blocks 012 and notice that they're spread potentially all over the disk. Okay, so this is a potentially big problem with the fat file system. So suppose now, so what are be so be is the block number and x is the offset. Okay, and so, so in this block here, if I were interested, suppose I were interested in getting bite five of the file. I would know that that's blocked zero because the blocks are say 4k in size. So it'd be blocked zero bite five. And so that would mean I'd go to this and I'd go to block zero and I'd find the fifth block and that and that would give me the bite that I wanted. So if I said help. So be as a file number is a block number here. Okay. Now, let's suppose we want to read from file 31 block to some offset x. What do we do. Well we have to index and find block to which is down here so how do we know what block to is a file 31. So what does this extremely simply okay all it does is we start with entry 31 is the file number and so that means that the file number corresponds directly to whatever block. This is block 31 represents block zero of the disk. Okay, so the 31st disk block is block zero of file 31. Okay, and then what is the fat do the fat is a set of pointers. Okay, well, from block 31 or from this potentially spot in the fat file system. The next block is what this link points to so this 31 is going to have a 32 in it because block 32 in the disk is the next block of the file, and then down here I don't know what block this is doesn't really matter is block three of the file. And so basically I can walk through the blocks of the file by starting at the head block which is the file number, and then just following the pointers, and that gives me block zero block one block two. Okay, and so the way I read the block from the disk is I wanted block to do two hops, and then I pull the whole block in and at that point now I can read block bite x out of that block and hand it back to the user. Okay. Questions. Now if you read the literature, what you'll find is there's many versions of the fat file system there was one that was 12 bits, one that was 16 bits, one that's 32 bits. Okay, that talks about the size of the integer in each one of these slots, which has to do with this block, the this blocks on the disk so you can imagine that fat 32 has many more much larger discs, it can handle in the original fat 12. Now very interesting question here that's in the chat is what if you want file 32. The answer is there is no file 32 because file 32 would put you in the middle of file 31. Okay, so not every file number corresponds to the beginning of a file. Okay, so let me say that again so file 32 isn't a file. Okay, file 31 is a file block 32 it turns out is the second block of that file. How do I know that. Well, I have to keep track of where my my files start. Okay, so the directory is going to come into play. If I thought 32 was a file and I popped in there. What I'm going to find is that file is going to look funny because it's going to be missing the first block. And if this is say a video, and there's a certain encoding in it I'm going to not be able to properly encode it because I'm jumping into the middle of that file. So you can start to see the ways in which a fat file system can get really screwed up. Like if I lose track of where all the file numbers in our then it's going to be very hard to figure out where the starts of all the files are. Now there are recovery programs that will go through and try to figure out that oh look, here's this block this block and this block, and they look like their block 01 and two of a video file. Therefore I'm going to call this a file and I'm going to generate a new fat for you that will let you read it as a file, but it's a very error prone process. Now, how do we let's look at this so the files a collection of disc blocks, the fat is a linked list one to one with the blocks. Okay, the file number is the index of the root of the block list for the file on the question that's an interesting one is do they always go down no. In fact, that's going to depend a lot on if you read some files and you write some files and you delete some files and you read some files and you write some files and you delete some files and you iterate days months years. It's going to matter what blocks are free and so you could you could link all over the place. So there is no locality in the fat file system, especially after you've used it for a while. In fact, the disc head is going to be going all over the place as you try to read linearly through a file so you can already see this has got a problem here. Okay, why is this used in cameras usb as well. It's, it was the lowest common denominator. They wanted something that could work in the original MS DOS slash windows boxes and so on and so pretty much it just historical reasons fat is the thing used. Okay, so I that may be an unsatisfying answer but that's the reason. So the offset and the file is a block number in an offset within the block, you follow the list to get the block number unused blocks are marked free. So what does that mean that means the fat has a special entry that isn't a link to another fat entry that just says I'm free. Okay, and so when you need a new block, you can scan through the fat to find ones that are marked as free and those are ones that you can use. Okay, and so let me give you an example here. So, suppose that I want to, let's see, I guess I had a duplicate here. Okay, suppose that I wanted to do a right. Okay, so actually, before I do that I want to show you something else here. So, let's take a look at two files. Okay, so here's an example with two files, file 31 and file, whatever file number two is I have no idea what that number is doesn't really matter, but it's got two blocks in it file 31 And notice that I've essentially written here another block into file 31 and so you can kind of see how these pointers can get all scrambled. Now the question might be where is this fat stored well it's stored on disk. Okay, at the beginning. And there's a special entry here that marks things as free. And the question might be what's the quickest way to format while you can mark all the fat entries is free. That's a quick format so called, and in that case it doesn't really delete the data what it does is it removes all the indexes. And so, if you do a quick format, and you do, you know, a directory you do a list in the directory, it's really a dear dir and windows or whatever. That things are gone but in fact all the data is underlying because all you've done is erased all the indexes and somebody with a file recovery program might be able to still look at it. Okay, so one of the good things about fat is that it's simple. You can basically implement it in device firmware. And so that's one of the reasons that it's also used in cameras and so on because it's really simple to implement it doesn't require a lot of work. So the free list kept as a link list. Technically the free list in the fats spec is really just zero entries here. If you wanted to have a link list, you could do that in memory as a way of avoiding having to scan all the way through. And sometimes, if you have enough memory, what most devices will do is they'll just load the whole fat into memory so it's much quicker to go through. But technically speaking if you took something and removed the USP key and you look at the fat things are indicated as free by being zero. Okay, so let's look at directories for a moment here so a directory in fat is a file containing a file name file number mappings. Okay. So here's an example where we might have the name music. And it has a pointer in it to the file number for that music directory. Notice that there's typically the dot which is the pointing to this guy and then the dot dot which is pointing to the parent. We link these directory entries together. Why are they linked together well just because in the fat things are all linked. Right so this is the first very clear interest instance hopefully for you guys of a directory is just a file that's got special formatting. Okay. Now, the interesting question that was on this is what if the sector of the root directory fails, then you potentially lose your data. There's actually two copies of it so you have a couple of chances to not lose it. But if you really lose the fat then you've just lost all of the indexes and potentially have no idea what files are linked together. So free space for newer deleted entries is kept so when you delete something in a directory you just link over it and there's free space in that directory. The file attributes are kept in the directory which means unlike what I was saying earlier that we're not able to put permissions on the file itself but rather on the directory so that's not quite the way we wanted it. So what distinguishes directory files from normal files. You can get to them by starting at the root directory. Okay. So all of this makes sure it depends a lot on the actual format of the metadata not getting screwed up. And so any of you who have ever lost I once lost a whole bunch of pictures in a camera, because a couple of blocks failed in the wrong way, and it's very hard to get them back. So the fat file system is very fragile, as you can see but again it's used a lot in very large USB keys. Okay. And it's a linked list of entries you have to linearly search through and so on so where do you find the root directory just to circle back on that it's at a well defined place on disk. So this is the first block that this is block 32 or excuse me block to there are no block zero or one. Don't ask me why that's just what they did so pretty much the very first block on the disk is the primary fat. And that's where you start your look up. Okay. So discussion suppose you start with a file number tie how much how long does it take to find a block. Well it's linear right you have to linearly search your way through. What's the block layout for a file. Well, the the layout for a file is accidentally whatever happens to be used as your writing and wherever the free blocks are. What about sequential access well sequential access is slow because you have to work your way through pointer pointer pointer pointer. So, you know, I guess if going from pointer pointer is not too bad. And your sequential access is not too bad. Random access is pretty bad right so if I wanted to get to block three from block from file 31 the old my only thing I can do is work my way through all of these links until I get to block three. Okay, and so the fat file systems very bad for random access unless you have a driver slash file system that pulls the whole fat in and reindexes it in a way that's fast and you can do that there's nothing no reason not to. Other than it takes a lot of memory and is not simple, which is one of the reasons that people like to use fats and camera because it's such a simple thing. What about fragmentation that's where the file is split across many parts of the desk well as you see just plain happens. And this is why they're all these D frag routines that you can run on old windows boxes and so on to rearrange the blocks, so that you really are linking sequentially and you can get some sequential performance out of this. But if you don't do that then the blocks are potentially all over the place. All files and handles them well enough right big files well it's a lot of links. I mean the biggest problem with a big file is you, you can't get randomly to the end of it without following a bunch of links. So that's that's a bit of an issue. Okay, so let's look at a different case study so I want to talk about the Berkeley file a fast file system. I know it's in new unix including the Berkeley fast file system. So the file number is no longer just a pointer into something like the fat it's actually an index into a set of I know to raise. And so those I know to raise each file, or directory is in a night is an I knowed. Okay, and so the file number is an index into this array each I knowed corresponds to a single file and contains all its metadata. So the things like the reader write permissions are stored with the file not in the directory like they were in the fat system. It allows multiple names, or directory entries for a file. So again the idea there is the I knowed is the file, the directory entries can point at it you can name that file 1212 different ways. As long as you get to that through the directory structure you can now use the same file because it's it's identity is defined by the I knowed. So this is, this is a much cleaner approach to to dealing with files. Okay, so the I knowed in unix typically maintains a multi level tree structure I'll show you this in a second to find storage blocks for files. And it's been designed in this asymmetric way which you'll see in a moment to make it great for little and large files. I showed you that there's a huge number of little files, but some really big files and we need to handle both of those well. Okay, so the original I knowed format which I'm going to show you appeared in the Berkeley standard distribution unix four dot one. And, you know, I've said this a couple of times I said this was sockets, you know BSD Berkeley standard distribution was famous for all sorts of innovations and operating systems. It's a, you know, go bears kind of scenario says part of your heritage here. And just as a more recent thing this is very similar structure for what Linux ext two or three ended up ext three is pretty much what you would get if you formatted, you know, a new version of Linux and you weren't trying to make a huge system and for the ext four. Okay, so go bears. And I knowed structure typically it looks like this, where an I knowed has a bunch of metadata, and then it has a bunch of what are called direct pointers which are pointers directly to block numbers. Okay, and so the block numbers remember I talked about the logical block numbers earlier point in a big space from one to end. And so the direct pointers point directly at a set of blocks and then there are double indirect pointers which is this is showing you an indirect pointer here for instance points at a block. And inside that block is a bunch of pointers to blocks, and then doubly indirect pointers pointed a block which points at blocks which point at a bunch of data blocks. And then a triple triple indirect goes to a block which points to a block which points to a block which points to a bunch of data blocks. So, all of the data blocks are over here on the far right, and this index structure notice how it's asymmetric so the first end direct pointers, you go you have the I knowed you can directly figure out which data blocks are there. If you go past let's say block 10. Then you start having to pull in a block which they'll let then let you get n blocks out of it. Okay, can anybody figure out why we did a bunch of direct pointers and then we had some indirect doubly indirect and triple indirect pointers. Why, why, why this crazy structure. Any idea something about small versus large files. Yeah, what does this do. Good, so somebody else said here the head of the file is fast but you can still accommodate large files that's correct. In fact for files that are small enough. It's only one hop once we've got the I knowed in memory which we get on open, we can look directly in the I knowed to find out the first end blocks, just directly so this is extremely efficient for small files. Right, but we can accommodate large files and for really large files the triple indirect pointers, give us a huge number of data blocks. Okay, and so this structure was set up precisely to handle small files really well, and still be able to handle big files, mostly fairly well, fairly well. And if you imagine cashing we haven't talked about that yet, all of these intermediate blocks, then in fact once you've gone to the trouble of doing the triply indirect blocks. And you pull in the first triple indirect block and then the doubles and the indirect blocks, then these can be put in the cash and you can get the rest of these very fast. Now the question is to clarify, does the file number point to a single I knowed or to an array of multiple I notes, the answer is the file number points to a single I knowed. Okay, so the file is defined by its I knowed. Each file has only one I knowed and when you talk about an I number it is an index into this I knowed array that points at where the I knowed is that makes sense. All right, are we good or more clarification. The I number points at the I knowed array every file has one I knowed. Okay, is the number of direct pointers part of the spec yes. So file systems, typically have a specific I knowed format so that's part of the file system. Okay, so it's, and you don't often have the option to vary it in fact I'm not sure of a commonly used file system off the hand off hand that lets you change the number of direct pointers. There are one to one for every file, yes. So each file has an I knowed each I knowed that is in use belongs to a file. Typically there's a whole bunch of these that are free, because if there weren't any free ones you couldn't create any new file so there's a bunch of free ones. But for the ones that are in use, they're only being used by one file and each file has one I knowed. And the file number is unique to the file. Yes. I don't know if I can say this in any other way this makes sense. So here is exactly one file. Okay, and it has exactly one I number which represents this spot. So I should I pause on this or we could. I'm assume that it's seeming that we're good. The I knowed array does not include a pointer to an I know the I knowed array has I knowed in it. The I number is a pointer in the I knowed array. Okay, so you could think of this as an array of structs if you want but it's on disk. So the I knowed are actually in the I knowed array which is stored on disk. Now, the top of the I knowed is the file attributes which are things like what user is it created it what group is it in the typical read write execute permissions of, you know the user group and world. Things like the set you ID and set GID bits which say that whenever you try to execute this file if it's an executable does it get an effective user ID that is the same as the owner or an effective group ID that's the same as the group. Those bits are all stored in the metadata. Okay, whether this can be read or written etc. Okay. And as for instance here's an example of 12 pointers. Okay, this wasn't the original BSD necessarily but certainly Linux has 12 of these direct pointers the original BSD had 10. That's part of the spec. But what this is saying is that in this I knowed we have for instance 12 pointers that pointed data blocks. If it's 4k blocks, that means that the direct pointers are sufficient for files up to 48 kilobytes, everybody with me why because we have 12 pointers 12 times four kilobytes four kilobytes. Okay, so we can do pretty well with lots of small files having only one look up hop, one indirect in direction to get to the data blocks once we've load I knowed into Ram, then we can get these data blocks. And that's getting us this thing that we talked about earlier which is that most of the files are small. So most of the inodes don't have any indirect doubly indirect or triply indirect pointers those are zeros they basically have everything in this small number of direct pointers. Okay. So does the file system not support 512 by box. Okay, so that's an interesting question. And the answer is that originally these blocks were small and they were 512 bytes in the original BSD. Okay, because the sector sizes were 512 bytes. When we got to the fast file system, which I want to finish up here before we end today. These blocks were bigger and so then there was a special way to deal with fragmentation where you'd have data blocks were partially used but let's leave that for another conversation. And by the way, just to finish one thing though is if the sectors are 512 bytes on disk, when we read data blocks and with their four kilobytes, we don't nothing that the file system, the file system has no idea that the disk is operating in 512 bytes because the disk device driver only pulls in and out 4k bytes so there's never that level of granularities never exposed to this file system. Now, so once we get to the indirect pointers, we can actually get up to terabytes of data. Okay, so once we get to these this level up here we're in pretty good shape and that basically handles the really large files in our original study there so we're good to go with one two and three level indirect pointers. Okay. So to get all together. We basically have an on disk index where we have these I know to raise with a bunch of I nodes in them that index files for us. And in the case of the original Unix it was 10 direct pointers. So in order to ask our question how many accesses for block 23. Well what you do is you, you get through the direct ones and then you start talking about the two of them because you have to get one for the indirect blocks if we have 10 direct pointers in this example to get to block 23 we have to basically get past the first 10. And then we know that the block 23 is going to be in this. We have data blocks that are singly indirect so we're going to have to read this indirect block and then we'll be able to get the go down to block 13 in this grouping to get block 23. Okay, and so, and actually if it's zero indexed will go to block to get to the one we want. But notice that we can easily figure it out if we know what block we're interested in we can figure out where in this structure we have to go to get our block. So this I know being well defined based on the file system means that we can easily go from block number to which where the data block is right. So how about block five well that's just the direct blocks we just do one read. How about block 340 well it turns out we have to go down to the doubly indirect blocks at that point here, read this guy this guy and so on. So you guys can figure that out. All right. Now, if you guys will give me another few moments here I want to actually talk about the fast file system so so far. We're really talking about Berkeley BSD 4.1. As you can imagine if you look at this data structure. There's nothing in this data structure that says these data blocks are laid out in any intelligent way on disk. In fact the original Berkeley Unix 4.1 BSD file system had this unfortunate property that it would start out really fast. Why is that well because as I allocated new files. I would lay out all my blocks on disk in a sequential order and reading them back would be fast. But over time as you read and wrote read and wrote and deleted. What would happen is the file system would get more and more slow over time progressively slower until it was half or worse of the original performance. And the reason that these blocks would start becoming randomly scattered on the disk because the free list in the original BSD was literally a linked list and had no idea of locality on disk. Okay, so you can imagine that's a problem. So what did they do. Well, to, and this is basically we got to deal with performance. Okay, and so what happened is among other things we got to go back to this from last time, two times ago, if we want to optimize reading on a disk. We remember that the seek time and plus the rotational latency plus the transfer time, all add up to give me my total time, and this seek and rotational latency can be long, especially the seek time. We would like to avoid as much as possible. So if you are reading sequentially, say here, and you're reading through this because this was a video or whatever what I'd like is as I read successive blocks it'd be great if they all were on the same track because then I could get Well that can only happen if I'm my file system is conscious of that and tries to figure out how to lay them out in a way that mostly means that sequential access either stays on the same track or if it has to change tracks or cylinders will go to an adjoining cylinder with a little tiny head moment rather than going all over the place. Okay, and so we're going to try to optimize so that we first read from the same track, then from the same cylinder, neither of those require us to move the head, and then only from tracks that are adjacent. Okay, and so the fast file system which is BSD 4.2 1984 had the same I node structure so from the standpoint of what kind of files are supported. They basically kept that same idea that we just showed you of really efficient small files, but the ability to support the large ones. One of the things they did do is they went from a block size of 512 to 1024 so they doubled the block size, and that immediately gave them a lot more sequential movement. Okay, because we could read successive blocks very quickly. We can read basically twice as many bites at a time. Okay, so that was good. So the paper on the fast file system is up there on the resources page for you guys take a look. And again, this is a, you know, Berkeley project was well known at the time. And it did a bunch of optimizations for performance and reliability, among other things distributing I nodes rather than having a single I node array that was on the outer tracks of the disk, or the outer cylinders of the disk and actually distributed them throughout. It used bitmap allocation in the place of a free list. So the nice thing about a bitmap is now you have sort of one spot for every sector. And now you can make a decision you can say oh look there's a big range, an empty space on the disk with a big range of free blocks that I could allocate sequentially. And so the free list given a much better idea of what was sequentially free and not. Another trick that they did was they kept 10% of the disk space free, and that probabilistically gave them a lot of runs of empty space which gave them a much better ability to read sequentially off the disk. So, in your, so in the early days, which we were talking about here, early Unix and DOS windows, the fat file system, etc. Basically put all the headers on the outer most cylinders. And two problems with that are one since the I nodes are all in one place. If the head crash destroyed the disk you've just destroyed all your I nodes and now you lost track of all of the places of your files. Right. Problem number two was when you create a file, you don't really know how big it'll become and so the question is, how do you allocate sequentially enough space to get good performance. And we'll talk about that next time since we've just run out of time here, but just to give you a little bit of things to think about on our way out. And they basically divided the group into divided the disk itself into a bunch of block groups and distributed the I nodes around in the groups and made a came up with a way of basically allocating files sequentially within a group. And given the heuristics for doing that they actually improve the performance of this quite a bit. We'll talk about that next time. For now. Just in conclusion we've been talking about file systems about transforming blocks into files and directories, optimizing for access and usage patterns, maximizing sequential access and allowing very efficient random access. We talked about file and directories being defined by header called I node. We talked about naming which is translating from user visible names to actual system resources. The directories are used for naming and a link or tree structure stored in the files. And that's how we basically define which blocks belong in a file. We talked about the fat scheme which is very widely used it's a linked list approach. It's very simple to implement in firmware but very poor for performance and it basically has no security, as you can see. We want to look at the actual file access patterns, lots of small files but a few really big ones taking up all the space. And so next time we'll talk about laying out file systems to take advantage of that including the fast file system. And, and then we'll talk about a couple of others I have two other file systems that we'll talk about very briefly the beginning of the next lecture including ntfs which is the windows file system and f2fs which is one that's optimized for flash. All right. So, I think that's when we're going to call it a night. I hope everybody has a great weekend and we're going to not try to get too crazy watching the file, the vote counts coming in. But otherwise, I hope you all have a wonderful weekend and we'll see you on Monday. It's BSD file system and still in use that's a question on the chat answers yes. It's definitely still in use in BSD Unix and Linux ext to slash three is also essentially the BSD file system. So, all right, you guys have a great evening. I know.