 All right. Hello, everybody. Welcome to Fun Wednesdays. So today we're talking about file systems. Fun. So everyone's kind of familiar with this, at least on Linux or Unix or anything besides Windows. So the usual POSIX file system looks something like this. It's part of the FSH, so there's actually a hierarchy standard. That's what that stands for. So it's actually defined, some directories are actually defined as they should exist on a normal Unix system. So there's slash, which is the special root directory, which everything has to start with, and we'll have a lab so you really understand that the root directory is a bit of a special case. And there's really no magic aside from that, aside from a magic number, which we're kind of getting used to. And then in root, there should be a few directories, so there should be a bin directory. That's all your binaries are executables. That on Linux should be elf files. There's a dev directory, which should represent all your devices, an ETC, which should have configuration files. Home directory, which should be individual for each users. And in that example, there would be one for myself, and that's where you would put all your files in. And then there might be a mount directory, something where you mount other devices that you plug into your system, like a USB stick or something like that. So usually there's a concept of a working directory, which is the directory you're currently in. So if you do any relative paths, which everyone's probably done before, they're relative to the current working directory. So there is an absolute and relative path. Relative path is relative to the current working directory, and an absolute path will start with root. So if I'm trying to describe this todo.txt, well, if I'm currently in this John directory, I might be able to say, hey, the relative path is dot. Has anyone ever saw dot entries before, or dot, dot, or use, dot, or dot, dot? You got thumbs up. So we'll go over that and see what they actually mean. But for this, the relative path for todo would be dot saying current directory, which is John. There's a file called todo.txt in it. And then if we did an absolute path, it'd be slash home slash John slash todo.txt. So basically, how do I find that todo.txt starting at the root directory? And then similarly for USB, well, if you did the relative one, you would have to start at John. And if that's your working directory, you'd do dot, dot to get to home, which would be up one directory, then dot, dot to get to root, then mount USB. So the relative path would be dot, dot slash, dot, dot slash, mount USB. And then the absolute path would be root, mount USB, and that's it. So here's the answers to that. Hopefully that is not terribly surprising. The only things is there's some special symbols here. There is dot, which means the current directory, and then dot, dot, which means the parent directory. So whatever you live in. And then there's the tilde sign. If you haven't seen that before, but I'm guessing you have. And it means it's the user's home directory, so where they should store all their files. And then all the relative paths are calculated from the current working directory. If you want to get the current working directory from your shell, you can print out that variable like echo, dollar sign, PWD, and that will tell you what directory you're currently in. Or you can just type the command PWD, which essentially will read that for you. So any questions about that? Hopefully straightforward. So here's a fun little history lesson. So you know on Linux systems, if you start a file with a dot, it's like a hidden file. Kind of a hidden file where you don't really see it. Well, that was actually a big old mistake of someone trying to be way too efficient and too clever for their own good. So within a directory, there's going to be an entry for dot and dot, dot and whatever you LS. Typically you don't want to see those. So you would want to hide dot and dot, dot but some tricky developer was like, well that's two cases, that's silly. I can cover this in one if statement. So if it starts with a dot, I just won't show it. And then suddenly, if your file began with a dot, you wouldn't see it when you typed LS. And then they just made that into a feature. So that's the history behind that. That's why that works just because someone wanted to cover two cases and wanted to do it with a single if statement. So that's the history of that. That's a fun little thing. So when we access our files, we'll either access them sequentially or randomly. And this kind of ties into the whole memory thing. So if you read a file sequentially, well, that's what we've usually been doing if we open a file and we read it. So whenever you use a system called read, it advances the position inside the file internally and we'll see what that looks like a bit today. And then next time you call read, it just continues from where it left off. And consequently, if you went and did writes instead of reads, well, if you did multiple write calls, it would just keep appending to the end. So it would kind of shift that position forward and forward and forward. So if I did a write system call that said hello, and then another one that said there, I would see hello there all in a row and I wouldn't be overwriting things. So that's usually how you deal with file descriptors and write to files and things like that, but you don't have to. You can randomly access any byte you want in a file. You can read and write them in any order. You can overwrite them. You can do whatever. And for that, you would need a specific position where you want to read a byte from or where you want to write a byte to. So we've already seen the open system call. So that will take essentially a path name. That could be absolute or relative. Some flags, that would be like read, write. So that's what read, write is. Write only, read, write. Also, there's a flag called O append. That will make that position go to the end of the file. So if I just want to write to the end of the file, you add O append to the flags. And then whenever you do a write system call, it will just be appended to the end of the file. So what we haven't seen yet is this L seek system call. And that system call will actually change that internal position for that file descriptor. So you have to give it a file descriptor, that L seek. You have to give it a file descriptor as a first argument. So that's what I want to change internally. And then the second argument is an offset, which is how many bytes do I want to go? And then it's relative from the last argument there. So it's an it went. But it can only be one of three values. So it's just like a really crappy enum because this is C and we can't do anything better than that. So once can be either set, seek sec, seek current or seek end. And that's where that offset is relative to. So if you do seek set and then give it an offset of say 10, that's relative to the beginning of the file. So that means I want to read the 10th byte or move the position to the 10th byte. So whenever I read and write, it goes to the 10th byte. And then if you do current, that is a relative position to wherever the pointer currently is. And as you would probably guess, the end is relative to the end of that file. So if I want to read, I don't know, the second last byte, I could say l seek relative to seek end and then minus two. So I want to come two bytes back and then I can go ahead and read or write whatever byte I want from there. So it's good to know of this because it will actually cause us problems with the friends we already know like fork. So we already saw how to do this, accessing a directory. So there's open dir, read dir and close dir and it works like this. This actually does not translate exactly to a system call and there's actually a specific format for this. So we'll actually see this in the last lab of the course. But this is what it actually looks like in a process. So each process in its process control block has a file table and they're just stored there and the file descriptors are basically just an index into this table. So I kind of alluded before that, hey, file descriptors are just a pointer. So these are the things that they actually point to. So this would have two processes, process one and process two and it would have three file descriptors that are just represented by their index and then they each point to an individual file and there are three components to a file. There is a position, so where that internal position is to read or write from for the next byte. There's flags, which would be all the permissions. So read, write, what you actually can do with the file and then there is another layer of indirection here. There's a pointer to a V node and a V node is just supposed to represent anything it can read and write bytes to. So a file would be an example of a V node or your pipe, for example. So that's what it would look like. So in this diagram, process one, file descriptor zero points to this. So it would have a position flags which are independent because they point to that entry and then they point to file A and then file descriptor one has its own position, own flags and then points to file B and then in process two here, file descriptor zero points to its own entry which would have its own position which is independent of process one. It would have its own flags and the V node would also go to that same file. So both of these processes could have that same file open for different purposes. So process one could have it open for writing, process two could have it open for reading and that is stored in the individual entry but when you're actually reading and writing bytes to it you don't really care it's just a file and as long as it has the correct flag set for permissions then it can go ahead and do that operation. So any questions about this? Because this actually makes our lives more difficult. So each process has this file table and it's process control block. Again, like I said the file descriptor is just an index to this table but it's a bit more complicated than that so each item actually points to a system-wide global open file table or you can call it a Goff table if you really want bit of a silly name but there's just a big global table and that's where all the information about the position and flags are stored and when it points to V node and again V node anything that can support a reader write. So that could represent a pipe, network socket, we'll see network sockets, regular files, whatever you want. So remember what happens when you fork so the process control block is copied on the fork and specifically for us there is a local file descriptor table and that would also get copied on a fork so both process control blocks would point to a global open file table entry and they would actually point to the same entry so they would share information which can cause you some issues. So this is what would happen so we say we have again process control block one, process control block two and then they're pointing to the same global open file table entry so this is global, it is managed by the kernel and not independent of each process. So this is what it would look like after a fork so they would, oh sorry, yep. Yeah, so this is, so if you call it dupe two it essentially replaces whatever that entry points to. So yeah, so like in here if we had, what we had in the lab is we set up a pipe so it would have like file descriptor three points to a global open entry that represents the pipe and then when you dupe two you're essentially just making file descriptor one or whatever point to that same entry and then you can close that entry. So this is what it would look like after a fork and you can notice something possibly bad here where they're sharing some information so if they are sharing a position and one process calls read, well it's going to advance that position in both processes so for example if you tried to read a file and you had to try and read that same file after a fork in both processes, well because they share that position one process is going to read some parts of the file and then the other process may read nothing or just a mutually exclusive part of that file so one might read the first half one might read the second half or one might read the entire file one reads nothing because they actually share that entry there which sometimes is not what you expect. So yeah that's one of the gotchas so that current position is actually shared for both processes because it points to that same global entry and that's all it points to so in addition besides reading and writing which would change that position for you one process could also do an lseq system call and that changes the position for both processes because both processes are pointing to the same entry if one process changes that position and the other process is still pointing to it well guess what it's now using that same position so one process can actually affect the other one through file descriptors which may not be something you expect but however if you don't want this if you open the same file in both processes after forking they both create their own independent global open file table the only way you can share global file table entries is just by forking that's it you can play with your file descriptor so they look the same but by default the only way you can share stuff is actually through forking so let's say we have this as our main so in our main we open to do.txt say it's read only and then we fork and then we open b.txt it's read only we assume there's no open files not even the standard ones so what would I expect to happen and what would all their relationships be so I'll give you a sec to do that and then we can go over it okay any guesses as to what will actually happen here yep yeah so what's gonna happen here say I'm running along process ID two is going and executing this main when it opens assume nothing else is open so it would return a new file descriptor so if there's nothing open gets the lowest number kind of like your thread lab so to do.txt would be open in process two's file table there would be an entry for zero and then it would point to something in this global open file table and it would have its own position it would have its own flags and then it would have its own V node and then that points to some actual file called to do whoops.txt and then you go along you do the fork so the new process is an exact clone of the parent at the time of the fork so it would copy that open file table so in process three it would have an entry for zero that actually points whoops that actually points to that exact same entry so now they're sharing a position so now if the child did a read system call or something like that and read the first four bytes well now the parent can't read the first four bytes without resetting the position and all that stuff and then afterwards you don't know what process is going to execute and they both do an open independently of each other whatever you do an open it creates a new global entry in the table so say process two runs first it would create a new file descriptor one which would point to a new entry in the sorry the global open file table it would have its own position it would have its own flags and it would have its own V node whoops and then that would point to B.txt and then in process two it would do open create a new entry in its local file table that points to a new global entry so it would point to this one which would have its own position its own flags and then that V node would point to the same B.txt because it's actually the same file yep yeah so that's just to make it easy to share so they whenever you fork they're just all shared by default because they're essentially just pointing to something yeah sorry oh sorry yeah so LOF is the local open file table so that's just the file descriptors pointing to global entries sorry yeah so any other questions about this example so again here's a nicer picture of what it would look like so the parent two file descriptors so it has two local entries it's basically just the size of its file descriptor table is yeah an LOF is a local open file table and they just point to global entries so the parent would have two and the child would also have two both their file descriptor zeros are pointing to the same thing because that was done before the fork so it would have been copied and then after that they both open the same file so they would both have their own global entries that are independent of each other but they're actually pointing to the same underlying file yep oh so this in the middle is a global open file table and then this in each process that's the local file table for each process yep yeah the local open file table is just a number and then points to the global file table yep yeah so zero one two are usually taken but I said here assume there's no previously open files not even the standard ones so this would be a bit weird but usually zero one and two would be taken so it'd be the same thing I just didn't want to write three extra things here yep yep yeah the LC position is the same changing that position variable there which is which byte to read in the file yep yeah so the standard ones would point to point to the same global open file table as whatever the parent that made them so usually if you look at that proc file system which we know how to do now and look at the standard file descriptors they all point to the same thing in pretty much every process okay yeah and this is also why they share and you have a standard input why if you share that over multiple processes well only one process is gonna read anything you type from the keyboard because they all share that position so essentially the active one would read it and then the other ones can't read it so that's why your standard in only gets read by one process and not everything in the tree okay so now talking about how we actually store files on your file system so we saw SSDs yesterday we saw that they contain pages which are basically just fixed size blocks of memory so if we want to store a file well we would store them also in pages so one way to store them would be like contiguous allocation so say I have a green file that would span like three pages well I could just allocate it somewhere on that SSD and then to describe where that file is I can tell you what page it starts with and how many pages it has right that would be contiguous allocation I don't have to tell you much about it then for the red file I can tell you where it starts tell you it has six pages and then for the blue file I tell you where it starts tell you it has four pages so is this a good idea to do with file or is this a good idea for your file system so yep yeah so if I get rid of something it might leave a weird size gap that I can't fill in addition it's a bit better because I'm using pages so I'm gonna have page size gaps but there might not be enough pages to fill something but also typically you make files bigger and smaller so what would happen if I wanted to make that red file bigger yeah well I could but what would I have to do yeah so I couldn't make it bigger because it would overrun into the blue and then I can't say it's like seven now because it's wherever blue is so I could just take it move it here and then copy all those pages copy all the contents of that file and then just stick the new block at the end of that because now I have room for it or I could have moved the blue file down copied all of it and then just made space for the red but typically this is pretty bad, pretty wasteful and we've kind of already solved this problem when we had memory right so memory was also in pages exactly like this what did we do for memory yeah yes we did do multi-level page tables but even without doing multi-level page tables we just made different mappings right yeah well or we could use a free list that's another option so both are valid options we'll get to yours later so let's start with yours so first okay yeah contiguous is really fast as long as there's no modifications really space efficient because if I was to describe a file to you again all I have to say is what page or block it starts at and how many blocks it takes up in its size so it's really fast to access any block in a file because you can figure out what block it's supposed to be in so if I said for example this is a long-winded way of saying assume the block size is at four kilobyte page well if I want to write or if I want to read byte 5,000 well I know it's on block two for that file so I can just go over one and read it I don't have to traverse through every other block or read anything I can directly look it up because it's essentially like a ray lookup but files can't grow easily there's internal fragmentation so remember we kind of talked about fragmentation where it's essentially just wasted space and for all of this since the file system only cares about blocks and pages and they're all the same fixed size we don't have to worry about any external fragmentation when files are deleted or truncated because all our blocks are the same size so one's just as good as the other but we're going to have some internal fragmentation where hey if I have a file that's only 10 bytes and a block is 4,000 bytes well essentially most of those bytes are going to be wasted but that's just what file systems do that's the trade-off they have every file system is going to have that trade-off where they only deal with blocks and they don't care about individual bytes so here's our link list idea where we could just store a link list of pages so this is called link allocation and if you were to do this well you actually have to store the pointer on the block itself so for example if a file started here at block zero well it needs a pointer to point to the next block and then that block points to the next block which points to the next block points to the next block points to the next block and eventually that block will have an all pointer to say hey that's the end of the file it's essentially just a link list but it's just distributed over blocks so what you do here is you give up a little bit of space in the block to store the pointer so instead of using 4,096 bytes to store data say I use eight bytes for a pointer so I just lose eight bytes in my block but that's something you can live with now files can grow and shrink quite easily because it's that same idea as memory or the file system one page is as good as the next so if I want to if I want to make this file bigger by a block I can just go to the end here and then grab a new unused page and then point to that page and say that's the next block and I make my file bigger and it doesn't care they don't have to be contiguous or anything like that I can just do that so now we can grow and shrink them as we want it's still that same thing with external fragmentation internal fragmentation that's just a trade-off we'll make but this is a bit slow right so if we want to access say block 4 I have to read this block access like the pointer on this block and typically you assume that things are nice and cached on blocks but to get to the next pointer I have to go to a new block which won't be cached because it's really far away read a pointer there now next block not going to be cached too far away read a pointer there oh that sucks it's too far away and I have to read it I'm not going to be able to use a cache it's going to be really really slow so can you think of a better way to speed that up and to get those pointers closer together yep so yeah so the idea was to keep all the pointers in the first block or alternatively you can just keep all the pointers separately to describe that block so essentially you just have a bunch of pointers all in a row that are all right next to each other so you just have like an array of pointers and then if you need to walk through the blocks well they're not far away anymore it will hopefully be within that page maybe it takes other pages depending on how big the file is but you can just walk through the blocks they'll all be relatively close together and hopefully it will be faster so that's exactly what a file allocation table is the file allocation table is just a list of pointers and it just makes all the pointers right next to each other so instead of storing the pointers on the blocks themselves the blocks are now free and you have this separate table that has an index for every block and you can point to them so at index zero it would point to block six at block six would point to two two would point to 13 13 points to nine nine points to 18 and we're done so this is the first file system that's actually used so how many have heard of something called fat 32 or something like that so that's what this is so fat 32 means it's a file allocation table 32 means the pointers are 32 bits in size so your pointers are four bytes so this is an actual one that's actually used and everyone if you're using a windows computer you are using this because your boot drive has to be formatted this way so that's what fat 32 is and it's the first useful one so essentially it's linked allocation but we put all the pointers together so it's not as wasteful we can use the whole block and everything is nice and closer together so it's got faster random access so if the file allocation table can be held in memory or cached well computers are really good at accessing values that are close together so hopefully it's really really fast but the bad thing is each file needs to have a file allocation table and the size of the file allocation table is proportional to the size of the disk because it has to be able to point to any single block on the disk so the bigger your disk is the bigger the file allocation table has to be and that's per individual file so it gets really out of control really fast which is why people don't use them for anything that is like gigabytes of size or something like that it's just too crappy and you waste way too much space so can we think of a better idea than a file allocation table even though it's actually something that's used so we heard multi-level page tables so would the multi-level page table idea work for this or what about something simpler so at the end of the day distributed okay so kind of like multi-level ish so at the end of the day for multi-level or even single level what does it do so essentially it takes a virtual address and looks it up turns it into a physical one so if you wanted to be really really fast for files and do that well essentially that's just like a table lookup you could do the same thing so what about just an array of pointers and entry zero points to block zero for that file entry one points to block one entry two points to block two so you just make an array of pointers that aren't this crazy thing but the first pointer points to the first block or the zeroth block they all point to the next block you essentially just turn into an array so that's what indexed allocation is so index allocation is just an array of blocks for an individual file so they just point to that so if I have a file you essentially just give it an array and if I want to look up block four well it's like looking up element four in array that's really really easy I know exactly what byte to go to so if I want to see what byte four is well I have block four and it points to this block so block four my file is that block on the hard drive so does that make sense to everyone so the idea from multi-level page tables here is this red block here so in order to store all the indexes you store all the indexes on a page and then to describe a file you say oh where is its array pretty much so its array would live in a block and that block just has a bunch of pointers for that file so it would yeah so it just same idea as kind of multi-level page tables of just fitting pointers on a block and this is kind of the idea of a single level multi single level page table yep yeah so the question is what happens if I want to grow the file so in this case if save my page size is four kilobytes so I can fit like a thousand blocks I can grow my file up to a thousand by just adding more pointers right the problem is is when I run out of space on that block and then that becomes the multi-level page table idea so we'll get to that next lecture but essentially when we run out we'll pretty much use the multi-level page table idea so yep so those yeah so those two columns would be stored on the drive as well but the size of them depends on how big the hard drive is so I can't draw them because I have no idea how big they would be so whenever you format a drive it would figure out how big it needs to be and make it the right size yep yeah so to describe a file in this case I would say hey this file its index block is at page red so if you want to figure out any block that makes of the file start at page red and use it as an array just one file yeah so the red block here is just that essentially that array of pointers for one individual file so if I was to describe the file to you as part of the file system I would say hey if you want to read to do.txt the it's all of its indexes are the red block so go read the red block yep so what yeah so this one's not linear to disk size but it's only using one block so it's going to have some limitations here so my file can only be a certain size so it was like the point that was brought before as soon as I run out of pointers that fit on the block I can't make the file any bigger anymore yep so we'll work out so yeah so talking about its limits so it can shrink and grow we still have the same fragmentation things fast random access time because it's essentially an array but now we're at a limitation where the file size is limited by the maximum size that can fit in a block right our index block is just lives on a block so let's see so let's see how big of a file it could actually support so in this example we'll say that our index block stores a bunch of pointers to data blocks so there's no other information it's just literally pointing to a block that has its own index and then we'll say that the size of a block on a disc is eight kilobytes so it's a bit bigger and then a pointer to a block is four bytes so the question is what's the maximum size of file that can be managed by this index block so give you two seconds to think or a little bit to think about that all right how big can my file be yep two thousand how big is two thousand two thousand pointers in actual megabytes or something like that yeah so you divide but that's how many pointers you can fit on a block and then each of those pointers points to a block that's eight kilobytes so you multiply them together right here yeah file sorry another comment no yeah okay so just really quick it's like the same idea of page tables so this is my block size if I write it in powers of two so it's two to the thirteen and then if I want to answer how many pointers I can fit on a block well this is my pointer size and then if I want to answer how many of them fit in a block well I just divide them so how many pointers fit on a block well that is if we do our fun math it's two to the eleven so this is number of pointers so if I only have one block that can represent my file I can point up to two to the eleven things and if I want to get the maximum the size of the file oops well I have two to the eleven pointers and each pointer points to a block the block is two to the thirteen so that's the maximum size of my file which would be two to the 24 and if I want to go ahead change that into units we can actually understand that's the same as two to the four times two to the twenty and then that's two to the four meg e bytes because it's powers of two and we're not trying to swindle you and then two to the four whoops it's just sixteen so the maximum size of a file for this would be 16 megabytes so who here would be satisfied with that we got one so someone that doesn't care about movies or like animated gifts or anything like that so that's a good idea so far we've had a good idea and here's the calculation just so you have it so so far that was a good idea but it's not quite practical maybe enough for some text files but we have to do a bit better than that and we'll do a bit better than that tomorrow in the cursed room unless I just unless we want to do lab three stuff yeah yeah yeah so for this my file is represented by a block of indexes but as soon as I run out of things I can fit on that block that's it so I have a limitation with the size of the file but to describe my file I just say hey where's its index block that's it okay so file systems very good for persistence it's the software layer on top of that because for SSDs you just get a bunch of blocks or pages however you want to think of it and it's up to the kernel to decide so you need a file system on top of that to enable persistence and file systems just describe how files are stored on disk the first and only one we've seen today that is actually usable is that fat or file allocation table all the other ones are kind of a work in progress that we're working on and we'll see the real thing either tomorrow or Wednesday but API wise we can open files change the positions it will it will actually use the file system under the hood to figure out what physical block on the hard drive it should point at then we saw each process has its own local open file table and there's a global open file table which is where whenever you call open actually creates entries for and then as part of fork it copies all those pointers there so you're pointing to the same entry in the global open file table then we saw some more interesting allocation strategies for the actual disk itself so contiguous which we argued doesn't work linked which we argued sucks and then fat which is just linked but all the pointers are closer together and people actually use that and then we saw indexed which isn't actually used and we'll go over what's actually used instead of indexes but you can probably guess it could have the same idea with multi-level page tables where we take a few hops and then we have a much larger space we can actually use so with that just remember pulling for you we're on this together