 All right, welcome to Tuesday. So we will talk about file systems and hopefully this lecture will actually make sense of what you've been doing so far. So this will hopefully connect everything back together. It's the last big topic we have, while aside from virtual machines. So we are almost done. So file systems, you should be somewhat familiar with the layout of this. So like a usual POSIX file system has looked something like this where at the very top of it is some very special structure called the root, which is just the forward slash where everything starts from. And within the root directory, which if we get to that, you will see it's just a directory like any other directory, except it kind of has a known special value that makes it special. So that's the root directory. So within the root directory, it's just a directory. So it has a bunch of subfolders in it. So like there's bin, dev, etc, home, mount. And those are all defined by a standard called the FHS, which is the file system hierarchy standard. So just so you know, and then within home, you can have a directory for each and every user. And then within mount, that's special. You can mount devices there, which would take another file system and essentially mount it there. So some quick questions we should all be familiar with because we've been using Unix machines for a while now, that if I have my working directory is home John, then what is the absolute and relative path of to do dot txt and usb dot, or sorry, just usb folder? Well, within the relative path to to do dot txt is just dot slash to do dot txt since I'm already in the John folder. And the absolute path will start always with the root directory. So it will be slash home slash John slash to do dot txt. And then for usb, while the relative path, there's that special dot dot. So it'd be dot dot slash dot dot. That would take us to the root and then it would just be slash mount slash usb. And if that was absolute, it would just be slash mount usb. So here's the answers to all that. And those are the special symbols that we have when we deal with file systems. So a singular dot is meant for the current directory. A double dot is meant for the parent directory and the tilde, the little squiggly line is meant for the user's home directory who or the current home user is, which will usually be set through like an environment variable called home. So usually, and then whenever you execute commands, all of the relative paths are gonna be calculated from your current working directory, which might be represented by the environment variable pwd. So the, which is just the working directory. So some fun facts about this is that if you look within the actual structure of all of the directories, there's actually a file called dot and a file called dot dot. And if you've been using Unix long enough, you know how like files that begin with a dot are hidden from you. So funny fact, that was a bug made into a feature. So when you want to print, when someone created LS, they wanted to hide from you dot and dot dot since they're actual entries. So they didn't want to show that to you. So whoever in the instant wisdom instead of just special casing dot and dot dot and hiding that from you, they're like, hey, I can just cover this in one case. So they just said, hey, if it begins with a dot, that means I should hide it from you. And then that little bug just became a feature where anything that starts with a dot is just hidden from you by default. So funny story, that's where that comes from. So someone was trying to be too clever for their own good. So when you access files and where they're actually laid out, so this kind of jives with memory allocation. So, and dealing with pages. So if the kernel is dealing with pages, it could allocate pages sequentially or it could just do them randomly and figure out whatever pages are free. So if you do them sequentially, you can lay out all the blocks sequentially and then each read you do, so your system call read would advance some position inside of the file. So the curl is gonna keep track of where it currently is and what byte it should read in the file. And every time you read, it would advance that position within the file. And then if you do a write, they're gonna be appended to that position and we'll see how to set that position to the end or how to manipulate that position. So the other way you can access files is just randomly. So you can just read and write randomly into the file in any order and you have to actually set the position of that offset every single time and we'll see how to do that. So we're kind of familiar with the first system call here with open that takes a path name. Some flags and some modes, which the flags will be something like, hey, can I read or write it? Should I write only? Should I read only? And then there's a special flag here that you might not have used before called OAPen. That would just move the position to the end of the file internally. So if you just do an open by default and then you write to that file descriptor, it would write to the beginning of the file. Well, if you give it this append flag, your rights would instead go to the end of the file instead of the beginning of the file. And then if you want to randomly go through a file, well, there's a system call you probably haven't used yet called lseq and lseq takes a file descriptor and an offset and this third parameter called wence. And wence is just what to make the offset relative to. So if you make wence set seek, it will just go directly to the offset. It will set it directly to that value and you can think of that like an absolute path. So it absolutely will set the offset exactly to what you told it to. And then there are two more though they're seeker. So that would be relative to where it currently is. So if you want to skip forward like a hundred bytes, you could just set wence to seeker and then offset to a hundred bytes and it would jump forward from wherever it currently is a hundred bytes. Or if you want to reread the last hundred bytes, you could set to negative 100. And then the final one is seek end. So that's relative to the end of the file. So if you just want to go to the end of file, you could set the offset as zero and anchor it towards seek end and then it would go to the end of the file. So that, so yeah, set makes it absolute and the current and end are both relative. Yeah, absolute with respect to like the beginning of the file. So zero would be the first byte of the file and it doesn't matter where that current pointer is if you set seek set to zero, it's going to go to the beginning of the file. Yep. Okay, so we've seen this before. So just as a refresher, so this will come up probably in next lecture but I'll bring it up here. So our memories of lab one where we have to go through the directory. So there's this open dir, read dir and close dir. So we'll actually see what the structures look like in the next lecture, but this was just the API for it and if you print a directory, well, you have to open the directory and then read dir over and over again until it's finally null. And then you can access the structure there and actually print out the contents of that directory. And then you've done this going through it recursively. So within the process control block, so this is going all the way back to lecture four when we talked about the process control block that is unique per process, keeps track of the process ID, all the scheduling information. Now we know it also keeps track of page tables and now we finally know it also keeps track of what files are open. So what it will have is just a table of file descriptors, which is just an array and it will point to entries that have three attributes. So it has like a, each file descriptor has a current position associated with it, flags, so that would be its permission. So if it can read and write to it and then this V node and a V node just represents anything you can read or write to. And so it could be a regular file or it could also be a socket for example. So if we have two processes, they would have two process control blocks like one and two here and they would have entries in the, it would have entries in its offset table. And then for example here in process one, file descriptor zero points to the first entry which then points to file A and then process one's file descriptor one has another entry here which points to file B and has its own position and then process two at the bottom, its file descriptor zero also has its own unique position that is not, that does not correspond to process ones that also points to file B. So they both have a view into file B but they might be in, their offsets might be different. Yeah, a pipe. So if you have a pipe you'll have two file descriptors. So the read and the write will be different and they'll have different offsets. Yeah, they'll have different positions. Yeah, so the V node would point to the same thing which would be the buffer. So the V node would just be something you can read and write to. So just a kernel controlled buffer is something you can read and write to. But then they'd each have their own offset so one process could be filling up the buffer and the other process could not have read it yet so its position would not change but the right position would keep changing because it would just append stuff to the buffer at the end. So each process is gonna contain this file table in its process control block and the file descriptor is simply just an index into this table. So and then each item there is going to point to a system-wide global open file table. It's called a goft. So the goft table is what holds the information about the seek position and all the flags and also points to the V node and V node is just anything that can support reading and writing. So the V node will hold information about the file and that's some internal stuff that we won't go into in this course but it could represent regular files, pipes like we said, which would just be a buffer, sockets and other things, even directories. So remember what happens during a fork. So the process control block is what gets copied. So specifically for us there's going to be a local open file table that gets inherited. So all the local file tables, all they do is point to entries in the global open file table. So if you fork and you copy that local file to table because there's this layer of indirection there, they're going to both point at the same global entry. Yeah. So they're gonna use the same offset? Yeah. Yep, so they're gonna use the same offset. So they will open the file table? Yeah. Yeah. So the local one just points to an entry in the global one and the global one will have the position in everything. So yeah, so the local one is just a reference and the global one is actually just global and shared. Oops. Yeah, in this case that's a global one. So yeah, so if we have a fork and they're just pointing to entries in the global file table, well, if we fork a process, so say we take process ID one and fork it, well, it would copy the local file table. So both the old and the new process are going to point to the same entry in the global open file table and therefore shared the same position. So this is why in your pipe lab, if you happen to mess up and you shared standard in across multiple processes with your fork, well, they're all going to fight over reading from that and whoever reads from it first will update the position based off how many bytes they read. So it won't get read between two processes. So only one process will be able to read. It will update the position and that's it. So you won't see the same information across multiple processes. If you have a lot of standard ins open, then they essentially would fight for it and whoever essentially wins the big data race will actually read the data. All right, any questions about that? So hopefully that makes sense with how that actually works. So they're actually sharing the same position and that's why you never reread the data unless you reopen the file. So there's going to be some gotchas to this because we have to keep track of it and now we know actually understand a bit better how this all works. So when you fork, the current position in the file is going to be shared between both processes. So some obvious bugs that you might make is, hey, I open a file called, I don't know, like readme.txt and then I fork and then in both processes I try and read the file. Well, because of this where they're sharing the position, well, assuming the file's really small, only one of the processes is actually going to read anything from the file and the other process is going to get nothing or if the process is really big and gets returned in multiple chunks, one process might see some of the file and the other process would see the other part of the file and there'd be no overlap between them. So you have to be really careful if you are sharing file descriptors and you need to make really sure you want to share the same position. Yeah, yeah. Oh, so here, let's see. So what would happen if we had that example before where we say we'll just have one, we'll just have one file descriptor we care about. So we have process control block for process one. It has file descriptor zero and then process control block two is a fork of that which has file descriptor zero. So they would both point to the same entry that has a position, flags, and then that V node. So the V node could go, say it represents some file called, I don't know, read me or something like that. So now if this process does a read call, and this process does a read call, well, the curl is gonna decide who goes first. So if read means say it had the contents of read me was something like first with a new line and then second with a new line, what could happen assuming read gets really small things, this process would go ahead and this might return something like first with the new line and then internally the position would have started here and then because of that read call, it would have updated the position. So now the position would be somewhere like there and now say the kernel decides to do process twos read call. So that would just read from wherever the position currently is and currently they are sharing a position. So this one would return second and now the position is going to be at the end of the file and it's probably just gonna return zero and then both processes are gonna think they read the entire file. So in this case because they're sharing the position, this is one thing that could happen where process one reads first, process two reads second and that's it. So they both get essentially half of the file in this case and you could also have the scenario where process one reads the entire file and process two just thinks it's an empty file. So you have to be very careful with your forking. You have to be absolutely sure you want to share the position. Otherwise you're gonna have some really unexpected results like this where you're like, huh, my process just gave me the line second and in the file, there's definitely more than that in it. And also another got you too, if that isn't bad enough, well, because they're sharing the same position, if one of those processes decides to seek after the fork, well, it would seek in both processes. So you might have two read calls in the same process that read the same information because the other process just seek the position back to the beginning of the file on you and you can't tell any different. So that's another thing that those seek calls that change the position will change the position for every single process that's actually pointing at that entry. Yeah, so it changes a file descriptors position but they're both pointing to the same global entry. So it would update that. So the file descriptors are independent but what they're pointing to is not, they're pointing to the same thing. Yeah, so seek affects this position and this is in the global open. Yeah, so seek will affect what's in the global open file descriptor table. So when you fork, they're going to be two independent copies of each other where they'll have their own file descriptor tables but it just happens that with the fork they copy what they point to. Yeah, no, so that is per open file. So I could have, if I wanted to, if I wanted to have both of these processes read the entire file, well I could just close the shared one in this one and then reopen the file. If I reopen the file which would create a new entry in the global table it would have its same position or a new independent position, new independent flags and the only difference would be the V node would point to the same file but the positions would be independent. So if I did this then both their positions are going to be independent and they just happen to point to the same file which is okay but they won't affect each other's position. Yeah, and then also because of fork being just a copy at the time of the fork if you open the same file in both processes after creating you're going to create multiple global entries in the global table. So anything you do after the fork they're going to be independent and they're going to have different positions because they're going to create their own unique entries in that global open file table. So for this we can, whoops. All right, so for this example so that local open file table is just a list of file descriptors that point to entries in the global open file table and the global open file table is what happens whenever a process actually calls open. So in this case, say you have a process it opens to do.txt and then it forks and then it opens b.txt so you should be able to answer okay for each process how many local open files do they have and how many global open files are there and what are their relationships. So any guesses and you can call the processes is call them parent and child. So child's the one that got created by the fork. So how many files does the parent have open to, right? So it would have to do.txt and b.txt open. So how many files would the child have open? One, two, right? So the child would have the same files open it would have to do.txt and b.txt but their relationship would be because to do.txt was open before the fork that would create a global open file table and then because of the fork both of the processes would point to that same entry so they're sharing the position of to do.txt and then after the fork both of them open b.txt independently so each of those would make a unique entry in the global open file table and they would have a corresponding local entry that points directly to that one. So this is what it would look like. So in the parent assuming that didn't have any other file descriptors open so the zeroth file descriptor was to to do.txt so that open call would have created an entry in the local open file table and an entry in the global open file table. So entry zero would have pointed to this entry right here for to do.txt so it would have its own position, own flags and own V node that points to to do.txt and then after the fork the child would also have file descriptor zero as part of the copy and it's going to point to that same reference in the global open file table so child's file descriptor zero would also point to that and they're going to share the position of to do.txt and then because both processes open a file again after the fork well they open, they create a new local entry and a new global entry. So process, the parent process would create a new entry in the global file table with its own position and flags and open b.txt that's what the V node would point to and then the child process it would create a new entry in the global open file table that would have its own independent position now that also just happens to point to that same b.txt file so that clear for everyone? Hopefully so that way both of those processes could read the entire contents of b.txt but if they're trying to both read from to do.txt they might have issues where they will not see the they may not see the entire contents of a file or one may see the entire contents of a file and the other one's not going to see anything or they see like mutually exclusive halfs of that same file. So if you do it separately like this like they both have their own independent position so they can both read the entire file. So the, yeah, sorry. Yeah, so you can open b.txt with a right permission and then you can do a system call right to it. Yep. Yep. They just have their own independent position and they might fight with each other. So they can all write with a right permission? Yeah. Yeah, cause reading just modifies the file so we can, for the next lecture we could even write an example of that cause I'm pretty sure that's what happens with writes but I'm not exactly sure because normally you wouldn't open to you wouldn't write to the same file that you know is the same file. Yeah. Yeah, so I mean different processes can have a file open and then both write to it and it depends what the right is doing. So sometimes the right can just write to the end of the file and then sometimes the right can start writing to the beginning of the file. So you can have race, you'll have it'll essentially just be a data race. So you could have one program that has the file open for like just appending to the end and one that writes it to the beginning and depending on the order that they execute in lots of things could happen. So one thing that might happen that I think is a bug I encountered in grad school too is like one process can have the file open for appending and it would just append to it and then have like a buffer where it would do that write back thing and then another process just starts writing to that file and making a new smaller one and then closes but because the other one still has it in memory eventually it just that file just comes back again even though you overwrote it in the other process. So you can have things that happen like that and that is thankfully we don't we're not writing a real kernel in here because if you're writing a real kernel you have to actually handle cases like that. But yeah, things like that can happen. So speaking of that the complications we will go into is how you actually store that file. So if you write to it you have to modify some information somewhere and you also have to figure out how to store it. So then the question is how do we store files? Well the easiest thing to do might be something like contiguous allocation where we kind of know that our files sister or that our hard drives will have things that look like big blocks that look like pages they might be a bit bigger they might be four kilobytes there'll be something in that neighborhood. So if we want to store files if we want to store a file well we could do it like we just have an array. So you could just store everything contiguously keep track for every file where it starts and how many blocks it takes and then that's it. So here I have three files a green file a red file and a blue file. So I could say the green file starts at block zero or whatever that is and the red file starts at block seven or six or whatever that is then the blue file starts there and it has a certain length. So what would be the problem if I tried to store files like this which is related to memory allocation? Yeah I have fragmentation and it's going to be even worse because files are bigger than things you malloc and files can also shrink and files can also get bigger. So in this case if I wanted to make the red one bigger well I can't. So if you're computer if you tried to like save a file after you did all your work on your essay or your assignment and you tried to save it and your kernel was just like no then you'd probably be fairly peeved. So what you could do I mean you could kind of move the blue one and update things but that might be slow especially if it's a super big file. So ideally we want to just avoid fragmentation and we can do a bit better but this is like the first idea you might have if you're doing if you figure out how to store a file. So the nice things is that it's base efficient so you only record the starting block and how many blocks there are you fast random accesses because everything is sequential and you know exactly what given the bite and you know the size of the block well you know exactly what block you should be accessing. And then the drawbacks of course is files can't grow easily so you have to make sure there's space for it to grow there's going to be internal fragmentation as well so files may not fill an entire block but we're going to see you can't really do that much about internal fragmentation especially if the hardware only cares about blocks but we're going to have really bad issues with external fragmentation because files can be deleted they can grow and they can also shrink so that's kind of a nightmare if you have to handle the allocations. So what about just doing what we did with the kernel just store a free list of pages so this is something called link allocation where we would record the first page of the allocation and then on every single page we would just carve out a space for a pointer to the next page and then we could follow them all until that pointer finally is null or something like that. So that's what you can do that's what linked allocation is called so this is also going to be space efficient because you only need to keep track of the starting block and there's also a caveat where the blocks also need to store a pointer so the block size is slightly smaller so if the block size is like 4,096 bytes well if the pointer I want to store on it is four bytes well then now I can only use 4,092 bytes to store information, yeah. No, so in this case if you just freed the third element it's just a linked list so you just update the entry for the second element to point to the fourth one. So it's just a singly linked list so if you really cared about that so we'll see that no one actually uses that you can make this slightly better if you used a doubly linked list because then you could go the other way but you're gonna, you know your block size is gonna be even slightly smaller but the nice thing about this is now files can grow and shrink kind of like our kernel deals with memory allocation so there's gonna be no external fragmentation they're just big block size, of course we're gonna have internal fragmentation so we're dealing with blocks but our random access speed is going to kind of suck because like you said if we want to remove a block or anything we have to walk through all of them and then each block also may be located far away so it would never be cash so even worse we kind of know about a TLB now, right? So if we actually walk this every single pointer would be on a new block which would be like essentially you can think of it as like a TLB miss every single time so I don't even take advantages of caches or anything like that it's miss, miss, miss, miss so if you're a kernel and you're just dealing out memory while you're dealing it out one page at a time the link list actually kind of makes sense for that but not so for files so a slight improvement we can do is something called a file allocation table and that essentially uses the exact same idea so it uses the link list but instead of storing the pointer on the block itself yeah, you store it elsewhere so you create a file allocation table and then that is just an entry of pointers that are all nicely, they're essentially just an array all together and you just store all the pointers in there and they point to a block so instead of here before I pointed to each block pointed to the next block well I would just have a file allocation table that tells me for each block what block should it point to so it's just a bit more compact representation so if you actually have to go through that you can load the whole file allocation table in the memory and then quickly traverse the link list and it'll be a bit faster so you might think this is a bit weird but has anyone ever formatted a hard drive or set up something and used a file system called FAT or FAT32 so that's exactly what this is so FAT stands for file allocation table and that's what that is so 32 is just how many bits wide they are so it keeps track, the pointer size is 32 bits so this is the first file system allocation strategy you'll see that's actually used so it's otherwise similar to link allocation the only difference is where it's storing those pointers so instead of storing them on the page stores them in a table so it still has all the same benefits so files can grow and shrink as will at will you've no external fragmentation still internal fragmentation and you have faster random access because that file allocation table can be held in memory and cache but the big drawback is that that file allocation size is linear to the disk size because you have to keep track of possibly every single block and every single file on your system needs it's own file allocation table so every file needs it and it is related to how big the disk is so the bigger the disk is the more room it takes just to keep track of what blocks are allocated and you're just gonna run out of space much, much quickly yep, yep so in this case, so like N3 zero yeah so in this case we had block zero pointing to block six so the table would have an entry for every single block and then you also need to keep track of where it starts so in this point, in this example the starting block is zero but it could not be but every file would have it's own file allocation table that keeps track of all the blocks it uses and needs to allocate it needs to be big enough to possibly fit every single block so a lot of that is going to be zero as well so every file would get it's own allocation table but it would be mostly empty minus the blocks it actually uses so anyone want to guess how we could actually further speed this up so that still kind of uses a link list what's faster than a link list hash table, what about if we're not clever enough to do a hash table or what is a hash table a fancy version of an array we just want to use an array instead of a link list so that's what index allocation map is so instead of just storing a link list of all the blocks and then connecting them together through pointers well why don't I just store an array of what blocks I actually I actually have allocated so in this case for indexed for indexed allocation each file would only keep track of the indexes it actually uses so it would the size of this array is proportional to how many blocks it occupies so sorry yeah so if it grows you would have to make the you'd have to make the array bigger so in this case say we have the same file that takes six blocks so instead of just creating you know possible pointers to every single block I'll just create an array of size six and then each entry is going to point to a block on the disk that corresponds to whatever block it is in the file so in this other example you know block zero the first block was the zeroth block and then it pointed to the sixth block and then the second block so this zeroth block of the file is a zeroth block of the disk and the first block of the file is the sixth block of the disk and the second block of the file is the second block of the disk so if it was fat all it would do is keep track of them like this so block zero would point directly to whatever block is allocated block one would point to block you know whatever block was allocated for that and same with block two and go through that. And now what it also would do is store all these indexes on a block itself. So in this case, so for your question for growing the array and under this scheme, we would have a maximum file size and we'd kind of do the same trick we did with page tables where we'd fit the index allocation exactly on a block. So in this case, the red block would hold all the indexes that represent that file. So it would essentially just hold array of pointers that represent that file. So knowing that's how it works, well we can figure out how large it should be or how large the file could be. So the nice thing about this too is files can grow and shrink. There's again not going to be any external fragmentation, still going to be internal fragmentation. Our random access is really fast now because it's an array instead of a linked list. And but now we have a limitation where we have a maximum file size now. So if we just put pointers in that block, well we can calculate our maximum file size because we know how big a block is theoretically and we know how big our pointers are, so we know how many entries we can store. So if each of those point to a page, well it's that many entries times the size of a page is our maximum file. So let's go through that quick. So if we have an index allocation, we would use one block for all of the pointers and you can think of it exactly like it adds an array. So in this example, the disk block would be eight kilobytes in size instead of four kilobytes, so it's a bit bigger. And then we're going to assume that a pointer to a block is four bytes. So then we can answer the question, well, what's the maximum size of a file managed by this index block? Excluding the index block size. So our block is eight kilobytes. Anyone want to tell me what that is in powers of two? Two to the 13. So a pointer is four bytes. What's that in powers of two? To the power two. So how many pointers could I fit on a block? So I could fit two to the 11 pointers on one of these index blocks. Yeah, so the maximum size of a file, that's how many things I can point to. Assume each of those is full and pointing to a block. Well, then each block is two to the 13. So what's two to the 13 times two to the 11? Two to the 24, yeah, which is 16 megabytes. So would anyone here like to use a computer where your maximum file size is 16 megabytes? So maybe you could get away with this in like the 70s or something, but not really anymore. But we're gonna have a trade-off because we're using a lot of space to keep track of just the pointers. So we have just a block that keeps track of pointers, which is eight kilobytes, but we also can't handle files that are that big. So if we extend this to kind of be like a file allocation, if we extend this and use as many blocks as could be supported by the device, we essentially get a file allocation table again, but the only difference is it's in order. So we can do a bit better than this and we'll get into what a bit better means in the next lecture. But for this, the maximum file size is only 16 megabytes, which isn't great. So like disk-enabled persistence, file systems also enable persistence and they describe how data is stored on the disk. API-wise, you can open files, change the position to read and write the file at. Each process has its own local open file table and there's a global one and there's multiple allocation strategies, contiguous, linked, fat, and index. And we'll see more next lecture. So just remember, I'm pulling for you. We're on this.