 All right. Welcome back everybody to 162. We're going to pick up where we left off last time. We were starting to talk about file systems and we're going to move forward on that. But before then I wanted to make sure to remind you of the queuing theory that we talked about. And what you got was kind of your very first taste of queuing theory. There's a lot more interesting things, many more interesting things you can do and there are whole courses on it. But here we were assuming the simplest assumptions here that the system's in equilibrium and the queue is infinite in size. And the time between successive arrivals is random and memoryless, which essentially means that, you know, we don't have much more information about this other than the rate. And the other good thing is, as I mentioned, memoryless is the shape of a lot of probability curves when you add a bunch of independent things together. So perhaps that assumption's okay. And on the exit the server we assume has some arbitrary possibilities for its probability of service time. And so we talk about its service rate is one over the service time. And we said last time that if we had three, kind of three of the five parameters, we could derive the other one. So for instance, if we know the arrival rate lambda, we know the service time of the server and we know the squared coefficient of variance, which is sigma squared over m1 squared, that's the mean. Then we can derive, for instance, the service rate one over the service time and the server utilization lambda over mu, which is also lambda times T ser. Now what's interesting as we mentioned is the fact that this relatively flexible setup here with an arbitrary server probability distribution is completely described by the three parameters in red here, which is pretty astonishing, basically, that it doesn't matter kind of how complicated things are. If you know the service time and the squared coefficient of variance, then you can compute the queuing. So parameters we might wish to compute, as we mentioned, were the time spent in the queue, TQ, and the length of the queue, which is just Little's law. We talked a lot about that last time. And that's just the rate coming in times TQ. I also like to think of Little's law as the McDonald's standing in line law, as I mentioned last time. And then the results we came up with were two possibilities, one where the server is memoryless, that's an M1 queue memory, in memory, memoryless, in memoryless, out in one queue. Or if it's a general service time, that's memoryless in general one queue. And basically the difference between the M and the G is whether C equals one or not. And so what we came up with, we talked about these equations, and I also pointed you at the end of the lecture last time, to places you can derive them from. Take a look at some texts. But what's interesting, again, is that this relatively simple queuing behavior can be used to figure out how long it takes to service, say, a disk request taking into account all the queuing in the system. And the other interesting thing was that both of these behaviors have this U over 1 minus U behavior, which as the utilization goes to 100, the time to respond goes toward infinity because the queue keeps growing. And that is exhibited directly in this simple queue theory. Okay. And I also talked to you through some examples. Captions are not syncing into Zoom. Okay. I guess I'm not supposed to do anything about that. So what I will also, so this U over 1 minus U behavior of this thing growing to infinity is very interesting. Obviously any real queue is going to be a finite length. And so in that instance, that finite length is going to not go to infinity, but it's still going to start blocking the arrival. Okay. Moving forward now, we were starting to talk about file systems in which, by the way, if you're computing an end-to-end service time for a file system, you could use that queuing theory I just showed you. But now let's build a file system out of our devices. And we're going to talk mostly about disks today. And what we've got, for instance, is the name of a file. You're all used to that, which then basically goes into a directory to look it up. And what you get out of the directory is a file number or an i number. And that's an index structure pointer. And we'll talk a bit about what that index structure is. That's going to vary from file system to file system. But basically what that structure does is it says, here are all the blocks on disk, for instance, that are part of this file. And so then this is called the iNode and the iNumber names, which iNode is of interest. And then from that file index structure, we can figure out which data block is that's of interest to us because of where we are in the file. Okay. And I did have pointed out a couple of times that usually the blocks that we talk about in the file system, maybe 4k, are multiple sectors. So the sector is the minimal unit that you can read or write from a disk, but we don't really use 512 bytes at a time because the overhead's too high. We combine them into blocks. Some newer disks, the sector size is actually 4k, in which case it's one sector per block. Now, I did want to say something, which is a little unfortunate is we have a conflict in terminology. So in the flash that I showed you last time, the word block has this meaning of a bunch of pages. So a page is a 4 kilobyte chunk that you can read and write. A block is a bunch of those pages that are together on the chip. And you have to erase a whole block at a time. And then you can write pages within that block. And you can never rewrite them. But anyway, so be aware of that slight conflict. And if there's a situation where it could be confusing as to what type of block, I would say assume it's the file system block or ask somebody if it's an exam or whatever. So there are interesting pieces here we got to figure out. Like what is this directory structure look like? What's the indexing structure look like? And how do we make sure the blocks hold our data? And so here's a brief version of that figure from the last slide where our directory turns into the file number, which gets us our index structure, which turns into a block. And the first thing is that open is performing what we call name resolution. And that's translating the path name into the file number. And basically the result of that open is a file descriptor in the process control block in the kernel. And as you know, since you've been using files all term that returns an integer handle to the user code for reading and writing from that after it's open. And rewrite, seek, sync, et cetera, operate on that handle. So all of the, in UNIX style operating systems, all of the checking for permissions are all done at the open process. And so once you get past open and then now you have a file handle, you can read and write the data. So directories you're very familiar with. So here's a picture of one from a Mac. But the idea is that directories have inside of them either directories or other files. And so this hierarchical structure, which you've all come to know and love probably since kindergarten, right? Actually wasn't always there. I had to use some machines back at IBM at one point that I had no directory structure. Everything was flat. And I'll tell you it's really hard to organize a flat file system where sort of everything is at one level. So directories are a very good thing. So basically it's the directories a hierarchical structure. Every directory entry is either as a collection of either files or other directories. And so really if you think about this, it kind of suggests that a directory is really just a special type of file. It has name and attributes. Files have data, directories have data. The real difference between a directory and a file is that the directory has some prescribed format to it to allow the name resolution to reverse the directory. And one thing we'll talk about a little bit later is this mapping from a name to another file or directory is called a hard link. And that hard link is actually an inode pointer that's bound into the directory with the name. And so it's called a hard link. There's also something called a soft link, which is a special type of file that says this thing you're looking for is really somewhere completely else in the file system. Hard links are really kind of the bread and butter of the file system itself. Soft links are a convenience. And we'll talk more about that a little later. So what about the structure of a directory? So, for instance, I wanted to point this out, perhaps this is clear to everybody, but how do you access slash my slash book slash count? Well, you could sort of ask this question, which we do in exams all the time. How many disk accesses does it take? And well, the first one has to involve reading in the file header for the root that's slash. And that inode for root now lets you read the root directory. And in that root directory, we're going to look up the word my. And so let's read the first data block for the root, assume for a moment that my is at the beginning of that directory. And so we'll search linearly and probably come up with my right away. Yes, there's a question on the chat about whether soft link and sim link are the same thing. Yes, they are. So once we've found my in the top level directory, that's going to point us at another inode, which now we could read in the file header for that directory. And we read the first data block for my search for the word book. And we can read the file header for book search in there for count. And now the file header we get is for the actual file. And at that point, we're good to go and can start reading and writing the file. So notice this long process of resolving and opening a bunch of directories until we eventually get to the right one. This is extremely expensive and would tend to indicate that you don't really want to use something that's so deep in your directory structure because it'd be too expensive. And so the answer to that is twofold. One, these lookups are all cached in the operating system. And so there's actually something called a name cache that holds the latest versions of these. And it gets invalidated whenever you change any of the components. The second thing is once you've already opened the, say, the file count, then you don't have to traverse the structure anymore because you have the I number for the file itself. The final thing, by the way, is this notion of the current working directory, which you've had some experience with up till now, which is a per address space, per process pointer to a particular directory. And so then instead of saying slash my slash book slash count, if your current working directory were slash my slash book, then you could talk about count and opening just count without any slashes and the current working directory would find that for you. And so there's a lot of different caching that tries to make this lookup process a lot less expensive. Okay. Now, what is a file? Well, a file is named Permanent Storage. Contains data blocks, which are on disk somewhere. So that's a little tricky for us because in principle, I mean, how does a user know where the blocks are on disk? They don't have any visibility into that. And so we're going to have to do something to basically translate to the user's view of a file as a bag of bytes that are sequentially put together that somehow live somewhere. That's going to be our file systems going to help us with that. Files in addition to blocks that have the data also have metadata, which are attributes, things like who's the owner of this file, what its size, when was it last opened or touched or any number of the time based metadata, access rights, read, write, executable in a series of groups like owners, groups, other in Unix systems. Interestingly enough, in more complicated systems like Windows for have access control, which is more arbitrary in that you can give ownership to arbitrary groups and users using the access control system. So that's a more sophisticated style of controlling access to files. So when you do open, what really happens? Well, if you give it a file name through the open system call, and the kernel has to basically go through its in memory directory. Let's assume that it's finally resolved to the top level or the bottom level directory that has the file in it. And in that directory structure, we pull in the parts of the directory that we're searching, and that's going to point us to a file control block. And so we get out of the file control block essentially is an index to that, the file control block, which has all of the names of all the blocks in it that are part of that file. If you look, once you've opened the file, then each process has its own per process file handler table. And so when you read a given file, which you're providing is the file descriptor, which is an index into this table, that index points to a system wide table, which has the file control block in it and can be examined to get data blocks. And so the structure of these in memory data structures are really what we have to figure out when we start talking about a file system. Now, just because we can, let's start with the first file system we're going to talk about here is pretty straightforward. It's called the file allocation table, or FAT. And this is the world's most common file system. It started out on MS-DOS way back when. And now it's used pretty much in all of these USB sticks. And, you know, every operating system knows how to read FAT, even though it's not a particularly efficient way to organize blocks. But let's start with this as our simple example. And so assume for now that we have a way to translate a path into a file number. So we have some directory structure that already exists and works. How does the FAT file system organize itself? Well, there's this thing called the file allocation table, which is a large linear array that has words in it from zero up to the number of blocks in the file system. And disk storage is just a collection of blocks on the disk. And so in order to make a file, we somehow have to indicate for every position in the file which disk block is that. And so really what we need to do is have our allocation table hold a mapping from the requested offset into the block and offset within that block. And so, for instance, block zero of a file, we're going to call it file number 31, you'll see why in a moment, has a certain block to it. And block one and block two of the file are now spread across the disk. So what you see here is here's the set of all blocks on the disk. Block zero of file 31 happens to be at this point in that huge array of disk blocks. Block one is right after it. Maybe that's because I was writing that file 31 and I got those two blocks in a row. Block two might be somewhere way down elsewhere on the disk. And so how do we put these together in a way or keep track of the ordering in a way that works? And the way this works, well, let's see this. Suppose we want to read from file 31 block two offset X. What we have to do is we index into the fat with the file number, which is 31. And so we look at file number 31. And really, what that says is file number 31 is actually the 31st block on the disk. And so the fact that we say that the file is called 31 is really stating that the very first block in that file is going to be at position 31 on the disk. And so then what the fat has is it has an entry to point to the next block. So file number 31 has a pointer to what ends up being block 31 and the fat has a pointer to block 32, it turns out here, which now has a pointer to this block wherever that is. And so by starting with the file number, you figure out what index the first block is, and then you just start following the links until you get to the block of interest. Okay, and so for instance, because we were interested in block two offset X, we basically follow these links down, we read off of disk into memory, and then the last thing we can do is now that we've got that block, we can use X to give us our offset in the block. Okay, questions? So the disk, the list is not doubly linked. That's a good question. It's just singly linked. And as you can imagine, there's a lot of you don't seek backwards. You start from the beginning and you work your way forward. So now the one thing, so this is already that's a good question because you can see that this has some inefficiency built into it. It's the simplest possible way of indicating which blocks belong to which file. So the one saving grace on this is in really in machines with a lot of memory, you just load the whole fat table in and then this tracing is a lot faster because you're not actually reading blocks from it. So what are some properties of this system while files a collection of disk blocks, the fat is linked is linked one to one with blocks. There was a question on the chat here in this example where the file 32 be stored in the fat. The answer is there is no file 32. So only some of the file numbers are valid in this instance because they point to the first block in a file. 32 is not a valid file. It's a valid pointer into the middle of a file. Okay, and the other thing that you can figure out from this right away is boy, this sounds like you could easily screw everything up by jumping in the middle of a file and there really is no control over this. And so you can start to see why fat is a extremely simple for little devices but and therefore desirable but be not very desirable if you have any sort of worry about security. The file number so the difference between fat and X fat. So there's a lot of versions of fat which have increased size so like fat 32 the original one you could only have a 16-bit version of this fat 32 allows much larger pointers X fat has some other properties to it. But let's keep going on the basic fat here for a moment. These are all variants on the theme in case you are asking that question. So file offset is a block and index is a list to get the block number as we mentioned. Unused blocks are marked free and there's no real ordering. So the other thing that's really unfortunate about this is if you need a new block you have to just start scanning at the beginning of the fat and finding the next block. And if you have lots of use of the block and the file system and it's mostly full then what we end up with is something that has a lot of holes in it but takes a huge amount of time to find the holes. And these are free blocks but you actually have to scan to find them. And the other thing to note here is that notice that there's no locality by definition in here as well. So if you've done a lot of reading and deleting and reading and deleting of a fat file which you find the file system is you end up very quickly with problems because there's no locality whatsoever. Okay and so you're seeking all over the place. So this is very undesirable for spinning storage less of a problem for something like flash storage on a flash key. So let's look at an example of writing. Suppose we want to write file 31 block 3. Well notice there is no block 3. So we have to get a free block and link into the file. So here's an example we grabbed a free block we linked it into the file. And then we can write. Okay and so we growth the file by allocating free blocks. And if we open a new file here's an example of file 63 it might be linked somewhere else. And so what you're looking at now is a file system with two blocks in it or two files in it. Excuse me one file with four blocks the other with two blocks. Okay so what about this? Well it's simple. You can't argue that. It's used in Windows, USB drives, SD cards, cameras, phones, you name it, it's there. Okay where is the fat itself stored? Well it's stored in an early part of the disk in some of the early blocks but usually if you have enough memory you load as much of it as you can into memory boot time. What happens when you format a disk? What you're really doing is you're marking the fat entries is free and optionally zeroing the block. So Windows 10, so the question is does Windows 10 use fat? So the question depends on how you're asking that question. So Windows 10 can read and write fat as you can plug a USB drive into Windows 10 without any problem. Most people that have Windows 10 and in fact earlier versions of Windows have all formatted a second file system called NTFS which we'll talk about a little later. So the primary file system is absolutely not fat but certainly Windows 10 can use fat and anytime you plug a USB key in or attach your phone or whatever to a Windows 10 box you're going to be able to read that file system. Okay. So as I was saying about formatting you can see that we could zero all the blocks and then mark all the entries it's free and that would be one way of making sure that the data is properly destroyed although you really want to write over a bunch of zeros and ones and so on but oftentimes there is a fat there was a fat file system excuse me there's a way to basically format quickly which would just redo the fat here and leave all the data and sometimes that's done accidentally and you can try to have a recovery program that tries to find reconstruct the blocks which is sometimes successful sometimes not. There's a statement on the chat about Windows XP using the fat file system. Yes, there was definitely a more prevalence of fat earlier in time but and you could format your C drive as fat and a lot of consumer products did that but you could also use the Windows NT file system once you get past Windows 10 pretty easily and so if you're worried about reliability most people don't do fat for that. So what happens when you quick format a disk basically marking the entries is free and leaving the data in the blocks so pros very simple you can actually implement it in firmware okay issues how long does it take to find a block well you got to linearly scan this linked list so this is really bad at both random and sequential access what's the block layout for the file well it's haphazard it's whatever happened to happen you know whatever blocks were free when you were writing it sequential access requires a lot of pointer chasing random access requires a lot of pointer chasing and this is illustrating what we often call fragmentation so what you see here is that green and yellow blocks are interspersed with each other there are defragmentation tools that can be used to bring all the blocks of a file close to each other and sequential and that will overall speed up the performance quite a bit certainly for reads you can do a lot of sequential access etc what is unfortunate is then as you start using and deleting and modifying files then the holes open up and you need to defrag every now and then so small files are extremely efficient right you can find a file that's one block big files less so now what about the directories in the fat well as I indicated earlier directories are just files in the case of the fat file system what they are is there a series of linked entries which map a name and a file number or block number and so if you were trying to look for home Tom foo then what you would do is I'm assuming that this is home Tom directory and so in that you would look for foo that would give you a pointer which you could now point into the fat and start the first block of the file system so the directory is a file right so is a file containing file name file number mappings free space is in here for new and deleted entries and whenever you delete an entry it just leaves it as free and sort of links over it where do you find the root directory well every file system has has to have an answer for this it's always at a well-defined place on disk and for fat this is a block two there are no blocks there or one don't ask me why that's just the way they numbered it remaining directories are accessed via their file number questions now fat has lots of security holes okay nobody in their right mind would claim that fat is secure there's no access rights you can't say this person is not allowed to see it this person's not allowed to write it there's no way even in principle to track ownership of data so that's problem there's no header in the file blocks and so what that means is there's no way to enforce control over the data since kind of all processes have access to all the whole fat table and so there really isn't any way to prevent people from entering the middle of files and so on and you're just following pointers to disk blocks so fat was designed back in the day when when operating systems on personal machines were very simple okay I said that already okay that's going to be what I'm going to say about fat unless we had any final questions so we can hope to do better and that's what we're going to hope to do today so in order to do better we should know what we're trying to achieve and so there have been many interesting studies over time I present this one just because it was so thorough at the time it was over a number of years at Microsoft and it was looking at file system metadata and I'm going to take these two graphs separately so the first one was the characteristics of files and what you saw here is that in 2000 2001 2002 2003 2004 what you see here is these are the number of files in the file system of a given size so this peak here for instance is the peak of 2k byte files how many of them in the system have our two kilobytes in size and this says there's about 3,000 of them in 2000 but by 2004 there were 10,000 of them and what you see two things from this one the number of files was growing without bound and ridiculously quickly and secondly that most of the files in the system are small we're not talking about 8 megabyte files or 128 megabyte files we're talking about 2k files so small files are important to optimize for in the kind of system we were looking at here which is kind of a standardized Unix system but although most of the files are small most of the bytes are in the large files so if you look at this this is a different graph and what this is is this is the size of the files and this is the number of bytes in each of those and what you see is these are peaking out at the 16 meg region or two you know 8 meg region so most of the space is occupied by large files which is not entirely surprising you might have a whole bunch of system files which are small but then some video files which are large and they're basically taking up most of of the data or most of the space on the disk okay so there's a small number of really big files and a large number of really small files so it'd be nice if we had some way to build a file system that could handle both of these well enter the Unix file system so the original inode format appeared in BST 4.1 that's pretty prevalent now it was the Berkeley standard distribution Unix part of your heritage and it's very similar to Linux's EXT 2 or 3 the file number itself is an index into node arrays yay Berkeley number one okay the file number is an index into inode arrays and an inode is a multi-level index structure that describes everything you need to know about a file so it's good for little and large files it's an asymmetric tree with thick size blocks that basically helps in being good for little and large files and I'll show you what that means in a second the metadata associated with the file not the directory so what does that mean that means that this inode structure itself describes who's allowed to access it and with what access writes when was it last accessed and so on and that's part of the file not the directory and so it doesn't matter which directory you put a file in it has well-defined permissions which are part of its structure so a particularly good instance of this was the Unix fast file system BST 4.2 which was a follow-on to the 4.1 BST file system structure where locality heuristics were put into play better block group placement, reserving space, etc to get performance so the first version at BST 4.1 was getting a lot of functionality in there 4.2 helped make it fast okay and Unix also has a scalable directory structure so we'll talk about that in a second but here's a basic idea of what we've got on disk in the fast file system so we have an inode array which is just a bunch of blocks on disks that hold inodes each of these inodes is typically something like 128 bytes so you could hold a bunch of them in a single disk block okay and an inode we said describes the file and so that inode has metadata which are like who's the owner what are the read write permissions of this file etc it has a set of direct pointers what's a direct pointer a direct pointer is in the inode structure and gives a direct block number on the disk and so a direct pointer directly points to a block an indirect pointer points to a block that has pointers to blocks a doubly indirect pointer points to a block that points to a block that points to blocks and triple and so on The reason we use this structure the way we do is this I node for small files is extremely efficient because in the I node itself we point to all the blocks that are in use directly and so there's only pull the I node into memory and then directly access the blocks for small files So it's extremely efficient for the type of files that we've decided we have a lot of which are small ones How do we handle the big ones? Well, we handle big files with these indirect doubly indirect and triply indirect Pointers which let us get a very large number of total data blocks for large files Okay, and the the other thing to think about for a moment here is the efficiency of Going through this data structure to get all of these blocks at the end is less worrisome for really large files because we're pulling in so many files and so many of these indirect blocks Get pulled into memory once and then they're traversed multiple times And so when we're using really large files this particular data structure holds them well and doesn't suffer from those kind of Inefficiencies you might worry about because of all these intermediate blocks So the file attributes up here in the metadata structure user group Which nine basic access control bits Set UID bit which executes that owner's permission. So if you put set UID In and I know that means that if any user tries to access That file the first thing it'll happen is that the operating system will change to the owner's permission In order before it starts executing and so that's how for instance things that require root permission Can be used by users because they have a set UID bit question was Here was why why indirect blocks again? So the answer is that for a small amount of inode space the indirect doubly indirect and triply indirect blocks Let's us describe a lot of blocks Okay, but we don't want to leave all of that space in an inode for direct blocks because That would mean that our inodes are huge, but we only use a Small number of the direct pointers most of the time since most of the blocks are small So the the reason we have this structure this asymmetric tree structure is really Since most of our files are small they can all fit all of their blocks directly in the inode for ones that are a little bigger They only have to worry about a single indirect block and for the really large ones We spread the pointers to the data blocks over more disc blocks. I hope that helped So the question is so are they essentially different parts of the file, so I'm wondering Which what you mean by that? Let me try this again So if you are interested suppose that there are 10 of these direct blocks if you try to read something that's at block 9 Okay, you'll start scanning you'll get to block 9 because it's in the direct pointer set still and access the block directly If you want blocks 10 you have to basically load the direct indirect block first and then block 10 is going to be here 11, 12, 13, 14 Then once we get past all of these blocks there might be a thousand of them then then we go to the doubly indirect we load this guy this guy and that's our next block So all of the data blocks in the file Is are along this axis over here and figuring out how to get to block 593 is knowing what this structure is Okay. Now, let me Let me back up here. There was one more question of what's the limit here. So the the typical limit with a The Unix inode structure is it doesn't go beyond triply indirect Okay, and that's just because of the size of the inodes Which are set to like 128 bytes and so that's a an optimization that's been chosen the inode array itself As I mentioned earlier is actually stored on the disk But then when you pull an inode when you access a certain file you pull its inode into memory and then you start accessing this If if the inode has no pointers to give the files too large then you get a right error You you are not allowed to write past that point So so this this represents a hard limit on the maximum size of the file and so that's why These data blocks in the original file system were actually 512 bytes They've become 4k bytes over time or a k or even larger. You can like in Linux You can have 16k data blocks and that's to allow much larger files on really big disks Okay, now Let's see So why would a block be separated from the lab? The other ones here is another question So the the reason that each of these blocks are separate is that's basically because of the way the disk structure is So the disk is divided into a series of blocks really sectors But we put a bunch of them together into blocks And so we need to keep track of which blocks are part of which files All right Now Hopefully that helps now Now we can start to talk a little bit about this We're going to go forward in a moment But you can imagine that these blocks could be spread all over the disk and this file would be extremely Painfully slow to access so part we're going to have to access part of what we're going to have to talk about is how to We make sure these data blocks are mostly sequential on the disk itself and that's going to be a design point for us so Metadata as I mentioned the most interesting thing you probably haven't thought about as much is a set UID bit and set GID bit Basically our bits that you can set When when you're the owner of a file that allow Things to be for instance Accessed as if you were say root and so things that have to manipulate That have to actually manipulate Things that only the operating system route can do you can produce a file that's an executable that can do that optimization and actually Access those things by setting this at UID and set GID bits to give permissions So that anybody can run those files And then you can actually Set UID and set GID bits to give permissions so that anybody can run those files and as you can imagine this is an attack point You know one way to attack a system is to figure out how to set the set UID bit on it And then suddenly you've got what should be a user accessible or a user mode file Executable is suddenly got root permissions and now it can do all sorts of damage because it can act as route so The data pointers I mentioned the direct pointers Okay, so Linux actually has 12 of these So 4k blocks give you 48 kilobytes of directly accessible ones And that's basically handling the fact that we have lots of small files Okay, and the large files we handle with the triply indirect ones Okay, and so four kilobyte blocks has a thousand twenty four pointers at each level And so basically you can get four terabytes at level four if you had a level four Four gigabytes of level three, etc Okay, and so one differentiation in between different file systems is kind of how many direct blocks are there What's the maximum number of indirect levels you can have and so on? So are the data blocks for each file in contiguous memory locations on disk? No So they don't have to be these actually could be spread all over the disc If they were in contiguous locations on the disk, then it would be very efficient to read quickly So you can see that there's a huge benefit to arranging these to be on disk. Well, but that's the trick Okay, that's the trick. You have to make sure you do that in the original the original version of the file system from 4.1 BSD didn't do any Attempt to make things be sequential and as a result when you ran the file system for long enough It started to get really fragmented and things were very slow so BSD version 4.2 1984 was same as BSD 4.1 So it had the same file header and triple indirect blocks, etc except it incorporated ideas from the Cray operating system to basically get us to Allocating things sequentially that's our key right and so I'd used bitmap allocation in place of a free list So you just sort of line all the blocks up in a linear order and now you just got bits to say which ones are free and which aren't And it did have really worked really hard to allocate files contiguously But if you think about the interface that you are now very familiar with all term You open a file and you start writing it in the operating system really has no idea how big that file is going to be So you could stop after you write 2k Or you could stop after you write a terabyte and there isn't any obvious difference in that interface Okay, and so what happens in the 4.2 BSD is it Spreads everything out and allocates things in chunks in a way that tries to optimize for as much Sequential access as possible and one of the tricks there is reserving 10% of the disk space always being free So that you can probabilistically find long runs of blocks to put into files as you're constructing There's also something called skip sector positioning which made a lot of sense in the old days I'll show you in a moment what that was. It's not used as much anymore So the problem here is really when you create a file, you don't know how big it will become So how much contiguous space do you allocate for it? Well, what you do is you find a range of free blocks and you put each new file at the front of a different range And so really you spread the beginnings of the files out as much as you can in order to Give you successive blocks that can be allocated together And you store The flip side of this though is you also store files from the same directory near each other so that when you do an LS Dash L. I'm sure you guys do that a lot You can see all the metadata pretty quickly because it doesn't have to seek a lot to get that metadata So the fast file system was really this Bsd 4.2 file system and I put a paper up on the readings for you guys on the Resources page, which is the fast file system descriptor page paper and we we actually study that sometimes in In 262 So let me show you One optimization that was pretty important back in the day when this first came out Which was a rotational problem and the idea here is that You you read a block you do some processing you go back to read the next block But by the time you get to reading the second block the disk is turned past the first block So it's actually bad in this scenario to have blocks absolutely right next to each other on a track because These older systems would read a block do some processing come back to read the next block and it'll missed it And so you end up with sort of one rotation for every read Of a block which sounds awful. And so it was the trick while they allocated their data with skip sector Okay, and so the idea would be that the pink and the blue Represent different files and so you'd allocate blocks with sector skips in them where you compute the amount of skipping based on this processing delay Okay, that's very dependent on what processor you had what your interrupt time was and so on And so that was part of what the BSD 4.2 file system did today, of course, you don't have to worry about that and if you were to go to Seagate's Website and look at the specs on a disk drive, for instance, what you'd find is it's got DRAM on the disk What is that DRAM used for well among other things it's used for track buffering And so the way we solve this today is you load a whole track at a time and put it in RAM and now Successive accesses to that track buffer are at full speed and don't require you to worry about Rotational delay and so these track buffers are a huge importance to performance And if you look there's a lot of memory on modern disks megabytes and megabytes of memory So as an aside now modern disks and controllers do a whole bunch of things under the covers There's track buffers that I just mentioned. They do the elevator algorithms We talked about last time they filter out bad blocks and by giving you a linear logical block Ordering and so pretty much the operating system is insulated from a lot of The things it used to have to worry about in terms of optimizing for disk access that stuff is hidden inside the controller on modern disks and so what's what's a funny side effect of this is a lot of operating systems have residual Scheduling in them that tries to optimize for these things that the controller is already doing. It's almost like having an appendix, right? You know, this is like the operating systems appendix with respected disks and it's busy not functionally trying to do elevator algorithms and buffering and stuff on single tracks and You know at best it's going to get in the way of the controllers and so Modern file systems have started stripping some of this stuff out Okay, and as you can imagine when you get to flash a flash file system absolutely filters that kind of stuff out Okay, because it doesn't have to worry about rotation Okay, let's take a a brief Let's take a brief pause We have a good question here in the chat So the question is is the size of the inode array the limit on the number of files you can have on your disk. It is a limit. Yes you could by allocating too few inodes you could actually Have a limit on how many files you were allowed to do And so that's actually a parameter on formatting as to how many inodes you have in the system So in principle, you could run out of inodes and it has happened in the past To me, but usually the parameters that you get with formatting are pretty good and you almost always Are almost never have to worry about that Okay, so Okay, I think our break we can continue so Where are the inodes stored well in early Unix and Windows fad file system. They were basically in a special array on the outermost cylinders. So all the inodes were say on the outer part of the disk and this is Kind of silly because the headers were the inodes were basically not stored anywhere near the data blocks themselves And so just to read a small file You'd have to seek to get the header and then seek back to the data and so just the existence of inodes Way out on the edge there kind of destroyed locality by definition and They were a fixed size set when the disk is formatted And at formatting time fixed number inodes are created. Each has given a unique number And that number is used to index into this array Later versions of Unix including the fast file system, which I said the paper is up there on the website Move the header information to be closer to data blocks And in fact the inode arrays are actually there's many of them spread throughout the disk and The way we do that basically both with the vst fast file system and Linux CXT two and three is there actually things called cylinder groups and The cylinder groups actually have the inodes in them and as a result the inodes are much closer to the data itself and the pros of doing this is By putting the file headers on many cylinders you get that performance advantage So for small directories, you could probably fit all the data file headers, etc in the same cylinder with no seeks and the other kind of important thing about this is The reliability of this original layout of inodes was horrendous because if You got a head crash which literally means that the head on the disk hit the media and Started digging holes into it and you got a head crash on the outer part of the disk You would destroy all the inodes in the system and as a result essentially destroy all the data because you didn't know what was linked to what And so by spreading the inodes throughout the the disk You've much made it much more likely that even if part of the disk is damaged You can still find a bunch of files and and they're associated data close to each other in different file groups Then wherever the disk head touchdown, so that's a good thing And it was part of the fast file system and It's basically part of the optimization to avoid seeks and it's also got its reliability benefits as well So here for instance was the block groups. They're just series of concentric rings. You've got free space bitmaps and inodes in a block group and so each block group has its own free space and inodes, okay and Data blocks metadata free space is inter interleaved within the group So you're avoiding huge seek So basically it's quite possible that you could once you got to the director You're interested in you could move back and forth only within a block group and not have to Scan out while you were essentially doing things within a given directory Okay, and so now the directory all its files are in a common block group And so now things like LS dash L, which is what I do all the time on the unit systems Basically, you know, tell me all the files in this directory and what its metadata is are much more efficient because they just have to stay within a block group Okay, now the other thing is these block groups are using something called first first free allocation which is when you expand a file what you do is you first try to find successive blocks in a bitmap that have a bunch of consecutive blocks in it and Eventually when the file gets big enough, then you might go to a separate block group To look for big strings of files because in that case, you know that This file keeps growing. Maybe I need to find a big run of three blocks because it looks like with high probability It's going to get even bigger But the important part about this allocation is really that you have to keep 10% or more free This is just a probabilistic statement that if you have at least 10% of your blocks free Then it's far more likely that there'll be larger runs of blocks that are free and It's you know experimentally they found that if they kept 10% of the blocks sort of in reserve They were far more likely to do good allocation And so that's a maybe it's just surprising the first time you hear it But in fact when you're running a Linux file system and it tells you that you're out of space It's probably the case that you've used up 90% of the blocks and they're still add 10% there if you're the super user or root then you often can go ahead and over allocate but that's Probably not something you want to do because then your file system will start performing very poorly So this is the trying to find we have a small number of blocks We might fill in a small chain and then we can also find bigger strings to To help us allocate for larger files Okay So the pros of the fast file system was efficient storage for small and large files locality for small and large files and Metadata and data node defragmentation cons very inefficient for tiny files So like a one byte file still needs an inode and a data block inefficient encoding when files mostly contiguous on disk if it's mostly contiguous what you'd like to do is just sort of point at the start of it and and then have a series of Sectors afterwards that we're all part of the same file. You know why bother having a bunch of pointers And of course we need to reserve this 10 to 20% free space so that may be a con Okay, so the X-T3 or two and three disk layout has Similarities exactly to what we just told you about okay, so disk has divided into block groups providing locality each group has two block size bitmaps free blocks and inodes Block sizes are settable so you can actually at the time that you produce the file system You can have a 1k 2k 4k 8k block The inode structure is very similar to 4.2 BSD with 12 direct pointers whereas the original BSD had 10 The only difference between ex-t2 and ex-t3 is the addition of a journal That will help you you can see journal contents up here that will help Make sure that when your machine crashes that the file system is more likely to Avoid corruption and we'll talk about how to use journaling and a little later next time So Let's talk a little bit more about directories So basically directories are stored in files and can be read but typically you don't read the file The directories directly, but you could in principle open a directory and read through it So you have access to that directory. It's system calls that basically Manipulate these so open or create traverses the structure to create new entries Make dear remove dear add and remove entries Link and unlink basically link an existing file to a directory So for instance this file down here user lib 4.3 slash foo I could have a link in user lib as well a hard link if I wanted To that and so I could have an entry in this directory pointing at that directly And that's a that's a hard link Okay, the other thing I could do is this hard link now is part of the directory system And so if I were to remove this file from this directory The file would not get deallocated because I've got a hard pointer to it the other option is A soft link where what I have here is this rather than an actual directory entry I just have a special file called foo and user lib and foo is marked as a soft link or a sim link and Has the full path of foo in it. That's a little different from a hard link That's a soft link and in that case I could accidentally delete the file On underneath and that soft link would still think it was pointing to something that didn't exist Or as hard links have explicit checks on them And so when you delete a file it deletes all of the things that are pointing at it as hard links Okay, all right, so these there's a bunch of libc support open dear read dear etc for directories and that's basically Operating system level support for manipulating directories, which yes, they're only files, but rarely Do you ever write a directory yourself? In fact, that would be a bad idea and it's disallowed unless you're rude So hard link as I said sets a directory entry to contain the file number of the file So it creates a separate name or path for the file. You can't make loops with links hard links soft links or symbolic links or shortcuts some operating systems call that are basically a Special type of file that really just it contains a path and name of the file And it's really mapping one name into another and it's completely independent of The actual files and so here with soft links you can easily create loops if you wanted to etc But it's not and they're not explicitly Referenced counted and so on but you'll see a lot of soft links. That's ln dash l excuse me ln Dash s is how you make a soft link And many of you have probably done that you can do man on ln for link to see how that goes So one downside of a lot of the original Unix file systems were that the directories were linear and so to find To find a file in a huge directory you actually had to linearly search through the whole directory There was no index structure on it and that's still the case for a lot of operating systems a lot of Unix operating systems in Some of the variants free BSD net BSD open BSD There's actually these what should be have been obvious the ability to put an index on top of a directory So that when you're looking for some file There's actually an efficient search to find the i number with it And this is not available by default on all operating systems even these guys, but it's the obvious thing and if you're ever wondering why a Directory becomes extremely slow when you put a lot of files in it. It's because this index instructor is not always supported on all operating systems so an NTFS the new technology file system, which is You know, this is new is is old now back from the Late 90s early 2000s to the fault on Microsoft Windows systems for everything except for things like USB keys and so on Variable length extents rather than fixed blocks. So the interesting thing there is in the file systems we are just talking about every block is 4k or 8k or whatever and variable length is In NTFS says that the blocks are really just runs of blocks You point at the start, you know how long it is and now you got a run of blocks and so NTFS Doesn't have to have a pointer for every block. It does this nice Optimization that if you have a really large file and there was a run of free blocks you just have to point at the beginning of the block and So it's much more efficient in terms of total metadata for files So the interesting thing about it is pretty much everything's a sequence of attribute value pairs Which are metadata and data tied together and I'll show you that in a second and you mix direct and indirect block Direct and indirect addressing very freely. Okay, and directories do have a be tree structure. So they're efficient. So Basically the way this works and you can look up. I've got a link here to learn more But NTFS has something called the master file table and that master file table is Rather than there being a root file system the master file table is a database With very flexible one kilobyte entries that map metadata and data together as an entry Okay, and then variable size attribute records come out of that. So if you have really small files, you can have the directory entry The inode structure and the data are all together in one of these entries in the master file table So this is extremely efficient for small files and when you want to get to big files What you do is you have one of these entries and then it points to these big attribute Attributes that basically point it runs on the disk Okay, and so this master file table has a whole bunch of small file records, which basically have everything in them complete the the name the Metadata and the data all in one chunk one one kilobyte chunk, but when you have something very large then that Entry in the master table basically points out a bunch of other Extents and each of those extents can point at extents and so you get to very large files only if necessary Okay, and ext4 which is a follow-on to ext3 for Linux has a similar kind of structure Okay, and when you create a file Not surprisingly you can provide a hint as to how big it is so if you're thinking this is going to be a really big file you give a hint and Therefore the file system can allocate a large chunk of data if you're going to a large chunk or Extent if you're going to have a really large file and they can do that at open create time as a result Then you're not having to guess on the fly How big this thing is going to be or figure it out after the fact like basically the BSD file system does And NTI FS has a journaling mechanism, which we'll talk about next time for reliability or we'll discuss it later So here's a small file master record There's some standard information like create time modify time access time owner ID, etc. The file name itself the data Okay, and that's just one record. Okay, so this is extremely efficient and notice That this what we talk about this data is it's an attribute list so it can have a list of name colon data attributes and so You can have multiple streams of data So this this file system supports a single name having multiple streams it also allows you to put as Things in this data can be pointers to other master file table records or to extents pairs And so you can basically build a tree out of this that builds large files Okay So here's an example of for instance a medium file where the record instead of the actual data has pointers to A start and a length on disk for an extent and it has several of them Okay, and so that's a way to get bigger files really The the downside of this is because we have variable size extents rather than blocks Which are fixed size means that now we have fragmentation problems and so yes NTFS has a Potential fragmented file system problem and so it tries to do a bunch of dynamic stuff just like The fast file system did to try to prevent that fragmentation and keep long runs basically But potentially if you've been reading and writing an NTFS file system a lot and deleting a lot You can get a fragmentation situation where you can never Allocate these large extents anymore because there aren't any large runs and so then this degenerates into a lot of really small runs And it gets really inefficient pretty quickly And then you can run a defragmentation style thing on it if you like Okay, here's a huge badly fragmented file and that funny so you basically have the first Master record then points at a bunch of other master records that are in the master file table Which then maybe points at extents as well and so you can build I like to think of these as Franken files They're pretty big and and messy But it's extremely flexible and notice the key thing that we started with is that it's Extraordinarily efficient for small files and can handle really large files assuming that you haven't gotten a significant amount of fragmentation Okay now Let's talk a little bit about Interfaces to files so you're very familiar with this idea of you open a file you read it you write it and you close it but And that's pretty pretty much what I call traditional IO, which is explicit transfers between buffers and In the process address space and in the kernel you you execute a read you press a buffer into the read the read goes Into the kernel it finds your data and brings it back and this involves basically multiple copies Into caches in memory and so on and plus system calls You might ask us a different question, which is what if you could quote map The file directly into memory and then just do memory reads and writes that would seem like that could be a lot more efficient and would avoid all sorts of multi copies and Between buffers and would avoid system call overhead and so Well, could we do that and yes we can it's called M. Map and by the way executable files are treated this way when we Execute the process so what actually happens when you exec a binary off disk is It maps that disk file into memory and then it demand pages in as you start running the executable And so really you've already used this memory mapping, but didn't know you were doing it when you used exec So just to remember for a moment Here we have an instruction that's accessing the memory management management unit if the page is properly in the page table in a state that is Accessible by this access then we get the page in memory and we access it Otherwise if the page table is invalid or we try to do a write and the things that read only then we get a page fault that page fault Causes an exception handler and the page fault handler which goes and pulls the page off of the disk fixes up the page table and now Sometime later the scheduler returns and starts re-executing that process and the second time it now works having pulled the block it So if you look at this This swap drive or swap space that we just talked about is on the disk just like a regular file. So What if you have files? Okay, and what we're going to do is we're going to memory map the files using M map into parts of our address space and now Here's a page table entries mapping a file and so now that mapping is in theory pointing at files on the on the disk and if I go to try to access Something in that space and I get a page fault because there was no page table entry I do the same thing I did before I get an exception handler. It pulls in a block of the file into memory fixes up the page table and now The next time I go through I can basically directly access the file Via this memory and so if I'm doing a load I would be reading from the file Just by doing a load. Okay, and so this What I've done here is I memory map the file and now reads and writes basically give me access to that file Directly if I write to this they get pushed out to the file Okay, so So this is called M map you guys should all do a man on this to see what it looks like Mem map allocates memory or map files or devices into memory. It's got a lot of arguments The the ones of interest here for the moment are the file descriptor. So this is the file you're mapping And then the address where you want to put it in your virtual address space You can leave it at zero and then what will happen is the M map system call will pick a space That's free in your virtual address space and what gets returned from M map is the new allocated pointer Which represents the file? Okay, and so it may map a specific region or let the system find one for you So that's the difference on whether you say, you know, try to map me at this address and it will either Succeed or fail or if you put a zero then it'll find one for you And it's used for both manipulating files and for sharing between processes So for instance this file descriptor could be a file that's used to talk between two different processes And that would be one way to set up mutual communication Okay, let me show you a simple example here where What we do is we we give it the name of a file That's M file and we first of all print out, you know data heap stack What the different addresses are for that we go ahead and open the file And once we've got the file open and we've checked our errors then we're gonna M map it So notice by putting a zero here. I'm saying well find me some spot my address space wherever you can We want to be able to read and write the file and we're mapping a file and we're mapping it shared Which basically means other processes may be doing this And so it helps to make sure that things get pushed out to disk from memory And then here's the disk descriptor of the open file. We map it. We're gonna print Where it is in our virtual space and then we're gonna just put some data on it Okay Right here with the string copy and that what do I mean by that? That's just taking the string Let's write over it and it's writing at the thing that was returned from M map plus 20 bytes And so by copying into memory. We're actually writing the file. So this is M map Okay, and so for instance if here's this example if we cat the test file It says this is line one this line two line three line four, but then we run M map Okay with test and what happens here is it tells us some addresses and notice M map returned Something close or approximate to our data Okay, which makes sense. It's kind of in the data regions that are free for mapping and what after we're done right Running M map. What should happen? Well if we can't test again, look at this This is line one the let's write over its line three This is line four if you notice this is we're gonna count one two three four five six seven eight nine ten eleven twelve thirteen fourteen fifteen sixteen There's a new line seventeen eighteen nineteen twenty Starting zero that'd be twenty here is where L is and so let's write over it Basically happened just because I copied onto memory and it went on to the disk as a result Okay, so that's M map So the point at which write back happens You have to be careful about so there are Special M flush Operations you can do to make sure it goes back also when you close it'll go back and there are a few other Ways of making sure it gets flushed to disk you can look at the man page and so on The the other thing I wanted to point out is when we ran M map here We gave it a file descriptor that was a file we can actually do Give it a file descriptor from The shared memory system call that will allocate some memory for us and return a file descriptor to that memory Which we can then map this way and now we have memory we can share between two different processes Without backing it by disk. Okay, and so it doesn't this doesn't have to be a disk option, okay So here's an example of doing that so both Processes are going to map the same file name into their memories. So this is its memory space This is this memory space notice that they're not the same part of the virtual space because in this case we Put zero on M map and said find me which can We could through a separate communication mechanism come up with the address space That we want to map and then have both of them try to run M map with a specific address if we wanted to sort of exchange data structures or whatever and now Basically these files are mapped into memory and now as I read and write to that part of memory I'm actually communicating. Okay, and so This is a way to get inter process communication working But it's backed by a file if we want it to be backed by memory instead We can do what I said earlier which is allocate Using the shared memory allocation functions something to put in here Or we can do something called map Rather than file we can do anonymous and then it will find us some spare memory. That's not backed by disk All right, so let's say a little bit about caching before we end up for the day so As you can imagine or as you realize disks are very slow And so what we want to do for the file system is we want to build a cache and I already said You know everything in operating systems a cache And you can quote kubi on that if you like but you exploit locality by caching data and memory And so many different things can be cached even with the file system So one is those name translations which are mappings from paths, you know slash my slash book whatever to inodes those are cached as name translations and Disk blocks mappings from block address to disk content are also cached in the cache and a cache and something called the buffer cache So this is memory used to cache kernel resources including disk blocks and name translations It can have dirty blocks which are blocks that are not on disk. So if you remember we would call this a right back cache The replacement policy is typically LRU and you can afford the overhead of time stamps or linked lists or whatever you want for LRU because Basically, we are only worried about the granularity of pulling things in and off of the disk Not every load and store The advantages here is it works This kind of caching works extremely well for name translations because you don't often change the names And it works well in general for the disk cache as well as long as memory is big enough Or you're not sort of striping across all of your disk So the disadvantage of course There are disadvantages One of them is it fails when some applications scan all the way through the whole disk you can easily Cast everything out of memory which is a problem. So for instance, if you do find dot dash except grep foo Whatever this is basically Grepping for the word foo through every file in the file system. You could do this as a way to completely blow out your file cache You could imagine other replacement policies than LRU So for instance some systems actually allow applications to request policies a good example of one that you might want is used once so if you if you Ask for a use once policy because you're going to walk across a lot of things exactly once and you don't want to blow out your cache for Other things that are in your cache, then you might use the use once policy So the cache size you might ask how much virtual memory should the OS allocate to the buffer cache versus virtual memory so you can imagine there's a tug of war here because You know, you want to have enough Pages for your virtual memory so that you're not thrashing but you want to have enough buffer cache so that you're getting good behavior out of your file system Too much memory of the file system cache. You can't run many applications at once too little memory of the file system cache Many applications may run slowly because they're accessing the disk all the time there's a point here to decide the solution Today these days is to adjust the boundary dynamically to try to find Something in the middle of what's the right amount to give to the buffer cache versus virtual memory When I first started building kernels way back when you actually had to have a compile constant to say sort of how much you put in each category and that was Kind of unfortunate because how did you guess? So one of the things the caching does is free read ahead prefetching So when you read a block it often reads the next couple of blocks out of the file because it's highly likely they're going to be needed That's exploiting the fact the most common file access is sequential The elevator algorithm Can efficiently interleave groups of prefetches from concurrent applications whether or not the elevator algorithms Elevator algorithms run in the operating system doesn't matter what True here is if I have enough requests Even ones that are prefetching from different parts on the disk then whoever's running the elevator algorithm Operating system or the disk controller can rearrange them to do a good job of access How much do you prefetch well if you prefetch too much you're going to blow out your buffer cache and pollute it It's the term there. So So what you basically do is a small number of reads. It's kind of what happens Okay, so we'll pick this up next time actually I'm going to finish one more thing if that's okay with you guys Delayed writes our rights to files are not immediately sent out to disk So the reason we do that is so that if we think about a compile which generates a whole bunch of It generates a whole bunch of temporary files if we don't push things out to disk right away It's quite possible that We can create files Right to them use them and delete them and they never have to go to disk Okay, so that's one big advantage of delayed writes another advantage is if you have files that are overwritten multiple times by waiting Before you push out to disk. You're basically Getting some efficiency that way. All right So the problem so delayed writes have a lot of performance and we're gonna have to talk more about the consequences of this Which is basically if you crash at the wrong time, you can actually lose data And the original Unix only flushed data every 30 seconds. So that's a problem Or you know, it was a good you got the good performance out of the delayed writes But you had that vulnerable window of 30 seconds where you might lose the last 30 seconds to your data So the advantages of delayed writes is you can efficiently order a lot of requests So if you have a bunch of dirty blocks in your buffer cache You can choose how to send them out or you can send them out in groups and The elevator algorithm can basically rearrange them to do a good job You can run disk allocation knowing the correct size So if you have a bunch of things in your buffer cache that are writes, you can say oh look There's actually 30 blocks there that we haven't pushed to disk. Let's find a run of 30 free blocks Okay, and also you can avoid maybe writing temporary files and entirely this advantage is system crashes Okay, because you could lose a bunch of data and what we're gonna do for that next time It's going to talk or the time after we're going to talk about how journaling can help save us Okay, so in conclusion file system transforms blocks into files and directories you optimize for size access and usage Patterns we're trying to maximize sequential access but have random access as a possibility We're going to protect the OS Protection and security regime with metadata Okay File defined by a header called an inode and that I know it looks different depending on what the file system is Naming translates from the user visible names to the system resources Which is the inodes for instance So directories map from names to other files or directories the directory can be a linked tree Or a linked or a tree structure and we talked about multi-level indexing schemes I nodes contain file info direct pointers to blocks indirect pointers, etc Which optimizes for many small files, but a few really big files NTFS gives a variation on that which allows you to have extents. All right, so I think we're done We talked about 4.2 BSD with a bunch of different optimizations We talked about layout driven by free space management And we talked about memory management is really using mappings between memory and The disk and we can start using those on our own to do some interesting types of communication. All right. Good luck everybody We're going to call it a day As you've seen we're mailing out info about how the exams are going to happen We're still on track to doing something on Thursday. I'm going to wish you all really good luck and We have a lot of communication over Piazza and with the TAs and so on so Best of luck. I hope you have a great evening and we'll talk to you soon. Good night. Thank you