 to the 16th lecture in the course design and engineering of computer systems. So, in this week we are going to study a little bit more about the IO sub system in operating systems. So, let us get started. So, this is a recap of what we have seen in the course so far. So, we have studied the basics of what are computer systems, what are some of the common principles of designing computer systems. If you remember from the first week, we had studied principles like abstraction, virtualization, fixed size allocation, caching and you would have seen these principles being used over and over again in the past few weeks. We have started this course with an overview of the hardware, CPU, memory and IO devices and then we moved on to the system software layer, which is we have seen how the OS runs processes and the similar concept of virtualizing the CPU for a process was used to run virtual machines and then we have studied a little bit about how memory is accessed and managed in an operating system and this week we are going to complete our discussion of operating systems by understanding how the OS manages IO devices. In particular, we are going to focus on two types of IO devices mainly that is the hard disk or secondary storage which stores your files and the network card or the network interface which lets you communicate with other machines. So, these are the two main important subsystems in an operating system when pertaining to dealing with IO devices. So, the part that deals with files is called the file system and the part of the OS that manages the network is called the network stack and in this week we are going to study these two in a little bit more detail. So, let us get started. So, this is a brief recap of what are IO devices and how they work that we have seen in the first week of the course. So, an IO device is any device that is other than the CPU and your main memory that is there on your computer like a hard disk that stores data in blocks or other devices like network cards, keyboard which basically generate a stream of bytes or even your monitor which is an output device and so on. All of these are IO devices and you all are familiar with these. So, every IO device is usually managed by a device controller which is like a specialized microcontroller that focuses on managing the IO device. So, there is the IO device and there is the device controller which exposes or which has a bunch of registers and the CPU or the device driver in the operating system will access these registers, read and write these registers of the device controller in order to get some IO done using the IO device. So, we have seen that some common registers are the command register which is used to give a command, the device driver will write a command into the command register, the status register indicates what is the status of the IO device, the data register is used to read and write data from the IO device and so on. And device drivers communicate with the device through these registers. So, we have seen a simple example of how you read the data from a disk. So, the device driver that is running inside the operating system will give a command to the disk asking it to read say some block of data on the disk and once the command is given of course the operating system will you know whichever process gave the command it will block, it will be context switched out and the OS will run some other process simply because the disk takes a really long time in order to finish this operation as compared to the operating systems or the CPU speed. So, therefore, while the disk is executing this command the OS will go run some other process and at this point of time the disk is executing the command and when the data is ready to read whatever data has been asked to read when that is ready the device will transfer it into RAM it will perform a DMA a direct memory access and transfer the data into main memory and then it will raise an interrupt the trap instruction will run then the OS the device driver will come handle the interrupt look at the data process it further you know give it to the process unblock it all of that will be done. So, we have seen all of this before and the main reason why we use DMA is that it reduces the overhead instead of the device driver copying data from the you know some register of the device controller the device is directly putting the data in RAM which makes life easy for the interrupt handling part and this is especially important for high speed devices like disks and network cards and so on. So, in the next few lectures whatever we study they are all based on DMA. So, this is a recap of what we have seen before in the first week of the course with respect to IO devices. Now, let us begin our discussion of file systems. So, we have seen the notion of a file before a file is nothing but stream of bytes that are persistently stored on a secondary storage device like a hard disk you all know what a file is you read and write files access files all the time on your computers and traditional hard disks split a file into some fixed size blocks and store it on disk it is not stored continuously but into it is split into fixed size blocks and stored and the operating system what does the operating system have to do with files the OS does many things. First is of course the user program is given a bunch of system calls the OS X exposes an API of system calls to user programs and then the OS actually implements these system calls you know your system calls to open a file read a file write a file all of those are implemented by the operating system and the part of the operating system that is called the file system. And as you implement these system calls you will have to access the underlying disk the device you have to access and this access is done via the device driver. So, this accessing the actual device is done via the device driver and also in addition to all of these pieces we have also seen that the OS maintains a cache a disk buffer cache of the recently accessed disk blocks whatever you have read from disk via the device driver you will cache them for some more time. So, all of these things the system call implementations the device driver the disk buffer cache all of these together are used by the OS to provide the user the abstraction of files and all of this code together is called the file system. And every file system has a specific way of organizing data on disk you know how to split the file into different blocks and so on there are different ways of doing it that is not just one way of doing it which is why there are many different file systems in use in modern operating systems. So, in this lecture we are not going to go into details about any particular file system but we are going to study the general principles that hold across any file system. So, in this lecture we will begin with understanding what are the data structures that are used in a simple file system and in the next lecture we are going to study the implementation of the system calls how do you actually do open read write using these data structures. So, let us begin the first and the most important data structure in the file system in modern file systems is what is called the inode or the index node. So, what is the inode a file a variable sized file is stored in is split into fixed size blocks and stored on the disk that is if you have a large file like this you will split it into blocks and then on your disk you will have one block here one block here one block here we will not store the file contiguously but split it into fixed size blocks and this is a principle we have seen before with respect to pages in main memory and so on and why do we do this because we want to avoid fragmentation external fragmentation you do not want variable size things being stored on disk and there being gaps left between two variable size chunks we do not want to deal with that headache. So, we split the disk into fixed size blocks and we allocate storage to a file in terms of these fixed size blocks. So, once a file is split into fixed size blocks and stored noncontiguously not in you know one place but all over the disk you will need somewhere to keep track of where all a file is located on disk just like how the memory image of a process is split in across many pages in memory and you need a page table to keep track of it. Similarly, for a file also every file there is a index node or an i node which actually remembers the location of all the blocks where a file is stored the first few bytes of a file is stored in this block number the next few bytes in this block number the next few bytes in this block number. So, all of these block numbers where a file data is stored all of those are kept track of all of those block numbers are stored in the i node or the index node of a file. So, this index node of a file also where is this index node stored this i node is also stored on disk just like how your file data is split into blocks and stored on disk similarly the i node is also stored on disk somewhere and this i node has the block numbers of all the blocks of a file and this i node need not fit in one disk block again it can be split into multiple disk blocks you will have a hierarchical structure that we will see in a little bit. So, conceptually it is similar to a page table of where you do fixed size allocation non-contiguous allocation and keep track of all that information in an index block. So, the i node stores all metadata about a file not just the location of these data blocks the disk block numbers of the file is one type of metadata about a file. So, understand the difference between the data of a file which is the actual contents of the file and the metadata of a file which is information about the file which includes disk block numbers various other things like what is the file size permissions who can read who can write when was the last time the file was accessed modified all of this constitutes the metadata of a file and the i node stores all of that information also and this i node is also stored on the disk along with the data blocks of a file. So, let us understand the structure of the i node in a little bit more detail. So, every i node in a file system has a unique number the i node number it is like say the role number of a student which uniquely identifies the i node on disk and therefore, from the i node you can uniquely identify a file. So, given a file if you know its i node number you can locate the i node on disk and once you locate the i node you know where all the data blocks are there of the file and you can get all the information about the file by knowing the i node number of the file. So, this is a very important piece of information about the file. Now, inside the i node how do you store all the block numbers of a file a file can have many blocks typical block sizes on a traditional hard disk is say 512 bytes. So, if your file is larger than that you will the file will be stored in multiple data blocks and how do you store all of these data blocks in an i node we are going to understand now. So, note that all of the data blocks of a file may not even fit in one i node you might have to use multiple different blocks they may not fit in one i node disk block. So, the way it is done is again the same idea of hierarchically storing like we have seen in page tables a similar idea is used. So, the typical i node looks like this the first few block numbers of a file the first few blocks of a file their numbers are directly stored in the i node that is the i node will have pointers direct pointers to the first few blocks of a file. And those blocks are called direct blocks why are they called direct blocks because their block number is directly there in the i node you go to the i node the first block of a file block number 42 you know that you can immediately access the first block of the file. So, this way the i node will have a first few direct blocks and then of course you cannot store large number of things at some point space in the i node will run out. Then once all the direct blocks are full once your file size exceeds these direct blocks then what you will do is you will create an indirect block which in turn has the block numbers of the file and this indirect block number that will be stored in the i node. So, there will be a single indirect block which has the numbers of the data blocks of the file and the number of the single indirect block is stored in the i node. So, without occupying much space you are able to store many more block numbers using a single indirect block. Then what if this is also full what of your file is bigger than this? Then you will have a double indirect block which is your i node will have a block number of a double indirect block and this double indirect block will have the block numbers of several single indirect blocks and these single indirect blocks will have the actual block numbers of the file that is you have a two level hierarchy. The block numbers of the file are stored in these single indirect blocks these block numbers are stored in the double indirect block and this block number of the double indirect block is stored in the i node of the file. Similarly, you can keep doing this, you can have a triple indirect block and so on. So, normally file systems stop around a triple indirect block because you do not need to store files larger than that. So, note that this is not a symmetrically hierarchical structure like a page table. A page table everything was two levels in the two level page table to access any virtual address any page you had to go through two levels. But here it is not like that. The first few block numbers of a file you can get by directly from the i node. Then the next few block numbers you will have to do one hop. The next few block numbers you will have to do two hops, then three hops and so on. So, why is this hierarchical structure? Because accessing a file may require multiple disk accesses and it is very slow and therefore, if most files are just small, you want to avoid so many disk accesses. You want to avoid multiple disk accesses even for small files. So, small files I will try to fit them into the direct blocks as much as possible. Only for larger files the overhead will increase. So, that is the reason why this i node is somewhat of a asymmetric hierarchical structure unlike a page table. And as with any multi-level indexing structure accessing a file now requires multiple disk accesses. So, every time you want to read some data block of a file, first you have to get the i node. From there maybe read multiple other blocks and then finally know the block number that you want of the file and then read that disk block. So, your overhead has increased due to this i node structure. And this i node also imposes a certain limit on your maximum file size. As you can see, there is only so many data block numbers of a file you can store in the i node and that limits your file size. For example, if your i node has k direct blocks and a single indirect block that can store n other block numbers and a double indirect block that can store n single indirect blocks, a triple indirect block and so on. Then what is the maximum file size that you can keep track of in your i node? You can remember the first k direct blocks, then you can remember n indirect blocks that is the block numbers which are stored in the single indirect block. Then you have you know n single indirect block each of which has n block numbers. So, therefore, from the double indirect block you can store n squared block numbers, then from the triple indirect block you can store n cubed block numbers. So, this will be the maximum size of the file you can keep track of. Once your file size exceeds these many blocks, then there is no more space in your i node available in order to store the block number of your extra data in the file beyond these many blocks. If you need more blocks, there is no place to store the block number in your i node therefore, you cannot keep track of the file anymore. Therefore, there is a limit on the file size. Similarly, there could be limits on you know depending on how many i nodes you have, how many files you can have in your system what is the size of the disk that you can manage all of these based on how your file system is structured, how these metadata data structures are structured based on that there is certain limits on various file size and disk size and so on in any file system and different file systems have different limits because their data structures are different. So, next we come to directories. So, what is a directory? You all know what is a directory which is a collection of files. But in Linux like operating system, the directory is also treated like a special kind of file. It is not any separate entity. It is also a file just that it is a file which contains information about other files. It is a special kind of file which contains information like the names of all the files in that directory and their i node numbers. So, the directory if you look at if you treat the directory as sort of a text file or something as a regular file what you will find inside the directory is a file name and an i node number. You know file name, i node number, file name, i node number, sub directory, i node number. In this is what you will find inside a directory if you were to read it like a regular file. So, that therefore the directory is also a file. It will also have an i node number. It will also have an i node and all these contents of the directory will also be split into blocks spread all over the disk and the i node of the directory will keep track of all of these data blocks of a directory. So, it is treated like a file and the file type in the i node will indicate is it a regular file? Is it a directory kind of a special file? All of that is indicated and the i node of a directory keeps track of all these data blocks of a directory. And how do you store these file name to i node number mappings in the directory data blocks? You can store it in any format. You can say I will have fixed size records, you know there is a fixed size record that has the file name and i node number, another fixed size, another fixed size. You can store it as fixed size records, you can store it as a link list, you can store it as you know a binary search tree sorted by the alphabetically sorted by the file name. There are many different ways in which you can store this file name to i node number mapping in a directory. So, if you want to find a file in a directory, how do you do that? So, there is the i node of the directory itself that has information about all these data blocks of the directory. So, you will first fetch the i node of a directory and inside that i node of the directory you have all these data block numbers of the directory, you will locate the data blocks, you will read the data blocks of the directory and inside those data blocks you will find all of these records of a file name, its i node number, file name, its i node number, all of that information you will find inside the data blocks of the directory and then you will search through this information for the particular file you want. And once you find your file you know it is iNode number and from the iNode number of a file then you know the information about the data blocks of the file itself and then you can proceed to access the file. In this way the directory helps you locate the information about a particular file. And note that the search for the file name in a directory will depend on how the directory is storing these records, is it storing them as fixed size records, link list, search, tree. So depending on that you are searching for this file the logic will be different inside a directory. But the basic concept is as follows a directory is nothing but a special file that has mappings between the file names and their iNode numbers so that you can locate these iNodes of a file when you need to. And this directory this is the data blocks of a directory, every directory also has an iNode which has pointers to these data blocks. And this directory is iNode, this number will be there in its parent directory. The parent directory will keep track of the iNode number of a subdirectory. So now next let us understand path names. So what are path names you all know what path names are in any file system there is some path name like you know slash home slash foo slash a.txt this is sort of in a in the directory tree in the root file system starting with the root directory this path name helps you locate a particular file, a file is identified by its path name. Now every time you want to open or access a file in any way you need to know its path name so that you can find out more information about the file. So if you know the path name how do you find out the iNode number of the file it is as follows. You will first start with the root directory you will know the iNode number of the root directory from that you will get all the data blocks of the root directory. And in the data blocks of the root directory you have information like this home sub directory some others sub directory what their iNode numbers are you know those iNode numbers from this iNode number of the home directory you get its data blocks and inside the home directory there are many other directories there like the foo directory and you will get the iNode number of the foo directory and in the foo directory you have many files like you know a.txt and you know its iNode number and from this iNode number you will get the inode of a.txt and in that inode you have the data block numbers of a.txt you can access the file. So this is called recursively traversing the path name of a file in order to get to the information in the file. So you start with the root and for every element you will read the directories data blocks, look up the next element in the path name, retrieve its inode number and again repeat again read the next level its inode, its data blocks and the next level you find out its inode, its data blocks you keep repeating this process until you reach the end of your path name at which point you have the inode number of the file that you want from that inode number you will get information about all the data blocks of a file. So this is how you traverse a path name to find out more information about the file like its inode number and its data blocks. So now any file system has a certain layout on disk there are all of these things you know inodes and directories and everything and all of these are laid out, organized on disk in a certain way. So every hard disk if you look at it it will have you know a large number of data blocks which store you know the data of files as well as directories you know even directories need data blocks to keep track of file name to inode number mapping. So you have a bunch of data blocks and then you have you know a bunch of inode blocks and then you need to have some information about which of these data blocks are free which are occupied which are not all of that information is also maintained you know in the form of say which like a free list you know which blocks are free or some kind of a bitmap. So what is a bitmap? You have one bit indicating 0 or 1 is the block on disk free or not you have that information that is called a bitmap. So you have all of these things stored on the hard disk and then there is one block the first block of the file system is called the super block which basically has information of this layout itself. The super block will say okay the first few blocks contain the bitmaps the next few blocks contain the inodes that next few blocks contain the data blocks all of this organization this information is stored in the super block. So anytime you want to understand how a file system is organized on disk you start with the super block read information about all the other organization in the super block and then you can access the rest of the file system and how is free space managed we have seen one way is the free list which is you remember you store the free block numbers as a list and the first free block number you remember in say your super block or somewhere and in this first free block you store the block number of the next free block in this free block you store the block number of the next free block and so on you can maintain like a free list like this or you can store a bitmap as we have discussed above you maintain one bit of information about each disk block that will tell you is that disk block free or is it in use by some file or some directory or something. So in this way you will manage the free space on a disk and this free space information these bitmaps are also stored on the hard disk and when you format a hard disk if you say you must have heard the term formatting a hard disk what does it mean it simply means laying out this information on the disk in a specific format and different file systems will have different layouts different you know data structures for all of these things and therefore the formatting also will be different. So next is the in memory data structures so all of this is on disk for a file system on disk you have data blocks you have inodes you have bitmaps you have a super block all of this is on the hard disk pertaining to a file system. Now a file system also in memory when the OS is running when the system is running in memory also it has a few data structures to keep track of and what are they when a file is opened and when it is being used to read and write you will basically take the inode from disk and you will keep a copy of it in memory in a cache that is called the in memory inode why is this done because when you are reading writing a file you need to know which is the first block number second block number you need to know a lot of information about the file in every time going to the disk to access the inode is a little bit suboptimal therefore to speed things up you will bring the inode into memory for the duration that the file is opened and in use that is the in memory inode then you have a data structure called the open file table which is used to keep track of all the open files. So, this is a table it has one entry for every file that is opened and this entry contains you know say the pointer to the in memory inode you know the inode number or various other pieces of information the other piece of information is the offset. Remember that in a previous lecture we have discussed when you read a file you read it as a stream of bytes if you first read 64 bytes the next time you read you will get the next 64 bytes. So, you have to remember the offset in a stream of bytes until which point have you read. So, that offset is also kept track of in the open file table. So, this open file table has the pointer to the in memory inode things like offset and a few other things all of this information is stored in the open file table and in addition to this open file table you have what is called the file descriptor array. So, this open file table is a global data structure there is one for all processes in the system whereas this file descriptor array is a per process array of information about the files that are open for this particular process and this is part of the PCB. The PCB is where you keep track of all information about a process the file descriptor array is also part of the PCB and this file descriptor array for every file that is open the info the pointer to this open file table entry will be stored in this file descriptor array. So, that this index in this file descriptor array this is what is returned to you when you open a file. So, when you open a file the inode is brought into memory and open file table entry is created and the pointer to this open file table entry is stored in the file descriptor array and which index is at the 012 whichever index in the file descriptor array you have stored this information that is returned to you as a file descriptor or a file handle. And in the future for any operation on the file you provide this handle and from this handle you follow the pointers you will reach the inode of the file you have all information about the file readily available for you ok. So, in the next lecture we are going to see how all of these data structures are used when we open a file read a file write a file and so on we will come to that in the next lecture. And one final in memory data structure is the disk buffer cache. So, all the blocks of a file that you read from disk recently all of them are cached in a least recently used cache called the disk buffer cache. So, these are all the important data structures of course different file systems more complicated real life file systems may have more or complex different data structures than this, but this is a basic idea for a simple file system. And one subtlety I would like to point out about the open file table is that every time you open a file with the open system call you will have a new entry in the open file table and a new entry in the file descriptor array. So, suppose there is a file that is opened by two different processes then the inode of the file will be common, but if process p opens a file it will create a separate open file table entry which is pointing to this inode. If process q opens the file it will also create a separate open file table entry that is also storing a pointer to the same inode. Similarly, process p will have it is own file descriptor array that has a pointer to the open file table entry process q will have it is own file descriptor array that has a pointer to its open file table entry. Why do we have this? Why do we have different open file table entries? This is because the files when they open when different processes open a file the offsets at which they are reading they are independent. If process p reads 64 bytes then process q will start at the beginning it will not continue where p left off therefore you need to keep track of different offsets therefore you will have different open file table entries. With the exception that when the parent folks a child process at that point if this is process p and it has forked a child c then both this parent and the child share the same offsets of the open files that is they will both point to the same open file table entry and when one of them reads or writes that offset is reflected in the other child. So, this is the only time where different processes will use the same open file table entry otherwise different processes open the same file also they will have separate open file table entries and they will read the file as independent streams. And in the case of the parent and child of course this is a little messy therefore it is usually better if one of them closes their file and the other continues to use it. So, that is all for this lecture. In this lecture we have introduced the concept of what happens in the file system of an operating system we have introduced the data structures like open file tables, i nodes and so on. In the next lecture we are going to continue our discussion on how the file system calls are implemented using these data structures. And a small exercise for you is as I have said there are many different file systems in use in modern systems. So, take a look at your computer, your desktop or laptop find out more about what file system is it running, what is the name of the file system, what are some of its differentiating characteristics. You might want to just read up a little bit about the different types of file systems available today. Thank you all that is all I have for this lecture. Let us continue this topic in the next lecture. Thank you.