 and introduction to distributed file system. This is Dr. Anita Pooja, professor in computer science and engineering department at Walchand Institute of Technology, Sonapur. At the end of this session, students will get familiar to what are the goals of distributed file system and the architecture of the distributed file system. Students are required to have the knowledge of file system management in Unix OS. Now, what is distributed file system that is DFS. Before going to DFS, let us see what is file system. File system is a component of operating system that performs organization, storage, retrieval, sharing and protection of files. So distributed file system is also a component of operating system, distributed operating system that is it is a resource management component of distributed operating system. It provides storage and retrieval of files but in distributed environment. Here the users of the files and the storage devices of the files are physically dispersed, separated. It implements a common file system that can be shared by multiple autonomous computers or we can say multiple clients. Now, let us see what are the goals of the DFS. DFS has two important goals, network transparency and high availability. Network transparency means users are not aware of the location of files at all. They are only concerned with giving the request for the files and they always assume that these files, the files that they request they are always present on their local system. They are not at all aware from where they are getting the files that is called as network transparency. Second is high availability system failures of failures in the regularly scheduled activities should not result into unavailability of files. In any case of partial failures, users should get the files that they request. Now, let us see the architecture of DFS which has only three components, file servers, clients with the client cache and the local disk and the servers and clients communicated through communication network. File servers are dedicated storing files and perform file access operations. Clients they are used solely for computational purposes that is the access files stored on the servers, perform some processing on the files like modification of file or something of that sort. Client machines can be equipped with local disk storage that can be used for caching the remote files as a swap area or as a storage area. The two important services in distributed file system are the name server which performs the name resolution service and the cache manager which provides the caching of the data and on client side as well as server side. Now, let us go to name server. So, it is a process which maps names specified by the users to the stored objects such as files and directories. This is also known as name resolution and it occurs when a process or client refers to the file or directory for the very first time. Now, let us move to cache manager. It is a process that implements file caching. Now, in file caching a copy of file stored at the remote file server is brought to client's machine when referenced by the client for the first time. Now, subsequent access to the same file by the client can be fetched from the client's cache instead of getting it all the way from the file server thus reducing the delay or the network latency. So, file caching is done on the client side assuming that client may need to access the same file again and again. So, why to send all the why to send the request all the way to the remote file server every time. So, it always when the first time it accesses the file it will copy that file into its local cache thinking that the next time it can fetch the file from its local cache only instead of getting it from file server. Cache managers are present both at clients and servers. At the server side cache managers cache files in the main memory to reduce the disk latency. See, the file system always resides on the hard disk, but when first time the server fetches the file from the hard disk then it will cache the file in its main memory thinking that next time it wants to access the file it will get it from your main it will get it from its main memory instead of from it from the disk because every disk operation is associated with the latency. If multiple clients cache the same file and try to modify it then the copies become inconsistent. To avoid this inconsistency cache managers at client and server they should very well coordinate with each other during data storage and retrieval operations. Now let me pause the video for a question think on this question and try to answer it correctly. So cache managers are responsible for four options are given caching data in client cache caching data on server side in main memory avoiding inconsistency by coordination during data storage and retrieval operations mapping user names to stored objects such as files or directories. So choose appropriate one. So caching data in client cache is done by client managers as we have seen just now caching data on server side in main memory is also done by cache managers on the server side and cache managers are also responsible for avoiding inconsistency by proper coordination between them during data storage and retrieval operations. But the fourth job is not done by the cache managers that is mapping user names to stored objects such as files or directories it is done by the name server during name resolution. So the first three are applicable to cache managers so option c is correct. Now let's see how data access actions are performed in distributed file system. So there are two parts here client and server on the left side the client part is shown here client issues request to request issues the request to access the file data here is nothing but file the client first checks in its cache whether the data is present if it is present it is immediately given back to the client if it is not present then it will check the file whether it is present on its local hard disk if it is present it is first given to the client cache from where it is again given back to the client if it is not present then it is the request is sent to the file server through communication network. Now file server first it will check in its server cache if the data is present it will immediately send to the client cache and from client cache to the client if data is not present then it will check on its disk server disk if the file is present if it is present then it will first be copied to the server cache and from server cache to the client cache and then finally to the client in this way the data access actions are cannot in DFS. Now let's see some of the mechanisms first we will start with the mounting. We can bind together different file namespaces to form a single hierarchical structured logical namespace. So namespace is bounded always to an either internal node or leaf node of a namespace tree so this is a namespace tree there are three servers shown here server x then server y and server z server x consists of server y mounted at mount point a and server z is mounted at point i of the server x. So here the server y and server y consists of the combined namespace of d, e and f and server z consists of combined namespace of j and k similarly with that of the server x. So here the kernel maintains a structure called as mount table the mount table maps mount points to appropriate storage devices in distributed file system file systems maintained by file servers are mounted at clients. So there are two approaches to mount this mount information one is we can mount at the client side and second is we can mount at the server side. So let's see maintaining mount information at the clients each client has to individually mount every required file system this approach is usually followed in some NFS. Since each client can mount a file system at any node in a namespace tree every client need not see an identical namespace they will have a different view of the file namespace. Now maintaining mount information at server mount information can be maintained at servers in which case every client can see an identical file namespace they are not viewed differently because they are not mounted at the clients they are mounted at the servers. So if files are moved to a different servers then this mount information needs to be updated only at the file servers not at the clients whereas in the first approach if this is the case that is if files have moved to different servers then every client needs to update its mount table. Now this mechanism used in DFS that is caching is used in DFS to reduce delays in accessing of data. In file caching a copy of data stored at remote file server is brought to client when referenced by client for the very first time. Data can be cached in main memory or on the local disk of the clients. Data is cached in main memory at the server side to reduce the disk access operations and thus disk access latency. This is another mechanism that is hints caching results in cache consistency problem. So this can be avoided by great level of cooperation between file servers and clients which is very expensive. So alternative method is to treat cache data as only hints not the data that is expected to be completely accurate. For example after the name of file or directory is mapped to the physical object this address is cached or stored as a hint in the cache. So that next time if same file or directory is to be accessed then this hint is taken from the cache and directly that file is accessed. But if this address fails to map to the object because object has moved from this location to some other location then this cached address is no more useful. So this is deleted from the cache. File server conserves the name server to obtain the actual location of file and then updates the cache. The third mechanism is bulk data transfer. Now in this mechanism multiple consecutive data blocks are transferred from server to client in one seek operation. So this reduces the file access overhead by obtaining multiple number of blocks within one within a single disk seek time. So this mechanism is used on the basis of the fact that most of the time files are accessed in their entirety not in the part. Encryption. This mechanism is used for security and distributed systems. This method was developed by Edam's Cruder and it is used in DFID security. In this schemes two entities that want to communicate with each other they establish a key for conversation with the help of communication with the help of authentication server. These are some of the references used in preparing this video. Thank you.