 students welcome to a session on mechanisms for building distributed file system and the design issues of the distributed file system. This is Dr. Anita Pooja, professor in computer science and engineering department at Vulture Institute of Technology, Solaapur. At the end of this session, students will get familiar to what are the issues in design and implementation of distributed file system. And in that, there are many issues in the design implementation, but in that, we will be covering the first issue that is file naming approaches in this session. So, we will start with naming and name resolution. Every file or directory in file system is associated with a name. That name, it is expected to be a system wide unique name. Now, name resolution is the process of mapping a name to an object. In case of replication, as you know that in distributed systems, every important resource is replicated on multiple nodes, so that they should be available even in the case of failures. So, files are also replicated on multiple nodes. So, in case of replicated files, it is the process of mapping a user defined name to multiple objects. To multiple objects are the ones on which the replicas of the files are stored. Namespace is the collection of names which may or may not share an identical resolution mechanism. Now, let us see what are the traditional approaches to use the naming for the files or what are the conventions used traditionally to create the file names. So, the first approaches concatenate the host name to the file name stored on that host. So, you are using the physical location of the file as the host name and that is concatenated to the file name which is stored on that host. It is nothing but computer here. So, what are the advantages of this approach? This approach is very easy to be done. It assures that a file name is unique system wide because host name of every computer that is nothing but its IP address is unique. So, obviously this naming convention is also a system wide unique name that is created. So, name resolution is simple as the files can be located without the help of the other host in the system. Yes, because every file name has the host name concatenated with the file name, we don't need the help of the other host in locating the files. What are the disadvantages of this approach? It fails to achieve the goal of network transparency. I hope you remember the very first goal of the distributed file system was to achieve network transparency. That means the user should not come to know the physical location of the files. But that cannot be achieved in this approach because we are concatenating the physical location of file with the name of the file. Now migration of file from one host to another host changes the name of the file and thus changes are required to be made in the application that access this file. Yes, because say we are concatenating the IP address of the machine on which the file resides that is the host name and if file is migrated from that machine to any other machine obviously its name will change as well as the applications that access this file also need to be changed. Now this approach is location dependent. So one of the characteristic feature of distributed system is all the resources that are used in the distributed system they should have the names which are location independent. They should not be location dependent because if you use location dependent then we cannot achieve the network transparency also. Now the second approach, mount remote directories into local directories. Now mounting a remote directory requires that the host of the directory be known, right? The machine on which this directory is hosted that should be known. Once a remote directory is mounted its files can be referenced in location transparent manner. Advantages of this approach we can resolve a file name without the help of any other host without help of any other computers. Third approach, now this maintains a single global directory where all the files in the system belong to a single namespace. So it is just like we are maintaining a single global directory where we are combining all the namespaces of every file servers and weaving it as one logical space. Disadvantages, it is mostly limited to one computing facility or few cooperating computing facilities because this limitation requires the system-wide unique file names to be generated. Thus this scheme is impractical for distributed systems that involve heterogeneous environments and wide-area networks. It is not possible in this environment where we are using different types of computers and wide-area network and so on because one naming convention that is suitable for one computing facility may not be suitable for another computing facility. Now we can also use file names or we can also resolve a given name using a concept of context. A context identifies the part of the namespace in which to resolve a given name. Now context can be anything. Context can partition a complete namespace based on the geographical boundary, organizational boundary, specific to host, a file system type, etc. For example I can say that all doc files, they belong to one category, they belong to one context. So when I want to resolve a file name which is of type doc, I will use its context to resolve the given file. Having context-based scheme, a file name is composed of context name followed by the file name that is local to that context. In the first approach how we did? Host name, concatenated with the file name. But here we will use the context name followed by the local file name to that context. So to resolve a name, the name server should interpret the name with respect to the given context. If all files share the same initial context, then unique system by global names result. Example, we will consider X kernel logical file system which makes use of context. Now in this file system, user defines his own file space hierarchy. The internal nodes in this hierarchy correspond to the context and the leaf nodes correspond to the actual file name local to that context. Now there is one more naming scheme, file naming scheme which uses the concept of context that is tilde file naming scheme. The namespace is partitioned based on the projects they are associated with into a set of logically independent trees called tilde trees. Each process running in the system has a set of tilde trees associated with it that constitute the process tilde environment. For example, if I am running two projects in my environment, distributed environment, I will have two tilde trees associated, one associated for each project and all those file names belong to that particular project only. So this is a very good type of context we can use to resolve a given file name. So when a process tries to open the file, the file name is interpreted first with respect to its context that is process tilde environment and then with reference to its file name in that context. Now let's pause the video, see the question that has been given, think on it and try to answer the question. Having a combination of host name and file name for naming files guarantees system-wide unique file names but has a disadvantage of location dependency. This was the very first approach that we studied that is concatenating the host name with the file name. This will obviously generate the unique file names because all the host names they have unique IP address but there is a disadvantage of location dependency. If file moves from one system to another system, its name completely changes. So the answer here is true. These are some of the references used for preparing this video. Thank you.