 Dr. Nita Pooja, professor in computer science and engine department at Walsh and Institute of Technology, Solapur. At the end of this session, students will get familiar to the design and architecture of Google File System. Google File System is a scalable distributed system for handling larger data intensive applications. It is built from cheap commodity hardware. It is most suitable to handle the large files ranging from 100 megabytes to many gigabytes. It supports large streaming reads and small random read operations. It supports large sequential file appends also. It supports producer consumer cues for many way merging and file atomicity. That is, it allows multiple handles, multiple clients to access the files in a concurrent fashion. It uses bulk data transfer in a single operation, so it sustains high bandwidth. Now, what is the interface between GFS and the user? It is the interface just like a normal file system, that is, hierarchical directories and path names. And the usual operations that are performed are the same as that we perform with a normal file system, that is, create, delete, open, close, read, and write copy operations, as well as append, record append. That is, multiple clients are allowed to append the data to the same file concurrently. Now, let's see the architecture. Architecture consists of GFS master server, one copy, and multiple copies of GFS chunk servers. As well, on the left side, we can see the application that is GFS client. Now, GFS client provides the file name that it wants to access to the GFS master. GFS master, it returns the client chunk handle and also the chunk location, because it has the metadata which gives the information of the file mapping with its chunk locations. So it returns the chunk handle as well as chunk locations to the client, to the client, GFS client. Then GFS client uses the chunk handle. Then it also specifies the byte range that it wants to read or write to the GFS chunk servers. GFS chunk servers, if it is a read operation, then the chunk server will immediately return the chunk data to the GFS client. And if it is a modify or the write operations, then these are carried on on the various chunk servers where the replic of the files are maintained in an pipeline fashion. Now, what is the chunk size in GFS? The chunk size is 64 MB, which is much larger than the typical file system block size. Now, what are the advantages of maintaining using such a large chunk size? First is, in one chunk operation, in one chunk fetch, many read and write operations can be performed on the chunk. So it reduces the interaction between client and master every time getting the new chunk from the masters. Client can perform many operations on the given chunk. It reduces network overhead by keeping persistent TCP connection. It reduces the size of metadata stored on the master, obviously, because if chunk size is larger, then the number of chunks is also small. Hence, the size of metadata stored for each chunk on the master is also small. So this metadata resides in memory, main memory. Now, metadata can be of three types. One is namespaces, obviously, which consists of file and the chunk identifier. Mapping from the files to chunks is the second type of metadata. Then location of each chunk replicas is the third type of metadata. Metadata is stored in memory, that is, in the main memory. Hence, the entire scanning of this metadata is quite easy and efficient. Now let's see the chunk locations. Who has the record of these chunk locations, where they are residing, where the replicas are residing, and so on? So master do not keep this record of chunk locations at all. Instead, every time it pulls chunk servers at startup and periodically after fixed interval of times, these are called as heartbeat messages. And the chunks return the chunk ID of the file that is present with them. Now because sometimes chunk server fails, it is hard to keep the persistent record of the chunk locations. Now let's move on to some of the operations that are done by the master, as well as some of the operations that are performed by the client. Now, very first operation is the operation log. Now, as we know that master maintains a historical record of the critical metadata changes. It also maintains the information of the namespace and mapping. And for reliability and consistency purpose, it also maintains the information of the replicas of every chunk, the number of replicas of every chunk, and on which servers they are maintained. So all these information forms the log record. Some chunk server is primary for every chunk. That is, for every chunk, there are primary replicas as well as second replicas. Master always grants lease or permission to the primary replica first, typically for 60 seconds. Then leases are renewed using periodic heartbeat messages between master and chunk servers. Clients ask master for primary and secondary replicas for every chunk. And client sends the data to replicas in the decision fashion, that is in the pipeline fashion. That is, all the modifications to the files, they are forwarded to all the replicas in the pipeline fashion. So this takes advantage of the full-duplex Ethernet links. Now let's see how the write control and the data flow is performed in the GFS. So there are totally seven operations that you can see between the client and the master and the flow that takes place through the replicas. So the thick lines, they show the data flow control, data flow, and the thin lines, they show the control, write control flow. So we'll see what are these seven operations. First operation is client asks master, which chunk server holds current lease of chunk and the locations of the other replicas. So master replies with the identity of primary and the locations of the second replicas of the chunks. The client pushes data to all the replicas. Once all replicas have acknowledged receiving the data, client sends the request, write request to the primary replica first. The primary assigns consecutive serial numbers to all the mutations it receives and providing the serialization. Then it applies all these mutations in the serial number order even to the second replicas. That is, they apply mutations in the same serial number order even on the second replicas. Certified replicas reply to primary, indicating that all mutations or operations have been completed on their side. Primary replies to the client with success or with the error message. Now let's see what are the client operations done? How the client performs operations with a server? So first client issues control request to the master server. Control request is nothing but request for the metadata. Then it issues data request directly to chunk servers. It caches metadata, but it does no caching of the data on its own side because for streaming reads, that is read once and appending writes, that is write once, they don't need caching at client side at all. So they don't, caching of data on the client side is not there in GFS and hence no consistency difficulties arise among the clients. Now master operations, we will start first with the log records. Master has all metadata information. If it loses it, it loses the complete file system. So master logs all the client request to this sequentially, replicates log entries to remote backup server also so that even if this master goes down, it can be handled, the operation can be continued by the backup servers. Now next operation is where to put a chunk. That is how to decide the location of a chunk. So it is always desirable to put a chunk on the chunk server, which is below average disk space utilization, that is, which is likely loaded. Then it limits the number of creations on each chunk server to avoid heavy right traffic. Okay, chunk replicas, they are maintained on certain chunk servers. So chunk spreads chunks across multiple racks. Now next master operation that is the re-replication and rebalancing. Re-replication occurs when the number of available chunk replicas falls below a user defined limit when there is a shortage of the chunk replicas. Now when can this usually occur? Chunk server, when chunk server becomes available due to some problem, that is, it crashes or there is a corrupted data in a chunk or there is a disk error. And if there is an increased replication limit, then this re-replication occurs. When re-balancing is done, re-balancing is done periodically by master. Master examines current replica distribution and moves replicas to even disk space usage. Gradual nature avoids swamping a chunk server. Fault tolerance, now during fault tolerance is nothing but recovery from the faults. So for fast recovery, master and chunk servers are designed to restore the state somewhere and restart in seconds when they recover from the failure. So chunk replication, each chunk is replicated on multiple servers on different racks. And according to user demand, the replication factor can be modified for reliability purpose. Master replication can also be done, that is, all three metadata, operation log, historical record of critical metadata changes and operation log is replicated also on multiple machines. Now pause the video, think and write. Re-replication occurs in which of the following cases. There are four options given to this. When chunk server becomes unavailable, when corrupted data is there in the chunk or when there is disk error or all of this. So we have already seen just now that in all the first three cases, that is A, B, C, chunk, that is re-replication occurs. That means option D, which says that all of these is the correct answer. These are some of the references used for preparing the session. Thank you.