 Now, let's look at the content delivery and dissemination mechanisms. We'll see how we can make the delivery of content from the CDN to the end user more dynamic, scalable and efficient. We are going to look at some replication techniques that we shall consider in this module. Since we know that CDNs are global. We can think about a very large scale transcontinental network connecting various data centers. So it's a very complex situation. We can expect some kind of connectivity failures, intermittent connectivity or even hardware failures. In that case, the original content which is hosted by the original web server is not replicated at the caching sites or the remote sites. Therefore, a requirement comes up that a CDN is going to work best when the connectivity is intact. The content is replicated to the remote sites so that the users are able to access the content locally. So pushing the content towards the users is a desired goal for CDN. This not only helps the overall quality of service parameters in terms of user experiencing some performance, but it also increases the availability of content in the wake of failure. This necessitates that some kind of algorithms and mechanisms be devised which deal with the replication of data and the dissemination of data through some pre-like structure. We can think about the creation of replica in a content delivery network at two levels. In the first level or tier, we have a small group of servers which are very consistently updating the content from the original server. We can think about it as a Byzantine inner ring. It's a security terminology which can be replicated onto the storage system. So the original content from web server or streaming media providers can be replicated onto this ring. Then we have an outwardly growing large network which is more of a soft state based second tier. Soft state basically means it's a timer based content management mechanism. So what happens is once we have a timer based mechanism, it means that the content in the outer circle or outer tier is going to be a replica which is temporarily stored on the outer circle. So the CDNs, the file system caches, proxy caches all fall into this category. So if that's the desired view we would like to develop for content distribution through replication and caching, what are the possible challenges we foresee? The first one is that whatever the logical abstraction of inner and outer circles or tiers may be, on the physical implementation, we must think about some kind of tree like structure where the replicas are sent out in a multicast manner because having a broadcast storm or a unicast transmission for replication does not make much sense. So this multicast tree is going to behave like a dissemination mechanism which will make a tree. This tree is essentially limited by the total branching or the total connectivity that could be realized in this arrangement. The second tier application is aimed to provide content in close proximity to the users so it is expected that the quality of service overall is going to increase. At the same time, the network considerations and the cost issues like resource consumption of the underlying infrastructure and the communication overhead shall also be optimally realized. The book which I am referring to is again from CDN Raj Kumar Bhaiya, there it mentions of an algorithm proposed by one of the contributing group. It is called SCAN, Scalable Content Access Network. Here again the physical nodes are categorized as servers, clients and these are placed in the servers are placed within the ISP data centers. These servers actually have the replica of the original data which whichever was stored at the origin server. In order to have consistent view, these servers must have an interactive update mechanism that they call as distributed routing and location system. So this helps locate the nearby replicas for the clients without having to go through the global communication. The architecture of SCAN system can be thought of at the data plane and more at the physical level. At the data plane, we have the origin data source represented by the square with thick black line. Then we have the replica which is represented by a square with thin dotted line. The cache is represented by an ellipse and then they have a relationship. Thin arrow actually refers to always update and then the thick arrow refers to adaptive coherence. So these two are basically rigid update and need driven update. So various representations and relationships are shown between the replica and the cache. So this is the data plane or the more logical or abstract plane. On the network these are represented by root servers, servers and the clients. And the relationship is described through the dynamic location and discovery mechanism, the protocol that we have just described through mesh. They call it a tapestry mesh. With that in view, let's look at how the internet content delivery systems vary in terms of performance and certain other properties. The classification that can be thought of as we can have web-based caching in which the client initiates the request or the server actually initiates the request. There is uncooperative that is no coherence, no coordination between the servers, the caches and the replicas. It is called uncooperative pull-based content delivery networks. Then we have pushed-based content delivery networks which are cooperative. Now these different options are compared against a couple of properties. For instance, what is the cache replica sharing mechanism for efficient replication? It is obvious that sometimes it is cooperative, sometimes it is not cooperative. Then how much support is there for increasing size of the network for request redirection that is when a request is made, would that request be redirected to another replicated server or a cached server. So these are different options which are available. For instance, you can see that in server initiated, we have a bloom filter-based mechanism to exchange the replica locations. Then we have the detailing of replication which could be achieved that is called granularity. On the basis of URL or URI. And then on the basis of a website also. Then is there a provision for load balancing? Are the replicas coherent with each other? Is there a network monitoring mechanism for fault tolerance for these different options available? So this entire table summarizes the options that we have and we can take a blend of either of these. The reference book is again by Raj Kumar Bhaiya, Content Delivery Networks, published by Springer Science and Business Media, lecture notes and electrical engineering.