 Hello everyone, welcome to the session on cap theorem in big data analytics. In this one, we will be studying in detail the cap theorem which is used in big data analytics. The learning outcomes at the end of this session, you will be able to analyze the different aspects of cap theorem for choosing the database and you will be able to formulate basic use case of cap theorem where to use, how to use, when to use, okay what are the aspects of cap theorem. All these one we will be learning in this one. What is cap? Cap is actually having the three things here, one is consistency, another one is availability and the third one is partition tolerance. So these are the three things which are basically involved in the database. Talking about the cap theorem, actually it is called as Briever's theorem also. For few times it is called as Briever's theorem or a cap theorem. It is stating that in a distributed environment, it is impossible to provide all the three cap guarantees. What are these three one, consistency, availability and the partition tolerance. In another way, if we are talking about the cap theorem, what exactly cap theorem is? It is saying that it is impossible for a distributed computer system to provide simultaneously all the three cap guarantees, okay. So another formulation is the cap theorem states that a distributed database system has to make a trade-off between the three's. What are these one, consistency, availability and the partition occurrence, okay. It has to make a trade-off for in the distributed system has to make a trade-off between these three because all three are not possible at a time. Let us see one by one, what are the these three things. The first one is consistency. What consistency is talking about, you can see the diagram here that anybody is whenever requesting a data, that data should be available, okay. But in the same view to everyone, okay, even if simultaneously everyone is asking, the same view of the data irrespective of any kind of updation, any kind of deletion, it should be available. Knowing inconsistency data should be there in a network whenever the request is coming. So it is implying that every read fetches the last write, whatever the last write is there that should be given to every next read, that is the talking about the consistency. Next comes availability, what is this availability? Availability is nothing but every time, okay, the server should be ready, the data should be given, so 100% data should be available. It is in another way saying that if each client can always read and write, anytime anybody can read or anybody can write, okay, that is called as availability. And it is implying that every reads and writes are always succeed. It should not happen that write is the last operation which is working, immediately the read option operation is working, not like that, okay. So the system should be available all the time for even read operation or for even write operation. So here you can see the scenario is given in this diagram like a particular request is there and the response is coming from that one. But from where the response is coming that does not matter because but the response has to come. It should not say that that particular data is not available. Next comes partition tolerant. What is partition tolerant is? It means that usually what happens whenever we are in the network, the network may be partition, the network may break, okay, in between. So your distributed environment may be cut into different parts, it may be in a partition where few systems are at one partition and few are at the other partition. So this is called as a network partition. But even if a network partition occurs, then also the system has to work, okay. So that is what that system is called as then partition tolerant. So when the system is even working despite of the network partition, then that system is called as a partition tolerant system. What it is implying, the system will continue to function for the network partitions also. So the scenario is given in this one. So here the partition has occurred but then also whenever somebody is requesting the response has to come there, that is about the partition tolerant. So the three things we have seen, let us see these one, that all these three things we have combined in this one. So pause the video and observe the following figure and think that why that cross is there in the center, okay, and what it is indicating. Whatever we talked about till now, okay, that is only it is showing, think of this one. So what that cross means, the cross is nothing but we cannot build a general data store that is continually available, sequentially consistent and tolerant to the partition, all three are not possible. So that is what this diagram is saying for cross, okay. So another thing this diagram is showing is this is talking about consistency where all the clients see the same view of data, okay, everybody can see that one even right after the update or delete. Availability is all clients can find a replica of the data, okay, even in case of few nodes are failure or whatever the data is available and partitioning the system continues to work as expected, okay, even in the presence of the network failure also, okay. But at a time even availability and partitioning can be combined, consistency and partitioning can be combined and provided by the system or consistency and availability is provided by the system but not all three. So this intersection is not provided by the system which it is not possible, it is we are saying it is impossible. So the same cap theorem it has shown in the triangular view, see what it is saying, the three points are there for this triangle, this is about the consistency, this is about the partition tolerance, this is about the availability. And this combination means this edge is saying about consistency and availability, this edge is saying about availability and partition tolerance and this edge is saying about consistency and partition tolerance. So even these two are available so at a time these two are available and what are the models which are supporting this one you can see in this one the examples are given. The data models which are supporting usually for the relational as compared to this one comparison are these one like key value and all these one these are the data models. Now which databases are supporting this one consistency and availability the normal RDBMS, MySQL, PostgreSQL talking about the big data thing that is AsterData, Green Plum all these are provided. Similarly availability and partition tolerant is provided by Dynamo and these are the things like Cassandre is there, SimpleDB is there, CouchDB is there, Risk is there and many more. For the third one that is consistency and partition tolerant what are the databases which are providing these one BigTable is there, HyperTable is there, HBase is there, MongoDB is there okay and other these are provided here okay all these are supporting the two things consistency and partition tolerant. Let us see one by one these one as we have seen that all three are not possible now how the two can be combined. Talking about CA that is consistency and availability what it is saying it is saying that a single site cluster all the nodes which are always in the contact when a partition occurs should work here okay and it has to compromise the partition tolerant but it is working for consistency and availability. Few examples of applications are banking, finance application, a system which is having a transaction which is connected to the RDBMS these are provided by the facilities consistency and availability. Next comes availability and partition tolerance what it is saying system is still available under the partitioning but some of the data return may be inaccurate why because the consistency cannot be maintained here because of the partition even if it is partition tolerant. So it is compromising the consistency in that case when to choose AP to achieve itself a question okay but how we can do that there is a use case for it. What is that one when most means recent versions of the data when you want to read it okay usually that that would be state okay. So this system state will also accept rights that can be processed later when the partition is resolved. So therefore it is not maintaining the consistency. Next comes so talking about this availability and partition more availability is also compelling the option when the system needs to continue a function inside in spite of the external errors okay the types of applications are shopping cards, customer facing system, news publishing systems are there okay and the artifacts whenever you are saying that consistency usually is compromising the availability or partitioning tolerance. Next comes this consistency and partition tolerance what the CP is saying that some data may not be available accessible but the rest is still consistent and accurate okay. So whatever the availability may be not there hundred percent but the data is whatever we are reading that data is consistent and it is accurate okay and it is handling the partition tolerance also. So even if the partition is there the system is working in that case but what it is compromising it is compromising your availability because sometimes the data may not be available. So the examples are Cassandra, Amazon's or Dynamite Dubu, Old Mode, React, SimpleDB etc. These are my few references. I have referred the textbook as well as a web resource. Thank you.