 Good evening, guys. We are ready for the next session. The next session is Python and React Database, a perfect couple for huge-scale distributed computing. The speaker for this session is Narend. Narend is a DevOps engineer from Nolarity Communications Bangalore. So I request Narend to kindly start with the session. Okay, let us start. Yes, this is the talk which I'm going to give, Python and React DB. How many of you heard about the React DB before? Please raise your hands. Have you used it before in your company or anywhere else? Just raise your hands if you used it before. Okay, so this is my talk. And why I'm concentrated on React DB this time is because of the problems we face in our company. And this is me. I'm a DevOps engineer at Nolarity and a programmer in Python and JavaScript from past two years. And also many of you could have seen my blog, I Am Pythonist. How many of you saw that? I am Pythonist. So let me ask you a few questions, or I want to take a few ideas from you. Like, have you ever wondered how Amazon is handling such a huge traffic and how it's maintaining, how it's recording all the transactions every day? Yes. Have you seen any such organization equivalent to Amazon in handling such a big number of transaction lists and all? There are a few technologies which can handle equivalent to Amazon, but Amazon is doing that because of one technology. It's DynamoDB. It's the paper which is released by Amazon in 2007. So React. React is an open source implementation of DynamoDB. So it is capable of doing all the things that Amazon S3 can do. See their characteristics. 100% uptime, infinite scale. 100% uptime is not possible in the reality, right? 100% nothing can be up. But we will talk in the sense of relative position saying that it's 100% uptime means there will be no downtime and all. Infinite scale is not possible, but we will say that scalability should be there and it should increase horizontally as we add new resources or nodes to it. And fault recovery. If any of the nodes in the cluster went down, you should be able to recover the entire cluster using the other nodes which are existing. And low latency. Latency means it's the delay which is experienced by the user. So low latency means the delay should be less so that customer user experience will be very good. So why we need to care the fault tolerance? Because nowadays even the two seconds and three seconds delay can cause the user to go to other websites like take Snapdeal and Flipkart. If Snapdeal lags, you will instantly open a new tab and you will go to Flipkart. Yes or no? So Snapdeal is lost one customer. I'm taking it as an example. It may be the case of Flipkart too. So that's why you need to take care of fault tolerance. So you should never lose a customer, a business customer. Many of the companies currently which are existing in Bangalore are startups and I don't think they've grown very big but eventually all the companies from startups will go into the final stage where they need to handle a lot of requests and a lot of users and a lot of traffic and they should scale their data, their databases. So at the time even the loss of one second will cost you billions of dollars or millions of dollars, whatever it may be. Amazon will lose if they cannot provide the scalable solution for the customers. If any latency were there, then finished. Business will went into a huge loss. So React, I will come to Python in the second half because React is totally a different solution which is offered. I think you know about MySQL and NoSQL database solutions. So in NoSQL we have many types of NoSQL databases like MongoDB, CouchDB, FoundationDB, Redis and React. Since many of you don't know about the React, what is the specialty, why we need to use it because there are a lot of options available, why I should use React. So it's a NoSQL database. I will say use it for two main reasons because it's built on Erlang. How many of you programmed in Erlang? Here, please raise your hands. Erlang is a functional programming language. This is not Erlang con, so I cannot extend it, but Erlang is the thing which is powering the WhatsApp. So this is built on Erlang. Erlang in the sense there will be actors and concurrent programming in which everything is fault recovery. One process even though if it dies, it won't crash the system. It is independent. All the process will communicate with each other using the messages. There won't be shared resources and all. So obviously there will not be crash even though one process dies. So ReactDB is built on Erlang, so it will be robust. If you take MongoDB it is on C++ and if you take Cassandra it is on Java. So the second point I told you in the beginning that it's the open source implementation of DynamoDB. Dynamo has a specific architecture and how it process the clustering, how it maintain the different notes, virtual notes is the main thing we are going to discuss here. And in Dynamo paper the architecture which they mentioned is not master slave like MongoDB. Many of you have heard about sharding in MongoDB when you want to scale it up. Sharding is the process where you scale. So in DynamoDB there is no master slave architecture. Everything is peer to peer. All the guys in the notes are equal and they will know all other notes. So there is no single point of failure saying that in MongoDB if master is failed then everything is lost. But in Dynamo everybody is a master so everybody can see all the things. So if you use React the main advantage as a DevOps engineer you will get is there will be no 2 p.m. calls. 2 a.m. sorry, 2 a.m. calls will not be there. There is 2 p.m. Because even though one note fails the other note will come to save your life and it will process the request. So how many of you work with distributed databases? Just raise your hands. So what are the two main characteristics in the distributed databases is like replication and partition. You know the meaning of replication, right? You have one set of data and you want to take a copy of it, backup exactly clone of it. It is called as the replication and partition means cut it into half and place one half in one container and place one in another place. It is called the partition. But what are the uses of, by seeing the picture you will come to know those are country names from A to Z. We are doing replication there. Means all the country names from A to Z we are storing them in Node B also. Node B, nodes are like systems, virtual nodes or servers we can call them. So we are copying them into Node B. But what's the main disadvantage here? We can copy everything. We need to have a lot of backups in case of data failure. But what's the main problem with this replication? Capacity, we need to have a lot of capacity and wastage of memory will be happening always. Because we are taking unwanted backups, right? But we need to think how we can come up with a solution which returns us the data at the time of failure of Node. But it should not be the replication and there is one more technique as I told you is partition. We will cut it into two parts and keep half of it in Node A and half of it in Node B. But what is the advantage here? Space. We are utilizing all the space to fill entire data. But what's the disadvantage? If Node B went down, we are out. We will get a 2AM call. So when we talk about distributed databases and scalability and all these things, there will be one demon coming into our way. It is called cap theorem. How many of you heard cap theorem? Cap theorem states that at the time of network partition, network partition means you have four nodes and four are connected to internet and they are clustered. So let us think network plug is unplugged from Node C. Then Node C cannot contact with the other guys because internet is down, right? So it is called the network partition. So at the time of network partition, you can have only two things. One is availability or other one is consistency. I will tell you why only two things can be there. Let us take an example. We have a database and there is a network partition. Take Node C and you have some data in Node C and at the same time one guy is trying to write to Node D and the other guy is reading from Node C which do not know about the other nodes. So it will return the old value, yes or no at the time of partition. But if you want to stop that you need to log the database, yes or no. You need to log the database saying that sorry one node is in the failure so let us repair it first and then you can write it. But here availability is not there, yes or no. So similarly if you leave him write, he will write that value to a dead node that is not available to the other nodes so other people will see the different values. Here consistency is lost if you allow him. So if availability is there then there will be no consistency and if consistency is there availability will not be there in vice versa. So how you can overcome that? React has a very good feature which enables you to set the level of consistency and availability. They both are not opposite but you can set the levels of consistency and availability using React. See this is what I told about, level of availability and level of consistency can be set using React. So how React does that? React has a strategy called NRW N means number of nodes to replicate the data. R means how many nodes it should be the data which is given by the customer should be written so that we should treat it as a successful read, R means write means it's obvious like how many nodes you need to write in order to make it a successful write. By seeing this picture you will come to know perfectly. See there we have node C right? So there are 5 nodes in our cluster and we are saying N equals to 3 it means just replicate this data to 3 nodes out of 5 so that if anything is lost node E is failed then I can get it from node C and node D. W equals to 2 means write this thing to at least 2 objects to confirm it is a successful write and read at least 2 nodes. You can say that you can read from node C and give to the customer saying that this is the thing but in React we can set from the program itself how many nodes I can read so that if you set N equals to 2 it is like okay this read is perfectly read from 2 servers so it might be consistent if you say R equals to 3 it will read from all 3 replicated servers and say yes this is the correct data because we read from all 3 servers that's how you can set availability and consistency values in the React dv mainly it's consistency so how React stores the things in the database is depicted in this picture it's very beautiful one like I told you React is master less means all our masters in React not like MongoDB there is no master here all are equal this is actually a virtual ring this is not physical this is a virtual ring in which we have all the nodes and all the nodes know what is happening in the other nodes that is called the ring state this is actually a algorithmic ring how React stores data in the database is like when you give data to React it will hash it using shavan algorithm and it will get some hash value so that hash value is used to insert data in this ring so let us take the key called favorite and hash it you will get some value that value is used to insert that favorite key at the place 3 okay means here 3 these a b c d e are physical nodes like my laptop is one node his laptop is another so there are 5 nodes and what are these things are called virtual nodes each node I mean each system will be divided into virtual nodes so React will allocate those virtual nodes to each node using shavan algorithm that's the case so 1 2 3 4 5 it will allocate all 64 virtual nodes in this manner for a 5 node configuration so when we say n equals to 3 by default for a 5 node cluster n equals to 3 means replicate to 3 nodes so let us think favorite key is stored in 3 3 so it will be replicated to 4 and 5 but here what are the physical nodes a b c d e are physical nodes so if d and e fails you will get value from c yes or no this is called the partitioning which we saw before we cut down half of the country list and put another half into another right so this is the partitioning how react partitions the virtual nodes in different nodes physical nodes these all are physical nodes and these all are virtual nodes so react does that and it will replicate but when we see the ring 3 4 5 are very adjacent right 3 4 5 are adjacent but when we see physically these things are in different machines or different nodes different physical nodes so even though d fails e will give the request and also people will think that how to add a node to the cluster how to remove because first thing you have 400 requests or something next you got 2000 requests so you need to add 5 more nodes in order to satisfy those 2000 requests right it is called horizontal scaling so people will think if you are using a database then it will be a very tough and complex job to add nodes to a cluster and remove it but in react it's very simple you will have a command called react admin so using react admin you can do all the things like add a node remove a node replace it destroy it you can do anything so cluster join react at the rate of 127.0.0.1 so that is the machine name like let us take my node so if you say cluster join node name it will join to the cluster that's it and if you say leave it is like we are removing a node from the cluster with a single command with a single command we can do that and if some node is failed and we need to replace it because node is failed due to hardware failure or network then we can do it using cluster replace node 1 and node 2 those two are machine names like IP address so you can also get the cluster status using those commands it's very simple with a one line you can do that and you heard of acid right in the database acid when you are dealing with MySQL or relational databases you will see the acid atomic durable and all those things so react is base base is the common terminology we are using in the distributed databases saying that it should be available B means basically available it should be available to the customer all the time soft state means the state of the data should be preserved and eventually consistent means the last right should win like those who like we will call it as transactions in relational databases right so similar to that durable means it should be there the data should be perfectly returned to the database because we can have durable writes or not that is our choice in react like if it is not durable the request for writing the data will go to the node but it won't acknowledge but if you mention that it is a durable right then the process will sit there and after successfully the thing is returned to the disk then it will come back and say yes it is durable right so react is durable you can set a right durable or not in the request itself that's the react has full customization like you can set all the levels which you want according to the customer experience so now I came to python actually at Nolarity I am working at Nolarity right we are a cloud telephony company so we use python a lot we use python as our main language to build all the systems there so you are seeing coding in python is very fun and it is also very simple language and there are many reasons to pick up the python actually I am not picking up python as using react as the database but we are using python and we want to scale we want to scale because we need to handle like lakhs of phone calls every hour so we are in the direction moving towards react slowly so I am sharing my experience with you our main stream is in python we are using Django as our back end so this is what we all know because of its expressive power with short lines of code you can do wonderful things and because of its simplicity and lot of libraries are available to do anything in single word we can say this so now you might have a question for me like yes you have a python client you are talking about react and also python there is a driver which connects both of these you can ask me yes there is a very good python client which is written already like 2 to 3 years back and it is matured so this is the short snippet I am showing you how to insert data how to connect to a node and how to insert data instantly see there I am importing the react client import react and creating the react client creating the bucket have you heard about bucket how many of you worked with S3 many of you I think so bucket is not a new term to you but let me explain here data is not stored as the table and row here data is stored as the bucket key and value bucket is the name space which is used to hold the keys key is the identity using which you will fetch the value so think it as a bucket in bucket you will have keys and value pairs using key you can fetch the data and using key you can insert data so I am creating a bucket let us create a bucket and in that bucket I am creating a new key called python developer 1 and I am inserting my data there and when I say store with filings you go to your data is stored in the database right so this is what I am talking about react client bucket new and store these are the methods I am talking about which are used to insert the data but react client we saw the normal version you might be wondering where is the clustering where are the nodes where are the scalability things you talked about till now you are showing just one node and saying that I am inserting data into that where are the nodes and where is the cluster so let us see about it these are the other two methods used to fetch a bucket and a value from it by seeing that code you will get we are creating a client and connecting to the database if you give nothing like react client with no arguments means it is in the local host that is same for all other database clients like MongoDB and all so bucket equals to client dot bucket of developers and user bucket dot get python developer one and I am getting the data out of it using get data you will get the data from the object which is written for a key so this is what I am speaking about before you just told react client and you just connected to the local machine how you can say it will work on cluster or multiple nodes how we will connect to multiple nodes multiple systems so see the second one second one is for single node which is in the remote place it may be easy to instance or something you can connect to that easy to instance using the address and port of that particular easy to and third one is interesting it is actually the thing which we want there is an argument in the third form of react client it is nodes it is a dictionary which consists of the machine information node information here the node this node is just the local host this thing this is one dictionary so you can pass any number of dictionaries let us say we have five nodes called react one, react two, react three, react four and five so you can add all those nodes in this react client form and when you say create client it will create a cluster it will connect to the entire cluster rather than connecting to a single node so using with one statement you can connect it okay in relational databases how you will search on the multiple like when data is distributed on multiple systems how you will search in all of them how many of you implemented the open source search technology lucin in your projects lucin or solar apache solar okay so solar is open source technology you can use it to implement your own search in your website or anywhere you want and you can use it and it will search with lightning speed so this react uses the solar search as the back end to search things on the multiple systems which are spread over the cluster like one node can be in Singapore another node in US west another one is in US east but you can call a query from a node and you can get results from all of the nodes you can search in all the nodes using a single command in react in relational databases it's not possible and it's not that easy even though you are adding some add-ons or something and react also have the map reduce architecture like normally you do map reducing in order to aggregate and find the results from the data which you stored in the database it's not like actually map reducing is taking code to data rather than taking data to code and executing some results and fetching it so react distributes the mapping part and also the reducing part equally to all the nodes so that they will execute them independently and send result to a single node which aggregates them and redistributes the tasks so even react has a very good map reduce architecture but what we are getting here using react and python python it's easy to start for anyone like for all the other developers who are coming from the different domains they can easily adopt to python because it's easy to learn and easy to start so it's a developer friendly stack react has a very bad documentation exactly one year before but before one year exactly react 2 was released so at the time they took they are more concentrated on the documentation so now you can find a very good documentation on react in the basho website basho is the company which created react which implemented the open source version of dynamo you know mongo db right so tension is the company which created which build the mongo db so similarly basho technologies is the company which build the react so on their website you can find lot of documentation for the python client so before that let me show you a working demonstration of it so I am trying to create a cluster in the docker container how many of you worked with docker 2 how many of you know docker created containers and deployed something in that created images and pushed it to cloud repository so docker is nothing but it simulates a virtual environment for you so that you can create a virtual environment and execute things in it and docker will assign a random ip to a container so that you can use it as a you can use it as a virtual system because I don't have 5 systems here right which is connected through the LAN to show the demo so I am creating 5 nodes in the docker and I will use them to connect ok I think there is some problem with this ok and you may ask where we should not use the react I can show you but I think something happened to my docker demon so I experimented with it in the morning so where you should not use it if you are a startup and you don't have much data to handle and you are not scaling you are not thinking about scaling for coming 5 years then don't use it simply because there will be more react haters than who are going to get it so if you are nostalgic about sequel and you don't want to leave it and come to no sequel world then you can use it and mainly if you don't have any problem now you can skip using the react where you should use react I told this as the first point that your data is so critical and your customer need not wait even a single second to get the data or write an operation so if your business is like that then you should use react compulsorily because even though one node fails the other one will come to the rescue ok you will get much more things there is a very good book on react called react book and there are client libraries for majority of the programming languages not only for the python and python and react db is a perfect couple in my opinion because you can use python to build websites very soon and react to scale it horizontally so thank you if you have any questions you can ask it's an open sourced one you can fork it and you can use it as it is but there is an enterprise edition also in basho it has many additional features which is not required normally but they are required when your things are going out of hand but for a big scale business you can use the open source version and get what do you want yes so that company or should I do it manually myself I gave you commands to do that right just purchase open stack machines or any other thing and just add those machines to the cluster it will scale up automatically it's not required that's what I told if you take databases like kessandra and other things if you want to scale it up you need to follow a boilerplate procedure to do that is react really good for transactions that's why I told it's eventually consistent yes there is a technique called vector clocks in react it will track all the write operations and also read operations using a thing called vector clock so this vector clock will be it's a time stamp which will be converted in multiple geographies multiple locations on multiple nodes and it will automatically transfer it into a single value single relative value and compare that are there any benchmarks run to compare react with the other databases yes there are many benchmarks in the web and it's almost equivalent to good with MongoDB but one thing which really awesome about react db is like if you are going to add a node which is having more cores then react will perform like a superman because it's running on Erlang Erlang can use all the four cores to do this I accept Cassandra and react both are came from Dynamo paper they both are implemented from the Dynamo concept which is given by the Amazon but if you take react then if you are currently the systems with multiple cores is common right so if you add four core system then react's power will be doubled but not with the Cassandra does that mean react performs better when more cores are added to react or no no no obviously it performs better because I mean better than say a MongoDB it will do really better than MongoDB if more cores are added because Erlang likes cores like anything thanks one question here where how is concurrent read and write handled in react because you have all those problems in in any distributed system this concurrent read and write problem will be there so that's what I told at the time of concurrent write at the time of concurrent write react allows you to choose what to happen like you need to take last right win strategy or you need to create siblings so that option in react client if you said that client option then the last right will win like if but if two people are doing I told you the vector clock before right so it's concurrent means they are not going to do it in the same microsecond which it's going to insert into the database right so there will be a time lag so you can overwrite one value with the other at the time of concurrent like you and I both are inserting same thing into the database but for us it may be 445 or 450 but for react it will go into microsecond level and there is a vector clock it will check that vector clock what is the exact time it's going to hit the disk according to that it will overwrite or it will keep two records according to the option you specify there is a option called allow multiple nodes if you set it as true then it will create siblings if you set it as false then last right wins it's according to your thing there is no way to come around that situation but you can keep siblings you can remove the ambiguity there so that data is not lost how about the reads at that time it will be dirty right because the right is happening that's according to the ratio or the probability which we choose the consistency or availability level actually in react there is one more good thing like for each request you can set the readability and writeability levels which I showed you before the number of nodes it should be read to get the data and number of nodes it is needed to write it so those things you can make it per request but in other databases it needs to be set in some configuration file we need to restart it to modify but here and automatically in react okay are we having any more questions questions anyone you can ask okay I don't think that any more questions are there so I thank you all for coming today and attending Pyken I hope to see you tomorrow also and I thank Narayan for such a wonderful session thank you Narayan thank you thank you guys