 In today's session, I'll be covering the topic of craft consensus algorithm and demonstrate the uses in the acolyte-distributed database. I'm grateful for the Euro-Python community to be providing me with this opportunity. Let me quickly introduce myself. I'm Tanya Thne and I currently work as a software engineer at a financial services firm. During my work, I have interacted with a number of database systems like DynamoDB, AuroraDB, and JogCV, and I'm passionate about building high-performance systems around these databases. In the past, I have also worked with computer vision and game theory, and I love to keep reading about these technologies. I also spend my time volunteering for social initiatives and have recently participated in a number of initiatives that have been organized by my firm. In my free time, you can find me painting, listening to an audiobook, or planning my next trip. Without further ado, let's dive into the topic. I'll be discussing distributed system and how they can be useful and the challenges associated with them. Then we look at a particular problem of establishing consensus and discuss the draft consensus that got them in detail. At the end, we look at draft in action using the acolyte database and see how we can send commands using Python. The topic of distributed systems is VAT, and this talk is a beginner discussion on the topic, and it will enable you to read up more on distributed systems. So, what are distributed systems and why are they needed? In today's world, data has become an entity of utmost importance. Whether you are scrolling through social media, ordering a pizza, making a transaction, booking a doctor's appointment, or even using the JPS, you are turning and using a lot of data. Therefore, it is important that the systems that work with it are designed to be able to handle such large sales and meet the demands of speed with which the application should run. Let us try to understand the need of distributed systems to an example. Say, I own a social media company and I store all of my data and run the application on a single machine. This is clearly a recipe for disaster life. If the machine crashes for whatsoever reason, the application will become unavailable. Moreover, if the disk gets corrupted, we could end up losing all of the information. Even when the system is up and running, all of the requests are being directed to one particular machine. This would lead to an immense workload on the system. And let's say your users are located geographically apart, then those data situated part of the data center would experience a lot of latency. Distributed systems can come to a rescue over time. It is designed to distribute the processing load, data, and resources across multiple nodes. So our system becomes more scalable. And since all of the information is distributed, is replicated across multiple nodes, we also introduce data redundancy. So if one of the hosts crashes, we do not end up losing all of the information. Moreover, the users can connect to the data center that is closest to them, and this way they would give experience minimum latency. We also introduce fault tolerance. Say one of the nodes crashes, we can design our system in such a way that the connection of the user is now being redirected to another node. These systems can range to two node systems, to large-scale systems like that of cloud infrastructure that contains thousands or even millions of interconnected nodes. While distributed systems offer a host of advantages, does it come with caveats? Well, the devil is in the details, isn't it? Unsurprisingly, maintaining a distributed system comes with its own set of challenges. Let us look at some of the problems that we must try to handle if we are working with distributed systems. The first is the issue of consistency and consensus. If the same data is hosted across multiple nodes, we need to ensure that all of it is in the same space. This is made difficult by a number of issues that can arise. Let's say one of your nodes is experiencing network lag and there is a delay in it receiving the changes from other nodes. So it will lag behind the other nodes and the users that are connected to that node would see data that was in the past. Then if a particular node crashes, and by the time it is brought up, a number of instructions are lost. That way the data will surely become inconsistent, right? And we wouldn't want the users to see inconsistent information. We can easily be connected from our laptop and our phone to the application and it could be potentially different servers. We do not want them to see different information, right? Now let us look at the second issue. Hosts would be connected over a network. And once network connections come into the picture, there's always the risk of the channel breaking. As a developer, we must be prepared for handling such scenarios. They can also be additional issues like that of security breach or unbalanced workload between nodes. While all the problems are not in our hand and something unexpected might always happen, it is best that we remain prepared and design our system carefully. In this session, we'll dive deeper into the problem of establishing consensus. So let us try to understand the curious case of consensus. In simple terms, it means getting several nodes to agree on something. This coordination can be facilitated by a node that acts as a leader. For example, if three people are trying to book a seat for a movie theater at around the same time, these nodes would send their request to book a seat to the leader. The leader will make a decision and it will decide which of these nodes got the seat. Let's say node three books the seat. This could be decided in a number of ways. One of the ways could be the fastest request, the earliest request, right? But if the process of achieving consensus is not handled properly, errors like that of double booking are prone to happen. Another situation that can happen is, let's say node one crashes and it did not receive the response that node three successfully booked the seat. In this case, if someone is viewing the browser, what should they see as the status of the seat? This is where consensus comes to rescue. Now, let us try to formally define consensus by a set of properties that must be satisfied. There should be uniform agreement. That is, everyone should agree on the same outcome. The second property is that of integrity. That is, once you have decided, there's no going back. You cannot take a decision twice. Further, a decision must be valid, which means that if a node decides on a value, it must have come from some node in the cluster. One can ask why is this required? Well, this helps out rule-trivial solutions. For example, if your application is continuously deciding on a value of an integer X to be five, irrespective of what the nodes then, it would satisfy the first two properties because the nodes are all agreeing on the value and they are only taking the decision once. But this is clearly faulty, right? Such algorithms would be kept in check by the validity property. Now, this seems like a good enough criteria to reach the center. But if we think these properties do not take how the leader is being collected. It is a design choice. I could say that only one particular node will be the leader of my cluster. What happens if it goes down? My system cannot respond to requests and will have to wait until the leader node is brought up. Here comes the property of termination. It says that a consensus algorithm must progress. Even if some nodes fail, the others must read a decision. But of course, this is not always possible. What happens when all of your nodes crash? Or if we think about it, a majority of the nodes must be up and running for the possibility of reaching a consensus. All right, with that out of the way, let's jump into the Raft Consensus Algorithm. Raft was developed by Diego Ongaro and John Astrohout at the Stanford University in 2014. Please pardon me if I mispronounce the names. It was designed to be easy to understand and is equivalent to the PaXOS algorithm in its performance. For context, PaXOS has been around since 1989 and is considered a Fundamental Consensus Algorithm. It has been widely studied and has inspired the design of previous other practical implementations for consensus protocols. Now, Raft worked by having each node store the data as well as the log of operations that have been performed on it. It ensures agreement on the commands in each server log. So all of the servers would have the same set of logs in the same order. As a result, each machine will process the same order of operations and thus would arrive at an identical state. This algorithm can be thought of as working in two stages. The first stage is that of leader election, where all of the nodes decide who the leader node is. And once that is done, all of the changes would pass to the leader node. So the leader node is now responsible to replicate the changes that it was asked to do in all of the other nodes. Let us look at these stages one by one. So what happens in leader election? So a node according to the Raft algorithm would be in one of the three states. It could be a leader, it could be a follower or it could be a candidate. In this pictorial representation, I am going by color. So green is the leader, blue is the follower and yellow is the candidate. When the cluster starts, all of the nodes would be in a follower state. And if they do not hear from a leader for some time, they can choose to become a candidate. Let's say there are three nodes, A, B and B, all of which are followers and A decides to become a candidate. The candidate will then ask for votes from the other node. And if it receives the majority of votes, it would become a leader. A question that comes to mind is, how long does a follower wait before it must decide to become a candidate? This can be handled, this needs to be handled carefully because if the time is too low, the followers would keep on becoming candidates and there would be elections all the time. This way our cluster would not be able to get anything else done. But if the time is too big, then what could happen is if the leader crashes, you would have to wait a lot of time, wait a long time before another election happens. This is controlled by the election time-out setting. Each node is assigned the value of an election time-out. After, so if a particular node has not heard from the leader for a duration of the election time-out, it can become a candidate. This value of election time-out is randomized to be between 150 to 300 milliseconds. So all of the seconds, sorry, all of the nodes, follower nodes would keep on maintaining time of how long they have not heard from the leader. And once that time hit the election time-out value, they would become a candidate and request votes from the other nodes. Now it could end up, it could happen that we end up having multiple candidates and none of them get a majority of the votes. In that case, a re-electing the trigger. Okay. Now, what happens if there is a scenario where there could be multiple leaders? Well, if the leader node gets disconnected from some part of the cluster, the majority of the cluster, then any of the nodes in the majority would not receive the communication from the leader. They would become a candidate and there would be another leader. Now, if the network is healed and the leader, the older leader comes back, there are two leaders in our cluster, right? We cannot let this situation happen. This could lead to conflict because there are two nodes doing the right and communicating their values for the rest of them. In this case, what happens is we use the election term. So whenever a candidate decides to request for votes, a new election term starts. So when in our example, the older leader comes back and it sees that a new election term has started, it will step down and there would be just one leader. Okay. Now, let us say we have established that A is the leader and we store the value of an integer X, which is currently five. A client connects to the leader and sends it a command to set the value of X to 10. What would the leader do in this case? The first thing that will happen is it will copy over this instruction in its own log. Now, a point to note here is that this value is not yet committed. So the value of X is still five on the node. Then it will instruct its followers to replicate this particular log entry. Now, it will wait for a majority of the followers to perform this replication and then it would update, it would commit the transaction. So at this point, the value of X becomes 10 in the node, in the leader node. The leader then communicates the followers that the commit has been performed. And soon enough, all of the nodes in the cluster reach a consensus on what is the value of the integer X. Okay. Now, I'll do a quick demo using RQ Lite, where I will have four nodes, one, two, three and four, up and running. And one of the nodes is currently a leader. This node will perform transactions and we'll see if the changes have been reflected in the other nodes. Then we'll also see what happens if a node fails. For example, if a leader node fails, a reelection must be triggered and whoever gets the majority of the votes must become the new leader. So the command that is used to start the process looks something like this, which has two important things, the HTTP address and the draft address. Now, the draft address is the port that is used to communicate things related to a draft consensus protocol. And the HTTP address is where the commands are sent by the Gantt. Okay. So in this case, a node with ID two is being established whose HTTP address is 4003 and the draft address is the port 4004. And it is joining an already running host whose HTTP address is 4001. This node is called node two and I'm storing all of the logs in this particular file. Okay. If I see what happens over here. We have just last five minutes remaining for the talk. Sure. So in this case, I have executed a command that is showing me the status of the host at 4001. We'll see that this is the leader node and the other nodes are at 4003, 4005 and 4007. And all of these nodes are the follower nodes. Okay. Right. Now, I am connected to, I started an ArchiLite command line and I'm connected to the port 4001. Here, I will try to create a table. Okay. And let's quickly check that it is created correctly. Yep. And then I try to insert value. Okay. If I try to see what happened to the table, I should be able to see the value of it. Now if I try to do, try to see what happens for the other nodes. So I'll quickly run the command for one of the nodes and connect to 4005 where I will select all from my table. Yep. And it would be replicated another data. Now, if I decide to kill my, if I decide to kill my leader, the leader node is the one with 4001. So, yeah, it has been killed. And I try to quickly see, this is a follower. So one of them have to become a leader. Well, 4003 is a follower, 4005 is a follower again. And 4007 now becomes a leader. So you see how it works in action. Now, if you want to do the same thing using Python, we can use PyArchulite, where we'll establish a connection and use the connection cursor to execute the months. And using the same, we can see that the table will be created and values would be inserted. And the same would happen in other nodes of the cluster. Okay. There are a lot of information that is available. And here are the resources that I found much helpful is the book designing data intensive applications, the original paper for us and the demonstration, the simulation provided by Secret Life of Data and Archulite documentation. That brings me to the end of my presentation. Thank you all for listening to the talk. And I hope to see you all again. Thank you. Thank you.