 Okay, so we're coming to our third session in this block and you've just seen messages from some of our sponsors, but some of our sponsors also do talks and the next one I'd like to invite to us is Alexis from Numbuli, Alexis, hello. Hello everyone. Can you hear me all right and see me as well? This works perfectly and we can see you too. You are in Paris at the moment or in France? Yeah, in France and so far France actually this time, but I usually live in Paris and work in Paris. Are you ready to do your presentation? Sure, I guess so. I will just start sharing my screen and prepare everything if you're low. Yes, I'll show. Okay. And let's see if that works. So, hello everyone. I'm really happy to be remotely as you understood with you today. In this talk, I will detail Cassandra and Silla low level architecture and explain how it's used by Python Cassandra driver and show you how we extended it to write to Python driver for Silla. There will be diagrams, emojis, Python code and hopefully some amazing performance graphs as well. Let me introduce a bit about myself. So, has Martin already told you and you can judge from my accent. I'm French, I'm Alexis and I'm a CTO at Numerli who happens as well to be a proud sponsor for Europe Python since 2014. We are digital marketing experts and we help brands establish a relationship, a digital relationship with their customers. On the open source world, Gentoo Linux developer where I'm part of the cluster and container teams which means that I spend some of my time working on packaging distributed databases such as MongoDB and Silla or work on packaging some other cluster-related tools and work with my friends on the Docker Gentoo Linux images. I'm also open source contributor and enthusiast. I've been contributing to MongoDB, to Silla, to Apache Airflow and I'm a Python software foundation contributing member which means that I spend a fair amount of my time working on or contributing to Python-based open-source projects. Before we start, I wanted to introduce you to a fact that you may not know already but since Europe Python is using Discord, I found it interesting to share to you that Discord is using Silla as well. If you are interested in understanding what they do, I invite you to check out the link here where Mark Smith introduces how Discord is using Silla. It's very interesting, so check it out if you want. Also, this is an advanced talk as advertised, so I will suppose that you are familiar with the basics of consistent hashing and Cassandra. If that's not the case or if you simply want to know more about consistent hashing and even use it in your Python applications, I'm allowing myself to also introduce you to the talk I gave on this very subject in a previous Europe Python edition. But fear not, I've still worked on not making this a big problem for this presentation so you should be able to follow along and no problem even if you don't know exactly the details of consistent hashing. Let's get started now and let me introduce you to Cassandra and Silla token-reading architectures. We see all they have in common but also what Silla has done that makes it special and worth a dedicated Python driver. The first thing to know is that a Cassandra or Silla cluster is a cluster, is a collection of nodes or instances that can be visualized as a ring. All the nodes in this ring should be homogeneous using a shared-nothing approach. This means that there's nothing special about a node in this topology. One node, one Cassandra node on the ring or any Silla node on the ring has no special role or anything special about it. There is no primary or secondary or anything. They all do the same thing. This ring is called a token ring in which the positions of the nodes on the ring define token ranges and token range partitions. You can see that before, if you go clockwise on the ring, the range that is preceding a node is the token range or the partition that it is responsible for. The partition is just a subset of data that is stored on the node. In SQL, the Cassandra query language, a partition appears as a group of sorted rows and is the unit of access of a query data. This data is usually replicated across nodes thanks to a setting that is called the replication factor. The replication factor defines how data is replicated on nodes. For example, a replication factor of two means that a given token or token range or partition will be stored on two nodes. This is the case here where partition one and two is stored on X and you can see that partition two is also stored on Y while the partition one is stored on node Z. That means that if we were to lose node X, we could still read the data from partition one from node Z. This is how high availability is achieved and how Cassandra and Sila favor availability and partition tolerance. They are called AP on the CAPTRM. This kind of token ring architecture is sensible to data distribution among the nodes. Queries should, in theory, be evenly distributed between nodes. We could get an unbalance of data and query load in the above scenario where we store data in three big partitions and each node holds one range of the previous one and one range from the next one. If one of those partitions were to grow larger than another one, we could then have an unbalance of queries and sort of overload on them. To counter this effect that we are calling a hot node or a hot partition, we need to add more variants in partition two node allocation and this is done using what is called virtual nodes. So instead of placing physical nodes on the ring, we will place many virtual instances of them called virtual nodes. A virtual node represents a contiguous range of tokens owned by a single node. So it's just a smaller slice of a partition but it's more shuffled between nodes. A physical node may be assigned multiple and non-contiguous, if you remember the preceding slide, it was contiguous. This time, virtual nodes allow for non-contiguous assignment of nodes. The default is to split a node into 256 virtual nodes on the token ring. So this is true for Cassandra and this is true for Sila as well. So if you look at how now the partitions are distributed among nodes, you see that there is more variance into this which will end up in distributing the query better. This is it for Cassandra's data distribution but Sila goes one step further. On each Sila node, tokens of a V node are further distributed among the CPU cores of the node that are called shards. This means that the data stored on a Sila cluster is not only bound to a node but can be traced down to one of its CPU cores. This is really interesting architecture and low level design. This is the feature that we will leverage on the Python Sila Sharderware driver later on and I will explain to you how. Now that we understand how data is stored and distributed on the cluster, let's see how it's queried by clients. On the physical layer, partition is a unit of data stored on a node and is identified by a partition key. You can relate a partition key to a primary key in the SQL world. A partition key is the primary means of looking up a set of rows that comprise a partition and a partition key serves to identify the node in the cluster that stores a given partition as well as to distribute the data across nodes in the cluster. The partitioner or the partition hash function using the partition key will help us determine where the data is stored on the given node in the cluster. So you take the ID, in this case, you see a colon ID, this will be the partition key. You take the value, you apply a hash function on it, the partitioner hash function, which by default on Cassandra and Sila is murmur hash tree. And this will give you a token. A token is like a number, it just actually is a number, that will be placed on the token ring and from where it leads on the token ring, you will find out which nodes is, which node is responsible for this data. That's as simple as this. Okay, so let's recap now. On Cassandra, the hash of the partition key gives you a token telling you which node has the data that you are looking for. We can see this architecture as a short pair node architecture because from a token, you get to a node, so short pair node. On Sila, the same hash of the partition key, from the same partition key gives you the same token, but the same token is not only telling you which node has the data, but also which CPUs core in this node is responsible for handling it. So this is a short pair core architecture. This is how it's called. So Cassandra is short pair node while Sila is short pair core. Now let's see how does a client driver query a Cassandra or a Sila cluster because now that we know this, we could guess and expect that client drivers uses this knowledge to find out and optimize their query plan. A naive client would go on like this. When a client connects to a Cassandra or Sila cluster, it opens a connection to every node of the cluster. When it wants to issue a query, a naive client would pick one of its connection randomly, let's say, and issue the query to the node. The node it issues the query to will be seen as from the client's perspective, it will act as what is called a coordinator because he's coordinating the query and he's taking the ephemeral responsibility of routing the query internally in the cluster to the right nodes that are replicas for this data. That means that are responsible for the partition the query belongs to and gathering the responses and then responding back to the client. But maybe this coordinator is not a replica for the query data. If it's not the case, if the coordinator is not a replica for the query data, it has to issue the queries to all replicas itself. That means that you will add, it will add an extra hop inside the cluster to get the responses. This is suboptimal, of course, as it consumes network and processing power on the coordinator node for something that the client could have guessed in the first place, right? Because since the partition a hash function is known, our client library can use it to predict data location on the cluster and optimize the query routing. This is what the Cassandra driver does using the token aware policy. How does it work? Token aware clients apply the partition logic to select the right connection to the right node and make sure that its coordinator node is also a replica of the query data. This is cool. And this is very efficient. As a result, we save network hopes, lower the cluster internal load and get reduced query latency, meaning faster queries. Let's see how the Cassandra driver does it for real internally. So the token aware policy from the point of view of the Python Cassandra driver, the partition key is seen as a routing key because it will be used to route the query, right? So it is seen as a routing key which is used to determine which node are the replicas for the query. And to allow our Python driver to know about the partition key of a query, the query itself must be prepared as a server side statement. This is how it looks in Python. Cassandra's and Sela's prepared statements, you can see them a bit like stored procedures in the SQL world. You see, if you see statement equals session.prepare and then you express the query that you want and when you have an argument or parameter, you just put an integration mark. And this is the recommended and most optimal way to query the data because when you have prepared your query, it is validated and it lives on the server side. So you don't have to pass it and you just have to pass a reference to it and then only pass the arguments and the arguments, one of them will be the mandatory routing key, partition key, then routing key. So statement plus routing key equals nodes. And it's very, very cool. Another thing to note, it is that just like stored procedures, prepare statements are also the safest way because they prevent query injection. So please, in production at the bare minimal, only use prepare statements when you issue queries to Cassandra or Sela clusters. Then the Python Cassandra driver defaults to the token aware to root the query and then it also defaults to a data center aware round robin load balancing query routing policy. It's a big long, but what it means that is that it will load balance for you in a round robin fashion. So one after the other, after the other like this. So it's a bare minimal load balancing algorithm there is, but it's still pretty efficient. So don't worry, even if your cluster is not spread between multiple data center, it still works. It's just, it just happens to be the default. So it's token aware plus data center aware round robin. By doing so, the query routing will not only hit the right node holding a copy of the data that you seek. Remember, it's called the replica, the replica, but also load balance the queries evenly between all its every class. So one could think, yeah, this is awesome and optimal. I mean, from the Cassandra cluster point of view, it is and we can do better than this, but not with a Sela one. Remember, Sela shards the data one way further down to node CPUs. So having a token awareness is cool, but if our client had shard awareness, it would be even cooler, because this means that a token aware client could be extended to become a shard aware client to root its queries not only to nodes, but right to their CPU cores. This is very interesting to do. Such drivers, they already exist as forks of the data stacks Cassandra drivers. And it's true for the Java one and the Go one as well. And they have been around since last year, actually. But there was no shard aware driver for Python. And it made me sad and pretty angry. So when I attended Sela submit last year in San Francisco, I did some lobbying and hard lobbying and found some Sela developers that were willing to help in making this happen for Python as well. So we promised each other to make a Python shard aware driver. And good news is that we have obviously did and I will now explain to you in details how it has been done and what we found out by doing so. Very, very interesting as well I think. So let's start by checking out the expected structural differences between the Cassandra driver and the Sela driver fork. The first thing to see is that the token aware Cassandra driver has, when it connects for the first time to the cluster, it opens a control connection. This control connection allows your Cassandra driver to know about the cluster topology, how many nodes they are, which are up, which are down, what are the schemas, et cetera, et cetera. It needs to know all this. So it opens a special connection for this and in this connection it refreshes from time to time. And then it will open one connection per node because this is how the token aware policy will be applied to select the right of this connection based on the query. And then it will be done by the famous token calculation. And that's how it's done. All right, on the shadowware client perspective, we still need to know about the cluster topology actually, but instead of you opening one connection per node, we will be opening one connection per core per node. The token calculation will still be useful to select the right node from the token perspective, but then we will need to add a shard ID calculation because we need to go down to the shard or the CPU core. So to select the connection to the right core to route the queries. Let's transform this into a to-do from the Python code perspective. First thing is, since we will be using the same kind of control connection, we don't, we just use this as is. There's nothing to change here. We will need to change the connection class object because now when we are gonna open a connection per core per node, we will need to be able to detect if we are talking to a CLA cluster or to a Cassandra cluster. The CLA driver, we wanted to retain the maximum compatibility with Cassandra as well. So you can use the CLA driver to discuss and query your Cassandra cluster as well. The host connection pool should, we'll use those shard aware connections to open one connection to every core of every node. The token calculation that selects the right node will be the same. We will just use the vanilla and already existing and efficient token aware policy. But then we will need to extend it and add in the cluster when you issue the query, we will need the cluster class to pass down the routing key to the connection pool. And then we will apply the shard ID calculation and then implement the logic based on the shard ID of selecting the right connection to the right node to the right shard. Okay, sounds like a plan, let's do this. Now we'll get down into the code. Before we go into this, I wanted to highlight and to introduce Israel. Israel is a CLA developer from Israel. I know it's confusing, that's how it is. And since I know he's in the audience, that makes me some kind of pressure as you can guess, I wanted to take this opportunity to thank him and giving him the credits for most of what I'm going to present now, especially the efforts he put into CI testing the CLA driver. So the first thing that needed to be done is to add those shard information to the connections. So this is how it's been done. So a connection now has a shard ID assigned to it and sharding information. The sharding information comes from the server responses when we issue a query. That's what is squared in red on the bottom. The logic of this looks like this. What's interesting to note is that the Cassandra protocol allows only for connection message options being passed on the server response. This means that when the client initially connects to Cassandra or Silla, it has no way of passing any kind of information to the server. So we are dependent on the server's response to know about shard information or whatever it is that we need. If we look at the message options that we get back after we have connected to the server, the first one is one of the most interesting for us because the Silla shard information tells us which shard ID or core was assigned to the connection by the server. And I'm gonna say this again. This information tells us what shard ID or core was assigned to the connection by the server. Since we have no way of asking anything when we connect, we are dependent on the server shard allocation to the connection that we opened. This is a protocol limitation. So now we are gonna change the host connection pool class. We need to get a connection object for every core of the node, right? So the first thing we did is that we got rid of the single connection that we had before. And replaced it with a dict where the keys are the shard ID, numerical shard ID and the values are the connections that are bound to the specified shard. The first time we connect, as you can see in the first square rectangle, the first connection allows us to detect if we are connecting to a shard aware cluster or a Silla cluster, for instance. And this is where we get the first glance at the sharding information and we store it. Then the first time I saw the initial implementation like this, and this is the second part here with the four underscore in range. We can see that we are doing an asynchronous and optimistic way to get a connection to every core. Why is that? We open a new connection to the node and store its shard ID plus connection object on the dict. And we will do this twice as many times there are cores on the remote node until we have a connection to all shards, maybe. Because if you remember, the client when it connects cannot specify which shards it want to connect to. So you just happen to connect to the server and you get a shard assigned. That's why the initial implementation was trying and saying, okay, let's try twice as much as there are cores available on the remote node and hopefully we'll get a full connection and a connection for every shard. If we were lucky, fine, keep on moving. If not, we would raise an exception. No, this is the first time I saw this. I understood that there was, I understood this flow in the client not being able to request a specific shard ID because of the protocol limitation. So there is no deterministic and secure way to get the connection to all remote shards of a CLA node. And connecting also synchronously means that the startup of our application would be as slow as connecting to hopefully all shards of all nodes, not acceptable. So the second thing that came to my mind was, hey, this also means that all the current shard aware drivers, since it's a protocol limitation, it's not bound to Python and whatever. It's not a Python's problem. It's a flow or a lack, maybe not a flow, but rather lack in the protocol itself. So that means that all the current shard aware drivers are lying since none of them even today can guarantee to always have a connection for a given routing key. All of this is opportunistic and optimistic. You will eventually get one, but not all your queries will be able to use the direct connection to the right shard. So I wrote an RFC on the CLA Dev mailing list to discuss this problem. And the good news is that the consensus to a solution was recently found. It will take the form of a new shard allocation algorithm that will be placed on the server side and that will be made available as a new listening port on CLA. So since we want CLA and Cassandra on their default port to retain the same behavior, if we want to change a bit and add on the server a new kind of allocation, port allocation, we need to do it on your new port. So it will be a shard aware port, let's say. It will just use a sort of modulo on the client's socket source port to assign the correct and calculate and assign the correct shard to the connection. So it means that the clients on their side will just have to calculate and select the right socket source port to get a connection to the desired shard ID. This is work in progress, but it's not done yet. So I worked on implementing a software and optimistic and asynchronous way of dealing with this problem. Let's see how it's been done. The first thing is that I wrote two functions. The first one is the optimistic one. It opens a connection, tries to open a connection and only stores it if the reported shard ID was not connected before. Else, we close it. So we are only interested in the keeping connections to missing shard IDs. But we just open it and if it works, good. If it doesn't, we'll see later. Then I switched the startup logic to schedule as many of those optimistic attempts to get a connection to a missing shard ID as there are shards available on the remote node. So when you connect and you start connecting, if you have 50 cores on your remote server, you will issue asynchronously and schedule asynchronously 50 attempts to get a connection to shards. Maybe 25 of them will give you different and unique shard IDs connected. Maybe two of them, maybe 50 of them, lucky you. But now we don't care. It's optimistic, asynchronous and it will go on and on again like this as we use the driver. The result is a cluster, an application startup time that is as fast as the usual Cassandre driver and non-blocking optimistic shard connections. So the cluster object now, it should pass down the routing key as well to the pool. So when you see here, when you issue a query in the query function and the cluster looks up for a connection there, we added the routing key so that we could apply the shard ID calculation. This shard ID calculation is a bit obscure and Lucky me Israel was there to implement it in the first place. But the first time I tried to use this pure Python implemented shard ID calculation, it was very bad on the driver performance way. We were slower than the Cassandre driver. So what Israel did is to move this shard ID computation to Sighton because actually the Cassandre driver is using a lot of Sighton in the background when you install it and he managed to cut its latency impact by almost seven. So Kudos again Israel, it was a very impressive move and it made the difference on the driver's perspective. So now let's wrap it together and in the main shard awareness logic in the host connection pool. So here this is basically where the connection selection happens and everything is glued together. So if we are in that's line two, if we are on a shard aware communication with a cluster, we will calculate the shard ID now from the routing key token. Then we will use the routing key, the shard ID and we will try to look up in our connection dict if we happen to have a connection to this direct shard ID, so to this direct core. If we do triumph, we will use this direct connection to the right core to enroute the query there almost. That's perfect, that's the best case scenario. If not, we will pray and issue asynchronously a new attempt to connect to a missing shard. Maybe it will be the shard we were trying to look at before, maybe it will be another one, but that means that as much as you issue queries to your cluster, the more chance you get to have a connection to all shards and all cores. And if you didn't have one, we will just random pick an existing connection. So we would be just as if we were using the Casanova driver. Now that we have seen the implementation details, does the Cilla driver live up to our expectations? Is it fast and how did it work in production? Because to us and to me, the real values and the R must be taken from production. So let's see. The first expectations that we check was an increase. We expect an increase in the number of open connections from the cluster's perspective. And this is a check because now that we are opening not only one connection to each node, but one connection to each core of each node, we expected to see a disincrease. So you can see from the annotation when we deployed the Cilla driver, we saw this increase. The second one was also an expectation to have more CPU requirements because you open more connections, meaning that your driver and your CPU has to handle more connections and keep alive, et cetera, for to keep those connections alive. We saw that we had to increase from on our Kubernetes deployments a bit, the CPU limits to avoid CPU saturation and throttling. But then what about the major impact we wanted? We want faster queries, lower latencies, right? How did that translate for real? This is how what will graph look like. I was like, ha, wow, it's amazing. We gained between 15 and 25% performance boost. And at Lumberly, we like to look at our graph on the worst-case scenario possible. That means that if you check this out, this is the max of our processing. This is the worst latency that we get from our application perspectives. What's interesting to note as well is that the performance boost is progressive since we connect to shards in the background in an optimistic fashion. The longer our application runs, the more chance to have a connection to all shards it has. And then the lower the latency guests because we start to always have the right connection to the right core for every query. So you can see that right after the deployment, we get already a fair amount of performance boost, but the longer the time passes, the more shards connected, the better the latency. That was very interesting to see. We can see it also from afar, if we apply a moving median on another power-angry process, you can clearly see the big difference that the CELAS shard or wire driver has made in our production applications. From this, we got an unexpected and cool side effect. It's that we manage as well to, since the cluster load was reduced and the client's latency was lower for the same workload, we could cut by half the number of replicas on our deployment, which was very cool as well. So we saved actually resources on our Kubernetes cluster. Recent additions that we've done on the driver, we have added some helpers to allow you to check for shard awareness and to check for this opportunistic shard aware connections. So you can actually see how much of it is fully or not connected. When it becomes available, we will be changing also the driver to be able to select deterministically this time, the shard ID when it connects. So there are two open pool of requests already for this. We're gonna still work on improving the documentation. And since it's a Cassandra driver fork, we will merge and rebase the latest improvements as well. Try the CELAS driver. It's working great. It's working production for us for now almost a month and with the great impact that you've seen before, check it out on the repository. Come chat with us as well. When your Python is over on the CELADB clusters like, we have a Pythonistas channel where you are all welcome. And that's it for me. I want to thank everyone for attending and making this Europe Python a success. There is the Discord third channel where we can keep in touch and discuss this further or deeper if you want. And tomorrow we will also have a sponsor talk session where you can join in and we will be talking about a lot of different aspects and have cool guests from a number as well. Thank you very much. Thank you very much for the talk. We have a minute or two for questions. Do you mind some Q&A? No, of course. Yes, there's one question that has gotten some votes. The code of the driver seems to be Python 2 compliant, but is there an Asyncio part, like for Python 3.5 or better? Yes, it's an old code base actually. The Cassandra driver is quite old and since we have been forking it to extend it, we inherited this as well. So yes, it is still supporting Python 2 and there are Asyncio and LibyEV as well connection class. So you can also change your connection class. The default ones is Asyncore 1. We have a question from Roberto. Q1 shards are pinned to CPU or what? Are there corrections for unbalanced shards, e.g. one CPU to be hit by more than the others? Is that correct? Sorry, one CPU to be hit by more than the others? One CPU to be hit by more than the others? Yes, if you have an imbalance in your data distribution, you will get the same kind of problem that you can get on a node, but instead of impacting the whole node, you will be impacting a core, yeah. So yeah, this problem exists as well in this kind of architecture. This is inherent of a consistent hashing-based architecture actually. Okay, thank you very much. There's a few more questions in the Discord talk channel. So you're going to find that you have raised a lot of interest. And there was also an off-topic question before we let you go. What's with those papers hanging there behind you? That's the question. Yeah, this is a way to keep notes above my head, you know? It's perfect. Thank you very much for taking the time to show. Thank you, everyone. Thank you for sponsoring EuroPython. And here's a round of applause for you and you're going to meet more questions in the chat.