 Hello, Alexis. You're on the stage now. Hello. Hello, everyone. Can you hear me clearly? Yeah. Can you hear me as well? Yeah. Great. You voice is clear. I'm going to add your presentation as well on the screen. Are you able to see that? Yeah. Great. Great. So, stage is yours, Alexis. Thank you very much. Well, hello, everyone. I'm really happy to be at Python India. So, in this talk, I will detail Cassandra and Silla, low-level architecture, and explain how it's used by Python Cassandra driver, and show you how we extended it to write a Python driver for Silla. There will be diagrams, emojis, Python code, and hopefully amazing performance graphs. Let's start with a bit about myself. As you may have guessed from my accent, I'm living in Paris, France, so that's where I'm broadcasting from, actually, at no moment. I'm CTO at Numberly, and I have different lives in the open-source world. I'm a Gen2 Linux developer, where I focus mainly on packaging cluster-related stuff, distributed databases such as MongoDB and obviously Silla. And I'm also part of the containers team, which means that I spend some of my time making sure that the Gen2 Docker images are on the Docker Hub. Then, on the open-source side, I'm also a contributor. I've been contributing to MongoDB, Silla, obviously, as well, Apache Airflow, and I'm a PSF contributing member, which means that I spend a fair amount of my time creating or contributing to Python open-source projects. In this talk, I will suppose that you are familiar with the basics of consistent hashing and Cassandra. If that's not the case, or you simply want to know more or even use it in your Python applications, check out the talk that I gave on the subject in a previous EuroPython edition. But Fiornot has still worked on not making this a problem for this presentation, so everything should be all right. Let's get started now and let me introduce you to Cassandra and Silla token ring architectures. We'll see all they have in common, but also what Silla has done that makes it special and worth a dedicated Python driver. First of all, since I will be using their logos in diagrams, let me introduce them to you. The eye on the upper left is the logo of Cassandra and the little sea monster with also only one eye is Silla. When you see both of them on the same box, it means that the presented topic or logic applies to both of them. When you will see only one of them present in a box, it means that it's specific to either Cassandra or either Silla. A Cassandra or Silla cluster is a collection of nodes or instances visualized as a ring. All of the nodes should be homogeneous using a shared nothing approach. This means that there's nothing special about a node in this topology. A node has no special role or anything. They are all equal on the ring. This ring is called a token ring in which the positions of the nodes on the ring define token ranges or partitions. A partition is a subset of data that is stored on a node. This data is usually replicated across nodes thanks to a setting that is called replication factor. The replication factor defines how data is replicated on nodes. For example, a replication factor of two means a given token or token range will be stored on two nodes. This is how high availability is achieved and how Cassandra and Silla favor availability and partition tolerance in the cap theorem. They are labeled AP. This kind of token or ring architecture is sensible to data distribution among the nodes. We could get an unbalance of data and query load in the above scenario where we store data in three big partitions and each node holds one range from the previous one and one range from the next one. You see 1, 2 on node X, 2, 3 node Y, 3, 1 node Z. This could lead to something that we call hot nodes or hot partitions. We need to add more variance in partition to node allocation. This is done by introducing the concept of virtual nodes. Instead of placing physical nodes on the ring, we will place many virtual instances of them called virtual nodes. A virtual node represents a contiguous range of tokens owned by a single node, so it's a smaller slice of a partition. A physical node may be assigned multiple non-contiguous V nodes. This is it for Cassandra's data distribution, but Silla goes one step further. On each Silla node, tokens of a V node are further distributed among the CPU cores of the node that we are called shards. This means that the data stored on Silla, on the Silla cluster, is not only bound to a node, but can be traced down to one of its CPU cores. This is the feature that we will leverage on the Python Silla shadowware driver later on, and we'll get into more details about this. But now that we understand how data is stored and distributed on the cluster, let's see how it's queried by clients. On the physical layer, a partition is a unit of data stored on a node and is identified by a partition key. A partition key serves to identify the node in the cluster that stores a given partition, as well as to distribute data across nodes in the cluster. The partitioner, or partition hash function, using a partition key, determines where data is stored on a given node in the cluster. It does this by computing a token, which is a number for each partition key. So if we look at the diagram, we have column ID, which is set up as the partition key. We would take the value of it. We will apply the partition hash function, which by default is murmur hash3. And then we will get a token, which is a number that we will place on the ring. And from this position on the ring, we will know which nodes it belongs to. So let's recap. On Cassandra, the hash of the partition key gives you a token telling you which node has the data that you are looking for. We can call this a sharp pair node architecture. On Silla, the same hash of the partition key gives you a token, not only telling you which node has the data, but also which CPU's core is responsible for handling it. So we can call Silla architecture a sharp pair core architecture. Now, how does a client driver query a Cassandra or Silla cluster? Well, since the partition hash function is known, our client libraries can use it to predict data location on the cluster and optimize their query routing. But let's first explain what a naive client would do if not using the partition hash function. When a client connects to a Cassandra or Silla cluster, it opens a connection to every node of the cluster. Then when it wants to issue a query, naive clients would pick one of its connections randomly and issue the query to the node. This node is called the coordinator for the client's query. It takes the ephemeral responsibility of routing the query internally to the right replicas, gathering their responses and responding to the client. But if the coordinator is not a replica for the query data, it has to issue the queries to all replicas itself. This is suboptimal as it consumes network and processing power on the coordinator node for something that the client could have guessed in the first place. And this is what we call token aware clients. Token aware clients applies the partition logic to select the right connection to the right node and make sure that its coordinator node is also a replica of the query data. As a result, we save network hops and processing, allowing for a lower cluster internal load and reduce query latency. Let's see how Cassandra driver does it. In the Python Cassandra driver, token awareness is achieved thanks to the token aware policy. The Python Cassandra driver defaults to token aware and the data center aware round robin load balancing query routing policy. It's a bit long to say and it looks complicated and all, but it's not that much. But fear not, it works even if your cluster is not spread between multiple data centers. It will just pick a default one for you. But by doing so, query routing will not only hit the right node holding a copy of the data that you are seeking, but load balance queries evenly between all its replicas. So if we look at the diagram, you see that if we were to issue a query that would hit partition number one that lives in node X, the first one would go maybe to hit this one. Then if we were to issue the same query again, we see that partition one is also living on node Y. So the load balancing strategy would then hit node Y and then node X and then node Y, et cetera, et cetera. This is how load balancing is achieved. From the point of view of the Python Cassandra driver, the partition key is seen as a routing key, which is used to determine which nodes are replicas for the query. To allow our Python driver to know about the partition key of a query, the query itself must be prepared as a server-side statement. Cassandra's NCLAS-prepared statements can be seen like stored procedure in the SQL world. It is the recommended and most optimal way to query data because you declare your query once and then only pass a reference to it and the needed parameters. It is also the safest as it prevents query injections, so please only use prepare statements even more in production. So I've put on an example of how it's done. You see the session.prepare where we prepare our statement and then we just execute the query with the reference to the statement object and the partition key, which will act as a routing key. This is awesome and optimal. One config that we cannot do better than this. That's true on Cassandra cluster, but not with a SILA one. Why? Well, if you remember, because SILA shards the data one way further down to a node CPU. So that means that we can extend a token-aware driver and token-aware client to become a shard-aware client, which is able to route its query not only to nodes, but to their right CPU cores. And you can note that it will be achieved with exactly the same code, which is pretty cool. So you just have to switch from Cassandra driver to SILA driver and obviously run a SILA cluster. And then you will have core CPU core query routing. So such driver already existed as forks of Datastax Cassandra drivers except for the Python one and it made me sad and angry at the time. So when I attended SILA submit last year in San Francisco, I did some lobbying and found some friends on the SILA dev team. We promised each other to make a Python shard-aware driver as a matter of fact, we did. So what are the expected structural differences between the Cassandra driver and its SILA driver fork? The main one, as you can see, is how we connect to the cluster and to the nodes. When in Cassandra, you have one connection per node because the node is the main and only point that you will be using to route your queries. You need to go further to the core in SILA. So that means that you will have to open one connection per core per node on the SILA shard-aware driver. Then the token calculation that selects the node is the same because this is the same logic, but then you go one step further again on SILA because you have a shard-aware calculation that you will be using to select the right connection to the right core based on your query. Now, let's see what needs to be done on the Python code side. Remember that we want to retain compatibility with the Cassandra cluster so that even the SILA driver can work seamlessly with the Cassandra cluster. And Israel Foster, who is a SILA developer from Israel, I know it's a bit confusing, made the first pull request for this Python shard-awareness. All the words that I highlighted in green represent a class in the Python code. So all those classes, we will have to check. And one of the first things I contributed myself was aimed to solve the connection to every core problem. So that's the second bullet point here. When you need to make sure somehow to connect or to get a connection to every core so that any query that is coming in, you will have a connection to use and straight to the right core. It's easier said than done, actually. Because when I first looked at the initial PR from Israel, I found out that it was technically not possible for a client to select the shard it wants to connect to at the connection time. And after thinking about it, it sounds pretty obvious because the Cassandra protocol does not root queries to core. So there was no point in the protocol itself and there was no manner in the main protocol itself to do this. And since the SILA protocol and Cassandra protocol are fully compatible, SILA did not provide this as well. Which means that all SILA shard-aware drivers are affected by this as well. So I wrote an RFC to the SILA mailing list to find a solution and great news. One has been found and is now implemented. And it will be part of SILA 4.3. It takes the form of a new shard allocation algorithm that will be made available on the new listening port on the SILA server. It will, on the server side, use a modular of the client's source socket port to assign the correct shard to the connection. That means that the clients will then have to calculate the right source socket port to get a connection to their desired shard ID. Until then, I worked on implementing an asynchronous and optimistic way of dealing with this problem because when you connect, you don't know for sure which shard ID you will get a connection to. So you have to implement some logic to not slow down the connecting and the popping up of your initialization of your Python driver but still make out most of the chance to get the right connection to the right core. So let's see this logic that is implemented. This is where the connection selection happens and where everything is glued together resulting to a shard-aware query routing. So the first thing that we do is we get a token and we need to calculate the shard ID from the query routing key token. So we got this and then we will get the shard ID. So we know which core, which is the shard or core ID on the node that is responsible for this slice of data. Then we will try to find a connection. If we happen to have a connection to the right shard ID and core, if we do, victory. We use our direct connection to the right core to route the query. This is the ideal scenario. If we don't, we will just issue asynchronously a new tentative, a new connection and maybe we will get lucky and this one will get us connected to this shard ID or any other missing shard ID. So this is the current version, an optimistic but non-deterministic way. The upcoming version when we will be able to use the source-based algorithm will be deterministic so we will be sure to always have a connection to every shard. So if we were not lucky this time, we will just pick a random one and just be as good as Cassandra Driver was. One important step also was to implement a way to know if we are connected to a Cassandra Orsila server. Detecting shard awareness is done by passing the message options set back by the server after we connected. In this example, you can see that the SILA shard information tells us which shard ID was assigned to the connection by the server and you get also some other interesting things from the SILA server side. Another important implementation was the famous token to shard ID calculation. So when the cluster class requests the connection from the pool, it passes the routing key. This routing key is then used by the shard ID calculation code to select the connection to the right core. But this calculation when it was implemented in pure Python was having a severe impact on the driver performance. So the first production results were not good at all. We were even slower than the Cassandra Driver so it was a bit disappointing at first. So Israel moved the shard ID computation to Cyton cutting down its latency impact by almost 7 and this made us faster than the Cassandra Driver. So now that we have seen the implementation details, does the SILA Driver live up to our expectations? And what do we expect when we push this SILA Driver in production? Well, from the one connection per core per node, there are two expectations from there. The first one is we expect to see an increase from the point of view of the cluster. We expect to see an increase of the number of open connections since instead of having one connection per node, we will have one connection per core per node. We expect to see it on the monitoring, right? And we effectively did as you can see on the first graph. The second one is since we have more connections open to handle, to keep alive, et cetera, we expected also to be more CPU hungry. So in our cases, we saw that indeed we had to adjust the resources of CPU to avoid CPU saturation on our Kubernetes cluster. So it's a small increase, but it's still an expected increase. So this one checked as well. But what about the major impact we wanted? Faster queries, lower latencies, right? Because these routine queries to the right core is what we expect we wanted to be faster. And this is what happened to our production workload queries max latency. We got immediately 15 to 25% performance boost. We were like, oh my God, this is amazing. And we are super happy. Please note that the graph that you're seeing is an actual production graph, and it's also the max processing time. So we like at numberly to look at our worst case scenario, right? So this one is the max. It's not the average or the minimum. It's the worst of what we do on this specific workload. So on the worst case, we win 15 to 25%, which means that we are 15 to 25% faster. What is interesting to note is that the performance boost is progressive since we connect to shards in the background in an optimistic fashion. When we just roll out the new driver, we sure don't have a connection to every core yet. We will happen to have it at some point, but since the first implementation is still an optimistic and nondeterministic way, it means that it will take some time. So the longer our application runs, the more chance to have a connection to all shards it has, the lower the latency gets. This is what you can see happening as well. And we expect this performance boost to be instantaneous when the new connection algorithm will be available. From another perspective, if we apply a moving median on another deployment, we see the major impact of the driver, which is pretty as well impressive, as you can see, and pretty stable as well, which was a very good thing to get. We also got an unexpected side effect of reduced latency plus lower cluster footprint. We could cut by half the pod replica requirements from a higher throughput application. So this one we didn't see coming, but since our workload was faster and easier on the cluster itself, it allowed us to actually save a lot of pod replicas on the Kubernetes cluster. So if you relate this to the small increase of CPU resources that it has been asking for, it's clearly a major win. We have done some enhancements as well. We have added some helpers to let you programmatically know if you are connected or not to a Cilla cluster. We have also added some shadowware statistics so that you can know if you have, I don't know, 50% only of connections to all the nodes and cores that are available or not. So that can be handy to have in your application logs. We are also working and we have an open PR for adding the new support of the new algorithm that will be available in 4.3. And we will merge and rebase the latest Cassandra driver improvements as well. So check it out. Contributions are very welcome. The driver is obviously available on PyPI. The documentation is there as well. If you see something missing or could be enhanced, feel free to tell us. And thanks for attending and making Python India a success. You can catch me online about everywhere with the handle Ultrabug. I've listed a quick link of my conference talks if any of them is of interest to you. And have a chat with us on the CillaDB user Slack. We have a Pythonistas channel where you can hang around and ask questions or give ideas, et cetera. Thank you very much for attending. Awesome, Alexis. Thanks a lot. We have a question from the audience. I'm just going to wait for that to pop up. So the question is, are shards on Cilla bring to the CPUs? Exactly. Shards are pin CPUs and pin cores of CPUs, actually. Yes. Great. Thanks. So Alexis will be with us on Zulip. So head to Zulip stream, daily stage. So I'll volunteer to share the link with you on the hopping chat as well. And thanks a lot, Alexis. Any closing words? Thank you very much to you. Thank you. Bye bye. Bye bye.