 Okay, so the title of today's session is really about pre-cloud Cassandra to multi-cloud Cassandra. And I'll talk about the evolution. And the reason why I talked about the in-person events was because basically this is something that has evolved, something that's, I kind of built a demo and I'm improving the demo as we go along, right? So initially I set it up for a multi-cluster as you all know. One of the biggest challenges in Kubernetes today is being able to do multi-cluster. It's not easy to do that, right? A little bit of it depends on the application that you're building. It depends on the platform that you build on top of Kubernetes and all that. But you'll see how actually Cassandra and Kubernetes come together and the cloud come together in kind of a remarkably cohesive manner. And that's kind of what I'll show in the demo. What I'll do is kind of similar to what many of the cloud operators do, they have a control plane and then you deploy your data or your data plane, your control, your data plane, where the control plane. So likewise, I have AKS as the control plane and then I have the other two clouds, which is EKS and GKE as the data planes. So which is very cool to be able to do that. And it's a pretty opinionated implementation, but it works nevertheless. So the agenda for today is basically I'm going to do a quick intro, talk about NoSQL for those of you who have never heard of NoSQL. I'm sure you have, but I think it has a lot to do with the cloud as well. It's really about horizontal scaling and being able to provide the durability resilience that are typical cloud properties. So there was the cloud before there was the cloud. That's kind of how I put Cassandra. Then Kate Sandra or sometimes called Kat Sandra, it's Kubernetes version of Cassandra or Cassandra on top of Kubernetes. Now again, the biggest challenge that we face today, at least in the communities world, is the multi cluster and multi cloud and multi region and multi everything, right? And actually we have introduced a new operator called Kate Sandra operator. Earlier we had something called as a Cass operator and I'm going to briefly touch up on all of those. But what the Kate Sandra operator does is kind of sits on top of Cass operator and what it does is it deploys to multiple clouds or multiple regions or multiple clusters and so on, okay? Finally, I'll do a demo and like I said, it's an evolving demo. I started it at KubeCon NA, Los Angeles last year where I did a demo on GKE. Then I had to take the same thing and do it in AWS re-invent. But a demo on GKE is probably not gonna cut it in AWS re-invent, right? So what I did was I adapted it for EKS and I used a project called EKS KubeFed, which takes care of the networking aspects and the theme is common across all these different clouds is that once you enable the networking, right? Everything else kind of falls into place because Cass Sandra is just built to be able to kind of build a bigger cluster by just gossiping with the nodes to the other nodes which are called the seed nodes and then you can build a big cluster and then Kumbaya you have the entire cluster up and running, right? And then I'll talk about what's next and that's pretty much it. To introduce myself, I'm, I go by, I mean, my name is Raghavan Srinivas, no problem here. Most people can pronounce my name, but I go by Rags. Rags, not the richest, easiest way to remember me. I'm a developer at heart, but I also was a mechanical engineer from UBC Bangalore long, long time ago. I'm specialized in distributed systems, but really I love to teach and communicate. I even teach in the Boston area, which is where I've lived pretty much half my life. And then one of my passions is about inner loop because I think it makes a whole lot of sense. I know in the earlier talk was about flux, Argo CD and so on, that is really more of a robust CI CD kind of loops, but here what we are talking about is something that is a lightweight inner loop and I'll talk about that. Actually, next week there's gonna be a talk in Berlin for OpenInfra, where I'm talking about Quarkus and how we can use that and deploy it on Kubernetes and how we can gain a lot of productivity by, you know, in the inner loop. All right, moving on. I work for, you know, DataStacks as a developer advocate and we do run workshops every week, pretty much every week. We used to run more than one before. All of them are free, feel free to attend them. They're all virtual. We get a lot of people from India. We get a lot of people from US. We get a few people from Africa, every other continent really, not many from Australia because the time doesn't work out really well, but you know, if you can attend, that'll be great as well. What is Noseco? You know, it was just something that somebody came up actually, I was in San Francisco in June level 2009, you know, roughly around that time, where we had to come up with some kind of catchy hashtag for something that is not SQL, right? You know, SQL is the traditional relational data model, which kind of the only way to scale is to by going up and up and up, right? And at some point it's gonna go so big that it's gonna fall off, right? I mean, it's not gonna work to kind of vertically scale. But instead what you do is you do commodity hardware and spread your data across and by having multiple copies of the data, because you know, disk is very cheap these days, right? You know, I'm gonna give away my age if I say that my first, you know, PC had 20 megabytes of disk space. Okay, think about that. 20 megabytes of disk space, not even giga, right? And you know, I was so happy when I moved from the XT to the AT, which went from 20 megabytes to 40 megabytes. So I doubled my disk space, which was like just superb, right? But, you know, disk is very cheap these days and that's why, you know, it's better to spread your data across and that's exactly what NoSQL does in general. I mean, you know, there are other things as well, but in general it's about horizontal scaling. It's relational and NoSQL on two minutes. One is about, you know, scaling up and really this is about, you know, horizontal scaling, which means you have to distribute the data, which means you have to kind of take the data and put it on different nodes, right? And there is this concept of sharding and I'll get to it in a second. Okay, but the point here I'm trying to make is that, you know, NoSQL really preceded the cloud but had a lot of properties that are similar to the cloud and you will see in my demos and in the rest of the talk that how the, how kind of all of these come together. You might have heard of the CAP Theorem. Basically what this states is that, you know, if you think about consistency, availability and partition tolerance in a distributed system, when there is a failure and failure is inherent in a distributed system, right? It does happen. You have, you cannot have all these three properties at the same time. It cannot be consistent, available and partition tolerant. You have to sacrifice at least one of the three, right? And this is very critical. And if you think about this, you know, and we can go at length about this, but partition tolerance actually is worse than any of the other, or rather sacrificing partition tolerance is worse than sacrificing consistency and sacrificing availability. If it's not available for a few seconds, maybe even a few minutes, it's probably okay. If it's not consistent for a few seconds or a few minutes, it's probably okay because eventually it'll become consistent. If I'm not on the leaderboard, immediately it's not a big deal, right? You know, it will eventually get there. But if you lose partition tolerance, then, you know, even my grandmother can make out that, you know, the data is not consistent. A little bit more about Cassandra. What it does is, you know, it uses a ring like architecture as it's referred to. And if you, the oversimplified way of looking at this is that what it does is it takes data, right? You know, it uses something called as a partition key and it hashes the partition key into a particular node into this node, you know, ring of nodes. Now, what happens when you add, you know, nodes and subtract nodes and all that, right? The system takes care of all of that. So, again, the oversimplified view is kind of look at it as like a distributed hash, you know, table or whatever you wanna call it, right? Except that the platform takes care of all these failures, you know, eliminating hot partitions, you know, what happens when nodes come up and go down and all that and all of that happens automatically. We take care of the sharding itself. Sharding turns out to be very hard. So, you know, leave it up to the platform. Now, a lot of, you know, big companies use Cassandra. In fact, Cassandra originated in Facebook and Facebook is still one of the contributors to Cassandra. And again, you know, what is the big deal about talking about Cassandra, which is already maybe about 13, 14 years old, right? The point is that, you know, I think the thing that I keep hearing repeatedly from our customers is that, you know, Cassandra rarely, rarely goes down, if ever, right? And also in terms of scaling, you know, if you look at, you know, like, you know, 100 nodes and 1000 nodes, you know, your, basically your response or latency, throughput or whatever, you know, should scale linearly. And in many systems, they do until a point and after that, it kind of levels off with Cassandra. It keeps on taking, you know, it keeps on going and it's very linearly scalable, you know, without too much effort. Now, having said all that, what is Kate Sandra, which is, you know, because we're talking about Kubernetes here, right? Can you really run a database on Kubernetes? This was one of the articles that one of our colleagues right now, Christopher Bradford, who was on the other side, you know, did not work for data stacks at that time, basically wrote, can you really run a database on Kubernetes? Now, this was about four or five years ago and obviously a lot of things have evolved and you will see, you know, today that there is something called as the Cass operator that we rely on. So a lot of things have changed, but today, you know, definitely data and Kubernetes kind of go together and you can talk about that. It's not just about stateless apps. It's not just about 12 factor apps, you know, if you remember all that, but, you know, it is really, you know, prime for, you know, putting data on Kubernetes. One of the cool things about Kubernetes again, you know, is whether you want to do it on-prem or whether you want to do it on, you know, different clouds. I think I saw Siebel in the previous session. So really it doesn't matter, right? You know, very, very straightforward. You want to install it on your laptop. You want to install it on-prem. Everything, you know, just pretty much works the same, right? To be able to install Cassandra, we install a number of different components and you can see here that we, you know, have all of these installed with, you know, something called as a CAS operator. You can see here in the right, the CAS operator installs, Cassandra, we have something called as Repair, Medusa for backup and restore, Stargate, which is a unified API developers allow because it has REST, it has SQL, it has GraphQL, you know, it's basically a unified API. We also have, you know, if you want to provide ingress that is traffic and then there is, of course, Prometheus and Grafana for metrics. So all of these are automatically installed for you. We leverage Minio, you know, so that you can really do it on any of the, you know, different clouds. How do you install it? Very simple. Repo add the repo and Helm update and Helm install, Cassandra. Very simple, right? So this is it for a single cluster. Now what happens in the case of a multi-cluster, right? So we introduced a operator called Kate Sandra Operator and we'll see that today. We basically, what happened was we pushed Helm to the limit and then built a new operator called Kate Sandra Operator, okay? And if you want to understand a little bit of the rationale of why we did this, you can kind of look at this particular article on new stack. Why multi-cluster? Cassandra has always been designed for multi-region. You know, it's partition tolerance is extremely important for us. So even if one of the, you know, the nodes goes down or maybe a set of nodes goes down, you still should be able to service the request from, you know, from the customer, right? Nodes automatically route traffic to the nearby neighbors. You know, they use something called as a gossip protocol and really the theme here is going to be once you set up the networking, everything just kind of falls into place, okay? Data is automatically and asynchronously replicated and really from a Cassandra perspective, the cluster is, you know, very homogenous. Humanities on the other hand was not designed for multi-region. There were some design decisions that were consciously taken. You know, I recently did a panel on networking for KubeCon and essentially even there, right? You know, there were certain design decisions that, you know, sometimes the designers probably thought, you know, or right now considering maybe they probably should have had it back, right? They should have done it a little differently. But again, the idea there was it's really about, you know, kind of taking this apps today and adapting it to the cloud and making it, you know, in such a way that it works on, you know, all the clouds and so on. That's really the goal for Kubernetes. It was really built, you know, it was a platform for building platforms and was not, you know, designed for multi-region. So how do you adapt, you know, Kubernetes to be able to do this for multi-cluster, multi-region, multi-cloud and so on is using the, what we call this the Kate Sandra operator. The Kate Sandra operator is still in its kind of early days. It's not completely there yet, you know, for example, we don't have HA available, but essentially what Kate Sandra operator does is it's an operator of a Kate Sandra and supports multi-data center region and Cassandra clusters, okay? It consists of a control plane and a data plane and you'll see all that in action in a little bit, assuming my demo works fine. The control plane can only be installed on a single cluster for now, but you can, you know, double a cluster as both a control plane and a data plane if you want. So how does this work in terms of, you know, the Kubernetes purists, right? Essentially what you do is you inject the conflicts of these two clusters, the data plane clusters into the control plane, so that the control plane is able to manipulate the data plane. And that's really what, you know, how it works. So what I'll do is in my demo, I will talk about the evolution where I did this manually in my first demo, which I demoed at AWS Reinvent, right? And then, you know, when I did the same demo in KubeCon EU just a few weeks back, I used the Kate Sandra operator to do the demo. I will show you the evolution during the demo, okay? But essentially there is a concept of what is referred as a Kate Sandra cluster and here you specify what is the context. You know, remember the Kubernetes context I was talking about, right? One is the East context here and the other is the West context. And basically you can set up each of these data centers. You know, you will see in my example, I set up these different contexts and each of those will have their own kind of disk provider because, you know, each of these clouds have different disk providers and all that. And you can tweak all that and then finally install the Kate Sandra cluster via the Kate Sandra operator, okay? And this is kind of, it talks about how you can do this, inject the client conflicts into the, you know, into the data, I mean, into the control plane. So you have the data plane and the control plane and you take the client conflicts from the data plane and inject it into the control plane. And thereafter, you know, everything is ready to go. So in terms of demos, I wanted to set up, set this up a little bit because, you know, otherwise it's gonna be a little bit hard to kind of explain the whole thing. Like I said, the critical part of this multi cluster with Cassandra is to be able to set up the networking in a way that all of this happens pretty much automatically, right? For GKE, which is where I started, it's pretty straightforward to do that on EKS, not so much, right? Luckily for me, what I did was, I stumbled upon a project at EKS, CubeFed, which is based on CubeFed, but it's really for EKS. And what it does is it sets up all of this for you. It sets up Bastion host. And Bastion host is the one you can use to jump into these different, you know, clusters, right? And then what it did was, you know, you'll see my installation in a little bit. You have EU West one on a VPC 172.21.0.21. And then I had a cluster two, which is RAGS Fed 2.2 at EU Central One. You know, this is, I believe, in Dublin, and this is in Frankfurt, and you'll see both of these in action. The nice thing about EKS, CubeFed, is that it sets up all this networking. I'm not a networking guru myself, but once you set this all up, it's very easy to install Cassandra, because like I said, what it does is it looks for the neighboring node and kind of builds the cluster and forms a big cluster. So all of this happens automatically, okay? So with that said, let's see if I can jump to the demo. This is the multi-cluster with the GateSunder operator goes into a lot more details of exactly what it does. Like, you know, it has the different contexts that are injected here and, you know, the different components that can be adjusted and so on. But let's not worry too much about that. Let's go to the demo and see how it goes. All right. My session has been terminated, so I'm going to try to connect again. This is the CubeFed demo and basically I'm connecting to the Bastion host, okay? And let me clear this. And let me go into set up a few stuff. As you see before, let me try it again. Let me see if I can do it once more. But let me set this up, actually, try it once more. Maybe it's already here. Okay, let's try this. Get rid of this again and let me try it once more. Okay, we can go back here and try to connect. I'm just trying to get to the shell that where I've done the demo before. Okay, so one more try. It doesn't work out more until the next demo. So you can see here that I'm going to show the context, okay? So you can see here there are two clusters. One is the Fed 2.1 and the other is the Fed 2.2.2, right? So all that I need to do is kind of look at this. We just describe nodes and get the topology, right? And you will see here that this is an EU central one, one B, one A, one C. So Cassandra by itself is rack of air, but what I'm doing here is showing a larger cluster. And for those of you who are Cassandra admins, you know that you use a utility called node tool and that's exactly what I'm doing. I'm providing the admin and the username and the password and you will see here that it's basically a big cluster, DC1 and DC2. And you can see here that all of them are up and normal, you know, all of them are up and normal and so on. And if I want to look at like, for example, Q-Curl get parts, just go wide, right? You will see all these different, you know, like the Reaper, the different components, the DC2s which are in different racks. And then you will see somewhere here that it's running the CAS operator underneath. Okay, so that's exactly what is going on in this particular node. Now if I use context, and if I go to the other cluster, you will see if I do the same thing, if I look at the topology, you'll see that this is less, right? So essentially what this is showing is actually let me do this, get nodes as a wide, okay? And this might give us a little bit more insight into this. So you can see here, it's all 172.21, right? 172.21 and so on. So if I go back to, for example, you know, 22, you'll see that those are probably 172.22. So all of this networking aspects and all that are set up by EKS, and I really didn't have to do anything, okay? So this is as far as, you know, just EKS is concerned. The next part is what I'm going to show is like a really cool demo of, I'm using a product called JVatrix. And essentially what happens here is it enables the networking, very similar to what happened in EKS, right? What this does is it, you know, it sets up the multi-cloud, and you can see here for EKS, it sets up 10.2. For EKS, it sets up 10.1. And for GKE, it sets up 10.3. It does all the, you know, the routing, the peering and all that. So to look at this multi-cloud, what I'm going to do is I'm going to go look at lens, okay? And I've set this up already in the hard bar, right? And this is the EKS AVX cluster, right? And if you look at the pods, you can see here, you know, there are different pods here. You can take a look at, you know, what's going on here. You can take a look at the deployments. And if you dig through this, right? You will see, like for example, let's look at namespace k-center operator, right? And if you look through this particular operator, you will see that this is set up for, I'm trying to figure out where it is. Basically, the data plane is said to be true for this or the control plane is said to be true for this and the data plane is said to be false for this, right? So this is for EKS. For EKS, you know, if I do the same thing again, you know, if I look at the pods and if I look at the operator, okay, you will see that, you know, again, this is 10.1, which is for EKS. Okay, and you can see here that the k-center control plane is said to be false. So this is one of the data planes, right? So likewise, if I go to the GKE cluster, right? Same thing, you will see the same thing as well. Hopefully, right? Oh, I had to look at the pods. I mean, look at the k-center operator and you will see here that, you know, the control plane is false. And essentially, you know, all of this is being managed by the k-center operator, okay? So now, if I want to take a look at the unified cluster, right? What I can do is I can do a quick AVX. You'll see here there are three clusters, right? The EKS and the GKE. And again, I'm gonna run the same utility that I ran, okay? Except that here, I'm specifying the username and the password and then I'm gonna take a look at the status. Okay, do you see here? Hey, Rags, sorry to interrupt you. Yeah, I'm almost done. Just trying to check on time, yeah. Yes, this is the EKS and the GKE. And what I've done is installed this through the EKS cluster, which is really cool because, you know, all of these are joining together and creating one human cluster, okay? So with that said, let me go back to my presentation and finish up. If you have more questions, you know, feel free to jump on Discord. We have a very active discussion on Discord, but you can also go to k-center.io. You, we have our YouTube channel, Datastax Devs, all kinds of, you know, different workshops, you know, preliminary, we even have our badges, you know, which are very, very popular. So feel free to, you know, kind of subscribe to the channel, attend the workshops and you have to do some homework, we grade them and you can get some badges as a result. That said, I really wanna thank the organizers of KCD Chennai for putting up this event and having me here, you know, again, hope to see you all in person and take care. Thank you.