 Cool. Hey, welcome to the talk. This is about operational testing in Cassandra. So let me start with the brief introduction. I'm Pushkala Patabi Raman. I had performance and quality here in data stacks. We're going to talk about operational testing. What is going to be the impact of this? Some of our largest DSE customers have leveraged a systematic approach to operational testing in Cassandra, and this has enabled them to short-circuit the time for applications to be onboarded into the databases. It's been highly effective, and this talk is a sneak preview into what is it all about. Imagine you're sitting in a room and you have an application to go live and you're having a discussion with your database team and your application team and your architects. What are going to be the main topics of discussion? Hey, this is my database topology. This is going to be my infrastructure footprint. I really have just this capacity, and I need a specific version of the driver because this is the language I'm going to be creating my application in. Finally, the sensitivity of the application would drive what kind of business continuity processes and controls you would like to have around this particular application. This conversation has to happen, will happen. There is no two ways about it. This is how we build sustainable applications that cater to the needs of the business. But what does it translate to? What it translates to are two personas. You have an operator persona and a developer persona. An operator persona is a quit essential database-centric team there, which worries constantly about the uptime of your database and the effectiveness of the transactions and what do we need to do for infrastructure are we well stocked. And your developer personas whose concerns are about end-user experience and application error rate and latency and so on and so forth. So there is a balancing act between an operator persona and application persona, and there needs to be an equilibrium established so that the application launch is successful. Let's distill it even further. What happens behind the scenes? Behind the scenes, the teams again go back and figure out what is the business need that we are trying to solve, and what distribution of infrastructure is needed for this particular use case? Is it three GEOs, five GEOs, and do we need to have real-time replication across all key spaces? Or just one, localization rules, data residency rules, you name it, everything kicks in. And then of course the security configurations. What would be the ideal security configuration that meets the needs of the business, as well as the security posture of the organization? And these conversations, again, the goal is not to short-circuit these conversations or eliminate the conversations, but to come up with a structured approach so that the time to resolution is lesser and lesser as we get matured. Hence operational testing. What it means to do operational testing is to leverage the versatility of Cassandra and understand how Cassandra as a database reacts under load that is relevant to that organization, that application, and have metrics-based decision to enable, accelerate the choices that we make. And finally, ease of developer connectivity, right? The goal as operator is to avoid developers getting into a guessing game on what a particular config should be, how they should approach Cassandra as a database, or even learn Cassandra as an application developer, right? So how do we simplify that so that adoption of Cassandra becomes very simple for our app developers? And finally, left-shift security processes so that there is, again, no cognitive overload on am I doing it right? A typical testing landscape that is centric around database involves the following, right? There is a piece of pie that says, like, Cassandra connectivity test. Am I able to connect? Is my application able to connect? This involves, where is your application situated? What driver is it using? And then you have pre-production infrastructure health alerts. Let's keep it real. In most organizations, production and pre-production infrastructure layout and the shape and form and size are distinctly different for various reasons. So you need to have, again, a systematic approach to are we getting the right health metrics and alerts that would carry over and translate into production alerts. Business continuity and DR validations, again, we do not in most organizations have the ability to perform the same due diligence on business continuity and DR in lower environments as we would do with production environments. Then how do we build that muscle? Where do we build that muscle if we don't have the same level of ability? Finally, the quitters initial big elephant in the room which is database version upgrades. Whenever we have a database version upgrades, all bits are off, right? Like you need to go through more extensive testing and ensure that nothing happens to the existing infrastructure as well as new one. So in a typical operator world, this is how the testing landscape looks like. How are we changing it? What does it mean to have operational testing approach? Is to kind of left shift and change all this point in time testing to a more continuous pattern and you develop that muscle for it. For instance, this is what we do in data stacks with respect to DSC, right? Like we have nightly basic connectivity and security evaluation testing and then we have infrastructure readiness evaluation under load that runs weekly and then we have even more exhaustive BCPDR evaluation that runs quarterly. The complexity increases. When you do basic connectivity and security evaluation, you verify for driver validity. Are we connecting right? Are the metrics looking optimal? And when you do the infrastructure readiness, you up the game a little bit and you add the load quotient goes higher and your infrastructure footprints goes wider. And then when you go on to quarterly BCPDR evaluation, it gets even more complicated, both from infrastructure and load perspective. So right, we do all this. What does it eventually tell you? It gives you three signals. One, it removes the guessing game from is the database ready to accept a new application or a workload type? Two, it nudges the teams towards what we call a reference architecture pattern where the teams have a proven and well-tested set of parameters and configurations that tend to be acceptable to most applications in the organization. And when there are outliers, the delta is small and you can have a very focused discussion about it. This forms, this kind of creates a common denominator of sorts. And finally run books. This is perhaps the biggest outcome that we have seen is because of the rigor of doing all these tests, we have come up with a lot of run books and that has enabled our time to react if in production an issue happens. The operational testing kind of has this very pleasant outcome of generating these run books for all weird scenarios that can come up. So what is the recipe for it? The recipe is three. Access to infrastructure, access to business context and automation. And all these three are vital. Like without access to infrastructure, it's just smokes and mirrors, right? Like what are you actually testing all of this on? So that is the first buy-in. As an organization, we have to lean in and accept there is an infrastructure investment to this approach. Two is access to business context. And this has been where traditionally operators, they have their own workloads that they work with and they model their workloads and their testing is based of rather simplistic workload that they come up with. Now, with NoSQL Bench, and there is a session peeling the anion and looking under the covers of how NoSQL Bench works and how teams can leverage that. That's happening tomorrow. Provides the ability for even an operator persona to create those workloads with business context. When they do that, you kind of don't have to worry about what language the developers are going to code in or understand deeper, but there is a certain degree of translation that happens into no code, no code format that enables even operator personas to come up with workload that is representative and similar to the actual production workload that's coming in. And finally, automation, key to it. You can have access to infrastructure. You can understand your business context to the T, but if there is no automation to help accelerate, it's going to be a slow, long drawn process with no effective impact immediately. So this is the recipe that we adopted and we found it was very successful. Further, what does it mean in our world? As I said before, our nightly test, we have small ephemeral clusters that come up every night and we do exhaustive testing, thousands of tests get run. And then we have weekly soak test that that's about 45 DSE nodes. And this is in the context of data stocks enterprise, again, Cassandra version and three DCs and it runs for, I mean to say 48 hours and it has moderate workload. And geo clusters are like really huge clusters, 19 node DSE clusters. It's representative of some of the complex workloads and data patterns that are more demanding on the infrastructure footprint. And again, three DCs and we tend to put our newest and greatest features at scale to test there and it runs for three weeks. And as you can imagine the infrastructure footprint, the complexity and the time involved to run all of this goes exponentially more as we go towards the right. Again, the key ingredients in this recipe are three, like infrastructure automation. It's an open source tool fallout, it was homegrown but we have open sourced it a year ago and it is an orchestration engine that helps bring up infrastructure, set up your clusters, install a specific version in your cluster and literally go through the whole process. C-Tool is again, a component that helps us manage the infrastructure elements better. And again, NoSQL bench, amazing tool. I would highly recommend all of us to take a look at NoSQL bench and fallout. And finally scenarios, right? The evolution of scenarios is pretty key. Like we start with some table stick scenarios but as time evolves, like there are gonna be more scenarios and we keep adapting and adopting more scenarios. So what is the outcome of all of this? Here is a sample outcome of what all of those operator tests lead to. We generate a large report that goes over I have about 13 scenarios here and this is a year old report that I pulled out. And you can see how deep it goes. It goes to like, okay, what happens when I remove a DC, add a DC, rebuild an existing DC? And all of this is in the context of 90 nodes, geo-distributed, Cassandra clusters, right? It's not small, three node cluster. They are pretty heavy clusters at this point. And another outcome is a datasheet that gets generated out of these tests, which forms an effective conversation piece when we sit down to go back to slide three. Like your application team is in the room and they're having a conversation about how do I design my database? This is my use case, right? Like this is a good conversation piece there that helps to bring people together. And we've seen this happen time and again with some of our largest customers, the feedback has been very productive and said like this indeed helped us accelerate and reach a conclusion fairly quickly. It talks about our driver setting, our workload configuration and database configurations and what are the operations and what does it look during the operation before an operation? And this is fairly condensed but effective information when put in context with a business team in the room and application developers in the room helps reach an objective judgment quickly. So what is the key takeaway here? The key takeaway is structure the database conversation. There is a common denominator to all of this and that common denominator can be deciphered if you approach database architecture very systematically. The whole notion of reference architecture has been proven to help a lot of organizations scale their database operations very efficiently and quickly. And that is something we should all take away from this conversation here today. And that's it.