 Thanks everyone for turning up and thank you for obviously the millions of people who'll be watching this online. So my name is Johnny Miller. I'm from a company called Axanops and my talk is about our tool Axanops, which gives you one stop operations for Apache Cassandra. So Axanops is essentially a tool that allows you to manage, monitor and maintain your deployments of Apache Cassandra. And my background and my team's background, we've been using Cassandra for many, many, many years. I started using Cassandra with 0.6 when it was like pretty bleeding edge. And we've pretty much had hands on most Cassandra weirdness in the world over the past decade. And we've seen the tools, the evolution of those tools over time. But what we were finding was we were spending a lot of time looking after the tooling to look after Cassandra, as opposed to looking after Cassandra. And there's some great tools out there, you know, a big fan of Grafana, Prometheus, Rundeck, Elastic, all that good stuff. But what we were doing finding is when we're deploying Cassandra into environments, cloud or on-prem, we would be spending way more time doing the dance around getting the tools, working the tools deployed, the tools secure, the tool, the currency of those tools than just looking at Cassandra. So it's basically hard enough getting Cassandra deployed into a restrictive environment. You come in, you say, I'm going to drop Cassandra in there. I'm going to drop half a dozen different agents. I'm going to open up a load of ports and start sending stuff around your network. You know, we work with banks, insurance, governments, healthcare, the kind of places where they're really sensitive about what you drop on their servers. So we decided that we'd take a step back and we'd build a specific tool that gives you, in the strap line, one-stop operations for Apache Cassandra. And one of the call-outs there is we also developed this own bidirectional network protocol that allows us to communicate both in the sense of collecting all the logs and metrics of Cassandra, but also taking that information and acting on it on the cluster as well. So we have one network, one socket, and that's all the logs, all the metrics, and all the control flags for doing things on your Cassandra cluster. So, AxenOps, if you want to have a little look, there is a website there, demo.axenOps.cloud, which is a fully-working, deployed AxenOps in a Cassandra cluster. You can click around and have a little look at them. I'll show you the demo as part of the site. And you can see basically the features that we support. So we support Cassandra 3.11, 4.0, 4.1, and 5.0. We have 5.0 beta-1 running. I wouldn't suggest anybody puts it into production, but we're kicking the tires pretty heavily on Cassandra 5, and as soon as it goes to GA, we will literally be at the door supporting it. So we've done a lot of work to take in the new features. We have specific dashboards and instrumentation about the vector search and all that kind of new stuff in there as well. It also gives you performance dashboards, which you'd expect, a way of looking at all of your Cassandra logs. And we have service checks, alerting notifications, backups, repairs, rotary starts, reporting, basically everything you need to look after your Cassandra cluster. But why don't I show you instead? So let me pop out of the presentation. So this is the demo Axanops.cloud cluster that I was telling you about. So when we look at the monitoring side of stuff, what we have when you come in, you see a whole bunch of dashboards that these become preconfigured and deployed for your Cassandra cluster. These are based on our experience, our recommendations about what you want to be looking at, what you want to be monitoring, what you want to be keeping an eye on. But it's completely customizable. You can go in, create your own dashboards, create your own graphs. It's entirely within your control. And worth pointing out that the language we use to define these dashboards is we have essentially a PromQL interpreter that allows... So everybody's familiar with PromQL, so we figured why make something else. So we support kind of PromQL type queries for producing dashboards or defining alerts and such across your metrics. We also on the monitoring side have your logs, as you would want. So all the logs coming in around your Cassandra cluster. And then we have also service checks. And I'll go through each one of these as we progress. So when you start looking at the monitoring side, one of the problems I had was the really two things. The amount of data you want to see in a dashboard. Now, with Cassandra, there is an enormous amount of metrics that you need. And when the traditional tooling was essentially not really capable of keeping up with the volume of metrics, both from... And what would you end up doing? You'd end up reducing the metrics you were taking off Cassandra. And the number of metrics really correlates to the usage, the number of key spaces, the number of tables. So as the cluster starts getting more usage, you start to get more metrics. You end up in this horrible cycle of blacklisting lots of metrics. And then when something goes wrong, you're like, I really don't know what went wrong because I didn't capture all these metrics. So we support... We can capture tens of thousands of metrics off your Cassandra node. Which is pretty cool, I have to say. And we don't use JMX for this. Because what we also found was when you're capturing those metrics through JMX, you actually hit the JVM quite hard. And you actually interfere with what you want your JVM to be doing, which is servicing the requests to your clusters. So we put a lot of work into really efficiently getting those metrics out of Cassandra. You don't notice it happening and the volume that we can take. And then the flip side of this was the other challenge we have is the precision of those metrics. So when you're looking at the other thing you typically end up doing is reducing the precision. So you say, oh, I'll take it by a minute. I'll take it by five minutes because you can't cope with the storage. So we support a five-second granularity of the metrics. We can support even more, but we're kind of like, well, five seconds is probably okay. So when it comes to doing the kind of inspection of what's happening on your cluster, you can really dig in to a really, really fine point on what you're trying to look at. And that might sound five seconds a minute what's the issue. But when you've got like 300 servers and you have to search for something across a five-minute window of logs and metrics across 300 servers, that's a lot of stuff to be looking at. So the smaller that funnel I can get, the more precise I can get around, whatever I'm looking at, it means I can find what's going on further. So we developed a graphing engine that's complementary to how we deal with metrics. So we're able to render thousands of data points, stream them through, and if you're doing this through something like, say, Grafana, I'm pretty sure everyone's experienced this. You're in three in the morning, you're dealing with an issue, and Grafana just starts dying because you can't cope with the number of metrics, the number of servers, et cetera. So we can render literally thousands of servers, thousands of tables all in a dashboard, and you touch word, he says, you will not even see those kind of problems. So that was a big deal for us. So when you look at the dashboards, all of these are defined through PromQL. So you would just literally create the Prom query for this dashboard. It would again render it with a select of choices. And the nice thing you can do is, as I said, you can really be zooming in onto the kind of levels you want. And at the same time as zooming in, we also zoom in on the associated logs with that time window. And of course you can filter across whatever else you want to. So when it comes to looking at, okay, I had a spike, what was happening at that time and that spike, it's all there in one place. And you don't have to be popping out into an elastic dashboard or something else, and you're not copying, okay, I've got this window here, I've got to go to this dashboard here and copy this window there. It's all in one place. The other thing that we've got on the monitoring side is what we call service checks. Now service checks are your kind of more custom type of check. You might want to run on your Cassandra note. So things like, I don't know, are my SSL certificates still valid? Have they been revoked? Is this port listening? All those type of things. So what we give you there is the ability to create custom service checks with a bash-like syntax. It's not bash. We stop you doing things like RM minus RF and things like that, but you can essentially express any kind of check you want to happen on your service and you can configure accent up to say, run this every minute and what you do in your script, you give a non-zero exit code for your check, you get an alert triggered. So we have people using this to monitor other agents that this is running or this port is listening, et cetera. And the nice thing here is we also template out a lot of the things. So for example, if you wanted to write a service check, that's check my Cassandra port is listening. You're not going to type port 9042. You can just put in a template. Same thing with the address. I'm listening on this address. So each node gets pushed. It gets templated into what is the configured values for that server. So that makes it very easy to produce these. And it's very rich. We have clients doing really complex type of checks through this. Then on the maintenance side, there is a whole raft of things you're going to want to be doing with Cassandra. So we provide all of the necessary tools you expect to have from maintenance perspective. So the first thing you've got, which everybody using Cassandra, it's a really interesting space in Cassandra is repairs. Now, if you know what repairs are, if you don't know what repairs are, repairs are an absolutely necessary operational maintenance activity. You have to run across your cluster on a regular basis. And it involves a lot of complex scheduling and managing of that activity. And I've seen every flavor of where you would run repair over the past decade. And we basically said this is really too complex. So we've developed a tool called adaptive repairs. And what adaptive repairs is doing is essentially you want to enable it, you toggle that switch, and that is it. You do not have to worry about repairs on your clusters anymore. It is able to you add new tables, you have different GC graces, whatever happens on your cluster, it automatically picks this up and repairs that data for you. But what it also does, which is the cool stuff, is if you run a repair, typically you kick that off, and it's going to, you know, that horse, that is out of the stall, it is running. At the same time, you start, it's competing with the resources on your cluster. It's using CPU, it's using IO, it's using network. Then all of a sudden it starts to interfere with the performance of your requests. And, oh, you know, what am I doing? Oh, I've got to kill the repair because my 99th percentile latencies have gone off click and et cetera. And then you're continuously going, oh, okay, now I'm going to turn it back on again, et cetera, et cetera. What we do with adaptive repairs, because we have this fantastic plane of sight into what's happening on the cluster, we're looking at the performance and we're increasing or decreasing the intensity of the repairs based on what's happening on your cluster. So you don't need to go in and all of a sudden you have a big spike in requests coming in. Adaptive repairs will slow down or stop, until that goes away. Then once it has the headroom, comes back in, kicks up, speeds on, and keeps the repairs in place, which is very cool. But the even nicer part of this that you get is most clusters you have a kind of a seasonal kind of workload. You have periods of high load, periods of low load. But the cool thing with adaptive repairs is when your cluster is idling, we take that. We will get ahead of the repair, we'll get ahead of that GC grace. So what that means is eventually, as the adaptive repairs run through their cycles, you're essentially beating the GC grace, and ultimately your data ends up in a much more repaired state, because it's naturally being repaired based on what's happening. So overall you get a much more consistent answer with your data. And then we also have scheduled repairs for the kind of old school repairs that you do want to do under certain circumstances. Other thing on the kind of maintenance side is rolling restart. Now you might think, what's wrong with a rolling restart? I can just run it in the way you go. But when you're looking after, you know, 300 node Cassandra cluster in four DCs with multiple racks and multiple AZs, if you do one node at a time, you're going to be waiting a long time to get a rolling restart done. So what we have is the ability for you to create really complex restart strategies in AXNOPs. So you can do things like restart more than one node in each data center at the same time, or one node in each rack in each data center at the same time, and really build it based on what you're doing, you can really get ahead of that rolling restart and be much more structured in the restart strategy you use. But the cool thing about it is, well, which I love because I hate this, is you often have a change window at like three in the morning, and when you set up this rolling restart, I schedule it to run, I go to bed and the rolling restart happens. And then it also integrates with the alerting as well. So when you can have it set up so that when the restart starts, you get a notification, when it finishes, you get a notification, or if it fails, you get a notification. So you can basically schedule a rolling restart if it fails, trigger a pager duty, wake me up, come and do what needs to be done. And that's quite useful for my sleeping patterns. And then on the, it wouldn't be much of a database management tool if we didn't support backups, okay? So with backups, you want to be backing up your cluster. It's one of the things I see in the field on a regular basis. People think just because Cassandra is highly resilient and I can have multiple kind of replicas of my data, et cetera, I don't need to back anything up. That is a fallacy. Even if you think you don't need to back up your data, you should be looking at backing up your system data in your nodes. And it's really important to do this because it isn't necessarily about, oh, can I rebuild my node if I have something going wrong? It's your recovery time. It's how long does it take for you to get that node back online or that data center back online because something's going wrong? So backups are really important. It's a database. You should be backing up. But what we do with our backups is you have obviously the ability to schedule backups, schedule the retention of those backups, but also take those backups off server, okay? So we support S3, SCP, local mounts, whatever you like to copy those backups off server. And remember, this is all happening within your environment. We're not taking your backups and copying them into Axanops. This is all happening within your own in for your own Amazon, your own Google. So you basically set up a backup in Cassandra, schedule it, runs and away you go. But the cool thing about the backups is what we do around deduplication. So with, traditionally, when you take a backup of Cassandra, you're doing what's called a no-tool snapshot. And what that's doing is creating hard links to your SS tables. And every snapshot you take, you have another set of hard links to file system. So what most people do is they take that snapshot and they copy that whole snapshot off server. What we do is we look at the deltas between each of the snapshots and we take the changed SS tables off server. And the effect of that is the off-server storage requirements reduce significantly because you're typically not churning through the data at that pace where you have a completely new set of SS tables every day. There's lots of stuff in there. So we maintain essentially that manifest of those SS tables off server. So when it comes to the cost of that storage, it comes down. And we've had people use this specifically for this. You've got a large Cassandra cluster. You're backing up to S3. Even on S3, it will end up costing you a lot of money. So we solve that problem. And then as you'd expect with a backup, you want to be doing restores as well and we support doing restores in there. Other thing to point out on the integration side is what do we support for triggering those notifications, triggering those alerts? So we pretty much support everything. And if there's something you want supported that's not there, let us know, we'll add it. So we have emails, you've got pager duty, you've got Slack, Teams, Snow, Ops Genie, Webhook, whatever you like. And the nice thing about the way we set up notifications in Axanops is obviously you have error as an alert. You trigger an error alert. It goes to that whatever you've configured for the integration. But we also have the ability for you to override with specific domain level alerts. So as an example, backups. Maybe you have a backup team who needs to deal with backups failing. So what you can do is you can trigger the error alert for backup to go to a completely separate pager duty, Slack, whatever you like for that specific team to handle. And that's quite useful because we find in most enterprises, there is a separation of concern on this or backup failing is a pager duty, but it's not a pager duty that's going to wake you up in the middle of the night. You look at it at 9 in the morning when you come online. So there's often different requirements for what would be a wake me up if something's gone wrong or don't wake me up, I'll come and look at it in the morning. And we split this up across global, the metrics, backups, the service checks I set for you, individual note stuff, various commands, repairs. Do you want to be woken up at 3 in the morning if something fails in a repair? Maybe, maybe not. So you can deal with that. That for me is a killer feature because often you end up with just kind of being blasted with noise on alerts. And the worst thing you can do with alerting is just start sending people alerts that they don't care about. Because what happens is when you get an alert you should care about, you ignore it. So I'm a big fan of being very, very targeted with when you wake someone up, when you send that error alert and where you send it to. And this gives us that way of going, okay, hold on, okay, backup alert goes to Slack. No down goes to page or duty. You can configure that a lot. And the other thing that you can also do is you can set up personal integrations for metrics as well. So you can go into a dashboard and say, I'm personally looking at, say, this metric, this table. If this happens, send me a message on Slack or whatever specific to me. So you have your own stuff you might want to look at, team-level stuff, domain-level stuff all in the one place. So essentially, that acts not, you know, pretty much in a nutshell. So we have performance dashboards, which I've shown you there. We've got the logs, you've got service checks, which I do quite like. It's because every environment's different. There's often weird little things that you need to check for. The alerting and the notification, there's no, you can, anything you want to alert on, anywhere you want to send it, it'll support it. We give you backups, and we have adaptive repairs. Don't underestimate adaptive repairs. It's like, I would say 60% of every guy looking after or girl looking after Cassandra's time is spent dealing with repairs and turning them on and turning them off and recovering and restarting. And one thing to make sure you know an adaptive repairs is when it stops, it resumes from where it last was, okay? So if you go in and say, turn off adaptive repairs, it will basically remember the state it had got to, the point it had got to with the range of the repair. So when you start it again, it just resumes from there as opposed to, I've killed my repair after being 90% through it, now I've got to start again all the way through. So it's very good at check pointing where it's got to in the repairs. Rolling restart, which you will definitely need. And oh, I didn't actually show you the reporting. So we also have this ability to create reports. And what essentially we do is, where is it going? I'll show it to you another time. So essentially what we can do with reports is we'll generate a PDF report based on a dashboard you pointed out. So what we use this for is essentially you will have, say, different teams who might want to be looking at a particular set of metrics or key spaces or tables, and we can set up axe knobs to generate you a report on the PDFs. And you can then distribute that to your manager, to your team to say this is the performance of your, whatever you care about. These are the incidents that have happened over on this cluster, and that all gets generated into a PDF to send across to your team. So there's one more thing. And I forgot my turtleneck, but one of the things that we're really, really happy to announce today is we're essentially launching a technical preview of Apache Cassandra cloud provisioning through axe knobs. And what this is doing, it is enabling the provision and maintenance of a production-ready Cassandra cluster directly into your own AWS account, your own VPC. And basically what you do is you go into the SaaS, you provide it with the necessary credentials for your cloud account, and you can then go in, click a button, press deploy, and it will go in and deploy a production-grade, secure Apache Cassandra cluster directly under your own control, your own VPC, your own EC2 instances, your own compute instances, yeah? Not on someone else's servers, so you maintain the control of that data, but you get exactly as good a quality level of service, but it's all running on your own instances. So you benefit from those cloud discounts based on it running on your stuff. You have the complete control of around the access to that data, when they can access, who can access it, et cetera. You don't have to worry about any of those, often the challenges, it's all your PCI, your PII kind of requirements around, I can't put this data out onto a shared service. It supports automated scaling, upgrades, everything you want to do to look after your Cassandra cluster. So, here's an example of going in and provisioning a new cluster in AWS. So you go in, you select where you want to go to, you provide, we give you the necessary things to set up from an IAM perspective, you make sure you've got everything set up, you confirm you've got everything you want to do, you go in, you pick your access key you're going to use, you pick the region you're going to deploy to, you pick the VPC you want to deploy under, you choose the subnets and AZs you want to be placed under, and then you click mixed, and then here you pick the number of nodes you want to deploy, nice little sliding bar to let you do this. And 64 would be a good thing, but we'll go with I think probably six for this one. Then the next thing you do is you choose the type of EC2 instance you want to deploy onto, you pick ephemeral or persistence, we also give you the cost as well, when you're doing this, the cost of the storage, the cost of the EC2 instance, and then you pick what kind of an instance type you want to deploy onto, and here we're going to pick whatever, and then the next thing you do is you give it a name, and you obviously need a bastion key as well for you allow you to connect up and do stuff, you pick the version of Cassandra you want to deploy, you click next out of the box, we'd like you to set up a backup, you can figure that backup, it'll create that S3 bucket if it isn't already there, you can give it one, give you a summary of what you're going to do, you click create, and obviously this is sped up because it takes a little bit longer to start up the instances, but then you go in, and hey presto, there is your deployed Apache Cassandra cluster with three nodes, and we set this all up, all of the SSL, the key stores, the trust stores, the super new provision with authentication on, we remove the default super user, we generate a random password, it is now ready as good as you would need in a, this would be appropriate for a financial institution, anything, the OS, everything. Next thing you get there is, this is quite handy, it's scaling up a cluster, so what we want to do here is increase the number of nodes in our cluster, it's as simple as that, you just literally toggle it up, pick the amount of nodes you want to add, click scale, click the button, scale, and then bam, that is it, it now goes in and adds the necessary nodes to your Cassandra cluster, scales it up, and you now have six nodes instead of three. And then the other thing we do is upgrading of Cassandra as well, so simply put here we're going to do, we're going to do an upgrade from 4.1.1 to 4.1.3, you literally go in, pick the version you want to upgrade to, click update, and hey presto, through the magic of speeding up the video, your cluster is now upgraded to 4.1.3. When we look at those details now, it's all been upgraded for you in there. And then there's another favorite of mine, which is the SSL certificates, no one likes managing certificates, particularly not in Cassandra, and here we make this really easy for you to essentially go in and refresh your SSL certificates and literally go in, pick refresh, bam, away you go, it's now refreshed all the SSL certificates across your Cassandra cluster. So what I've shown you there is through the cloud provisioning tool, the ability for you to provision a cluster into right now AWS, we've got Azure, Google, other things coming down the pipeline, scale that up and down as you need, and perform an upgrade of Cassandra. And there's plenty more as well in there, we support replacing nodes, changing instance types, OS patching as well. So we will let you, you saw the button to patch the OS, we do all that for you as well. When you want to upgrade or patch the OS based on the security, you click patch OS, it goes in, patches the OS with everything you need. And this is pretty cool. And we're really, really excited to do it. So basically, I'll say this because it's, I work for the company, it acts not, is the only solution for one-stop operations for patching Cassandra, it gives you everything you need to do to monitor that cluster, maintain that cluster, back up that cluster, and now provisioning as well. And the power here is that you can now very easily, at the click of a couple of buttons, deploy, like this is the same level of security, of resilience that we do on lots of clients clusters into your own cloud account on your own instances, on your own VPC, all under your control, without having to spend ages figuring out how to do this, how to all write all the automation, anything around that. And then we also give you all the necessary tooling to monitor that out of the box. It has all of the dashboards configured for you, the backups that you need to do, repairs, your logs, et cetera. And we're continuously updating it and adding more and more new features. And, you know, it works. And the other thing to also point out on the provisioning side, this isn't just executing a bunch of Ansible or Chef. This is a stateful. You need to be able to running these things in really a stateful kind of flow and process flow around this. So it isn't just, we don't just randomly go upgrade. We are checking things as part of the upgrade. We're going, ooh, something's going wrong. We'll back the upgrade. We have specific flows. For say you're coming from a minor, a new major version of an upgrade, we will have the specific steps to support that upgrade. So possibly you need to rewrite all the SS tables as part of this upgrade or whatever. That will be part of that specific flow for that upgrade path for you. And our job is to put that all, all that smarts into the tool so you can just basically go there, click buttons and not have to worry about it and just focus on Cassandra. So, thank you very much. We've got time for questions, but just to point out, there's a couple of cool things that you can already go and have a play with. We have our, as I said, we have that demo environment. You can go and click around and have a look. We also have this ActionOps program which is free, basically up for six nodes, cloud or self-hosted, off you go, use ActionOps, get your Cassandra working. And also, if you sign up now for the, obviously we're getting actually quite a bit of demand for the provisioning. So we're operating really a technical preview mode. So if you're interested, scan that QWERTY code, register your interest, and then we'll let you know and you can come in and start playing with the provisioning yourself. And if you do register now, we'll also give you a free upgrade to ActionOps Enterprise which gives you some additional features to what you'd be getting on the free program as well. So, any questions? Well, actually, typically the SS tables are already compressed. So when you create your table in Cassandra, you have, by default, they're compressed with LZ4. So yes, you could compress them, but you're not going to get a massive drop in volume based on the compression. Yeah, absolutely. So once you've provisioned it, you've got to take those certificates and provide them to your applications to you. So yeah, once it's created that cluster, you can download the JKS and use what you want to do. And you can also copy the CA out to use for yourself to issue certificates. So our thoughts there are you might take that copy of that CA certificate, bring that into your own certificate and maybe you're using that to distribute those certs. So imagine Vault or something like that to generate those certs, push them to your apps as you need them. So you can download the JKS, you can take the CA yourself, drop it in and use it. Anything else? Stunned silence. Yeah. Cool. So give it a go. If you have any questions, there's my email address, aksanops.com or you can send an email to community at aksanops.com and do sign up for the technical preview for the provisioning and get on the list and we'll also give you that free upgrades to AXA Enterprise. So if there's no other questions, I'll leave it there. Thank you very much for your time. I hope you liked it. And enjoy the rest of the summit.