 So in this talk I'm going to go through mission critical data, the role of Apache Cassandra in a lot of financial institutions and the importance of evaluating the community in RSS. A little bit before I go into the introduction about me. I'm a consultant at DataStacks. We work with Apache Cassandra, DataStacks Enterprise and DataStacks Astra customers. The finance industry is one among many different industries that we work in. These are some of the companies that I'm allowed to talk about that we work with. You can see that we have a lot of experience working with all of you. A little bit more about myself. Oh sorry one more slide and if you haven't heard of DataStacks, DataStacks was the company behind Apache Cassandra. It today offers a database as a service offering called AstraDB that is an Apache Cassandra compatible offering as well as offering for Apache Pulsar and other solutions. DataStacks just a couple of weeks ago got a new funding round. We got $115 million by Goldman Sachs and others in the current economic climate that gives you an idea of how well we are doing and how foundational we are to so many different companies and so many different sectors. We'll touch on some of that later in the slide. And finally the introduction about me. I have been passionate about open source for a long time now. I have been involved in Apache Cassandra going back to 2011 and I've been involved in many other open source projects since then. My day job is being a consultant. I along the way realized that I get a lot more enjoyment out of helping people and with their technical solutions and meeting new people along the way. So my full-time job is not to be an engineer on open source and I think that's important for some of the things that I will talk later about that you don't have to be a full-time dedicated OSS person to play a valuable and meaningful role as a contributor towards open source. In this talk I'm going to talk about, I'm going to break it down into five sections. I'm going to do a quick introduction to Apache Cassandra and what it is for those of you that aren't familiar with it. I'm going to do a quick run-through of how Apache Cassandra is used in the finance industry and with the customers that we work with. I'm going to wrap it up with giving you some code examples that you can grab the links to or the QR codes to go home and play with those examples, those best practices, follow the patterns that we work with in the industry. I'm going to look at where Apache Cassandra and its ecosystem is moving into the future and lastly and I hope most interestingly I'm going to look at the importance of looking behind the curtain to these projects and these technologies and into the communities. Okay, so first up Apache Cassandra. I'm going to go out on a limb and say that there's a good chance that you use Apache Cassandra as a database more than any other database in your day. A lot of people haven't heard of it. Apple iCloud, Netflix, Spotify, Instagram, TikTok, your bank, your insurance company, your telecom operator. Chances are they're using Apache Cassandra. Why is it that people choose Apache Cassandra? Originally it was built as out of necessity so it is a database which is always on. It scales linear and it's a real-time or low-latency database. In the early days of Cassandra it earned itself the reputation as the database of last resort. People would realize the RMS systems weren't working. It was too difficult once they're above four terabytes or eight terabytes and for the operational application stack they needed a NoSQL solution and they'll try different NoSQL solutions. Eventually they figured out Apache Cassandra was the one that actually did the job properly. It has over the years developed slowly at times because the engineers were very conscious about implementing things the right way. That's the reason why Apple uses it. That's the reason why Netflix uses it. For the biggest of the biggest clusters it's the database you use. Also on this slide one of the key features that Apache Cassandra offers and does best is this global distribution listening to the open source world. Global replicas. How do we use this in finance today? There are lots of use cases out there. When we work with our customers it's used everywhere. This kind of makes sense because it's not a high-level technology. It's quite a raw level of technology that gives you key functionality and a foundation and so if you've got a use case that needs lots of data or must always be on those use cases fall in. If I was going to distill it down to a few examples that we see at the moment let's start off with data modernization and transformation. That's a really obvious one. It's a very simple one but it is a large part of our activities still. The finance industry, the technology stacks that you've got are huge. They're massive and they're old. You've got thousands of components in them. Modernizing them is hard work. Customer 360 or the different 360 views and once you get more and more of them real-time then you can go to the 720 or what we call the omnichannel. You've got recommendations, real-time fraud, risk analysis, all of that ML, all of that when you bring data from your warehouses or analytics and you need to bring it back into feature stores or just that type of stuff. Payment processing at scale, crypto currency which when we look at a purely engineering point of view it's time series data, metadata around blockchain and then of course you've got data governance and democratization. I'm going to go through a couple of architectural slides. I'm going to skim over them quickly. The slide deck you can download. For me to go into this stuff properly you'll probably take a couple of hours. There's just a couple of key points that I want to show on each slide. This one is what a lot of modern architecture should look like or what we're pushing a lot of people towards because it's the best practice slide. What I like about this slide is it shows a lot of open source components how you can get your full stack up and running largely with open source. Here you have your microservices, it's not working on that, microservices stack. This is your application or operational stack and here you have your data warehouses. It's a little box in reality and it's a much, much bigger box in this industry. The second slide that I want to look at is largely the same. Here it's focusing on and we use it for data modernization. What you see here and what's important is the role of bringing in microservices and your streaming services. Again this is really simple stuff but it's what we're having to do a lot of the time with these legacy platforms to modernize them. This leads on to common tactics like CRQS, CRQS command query responsibility segregation and domain-driven encapsulation. These are key tactics that modernization then leads to possibility for transformation. DataStacks has written a really awesome paper, white paper, about this that goes through the different tactics that we go through to bring these legacy platforms up to scratch. I do recommend that you download that white paper and give it a read. The last architectural slide that I've got here is one for customer 360. I'm still struggling with the double computers. It's hard work, things you haven't practiced for. This is a slide that we typically use for 360. I don't want to talk so much about the 360 aspect of it today. What I want to talk about is there's something interesting happening in this slide. Here we're focused in on one app so we're not really worrying about the microservices stack anymore. It's still there but in this slide, in this diagram, the streaming services is the central component in this platform. This is that realization of an event-driven architecture. Once you've got that streaming services or event-driven design properly into the platform, there are a number of interesting consequences. One of them is that you no longer need to bring data from your warehouses and your analytics stacks back through into your operational stack or database. The data comes into your streaming system and from there it writes both into your application or operational database and into your data warehouses at the same time. This is what gives us real-time systems. This is what provides when you are doing your ML models and stuff, instead of it being delayed in time, you get real-time capabilities. That was skimming through the easy stuff. I wanted to do that just to create that platform, the picture of where you should be going because you need that to take the next step in the more interesting stuff with the machine learning, with the real-time, with the innovation. It comes back to this idea that if you want to be innovating, if you want that soft AI in your application, in your business, you've got to have real-time data. This idea of moving data into warehouses and figuring out what's valuable and what you have and then bringing it back, you can't innovate like that. As our CEO says, you can't innovate at the speed of a batch. A couple of code examples that I've got. The first one is a simple little Python app called DS Bank. It gives you the ability to create a bank account, create different cards. You can then watch transactions happen. You can have live dashboards. This is to illustrate that a base architecture for real-time processing and streaming. It's a solution based off Apache Pulsar, Apache Cassandra, GraphQL and Stargate. It shows you how you can put these things together without in your microservices stack client-side coupling of dependencies, which is what your database client drivers essentially are. It also does it with Apache Pulsar. What we are seeing with our customer base, the people who need Apache Cassandra, they're getting a lot out of using Apache Pulsar instead of Apache Kafka. The volumes of data with the global deployment and with elasticity needs, Apache Pulsar is easier to operate and has a higher throughput. The next code example is a simple cryptocurrency commerce app. It works through a Metamask wallet, which is just a plugin in your browser. Once you've got that wallet in your browser, you can run this app and it just creates, as a seller, you can just upload an image, call it NFT, list it and then people can bid on it and then you can buy it and you can go through that. Under the hood in Apache Cassandra, it is essentially just time series data. An older example that we've got is the banking IoT. Again, this shows transactions in time series being able to label and search it. Moving forward, Cassandra has been asleep for the last five years. In the last year or so, it's starting to wake up. We've got a number of big contributors, Apple, Netflix and data stacks all coming back in a big way. I wanted to do a quick run through of what is happening with Apache Cassandra and what is happening around Apache Cassandra. First up, and this is from a developer's point of view, we have an open source project called Stargate. What Stargate does is it creates like a coordinator layer on top of the cluster and it provides you an interface where you can use Apache Cassandra and it works with other databases as well using GRPC, GraphQL, REST and JSON. We see more and more enterprises, more and more people not wanting SQL, not wanting SQL, not wanting any client driver. They want in their microservices stack to be minimizing the protocols that are used between services. Your database or your search engine should be treated as just another microservice. From an operational's point of view, we have the Kate Sandra open source project. It is a Kubernetes operator for a full Cassandra stack. Not only do you get Apache Cassandra in it, you get the Reaper repair tool, you get the MCAC Prometheus and Grafana dashboards, you get that Stargate layer. Being opinionated ops, it makes life a bit easier when it comes to scaling and elasticity. It gives you that freedom to operate in different clouds. Finally, coming to Apache Cassandra. 4.0 took us many years to get out. As the Cassandra momentum built up again, our first objective was stability and QA of the code. One of the things the community made a commitment to was, as an early maturity technology, we had to take a step away from what's also typical in the open source world of the people going, I'm not going to feel safe deploying this into my production until it's at the third or the fifth of the sixth patch version. Leading up to 4.0, we said that had to stop. Apache Cassandra at a 0.0 version had to be safe for everyone's production. It had to be production ready and safe even for the biggest clusters out there. It took us a long time to make that properly happen. At the same time, we saw 4.0 25% faster, especially faster for the biggest clusters. Moving on to 4.1, we shifted from QA and stability of the code to plugability. This is as we saw it being a early maturity technology and it itself moving into a little bit more of a slow lane mode of development. A lot of our contributions now are coming from other people's forks of Cassandra. A little bit like the Linux kernel. We're seeing patches being contributed to us, which have already been running in other people's production systems. We're seeing less commits in the source control but of higher quality. That has led on to the need for more plugability as those different downstream people are doing different things with Apache Cassandra. 4.2, there's something really interesting happening. Apple wrote the Accord Consensus paper. Apple looked at Paxos, E-Paxos, Ramp, Janus, all of the different consensus protocols out there and they tried to put them together and figure out what they needed for globally distributed transactions. They came up with the Accord Consensus and that has been implemented in Cassandra 4.2. This is a game changer. This means you can do transactions, not just cross petition, but cross table. On the optimal read path, it is a one or a single round trip to do a transaction. That means that we can do cross-continent asset transactions fast. We're not talking slow like Spanner. Cassandra is a real-time low latency database. That's an important characteristic that we want to keep and this fits into it. Stay tuned for that. 4.2 will also have storage attach indexes that allows us to have a lot more secondary indexes in a cluster and Trimemtables which is very important for even more performance on data which is a very high frequency of updates to the data. The last section of this presentation and the one that I'm most excited about, this is the one that gets me out of bed in the morning, the community. Looking at the community behind an open source project. This is the section where I get to convince you that you should judge other human beings. It's the paradox. If we want inclusiveness, if we want diversity, we have to judge people. You need to know when you walk into a room with a group of people, are these people that I want to hang out with? Are these my people? If they're not, get out. Let's start off with the question of why do we choose open source? If you ask the engineers, they'll say transparency and efficiency. I don't think there are enough good enough reasons why companies have standardised on open source. I think it's interesting because the choice about the tools to use here doesn't have anything to do with the choice of tools we make. Let me explain that. I think that as we move towards microservices, as we move to more compartmentalised platforms, we need to have the freedom to operate. An open source often enables that. It allows us to interchange or change the technologies that we use over time as our needs change and as the technologies change. That allows us to worry less about the choice of tools and more about our business strategies. But there's a catch. One of the other things that people can sometimes say about open source and why we choose it is because it's free. That's complete nonsense. There is no free lunch. If open source was free, it would be free as in free puppy. If you choose to use open source, at some level you need to take responsibility for it. There are different ways that that plays out. You need to show up. It is a problem that we have a lot of employees still not encouraging the employees to contribute back to open source, even forbidding it in some situations. We need to address that. If you need to show up with open source, that leads into having to judge the community. How do we go about doing that? In open source, there are lots of different types of projects. They come in all shapes and sizes. You have the projects which are in foundations or are sponsored by large companies. A lot of companies today are standardising, saying these are the only open source technologies that you are allowed to use. We also have the bizarre of one man even unmaintained all pet projects on GitHub. I am hijacking the Cathedral and the bizarre analogy there for a completely different purpose. We shouldn't look past those GitHub projects. We shouldn't necessarily think that just because an open source project isn't alive or it has only got one contributor and there is a risk associated to it. We should often look at those types of open source projects just as reference code. It is a completely different way. It is not something you engage with. It is simply like finding the perfect function on Stack Overflow and saying that code does exactly what I want. Let me pull it in as a dependency in my project. But at the end of the day, you are treating it like it is your own code. You took a scan through it and you are going I like that code. It does what I need it to and it is safe. I will take responsibility for it and I will include it. The foundations are a bit different. The foundations have different companies involved and there are different things to look at. There are simplifications and generalizations made about the different foundations just like with open source licenses and there is no good and bad here. Different foundations work in different contexts and have different pros and cons. The CNCF can be labelled as a pay to play environment. The Apache Software Foundation can be labelled as a volunteer-only environment. These things aren't necessarily true. There is history there. There is different contexts. CNCF is a foundation where large suppliers, companies have come together and especially in the cloud space and Kubernetes, for example, they have a natural dynamic where they are equal players and they have wanted to come together and collaborate together on something and that is essentially what is at the heart of open source software. If two players in the market want to join together and collaborate, they will move faster than everyone else in the field and you can't change that. It doesn't matter where open source goes into the future, cloud or whatever, that dynamic of open source will never change and why it will succeed and be our standard moving forward. The Apache Foundation is different in that it covers a lot more different types of projects, projects which have a lot more different dynamics with the companies that are involved in them and how they collaborate, big companies and little companies and unpaid collaborative contributors. What the ASF did from the beginning was say let's take the notion of a company out of the open source community and the only thing that we recognise in the community is the individual human being and when we associate trust to that human being it does not expire because we've learnt that person and we've come to trust them and they're established. That is a way of levelling the playing field so that companies of different sizes or unpaid contributors can all come in. There's more to look at it here too as well. You want to look at how many companies are involved in the project. Is this an open source product which is just code thrown over the wall? Is it OSS as marketing? How in that project in that community passes by treated? When you look through the commit history of the project are they only from dedicated engineers employed by one company or just a handful of companies or is it a project which is a swarm of small contributions coming in from everywhere? When you look through the ticket system and the community dev list or their channels do you get an idea of their product management? What is their roadmap? Is it out in the open? Are they including you with where they're going and what people's needs are? Or do you get a feeling that this is behind closed doors and that if you turn up with a contribution you're kind of at their mercy at their whim of whether that will be included and it will align with what they need. That leads on to once you understand the different types of projects and the different dynamics into those communities. You can start also at the same time looking at the warmth of that community, looking at that inclusiveness of the community. You can also look at the diversity of the community. That's a tricky one because diversity is something that follows. So a young community may have a good warmth and a good inclusive and you would hope that diversity comes soon after. They do all combine. So that wraps it up for me. This is my last slide. So I hope I've convinced you that you need to judge the community behind your open source project, that the diversity and the inclusiveness of the community. It relates to the longevity, the quality, the security that was something that was touched on the keynote this morning and ultimately it comes back to your costs and your success of your own application. It's going to be when you look at that community it's going to be a indicator to how fruitful your own possibly very limited contributions to that project will be. I think it's important that we recognize that this is our work, this is our jobs, this is your 40 hours a week or more. These are the people that you surround yourself with. I think we all know that if you turn up to work and you have a team of people around you it doesn't need to be the smartest people. If it's a team that works well you will get great things accomplished and doesn't apply only in your own company. It applies in open source as well and sooner or later you will interact. That is me. Thank you very much. I think I've gone a little bit over time. A quick thank you to my employee as well, my sponsor. Datastax has an offering it's a subscription called Datastax Luna. That is our support subscription for both Apache Cassandra and Apache Pulsar. And as well we have an offering here. If you grab the QR code you can get $250 of credit to Astra which is our Apache Cassandra database as a service offering. If you want to get started with Apache Cassandra I really recommend that you jump on to Astra because you can avoid the whole setup and the operational startup bitstrap. Just get playing with the database immediately in a couple of minutes. Thank you very much.