 Hopefully, what in this talk I will answer is, what is Cascada? A lot of people have heard about it. I'll give you a run-through on that. What is real-time AI? What do I mean by that? I'm going to give a quick demo of a RAG app. If you don't know what that is, I'll explain it and show you how we can leverage AI in an app with five minutes of development time. And then I'm going to go through the importance of open source. A little bit about myself. I am a committer and on the PMC for Apache Cassandra. Last year, I was the PMC chair for the project. I've also been PMC chair for Apache tiles. And I've been passionate and involved in open source for a number of decades now. I thought for this presentation that I would go back to my first patch, but then I realized I have no idea what my first patch in open source was. It could have been something to bugzilla. It could have been a Linux driver. Otherwise, it was around the time that NetBeans was first open sourced and that the NetBeans platform came out. At work and my day job, I have been engineered by training and at heart, and I still try to code at least a day a week in my evenings, if I can, when the family lets me. But then I was a consultant for 10 years, primarily on Apache Cassandra. That led me to work with some of the largest deployments in the world. And then just this year, I've been moved into product. I'm still trying to figure out what product people do. But the position that I was put into was VP product open source, which is kind of odd because open source, especially what I'll talk about later, true open source, is not something that a company controls. And that is something that product people often have trouble getting their head around. And so my role is to kind of help communicate how companies invest and contribute to an open source, which is not controlled by them, how product works with that dynamic. So it's really more like open source operations. People will say, oh, is it Ospo, or open source program office? I don't like to think about it like that. I like to think about it more like operations in that if all of the company's efforts and contributions and coordination with open source and all of the other vendors and companies contributing to the projects is invisible, it's successful. So making it all of everyone's radar and just run smoothly is part of my job. A little bit more about the company that I work for. Datastax, Datastax is known for being one of the big contributors behind Cassandra in the early days. We are no longer that. Apple is probably the biggest contributor now. Contributor of individuals contributing, I should say. And we also have companies like Netflix and Intel and Amazon and Microsoft and Instacluster and Avian or offering up contributors into our community. What we do at Datastax is we offer Cassandra in the cloud. More than that, we offer a modern application layer data platform and the key components that we see that you need in a data platform in your application stack today, the database, streaming and machine learning. I'll go into a little bit more of that later. This is built on open source products. As a company today, our strategy is we don't hold IP. We're not interested in proprietary versions of our open source products. Everything that we do is open source. Everything that we code, we open source. We give you the freedom to operate. So you can put these components together yourselves or you can get us to operate it for you. Our job, our mission is to be the best at operating this. Essentially, we are just a utility. Our job is not just to be good at operating it but to run it cheaper and more efficiently than anyone else can and that incentivises you to take that off your shoulders. Okay, so I'm gonna break the talk up into four sections. Cascada is what you turned up for is on the title of the talk. Then I'm gonna go through generative AI and RAG apps and then I'm gonna talk about real-time AI, bringing those two things together and then finally talk about putting it into production. So Cascada was a startup a couple of years ago. It came from the ML engineers in Google and they saw that the real missing piece and almost half of business value to ML platforms was not the batch training of models and then putting it into an ML platform but actually more real-time stuff. And this has kind of opened up the doors to what we call feature engineering instead of feature stores and so on. They left Google started a startup and then last year data stacks acquired them and the first thing that we did was open source the project. So you can now find the project on GitHub. It has lots of features. It's quite a mature product already considering how young it is. I won't have time to go through all of them. On the feature list, probably my favourite is the native time travel. For ML platforms, this is a missing piece for a lot of people's stacks and with regulation coming up, the need for accountability and auditing on ML platforms is going to hit a lot of people hard and Cascada does it very nicely and cleanly. Okay, so jumping in, this is gonna take us back to basics. So excuse me. But if you think about a timeline to begin with, it is just discrete points on a timeline with some values grouped by entities. X-axis time, Y-axis, some value, entities grouped by colour here. First thing that we want to do with that is aggregate it. So here we're just saying, what's the sum of each entity over time? So we're just taking it from a discrete timeline to a continuous timeline. Again, this is pretty traditional and general way to visualise this type of data. So here the question for us is how much did each user spend over time? So we're gonna do that in Cascada. We can start somewhere basic. This is just, we're taking the purchases and we're interested in the amount column of that object and we're summing it. And that's how you're gonna get those values over time. What about if we want to window the aggregation? Again, this is not complicated. Let's say we're interested in how much has each user spent this month? All we do here is into the sum function we pass in the window and we just say it's since the beginning of the month. Again, nothing complex, super simple. Here comes one of Cascada's tricks. What about if you want to do window aggregation but not based on time? So for each user, how many page views have occurred since the last time there was a purchase? So here you can see that we're looking at what were the page views and counting them and then when there was a purchase, what was that value? In Cascada, we do that with, we're first working with page views, objects. We count them since there was a purchase. We can do this in SQL too. It's a little bit more complicated. This is not something that I'd really want to be handed off. If I was starting a new job and they were like, this is what we wrote, I'm gonna help. So we don't want that. Okay, what about joins? What is the average product review or score at the time of the purchase? So here you can see we're interested in average review scores over time and then when an item was purchased by someone. In Cascada, this is, we're taking the reviews object, we're interested in the score column and now we want to associate that with a reviews object and its column item, taking the average and then looking it up at that point when the purchase happened. Super simple, this is written in Python, one line. We can do an SQL too, it gets even more complicated. There's stuff in this that I didn't even know existed in SQL. Yeah, SQL can do everything, just don't make me ever do that, don't want it. It's not just SQL, people often say, oh, but Spark Streaming can do this type of stuff. Well, we went and looked up an existing implementation of Spark Streaming that did churn prediction over time and we took 63 pages of Spark code and rewrote it in two pages with Cascada. So you can see when it comes to doing declarative reasoning over timelines because Cascada is an abstraction layer or library that is built for doing this from the ground up. It used to have a custom DSL language on top called Fennel and just a couple of months ago that was rewritten to be, or Python, native Python, just to meet developers and the audience where they are. The engine of Cascada is all written in Rust and the data processing part of it is done with Apache Arrow. So that's gonna take advantage of the GPUs where we can. So performance, everyone likes a benchmark for the value that it is, but we took a few examples. We compared it with DuckDB, which is known to be very fast for this type of stuff. And you can see that Cascada was consistently faster and in some cases an order of magnitude faster. So hopefully already you can start to see, this is super simple, it's fast and it's quite likely going to make things possible or feasible where they weren't before in your job, at your work, in your solutions. Okay, so jumping into the next section. Cascada was originally written with predictive ML in mind. So, you know, like with Cascada on an ML platform and inference with an inference model, you could do something like, you know, within three or four clicks in a user session, you could pinpoint what the intent of their session was. So for example, if they went into a retail website, you could say, is this user really going to buy something and what category are they going to buy or are they just browsing or, you know, so you'd be able to do those things quite quickly where that's typically not what ML platforms do, they can give you recommendations but they're based on yesterday's data. 2023 hit, GPT came out, everyone's product roadmap got wiped and everyone's like, hell, what are we going to do with AI and generative AI? Open up a whole new world of possibilities. What we're seeing a lot of happening is these rag apps retrieval augmented generation. I know with our customers across the US, there is just this mad rush to implement AI chat bots before Thanksgiving. And they're basically following this architecture. So you take all of your in-house data, all of your proprietary or in-house data and you vectorize it or you create embeddings of it, you go put it into a feature store or a database which has a vector search. Then when a user comes along and asks a question, you vectorize that question or do an embedding on that question and then on that vector, you do a vector search, that gets you close or nearest neighbor, what we call approximate nearest neighbor results for that question in your internal knowledge database or in-house data or that user session and all their history. It can be from multiple areas. But what you're doing essentially is then when you go to write the prompt that you're going to send to the GPT or your large language model, you have a prompt template, you have the original question from the user but you're enriching that prompt with all of your in-house data that's a close match to the original question. Not only does this prevent hallucinations, it binds your large language model answers to stuff which is your business domain or data domain. So let's go through this in a super quick example and I'm going to try and do this in five minutes. So if I jump into data stacks and I'm going to use for this Astra, which is our Cassandra hosting in a cloud, I'm logging in with my Google account. I've already created a database called rag chatbot DB and the example that I'm going to follow is, I just went back, I jumped over it. Is this example here? So you'll find it quite quickly. This takes us to a Jupyter notebook which I've already got opened in a tab. So I've created a database ready to go for us. I've created a key space ready to go for us. So let's get going. So is it connected? Yep. So first thing to do is I need a few Python libraries. Get that going. I've already done it. This is usually where my wifi decides to drop out but it's looking good. Import some Python libraries. The next step is to connect to the Astro database. I need a token. We get that token by going to connect and then just generate token. And that will give you just a small JSON object and one of the fields in there has this Astra CS. I've already got this so I'm just going to copy it. And then I'm going to get my open AI key. So if you go to platform, open AI, I've already logged in, you go to account and you can go to view API keys. I've already got API keys. I'm going to just copy that, okay? Then I've got a key space name. I already showed that, okay? With a token, I'm just going to use the token user which tells the client to use the token based approach and then I'm going to need a secure connect bundle. So this is a Astra comes with just a little zip file that contains the keys to make connection easier. I've already downloaded that. That's my connection bundle there, good to go. To create the embeddings, I'm going to use the ADO2, create a session. I've got that table in the database already, drop it. Looked like I did, let's try it again. Okay, so I'm going to create the table. The table I'm going to create is an ID, that's just a UID or something. The title of the document, the context of the question, the question being asked and all the possible answers. So what I'm going to do, you'll see here, I'm going to download the Stanford question answering data set or the squad, data set from hugging face. And so this table model is a okay schema for that data set. And then against each question and the different answers it has, I'm going to create this field which is just of a vector type of one and a half thousand dimensions. Okay, done, download the data sets. Again, most of this stuff is like stuff you do one off. So it's not actually development time, development time. Probably comes down closer to a minute. Okay, let's put it into a panda and take a look at it. You can see the data there, title, context, question, answers. I know that this data set has duplicates, so let's remove them. Okay, so now we're going to create an embedding for each question and put it into our Astrodata base. This will take about a minute, I think. Again, it's a one off. The next steps that we're going to do is say for example, that's your internal data. And then you've got a chatbot and the user goes into the chatbot and they ask a question. So we're going to take that question here. When was the College of Engineering and University of Notre Dame established? And then we're going to create an embedding off it. And then with that embedding, we're going to put it into a SQL statement which is selecting that table and we're looking for results which are ordered by proximity to that vector and just three. And then what we're going to do while that is still running, this is our prompt template. So we're going to create a prompt where the role or GPT, we're telling it you're the chatbot helping customers with questions. The user is asking this question and then we have assistant content and we're putting in what we looked up to be our three closest internal questions. And then asking GPT 3.5. Let's see if I finish that. Done, okay. So that's our vector that we created. I create the select statement. This looks like a vector but it's actually, if I go to the top, that statement. Let's find three closest questions in our in-house data. So here we go, three questions related. Let's create that prompt and run it. And the answer is the College of Engineering at the University of Notre Dame established in 1920. So five minutes give or take and you can see most of it was just set up. Okay, so moving on. How do we bring these two things together? I mean, we see here in 2023 and with these RAG apps and generative AI, again, something very simple, very powerful in our industry. We know this. A lot of people are struggling to kind of figure out exactly how do I use this in our business? What is the value? If you go to any local AI startups, you know, the level of creativity I'm seeing in the startup community right now is phenomenal. I haven't seen anything like it. And I had a few of those meetups. You'll meet all of your local angel investors and product people who are desperately looking for their early cash and engineers. So it's all happening. But what about combining these two things together? We take a few example questions that we could ask a bot. You know, when will my package arrive or recommendation, what should I watch tonight and why or financial assistant is a good time to buy Bitcoin? And you can, when you think about, you know, how best to answer those questions, you can see that they often do touch on real time based information. You can't be working with yesterday's information. You can't be working with your database data at large. You're interested in what's happening in your streaming processes in your application stack as they happen. Let's take a bit more of a concrete example. Say for example, in the Astro portal, we created a chat bot, which was to help operators and users of Astro database with their database. And so here we have a user comes along and he's like, why am I query so slow? And, you know, with our in-house data, all of our support documents and support pages, you know, we could probably come up with a pretty good static answer. Hey, Ben, queries can be slow for many reasons. For example, the servers may be under heavy load or your query may be involved, may involve significant computational costs. Yeah, it's pretty lame, really. That user was obviously doing something then and there and their problem was related to their current session or today. So if we take real-time context and we put that in there, we can say, well, you know, their average query analysis is about three milliseconds. We know that's their norm. And at the moment, they're seeing queries which are taking around seven seconds. And if we take a look at their recent queries, you know, we can see that there's information, we can see what they're doing wrong there. And we can say, hey, Ben, it looks like your queries are doing full table scans. Try and select a key or use an index on that column if you're developing it. An example that I've got to run through is that combines both these technologies is an example, is an app that we've got on GitHub. It's under the Cascada organization called BeepGDP. And the idea here is to create a Slack bot which notifies you of messages or threads that are happening that you would typically be interacting with. This should resonate with most of you. And at least personally, I find Slack more and more frustrating that the more accounts that I'm in, the more channels there are at work, there are hundreds of channels. Now, I mean, like to try and actually stay on top of it, you're spending half an hour every day just scanning through, trying to get all the threads read. It's impossible. Really great way of just bringing you to the messages as they happen. So what are we gonna do here? We're gonna take a export dump of our Slack account and we're gonna put it into Parquet files and then we're gonna create timelines of all the messages and all the threads over all the channels. And then we're going to, in a predefined format, put them in to prompts, which are, and then do fine tuning into a GPT account. Fine tuning isn't something you typically need to do from what I've seen so far. 90% of the time playing with your prompts will get you what you need. But there are a few use cases here and there where fine tuning is the right thing to do. Here, because the structure of the data that we're working with is so specific, it makes sense. What that then allows us to do is that we then create a Slack bot that listens to messages and it will recreate them in that same predefined format and put them in as a prompt. And then the response from GPT will be, these are the people that would typically be responding to this thread or message. And then you can go send a notification to those people. If you go to that GitHub project, it's really only two files. We have an example export dump for you if you just wanna play around. Not everyone has admin access to Slack accounts. And then the fine tuning notebook is just that top half of the diagram, transforming the data into parquet files, putting it into timelines, then putting each thread into predefined format and then feeding it into open AI for the fine tuning. And then this beep GDP Python file which is the Slack bot that listens. Okay, read me. I think I've explained that already, I hope. The key part in the notebook is this section here. Again, it's super simple but the message is we're just basically trying to key them and join them on the channels and the threads. And in the beep GDP file, which again I think it's like 100 lines of Python or 200 lines of Python, it's very small. The main function is this one which is handles conversations and the conversations hand comes in. It does exactly the same thing. That's our predefined format, that is the prompt that goes in and we get the answer for. Okay, so whole new world of possibility. We can do lots of cool stuff with this. I think this year what we've seen is and the data scientists are a bit shaken up and they should be, the democratization and the commoditization of data science and machine learning. You can see here now that any old developer can come along and do quite complex machine learning or AI types of stuff in a few lines of code. This is great for us but this possibility is open to everyone. What everyone is streaming ahead with this year. What about when you put it into production? I think this is where people are gonna hit the hurdles. Development has now been made very simple. Once you're in production, I think that's where we're gonna find our challenges and it's certainly what we're seeing already with the people who are deploying these apps. First up, how do we see the application stack changing? We're seeing more and more the data tier as the critical foundation in every application stack. If you ask me, I would say that the definition of digitalization and digitalization projects this year has fundamentally changed. Last year, digitalization was about the automation of processes and tasks. This year, digitalization is simply about getting analog data digital under one data control plane, one data platform. You need your data to be democratized, access to all of its consumers, under one governance plane, one access plane. This is data mesh in a way. Maybe a better term that I've heard is data-centric engineering. Everything on top, okay so what you've seen now I've used different integrations, different frameworks to kind of set up, but I think what we're seeing with the business logic and with the user experience is it's becoming very cheap. Now we haven't seen kind of many autonomous agents, AI agents come out yet and this notion of you can just go to a GPT and say here's my product specification with an engineering specification with these how I want you to test and do chaos testing and security testing and privacy testing just doing all of those specifications to GPT and say write me that program. But a lot of people are pretty confident that's where we're going and going quickly. So all the stuff on top is becoming very cheap and very automated. Simple and quick to change. What's not quick to change is the stuff on the bottom and what a lot of people don't have is a proper data platform in the application stack that will scale. What I think we're going to see is what happened in 2010 with Mobile First and the explosion of data and how it blew up anyone's hope of using an RMS system in the data warehouse or data laking or analytics. We're seeing that going to happen in the application stack and so you are going to now start to replace the legacy databases, the RMS systems with modern data platforms that can serve any consumer no matter what format or API or traffic shape or SLOs they require. That is why at DataStacks we have built this data platform. That is what we're trying to deliver to people. We are active in the open source community and the things that we're working on, you can find Astra and Cassandra plug-ins to LangChain. There's also this LangStream project that's come out which allows you to I think in a lighter way do a streaming approach alternative to LangChain. We have Stargate which is like a coordinator layer on top of a Cassandra cluster that gives you REST and GRPC and GraphQL and document APIs. So again, your data can be accessed by anyone and we've got this CASIO framework which helps people put LangChain or LamaIndex on to Cassandra. And lastly, not least, we've got one of our engineers is working on this JLama which is a LamaIndex rewritten into Java and he's already getting an order of magnitude better performance in Java which is kind of like blowing him away. I don't have the reasonings to that. It could just be a cleaner design. Who knows? I know he's now relying on JTK21. Okay, bringing it back to Apache Cassandra. I think you've kind of seen where I've tied this into the loop and that the data platform we need to take different approaches. This is just one technology that you could be implementing in a data platform and when I talk about a data platform, I'm not saying you can just go take your RDMS technology and go replace it with some other database like Apache Cassandra. A data platform today requires lots of different components put together. One of the things I want to touch on though is the NoSQL moniker. So the NoSQL moniker came from one of the committers in the Cassandra project and it's kind of been misinterpreted along the way. When the moniker was first mentioned, what it meant was, look, we're taking a monolith database, the RDMS systems and we're going to rewrite them into microservices. Now microservices wasn't a word back there back then. Distributed computing is a better idea but microservices people understand. And databases are some of the most complicated technology in our industry. So we knew that we wouldn't be able to replicate an RDMS's feature set in one year or two years or three years. We had to start somewhere and we understood that, well, sometimes you can share the partition data in the application layer if your data domain naturally shards that way. But it's more often than not a dead end and if we could do that partitioning for you, that was the longest path in the future. The problem with that was that we had to break relationships and so that's where NoSQL came from. NoSQL was about that journey about rewriting a complex piece of technology into distributed computing. Unfortunately, other databases came along like MongoDB. It's a great database. I'm not going to diss it. But they were like, look, we've got a database too and it's got a different interface than SQL. So we're NoSQL too. And so that's kind of led us into this trap that NoSQL means about the user interface, that you have a different feature and it's also kind of led us down the wrong path, I believe, that every time you have a different, a need for a different feature in your data, you think you need to go choose a different database. What we're doing with Cassandra is implementing vector search. We did that in four days. We got feature parity with Pinecone with one developer coding in four days. We have performance, both latency, relevancy, accuracy, and throughput. Double that of Pinecone. Half latency, double throughput. And we understand that the people who can keep their data at their source of truth and just create the embeddings or the vectors there, they don't need to copy data to a search engine, they don't need to copy data to the analytics platform, et cetera, are going to win. It's not just the cost of savings with storage, it's the cost of savings of moving that data and massaging it, or having to work with yesterday's data and different schemers. Organizationaly, everything is much simpler. That was that slide. I think I'm up for time. Lastly, I want to note, in December, we have a Cassandra summit in San Jose. The next foundation is hosting it. Please come and join us there. You can see from this slide, Cassandra 5 is coming out with a ton of new features. We have ACID compliance. We're now strictly serializable at the spanner level. We can do leaderless global strict serializability using commodity clocks with a single round trip. That is the Accord consensus protocol that came out of Apple. Vector search, unified connection strategy tries throughout our structure. Thank you. If you've got questions, find me outside. There's also these cards on the poster outside. You find these cards, grab one, they've got the QR code, log in to Astra easily and give it a whirl. If you want to try any of these things, it comes with a lot more examples. So I saw just the basic examples. There's a lot more there. If you want to check them out.