 Okay, everybody, we're going to go ahead and get started. Welcome to Cassandra Summit. I hope you enjoyed the keynotes. And I'm going to be talking about something that a lot of people are talking about these days, retrieval augmented generation, AI, and how to use Bedrock with Astra to create a question and answer session. So this is me, I'm a developer advocate at DataStacks, Kirsten Hunter. I wrote a book called Irresistible APIs. I'm very much about the developer experience, making sure that there's no speed bumps when you get started on technology until you actually get to where you want to get. So what will this talk cover? I'm going to use Romeo and Juliet as scene, act five, scene three, which is where all the dying happens. And we're going to load that into our vector database, and then we're going to ask questions using prompt engineering. And I'm going to do this in a very easy way with Amazon Bedrock. You will notice that I have Romeo and Astra instead of Romeo and Juliet, and that is because when you take one of the LLM models, it will cheat. If you say only use the documents that I have, how did Juliet die? It'll cheat. So I'm going to look at the huge upload that it had from the web, and it'll give you the answer based on that. So I changed her name to Astra because Romeo and Astra is not a thing. It has to use the documents that I'm giving it. I'll talk a little bit about Amazon Bedrock, which is how we're creating the embeddings and the conclusions. I'll talk a little bit about Datasax Astra, which is a vector database. It is Cassandra. We have actually introduced vector search to the database. That's going to be in Cassandra 5. Right now it's just in Astra, but we're pushing it to the core so that you'll be able to use that going forward with your Cassandra instances. We'll talk a little bit about how a vector database works so that we understand that. I'll go over the process of the demo and then we'll just do the demo. I'm going to show you how I'm going to use a Jupyter Notebook and walk through the steps necessary to upload that information with the embeddings, create a query embedding, and then run it through LLM for a nice result. So I mentioned what the demo is. Again, we have Romeo and Juliet, except we're using Romeo and Astra. What we're building is a sample RAG application to demonstrate the process. Uploading the documents, using embeddings. We're using the Titan embeddings from Amazon, query using the same Titan embedding, retrieve the similar documents, and then use a prompt to clean up that answer and turn it into natural language. So Astra, Astra's a real-time vector database. It's not just a vector database. It is a fully-fledged Cassandra database that also knows how to do vector search. So you can put all of your stuff in there, not just your vector stuff, and it can all be maintained in the same database. One of the things that is important is that it indexes in real-time. So one of our competitors, the queries slow down significantly when it's in the process of indexing, and it has to go sort of offline a little bit for indexing, and so that makes things slow. Astra does everything in real-time. The reads and writes are real-time, and so you get the response of this, even when it's in the process of doing indexing. So how does a vector database work? How many of you are familiar with how a vector database works? Okay, so we have Creative Vector Store, and the vector store allows you to have documents, and basically the embedding is like an address in space. Like where does this document live in my document space? And so it's a very complicated address. If you're using OpenAI, for instance, it's 1376 dimensions. So it's a very precise address in your place, so that's what you do, you create the vector store, you populate it with those documents and those embeddings, and then when you have a query, you send the query through the same embedding to find out what its address is, and then it goes into the database and figures out what's near it. So we're talking, this is much closer to semantic search than the searches that we normally do on databases, which have very specific keys and value pairs. So it's pretty sexy. I mean, I've loved semantic search forever. Okay, so Amazon Bedrock is a managed service for AI Foundation models. There are tons of AI Foundation models more coming all the time. What Bedrock does is it lets you sort of use their interface to use those models so you don't have to go set up those models separately. You set up your Bedrock and then you choose the model that you want to use. This example, we're doing Amazon Titan embeddings and the Anthropic Cloud 2 model as the LLM. Models can be changed and switched out, it's needed, and that's really important because when I first did this, I used one of the other completion engines, different than the one that we chose to use the Claude one. And I said, in 20 to 50 words, tell me how Juliet died in Romeo and Juliet. And it said, stabbed. Okay, that wasn't 20 to 50 words. Prompt engineering is really tough, but if I use open AI, it says, oh, well, she took a potion to make her look dead and then Romeo came and thought she was dead and so he killed himself and then she woke up and stabbed herself, right? That's a good answer, right? But different models behave differently. Okay, so the steps that we're gonna follow for the demo, I'm gonna set up my Python environment. We're just gonna go through a simple Jupiter notebook for this. So I'm gonna set up the Python environment, the credentials and everything that I need. I'm gonna create a vector store in Astra. I'm gonna create an embedding for the query to retrieve similar documents and pass it through the LLM. This all makes sense when we look at the notebook. But for now, does anybody have any questions on kind of the vector store stuff and how it works? Nope, okay. So let me show you this sort of with pictures. So this is retrieval augmented to generation and so we give it the context with the tasks, the roles and the persona and the constraints. So we have our data on the left-hand side and we split it into documents and then the documents go through the embeddings API and then they get put into the vector database and then the prompt goes through the sentence to the same embeddings API to create that embedding so that it can tell the vector store, I want things that are similar to this and then it sends out the documents for you. And then when you're doing RAD, you take those documents and what we'll do is we're going to take the documents we get back and then have the LLM massage it until it gives you a good answer. I'm sorry, what? The dimension is kind of the address and the space is kind of the whole thing. All right, so I have a demo and of course there are demo demons so if it doesn't work then I'm gonna put this in the chat for the talk so you guys can play with it too. Okay, so I'm gonna set up my Python environment and I'm installing what's really important here is Casio, that is our library for using LangChain, LalaIndex, OpenAI, we've actually made it work for all of those things and it's just gonna take some time because I'm standing up here and you all can watch it. I'm gonna say that this is not the fastest internet I've ever encountered. So we're using, so usually we have Casio and LangChain, Casio, we have Bottle3 and BottleCore, those are the ways to interface with the AWS system and then there's a few things that are necessary for the particular example that we're showing and this should be done any moment now so what you get for doing a live demo. The nice thing about this is that you can ask all sorts of different questions about Romeo and Astra and you can see here with the LangChain embeddings, we're importing at Bedrock embeddings, that's how we're gonna access the embeddings that we want to use. So we're basically using LangChain as a adapter to Bedrock. LangChain is just a really fantastic way to get the models that you wanna use and use them. All right, let's done, okay. This is a tricky one, that's right, I send it to myself in Slack. So this is the token that you need for using Astra and the other piece of information is the DBID because you might have some different databases and they might be named the same, which is something that we want to not encourage but this way you're using the actual ID for the database. Moment of truth, I may just have to talk you through what it's gonna do if it's not happy. Thank you all for your patience, that's better. Okay, so those warnings are known, there are some things in the backend, Casio is gonna get rid of them very shortly but we are successful. So now I need to do my AWS ID, I never do easy demos. Probably should have set this up ahead of time but I wanted to be honest about what is required. And this is actually out there, you can play with it yourself as well. Session token is really the reason why I can't set this up ahead of time because it refreshes like every half hour. Let's see if we succeeded at that. So what I'm doing here is I'm setting up the bedrock runtime and then I'm using the embeddings and this is where I'm setting the model that I have. So I can do any of the models that they have for bedrock right there. I can switch them out, I can change them, it's all, it's all really easy. Okay, so I've just created a vector store named Shakespeare Act 5. I'm gonna grab the document that has the lines from that particular act and scene and then we're gonna go ahead and it's gonna take a couple minutes to add the 321 documents. So there's 321 lines. So when I've broken the play down into lines, each line is its own document and then it has the embedding. So it's just basic Python that we're doing here to add this. It's gonna say done in just a moment. But let's go ahead and look at the prompt while we're waiting. So we're telling it as a human use the following pieces of context to provide a concise answer to the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. That last part is almost always what you want if you're doing your achievable augmented generation because you do not want it to take the whole internet as of three years ago and give you, give the answer based on that. You want it to give the answer based on the documents that you put in. Okay, so we did the prompt. And so we're using anthropic Claude. It actually behaves much better than the Titan completion for our purposes, right? So you may find out that actually Titan works great for what you wanna do, but for what we wanted to do, we ended up liking Claude better. This is how we're gonna answer the question. We're gonna create an embedding. We're gonna stick it in there. And then, by booking the model, and then we're gonna give the answer. So let's see what we get. So what do we get? Based on the provided context, it seems that Astor, Tybalt and Romeo all die in the story. True. Specifically, the lines mentioned that Astor was found dead and bleeding Tybalt had an unclimbing death. Death, Romeo was also found dead. So that's a pretty good answer, right? Who died in the play? And this one shows you, these are the lines that it found. There's duplicates because I didn't clear out the database before I did it this time. So we got some duplicates, but it takes the 15 lines that it got that were in the region of who died in the play. And so it did the correct thing. So does anybody have any questions about how this all works? So I have an Astor database that's running, and that's what I used the credentials to get. I also have AWS access to using Bedrock. That this is new. It's something that you probably have to add to your roles in AWS in order to make it work correctly. I would love it if people played with this. I will put a link to it in the session notes, and you're welcome to give me feedback on it if it works for you, if it doesn't work for you, if you have questions about how to ask kind of different answers, yes. I'm sorry, what? So Cassandra 5.0, which is in beta, you could actually get a Docker image of Cassandra 5. And it has this vector search in it, and it will work. In fact, the Casio that I was talking about, let me show you the website. So Casio is designed to integrate Cassandra with Line Chain with Lama Index with OpenAI with all of the different, it's designed to be flexible so we can continue to add more providers. And there are paths in here where you can say I'm using Astra or I'm using Cassandra, and it will tell you how to make it work with Cassandra. So, and then you'd be able to do all this stuff. Yeah, that's right, Stefano is amazing. It's tomorrow afternoon, and he's gonna talk about Cassio and how it works. This is just, what I have here is just one example of how it works, but he's gonna talk about the whole infrastructure that he's created around Cassandra and AI. It's really, really amazing. So definitely go listen to Stefano tomorrow. Okay, well, I'm done. So, oh, more questions, yes. It's a SAS, yeah. Then you have... Right, you don't need collab, right? You can run the Jupyter notebooks locally. But I'm to go to Australia to be here, so I'm like going to some other class. So that's a great question. Astra actually has a backend, and the backend you get to choose, whether it's AWS, what region it is, GCP, or Azure. A lot of people choose to have two different regions from different providers, so that if Amazon has one of those catastrophic things that happens every now and then, you're still up and running on GCP or whatever. So, I don't know which part of AWS you're thinking would be moving, but... Well, I'm just thinking in terms of Latin services, some of them provided by AWS, but if it's provided by a party like Q, right? Or even some other application that is complex, you have a lot of parties, and you want to take into consideration what's happening with the latency of our products, right? Yeah, Astra, I've been told, and I haven't done the testing myself, but we've done testing with Astra against other databases, and we are very performant. There's not a lot of latency. So, using the bedrock, you'll probably get a little bit of latency because you're working with the two different systems, but I haven't had bedrock be, let's slow me down, noticeably. And I think I saw you mentioned about the 5.5, which is coming out. Yes. I guess it would be a Docker image or something like that. You can get a Docker image now of the 5.0 beta. I can potentially run it in my Kubernetes cluster on EKS and AWS. Yes. So, what would be the advantage of using... Well, so Cassandra is amazing and wonderful, and I love it, but it's a pain to manage. It does some amazing things with scalability, and Astra takes care of most of that for you. So, you're not spending your time managing Cassandra. So, the other thing that we've just added that might be interesting to people who are not excited about using CQL is we have a new, we're calling it the JSON API, and it is very similar to Mongo's interface. So, if you're comfortable with Mongoose, for instance, you can drop this in and use us as the backend instead. And it's much more friendly for people who live in front-end JavaScript land to hit a document database. It's a very common use case. And so, here, you don't demonstrate too much in this notebook, but what are the benefits of AstraDB versus other vector database? I think you talked about this briefly at the beginning. Yeah, so we did some, basically Astra is a Cassandra database and it has all the things that the Cassandra database brings with it, which is performance and reliability and uptime and fast, fast, fast queries. And then we have vector search on top of that. It's part of that. It's not like an extra thing that we stuck on. There's a lot of companies that are just a vector database and they're not really in a position to scale in the same way for a production system. And we did a lot of testing. I think if you go by our booth, we have the numbers, but we were like 18 times faster in some cases and four times faster in other cases. And we have really got the performance that you want if you're gonna have a vector database. So it seems to be an operational play and I guess replication and so Cassandra is much better than maybe the other thing. Yes. What you're telling me, okay. Yes, yes. Yeah, you can make vector tables and they can coexist with non-vector tables. You can. The Astra wouldn't, yes. What was the question? Oh, the question was if I have production database data in Astra and then I wanna add vector search with those can those coexist? And so what I said was you can have tables that have vectors and tables that don't have vectors but Astra's not gonna bring them together for you. You're gonna need to do that on the client side. Yes. Nice. And in your database, you can have multiple tables, right? And so if you have a table that doesn't need vector then you don't need to add vector. But like she said, if you want to add vector search to a particular table then we can do that. Yes. There was one of this one and I will put my slides up. I didn't make the deadline, I'm very sorry but I'll put them up. So if you check later today you should be able to get my slides, yes. Rag is really kind of tricky. You aren't gonna get the same answer from the same model every time even if you give it the same data. But that's just the way the LLMs kind of work. I mean, I asked it the same question five times and it gave me five similar but different answers. So that LLM piece at the completion level is gonna make matter a lot how you set up your prompt, right? That's gonna be different based on the model that you have and also you can set things as such as temperature where you tell it to be bounded and what it returns. So I have to wrap up. It's 11, time for the next talk. But please feel free to contact me and ask if you have questions or if you play with a notebook and see how it works that would be great. And I really appreciate all of you coming. So thanks so much.