 All right. Thank you everybody for coming. We're going to talk about making Cassandra easier for AI developers I realized that this sounds very similar to Probably five other talks that you saw on the agenda throughout the course of this conference and I think we're making a point. I'm just going to make one small slice of a point that you've probably heard from some other talks It's going to be pretty complimentary to what you've heard One of the things that I think has been really interesting about this conference is sort of that blurring of the lines between The Cassandra summit and AI dot dev. I didn't know what AI dot dev was when I got here. I'll confess and It's kind of blown my mind this idea of the foundations of the AI revolution are open source That's really what's going to drive things forward long term I'm now a believer in that. I'm also a believer and have been of What Shet shared in his keynote that Cassandra is the database for this. So again, this is just one slice of that So very glad to be here. Maybe we won't remember this as The first Cassandra summit since 2016 or whatever it was. Maybe we will all be claiming You know five years from now that we were at the first AI dot dev I don't know time will tell so let's see where this goes Okay, I Don't have a slide at the beginning that tells you what I'm going to tell you the whole outline thing I skipped that but I'm going to tell you a story and there's a there's a premise here about making a Cassandra easier for AI developers. Okay, that means that it's hard for at least some developers So let's talk a little bit about why Cassandra is hard or has been hard for developers um so I wrote a book and People I'm gonna pick on Aaron because he just walked in the back People like to give me crap about the Cassandra book. Does anyone have any ideas why that might be? Dang it No, that's not it Okay, so basically I work at data stacks Which is a company that you know for the past several years has been building a cloud service there's a couple of others folks that are doing this and We're basically trying to invalidate large portions of this book So I had a co-worker mentioned to me recently There was a part of the book that they would like to obsolete and I was like Oh, you're talking about the whole second half of the book where we talk about operations, right? Because we're building Astra and he was he said no, I hate the data modeling chapter By the way, this was Ed our chief product officer And I said what you hate the data modeling chapter and he said yeah because you throw partitioning all up in my face And you tell people that in order to correctly data model for Cassandra and build performance systems You have to get your data model, right? I admit that's right. So what if the chapter that we obsolete is the data modeling chapter? Okay, I'm a couple of years past the season of my career in which my whole job was telling people how to be good With Cassandra. So look, I'm just bringing the receipts. We're not going to dwell on this very much But yes, I've taught people a lot about partitioning and how to design your data model You know, here's my training slides. Okay, they're small in purpose. So you don't read them Alright, so hey, there's a whole process that we went through right not only do you have to have a conceptual logical and physical data model There's also this thing called Access patterns that you have to come up with because it turns out if you want to get your data in more than one way The same data, but you want to query it in a different way You you got to design a whole different table and this is what we've been telling people for years and actually turns out This is still true if you want the most optimal performance But actually we're getting pretty good Performance the other way But but there's more to this complexity So I would go and be one of the people that would deliver, you know a week-long training class, right? Because we wanted to make sure to let you know that in order to be competent to touch this You must have at least one week of training And that's if you're paying attention, right? If you weren't paying attention then you're it's gonna be ugly, right? And you know, it's kind of a trope, right? We go to the Cassandra summit and we hear the horror story talks I just got out of a great one Isaac and Lindsay, right all the things that can go wrong. We know this we So we teach you about the very user-friendly CQL upper left. Okay, which is a Very powerful tool. CQL has a lot of things in it And then we are I'm particularly fond upper right of teaching you how to build Microservice architectures, which individual services are responsible for their data types No, don't take a picture of this part And then bottom bottom left There's a all the drivers that we provide that that help you to write and execute CQL statements And you know, we have all these very elegant Object models that are set up around this with CQL sessions that you can build and configure and look at this awesome fluent API No, don't don't look at that Then we taught you okay You need to understand the cap theorem. This is very very important and I'm sorry to be so irreverent, but This is how it is. This is what we've taught you all this time, right? you can't have it all and You need to be aware of replication Strategies and consistently fact consistency factories or what is it? There's a lot of terminology Not only that we don't have acid transactions yet pretend that I'm not talking about five dot one yet So we have these things called lightweight transactions where you can lock in to have consistency guarantees on individual rows Must be this tall to use we have time to live if there was all Many horror stories around Inconsistent settings of time to live which is a very elegant feature Cassandra will clean up your old expired records for you All right, I think I've gone on a little bit long enough with this rant. So Well, well, I've been working on for the past couple of years is I switched over to the engineering side and started working on Making Cassandra easier for developers instead of trying to help people work around all the hard parts Which is still a great and valid thing to do Thank you advocates All right, so here's approach one. Let's make it easier with API eyes. Okay, look at all the complexity of setting up And using a driver to talk to a database And that's that you know, I've already left out all the data modeling parts Or you could just call an H and an API HTTP is great, right So this is the mission Stargate Captain's log Stargate 2023 All right, so we produced API eyes There's there's more irreverence. I'm apologizing in advance. Okay. This is a slide that we presented quite a lot We have multiple different API's that are part of the Stargate project. So there's a GRPC air P API Which is basically CQL over HTTP There's two different flavors of GraphQL API. One of them is a little more CQL centric. One of them is a little more GraphQL centric We have a restful API which models your key space and table name as elements on the URL path and then everything else is an adjacent body And then we have a document API Which does something similar with key spaces and tables and tries to allow you to provide a blob of JSON generic JSON And we'll figure out how to put it in the database for you. This last one is pretty promising It's the only one that doesn't bleed CQL all over you See the path the trap that we fill into with designing a lot of these API's especially these ones in the middle is we made them Exposed things that you would expect and syntax that you would expect to find from CQL It turns out that once you start doing that you can never stop because you will continually receive feature request after feature request To add more of the CQL syntax into those kinds of API's You'll probably never catch up and and we certainly haven't so what we did a few months ago is is To stop listening to all the people that told us how Great those other API's were and decide that we were really going to go forward and create a new document style API so The substance of that is covered the technical details of that recovered in Aaron's talk from a little bit before Which the title of which escapes me, but that's great because I have it on a couple slides down from now And I'll share it with you so JSON API The thing that we've with that we realized is we needed to start being idiomatic So instead of pushing all of our CQL all over you all the time What we really needed to do is provide API's or even a single API that really just expresses a simple way of doing things and HTTP is very helpful And and you know having having that structure is great, but really most developers don't like Write a bunch of curl commands and then type that that you know what I mean that exact syntax of the of a web query into their program That's not how you do things right you usually have a client library that helps you to do things so What we've got is a whole explosion of client libraries They're becoming helpful with our new API So today and Aaron I'm gonna do a quick recap of some a couple of key points that Aaron made in his talk First thing we actually started from the outside in and said what is a really great Developer experience around a document API We looked at the JavaScript community and said this is a community that does a lot with JSON Document is very natural for them. There's a lot of Mongo usage there How did that how do people in the JavaScript community do? Document style API's so we partnered with the Larry Karpov Mongo's project. He did a great talk at a virtual Conference that we did back in March called Cassandra forward kind of also under this Cassandra summit umbrella where talked about the the evolution of that project So I'm gonna talk about a couple of the elements of that It's pretty simple in its architecture so we have an API called JSON API that we've developed that sits in front of Cassandra and We have a client. So there's a Mongo's client. It's an existing client Millions and millions of GitHub repos that are using this client. Yes, you can take a picture of that one This is this is the part where you that's fine. I also have a link at the end, right? So I have all the pertinent links at the end if you want to wait And my slides are online so Yeah, take a picture don't take a picture it's all gettable later But the the the Mongoose library We basically just plugged in a driver that allows it to talk to Cassandra through the JSON API So then the focus is we're not really trying to teach you how to use a new HTTP API for most users the idea that it's Some API currently called the JSON API at least that's the name of the repo right now We could name it to something else you don't probably care because you'll actually be using an idiomatic client for your programming language That's the whole strategy Okay, so what does this look like super fast. I'm not gonna read the code to you But it's it's pretty simple on the left side. All we're doing is connecting Initializing a Mongoose client that's connecting to a particular database plugging in the Stargate Mongoose library so that we can connect to Astra or an open source Offering so we can we have a way where you can just download Docker containers and run this on your desktop And yes, it works just fine. We also do deploy it in Astra on the right side. It's a very simple interaction model So because Mongoose is an object data mapper mapper you create an object you define a schema For that's gonna describe your JSON data and then it's a very simple interaction to Create an object populate the different fields hit save and then you'll see some of the Some similar looking syntax that it involves the ability to find data on the following slides. So This is idiomatic to a JavaScript developer. That's the way that they would like to interact with data Okay, so the details of the JSON API are actually some of the syntax looks quite a bit similar It is a JSON based API So we have the ability to create a collection of documents up or left We have the ability to insert one or insert many or we have update update many These operations allow you to provide JSON documents or that you know with update you can patch portions of JSON documents There's a bunch of syntax that you can see the details of from Aaron's talk On the right side, you can see some of the different find operations These can get pretty interesting. There's even a find one an update that is a combo kind of read update and write type of operation So you can do some pretty sophisticated and pretty selective kind of data manipulations Using this API and yes, we took as inspiration the things that you could do In the Mongoose API to basically let it help us derive what operations needed to be on the JSON API So super powerful API Probably about two-thirds of what Aaron went through was the interactions that how this translate How all this translates into CQL queries that are made on the database So I'm not going to re-explain on this portion of it I But this was sort of prefixed in his presentation by the this is highly offensive to people that know CQL the idea that We would be creating a table with so many indexes on it I haven't even shown all the indexes that we create on each table on this slide But we're using something called storage attached index, which is new to Cassandra Coming in the the 5.0 release with in particular some new Advanced features, so we'll talk a little bit about those because those are kind of key to how We're enabling basically It'll become important in making Cassandra usable for AI developers, which is the next part of the talk So we have the initial storage attached index, which is under CEP 7 What is just in the process or recently completed and in the process of getting merged in is I guess to the 5.1 or to the trunk is the CQL knot operator in CEP 29 which allows us to do some pretty cool things not contains for setting maps and not equals for map entries and I don't even know if there's a CEP yet for range queries on map entries, but we make use of that to do Some pretty cool inequality based filtering on the JSON API So there's Aaron's talk that you can go back and look at later We also did a blog together on Kind of explaining some of the basic principles, so there's two different ways to consume that information So we're on this road to deploying this JSON API and and getting it into a fully production state GA So we've been working on this for a few months all of a sudden The Cassandra community starts talking about vector, right? We're gonna add we're gonna make Cassandra a vector database so This comment is coming in from the perspective of I guess I depicted JSON API is already being a dinosaur Sorry about that But we have this comet coming in of this AI revolution and we're now talking about vector databases and of course we immediately pivoted and said Or asked ourselves the question. Okay, so we have a new API. That's a document-based API is Vector search a valid and desirable feature for Document-oriented style of interaction. Yes the answer very much. Yes So we have this Additional CEP related to SAI which is CEP 30 the ANN vector search so In short order we've been also incorporated that into the API and into our clients as well So the cool thing is as you know, if you're thinking of the idea that does a comet destroy all life on earth Well, you know not really there's a bunch if there are things that get destroyed There's a bunch of other things that start popping up Around it and that that's I'm gonna show you a bunch of these things that have started popping up Because we added vector search to our new API and ecosystem. So this is where we really get in to the funnest part Making it easy for AI developers so First of all, this is from the perspective of Mongoose. So the Mongoose project itself Working with Valery Karpov. We started working toward adding vector as a thing that's available to all Mongoose users not not just people that are running this with Json API or Cassandra or Astra, right? So now when you are creating a model you can define what vector is going to be stored as part of each each schema that you create and You know setting what your vector dimensions are and what's the algorithm going to be So this is now part of the Mongoose project itself So then when you insert data you can provide the vector there's a Function shown here called embedding which is sort of like oh, we're just going to call some other thing to embed it We're actually looking forward to providing the ability in the API Such that we'll calculate the vector we'll calculate the embedding for you So that's a that's something that is definitely on our roadmap and then the vector search Looks in an idiomatic way to the Mongoose developer You can you can ask, you know for on a find you in the middle section there You can ask for your results to be sorted based on A&N search results Or you can also do what's known as the hybrid search where you're doing Some attribute based searching and then also a vector based search where you know You're using the attributes to kind of narrow your search space and then you want to do a similarity search on those results So this is a common pattern that Is frequently used in AI applications? We have an example of this that we built Yuki one of the engineers on our team As soon as we finished adding the vector capability Into the API built this photography app, which I think is is very cool because It's combining actually usage of the API vector search features with also calls out to To other image analysis libraries So it's a great example of how you can bring multiple different APIs together in a single application and create something pretty powerful He also wrote a blog about it, which we've linked there and I'll give you the link to the repo at the end So there has been a focus in everything. I've talked about so far in terms of JavaScript developers And it's a pretty sophisticated ecosystem That's that is starting to be built out So I'll confess to you that I I had to resist the temptation Through throughout the past, you know two days of this conference to come in and keep Editing and adding things to the slide because One of the awesome things about being at the end of the conference is you get to learn a bunch of stuff And then you want to go add it all to your slides so you can talk about all the other things and sorry Kirsten I didn't add the The CLI to the Java ecosystem, but there's that temptation to come in and Absorb and and add in all this stuff and I will say that I haven't really been at a conference before where Everyone else was was changing all their slides not because they waited till the last minute to do their slides But because literally the technology is being written and we're just trying to keep up and make the slides with all the stuff The new technology that we're creating. That's a pretty exciting place to be in so the JavaScript Ecosystem We actually have an upper right. We have the the mongoose Like object data mapper kind of style client We we were able to pretty quickly factor out the core of that into a non Object data mapper version of that. That's called astra dbt s. So there's a there's a typescript version of That client library and then there's integrations that are built in so the integrations that we have for a lang chain and llama index are already incorporating the those JavaScript Clients are already integrating the mongoose library We just I did allow myself this one last minute edit of the slide deck today We did just release this morning a blog about our Taylor Swift chatbot that we can add I'll let you go play with this or we can or you can come up afterwards and we can ask it questions if you want and I have the link to the actual app, which you can use don't go and I didn't put the link in here because I don't want you To go use it right now. We'll do that at the end The source code is available for that and I think we will take a quick look at Some of the source for this because I was so curious myself That you know, there's a bunch of stuff here of the layout of the app and all that But I went in here and just looked at the chat Library to see I was like, of course, I'm not connected to the internet That turns out to be important. Okay, let's come back and do that later It um, if I can get it if I can get it working. I'll show it to you but it basically like the part that You know does the database query to it's a rag style application and it's like, you know It's five lines of code where we're going and retrieving the vector data that helps us narrow down the set of Input that we're gonna pass into the LLM for the call and I just it was so Wonderful and pleasing and amazing to me that it was like we did all this work on the back end So that you could write five lines of JavaScript and build a chat app Brings a tear to my eye Okay, so job we have something for other people as well. There's a equivalent to the ecosystem on the python side so We also have Astro pi has integrated Through its vector store class. That's actually just a little layer on top of the json api So if you are going to be doing kind of a document style interaction That's going through astro pi and then if you are going to do more of a classic kind of cql type of interaction Casio does that although it does provide a nice abstraction and thanks to Stefanos talk. I know that it doesn't bleed very much cql isms on you Reference my earlier rant. No, it's quite a nice library And then we have equivalent. So for the python world, we have the lane lane chain and llama index integrations So the syntax is going to look and you know the the feel of this is going to be pretty much the same as we've Seen in some of the other examples except, you know, it's idiomatic to python We get our data back in dictionaries and you know all this manner of things so The Java ecosystem So there's also an astro client So length chain for Jay is what we have in the length chain world And then I've just highlighted some of the key classes in these apis here. So there's also an astro client For Java, so we're not neglecting So again, we have all these client libraries in multiple different languages This is going to continue to explode as the little baby pine trees come and grow up and populate our forest of AI tools Okay, I took a big risk here because I Provided an incomplete view of what is going to it is it growing and will be a massive ecosystem, right? And this is admittedly centered around things that work with the JSON API. So I'll just highlight a couple of things Don't tell me my arrows are wrong, although I mean you can tell me my arrows are wrong I like to learn so One of the things that we are working on is this idea of the JSON API being able to Call embedding services on your behalf. So this is an arrow that doesn't yet exist But will in the near future So we like this idea that there are these two paradigms in working with a powerful database like a standard So there's the classic CQL base that you've always been able to do and now we have this new document base paradigm That sits on top of it. So do you? Have to choose. Where is all this going? Here's another Couple of slides where we're going to talk about what's coming. There's no dates or guarantees or implications of anything right right now we have a Preview available in data stacks astra so you can run and play around with all this stuff without having to Install and configure anything So there are there are instructions for creating a vector database in Astra and Then some instructions that are available on how to use some of the different clients like the mongoose in particular and then So we've just added the vector search capability to the JSON API There are things like people are asking. Can you have analyzers? Can you do text search features? This has already been Or is I'd say is in the process of being added as a feature of astra and then coming as a open-source contribution for SAI so Guess what we'll probably be including that in the JSON API as well Similar idea, you know people want to be able to do geospatial searches. Okay, we can look at doing that I'm just throwing out the laundry list of things that were being asked I've mentioned this idea being able to actually calculate embeddings on people's behalf and you know Who knows what else is next but One of the things that I think is really cool is the JSON API is itself a source of search requirements for SAI So there's actually, you know, we could draw a little spiral of interchange where we have these two different views of data cql and JSON and then there's some complimentary things that are happening in terms of changes to Cassandra That are going to bring a lot of power to cql and a JSON And then, you know, maybe these will actually even be more and more interchangeable in an interoperable over time all right, so This is a conversation starter for you and me after this talk I've had a couple people hit me up lately about hey, you should write a book Hey, Jeff, you should write another book. I Don't know why I entertain these conversations. It's a lot of work to write a book, but People have hit me up with a couple of ideas and I'm looking for input on what to do or Just to talk about books and that that's fine, too So yeah, there's a definitive guide that has not been updated for a Cassandra 5 could do that could write a new book on Cassandra for AI specifically could write a book about vector databases or Maybe you would like to tell me that no one is going to need books anymore because we have LLMs This is also a valid input So I'm curious about what you think and having a lot of these conversations Um, is there something is there some body of knowledge that is emerging and is Well-defined enough that it's not going to be irrelevant if we write something in six months that will have some staying power Hey, I'd love to know what you think about it And that is the conclusion of This presentation. Thank you. I'll take some questions. If I have time. I don't know if I went over Or I guess I'm two minutes over, but there's no more talks after this That's right. We can do whatever we want We can go have adult beverages You could have left it any time Eric Awesome. Okay. Yeah, grab this one because you know all the stuff that I flew by and made reference to the links are on here There's a QR code about Where you can go and just get on to ask her quickly and get going with a bunch of this stuff So I'll leave this up and Yeah, you can ask me something if you want and then I'll turn my mic off and we'll go Any burning questions, which one do you want? Oh, I uploaded the whole deck To the yeah, if you go to my talk in the website or the app, it should the it should all be there Yeah, you can oh, yeah, do you want to guys want to do demos we can we can go play with the app now? Yeah, I have no idea. It's not where the Expo hall was. All right, go on get out of here. Thank you everyone