 Welcome back everyone to theCube SuperCloud 6. I'm John Furrier with Dave Vellante here and our Palo Alto studio is live for our presentation all day. We're going to talk about the AI innovators. We've got two great startups here. We've got Venkat, the CEO of Rockset and Kyle, head up product at onehouse.ai. Building AI apps in the cloud scale. Guys, great to see you, Venkat. Good to see you. Great to come in now. Thank you for having us. Thanks for having us. You guys are a startup that's growing rapidly both your companies, but you're on the growth curve and you're in the middle of what I call the perfect storm. If you're not on the right side of this wave, you're either going to be driftwood or miss the wave. So you guys are in good position. We've had you both on the queue before. The big conversation is, okay, as startups, you guys got to get that next round of funding. Okay, we're just going to come down to the success and given that you're in the hottest area, both of you guys are. Data is the number one thing people are talking about with Genevieve. Bad data is bad gene AI. Then the next question is what infrastructure am I going to run it on? That's not about a server or cloud. That's a collection of stuff or a system. So these guys, this is the hottest area. So I want to get your perspective. One, as an innovator, how do you guys see the current market relative to how people are keeping up with the trends? Venke, we'll start with you. For sure. Thanks for the opportunity. And in terms of what's happening, the Genevieve applications, LLM based applications created a new category of applications that require a different data architecture. You can't use the traditional databases and traditional backends to try and build an inner weight in this space. And now I think the next wave of applications that are being developed is that every application is getting enhanced with AI. So it's not just that there is a new category of applications like chatbots and other things that are being created, but every application now is getting enhanced with AI. And so the demands on the data architecture that allows people to build these AI applications quickly and efficiently at scale is the most important need of the hour. And that's exactly what we're doing at Rockset. As a search and analytics database, we power applications that needs to be modern and that needs to be enhanced with AI. And so that's what we do. I think we have customers like JetBlue building generic applications on Rockset. We can talk about those kinds of use cases. But yeah, I think it's really, really important to continue to innovate in the space that the data architectures are enhanced with, you know, ability to store vector embeddings, ability to index vector embeddings, and so that you can extract value from both your structured and unstructured data. I know you guys are doing a lot and you're probably want to come back to that, but Kyle, I want to get you in here on this one question because like you guys are both doing things that aren't categorically what other people used to do. It's like, hey, they do observability. They do a database. They do that. Those were categorical, maybe they're magic quadrants or whatever. But now we're in a world where the needs are different. You need to be multiple things. So these new categorical formations are starting to happen. I noticed that both of you guys both have that same dynamic. Can you explain what's happening and why it's happening? Yeah, the wind is at our back, like you mentioned and team this up. We're building the businesses in that right space right now where there's a lot of urgency and a lot of demand. Everyone's trying to build better data structures and data structures and systems, as you mentioned that can be ready for building AI applications and generative AI. And what we're building at one house today is what we call the universal data lake house, something that can unify data from all of your variety of sources, whether these are event streams and you need to bring in real time data into your system or whether these are transactional databases that you have on a line of business applications and you need to bring in change data capture or even just pull out any of the data that's inside of your swamp of a data lake and add structure to this data, governance to this data, performance optimization. So it's analytics ready and ready to train your ML models, build and create the vector embeddings and store these in any downstream system that you need and leverage any tool that's out there to then go build that. So the big innovation in the so-called modern data platform of let's call it in 2015 to today was the separation of compute and storage in the cloud scale. We know that well, a lot of money came in. Obviously, Snowflake and Databricks got escape velocity and the hyperscalers obviously play there. So I'm curious as to how you see the next generation platform. We call it the sixth data platform. We're kind of playing around with names, but it's very different. It's powering intelligent apps, much more real time. That's why we love having Uber in because it's like real time people, places and things. As well, it's unifying all the different data sources. Now, of course, you've got existing data platforms trying to get there. How do you guys see this playing out? Maybe Venkat from the sort of intelligent apps perspective and Kyle, maybe from the, what is that future data store? What does that look like? I think if you look at these modern AI applications, the data architecture largely has two really, really important sides to it. There's this one side where you're training, you're either building new models or you're enhancing fine tuning existing models with your own data sets, which is your proprietary data that is helping you fine tune and build these models better. And so there's this whole infrastructure around how do you aggregate all this data and how do you use it to build really, really efficient models that helps you build products that are enhanced by AI and or you're taking other open source models and you're fine tuning it with your data sets. And then there's the inference side, which is when you take those models and you extract embeddings out of that and you still need a serving tier and to build applications on top of. And so both of these things need to get enhanced so that you can have these very fast iterative cycles. And the other really important component that wires across this entire stack is real time. And so AI applications, there's very few AI applications that can work in batch mode. And I'll give you an example. Let's say you wanna build a song recommendation. If you just, based on the song that is being listened, if you just kind of recommend another song that is closest to it in terms of vector space, that's not actually gonna be a very good recommendation engine because you might have heard, just heard that song five minutes ago. So that's a very bad recommendation for that individual user. So you have to incorporate behavioral data, you have to incorporate real time data in order to make these products really, really effective. Just real quick on that one point. I think this is really nuanced, but it's important. The vectors are important for identifying context. What you're saying is a new kind of data set needs to be behavioral in vector form or other form? No, it's usually in metadata. It's usually called metadata filtering in the AI kind of parlance. It's really important to combine traditional data sets and real time state that you have about what is happening with the listening history and other things in this particular example. And you have to incorporate that alongside the vector embeddings and vector search to build contextualized, personalized kind of recommendation engines and so forth. This is a unique use case that's situational based upon the GNAI movement that ML and GNAI have pulled out which is the New Holy Grail contextual and behavioral accuracy and personalization. And for that you need real time and whether it is on the lake house side, what one house and Apache hoodie allows you to kind of build your entire data architecture on the training side, on being able to accumulate all of that in real time or on the serving side, you really need a unified search and retrieval system which is what ROXID is. So, Snowflake would say put it all into Snowflake and they're going to do fine with that. They bring AI to all that. Snowflake, but we know the but. But it's hard to get transaction data. We can't get Unistora. So, okay, so last June, I guess we saw Databricks with Unity Catalog and said, we'll bring anybody's data in. Okay, that's cool. And then Amazon, you got metadata and glue. You got metadata in data zones and it's sort of all over the place but they'll figure that out. So, is that the problem you're solving? How do you see the world differently? Yeah, that's a great question. And I think that with the rise of AI and new tools are coming out every week, every month, you probably see new open source projects starting up and customers need interoperability and choice. I think I even read some of the research that you guys published from ETR that showed that 40 to 50% of accounts that use either Databricks or Snowflake are using both. And now a new trend that you've seen also from Snowflake, they're supporting open table formats such as Iceberg on the Data Lake side. And Databricks, of course, Delta Lake was created there. Now there's a third one also that's been around since 2016 that came out of Uber and that's where the origin story for OneHouse is as well. It's called Apache Hootie. And so right now customers are in kind of a tricky situation where if they are using Databricks, Databricks wants them to use Delta Lake. If they're using Snowflake, Snowflake wants them to use Iceberg. So we also just recently launched a new open source project. We co-launched this with Microsoft and Google and it's called X-Table, Apache X-Table. It's incubating into the Apache Software Foundation right now. And so this is a fresh news hot off the press, just put out the... Is Elon involved today? No, no, no. X-Table stands for cross-table and the purpose is to enable seamless interoperability between the... Yes, yes. So what's the impact? What is that going to enable? What is X-Table's going to enable? What's the impact? So now you no longer have to spend months. I've seen organizations spend months and months of analysis and they get trapped in analysis paralysis of which table format should I choose and which one. But each of these communities, there's purpose behind these communities. They're all three great projects and they all have special features behind them. And so now people are like, hey, do I need something that's closer to near real time and has a faster ingestion capabilities like Apache Hootie? Or do I need something that has a really great a specification of the format like Iceberg? Or do I need something that works incredibly well with Databricks like Delta Lake? And now they're kind of trapped in this decision-making process that's very hard to come to. Yeah, I want to explain. I mean, our view is we're moving from a world that is very application-centric to one that's data-centric. Doing that, you've got metadata that's locked inside of all these applications, silos. And that's the problem that you're trying to solve because applying AI to that. And then when you get to the point of systems of agency taking action, you can't do that unless you have a unified data source. And like you called out, like there was the great innovation of decoupling compute from storage, right? And the rise of Snowflake and these other tools similar. And now it's a time like you guys have talked about this sixth data platform, right? Now is the time to decouple data from storage and from all of these computer, everyone wants to build a vertical optimized stack. Yeah, we call it separating compute from storage that moving to separating compute from data. Yes, yes. All data sources are then available to any computer. We're not quite there yet, but there, because it feels like there's a lot of missing gaps. You guys are trying to fill those gaps, obviously. That's right. So the data ops is, so again, back to this point, if you believe that unified data or separating data from storage is going to happen, which we do, then look at general AI to your point and what you're saying is that real-time information, data addressability has to have low latency availability. Exactly. Highly available and high availability at the same time. That's not like the way it was. It used to be very slow and you call a database, you get stuff now to make data freely available is also has risks. Yes. You got security risks, you got exposed data, so you got to have all that governance kind of built in from day one. This is disrupting the market because all the data applications were stove piped. Okay, now if you take the silos away and take the stove pipes away and make it horizontally available, then you start thinking about, okay, what does that mean for the categories like observability? What about things like data? What's working with context? If I have vectors, if I'm generating answers, is there memory? Is what's good? How do I observe that? So every category of these big markets are inadequate. Yes, I think the models, the autodegressive LLMs, I think are doing a phenomenal job at with enhancements like RAG, right? Like one way you're infusing context is you turn a prompt into a similarity search, you retrieve relevant results, you add it to the prompt even though the user didn't actually specify it and using that the LLMs able to spit out a collection of words that happens to be actually quite accurate in a quite a number of situations. But again, if you think about an airline, major airline building a chatbot, what are people asking questions about? Real-time flight information, right? Take any particular generic application, recommendation engines, anomaly detectors, is this anomaly new? Is this happening now? Why does it matter? So in any part, in any application that is sophisticated and is actually gonna have real-world impact, but getting enhanced with AI, real-time has to be part of the answer. Otherwise it's a library that can give you static information about some book that was written or something like that. So it's able to kind of summarize that for you, but if you really have to put that to production and actually have built better products, enhancing, enhance with AI and all these. This is why we use the metaphor of Uber for all. You've got real-time riders, drivers, people, places and things, routes, transaction data, you've got- Got an application. And you've got prices and ETAs and the like. And those are all different data elements. You're bringing that together in real-time and now they have thousands of engineers building this stuff. But for Uber, for any organization to be able to build that, they need some kind of horizontal layer. So how do you deal with transactions? That's the hard part. Yes, that's right. Yeah, and transactions and coming from, there's a tricky art of trying to stitch your data across transactional systems, operational systems and analytical systems, right? And change data capture has been around for a long time. But when you try to bring this into data lakes and object storage, things like S3, that's when it gets even harder. And that's why some of these projects were born. This is where if you read back in the origin story from Uber itself, they were trying to bring transactions to a data lake and object storage, a dup at the time. And that's where there were Apache Hoody was born. And certain metadata layers were built around this to enable you to mutate data on otherwise immutable data inside object storage. So once you can bring systems from transactional databases and systems from like old to be systems, now you can have these in lakes where you can bring the best of the tools to the table, whether these are things like Databricks or with Apache Spark or Apache Flink, Presto, Trino, you name it, bring it to the table, you can create your vector embeddings from there, then serve these down into either vector databases or combined with search capabilities like Rockset and stitch this all together through your data architecture. Well, you guys both are doing great work as innovators. And we appreciate you coming on theCUBE. We have a few minutes left. And it's also hard for startups that are like in your category where you're doing things that don't look like it was before. And so, as Andy Jassy used to say in the early days of ADS on theCUBE is, you have to be misunderstood for a little while before people get it. And now people have started to get it. So last couple of minutes we have, I want to each you guys to talk to the cameras on the audience from my perspective of, I'm a customer and I'm an investor because you guys are doing a round of funding soon. Probably got tons of term sheets. What's the vision? What should be the customer to be thinking about your company? Why do you exist? What's the main thing that you do that's different than what people might think you do? And then if you're an investor, what's the pitch deck look like? What's the summary? I can go first. Rockset is a search analytics database built for the cloud. And if you want to build a generic applications, you want, you know, what customers are looking for is not one vector database over here, one another database for traditional search, and then another completely different system for real time analytics. We are a single database that can do unified search and retrieval. And we are built for powering end user customer facing applications. And so you can build very, very powerful applications and we're entirely built for the cloud. And so build your applications faster, extremely, you know, quick time to market and scaling efficiently in the cloud. And so we have a massive number of innovations we can index in real time with, you know, sub 100 millisecond kind of response times and data latencies. You can build very, very powerful applications and scale them efficiently in the cloud. That is what we do. And so Rockset, you know, wants to unify, you know, unifies all of your data sets across traditional search, vector search and real time analytics so that you can build these powerful applications and all simply using SQL. Real quick follow up there. Why is vector search important now? I mean, it's been around for a while, in context of the retrieval, augmentation generation or RAG as they call it. Why is it important? Why are people paying so much attention to this and why is it important for an enterprise? Yes, I think that is the best way for you to enhance, leverage all the innovations happening in AI. I'll give you a very simple example. If you're trying to do keyword search in your application, if somebody wants to type cold beverage, you also want to suggest iced coffee. It doesn't have the word cold or beverage in it. And that is the kind of, you know, enhancement your application can have when you can enhance your search applications with vector search. And so this is what I, you know, our thesis is that AI applications are not a new category of applications that sit by the side while traditional applications keep going. Every application that you're going to be using in your daily basis are going to get enhanced in AI and you're not even going to notice it because you're just going to take it for granted when I search for cold beverage, iced coffee should show up, otherwise you just won't use that application. Because the keyword match doesn't linguistically match by the language, but the vector calculates it differently. Correct. Okay, great. Kyle, you're up customer and investor. Start with the customer. What do you guys do that's different and important? And what's the pitch deck? Yeah, alongside with the customer, I'll share a quick story as well, but also a question for you. Do you see as companies and organizations go to build out and innovate around AI, you probably see pocket books open, right? In contrast to where general market trends are of a budget's tightening and things like this, I think any organization I've talked to, they're still great budget for building and innovating around AI. Yes, and we talked about that earlier. We showed some data. Great. They showed that and it's stealing from other budgets actually. Great. But they got to see proof points. They got to see proof points. They want to see instant ROI. But also in this new emerging domain, there's also baselines that haven't been set yet. And as companies go to build these, they're very quick to building architectures that will work for like a quick applications and quick innovations as they're building these new platforms. But now as the industry starts to mature and your data systems start to scale, you'll see opportunities where people will be pulling back from specific single databases and vector databases back to systems that can scale around data lakes. And so I'll share a quick story. There was an organization, I even wrote this up recently where they were trying to generate their vector embeddings from images and other text inputs. And it was taking a very long time to get it done. And they're running at data lake scale, which typically is like 100X where you sit from data warehouses or otherwise. And so they modified this from using specific single tools, brought this to data lakes where they could create them beddings, but then put them beddings in the purpose built system to run the vector search. And once they did this, they were able to get tremendous cost savings and time to vector embedding creation. And then they could put it in the appropriate system for the vector search and really fast analytics on top. And so to round that back from like one specific customer story, if you are aiming to build a data lake or data lake house, one house is the fastest and easiest way to get it done. Point and click, very simple to pull in all of your data from the variety of sources and not just bring it in, but now your data can be performance optimized. And we take care of all of the specialized database and data warehouse functionality, indexes, clustering, the things you take for granted in a normal database or data warehouse. We bring these to the lake and make it seamless for you to have all of your data analytics. What about traction? Where's the traction point you guys seeing right now? Yeah, traction is great. We see from all sides of the market, people want to build these data systems and structures and they're eager and they're like really pushing to get into these areas, whether it's coming from cost reduction or whether it's coming from innovation. So you see customer adoption on your side. Benkai, where's your traction point? Hundreds of customers in production from large enterprises to a 10% startup. Okay. We have customers like JetBlue, customers like Klarna, building anomaly detectors and generic applications on Rockset and they all want to scale. And coming back to one thing you mentioned, which is they have to see the ROI. So a lot of the AI applications are migrating away from the tinkering phase to I got to run it in production. Okay, show me the ROI, what is it? And where do I run it? Yes. Where do I host it? Do I host it in a clustered on premise or in cloud or cloud operation? Exactly. And Klarna is a customer and they published these results that they've had by adopting GenAI in the last week or so where they have replaced the work of 400 plus support agents and are able to do that. GenAI agents are able to do that at a fraction of the cost. So that's the kind of ROI you need. And the other really important point you made is this is not like a separate workload that's as often the side, 100% of workloads are going to be intelligent AI infused. And that is quite different than what we've seen in the past. Well, we actually a little bit over time but since you're here, it's a great R&D and good research for us too. Quick question for both of you. Is there a new persona emerging that's handling all of this? Because it's not the observability team. It's not the cloud native team. Although there may be a collection. Is there a new, it's not IT racking and stacking. Is there a new persona in organizations that are handling all of this? What I'm seeing is that the research department, traditional data scientists are really now being challenged to try and bring the right models and find you in the right models with proprietary datasets that every enterprise have so that the AI can do what that business needs as opposed to being very generic LLMs. Are they new teams forming? Are they pre-existing security teams? Are they platform engineering teams? I mean, is the dev ops mainly dev ops teams? I think it's too early to say. I think everyone is challenged to try and adopt this because it's in the, you know, the things are moving very fast and evolving very quickly. But I think every function from, you know, analysis to data science to architecture. It's a new IT, it's a new thing. Yes. Well, compared to like traditional software engineering the tools and systems are still pretty immature for data and data engineering. But as we mature these tools and systems around data engineering, I think there's slightly a new role that I hear talked about of analytics engineer and someone that has wicked good chops on SQL and other things. But now as data tools and systems have become easier and more accessible now they can also provide those roles that were typically. It's the perfect storm. You've got the cloud native scale folks who know horizontal scalability. And then you get the, and also know data engineering SRE types and dev ops, dev sec ops. Then you got the analytics that have been handling the crown jewels where the data value is coming together, which by the way was not the motion we've seen in the past. Those were protected, highly protected, data science driven. Hyper specialized. Do you see that consolidate? I mean, I know there are teams of, you know, 30, 40, 50 people in the data pipeline. Data engineers, quality engineers, business analysts, they all have their own little sort of job to do and nothing can happen until that's done. It's a very linear sequential present. How do you see that changing? Will that collapse? Will AI, how do you see AI affecting that? Any thoughts on that? I think the data stock has it enhances AI. You know, some of these things would get automated away. Right? I think it's, you can now look at the data and summarize it without really going into the deep and pre-processing it. I think the way people are interacting with the data with these visualization tools and SQL based analytics, I think it's going to get upended with prompt engineering, you know, like you should be able to just, you know, talk to your, what's happening with my business, where do you really need an army of data analysts or I see XX to ask some very important questions about the data and interrogate the data. So I think the whole data architecture I'm saying is evolving very, very quickly. It'll set the table. It's going to set the agenda. We totally agree. Yes. The CGAI is emerging. A chief, generative AI often. Yeah, there you go. Guys, thanks for coming to Venkat. Great to see you. Thanks for having us. Guys, thanks for contributing this great conversation here on SuperCloud Six. You guys are both AI innovators and good luck on your next round. I'm sure you've got term sheets flying at you right away. Thank you so much. Cube in there for a little around the cube capital. Cube. We just started that yet. We're working on it. Thank you so much. Thanks for coming on. Okay, we'll be right back with SuperCloud Six. AI, you understand the short break? Stay with us.