 Good afternoon AI fans, and welcome back to Google Cloud next here in fabulous Las Vegas, Nevada. My name is Savannah Peterson. It's day three here on theCUBE. We are over 30 segments in to our fantastic programming across three days. John, my co-host and co-founder of theCUBE. What a week for us. That's been great. I mean, first of all, we're AI drunk all the time on theCUBE, but this is like big time AI. It's in everything they're talking about. Absolutely. It's got all the winning formulas for theCUBE. Speeds and feeds, high-scale performance, embedded AI and end user experience is just awesome. Great event, and this next guest is with a really leading company. Still private, but doing amazing, open things in AI and all data, so it should be a great segment. Yeah, absolutely. Let's welcome Chris to the show. Chris, thank you so much for being here. Thanks for having me. Has the week been as exciting for you as it has been for us? It's been fantastic. This is my first Google Next event personally, so it's been a lot of fun. Yeah. Did I think, what stood out to you about the experience so far? I just think the partnership with Google for us has been amazing, and this year in particular, it's really accelerated. And so, talking with a lot of the teams from Google and having as much traffic at our booth, I'm trying to better understand how our platform integrates into Google has been great, and we've been very fortunate. We're the Google Technology Partner of the Year, which we're very proud of. That's a big deal. Congratulations. So we're very thrilled. That's a great accolades from Google, Obviously Partner of the Year, but Thomas Kurian in his keynote day one was really kind of laid it out with a cloud that has first-party, they're a first-party LLM, third-party LLM and open source. You guys have introduced a new LLM, DBRX, okay, got a lot of awesome reviews, number one on Hugging Face. People really liked what's going on performance-wise, so open source models converging in an adoption and speed and performance with the pre-existing proprietary models is quite an accomplishment, shows growth and appetite for AI from developers. How's this converging into the execution for companies that are evaluating it because the data lake has been a great strategy, you get all the data in there. Now developers are coming in, data as code, we've been covering that. Here, this seems like the nexus of where AI and the data is everywhere and AI is everywhere. Where's the customer's intersection point? Where does this come together for the customer? Yeah, I mean, so this, I think DBRX is kind of the culmination of a decades-long strategy and let me just kind of go back to the co-founders of the company, research students out of UC Berkeley and when they did the initial investment thesis for Databricks, it was really three big bets and if you go back 10 plus years, these were pretty bold bets at the time. The first was that organizations were going to be moving to the cloud. Second, successful ones were going to be using open source and then third was that machine learning would be really at the forefront and so now you fast forward an entire decade. Nice predictions. Yeah, very wise predictions. I think it's one of the many reasons why the company has had the success it's had but you fast forward 10 years and we acquired Mosaic ML and there's a lot of press around the price point around that acquisition and DBRX I think is the first step in sort of the results of that which is can you build open models that perform as well as the proprietary ones and enable customers to do more with their own data in a very secure manner to either fine tune or do rag style augmentation for Gen AI and we're really proud of the results. As you said, it's the number one open model on Hugging Face. I think we're seeing new models coming out all the time. We've got through the investment in Mosaic ML we've got an amazing research team and we're going to continue to innovate in that space. How has your business changed at Databricks just over the past couple of years? I mean, we've been following. I want to get kind of a scoped perspective. Pre Gen AI, you guys had all the things going all those bets were coming in but then the Gen AI kicks things up a high gear then hence now we've got the language model. How has that changed at Databricks' focus and execution and ultimately customer benefit? Well, I think it's actually played into the strategy really well, right? The overall architecture and when you look at us versus some of the other vendors in the data space Databricks was built to be a data analytics platform with machine learning at the forefront. And so we have a world class SQL engine that allows and enables data warehousing but the core of it has been how do you get all the different data personas inside an organization from data engineering through data analysts, data scientists, machine learning experts, how do you get them to work together as a team and view one copy of the data and try and leverage the upstream work by other teams? So you mentioned the data lake and the data lakes had their pros and cons and we're proud of the fact that we pioneered this lake house concept which was, hey, with all the improvements from the hyperscalers in terms of storage and network IO and things like that and the ability to spin up containers quickly, we can actually leverage a single copy of the data stored in the object store and run all these different use cases to include large language models. Chris, one of the things that Jensen over at the NGDC conference said, he used the AI factory word. I remember saying at your event, you're enabling the AI factory model with the data lake. And now that word's come out a lot, the AI factory, they're going to give the metaphor of things are being built with AI and optimized for AI. What does that mean to you? How do you look at this AI factory positioning? How do you frame that? So organizations are really struggling with like how do we leverage it? We know it's a great opportunity but how do you do it safely? And the first thing's first is you've got to separate, I think, the use cases from externally facing use cases to internally facing use cases and for large enterprises especially where they've got a large workforce that can provide that human feedback because the worst thing you want to do is put an externally facing LLM out there, have it generate some content that's not accurate and we've got plenty of examples mentioned in industry where that's happened and it's created some legal challenges. What we're seeing is a lot of organizations starting with internally facing use cases, building off of these open models and Databricks is designed to be able to be very flexible. You could run our model, you could build your own model if you wanted to and the mosaic engine makes that really affordable or you could leverage any of these proprietary third party models through a model gateway that we've enabled. So for us, we want organizations to feel like they've got choice and not locked into any particular solution because we know it's changing and we want them to be able to orchestrate the invocation of models that meet the specific needs of a use case. You know, one of the things I've been impressed with Databricks on, especially last year's event you had was you introduced open formats, kind of changes the game and interoperability is going to be a big part of it, but also the intelligence is seeing more reasoning coming in with the AI side. So I have to ask you, are we at the peak or at the beginning of true democratization of data? Because you combine open source, open formats and scale and intelligence and data. Are we at the peak or are we at the beginning? Or where are we in the progress bar towards true democratization? Well I think my personal opinion is if you look industry wide, I think we're kind of at the beginning and the reason for it is because many organizations, these large enterprises in particular are coming off of these proprietary systems, kind of the vendor lock in and I think the technology community has really gotten sort of the wake up call around, look you've got to be more open. Now Databricks I would say is at the peak. We've been open from the very beginning, the foundational elements of our platform are very popular open source projects. Spark being sort of the key one there, but for us we look at it from two dimensions. One, we want to be as open as possible and we think the entire ecosystem should be more open. The second is we want to be the best price per performance for a particular use case. So when it comes to, you mentioned like the open formats for table structures in the lake house environment. We have Delta which is fully open sourced. We stand by it especially because of the performance characteristics for it, but we saw an industry, Iceberg and Hoodie and all these other alternatives and so rather than just stay fixed in this, we said, you know what, every time we get new data in, Delta will be the default, but we write out the additional metadata for both Hoodie and Iceberg. So if you're a customer that's got a large Iceberg kind of environment, Databricks will fit into that perfectly. Well explain why the importance of, I brought that up first, I wanted to unpack that because I think that is a real nuanced point. Why is that important? Because that enables something, that enables value in the form of what? Interoperability, ease of use, integration, walls, barriers, what's that? Yeah, absolutely, interoperability. We don't want customers to have to think like, okay, if I'm using Databricks and I have some third party system in my environment that only reads Iceberg in terms of their capabilities, why limit them in terms of that integration? We would rather solve that and say, look, we think you're going to get the best performance out of the Delta format, but under the covers, all of this is just Apache Parquet and the differences are metadata in the way in which certain, you know. It's a bold move, it's a bold move because you're basically taking a preemptive strike and saying, let's not squabble, let's get open, and everything else advances. So that will accelerate democratization. Yeah, I mean it's stronger together between the partners, between the open source community as a whole, it's the whole shebang. You touch a lot of different customers across verticals. What are some of the, I mean outside of people getting excited and not necessarily having their full strategy iced out. A lot of POCs we're seeing MVPs rather than actualization. What are some of the trends you're seeing? So again, in that sort of mode of looking internally first at opportunities, you're seeing LLMs being used quite a bit in call center operations and to make those more efficient and assist, you know, and on the one hand, people say, well, AI is going to replace jobs and replace humans and jobs. This is really more of an augmentation play. This is really making those call center agents more effective at their job, providing better customer experiences. So we're doing that. Another one, which is really interesting for us, is the ability to take LLMs, train them on call bowl as a programming language and the ability to convert that to other programming languages. So it gives organizations the ability to take legacy code and have a machine do a lot of the conversion for them, have humans validate it and prove it. What a time saver. But yeah, think about the cost savings associated with migrating legacy platforms. And it's everyone's favorite part of their job. It's migrating legacy. Sure. Reading someone else's old code with no comments and no documentation. Data is so exciting. We love data. It's always fun. But in an area we've been checking out and I want to get your thoughts on this, is that it's not always in the mainstream news. And when you peel back the onion on earnings and even in the private companies, the hot area right now is governance. And can you share your vision on why governance is important more now than ever before with Genervai? What about, what is it about Genervai that makes governance important to nail down now? And how is it being crafted or architected? Yeah, so this, I'm going to sort of call back to Google's white paper on the statistical based models for AI and they had a white paper called Unreasonable Effectiveness of Data, right? Which was this notion of a less sophisticated model trained with a ton of data is going to outperform a more sophisticated model with less data, right? When you get into neural nets and LLMs especially, the quality of that data is so critical because you want it to be trained with things that you know will provide the correct answers that your organization is willing to stand by. And so that risk, hallucinations, providing the wrong answer, those things all will stem from the data that you've used to train the model. So governance of data, understanding where it's come in, the lineage associated with it, the data quality checks, all those things are so important and that's why we built inside of our platform lots of capabilities around that. It all surfaces up through Unity Catalog which is that single pane of glass to really understand what different data assets you've got and what is the quality of the data. And also you could foreclose value if you don't, if data is not available for governance reasons or whatever or bad governance if data is missing. It begs the question, is there a best practice that you guys seeing now for developers who want to do the hard stuff with governance and compliance? The performance stuff's going to all get taken care of but the compliance side of it has always been kind of, I call it not anti-innovation but slows things down a lot in the speed game. How do you see that speeding up on the compliance side so people don't get held back? Yeah, well for us we're trying to leverage technology to help speed it up. So in addition to creating an environment where people can build and train their own LLMs and other types of machine learning models, we are embedding machine learning inside of the Databricks platform. So as new data comes in, we've trained models on the existing data and as that new data arrives we're able to compare and contrast whether or not that new data is consistent. And so we're able to flag these things on behalf of the enterprise, not just the developers, but maybe the audit and risk folks in say a banking scenario. So Databricks has obviously been good at predicting the future if we're here now. Two questions for you back to back. Actually I'll separate them. Does it feel like a big moment for you now that the whole world's jumped on the ML front and everyone wants to play with AI? Yeah, Databricks has its history and kind of its roots have been more the ground up inside of an organization finding those data science teams, the ones that are doing a lot of the hard work and figuring out how to make their lives easier and more enjoyable. And now with this massive wave of AI and the success of the company and kind of the vision I think behind the company to build this platform out and basically have it there at the ready when this wave is hit in a way that doesn't require organizations to pass their data to some closed off environment and run the security risk associated with that. You can do all the things that you'd want to do with large language models and gen AI inside the platform with your own data, your own security model. That's awesome. All right, last question for you. Going to the future prediction side of things. When we interview you next time, what do you hope to be able to say then that you can't say today? Let's see. That's an interesting question. I think what we'll see then is Databricks will be a household name the same way that Google and other major data providers are. And I think the world will know more about Databricks as a result of the work that's being done. Love that. We will take that sound bite when it happens and we really look forward to it. Thank you so much for joining us today. And we'll see you at data.ai event coming up summit. You will, yeah. Sure, it's going to be a whole part. We'll keep the party going. Thanks for being here with us, John. Thank all of you for tuning in to our live coverage here from Las Vegas, Nevada. We're at Google Cloud Next. My name is Savannah Peterson. You're watching theCUBE, the leading source for enterprise tech news.