 Hello, welcome back to day two of our CUBE coverage of Databricks' data plus AI. We're live in the lake house. I'm John Furrier, host of theCUBE, Rob Stretchy, our analyst here, breaking out all the action. Our guest here, Joel Minick, Vice President of Marketing, Databricks. Been around the block in the industry. Seen a few waves before. Joel, great to see you. Thanks for coming back on. You can see you again. Thanks for having me, John. So day four, how you feeling? Feeling fantastic. Fantastic, it's been an awesome conference. I love the generation AI theme. It speaks to a developer culture and data science, data engineering. There's a real transformation going on with LLMs and foundation models. You guys are right in the wave. You've seen this movie before with Spark, what it did to Hadoop, how that grew in the cloud. Now you got foundation models, LLMs. You got a new culture developing in this generative AI space. Big part of the theme here. How's that tying together with the lake house? You got some big announcements. What's your take? Yeah, for sure. What it felt like to us was as we looked out there and what was happening with generative AI is we found the catalyst. We found the catalyst for what clicks deep learning in the minds of organizations out there, of developers out there, on what this technology can actually can do. And now it's on the lips of everyone and thinking about how do we empower employees better with generative AI? How do we empower our customers to our products better with generative AI? And we wanted to lean into that this year and really make sure that folks understand what is that, how do I take best advantage of that technology? And ultimately what it comes down to when you look at how do I have success with machine learning and generative AI, it's data. Because the best algorithms in the world can't overcome that data. And so we are really finding this great intersection between the conversations around the lake house and the conversations around generative AI because that's what the lake house offers is for the first time uniquely, my AI platform and my data platform are the exact same thing. And that gives me such wonderful insight into the provenance, the lineage, the quality of my data, I just can't get it anywhere else. I love how Ollie describes the lake house. The lake is the unstructured data, the house is the structured, going to have that kind of world. And that's been around for a while, there's nothing new, you got 3.0 announced here. But the thing that's interesting to me is that the Delta sharing is a protocol that's now open source and working. You have a marketplace developing, where you have apps, data apps coming in that have a big headroom with generative AI, that's clear. And then you got this uniform unification going on. So Ollie said, forget the format wars, we're going to unify. That's a huge deal. How did that come about? What was the conversation like? You guys like, okay, let's just unify this, let's just do it. What was that decision about unifying the formats? Sure. Well for us, when it comes back to what has Databricks always been in pursuit of? It's been in pursuit of the democratization of data in AI. When we thought about Delta Lake in the 10 years that it's taken for a lake house to mature to the place it is today, it was always about, look, there's all these proprietary data platforms out there. And that restricts what people can do. We should disrupt that. We should find a new way for folks to use data. And that was what was led to Delta Lake and to knock those silos down. And the wonderful thing about the lake house is how much it's proliferated, but we also found that we got in this world where there was beginning to be a format war, where there's the Delta Lake standard for building out lake houses, but there was also iceberg and there was also hooting. And a format war is never good for a customer. Nobody wins. Nobody wins. And so we looked and said, if we come back to our roots of democratization and unification, what would we do to solve this? Well, we would develop a universal format and allow Delta Lake to simply be able to read and write to all lake house foundational systems, regardless of what they're built on. And we feel like this is absolutely the right thing to help democratize the evolution of the lake house farther. You know, Rob Stritch and I were talking on the cube yesterday, and we're kind of not goofing on the culture of Berkeley as always there, professor. But Berkeley has been the ground zero of a lot of great innovations and standards and revolutions, frankly. You know, BSD, you know, UNIX got beat by opening up that proprietary system. So a lot of systems thinking coming out of Berkeley, but that mindset of like just moving forward, not arguing this idea of standards. Every major inflection point in the history of the computer industry has a moment where people just saying, hey, let's not fight over things. Let's organize. Let's just agree on something and move forward because everybody wins. Exactly. And so this is happening now with one, the format was one example. I'm expecting there'll be more with Genevieve AI as LLM engineering becomes a practice. That's a big theme coming out of this show is that LLM engineering is emerging. Property engineering. Whatever that means. But the notion of data products are emerging and Rob and I coined the term data developer. We see a future where the developer is going to be coding in line with the CNC pipeline with data. That's going to be provided to them by the lake houses of the world and products. This future's emerging and it's happening right now. How do you guys look at that as a marketer and as a business person? You say, okay, we want to nurture it. We don't want to foreclose that. How do you look at that? This data developer, this notion of operationalizing data at scale. Right. That connection point between the developer and data is something that I think we've always leaned into a lot with Databricks and making sure that we have a very wide open ecosystem of we support Python, we support SQL, we support Scala, we support R. If you want to go work in C++ or Java and you want to do that through VS Code, we intersect directly with VS Code. So being able to make sure that what we're doing is empowering developers to use data because that is what is going to separate the winners from the losers in terms of application development going forward. But one of the other big things then is, well, how do you make somebody able to use data as effectively as possible? And that's what was a lot behind the idea of Lakehouse IQ that we talked about this week in giving somebody access to an assistant that can help them develop applications that use data, but do it in a way that really understands their data and not just give them some generic LLM that can supply some SQL statements. But instead, when you ask a question, say, you know, I need the application to be able to understand what is the revenue for product foo in the European region since introduction. If I asked that to a normal coding assistant, it would have no idea how to answer that question. But with Lakehouse IQ being able to understand how your data structure, how your organizational charts work, how your data gets moved between different teams inside of your organization, we can say, I actually do know what product foo is. I do know how you structure your European region, and I do know when this product was introduced in each one of those countries. And therefore, I can tell you exactly what it is you need to go and put into your code to get that right answer. And so it's all about how do we help every developer be 10x more productive than they're able to be today. What are some of the hot things that you guys like around these augmentation technologies that helps developers and humans? I saw a vector database was announced, that's a hot area right now. Open source, Milvus and the Linux Foundation, other open source projects are out there. Vector databases are hot. How does that relate into things? Because that seems to be a hot area with LLMs. Well, it's something that we wanted to make sure that folks had access to inside of the platform. But most importantly, we're able to take advantage of a lot of the data they had already built up. So when we talk about vector search inside of Databricks, what we wanted to do was make it very fast and very simple. So what you're able to do is take the data you already have in your Delta tables, and now convert those into vector indexes with just a few lines of code. And so able to get those vector, those embeddings out there, ready to be used for your LLMs, much, much more efficiently than what you'd be able to do otherwise. It's such a smarter way to keep indexing. Exactly. A new way to index content, to make it available. Okay, the next question is, is that I love the platform story, Ollie said I'm a platform guy. We are too. We love the Cuba platform people. We get that. When you look at platforms, the trade up between best of breed and platform is interesting because you want to have things native to the platform to enable value on top of it. At the same time, you want choice for the customer. They might want to use a vector database somewhere else or something else. So as you look at the platform story, how do you think about that? How should customers think about platforms versus having things native versus plug-in third-party models or whatever? How do you guys talk to the customer about that dynamic? Well, that goes back to the roots of why all the fundamental areas of Databricks are built on open source, right? The data processing built on Apache Spark, the data storage and management built on Delta Lake, the ML workflow built on ML flow. Because when we build the platform that way, we're able to say there are things that Databricks will always offer natively and there will be places where we think we can innovate and provide value, but there's also this vast ecosystem out there. And because we build all of these technologies on open standards, it's very, very easy for you to bring those applications, those tools into Databricks as well. And going back to this idea of unification and democratization, that's super important to us and that's why this open source foundation has always been at the heart of how we build. You can't argue with democratization, that is a great message. And it's legit. I mean, look at every trend, democratization putting the tools in the hands of normal people, developers or humans was always the key thing. And I love the reference to the PC revolution, a little bit old school for some of the younger demographics. But that cycle of innovation, it disrupted the mainframe and the mini computer, which were timeshare based systems. So we're seeing that kind of evolution now with LLMs as a marketer, as VP of marketing, you are steering the ship. Not just messaging, but like you got to look at your audience, your customer. What's some of the things that you see that others might not know about for Databricks around the demographics, the makeup of your customer. I'm seeing it definitely a different demographic than in the classic data sense here. A lot of developers, we talked about, what's the demographic, what's the audience, and how are you steering the ship? Because there's a lot of hype, that's definitely out there. But when you unpack the open source, when the hype goes away, you got to have some meat on the bone. For sure. So how are you steering the ship? What's the demographics like? Yeah, the demographics, we think about it as what is the data team inside of an organization? And we try to always be sure that we're talking to that set of individuals. And it boils down kind of at the heart of it to developers and data professionals who are focused at, first of all, how do I get the data and process the data? So we target a lot around data engineers. And how do we make their lives easier? We take our, once you've got that foundation of the Lake House, then we also think about, well, I want to do traditional analytics on that. I have to do BI on that data. I have to do reports on that data. And that's where we focus a lot of our Databricks SQL and how we think about the data analyst and what they're doing to be able to get more insights out of that and what the Lake House can do for them by being able to let them do those types of analytics on all of their data, not just what's been exposed in the traditional data warehouse. And then the last audience is really that data scientist, that ML engineer who is out there then taking that data, often marrying a lot of some of the analytics that have been done on the data warehousing side to go and build those next-generation models that are now powering all those intelligent applications. So it is really a cross-section, a cross from startups to enterprises, about all the folks who are trying to get value out of data. And soon that's going to be abstracted away as they get better, that'll be abstracted away. And we're looking at growing that pie bigger and bigger, absolutely, by being able to bring more natural language to the platform. So even if I don't know how to code Python or SQL, I can still make value out of it. So I got to ask you on the marketplace and the apps. Interesting dynamic going on there. You got a marketplace that's not a lock into the cloud, so it's open to everybody, love that story. And the clouds will still be successful with their marketplace. You know, they obviously, they got that procurement vibe going on there. But the apps are interesting. I interviewed two co-founders on the queue yesterday who are basically building data apps on top of Databricks. And one is kind of like just open in the marketplace and they know that full Databricks customer, but they work with Databricks. So you're allowing people to come in that aren't just Databricks customers in the marketplace with these data apps. What is that happen? What is the data app? How is that evolving? How do you guys see that emerging? I'll say it's a hot new thing popping up, these data apps. What are they? Yeah, it falls into two camps usually. One are applications that help me get data into a lake house. And then the other side of it is applications that then help me make use of that data. One that we talked a lot about this week, Kumo AI, who has a way of kind of marrying SQL with machine learning so that machine learning becomes much, much more accessible to folks. But you know, your comment about apps and how we think about that. One of the things that's super true about applications is especially in the data space, is there's all these fascinating startups that are coming out right now. Startups really struggle with how do I get my data app inside of an enterprise because I don't have the teams to go do all these security assessments, all these procurement dialogues, all the negotiation with legal departments, but the data has to be secure inside these enterprises. And so with lake house apps now, being able to containerize that application, deploy it securely into the customer's tenant so that there is never any movement of data. The application inherits all of the security and governance and protection of data bricks, so that enterprises can adopt these technologies very, very quickly and get all the benefit of innovation that's happening out there in startups. It's kind of liberating for the folks that have been doing this work in the past. It's been hard to stand up, say a neural network, or setting up provisioning the apparatus of data, was heavy lifting and pain in the ass, that frankly, right? So now you have people saying, hey, I got a neural network, I got recommendations as a service, I got this product that I don't have to build a heavy lifting with data lake and the data lake house. For sure. So this is going to, we think, spur an ecosystem. How do you guys look at that ecosystem? I mean, you have an ecosystem, you're not an ISV. You're not just an ISV, you're a platform. Yeah, and that's the beauty of the lake house, John, is that what we are trying to make available is as much data as possible for this ecosystem to thrive. We talked about it yesterday. Two exabytes of data are processed on average on Databricks every single day. And that is a tremendous amount of knowledge and value for the world of data apps to take advantage of with customers and to find value in customers. And so by making sure that all of this always plugs back to Delta Lake, it plugs back to open standards so that they can get into the Databricks ecosystem as fast and easily as possible, is always job number one for us. Joel, great to have you on the queue. I know you're super busy. Thanks for coming on and sharing your insight on how you see things in Databricks. You got a steer the ship. You got a demographic emerging young developers too. Young and old, like me and my age, you know? I mean, data is really key. Absolutely, absolutely. Thanks for coming on. All right, well thank you, John. We're here in the Lake House. I'm John Furhose, the QB right back. Rob Stresch and I will do a breakdown of the keynotes after this short break.