 Okay, welcome back to SuperCloud 22. I'm John Furrier, host of the queue. We have Ali Ghazi here, co-founder and CEO of Databricks. Ali, great to see you. Thanks for spending your valuable time to come on and talk about SuperCloud and the future of all the structural change that's happening in cloud computing. My pleasure, thanks for having me. Well, first of all, congratulations. We've been talking for many, many years and I still go back to the video that we have an archive of you talking about cloud and really at the beginning of the big reboot I call the post Hadoop revitalization of data. Congratulations, you've been cloud first, now on multiple clouds. Congratulations to you and your team for achieving what looks like a billion dollars in annualized revenue as reported by the Wall Streeters. So first, congratulations. Thank you so much, appreciate it. So I was talking to some young developers and I asked a random poll, what do you think about Databricks? Oh, we love those guys. They're AI and ML native and that's their advantage over the competition. So I pressed Y. I don't think they knew why, but that's an interesting perspective, this idea of cloud native, AI, ML native, ML ops. This has been a big trend and it's continuing. This is a big part of how this change and the structural change is happening. How do you react to that? And how do you see Databricks evolving into this new super cloud like multi-cloud environment? Yeah, look, I think it's a continuum. It starts with having data, but they want to get clean it, they want to get insights out of it. But then eventually you'd like to start asking questions, doing reports, maybe ask questions about what was my revenue yesterday last week, but soon you want to start using the crystal ball, predictive technology of, okay, but what will my revenue be next week? Next quarter, who's going to churn? And if you can finally automate that completely so that you can act on the predictions, right? So this credit card that got swiped, the AI thinks it's fraud, we're going to deny it. That's when you get real value. So we're trying to help all these organizations move through this data AI maturity curve all the way to that, you know, the prescriptive automated AI machine learning. That's when you get real competitive advantage. And, you know, we saw that with the fans, right? I mean, Google wouldn't be here today if it wasn't for AI. You know, we'd be using AltaVista or something. You know, we want to help all organizations to be able to leverage data and AI that way that the fans did. One of the things we're looking at with super cloud and why we call it super cloud versus other things like multi-cloud is that today a lot of the successful companies that have started in the cloud have been successful but have realized that even enterprises who have gotten by accident kind of and maybe have done nothing with cloud have just some cloud projects on multiple clouds. So people have multiple cloud operational things going on but it hasn't necessarily been a strategy per se. It's been more of kind of a default reaction to things. But the ones that are innovating have been successful in one native cloud because of use cases that drove that, got scale, got value. And then they're making that super by being on premise, putting in a modern data stack for the modern application development and kind of dealing with the things that you guys are in the middle of with Databricks is that that is where the action is. And they don't want to go lose the trajectory in all the economies of scale. So we're seeing another structural change where the evolutionary nature of the cloud has solved a bunch of use cases but now other use cases are emerging. That's on premises and edge that have been driven by applications because of the developer boom that's happening. You guys are in the middle of it. What is happening with this structural change? Are people looking for the modern data stack? Are they looking for more AI? What's your perspective on this super cloud kind of position? Yeah, look, it started with, now they are on multiple clouds, right? So multi cloud has been a thing. It became a thing. 70, 80% of our customers, when you ask them, they're more than one cloud. But then soon they start realizing that, hey, you know, if I'm on multiple clouds, this data stuff is hard enough as it is. Do I want to redo it again and again with different proprietary technologies, you know, on each of the clouds? And that's when I start thinking about, let's standardize this, let's figure out a way which just works across them. That's where I think open source comes in, becomes really important. Hey, can we leverage open standards because then we can make it work in these different environments, as you said, so that we can actually go super, as you said. That's one. The second thing is, can we simplify it? And I think today, the data landscape is complicated. You know, the conceptually it's simple. You have data, which is essentially customer data that you have, maybe employee data. And you want to get some kind of insights from that. But how you do that is very complicated. You have to buy data warehouse, hire data analysts. You have to buy, you know, store stuff in the data lake, you know, get your data engineers. If you want to streaming, real-time thing, that's another complete different set of technologies you have to buy, and you have to stitch all these together and you have to do it again and again on every cloud. So they just want simplification. So that's why we're big believers in this data lake house concept, which is an open standard to simplifying this data stack and help people to just get value out of their data in any environment. So they can do that in this sort of super cloud, as you call it. You know, we've been talking about that in previous interviews, do the heavy lifting, let them get the value. I have to ask you about how you see that going forward because if I'm a customer, I have a lot of operational challenges because the developers are kicking butt right now. We see that clearly. Open source is growing and continue to be great. But ops and security teams, they really care about this stuff and most companies don't want to spin up multiple ops teams to deal with different stacks. This is one big problem that I think that's leading into the multi-cloud viability. How do you guys deal with that? How do you talk to customers when they say, I want to have less complications on operations? Yeah, you're absolutely right. You know, it's easy for a developer to adopt all these technologies and new things are coming out all the time. The ops teams are the ones that have to make sure this works. Doing that in multiple different environments is super hard, especially when there's a proprietary stack in each environment that's different. So they just want standardization. They want open source. That's super important. We hear that all the time from them. They want open source technologies. They believe in the communities around it. You know, they know that source code is open. So you can also see if there's issues with it, if there's security breaches, those kind of things that they can have a community around it so that they can actually leverage that. So they're the ones that are really pushing this and we're seeing it across the board. You know, it starts first with the digital natives, you know, the companies that are, but slowly it's also now percolating to the other organizations that we're hearing across the board. Where are we on the innovation strategies for customers? Where are they on the trajectory around how they're building out their teams? How are they looking at the open source? How are they extending the value proposition of data bricks and data at scale as they start to build out their teams and operations? Because some are like kind of starting, you know, crawl, walk, run kind of vibe. Some are big companies. They're dealing with data all the time. Where are they in their journey? What's the core issues that they're solving? What are some of the use cases that you see that are most pressing? Yeah, what I've seen that's really exciting about this data lake house concept is that we're now seeing a lot of use cases around real-time. So real-time fraud detection. Real-time stock ticker pricing, anyone that's doing trading, they want that to work real-time. Lots of use cases around that. Lots of use cases around, how do we in real-time drive more engagement on our web assets if we're a media company, right? We have all these assets. How do we get people to get engaged, stay on our sites, continue engaging with the material we have? Those are real-time use cases. And the interesting thing is the real-time, so it's really important that you do that now. You don't want to recommend someone, hey, you should go check out this restaurant if they just came from that restaurant half an hour ago. So you want it to be real-time, but B, it's also all based on machine learning. These are, a lot of this is trying to predict what you want to see, what you want to do. Is it fraudulent? And that's also interesting because basically more and more machine learning is coming in. So that's super exciting to see the combination of real-time and machine learning on the lake house. And finally, I would say the lake house is really important for this because that's where the data is flowing in. If they have to take that data that's flowing into the lake and actually copy it into a separate warehouse, that delays the real-time use cases and then it can't hit those real-time deadlines. So that's another catalyst for this lake house pattern. Would that be an example of how the metrics are changing? Because I've been looking at some people saying, well, you can tell if someone's doing well, there's a lot of data being transferred. And then I was saying, well, wait a minute, data transfer costs money, right, and time. So this is interesting dynamic. In a way, you don't want to have a lot of movement, right? Yeah, movement actually decreases for a lot of these real-time use cases because what we saw in the past was that they would run a batch processing to process all the data. So once a day process all the data. But actually, if you look at the things that have changed since the data that we have yesterday, it's actually not that much. So if you can actually incrementally process it in real-time, you can actually reduce the cost of transfers and storage and processing. So that's actually a great point. That's also one of the main things that we're seeing with the use cases, you know, the bill shrinks and the cost goes down and they can process less. Yeah, I'm just gonna see how those KPIs evolve into industry metrics down the road around the super cloud of evolution. I got to ask you about the open source concept of data platform. You guys have been a pioneer in there doing great work, kind of picking the baton off where the Hadoop world left off, as Dave Vellante always points out. But working across clouds is super important. How are you guys looking at the ability to work across the different clouds with Databricks? Are you going to build that abstraction yourself? Does data sharing and model sharing kind of come into play there? How do you see this Databricks capability across the clouds? Yeah, let me start by saying, we're big fans of open source. We think that open source is a force in software that's going to continue for decades, hundreds of years and it's going to slowly replace all proprietary code in its way. We saw that it could do that with the most advanced technology, Windows, proprietary operating system, very complicated, got replaced with Linux. So open source can pretty much do anything. And what we're seeing with the data lake house is that slowly the open source community is building a replacement for the proprietary data warehouse, data lake, machine learning, real time stack in open source. And we're excited to be part of it. For us, Delta Lake is a very important project that really helps you standardize how you lay out your data in the cloud. And when it comes to really important protocol Delta sharing, that enables you in an open way, actually for the first time ever, share large data sets between organizations, but it uses an open protocol. So the great thing about that is you don't need to be a Databricks customer. You don't even like Databricks. You just need to use this open source project and you can now securely share data sets between organizations across clouds. And it actually does so really efficiently. Just one copy of the data. So you don't have to copy it if you're within the same cloud. So you're playing the long game on open source. Absolutely. I mean, this is a force. It's going to be there. If you deny it, before you know it, there's going to be something like Linux that is going to be a threat to your project. I totally agree, by the way. I was just talking to him at the end, like, hey, the software industry, someone made me come up with the software industry. The software industry is open source. There's no more software industry. It's called open source. It's integrations that become interesting. And I was looking at integrations now. It's really where the action is. And we had a panel with the Clouderati, we called it, that people have been around for a long time. And it was called the Innovator's Dilemma. And one of the comments was, it's the integrator's dilemma, not the innovator's dilemma. And this is a big part of this piece of SuperCloud. Can you share your thoughts on how cloud and integration need to be tightened up to really make it super? Actually, that's a great point. I think the beauty of this is, look, the ecosystem of data today is vast. You know, there's this picture that someone puts together every year, all the different vendors and how they relate, and it gets bigger and bigger and messier and messier. So, you know, we see customers use all kinds of different aspects of what's existing in the ecosystem. And they want it to be integrated in whatever you're selling them. And that's where I think the power of open source comes in. Open source, you get integrations that people will do without you having to push it. So us, Databricks is a vendor, we don't have to go tell people, please integrate with Databricks. The open source technology that we contribute to, automatically people are integrating with it. Delta Lake has integrations with lots of different software out there, and Databricks as a company doesn't have to push that. So I think open source is also another thing that really helps with the ecosystem integrations. Many of these companies in this data space actually have employees that are full-time dedicated to make sure our software works well with Spark, make sure our software works well with Delta, and they contribute back to that community. And that's the way you get this sort of ecosystem to further sort of flourish. Well, I really appreciate your time. My final question for you is, as we're kind of unpacking and kind of shape and frame SuperCloud for the future, how would you see a roadmap or architecture or outcome for companies that are going to clearly be in the cloud where it's open source is going to be dominating? Integrates has got to be seamless and frictionless. Abstractionally, make things super easy and take away the complexity. What is SuperCloud to them? What does the outcome look like? How would you define a SuperCloud environment for an enterprise? Yeah, for me, it's the simplification that you get where you standardize an open source, you get your data in one place in one format in one standardized way, and then you can get your insights from it without having to buy lots of different idiosyncratic proprietary software from different vendors that's different in each environment. So it's this slow standardization that's happening. And I think it's going to happen faster than we think. And I think in a couple of years, it's going to be a requirement that, you know, does your software work on all these different environments? Is it based on open source? Is it using this data lakehouse pattern? And if it's not, I think they're going to demand it. Yeah, I feel like we're close to some sort of de facto standard coming and you guys are a big part of it. Once that clicks in, it's going to highly accelerate in the open. And I think it's going to be super valuable. Ali, thank you so much for your time. And congratulations to you and your team on continuous. We've been following you guys since the beginning. Remember the early days and look how far it's come. And again, you guys are really making a big difference and making a super cool environment out there. Thanks for coming on and sharing. Thank you so much, John. Okay, this is SuperCloud 22. I'm John Furrier. Stay with us more for more coverage and more commentary after this break.