 Hello everybody, this is Dave Vellante. Welcome back to SuperCloud 2, where we're exploring the intersection of data, analytics, and the future of cloud. In this segment, we're going to look at how the SuperCloud will support a new class of applications, not just work that runs on multiple clouds, but rather a new breed of apps that can orchestrate things in the real world. Think Uber for many types of businesses. These applications are not about codifying forms or business processes. They're about orchestrating people, places, and things in a business ecosystem. And I'm pleased to welcome my colleague and friend George Gilbert, former Gartner analyst, Wikibon market analyst, former equities analyst, as my co-host. And we're thrilled to have Tristan Handy, who's the founder and CEO of DBT Labs, and Bob Muglia, who's the former president of Microsoft's Enterprise and Business and former CEO of Snowflake. Welcome all gentlemen, thank you for coming on the program. You're here. Thanks for having us. Hey look, I'm going to start actually with SuperCloud because both Tristan and Bob, you've read the definition. Thank you for doing that. And Bob, you have some really good input on some thoughts on maybe some of the drawbacks and how we can advance this. So what are your thoughts in reading that definition around SuperCloud? Well, I thought first of all that you did a very good job of laying out all of the characteristics of it and helping to define it overall. But I do think we can be tightened a bit. And I think it's helpful to do it in as short a way as possible. And so in the last day, I've spent a little time thinking about how to take it and write a crisp definition. And here's my go at it. It was one day old. So give me a break if it's going to change. And of course we have to follow the industry. And so that, and whatever the industry decides. But let's give this a try. So in the way I think you're defining it, what I would say is a SuperCloud is a platform that provides programmatically consistent services hosted on heterogeneous cloud providers. Boom, nice. Okay, great. I'm going to go back and read the script on that one and tighten that up a bit. Thank you for spending the time thinking about that. Tristan, would you add anything to that? Or what are your thoughts on the whole SuperCloud concept? So as I read through this, I fully realized that we need a word for this thing because I have experienced the inability to talk about it as well. But for many of us who have been living in the confluence snowflake, this world of like new infrastructure, this seems fairly uncontroversial. Like I read through this and I'm just like, yeah, this is like the world I've been living in for years now. And I noticed that you called out snowflake for being an example of this. But I think that there are like many folks, myself included, for whom this world like fully exists today. You know, I think that's a fair, I don't know if it's a criticism, but people have observed, well, what's the big deal here? It's just kind of what we're living in today. It reminds me of, you know, Tim Berners-Lee said, well, this is what the internet was supposed to be. It was supposed to be Web 2.0. Maybe this is what multi-cloud was supposed to be. Let's turn our attention to apps. Bob first and then go to Tristan. Bob, what are data apps to you when people talk about data products? Is that what they mean? Are we talking about something more different? What are data apps to you? Well, to understand data apps, it's useful to contrast them to something. And I just use the simple term people apps. I know that's a little bit awkward, but it's clear. And almost everything we work with, almost every application that we're familiar with, be it email or Salesforce or any consumer app, those are applications that are targeted at responding to people. You know, in contrast, a data application reacts to changes in data and uses some set of analytics services to autonomously take action. So where applications that we're familiar with respond to people, data apps respond to changes in data. And they both do something, but they do it for different reasons. Got it. George, you and I were talking about, it comes back to super cloud, broad definition, narrow definition. Tristan, how do you see it? Do you see it the same way? Do you have a different take on data apps? Oh geez, this is like a conversation that I don't know has an end. It's like, I write a sub stack and there's like this little community of people who all write sub stack and we argue with each other about these kinds of things. Like, you know, there's many different takes in these questions you can find. But the way that I think about it is that data products are atomic units of functionality that are fundamentally data-driven in nature. So a data product can be as simple as an interactive dashboard that is like, actually had design thinking put into it and serves a particular user group and has like actually gone through kind of the product development life cycle. And then a data app or data application is a kind of cohesive end to end experience that often encompasses like many different data products. So from my perspective, this is very, very related to the way that these things are produced, the kinds of experiences that they're provided, that like data innovates every product that we've been building in software engineering for as long as they're in computers. You know, Jamak Tagani oftentimes uses the, she doesn't name Spotify, but I think it's Spotify is that kind of example she uses. But I wonder if we can maybe try to take some examples. If you take like George, if you take a CRM system today, you're inputting leads, you got opportunities, it's driven by humans, they're really inputting the data and then you got the system that kind of orchestrates the business process like runs a forecast. But in this data-driven future, are we talking about the app itself pulling data in and automatically looking at data from the transaction systems, the call center, the supply chain, and then actually building a plan? George, is that how you see it? I go back to the example of Uber, may not be the most sophisticated data app that we build now, but it was like one of the first where you do have users interacting with their devices as riders trying to call a car or a driver. But the app then looks at the location of all the drivers in proximity and it matches a driver to a rider. It calculates an ETA to the rider, it calculates an ETA then to the destination and it calculates a price. Those are all activities that are done sort of autonomously that don't require a human to type something into a form. The application is using changes in data to calculate an analytic product and then to operationalize that, to assign the driver to calculate a price. Those are, that's an example of what I would think of as a data app. And my question then I guess for Tristan is if we don't have all the pieces in place for sort of mainstream companies to build those sorts of apps easily yet, like how would we get started? What's the role of a semantic layer in making that easier for mainstream companies to build? And how do we get started, you know, say with metrics? How does that take us down that path? So what we've seen in the past decade or so is that one of the most successful business models in infrastructure is taking hard things and rolling them up behind APIs. You take messaging, you take payments, what? And you all of a sudden increase the capability of kind of your median application developer. And you say previously you were spending all your time being focused on how do you accept credit cards, how do you send SMS payments? Now you can focus on your business logic and just create the thing. One of, interestingly, one of the things that we still don't know how to APIify is concepts that live inside of your data warehouse inside of your data lake. These are core concepts that, you know, you would imagine that the business would be able to create applications around very easily. But in fact, that's not the case. It's actually quite challenging too. And it involves a lot of data engineering, put planning, all this work to make these available. And so if you really want to make it very easy to create some of these data experiences for users, you need to have an ability to describe these metrics. And then to turn them into APIs that make them accessible to application developers who have literally no idea how they're calculated behind the scenes and they don't need to. So how rich can that API layer grow if you start with metric definitions that you've defined? And DBT has, you know, the metric, the dimensions, the time grain, things like that. That's a well-scoped sort of API that people can work within. How much can you extend that to say, non-calculated business rules or governance information like data reliability rules, things like that. Or even, you know, features for an AIML feature store. In other words, how it starts, you started pragmatically, but how far can you grow? Bob is waiting with Bated Beth to answer this question. Just really quickly, I think that we as a company and DBT as a product tend to be very pragmatic. We try to release the simplest possible version of the thing, get it out there and see people use it. But the idea that the concept of a metric is really just a first landing pad. The really, there is a physical manifestation of the data and then there's a logical manifestation of the data. What we're trying to do here is make it very easy to access the logical manifestation of the data. And a metric is a way to look at that. Maybe an entity, a customer, a user is another way to look at that and ensure that there will be more kind of logical structures as well. So Bob, chime in on this, what's your thoughts on the right architecture behind this and how do we get there? Yeah, well, first of all, I think one of the ways we get there is by what companies like DBT Labs and Tristan is doing, which is incrementally taking and building on the modern data stack and extending that to add a semantic layer that describes the data. Now, the way I tend to think about this is a fairly major shift in the way we think about writing applications, which is today a code first approach to moving to a world that is model driven. And I think that's what the big change will be, is that where today we think about data, we think about writing code, and we use that to produce APIs as Tristan said, which encapsulates those things together in some form of services that are useful for organizations. And that idea of that encapsulation is never gonna go away. It's a very, that concept of an API is incredibly useful and will exist well into the future. But what I think will happen is that in the next 10 years, we're gonna move to a world where organizations are defining models first of their data, but then ultimately of their business process, their entire business process. Now, the concept of a model driven world is a very old concept. I mean, I first started thinking about this and playing around with some early model driven tools, probably before Tristan was born in the early 1980s. And those tools didn't work because the semantics associated with executing the model were too complex to be written in anything other than a procedural language. We're now reaching a time where that is changing. And you see it everywhere. You see it first of all in the world of machine learning and machine learning models, which are taking over more and more of what applications are doing. And I think that's an incredibly important step and learned models are an important part of what people will do. But if you look at the world today, I will claim that we've always been modeling. Modeling has existed in computers since there have been integrated circuits in any form of computers. But what we do is what I would call implicit modeling, which means that the model is written on a whiteboard. It's in a bunch of Slack messages. It's on a set of napkins in conversations that happened during Zoom. That's where the model gets defined today. It's implicit. There is one in the system. It is hard coded inside application logic that exists across many applications, with humans being the glue that connects those models together. And really there is no central place you can go to understand the full attributes of the business, all of the business rules, all of the business logic, the business data. That's going to change in the next 10 years. And we'll start to have a world where we can define models about what we're doing. Now, in the short run, the most important models to build are data models and to describe all of the attributes of the data and their relationships. And that's work that DBT Labs is doing, a number of other companies are doing that. We're taking steps along that way with catalogs. People are trying to build more complete ontologies associated with that. The underlying infrastructure is still super, super nascent. But what I think we'll see is this infrastructure that exists today that's building learned models in the form of machine learning programs, some of these incredible machine learning programs in foundation models like GPT and Dolly and all of the things that are happening in these global scale models. But also all of that needs to get applied to the domains that are appropriate for a business. And I think we'll see the infrastructure developing for that that can take this concept of learned models and put it together with more explicitly defined models. And this is where the concept of knowledge graphs come in and then the technology that underlies that to actually implement and execute that, which I believe are relational knowledge graphs. Wow, there's a lot to unpack there. So let me ask the Colombo question, Tristan. We've been making in front of your youth. We're just jealous. Colombo, I'll explain it offline maybe. Hi, my name is Colombo. Okay, good. So, but today, if you think about the application stack and the data stack, which is largely an analytics pipeline, they're separate. Do they, those worlds, do they have to come together in order to achieve Bob's vision? When I talk to practitioners about that, they're like, well, I don't want to complexify the application stack because the data stack today is so hard to manage. But do those worlds have to come together and through that model, I guess, abstraction or translation that Bob was just describing. How do you guys think about that? Who wants to take that? I think it's inevitable that data and AI are going to become closer together. I think that the infrastructure there has been moving in that direction for a long time. Whether you want to use the Lakehouse Portmanteau or not, there's also, there's a next generation of data tech that is still in the early stage of being developed. There's a company that I love that is essentially cross-cloud Lambda. And it's just a wonderful abstraction for computing. So I think that people have been predicting that these worlds are going to come together for a while. A16Z wrote a great post on the back in, I think, 2020, predicting this, and I've been predicting this since 2020, but what's not clear is the timeline. But I think that this is still just as inevitable as it's been. Who's that? Let me follow up on. Who's that Tristan that does cross-cloud Lambda? Can you name names? They're called Modal Labs. Modal Labs, yeah, of course. All right, go ahead, George. Let me ask about this vision of trying to put the semantics or the code that represents the business with the data. It gets us to a world that's sort of more data-centric, where data's not locked inside or behind the APIs of different applications, so that we don't have silos. But at the same time, Bob, I've heard you talk about building the semantics gradually on top of, into a knowledge graph that maybe grows out of a data catalog. And the vision of getting to that point, essentially the enterprise's metadata and then the semantics are gonna add onto it are really stored in something that's separate from the underlying operational and analytic data. So at the same time, then, why couldn't we gradually build semantics beyond the metric definitions that DBT has today? In other words, you build more and more of the semantics in some layer that DBT defines, and that sits above the data management layer, but any requests for data have to go through the DBT layer. Is that a workable alternative or what type of limitations would you face? Well, I think that it is the way the world will evolve, is to start with the modern data stack, and which is operational applications going through a data pipeline into some form of data lake, data warehouse, the lake house, whatever you wanna call it, and then this wide variety of analytics services that are built together. To the point that Tristan made about machine learning and data coming together, you see that in every major data cloud provider. Snowflake certainly now supports Python and Java. Databricks is, of course, building their data warehouse, but certainly Google, Microsoft, and Amazon are doing very, very similar things in terms of building complete solutions that bring together an analytics stack that typically supports languages like Python together with the data stack and the data warehouse. I mean, all of those things are going to evolve, and they're not gonna go away because that infrastructure is relatively new. It's just being deployed by companies and it solves the problem of working with petabytes of data if you need to work with petabytes of data, and nothing will do that for a long time. What's missing is a layer that understands and can model the semantics of all of this. And if you wanna talk about all the semantics of even data, you need to think about all the relationships, you need to think about how these things connect together. And unfortunately, there really is no platform today. None of our existing platforms are ultimately sufficient for this. It was interesting, I was just talking to a customer yesterday, a large financial organization that is building out these semantic layers. They're further along than many companies are. And I asked what they're building it on and it's not surprising they're using combinations of some form of search together with text, textual-based search together with a document-oriented database, in this case, it was Cosmos. And that really is kind of the state of the art right now. And yet those products were not built for this. They can't manage the complicated relationships that are required, they can't issue the queries that are required, and so a new generation of database needs to be developed. And fortunately, that is happening, the world is developing a new set of relational algorithms that will be able to work with hundreds of different relations. If you look at a SQL database like Snowflake or a BigQuery, you get tens of different joins coming together and that query is going to take a really long time. Well, fortunately technology is evolving and it's possible with new join algorithms, worst case optimal join algorithms they're called, where you can join hundreds of different relations together and run semantic queries that you simply couldn't run. Now that technology is nascent, but it's really important and I think that will be a requirement to have this semantic layer reach its full potential. In the meantime, Tristan can do a lot of great things by building up on what he's got today and solve some problems that are very real. But in the long run, I think we'll see a new set of databases to support these models. So Tristan, you've got to respond to that, right? So take the example of Snowflake, we know it doesn't deal well with complex joins, but they've got big aspirations. They're building an ecosystem to really solve some of these problems. Tristan, you guys are part of that ecosystem and others, but please, your thoughts on what Bob just shared. Bob, I'm curious if, I would have no idea what you were talking about except that you introduced me to somebody who gave me a demo of a thing and do you not want to go there right now? No, I can talk about it. We can talk about it. The company I've been working with is relational AI and they're doing this work to actually, first of all, work across the industry with academics and research, across many, many different, over 20 different research institutions across the world to develop this new set of algorithms. They're all fully published just like SQL underlying algorithms that are used by SQL databases are. If you look today, every single SQL database uses a similar set of relational algorithms underneath that and those algorithms actually go back to system R and what IBM developed in the 1970s. There's an opportunity for us to build something new that allows you to take, for example, instead of taking data and grouping it together in tables, treat all data as individual relations, a key and a set of values and then be able to perform purely relational operations on it. If you go back to what Cod and what he wrote, he defined two things. He defined a relational calculus and a relational algebra and essentially SQL is a query language that is translated by the query processor into relational algebra. But however, the calculus of SQL is not even close to the full semantics of the relational mathematics and it's possible to have systems that can do everything and it can store all of the attributes of the data model or ultimately the business model in a form that is much more natural to work with. So here's like my short answer to this. I think that we're dealing in different time scales. I think that there is actually a tremendous amount of work to do in the semantic way or using the kind of technology that we have on the ground today. And I think that there's, I don't know, let's say five years of like really solid work that there is to do for the entire industry. It's not more. But the wonderful thing about DVT is that it's independent of what the compute substrate is beneath it. And so if we develop new platforms, new capabilities to describe semantic models in more fine-grained detail and more procedural, then we're gonna support that too. And so I'm excited about all of it. Yeah, so interpreting that short answer, you're basically saying, because Bob was just kind of pointing at you, it's incremental, but you're saying, yeah, okay, we're applying it for incremental use cases today, but we can accommodate a much broader set of examples in the future. Is that correct, Tristan? I think you're using the word incremental as if it's not good, but I think that incremental is great. We have always been about applying incremental improvement on top of what exists today, but allowing practitioners to use different workflows to actually make use of that technology. So yeah, we are a very incremental company. We're gonna continue being that way. Well, I think Bob was using incremental as a pejorative. I mean, to your point, a lot of... I don't think so. I wanna stop that. No, I don't think it's pejorative at all. I think incremental is usually the most successful path. Yes, of course. We agree on that. Having tried many, many moonshot things in my Microsoft days, I can tell you that being incremental is a good thing. And I'm a very big believer that that's the way the world's gonna go. I just think that there is a need for us to build something new and that ultimately that will be the solution. Now, you can argue whether it's two years, three years, five years or 10 years, but I'd be shocked if it didn't happen. Yeah, so we all agree that incremental is less disruptive, boom. But Tristan, I think I'm inferring that you believe you have the architecture to accommodate Bob's vision. And I'm inferring from Bob's comments that maybe you don't think that's the case, but please. No, no, I think that... So Bob, let me put words in your mouth and you tell me if you disagree. DBT is completely useless in a world where a large-scale cloud data warehouse doesn't exist. We were not able to bring the power of Python to our users until these platforms started supporting Python. Like DBT is a layer on top of large-scale computing platforms and to the extent that those platforms extend their functionality to bring more capabilities, we will also surface those capabilities. Let me try. So Bob, Bob, do you concur with Tristan? Absolutely. I mean, there's nothing to argue with than what Tristan just said. I wanna... It's what he's doing, I believe he'll continue to do it. And I think it's a very good thing for the industry. I'm just simply saying that on top of that, I would like to provide Tristan and all of those who are following similar paths to him with a new type of database that can actually solve these problems in a much more architected way. And when I talk about with something like Mongo or Cosmos together with Elastic, you're using Elastic as the join engine, okay? That's the purpose of it. It becomes a poor man's join engine. I kind of go, I know there's a better answer than that. I know there is. But that's kind of where we are state-of-the-art right now. George, we gotta wrap. So give us the last word here. Go ahead, George. Okay, I think there's a way to tie together what Tristan and Bob are both talking about and I want them to validate it, which is for five years we're gonna be adding or some number of years more and more semantics to the operational and analytic data that we have starting with metric definitions. My question is for Bob, as DBT accumulates more and more of those semantics for different enterprises, can that layer not run on top of a relational knowledge graph? And what would we lose by having the knowledge graph store sort of the joins all the complex relationships among the data, but having the semantics in the DBT layer? Well, I think this, okay. I think first of all that DBT will be an environment where many of these semantics are defined. The question we're asking is how are they stored and how are they processed? And what I predict will happen is that over time, as companies like DBT begin to build more and more richness into their semantic layer, they will begin to experience challenges that customers want to run queries, they wanna ask questions, they wanna use this for things where the underlying infrastructure becomes an obstacle. I mean, this has happened in always in the history, right? I mean, you see major advances in computer science when the data model changes. And I think we're on the verge of a very significant change in the way data is stored in structure, or at least metadata is stored in structure. Again, I'm not saying that anytime in the next 10 years, SQL is gonna go away. In fact, more SQL will be written in the future than it's been written in the past. And those platforms will mature to become the engines, the slicer dicers of data. I mean, that's what they are today. They're incredibly powerful at working with large amounts of data and that infrastructure is maturing very rapidly. What is not maturing is the infrastructure to handle all of the metadata and the semantics that that requires. And that's where I say knowledge graphs are what I believe will be the solution to that. But Tristan, bring us home here. It sounds like, let me pause at this, is that whatever happens in the future, we're going to leverage the vast system that has become cloud that we're talking about a super cloud, sort of where data lives, irrespective of physical location, we're going to have to tap that data. It's not necessarily going to be in one place, but give us your final thoughts, please. 100% agree. I think that the data is going to live everywhere. It is the responsibility for both the metadata systems and the data processing engines themselves to make sure that we can join data across cloud providers who can join data across different physical regions and that we as practitioners are going to kind of start forgetting about the details like that. And we're going to start thinking more about how we want to arrange our teams, how does the tooling that we use support our team structures? And that's when data mesh, I think, really starts to get very, very critical as a concept. Guys, great conversation. We're really awesome to have you. I can't thank you enough for spending time with us. I really appreciate it. It's a lot. All right, this is Dave Vellante for George Gilbert-Johnford in the entire CUBE community. Keep it right there for more content. You're watching SuperCloud too.