 and welcome to the inaugural episode of our newest show here at theCUBE Research. The show, the series will be called The Road to Intelligent Data Apps. I'm your host, Shelly Kramer, I'm the managing director and a senior analyst here at theCUBE Research, and I'm joined by my colleague and fellow analyst, George Gilbert. George, welcome, great to see you. Hi, Shelly, good to be here. So we've been talking about this show for a very long time, and we're excited about this because we see The Road to Intelligent Data Apps as truly the next frontier. This last decade in data and analytics was all about cloud architecture and the process of separating compute from storage. The MDS lake house became the historical system of truth and we saw the rise of simple scalable cloud platforms with the likes of Snowflake and Databricks. The next decade though, we anticipate some massive change. This decade will be all about separating compute from data. And we're evolving to what will be a new historical system of truth with more easily composable data products that will evolve into applications. So this is exciting. And it's something we believe will be an even more profound shift than the MDS lake house. And we're still figuring out what that will look like, which is part of why we wanted to do this series. What we do know for certain, however, is that in this world, it's the combination of data and metadata that will form this new system of truth. So in this episode, we'll explain what metadata is and why it's so critical to the data center architecture. George, I know this topic is something that you are particularly passionate about. You're very knowledgeable about it. That's the reason we decided to launch this webcast series. So this will be the first of many conversations and interviews to come. And I'm so excited to be doing, to be walking this path with you. So in this first episode, we'll explore why we need to evolve. We'll start to explore how in terms of potential approaches. And part of what we're gonna do is preview what we expect to see at AWS re-invent 2023, which is kicking off in Vegas next week. So George, I'm gonna toss it over to you. I've talked enough so far this morning. All right, so AWS re-invent, it's a big event in enterprise software every year. And this one, it's a key milestone in this kind of once in a decade shift in data architectures because this is probably the year where we start to see the emphasis shifting from what was the modern data stack really that Snowflake defined that supported analytics by separating storage from compute and making it cloud native. So you could have as much data as you wanted and independently scale the compute, but we need something different now. And so this is where we're gonna see this equally profound shift that's gonna ultimately support intelligent data apps. And to do that, we need to separate data from compute. And so before we get into the nitty gritty of what we're looking for, like in AWS announcements and things like that, let's set up the context, which is the why and the how we have to separate data from compute now. So Shelley, why don't you set up some of the context with what do we mean by intelligent data apps? Absolutely. So for starters, let's talk about apps. One of the things that I ran across, I remembered as I was preparing for this show at the AWS Summit in New York City this past summer, one of the keynotes sort of planted some data points in my brain. And it was a keynote given by Salesforce, Senior Vice President, Salesforce Data Cloud Manager, Data Cloud Product Manager, Gabrielle Tao. And she shared just a snippet of information that said, we are today in an age where there are more devices than humans on the planet. Okay, that really makes perfect sense, but I think it's something that most of us don't really think about very often. So more devices of course means more data. And that becomes, I think probably one of the number one challenges that organizations of all size today have is getting their arms around this massive amount of data that they have. So as more disconnected data grows, it becomes exponentially harder to connect that data and use it to better serve customers. Salesforce did a little bit of research that they presented or they did some research that they talked about at AWS's event. And one of the things that they shared was that, while customers expect companies to expect, anticipate their needs, most feel that organizations do not. And I think that probably we would all agree with that based upon our own encounters with call centers and things like that, looking for some help. So some of the data from Salesforce's research was that 60 to 70% of customers expect companies to understand their needs, but yet 56% of them feel like they don't get me. I'm just a number and it's just kind of a waste of time. So shifting to think about applications, why does this happen? Why is there such a disconnect with data? Why do customers feel like they're not being well served? What's the challenge here? Well, one of the things that Gabrielle shared in her keynote was the average company has a whopping 976 applications, throughout their organization. So the fact that there are disconnects there is actually not at all surprising. So I think that this is where we're trying to solve for certain challenges within organizations. And this particular topic, the topic of apps is critical because in order to better understand customer needs, there needs to be a better understanding of how to more effectively and efficiently connect with them. And that's the beauty of Intelligent Data Apps and that's where the importance of these comes in. So Intelligent Apps are essentially a platform that enables an enterprise to orchestrate and coordinate an ecosystem of people, places, things and activities. You're already exposed to Intelligent Data Apps probably on a regular basis without really realizing it. Some current examples would be Uber or Amazon's e-commerce platform or Airbnb or Instacart. These are all examples of sophisticated technology that companies are using and these are Intelligent Data Apps and that's where Intelligent Apps come in. They're essentially a platform that allows an organization to orchestrate and coordinate an ecosystem of people, places, things and activities. And the reality of it is most of us or many of us anyway are exposed to Intelligent Data Apps on a somewhat regular basis, maybe without even knowing it. Some examples of this include Uber or Airbnb or Amazon's e-commerce platform or even Instacart. So the way that companies, these companies are able to effectively serve us is by way of the sophisticated technology that they're using, which are Intelligent Data Apps. But the reality of it is for most companies, today's modern data stack needs to evolve in the months and years ahead. And that is what mainstream enterprises need to be able to do to think about building these applications. So some quick characteristics of Intelligent Data Apps and how they're more advanced than data products. Here's an example. So the digital representations that we see of Intelligent Data Apps, you've got things like riders and drivers and fares and routes and ETAs, which is what you need when you're working within the Uber ecosystem, right? In the modern data stack, what you see are strings. So that's very different. You've got data assets, you've got transformation pipelines, you've got dashboards, you've got AI and ML models and all of those things are important, but they're not Intelligent Data Apps. So Intelligent Data Apps are orchestrated by AI. And for example, it's like Uber matching those drivers, those riders, calculating the routes, being able to send you messages along the way and those sorts of things. So hopefully that quick little overview is helpful. And now George, I'm gonna throw it to you to ask you to explain where AWS is relative to supporting these kind of applications with its data stack. Okay, so the big thing that we need to start looking for is how AWS is actually fixing some of their legacy issues because they basically missed the architecture of the modern data stack. And they've been struggling with that for 10 years, which is the modern data stack, just to recap, was separating storage from compute so that you could apply different types of analytics to the same centralized storage, but it was done within the context of one vendor. So that's why you could have your business intelligence workloads operating on the same data as your data transformation pipelines. Again, it was done within the confines of your lake house. So that's why Snowflake and Databricks dominated that AWS's challenge was that they started with a legacy code base. And so Redshift never separated the storage from compute. So you had this Redshift data warehouse, but if you actually wanted to do your data transformation, you had a separate silo in S3 with Elastic MapReduce or more recently the SQL-based Athena. And so you did your pipelines, then you put it in Redshift, but Redshift only worked for your structured data. And so you could only do your dashboards. So you had these different silos. And the whole point was you didn't want pipelines, you wanted one system of truth. So that was the opening that created essentially Snowflake and Databricks and made them so large on the Amazon platform. The other thing was, as you mentioned, we were still dealing with strings, not things. So whoever was trying to build these intelligent data apps, they had to build all the intelligence about like drivers, riders, routes. So what we need is something in this new and evolving modern data stack that allows you to unify the metadata that essentially transform the meaning of strings. And when I mean strings, I mean the rows and columns that said, that had operational and analytic data and transform that into the drivers and riders and routes and ETAs. And that's in the metadata. Now, Amazon last year introduced something called Datazone, which was a business catalog that created glossaries that would give you like discoverability. So you could find your dashboards or your pipelines, but it wasn't yet something that an application could use. So we wanna see if they continue to evolve this metadata platform into something more operational that an application knows how to use. Cause right now, even though they're trying to unify that metadata or they started to last year, you still had the data silos. Like only Redshift could access Redshift data and only EMR could access like the data like EMR or Athena, they still had the silos. And it was business metadata, not something an application could access. So we're looking to see, can they continue that movement where there's unification of data and unification of metadata across that unified data repository. That's what we're looking for. And metadata is key. Metadata is key. So it's clear that the modern data stack has to evolve to support these apps rather. And so a future data stack, when we're looking at a future data stack, the theme here will be all about unbundling the DBMS. And we're looking for some additional functionality that includes, when you have a system of truth, you've got the data plus the metadata. And the metadata is the data that explains what the data means. And that's going to grow and evolve substantially. And we're looking, as you mentioned, George, beyond BI and AI and ML models needs to support intelligent data apps and products. We need to be able to have application intelligence, which helps us move from app silos into the data platform. This will allow apps to become composable building blocks. And for example, Uber can add surge pricing functions that affect all fares in a given region. So this is something that happens in real time on the fly and that is really important. So as I said, when we started talking about this, you may think you know what metadata is and you may think you know how important it is, but we can promise you that it's actually more important when we're talking about intelligent apps than ever before. And the role of metadata is incredibly critical. So as this evolution continues and intelligent data platforms become more widely adopted, it is that metadata that's key. It's now the new system of truth. The metadata holds the key to what the data means. And it's not the database maintaining the system of truth. The metadata has to do it and that's why we need to evolve. So now we're going to talk a little bit about how metadata can evolve. All right, I'm going to get a little technical here. Metadata used to be like the system catalog in a DBMS and it was information like about the schema. It was like what rows and columns there were, what tables there were, how to connect the tables. That's like technical metadata. Then it became something we called governance and that was something like the IT guys and the administrative guys worried about it. It was like, who's allowed to access what? But that now has to become something like everyone shares and maintains. And by everyone, I mean the developers, the data engineers, the application developers because that's what sort of up levels strings to things. So specifically, it's things like operational metadata. That was the things like lineage, like what data fed into what data as you transformed it. So you knew where it came from. That allowed you to monitor the quality and observe things like where there are errors during the transformation. Then there was business metadata like glossaries and discoverability. The key point here was that when you start unifying and adding this all together, you gradually transformed the strings into things as you added business logic that gave you essentially the meaning. Ultimately, all this metadata becomes what used to live in the applications, the intelligence that lived in the application now has to drop down into the metadata so that it's shared by everyone. And that's how we go from application silos to an intelligent data platform that gives you composability. So what we need to look for in terms of AWS product and frankly, what we wanna see in everyone's introductions. First, to see the separation of computing data, we wanna see that like Redshift, SageMaker, OpenSearch, key parts of the data stack that Amazon has, those need to share like a common repository. They should all be with data in S3. Today, most of those basically have their own, OpenSearch has its own data repository, Redshift has its own data repository. Those all have to have a common data foundation. And then we need to unify that metadata. So today there's business metadata, the glossary and discoverability, that's in data zone. There's technical metadata in like the glue data catalog, the Redshift system catalog, all that has to get unified into one sort of operational catalog that all applications can share. That's the key thing that we need to look for. So Shelley, why don't you start to explain what we're looking for? Well, some of the conversations that we have had is of course looking at who the players are in this ecosystem. Of course, we're gonna start with AWS as we head into the reinvent event. So AWS has a legacy and some challenges and basically they kind of miss the cloud data lake house with Redshift because it was based on on-prem code base. It didn't separate compute from storage and that created the opening for snowflake. AWS however seems to anticipating this next shift which we like to see with the development of data zone and its metadata strategy. So we're also gonna take a look at some of Amazon's competition on this front. So George, who do you think, who are you looking at? All right, so snowflake has some really strong foundations but they left a couple openings. So let's start starting with them. They have really strong DBMS foundation which allows them to provide simplicity to admins and developers if they fix a couple of things. So first of all, you want in your data platform unification of like query, transaction model, error handling, administration. So if you can have the same DBMS that manages your data and your metadata that gives unification and simplicity to developers and admins. What snowflake left open was all the metadata that would allow people really to turn strings into things and to manage a broader data estate which is to say data that's not just managed by the snowflake DBMS. That's the thing they have to fill in. And with Snowpark and Elastic Container Services they gave you the open extensibility to add any compute to access the data that goes through their platform. The problem right now with them is the pricing and business model makes it not cost competitive for all workloads. And that's a big problem. I mean, you know, especially I think now, you know, I've characterized this entire year is that a year where we're all trying to do more with less across the board from a corporate standpoint, right? So this cost issue is a big issue. Yes, and that actually has created an opening because people are putting the less performance sensitive workloads on different platforms right now. And because they don't, because Snowflake doesn't have a platform for managing the metadata outside their platform. It means someone else owns that essentially that system of truth metadata outside of Snowflake. In other words, this- Not ideal. Right, not ideal. And that's what created an opening for Databricks which came along and surprised some folks this year with Unity, which is this unified metadata platform where even if the data is not something that you're managing with their SQL engine, it can manage actually the metadata for your entire data estate. That's what caught Snowflake by surprise. And so you can have, they're also more price competitive because they segment their pricing by workload type. So for the non-performance sensitive stuff, it's much more cost effective. So what they're trying to do is own the metadata for your data estate even though their DBMS technology is weaker. So they're playing to their, they're trying to make their weakness their strength which is own all the metadata even if their DBMS isn't strong enough to manage all the metadata and the operational applications that you're gonna build eventually. There's an analytic DBMS right now. Yeah. And Google made tremendous progress showing at Google Cloud Next the separation of storage for compute so that all the AI tools, BigQuery, everything worked on a common data foundation to have this common metadata layer in Dataplex. So they're looking strong, even Microsoft which has basically been a non-entity in the data platform showed Azure Fabric where everything was separate from all the analytic engines were separate from the data foundation. And a metadata layer. So they're even in the game. So Amazon's coming from behind here. They are, but it's really this to me this whole conversation in this kickoff episode. I hope that our viewing audience and our listing audience can see why we're so excited about this. We think there's a ton of change ahead on this front and there has to be, right? And on the modern data stack we essentially need different compute engines talking to a shared data platform. And then the whole point of this transition is in the modern data stack. And we separated storage from compute but storage and compute were owned by the same vendor. This is where we're evolving. Okay, so now we need to get to the point where we have different compute engines from different vendors accessing the same data. And that means you need a platform where you're up leveling the intelligence that's in the data. So that as you build applications you're not talking to strings where different application silos have the intelligence about what those strings mean. That intelligence has to drop down to the data. And so the data platform has to maintain this new system of truth where it's got all this intelligence and it keeps it in sync so that you have applications then that can build on each other, which is composability. That's the key, that's what we're looking for. Yeah, and that my friends is what the series the road to intelligent data apps is all about exploring the challenges and rethinking the way we do things as we explore the road to intelligent data apps. As we wrap this inaugural episode we're glad to have you along on this journey with us and we hope that you'll hit the subscribe button and come along with us. And if you're planning to be at re-invent in Vegas next week and you wanna meet up, reach out and let us know. You can, I'll include links to where you can find us on LinkedIn. We'd love your connection request and we'd love to see you in Vegas next week. So with that, we're gonna wrap this show and we'll see you next time. Thanks, George. All right, thanks, Kelly.