 From theCUBE Studios in Palo Alto in Boston, bringing you data-driven insights from theCUBE and ETR. This is Breaking Analysis with Dave Vellante. In the next three to five years, we believe a set of intelligent data apps will emerge that will require a new type of modern data platform, a sixth data platform to support them. In our research, we've used the metaphor of Uber for all to describe this vision, meaning a software-based systems that enable a digital representation of a business, a digital twin, if you will, where various data elements like people, places and things from the real world are ingested, made coherent and combined to take action in real time. Now, rather than requiring thousands of engineers to build this system like Uber did, we've said this capability will come in the form of commercial off-the-shelf software. Hello and welcome to this week's Cube Insights, powered by ETR. To explore this premise, George Gilbert and I welcome Ryan Blue to this, our 201st episode. Ryan is the co-creator and PMC chair of Apache Iceberg and co-founder and the CEO of Tabular, a universal open table store that connects to any compute layer built by the creators of Iceberg. George, Ryan, welcome to the show. Good to be here. Yeah, likewise, thanks for having me on. Okay, before we get into it, let's review what we see is today's sort of five most prominent data platforms. So, here we list them, Snowflake, they've got the data cloud, Databricks's, Lakehouse, Delta and all the other things that we heard about at the recent Databricks Summit, Unity, the Spark execution engine. Google's got BigQuery, making a lot of noise with their AI, Vertex AI in particular, their sort of LLMs that allow you to customize your own. Microsoft's got Synapse and the recently announced Fabric. Amazon's got its bespoke set of tools, Redshift, LakeFormation, Glue to stitch them together. But interestingly, we've got this emerging open multi-vendor modular set of platforms. Iceberg is an example. We're using Starburst, Dagster, DBT Labs, whom we've had on before. And the question that we're asking here is can a sixth data platform emerge from these components? But before we get into it, George, anything you'd add to these five great modern data platforms? Well, there's a big delta between the data platforms we have today and that vision of the real-time intelligent data app. And the biggest question I have is, are the components of which Ryan and Cabular with Iceberg are a representative, are they going to offer plug-in alternatives to components of today's data platforms or will there be the ability to compose and assemble an alternative data platform? And then what are the trade-offs between that open assembly, if it's possible, and the integrated and generally typically simpler to operate but usually more expensive to buy? That's the big question I hope we explore with Ryan. Okay, great. And Ryan, anything you want to comment on sort of today's modern data platforms? I mean, we moved on from Hadoop. We saw the separation of compute from storage. You've got a lot of VC investment, a lot of, you know, IPOs that were, I guess, successful, I guess we'll see how that all ends up, but a lot of, certainly a lot of market momentum around those five. So when you guys talk about these platforms, I think that we've already seen the modularity starting to creep in. So if you look at just the last three of these, Google, Microsoft, Amazon, and to a certain extent, Databricks and Snowflake coming on board, we're already seeing that these are not platforms that are a single system, like a single engine, a single storage layer. These are already complex things made of very diverse products. So you're already seeing engines from the Hadoop space in Microsoft Synapse, right? It's Spark, right? Spark is also part of the Databricks platform. A huge imperative for Snowflake these days is how do we get data to people that want to use Spark with it? And so I think we've already started to see these systems decomposing and, you know, becoming a collection of projects that all work together rather than one big monolithic system. The question is just, you know, do the, with all the VC investment that you were alluding to, you know, how long are we going to wait until all of those components work together really well and what needs to change? Yeah, you bring up some really interesting points. Let's take a look at some of the spending data to put these in context, Ken, if you could bring up the next chart, and I want to sort of riff on some of the things that Ryan just talked about. So this is data from ETR, our survey partner, and it focuses in on the cloud computing data platforms. It's the data warehouse and database sector. So that's why we have the asterisk is specifically next to Microsoft. Microsoft is ubiquitous and there's probably a lot of on-prem SQL server that creeps its way into the survey, but it's a sizable survey of over 1,100. The vertical axis is spending momentum or net score. It's a representation of the net percent of customers in the dataset that are spending more on a particular platform. And the horizontal, it's listed there as overlap. It's really a proxy for penetration into the dataset. And that red dotted line at 40% is a high watermark. So you can see all five of these data platforms are above that high watermark. So they have strong representation of the market and they have a good momentum. The only one that's relatively small is Databricks, but it's kind of new to this space. We put Oracle on there, because certainly Oracle has tried to keep pace with, hey, we've got that too, but you can see it doesn't have nearly the momentum and it gives the other five context. And so that's sort of one data point. If you go on the next slide, Ken, this is a slide from the ETS survey, the Emerging Technology Survey. And it basically looks at market sentiment. So it's net sentiment. It's not a spending momentum. It's an intent to engage with a platform. These are all private companies and then it's mind-share on the horizontal platform. And you can see we've just picked out a couple of these sort of component players, FiveTran, AtScale, sort of the semantic layer, Starburst, doing some things with data and data mesh, DBT Labs, which is sort of API-fying metrics inside of data warehouses. And you can see that squiggly line is the progress made by Starburst since November of 2020. You would see similar lines with these other players as well. So they're all sort of moving fast, gaining market momentum. They're reasonably well-funded. And we see them as sort of components in this whole picture. So maybe starting again with George and then Ryan and see if you have any thoughts on the data that we just shared. Well, my big question is something Ryan, you brought up before, which is, are we seeing the modularization of previously integrated proprietary data platforms or are we seeing a bunch of multi-vendor modules emerging? And what are the use cases, if so, that's drawing those multi-vendor components into the marketplace and starting with iceberg. Where is it being drawn in? And then what use cases and what other modules do you think that are open and multi-vendor might emerge? And then picking up on that, I mean, just something that you sort of alluded to before, Ryan, you were saying, hey, we're already seeing that sort of modularity. I mean, look at Snowflake, it's got this single integrated system and they announced last year support for iceberg. They've improved upon that this year. They announced single store, which is still not in general availability, allowing them to bring in transaction data. So to your point, Ryan, that single integrated system, which is often the way that these markets sort of get traction and then they sort of decompose, but maybe some of your thoughts on this. Yeah, absolutely. If you take both Databricks and Snowflake, they want access to all of the data out there. And so they are very much incentivized to add support for projects like iceberg. Databricks and Snowflake have recently announced support for iceberg. And that's just from the monolithic vendor angle, right? I don't think anyone would have expected Snowflake to add full right support for a project like iceberg two years ago. That was pretty surprising. And then if you look at the other vendors, they're able to compete in this space because they're taking all of these projects that they don't own and they're packaging them up as out of the box data architecture. One of the critical pieces of this is that we're sharing storage across these projects. And that has for the data warehouse vendors like Snowflake, the advantage that they can get access to all of the data that's not stored in Snowflake. But I think a more important lens is from the buyer or the user's perspective where no one wants siloed data anymore. And they want essentially what Microsoft is talking about with fabric. They want to be able to access the same datasets through any different engine or means, whether that's a Python program on some developer's laptop or a data warehouse at the other end of the spectrum like Snowflake. So it's very important that all of those things can share data. Of course, that's where I'm coming from. So that's what I'm most familiar with. If you go above the layer of those engines, I'm less familiar, but we even see that consolidation with integrations like FiveTran writing directly to iceberg tables. Right, and it's interesting because Snowflake initially got some traction with this concept of data sharing. They were early on, but basically saying, bring everything into Snowflake, and it'll be governed and you'll be safe and we'll break down all those silos within our big silo. And they realized, oh, well, actually not all the data's going to end up here. So we have to kind of begin to open up. And it's to your point, the end customer ultimately will be the arbiter of how these shifts occur. I wonder if we could bring up Ken, the key questions. So to the next slide, and we're going to come back to this slide, I'm going to keep asking you to bring this up over the course of this conversation. So the first one that we wanted to explore with Ryan is you had Hadoop, which was actually quite atomic. All these piece parts, it was very service heavy. You bring in some outside experts or you had some guys in lab coats who knew how to do this stuff. And essentially, just never was able to get the traction function of spark disruption and of course the cloud. And then we moved to this highly integrated snowflake. And the question we had are what are the implications on usability, cost and value? And the reason we asked that is because at the most recent snowflake summit, we talked to customers, we talked to some of the ecosystem partners who said, you know, we're seeing a lot of our customers that are doing the data pipeline or the data engineering outside of snowflake saying it's too expensive. And we've seen that. Now we've talked to snowflake about it and they've said, well, actually, if you do this inside a snow park, you get the full value of the TCO, et cetera. And so there's that interesting debate that's going on. But we've clearly seen in the ETR data, some of the momentum in snowflake, in terms of percentage of customers spending more has come down. But at the same time, there is that value proposition. So what are the implications in your view around that usability, the cost and ultimately the value? I think you believe in an open modular world. How do you see that playing out? I do believe in the modular world. I think that the biggest change in databases in the last 10 years easily, if not longer is the ability to share storage underneath databases. And that really came from the Hadoop world because those of us, we weren't wearing lab coats, but those of us over there, we sort of had as an assumption that many different engines, whether they're streaming or batch or ad hoc, they all needed to use the same data sets. So that was an assumption when we built Iceberg and other similar formats. And that's what really drove this change to be able to share data. Now, in terms of cost and usability, that is a huge, huge disruption to the business model of basically every established database vendor who have lived on the Oracle model. And it's not really even the Oracle model, it's just that storage and compute have always been so closely tied together that you couldn't separate them. And so by winning workloads, winning data sets, you also were winning future compute, whether or not you were the best option for it. Best option meaning what people knew how to use, best in terms of performance and cost and things like that. And so the shift to sharing data means we can essentially move workloads without forklifting data, without needing to worry about how do we keep it in sync across these products? How do we secure it? All of those problems that have been inhibitors to moving data to the correct engine that is the most efficient or the most cost effective, et cetera. So I think that this shift to open storage and independent storage in particular is going to drive a lot of that value and basically cost efficiency. Dave, let me follow up on that because you said two things in there, which were critical, the shared storage, which means you'd compete for what's the best compute for that workload. But someone still got to govern that data. And let me define what I mean by govern. Got to make sure that multiple people aren't reading and writing. And I don't mean just a single table because you might be ingesting into multiple tables. Someone's got to apply permissions and their broader governance policies. So it's not just enough to say, we've standardized the table format. Something else in there has to make sure that everything's in the non-technical word, copacetic. What are those services that are required to make things copacetic, whether it's transactional support, whether it's permissions or broader governance? Tell us what you think needs to go along with this open data so that everyone can share it. That's a great question. I think that access controls are one of the biggest blind spots here. So the data lake world, which is essentially what I define as the Hadoop ecosystem after it moved to cloud and started using S3 and other object stores. This is a massive gap for the data lakes and for shared storage in general. Spoiler or disclaimer, this is what my company Tabular does. We provide this independent data platform that actually secures data, no matter how you're getting, sorry. That actually secures data no matter what you're using to access it, whether that is that Python process or Starburst Galaxy, for example. So I think that that is a really critical piece. What we've done is actually taken access controls, which if you have a tied together or glued together storage and query layer, those go in the query layer because the query layer understands the query and can say, oh, you're trying to use this column that you don't have access to. In the more modern, you know, sixth platform that you guys are talking about, that has to move down into the storage layer in order to really universally enforce those access controls without sinking between Starburst and Snowflake and Microsoft and Databricks and whatever else you wanna use at the query layer. Right, so if you bring up the slide again, the question slide, it sounds like you buy the premise that a sixth data platform will emerge. You started getting into sort of the components. Obviously, Iceberg is one of them that are gonna enable this vision and the role that Iceberg plays. It sounds like you're sort of aligned with that. There maybe is some fuzziness as to how that all plays out, but this is something that George and I definitely wanna explore. Yeah, I think that the sixth data platform is as good a name as you can come up with, right? I don't think that we know what it's gonna be called quite yet. I don't know that I would consider it distinct because I think what's gonna happen is all five of those players that you were talking about plus Oracle plus IBM and others are going to standardize on certain components within this architecture. Certainly shared storage is going to be one. And I believe that Iceberg is probably that shared storage format because it seems to have the broadest adoption. If Databricks and Snowflake can agree on something, then that's probably the de facto standard. But I think that that is Iceberg. We are still going to see what we can all agree on as we build more and more of those standards. In the Iceberg community, we're working on shared things like views, standardized encryption, ways of interacting with a catalog no matter who owns or runs that catalog. And I think those components are going to be really critical in the sixth data platform because it's going to be an amalgam of all of those players sharing data and sharing or being part of the exact same architecture. And George, you and I sort of debate or at least discuss quite frequently, like I always say, what about Oracle? And you're like, yeah, okay. And to Ryan's point, the existing five are clearly evolving. You've seen Oracle. I mean, Oracle, somebody announces something and then Oracle, Larry will announce it and act like they invented it. And so they've maintained, at least they're spending on R&D and maintaining relevance, which is a good thing. But I think, George, the thing that we point to is the real-time nature that Uber, for all, which we feel like many of these platforms are not in a position to capture. And that's maybe a little more forward-thinking or years out. But the idea that, and we've talked about things like the expressiveness of graph and knowledge databases with the query flexibility of SQL, or we've seen some of the things that Fauna is doing by being able to cross multiple documents and join across multiple documents in real-time. Those are some of the things that we're thinking about and why we feel that some of the five are going to be challenged. But George, I'd love for you to pick up on that and maybe follow up. You're talking, Dave, now closer to that real world of intelligent data apps that we were describing with like Uber, for all, where you've got digital twins fed in real-time. And let me sort of map that back down to Ryan and where we are today, which is do you see a path, Ryan, towards adding enough transactional consistency or a transactional model that you can take in real-time data that might be ingested across multiple tables and you understand windows, but at the same time, you can feed a real-time query that's looking at historical context and the real-time window. And then I guess what I'm asking is sort of how far up this stack you're going. And the other part is it's again that it's related to sort of who's keeping track of making sure of data integrity and essentially access integrity. Where are the boundaries? Are there boundaries? Is it going to be all in the storage manager? Help enlighten us as to essentially the integrity, the real-time nature and then how you start mapping that to higher-level analytics and digital representations, digital twins. That's a big question, but help us unwrap it. Yeah, so I think you brought up a couple of different areas and I'll address those somewhat separately. So first of all, in terms of transactions, that's one of the things that the formats or the open data formats, and I'm including just Delta and Iceberg in that. What they do is essentially have a transaction protocol that allows you to safely modify data without knowing about any of the other people trying to modify or read data. They're the two formats that do that and are open source. So that I think is a solved problem. Now, the issue that you then have is they do that by writing everything to an object store and cutting off a version, which is inherently a batch process. And that's where you start having this mismatch between modern streaming, which is a micro batch streaming operation and efficiency. Because you need to, at each point in time, commit to the table, every single commit incurs more work or something like that. So in order to get towards real time, you're simply doing more work and you're also adding more complexity to that process. So I think that essentially you're seeing cost rise, at least linearly if not exponentially, as you get closer and closer to real time. I think that the basic economics of that makes it infeasible in the long term to really make that real time something that you're gonna use 100% of the time. And this is the age old trade-off between latency and throughput. If you want lower latency, you have lower throughput. If you're willing to take higher latency, you have higher throughput and thus better efficiency. So I think that where we need that streaming and the sort of constantly fed data applications, those are going to get easier to build certainly. But I think that after those is probably where you're going to go and store data and make it durable for this sixth data platform. And it'll be interesting to see the interplay between those real time sort of streaming applications and how we sort of merge that data with data from the lake or warehouse. And George, when we think about what we had Uber on, they were sort of describing how they sort of attack that trade-off. Now that's a unique application. Riders, drivers, ETAs, destinations, prices, et cetera. But you could see sort of in industry 4.0 applications and IoT, some potential there, but George, your premise has always been it's gonna apply to sort of mainstream businesses. Ken, if you bring that slide back up, the question slide, George and I love to go write the dessert. I think we've hit on all of these, Ryan. What other building blocks beyond Iceberg and Tabular are going to enable choice and modularity in this world? What are the implications for today's modern data platforms? And we're talking about how will applications evolve where we can support this physical world model? I think I'm hearing from Ryan, George, that that's kind of somewhat limited, at least today in terms of our visibility and scope, the activity is gonna be really along these analytic applications, but you're seeing a lot of analytic and transaction activity coming together. So, George, pick it up from there. Let me follow up on that, Dave, which is if I parse, Ryan, what you were saying, you get only so far with this trying to get to asymptotically rising or asymptotic efficiency in terms of real time, say, ingest. What I think I'm inferring from what you're saying is, if you want a real time system of truth, you're gonna use an operational database. If you want a historical system of truth that's being continuously hydrated, that might be a separate analytic system. Am I understanding that right? Is that how you see the platform evolving where you're gonna have separate operational database and separate sort of historical system of truth to borrow a term from Bob Muglieff? So, yes, I think I would subscribe to that view, at least in the short term. The way that we are tackling the challenge of tearing down the data silos between these data platforms that, the major players that you had up on the earlier slide, the way that we're doing that is to essentially trade off some of the machinery that would go to support those real time like sub-second latency applications. So, I don't think that we're going to approach merging those two worlds anytime soon. However, I'm a big believer in usability and it's not that we need to merge those two worlds, it's that it needs to appear like we are, right? We can have separate systems that make a single use case work across that boundary. Netflix did this, I think, classically with logs coming from our applications. We had telemetry and logs coming from runtime applications that are powering the Netflix product worldwide. We need access to that information with millisecond latencies. Iceberg was not a runtime system that provided that. We kept it all in memory because that was what made the most sense, but absolutely for historical applications, we did. And to a user that trade off was seamless. So you would go and query logs from your running application and you'd get millisecond logs, sorry, logs that are fresh to the millisecond, but you could also go back in time two years. Things like that, those are the applications that are coming. How did you provide that seamless simplicity to the developer so that they didn't know they were going against two different databases? Through building a data app that was a ton of work and understood both of those data storage domains. So the app was, or the backend at least for it was responsible for receiving all of the logs in a almost real time and for storing them and managing the handoff between in memory and Iceberg. It's in George, again to use the Uber example, I might correct me if I'm wrong, but essentially using Google Spanner, this globally distributed strictly consistent database, but then they've got, I think they use the example that, hey, we don't update the pricing in real time. And of course, we can go back and look at historical data as well, which is in a separate data store. I forget what they were using, BigQuery or maybe it was some Postgres hack, I can't remember, but George, is that not a similar analogy? It's a perfect analogy. But a follow-up question for Ryan, which is I just wanna understand the separation that has to go down into storage of like permissions and maybe technical metadata that I guess maybe is stored as part of Iceberg, like what the tables are, what columns are there, what tables are connected in this one, if they are in this one repository or if I don't even know if it's called a database. And then does that interoperate with a higher level open set of operational catalogs like a Unity or today what's inside Snowflake. In other words, is there a core set of governing metadata that's associated with the storage layer and then everything else, all the operational metadata above that is interoperable with these open modular compute engines. So yeah, in terms of Iceberg, we do make a distinction between technical metadata required to read and write the format properly and higher level metadata. And that higher level metadata we think of as business operations info, even things like access controls and RBAC policy and all of that, that's a higher level that we don't include in even the Iceberg open source project. The technical metadata is quite enough to manage for an open source project. We need to build higher level structures on top of that. The reason I ask, Brian, this is critical because you identified how the business model of all the data management vendors changes when data is open and they don't get to attach all workloads because they manage that data. Now they have to compete for each workload on price and performance. So my question is, and I go back to this question about who keeps the data copacetic, the more metadata you have essentially that tells you that governs all that data, it sounds like that's the new gateway. Is that a fair characterization or is there no new gateway? I think that it is absolutely critical to have a plan around that metadata. We don't really know how far we're going to go down that road. So I think that today there's a very good argument that access controls and certain governance operations need to move into a catalog and storage layer that is uniform across all of those different methods of access and compute providers. What else goes into that, I think is something that we're going to see. I think that that is by far the biggest blocker that I see across customers that I talk to. Everyone has already Databricks and Snowflake and possibly some AWS or Microsoft services in their architecture. And they're wondering, how do I stop copying data? How do I stop syncing policy between these systems? I know that we need to solve that problem today, but the higher level stuff, because usually, like we work with a financial regulator that has their own method of tracking policy and translates that into the RBAC access controls underneath, how you manage that policy maybe organization specific, it might be something that evolves over time. I think it's, we're just at the start of this market where people are starting to think about data holistically across their entire enterprise, and including the transactional systems, how that data flows into the analytics systems, how we secure it and have the same set of users and roles and potentially access controls that follow that data around. This is a really big problem. And I wish I had a crystal ball, but I just know that the next step is to centralize and have fewer permission systems. And just one that maybe covers 95% of your analytic data is going to be a major step forward. And you're definitely seeing, as you said earlier, Ryan, signs of this where you see at least most of the top five, if not all, are beginning to adopt these sort of modular open stances. You see things like a sort of brute force, but look at MySQL Heatwave that Oracle has done bringing transaction and analytic together with monster memories. Ken, bring back the questions if you would. I mean, I think, George, we hit on all of these. I don't know if you had any final questions for Ryan before we went into. I want you to lay out, George, your sort of vision of what this modularity looks like and then give Ryan the last word here. Well, I guess my follow-up question for Ryan would be, now that we're prying open some of these existing data platforms, they had modularity, but generally they were working with their own modules. And you're providing now open storage that can work across everyone so that we have a data-centric world instead of a data platform-centric world. And my question would be, you really help crystallize it where everyone's now got to compete on workloads. What are the workloads that are first moving to a modular world where people are saying, let me choose a more appropriate price or performance point for data that I can maintain in an open format? I think right now, companies have a different problem. It's not like they're looking at this and saying, ooh, I want to move this workload. Or maybe they are, but it depends on where they're coming from. If you already have Spark and you need a much better data warehouse option, then you might be adding Snowflake. You might also be coming from Snowflake and going towards ML in Spark. Those sorts of things, I can't really summarize that. I would say that the biggest thing that we see in large organizations is that you have these pockets of people that this group really likes Spark. This group is perfectly happy running on Redshift. Someone else needed the performance of Snowflake and the CIO level is looking at this as what is our data story? How do we fix this problem of needing to sync data into five different places? I was talking to someone at our company that used to work in AdTech just this morning and he said that they had five or six different copies of the exact same dataset where different people would go to the same vendor buy the dataset, massage it slightly differently to meet their own internal model and then use it. And it's those sorts of problems that it's like, let's just store it once. And let's store all of this data once and make it universally accessible. We don't have to worry about copies. We don't have to worry about silos. We don't have to worry about copying governance policy. And is it leaking because Spark has no access controls while I've locked everything down in say Snowflake or Starburst, it's just a mess. And so the first thing that people are doing is trying to get a handle on it and say, we know we need to consolidate this. We know we need to share data across all of these things. And thankfully we can these days. 10 years ago, the choice was we can share data but we have to use Hadoop and it's unsafe and unreliable and very hard to use. Or we can continue having both Redshift and Snowflake and Natesa in our architecture. So we're just now moving to where it's possible and we're discovering a lot along the way. Guys, first of all, I have to congratulate you with 39 minutes in and we haven't said AI. So well done. I wonder, Ken, if you could bring up the last slide. George, I'd like you to unpack this, explain what we're looking at here, what your vision is. And I want to get Ryan to comment and then we'll wrap. Okay, and I would say, this is not really my vision. I think this is my attempt to illustrate what we see unfolding in the market that like Ryan is helping us articulate, which is that we start on the left and we had a DBMS or data platform-centric view of the world where it might be Redshift or Snowflake, where the state of the art of technology required us to integrate all the pieces to provide the simplicity and performance that customers needed. But as technology matures, we can start to modularize things and the first thing we're modularizing actually is storage. Now, it means we can open up and offer a standardized interface to tables, whether it's iceberg or Delta tables or hoodie. And what Ryan is helping us articulate is the permissions have to go along with that and there's some amount of transaction support in there. And then we're sort of taking apart the components that were in an integrated DBMS. Now, it doesn't mean you're gonna necessarily get all the components from different vendors, but let's just go through them. There's like a control plane that orchestrates the work. Today, we know these as DBT or DAX or Airflow or Prefect. There's the query optimizer, which is the hardest thing to do where you can just say what you want and it figures out how to get it. And that is also part of Snowflake, it's BigQuery, Azure Synapse Fabric or Databricks built their own Databricks SQL which was separate from the Spark Execution Engine. The Execution Engine is an SDK sort of non-SQL way of getting at the data. That was the original Spark Engine. Snowflake has an API now, but I think it goes through their query optimizer and there's the metadata layer, which is beyond just the technical metadata. And this is what we were talking about with Ryan, which is like, how do you essentially describe your data estate and all the information you need to know about how it hangs together? And that's like at scale with their metrics semantic layer. There's Atlan, which an elation and Calibra, which are sort of user and administrator catalogs. But the point of all this is to say that we're beginning to unbundle what was once in the DBMS, just the way Snowflake unbundled what was once Oracle, which had compute and storage together. They separated compute and storage and now we're separating compute into multiple things to Ryan's point so that we can use potentially different tools for different workloads and that they all work on one shared data estate, that the data is not embedded in and trapped inside one platform or engine. That's the question is, how are we getting there? Do we have the components right? What is that world going to look like when we get there? It has big implications for the products customers buy and for the business model of the vendor selling them. So Ryan, I mean, I feel like George's picture, thank you George for sharing that, you know, map pretty well to our conversation, but anything you'd add or any final thoughts that you want to bring forth? So just the high level modularity versus simplicity, I think we absolutely need both, right? The modularity is clearly being demanded. The simplicity always follows afterwards, you know, databases, power applications. So there's always been a gap in code and who controls what and these things. We're just adding layers, you know, we've done pretty good at knowing the boundaries between a database and an application on top of it and making that a smooth process. I think that we're doing that again, you know, separating that storage and compute layer, but we absolutely need both. And this is where some of the newer things that we've been doing in the Apache Iceberg community come into play, standardizing how we talk with catalogs, making it possible to actually secure that layer and say, hey, this is how you pass identity across that boundary. We're also, you know, moving database services from the query layer for maintenance into the storage layer, that's another thing that's moving. That modularity needs to be followed by the simplicity, things that use OAuth, you know, we're pioneering a way to use OAuth 2 to connect the query engine to our storage layer so that you can just click through and have an administrator say, yes, I wanna give access to this data warehouse to Starburst or something like that. That ease of use, I think is really the only thing that is going to make modularity happen because I mean, again, the big failure of the Hadoop ecosystem was that simplicity and that usability. We've only been able to see the benefits of that by layering on and maturing those products to add the simplicity. And so that's absolutely a part of where we're going. Excellent. Well, Ryan, I really wanna thank you for coming on the program and sharing your perspectives and wish you all the best with Tabular. Thank you. It was a lot of fun. I appreciate you guys having me on. You bet. And of course, thanks to my colleague, George Gilbert and Alex Meyerson who's on production and manages the podcast and Ken Schiffman who's on solo today. Good job, Ken with the slides. Kristen Martin and Cheryl Knight help get the word out on social media and in our newsletters and Rob Hoef is our EIC over at Silicon Angle. Thank you for all the good editing. And remember, all these episodes are available as podcasts. All you got to do is just search Breaking Analysis Podcast. We publish each week on wikibon.com and siliconangle.com and you can email me directly, david.volante at siliconangle.com or DM me at dvolante if you want to get in touch. Comment on our LinkedIn post and please check out etr.ai. Get some great survey data. They're accelerating their survey plans. And so definitely check that out. This is Dave Vellante for theCUBE Insights powered by ETR. Thanks for watching and we'll see you next time on Breaking Analysis.