 The stampede to cloud and massive VC investments has led to the emergence of a new generation of object store-based data lakes. And with them, two important trends, actually three important trends. First, a new category that combines data lakes and data warehouses, AKA the lake house, is emerged as a leading contender to be the data platform of the future. And this novelty touts the ability to address data engineering, data science and data warehouse workloads on a single shared data platform. The other major trend we've seen is query engines and broader data fabric virtualization platforms have embraced next-gen data lakes as platforms for SQL-centric business intelligence workloads, reducing or somewhat even claim eliminating the need for separate data warehouses, pretty bold. However, cloud data warehouses have added complementary technologies to bridge the gaps with lake houses. And the third is many, if not most customers that are embracing the so-called data fabric or data mesh architectures, they're looking at data lakes as a fundamental component of their strategies and they're trying to evolve them to be more capable, hence the interest in lake house. But at the same time, they don't want to or can't abandon their data warehouses states as such. We see a battle royale is brewing between cloud data warehouses and cloud lake houses. Is it possible to do it all with one cloud-centric analytical data platform? Well, we're going to find out. My name is Dave Vellante and welcome to the Data Platforms Power Panel in theCUBE, our next episode in a series where we gather some of the industry's top analysts to talk about one of our favorite topics, data. In today's session, we'll discuss trends, emerging options and the trade-offs of various approaches and we'll name names. Joining us today are Sanjeev Mohan, who's the principal at Sanjimoh. Tony Baer is principal at DB Insights and Doug Henshin is the vice president and principal analyst at Constellation Research. Guys, welcome back to theCUBE. Great to see you again. Thank you, thank you, thank you. It's early June and we're gearing up with two major conferences, several database conferences, but two in particular that we're very interested in, Snowflake Summit and Databricks Data and AI Summit. Doug, let's start off with you and then Tony and Sanjeev, if you could kindly weigh in. You know, where did this all start, Doug? The notion of lake house and let's talk about what exactly we mean by lake house. Well, you nailed it in your intro, you know, one platform to address BI, data science, data engineering, fewer platforms, less costs, less complexity, very compelling. You can credit Databricks for coining the term lake house back in 2020, but it's really a much older idea. You can go back to Cloudera introducing their Impala database in 2012. That was a database on top of Hadoop and indeed in that last decade, by the middle of that last decade there were several SQL on Hadoop products, open standards like Apache drill. And at the same time, the database vendors were trying to respond to this interest in machine learning and the data science. So they were adding SQL extensions, you know, the likes of Nerded and Vertica were adding SQL extensions to support the data science. But then later in that decade with the shift to cloud and object storage, you saw the vendors shift to this whole cloud and object storage idea. So you have in the database camp, Snowflake introduced Snowpark to try to address the data science needs. They introduced that in 2020 and last year they announced support for Python. You also had Oracle SAP jumped on this lake house idea last year supporting both the lake and warehouse, a single vendor, not necessarily quite single platform. Google very recently also jumped on the bandwagon. And then you also mentioned, you know, the SQL engine camp, the Dremios, the Ahanas, the Starbursts, really doing two things, a fabric for distributed access to many data sources, but also very firmly planning that idea that you can just have the lake and we'll help you do the BI workloads on that. And then of course the data lake camp with the Databricks and Cloud Errors, providing a warehouse style deployments on top of their lake platforms. Okay, thanks Doug. Now I'd be remiss, those of you who know me know that I typically write my own intros. This time my colleagues fed me a lot of that material. So thank you, you guys make it easy. But Tony, give us your thoughts on this intro. Right. Well, I very much agree with both of you, which may not make for the most exciting television in terms of that it has been an evolution, just like Doug said. I mean, for instance, just to give an example, when Teradata bought after data was initially seen as a platform, as a hardware platform play. Now in the end it was basically it was all those Aster functions that made a lot of sort of big data analytics, accessible to SQL. And so what I really see the, just in a more simpler definition or functional definition, the data lake house is really an attempt by the data lake folks to make the data lake friendlier territory to the SQL folks. And also to get into friendlier territory to all the data stewards who are basically concerned about the sprawl and the lack of control and governance in the data lake. And so it's really kind of a continuing of an ongoing trend. That being said, there's no action without counteraction. And of course, at the other end of the spectrum, we also see a lot of the data warehouses starting to add things like in database machine learning. So they're certainly not surrendering without a fight. Again, as Doug was mentioning, this has been part of a continual blending of platforms that we've seen over the years that we first saw in the ADUPE years with SQL on the Dube and data warehouses starting to reach out to cloud storage and or should say the HDFS. And then with the cloud then going cloud native and therefore trying to break the silos down even further. Yeah, thank you. And Sanjeev, data lakes when we first heard about them, there was such a compelling name and then we realized all the problems associated with them. So pick it up from there. What would you add to Doug and Tony? I would say, these are excellent points that Doug and Tony have brought to light. The concept of lake house was going on to your point, Dave, a long time ago, long before the term was invented. For example, in Uber, Uber was trying to do a mix of Hadoop and Vertica because what they really needed were transactional capabilities that Hadoop did not have. So they weren't calling it the lake house, they were using multiple technologies but now they're able to collapse it into a single data store that we call lake house. Data lakes, excellent at batch processing, large volumes of data but they don't have the real-time capabilities such as change data capture in certain updates. So this is why lake house has become so important because they give us these transactional capabilities. Great, okay. So I'm interested, you know, the name is great, lake house. The concept is powerful but I get concerned that it's a lot of marketing hype behind it. So I want to examine that a bit deeper. How mature is the concept of lake house? Are there practical examples that really exist in the real world that are driving business results for practitioners? Tony, maybe you could kick that off. Well, put this one. I think what's interesting is that both data lakes and data warehouse that each had to extend themselves. So it's not like, you know, the data, I mean, you know to believe the Databricks hype, it's that this was just a natural extension of the data lake. In point of fact, Databricks had to go outside its core technology of Spark to make the lake house possible. And it's a very similar type of thing on the part of the data warehouse though. So in terms of that, they've had to go beyond SQL. In the case of Databricks, okay, there have been a number of incremental improvements, you know, to Delta Lake. You know, to basically make the table format more performative for instance. But the other thing that I think the most dramatic change in all that is in their SQL engine, they had to essentially pretty much abandoned Spark SQL because it really, you know, in of itself, Spark SQL is essentially stopgap solution. And if they want to really address that crowd, they had to totally reinvent SQL or at least, you know, their SQL engine. And so Databricks SQL is not Spark SQL. It is not Spark. It's basically SQL that is adapted to run in a Spark environment, but the underlying engine is C++. It's not, you know, it's not scale or anything like that. So Databricks had to take a major detour outside of its core platform do this. So to answer your question, this is not mature because these are all basically kind of, even though the idea of blending platforms has been going on for a well over a decade, I would say that the current iteration is still fairly immature. And in the cloud, I actually see basically a further, what I would, I could see a further evolution of this because if you think through cloud native architecture, where you're essentially abstracting compute from data, there is no reason why if let's say you're dealing with, say, you know, the same, you know, basically data targets, say cloud storage, you know, cloud object storage, that you might not apportion the tasks to different compute engines. And so therefore you could have, you know, for instance, let's say you're a Google, you could have BigQuery, you know, perform basically the types of, you know, so the analytics, you know, the SQL analytics that would be associated with the data warehouse, and you could have BigQuery ML that does some in database machine learning. But at the same time, for another part of the query, which might involve, let's say some deep learning just for example, you might go out to let, you know, let's say the sparks, you know, to the serverless spark service or the data proc. And there's no reason why Google could not blend all those into a coherent offering that's basically all triggered through microservices. And I say, and I just gave a Google as an example, you could basically generalize that with all the other cloud or all the other third-party vendors. So I think we're still very early in the game in terms of the maturity of data lake houses. Thanks, Tony. So Sanjeev, is this all hype? What are your thoughts? It's not hype, but I completely agree. It's not mature yet. Lake houses has still a lot of work to do. So what I'm now starting to see is that the world is dividing into two camps. On one hand, there are people who don't want to deal with the operational aspects of vast amounts of data. They are the ones who are going for BigQuery, Redshift, Snowflake, Synapse and so on, because they want the platform to handle all the data modeling, access control, performance enhancements, but there's a trade-off. If you go with these platforms, then you are giving up on vendor neutrality. On the other side are those who have engineering skills, they want the independence. In other words, they don't want vendor lock-in. They want to transform their data into any number of use cases, especially data science, machine learning use case. What they want is agility via open file formats using any compute engine. So why do I say lake houses are not mature? Well, cloud data warehouses, they provide you an excellent user experience. That is the main reason why Snowflake took off. If you have thousands of cables, it takes minutes to get them started, uploaded into your warehouse and start experimentation. Table formats are far more resonating with the community than file formats. But once the cost goes up of cloud data warehouse, then the organization start exploring lake houses. But the problem is, lake houses still need to do a lot of work on metadata. Apache Hive was a fantastic first attempt at it. Even today, Apache Hive is still very strong, but it's all technical metadata. And it has so many different restrictions. That's why we see data breaks as investing into something about unity catalog. Hopefully we'll hear more about unity catalog at the end of the month. But there's a second problem. I just want to mention, and that is lack of standards. All these open source vendors, they're running what I call ego projects. You see on LinkedIn, they're constantly battling with each other. But the end user doesn't care. End user wants a problem to be solved. They want to use Prino, Dremios, Spark from EMR, Databricks, Ahana, Das, Flink, Athena. But the problem is that we don't have common standards. Right. Thanks. So, Doug, I worry sometimes. I mean, I look at the space that we've debated for years, best of breed versus the full suite. You see AWS with whatever, 12 different plus data stores and different APIs and primitives. You got Oracle putting everything into its database. It's actually done some interesting things with MySQL Heatwave. So maybe there's proof points there, but Snowflake really good at data warehouse, simplifying data warehouse. Databricks really good at making lake houses actually more functional. Can one platform do it all? Well, in a word, I can't be best of breed at all things. I think the upshot of and a cogent analysis from Sanjiv there, the database, the vendors coming out of the database tradition, they excel at the SQL, they're extending it into data science, but when it comes to unstructured data, data science, MLAI, often a compromise. The data lake crowd, the Databricks and such, they've struggled to completely displace the data warehouse when it really gets to the tough SLAs they acknowledge that there's still a role for the warehouse. Maybe you can size down the warehouse and offload some of the BI workloads and maybe in some of the SQL engines, good for ad hoc, minimize data movement. But really when you get to the deep service level, a requirement, the high concurrency, the high query workloads, you end up creating something that's warehouse like. Mm-hmm. Where do you guys think this market is headed? You know, what's going to take hold? Which projects are going to fade away? You've got some things in Apache projects like Houdi, Iceberg, where do they fit Sanjiv? Do you have any thoughts on that? So thank you, Dave. So I feel that table formats are starting to mature. There is a lot of work that's being done. We will not have a single product, a single platform. We'll have a mixture. So I see a lot of Apache Iceberg in the news. Apache Iceberg is really innovating. Their focus is on a table format. But then Delta and Apache Houdi doing a lot of deep engineering work. For example, how do you handle high concurrency when there are multiple rights going on? Do you version your parquet files or how do you do your upsets basically? So different focus, at the end of the day, the end user will decide what is the right platform. But we are going to have multiple formats living with us for a long time. Doug, is Iceberg in your view something that's going to address some of those gaps in standards that Sanjiv was talking about earlier? Yeah, Delta Lake, Houdi, Iceberg, they all address this need for consistency and scalability. Delta Lake open technically, but open for access. You don't hear about Delta Lakes in any worlds, but Databricks, hearing a lot of buzz about Apache Iceberg. End users want an open performance standard. And most recently, Google embraced Iceberg for its recent big lake. Their staff at supporting both lakes and warehouses on one conjoined platform. And Tony, of course, you remember the early days of the sort of big data movement. You had MapR was the most closed. You had Hortonworks the most open. You had Cloudera in between. There was always this kind of contest as to who's the most open. Does that matter? Are we going to see a repeat of that here? I think it's spheres of influence, I think. And Doug very much was kind of referring to this. I would call it kind of like the MongoDB syndrome, which is that you have a, and I'm talking about MongoDB before they change their license, open source project, but very much associated with MongoDB, which basically pretty much controlled most of the contributions and the decisions. I think Databricks has the same ironclad hold on Delta Lake, but still the market is pretty much associated Delta Lake as the Databricks open source project. I see Iceberg, I mean, Iceberg is probably further advanced than Hootie in terms of mind share. And so what I see this breaking down to is essentially the, basically the Databricks open source versus the community, the everything else open source, the community open source. So I see it's a very similar type of breakdown that I see repeating itself here. So by the way, Mongo has a conference next week another data platform is kind of not really relevant to this discussion totally, but in the sense it is because there's a lot of discussion on earnings calls these last couple of weeks about consumption and who's exposed. Obviously people are concerned about snowflakes consumption model. Mongo is maybe less exposed because Atlas is prominent in the portfolio, blah, blah, blah. But I wanted to bring up a little bit of controversy that we saw come out of the snowflake earnings call where the Evercore analysts asked Frank Slutman about discretionary spend and Frank basically said, look, we are not discretionary. We are deeply operationalized whereas he kind of poo pooed the lake house or the data lake, et cetera saying, oh yeah, data scientists pull files out and play with them. That's really not our business. Any of you have comments on that? Help us went through that controversy. Who wants to take that one? Let's put it this way. SQL folks are from Venus and the data scientists are from Mars. I mean, it really comes down to a sort of that type of perception. The fact is that traditionally with analytics, it was very SQL oriented and that basically the quants were kind of off in their corner on their spec, whether using SAS or whether using Teradata. I see what, it's really a great leveler today, which is that they're basically Python has become arguably one of the most popular programming languages, depending on what month you're looking at that the title index. And of course, obviously SQL is, as I tell the MongoDB folks, SQL is not going away. You have a large skills base out there. And so basically, I see this breaking down to essentially you're going to have each group that's going to have its own natural preferences for its home turf. And the fact that basically that boldly, let's say the Python and scale of folks are, or you're using Databricks does not make them any less operational or mission critical than the SQL folks. Anybody else want to chime in on that one? Yeah, I totally agree with that. You know, Python support in Snowflake is very nascent with all of Snowpark, all of the things outside of SQL, they're very much relying on partners to make things possible, make data science possible. And it's been, it's very early days. I think the bottom line, what we're going to see is each of these camps is going to keep working on doing better at the thing that they don't do today or they're new to, but they're not going to nail it. They're not going to be best of breed on both sides. So the SQL centric companies and shops are going to do more data science on their database centric platform. The data science driven companies might be doing more BI on their lakes with those vendors. And the companies that have highly distributed data, they're going to add fabrics and maybe offload more of their BI onto those engines like Dramio and Starburst. So I've asked you this before, but I'll ask you, Sanjeev. So it's because Snowflake and Databricks are such great examples because you have the data engineering crowd trying to go into data warehousing and you have the data warehousing guys trying to go into the sort of lake territory. Snowflake has $5 billion in the balance sheet. And I've asked you before, I asked you again, doesn't there has to be a semantic layer between these two worlds? Does Snowflake go out and do M&A and maybe buy an at scale or a data mirror or is that just sort of a bandaid? What are your thoughts on that, Sanjeev? Well, yeah, I think semantic layer is the metadata, the business metadata is extremely important. At the end of the day, the business folks, they'd rather go to the business metadata than have to figure out where, like for example, like let's say, you know, I wanna update somebody's email address and we have a lot of like, you know, overhead with data residency laws and all that. I want my platform to give me the metadata, business metadata so I can write my business logic without having to worry about which database, which location, so having that semantic layer is extremely important. In fact, now we are taking it to the next level. Now we are saying that it's not just a semantic layer, it's all my KPIs, all my calculations. So how can I make those calculations independent of the compute engine, independent of the BI tool and make them fungible? So more disaggregation of the stack, but it gives us more best-of-breed products that the customers have to worry about. So I want to ask you about the stack, you know, the modern data stack, if you will. And we always talk about injecting machine intelligence, AI into applications, making them more data-driven. But when you look at the application development stack, it's separate, you know, the database tends to be separate from the data and analytics stack. Do those two worlds have to come together in the modern data world? And what does that look like organizationally? I think it is, so organizationally, even technically, I think it is starting to happen. Microservices architecture was the first attempt to bring the application and the data world together, but they are fundamentally different things. Like for example, if an application crashes, that's horrible, but Kubernetes will self-heal and it'll bring the application back up. But if a database crashes and corrupts your data, we have a huge problem. So that's why they have traditionally been two different stacks. They are starting to come together, especially with data ops, for instance, versioning of the way we write business logic. You know, it used to be a business logic that was highly embedded into our database of choice, but now we are disaggregating that using GitHub, CICD, the whole DevOps tool chain. So data is catching up to the way applications are. Also databases that trans analytical databases, that's a little bit of what the story is with MongoDB next week with adding more analytical capabilities. But I think companies that talk about that are always careful to couch it as operational analytics, not the warehouse level workloads. So we're making progress, but I think there's always going to be, or there will long be a separate analytical data platform. Until DataMesh takes over. Yeah, I'm opening that kind of worm, so. Well, I know it's out of scope here, but wouldn't DataMesh say, hey, you take your best to breed to Doug's earlier point, you can't be best to breed at everything? Wouldn't DataMesh advocate? Data lakes do your data lake thing, data warehouse, do your data lake, then you're just a node in the mesh. Now you need separate data stores and you need separate teams, but. I think that, I mean, put it this way. DataMesh itself is a logical view of the world. It's not, you know, the DataMesh is not necessarily we're on the lake or on the warehouse. I think for me, the big fear there is more in terms of, you know, the silence of governance that could happen and the silent views of the world how we're redefined. And that's why, and I want to go back to saying what Sanji said, which is that it's going to be raising the importance of the semantic layer. Now does Snowflake, that raises, that opens a couple of Pandora's boxes here, which is one, does Snowflake dare go into that space or do they risk basically alienating basically, you know, their partner ecosystem, which is a key part of their whole appeal, which is best of breed. They're kind of the same situation that Informatica was during the early 2000s, when Informatica briefly flirted with analytic applications and realized that was not a good idea, need to redouble down on their core, which is data integration. The other thing though that raises the importance of, this is where the, you know, the best of breed comes in, is the data fabric. My contention is that if you and whether you use, you know, employee data mesh practice or not, if you do employee data mesh, you need data fabric, if you deploy data fabric, you don't need, that's like, you know, need to practice data mesh. But data fabric at its core, and admittedly, it's a category that's still very poorly defined and evolving, but at its core, we're talking about a common meta data backplane. Something that we used to talk about with master data management. This would be something that would be more, I would say basically, you know, mutable, that would be more, you know, that would be more evolving, you know, basically using, let's say, machine learning to kind of, so that we don't have to pre-define rules or pre-define what the world looks like. But, so I think in the long run, what this really means is that, whichever way we implement, whichever physical platform we implement, we need to all be speaking the same meta data language. And I think at the end of the day, because whether it's a lake warehouse or a lake house, we need that, we need common meta data. Doug, can I come back to something you pointed out, that those talking about bringing analytic and transaction databases together, you had talked about operationalizing those and the caution there, educate me on my SQL heat wave. I was surprised when Oracle put so much effort in that, and you may or may not be familiar with it, but a lot of folks have talked about that now, it's got nowhere in the market, no market share, but a lot of you've seen these benchmarks from Oracle. How real is that bringing together those two worlds and eliminating ETL? Yeah, I have to defer on that one, that's my colleague Holger Mueller, he wrote the report on that, he's way deep on it, and I have not gone deep on it. Okay, I just, I wonder if that is something that's, how real that is, or if it's just Oracle marketing, anybody have any thoughts on that? I'm pretty familiar with heat wave. It's essentially Oracle doing what, I mean, it's kind of a parallel with what Google's doing with AlloyDB. It's an operational database that will have some embedded analytics, and it's also something which I expect to start seeing with MongoDB, and I think basically Doug and Sanjeev were kind of referring to this before about basically kind of like the operational analytics that are basically embedded within an operational database. The idea here is that the last thing you want to do with an operational database is slow it down. So you're not going to be doing very complex deep learning or anything like that, but you might be doing things like classification, you might be doing some predictives. In other words, like, we've just concluded a transaction with this customer, but was it less than what we were expecting? What does that mean in terms of, is this customer likely to turn? I think we're going to be seeing a lot of that, and I think that's what a lot of what MySQL heat wave is all about. Whether Oracle has any presence in the market, now it's still a pretty new announcement, but the other thing that's going, that kind of goes against Oracle, that they had to battle against is that even though they own MySQL and run the open source project, everybody else, in terms of the actual commercial implementation, it's associated with everybody else, and the popular perception has been that MySQL has been basically kind of like a sidelight for Oracle, and so it's on Oracle's shoulders to prove that they're damn serious about it. Yeah, there's no coincidence that MariaDB was launched the day that Oracle acquired Sun. Sanjeev, I wonder if we could come back to a topic that we discussed earlier, which is this notion of consumption. Obviously, Wall Street's very concerned about it. Snowflake dropped prices last week. I've always felt like, hey, the consumption model is the right model. I can dial it down when I need to. Of course, the street freaks out. What are your thoughts on just pricing, the consumption model? What's the right model for companies, for customers? The consumption model is here to stay. What I would like to see, and I think it is an ideal situation, and actually plays into the Lakehouse concept, is that I have my data in some open format. Maybe it's Parquet or CSV or JSON, Avro, and I can bring whatever engine is the best engine for my workloads, bring it on, pay for consumption, and then shut it down. By the way, that could be Cloudera. We don't talk about Cloudera very much, but it could be one business unit wants to use Athena. Another business unit wants to use some other Trino, let's say Dremio. So every business unit is working on the same data set. See, that's critical, but that data set is maybe in their VPC and they bring any compute engine, pay for the use, shut it down. That, then you're getting value and you're only paying for consumption. It's not like, I left a cluster running by mistake. So there have to be guardrails. The reason Phenops is so big is because it's very easy for me to run a Cartesian join in the cloud and get a $10,000 bill. Well, it's like it's been a sort of a victim of its own success in some ways. They made it so easy to spin up single node instances, multi-node instances. And back in the day when compute was scarce and costly, those database engines optimized every last bit so they could get as much workload as possible out of every instance. Today, it's really easy to spin up a new node, a new multi-node cluster. So that freedom has meant many more nodes that aren't necessarily getting that utilization. So Snowflake has been doing a lot to add reporting, monitoring, dashboards around the utilization of all the nodes and multi-node instances that have spun up. And meanwhile, we're seeing some of the traditional on-prem databases that are moving into the cloud trying to offer that freedom. And I think they're going to have that same discovery that the cost surprises are going to follow as they make it easy to spin up new instances. Yeah, a lot of money went into this market over the last decade, separating compute from storage, moving to the cloud. I'm glad you mentioned Cloudera, Sanjeev, because they got it all started, the kind of big data movement. We don't talk about them that much. Sometimes I wonder if it's because when they merged Hortonworks and Cloudera, they dead-ended both platforms, but then they did invest in a more modern platform. But what's the future of Cloudera? What are you seeing out there? Cloudera has a good product. I have to say, the problem in our space is that there are way too many companies. There's way too much noise. We are expecting the end users to parse it out or be expecting analysts firms to boil it down. So I think marketing becomes a big problem. As far as technology is concerned, I think Cloudera did turn their selves around. And Tony, I know you talk to them quite frequently. I think they have a quite a comprehensive offering for a long time, actually. The board, they've created Kudu. So they got operational. They have Hadoop. They have an operational data warehouse. They migrated to the cloud. They are in hybrid, multi-cloud environment. A lot of cloud data warehouses are not hybrid. They're only in the cloud. Right. I think what Cloudera has done the most has been most successful has been in the transition to the cloud. And the fact that they're giving their customers more on-ramps to it, more hybrid on-ramps. So I give them a lot of credit there. They've also been trying to position themselves as being the most price-friendly in terms of that. We will put more guardrails and governors on it. I mean, part of that could be spin, but on the other hand, they don't have the same vested interests in compute cycles as, say, AWS would have with EMR. That being said, yes, Cloudera does it. I think its most powerful appeals of that is, it almost sounds in a way, I don't want to cast them as a legacy system. But the fact is they do have a huge landed legacy on-prem and still significant potential to land and expand that to the cloud. That being said, even though Cloudera is multifunction, there are, I think it certainly has its strengths and weaknesses. And the fact is that, yes, Cloudera has an operational database or an operational data store with it, kind of like the outgrowth of age base, but Cloudera is still primarily known for the deep analytics, the operational database, nobody's going to buy Cloudera or Cloudera data platform strictly for the operational database. They may use it as an add-on, just in the same way that a lot of customers have used, let's say, Teradata to basically to do some machine learning or any of the other basically, or let's say, Snowflake to parse through JSON. So again, it's not an indictment or anything like that, but the fact is obviously they do have their strengths and their weaknesses. I think their greatest opportunity is with their existing base because that base has a lot invested and vested. And the fact is they do have a hybrid path that a lot of the others lack. Yeah, and of course, being on the quarterly shot clock was not a good place to be under the microscope for Cloudera. And now they at least can refactor the business accordingly. I'm glad you mentioned hybrid too. We saw Snowflake last month did a deal with Dell whereby non-native Snowflake data could access on-prem object store from Dell. They announced a similar thing with pure storage. What do you guys make of that? Is that just, how significant will that be? Will customers actually do that? I think they're using either materialized views or extended tables. There are data-rated residency requirements. There are desires to have these platforms in your own data center. And finally, they capitulated. I mean, Frank Slutman is famous for saying to be very focused. And earlier, not many months ago, they called the going on-prem as a distraction, but clearly there's enough demand and certainly government contracts. Any company that has data residency requirements, it's a real need. So they finally addressed it. Yeah, I'll bet dollars to donuts. There was an EBC session and some big customer said, if you don't do this, we ain't doing business with you. And that was like, okay, we'll do it. So Dave, I have to say earlier on, you had brought this point how Frank Slutman was pooping data science workloads. On your show about a year or so ago, he said, we are never going to on-prem. We burnt that bridge. That was on your show, I think. I remember exactly the statement because it was interesting. He said, we're never going to do the half-wayhouse. And I think what he meant is we're not going to bring the Snowflake architecture to run on-prem because it defeats the elasticity of the cloud. So this was kind of a capitulation in a way, but I think it still preserves his original intent, sort of. I don't know. The point here is that every vendor will poo poo, whatever they don't have until they do have it. And then it'll be like, oh, we are all in. We've always been doing this. We've always supported this. And now we are doing it better than others. Look, it was the same type of shock wave that we felt basically when AWS, at the last moment, at one of their reinvents said, oh, by the way, we're going to introduce outposts. And the analyst group is typically pre-briefed about a week or two ahead under NDA. And that was not part of it. And when they just casually dropped that in the analyst session, it's like you could have heard the sound of lots of analysts changing their diapers at that point. I remember that. Props to Andy Jassy, who many times actually told us, never say never when it comes to AWS. So guys, I know we got to run. We got some hard stops. Maybe you could each give us your final thoughts, Doug. Start us off and then... Sure. Well, we've got the Snowflake Summit coming up. I'll be looking for customers that are really doing data science that are really employing Python through Snow Park. And then a couple of weeks later, we've got Databricks with their data and AI summit in San Francisco. I'll be looking for customers that are really doing considerable BI workloads. Last year, I did a market overview of this analytical data platform space. 14 vendors, eight of them claim to support Lake House, both sides of the camp. Databricks customer had 32, their top customer that they could site was unnamed. It had 32 concurrent users doing 15,000 queries per hour. That's good, but it's not up to the most demanding BI SQL workloads. And they acknowledged that and said they need to keep working that. Snowflake asked for their biggest data science customer. They cited Kabula, 400 terabytes, 8,500 users, 400,000 data engineering jobs per day. I took the data engineering job to be probably SQL centric ETL style transformation work. So I want to see the real use of the Python, how much Snowflake has grown as a snow park has grown as a way to support data science. Great, Tony? Yeah, actually of all things, and certainly I'll still be looking for similar things as what Doug is saying, but I think sort of like kind of out of left field, I'm interested to see what MongoDB is going to start to say about operational analytics. Because I mean, they're basically, they're into this conquer the world strategy, we can be all things to all people. Okay, if that's the case, what's going to be a case with basically putting in some inline analytics? What are you going to be doing with your query engine? So that's actually kind of an interesting and it's going to be the thing I'm looking for next week. Great, Sanjeev, bring us home. So I'll be at MongoDB World Snowflake and Databricks and very interested in seeing since Tony brought up MongoDB, I see that even the databases are shifting tremendously. They are addressing both the HStack use case, online transactional and analytical. I'm also seeing that these databases started in, let's say in case of my SQL heat wave as relational or in MongoDB as document, but now they've added graphs, they've added time series, they've added geospatial and they just keep adding more and more data structures and really making these databases multifunctional. So very interesting. It gets back to our discussion of best of breed, versus all in one. And it's likely Mongo's path are part of their strategy of course is through developers. They're very developer focused. So we'll be looking for that. Hey guys, I'll be there as well. I'm hoping that we maybe have some extra time on theCUBE. So please stop by and we can maybe chat a little bit. Guys, as always, fantastic. Thank you so much, Doug, Tony, Sanjeev and let's do this again. It's been a pleasure. All right. And thank you for watching. This is Dave Vellante for theCUBE and the excellent analyst. We'll see you next time. See you next.