 From theCUBE Studios in Palo Alto in Boston, bringing you data-driven insights from theCUBE and ETR. This is Breaking Analysis with Dave Vellante. Over the last several months, George Gilbert and I have collaborated on a number of data oriented research projects. We started by looking at Databricks and some of the potential disruption scenarios for Databricks and then carry that through to Snowflake as a preview to this year's 2023 Snowflake Summit. And then follow that up with a vision of the future of data apps with Uber as an example of how we see this world unfolding. We talked about the focus of applications changing from being process-centric to data-centric, meaning that the business logic will be embedded into the data. Versus today, if you think about it, the data is embedded and buried into application silos. So we see applications built on top of this data as the future, where data, as John Furrier often has said, becomes the new development kit. So here at Snowflake Summit, we heard a strong message of all data and all workloads. So that's the very high level focus. At the lower level, when you talk to the technical people, you have a really strong technical foundation. Now in the middle, you had a fire hose of announcements, which were sometimes hard to connect. So that's what we're going to try to do today. Hello and welcome to this week Wikibon Cube Insights, powered by ETR. In this breaking analysis, we are going to do just that. We're going to try to connect the dots between the high level messaging and the underlying technical architecture to explain what we think is the differentiation that Snowflake brings. We'll also bring in ETR data and talk about how Snowflake compares to some of the other leading data platforms. We think there are maybe five in the world that are vying for the new modern data stack. Okay, let's bring up the first slide and talk to that a little bit. We've shown you this before, we've made some changes to it. It talks to the traditional strengths of Snowflake, coming at the problem from a database perspective. Snowflake are database guys. They came out of Oracle and Databricks coming at it on the right hand side of this chart from a data science and machine learning angle. Now they're both trying to expand their opportunities into each other's domain. So in thinking about Snowflake and what we've been talking about and learned at this summit, this message of all data, all workloads in many, many ways to query the data and many data types, what we're calling in this slide, pluggable storage. So you got a very strong technical and brand promise of governance that underlies that top level messaging and the technical architecture and the challenges that that brings. So Snowflake has to make sure that it is secure and governed and can share data. So the difference is this is a completely integrated experience. So let's hear this in the words of Frank Slutman when he came on the cube this week. Please play the clip. The narrative for Snowflake as a data cloud is it's a multi-layer kick, right? We obviously have infrastructure, elastic consumed by the drink. We have live data in extraordinary amounts. We have the complete workload enablement layer, the programmability platform, which is Snowpark, the marketplace, and then the transaction model, transactional model, people can monetize data and applications. So the strategy really is we enable data engineers, big time, that's sort of our, I call them our homies. Those are the people that we're super close to historically because we're a database company from way back. But we now have completely embraced the functional layer that lives above the data layer. You have data engineers and software engineers. And we now said, look, software engineers and data engineers, we address both these audiences. It is a big vision, but we think in the cloud, you have to have this, you know, in a non-premise environment, it's very, very different. You know, you can really stratify these things. But in the cloud, it's like, wait a second, you know, all of a sudden it's like, who manages security and governance here? Well, it's not you, it's them. So in other words, unless we step into that void and say, no, no, no, we own it. If you're on Snowflake, you know, you're safe, you're compliant, all these things. That's a really important thing because if we're not doing it, who's going to do it? Right? So there's a lot of discomfort around, you know, where the software engineers live, where the data engineers live, how they interact. And that's really, we want to really assert, you know, that space, if you will. All right, so there's a lot to unpack there from Frank Slutman's comments. He talked about data engineers, he's going to laugh, being their homies. I remember when he was in service now, he used to talk about the IT people being their homies. Well, these are the data crowd or his homies now. Somebody at the keynote, I can't remember who it was, said, you know what, in many respects, we're all data engineers today. So Snowflake is really trying to democratize the complexity or simplify the complexity of that data engineering. But also the last thing that Frank Slutman talked about was the governance and that's really the big promise and the hard part of what Snowflake does. Everything's integrated. And so this is a really important point and one of the differentiators that I don't sometimes don't think comes through when you're getting the fire hose of product announcements. So I want to sort of work with you to explain that and connect those dots a little bit. So let's start with sort of the two big vectors here is the many ways to query the data and many different data types. So let's unpack that a little bit. Now, Snowflake started with SQL. That's the main spring and you can see on this chart, you know, a lot of different options in terms of query options and storage formats. When you talk to Benoit Dajaville, one of the co-founders, and Frank Slutman will bristle when people talk about Snowflake being a data warehouse. Snowflake isn't a SQL data warehouse. It's where it started, but it's really a data flow engine that can have multiple personalities to talk to it. This is something that George Gilbert and I talk about all the time. And George sort of articulated this on theCUBE. We did a great session on Tuesday that I would encourage you to listen to. We'll put that link in the show notes. Anyway, those multiple personalities, those can in turn talk to multiple data types. Okay, what we show here is pluggable storage. And that in a nutshell is their huge value add because no one else really has that. Okay, so again to review, they started with SQL, then they added data frames. The data frame is a way that Python programmers talk to their data to find out what's in there to explore it, to clean it. It's more forgiving than SQL. And data frame can generate SQL, but it's a different interface. So Python programmers, which now there may be more Python programmers than SQL programmers, they can now access the same backend data capabilities. Then search, they brought in search with the Neva acquisition. So you can talk in natural language and underneath it can generate a SQL query. So Neva people thought, well, it's a consumer search app. What's the point of this? Well, in fact, they're out of that business now. And this is what we think that Snowflake's going to do. They're going to translate that natural language into a SQL query. So we can generate a query to talk to documents and then pull that query result together and integrate it. That's the key. Then there's another way of talking to the data, which is traditionalized supervised machine learning libraries. And then just this week, Snowflake with NVIDIA announced where NVIDIA's generative AI and some packaged machine learning models are going to be encapsulated inside of the Snowpark container elements. So it's not just NLP, but it recommenders out of the box to build really advanced models. So the interesting question here is when you think about Databricks wheelhouse, it built a tool chain in language for supervised machine learning, which cleans data and does feature engineering and trains models and serves them up and so forth. So can Snowflake leapfrog this approach using NVIDIA and unsupervised learning with generative AI that trains itself? And if so, does that make it too reliant on NVIDIA? That's a topic for another day, but I want to take some time now and explain this idea of many different data types. Snowflake talks in terms of unstructured data and structured data and semi-structured data. Come back to this chart. We call it pluggable storage. So even with structured data, when you're doing analytics, you store data in column format. When you're working with transaction data, you work in rows. Now last year, Snowflake announced Unistore to deal with transaction data, but the product's still not shipping. We talked to a number of customers this week who are dying to get their hands on it, but Frank Sluteman and Christian Kleineman sort of had the posture of look, it'll ship when it's ready, it's getting close, but to ensure our promise of data sharing and governance and security, it just takes time. It's really hard. Okay, so you have OLAP and you have transactions. Then Snowflake's added streaming data and they can land that in a table and join across different table types with dynamic tables that can be updated from a stream and reference data. Then they've added vector data. We saw Pinecone this week. So they could bring in documents, they can shred them and turn them into machine-readable format. They've also added graph data. We saw that with relational AI this week. Also in theCUBE, we talked about Blue Yonder who's re-architecting its platform on Snowflake along with relational AI to address the complexity of supply chains. So the result of all this is Snowflake takes all this mess and integrates it into one relational table, but it's pulling it from different table types. So that is the magic of Snowflake. Now, you may be asking yourself, well, wait a minute. Amazon's got a lot of different ways to query. Amazon supports a lot of different data types. So what's different here? So George Gilbert on theCUBE explained it and I think let's listen to his words. Please play the clip. So you teed this up perfectly because if I wanted to take a before and after image, the before image would be like Amazon Web Services when Werner Vogels got up 18 months ago at Reinvent and he puts up the slide of all 200 Amazon services and he's like, you guys are telling us this is complicated but it's your fault. You asked for all this choice and power and what Snowflake is doing is taking that slide and integrating all those capabilities so you're not trying to stuff together a bunch of piece parts that didn't fit. Now to be specific, Benoit started out by saying, look, from day one we built this data flow engine. It wasn't a SQL data warehouse. It was a data flow engine that could have multiple personalities to talk to it and that it in turn could talk to multiple data types and that in a nutshell is their huge value add because no one else really has that. All right, before we bring in the ETR data, I want to talk about this. First of all, what George was talking about with Werner Vogels, years ago, John Furrier and I on theCUBE asked Andy Jassy, you know, why the Lego blocks? Why the piece parts? And what he said was by having all these primitives, we can move as the market moves. Now, of course, the problem with that is it creates complexity. Okay, so that's the sort of trade-off. Nonetheless, coming back to Snowflake and what George was talking about, there is a little bit of a knit here in the following. Snowflake, very Amazon-like, it wants you to put all the data into Snowflake. It wants you to do all the data cleansing and all that pipeline work inside of Snowflake, inside the data cloud. Now, the advantage of doing that is everything's integrated, everything's consistent, it's all governed, it's secure, you can share data safely, et cetera. But in speaking to some customers, it's either real or a perception, but a lot of customers said, well, we don't want to do all that data cleansing inside of Snowflake, it's too expensive. So we might want to use our Spark tool chain and we might be doing something with an ETL vendor like Informatica, and we'll do that outside because it's cheaper, maybe do it, just use it on top of S3. Now, in trying to understand this, the sort of two-edged sword there, one is if you're doing that outside of Snowflake, there's a lot of other heavy lifting that you have to do. So again, very Amazon-like, Snowflake wants to take away that undifferentiated heavy lifting. The other sort of interesting fact here is, and we'll come back to this later, is that Snowflake, the reason why people think it's more expensive, we think, is because Snowflake bundles in the AWS fees. So you're not only paying for Snowflake, you're paying for AWS compute and storage. Databricks does it differently. Databricks doesn't bundle that in and charge for that, so people might have the perception that it's cheaper to do that. Now, it also may be, we haven't really done the full analysis, but the full analysis should accommodate, for example, the cost of Databricks, the cost of AWS, the staff time that it takes to do all that and put it together. And as I say, we haven't built the model yet, but it's worth looking at. I think generally the consensus here, anecdotally, has been if it's Greenfield, it's going to be more efficient to do it inside of Snowflake, but if you've got other processes built up around, whether it's, again, your Spark Toolchain or other ETL or ELT processes, it might be more advantageous to do it outside. Okay, so we'll come back to look at that whole TCO. Let's dig a little bit more into the ETR data and see what it shows here. So this slide really shows net score in the vertical axis. Net score is a measure of spending momentum. And in the horizontal axis, it's a measure of presence in the data set and overlap in the data set, but it's really the end, if you will, in the data set. And we're plotting several companies that we think are vying for the future of the new new data stack. So Microsoft, Azure in the upper right, AWS, they're leaders in cloud, obviously, and also machine learning. Google Cloud Platform, you can see right on that red dotted line. That red dotted line, by the way, is an indicator of a highly elevated spending momentum, net score. And then you see we plotted Snowflake, you see Databricks and you see Snowpark, which is, ETR has just added, and then Streamlit as well, which is just below that 40% line. So step back for a moment. When we first started seeing in the mid-last decade, we saw this emergence of it, what we thought was a new data stack. It was AWS infrastructure in the cloud. It was Snowflake for the simplified data warehouse and it was Databricks for machine learning. And we thought that was going to form the new stack. Well, what happened was both these companies raised a bulk load of money and they saw a massive tan. So they went after it and now they've become much more competitive as we showed on that earlier chart. Let's come back to Microsoft on this chart. Their data platform in Azure was always very fragmented and this is the first time with Fabric, which they just recently announced at their conference, and Synapse as the core engine that they at least standardized the table format on. But they standardized on Delta tables and the reason they did that is because Databricks had something like huge percent, like 40% of all VMs running on Azure were Databricks, meaning all the data on Azure was belonging to Databricks. So Microsoft, they're trying to co-op that data back and say, use our analytic engines and we'll partner with Databricks. And between the two of us, we'll try to get everyone into Delta tables and then we'll compete on the basis of our analytic engines and the quality. So that's their play, but neither of them has this core engine, this wholly integrated engine with multiple query options and multiple data types that Snowflake has in terms of handling all these different workloads. So AWS, as we heard from George's discussion, you know, has a different approach. It's really a bespoke approach. Now Google's got a very strong play, coherent, there's a good reason to use it, but they're kind of sort of forcing everybody into their platform around BigQuery and which is, think about what Snowflake and AWS want. They want that too, but they're happy, for instance, on AWS' side, they're happy to sell Snowflake because it drives compute and storage. Whereas Google is more reluctant to do so because data is their main play. So they're trying to kind of push you into their data stack. They don't want to help Snowflake necessarily. At least that's our take and that's the appearance that we have. Now, focus on Databricks and Snowflake for a moment in their relative positions. Snowflake's net score has come down in the last several quarters and its spending momentum, its net score, continues to decelerate. It's still a well above the 40% mark, but it's now below that of Databricks for the first time. Now ETR has added Snowpark and Streamlit and these are add-on sales that Snowflake can drive. But what we're seeing in the data is a smaller percentage of customers that are adding new logos and a smaller percentage that are spending more. We're also seeing a higher percentage that are spending flat and a higher percentage that are spending less. So that translates to compress the net score and the spending momentum. So I want to make a comment on new logos. I had said, and this is because I think this was said at the last financial analyst day by Mike Scarpelli that Snowflake didn't incentivize a sales force on new logos. I asked Frank Slupin about that. He said, no, that's not true. We actually have created now a separate team, wholly separate teams, one that goes after new logos, one that minds the existing install base and focuses on consumption, which I had felt was their only way of doing incentives. So I missed that transition. So sometime between last year, a year ago and this year they made that transition. I'm not sure exactly when, but I think that's smart because they run the risk of competition, getting a foothold in those new logos and it's harder to unseat. Okay, so you have this battle going on between two companies, Snowflake and Databricks. So let's put this into context with this next graphic. We know the big topic in the battle right now, of course, is AI. So what we did, and this is the power of the ETR dataset, is we cut the ETR database by those customers who are leading AI ML accounts. So these are accounts of doing advanced ML and AI and we're filtering the same companies on those accounts to see what kind of net score and presence that they have in the data. So you see upper left, we're talking about ML AI accounts and then that end, you can see that end in that red triangle or rectangle drops to 492 before it was like 1500, over 1500. Now it drops to just under 500. And we plot the positions of those same firms and we're cutting the data for their product portfolios that are analytics, database and AI platforms. So that filters out Microsoft, Azure cloud, AWS, Lambda and AWS cloud, et cetera. So it's pure in those three taxonomy sectors. Now, Microsoft, as you can see plays a leading role here and leapfrogged everybody with its open AI deal. Amazon's got a really good AI story with SageMaker, very good actually. And we know Databricks is really strong in the ML AI space, but Snowflake has been making moves there. An example is Neva and the Nvidia announcements this week that they're sort of bundling in to the Snowpark container services. It's also fascinating to see Snowflake and Databricks right on top of each other on this chart. So on the one hand, you'd say, well, that's interesting. Snowflake's presence in ML AI accounts is strong, even though you'd think Databricks would be stronger. But first of all, there's a lot of more Snowflake accounts and the reality is a large overlap between Snowflake and Databricks accounts because of what we said before, historically they solved different problems, they were part of that new, new stack, but increasingly they're on a collision course. So there's an interesting flip side here that we want to explore. And again, this is something that's kind of preliminary but we'll introduce it. As I said before, Snowflake's revenue includes the AWS charges, okay? Databricks revenue doesn't. So what if you stripped out the value of that AWS revenue that actually Snowflake is charging for? Kind of Snowflake net revenue. We think those companies would be much more comparably sized. So it'll be interesting to see when, if and when, I'm sure it's more of a when, Databricks goes public, we'll be able to dig into that a little bit and we'll start probing a little bit more to try to understand, you know, how, what that net revenue would look like in the Databricks or sorry, the Snowflake concept, our context. And this is important because there's a misunderstanding what we were talking about earlier about maybe Snowflake's more expensive, so we have to do an apples to apples. So we're really trying to unpack that and understand it. Okay, so you see the two firms in this graphic, they're right on top of each other, but you know, one has to ask, what if you pulled that bundled revenue out? And that's what we're going to do. Would they be the same size as possible? We're going to look at that. Okay, let's end here with a quick summary, a little bit of what we've learned this week and we'll bring in after we talk to John Furrier and Rob Streche and George Gilbert, we'll bring in the context from the Databricks event out in San Francisco to get their take. So this slide we're talking about the expanding Snowflake universe, the big three announcements this year and there were more, but the big three, actually really big four, I remiss to leave out the big Monday night announcement was NVIDIA and Snowflake encapsulating within the Snow Park Container Services NVIDIA's entire stack. So that's huge, Jensen Wang said we're going to supercharge Snowflake and that was a really interesting discussion and we talked about it a lot this week in the queue. The other big three announcements that came after on Tuesday, Frank Slutman sort of teed it up, Iceberg Open Tables, the big news there really was they basically did away with any performance degradation or lack of benefit between doing Iceberg inside of Snowflake or actually leaving data in the Iceberg Tables. So the performance is now the same, first-class citizens. This is an example of Snowflake extending, stretching its fabric or its mesh to more data types and creating what they would like to see is the best possible experience. Second big announcement, native application framework. So let's think of that as the app store for enterprise. We've got 25 companies now participating in that application framework. We can see a lot more. We heard from Denise Pearson earlier today when she announced that next year is going to be in Moscone. They're going to integrate their developer conference in with Snowflake Summit. So it's probably going to double in size in talking about 10,000 to 20,000 people because they're really starting to cater to that developer persona. So we'll be watching that closely. The big, big announcement in addition to the NVIDIA announcement was Snowflake Snowpark Container Services. This really changes the game in terms of how you can bring data and apps and build apps inside of Snowflake. Dramatically simplifies it. The example we gave a lot of times this week was Blue Yonder bringing in the former manugistics capability, containerizing that and making that a first-class citizen as well as startups actually simplifying the development experience inside of Snowflake. There were tons of other announcements like kind of boring but important stuff like AutoSync with Git repos and logging and tracing APIs and a new open source command line interface which people were really excited about. So these are sort of nitty gritty kinds of developer things that just make their lives easier. So anyway, the bottom line is the future of data apps is evolving. We've kind of laid this out now in a series of posts and George and I will continue to do so. He was critical in helping me before he left for the Databricks event. Critical in putting this breaking analysis together so I want to thank him for that. I want to thank the entire CUBE team. Guys, really appreciate you doing it here. The breaking analysis, getting that done in an amazing fashion. We're going to toss this over to Alex Meyerson and Ken Schiffman for final packaging. Kristen Martin and Cheryl Knight, they helped to get the word out on siliconangle.com and in our newsletters and in social media. Remember I publish each week on wikibond.com and siliconangle.com. All these episodes are available as podcasts. All you can do is search breaking analysis podcast. I want to get in touch, email me at david.volante at siliconangle.com. You can hit me up, direct message at dvolante or on LinkedIn. Check out our LinkedIn posts and do check out etr.ai. They got the best survey data in the business. This is Dave Vellante for the CUBE Insights powered by ETR. Thanks for watching everybody and we'll see you next time on breaking analysis.