 From theCUBE Studios in Palo Alto in Boston, bringing you data-driven insights from theCUBE and ETR. This is Breaking Analysis with Dave Vellante. At our inaugural SuperCloud 22 event, we further refined the concept of a SuperCloud, iterating on the definition. The salient attributes and some examples of what is and what is not a SuperCloud. Welcome to this week's Wikibon Cube Insights, powered by ETR. You know, Snowflake has always been what we feel is one of the strongest examples of a SuperCloud. And in this Breaking Analysis from our studios in Palo Alto, we unpack our interview with Benoit D'Avile, co-founder and president of products at Snowflake, and we test our SuperCloud definition on the company's data cloud platform. And we're really looking forward to your feedback. First, let's examine how we define SuperCloud. Very importantly, one of the goals of SuperCloud 22 was to get the community's input on the definition and iterate on previous work. SuperCloud is an emerging computing architecture that comprises a set of services which are abstracted from the underlying primitives of hyperscale clouds. We're talking about services such as compute, storage, networking, security, and other native tooling, like machine learning and developer tools to create a global system that spans more than one cloud. SuperCloud, as shown on this slide, has five essential properties. X number of deployment models and Y number of service models. We're looking for community input on X and Y and on the first point as well, so please weigh in and contribute. Now we've identified these five essential elements of a SuperCloud and let's talk about these. First, a SuperCloud has to run its services on more than one cloud, leveraging the cloud native tools offered by each of the cloud providers. The builder of the SuperCloud platform is responsible for optimizing the underlying primitives of each cloud and optimizing for the specific needs, be it cost or performance or latency or governance, data sharing, security, et cetera. But those primitives must be abstracted such that a common experience is delivered across the clouds for both users and developers. The SuperCloud has a metadata intelligence layer that can maximize efficiency for the specific purpose of the SuperCloud, i.e. the purpose that the SuperCloud is intended for and it does so in a federated model. And it includes what we call a super pass. This is a prerequisite that is a purpose built component and enables ecosystem partners to customize and monetize incremental services while at the same time ensuring that a common experiences exist across clouds. Now, in terms of deployment models, we'd really like to get more feedback on this piece, but here's where we are so far based on the feedback we got at SuperCloud 22. We see three deployment models. The first is one where a control plane may run on one cloud but supports data plane interactions with more than one other cloud. The second model instantiates the SuperCloud services on each individual cloud and within regions and can support interactions across more than one cloud with a unified interface, connecting those instantiations, those instances to create a common experience. And the third model superimposes its services as a layer or in the case of Snowflake, they call it a mesh on top of the cloud. On top of the cloud providers, region or regions with a single global instantiation, of those services which spans multiple cloud providers. This is our understanding from the conversation with Benoit Dajaville as to how Snowflake approaches its solutions. And for now, we're going to park the service models. We need more time to flesh that out and we'll propose something shortly for you to comment on. Now, we peppered Benoit Dajaville at SuperCloud 22 to test how the Snowflake data cloud aligns to our concepts and our definition. Let me also say that Snowflake doesn't use the term data cloud. They really want to respect and they want to denigrate the importance of their hyperscale partners, nor do we. But we do think the hyperscalers today anyway are building or not building what we call SuperClouds but they are, but people who are building SuperClouds are building on top of hyperscale clouds. That is a prerequisite. So here are the questions that we tested with Snowflake. First question. How does Snowflake architect its data cloud and what is its deployment model? Listen to Dajaville talk about how Snowflake has architected a single system. Play the clip. There are several ways to do this, you know, SuperCloud as you name them. The way we picked is to create one single system and that's very important, right? There are several ways, right? You can instantiate your solution in every region of the cloud and potentially that region could be AWS that region could be GCP. So you are indeed a multi-cloud solution. But Snowflake, we did it differently. We are really creating cloud regions which are superimposed on top of the cloud provider, you know, region, infrastructure region. So we are building our regions. But where it's very different is that each region of Snowflake is not one instantiation of our service. Our service is global by nature. We can move data from one region to the other. When you land in Snowflake, you land into one region but you can grow from there and you can, you know, exist in multiple cloud at the same time. And that's very important, right? It's not one single, I mean, different instantiation of a system is one single instantiation which covers many cloud regions and many cloud providers. Snowflake chose the most advanced level of our three deployment models. D'Ajaville talked about two, presumably so it could maintain maximum control and ensure that common experience, like the iPhone model. Next, we probed about the technical enablers of the data cloud. Listen to D'Ajaville talk about snow grid. He uses the term mesh. And then this can get confusing with your Jean-Marc D'Agony's data mesh concept. But listen to Benoit's explanation. Well, as I said, you know, first we start by building, you know, Snowflake regions. We have today three regions that spawn, you know, the world. So it's a worldwide system with many regions. But all these regions are connected together. They are, you know, mesh together without technology. We name it snow grid. And that makes it hard because, you know, regions, you know, Azure region can talk to a WS region or GCP regions. And as a user of our cloud, you don't see really this regional differences that, you know, regions are in different, you know, potentially cloud. When you use Snowflake, you can exist. Your presence as an organization can be in several regions, several clouds if you want, geographic and both geographic and cloud provider. So I can share data irrespective of the cloud. And I'm in the Snowflake data cloud. Is that correct? I can do that today. Exactly. And that's very critical, right? What we wanted is to remove data silos. And when you instantiate a system in one single region, and that system is locked in that region, you cannot communicate with other parts of the world. You are locking data in one region, right? And we didn't want to do that. We wanted, you know, data to be distributed the way customer wants it to be distributed across the world and potentially sharing data at world scale. Now, maybe there are many ways to skin the other cat, meaning perhaps if a platform does instantiate in multiple places, there are ways to share data, but this is how Snowflake chose to approach the problem. Next question. How do you deal with latency in this big global system? This is really important to us because while Snowflake has some really smart people working as engineers and the like, we don't think they've solved for the speed of light problem. They're best people working on it, as we often joke. Listen to Ben Wadajevil's comments on this topic. So yes and no, the way we do it is very expensive to do that because generally if you want to join, you know, data which are in different region and different cloud is going to be very expensive because you need to move data every time you join it. So the way we do it is that you replicate the subset of data that you want to access from one region from other regions. So you can create this data mesh, but data is replicated to make it very cheap and very performant too. And is the Snowgrid, does that have the metadata intelligence to actually perform it? Can you describe that a little bit? Yes, Snowgrid is both a way to exchange metadata so each region of Snowflag knows about all the other regions of Snowflag, every time we create a new region, the metadata is distributed over our data cloud. Not only region knows all the region, but knows every organization that exists in our cloud where this organization is, where data can be replicated by this organization. And then of course it's also used as a way to exchange data. So you can exchange data by scale of data size and we just had, I was just receiving an email from one of our customer who moved more than four petabytes of data cross region, cross cloud providers in few days. And it's a lot of data so it takes some time to move but they were able to do that online, completely online and switch over to the other region, which is failover is very important also. So yes and no probably means typically no. He says yes and no probably means no. So it sounds like Snowflag is selectively pulling small amounts of data and replicating it where necessary. But you also heard him talk about the metadata layer, which is one of the essential aspects of SuperCloud. Okay, next we dug into security. It's one of the most important issues and we think one of the hardest parts related to deploying SuperCloud. So we've talked about how the cloud has become the first line of defense for the CISO but now with multi-cloud, you have multiple first lines of defense and that means multiple shared responsibility models and multiple tool sets from different cloud providers and then expanded threat surface. So listen to Benoit's explanation here, please play the clip. This is a great question. Security has always been the most important aspect of Snowflag since day one, right? This is the question that every customer of ours has, you know, how can you guarantee the security of my data? And so we secure data really tightly in region. We have several layers of security. It starts by encrypting it every data at rest and that's very important. A lot of customers are not doing that, right? You hear these attacks, for example, on cloud, you know, where someone left, you know, their buckets open and then, you know, you can access the data because it's a non-encrypted. So we are encrypting everything at rest. We are encrypting everything in transit. So a region is very secure. Now, you know, you never, from one region, you never access data from another region, it's not flag. That's why also we replicate data. Now the replication of that data across region or the metadata for that matter is really highly secure. So SnowGrid ensures that everything is encrypted. Everything is, you know, we have multiple, you know, encryption keys and it's, you know, stored in hardware, you know, secure modules. So we built, you know, SnowGrid such that it's secure and it will always be secure movement of data. So when we heard this explanation, we immediately went to the lowest common denominator question, meaning when you think about how AWS, for instance, deals with data in motion or data in rest, it might be different from how another cloud provider deals with it. So how does AWS, differences, for example, in the AWS maturity model for various cloud capabilities, you know, let's say they've got a faster Nitro or Graviton, does it, do you have to, how does Snowflake deal with that? Do they have to slow everything else down? Like imagine a caravan cruising, you know, across the desert. So, you know, every truck can keep up. Let's listen. It's a great question. I mean, of course our software is abstracting, you know, all the cloud providers, you know, infrastructure such that when you run in one region, let's say AWS or Azure, it doesn't make any difference as far as the applications are concerned. And this abstraction, of course, is a lot of work. I mean, really, really a lot of work because it needs to be secure. It needs to be performance and, you know, every cloud and it has, you know, to expose APIs which are uniform. And, you know, cloud providers, even though they have potentially the same concept, let's say blob storage, APIs are completely different. The way, you know, these systems are secure is completely different. The errors that you can get and the retry, you know, mechanism is very different from one cloud to the other. The performance is also different. We discovered that when we were starting to port our software and, you know, we had to completely rethink how to leverage blob storage in that cloud versus that cloud because just of performance too. And so we had, you know, for example, to, you know, Stripe data. So all this work is work that, you know, you don't need as an application because our vision, really, is that application which are running in our data cloud can, you know, be abstracted of all these difference and we provide all the services, all the workload that this application need whether it's transactional access to data, analytical access to data, you know, managing, you know, logs, managing, you know, metrics, all of these is abstracted too such that they are not, you know, tied to one, you know, particular service of one cloud and distributing this application across, you know, many regions, many cloud is very seamless. So from that answer, we know that Snowflake takes care of everything but we really don't understand the performance implications, you know, in that specific case, but we feel pretty certain that the promises that Snowflake makes around governance and security within their data sharing construct will be kept. Now, another criterion that we've proposed for SuperCloud is a super pass layer to create a common developer experience and an enabler for ecosystem partners to monetize. Please play the clip. Let's listen. We build it's, you know, a custom build because as you said, you know, what exists in one cloud might not exist in another cloud provider, right? So we have to build, you know, all these components that's a modern application, modern data application need and that, you know, goes to machine learning, as I said, transactional analytical system and the entire thing so that it can run in isolation basically. And the objective is the developer experience will be identical across those clouds. Yes, the developers doesn't need to worry about cloud provider and actually our system, we have, we didn't talk about it, but the marketplace that we have, which allows actually to deliver. We're getting there. Yeah. Okay. Now, we're not going to go deep into ecosystem today. We've talked about Snowflake's strengths in this regard, but Snowflake, they pretty much ticked all the boxes on our SuperCloud attributes and definition. We asked Ben Waddageville to confirm that this is all shipping and available today. And he also gave us a glimpse of the future. Play the clip. And we are still developing it, you know, transactional, you know, Unistore, as we call it, was announced in last summit. So they are still, you know, work in progress, but that's the vision, right? And that's important because we talk about the infrastructure, right? You mentioned a lot about storage and compute, but it's not only that, right? When you think about application, they need to use a transactional database. They need to use an analytical system. They need to use, you know, machine learning. So you need to provide also all these services which are consistent across all the cloud providers. So you can hear Dajaville talking about expanding beyond taking advantage of the core infrastructure, storage and networking, et cetera, and bringing intelligence to the data through machine learning and AI. So of course, there's more to come. And there better be at this company's valuation, despite the recent sharp pullback in a tightening fed environment. Okay, so I know it's cliche, but everyone's comparing Snowflakes and Databricks. Databricks has been pretty vocal about its open source posture compared to Snowflakes. And it just so happens that we had Ali Gotzion at SuperCloud 22 as well. He wasn't in studio, he had to do a remote because I guess he's presenting at an investor conference this week. So we had to bring him in remotely. Now I didn't get to do this interview, John Furrier did, but I listened to it and captured this clip about how Databricks sees SuperCloud and the importance of open source. Take a listen to Goatsey. Yeah, let me start by saying we just, we're big fans of open source. We think that open source is a force in software that's going to continue for decades, hundreds of years. And it's going to slowly replace all proprietary code in its way. We saw that it could do that with the most advanced technology. Windows proprietary operating system, very complicated, got replaced with Linux. So open source can pretty much do anything. And what we're seeing with the data lake house is that slowly the open source community is building a replacement for the proprietary data warehouse, data lake, machine learning, real-time stack in open source. And we're excited to be part of it. For us, Delta Lake is a very important project that really helps you standardize how you lay out your data in the cloud. And when it comes a really important protocol called Delta Sharing, that enables you in an open way, actually for the first time ever, share large data sets between organizations. But it uses an open protocol. So the great thing about that is you don't need to be a Databricks customer. You don't need to even like Databricks. You just need to use this open source project and you can now securely share data sets between organizations across clouds. And it actually does so really efficiently. Just one copy of the data. So you don't have to copy it if you're within the same cloud. So the implication of Alley Goatsey's comments is that Databricks with Delta Sharing, as John implied, is playing a long game. Now, I don't know if enough about the Databricks architecture to comment in detail, I got to do more research there. So I reached out to my two analyst friends, Tony Baer and Sanjeev Mohan, to see what they thought because they cover these companies pretty closely. Here's what Tony Baer said. Quote, I've viewed the divergent lake house strategies of Databricks and Snowflake in the context of their roots. Prior to Delta Lake, Databricks prime focus was the compute, not the storage layer. And more specifically, they were a compute engine, not a database. Snowflake approached from the opposite end of the pool as they originally fit the mold of the classic database company rather than a specific compute engine per se. The lake house pushes both companies outside of their original comfort zones. Databricks to storage, Snowflake to compute engine. So it makes perfect sense for Databricks to embrace the open source narrative at the storage layer and for Snowflake to continue its walled garden approach. But in the long run, their strategies are already overlapping. Databricks is not a 100% open source company. Its practitioner experience has always been proprietary and now so is its SQL query engine. Likewise, Snowflake has had to open up with the support of Iceberg for open data lake format. The question really becomes how serious Snowflake will be in making Iceberg a first class citizen in its environment that is not necessarily officially branding a lake house but effectively is. And likewise, can Databricks deliver the service levels associated with walled gardens through a more brute force approach that relies heavily on the query engine. At the end of the day, those are the key requirements that will matter to Databricks and Snowflake customers. End quote. That was some deep thought by Tony. Thank you for that. Sanjay Mohan added the following. Quote, open source is a slippery slope. People buy mobile phones based on open source Android but it's not fully open. Similarly, Databricks Delta Lake was not originally fully open source. And even today, its photon execution engine is not. We are always going to live in a hybrid world. Snowflake and Databricks will support whatever model works best for them and their customers. The big question is, do customers care as deeply about which vendor has a higher degree of openness as we technology people do? I believe customers evaluation criteria is far more nuanced than just to decipher each vendor's open source claims. End quote. Okay. So I had to ask Dajaville about their so-called wall garden approach and what their strategy is with Apache, Iceberg. Here's what he said. Iceberg is very important. So just to give some context, Iceberg is an open table format, which was first developed by Netflix and Netflix put its open source in the Apache community. So we embrace that open source standard because it's widely used by many companies and also many companies have really invested a lot of effort in building big data, adobe solution or data like solution and they want to use Snowflake and they couldn't really use Snowflake because all their data were in open format. So we are embracing Iceberg to help these companies move to the cloud. But why we have been re-lengthed with direct access to data? Direct access to data is a little bit of a problem for us and the reason is when you direct access to data, now you have direct access to storage. Now you have to understand, for example, the specificity of one cloud versus the other. So as soon as you start to have direct access to data, you lose your cloud diagnostic layer. You don't access data with API. When you have direct access to data it's very hard to secure data because you need to grant access, direct access to tools which are not protected and you see a lot of hacking of data because of that. So that was not direct access to data is not serving well our customers and that's why we have been re-lengthed to do that because it's not cloud diagnostic, it's you have to code that, you have to, you need a lot of intelligence while API is accessed. So we want open APIs that I guess the way we embrace openness is by open API you access directly data. Here's my take, Snowflake is hedging its bets because enough people care about open source that they have to have some open data format options and it's good optics. And you heard Benoit Dajaville talk about the risks of directly accessing the data and the complexities it brings. Now is that maybe a little fud against Databricks? Maybe, but same can be said for Ali's comments, maybe fudding the proprietoriness of Snowflake. But as both analysts pointed out, open is a spectrum, hey, I remember Unix used to equal open systems. Okay, let's end with some ETR spending data and why not compare Snowflake and Databricks spending profiles? This is an XY graph with net score or spending momentum on the Y axis and pervasiveness or overlap in the dataset on the X axis. This is data from the January survey when Snowflake was holding above 80% net score off the charts. Databricks was also very strong in the upper 60s. Now, let's fast forward to this next chart and show you the July ETR survey data and you can see Snowflake has come back down to Earth. Now, I remember anything above 40% net score is highly elevated, so both companies are doing well, but Snowflake is well off its highs and Databricks has come down somewhat as well. Databricks is inching to the right, Snowflake rocketed to the right post its IPO. And as we know, Databricks wasn't able to get to IPO during the COVID bubble. Allie Goetze is at the Morgan Stanley CEO Conference this week. They got plenty of cash to withstand a long-term recession, I'm told. And they've started the message that they're a billion dollars in annualized revenue. I'm not sure exactly what that means. I've seen some numbers on their gross margins. I'm not sure what that means. I've seen some numbers on their net revenue retention. Again, our reserve judgment until we see an S1. But it's clear, both of these companies have momentum and they're out competing in the market. Well, as always, be the ultimate arbiter. Different philosophies, perhaps. Is it like Democrats and Republicans? Well, it could be, but they're both going after solving data problems. Both companies are trying to help customers get more value out of their data and both companies are highly valued so they have to perform for their investors. To paraphrase Ralph Nader, the similarities may be greater than the differences. Okay, that's it for today. Thanks to the team from Palo Alto for this awesome super cloud studio build. Alex Meyerson and Ken Schiffman are on production in the Palo Alto studios today. Kristen Martin and Cheryl Knight, get the word out to our community. Rob Hoef is our editor in chief over at Silicon Angle. Thanks to all. Please check out ETR.AI for all the survey data. Remember, these episodes are all available as podcasts wherever you listen, just search, breaking analysis podcasts. I publish each week on wikibon.com and siliconangle.com. You can email me at david.volante at siliconangle.com or DM me at dvolante or comment on my LinkedIn posts. And please, as I say, ETR has got some of the best survey data in the business. We track it every quarter and really excited to be partners with them. This is Dave Volante for the CUBE Insights powered by ETR. Thanks for watching and we'll see you next time on breaking analysis.