 Hi everyone, welcome back to theCUBE's coverage of AWS re-invent 2021. We're wrapping up four days of coverage, two sets, two remote sets, one in Boston, one in Palo Alto. And really it's a pleasure to introduce Benoit Dajaville. He's the co-founder of Snowflake and president of products. Benoit, thanks for taking some time out and coming in the queue. Yeah, thank you for having me, Dave. You know, it's really a pleasure. I've been, we've been watching Snowflake since maybe not 2012, but mid last decade, you hit our radar and we said, wow, this company is going to go places and we made that call correctly. But it's been a pleasure to sort of follow you. We've talked a little bit remotely. I kind of want to go back to some of the fundamentals. I often, first of all, I wanted to mention your earnings last night. If you guys didn't see it, again, triple digit growth, $1.8 billion RPO, cash flow actually looking pretty good. So pretty amazing. Oh, and 173% NRR, you know, wow. And Mike Scarpelli's kind of bummed that you did so well. And I know why, right? Because it's going to be at some point, you know, and then he dials it down for the expectations and Wall Street says, oh, he's sandbagging. And then at some point you're actually going to, you're going to meet expectations and there's, you know, people are going to go, oh, they meet, met expectations. You know, but anyway, he's smart guy. He knows what he's doing. I live with so funny listening to him last night. But anyway, I want to go back to, when I talked to practitioners about data warehousing pre-cloud, they would say sound bites like, it's like a snake swallowing a basketball, they would tell me. And the other thing they said, we just chased the chips. Every time a new Intel chip comes out, we have to bring in new servers. And we're struggling. The cloud changed all that. Your vision and Terry's vision changed all that. Maybe go back to the fundamentals of what you saw. Yeah, I mean, we really wanted to address what we call the data challenges. And if you remember at that time, you know, that challenge was first the volume of data, machine-generated data. So it was way more than just structure data, right? Machine-generated data is web logs. And it's at petabyte scale. And there was no good solution for that type of data. Big data was not a great solution. Adoop was really bad. And there was no good solution for that. So we thought we should do something for big data. The other aspect was concurrency, right? Everyone wants to use this data analytic platform in an enterprise, right? And you have more and more workloads running against the same data and the systems that were built were not scaling, you know, for this workload. So you had to silo data, right? That's the only way big enterprise could deal with that is to create many different silos, Oracle, Teradata, you know, Datamars, you would hear Datamars. All of it was to offload, right, this data. And then there was what we call data sharing. How to get access to data which is not born inside the enterprise, right? So with theory, we wanted to solve all these challenges and we thought the only way to solve it was the cloud. And the cloud has really two free aspects. One is the elasticity. For all of a sudden, you can run every workload that you want concurrently in parallel, you know, on different compute resources and you can run them against the same data. So this is kind of the data lake, you know, model. If you want, at the same time, you can, you know, in the cloud create a service. So you can remove complexity from users and make it really easy for new workloads to be added to the system because you can manage, you can create a managed service where all of a sudden, our customers, they don't need to manage infrastructure, they don't need to patch, they don't need to tune. Everything is done by snowflake, the service and they can just load and run their query. And the third aspect is really collaboration. Is how to connect data sets together. And that's, you know, almost a new product for snowflake, these data sharing. So we really snowflake was all about combining big data and data warehouse in one system in the cloud and have only one single system where you can put all your data and all your workload. So you weren't necessarily trying to solve the data warehouse problem, you were trying to solve a data problem. And then it just so happened, data warehouse was a logical entry point for you. You know, it's really not that. Yes, we wanted to solve the data problem. And for us, big data was a really important problem to solve. So from day one, snowflake was all about machine-generated data, petabyte scale, but we wanted to do it right. And for us, right was not compromising on data warehouse principle, which is a CDT of transaction, which is really fast response time and which is also simplicity. So as I said, we wanted to solve kind of all the problems, you know, at the time of volume of data, concurrency and this sharing aspect. And this was 2012, you knew at that time that Hadoop wasn't going to be the answer. No, I mean, we were really, I mean, I mean, everyone knew that. I mean, everyone knew Hadoop was really bad, you know, complex to manage really slow. I mean, it had good aspect, right? This was the only system that could manage, you know, petabyte scale, you know, data sets. That's the only thing. Cheaply. Yeah, and cheaply, which was good. And we wanted really to do that plus, you know, have all the good attributes of data warehouse system. And at the same time, we wanted to build a system where if you are a data warehouse customer, if you are coming from Teradata, you can migrate to Snowflake and you will get a system which is faster than what you had on premise, right? That was critical. So we wanted to do big data without compromising on data warehouse. So several years ago, we looked at, you know, the hyperscalers and said, wow, I mean, last year they spent $100 billion in CapEx. And so we started to think about this abstraction layer and then we saw what you guys announced with the data cloud. We call it super clouds. And we see that as exactly what you're building. So that's clearly not just a data warehouse or database. It's technology that really hides the underlying complexity of all those clouds and then allows you to have federated governance and data sharing, all those things. Can you talk about sort of how you think about that architecture? So for me, what I say is that RISC Snowflake is the world wide web of data. And we are indeed a super cloud or we are super opposed to the infrastructure cloud, which is our friends at Amazon and of course, you know, Azure, I mean, Microsoft and Google. And as, you know, as any cloud, we have regions, Snowflake regions all over the world and located on different cloud providers. At the same time, our platform is global in the sense that every region interconnects with all the other regions. This is our Snowflake, Snowgrid data mesh if you want, such that as an organization, you can have your presence on several Snowflake regions. It doesn't matter which cloud provider. So you can mix AWS with Azure. You can use our cloud like that. And indeed, you can, you know, this is a cloud where you can run, you can store all your data. That's the first, you know, the thing that really matters and data is structured, but it's semi-structured, as I say, machine generated, beta by scale, but it's also unstructured, right? We have added support for images, text, videos where you can process this data in our, you know, system. And that's the workloads part. And workload, what is very important is that you can run this workload. Any number of workloads, so the number of workloads is effectively unlimited with Snowflake because each workload you can have is dedicated set of compute resources, all operating on the same data set. And the type of workloads is also very important. It's not only about dashboards and data warehouse, it's data engineering, it's data science, it's building application. We have many of our customers, we are building full-scale data cloud, I mean, cloud applications on top of Snowflake. Yeah, so the other thing, if you're not familiar with Snowflake, I don't know, maybe your head has been in the sand for a while, but separating compute and storage, I don't know if you were the first, but you were certainly the first to popularize it. And that allowed you to solve that chasing the chips problem and the swallowing the basketball, right? Because you had virtually infinite resources now at your disposal. Yeah, this is really the concurrency challenge that I was mentioning. Everyone wants to access the data. And of course, if everyone runs on the same set of compute resources, you have a bottleneck. So Snowflake was really about this multi workload, we call it multi cluster shared data architecture. But it's not difficult to run multiple cluster if you don't have consistency of data. So how to do that while maintaining transactional property of data, acidity, right? You cannot modify data from different clusters and when you commit, every other cluster will immediately see the change, right? As if everyone was running on the same cluster. So that was the challenge that we solved when we started Snowflake. Use the term data mesh. What is data mesh to Snowflake? Is it a concept? Is it a fabric? No, it's a very interesting point. Is, you know, as much as we like to centralize data, this becomes a bottleneck, right? When you are a large organization with different independent business units, everyone wants to manage their own data and they have domain specific expertise about that data. So having it centralized in 80 is not practical. At the same time, you really want to be able to connect these different data sets together and join, you know, different data together, right? So that's the data mesh architecture. Each data set is managed independently by business owners and then there is a contract which is exposed to others and you can combine and Snowflake architectures with data sharing, right? Data sharing that can happen within an organization or across organization allows, you know, to connect any data with any other data on our platform. Yeah, so when I first heard that term, you guys using the term data mesh, I got very excited because it's kind of the data mesh is the, my view anyway, is going to be the fundamental architecture of this decade and beyond. And the principles, if I understand it correctly, you're applying the principles of Jamak DeGany's data mesh within Snowflake. So decentralized data, it doesn't have to be physically in one place. It's logically, it's in the data cloud. It's logically desultralized, right? Yes. It's independently managed. And the reason, right, is the data that you need to use is not produced by your, even if in your company you want to centralize the data and having only one organization, let's say IT managing that, let's pretend, you know, yet you need to connect with other data sets which is managed by other organizations. So by nature, the data that you use cannot be centralized, right? So now that you have this principle, if you have a platform that's where you can store all the data, you know, wherever it is and you can connect this data very seamlessly, then you can use that platform for your own enterprise, right? To have different business units independently manage their data sets, connect these together so that as a company you have a 360 view of your customers, for example, but you can expand that, you know, outside of your enterprise and connect with data sets which are from your vertical, for example, financial data set that you don't have in your company or public any public data set. And the other key principles I think that you've touched on is really it's the line of business now increasingly they're building data products, all right, that are creating value and then also there's a self-service component. Anybody, assuming there's the fourth principle, governance, you got to have federated governance and it seems like you've kind of ticked the boxes and more than ticked the boxes, but engineered a solution to solve for those. No, it's very true. So Snowflake was really built to be really simple to use and you're right, our vision was it will be more than IT, right, who is going to use, you know, Snowflake is going now to be business you need because you do not have to manage infrastructure, you'll not have to patch, you'll not have to do these things that business, you know, cannot do. You just have to load your data and run your queries and run your applications. So now, you know, business can tightly use Snowflake and create value from that and yes, you're right, then connect their data with other, you know, data sets and to get, you know, maximum insight. Can you please talk about some of the things you're doing with AWS here at the event? I'm interested in what you're doing with your machine learning initiatives that you've recently announced, the AI piece. Yes, so one key aspect is data is not only about SQL, right, we started with SQL, but we expanded, you know, our platform to what we call data programmability, which is really about running program at scale across, you know, large volume of data and this was made popular with a programming model which was introduced by Panda DataFrames, later, you know, taken by Spark and now we have DataFrames in Snowflake where we are different than other systems is that these DataFrame programs, which are, you know, in Python or Java or Scala, you program with data, these DataFrames are compiled to our single execution platform. So we have one single execution platform, which is a data flow execution platform, which can run both SQL very efficiently, as I said, you know, data warehouse speed and also this very complex, you know, programs running, you know, Python and Java against, you know, this data and this is a single platform. You don't need to use two different systems. Yeah, so that's just, now, so you kind of really attack the traditional analytics base. People said, wow, Snowflake's really easy. Now you're injecting AI and machine intelligence. I see Databricks coming at it from the other angle. They started with machine learning, now they're sort of going after the analytics. Does there need to be a semantic layer to connect, because it's the same raw data, does there need to be a semantic layer to connect those two worlds? And yes, and that's what we are doing in our platform and that's very novel to Snowflake. As I said, you interact with data in different program, you pick your program. You are SQL programmer, you know, use SQL. You are a Python programmer, use data frames with Python. It doesn't really matter and then the semantic layer is our compiler and our processing engine is going to translate both your program and my program in Python, your program in SQL to the same execution platform and to the same programming language that Snowflake internally, we don't expose that programming language, but it's a Dataflow programming language that our execution platform executes. So at the end, you know, we might execute exactly the same program potentially, right? And that's very important because we spend all our IP and all our time engineering time to optimize this platform, to make it, you know, the fastest platform and we want to use that platform for any type of workloads, you know, whether it's data programs or SQL. Now, you and Terry were at Oracle, so you know a lot about benchmarking as Larry would stand up and say, we killed the competition, you guys are probably behind it, right? So you know a lot about that. You have a very behind it. Yeah, so you know a lot about that. I've had some experience, not a lot of technologists, but I'm an observer and analyst. You have to take benchmarking with a very big grain of salt and you're done. So you guys have generally stayed away from that. Databricks came out and they came up with all these benchmarks. So you had to respond because otherwise it's just, it's out there. Now you re-ran the benchmarks, you took out the materialized views and all the expensive stuff that they included in your cost, you know, your price performance. But you wrote, I thought, a very cogent blog. Maybe you could talk about sort of why you did that and your general philosophy around benchmarking. Yeah, yeah, you know, from day one with Terry, we say never again we will participate in this, you know, really a stupid benchmark war because it's really not in the interest of customers. And we have been really at the front line of that war with Terry, both of us, you know, really doing special tricks, right? And optimizing this query to death, this queries that no one runs, apart from this syntactic benchmark, we optimize them to death to have the best number when we were at Oracle. And we decided that this isn't really not helping customers in the end. So we said we snowflake will not, you know, do that. And actually we are not the only one not to do that. If you look at who has published TPC DS, you will see, you know, no one, none of the big vendors. It's not because they cannot run TPC DS, you know, Oracle can run it. I know that and all the other, you know, big datawares vendor can. But, you know, it's something of a little bit the past. And TPC was really important at some point and it's not really relevant now. So we are not going to compete. And that's what we said basically in our blog. We are not interested in participating in this war. We want to invest our engineering effort and our IP in solving real world, you know, issues and performance issues that we have. And we want to improve our engine, you know, for this real world, you know, customers. And the nice thing with snowflake, because it's a service, we see exactly, you know, all the queries that our customers, you know, are executing. So we know where we are struggling as a system. And that's where we want to invest. And we want to improve. And if you look at, you know, many announcements that we made, it's all about, you know, under the cover improving, you know, snowflake and getting the benefit of this improvement to our customer. So that was the message of that blog. And the other message was, okay, you know, Mr. Databricks, you know, it's nice. It's perfect that you, you know, I mean, everyone makes a decision, right? We made a decision not to participate. Databricks made another decision, which is very fine. And that's fine that they publish, you know, their number on their system. What is not fine is that they publish number, you know, using, you know, snowflake and misrepresenting really our performance. And that's what we wanted also to correct. Yeah. Well, thank you for, for, for going into that. I know it's, look at it. Leaders don't necessarily have to get involved in that mud sling. Yeah. And we, we, you know, Right. There was a lot of back and forth. Enough, enough said about that. So that's cool. I want to ask you, I interviewed Frank last spring, right after the lockdown, he was kind enough to come on virtually. And I asked him about on-prem. And he was, you know, Frank, he doesn't, he doesn't make words. He said, we're not getting into a halfway house. That's not going to happen. And of course, he really can't do what you do on-prem. You can't separate computing. So some have tried, but it's not the same. But at the same time that you see, like Andreessen comes out with this blog that says, a huge portion of your cost of goods sold is going to be the cloud. So you're going to have to repatriate. Help me square that circle. Is it cloud forever? Is it, is it, will you never say never? Can you share with that? I will never say never. It's not my style. You know, I always say you can always change your mind and maybe different factors, you know, can change your mind. What was true at some point might not be true, you know, at the later point. But as of now, I don't see any reason for us to go on-premise. As you mentioned at the beginning, right? Snowflake is growing like crazy. The world is moving to the cloud. I think, you know, maybe it goes both ways, but I would say 90% or 99% of the world is moving to the cloud. Maybe 1% is coming back for some, you know, very specific reasons. I don't think that the world is going to move back on-premise. So in the end, we might miss, you know, a small percentage of the workload that we stay on-premise and that's okay. Yeah, so, and as well, if you dig into the, some of the financial statements, you'll see, read the notes where you've renegotiated, right? I mean, we're talking big numbers. I mean, hundreds and hundreds of millions of dollars cost reduction, actually more over a 10-year period. Billions of your cloud bills. So the cloud suppliers, I mean, they don't want to lose you as a customer, right? They're one of their biggest customers. So it's awesome. Last question is kind of, I mean, your work now is to really drive the data cloud, get adoption up, you know, build that super cloud, we call it. Maybe you could talk a little bit about, you know, how you see the future. I mean, the future is really broadened, you know, the scope of Snowflake and really the, I would say the marketplace and our, you know, data sharing and services, you know, which are likely built natively on Snowflake and are shared, you know, through our platform and can operate, you know, it can mix, you know, data on provider size, with data on consumer side and creating this collaboration within the, you know, Snowflake data cloud, I think is really the future. And then we are really only scratching the surface of that. And you can see the enthusiasm of Snowflake data cloud and, you know, vertical industry, we have not the financial, you know, data cloud, you know, industry complete vertical industry latching on that concept and collaborating, you know, via Snowflake, which was not possible before. And I think, you know, you talked about machine learning, for example, machine learning, collaboration through machine learning, the ones who are building this advanced model might not be the same as the one who are consuming this model, right? It might be this collaboration between, you know, expertise and consumer of that expertise. So we are really at the beginning of this inter-connected world and to me, the World Wide Web of data that we are creating is really going to be amazing and it's all about connecting. Yeah, I'm glad you mentioned the ecosystem. I didn't give enough attention to that because as a cloud provider, which essentially you are, you've got to have a strong ecosystem. That's a hallmark of cloud. And then the other vertical that we didn't touch on is media and entertainment, a lot of direct to consumer. I think healthcare is going to be a huge vertical for you guys. All right, we got to go, Terry. I mean, Ben, thanks so much for coming on theCUBE. We really appreciate your time out. And thank you for watching. This is a wrap from AWS ReInvent 2021. theCUBE, the leader in global tech coverage. We'll see you next time.