 Extracting the signal from the noise, it's theCUBE. Covering VMworld 2015. Brought to you by VMware and its ecosystem sponsors. Now your host, Jeff Frick. Hey, welcome back everybody. Jeff Frick here with theCUBE. We are live in San Francisco, Moscone North, stopped by the lobby. It's not quite as busy without the keynotes as it's been the last couple of days, but we're excited for day three wall-to-wall coverage from VMworld 2015. Joined in my next segment by George Gilbert from Wikibon, George, good to see ya. And Bob Muglia, CEO of Soft Flake Computing, welcome. Snowflake. Snowflake, I'm sorry, Snowflake Computing. It's live TV and ice cream are probably more than anybody. So welcome, Bob. Glad to be here. So for people that aren't familiar with Snowflake, why don't you give them kind of the quick overview of what you guys are up to? We're a cloud data warehouse company. We were founded about three years ago. We built a complete SQL relational database, data warehouse from scratch. So it's all new database code. We don't use any pre-existing things like Hadoop or Postgres or anything like that in creating it. It's an all new system. It was designed for the cloud and it's an incredible data warehouse. I think it's really like, it's one of the best products I've ever experienced and it's solving a lot of problems for our customers. And then where are you kind of in the life of the company in terms of, you're obviously GA, how many people kind of funding, give us kind of that input? We've completed our series C funding, which was earlier this year that 45 million we've raised a total of 76 million, about 85 people in the company, as I said, three years old. Company was founded by the three founders, all architects, two Oracle data warehouse architects that had been in the data warehousing group for a long time and then the third founders that performance expert. So Bob, you have a rather illustrious background. Why don't you tell us a little about sort of your history and what brought you back to the startup world. Sure, thanks George. So yeah, the thing for me is I spent 23 years at Microsoft. I was the first technical guy on SQL server back in 1988 when it was the Ashton Tate Microsoft SQL server on OS2 if anyone can remember that. And then I had the good fortune to work and run product teams within Microsoft for that 23 years. So I was, for the last eight years, seven years at Microsoft, I was the president of the server and tools group. So I ran Windows server, SQL server, Visual Studio, all of our management and security products. And then I left there, I spent a couple of years at Juniper and decided I really wanted to focus on building things. And what I saw looking at the big tech industries is a lot of big tech companies that have strong legacy businesses, but they're not doing the innovation and all the innovation was happening in small companies. So I sought out a small company that was doing something really unique and different and Snowflake was a perfect fit. So continuing on that theme, Snowflake as a cloud database, what distinguishes a cloud database from package software? Well, it's interesting. I mean, there's different kinds of cloud solutions. People think of information as a service, platform as a service and then software as a service. We're a software as a service. So we're fully turnkey, like Salesforce. You load data, you run queries. There's no administration, no DBA to work to do. There's no indices to create. There's no keys that need to be built to do distribution across different nodes. We handle all of that for the customer. And that's pretty different than other cloud data warehouses. And it's certainly very different than getting an appliance, a data warehouse appliance or software that's installed in a set of machines within a data center. We're really, those tasks are taken on by the IT department. So, okay, let's talk about first Oracle, which, you know, it's there because it's, It's there. It's there. They want to put it on their cloud now, but it's still at its heart, it's a package product. So whether they manage it or whether the user manages it, perhaps Oracle has more sophistication to bring to bear. All those activities you just mentioned, all those administrative tasks, still need to be done. Well, it depends, right? With Oracle, they absolutely still need to be done. And in fact, you know, our founders, Ben Wildeva, in particular, worked on Oracle management on Exadata. That was his, you know, function was to help make Oracle easier to use. And he found it to be a relatively fruitless and hopeless task. And so what wound up, winds up happening is while you can make it a bit easier, you're still a lot of work to do because of the inherent architecture of Oracle. And when Oracle talks cloud, they always mean managed services. I mean, that's really what they're saying when they talk about Oracle cloud products. Snowflake is totally different. We were designed from the ground up to be a full scale out SaaS service to support thousands and thousands of customers simultaneously with effectively no management tasks. All of that tuning and all the different knobs and dials to be set, that's all replaced with architecture that's designed to not require those settings to be done. So let me move from Oracle to something a little sort of further towards the service spectrum, but maybe not all the way with the scalability. Microsoft SQL Azure, you know, data services, I heard, I think it passed some at last year, they have like a million five instances, you know, last year. Of SQL Server. Yeah, of SQL Server. Right, very successful product. I love it. I have some affinity to SQL Server since I was there at the very beginning. So, you know, if you had sort of DBA managing 50 databases, you'd still need 30,000 DBAs there. So they must have done a fair amount of automation. You know, they've done some for sure and they will make it a bit easier to run in the cloud environment, but most of the administrative tasks that are present in SQL Server will still be present in SQL Azure data warehouse. And so you'll still need a DBA. Just like with Redshift, you still need a DBA. The existing products were all built. You know, if you look at competitive cloud technologies other than Snowflake and the data warehousing space, they're all existing package products that have been picked up and put into the cloud. And so the fundamental characteristics may make the same. If you look, I mean, sure, Microsoft or Amazon will provision the instances for you. That's great. And that helps for sure. You don't have to wheel in a new piece of hardware. But then from that point on, there's a lot of management tasks associated with that. Let's then key in on Redshift, because John Furrier had, I think, dinner with Andy Jassy a couple of weeks ago, and he singled out Redshift as sort of on fire and we've heard that from, you know, our startup contacts in the valley. We know that technology came partly from, you know, Park Cell and then they built a lot on it. What still is exposed in terms of administrative knobs that, like... Basically all of them. Really? I mean, what Amazon did is they acquired rights to Park Cell and they hosted it in the AWS cloud environment. They've done a very good job of doing that. So it's super easy in Amazon to instantiate a new Redshift cluster. But that's kind of where it ends. I mean, they help you back it up. There's a few things they do, but all of the administrative tasks you have to do, you still have to vacuum it, you still have to manage it, you still have to determine your distribution keys, all of the things that you have to do with Park Cell or really with any shared nothing database you have to do with Redshift. And, you know, that's one of the big differentiators that Snowflake has is that all of those tasks don't exist. We don't use a traditional architecture like shared disk or shared nothing. In fact, we have a new architecture that has never been existed before that we call multi-cluster shared data that essentially makes this administrative work go away and provides us with an incredible degree of elasticity and almost limitless scalability. Okay, so let's try and quantify that. Whether it's in DBAs or databases per DBA that like Redshift requires or Azure SQL database services or, you know, and Snowflake or just dollars per terabyte for running costs. So it's, I mean, you take whatever it is for on-premise and you say, okay, it's almost the same in the cloud, except you don't need to have the person ordering the hardware and installing the hardware and connecting wires. That's helpful, right? But the administrative tasks for Azure, SQL Azure Data Warehouse or for Redshift or for other products, certainly things like Hadoop, God forbid, are essentially the same in a cloud environment as they are in a non-premise environment. So that's those comparisons. In Snowflake, you don't need a DBA. So the equation is divide by zero. There's no need to have a DBA in Snowflake. Is there just so we can get a sense of the metrics? I understand it's, you can say zero for Snowflake, but sort of how would you measure the running cost, you know, in DBAs per terabyte, DBAs per database, you know, or just operational costs, total cost of ownership. Well, I mean, the first thing about all of these products is, is what does it cost to acquire the system, right? There's the hardware cost, there's a software cost, maybe it's packages and appliance. And if you look at traditional enterprise data warehouses that exist non-premise, so whether it's Oracle or Teradata or Natesa, you get on the list, those things are expensive. They are just really expensive. So comparing them to a cloud solution, it's almost an unfair comparison. It really is, because it's the cost that the customer will bear is a fraction of what they would pay for the software and hardware alone, let alone the administrative cost. And then there's further savings, as I said, from an administrative perspective. Can you help us quantify? I know I'm repeating myself, I'm just, I want to get something. Let me give you an example, let me give you an example. We have had customers come to us that have Oracle maintenance, Oracle systems that are approaching the end of their useful life and they've had, you know, Oracle has quoted them like a $6 million replacement cost. And their annual maintenance on that is north of a million dollars, right? Because the God-given 22% maintenance cost that Larry charges all customers. One of Larry's rules. So if you look at that, you could get the same level of performance for Snowflake. In fact, better performance on Snowflake for that installation on an annual basis for well under half a million dollars a year. All in, operational costs. And that doesn't, you know, and that's, you know, of course, the Oracle system, you're not talking about power, you're not talking about data center costs, you're not talking about operational costs. So you add all of that up. I mean, you're talking about a lifetime, a lifetime cost over a five or six year system of 10, 12, you know, at least million dollars. We would be able to solve that customer problem all in upside down for well under, you know, well under two and a half million dollars. And it would be, so often it's a quarter or even less. And I'm being conservative. I want to be clear. I'm being conservative in these numbers. The actual numbers are probably better. So that's a good transition kind of to the business of what you guys are doing. So how much of your business is because you've got the special cloud native system ready to go for new apps. And then there's always a debate about kind of rip and replace versus, you know, what's already up and running. We don't want to touch it, migrate it to the cloud. And maybe, maybe not versus native, of course we'll put it there. But it sounds like you're getting some activity and people actually swapping out legacy and moving on. We are, we're in the middle. In fact, they have a call this afternoon with one of our customers that is doing an Oracle migration, a large exadata migration. And we're in the process of actually implementing that solution for the customer. We see several things. I mean, our sort of initial customer base came from people that had data already in the cloud and were familiar with the cloud. So often tech companies, advertising, media, gaming, companies that already were big users of the cloud. You know, we're now seeing a broader and broader set of customers that are more traditional enterprises evaluate the cloud. And I'm talking about public cloud situations here, right? So SaaS services like Snowflake. And the reality is almost all of these large companies have begun to adopt some SaaS services, whether it's for their CRM applications with Salesforce or whether it's email, they're moving down the path of bringing SaaS into their environment. Now obviously, if you're a major enterprise with thousands of business applications, you're not going to change overnight to the cloud. And so you're going to be operating in some form of hybrid environment for some period of time. But the migration and the acceptance of the cloud as a part of the overall IT solution is becoming much more commonplace. And I think one of the big things we've seen is that if you went back 12 months ago and you've talked to companies that were in the financial services industry or the healthcare industry, they would go no way to the cloud. We're not ready for that. And now what we're seeing is companies that are in the, even these companies that are in highly regulated industries being in a situation where they are putting in place policies, security policies and procedures to allow them to appropriately adopt cloud systems that take into account their security and regulatory needs. So they're laying the foundation then to really start to make that move, at least in a limited way, or find the places where they can make that move. And we find that great engagement with companies. We're having good conversations with both financial services and healthcare companies right now on that. And as an example, we're in the process of HIPAA certification for what we're doing with Snowflake. And that's a crucial thing to achieve in order to enable many of these healthcare companies to work with a cloud data warehouse. So the other thing we talked quite a bit is horses for courses. It's all driven by the workload. It's all driven by the workload. Where you put what's all driven by the workload. So are there any particular workflows that Snowflake is really a better solution than some of those that kind of stand out? Absolutely. So when we built this thing, when the engineers built this fully relational SQL, and by the way, it's full SQL, not this partial SQL. If you're familiar with SQL benchmarks, there's something called TPCDS, which is not a great benchmark, but it's a great test of the thoroughness of the analytical SQL capabilities that a data warehouse provides. And we run the whole thing. And many existing data warehouses don't do that. But the workload that is most interesting and where Snowflake is really highly differentiated is for customers who have some form of machine generated data, be it from web applications or cloud apps or mobile devices or sensors. And this data tends to take the form of what we call semi-structured data. And it comes in typically packaged in JSON or Avro format, sometimes XML. And Snowflake just is awesome at that. I mean, if you look compared to almost anything else in the market, customers that are trying to work with this data are in misery. I mean, they're practically crying at their desks. It's so difficult to actually get answers out of this data using traditional solutions like Hadoop, which just don't perform very well and are very hard. And with Snowflake, it's incredibly easy. You just load the data and you run queries and we were able to give them the answers almost instantly. So just along those lines, because I read the JSON or semi-structured database, white paper or data support, white paper. And others on paper have this sort of schema on needs, schema on read sort of thing. What's your secret sauce that makes this possible? Because... I love that question, Drew. Thanks for that question. So the other people can read in data and then just throw brute CPU horsepower at it to go through and find the answers. And be sure they can scan it. And if you take a Hadoop system and you use Hive or you use Impaler or something, that's what you're doing, right? You can run a query against that and you're just throwing the brute horsepower of the cluster at it. And someday you'll get an answer. I mean, the answers will come out, but it is not a very, very performant thing. And then meanwhile, you're dealing with the administrative costs of that solution, which by the way are even worse than the administrative costs of a traditional data warehouse. With Snowflake, what we do is we are schemalists in our design just like that. But unlike those solutions, when you loan data into Snowflake, we have a special data type, we call a variant data type. And when data gets loaded into variant, what happens is we discern the schema of the JSON file. So if you look at JSON or Avro, it is organized as a hierarchy with different levels within that hierarchy, providing different data elements. Now, while it is schemalists, and in fact the schema evolves dynamically, it changes over time. So unlike structured data, which has a fixed schema, this changes. There is some commonality to the schema. It repeats, you know, that you tend to have attributes that repeat. Well, we actually discern that. And then we use our same columnar compression that would be used in a modern data warehouse. And we apply that to this semi-structured data. And then we use our full query processor with pruning to be able to just select exactly the data you need. So you can perform a full relational query using the full rich semantics of SQL against this semi-structured data with performance almost as fast as structured. Would it be fair to say then that you're doing the work while the data is being ingested to add enough structure so that when you want to get it out at query time, you've had it structured enough where the elements that are in common, you can scan it very quickly. That's a good way to put it. We essentially discern the structure that exists within that at the time the data is loaded. And we don't have to require, there's no pre-declaration of this required. You just load the data and we intuit it as the data is loaded. And then we actually store metadata associated with that. So we compress the data. And so you have very low amounts of data that you have to, small amounts of data that you have to scan. And then we discern the key attributes of that data and we get statistics from that so that our query optimizer can just issue a query result. So like we have a customer, we have one of our customers that has a single table that approaches 200 terabytes in size. And that's compressed. Uncompressed it would be somewhere between one and two petabytes in size. And with that customer, I mean, if you scan 150 terabytes, you know, if you have to scan it all like you would do with Hadoop, it's going to take a while even with modern horsepower. But what these folks are able to do is the vast majority of their queries are what's happened in this period of time with these conditions, they set a set of predicates on it. And so instead of the result taking hours to happen, it can happen in minutes or even seconds. So it's literally orders of magnitude faster than the alternative approach. This is a lead in sort of to the, we're running out of time with the last question which is, so you have a product sort of data warehouse as a service and we've talked before about how Azure, Google Cloud Platform and Amazon have multiple data management products, they're calling a data platform, but they're still pretty discreet. When those start to come together, if they start to come together, are you essentially positioning Snowflake as, you know, already multi-model in the sense that it- It already is multi-model. And I don't know whether they ever will come together to be honest, because those are all built as discreet technologies with clear architectural foundations underneath them. And you just can't take, you know, you can't take an apple and orange and, you know, and make a pear out of it. It's a different fruit. Almost like the organizational, the R&D organizational barriers or boundaries show up in the product boundaries. The organizational and the product lineage. Now, I mean, all of these products, I mean, whether you're talking about SQL Server, whether you're talking about Redshift with Par Excel, I mean, all of these products have lineages that go back 20, 30 years into code bases that were designed, frankly, in the 1980s, right? SQL Server, I said, 1988. A lot of that code's still in the product. And if you look, you can't just snap your fingers and change that. So the crazy thing is, is that our founders had the guts and the ability to actually take and build the first and only these days, modern cloud data warehouse that was built from scratch to solve the problem that today's customers have. And those problems include, sure, a fully functional, structured, relational data warehouse that's super competitive against Oracle and Teradata on one hand, but also a product that at the same time seamlessly solves the problem for customers that are working with machine-generated data and essentially blows the socks off of alternative solutions like Hadoop. All right, we're out of time. That was a great wrap. Anyway, I was going to ask you for the last word. I think you got it in. It's a great, the last word is we love our product. There you go. It's the last word. It's possible. Thanks for stopping by Bob Buckley's No Flake Computing. Check it out. I'm Jeff Frick with George Gilbert. We're at Veer World 2015. You're watching theCUBE. We'll be back with our next guest after this short break.