 Live from San Jose, in the heart of Silicon Valley, it's theCUBE, covering DataWorks Summit 2017, brought to you by Hortonworks. Hey, welcome back to theCUBE. We are live on day one of the DataWorks Summit in the heart of Silicon Valley. I'm Lisa Martin with my co-host, Peter Burris, just chatting with our next guest about the Warriors win yesterday. We're also pretty excited about that. Please welcome, David, see the SVP of marketing from CUBE. Hey, David. Hey, thanks for having me. We're glad that you still have a voice after no doubt cheering on the whole team last night. It was a close call because I was yelling pretty loud yesterday. So talk to us about your, the SVP of marketing for CUBE. Big data platform in the cloud. You guys just had a big announcement a few weeks ago. What are your thoughts? What's going on with CUBE? What's going on with big data? What are you seeing in the market? So, you know, we're a cloud native data platform. And, you know, when we talk to customers, we're really, you know, they're really complaining about how they're just struggling with complexity and the barriers to entry. And, you know, they're really crying out for help. And, you know, the good news I suppose is, you know, we're in an industry that has a very high pace of innovation. That's great, right? You know, Spark has had eight versions now in two years. But that pace of innovation is, you know, making the complexity even harder. I was watching, you know, Cloudera bragging about how their new product is a combination of 24 open source projects. You know, that's tough stuff, right? And so if you're a practitioner, you know, trying to get big data operationalized in your company and trying to, you know, scale the use of data and analytics across the company, the nature of open source is it's designed for flexibility, right? You know, the source code's public. You have all these options and configuration settings, et cetera. But moving those into production and then scaling them in a reliable way is just crushing practitioners. And so data teams are suffering. And I think, you know, frankly, it's bad for our industry because, you know, Gartner's talking about, you know, an 80% failure rate of big data projects by 2018. Think about that. Like, what industry can survive when, you know, 70 or 80% of the projects fail? Well, I think, let's, let's, let me push on a little bit. Cause I think that the concern is about, is not about 70 to 80% of the, of the efforts to reach an answer in a complex big data thing is going to fail. We can probably accommodate that, but what we can accommodate is failure in the underlying infrastructure. Absolutely. So the research we've done, you know, suggests something as well that we are seeing an enormous amount of time spent on the underlying infrastructure. And there's a lot of failures there. People that say, I have a question. I want to know if there's an answer. And then trying to get to that answer and not getting the answer they want or getting a different answer. That kind of failure is still okay. Because that's experience. We're getting more and more and more. So it's not the failure in the data science side or in the application side. I would say like getting to an answer you don't like is a form of success. Right? Like you have an idea, you try it out. That's all great. So it's just testing. What Gartner is really saying is it's failure in the implementation of the infrastructure side. So it's the administrative and the operational side. It's a project that didn't deliver an end result and if the end result is what you hoped, great. If it was, you know. You proved it. Exactly. Couldn't even answer the question. So let me test something on you, Dave. David, we've been carrying a thesis at Wikibon for a while that it looks like open source is proving that it's very good at mimicking and not quite as good at inventing. Right. So by that I mean that if you put an operate if you drop an operating system in front of line of store vault you can look at that and say, I can do that. Right. And do a great job of it. If you put a development tool same kind of a thing. But big data is very, very complex. A lot of enormous number of use cases and open source has done a good job at a tool level. And it looks as though the tools are being built to make other tools more valuable as opposed to making it easy for a business to operationalize data science and the use of big data in their business. Would you agree or disagree with that? Yeah, I think that's sort of like fundamental to the philosophy of open source. You know, I'm going to do my work something I need for me but I'm going to share it with everybody else and they can contribute. But at the end of the day unlike commercial software there's sort of no one throat to choke. Right. And there's nobody who is going to guarantee the interoperability and the success of the piece of software that you're trying to deploy. There's not even a real coherent vision in many respects about what the final product is going to look like. So you know, what you have is a lot of really great cutting edge technology that a lot of really smart people have sort of poured their heart and souls to but that's a little different than trying to get to an end result. And you know, like it or not like commercial software packages are designed to deliver a result that you pay for. Open source being philosophically very different I think breeds inherent complexity and that complexity right now is I think at the root of the problem in our industry. So give us an example, David. You know, you're a marketing guy and my marketing gal. Give us an example of a customer maybe one of your favorite examples where where are you helping them? They're struggling here. They've made significant investments from an infrastructure perspective. They know there's value in the data of varying degrees as we've talked about before. How does does Keeble get in there and start helping this use case customer start to optimize and really start making this big data project successful? That's a great question. So there's really two things. Number one is, you know, we are a SaaS based platform in the cloud and what we do basically is make big data into more of a turnkey service. So actually the other day I was sort of surfing the internet and we have a customer from Sonic Drive-In. You know, they do hamburgers and stuff. Oh yeah. And they're doing a bunch of big data and this guy was at a data science meetup talking about, and we didn't put him up to this. He just volunteered and stuff. He was talking about how we've made his life so much easier. Why? Because all of the configuration stuff and the settings and how to manage costs was basically filling out a form and setting policy and parameters and not having to write scripts and figure out all these configuration settings. And if I set this one this way and that one that way, what happens? You know, we have a sort of more curated environment that makes that easy. But the thing that I'm really excited about is we think this is the time to really look at having data platforms that can build, that can run autonomously. Today, companies have to hire really expensive, really highly skilled, super smart data engineers and data ops people to run their infrastructure. And you know, if you look at studies we're about 180,000 people short of the number of data engineers and data ops people this industry needs. So trying to scale by adding more smart people super hard, right? But instead, if you could start to get machines to do what people are doing just faster, cheaper, more reliably then you can scale your data platform. So we basically made an announcement a couple weeks ago about kind of the industry's first autonomous data platform and what we're building are software agents that can take over certain types of data management tasks so that data engineers don't have to do it or don't have to be up at three in the morning making sure everything's going right. And from a market segmentation perspective where's the sweet spot for that? Enterprise, SMB, somewhere in the pool. Oh, the bigger you have to scale. It's not about company size, it's really about sort of the scope and scale of your big data effort. So the more people you have using it and the more data you have, the more you want automation to make things easier. It's sort of true of any industry, it's certainly going to be true of the big data industry. Yeah, the more complexity in the question set. Correct. The more complexity. The more users you have, the more teams you have or data sources. Presumably that's going to be correlated. Absolutely, correct. Which is, we can use a big data project to really ascertain that. Correct, well in fact that's sort of what we're doing. So because we're a SaaS platform we take in the metadata from what our customers are doing. What users, what clusters, what queries, which tables, all that stuff. And we basically use machine learning and artificial intelligence to analyze how you're using your data platform and tell you what you can do better or automate stuff that you don't have to do anymore. So we've presumed that the industry at some point in time, the big data industry at some point in time was going to start moving his attention to things like machine learning and AI. You know, up into applications. Are we going to see the big data industry basically more pretty rapidly into more of a service or application conversation? Or is it going to kind of, are we going to see a rebirth as folks try to bring a more coherent approach to the existing, you know, many of the tools that are here right now? What do you think? Oh, I think we're going to see some degree of industry consolidation and you're going to see vendors, you know, and you seeing it today, you know, trying to simplify and consolidate, right? And so some of that's moving up the stack towards applications. Some of that's about sort of repackaging their offerings and, you know, adding simplicity. It's about using artificial intelligence to make the operation of the platform itself easier. I think you'll see a variety of those things because, you know, companies have too many places where they can stumble in their deployment. And, you know, it's going to be, you know, the vendor community that has to step in and simplify those things to basically gain greater adoption. So as you think about it, what is, I mean, I have my own idea, but what do you think the metric that businesses should be using as they conceive of how to source different tools and invest in different tools, put things together? I think it's increasing, we're going to talk about time to value. What do you think? I think time to value is one. I think another one you could look at is the number of people who have access to the data to create insights, right? So, you know, if you can say 100% of my company has access to the data and analytics that they need to, you know, help their function run better, whatever it is, that's a pretty awesome accomplishment. And, you know, there's a bunch of people who may or may not have 100%, but they're pretty close, right? And they're really, you know, they've really become a data-driven enterprise. And then you have lots of companies who are sort of stuck with, okay, we have this use case running, thank goodness, took us two years and a couple million bucks. And now they're trying to figure out how they get to the next step. And so they have five users, you know, who are able to use their data platform successfully. That's, you know, I think that's a big measure of success. So, I want to talk quickly about, if I may, about the cloud, because it's pretty clear that there are a number of, that there's some very, very large shops that are starting to conceive of important parts of their overall approach to data and putting things into the cloud. There's a lot of advantages of doing it that way. At the same time, they're also thinking about and how am I going to integrate the models that I generate out of big data back into applications that might be running in a lot of different places. That suggests that there's going to be a new challenge on the horizon of how do we think about end-to-end bringing applications together with predictable data movement and control and other types of activities. Do you agree that that's on the horizon of how we think about end-to-end performance across multiple different clouds? I think that's coming. I think that I'm still surprised at how many people have not figured out that the economic and agility advantages of cloud are so great that you'd be honestly foolish not to consider a cloud and have a proactive way to migrate there. And so there's just a shocking amount of companies who are still plotting away, building their on-prem infrastructures, et cetera. And they still have hesitancy and questions about the cloud. I do think that you're right, but I think what you're talking about is three to five years out from the mainstream of the industry. Certainly the early adopters who have sort of gotten there, they're talking about that now. But it's sort of a mainstream phenomenon. I think that's a couple of years out. Excuse me, Peter. One of the things that just kind of made me think of was these companies that we were saying there's a lot that still have hesitancies regarding cloud. Kind of vendor lock and pop into my head. And that kind of brought me back to one of the things that you were mentioning in the beginning, open source complexity there. Are you seeing or are you helping companies to go back to more of that commercialized proprietary software? Are you seeing a shift in enterprises being less concerned about lock-in because they want more simplicity? You know, it's a great question. I think in the big data space, hard to avoid sort of going down the open source path, right? I think what people are getting more concerned about actually is being locked into a single cloud vendor. And so more and more of the conversations we have are about what are your multi-cloud and eventually cross-cloud capabilities? That's a question I just asked. Right, exactly. And so I think more and more that's coming to the front. I was with a very large healthcare company a week ago and I said, what's your cloud strategy? And they said, well, we have a no vendor left behind policy. So, you know, we're standardized on Azure. We've got a bunch of pilots on AWS and we're planning to move, you know, from a data warehousing vendor to Oracle in the cloud. So, you know, I think for large companies, a lot of them can't control the fact that different divisions, departments, whatever, will use different clouds. And so architecturally, they're going to have to start to think about these sort of multi-cloud, cross-cloud scenarios. And, you know, most large companies, given a choice, will not bet the farm. On a single cloud provider and, you know, we're great partners and we love Amazon, but every time they have, you know, an S3 outage like they had a few months ago, you know, it really makes people think carefully about, you know, what their infrastructure is and how they're, you know, dealing with reliability. Well, in fairness, they don't have that many, so. They don't. But it only takes one, right? That's right, that's right. And there's reasons to suspect that there will be increased specialization of services in the cloud. Correct. So, I mean, it's going to get more complex in the cloud as well. Correct. Not less. Well, David C, SVP of Marketing at KubeL. Thank you so much for joining and sharing your insights with Peter and myself. It's been very insightful. So this is another great example of how we've been talking about the lawyers and food. Sonic was brought up into play here in this. Exactly. Very exciting. You never know what's going to happen on the Kube. So for David and Peter, I'm Lisa Martin. You're watching day one of the data work summit in the heart of Silicon Valley. And stick around because we've got more great content coming your way.