 Live from Dublin, Ireland, it's theCUBE. Covering Hadoop Summit Europe 2016, brought to you by Hortonworks. Now your hosts, John Furrier and Dave Vellante. Hello everyone, welcome to our live coverage of Hadoop Summit 2016 in Dublin. We are live in Dublin, Ireland. This is theCUBE's SiliconANGLE's flagship program where we go out to the events and extract the signal from the noise. Dave, we did 77 events last year, seventh season of theCUBE. We've done all the Hadoop world events except for the first one. In 2010, we did the original Hadoop world when theCUBE was essentially started in the Cloudera office. And we've seen so much. I've been in this business 30 years. Working at the big companies, covering the enterprise, being in the enterprise. Obviously on your experience, 30 years. I mean the insight and knowledge that we bring and also acquire on theCUBE is phenomenal and you get to see the landscape forming and the Hadoop ecosystem, which was essentially coined by Arun Murthy, the co-founder of Hortonworks yesterday, as essentially just a name now for the ecosystem around big data. And you got to give credit to Cloudera for being first mover. Mike Olson, Amar Awadala, we were there. Jeff Hammabocker. Jeff Hammabocker, all the guys involved. We saw some early Cloudera employees last night and those early days, there was nobody. Hortonworks came out of nowhere with Yahoo from Yahoo and became a second mover in the big system three, which is now MapR, came along as well, all venture funded. But you're seeing the transformation of the ecosystem and it couldn't be more highlighted by this show. And in our view, it's growing. But yet, is it big enough? The big players are coming in, Oracle. People, we have relationships, Oracle, IBM, HP, they're all coming in in a big way, Microsoft. And so, the question is, what is the landscape going to look like on this ecosystem? And I was saying yesterday, I think that the Hadoop ecosystem has failed customers and I meant that in a way to say, hey, you got to go faster. And what I meant by that is, you have a market that's begging for solutions and the cost of ownership to run Hadoop is still high and the penetration of Hadoop in the enterprise is still sub 10%. Depending on the estimates you look at from Wikibon and or the market, revenues from the big companies, pioneering the space, like Hortonworks, like Cloud Air and others. So it's early, but the market's shifting. And the market forces we teased out yesterday is if it doesn't grow fast enough, the big guy's going to come in with a cloud. So I want to get your take on this and kind of bring your insight and knowledge into the conversation because we see those other guys too. We see their moves, we talk to their customers and we hear comments like IBM does more revenue on a small little product line than all the Hadoop vendors combined. Those big come Oracle and Mark Hertz talking to me privately and said, hey, we do a lot more business than a lot of these companies that exist. So you got Cloud Air pushing $100 million a quarter. You got Hortonworks, public company growing, the ecosystem's evolving, but still fragmented, not yet unified. What's your take on all this? Well, so first of all, you're right, Cloud Air got it all started. And I remember the first strata that we ever went to, there was no competition for Cloud Air. And I remember we talked to Amarawa Dalla about that at the time and said, well, there probably will be competition because it's a big market and sure enough, the next year when we had that conversation, there was lots of competition. Of course, it's well known how Hortonworks got started with the Yahoo! spin-out. But I guess these companies, we talked about yesterday, John, the pure play so-called big data companies, the only two that hit the leaderboard are Palantir and Revenue. In Revenue, that's the ultimate metric, right? Forget a headcount, headcount's a whole different story, but evaluation, but Palantir and Splunk, add those guys, those two up and you're talking about a billion dollars. Like you said, Cloud Air is a private company so you really don't know what they're doing, but we had them at just under $200 million last year in Revenue. I think it's not unreasonable to expect that they could double. I mean, they're probably on that steep part of the S-curve right now, so they could do 400 this year. So they're getting to be a little bit more than a rounding error. Hortonworks is like, what, 120 last year, so they're going to push 200. So they're becoming more than just rounding errors. However, you look at, you know, what IBM claims is their analytics business, it's like almost $20 billion. Now we have about 1.8 of that, almost two this year, as so-called big data. Because we define big data as a subset of, you know, the old Cognos business and the old BI business, you know, and Oracle and SAP are pretty big as well and they're just getting started. So when these big guys start to throw gas on the fire, now the other thing that George Gilbert's been talking about is Microsoft. Microsoft has, you know, eight to $10 billion of operating profit in its cloud and enterprise business. And it could just aim at the big data business if it wanted to. It just, okay, we're going to own that, boom. And just come in and sort of take over. And sort of do what Amazon's doing, maybe in a bigger way. John, I'd love to get your take on Amazon and what your thoughts there. We talked to Scott now yesterday, he's like, yeah, you know, Amazon, it's nice, kind of an on ramp to use your term, but really we don't see them as sort of enterprise grade. Do you agree with that? I do, and I think one of the things, Dave, is that, you know, you and I are old enough, we have enough experience where we've seen a lot of these industry cycles before, and I got to tell you that you're seeing the same movie over and over again. And a lot of these, I won't call them dot-com, like companies, but you know, these programmer, like, you know, millennial startups that are over a billion dollars, just don't have a lot of senior management in the top positions who don't have the experience. So we're going through a wave of change right now with the enterprise as hot as it is, Dave, and with the transformation with technology where your experience matters. So based on my experience, okay, what I've seen in the past is you're seeing the same movie over and over again with Linux and a lot of these early open source days where it was a tier three, tier two citizen. Obviously Red Hat, you look at the successor Red Hat, Dave, is a tier one citizen open source. Communities and the business model of open source is a tier one software environment. So you now have a historic moment in time where tier one software development, software development practices is all being done in open source, okay? And community is the number one factor. So if you look at that, that being said, Hadoop ecosystem, in my opinion, is a tier three, tier two citizen relative to the existing incumbent businesses. For example, I was noticing here at the event here at Hadoop Summit, Oracle's Big Data SQL allows you to run Horton Works and Oracle 12C Database with or without Exadata. That's a huge move and Oracle could run the table. So just little moves by the big guys could really crush the industry. So I believe that this ecosystem is fragmented, okay, across all the different vendors and approaches. I think they're just always, when they catch up, they fall behind. When they catch up, they fall behind. When they catch up, they fall behind. And it couldn't be clearer than some of the messaging you hear from Cloudera and Hortonworks MapR where they're targeting the CIO. And the CIO wants simplicity. They don't want complexity. And so you have a problem right now where they catch up, they make it bulletproof, they bring insecurity, and yet it's complex. So they're kind of misfiring on the alignment with what the CIOs want. That's one challenge. The other challenge is you have, the elephant in the room is Amazon Web Services. So if you compare and contrast, I know this may or may not be a great metaphor, but historically when we lived through the Linux revolution, there was a forcing function of Linux to kind of come together. And then IBM made a huge investment with Linux over a billion dollars. And that was Sun Microsystems. The mini computer guys, HP UX, the Unix market really was, really trying to, was, was fragmented. Linux came together. Digital, HP, IBM, Sun, they were all fighting over whatever, 20 billion dollar workstation market. And they were fighting Microsoft. And all the different convoluted Unix versions, licensing, who pays what. Unix was a really powerful system, an operating system, you know, and VSD came out of that at Berkeley. You had versions of Linux and Linux formed around the function. If we don't come together. But so for people that don't remember it, so the Unix's were all essentially proprietary systems, operating systems built on top of some open code. Hardware. Berkeley, whatever, yeah. But they were all proprietary. They were stovepipes. Hence Linux comes into the four. Yeah, so the Linux came together and they came together in a beautiful way. The community of Linux really was a defining moment in the open source evolution. That really was a flash point of innovation where that essentially combined with other Apache and other open source communities show the business model. Well, and I want to just interject. And Steve Mills at IBM led the IBM investment of a billion dollars, which in mid 90s was a lot of dough, into Linux. And so that formed a community. And the rest is history. Now we have a tier one environment with Linux, Red Hat and all that stuff going on around OpenStack, here in Hadoop, and I think my view, my view and my opinion based on my experience is what we can take out of this market is you still don't see the revenue numbers blowing it out. And it's a fragmented Hadoop market relative to the open source because you now have vendor solutions coming in Oracle IBM and you've made all the big guys. So I believe that this community has to come together around a core platform because this is a forcing function for this community to be very relevant in the IoT future phase and also in the analytics picture because of the open source power. So to me, you're seeing a lot of parallel between Linux in that way and in this community being fragmented the way it is in my opinion is interesting. Now, Unix was a fragmented environment but you had the big guys like Microsoft with Windows 95 and the Windows environment and the other big players and that they wanted to use Linux as a way to get around that. Amazon is now the big enemy in all this and I think Amazon will provide the kind of services to these communities in the absence of some sort of solidarity of the Hadoop ecosystem. So I want to ask you a question. So you had responded yes, as I said you agree that Amazon's big data offering is lightweight? Yes. You do. You agree with that, okay. However, the potential energy that they have in the absence of baked out solutions is interesting. So Amazon is moving into the enterprise and others could come in as well but they're the big threat. So clarify something for me. So but can't you basically run Hortonworks or Cloudera on Amazon? Yes. Right? Okay. That's like Lotus one, two, three. But head to head. No, but head to head, what are you saying? Is Amazon's, you know, whatever it is, Kinesis and Lambda and all, you know, a Dynamo DB, all that sort of stacked up is maybe not as robust but if that's a problem I can just run Cloudera or Hortonworks or MapR in the Amazon cloud, correct? Yeah, in a stovepipe kind of way. Okay. So what are you saying that Amazon will eventually keep getting better and better and better and better? Well, I think Linux or the way Linux evolved was to be that force as an alternative tier two kind of citizen. Now tier one, to be an alternative to the incumbent proprietary in the old days. So I'm not saying Amazon's priority but in a similar way. Well it is, and Amazon and Microsoft could run the table on big data. Okay, so who wins and who loses? And here's the, so there's opportunities. So I'm pointing it out because I made a prediction yesterday that everyone's got to come together and I think this is why. The hybrid cloud is what people are doing. Not everyone's going to be public cloud. So Amazon has to move into that direction of the enterprise. So the opportunity for this community is to absolutely own the hybrid cloud environment relative to the Hadoop and or surrounding open source and vendor solutions around big data component for the analytics, the engine of analysts with the cloud and hybrid infrastructure powering that. So to me that is the opportunity that if they have that Linux-like moment, the Hadoop ecosystem needs that Linux-type moment right now to come in and be a viable solution for CIOs, for reducing the complexities, lowering the total cost of ownership for managing clusters, the training issues. We're hearing this time and time again even up on stage. Herb can say. That's a key point. The TCL point that you're bringing up right now. So all the trends around automation assemblies are all pointing to the fact that that has to go faster. And this is why a fragmented environment just does not create an environment for speed. Okay, so I agree with you on the hybrid. However, there's some caveats here. And also I want to unpack this TCO thing because I think you're really onto an important point. The cost, and we know this from our own experience of running and managing a Hadoop is significant. And it's much, much lower in the cloud from the standpoint of all the non-differentiated heavy lifting that you have to do. So if we think about what you're saying, essentially you're saying that, if I hear you correctly that Amazon and I would throw Microsoft in there, could as they improve, get better and better and better, then there's a need to kind of coalesce this fragmented ecosystem. Yes, yes. All right, so that's cool. Now the second thing is I want to talk about hybrid. We heard from the keynotes this morning. The way that it was interesting, I thought Herb did a really good job. But like many people, the way he asks the questions as I do often is you ask questions about cloud as though it's a destination. And Brian Graceley's really, or Brian Graceley's big on this. Cloud is not a destination. It's not about where you run your apps or place your data. It's about the operating model. That's what's different about cloud. Whether it's on-prem or off-prem, it's all about can you drive the cost of TCO, the TCO down? Yeah, okay, I would put forth the assertion that the enterprises are not operationally ready for the kind of dreams that the cloud areas and the Hortonworks have for their sales targets. And I think sales will suffer for those guys. Clouders are the biggest revenue numbers. So if they're going to be immediately under a lot of pressure because they have to expand their sales base fast, so they land and expand. Before you get there, let's talk about from the customer perspective. So the customers on stage today basically said, we use hybrid. And I think they're beginning to think of cloud as an operating model because one of the folks on the panel said, yeah, whether it's, we take the less risky stuff and the data's in the cloud, we can access all these open data sources in the cloud. We filter what we want and we bring them back to our, behind our firewall and then we blend them with our own proprietary data. And it's our own cloud. So they're starting to develop this notion of a hybrid cloud where the operating model is consistent on public and private. Now we know that there's a big schism. Oracle strategy of course is same-same, but they're not quite there yet. Microsoft with its partnerships is trying to get the same-same. IBM is doing a little bit of that. But most IT operations are not same-same. Here's my point. We know from the data that Amazon is crushing it from a profitability standpoint. Amazon operating profits are 28% last quarter. EMCs are 17%. So Amazon is selling for ostensibly one-third the price, but it's got significantly higher operating margin. What does that tell you? That's as Amazon's costs are lower. And my bet is that they're coming down faster than they are in the enterprise because of Amazon's volume. So here's the question. Can IT organizations keep up with the cost structure and the low TCO that the cloud guys are delivering? No. The answer is no. And here's why. The complexity involved and the cost of ownership kill the profitability operating model of what the promise is. So the promise of cloud and or Hadoop in context to big data applications, how data is landing, and I'm not even factoring the complexity of where the data is acquired, if it sits on the edge of the network, does it come back in? All that complicates even more. The operating profit model of that is going to get crushed. That's why it's a race right now in this ecosystem to provide automation and tooling and recipes and reference architectures for those big data apps. Because otherwise it's the same movie. It's like Groundhog's Day, the same movie. Catch up, fall behind. Catch up, fall behind. Amazon and Microsoft, I don't know about Google. Google, let's throw them in the mix. Amazon, Microsoft, Google, they win the TCO battle. 100%. Okay. Well, they win two battles. They win the ease of entry into solution sets with shadow IT now converting into open source and or public cloud or baby hybrid, mini hybrid cloud. And then as the customers extend into scaling up the operating model with a low cost structure, high yield on the app side, they can provide leverage on that. So the operators in this case, Microsoft, Amazon, Oracle, whoever can provide scaled economies that dwarf anything else. So if you're not in that flywheel, if you're on the outside looking into that, you're dead in my opinion. So that's to me the big picture. So we forecast that $200 billion of operating profits are going to vaporize, no not operating profits, $200 billion in spending are going to vaporize, shifting from heavy lifting on infrastructure management into the cloud, into vendor R&D. The other factor here is there's been a slow motion collapse in infrastructure hardware and software pricing which is one of the reasons why it's been hard for a lot of these big data software companies to achieve escape velocity. Because to use your line, why buy the cow if the milk's for free? And the other dynamic is this industry has to move out of the catch up, fall behind, catch up, fall behind to catch up and drive, right? So until they get to that point. So carry this through. So if the cloud vendors, the public cloud vendors win the cost, simplicity and TCO model, that means that the on-prem guys, all the guys talking hybrid, have to have significant differentiation on-prem. People today talk about security, privacy, you know, maybe, maybe that's going to continue. I don't know how much of that is illusory versus real. To me, the bigger issue is how do you add value up the stack beyond what you can do in the public cloud? So on-prem, if you're going to have a significant on-prem, continue on-prem, you have to have major differentiable advantage number one, or it's just too damn expensive for you to move. And that's the other possibility. But that becomes a very defensive business. We agree, the growth business is in the cloud. And here's the bottom line. Insights alone will not win that up the stack battle. The apps themselves have to be highly differentiated. Insights are good, it's a good starting point, but you got to have actual value that's going to be differentiated. Okay, and I would argue that the apps are all going to be SaaS, which is cloud. Or a hybrid, it's going to be operating models going to be different from the users. But the good news in all this, Dave, and the good news for the folks out there is there's some low-hanging fruit that's supporting the industry. Cyber security, huge big data problem, Hadoop wins in that environment. You have other use cases like fraud detection, persona-based 360 marketing, and ultimately IoT right around the corner. So there's some rising tide aspect of that that will keep revenues decent, but ultimately that's not going to be enough. The foundation has to be set for catch up and drive the industry, versus catch up, fall behind, catch up, fall behind. And then one last point. Lest you out there think that I'm one of these people who says the whole world's going to public cloud, I'm not. I am saying that the public cloud guys are going to make a lot of money, and they're going to win the profit battle. Because we see about a third of the market long-term going to the public cloud, a third going to on-prem cloud that replicates public cloud functionality to a great degree. We call it true private cloud. And the last third is just legacy stuff that's going to sunset, and really not really an exciting business. So that's sort of our take on the business going forward. Yeah, and do the customers really care? As long as it's a single pane of glass, I have some public cloud, some percentage, maybe 20 or more around there, hybrid and then on-prem. I mean it's going to be an operating environment, so it should be kind of like, who cares? Security will drive it, the application workloads will drive those behaviors. And asset management. I got a third of my business that's in my portfolio that has managed decline. I'm not going to rip and replace. That's what it is. The net net is the Hadoop ecosystem now, categorically going beyond just Hadoop open source and the big data applications from the vendors have to do a better job of providing that kind of value. The CIO conversation is not complexity, it's simplicity, and that's the angle. Okay, so this is a kickoff segment. This is theCUBE. We'll be back with more live coverage. We've got great guests lined up. Day two of a Duke Summit here in theCUBE. Go to the hashtag HS16 Dublin. Go to crowdchat.net slash HS16 Dublin for conversation. We'll be right back with more live coverage after this short break. I'm John Forere with Dave Vellante.