 Live from the San Jose Convention Center, extracting the signal from the noise. It's theCUBE, covering Hadoop Summit 2015, brought to you by headline sponsor Hortonworks, and by EMC, Pivotal, IBM, Pentaho, Teradata, Syncsort, and by Atunity, WAN Disco. Okay, welcome back everyone. We are here live at Hadoop Summit 2015 for day three, about to kick off our third day of wall-to-wall coverage here at Silicon Valley's own San Jose Convention Center. I'm John Furrier, my co-host George Gilbert, Big Data Analyst at Wikibon.org, and Jeff Frick, general manager theCUBE. Guys, day three, we're kind of winding down, and you got a good night's sleep last night, great networking at the event. Going out at night, I get to see all the action to the relationships, and I think going through the events, Jeff, is all about relationships, and you get to see people that we know from the Hortonworks days, early on, people in the industry. Some people change jobs, different positions, but ultimately, this is a really tight community. It's really fun to see people growing into their jobs, either getting promoted, getting new jobs, starting companies, I saw at least four or five people that I knew from the early days at Cloudera. I mean, basically all the early employees pretty much have moved on from Cloudera, and have gone on to start companies. I've seen guys start companies at Hortonworks, all kind of part of the Karitsu, part of the Hadoop community has been exciting to watch, and certainly yesterday was a great day. We had great content, and again, George, the themes are clear. We are seeing an era of innovation here that is like the railroad, as Herb Kunitz said. The railroads and electricity, this is utilities, this is like monopoly, you know, own the railroads, own electricity. This is a fabric of standardization that's happening with Hadoop, but it's not just a Hadoop game. We heard from Teradata saying with the Hedap guys over there, we saw Presto, another sequel engine, we heard IBM exploring analytics. We heard WAN Disco really hit the nail on the head because they've successfully predicted two major shifts that they made bets on that paid out huge for them, and their thesis is there's a world beyond Hadoop. So that, to me, is not so much a pigeonholing of Hadoop more than just the expansive business opportunity. Hadoop is clearly growing, Hadoop will be standard. I mean Herb, Mirv at Gartner said 50-something percent will have Hadoop in the enterprise. That's still a huge number, I still still like, it could be 70 percent. So, you know, we got people here with thousands of clients. I want to get your take, George, first. What is your analysis of the market? Do you agree that Hadoop is growing and there's more besides Hadoop? We got Spark Summit coming around the corner respecting IBM to make a huge announcement. It's already kind of leaked out through Bloomberg, so Spark is out there. That's causing a flywheel of innovation and excitement that will fuel potentially Hadoop and other things. What's your thoughts? Well, it's pretty clear that ultimately we're going to get to the point where Hadoop is a utility, like railroads, like electricity. And like some of our guests said, right now the gauge on the railroad tracks are different between one and the other. It's pretty clear that when we look at the like open data platform, that was really an attempt to say, you know, all these Apache projects, there are several dozen of them. And in any Hadoop distros, it's about 17 to 20, we need to rationalize those so ISVs and partners targeting them can look at one platform, not, you know, a whole collection of projects. The big debate, and then we asked everyone who came on was, so where are we in this life cycle? And pretty much everyone said we've got a certain set of repeatable solutions. You know, whether it's the data warehouse offload, whether it's the ad serving and personalization. But we haven't gotten to the point where there's like a huge library of solutions. So it's, we're very much approaching the inflection point and there are a lot of people buying it and you know, sort of getting comfortable with it. But I'm not sure I would say we're in the tornado yet. It's interesting, you bring up a good point because it's parsing the language of the industry and then translating that to the customer. And what I'm seeing is, this is a product, this is a technology, and we ask direct questions. Is analytics a process or product? We ask questions, so what's going on with tooling and platform? These are industry words and these are things that how we cross the chasm because there is a product, there is a technology, there is a process, then product. Now, that is just the beginning of an enablement kind of culture now. You're starting to see that next generation of this is not yet a product for the customer because the solutions that come out of that from the practitioners renders itself in the customer market. So that to me is something that I'm seeing that's very important, which is there's so much more work to do, the industry shouldn't get ahead of themselves and congratulate each other, high-five each other. We should be looking at it, okay, there now is another level of the leg of the journey to do that. You know, you mentioned Wendisco and they're very articulate about it, which is, look, we have this data fabric, they call it, and they're essentially separating the data from the underlying infrastructure by saying, the data could be part of Hadoop, it could be in a relational database, this isn't shipping yet, but this is their vision, could be a relational database, could be on EMC type storage, but we're going to make it so that you don't care where it is and that is a critical foundation step for building them applications that are sort of location independent and infrastructure independent. And yes, so that's a part of making the platform more mature. The other thing I'm observing as we look at the close of this conference, day three, Jeff, I want to go to you on this because we've been doing something over the past year that's been not obvious to the public, but now we're out there with theCUBE, is we've been looking at the social data, we've been looking at the community, even noticing a change in what we've been doing by adding more omnichannel-like content pieces with images and links, and we've been looking at the community and looking at the conversations through our CrowdChat platform that theCUBE is enabling. And Jeff, I want to get your thoughts on this because this is a community, I want you to talk about those relationships. We've known some of these folks, like I said on the intro, like for many, many years, we've done all the Hadoop Summit events until O'Reilly kind of blocked us out of there, but now we've done every Hadoop Summit, we're going to continue to do it. This is an ecosystem that we've been embedded in from day one, and now we have social data. Comment on the relationships, comment on the people side of the equation. Yeah, I think the relationships are huge, John, and I think you made an interesting point because we're seeing people now who have left some of the early adopter companies and are now at new startups. So this whole another kind of generation of startups looking for specific applications so they can own a marketplace to leverage the infrastructure that's come such a long way. I mean, we've been coming to this thing for five years now, it's kind of like we were at OpenStack a few weeks ago. You know, it takes a while for these things to go, and George, I like your comment about the rail grade. You know, there's a terrific book called The Box, which is about the shipping container, which you think is such a simple thing. Of course, they're all 40 foot shipping containers that can fit on a ship, a truck, and a plane, or in a rail car, but that took a long time to get that settled in. So, you know, I think, and with the Hadoop, all the buzz, I always go back to Amara's Law. I don't think it gets enough buzz in the industry. We always talk about Amara's Law, but Amara's Law, which is that we overestimate in the short term the effects of technology, but we way underestimate the long-term effects, and I think it was Merv that said, you know, we're in the age of disillusionment, and you know, we're kind of past that early height, but that means real stuff starts to get done, the infrastructure components continue to mature and evolve, and now we'll see that next class of applications. And I think, John, I remember it was a big question two or three Hadoop summits ago. When are we going to see the applications? When are we going to see the applications? And it feels like we're getting pretty close to that. So on particularly the social data, what are you seeing out there? We've been in the conversation, we've been putting out this thesis of join the conversation, because people want to join the conversation that aren't at the event. You had, we had Ariana here, we had our team back in Boston and Palo Alto and Dallas kind of monitoring the conversation space. What are you seeing? Well, we're seeing still, not to brag, the stills, I think some of the best conversations are the conversations that we're generating here out of the Cube. It's just the nature of our format, the way that people sit down, we just get people to say terrific nuggets of information about what's happening in the past, the present and the future. And those are really the catalysts that are bringing people together. Just Merv today will probably be a classic Cube moment with water not being thrown on you, John, by John Cleese, but really is a glass half full or is a glass empty? And there's a lot of debate around the Gartner report that came out about, like he said, 50%. Is that a good number or a bad number? And those are the types of things that get the people involved to have that conversation because there is no right answer, right? It's all relative. And what's your point of reference? And what are you comparing it to? But clearly the trajectory is up into the right. Georgia, go ahead. Just I wanted to add to that half full half empty comment which was one of the themes that we kept hearing about from many of the guests was we've got this incredible innovation on one side that the Apache Software Foundation is fostering and an unending number of projects and growing, but then the distro vendors, you know, Mapar, Hortonworks and Cladera, there's, they're trying to keep up with the different release cadence and try and create a coherent platform. But in doing that, they're exposing a lot of complexity to the administrator, the developer. So, you know, there's this tension and we haven't figured out how to resolve it yet. And that's partly what's slowing us down. Guys, I want to get your thoughts, George, and particularly your analysis on the evolution of Hadoop, the Hortonworks role in the ecosystem, vis-a-vis the expansion of the scope of Hadoop and Big Data in general. So one of the things that's becoming clear to me out of this is that, and through the, by the way, Hortonworks messaging I thought is solid. They look good, they're tight on their messaging. But what's interesting, we asked the ODP question because it's kind of like the firecracker you just want to, you know, gut going for us. Certainly we like to like to match and get the firecracker going. But Sean Connolly made his quote. He said, it's like the housewives of Hadoop. It's like this bickering contest going on between, you know, Cloudera, Hortonworks and others saying, hey, you know, and the press loves that because it's just, it gets into the nuances of the inside baseball. But putting that aside, ODP is real because of just the players involved. You cannot ignore IBM. You cannot ignore the size and scope of the players involved in ODP. And what's interesting is, if you look at what Hortonworks is doing by saying we're the railroad, we're going to lay the tracks down to get some standardization and you have ODP as a consumption downstream, you have an interesting thing. And I want to bring this up because I have a search background. So I look at Google search. Google search is organic search. It's pristine, it's, you know, pure algorithmic and the quality of their organic search is good. And they have the ads on the right hand side called out on top. So you know the difference between an ad and organic search. So in a way, Hortonworks strategy is to be, let's win the open source, contribute there and let's make ODP a consumption option for folks like IBM. So interesting strategy, completely clean and they've been interesting. They haven't changed their move on that. They've said, we're going to stay in open source and now we have this consumption vehicle, ODP. That makes a lot of sense from my standpoint. I just can't, I mean, I've been trying to shoot holes in this thing from day one and I just don't see any holes. I mean, if someone wants to consume a reference architecture or some baked code base. It's a perfect answer to this problem of we have these ever broadening distributions that have approaching two dozen component projects within a distribution. And so Hortonworks, IBM, Pivotal, others, they get together and they're saying, okay, let's standardize on a core of those, make that the standard gauge railroad track. And if you want to build a wider car on it, a railroad car, you can, but the wheels are going to have to stick on that track. And the example you're using is the ad unit next to the Google search results. We're going to all agree on the size of that ad unit and then all the other stuff, we leave open to innovation. But over time, we'll standardize more and more. And so I think I agree. And this is what one thing as I've been involved in the enterprise business for 30 years, I know the CIOs have worked with them and sold to them launch systems. Here's one thing that enterprises hate. I want to get your thoughts on this. Enterprises hate when there's a scorpion that could bite them. You know, the old joke, the scorpion wants to go across the river and promises not to bite the guy and poison them and then halfway through bites and he goes, why did you say you're probably, I'm a scorpion, that's what I do. So in a way, that joke kind of applies to the enterprise. They don't want someone to come in saying, hey, we're here to replace Oracle, we're here to do all this stuff. The issue is, is that I'm just a fact that I could get burned. So I think this is the challenge for Cloudera and others is you can't be shifting your strategy all the time, you got to stay true to the enterprise. You have to give them comfort and trust. And I think that is something that we have to look at because if ODP gets legs, it's going to cause some friction in the market of who consumes what version. I think that's a trainwreck for somebody down the road. What's your thoughts? Well, the interesting thing is, all infrastructure software sort of is headed in this open source direction. And several people on the queue over the last day or two have talked to us about the Red Hat analogy. But Red Hat was in a position to take all these different components and put them in a release as an operating system. The challenge with Hadoop is the definition of what goes into this platform is broadening all the time. And Cloudera is doing a good job of packaging it all up but it's their packaging. So if you want to move to a different distro, it's not all that easy because someone else will have curated a whole bunch of different components. Yes, each component is open source but the ODP, some people, there are a lot of cynical reasons why people offer, hey, Pivotal didn't want to put all the resources into maintaining its own distro. But basically it's the value add. Well, Cloudera, I mean, first of all, I think Cloudera's got a good strategy. I'm not saying that they're the Scorpion. But what I'm saying is if the market shifts Cloudera could get flat-footed, but they're winning right now. So from their standpoint, they have a strategy. They're winning big deals like Apple, big licenses. So they're in their lane. I don't see them shifting. If they do shift, or the fact that they can't deliver the enterprise wide, is it all Cloudera or not all Cloudera? That to me is still an open question that I want to dig into because if I'm in an enterprise, I now I'm seeing 1,000 flowers blooming and I'm seeing distros pop up, I'm seeing a lot of organic innovation. Not like what shadow IT did for cloud, I'm seeing that with Hadoop, where people are just playing with doing POCs and kind of five different versions laying around and all sitting there raining it in. That is an issue, so I'm looking at that because the new enterprise market is don't burn me. I've got to be plug and play. I've got to be interoperable and I need choice. That seems to be the vibe. So with that, a one-size-fits-all doesn't work in my opinion. I'm just saying that's my take. The part where Cloudera is trying to differentiate appears to be in the management layer and in the governance around the data. In other words, they'll say, we'll tell you where the data came from, we'll make sure you know it's trustworthy and we'll make sure it's easy to manage your Cloudera distribution of Hadoop. And the Hortonworks guys were coming from behind with things like Ambari to manage it and Zeppelin to put a UI around it. I think in that case for like data scientists. And so that's catching up. But ultimately, I mean, we have to be realistic. Yes, we call them railroads and electricity, but the real value add was all the applications that were built on top of those. You know, the railroads, ironically, never really made that much money because they were so expensive to build and maintain and frankly, utilities as well. But it was all the industries that were built on top of those that was where the money was. And so this may not be a perfect analogy, but once we standardize this infrastructure, that's where we're going to see some real wealth creation. And that's what we see like with Avi Mehta from Trasada. He is a very specific company built for a very specific application in the financial services market where he can leverage the underlying infrastructure, but that's just an enabler to really go out and deliver a solution to his client base that's buying his application. And what's interesting, Avi's made a great point before where as a startup, and I think John, you asked the question, where should startups come and play in the space? And as I look out behind you, John, there's all kinds of little tiny tables with new companies is find an application space, a problem space that you can actually be the number one or two player leveraging this infrastructure, use your domain expertise and your application development skills to take down a thing where you can be one or two and we're starting to see that. You know, and where we had infrastructure software where someone could build a giant company, an Oracle and the Oracle relational database, it was not really, you couldn't really port applications between them easily. And so only the big ISVs for the most part could and administrators were really trained on one and not the other. So you could build a real platform business. Here, it's open source, the interfaces are very similar. You've got multiple suppliers. It's not like you're going to build a $10 billion business or a $20 billion business and a 50% gross operating margins on this. As a platform business. As a platform. But you can as an application, right? Yes. And especially in the API economy. Yeah. I know we way overused the Uber example too much, but you know, break it down into components parts. It's GPS and a payment system. And you know, the other part is, look at the sort of one of the critical value add components on this platform, all the SQL engines. They're, you know, we're getting close to two dozen and counting. It's like, you know, hunting, you know, it's hunting geese. You know, you shoot one out of the sky. Another one shows up and well, you know who's going to like that is Rob Beard. He's a big hunter is from what I heard. Okay, this is a wrap for the segment, but we got a big day here. We got Bloomberg R&D. We got Peter Goldmacher, who was fantastic on theCUBE in New York City. Great analysis. He's now running strategy and market development at Aerospike, a great hire by Aerospike. We got IBM, Anjul is coming on. VP of big data analytics, BMC, John Finnelli, et cetera sent now with data turn coming on, looking for that conversation. Informatica and Rackspace. We got a lot of great guests coming in to wrap up the day. We're going to hold it together. Again, three days of wall-to-wall coverage is theCUBE live in Silicon Valley at Hadoop Summit 2015. We'll be right back after this short break.