 Live from the San Jose Convention Center, extracting the signal from the noise, it's theCUBE, covering Hadoop Summit 2015. Brought to you by headline sponsor, Hortonworks, and by EMC, Pivotal, IBM, Pentaho, Teradata, Syncsort, and by Atunity. Now your hosts, John Furrier and George Gilbert. Okay, welcome back everyone, we are live here. Day two at Hortonworks is Hadoop Summit 2015. This is theCUBE's SiliconANGLE flagship program. We'll go out to the events, extract the signal from the noise. I'm John Furrier, the founder of SiliconANGLE. I'm showing my co-host, George Gilbert. Our new big data analyst at wikibon.com. George, kickoff segment day two. Three days of wall-to-wall coverage here. Again, our consecutive year in a row when we covered all the Hadoop summits. What's going on? I mean, a lot of action. Enterprise grade, operationalizing. We had Merv Agent from Gardnery yesterday. A lot of good stuff. What are you seeing after day one? Just, what's the signals coming from this market? So a couple high level things which were somewhat contradictory actually. Some people were saying, hey, we've got this infrastructure that's like electricity now, you know, ubiquitous and people are going to start building apps. And others are saying, you know, we've got this ecosystem that is throwing off innovation at a pace that we've never seen. But because there's so many pieces, we've got this complexity that is kind of daunting to administrators and developers. I thought it was interesting that the Cloudera folks were talking about applications like in financial services like related to fraud or telco and the Hortonworks guys were very much focused on ubiquity. And then Abhi Mehta who did the kickoff with us, he was saying really where we want to go is turning analytics into a set of building blocks for automated intelligence. And that's where, you know, Hadoop is foundation and we start building these repeatable applications. So as you go to, as you look at the market, I'll say you have your research agenda, you just put out a great post and you get more research coming around the systems of intelligence, which essentially, you know, IBM even saying they like that better than systems of insight. Joel, that's his personal opinion on IBM, but that teases the trajectory of where this is going and you think about apps, you think about what needs to happen. And I think that's a key indicator. The other key thing that I'm looking at is and you brought up Cloudera, Cloudera with relationship with EMC is very telling because Cloudera is growing up and working with EMC means that EMC is in good position to be that platform for big data where you don't have to say I'm a Horton Workshop or I'm a Cloudera shop, they're not mutually exclusive anymore. You can have workloads and apps driving the tooling underneath. And so what that means is, it means that we are going to see in my opinion based on what I'm hearing and seeing and talking to Mervit Gardner is an environment where it's post Hadoop. Hadoop is going to be there, there's no brain, it's going mainstream. But there's an indicator that there's a world beyond Hadoop. So I don't have to have everything running on Hadoop. I can have Hadoop, I can have other platforms, this could be platforming if you will. It's very interesting. So what that means is if you have projects going on, there's a risk factor going on. So to me, it's like, okay, I'm a buyer of consuming this technology. I have risks on two fronts, failure and success. So if I'm successful in my Hadoop deployment, what happens? Do I put my data in Hadoop? Now I got to move the data around. So there's now an integrated intricacies of storage, flash, and we kind of teased it out yesterday and no one's kind of talking about that. EMC really the only one here talking about that. So that means the customer deployments will not be one vendor. It's not a Hortonworks, it's not a Hadoop. So what's your take on that? I mean, do you agree? I mean, what's your sense of that? Because ultimately the customers, it's going to cost them more money if they're successful and they have to retool. I mean, look at like Teradata, for instance. So actually, we did hear from one customer, an unnamed sort of traditional data warehouse vendor where if you add in the acquisition cost and the operational cost, effectively you're paying $100,000 a terabyte for a traditional data warehouse. And you compare that to in Hadoop, this particular developer or system integrator cited roughly $1,000 a terabyte for Hadoop. I think that's a little bit low. But the thing that where you say the sort of the blessing and the curse of success is that Apache is set up really to help govern and release a couple projects. And each of the distros has like 17 components now. And there's nothing that's encompassing all those and making them into coherent platforms. What happens when customers have multiple versions of Hadoop running? So let's say I'm a big bank, for instance. I'm like, wow, I teach this little audit and they say, hey, I have nine different distros running in my enterprise. How do they rein that in? Or is it kind of a problem? There's a huge administrative overload. There are some tools, management tools that can help with that, like when disco can make it possible so that you can actually have like Isilon from EMC running your storage on one cluster and you can have sort of commodity storage on another cluster. And from the point of view of adding the data, you don't know and you don't care the difference. The thing actually that the Cloudera guy said relating to EMC was some customers need storage that's manageable and that grows much faster than their Hadoop compute so that they might have an Isilon cluster growing 100% a year that has manageability and that they can't get from Hadoop. So they want to keep their data in Isilon and then they want to bring Hadoop to Isilon and that's something that's possible. Well, we've got a big lineup today. We have a lot of people stacked up. It's like the airplanes back and up at the airport. We have starting off, we have talent, we have attunity, we have Teradata, Facebook. We have someone from the Facebook it's going to be great to talk to those guys. David Richards from WAN Disco, always great to get David on, he's awesome, very articulate and we love WAN Disco. And then we have Scott Now who's the CTO of Hortonworks. He just gave the keynote speech. He's going to be great. Herb, the president of Hortonworks, I saw him last night. Jim Campbilly, he's coming on. And then we got IBM, EMC. Man, what a lineup. You know, I just want to sort of jump back to one thing from yesterday that was interesting because they were two opposite ends of the same spectrum, which was looking at Pentaho and looking at Syncsort. We don't normally associate them as core foundation platform parts because they're not part of Hadoop. But Pentaho has something where they have an end-to-end tool that does the data prep all the way through to the presentation and analysis. And that takes away all the difficulties that you normally would have when you're moving data from tool to tool. And you have to explain, okay, from the tool you're coming out of to the tool you're going into, how to stitch them together. Syncsort did something different or is doing something different, which is really interesting where they're saying we're just going to go deep on sort of the data preparation and offloading from a data warehouse. But we're going to put all that information that you need to pass it off to the next tool, the business intelligence tool. We're going to put that down in the Hadoop infrastructure. In this case, it's in Hive. So that they can achieve, to some extent, that seamless handoff that formerly you needed an end-to-end tool for. So I think the tools have room for a lot of innovation. Yeah, and there's a lot of room for good tools. I mean, it's a tooling market. But again, I'm back to this replatforming message. What's happening in my mind, the customer base, is tools, you need tools. And the tools are dictated by the application. So the applications decide what it looks like underneath. So I would say, the big story that I'm trying to sniff out of this show is, what does the platform look like? Because I do not believe that it's a Cloudera-only situation. I don't believe it's a Hortonworks-only situation. I don't believe it's a Pivotal-only situation. I think customers will have pockets of platform components from different vendors. And that question is not really being addressed. Murph kind of brought, we talked about it yesterday. And I want to hear more, I want to hear, okay, give me the real-life customer scenario where, okay, let's take the trajectory of success. What's the cost involved? And is our headaches or not? If there's going to be more costs, oh, well, we put everything in a data leg, it's all in Hadoop, and now I got to move the data, I got to replicate the data, that's more costs. So that's more, you know what I'm saying? So like, I want to hear a seamless story. I want to hear that. You're hitting on what I think is the biggest issue, which is, and it goes back to Murph's point, we have 17 products in these distros, and the Hortonworks guys were saying, we're trying to simplify by putting, you know, Mbari as a management kind of skin. It's deeper than the skin around it. You know, there are these other tools for security and you know, that would hide the fact that there are seams between 17 products. But there's another alternative, which is Azure from Microsoft, Google Cloud Platform, and Amazon Web Services, where they design, build, test, integrate, deliver, and operate a complete platform. And those, I think, long-term represent the real competition to the Hadoop distros. Because- Yeah, and this is the new formula. Okay, so what does this go next? And I think that's the story. Yes. All right, so let's talk about what's coming up today, and what happened yesterday. Bottom line, what is your take on day one? Give us the analysis on what's going on. I mean, big picture. You know, we asked, we asked about, so pretty much to everyone, how did Hadoop cross the chasm? And we got some different answers. You know, one was yarn, which is, you know, a piece of infrastructure, not a solution. Others said financial services like fraud, detection, or retail merchandising. I'm not sure we've crossed the chasm yet. We've bought, we, there are a lot of people deploying it and kicking the tires, but it's not clear that there's some things that every sort of early majority buyer says, I can write a check for. With the possible exception of data warehouse offload, where, you know, at 100,000 a terabyte and data growing 50 to 100% a year, IT budgets can't afford to keep adding- Well, I mean, you can argue that there's a mini chasm that's been crossed. I would agree with you on that statement, but I would also say that, looking at the industry success, is it going to implode or did it make, did it land somewhere? And I think the landing is a series of chasms. I think it's a halfway point, if you will, or a landing area before the chasm. It's like we hit the ball in the water and golf is a drop area. So like, I think right now they're in play and they're good at this ecosystem. Obviously the buzz here is fantastic. There's a lot of activity. So I don't get the sense that there's a, you know, people are dragging a little bit going, oh, wow, where's the meat? Where's the meat on the bone? I think it's like, okay, we know this action and I just don't believe Gartner's numbers. I don't think Gartner's survey is representative of the mainstream audience. I think that's just, you know, I think they have a point to say and I think the message is clear and I can buy the rationale from Gartner. And we love those guys and the mirror is great, but the sample size just isn't big enough. We're in a global economy, total inflection point. I just think a better way of identifying what that is has to come out. I'm sending an email blast of thousands of people and 300 people fell to form that says, are you using Hadoop? This is, I don't think that's going to be workable. So again, we're keeping an eye on it, wikibon.com. This is theCUBE, day two kicking off live in Silicon Valley. We are here on the ground with the team on the Twittersphere. Join the conversation, go to crowdchat.net slash Hadoop Summit, go to the hashtag Hadoop Summit, of course, hashtag theCUBE. We'll be watching and then answer any questions you have, send them on, I'm John Furrier. I'm George Gill, but we'll be back after this short break.