 The Cube at Hadoop Summit 2014 is brought to you by Anchor Sponsor Hortonworks. We do Hadoop. And headline sponsor, WAN Disco. We make Hadoop invincible. Hey, welcome back everyone here. Day three coverage of SiliconANGLE, Wikibon's The Cube, our flagship program. We go out to the events, just trying to see the noise. Wall-to-wall coverage, I'm John Furrier, my co-host. Jeff Kelly, big data analyst at Wikibon. Jeff, day three's kicking off. Let's get into the analysis. What's your take? Two days of meetings, talking to executives, entrepreneurs. And we've seen three levels of interest here. The startups, the growing companies, and also the big public companies. So in each one, there's a lot of activity, a lot of biz dev, a lot of new products. What's your take? Well, a few things. One, I think some of the conversations I've had here are kind of validating some of the data we've been collecting at Wikibon around the adoption of Hadoop. And really, I think what we're seeing here is we are very close to a tipping point where a lot of the kind of early adopter, POCs and experiments, blossom into kind of full-blown production deployments. We're getting very close to that point where Hadoop goes mainstream. We just saw this morning on the keynote panel, companies like Kohl's, talking about their use of Hadoop. So it's being used much more widely than just in some of the valley companies. So that's a good sign for sure. I think one of the other big topics here over the last couple of days has been the Hadoop and the data warehouse and how they interact and how they relate to one another. Do they complement one another? Will Hadoop replace the data warehouse? And it's really interesting, particularly from the vendor perspective. There's 88 vendors here. A number of them are the data warehouse heavyweights that are household names. And it's really interesting to watch the different companies try to position their technologies relative to Hadoop. Some are kind of tying themselves and not trying to do that. Others have a more elegant story depending on their technology. So those are two of the big stories. And the third, I think, is of course some of the capabilities around Yarn and SQL on Hadoop or in Hadoop, depending on which nomenclature you prefer, making Hadoop go real time. And that relates to what I mentioned earlier about it. Just about ready to go mainstream. So Jeff, you had a good show here. Obviously you had a great presence on stage, Merv Adrian and yourself, Jeff Kelly from Wikibon, headlining the keynotes. Talk about what you did on the panel and then I want to talk about the survey that you've been kind of leaking out a little bit here. So let's talk about your keynote on stage with Doug Cutting, the inventor of Hadoop and Arun Murthy, lead tech guy at Hortonworks. Really the players behind, the tech athletes behind Hadoop. What was your agenda, what did you guys talk about and what was the walk away from that? Well the main theme of that conversation was about where is Hadoop going, what's the future of Hadoop and who better to provide their input and their insights on that than really two of the most important people in the community. Doug Cutting, who essentially the founder of Hadoop, the father of Hadoop, founded the technology and the approach back at Yahoo. And of course Arun Murthy, who worked with Doug at Yahoo who created Hadoop too or Yarn, which is essentially enabling Hadoop to again go mainstream and be more applicable to different use cases. So it was a really interesting conversation. We talked a lot about how to keep the community vital going forward over the first, about nine years of its lifespan. I mean the first half of its life, there were no commercial vendors around the Hadoop space. It was all the community. Then there were some startups that started getting involved in recent years and now you see all the big companies here, Microsoft, Interdata and Oracle, SAP, they're all here. So we talked a little bit about how that's going to impact the development of the platform. And they're positive and positive and negatives. There's a lot of money flowing into this market from those companies I mentioned, also VC firms. Money can be a good thing in that it spurs development, it spurs interest, it shows that really Hadoop is becoming a critical part of the data infrastructure and it's going to be in the future. But of course when you've got different competing interests from the different large vendors, that has the potential to take Hadoop in different directions that maybe might not fit with exactly what the open source community might be interested in. So we talked a little bit about that. And it's a balancing act. I mean there's no definite answer how that's going to happen, but it's important to keep the community aspect vital, I think is really critical for Hadoop to become all that it can. And then going forward we talked a little bit about where they saw Hadoop in five, 10 years. And not surprisingly they really see it as a mainstream part of the data center. And it's going to take its rightful place alongside other technologies. So the Wall Street Journal is an article today from your panel, so good job, congratulations on really generating the kind of attention on a global scale. Wall Street Journal talking about your interview, but also they bring up a good point which is the skills gap. Talk about that. What did those guys talk about? I mean it's new generation of computer science programmers and developers and architects are coming into the market. What was discussed there? And why did the Wall Street Journal focus on that panel so much? Well, famously McKinsey pointed out a few years ago in their big data study that I think the number was we're going to be short of about 190,000 data workers, whether that's data scientists or application developers, whatever the case might be. So this has been a theme or a meme going around in this community for a while because Hadoop is a complex technology and in its earlier form you really needed some particularly advanced skills to work with MapReduce and Program and Java. And these were not skills that were widely available. When you think about the data science level being able to combine the statistics capabilities, the math capabilities, and as well as more of the business acumen which you need to be a successful data scientist, there are not a lot of those people around. So that is kind of the main issue that we've been kind of grappling with in that sense for the last few years. And one way I think we're going to overcome that and we're starting to overcome that is one, the tools will get easier. They'll abstract away some of the complexity. But the other thing is, as Doug pointed out in the conversation, the younger generations coming up through universities now, they are being born into this world where they're comfortable with Hadoop and MapReduce and they understand that data, being data savvy is critical no matter what your line of work going forward if you're a knowledge worker of any kind. So as that generation comes up, that's one way we're going to close this gap. And I think that was Doug's main point. I like how the comment about they were kind of saying, Java might not be the young guns language of choice. I think that's true. We've said that on theCUBE many times. It's a great panel, great keynote panel with those two legends in the industry. It's fun watching the industry grow up with those guys. See, there's normal guys doing their thing. Now they're celebrities. Yeah, well, I know it's interesting too. I mean, there was always, there's this subtext of, you know, and I think people were wondering, we have got Hortonworks and Cloudera on stage together and there was a joke going around in the green room which we should have got those sumo wrestler outfits for the two guys. I wanted to make sure in that conversation that it was not a vendor food fight because these are two, they do represent their respective companies, but they're really, I think, even more important members and leaders of the community. So I wanted to kind of keep it on that level. I mean, Doug Cloudera, you know he works for Cloudera. It's very clear he is pure open source guy. He's not one of these guys that leans to, plays his jersey with his team. He is pure community guy, and Arun, obviously being one of the co-founders of Hortonworks is all about the open source. So those two guys deserve a lot of credit, independent of the companies that they work for. I think, you know, we've seen that and we could say, you know, with absolute certainty, both guys stand up, awesome guys. So Doug and Arun, fantastic people. It's great to follow them and see their work get recognized on a global scale. Next question I want to ask you is the survey that you're showing people that hasn't been released yet. It's coming out, the exclusive Wikibon survey that you put together, okay, narrowed down, very targeted, what are some of the results there and how is that playing out? What's some of the feedback? Well, the feedback's been good so far. You know, we've been, as you say, kind of, you know, sprinkling in some of our insights from that survey on the show over the last couple of days and on Twitter. You know, we'll be formally releasing the results shortly, we don't have an exact date yet, but we're going through the results now and we will make those available to the community. But you know, some of the interesting findings, you know, I think relate to what we were just talking about the point I made about kind of being very close to this tipping point where Hadoop's going to start to go mainstream. And one of the interesting findings we found, you know, among Hadoop practitioners, people who have deployed Hadoop in their enterprise, only about 25% are actually paying customers of a Cloudera, a Hortonworks, a MapR, et cetera. Majority are actually using not even a free distribution from one of those vendors, but roll your own Apache Hadoop. So that tells me there's a huge opportunity there. I mean, I guess there's two ways to look at it. One, you could say if you're one of these Hadoop vendors, like wow, we need to do a better job, we need to start ramping up the revenue. On the other hand, look at that opportunity. And also keep in mind that our survey was probably a little bit more forward leading than a lot of the other analyst firms, some of their work, you know, we kind of deal with more of the cutting edge to technology and early adopters. So if you, you know, we're going to apply some of our statistical methods to this. And I think when you really take a step back, the actual number of the percentage of Hadoop practitioners that are paying customers is probably even lower than that. So there's a huge opportunity here. And if you couple that with all the talk we're hearing at this show about Hadoop being ready for the mainstream, that's why I think we were at this tipping point. Some of the other findings, you know, one interesting finding that we haven't talked a lot about at this show, hasn't been a real big focus of the show is the use of the public cloud to support big data deployments. Around 56% of the respondents to our survey who have deployed big data technology in one form or another are using the public cloud. And another 24% plan to use the public cloud. So that's an interesting, you know, we talk a lot about it, we keep on the convergence of cloud and big data, along with application development being agile. We don't hear a lot about it at shows like this, but I think that's the direction we're moving. And it's clear from our data that that's what practitioners are looking at. So what's your plans for the survey? Take us through your roadmap. You're going to take it on the road. You're going to publish it. What's the timetable? What's some of the activities around? Are you going to do follow on surveys? A lot of folks have asked me, one, where's the survey? How do we get my hands on it? And two, can you do more with it? Are you going to do more? Absolutely. So, you know, we just wrapped up the survey. So right now we're kind of going through the data at a high level. You know, once this show wraps up, my next big focus is going to be digging really deep into the data, doing a lot of cross tabulations and finding some of those really great insights. And indeed, we're going to take it on the road. We're going to be visiting a lot of the people in this room at their respective companies and kind of sharing some of the insights. And then we will, of course, publish on Wikibon some of the top level results so the community can get a look at that. And then I suspect we'll be getting a lot of inquiries from our clients to dig even deeper into the data. And that's something we plan to do. And then in Q3, we're going to do another survey. We're going to follow up on this. And that's really our plan at Wikibon is to kind of double down on our survey and essentially data collection work because we find that's what a lot of our clients are interested in, one of the communities interested in. Okay, it's day three. We're going to dig in a little bit deeper today. Today, day three is usually the day we kind of relax. A lot of activities kind of like trying to get the energy going here, but mainly we're going to dig into some of the deeper conversations around the three areas we've been exploring at this show. You know, the startup, you know, series A, B companies, the series C funded growing, rapidly growing ventures. And obviously the big whales, the public companies, all three of those theaters are really pumped right now and active, all doing well in business. We're going to have CEOs and CTOs from all three of those sectors coming in and sharing with us their insights, their experiences, their objectives, and how they see this world playing out in the Hadoop big data landscape. This is theCUBE. We go out to the events. I can see them from the noise. I'm John Furrier with Jeff Kelly. We'll be right back with our first guest, day three, right after the short break.