 The Cube at Hadoop Summit 2014 is brought to you by anchor sponsor Hortonworks. We do Hadoop. And headline sponsor WANDISCO. We make Hadoop invincible. In San Jose for Hadoop Summit 2014, I'm John Furrier, the founder of SiliconANG, my co-host Jeff Kelly, analyst at Wikibon.org, top industry analysts in big data, first out ever with the big data, market sizing, and now we're going to announce some big news here. We also have some new market survey information surveying over 900,000 unique people with a targeted direct measurement around 300 direct targeted affinity people. Jeff Kelly, welcome to kicking off Hadoop Summit third year in a row. Thanks, John. Good to be here. It's an exciting time. Over 300 attendees at this event representing 1,000 companies we learned today in the keynote. So a lot happening. I mentioned we've done some research recently and got some really interesting survey findings that will be sprinkling throughout the broadcast over the next three days, kind of giving you guys a sneak peek of the survey findings before we publish them on Wikibon. So yeah, a lot of good stuff happening this week. So exclusive data coming here from the Cube, obviously putting out more and more free content. That's what we do. Kind of unique, Jeff. People kind of still scratching their head like, you know, why would you give away free content? Such a valuable survey. Stay tuned watching the Cube. We will share with you that survey. But Jeff, let's just get right into the news. What's going on? Obviously Hadoop Summit, Hortonworks big show, but it's not just Hortonworks who just put on the show. It's the entire industry. So this is not a Hortonworks show. It's an industry show. Just like Cloudera runs Hadoop World with O'Reilly Media, Hortonworks runs this developer conference and it's a full industry. So what's the story? What do you see for news? Obviously MapR. You see Concurrent out there. You've got Cloudera. You've got Hortonworks. What's the story? Well, anytime we have an event like this, there's going to be a lot of news that drops, you know, right as the event kicks off, you know, all the PR news wires are flooded with news announcements. You know, some of the more interesting things we've seen today, we've seen a partnership between Trifacta, which is a really hot startup focusing on data transformation and helping data scientists become more productive. Trifacta is teaming up with Hortonworks. We see another big one this morning, actually an acquisition announcement. Cloudera, Hortonworks' competitor, has acquired Gazang, which is focused on Hadoop security. And of course, this comes a few weeks after Hortonworks themselves acquired a company called Xasecure, again, a Hadoop security company. I think that says a lot about where the focus is on Hadoop and that is making it enterprise grade. We've been saying this for, you know, over the last couple of years, but we're really seeing these companies like Hortonworks and Cloudera put their money where their mouth is with these kind of acquisitions and really focusing on, in this case, security and data governance, two critical components if you're going to run Hadoop in production with critical and sensitive data from either your customers or partners, whoever. So those are kind of two of the, one of the areas I should say security that's really getting a lot of focus of this show. The other, of course, is the continuing saga of SQL on Hadoop. All the different ways you can, you know, kind of tackle that problem. You've got things like the Stinger Initiative, which was completed recently, essentially improved significantly the performance of Hive on Hadoop. So you can use SQL-like interactive analytics on Hadoop. You've got Cloudera's approach with Impala. You've got MapR focused on more of an open approach supporting both Impala and Hive, Apache Drill, and of course, their partnership with HP Vertica. So the whole SQL on Hadoop area is a really interesting one because there's so many different options, so many different ways to go ahead and process data in that format and that way on Hadoop. So it hasn't been decided which is the best fit yet, so that's still playing out. So is there equilibrium in the ecosystem? I got to ask you the question because this is the top story in my mind. Obviously, Cloudera and Hortonworks, there is a real story there. You saw Amar Awadala had a tweet last night, kind of really, kind of vocal, pretty targeted at Hortonworks. There's no love lost between these two companies, apparently, one upping each other. First Hortonworks with the big acquisition all in on XA and security. I mean, Hortonworks makes the first move. Not to be outdone, Cloudera has their own war chest. So, I mean, it's pretty obvious. I mean, they can dance all they want, but the fact of the matter is they're matching volley for volley, it's on the table. Security by Hortonworks with their acquisition and Cloudera, which one ranks better in your mind? Obviously, the Hortonworks one looks very solid. And I think that's going to be a game changer for them. And that's they're all in on security. In the bottom line is you can't have a dupe without security, especially with the hard bleed and all the paranoia around open source with security. So clearly they're filling the holes in the platform that some are saying, isn't it ready for prime time? What's your thoughts? Well, hard to say who's got the better technology. They're both developing in the development stage. Clearly, I think we saw in the keynote this morning, Mervadion from Gardner talking about kind of the life cycle of any given technology, about a 20 year maturity cycle. We're about halfway through that with Hadoop right now. And so we're starting to see more focus on these enterprise great capabilities. I think the more interesting thing is that is it just watching the two different approaches, Cloudera with their open core approach, open core Hadoop surrounding it with some proprietary, some open source components to extend the functionality and create what they're calling their enterprise data hub. And then Hortonworks, which is really stuck to their guns from the beginning focusing on core Hadoop open source Apache software foundation Hadoop, and that's it. And they've stuck to that. I think there are two different approaches in terms of go to market. Obviously, again, it's very different. Cloudera is going out there to perspective customer selling direct. Although they do have a partner network, they're more of a direct sell. Whereas Hortonworks is very much focused on reselling through partners, SAP, Teradata, Microsoft, Red Hat. So it's a very different approach. In terms of, you mentioned the kind of the ecosystem and the competition between these two companies. And some of the others map our pivotal as well. And let's not forget them, I've heard at various Hadoop events that it's still a very collegial atmosphere. It's an open source community and there's a lot of cooperation and there is. But make no mistake about it, this is a cutthroat market. There is, when you've got this much money in the market, you've got Cloudera having raised $900 million, $740 million of that from Intel. You've got Hortonworks raised $200 million of their own. We've sized this market, the overall big data market is going to hit $50 million in 2017, this is a cutthroat market. There is no love lost between these companies. That's just the reality, they're going for it and it's really fun to watch. The question is now, as I mentioned, the money involved, Cloudera in particular, what are they going to do with all that money? That's the next question. We've got one answer today with their acquisition of Gazang. Similarly, Hortonworks has raised $200 million, that's nothing to sneeze at either. What are they going to do with their funds? It's really interesting to watch, it's a cutthroat market, and again, a good one for an analyst like myself to be covering. I think it's clear to me that obviously there's this success in the market. It's this whole war between both companies. It's really competition. I don't want to trivialize the competition. Certainly it's heated, there's a lot of money at stake, but I think if you look at their paths, Cloudera is aggressively going in for the platform play. I think that's why the Intel involvement is critical. And having Kim Stevenson on the board, which was announced yesterday, is a real great move for Cloudera because she's won a great woman in tech, she's a woman on the board, and they have a good diverse board over there of talent. Now the question is, does the board have the muscle to actually make the bold moves? And I think the answer with this acquisition is yes. Cloudera is not afraid to throw the dough around. And answer to Hortonworks' advantage. Hortonworks made that move with security ahead of Cloudera. Cloudera is quickly following. So if it becomes an arms race, Cloudera has the war chest. That means the folks on the board of Hortonworks are going to have to sharpen the pencil and hit the market for another round of funding because clearly Hortonworks is going to be a winner as well. So I don't think the capital markets are going to really be hard on Hortonworks. I think it's going to be simply a matter of how much money do they need? What's the market opportunity? So obviously your size in the market is in the billions and billions of dollars. If you factor in the Internet of Things, I think there's room for both companies to be billion-dollar plays, absolutely from a revenue standpoint. I mean, if you look at the Internet of Things, there's a lot of space where Hortonworks strategy of open source all the time and Cloudera's platform, which is open source plus license, can work. So what's your take? Is there enough market opportunity for all both companies? Absolutely. Nevertheless, one will come ahead of the other and they both want that top spot. That's understandable. But I do think there is enough room for more than one company in this market. But you're going to have a winner, you're going to have a second place, and then you might have a third, a little bit further back. But there are different approaches for sure, but this is such a young market and it's so much opportunity that you wonder sometimes are they spending too much time kind of competing or focusing on one another and should be focusing more on the market. That may be a little bit unfair of a statement. They're certainly focused on the market. But as I said, there's still a lot of competition going on and we're hearing a lot from the different players with tweets flying around about this conference in particular. And it's a really interesting market to cover right now. I think the other important question to ask is not necessarily the competition between Hadoop competitors, but how is this going to impact the data warehouse market? We've been talking about both Hadoop replace the data warehouse. Does it complement the data warehouse? How is that going to play out? And I think clearly Hadoop is not going to completely replace the data warehouse that's not going to happen. People are not ripping out their Teradata installations or their Oracle installations and replacing that with Hadoop. There's just too many mature, important workloads happening in those environments. But there's definitely competition for some of the same dollars. As I mentioned earlier, we've done some recent survey work and we found that over 60% of people that have adopted Hadoop have actually shifted workloads from their traditional data warehouse or other systems such as mainframes to Hadoop. So there's definitely going to be a shift of the workloads and with that goes some of the revenue for some of those data warehouse vendors. So I think that's something to watch. We'll see how that plays out. We've done some... So I got to ask you on the survey. We're going to get to that in a different segment. We're going to announce all that survey results. How many people are paying for subscriptions and how many are freebies? And how does open source in general impact software revenues? Well, it's interesting. So around 30% to 35% of the folks that are using Hadoop that we talk to are paying companies like Cloudera, Hardenworks, MapR, Pivotal, IBM. But a good chunk of them, close to half, are using either roll your own or one of the free distributions from one of those vendors. So what that tells me is we're still in a very early phase. We're seeing a lot of proof of concepts, projects, you know, in the experimentation phase and there's still a huge opportunity when those projects move to production for they're going to look to providers such as Hardenworks and others to help support those production workloads. So it's actually a little bit smaller market than I think we thought in terms of the number of people using Hadoop that are actually paying these distribution providers. But in a sense, that's a good thing because that's a huge opportunity as the proof of concepts and experiments kind of grow up, if you will. So, you know, it'll be interesting over the next year to see how fast that happens. You know, we spent the afternoon yesterday at Hardenworks had an analyst event. Throughout a few stats, they're adding over 70 customers per quarter. And you got to wonder how many of those as they expand their deployments are going to become, you know, truly large jumbo paying customers as some of the Hortonworks put it. So, we'll see. It's a really interesting market, as I said. Huge opportunity. And the dynamic with the data warehouse vendors is also really interesting how Teradata is embracing Hadoop where some of the other vendors maybe you're not seeing quite as much of an embracing there. And then how that's going to impact their long-term revenue will be interesting to watch. So I got to ask you at the survey when are you planning to announce it formally on the blog? Just give a quick tidbit on this before we break for our next segment. Obviously 900,000 unique people on our monitoring system led you to a targeted group of 300, roughly give or take, real targeted survey. What was your focus? What was the objective of the survey and what are you looking forward? How are you going to roll that out? Sure, so we ended up surveying about 303 big data practitioners. As I mentioned about 36% of those folks, 110 or so have deployed Hadoop in either proof of concept, experimentation phase or in production. And basically what we were after in the survey was a few things. One, we just want to understand of course what type of use cases were they deploying these big data technologies for big data being not just Hadoop but also analytic databases, data visualization tools and other things. So we got some interesting data around use cases. We also wanted some profile information. Who are these people? What industries are they in? Are we seeing any trends in terms of industry? One industry adopting big data technologies over another. And then of course we asked about barriers to success. What are the things that are really preventing practitioners from achieving the full value of their investment in big data technology? So we got some interesting data on that. And we're able to go back and look at that and compare, are we seeing certain barriers impacting certain industries over others or are we seeing certain barriers coming up at different points in the life cycle of a deployment? So a lot of great information. Also a little bit about deployment methods. Interesting data around how many big data practitioners are actually using the public cloud in one form or another to support their big data projects. So a lot of great data. We'll be sprinkling some of the insights throughout the three days here in terms of publishing on Wikibon. You have to stay tuned on that. It will be not too long but we're really excited about the project, about the survey. We're going to continue to follow up on this work for the surveys and as you said, we're always watching the market. So I'm really looking at the raw data right now. It's really impressive. Certainly the passive monitoring of the 900,000 on our crowd chat platform is fantastic. But the questions you have are very interesting. And so I got to ask you, how do companies get involved? Are they sponsoring this? Was it sponsored by vendors? Is it your product? Just clarify just some of the nuance. And how do people get involved to influence some of the questions? Are you open for that? Do they pay to get involved? How do they get the data? Sure, so the Wikibon business model is one in which we, companies and end users will come to us and basically underwrite our research. So this particular survey was not sponsored by any one given vendor. This was proprietary Wikibon. Yeah, it was put on by us. It was supported by the revenue that we drive from all our clients because we think it's valuable information for everybody in the Wikibon community. You want to get more involved with Wikibon and understand some of the things we do. Obviously, you can tweet me at Jeffrey F. Kelly. You can email us at Wikibon. You can find us on Facebook, LinkedIn, et cetera. We welcome, the Wikibon community has open arms. So we welcome everybody and we'd love to hear from you. Okay, this is theCUBE. I'm John Furrier with Jeff Kelly. We're going to be breaking down Hadoop Summit for the next three days, two and a half days live in San Jose, live in Silicon Valley. This is theCUBE, we'll be right back with our next guest after this short break.