 Okay, we're back live at Stratoconference. Okay, we're back live at Stratoconference. I'm John Furrier, the founder of SiliconANGLE.com, SiliconANGLE.tv. We're here live with theCUBE at the Stratoconference. We're going to extract a signal from the noise. This is day one. I'm here with my co-host Dave Vellante from wikibon.org with the research firm, open source, research free. And we have some recent news here, and we want to cover it here on a drill down about big data. We've heard from some scientists. We've heard from open source. We've heard from nonprofits. We've heard from big consulting vendors like EMC. Big data is a big business changing the world and changing society. It's creating a ton of opportunities. And the big number that we're seeing is a $50 billion market. This report put out by wikibon's own Jeff Kelly, the best analyst in the business who put out the only report, the first report on market sizing, market share. Jeff Kelly, welcome back to theCUBE. So, one. Thanks for having me. What's the feedback from the report? So, what do you hear? That's good to be first. You take all the arrows, right? Right. You're a man for doing that. We know there's some real-time wiki editing going on when the calls were coming in. So, one, tell me about the report, feedback about the report. First, tell the folks out there high-level synopsis of what the report is, what the feedback has been, and what corrections and or iterations are you making? Mm-hmm. Well, basically we set out to size the big data market because we didn't think anybody else had done it. Yeah. Right, yeah. So, essentially, the first challenge there was defining big data for the sake of the report. And we cast a pretty wide net, including Hadoop, but also other analytic tools and applications as applied to big data, kind of the next generation data warehouse market as well. So, that's kind of where we came at it. We then had to look at the, not just the Hadoop pure plays, but also the big guys, IBM, et cetera, who actually, IBM came out on top, do largely to their wealth of kind of database options and their Hadoop, their adoption of Hadoop. So, you included, for example... Hold on, Dave, Dave, just to interrupt. Mark Hopkins, if you could put on the screen for the folks out there who want to look at this report, it's wikibond.org, wikibon.org, slash big data. So, you type in that URL, you put up on the screen on the lower thirds, if possible, just so people can know the report is, that'll take you directly to the research report, the big data market size and vendor revenues. Huge report. Sorry, Dave, didn't mean to interrupt. No, it's okay. So, Jeff, you were going through the definition. You said you cast a pretty wide net. So, you included things like, obviously, you look at hardware, software, and services, the whole enchilada. That's how you get to five billion in 2012. But you included things like Oracle's Exadata, correct? We did. All of Exadata? Well, not all of Exadata. Certainly, that was one of the issues. Some people have taken issues with us. I don't think all Exadata is big data. No. We could have a little debate about that. Well, you mentioned big data washing earlier, so that may fall into that category. But we considered a certain portion of Oracle's Exadata revenue to be big data based mainly on the data volumes involved, as well as, you know, it is a unique appliance. And again, just the volumes involved, we thought it was fair to include a portion of their Exadata revenue. Okay, so, and your numbers are, I said, $5.4 billion this year, up to $53 billion by 2017. What's that growth rate? That's a... So, somewhere around 58% compound. Compound annual growth rate, yeah. Now, you had this slice of that five billion that is pure play. So, you said IBM led all players, and a lot of that is services, I presume, right? Yeah, most of it on there. A lot of software and their analytics. Well, yeah, they've got a lot of software. Obviously, they've made countless, seems, analytic acquisitions. But also, yeah, they lead their big data play with services first and foremost. So, talk about the pure plays. Who led in the pure plays? Well, overall in terms of revenue was Vertica, now part of HP. Are they a pure play? Well, we included them in the pure plays, essentially we included a handful of the next generation data warehouse vendors in the pure play market, mainly because we feel like they were doing a lot of the innovation prior to, and even since their acquisitions. And also, they really haven't been kind of polluted, as we say in the report by their acquirers at this point. Vertica's been allowed to, for the most part, continue. Wouldn't HP say we add value to our acquirings? Well, you know, I'll let HP speak for themselves. They surely would, but. Or EMC, who acquired Green Plum. I think that EMC number might be a bit high. What's your thoughts on EMC? I mean, Green Plum is not doing that well. Yeah, what else is in there? So you're saying that's just pure play, Green Plum, Icelon is not, and that's just Green Plum. Right, that's the Green Plum. I mean, the Icelon is running as Green Plum. Green Plum is running as Green Plum. I mean, the Icelon is running as Green Plum. EMC overall. 43 million? All right, EMC overall, we've got, we included a slice of the Icelon business as well as their Hadoop business, which is obviously, you know, falls under the Green Plum area, as well as their services. So what's your take? Well, Bill Smars basically ratified the fact that this big business in the integration with BI, which I would agree with, it makes a lot of business sense. Can you talk about the pure play and or Green Field? Cause Cloudera and the Hortonworks are battling it out for, and we'll have both those companies on here at theCUBE this week. So tomorrow, Cloudera is coming on and Hortonworks is coming the next day. And by the way, Cloudera helps sponsor us to get here. So shout out to our friends at Cloudera, which we love for sponsoring us. Thank you very much. Those two guys have a Green Field. So they have no legacy. They're pure, you know, big data, commodity hardware. So they're hitting the Green Field. What are some of the Green Field applications that you're seeing for those guys? Yeah, and Cloudera would be the pure play Hadoop, right? So let's talk about that a little bit too. Well, essentially, you know, those vendors are doing the major innovation around making Hadoop as a platform stable, enterprise ready, you know, making it ready to go into environments and enterprise environments where there are strict SLA requirements, uptime requirements, things like that. There's still a lot of work to be done, of course, but that's where they're playing a major role. Cloudera in particular is also working very closely with a number of partners. Like Oracle. That's not the direction I was going right here. I want to go and talk about that. But they do partner with Oracle. We should talk about that. But they are also partnering with a lot of the big data application vendors out there that are coming out, that are sprouting up everywhere these days, it seems, trying to make it as easy as possible for big data application vendors to apply their applications to Cloudera's Hadoop stack. So, and then of course, Hortonworks, we've got taking a little bit of a different model. They recently had a shuffle in their executive ranks. Well, the guy's CEO became CEO. That's not really a shuffle. Well, their CEO, I mean, everyone at Yahoo knew that he wouldn't be a long-term CEO and he was publicly saying that, you know, when I interviewed when they did the spin out or they quote, walk out. So this wasn't a demotion, right? No, no, not at all. I mean, the guy from Benchmark came in clearly to run the show and that's clearly the way they're going and they're going full throttle with the Benchmark playbook, so. That's pretty common, right? I mean. For Benchmark, it's clear. That's their playbook. They're going to go right at Cloudera, drive for second place as fast as they can and then see what they can get from there. Well, they're, you know, they confirmed or recommitted to their 100% open source distribution. Who did? The Hortonworks. Oh, absolutely, yeah. Recently with, when they got a little bit of press around that move in the executive. But how did Hortonworks respond to the massive number of Cloudera's revenues? Because, you know, the basic context between Cloudera and Hortonworks right now is, I'll say the service revenue is Hortonworks, Cloudera's got a license with Enterprise Edition and their proprietary add-ons. You know, you got to look at that and say, huh, you know. Well, they've, let's, look, they don't have a product yet, Hortonworks. They're putting out their own Hadoop distribution, which hasn't gone GA yet essentially. They've got their services business, which is how they're going to make money. But they're not, you know, they're still a young company. They're still, as far as we know, working with Yahoo as their main customer. Some work with Microsoft. But they're, you know, they've got a ways to go. They've obviously got some smart people and some real talent there. And I certainly think they can make a go of it. But like I said, they've got a ways to go if you look at our report. Cloudera's got a pretty significant lead revenue-wise at this point. I mean, they definitely got the marquee proof of concept going from what we're hearing. But it's still small. I mean, we're talking about 300 million out of 5 billion, right? So there's still a lot of upsides there, right? Yeah, absolutely. What about acquisitions? I mean, are all, are these companies going to stay private, in your opinion? Or is there going to be a whole new slew of, based on what you're hearing at conferences like Strata, a whole new slew of startups? I mean, other is only 10 million, but there's got to be like 10,000 companies in there. Right, right. Well, there's going to be tons and tons of big data application startups you're going to see over the next, you're already seeing it. You're going to continue to see it over the next five years. I think a lot of the players we have here on the PurePlay chart currently, you are going to see a lot of consolidation in the market and they are going to be acquired, a good number of them, especially the more successful of the players, much like we saw in the BI world, few years ago, 2007, 2008, much as we saw maybe last year with the next generation data warehouse vendors who we've still included in the chart, Vertica, Aster, Green Palm. And then we can kind of, this might be a good segue into the cladior oracle conversation. We're seeing a lot of partnerships among some of the bigger, more traditional database vendors, some of these Hadoop focused distribution vendors, Cladior and Oracle signed a deal, Teradata and Hortonworks just announced a deal, I believe earlier this week, it might have been late last week. So, we're seeing the traditional database vendors. Well, Microsoft cut a deal with Hortonworks today, they reaffirmed that deal in another announcement today, which I got an email from. So guys, how's this shaking out? You got basically the three big guys, or three main players in the Hadoop distribution is Cladior, Hortonworks and MapR. And I guess Lexus, Nexus, which isn't Hadoop, and then IBM sort of, right? So what's the landscape look like? Can we talk about that? Last time we had this conversation at Hadoop World, I'd love to get your thoughts on it. What you see is how that's all shaking up. Well, the landscape in terms of the options out there, if you want to deploy an eternal Hadoop. A year ago, you had Cladera, and we had Amar Awadal on last strata, February, John. And I said, what about competition? Dave, there is no competition. Now look, you've got Hortonworks. Yeah, but they had a sizable lead, and they had the more key proof of concepts, and they're perceived as better code, given the premium upgrade, because the freemium model's totally working for Cladera. So the horses are on the track, let's handicap them. What do you see? Well, among the pure plays, you've got Cladera, as we said, clearly in the lead revenue wise, they've done a lot of successful proof of concepts, and their challenge right now is going to be to expand those into really large deployments. Hortonworks, as we talked about, they've got to prove it now. They've got the talent, they've got the business model that they've settled on, and now it's their time to go... Get the backing. Right, it's now their time to go show they can make a business out of it. MapR... Which is going to be purely a services business. Purely services, a technical support training type services. MapR taking a slightly different approach, depending on who you ask forking the Hadoop code, or they would say improving it. Well, so let's talk about this a little bit. John, I'd love your opinion on this. I mean, Hortonworks, I like the strategy, because it's pure services, but it's hard to leverage services. It's hard to scale services. It's hard to get like multiples, five X valuation revenue multiples of services. So that's sort of one thing. What's your take on that? Versus say the Cladera model, where they started in services, and now really working towards software. Regarding Hortonworks, do you want me to sugar coat it, or do you want me to do it? No, I want that. We want, hey, listen, this is the queue. We want, we don't know how it is. When everyone says join a sugar coat it, that means they're not going to sugar coat it. Look at Hortonworks is a great company. They have really good horses inside the company. You know, as we say in baseball, it's from Arzo, pitching wins pennants, and defense wins super balls, right? You know, Hortonworks clearly has got the engineer. A lot of contributors. A lot of talent. Well, they have more contributors over there at Cladera that can argue that. They can piss in the wind on that, but then they both companies, Cladera and Hortonworks have killer engineering talent. Okay, and then Hortonworks also has the service contract with Yahoo, so they're also kind of still kind of collaborating, quote, with Yahoo. Separate from Yahoo, Hortonworks has got great talent. Here's the problem with Hortonworks. Services revenue's not going to cut it. They got to pivot, which I hate that word, but services revenue's not going to cut it. It's a total red herring. I think that's just a nice messaging for the marketplace. I think you're going to see them move very quickly with a freemium model, and figure out that Cladera and Hortonworks on the scheme of the dollars, if the market continues to grow at this pace, and if it does, then these numbers might still be in the small side. If they are on the small side, it's just too much money and the competition's not Cladera or Hortonworks. It's those guys against EMC, Oracle, and others. So to me, that's the big issue, and I think Hortonworks is going to wake up and say, okay, we're solid number two with Cladera as the new school, and from there, you'll see a lot of different stuff. Okay, now you've got Cladera who's doing a pivot, right? It was really primarily selling training services and other services. Cladera's never done a pivot. They've always had that model. Okay, but they've always had that model, but they weren't actually executing on it until what, a year ago, maybe? No, they've always been executing on it. On software? Yeah, it's always been their model. Cladera has not changed from their mission, which has been great, which is they want to change the world, commodity hardware, a dupe, contribution, and their business model was clear from day one. And the way you make money in that open source world, Jeff, with your Cladera, is how? Well, they essentially add some proprietary tools around the management of your Hadoop cluster, as well as services around managing Hadoop internally at their customers' sites. So essentially that's how they're making money there. The core of the Hadoop distribution that they are selling is, or should say are giving away is pure Apache, 100%. Okay, so, but the Enterprise Edition is a proprietary set of code that you have to pay for. They don't make that source code available. Exactly, the management console, which allows you to kind of streamline the management of your Hadoop cluster as opposed to doing it with the open source tools and doing it yourself. Okay, but MapR gets criticized for being proprietary and doesn't, is it Cladera proprietary? Well, the difference there is some, you would argue that the kind of the core of the Hadoop ecosystem is the file system. And that's where MapR takes a proprietary approach with their direct access NFS, which even competitors I think would agree does improve performance significantly over HDFS at this point. Removes some of the single point of failure issues, improves performance speed. And it's also, according to MapR, API compatible, which means while the code isn't open, it is easy to move your data in and out to go from MapR's distribution to someone else if you wanted to. So there's value in that propriety? Oh, there's absolute value. They're starting to make a little bit of money, they're starting to make some noise. And I expect them to, I mean, they have a solid but potentially limited business, but I still think they can make some money based around those organizations that are a little bit more mature with Hadoop that kind of know where they wanna go with it rather than kind of the experimentation phase. Does EMC buy them? They've got a relationship or do you think EMC will go into a different direction? Well, EMC is already to some degree going in another direction. They are, they've rebranded their MapR based distribution to Green Plum MR. And again, they're focusing that on the high-end customers that have very kind of sophisticated big data needs that know what they want, really that are really leveraging big data for monetary value now. Okay, so they also have Green Plum HD, which is their enterprise edition, that does not include MapR. So they're hedging all the bets. So who's the favorite on the track, Cloudera? I would say Cloudera is the favorite right now among the pure plays for sure. I mean, you've got to consider IBM for IBM. Yeah, so of those three. Now, let's actually, before we handicap, let's bring in IBM, where do they fit and are they maybe the favorite course? Well, I certainly wouldn't count them out. I mean, the one, here's what they have a lot of different products that overlap, they've got, I mean, as we mentioned earlier, all the analytic companies that they've acquired over the years, they've got DP2, the InfoSphere warehouse line, and the TISA, they've got their own Hadoop distribution and a platform called Big Insights. So they, you know, it can be a little bit confusing when you're trying to understand all the options they have. And as I said, they lead with their services business and they go in, unlike someone like an Oracle now saying, hey, we've got a big data appliance, easy to understand, similar with EMC, IBM says, well, let's talk about your business product. What do you need? And then we, they say, we've got you covered, but let's figure out what we need. Stop, IBM, didn't IBM outperform Apple last year? At least one point in the year. I believe, I believe you, all right, we've got our app here, but so last question is, if you had a handicap, those four, as the leading distributor of Hadoop, who do you put in the lead? Right now, I put Cladera in the lead. My honest opinion is that they will be acquired at some point in the next three to five years, whether who it will be is hard to say, but I think this conversation is going to be very different in three years when we're talking about who's leading because we're going to be talking about the mega vendors. Okay, Dave, just some quick news bites from here, the news cube, we'll bring you some quick news cubes here. Vanessa Alvarez from Forrester canceled her appearance, so folks, she won't be here. Forrester not going to have any presence here. Wikibon will carry the load, so we'll help her out. Vanessa is a good friend of ours. Other news, Apple is announcing their iPad event on March 7th coming up, and so that's going to be a big thing. Obviously, the iPad really represents the future of what big data can represent. We saw that SAP Sapphire. You're going to get an iPad three? I'm definitely getting an iPad three, for sure. Give the other one to your kids. I already lost that one to my wife and kids. And so just a lot of good stuff going on. Here, we're seeing a lot of discussion about journalism. Data journalism going on. So obviously, you have your pet little memes floating around here. Obviously, O'Reilly is very academic, very commercial developer kind of mindset. We bring more of the business mojo here with the market share numbers, so Jeff, thank you very much for that. There's another theme that I wanted to just bring up, and it sort of relates to data, big data is really not related to Strata, but there's this hard disk drive shortage. A number of the companies like HP and Dell have indicated that the hard disk drive shortages has hurt them, has hurt their supply, they've had a raised prices. My question is, what does that do to the cloud? What does that do to flash? Does it accelerate the adoption of those technologies? And that's just a theme that has been sort of bouncing around. Well, they had a session here called Deep Data, and when I looked at the sessions on this one, I thought that they were good sessions. I noticed they kind of were middle of the road. They weren't as deep. I think you're going to see a different kind of dynamic at the different conferences. We will be at Hadoop Summit, and you're also going to see HBase conference being put on by Cloudera. So a lot of great stuff going on, a lot of great news. IBM just cut 1,000 workers. My HP is laying off 250 from the WebOS division. iPad event, whole world's changing. The big guys are retooling. iPads changing the world. Obviously, data is a big part of that. I believe that big data is kind of a revolution. We're going to have seven scenes since the PC. We're going to bring it all to you from SiliconANGLE TV.