 At Big Data SV 2014 is brought to you by headline sponsors WANDISCO. We make Hadoop Invincible and Actian, accelerating Big Data 2.0. Okay, welcome back everyone, we're here live at Big Data SV, I'm John Furrier, the founder of SiliconANGLE. It's our day one wrap up of Big Data SV, covering the Stratoconference and all that happened is a big data in Silicon Valley, all the innovations, I'm here with my co-host Dave O'Loughy, the co-founder of wikibon.org, joined by industry analyst, wikibon analyst Jeff Kelly, just put out the seminal report again on another authoritative Big Data study, guys this is the wrap up and day one, I think the initial vibe here and the report is ton of news, you see in the distros, really being successful, obviously Hortonworks has the red hat announcement, they were just on early with Microsoft, you're seeing Big Data explode into more of a budget oriented discussion, we're starting to hear now, we always say where's the proof in the pudding, where's the rubber reading the road, where's the steak on the bone, meat on the bone, and that is customers, you're seeing a lot more business conversations with budgets, with deliverables, with outcomes, so very exciting, we had a lot of great people on Dave, and what's your take, I mean first we'll get your take on today and then we can go into some of the commentary. Well, like I said earlier, I think you've seen a lot of focus on partnerships, a lot of distribution, a lot of relationships, a lot of ecosystem development, which is a sign of maturity, I think we are hitting that phase of okay, we've heard a lot of announcements and product and other innovations, and now you're seeing more partnerships, you're seeing new pricing models, you're seeing people tweak, and I think toward the end of this year you're gonna start to see the pretenders separated from the contenders. Jeff, what's your take on that? I don't know if it's gonna happen this year, I think it's gonna take a little bit more time before the pretenders kind of get weeded out of the field. Yeah, because there's still a lot of VC money out there. There's a lot of money out there, so I don't think this year's gonna happen. We'll see a couple, we saw a couple last year, we'll see a couple this year, but I think it's gonna be a couple more years before we start to really see, you know, in our industry report we had what, 70 plus vendors, maybe you know, I think next year we'll have just as many, but maybe in three years it might only be, you know, might be half that, so we'll see. But it will shake out. So when's the bubble burst? You say in two years? I think at least a couple years. I think at least a couple more years before you'll start to see the field win-out through acquisition and through a few, you know, some will fold. So I got to get to the take, Dave, we always talk about like the entrepreneurial side-bar conversation, start-up, making the innovation, having the Silicon Valley, what's your take, and from your standpoint, making the state-of-the-start-up landscape for big data? So I think as we were just talking, I mean, there's still a lot of VC money in the pipeline. You're seeing, even though they wouldn't call it a roll-up, it's really interesting to see what Acti and did in terms of bringing in all these different companies and presenting it as a platform. And so, and at the same time, you look at a company like Paxata, Paxata is, you know, solving problems that people have been talking about, and we'll see. I haven't seen the solution yet, but from what they're describing and the customer update they're describing, there's still some innovation going on here. We saw that with Clear Story last year, sucking in a bunch of money. So I think there's still a way to go. I think that a lot of the things, whether it's, you know, text analytics, whether it's making Hadoop run better, whether it's real-time performance, you know, a lot of those problems still haven't been solved, and I think there's money pouring in to a lot of start-ups saying, okay, go ahead and solve these problems. So when I was saying, you know, toward the end of this year you're going to start to, you know, separate the pretenders from the contenders. I think this is the year that if you're, if you're very well funded with VC money and you've been around for a few years, four or five years, you better start to show customer traction or you're going to be out of money in a couple of years, and you're going to be one of the ones that disappears. So I think that you're starting to see, it's like the jockeying before the start of the race, you're starting to see the positioning taking place, and I think there's starting to, as you squint through it, you're starting to see some of the companies that are better positioned. I mean, I think that the big distro vendors are, you know, all sort of, you know, grabbing their spot, and their elbows are sharpened, and we'll see where it ends up this year. I think this year is going to be, I mean, this is going to get intense this year. This is going to be a real fight because you're starting to see real revenue from Hortonworks, from Cloudera, from MapR as well. MapR is not going away, by the way. Yeah, we never thought they were. No, I mean, you know, some have, you know, questioned their kind of longevity, but I think they're around. That's the Silicon Valley whispers, you know? Yeah, you hear it all the time, but I think you're going to see this year is really when, you've got companies like Cloudera, for instance, they need to hit the 100 million revenue mark because they want to go public the following year. You're going to see Cloudera, sorry, Hortonworks. Similarly, I think they have now kind of revved up their channel. They've got Teradata, they've got Microsoft, Salesforce ready to go out there and really start selling. I think you're going to see a real boost from them this year. And it's going to get, you know, if I think, you know, we, there were some back and forth in the market between those players and MapR as well. I think it's going to only increase this year. The stakes are high. There's a lot of money being invested. There's going to be more money invested in those companies this year. You might see more raise. So John, when we first started here, it was like, okay, what is a dupe? And how can I use a dupe? And Cloudera had no competitors. You were early on, you know, used to be co-resident at the Cloudera offices. So you had sort of the early, you know, line of sight into this, the market trends. How has it evolved in your view and what has surprised you and what have, what has not surprised you? I mean, if I think about it, it's a lot of things have surprised me. Some things haven't surprised me. One thing that hasn't surprised me is the dominance of Hadoop as a viable open source platform. I saw that early on. We both were early on in the Hadoop world. It's now evolved into an industry. It was pretty obvious that trajectory was relevant and it would be an industry. The disruption didn't surprise me. I think the disruption of Hadoop was as expected on my radar. What surprised me is Cloudera didn't dominate as much as I had thought they would. I thought Cloudera was going to absolutely be public by now. I thought that they would be dominating at many levels. I think that they got caught early on thinking that they had a good lead. I'm surprised by Hortonworks. I'm surprised by their rapid ascension to a number two if not challenging Cloudera for a number one position. The red had announcement and Microsoft clearly is validation for Hortonworks. At the same time, Cloudera is valid. What didn't surprise me was I said when Hortonworks came into the market, I said there's plenty of beach for everybody. Remember that comment? There's plenty of beach head. Just get your sections. That's true. I think the Cloudera-Hortonworks conversation never really materialized. I think Gigong was like, the war, the distributions. I don't think that there's going to be a problem. I think Hortonworks is going to have a good business model. I think Cloudera is going to have a great business model. I'm bullish on Cloudera. I'm surprised they didn't dominate as high they did, but I'm very bullish that Cloudera is going to have a nice position in the market. I think they're smart enough to differentiate, but clearly they're different vectors. You look at Hortonworks and Cloudera, completely different animals in my opinion. Again, that was a surprise. I'm surprised that the big guys have not come in and done a lot of buying. That's a surprise. I thought there'd be a lot more M&A activity at this level. I think this might be the year we see some of that M&A. Yeah, do you think that the big guys don't know what to buy yet? They don't know where to place their bets? Or they don't think it's real? What's your take on that? I think it's similar to enterprises out there. They're trying to determine who they're going to place their bets on, who they're going to bring in, whether it's the Hadoop distribution which one to pick, if it's a SQL database which one to go with. Similarly, the big guys, the mega IT vendors, they're not sure either. They're watching this play out just like the rest of us. Frankly, I think some of the big mega vendors really are still just trying to get their thoughts together on how they're going to compete in this market because it's so disruptive to the traditional business model of some of these big companies. I think the reason why in my opinion is that it's just too frothy. Look at the valuations of Cloudera, Hortonworks. It's just priced out. I think the VCs that are behind say Cloudera and Hortonworks are big time VCs. You've got Excel and they're contingent. Hortonworks has a benchmark on their contingent. They've got big bets. They're expecting massive returns on that. Once you hit that valuation, you're way out of range. Look at on the consumer side, Box.net is going to go public. You've got Dropbox. The valuations are outrageous. If they don't get sold or go public before the bubble bursts, then it's going to come down to revenue. That's why you're seeing this year really start to feel the energy that is going to be an explosion of growth because it's at that point now where if they don't start banking some revenue, there's a diminishing return going to kick in. They have high valuations. It's like, come on, we've been hearing this. Let's see it. I think that's where the energy is coming from. How important is the distro? You've got really the big three in the distro space. We talk about IBM, we talk about Intel and others. The big three, there are Hortonworks and MapR. How important is it to own the distro and what impact has Hortonworks' entrance into the marketplace and their dogma toward open source affected that dynamic? Well, I think Hortonworks has really changed the dynamic significantly. They're giving away the product. That's significant downward pressure, price pressure on the other competitors in this space. There's different ways to compete with that. You can try to build application or I should say build kind of modules on top and kind of expand your value to those types of things like Clutter, Zoom with Impala and their Spark announcement today. MapR has been focusing more on bringing in some transactional workloads on top of HBase. They're kind of up leveling the conversation as one way to compete with that. How important is it to own the distro in the sense that owning the market, it's a potentially valuable market, but it's not huge. Actually, when you look at our forecast, Hadoop is really just a small slice of the overall big data market and we forecast it to remain that way. So you're saying it's not critical? Well, I think it's from a leverage standpoint. I think it's an important part of the overall picture, but it's not the most important thing necessarily. There's so many other components that go into leveraging big data. We've talked to Syncsort today around data integration. One topic we didn't get into too much today is the cloud and what role that's going to play. Then you've got all the application providers that are going to come out of the woodwork as we progress as the infrastructure hardens. That's where the real value is in my opinion is the applications and the service providers. I want to ask you about the changes in pricing models. My understanding is cloud era has changed its pricing model or is it still experimenting? What do you know about that? What can you tell us about that? Yeah, so they simplified their pricing model. I think they got some customer feedback that essentially their old model was confusing and they made the strategic decision to streamline their offerings and how they priced those. So essentially down they've got three basic pricing levels. They've got the basic edition which is your core Hadoop and then the management software including their proprietary management software. I think what they're calling flex, their flex option essentially in addition to the core Hadoop and the management software, one of the add-on modules whether it's Impala or Spark or Search and then their enterprise data hub is the third pricing level which is essentially all you can eat. It's the core Hadoop, it's the management software and as many of those models on top as you want to use. So they've just tried to make it a little bit simpler. Before they had subscriptions for backup and recovery for too much great Hadoop. Yeah, it was just a bit confusing and they realized it was important to just kind of simplify that. From a pricing perspective, the pricing that both on, you could price that as per terabyte or per node. So they try to offer a little bit more flexibility but at the same time simplify it. So that makes sense. I think it's a smart move. Frankly there's a lot of confusion still in this market whether it's Hadoop specifically or big data generally. So when you can take steps to simplify the conversation and make things a little bit easier for the customer to understand, that's a good thing. We're still seeing day of the trends. I've been taking notes here, the themes, data science, the Hadoop market still rocking database applications and development, legacy vendors and then new emerging areas, new modern approaches. I think the big developer conversations is also going to continue to accelerate. We're seeing that here today. The sequel that genie is out of the bottle. That's going to be a done deal. I think the big gorilla guy is going to take that down. If you're a startup you better have a nice niche in that market or big muscle. Yeah, well I mean the sequel on Hadoop is an important part of the overall Hadoop as a fully functioning multi application platform but that's what it is, it's a feature. So it's one part but it's a challenge to build a business just on that alone. So Jeff, you were out there on the ground at Strata conference. We were broadcasting live here at Big Data SV right across the street. What was it like out there? What was the vibe? Just give us some highlights. Who did you see? Do you see any celebrities? Any action at all? We had a lot of celebrities here on the queue. You probably had more celebrities here today. So it's day one over there at Strata. Not a lot of the full day and half day workshops. It's a core audience. Yeah, the exhibition floor wasn't even open yet. So not a lot of action to see. I did hear a little bit talking to a couple of the folks that attended the Data Science track today and there was a little disillusionment with that. A lot of the data scientists were people that attended the Data Science track, whether they call themselves data scientists or not, were a little disappointed that the content didn't go a little bit deeper, frankly, and they were talking about that on Twitter. So that's a challenge, I think, for somebody like O'Reilly over at Strata. They've put together this great conference. But I think what it started as we've talked about today, as you guys have talked about today in the queue, was a lot of mostly t-shirts. It was data scientists and hackers. It was some really cool and depth hands-on stuff. Now we've got a lot more busy people there. So as the core audience of a show like Strata expands, that's the challenge is to keep it relevant for all those different stakeholders. A lot of skill gaps, a lot of educational coming down, a lot of Hadoop data science, big data, data transformation, data fusion. Dave, I always talk about our first Hadoop, where Abhimeh and yourself and I were talking about data factories, the revolution. Abhimeh, if you're out there watching, shout out to Trasada, a great friend. He first coined the term data factories in New York City when it was Hadoop World. And we're seeing that. You're seeing the data fusion. You're starting to see that, playing and partying with data in a way that's business line, sea level, not CIO. This comes down to the data manager. Who's going to be doing the data? So these data factories need to be managed by people who know how to manage data. You know, there hasn't been much discussion today about security and data privacy. And I would expect the balance of this week. We're going to hear more about that. You mentioned Abhimeh, we're going to have Abhimeh on later this week. I don't know if there are guys from Squirrel coming on, I think they are. I'm sure they'll be talking about security as well in privacy. With Cloud and Snowden and all that stuff, it was a big topic of discussion last year. Certainly it really, I think, peaked at Hadoop Summit and then carried through big data NYC. What have you been hearing about the whole security meme? Without question, security is critical and it's not getting enough coverage or enough, people are not putting enough thought into it right now. And I think it's going to be a challenge as we kind of go from this really early adopted to mainstream early mainstream adoption of Hadoop and some of the other surrounding technologies. Security and privacy are two issues that are going to continue to come out. The industry really needs to address it. Most of the, with the NSA scandal, the eye of the public has been mostly focused on the government. But I think as people start to realize just how much data commercial enterprises are collecting and using, it's going to turn pretty quickly to the commercial side. So I think the industry, I've been saying this for a couple of years now, needs to get out in front of this topic. John, what do you got? So I just want to do a shout out and just as we wrap up this segment, thank our sponsors. We would not be here if it wasn't for the support of the community. We've been doing big data coverage now. We're on our fifth season. Dave and I have interviewed and the CUBE team over 3,204 guests. I was told by our team, actually it was like telling me we've interviewed over 1,000 people. Damn! 3,200 plus people, Dave. That's ridiculous. Who are their names? We would not be able to do this without the support of sponsors. We have an underwriting model for that independent action. Bring that tech sports coverage to you. We love it. We love doing it. And the sponsors here for Big Data SVR, WAN Disco, ActiN, HP, Vertica, Cloudera, Tresada, MapR, Syncsort, InfoObjects, AlterX and Squirrel, we really appreciate it. And shout out to WAN Disco. They were also sponsoring Big Data NYC. It's the sport of the community and giving the vote of confidence with the sponsorship. We really appreciate it. And it allows us to do our thing and pay the freight to get all the equipment in the team here, the full teams here. And we're doing crowd chats. We're doing our best to extract the signal and share that openly and publicly. So thank you very much. I also want to say, so we're here broadcasting at the Hilton. We're in the Yosemite room today, tomorrow, Wednesday and all day Thursday. Also tomorrow night, Wednesday at 6 o'clock, we've got a reception, a party going on in the coastal room, just coming to the Hilton to go to the right. You'll see us there. So all our friends in Silicon Valley, even if you're not at Strada, if you are at Strada, definitely stop by. But even if you're not at Strada, come on by. We're in Santa Clara, right in the heart of the valley. So you're more than welcome. Friends of the Cube, we'd love to see you guys. Great. We're going to be back all day, all day live tomorrow, packed house, sold out. Everything's sold out. All the slots are filled. Editorial guests will be reporting live. A ton of news to talk about. A lot of action. Now it's time to go to the MapR party. We're right next door. We happen to have a beer with those guys. So it's beer time for the Cube guys. I'm John Perrier with Dave Vellante and Jeff Kelly. And the whole crew will see you tomorrow live. First thing in the morning here on Big Data SV, hashtag Big Data SV. Live from Silicon Valley, Big Data, Silicon Valley. We'll see you tomorrow.