 Live from New York, it's theCUBE covering Big Data NYC 2015. Brought to you by Hortonworks, IBM, EMC, and Pivotal. Okay, welcome back everyone. We're live here in New York City on the ground, live for Big Data NYC, the Silicon Angle, Wikibon's theCUBE, CrowdChat, our event as part of and in conjunction with Strada Hadoop, which is 300 yards behind us, down the street, right next to the Java Center. Again, this is all about getting the data, about Big Data, our flagship program theCUBE. We're out here extracting the signal noise. Final segment for day one, analyzing, talking to all the entrepreneurs, the vendors, the VCs, all coming through our studio here in New York City, and I'm here with co-founder of Wikibon, Dave Vellante. Of course, I'm John Furrier, the founder of Wikibon Silicon Angle, and our Big Data analysts, chief analysts on the Big Data side, George Gilbert, to break down and share with you the signal that we're extracting in some of the highlights. Dave, I want to talk to you first. Interesting lineup today, we had the big dogs come in, we had IBM, we had Cisco, EMC, Pivotal, a lot of the big players, and we had some startups. We had a launch on theCUBE, we had a product launch, so we talked to the vendors. Okay, no customers yet, we'll see those tomorrow. But everyone wants to know what's going on in the space. It's confusing, there's a transitional transformation going on, a sea change, you're obviously seeing big moves, Hadoop is being commoditized, analytics is front and center, a lot of speeds and fees with Spark, a lot of new game-changing things over the past 12 months that are going to impact the future. What is your take? We're beyond Hadoop, certainly, no doubt about it. Poll question out there today, that Hadoop is going to be regulated to be invisible, not going away, they'll be relevant as part of a bigger landscape, but things like Spark under the hood is happening. So there is now a beyond Hadoop architecture. So I want to get George's take on the architectural impact, but I want to get your take on the business landscape because it's going to impact the practitioners, people who are making decisions about what to do out there, certainly they are concerned, they want to know what's going on, what's your take on those guys? Here's my take, a couple of adjectives, crowded, overfunded, profitless, okay, great potential, but I'll go back to the original theme that we struck years ago on theCUBE, that the practitioners, the people who are applying big data technology are the ones who are really going to make the big money here. Okay, so we're still, the investors, Wall Street crowd, they're still sort of looking for that next Intel, the next Microsoft, the next Cisco. I tell you, they're not going to find it in the vendor booths over there, they're going to find it in the way in which people are transforming their businesses, they're going to come out of disrupting industries, leveraging that digital matrix, that's where the money is to be made. Now, I'm not saying that the- So whether we're crowded, profitless? As you're crowded, overfunded, and profitless, with great potential, so I'm not saying that you won't see some players emerge, but I mean, are we going to see a $5 billion company emerge from the Hadoop ecosystem? I mean, that's a big question right now. I think Cloudera was the company I was looking towards as the one to be the breakout name, kind of like what Amazon did on cloud. I just don't see Cloudera getting there with Cisco, EMC, all these big guys, Oracle, kind of crashing the party here with the billboards and the cars. I just don't see Cloudera getting that to being a sustainable, multi-multi-billion dollar company. I just don't, unless they go all in on analytics. I think they have to go in and be that new vendor, Dave. It's a big pivot, right? So it's a huge task. So but this is where George, your analysis is, I think, spot on when you said there's a slow, steady decline in infrastructure software pricing. I'd even reword it. A slow motion collapse in infrastructure software pricing. I love it. And so that really underscores the challenges of companies like Cloudera. Hold on, say that again. A slow motion collapse in infrastructure software pricing for two reasons. Open source and metered pricing coming from the cloud. Open source, if you look at a chart of how many projects are on GitHub, it went from maybe five years ago. It was in the below, I think, one million level. It's now at 20 million and climbing at an accelerating rate. When you look at that, it makes it very hard for a company to sort of sell infrastructure software that's closed source or proprietary unless they really have some serious value at. And metered pricing just means you can't charge up front, you know, which is how essentially software companies always paid for their sales forces. So I mean, the license was what paid for the sales rep and maintenance was what you made profit later on. And that model is pretty much dead. And I think conventional wisdom says, oh, the traditional data warehouse guys, the IBMs, the big players are screwed. That was the traditional thinking coming into this way. But I will tell you, you just look at IBM's analytics business. I mean, they're throwing off more revenue in a year. You know, their growth in revenue in one year is bigger than this whole ecosystem combined. This is a $18 billion business for IBM. And they're making a lot of money there. So to your point, how do these other guys get escape velocity? Is that what you were talking about when you were saying, you don't really see how some of these guys can break out? Yeah, so to my point, Dave, that's a great, great point, great question. I want to clarify, I don't see a breakout company because there's too many big competitors with a lot to lose in the enterprise space that see what George just said, the slow-motion collapse of infrastructure pricing. And they have the resources to shift to value. And they have sales forces and technology that they could actually pivot through. Big companies can pivot in this market just like little guys. So I think that's one thing. The Hadoop as a storage layer that's finally being talked about now in Cloudera validated that with their announcement today. They're trying to put, you know, rosy colored sunglasses on that one, but it is storage. And they are positioning themselves saying, look at Hadoop has commoditized its storage. A lot of access is all gray. We saw this coming. Okay, I'd buy that. But Spark is driving the conversation. Spark is about analytics. And I think the big play for someone to break out and be a billion, that billion-dollar unicorn, true billion-dollar unicorn with profits, not paper one, is to co-op Spark and take that ball to the finish line. To me, that is going to be the dangerous move by Cloudera or any one of those players to try to co-op Spark and run to the finish line. To me, that's the only move that I see right now. And again, integration, unified. These are talks of the big guys, Dave. This is not a- I got a question for you. Is the goal, your Silicon Valley, is the goal in Silicon Valley to build great companies or is it to achieve a great exit? They're not mutually exclusive, right? So I think Silicon Valley has a machine, right? Any innovation machine. So you're seeing pure storage in particular, and Cloudera overvalued, in my opinion, based upon the revenues. It's all about future. So that means Cloudera needs to be that unicorn. If they can be that unicorn, then this strategy is good. They've already had liquidity. Intel put a lot of money in. There's been some exit off the table from the investors and management team. So that's an exit, not a full exit, but the private market's good. So there's some liquidity going on there. But Silicon Valley is about innovation. Money-making is a side effect of that, but I believe that Cloudera wants to be a durable company. I said it in the crowd chat. They do not want to sell. I talked to the founders since inception and that I see the same thing today. We are going to go big or go home. That's Silicon Valley. So, a couple other things. So we were talking about Spark, George. I wonder if we could parse through that a little bit. The survey data that we have suggests very clearly of the 60% of the people are doing Hadoop, a very, very large proportion of people are doing Spark. It was the number two software instance installed on the person's after Cloudera manager. That's right. The other data point is the vast majority, 70, 80, 90% of the people that we talked to said they are going to shift some workloads to Spark that would have run on Hadoop. That's very clear. The flip side of that is there's an A-sayers on Spark that say, look, Spark's complicated too. You need to have Python expertise. It's not just Nirvana. It's really for developers. What's your take on all that? There's one picture you can draw that's evolving. Hadoop used to be a set of analytic processes or data processing capabilities surrounded by management and security and storage underneath. So it was this, you could think of it as a box. Management, storage on the bottom, say security on the other side or governance. And what's happening is Spark is carving out that whole data processing, analytic processing center so that what's left with Hadoop is there's value there, but it's a lot less value. It's hollowing out the Hadoop ecosystem. And that's for companies that have committed to Hadoop. There are, as we know from actually, not just our data, but Databricks data. It turns out most Spark users now are not running it on the Hadoop infrastructure. 48% are using it outside of the Hadoop ecosystem. So value comes from being a trusted advisor being a trusted advisor to the enterprise of a high value application. And that's been the traditional model. And this is why you see Oracle spending their diminishing cash flow from the database on buying application companies to move higher up in the stack. It's a heat shield for them. But so we've always been skeptical about people's ability to make money, John, in Hadoop. Does that say then that Hortonworks actually had the right model in a very difficult market, i.e. just give it away and sell a subscription because the rest of it is going to be free? No, I think, I mean, no, I don't think they had the wrong model. The Hadoop model, the cloud area. You think Hortonworks has the wrong model? Yeah, I think they're the right model. Oh, okay, sorry. Yeah. I thought I heard differently. No, that's what I'm saying. Is that sort of underscore that Hortonworks actually has the right model? Albeit a difficult one, but it's- Well, I'll just say for the record, cloud area and Hortonworks had the right model. It's evolving and both are positioning themselves very quickly for the future. You see, if you look at the tactics of both Hortonworks and cloud area, they are sitting in the boardroom saying we bet to get to a position in the marketplace before the puck gets there and they're smart enough, both companies to know what that is. We're taking different approaches. Again, I think Hadoop will be invisible in the future, but very relevant, no doubt about it. You can compute the data, that's value in the storage, hence the cloud area announcement. But the real value where the money's going to be paid is going to be for real time, high value activities that people see value in. That's where the money's going to be. That's clearly analytics. So as far as I'm concerned, it's really clear where the market's going. It's software systems that are intelligent, systems of intelligence, where today, with mobile, you get recommendation engines, we joke about that earlier, advertising, where the speed of the app for the human experience has to be, oh cool, click buy, that's a nice recommendation. That's the speed of the interface for the humans. That is the speed of business. That is only, how many milliseconds? 100 milliseconds? You move to machines. The speed of the machine dwarfs the speed of a human being. So you're going to start to see machine learning and tools and streaming, flow-based architectures are going to change the game. That's why Watson's a big bet for IBM, and that is a huge opportunity. Well, you're going to talk about this in your talk tomorrow. Essentially, the two areas that customers have to invest in, the ability to capture that data in near real time and affect an outcome, but also the ability to improve over time. Can you talk about those two things? Yeah, those are, those actually, right now we talk about them as sources of competitive advantage relative to the plumbing that systems of record. That these are two pipelines. One is make that recommendation or prediction in real time or human real time, and eventually in machine real time, then there's another pipeline which has improved that decision making process so you're getting smarter. But I want to go back to what Dave said, which is, at some point, the rocket science that goes into building these systems is going to diffuse out to mainstream companies. When the Microsofts and Amazons and Googles of the world make it much simpler to build these types of systems, the advantage may no longer be in selling that type of software. It'll be in the companies who use it, whether it's ad tech or banking or some other type of, some other type of vertical solution. George, thanks so much. We have to end it right there. We're getting on the break. They're going to pull the plug. We have free research here on 527 West 37. That's our studio, one block from the Javits Center. Come by, we have a party tomorrow Wednesday. Five to seven is a presentation, special presentation on leaching new research and then seven beyond seven. I'm going to party here. Come by, check us out down in the cube. Two more full days of live coverage coming up. Stay with us on theCUBE. We're going to be bringing you a lot of action and a lot of signal here at Big Data NYC as part of and in conjunction with Strata Hadoop. Thanks for watching. We'll see you tomorrow.