 New York City for Big Data NYC, hashtag as Big Data NYC as part of and in conjunction with Strata Hadoof O'Reilly's Big Conference, which is literally 100 yards from us. We are on the ground, live in New York City's special presentations, the Cube, I'm John Fourier. With my co-host Dave Vellante and our Big Data analyst, George Gilbert, kicking off basically three days of wall-to-wall coverage of what's going on in the Big Data space and certainly Big Data NYC really is an encapsulation of O'Reilly, Strata, Hadoof, what's going on in Wall Street, what's going on in the markets, what's happening with Big Data and its impact to society and more importantly, the value of these companies that are selling software and cloud services. Dave, George, great to see you guys. We're here. You can see it's like the Today Show here in the Cube. We're down in the ground. People welcome by. And we're really right on the doorstep of the Javits Center where all the action is happening. And I'm going to bring a lot of news coverage. Fifty yards. Fifty, I said a hundred. It's like a good three-iron fade. You know, nice and easy. Now to draw George. Great to see you guys. First of all, let's just jump right into it. A lot of big news is going to be coming out. Cloudera is kind of reinventing themselves, putting out now a storage layer to try to fill a hole in there. We see Hortonworks with Dataflow, a lot of stuff happening with those guys, the incumbents, if you will, of Big Data and the Hadoof space. But the advent of Spark, the advent of DevOps and software is really a big part of what this show is about. Because as we had talked about at Hadoop Summit, Big Data Analytics is the killer app. It's a competitive advantage, but the cloud is powering that. And the technology under the hood is changing. Dave, this is impacting the value of companies. It's impacting their offerings. Let's start with you. What's your take? Well, when we first did Hadoop World back in 2010, John, in New York City, when I got the call from you and we flew in and we kind of guerrilla our way in, it was a shiny new toy. It was not well-known, it wasn't widely adopted, certainly not in the enterprise, and we saw the adoption of Hadoop in the enterprise take place over the last several years. And the big theme became making Hadoop enterprise-grade enterprise-ready. And that's still one of the big complaints about Hadoop, but the other one is complexity. And as George, I'm sure you're going to talk about Spark coming into the fray, meaning more integration, being simplifying things for developers, has really taken the world by storm. And you're seeing this interesting rift in the community. The Hadoop huggers, if you will, saying, well, no, no, it's all good. We're adopting Spark, and then the Spark guys really pushing that Spark is going to replace a lot of the Hadoop functionality. So there's an interesting rift going on. There's a lot of jockeying position. You're seeing George a lot of database announcements. So very interesting developments in the ecosystem, which is one big mess. Dave, so the mess is out there, and that is really just an evolution of the market space. Impact on the public companies and the impact on the private market. Obviously, Cloud Air's valuation is over $4 billion. We saw pure storage have a down round as they file for their IPO. Is the economic bubble bursting or the softening of the market going to affect these high private company valuations? And obviously, what's the impact for the public companies and then opportunity for startups? Well, I love how George Gilbert phrases this. This is slow, steady collapse in infrastructure software pricing. And the key to me is that it's a slow and steady collapse. So it's giving the oracles and the terror data of the world enough time to pivot. And so they're able to make investments. They're able to make both R&D, organic investments, and also other tuck-ins. You're certainly seeing terror data do that. You're seeing or oracle sort of reposition its portfolio. They've both got big presences here at Strata and a dupe world. So it's giving them time. And then the second thing I want to say there is initially we saw a big backlash within the traditional data warehouse world where it was that big sucking sound. Our survey data, we just completed a new survey at Wikibon, suggests that the data warehouse continues to be a mainstay in data warehouse, data integration. Some of those tried and true tools that you get from Informatica and oracle and others are actually fundamental to people's big data analytics. Right, George? Just keying off that notion of the simplicity of the data warehouse, Hadoop is the opposite. It's not a product. It's an ecosystem. Now on that side, the benefit is you get hyper innovation. But you've got dozens of different products where each distribution vendor is trying to glue them together. Now, we're beginning to see that ecosystem spin out beyond the control of any one distro vendor. And so we're seeing where we used to have all the core parts of Hadoop be essentially agreed on by each vendor. And they would do their own kind of management and security around the edges. Now even the core is becoming fragmented. So we're back to the proprietary flavors of Unix that we saw in the late 90s. The idea being that you can no longer have a second source, a reliable second source for your distro. So that would suggest that the winner is going to make a lot of money. The number two guy is going to make a little amount of money. The number three guy is going to break even and everybody else is going to be gone. And there's another two dimensions beyond that, which is people haven't really talked about Azure, Amazon, and Google as competitors with their native services. But they are building deep, deep integration design, build, test, integrate, operate, where- As a service. Yes, as a service, and that's the operate part. Where unlike Hadoop, it's not an ecosystem out of control. It's a set of tightly integrated services all the way from development to operations. And that makes it more palatable, more consumable to mainstream companies. The second thing that's coming actually as a surprise in terms of much faster adoption than we expected is Spark hollowing out the Hadoop ecosystem. Just briefly, Spark always leveraged the storage layer, the management layer, the security layer of Hadoop. But now in the latest survey put out by Databricks, and so to be taken with a grain of salt, except that it's not of Databricks customers. It's all Spark customers. 48% of Spark customers are running it without yarn, which means they're running it without the Hadoop bits. So they're running it outside the Hadoop ecosystem. I gotta ask you, George, what's changed in your mind, Dave? I'd love to get your perspective too. As analysts, looking back right now, what's changed? And what does the research tell you? I guess what's changed is that there's a growing awareness that we have a complexity problem. And when you have enterprise software growing at 100% a year, as the Hadoop vendors are showing, and you have essentially a complex product, the customers aren't able to absorb and deploy software at that rate. We've seen this movie before with ERP in the late 2000s, with B2B software in the late 2000s. Basically it means you've got software piling up within the customer site. And basically as a service versions, whether Hadoop, Spark, or the proprietary cloud vendors, that helps it get consumed faster. I think, well, if I can add, so I think what's changed is this whole ecosystem used to be, oh, let's play nice, and it's getting extremely competitive. Anytime you have a TAM that's this big, and you get the big enterprise vendors coming in, and now you get the cloud guys coming in, a lot of jockeying for position. We're going to have a panel Wednesday night here at the location on the ground here with all the Mer from Gardner. We're going to have some customers up there, big name customers, Time Warner among others, huge advantage. Final question guys, what is the noise out there, and what is the signal we should be looking for to extract from the noise? Dave, we'll start with you. Well, so I think it's how are people going to differentiate, as George says, is this slow decline in software pricing as a result of open source. So how are people going to make money? They're going to make money by identifying, with the exception of Hortonworks, identifying ways in which they can differentiate from the competition. Hortonworks, obviously we know their game. It's the long game. It's the volume game. Interestingly enough, two-thirds of the people in our survey are actually paying for either a Hadoop distribution or a subscription. So there's light at the end of that tunnel. But I think that's really the thing that we have to look for. What is the differentiation, and what's the adoption? George? And I would just add to that the business model is evolving where these systems of intelligence have so many components in them. And vendors are saying, with this slow motion collapse in upfront pricing, they're basically saying we'll make money helping you, the customer, run our software. But there's so many components in the software that any one vendor helping a customer run their component doesn't really add a lot of value unless they plug into something larger. Guys, great stuff. Great research from Wikibon. Cutting Edge is the best in the business. wikibon.com. Go to that site. Check out the research. A lot of free content there, as well as a subscription for some of the more advanced cutting edge front line stuff. And of course, guys, it's all about beyond Hadoop. My view is simply this this week. What's going on beyond Hadoop? Hadoop is moving to the next level. Cloudera is already saying it with, hey, Hadoop's there, but we're going to do more. And Hadoop's taking a backseat with the innovation. So very interesting to see how that gets played out. This is theCUBE. We'll be right back with more live in New York City after this short break.