 Live from New York, it's theCUBE covering big data NYC 2015. Brought to you by Hortonworks, IBM, EMC and Pivotal. Now your host, John Furrier at George Gilbert. Okay, welcome back everyone. We are here live in New York City for Hadoop World, Strata, Hadoop, Strata Conference, big data NYC. It's all happening here. We are on the ground 100 yards from the Javits Center right up West 37th Street, big studio kind of like the today's show. This is theCUBE. I'm John Furrier, George Gilbert, our big data analyst at wikibon.org. And our next guest is CUBE alum, David Richards, the founder and CEO of WAN Disco. Great to see you. Welcome back. Always great to be here. And what a fantastic space you guys have, guys. Phenomenal. Well I love having you on one because you're the CEO of a public company in the space for many years. But you're also kind of like a guest analyst. We get to get the perspective from you. So the theme this year that we're hearing on theCUBE is beyond Hadoop, get real Hadoop. Beyond Hadoop, Hadoop is invisible. It's turning into a spark world. A lot of hype there. So the bottom line is that there are people who want to write big checks in the enterprise to bring the infrastructure to the next level. And that means it's got to be real. And the reality is you're going to have a valuable solution that you get paid for. And that's going to differentiate a lot of people. You've been doing business successfully. What's your take on that? Do you believe in that statement? And what are some of the things that you're hearing relative to what's re-making Hadoop real? So that's a very big question and warrants an answer of at least 30 minutes. But I'll try and keep it to about five. That's, so first of all, the evolution of the Hadoop marketplaces is really fascinating. We're a really interesting inflection point right now. Firstly, I'd say that Silicon Valley is this humongous vacuum chamber, where we want to believe that suddenly Hadoop is so old and now we're onto the next thing, Spark, and then we're onto the next thing, which is in memory compute, then we're onto the next thing. And that isn't the way that markets really work. We're only now seeing the first green shoots of companies going into full-blown production environments with Apache Hadoop. And it's interesting, I mean, Mike Olson 12 months ago said Hadoop will disappear. And I think in a way he's right. And by that, if I go and ask the average CIO of any of the larger businesses that are deploying Hadoop right now, which Hadoop distribution you're using, nobody knows. Because they might be using an oratal, big data appliance. We're doing a lot of deals with oratal right now. They may be using a teradata appliance that, and these are OEMs of various sundry Hadoop distributions. I think the marketplace is moving from basically lab work and in a lab, and this is to your point, John, there are no SLAs, there are no service-level agreements in an enterprise. The first question I ask to any company deploying Hadoop is, what's your SLA? Is it less than two hours? If it's not, that is not a real production mission-critical application. If there is, then it is. And that's where one is at. I would say it could be less than 20 minutes. I mean, the numbers can only get smaller. It's a huge window, two hours. Well, exactly. And believe it or not, a lot of the applications, when we say companies are in production with Hadoop, a lot of those applications are batch-based, where an outage really doesn't matter. They're not using mission-critical data, so security isn't that important. But the market's moving and evolving. And I think that as we move beyond these lab projects, as companies come out of a lab, I think the market reverts to type. And by type, I mean, you're going to see the usual suspects dominate the marketplace. It's going to be Oracle, IBM, EMC, Teradata, Cisco, Microsoft. They're the companies that are going to absolutely dominate enterprise deployments. But there is a real battle royale going on right now. We saw announcements from the Google Cloud, which is Spark on Hadoop. I think it was one cent for an hour. We've got IBM launching, apparently, exactly the same thing. We've got all the cloud vendors are trying to bring those services to market. That's fascinating. So let's talk about what you guys announced. WAN-Disco Fusion, partnership with EMC. I see they're big here. And this world also is, if you look at what Hadoop was seven years ago, when we first started covering it six years ago with theCUBE, it was no startups. EMC, Green Plum, was the first kind of entry in. And then now Oracle, EMC, the whales are all here. So how does someone become a unicorn when the megacorns are coming in? So that's a great question. And I think WAN-Disco is ideally positioned for this move. I'm bound to say that, of course. But let's just think about what's actually happening. Hadoop is going into enterprise. The companies taking it there are not the startup tech businesses. It's, as you put it, the whales, the enormous businesses, the IBMs, the EMCs, and so on. We announced the partnership with EMC today. Most of our production deals, as I go down both pipeline and deals close to date, 90%, you know, are with the whales. So how do you become a unicorn in that environment? You provide a piece of technology that is so specifically unique that they can't do it themselves. Now WAN-SCOPE, Active Active Replication for business continuity is what we do. And we're performing very, very well as this marketplace matures. From a lab, when it's just all the propeller heads saying I'm going to build a nice application that some might need, not a must have, into production, that's where WAN-Disco comes to life. So let's just take that as an example. So I'm just going to use a metaphor. It may or may not be accurate. Data domain had a niche concept called backup and recovery. They nailed it so hard no one can replicate them. But yeah, EMC paid billions of dollars for it, but then it was obviously a bidding war. Franco-Tron was involved in that deal too. We just talked about the banking side, which we'll get to in a second. I want to talk about the M&A activity. But that was something that no one could replicate. Data domain really kicked some butt there and then EMC actually paid it up for it, but not saying that you'll get bought. But potentially, is that what you're saying that WAN-Disco has some core competency and some inability? Or I was with... Because everyone's going to say, well, if you're so good, then why not just copy that? So first of all, we have patents. We've got three granted and seven filed in one of the most complicated, pivotal problems in computer science, active-active WAN-SCO replication. I know it's a big mouthful. I know people's eyes... Gigi would love to talk about it for an hour. But it's a hard problem. And I'll quote a Wall Street CIO that I met with on Monday, who said, I hope you guys don't get bought. And I said, we're a public company. We don't need to get bought. We've got access to capital markets and so on. He said, because what you're doing, you're in a category of one, nobody else can do this. As companies go into production, business continuity becomes an absolute critical component of that. And that's what we do. Well, I mean, I would argue a tender offer would get your attention or some sort of, you know, your category one, a number will always get your attention. But let's talk about the M&A market, not necessarily for WAN-Disco. Because what you have here in this marketplace as Dave Vellante was saying yesterday is, you have words like profitless with prospect. Meaning that means that's the development market. Meaning I'm a startup, I may raise a bunch of venture capital or private equity. I spend $2 to get a $1 in sale. That's not a sustainable model, but I'm going to take down market position and grow from there and have a valuable product. Or I have a valuable product and I sell, I get paid. Or I don't know, or I have a valuable product and I go out of business. So in that scenario, there's growth, capital, tuck under, deals. And there's also, AccuHires. So I want you to comment on the landscape because we will see some dying unicorns as across the entire landscape of tech. You're going to see growth startups just become the next brand name. And then you're going to see great product, no market. That's a great question. So the way I look at the marketplace is the private equity funded businesses are just out of control and valuations. I mean, we're seeing, you know, for companies with less than $100 million of revenue, $4 billion of valuations on what might happen. And I think there's two factors there. Undoubtedly, this highly disruptive marketplace called Big Data that's really driven by cost in Georgia, you've got a great slide that used to show data storage going at 60%, IT budgets going at 5%. That means that the marketplace is being disrupted by cost and cost, not technology. So I think those businesses are benefiting from that. And secondarily, they're benefiting from a PE bubble with interest rates at zero. I mean, let's face it, where else can money go? Money's like a heat seeking missile for opportunity. And one of the places it's going is into small private companies where, you know, they don't have to disclose revenue numbers. And often, you know, when they do file IPOs, there's a big surprises that their revenues are half or a third of what we all thought they were. And I think that's, I think good luck to them because we are creating unicorns. How real are some of them? I mean, come on. I mean, you can't get multi-billion dollar valuations on relatively insignificant revenue. The other market, the M&A marketplace is interesting. There are this inflection point, this idea that we are going from essentially a lab project into production changes the market because if you look at the average deal sizes in what way? Because the first thing that changes is the average deal sizes grows from a sort of $30,000 product sale, which we know because we can see some of the, there is some visibility in this invisible market now versus how companies like us, I mean, what's our average deal size? Three, four, five hundred? This quarter, our average deal size, I think will be half a million bucks. That's clear indication that the market is going into production. And I think- That's some good metadata, so just sharing. Appreciate that, but I want to go back. The $30,000 deals, are those POCs, average POCs? Well, so the theory there is land and expand, right? So you go into a business and they're taking it into a lab environment. You land the deal and they pay you for some services, for some training, for some education, and for a little bit of product. When those deals want to go into production, the theory is that they'll use you to go into production. I don't necessarily think that that is the case because I think the whales, as you put it, John, are going to be those companies that take them into production. And Merv Adrian, I was talking to you on the way here, has that exact thought, so you try it, you put it into a lab with the techie, with a lot of the technology- It's called vendor hopping, party hopping. Correct, it's just the same thing. And I want to be part of that second move in the marketplace, which is where all the money is. So we were speculating, I want to get your thoughts on this, this may not relate to RANDISCO, or you might have some visibility on it. Dave and I were talking yesterday, he asked me directly, how does Cloudera become the unicorn, not a dead unicorn? I go, well, they're already technically a unicorn on paper at a $4 billion valuation. Pure storage got it down around on their IPO filing. So if they go IPO, they may have to get that valuation. But I said, given the kind of stagnant growth of Impala, or this concept of the data hub, with SQL going on Hadoop, driving the lingua franca, that kind of puts a stall on that. So how does Cloudera become a billion dollar revenue company? They got to own analytics. To me, so how do you do that? Co-op spark. So I just was thinking in my head, what is that move? There's a lot of competition in spark. And, oh by the way, who's coming into spark? I'm EM, HP, Oracle, Google, every microsoft. Yeah, you see, that's a great point. It's a difficult conundrum, right? Because if the marketplace moves to type, which I think it probably will, where it, and I sat down with a government CIO a couple of weeks ago and I gave him my whole product adoption life cycle talk about and how the market will move back to Oracle and Google and all these companies. And he said, you've really upset me because I was really hoping that the market would be made by these really small, new unicorn kind of companies. But actually, I don't think it will. I think that this is an evolution of cost in the data center driven by the major vendors and nothing will change. But okay, you said two things there that could be in conflict, which is the major vendors have a price umbrella. That they're trying to preserve like Oracle with at least price $300,000 per processor core for the 12G database, or 12C. But sort of in the open source infrastructure land, we've got the slow motion price collapse. They're helping to bring down costs for others. But if customers, big customers don't want to deal as much with small vendors. So what happens to the pricing if it's the big guys who are going to bring enterprises into production? So there are two forces at play in the marketplace. One, which is that the cost pricing pressure on those companies, traditional technologies caused by open source and Hadoop. The second pressure that you cannot ignore is this idea of cloud and running Spark on, and the point of Spark is that I can run the same compute with a third of the amount of hardware which brings cloud into play. And John, you made a great point. All of the big vendors are placing ginormous bets on that happening. So it's two things. Even if it's the big companies who traditionally have a higher price point, price umbrella, open source software is pressuring it and metered pricing is pressuring it. That's bringing everyone's level down. It may not be down to the level of one of the Hadoop vendors, the independent vendors. But that, you see, I was talking to an investor, very big investor who said how much pricing power is there from those early vendors in the marketplace around Hadoop? I don't think they've got very much pricing power. I think that's a concern because is that a race to the bottom? Well the value shift, no, it's a race to the bottom for sure. We're seeing Hadoop commoditize significantly. That's Mike Olson's invisible. He said kind of going away. What he was trying to say was, and he reiterated this year, funny, he brought it back up again. Probably took a lot of heat for that. But what he was saying was it was not irrelevant. What he was saying is Hadoop is going to be abstracted away and the value creation is going to be around it. That's analytics. That's where we're seeing Amazon crushing it with Redshift. I mean they've just decimated the price value point for data warehousing. So this is going to be the new normal in every category. Analytics is going to be embedded everywhere and it's got to be run with critical infrastructure underneath. With that being said, what does that mean for a CIO or a CXO? Who's out there saying, hey, go to the strata Hadoop world. 13,000 people have come here this year because they're all trying to figure out what do I write the check for? So how do you break that down for a CIO? Okay then, so when I go and talk to CIOs, I say, okay, you need to be looking at two big trends. One which is you undoubtedly have to reduce the cost of storage across the enterprise because it's the fundamental source of competition in every single industry. I was talking to a big insurer yesterday, Monday, sorry, who was talking about suddenly their business model, they believe is going to be disrupted, they're retraining all of the actuarials to be data scientists. So the most important thing for them is to get their data now quickly into storage, into commodity storage and then stop running these algorithms, retraining that stuff. The second thing that they think they're going to do in three years, I'm going to use the same example, is they think because of significant cost benefit, they're going to move to a cloud-based infrastructure. So really, Hadoop behind the firewall, possibly running Spark on top, is a stepping stone really to get to the promised land of cloud and I'm going to reduce my data center costs down to virtually zero. That's what they're all trying to do. Would it be fair to say that Hadoop originally, they were all sort of, you were able to second source with different distros for the most part, but now we're seeing not just at the management, security, governance layer, but even core processing, we're seeing differentiation. But more important, it's Spark as a sort of processing center of gravity is hollowing them out so that they have this very limited set of almost proprietary value add but that's open source. So there's less room, there's less room for them to establish a higher, to maintain a price point. And so I guess moving that framework, that whole framework to the cloud is it's going to be sort of intense price pressure. Correct. What is happening right now, we are going to see the equivalent of browser wars in cloud wars. There is going to be every single major vendor, all seven are going to have Spark running on Hadoop. They're already doing it. And that's really where the pricing pressure is going to come from because every CIO is going to say, well if I can buy this for one cent per hour from Google with their new products, I mean, where does it go? I mean, that's the lowest coin denomination that you can get to, right? For an hour's use of their storage, incredible. Absolutely incredible. All right, so final worry, bumper sticker for this event. If you had to look back at this year's event, we've still got another day to go. What's your take so far? What's the aroma of the vibe, core messaging? Is it, is this what we were talking about? Any other insights and color you could share? Hadoop's getting serious. And when it gets serious, it means money. When it means money, the whales arrive. And if you walk around the floor, who's got the biggest booths? It's the whales. Who's got more people hunting out these deals? It's the whales. The whales are here. There'll be whales out there. It's whale season here in the Duke ecosystem. Obviously, Amazon web services event is next week. There's going to be a lot of whales there. You're starting to see everyone showing up. Cloud is powering analytics. It's the perfect storm. But you've got to have stuff under the hood. You guys are doing great. We're in disco, doing the hard stuff. So no one else has to, right? Exactly. David Rich is great to have you on. Thanks for your perspective. A CEO of when disco also kind of is our guest analyst. Appreciate it. We'll be right back. More live coverage in New York City after this short break.