 from San Jose, California, in the heart of Silicon Valley. It's theCUBE, covering Hadoop Summit 2016, brought to you by Hortonworks. Here's your host, John Furrier. Okay, welcome back, everyone. We are here live in Silicon Valley at Hadoop Summit 2016, actually San Jose. This is theCUBE, our flagship program. We go out to the events and extract the signal from the noise. Our next guest, David Richards, CEO of WAN Disco at Joel Horowitz, strategy and business development, IBM analytics, guys, welcome back to theCUBE. Good to see you guys. Thanks for having us. It's great to be here, John. Okay, so give us the update on WAN Disco. What's the relationship with IBM and WAN Disco? Cause, you know, I can just almost see it, but I'm going to, I'm not going to just tell us. Okay, so I think we, the last time we were on theCUBE, I was here with Ritika, who works very closely with Joe, we began to talk about how a partnership was evolving. And of course, we were negotiating an OEM deal back then, so we really couldn't talk about it very much, but this week, I'm delighted to say that we announced, I think it's called IBM Big Replicate. Big Replicate, yeah, we have a big everything and Replicate's the latest edition. So it's going really well. It's OEM'd into IBM's analytics, data products and cloud products. You know, I'm smiling and smirking because, you know, we've had so many conversations, David on theCUBE with you on, and following your business through the, you know, bumpy road or the wild seas of big data. And, you know, it's been a really interesting, you know, tossing and turning of the industry. I mean, Joel, we've talked about it too. The innovation around Hadoop and then the massive kind of slow down and realization that cloud is now on top of it. The consumerization of the enterprise created a little shift in the value proposition and then a massive rush to build enterprise grade. Right, and you guys had that enterprise grade piece of it, IBM, certainly your enterprise grade, you have enterprise everywhere, but the ecosystem had to evolve really fast. What happened? Share with the audience this shift. Well, so it's classic product adoption life-sital and the buying audience has changed over that time continuum. So in the very early days, when we first started talking about these events and we were talking about Hadoop and we all really cared about whether it was big. You once had a distribution. That's a throwback today, it's tomorrow's Thursday, we'll do that tomorrow. So the buying audience has changed and consequently the company's involved in ecosystem of change. So where we once used to really care about all of those different components, we don't really care about the massinations below the application layer anymore. Some people do yes, but by and large we don't and that's why cloud, for example, is so successful because cloud is, you press a button and it's there. And that I think is where the market is going to very, very, very quickly. So it makes perfect sense for a company like Wondisco, we've got 20, 30, 40, 50 sales people to move to a company like IBM that have four or 5,000 people selling their analytics products. Yeah, and so this is an OEM deal. Let's just get the news on the table. You're on OEM, I've used an OEM their product and branded IBM big replication. Yeah, it's part of our big insights portfolio. We've done a great job at growing this product line over the last few years with last year talking about how we decoupled all the value adds from the core distribution. So I'm happy to say that we're both part of the ODPI. It's an ODPI certified distribution that is Hadoop that we offer today for free. But then we've been adding, not just in terms of the data management capabilities, but the partnership here that we're announcing with Wondisco and how we branded it as big replicate is squarely aimed at the data management market today. But where we're headed, as David points out, is really much bigger, right? We're talking about support for not only distributed storage and data, but we're also talking about a hybrid offering that will get you to the cloud faster. So not only does big replicate work with HDFS, it also works with the Swift Object Store, which is, as you know, the underlying storage for our cloud offering. So what we're hoping to see from this great partnership is, as you see around you, Hadoop is a great market, but there's a lot more here when you talk about managing data that you need to consider. And I think hybrid is becoming a lot larger of a story than simply distributing your processing and your storage. It's becoming a lot more about, okay, how do you offset different regions? How do you think through that there are multiple, I think there's this idea that there's one Hadoop cluster in an enterprise. I think that's factually wrong. I think what we're observing is that there's actually people who are spinning up multiple Hadoop distributions at the line of business for maybe a campaign or for maybe doing fraud detection or maybe doing log file and whatever, and managing all those clusters, and they'll have cloud error, they'll have Fortinworks, they'll have IBM, they'll have all of these different distributions that they're having to deal with. And what we're offering is sanity, right? It's like, give me sanity for how I can actually replicate that data. I love the name, Big Replicate. Fantastic. No, Big Insights, Big Replicate. And so go to Mark, you guys are gonna have bigger sales for us. It's a nice pop for you guys. I mean, it's a good deal. Nice. We were just talking before we came on air about sort of deal flow coming through, it's coming through this, or potential deal flow coming through this, which has just been off the charts. I mean, obviously when you turn on the tap and suddenly you enable thousands and thousands of salespeople to start selling your products, I mean, IBM are doing a great job. And I think IBM are in a unique position where they own both cloud and on-prem, right? That's, there are very few companies that own both the on-prem. Well, they're going to need to have that connection for the companies that are going hybrid. So hybrid cloud becomes interesting right now. Well, actually it's, I mean, there's a theory that says, okay, so we were just discussing this, the value of data lies in analytics, not in the data itself, it lies in you've been able to pull out information from that data. Most CIOs- If you can get the data. If you can get the data. Let's assume that you've got the data. So then it becomes a question of- That's a big assumption. That's a big assumption. How's Nancy Henley on about metadata? I mean, that's an issue. People have data, they store, they can't do anything with it. Exactly. So, and that's part of the problem because what you actually have to have is CPU slash processing power for an unknown amount of data at any one moment in time. Now that sounds like an elastic use case and you can't do elastic on-prem. You can do elastic in cloud. That means that virtually every distribution will have to be a hybrid distribution. IBM realized this years ago, right? And began to build this hybrid infrastructure. We're going to help them to move data, completely consistent data between on-prem and cloud. So when you query things in the cloud, it's exactly the same results and the great results you get. And also the stability too. There's so many potential, as we've discussed in the past, that sounds simple and logical, but to do it on enterprise grade is pretty complex. And so it just gives a nice, stable enterprise grade component. I mean, the volumes of data that we're talking about here are just off the charts. All right, so give me a use case of a customer that you guys are working with or has there been any go-to-market activity or an ideal scenario that you guys see as a use case for this partnership? We're already seeing a whole bunch of things come through. What's the number one pattern that bubbles up to the top? Use case-wise. I, as Joel pointed out, that he doesn't believe that any one company just has one version of Hadoop behind their firewall. They have multiple vendors. 100% agree with that. So how do you create one simple cluster from all of those? That's one problem you solve. That's, of course, a very large problem. Second problem that we're seeing in spades is I have to move data to cloud to run analytics applications against it. That's huge. That requires completely guaranteed consistent data between on-prem and cloud. And I think those two use cases alone account for pretty much every single company. I think there's even a third here. I think the third is actually, I think, frankly, there's a lot of inefficiencies in managing just HDFS. How many times do you have to actually copy data? If I looked across, I think the standard right now is having three copies. And actually, working with Big Replicate and One Disco, you can actually have more assurance that you actually have to make less copies across the cluster and actually across multiple clusters. So if you think about that, you may have three copies of the data sitting in this cluster. Likely, an analyst have dragged a bunch of the same data in another cluster, so that's another multiple of three. So there's a huge amount of waste, right, in terms of the same data living across your enterprise, that I think there's a huge cost savings component to this as well. Yeah, does this involve anything with Project Atlas at all? You guys have been working with that project. It's interesting. I mean, it's going to be, we're seeing a lot of opening up the data, but all they're doing is creating versions of it. And so then it becomes version control of the data. Do you see a master or a centralization of data? I mean, I see nothing so much centralized, pull all the data in one spot, but why replicated or, I mean, do you see that going on? I guess I'm not following the trend here. I mean, I can't see the mega trend going on. Tell me the, what's the big trend? I mean, the big trend is I need an elastic infrastructure. I can't build an elastic infrastructure on premise. It makes, it doesn't make economic sense to build massive redundancy, maybe three or four times the infrastructure I need on premise when I'm only going to use it maybe 10, 20% of the time. So the mega trend is cloud provides me with a completely economic elastic infrastructure. In order to take advantage of that, I have to be able to move data, transactional data, data that changes all the time into that cloud infrastructure and query it. That's the mega trend. This is something that's there. So moving the data around the right time. And that's transaction, anybody can say, okay, press pause, move the data, press play. So if I can understand this correctly, I'm just sorry, I'm a little slow into the day today. So instead of staging the data, you're moving data via the analytics engines. Is that what you're getting at? Data, data that's being transformed. I think you're accessing data differently. So I think today with Hadoop, you're accessing it maybe through like flume or through Uzi, where you're building all these data pipelines that you have to manage. And I think that's obnoxious. I think really what you want is to use something like Apache Spark. Obviously we made a large investment in that earlier, actually last year. And to me, I think what I'm seeing is people who have very specific use cases. So they want to do analysis for a particular campaign. And so they may just pull a bunch of data into memory from across their data environment. And that may be on the cloud and maybe from a third party and maybe from a transactional system and maybe from anywhere. And that may be done in Hadoop, it may not, frankly. Yeah, this is a great point. And again, one of the themes in the show is that it comes down, this is a question that's kind of and talked about in the hallways and I'd love to get your thoughts on this is there are some people saying that there's really no traction for Hadoop in the cloud. And that customers are saying, you know, I'm not, it's not about just Hadoop in the cloud. I'm going to put it in S3 or object store. You're right. I think- Yeah, I'm right. Every single- There's no traction for Hadoop in the cloud or- Well, I'll tell you what customers tell us. So customers look at what they actually need from storage, right? And they compare whatever it is to do for any on-premise proprietary storage array and then look at what, you know, S3 and Swift and so on offer to them. And if you do a side-by-side comparison, there isn't really a difference between those two things, right? So I would argue that it's a fact that the functionally storage in cloud gives you all the functionality that any customer would need. And therefore the relevance of Hadoop in cloud probably isn't there. I would add to that. So it really depends on how you define Hadoop. So if you define Hadoop by the storage layer, then I would say for sure, like HDFS versus an object store, that's going to be a difficult one to, you know, find some sort of, you know, benefit there. But if you look at Hadoop, like I was talking with friend Blake from Netflix and I was asking, you know, so I hear you guys are kind of like replatforming on Spark now. And he was basically telling me like, well sort of, I mean, they've invested a lot in pig, right, in hive. And so if you think it now about Hadoop as this broader ecosystem, which you brought up Atlas, you know, we talk about Ranger and Knox and all this stuff that keeps coming out. There's a lot of people who are still invested in the peripheral ecosystem, you know, around Hadoop as that central point. I mean, my argument would be that I don't think, I think there's still going to be a place for, you know, distributed computing kind of projects. And now whether those will continue to interface through yarn via and then down to HDFS or whether, you know, or whether that'll be yarn on say an object store or something. And those projects will like, you know, persist on their own. To me, that's kind of more of how I think about, you know, the larger kind of discussion around Hadoop. I think people have made a lot of investments in terms of that ecosystem around Hadoop. And that's something that they're going to have to think through. And Hadoop wasn't really designed for cloud. It was designed for commodity servers, deployment with ease and at low cost. It wasn't designed for cloud-based applications. Storage in cloud was designed for storage in cloud, right? That's what S3, that's what Swift and so on were designed specifically to do. And they fulfill most of those functions. But Joel's right, they will come to use it. Well, what's my whole argument? My whole argument is that, why would you want to use Hadoop in the cloud when you can just do that? There's object store out there's plenty of great storage opportunities in the cloud. So they almost say shoe horning Hadoop. But I think that's, anyway. There are two classes of customers, right? There are customers that were born in the cloud. And they're not going to suddenly say, oh, you know what? We need to build our own server infrastructure behind our own firewall because they were born in cloud. All right, so I'm going to ask you guys this question. You can choose to answer or not. Joel may not want to answer it because he's from IBM and gets his wrist slapped. Hadoop ecosystem, this is a question I got on DM. Hadoop ecosystem consolidation question. Yeah. That was mainly in the question. People are mainly in the question now. Keep sending me your questions. If you don't want your name on it. Hold on, Hadoop system ecosystem. When will this start to happen? What is holding back the M&A? So that's a great question. So first of all, consolidation happens when you sort of reach that tipping point and leveling off that inflection point where the market levels off and we've reached market saturation. So there's no more market to go after. So the big guys like IBM and so on come in. Or there was never a market to begin with. Exactly. And that's the case. But yes, I see the point. Now, what's stopping that from happening today? And you're a naughty boy, by the way, for asking this question. A lot of these companies are still very well funded. So while they still have cash on the balance sheet, of course, it's very, very hard for that to take place. You picked up my next question. But that's a good point. The VCs held back in 2009, after the crash in 2008, Sequoia's Memo, the Good Times role, or RIP Good Times, they stopped funding companies. Their companies are getting funded, continually getting funding. So I don't think you can look at this market as like an isolated market. Like there's the Hadoop market and then there's a Spark market. And then even there's like an AI or cognitive market. I actually think this is all the same market, right? Machine learning would not be possible if you didn't have Hadoop, right? I wouldn't say it. It wouldn't have a resurgence that it has had, right? And how it was like one of the first kind of machine learning languages that caught fire, right? From Ted Dunning and others. And that kind of brought it back to life, right? And then Spark, I mean, if you talk to... I wouldn't say Cray, I'd say incubated. And create that renaissance experience. Yeah, I mean deep learning, some of the machine learning algorithms, right? Require you to have a distributed kind of framework to work in. And so I would argue that it's less of a consolidation, but it's more of an evolution of people going, okay, there's distributed computing. Do I need to do that on premise in this Hadoop ecosystem? Or can I do that in the cloud or in a growing Spark ecosystem? But I would argue there's other things. I would agree with you. I love both areas of my snarky comment. There's never a market to begin with. What I'm saying there is that the monetization of commanding the hill that everyone was fighting for was just one of many hills in a bigger field of hills. And so you could be in a cul-de-sac of being your own champion of no paying customers. Or a free open source product. Unlike the .com era where most of those companies were in the public markets and you could actually see proper valuations, most of the companies, the unicorns now, most are not public. So the valuations are really difficult to, the valuation metrics are hard to come by. There are only few of those companies that are in the public market. Well the cash story is right on, I think, to Joel's point. It's easy to pivot in a market that's big and growing just because you're in the wrong corner of the market, pivoting or vectoring into the value is easier now than it was 10 years ago because one, if you have a unicorn situation, you have cash on the bank. So they have a good flush cash. Your runways so far out, you can still do your thing. If you're a startup, you can get time to value pretty quickly with the cloud. So again, I still think it's very healthy. In my opinion, I kind of think you guys have good analysis on that point. I think we're going to see some really cool stuff happen working together and especially from what I'm seeing at IBM and the fact that IT, you know, in the IT crowd, I mean there is a behavioral change that's happening that Hadoop opened the door to that we're starting to see more and more IT professionals walk through. In the sense that Hadoop has opened the door to not thinking of data as a liability but actually thinking about data differently as an asset. And I think this is where this market does have an opportunity to continue to grow as long as we don't get carried away with trying to solve all of the old problems that we solve for on-premise data management. Like if we do that, then we're just, you know- The metadata is the key. Metadata is a huge issue. I think that's going to be a big deal. And on the M&A, my feeling on the M&A is, is that, you know, you got to buy something of value so you have revenue, which means customers and or intellectual property. So in a market of open source, you know, it comes back down to the valuation question. You know, if you're IBM or Oracle or HP, I mean, they can pivot too and they can be agile. Now slower agile, but, you know, they can literally throw some engineers at it. So if there's no customers on I&P, they can replicate that product. So they don't know what they're buying. My whole point is, if there's nothing to buy, you know. I mean, I think it depends on, you know, ultimately it depends on where we see, you know, people deriving value. And clearly with Wondisco, there's a huge amount of value that, you know, we're seeing our, you know, customers derive. So I think it comes down to that. And there is a lot of IP there and there's a lot of IP and a lot of these companies, I think it's just a matter of, you know, widening their view. And I think Wondisco is probably the earliest to do this, frankly. What's a recognize that for them to, you know, to succeed, it couldn't just be about Hadoop. It actually had to expand to talk about cloud and talk about other data environments, right? Well, congratulations on the OEM, DL, IBM. Great name, big replicate. Love it. Yeah. Fantastic name. We're excited. It's a great product and, you know, we've been following you guys for a long time, David. Great product, great, great energy. So I'm sure there's going to be a lot more deals coming on here. Good strategies, OEM strategy thing, huh? Oh yeah. Do you use the sales cost? Gives us tremendous operational leverage. I mean, you know, I mean, getting 4,000... Get a great partner in IBM. They know the enterprise. Great stuff. This is theCUBE bringing all the action here at Hadoop, IBM, OEM, and deal with Wondisco. All happening right here in theCUBE. We're back with more live coverage after this short break.