 Live from New York City, it's theCUBE at Big Data NYC 2014. Brought to you by headline sponsor, Juan Disco, with support from EMC. Mark Logic and TerraData. Now, here is your host, Dave Vellante. Welcome back to Big Data NYC, everybody. You're watching theCUBE. theCUBE is our live mobile studio. We go out to the events. We extract the signal from the noise. You gain some dollars here as the CTO of Wendisco. Long-time CUBE guest and CUBE alum. You gain good to see you again. Thanks for coming on. Thanks for having me, Dave. So we were talking offline here. You've been shuttling back and forth from the Javits Center. What's the vibe like down there? What's the traffic like? What's the conversation going on? Traffic's a mess. New York's a little bit harder to get around these days. But the crowd at the Javits is really good. A lot of customers with active deployments to production, which is a vast difference from a year ago. We are hearing concerns from customers that all relate to taking their applications to full-on, 100% uptime available solutions. Yeah, when we first started broadcasting at Hadoop World in 2010, five years ago, we had our fifth year celebration of Hadoop World last night. And the crowd was the T-shirt crowd, nose rings, you know that whole deal. My understanding is it's creeping up closer to my age. You're seeing the enterprise.it folks and the business people. So we get to embrace Hadoop, aren't you? Absolutely. We're seeing the enterprise.it folks and the business folks put different values on specific features. And that's changing the nature of how Hadoop evolves because the vendors are listening. I mean, make no mistake about it. Both Hortonworks, Cloudera, and other distro vendors are all tuned to what the customers are saying. So the feature set that Hadoop is evolving is improving, I think, for the better. Yeah, and innovation's happening quickly. You know, the book on open source has always been, it's great, it's wonderful, kumbaya, but it's really slow to develop. And it's risky and all those things. You've heard them. It seems to be changing. It seems like the pace of innovation is quite rapid given all this investment that's going into. I agree. I mean, the cathedral and the bazaar was written in the day of Linux, and today's open source world is vastly different, partly because of the money that's going into the Hadoop companies, and partly because the Hadoop companies have taken a different approach. It's not GPL, it's Apache licensed, so it's more of a free-for-all environment. I think the old model for open source may not necessarily apply anymore. The Hadoop is a new generation, a new evolution of open source software. You're saying that the GPL was more of a free-for-all, and now it's more structured than the Apaches providing the leadership. Quite the reverse, actually. I'm saying that. I just understood that. What did you mean by that? By that, I mean, the GPL still forces you to contribute everything you do back to the community while the Apache license does not. And the original premise was that the Apache license would cost people to take the code and not put things back. But that hasn't happened. Every company in the Apache ecosystem desperately wants to put their source code back, wants to put their source and offerings back into core Hadoop. So it's turned into this free-for-all environment where open source companies are competing with proprietary companies to improve Hadoop in different directions. But effectively, isn't it not achieving the same result? Or sorry, couldn't have GPL have achieved the same result? That's what you're saying? It's, um. Or was it people were afraid to jump in because of the GPL? The GPL scared a lot of people off, whereas the Apache license has opened it up. Proprietary companies look at it and go, we can take the stuff and build our stuff and sell the software. And a year or two down the line, realize without community support, it's really not worth much. So because the GPL concept was not only free of charge, but free to do what you want, as long as you get back and what you're saying is the Apache license says free of charge, free to do what you want with it, and you don't have to give it back anybody says, but we're going to give back anyway. We have to give it back because our customers are telling us if it's not supported by the community, we won't pick it up. Next thing you know, your idea of taking from the open source community and not giving back has changed and you want to be accepted by the community. So this is all new and different in the open source world. And it's resulted in an explosion of innovation and an acceleration of innovation, right? So if your strategy is to try to stay ahead of the open source curve, it's harder and harder to do that. That's correct. Now, you guys, part of your strategy is to stay ahead of the open source curve. So give us the update on what's going on when disco, active-active, global distributed namespace, you know, 24 by seven. What's the, what's been the uptake of what you guys are doing and you've been working on some interesting projects. Let's get into that. Right. So the uptake has been phenomenal. The financial customers just love us because they believe that this is the only way to really comply with regulation. And having 100% active-active with full utilization of all resources is just very exciting. That's our current product. We have a new exciting product that's coming out from the labs that I run. This is exciting because at the core of our company, we are a replication company, strongly consistent replication. So we looked at it and we've seen a lot of customers with different Hadoop systems struggling to integrate all of those. The product we've come up with is a unification product, a single namespace across different distributions running in different data centers. You can point your data in just pipeline at this new namespace. You can run your analytics against it. It's strongly consistent. So everywhere in the world, you can have Hortonworks distro in one data center, Cloudera in another one, MapR, or even Isilon and other shared storage systems. If you do an LS, you'll find the same files. If you read the data, you'll find the same data. You can run applications against all of those MapReduce and Hive. We came up with this mostly from conversations with customers who firmly decided that they are not going to settle on one distribution. We're supporting multiple distributions for different applications, different reasons, and we found that unification in this manner provides a lot of value for them. That's interesting. I got a lot of questions. I want to start with just your, get your insight on the diversity of Hadoop distributions. Jeff Kelly yesterday said, listen, with all due respect to everybody outside of these three, Hortonworks, Cloudera and MapR are really the big three that are gaining momentum in the marketplace. And there are others, but his sort of prediction was that the market's going to consolidate around those three and then maybe there's going to be further consolidation. How much difference and diversity is there between those three distros and how does that affect your development? So between Cloudera and Hortonworks, there's a little bit of difference, not a whole lot. They both tend to leapfrog each other across minor release versions. MapR, of course, is a completely different proprietary file system, but it has its place in this ecosystem and there are some applications that run well in each of these platforms. Customers are not standardizing on one. They're picking what works best for that specific application. I think a little bit of this has to do with the fact that the application programming paradigm has changed. You're not writing SQL, not necessarily writing SQL applications anymore. You're doing MapReduce processing, Hive, a number of things that sort of look like SQL, but not always. So that opens up this whole new way of doing things and you don't necessarily want to say this is not like picking Oracle and writing to Oracle SQL. There are a different set of applications and people are happy to go with different solutions that work best. So you have, I mean, obviously have limited R&D budget. Certainly, you don't have unlimited R&D budget. So you have to prioritize and if I heard you correctly, you got the big three or you can support other distros like Pivotal. We do have support for all the distros. We support Cloudera, Hortonworks, Pivotal, IBM. We're certified against all of these. The MapR product is a separate file system so we don't support that in our nonstop HDFS product with this new upcoming unified file system. We will support that as well and we're also working with shared storage vendors to offer support for that. So that's just, I was just, it's brute force hard work. It's engineering on each of those to make sure that- It's a certain level of work but the core intellectual property actually buys us a lot. The replication engine hasn't changed not a whole lot from nonstop HDFS and I have a crew of engineers that are just amazing with this stuff now because we've taken the time to build nonstop HDFS. That gives us a huge advantage. We can go in and build a unification system without too much effort and the reality is the application API for talking to Hadoop systems is relatively simple. It doesn't support the complex POSIX API so we got a little bit of relief from that. Well, I think this is incredibly powerful conceptually because I've seen over my career so many situations where a company who was solving a really hard problem or was in a niche had to pick a winner. And if they picked the wrong winner, if they picked OS2, they were doomed. It sounds obvious now but at the time, well IBM is doing OS2 and this little little Microsoft has taken Windows. Maybe I'll put my resources there. Oops, I'm going to bet my company on Lotus Notes. Okay, I mean so many, I've asked VMS that just didn't pan out. You're able to hedge your bets. True. And now it sounds like it's a function of both great engineering and the nature of the state of programming today. That is true. It's a function of both and the deep expertise we have in the strong consistent replication helps. And customers are really good to see the product and that just electrifies my engineers. My guys come back with really good ideas for doing it. So we're excited. Well, it de-risks it for the customer. Absolutely. So use whatever distro you want. We're not going to try to push you in one way. We're distribution agnostic. Different distros are going to have horses for courses. You know, obviously map bar is good for this. Cloudera for that, et cetera, et cetera. We can handle it all. So that's going to be a very compelling value proposition. It's a compelling value proposition. We've had customers come back to us and say we want to use this for upgrading from one version of the same distro to another. And sure, that works really well. And it's a use case that's low risk for them. They go in, they keep their existing cluster, operating as normal, point some ingest pipelines at this new file system layer that sits on top of it. The replication takes over and at another data center or maybe another cluster in the same data center, they now have this new version of that distribution. So that's a use case that came out of the blue to us and it's perfectly acceptable. Sorry, final speaker, you're saying within the same distro. Within the same distro, CDH44 to CDH51 or HTTP, earlier to HTTP21. Well, I want to explore this a little bit. So this is a big deal. Because normally, when you're doing some kind of major software upgrade, you've got to have planned downtime. And I've seen it before, I can think of several clients that we had that said, oh no, the vendor said it's OK. And then something happened and they were down for days. In one case, it was really weeks before they got back to where they needed to be. And the reason was they had to sort of half operate and they couldn't keep up with the changes. And it was just a real disaster. True disaster. And I don't take that term lightly. So pre this solution, how would an upgrade go and take us through how it worked while it just did. But compare the two. So in a normal upgrade process, going from a minor rev to a minor rev, it's usually just supported by Cloud Era Manager. They've done a good job of that. And you go through and rotate through the servers upgrading them. A major rev upgrade is a major problem. Sometimes there's data migration involved. They've tried very hard. Both the distro vendors have tried very hard to keep that minimal. But it's a disruptive change. And you can expect a few days downtime at least. What we bring to the table is keep your existing applications operational on your existing cluster. If you happen to be looking at a DR cluster or if you already have a DR cluster, then we'll install a new version on that. And we'll install our bridge file system on top of both of these and start replicating from one to the other. Now you can try porting your applications. The little known secret or dirty little secret is that not all applications just run from one version of a distro to another version. Because sometimes the APIs change. Sometimes there are subtle semantic differences. So you try your new applications on the new cluster. Once you've got that satisfactorily running and you know that the new system is fully up, you can decommission nodes from your old cluster and perhaps upgrade that to the new version. So now you have a bridged cluster, both running the same rev, the newest rev of your distro. Okay, so let's go through that. So in your solution, when I'm going to the new version of the major upgrade, if something goes wrong, what happens? In our solution, remember the existing cluster remains intact, it's untouched. Applications are running on it. We have installed a layer in front of that for certain ingest and perhaps start replicating over time. Now if you're sitting in just the next rack, then that replication takes a very short time. When the data is replicated, you have the same data with strong consistency in a new cluster running the latest version of that distro. So now the next job is to take your applications and try running on those. 99% may work out of the box. One or two things will break. We all know how these things go. You fix those, maybe it takes a couple of days, three days. Your data is remaining consistent. Remember that as your data is pouring into your old cluster, it's also getting replicated into the new cluster. When you've got the applications running satisfactorily, you have the option of running both simultaneously for a week or so or cut the old one out and switch to the new one. And without any downtime, you now have a new distro and applications. Okay, and so in the old world, you would have to, what, try to anticipate the things that were gonna go wrong with the applications or try to test it? Correct. And then try to predict how long you were going to be down and hope that the big risk there is like you said, you lose data consistency. Correct. So even in your scenario, if it takes longer, if there are some unknown blind spots, if it takes a week or two weeks, your data is still consistent. The data is consistent. Now I will admit, you need more resources for doing it this way because you basically have two copies of the data at one point. But if you mean compute resources. Right. It's a small price to pay though. Absolutely. Relative to the cost of downtime. Right, and most often companies operating at this level of availability will have a disaster recovery cluster that's probably lying idle. What we've done is converted that disaster recovery cluster into an active part of your workload. I mean, again, I would think if a customer is saying, well, if their big objection is I got to buy more servers or more storage, then the value of this feature is not enough to offset that. But for the guys who understand the importance of this and the cost of downtime is so high, it's a no brainer. It's a no brainer. And we're feeling all sorts of requests from people for this. It's an alpha stage and we just showed it to a few people, but the word has spread and now we're having a hard time keeping sales guys out of mind. So now, what do you call this a deep unification? So, David, our CEO, came up with the name One Hadoop and I'm not supposed to say that, but we're still mulling the name and I like that. But you're on theCUBE, so that's great. There's only a thousand people watching today. So that's interesting. So where did this requirement come from? Can you talk about that a little bit? So, this requirement, so the idea is mine and I've been talking to customers. I work very closely with customers. I believe that's an important part of my job. And I can't mention names of a few customers, but the original idea came from conversations with some of these customers and the struggles they were experiencing and the realization that it is never going to be a single Hadoop in every, there's never going to be a red hat of the Hadoop world. Let's put it that way. And we jumped on the opportunity to bridge all these disparate distros. Yeah, well, the second person this week has said that Abhimeda said the same thing yesterday on the panel. He said there will be no red hat of a Duke for so many reasons. One is that the Hadoop is going to be much bigger than Linux. And the second is, you know, back in the day, people pretty much in red hat had competition, but the big whales kind of left them alone. If anything, actually IBM helped them by injecting a billion dollars into the Linux ecosystem. And today I think people see the opportunity. The question we have, and I don't know if you have any thoughts on it, is, you know, how many of the distro vendors will actually survive? My feeling is that the market can support three distros. I agree. I've seen distinct applications that really work well on each of these distros. And customers are fanatical about that. At the end of the day, you can sell them a storage layer, you can sell them a compute layer. All that doesn't matter if the application doesn't work well. And given that each of these applications have unique attributes, and people are developing those, I mean, there's a lot of energy going into that. I think the ecosystem can easily support three distros. What I love is the passion around the business model. So you got MapR, actually, you can make money off of its distro. Hortonworks not trying to make money off its distro and Cloudera trying to make its distro sticky so it can move up the stack. And the three dramatically different business models. Again, I can see all three of them having legs and working. I mean, if you've got passion around that strategy and you've got people who believe in it and customers that are loyal, there's no reason why you can't make money at all three of those. No, and I've met some folks from traditional shared storage industry and they have a handle on this as well. They're getting some applications to run better on their systems than some Hadoop applications to run better on their systems than anybody else. And remember, they have longstanding relationships with a lot of these customers. They can walk in and in two days have one of these applications running well. So there's more to it than, and it is going to be much bigger than Linux and there's definitely room for a number of players in the industry. And I can't say what the best business model is. I mean, I could see, okay, if I'm a VC, I want to invest in Hortonworks because they're going for massive volume and they've got the software economics game. I could see, Intel obviously put a lot of money into Cloudera, but if I'm Intel, why not? I'm going to build an ecosystem around that and this is this huge opportunity. I can see that having a viable model and then I look at MapR, okay, maybe it's smaller but I see a potential for great profit there to the extent that they can stay ahead. I mean, I guess the argument against MapR is again, the one we were having is can they stay ahead of the open source world? And that's an interesting question and one that you probably think about all the time. We think about that a lot. We do have products that mitigate some of this risk. One Hadoop is an example. We really help customers live with multiple distros so that product has legs beyond anything we do. On the other hand, we are a proprietary enhancement to Hadoop and we add value that nobody else can. So much like MapR, we're seeing customer traction precisely because of that. At the end of the day, to users, does it really matter that it's open source versus closed source? All that matters is it must work and must be well supported. No, no, people will pay up for a proprietary function. I mean, everybody in this world is open. Who's not open? I mean, Oracle open. Oracle's the least open of them all, but Oracle's, they can point to a lot of things. I mean, Java, it's open. It's open, yeah, so. So today's world is open. It's a spectrum and it's a moving target. So people will pay up for a proprietary function. There's no question about that. I mean, sometimes in the community, you get passion around, well, we're more open than you are and you're bad, we're good. That's all nonsense. I mean, the customers don't care about that. They care about solving business problems as you well know. Financial services is interesting. I was talking to David Richards about this. They seem to really be leading the charge again here. I mean, you're biased because this world, the financial services world really needs what you have. But nonetheless, it seems to have started in financial services, obviously web giants, but in the commercial enterprise. It seems like financial services is leading the charge again, whereas just a few short years ago, everybody thought the banking system was dead. Here we are again, leading the tech charge. That's very true and you are correct. I meet a lot of IT folks from the top banks and the top financial institutions. They're smart, savvy people. They may have been totally oracle people five, six years ago, but today they know more about Hadoop applications than most people I've seen. And I had a discussion with this gentleman from one of the big credit card companies. He was talking about how somebody walked up to him in a bar and said, you guys are ex, you're old, and we have the new way to exchange money and do business. And he happened to mention that they had supplied a lot of the technology that went into the wallet software offered by one of the big new, very catchy vendors. I'm trying hard to not mention names, but the fact is that the financial industry is very savvy these days and I'm somewhat convinced and this is totally my own theory that within months, maybe years, there will be credit card companies that accept Bitcoin for payment, perhaps, or it's going to become a much more dramatically evolved world and the technology they're using is all Hadoop. Almost all of it is Hadoop based. Well, I'm a believer in digital currency. It's interesting when I hear Warren Buffett on CNBC saying, stay away from it, it's never going to happen, it's going to die, and then you get guys like Mark Andreessen on the other end talking about how it's much more than a digital currency. It's a platform, it's an ecosystem evolving around it. Kind of esoteric terms, but we've seen these wild cards like open source, you know, Warren Buffett might say open source, how can you make any money? You don't ever do open source and then you look what happens around the innovation, not that he ever said that, he didn't, but I'm just using it as sort of a metaphor for digital currency and the naysayers and the supporters. But it's interesting to see digital currency riding on top of this, we call, the others call the digital fabric, and it's going to be interesting to see how the financial services firms transform. The money business has always been the smartest people, they're sharks, and they kind of have a tendency to figure it out, and then they do some stupid things, and then they shake out, but they have a knack of rebirth. Yes, yes. So it's going to be fun to watch. But well, Jagain, thanks very much for coming on theCUBE. Again, always a pleasure riffing with you on industry trends. I'll give you the last thought, just perceptions of Hadoop world this year, sort of where we're at in Hadoop, some of the things that are exciting you, whatever you want to close with. I want to close with this. I did a talk on global Hadoop yesterday. I've done similar talks a year ago. Yesterday's talk was standing room only. All the questions were from customers who were experiencing the pain of trying to do things with this CPE, and they want a real solution to that. These are production problems. So the notion that Hadoop is moving out of the labs and environments where POCs are done into real production is very true. All the customers you see at the conference today are customers doing production applications. So that's wonderful to see. Awesome, you know, Rich Napolitano was on earlier. He said, if you want the truth, you got to talk to either engineering or sales. That's where the truth lives. So we love having you on, because we get the truth. So thanks again. Thanks Dave, wonderful to meet you. Great to see you. All right, keep it right there, but we'll be back right after this world. This is theCUBE, we're live from Big Data NYC.