 The Cube at Hadoop Summit 2014 is brought to you by Anchor Sponsor, Hortonworks. We do Hadoop. And headline sponsor, WAN Disco. We make Hadoop invincible. Okay, welcome back, everyone, live in Silicon Valley in San Jose for Hadoop Summit. This is The Cube, our flagship program. We go out to the events. Extract a simple noise. I'm John Furrier with Jeff Kelly and all of us at Wikibon.org. Exclusive coverage of our new survey that Wikibon's putting out. So stay tuned, keep watching. We're leaking out little tidbits. The first look here on theCUBE. Our next guest is Cube alum, Jagane Sundar, CTO WAN Disco. Welcome back. Thank you, John. So what's new with you guys? Obviously, this is a power show for you guys. Obviously all your customers are here and developers. It's a tech geek fest, but it's also a lot of critical things happening in the ecosystem. So share with us what's going on for you guys here at the show and what are you looking for to do at the show here? Absolutely. As we've always said, availability and continuous availability are our principal concerns. So we're taking HDFS and we've made it run across the wide area network. This is of course the best way to have a multi-data center Hadoop installation. Customers love it. It's a great alternative to weaker solutions like this CP for disaster recovery or pushing ahead with HBase, making that continuously available across data centers. And we've got a little bit to talk about non-stop. So you have non-stop Hadoop, what do you call HBase? Non-stop HBase, what's the slogan for that? That is non-stop HBase, you're exactly right. Okay, so non-stop HBase, describe the innovations on top of HBase because it's very popular, but some have said it's kind of hard to work with. How are you guys making that any better? So the biggest problem with HBase is region server failure. If a region server fails, all the regions that it serves need to be migrated to other region servers. And that could take an indeterminate amount of time. We use our distributed coordination engine to store the data in three region servers. That gives you the ability to have active active region servers. So if a single region server fails, it's a non-event in our system. So one of the themes this year is security, right? So obviously, Hortonworks makes a first power move, makes a big acquisition, Cloudera now follows and copies that move or copies. They're basically match volleys for volley, but it really points to the bigger picture, which is need for enterprise grade and dupe. You really can't go anywhere without security. In fact, Jeff Kelly's survey that we've been talking about, I just tweeted out that when talked about, 20% of Hadoop is paid for, the rest is free. So if you believe the numbers, and Jeff certainly can answer that, that means that there's a huge tsunami of adoption coming. Yes. So this security is a tell-tale sign that there's need for enterprise grade. So with that in mind, what are you guys doing? How do you talk to your customers? Because one of your things is you're enterprise ready. Is that like table stakes with security, and how do you guys feel in between? So security is a first order concern of ours. If you consider things like a home PC, you have a USB drive sitting on it, and you store data unencrypted on that USB drive, should that drive need to go out for service of some kind, you are in a position where very confidential information may be available to some unscrupulous persons. You have a similar situation with disk CP based copy solutions that are being followed. If you have two data centers or more data centers, and you're copying your data across the van to a different data center, what you're essentially doing is compromising the security of your system. For enterprise grade security, we believe the best way to do it is to have a single HDFS, security policies and authentication authorization, everything that you establish is run in a very symmetric manner across all these data centers. That is step one to ensuring a secure Hadoop. So Jeff, what's your take on that? I mean, you're the analyst, you get the survey. I mean, obviously security is one major thing. I think we talked about cloud-era word words. There's other issues, what do you find? Sure, well, as you mentioned, deployments that are spanning multiple data centers, you've got to keep those coordinated and essentially acting as a single deployment. When we've talked before in theCUBE, and I think that issue becomes more important as the market matures and some of these deployments actually do start to span multiple data centers. So in our recent survey, we asked just that question from Hadoop practitioners, how many of their deployments span multiple? Geographically dispersed data centers, and actually about 70% said, in fact, that's the environment in their organization, which to me was higher than I expected. It says to me that this market is moving even faster than we thought, we thought it was moving pretty quickly. But that's exactly when this go plays and allows those types of deployments to again act as a kind of a single deployment. Is that an accurate way to say it? Indeed, absolutely. I mean, the number is not a surprise to me. We're seeing bigger and more interesting things from our customers. The point is that the services that most of these enterprises offer cannot be limited to a single data center anymore. That is decades old technology. You need your data to be available continuously in all the data centers. Further, you really can't deal with reconciling wandering file systems. If you have two different, slightly different, even if it's just 1% of the files that are different on a multi-petabyte system, that's a huge task to try and reconcile the difference. So you really need a single HDFS that runs across multiple data centers. So that's, I mean, that's an administrative nightmare it sounds like. Absolutely. It's the sort of thing where IT administrators have to call upon their users, their application developers and users of those applications and tell them the bad news that they've got two files that they need to figure out which one is the one they want to retain and which one is outdated or contains inadequate information. So this is not possible with a system where you have a HDFS running across the van and that's a huge simplification of the administrative overhead. Can you maybe provide a little color around what you're seeing from customers in terms of what are some of the applications that they're supporting, that are running, being supported by multi-data center, Hadoop deployments? So financial customers are a big vertical in our space. They have a tier three, tier two, tier one applications. Till now, with their disaster recovery solutions, they've never been able to push further than tier three applications in a multi-data center environment. With our technology, what excites them most is the ability to run tier one applications and consider failover as something that's a very essential part of the system. So that's exciting to us because we're going in there and convincing our customers to go with an architecture that has active active on both sides. You can ingest data on any of the data centers. Local makes most sense. And you can have that data available for tier one applications with virtually no time to failover. And so let's take that a step further. What kind of market opportunities does that open up for customers of yours? The first thing we've, we had a very interesting conversation with a specific financial customer who said that they like the notion of buying their high availability solution from a third party away from the distributions who specializes in this high availability software. We've done this for over eight years now and it's opening up the entire financial and healthcare verticals to us. So in terms of the different distributions, you just mentioned, of course, there's Hortonworks, Cloudera, MapR, others. So how do you interact with the different companies and how do you maintain some kind of uniformity across the different distributions? Absolutely. So we're based on open source Apache Hadoop. We are a thin layer that extends open source Apache Hadoop. And so it was naturally very easy for us to get this to run on Cloudera distribution, Hortonworks distribution. We're also working with other partners such as Pivotal and IBM. We are distribution agnostic and we have wonderful relationship with both Hortonworks and Cloudera and the other distro vendors. And I got to get your opinion. I want to ask everybody about the market dynamics and what's going on. How have you seen this market evolve from the different players in terms of the level of competition? I mean, are you seeing, when you go into customer sites, are you going in with particular vendors such as Hortonworks or Cloudera? Are you seeing them? Are you kind of watching them fight it out as you go into talk about high availability? As our CEO often points out, we don't sell religion, we sell Bipers. So the decision has been made before we step in. We may also, I mean, just from a perspective of the industry, I see both Cloudera, Hortonworks and other vendors as strong players in this Hadoop space and we may actually be in a position where there's more than one winner in the loop ecosystem and more than one distribution is popular or widespread. Open Source being the powerhouse that it is, we can see innovation that's the cumulative sum of maybe 100 different companies pouring into open source. I can see a role for a different style of distro from Cloudera and a different one from Hortonworks and maybe others as well. And what do you think that means for the future of Apache Hadoop? I'm actually going to be on stage tomorrow with Doug Cutting and Arun Murthy, two of the critical members of the Hadoop community will be talking about the future of Apache Hadoop. I'm curious from your perspective, how do you see it evolving, particularly when you've got the vendor dynamics happening, you've got the open source ecosystem and community contributing, how do you see it evolving over the next one, two, five years? So I see a clear parallel between the Linux world and the Hadoop community at this point. There's Linus who puts out the kernel and is very careful about what goes into the kernel, but it takes further packaging and further development and further bundling before you have an operating system such as Red Hat or Suze. Similarly, I see the Apache community playing the central role for putting out the key distribution components, but there will be enhancements, improvements and enterprise features added by the distro vendors, partners such as ourselves, and that's the size of the ecosystem, basically. Well, it's a really interesting topic, John. I mean, this market could go any number of directions. You know, I was actually just tweeting on our crowd chat. Go to crowdchat.net slash at Dupes Summer, we're opening all the tidbits out there for the survey and also I was just commenting on the WAN disco and it's going back and forth with some folks out there that make no bones about it. The enterprise grade features are being filled in. I actually just had to delete my statement because sometimes I get in trouble when I tweet from the queue because I say things too fast, like Cloudera and Horton works filling in the holes of their product lines. It doesn't come across in a positive light, but in reality, it is evolution. They are filling features, correct? Not necessarily holes, they are holes in the sense of as you evolve, security and industrial grade enterprise is critical. So I got to ask you from a technical perspective, where are we on this and what do you see as the threshold issues for the Hadoop community, not just Cloudera, Horton, others, and you guys, as you cross the threshold, what is the table stakes for enterprise Hadoop? So enterprise Hadoop has to be continuously available, secure of course, and it has to offer services for big data in the same manner that people are used to with traditional databases, thereby the importance of SQL and the importance of other applications. But continuously available, secure service, that's your starting point, without that you don't have a platform to deploy. So you saw Apple's big news yesterday, iOS 8, really enterprise friendly, that's what they're trying to go down. The consumerization of IT, I got to ask the question, because this is ultimately the Pandora's box that gets opened up, which is not a whole Swiss cheese in those products too. I mean, it's a lot of holes. I mean, you talk about applications, virtualization, data virtualization now coming around the corner, what is an enterprise need to do as the consumerization with more iOS 8 for the enterprise? Android, some say it's more insecure than say Apple in the enterprise, what's your take on that? The first and biggest impact that enterprises are going to have is that the volume, data that's going to flow into their systems is going to multiply several orders of magnitude. And it's not clear that you can afford to throw away any of this data. So consumerization with iOS coming in, if it's a single mobile device that people are carrying, they have a great deal of personal information on that as well. It's also the enterprises responsibility at this point. So I would begin to think that the volume of data that's going into enterprise systems is going to increase as a result of this move by Apple. Do you think data virtualization has legs? It's too early to say at this point. I don't have a strong sense of whether it's going to have legs or not. Okay, while I got you here, might as well ask you the question about the Spark Summit because we're going to bring the cube up to the Spark Summit. What's your take on that whole movement? You into it, you think it's relevant? What's going on with that? I think it's relevant. I think it is much more interesting when it runs on top of yarn. And I think it's the sort of application that certain, it's the sort of platform that certain applications will be drawn to and it will be supported as a first class citizen on top of Hadoop. If you had to sum up, since I'm putting you in the spot here, might as well ask the last final tough question. If you had to put the two horses on the track, Horton works in Cloudera and someone said, I don't want you to talk trash or anything, it was really specific. How would you describe the difference between those two companies? I believe that Cloudera has a stronger enterprise focus and Hortonworks has a stronger open source roots focus. They have a stronger presence in the community. So it's an interesting battle to watch and it's possible that there'll be two winners in this equation, but that's where I would. Yeah, we have a unique perspective. The cube started in Cloudera, I've seen that company from 30 employees, remember when Amarawa Doll was an entrepreneur in residence, seeing it grow and then a lot of people are saying they don't recognize the new Cloudera. I mean, it's a big company, they have billions of dollars of valuation, 900 million net, 740, early investors cashed out. I mean, it's a success story. The Cloudera is a success story and Hortonworks is not done. So it's not done and I truly believe that Hortonworks has the open source roots to produce and satisfy certain segments of the market. Cloudera is strong in other segments, there's no doubt about that. Yeah, and the vision, they've always stayed true, the vision I was talking with the product guys and it's like, they've had this vision from day one, but I will tell you that a lot of people are talking about their lack of presence in the open source communities compared to Hortonworks, saying that the old Cloudera was much stronger but then Cloudera counters it and says, oh no, we have tons of guys contributing. So that's always going to be kind of like he said, she said, but we're keeping an eye on it but people are talking about that, good point. Thanks for coming on theCUBE, really appreciate it. When Disco here, CTO talking about non-stop Hadoop, non-stop H-Base and Enterprise, great Hadoop, it's coming fast, our prediction and our survey is showing that the freebies and the proof of concepts are rolling right into production and that 80% of that market will be exploding. That's why it's kind of like the clutching and wrapping going on as we say there, jockeying for pole position. This theCUBE, we'll be right back after this short break. Thanks for having me, John.