 Everybody we're back, this is Dave Vellante. I'm with Wikibon.org and this is theCUBE. We come to the events. We extract the signal from the noise. We bring you the best guests, the smartest minds in the industry. David Richards is here. He's the CEO of WAN Disco. We're also here with Jagane Sandar, who's the CTO and VP of Engineering for the big data business at WAN Disco. Gentlemen, welcome back to theCUBE. It's great to see you guys again. A pleasure Dave, as always. So we've been tracking WAN Disco now for a while. You guys are making steady progress. You're solving. It's like you guys just go after the hard problems first, if you get the easy stuff, comes later. But give us the update as to where you are. We were talking off camera about some of the milestones you hit at Strata. You're now at Hadoop Summit. You've got four big announcements today and you guys are just exploding. So David, give us the update. So it's one of those, where do I start? But I think the sort of jewel in the crown is all about our active, active replication technology, which we can actually do over a wide area network and we're probably the only company in the world that can do that. And we've applied that technology now to Hadoop. So I mean our sort of catchphrase is that the data center is no longer a single point of failure. So we can provide active, active replication for Hadoop deployments now over a WAN. At the last conference at Strata, we announced, if you remember, we announced that we did the intradata center product where we could replicate Hadoop within the data center. And now again, and this team have done a fantastic job and we now have, in probably world record time, Hadoop 2.0 replicated over a wide area network. That's active, active replication of Hadoop over a WAN. Okay, so you've got the active, active piece. You've also made some announcements around Amazon S3. Yeah, when we first launched our distro back in February, we, as part of that launched something called S3 HDFS, which allows you to do the sort of, the public cloud to private cloud deployment with reference to Amazon. And we announced that we've open sourced that today. So that's available to everyone for free download and it's available at wandesco.com, our website, which we're very pleased about. I think the community will be very happy that we've done that. We also announced, we had the AMP Labs from UC Berkeley in our offices a couple of weeks ago and I'm delighted to say that we're the first company to provide commercial support for Spark and Shark. And that was driven actually by customers. So we began to see a growing demand for this high volume, high velocity and memory analytics from an open source perspective and Spark and Shark are actually gaining momentum, which is kind of interesting in that space. So we're the first company to provide commercial support working in conjunction with the AMP Labs guys from Berkeley and to provide support for those products. So four big pieces, the Active-Active S3 Hadoop, a new Hadoop distro and then the in-memory pieces with Spark and Shark. Jaygan, I want to dig into the Active-Active piece a little bit, this notion of a global namespace. Talk about that capability, how you guys are attacking it and then we can help our audience understand a little bit what the business value is. Sure, Dave, ours is the first solution that runs HDFS across multiple data centers separated by thousands of miles. We have a single HDFS cluster so you can create a file in one data center, it's visible in the other data center immediately and the data blocks get replicated there as well. Over at our booth, we have a demonstration where we generate data using Terogen in one data center, Amazon, US East One as a matter of fact and we run Terasort which consumes that data in another data center in Oregon. By this, what we've essentially done is provided you with replication for complete failure of a data center. Back in Strata, we had a solution where if a single name node from our three, five or seven name node cluster were to fail, your Hadoop continues to run uninterrupted. With this solution, you can lose an entire data center into flood or some natural disaster. You'd still be fully functional with your other data center. That's the big important piece that we bring to the table. Yeah, so this is a really hard problem to solve and a lot of people are going to say, no, that's magic. How does that work? And when you're talking off camera about who are the companies that are actually trying to solve this problem? You guys, we're talking about Google Spanner, really the only two at a distance that are really attacking this problem and Google Spanner, of course, puts a lot of specific assumptions and constraints and talking about atomic clocks and things like that. You guys use the notion of eventual knowledge. And we don't have time to really dig into that, but it's essentially allows you to take a hit within a data center. But still allow you to propagate that knowledge such that you can recover from that failure. I think one of the important things to note as well is that the core technology that enables this is not new. The PAXOS algorithm, which has been around for since what, 20, 25 years. And in fact, our application of PAXOS to enable this distributed coordination engine that we have, that's not new either. It's been, we've had technology in the market now for eight years and that we did an IP on the back of this technology for a product called Subversion Multisite. So we've seen, we've had instances like the Chengdu earthquake where data centers collapsed, right? So we had one of our major customers, O2 Micro, who make power adapters for laptops. Their whole data center was out of commission. But what actually happened with their developers? Well, nothing, because they were able to go home and fail over to another data center, in this case that was in the United States. So we can actually do this, as Jigain was saying, this complete one scope failover without losing, without zero downtime and zero data loss. I think it's pretty incredible. Yeah, zero data loss is key because normally you have, you either have to do unnatural acts, like put in a three data centers, which is incredibly expensive, or you have to expose yourself to data loss. What do we mean by that? You write to a synchronous data center and then you asynchronously trickle the data to some data at a distance and then there's always that lag time. You guys have solved that problem, if I understand that. Correct, it's a complete zero time to recovery. So you can recover immediately in a completely different geo region. Yeah, so there's probably some speed of light constraints in terms of, I hate all that, but you haven't solved the physical problems yet, but you're working on that, I understand. I have to differ that to our chief scientist, but yes, we have an implementation of PAXOS that doesn't worry about time. So we can do this replication between India and the US, for example. The latency is not a problem. If you want instant failover, you can buy more bandwidth. Other solutions that use block level replication have an issue where the file systems such as EXT4 or NTFS will fail. Ours is a solution built on top of stock Apache Hadoop. So you're using Apache Hadoop, the file system doesn't change on disk or on wire, yet you get this disaster recovery capability where an entire data center can fail. So when disco is kind of an amazing company, it was a self-funded startup, right? You never took a dime of outside capital until you did your IPO, if I understand this correctly. That's correct. You're solving some of the world's hardest problems, like the zero data loss problem, which obviously would be big, for example, within financial services. Take us back to how you actually did that. I mean, I can imagine you're going into an account saying, hey, we have this capability, and somebody, again, in financial services, saying, wow, I really need that, but when disco, a little old when disco, who are you guys? If you were IBM, I would say yes, but I can't bet my business on you because I'll lose the job. How were you able to get through that hurdle? So, necessity is the mother of invention, I'm very fond of saying, and I think great companies have found that out of necessity, actually, more than by design and venture capital, et cetera. And we decided, I felt that the intellectual property in the company was so strong, I really didn't want to sell it to Venture for cents on the dollar. You know, sorry, venture capitalists out there, but I just didn't want to do that. So, we formed the company organically, and we did it without sales guys. Every single one of our initial customers came to us because we didn't have an enterprise sales team. So, we had this amazing product where customers had a problem and they came to us for a solution. And back in 2009, I got a phone call one day from a very senior guy at Hewlett Packard who eventually bought an enterprise-wide deal for their entire company. So, you know, WANDISCO technology is now available to every single HP developer throughout their organization. And that was a pivotal moment for the company because it took us from being a small startup to doing a multi-million dollar deal on the rest, as they say, it's history with an IPO, as you mentioned, in 2012. Well, you've actually written about this in terms of the distributed nature of developers and the impact of network downtime on developer productivity. So, that was sort of an early sort of value proposition? How has that, is that the case and how has that evolved over time? So, the company started out solving the problem of if you have developers in India and China, the United Kingdom, United States, how do they collaborate effectively over a wide area network? Well, the answer is, of course, they can't, we often find it's cheaper when we initially set up the system to FedEx a DVD of the source tree over to China because it takes three days to download the thing in the first place. So, there is that speed of light problem which you got back to. I think somebody once did some research, it's cheaper to send a message via a pigeon to South Africa than it is by actually sending it electronically. So. Chevy Chuck access method, I call it. So, the original value proposition was both efficient. So, what happens if we had one customer who they had downtime in India consistently because people were hanging there washing, using the fiber line as a washing line so the fiber line kept on going down every five minutes. So, we solved that problem which is outage. We also solved the performance issue because if you have 4,000 developers in India constantly downloading from a master server somewhere in the United States, that's 4,000 multiplied by all of those downloads. That's a hell of a lot of downloads that you've got to cope with. So, we always have the data locally for those developers. So, a really strong value proposition in both performance and outage in that marketplace. And now taking that message and applying it to the Hadoop marketplace and how do we get into the Hadoop marketplace? We didn't just one day go out and say right, we're going to develop a Hadoop solution. We went and acquired Jagain's company which included him himself and also a guy called Konstantin Shwasko that was one of the original six at Yahoo that developed Hadoop in the first instance. So, I think our execution strategy because we never took venture of polluting back to an initial question is very, very strong. Companies found it very, very strong foundations. Yeah, when we first met, I started asking questions and was very impressed with the chops you guys had in the Hadoop Committer community. Jagain, I wonder if I could ask you. So, this notion of active-active at a distance and David, you just pointed out the PAXOS algorithm's been around for a long time. Yet, very few are able to solve that problem. Nobody, really. You guys are really the first. And again, I would throw Google Spanner in there because it's just a really interesting academic paper and worth reading if you've got nothing to do and you want to make your eyes bleed. But why has it been so difficult and how have you guys been able to solve this problem where others have not? So, in addition to the eight years that we've had this product in the marketplace, our chief scientist, Alad, has been doing distributed coordination for the better part of 20 years. And PAXOS, a true implementation, is really hard to get right. All sorts of compromises can be made and you can build things such as ZooKeeper and other solutions which make somewhat compromises. But your solution is also 98% true, not 100% true. We didn't take that path. We took the long, hard route. We built a PAXOS that's truly independent of time and proven and robust. And a lot of credit goes to Alad, our chief scientist, of course. But that's the core of our intellectual problem. Yeah, so because the hard part is if something goes wrong, you got to make sure that you don't lose data and that's a really tricky problem. All right, so David, how do you envision customers applying this capability? What are the discussions like going on and how does it relate to what's going on here at Hadoop Summit? So, first of all, we've got a lot of activity in our sales pipeline right now. We anticipate that we'll be making an announcement in the next 48 hours that's quite a large OEM with a very large telecommunications company where they have to have the four nines. They have to have 99.9999% of time if Hadoop is in fact going to move from batch to transactional, to the transactional sphere. Now that notion that we said we thought four months ago that this was going to be very important, it's critical. If Hadoop is going to be an applications platform, it has to have guaranteed high availability and we're providing that. We're also talking to other vendors about OEM deals and we have a very, very strong sales pipeline right now of customers that may have already tried to do, you know, use Cloudera or Hortonworks on a trial basis now looking to move that into full production and I think we're the company to take it there. Yeah, I mean, you mentioned, you know, the possibility of OEMs is, I mean, I would see a number of enterprise companies wanting access to this capability. So, I was going to ask you about partnerships and your ecosystem, can you talk a little bit more about your philosophy in that regard? Yeah, so, I mean, our initial execution strategy is certainly an OEM strategy and we see OEM as a critical component of our early entry into the marketplace. I think the market's so big, I can't, you know, we've got 40 enterprise sales guys, I can't scale my enterprise sales team fast enough to take advantage of demand out there in the marketplace so we have to look at OEMs without being disrespectful to some of the companies that we're currently in negotiation with, you know, some of the usual suspects in and around the Hadoop distribution arena we're very interested in partnering with those. So, typically, we've had a direct sales model. So, actually, OEM is kind of new to us, but we're certainly very open to doing OEMs. On the partnering side, we've partnered with companies like DataGuys, for example, to fill some of the gaps that we see in security and certainly around services, certainly for the early part of the market, educating customers, helping customers to get Hadoop into production is very important. So, we're part of the number of those vendors recently. Excellent, so we'll be looking for potential announcements there over the next several months and then we've got, of course, a Hadoop world coming up in the fall. Yep. Yet another milestone. You seem to be on this cadence of innovations and announcements along these big shows. What should people be looking for? I'll give you the last word. I think you should be looking for us to make announcements that have revenue associated with them and certainly our shareholders would be looking for some of those things as well. It's all well and good to make lots of product announcements. We now have to show execution and take those into full-blown, mission-critical Hadoop deployments and hopefully Jagane can keep his eyes open. I know he's been working for a number of week entries. Excellent. All right, gentlemen, listen, thanks very much for coming on theCUBE. Wendisco, really interesting company. Check them out. Check out the PAXOS algorithm. Look at the Wikipedia entry. It's very, very interesting. They're really the first example of Active Active. Our David Fleuer is going to be all over this topic. I want you to spend some time with them, Jagane, if you wouldn't mind. All right, everybody, keep it right there. We'll write back with our next guest. This is theCUBE. I'm David Fleuer live from the Hadoop Summit in San Jose, California.