 Oregon, this is Silicon Angles theCUBE and we're here live at Strata in Santa Clara, California, O'Reilly Media's big data event. This is I think our fourth or even fifth Strata or Hadoop World if you combine them as Cloudera and O'Reilly did last year. And what we like to do here is try to extract the signal from the noise, bring to you some of the major trends. We've been talking a lot about some of the activity that's going on in the platform world and what we've called the land grab. Cloudera used to be all alone in the Hadoop distribution business and then others jumped in and now there's just been a sea of innovation that has come to the forum. We're with a company, it's very interesting, I think you'll find their background and their innovation in big data. A company's called WAN Disco. We're here with David Richards, who's the CEO and Jagane Sundar, who's the CTO and VP of engineering for the big data side of the WAN Disco business. Gentlemen, welcome back to theCUBE. Dave, thank you, it's great to be here. Yeah, it's good to see you guys again. We had a great meeting in Marlboro on the East Coast and Jeff Kelly did an interview with you guys on theCUBE in our Marlboro offices and kind of introducing the company and the acquisition that brought you in to the big data space, but I want to start with sort of why you're here. But before we actually get to that, tell us about who you are, because I think a lot of people don't know. So yeah, we kind of sound like a 1970s Donna Summer Disco kind of company, don't we? But we've got nothing to do with that. So our heritage is in data replication. WAN Disco stands for Wide Area Network Distributor Computing. We're kind of a strange company in many ways, not just the name, but we did an IPO about seven, eight months ago now and we did it without ever raising venture capital. So we were able to raise a lot of money on the London Stock Exchange recently and I think we won the IPO of the year award along with numerous other awards. So the company's heritage, rich heritage is in the field of distributor computing and we're quite famous, I guess, for making a product called Subversion Multisite that a lot of the big companies used to develop their software development today. Yeah, and that's kind of a rare journey that you went on. We actually just had a company called DDN, Data Direct Networks. They're a $300 million company, also self-funded. So two very successful stories. They've talked about an IPO and sort of pulled back from that but it's not something that you see typically in this business. How were you able to achieve that? So in order to become a successful company, it's back to the old fashioned things that your grandfather and my grandfather would probably have known, which is actually selling products that customers were prepared to buy. So as a company we had a great discipline from 2005 when the business was incorporated to build products for a market that companies were going to buy and they were solving big problems that companies were prepared to pay money for. We're not doing anything different now. Our entry into the big data marketplace that I'm presuming we're going to get onto is actually that, we're using exactly the same processes, exactly the same discipline around open source and replying our unique technology into that market. So since we last met, you guys have announced Hadoop distribution based on Hadoop 2.0. Talk about what you announced and then why you needed to do that. So at a very high level, and I hand over to Gigan, and second you can fill in the gaps. We announced the world's first production ready, Hadoop 2.x distro, which we're very excited about. We did it in a very short order. We hired a very talented group of engineers and existing committers, a guy called Dr. Konstantin Boutnik, who actually was at Cloudera for a while and a Yahoo before that, who really pressed forward and built that distro very quickly. And there are numerous reasons why we did that. I mean, yeah, we're going to see other companies enter into this marketplace. We're going to see, we just see an Intel, we just see an EMC announced. With their third distribution. With their third distro. I mean, look, this is a huge marketplace, right? I mean, you guys up to your forecasts on the market, it's growing much faster than we all imagined. And in order for us to execute, we have to have our own distro. That isn't exclusive. I mean, we are prepared to support, for example, Cloudera. We've got to partnership with Cloudera, we've got to partnership with Hortonworks. So it's not, it's an either or situation for us. Obviously we'd prefer people to be in our distro, but we're equally prepared to support their distributions. And you prefer people to be on your distro because it accentuates your value proposition, presumably, right? I mean. That's right. I mean, we have a patent in the way that you do active, active replication of a wide area network. We can guarantee 100% that Hadoop will be available with zero outage, zero downtime. We've been doing this for seven years now in the subversion space. So we can save people from earthquakes, floods, other disasters. We can guarantee Hadoop is bulletproof for the enterprise. And of course, it fits in very nicely. Our active, active replication technology can plug into our existing Hadoop distribution de facto, or in fact, Hadoop can plug into Hortonworks and Cloudera as well in equal measure. So Jigay, can you take us back, talk about Hadoop, go to the roots and talk about why it was sort of inherently not what everybody says enterprise ready and the problem that you were trying to solve. And then we'll talk about how you solved it. Sure. Thanks for the intro, Dave. Hadoop was originally based on Google's work with their MapReduce algorithm. And the paper that they wrote laid out the name node or the metadata for the file system as a fully in-memory operation system. That way it could scale to several thousands of servers without any difficulty. The problem with that though, it's really hard to make it a replicated multi-server architecture. So traditionally all the efforts in making Hadoop more available have focused on cold standby, failover, warm standby, failover. There never was a server, a multiple server architecture. Enter Vandisco, we have technology that we've used in the SVN space for replication peer to peer both within the data center and across the world in different data centers. We took that after our acquisition. We took that technology and we applied it to the name node in a really non-invasive manner. In this way we can add the active active capabilities to the name node metadata. This is something we're offering tightly integrated into WDD, our own distro. We will also offer it on top of CDH and Hortonworks distro when they come about. So support of WDD will be much tightly, much more tightly integrated. But other than that you can see the benefits on other distributions also. So it's an active active, you do data replication over the WAN, so essentially you have a non-stop name node. Correct. And now, talk about the advantages of your distribution versus sort of applying it to other distributions. So first and foremost we are very much the leading edge of the distribution business. We were the first with a 2.x based distribution. We were the first with a yarn which is the next generation resource scheduler for Compute. We're the first to support yarn as a first class citizen in the Hadoop distro business. It's supported as an unsupported piece of software in cloud era. We're also the first to include enterprise ready features such as the non-stop capability both for within a data center and cross data center. Finally we have features such as S3 HDFS which is the capability to move your workload from a public cloud such as Amazon's S3 and Elastic MapReduce into your private cloud based Hadoop. So all these features are not available in any of the other distros. So I mean obviously your technology is more appropriate for mission critical applications. Is that the exclusive target or do you see it as wider than that? Not quite. So our Vandesco Hadoop console which is our management solution is a fully multi-tenant management console. So you can go in and create a cluster with very few pages of information. You just tell it how much storage you want, tell it how many users you want and how CPU intensive it is. For experimental and development purpose this could be setting up a cluster, running some tests and tearing it down all within a few hours. Or if you want to use it for production you can have the cluster running for months on end. We do cover both targets. We expect this to be used by people with Linux administration skills. We don't require you to undergo extensive Hadoop training. We have seen, just to add to that, we have seen from our partners numerous companies, enterprises that are trying to put Hadoop into a production environment but they do have concerns about high availability. And there's that very famous, the name note being the central point of failure. And I think, certainly in the early market we're seeing great deal of interest from companies that want to solve that central point of failure. And we do solve that problem. You know, David, we've heard today and even in other events that for every five Hadoop deployments there's one in production. So that's that ratio that's, do you feel like you can move that needle? We do, and our partners think that too. I mean, we've, we just launched the, what we're calling the non-stop partnership program and we've got seven partners or pretty well known like Hive, the drive manufacturers, Fusion IO, SUSE, we're looking at operating system deal with those guys. As well as CloudWik and numerous other system integrated partners who all think that we can drag that revenue forward for them. So we're the roadmap for them to take Hadoop into production. So that one in five, which I think is about accurate right now, I think we can turn that maybe three in five, if our technology works in the way that we think it will. Yeah, I saw the Fusion IO partnership announced. Caught my eye, it's just, the reason it did is because I often say a Hadoop is the new tape. Think about Flash in this world. Where's the fit there? Well, hardware vendors love us. So why do hardware vendors love us? Because we're replicating data. So we're, for every one box, now they have to have two boxes to gain the high availability. So the hardware vendors would, we're really seeing traction in the hardware space. Yeah, so where do you see us in this whole Hadoop space? Talk about your vision for Hadoop generally and big data specifically. So I think Hadoop to date has been used very successfully in non-mission-critical environments. And when I say non-mission-critical, I mean running sentiment analysis, running data analysis, I want to see Hadoop move into the enterprise in the runtime for replacing relational databases, spreading throughout the enterprise, changing the way in which we look at data, the possibilities that organizations have from data. You know, I mean, just looking at the MIT paper about Obama's 2012 election campaign, that shows you the possibilities that big data unfolds to companies and organizations and enterprises today. I was on the phone this morning to an analyst firm in the UK who were seeing, I mean they think that nine out of 10 of their clients are currently looking at Hadoop big data implementations and probably only one of them have done it so far. So there's a long way to go in this marketplace. I think your Wikibon's analysis on this market has been pretty accurate so far. See you've upped your analysis of the estimates recently which is great news and I think we're going to see a lot more enterprises move into Hadoop and start deploying Hadoop. Yeah, it's kind of been the Hadoop tale of wagging the dog in the last couple of years. Do you see that flipping? I do, I think we're going to end up in a situation where it's going to be very similar. Remember back in 2000, 95 to 2000, I used to be an SAP consultant when SAP consultants ruled the world. I think we're going to see data analysts rule the world and if you're not implementing a big data implementation in your organization for a data intensive organization, you're going to be at such a massive competitive disadvantage that you are simply going to have to look at big data somewhere down the line in your organization. And we're seeing similar adoption, so a lot of CIOs are currently saying we need to implement big data without really knowing what it is, what it does for their organization. But that's going to change over, I think 2013 is going to be the year where organization is going to full production with a lot of big data implementations. So what, Jiggy, from a technical standpoint, you're obviously attacking the mission critical piece of it. What other areas do you see from a technical perspective as to that the industry needs to knock down in order for that adoption to accelerate the way we think it will? So management of data is a significant problem. As big data grows, you cannot do things like have four copies of the data and four different data centers just because you cannot have a file system that runs across the data centers. We plan on attacking that with our next generation file system that can replicate across the data centers. If you have a workload that's running in the public cloud, I believe that you will be considering options for private cloud. Very well-known companies like Netflix have dabbled in both, and we know that the cost advantages of moving workloads into the private cloud are significant. We expect to see vendors offering solutions that blend the two. We expect to see forward progress in terms of bursting loads, workloads onto the public cloud, but mostly keeping the data in the private cloud. These are all areas that I think you will see significant developments in. Yeah, you know, you're making, I think a great point. We've seen Amazon really going after the enterprise, and of course its marketing would suggest that it's actually cheaper to put stuff from the public cloud, but it's always more expensive to rent than it is to own. Absolutely. And I would imagine a lot of your customers are saying, okay, we're not going to deploy in the public cloud. Maybe the right tool for the right job selectively. That's another tool in our bag, but it's the private cloud piece that you see as having to evolve to the point where this thing can really take off. Agreed. We think that there will be a lot of experimentation in the public cloud. There might even be small scale deployments, but given an even playing field where it's easy to move your workload into your private cloud, the cost advantages of running it there will win in the long run. So what's next for WAN Disco, David? What should observers be looking for and tracking you against? What we told Ash Eld is, look out for three things in our execution plan. One, products, check. We've launched four products into the marketplace a year ahead of schedule. Number two, partners, check. We've got seven partners in less than a month. We're expecting probably about 50 this year. And number three is customers, and not quite checked because we've just launched these products and partnerships of course, but I'm expecting us to enter the marketplace in the next nine months with lots and lots of customers. Well WAN Disco, when I first met you guys, I was blown away by your knowledge and your savvy in this space. And we're starting to see execution follow, which is great. So vision and execution usually makes a good top down bottom up success. So continued success for you guys. We'll be watching and really appreciate you guys coming on. Thank you very much. Pleasure to be here. Thank you, David. Take care. All right, keep it right there. We'll right back with our next guest. This is Silicon Angles theCUBE and we're live at the Stratoconference in Santa Clara.