 Big Data SV 2014 is brought to you by headline sponsors, WAN Disco. We make Hadoop invincible and Actian, accelerating Big Data 2.0. Okay, welcome back everyone. This is theCUBE, SiliconANGLE and Wikibon's flagship program. We go out to the events, extract the silicon noise. We're here live in Silicon Valley for Big Data SV, covering all the action in Silicon Valley and over across the street at the Stratoconference. I'm John Furrier, the founder of SiliconANGLE and we have two special guests here from WAN Disco with the Chief Marketing Officer, Jim Campighili and Jigain Sundar, CTO. Welcome to theCUBE. Thanks John. Guys, first of all, I want to really appreciate your support. We had David earlier talking about all the great success and the news and everything. So I want to bring you guys in and kind of have more of a little bit of a drill down on some of what's going on in the customer base, in the technology theaters, in and around Big Data. So, Jim, I got to ask you first. It's the time of the market where we're starting to see the rubber hit the road, we're starting to see the proof of concepts were growing into production, we're starting to see some decision points, you're starting to see the value propositions on the outcome. So I got to ask you first question is, given your focus on the market, where do you guys see the pressure points for these value propositions and these business outcomes? So I think one of the things that everybody has to remember is that even though there's a lot of excitement around Hadoop and it's justified, there's a lot of power in terms of what you can do with it, only about 24% of the Hadoop clusters out there actually in production. And the reason for that is basically because a lot of the enterprise enabling capabilities in terms of disaster recovery, continuous availability, the ability to have Hadoop deployed across multiple data centers over wide area network, all of those things really haven't been there until Wendysco arrived on the scene. One of the things we, Dave and I always talk about is, and Dave loves to talk about as an analyst, talking about the customer focus, right? And the customer focus is one of those things, it can be used as a punchline and a lot of big data washing or any kind of solution sets, but ultimately there's still a lot of fear and certainty in doubt for CIOs because the fear of downtime, you mentioned continuous availability, planned downtime, unplanned downtime, these are not something that makes people feel comfortable. So how do you guys talk to customers when you're saying, hey, I love the promise of big data, but I really have to get to an operational position where I got to have good optics on visibility on migrations and movements and agile, but yet I can't have downtime. Well, that's one of the things that we really helped them overcome. So effectively what you have with our nonstop Hadoop solution is a package that enables you to have active data centers in multiple locations. So 100% uptime is critical, it's key, and it's the core of what we provide, but that ability to not have to worry about disaster recovery, to not have to worry about data availability, fail over those kinds of things, being able to access the same data at every location, effectively it won't vary network speeds. We resolve all these problems. In fact, backup and recovery is really, and the way we automate it really is a byproduct of what we do with our patented active-active replication. And it's much more than simply mirroring or snapshots. Every data center where you've got nonstop Hadoop installed is active and we literally enable you to deploy a single Hadoop cluster and now with today's announcement with our nonstop technology applied to HBase, a single HBase instance running on that cluster across multiple data centers. So you have now real-time interactive access to your data everywhere that you've deployed nonstop Hadoop. Again, I want to get your perspective. Obviously HBase, obviously everyone who follows us knows we've built on HBase with our CrowdChat product, but also we were the first ones to do the cube at HBase console in the inaugural event of HBaseCon in San Francisco. Things like NameNode, regional servers, these are all things that people worry about. So I want to get your take under the hood. HBase has been called even back in the early days when Facebook was showing their implementations and everyone's rolling out the greatness of HBase and the buzz started to kind of lift. It was called a tailored suit, but if you had it fitted, it's hard to use it other places. One, tell us two things. Talk about the NameNode and the regional servers, those issues around operation, but also this notion of a tailored suit. Has it changed? Is it more flexible? What's your take? So let's start with HDFS. If a HBase instance cannot talk to HDFS, it gets really unhappy. It's likely to lose data. Availability is out of the question. Your applications will freeze. So by solving the availability of HDFS, you solve the first problem. But the next layer up is the actual services that HBase uses. One is the master and that's the HBase master, then there are the region servers. You have a multitude of region servers and each one serves a portion of your database. That's a single point of failure for that portion. You need to resolve that. You need to be able to have multiple servers playing an active role in serving that particular region. To answer your question, is it still a tailored suit? It's gotten better. It used to be that HBase would really not do well if you push it beyond its comfort zone, but it's getting better. It's more resilient and solutions such as ours make it all the more available. It's getting to the point where enterprise grade applications that are transaction based are possible on this environment. Can you give some examples of that? Because that's a big thing people talk about that aren't close to the action. Oh, HBase is not ready for prime time or enterprises shouldn't be ready. What are some of the proof points that you can share around some of the things that HBase is doing that's getting a lot of traction? One of the things that we can't talk about specific customers, but we have a large public utility in the Europe zone that's using us for customer data. This is the sort of data that fits well in traditional RDBMS applications, SAP type applications, and HBase is getting favor over those traditional systems. That's a sign of the availability and improved capability in HBase. Now, it used to be that a MapReduce job would take minutes to start up and it's not really suitable for real time. This changes the game altogether. Jim, talk about the replication piece you mentioned earlier around disaster recovery. How? I mean, every enterprise has this paradigm, right? I mean, you saw EMC by data domain back in the day and everyone has all these practices. Is there shifts in the practices of replication and disaster recovery that really is, that needs an aspirin, the customers where they need an aspirin today? Or is it more of shifting over to more of just needs for new applications? Can you see this? Well, I think one of the challenges that a lot of firms have had with the traditional backup and recovery solutions is that at best, the Recovery Data Center, if you will, was limited to a metropolitan land. And if a hurricane Sandy comes through, that's not gonna be good enough. In fact, a lot of firms and financial services and other industries really have regulatory requirements that their data has to be available and accessible no matter what's happening. The other thing about those traditional solutions is you have a primary data center where everybody's doing their work and then you have this backup data center even if you're using a solution that claims to be real-time backup that is not fully accessible. It's standby. You've got this extra hardware that's not doing anything for you until some disaster strikes and assuming that secondary data center didn't get blown out, then it becomes active. But if you have some major disaster, even something within a 50-mile radius over metropolitan land isn't going to help you. We saw one of Dave Ridges earlier and he talks about some of the tectonic shifts happening in IT. In particular, honestly, you have DevOps in cloud kind of putting a big part of this new modern era. How do your customers and how do customers that you're talking to handle this kind of shift? And kind of disaster recovery, it's always very important but it's not always on the front burner when you look at architecture, right? It's like, oh yeah, I'm going to load all these databases and you do have pipelining of data and new analytics. Is it front and center from your perspective and do they, when they find out, is it too late and how do you talk to customers? Oh, I should have factored that in. We hear a lot about that with backup recovery, disaster recovery and these areas where it's always on top of mind but not always top of the architectural discussion. How do you guys talk to customers? Well, we explain the benefits that we add that you're not limited to a metropolitan land like some of the other popular solutions and again, that's only one facet of what we offer. We enable companies that have data centers all over the world to be able to actively use that hardware in every data center and it's a byproduct of the active-active replication, the patented active-active replication that we have that enables, in effect, continuous synchronization across all those data centers and it enables the users at each location to actively access that data, not only to read from it but to write to it and change it and have those changes reflected everywhere. So we now are talking to some companies that are looking at doing things like global clickstream analysis but not just gathering logs, doing more intelligent things, time series analysis. They operate in a number of countries, they introduce new products and services on their websites, they want to see where they're getting the most hits and what other interaction they're having with those people that are hitting their websites so that they can intelligently look at the impact of what they've just launched. So I want to ask a question. And localize that as well. I want to ask a question for both of you guys and different perspectives and you both can answer the same question but business outcome and most of the technology. In every startup I talk to, entrepreneur or big company, something happens, you mentioned Hurricane Sandy and you go, oh, if they had my solution, they would be like saved, you know? And we always, and because you know your product, you know how it renders itself and the value proposition, share with the folks some color and you don't have to name names, you can say, hey, in that scenario, point to some examples of folks that if they had your solution or an outcome or a use case that you could really knock out of the park on the business side and then on the technology side. I think one of the most common use cases is that ability to continue operating when something strikes regionally, a regional disaster. You can immediately start accessing another data center and it's not just a backup data center, it was fully operational. It has all your data, including your login information, even though you were in New York and the other data center was in Europe or Asia somewhere. You can just about take any scenario where you'd be 100% available. And if they're not 100% available, what are some of the consequences? They may go several days without access to the data they need. I mean, obviously, can cripple a business. You can suffer tens of millions of dollars in losses. There have been some, I won't name the vendor necessarily, but some cloud-based options for big data that some large, well-known companies depended upon when they had an outage, what effectively happened is that business, Netflix over Christmas, about a year or so ago, effectively was not obviously getting any, able to process any orders for any of their movies and deliver them to their customers. Huge customer satisfaction problem right there. So again, how about you, in the technical perspective, what are some of the things you kind of lick your chops going, man, if they had our solution, use cases where you've seen and you know? So the one anecdote that comes to mind is a CIO who was unhappy because 37 out of his 90 servers were going to be sitting in read-only standby mode. And it didn't seem like such a big problem for a large part of his organization, but it hit his desk and he looked at it and went, no, that's 37 servers that are gonna be sitting idle. And you're gonna expand this cluster to 10 times what it is, so you're gonna have 300 to 400 servers. So that was our foot into the door at this particular customer. And they were, that's 37 idle servers, so they looked at the multiple on the scale side. Exactly, and for all of our value props, that's the one that hit his radar. We walked in and we explained that we could rebalance their load into both data centers or multiple ones when he needed it. And that was instant, there was a flash of light that went off and that's the sort of experience we're seeing, this is not technical at all, but the technology we bring to the table in order to enable this is very hard to do. It is the sort of thing that projects like ZooKeeper start off by saying, Paxos is very hard to do, so we decided to do something that's less and it's probably enough. Well, if it's critical data, probably good enough is not good enough, really. You need guarantees, you need mathematical certainty that your data is replicated and available. That's what Paxos brings to the table. Our own enhancements to Paxos, patented enhancements, add additional value to this. We're able to offer guarantees that most other vendors cannot and that's probably successful sales. So I got to ask you, so most customers have kind of like that same itch they're always scratching. It's usually a cultural thing or their business that they're in, where it's retail or finance or some vertical, they always have that kind of unique itch that they're scratching technically or from a business outcome. What would you say that you guys align with most from a success standpoint from that customer use case, that itch, if you will? I would, so the thing that immediately comes to mind is that now that people are looking at a wholesale replacement of key change in the way they do data processing, they're actually open to a lot of things that have been annoying them for a while. 50 mile radius limitation on data center recovery because SRDF was built as a block replication technology that's unaware of file system sitting on top of it. This was an annoyance for customers for a long time. When we walk in and say now that you're doing a big data system that's completely different, we can resolve all of these additional problems in addition to the prices that you get with big data. They're quick to jump on that. It's really resonating with them. Jim, what's your take on obviously getting the word out for them on the marketing side, when Discosly and Big Data NYC you guys had nonstop Hadoop, this is the age based announcement. What's the trend line? What are you vectoring on from a solutions standpoint that you're packaging and how do you explain that to your customers? Well, I think basically, especially with the age based announcement and obviously as I've been saying and Jay Gaines been saying, you need the nonstop name node capability which we started with and then today with the age based announcement, with the multiple active master servers and multiple region servers and so forth. What that means is you can now deploy this in an environment, a real time interactive environment where in some cases, high frequency trading, those kinds of applications for example, even a moment of downtime can cost tens of millions of dollars. So there's a very big value proposition for the right set of applications. Let's talk about mission critical as we kind of get to wrap up on time. I want to get your take on the final question on mission critical applications that developers out there. What do they need to understand in the current market? It's more of a macro question. As you guys go out there and work with some of your customers and potential customers, what's the environment for them? What do they need to pay attention to that's important for developers today? Well, one of the things to understand about our technology in general is that if they understand Hadoop, we're transparent. Our implementation doesn't alter Hadoop's functionality. They can use all of the components of the Hadoop ecosystem that they're used to working with. So really, I think what they need to understand more than anything is you've got a new set of capabilities. Hadoop has moved from kind of a batch-oriented, run a map-reduced job, pull the data out, display it. It's probably several hours old when you're looking at it on Yahoo or whatever the site is, to more of a real-time interactive kind of mode, trying to use it like you traditionally use an RWMS like an Oracle or a DB2. And you can now, with our technology, do those kinds of applications on Hadoop and do them with all the kind of rich data that Hadoop supports, not just text and rows and columns, but semi-structured and unstructured data of any stripe. So I think what they need to understand is the skills they have are going to serve them well, but now you can start building applications as if there is no downtime and you can take those extra risks of rolling out real-time interactive mission-critical applications on top of Hadoop that simply would not have been possible before and doing it on a worldwide scale, not just for a single location. So, again, I want to ask you a final question for you is, the landscape of the Hadoop ecosystem, we've seen the evolution. We're now, I don't know what year we're in, fifth, maybe, if it's sixth year? Seems like a decade, but the maturing of the market is happening. Now, what are you seeing technically and in the community as key trends that folks should be aware that might not be paying attention to the day-to-day, open source, H-base, communities, a lot of stuff going on in the community. What's your take on where we're at? I don't want to be too controversial, but I do see consolidation among distributions. I think there are going to be a few key distributions, much like Linux consolidated around a few. That's a good thing for both the community and the users. The application environments are also going to consolidate to a smaller extent because I think applications, there is more appeal for different types of getting access to the data. Most importantly, the notion that you cannot run SQL on top of big data systems is basically going to disappear. And the final point I'd like to make is that it's really easy and important for applications to run across multiple data centers these days. Nobody expects you can go to Amazon and purchase a VM and two data centers. That's really easy to set up. But the software to run your application across these is not, and that's important for even small and medium businesses to consider. People need that kind of data. So you think that's table stakes for the future of data motion across data centers, hence the regional focus? Exactly. Okay guys, well thanks so much, Jim, and appreciate WAN Disco support. Guys, you guys have been amazing to partners with theCUBE. We really appreciate it, and thanks for the insight. We are here live at Big Data SV, getting all the thought leadership, all the data streaming it live here, extracting the scene from the noise. This is theCUBE. I'm John Furrier. We'll be right back with our next guest here, live in Silicon Valley for Big Data SV. We'll be right back.