 Live from the San Jose Convention Center, extracting the signal from the noise, it's theCUBE, covering Hadoop Summit 2015, brought to you by headline sponsor Hortonworks, and by EMC, Pivotal, IBM, Pentaho, Teradata, Syncsort, and by Atunity. Now your hosts, John Furrier and George Gilbert. Okay, welcome back everyone. We are live in Silicon Valley at Hadoop Summit 2015. This is theCUBE's Silicon Angles flagship program, where we go out to the events and extract the signal from the noise. I'm John Furrier, my host, George Gilbert, big data analyst at wikibon.com. Check out his research there. Go to siliconangle.tv, join the conversation, check out this new cutting edge app, crowdchat.net, join the conversation, jump into a chat there. Our next guest is Jigain Sundar, CTO of Wendisco, welcome back to theCUBE, alumni. You're the CTO of Wendisco, and you guys have made some significant bets over the past few years in this big data Hadoop space. Successful bets, they played out. The horse came in, you won the triple crown. What, how do you feel? We feel wonderful. Absolutely, we feel wonderful. Thank you, John. Thank you, George, for having me back. Specifically, the bet you're talking about is the recognition that the big data space is really actually bigger than Hadoop, and we had all sorts of market feedback coming back to us telling us it's not just HDFS, it's other storage systems, it's not just analytics using MapReduce, it's other analytic systems. And what we ended up doing was building a replication layer about normal HDFS or GPFS or Isilon or other storage systems. The result was that for all those customers out there with diverse environments, we gave them the ability to have strongly consistent replicated versions of their data. Now, the wonderful thing is the replication can happen across the wide area network. So the next thing you know, you have people coming back to us telling us, this is great, can I use it for migrating from one distribution to the other? Can I use it for migrating from one version? What they're really saying is, I want to use this for my business because what's happening with Hadoop and big data is it's going mainstream. It's already crossed over and you guys have been so successful with your vision about the data center because you finally went to the future and pre-built some software and now the world's kind of rolling right there. And you're seeing the examples in mainstream. If people have an eye watch, if they have iTunes, they're on Apple Pay, if they're buying Spotify, if they're doing anything on Facebook, they're touching Hadoop. So like, this is not like a one-off thing now. This is like completely crossed over. It's absolutely completely crossed over and it's not just single Hadoop installations anymore. All of them have many distributions in many data centers and traditional storage vendors are also playing a very strong role in this. They have certain use cases that work best with them. So a solution like ours, which really helps you to keep your data consistent across these disparate storage environments, plays really well into that world. I love interviewing you and you guys and everyone on theCUBE here because you know why it's the technologists like yourself that build the engine under the hood that powers all the things I just talked about, the goodness on the consumer side, now coming to the enterprise. So I got to ask you, what's the new innovation under the hood in this Hadoop real estate market, if you will? I mean, it's like buying beachfront property, real estate. True, the traditional vendors are going after scale and other limits within the data center. There are others such as ours who believe that this world is much larger than single data centers. We go after cross data center replication, cross data center storage and analytics, that sort of a thing. There are other vendors who are going after pure speed where the storage types that are amenable to speed are more likely to be used in those environments. This is becoming a true enterprise play with all sorts of storage and all sorts of compute playing in this one so-called big data environment. So I think about the flash points to describe when, it was always hard to like, what do those guys do? Because they're like, they're the smart guys that do all that stuff. But Hurricane Sandy, when it hit New York, that was when we interviewed you about that event. Hadoop was very fragile at that time. Right. And what you guys proved was, there was a storm after the storm, which means in the data centers that got washed out in New York, lower Manhattan, the people who didn't have any kind of... And then, so that was a pivotal moment in your history. It was a pivotal moment. Explain that Hurricane Sandy situation. This was a classic situation where folks had solutions built for the metropolitan area land. They had solutions that could not exceed five or 10 milliseconds of latency. A strong mathematical algorithm like PAXOS when extended over the wide area network can solve those issues. So those folks who had data centers in Manhattan with a backup data center in New Jersey were completely wiped out. That's when we stepped in and offered a solution that works across any distance you want. And it was a big hit for the Hadoop use case at that point. But as time went on, we found customers coming back to us saying, does all this NFS data I have? Is there anything you can do about it? And that got us thinking. You know, this replication and getting a global data platform out is more than just Hadoop and HDFS. Thereby, we started looking at extending our platform beyond Hadoop. This was before most vendors had even recognized that Hadoop was not the only player in the big data space. So it gave us a head start on building our cross data center, cross platform replication solution. So now what's the new Hurricane Sandy event? You got Fusion. You guys made a good bet that you show that you could have non-stop Hadoop or I call it industrial grade data center. It was kind of the pre, the messaging now is, oh, the enterprise ready. You guys had that two years ago. So what's happening now for you guys? What's the new innovation? So the new innovations we're coming up with are pretty much in the compute space so you can submit a job. We'll find out where the data's available, that does compute capacity, run it and provide you with results without having to do any specific things. Then we're going to include NFS, SIFs and other storage protocols into a replication system. So if you have a standard NFS server that you're using for doing some web log collection, replication will be automated with our system. Then you can run analytics on top of that. The final frontier, which is something that I'm starting to talk a little bit more within the company about, is actually running database, RDBMS type technology across different data centers. This is technology that's in the rarefied community. Not many people have seen success with it, but we believe that we can offer fusion for SQL, which basically means you can run your SQL queries with multiple data centers at play. You can have ingest happening in each of those data centers and you'll always get consistent answers. You can go back to your database from three days ago, run a query, you'll get the exact same answer. You can go back to your database from two weeks ago, you'll get exactly the same answer. This is globally distributed RDBMS capability that we think is the final frontier in the way fusion is going to evolve. So what's the future outlook for the engineers? What are you guys working on? So you're in the R&D, you guys have successfully predicted two major pivot points in the industry. What's the next? So in our company, the emphasis is always on network capability. We look at it and tell people there's never a customer who has just one data center. So we get total awareness of latencies, we get total awareness of distributed coordination. Our engineers go through this process of almost an epiphany when they realize how distributed coordination plays a central role in keeping data consistent across any distance. So that's where all of our engineering education is, if you will. Going forward, of course, we'll develop expertise in other protocols such as NFS and other things that we're talking about. From a customer perspective, essentially what we're offering is the ability to have a single data lake that spans more than just Hadoop. You can have your NFS, a single data lake, if you will. Oh, the spans location. Correct, locations, protocols, applications that run on top of it. You could call it a data fabric. In fact, that's a term that you yourself used a few minutes ago, which I think is very appropriate. It's the sort of fabric where you can put data in from any application and run analytics of any kind on top of this fabric. So what do you think about this year's event? I got to ask you, because every year, you've got a lot of people here. Is that about 40 people? Indeed, we've got about 40 people. And so, you know, earlier on, the strata's were, I'm going to be very honest here, but the strata's were better than the summit. But this year, I think it's the first year that the summit is actually better than the strata. I have one more data point for you. This is the first year that the price of entry to the summit is more expensive than the price of entry to the summit. What made the summit better? The price of entry is what? Price of entry to the summit is more than that of the price of entry to the Linux con, the big Linux conference. Good, yes, all right, hey. It sort of tells you. What made the summit better than strata? You know, I think that customers sort of view the summit as more open, more amenable to open source developers, little bit more friendly to upstarts people who have a new and innovative idea and want to pitch an innovative product based on that. I think strata's a little more closed in that front. That's my perspective. Tweeted a moment. Just tweeted it. But no, I think the big, well, it was interesting. It's who lead frogs who, because strata was first. Now if strata is better than last year, then strata is fragmented. I mean, it's a very closed event. I mean, it's exclusive. I mean, it's exclusionary. They can exclude people. So strata is kind of a different conference. This one has much more of a feel for free. It's open source. It's not about, you know, the money so much, although people want to make money in this business model. It's not all talk about that kind of thing, but there's real developer community here. To me, George and I were just talking about this over lunch. This conference represents, to me, a true representation of the diverse workforce of corporations. You have DevOps. You have IT guys. You have software developers who work for companies. Software developers who work in open source, open source guys in general. You have data scientists. You have executives trying to figure it out. This, to me, this is the enterprise. That's DevOps, complete flattening of the silos. Right, it is. I mean, the modern enterprise data strategy includes all of the folks that you just mentioned. And if you have a good idea in any of these areas at the summit, you do stand a chance of making an impact. You can come out here. I got to ask you the hard question. Well, not hard question. It's an easy question. Yep. For you, Spark. There is a world right now where Spark is dominating its on Hadoop that could be Spark not on Hadoop. There could be a non-Hadoop world. Hadoop's going to grow. We know that's a fact. How big is unknown? However, we also know this is also an environment where a lot of the goodness in the stack will sit on top of something other than Hadoop. Sure. So that's just life, right? It's called multi-vendor. What's your vision on Spark and this other non-Hadoop opportunities? So as you point out, the non-Hadoop aspects of big data are real. There are use cases for which traditional storage actually works very well. So they're going to have a big presence. Spark for its role does a very good job of specific applications. But remember, it's just specific applications. It's not the be-all and end-all compute platform. It may compete well in certain applications, but not in all. And as the developer community improves, they are coming up with better ways to compute things. So my guess is that Spark is a great compute platform for today, but there will be more and better innovations also. But there's also a core difference when you take it right down to the heart of the two ecosystems. Like Hadoop is, we're going to build a workload that does this really well like SQL, we'll do machine learning, we'll do streaming, whereas the Spark one is, let's build a common core and we can, this way we can take the output of one and put it into the other. But and it's not as broad today as Hadoop, partly because it's immature, no? It is immature at this point and there is more thought going into that. But if you really look at it, it's not the sort of innovation that's two generations ahead. For example, it still all runs within a single data center. There is a better notion of feeding one job to the other so you can have map, map, reduce type things or map, reduce, reduce type things, better than a traditional map reduce environment. But still it's all some form of processing that happens in sequence within a single data center. So while it is innovative, I believe that the next generation of innovation will come from more global perspective, more things run in multiple data centers with all of the ingests happening in all the data centers. There are disparate solutions like Kafka for multi-DC ingests that's somewhat functional. There's single DC analytics like Spark and Impala to an extent and the traditional folks from SAS and others have solutions also. But as developers improve in their understanding of how these systems work, you will get more generational improvements. So would you see, I mean, if you're this managing the data fabric across storage tiers, across application workloads, for the real-time analytics maybe over here and ingest over here, would you see some workload manager or some framework sitting on top of your fabric? Is that where your mindset is moving? We think that there's value in that, particularly since we have the ability to know where certain data exists and where replication is available. So a more intelligent yarn that can tell you that there's compute capacity available in your New York data center, replication is already completed to that so you should run the job there. That sort of intelligence is very welcome in this environment. Oh, because you've got the intelligence about the data that these other, like a Spark or a Hadoop, can't yet take advantage of. Correct. They don't have the ability to know that yet because their scope is all within a single data center and the minute you move off data center you have consistency problems, you have a whole stack of software that was never meant to operate with long latencies, all the problems associated with that. So Google has started to make some innovations in multi-data center databases like Spanner. Correct. Would your infrastructure make it possible for another ISV who's not quite got the chops of Google to build that sort of database? Indeed. That's exactly right. That's a great example that you've given us. Now, underlying Google's multi-dc effort is a Paxos that depends on very strong time guarantees provided by GPS clocks and atomic clocks. Now, our software, we have patents on being able to run a Paxos algorithm across wide area networks without any of these enhancements. The advantage to enterprise customers is they're going to get capabilities like what Google offers and more capabilities built on top of Hadoop without the need for extra hardware, specialized hardware. That's where we are headed. Okay. We are here live inside theCUBE. Final question, thumbs up on the show. What's the key thing that's happening here at the show for the folks watching in a bumper sticker? What's the vibe? What's the show? Thumbs up on the show and the key differentiator or the key moment this represents is the true integration of traditional enterprise software and hardware vendors with Hadoop ecosystems. I think 2015 is the year that's real. That's a great nutshell. Okay. Jigain Sundar, CTO of WAN Disco. Smartest minds in the business you guys made. Congratulations on the bets you made. It was a gamble. You stopped the distro. It was. You threw the arrow forward. You guys really did a good job. Congratulations. Thank you, John. Thank you, George. Okay, we'll be right back more for this short break. This is theCUBE live in Silicon Valley for Hadoop Summit 2015. I'm John Furrier with George Gilbert. We'll be right back after this short break.