 Live from the Moscone Center, it's theCUBE, covering AWS Summit San Francisco 2018. Brought to you by Amazon Web Services. Welcome back, I'm Stu Miniman, and this is theCUBE's exclusive coverage of AWS Summit here in San Francisco. Happy to welcome back to the program, Jagain Sundar, who's the CTO of WAN Disco. Jagain, great to see you, how have you been? You've been great Stu, thanks for having me. All right, so every show we go to now, data really is at the center of it, you know? I'm an infrastructure guy, you know, data is so much of the discussion here, here in the cloud and the keynotes that they were talking about it. IoT of course, data is so much involved in it. You know, we've watched WAN Disco, you know, from the days that we were talking about big data. Now it's, you know, there's AI, there's ML, data's involved, but you know, tell us, you know, what is WAN Disco's position in the marketplace today and the updated role on data? Stu, we have this notion, this brand new industry segment called live data. Now, this is more than just ADBD data or big data. In fact, this is cloud-scale data located in multiple regions around the world and changing all the time. So you have East Coast data centers with data, West Coast data centers with data, European data centers with data, all of this is changing at the same time. Yet, your need for analytics and business intelligence based on that is across the board. You want your analytics to be consistent with the data from all these locations. That, in a sense, is the live data problem. Okay, I think I understand it, but you know, we're not talking about, like in the storage world, there was like hot data. It was what's hot and cold data. In, you know, we talked about real-time data for streaming data and everything like that, but how do you compare and contrast, you know, kind of, you said global and scope, talked about, you know, multi-region, really talking distributed from an architectural standpoint. You know, what's enabling that to be kind of the discussion today? Is it the likes of Amazon and their global reach? And you know, where does WAN Disco fit into the picture? So, Amazon's clearly a factor in this. The fact that you can start up a virtual machine in any part of the world in a matter of minutes and have data accessible to that VM in an instant changes the business of globally accessible data. You're not simply talking about primary data center and the disaster recovery data center anymore. You have multiple data centers, the data's changing in all those places, and you want analytics on all of the data, not part of the data, not on the primary data center. How do you accomplish that? That's the challenge. Yeah, so drill into it a little bit for us. You know, is this a replication technology? Is this just, you know, a service that I can spin up? When you say live, can I turn it off? Or, you know, how do those kind of, when I think about all the cloud dynamics and levers? So, it is indeed based on active-active replication using a mathematically strong algorithm called PAXOS. In a minute, I'll contrast that with other replication technologies. But the essence of this is that by using this replication technology as a service, so if you are going up to Amazon's web services and you're purchasing some analytics engine, be it Hive or Redshift or any analytics engine, and you want to have that be accessible from multiple data centers, be available in the face of data center or entire region failure, and the data should be accessible, then you go with our live data platform. Okay, yeah, so we want you to compare and contrast. You know, I think about, you know, I hear active-active, you know, speed of light's always a challenge. You know, globally, having consistency is challenging. There's things like Google Spanner out there to look at those. You know, how does this fit compared to, you know, the way we've thought of things like replication and, you know, globally distributed systems in the past? Interesting question. So, RSS and analytics, RSS is great for analytics applications, whereas something like Google Spanner is more like a MySQL database replacement that runs in multiple data centers. We don't cater to that database transaction type of applications. We cater to analytics applications of, you know, batch, very fast streaming applications, enterprise data warehouse type analytics applications for all of those. Now, if you take a look inside and see what kind of replication technology we use, you'll find that we're better than the other two different types. There are two different types of existing replication technologies. One is log shipping. The traditional Oracle Golden Gate type ship the log once the change is made to the primary. The second is take a snapshot and copy differences between snapshots. Both have their deficiencies. Snapshot, of course, is time-based and it happens once in a while. You'll be lucky if you can get one-day RTO with those sorts of things. Also, there's an interesting anecdote that comes to mind when I say that because the Hadeep folks in their HDFS implemented a version of snapshot and snapdiff. The unfortunate truth is that it was engineered such that if you have a lot of changes happening, the snapshot and snapdiff code might consume too much memory and bring down your name note. That's undesirable. Now, your backup facility just brought down your main data capability. So, snapshot has its deficiencies. Log shipping is always active passive. Contrast that with our technology of live data where you can have multiple data centers filled with data. You can write your data to any of these data centers. It makes for a much more capable system. Okay, can you explain that? How does this fit in with AWS and can it live in multi-clouds? What about on-premises? The whole multi- and hybrid cloud discussion. Interesting. So, the answer is yes. It can live in multiple regions within the same cloud, multiple regions within different clouds. It can also bridge data that exists on your on-prem to do for other big data systems or object store systems within cloud S3 or Azure or any of the blob stores available in the cloud. And when I say this, I mean in a live data fashion, that means you can write your on-prem storage. You can also write to your cloud buckets at the same time. We'll keep it consistent and replicated. Yeah, what are you hearing from customers when it comes to where their data lives? I know last time I interviewed David Richards, your CEO, you know, he said, the data lakes are really used to be on-premises. Now there's a massive shift moving to the public clouds. Is that continuing? What's kind of the breakdown? What are you hearing from customers? So, I cannot name a single customer of ours who is not thinking about the cloud. Every one of them has a presence on-premise. They are looking to grow in the cloud. On-prem does not appear to be on a growth path for them. They're looking at growing in the cloud. They're looking at bursting into the cloud. And they're almost all looking at multi-cloud as well. That's been our experience. You know, at the beginning of the conversation, we talked about data. How are customers doing, you know, exploiting and leveraging or making sure that they aren't, you know, making, having data become a liability for them? All right. So, there are so many interesting use cases I'd love to talk about. But the one that jumps out at me is a major auto manufacturer. Telematics data coming in from a huge number, hundreds of thousands of cars on the road. They chose to use our technology because they can feed their West Coast car telematics into their West Coast data center while simultaneously writing East Coast car data into the East Coast data center. We do the replication. We build a live data platform for them. They run their standard analytics applications, be it Hadoop sourced or some other analytics applications, they get consistent answers. Whether you run the analytics application on the East Coast or the West Coast, you will get the same exact answer. That is very valuable because if you are doing things like, you know, fault detection, you really don't want spurious detection because the data on the West Coast was not quite consistent and your analytics applications was led astray. That's a great example. We also have another example with a top three bank that has a regulatory concern where they need to operate out of their backup data center, so-called backup data center, once every periodically, three months or so. Now with live data, there is no notion of active data center and backup data center. All data centers are active. So this particular regulatory requirement is extremely simple for them to implement. They just run their queries on one of the other data centers and prove to the regulators that their data is indeed live. I could go on and on about a number of these. We also have a top two retailer who has got such a volume of data that they cannot manage it on one who do cluster. They use our technology to create the live data lake. Yeah, one of the challenges always, customers love the idea of global, but governance, compliance, things like GDPR pop up. Does that play into your world? Is that a bit outside of what you're discussing? It actually turns out to be an important consideration for us because if you think about it, when we replicate, the data flows through us. So we can be very careful about not replicating data that is not supposed to be replicated. We can also be very careful about making sure that the data is available in multiple regions within the same country if that is the requirement. So GDPR does play a big role in the reason why many of our customers, particularly in the financial industry, end up purchasing our software. Okay, so this new term, live data, are there any other partners or viewers that are involved in this? As always, you want a bit of an ecosystem to help build out a wave. Correct. So our most important partners are the cloud vendors and they're multi-region by nature. There is no idea of a single data center or a single region cloud. So Microsoft, Amazon with AWS, these are all important partners of ours and they're promoting our live data platform as part of their strategy of building huge hybrid data lakes. All right, Jigain, give us a little view, looking forward. What should we expect to see with live data in Wendisco through the rest of 2018? So looking forward, we expect to see our footprint grow in terms of dealing with a variety of applications all the way from batch pig scripts that used to run once a day to hide that's maybe once every 15 minutes to data warehouses that are almost instant and queryable by human beings to streaming data that pours things into Kafka. We see the whole footprint of analytics databases growing. We see cross capability, meaning perhaps an Amazon Redshift to an Azure SQL DW replication. Those things are very interesting to us, to our customers because some of them have strengths in certain areas and others have strengths in other areas. Customers want to exploit both of those. So we see us as being the glue for all world scale analytics applications. All right, well, Jigain, appreciate you sharing with us. Everything was happening in Wendisco. It's a new idea of live data. We look forward to catching up with you and the team in the future, hear more about the customers and everything on there. We'll be back with lots more coverage here from AWS Summit. Here at San Francisco, I'm Stu Miniman. You're watching theCUBE.