 Live from New York, it's theCUBE. Covering Big Data New York City 2016. Brought to you by headline sponsors, Cisco, IBM, Nvidia, and our ecosystem sponsors. Now, here are your hosts, Dave Vellante and George Gilbert. We're back in New York City. This is theCUBE, the worldwide leader in live tech coverage. And we're here with a special presentation as part of Strata Hadoop, Strata Hadoop World. Tendu Yogurtju is here. She's a general manager of Big Data at SyncSort. Tendu, it's great to see you again. Same here. Hi, Dave. Hi, George. So it's been a while since we sat down and talked. I know you've been chatting with some of the folks at Wikibon and SiliconANGLE, but give us the quick update on SyncSort. What's new? A lot of things are new. We just announced actually capabilities around data governance. Data governance is a very important area of focus for us this year. Half of our Hadoop customers and Big Data customers are already in production. And these are enterprise companies like financial services, insurance, healthcare, where regulatory compliance is very critical. So we started focusing in this area, complimenting what the Hadoop vendors are also doing. We are integrated with Cloud and Navigator. We are working towards integration with Atlas. And some of those happen, as you know, due to our native integration and open source contributions quickly and organically. What we announced last week is making data governance and audits and taking metadata lineage, which is simpler and more flexible for our customers and for organizations. What that means is most of the financial services or insurance companies already have custom in-house metadata management frameworks. They are looking forward to transition into Hadoop-based metadata management frameworks. However, this comes at a certain adoption speed and as these Hadoop-based tools are maturing at the same time. So we wanted to give them flexibility by publishing the metadata lineage cross-platforms and in just an open text format that anybody can use with in-house metadata management tools. When I talk about cross-platforms, Hadoop-based frameworks know very well what's happening in the Hadoop, what's happening in the higher meta stores at the moment that the data is in the platform. We also have the knowledge of data as we are accessing all enterprise data from mainframe, vSAM files, DB2 on Z, legacy data stores, databases, and new data stores, streaming data and mobile and web data. We have that understanding of where the data is originating. What's happening to it? As it lands in the enterprise data lake or data hub and the analytics happen, so we publish that in an open format that in addition to our integration with Navigator and work in progress with Atlas. So, Tendu, the problem that you're solving here is that with all these data sources, people can't trust the data. Is that right? That's correct. And with big data, these data silos, so people are, organizations are trying to really break the silos of data and have all of the enterprise data available for advanced analytics. Advanced analytics, machine learning, deep learning, all require data to be accessible to them. Data has to be brought to them, made accessible for them. So, while making that happen, security becomes a challenge. Data governance becomes a challenge. Where did the data move? How many copies of data is created? Do I have that data already available in Hadoop so I can actually make use of it as part of my analysis, whether it's with MapReduce or Spark? These questions are becoming bigger challenges for the enterprise organizations. So, this week's announcement is around, not only that, we are integrated and compatible with the cloud navigator and Hadoop-based frameworks. We are also making this data available for you. Lineage is available in an open format. If you want to make use of this in your enterprise metadata management tool or custom in-house tool, the organization is flexible of their choice and adoption speed. So, two more questions about this. So, this is part of the DMX platform, is that right? This is part of the DMX, DMX8, yes. And we have this available on-premise in the cloud and as the data is also potentially going, sourcing on-premise and ending in the cloud. And the output is a CSV file and I can do whatever I want with it. You can import to the metadata reposter that they have. In most cases, insurance, healthcare, they have something custom built in-house. And they are looking to move to Hadoop-based, but they have something in-house already. So, it sounds to me tender like dropping in Hadoop and other tools that these pipelines are getting much, much larger in terms of the transformations, the analytics. What are some of the challenges in addition to keeping track of the metadata or the lineage information that goes that you're now enabling to be captured between the tools? Like, what's the next step in terms of ensuring this integrity? I think, security, governance, and also simplifying this environment and simplifying it, whether the data is in motion, whether the data is in batch, whether the data is on-premise or in the cloud, having a single interface and single tool for batch and streaming on-premise or in the cloud, and both for legacy data, mainframe data, and new data sources is really critical. When Matt and Wei, they were talking about examples, for example, for an insurance company collecting data from the cars streaming. It's often that when you are actually collecting data, whether it's IoT or it's mobile data, sensor data, you have to make sense of this by referring to the historical customer data or transactional data. These critical data assets still reside in the legacy stores. It's often on the mainframe. When we make a credit card payment or swap the card in a yellow cab or an Uber in New York City, we are accessing a mainframe. So bridging the gap between the mainframes and the big data analytics and advanced analytics is where we are really positioning ourselves and bringing value to our customers. This many variety of use cases and having silos of data, silos of tools is a big challenge. So bridging the gap between these multiple types of data as well as platforms is important. Is there one format that has enough information to move from all these different types of tools and platforms and to capture all that change in the lineage? In other words, is there a generic format that you've modeled in these tab-deliminated or comma-separated tables that captures everything that the customer needs to know across all their different tools? For us, we partner with our customers. I mean, this is part of a big picture data governance story and making life easier for the organizations to keep track of what happened to their data for compliance reasons. And when we partnered, we saw that if you speak to five different customers, they have five different ways of having metadata management in their environment. So the common simplest format for us to really publish it in an open CSV file format because they have ways to import this to their environment. And they can now see data is originating from metadata mainframe DB2 and, let's say, Teradata and joined with data coming from their online financial services offering and integrated and written into a high table. This information in and out of Hadoop is available for them. We already published that into Cloud Air Navigator. We were one of the first vendors actually last year to integrate. We are also working currently for the same for Hortonworks Atlas. We were at a chief data officer conference last Friday in Boston. And one of the CDOs said to us that we said, what's the framework for getting started? How should people think about this? He said, there are five things that you have to do to be a data-driven company. And three are sequential. Two are simultaneous. The two that were simultaneous were you got to partner with a line of business, and you got to cultivate people and train people in the proper context. Those are ongoing activities. But the three sequential linear activities, one was understand how you make money. Not how you monetize the data, but how your company monetizes and how data can support that, which was a nice little twist, I thought, because everybody used to think, let's go sell the data. Wow, that's hard. How do we sell data? Nobody wants to buy our data. But so that was good. The second was understand your data sources. And the third was trust, data trust, achieve data trust. So that's what we're talking about here, this lineage and provenance of data. My question is, what else is part of that number three? So that's good news for you in this announcement, number three. What else is there? Is there a data quality component? And where does Syncsort fit in that number three? There's a data cleansing component, definitely. Data preparation component, that's part of that. And data quality is definitely part of that as well. And talking about those three, actually, the good news is that we have something to say for all three, right? Making data accessible is one of the biggest value propositions, making all data accessible. There are several tools for accessing new types of data. We are also making legacy data accessible for advanced analytics and machine learning and Spark analytics, for example. And the first one, how your company makes money, obviously, is helping with that, accelerating the return on investment and reducing the total cost of ownership. Those are areas. In terms of what else is happening, we also end of August announce another acquisition. And that's Kajito, a UK-based company, which falls very well into our overall strategy for bridging the gap between the mainframes and big data and liberating more data from the enterprise legacy data stores. They have expertise in DB2 and IBM DB2 and CA's IDMS. And we are very happy to have their products part of the portfolio and more important, the talent they have because DB2 is very common among our big data customers. And another announcement that we made during Hadoop Summit was Hortonworks is reselling DMXH for ETL onboarding and basically onboarding new customers with Hadoop reducing the skill sets. Do you typically sell? I mean, you talked about this announcement, specifically financial insurance, health care. Those are highly regulated industries. Those industries tend to have a chief data officer. Are you increasingly selling to interacting with chief data officers? We are interacting with the chief data officer, sometimes big data CTO, also called data warehouse architect. Usually, data warehouse architect is still heavily involved in the big data implementations. And whoever owns really the Hadoop as a service or data as a service architecture, those are really the level of people that we interact. And one of the areas I think it becomes critical, as we are providing a single interface, a graphical user interface, a different environment that people can combine, batch streaming, and work with different types of data cleansing and data preparation in a graphical user interface driven environment, we are able to actually help organizations leverage the skill sets they have in-house. So they don't have to be the expert in Scala coding, which you have to be somewhat in order to do efficient and optimized Spark programming. They don't have to understand how to tune map reduce on a particular cluster configuration. That really helps. That's why we have still the data warehouse engineers and architects as part of our audience. It sounds like you're sort of step by step reconstituting the traditional ETL function, but that you can accommodate streams that you can accommodate lineage and other governance functions. It sounds like you want to be a trusted conduit across systems where those systems form an analytic pipeline. Yes, definitely. However, ETL is also redefined in the context of Hadoop and big data in a way. Big data prep and integration. So as part of that next generation data warehouse architecture, or Hadoop as a service data as a service architecture, we see the kind of data integration, innovating itself. Our goal is really to bring value to our customers by making all of the data available in a single interface and having the flexibility accommodating cloud and hybrid and on-premise to cloud environments in addition to multiple ways of working with data. And we are not trying to do this from scratch. We have already expertise in terms of dynamics optimizations and self-tuning and simplicity with our products, partnering with the Hadoop vendors, Cloudera and Hortonworks and MapR have been really very complementary to our overall strategy. They do really the platform-specific enhancements and strengthen the platform. And we complement them in terms of solving hard problems on the user side, governing the data, the lineage, leaving the security frameworks to them, for example, and organically inherit some of the things happening in the Hadoop stack. Tendu, we're out of time, but so as close, you've observed this space now for a number of years and you've seen sort of all those shiny new toys come out and then the enterprise sort of subsumed them in and you guys were at the center of all that. How would you describe where we're at in this whole space? I think we are at a very good place now. One of the indicators is that when we are talking with the chief data officer, most of them already made the decision that they have a cluster in Hadoop place. So two years ago, you would go, oh, I'm trying to make my mind which Hadoop distribution, which use case. If you don't have a use case, if you don't have a business problem to solve, you shouldn't be really talking about the big data initiative. Now we can understand the maturity of the technology stack, maturity of the market, more importantly, that actually people have a business use case and they know how they will measure the results. And this is especially with many of the enterprise companies, not just the social media companies where everything originated, right? It's not just companies like Facebook or Twitter and Google. It's with the financial services, with the insurance and taking advantage of the data sets and being able to actually access and analyze these data is now part of their competitive, sustainable, competitive value proposition. So it is, we are at a very good place. I think- Big data goes mainstream. Big data is mainstream already, yes. All right, we'll leave it there. Thank you, Tendu, for coming back in theCUBE. It's good to see you again. Thank you for having me. It's always a pleasure. All right, keep it right there, everybody. We'll be back with our next guest. This is theCUBE, we're live from New York City. Right back.