 Live from San Jose, California, it's theCUBE. Covering Big Data Silicon Valley 2017. Hey California, Silicon Valley at the heart of the Big Data world. This is theCUBE's coverage of Big Data Silicon Valley in conjunction with Strata Hadoop. Of course we've been here for multiple years. Covering Hadoop World for now our eighth year, now that's Strata Hadoop. When we do our own event, Big Data SV, in New York City and Silicon Valley, SVNYC. I'm John Furrier, my co-host George Gilbert, an analyst at Wikibon. Our next guest is Tendu Yogurtcher with Syncsort, general manager of the Big Data. Did I get that right? Yes, you got it right. It's always a pleasure to be here. I love your name. It's so hard for me to get, but I think I was close enough there. Welcome back. Thank you. Great to see you. One of the things I'm excited about with Syncsort is we've been following you guys. We talk to you guys every year and it just seems to be that every year more and more announcements happen. You guys are unstoppable. You like what Amazon does. It's just more and more announcements. But the theme seems to be integration. Give us the latest update. You had an update. You bought Trillium. You had a deal with Horton or she got integrated with Spark. You got a big news here. What's the news here this year? Sure. Thank you for having me. And yes, it's very exciting times at Syncsort. And I probably say that every time I appear because every time it's more exciting than the previous, which is great. So we bought Trillium software and Trillium software has been leading data quality over-educated in many of the enterprises. And it's very complementary to our data integration data management portfolio because we are helping our customers to access all of their enterprise data, not just the new emerging sources in the connected devices and mobile and streaming. And also leveraging reference data, the mainframe legacy systems and the legacy enterprise data warehouse. While we are doing that, accessing data, Data Lake is now actually, in some cases, turning into data swamp. And that was a term Dave Valente used a couple of years back in one of the crowd chats and it's becoming real. So data... Real being the data swamps, I mean, data's legs are turning into swamps because they're not being leveraged properly. Exactly, exactly. Because it's about also having access to write data. And data quality is very complementary because Trillium has trusted write data served to enterprise customers in the traditional environments. So now we are looking forward to bring that enterprise trusted data quality into Data Lake. And in terms of the data integration, data integration has been always very critical to any organization. It's even more critical now that the data is shifting gravity and the amount of data organizations have. What we have been delivering in very large enterprise production environments for the last three years is we are hearing our competitors making announcement in those areas very recently, which is a validation because we are already running in very large production environments. We are offering value by saying, create your applications for integrating your data, whether it's in the cloud, originating in the cloud or originating on the mainframes, whether it's on the legacy data warehouse, you can deploy the same exact application without any recompilations, without any changes on your standalone Windows laptop or in Hadoop MapReduce or Spark in the cloud. So this design once and deploy anywhere is becoming more and more critical with data is originating in many different places and cloud is definitely one of them. And our data warehouse optimization solution with Hortonworks and EdgeScale, it's a special package to accelerate this adoption. It's basically helping organizations to offload workloads from the existing TerraData or NITISA data warehouse and deploying in Hadoop. We provide a single button to automatically map the metadata, create the metadata in Hive or on Hadoop and also make the data accessible in the new environment. And EdgeScale provides fast BI on top of that. Wow, that's amazing. So I want to ask you a question because this is another theme, so I just did a tweet just now while you're talking, saying the theme this year is cleaning up the data lakes or data swamps, aka data lakes. The other theme is integration. So could you just lay out your premise on how enterprises should be looking at integration now because it's the multi-vendor world, it's a multi-cloud world, multi-data type and source with metadata world. How do you advise customers that have the plethora of action coming at them? IoT, you got cloud, you got big data, I got Hadoop here, I got Spark over here. What's the integration formula? So first thing is identify what your business use cases. What's your business challenge? What's your business goals and the challenge? Because that should be the real driver. We see in some organizations, they start with the intention, we would like to create a data lake without having that very clear understanding. What is it that I'm trying to serve with this data lake? And data as a service is really becoming a team across multiple organizations, whether it's on the enterprise side or on some of the online retail organizations, for example. And as part of that data as a service, organizations really need to adapt tools that are going to enable them to take advantage of the technology stack. The technology stack is evolving very rapidly. The skill sets are rare and skill sets are rare because you need to be kind of making in the arguments, am I hiring PhD students who can program Scala in the most optimized way or should I hire Java developers or should I hire Python developers? So, and the names of the tools in the stack, Spark one versus Spark two APIs change, it's really evolving very rapidly. It's hard to find Scala developers. I mean, you go outside and look in the alley. Exactly. So you need to be, as an organization, our advice is that you really need to find tools that are going to fit those business use cases and provide a single software environment that data integration might be happening on-premise now with some of the legacy enterprise data warehouse and it might happen in an hybrid on-premise and cloud environment in the near future and perhaps completely in the cloud. So standard tools, tools that have some standard software behind it so you don't get stuck in the personnel hiring problem of some unique domain expertise that's hard to hire. Yes, skill set is one problem. The second problem is the fact that the applications need to be recompiled because the stack is evolving and the APIs are not compatible with the previous version. So that's the maintenance cost also to keep up with the things, to be able to catch up with the new versions of the stack. That's another area that the tools really have because you want to be able to develop the same application and deploy it anywhere in any compute platform. So Tendu, if I hear you properly what you're saying is integration sounds great on paper, it's important, but there's some hidden costs there. Yes. And that is the skill set and then there's the stack recompiling in the future. Okay, that's awesome. So take a step back and zoom out and talk about sync sorts positioning because you guys have been changing with the stacks as well and you guys have been doing very well obviously with the announcements you've been just coming on the market all the time. What is the current value proposition for sync sort today? The current value proposition is really we have organizations to create the next generation modern data architecture by accessing and liberating all enterprise data and delivering that data at the right time and the right quality data. That's our, it's liberate, integrate with integrity. So that's our value proposition. How would it that? We provide that single software environment you can have batch legacy data and the streaming data sources integrated in the same exact environment and it enables you to adapt to Spark2 or Flink or whichever compute framework is going to happen. That has been our value proposition and it is proven in many production deployments. Hey, what's interesting too is the way you guys have approached the market you've locked down the legacy. So you have, we've talked about the main frame and it's well beyond that now. You guys have un-understand the legacy so you kind of lock that down, protect it and make it, it's not, I mean secure, it's security-wise but you do that too but making sure it works because still data there. These legacy systems are really critical in the hybrid. A main frame expertise and heritage that we have is a critical part of our offering and we will continue to focus on innovation on the main frame side as well as on the distributed. One of the announcement that we made since our last conversation was bringing, we have partnership with Compware and we now bring more data types about application failures, it's abandoned data to splunk for operational intelligence and we will continue to also support more delivery types. We have batch delivery, we have streaming delivery and now replication into Hadoop has been a challenge so our focus is now replication from DB2 on main frame and we send our main frame to Hadoop environments that's what we will continue to focus on main frame because we have heritage there and it's also part of a big enterprise data lake. You cannot make sense of the customer data that you are getting from mobile if you don't reference the critical data sets that are on the main frame and with the Trillium acquisition it's very exciting because now we are at a kind of pivotal point in the market, we can bring that data validation, cleansing and matching superior capabilities we have to the big data environments and one of the things is also- So when you have a low latency, is it you guys do the whole low latency thing too? You bring it in fast? Yes, we bring, that's our current value proposition and as we are accessing this data and integrating as part of the data lake, now we have capabilities with Trillium that we can profile that data, get statistics and start using machine learning to automate the data stewards job. Data stewards are still spending 75% of their time trying to cleanse the data. So if we can- A lot of manual work labor there. Exactly. And modeling too by the way, the modeling and just the cleaning, cleaning and modeling and kind of go hand in hand. If we can automate any of these steps to derive the business rules automatically and provide the right data on the data lake, that would be very valuable. This is what we are hearing from our customers as well. You know, we've heard probably five years about the data lake as the center of gravity of big data but we're hearing at least a bifurcation and maybe more where now we want to take that data and apply it, operationalize it in making decisions with machine learning and predictive analytics. But at the same time, we're trying to square this strange circle of data, the data lake where you didn't say upfront what you wanted it to look like but now we want ever richer metadata to make sense out of it. You know, a layer that you're putting on it with data prep layer and others are trying to put different metadata on top of it. What do you see that metadata layer looking like over the next three to five years? So the governance is a very key topic and for organizations who are ahead of the game in the big data and who already established that data lake, data governance and even analytics governance becomes important. So what we are delivering here with the Trillium we will have generally available by end of Q1. We are basically bringing business rules to the data. So instead of bringing data to business rules we are taking the business rules and deploying them where the data exists. So that will be a key because of the data gravity you mentioned because the data might be in the Hadoop environment, data might be in a legacy enterprise data warehouse and it might be originating in the cloud and you don't want to move the data to the business rules. You want to move the business rules to where the data exists. Cloud is an area that we see more and more of our customers are moving forward. The two main use cases around integration is one because the data is originating in cloud and the second one is archiving data to cloud and we announced actually tighter integration with cloud director earlier this week for this event and that we have been in cloud deployments and we have actually an offering in elastic map produced already on EC2 for a couple of years now and also on the Google cloud storage. But this announcement is primarily making deployments even easier by leveraging cloud directors elasticity for increasing and reducing the deployment. Now our customers will also take advantage of integration jobs from that elasticity. Tendu, it's great to have you on the queue because you have an engineering mind but you're also now the general manager of the business and your business is changing. You're in the center of the action so I want to get your expertise and insight into enterprise readiness concept and we saw last week at Google cloud 2017, the Google going down the path of being enterprise ready or taking steps, I don't think they're fully ready but they're certainly serious about the cloud on the enterprise and that's clear from Diane Greene who knows the enterprise. And it sparked the conversation last week around what does enterprise readiness mean for cloud players because there's so many details in between the lines if you will of what products are, that integration, certification, SLAs, what's your take on the notion of cloud readiness? You know, vis-a-vis Google and others that are bringing cloud compute, a lot of resources with an IoT market that's now booming, big data evolving very, very fast, a lot of real time, a lot of analytics, a lot of innovation happening. What's the enterprise picture look like and from a readiness standpoint, how do these guys get ready? So from a big picture, for enterprise, there are a couple of things that these cannot be afterthought. Security, metadata lineage as part of data governance and being able to have flexibility in the architecture that they will not be kind of recreating the jobs that they might have already deployed in the on-premise environments, right? To be able to have the same application running from on-premise to cloud will be critical because it gives flexibility for adoption in the enterprise. And enterprise may have some map-reduced jobs running on-premise with the Spark jobs on cloud because they are really doing some predictive analytics, graph analytics on those. They want to be able to kind of have that flexibility of architecture where we hear this concept of a hybrid environment. And you don't want to be kind of deploying a completely different product in the cloud and redo your jobs. That flexibility of architecture, flexibility in adoption. So having different code bases in the cloud versus on-prem requires two jobs to do the same thing. Two jobs for maintaining, two jobs for standardizing, and two different skill sets of people, potentially. So you want security, governance, and being able to access easily and have applications move in between environments will be very critical. So seamless integration between clouds and on-prem first, and then potentially multi-cloud? Yes. So that's table stakes in your mind. They are absolutely table stakes. And a lot of vendors are trying to focus on that. Definitely Hadoop vendors are also focusing on that. And also one of the things like when people talk about governance, the requirements are changing. We have been talking about single view and customer 360 for a while now, right? And do we have it right yet? The enrichment is becoming key. With Trillium, we made a recent announcement, precise. Enriching, it's not just the address that you want to deliver and make sure that the address should be correct. It's also the email address and the phone number. Is it mobile number? Is it landline? It's enriched data sets now we have to be really dealing and there's a lot of opportunity and we are really excited because data quality discovery and integration are coming together and we have a good... Well, Tendu, thank you for joining us and congratulations on, yes, as SyncSort broadens their scope to being a modern data platform solution provider for companies, congratulations. Thank you. Thank you for having me. We're live on theCUBE here, live in Silicon Valley in San Jose. I'm John Furrier with George Gilbert. You're watching our coverage of Big Data, Silicon Valley in conjunction with Stroudhead Dube. This is Silicon Angles theCUBE. We'll be right back with more live coverage. We've got two days of wall-to-wall coverage with experts and pros talking about big data, the transformations here inside theCUBE. We'll be right back.