 Live from San Jose, in the heart of Silicon Valley, it's theCUBE, covering DataWorks Summit 2017. Brought to you by Hortonworks. theCUBE, we are live at day one of the DataWorks Summit. We've had a great day here. I'm surprised that we still have our voices left. I'm Lisa Martin with my co-host, George Gilbert. We have been talking with great innovators today across this great community. Folks from Hortonworks, of course, IBM, partners. Now I'd like to welcome back to theCUBE, who was here this morning in the green shoes, the CTO of Hortonworks, Scott now. Welcome back, Scott. Great to be here yet again. Yet again, and we have another CTO. We've got a CTO corner over here with CUBE alumni, and the CTO of Syncsort, Tendu Yokchul. Welcome back to theCUBE, both of you. Thank you for being here. Thank you. So, guys, what's new with the partnership? I know that Syncsort, you have 87% or 87 of the Fortune 100, our companies, our customers. Scott, 60 of the Fortune 100 companies, our customers of Hortonworks. Talk to us about the partnership that you have with Syncsort. What's new, what's going on there? You know, there's always something new in our partnership. We launched our partnership, what, a year and a half ago or so? And it was really built on the foundation of helping our customers get time to value very quickly, right? And leveraging our mutual strengths. And we've been back on theCUBE a couple of times, and we continue to have new things to talk about, whether it be new customer successes or new feature functionalities or new integration of our technology. And so it's not just something that's static and sitting still, but it's a partnership that had a great foundation and value and continues to grow. And, you know, with some of the latest moves that I'm sure Tendu will bring us up the speed on, that Syncsort has made, customers who have jumped on the bandwagon with us together are able to get much more benefit than originally they even intended. Let me talk about some of the things actually happening with Syncsort and with the partnership. Thank you, Scott. And Trilium acquisition has been transformative for us. Really, we have achieved quite a lot within the last six months, delivering joint solutions between our data integration, DMXH and Trilium data quality and profiling portfolio. And that was kind of our first step, very much focused on the data governance. We are going to have data quality for Data Lake product available later this year. And this week, actually, we will be announcing our partnership with Calibra data governance platform, basically making business rules and technical metadata available through the Calibra dashboards for data scientists. And in terms of our joint solution and joint offering for data warehouse optimization and the bundle that we launched early February of this year, that's in production, a large complex production deployments already helping our customers access all their data, all enterprise data, including legacy data warehouse, new data sources, as well as legacy mainframe in the Data Lake. So we will be announcing again in a week or so, change data capture capabilities from legacy data stores into Hadoop, keeping that data fresh and giving more choices to our customers in terms of populating the Data Lake, as well as use cases like archiving data into cloud. Teddy, let me try and unpack what was a very dense, in a good way, a lot of content sticking my foot in my mouth every 30 seconds. I think you called you dense. Yeah. So help us visualize a scenario where you have maybe DMXH bringing data in, you might have changed data capture coming from a live database and you've got the data quality at work as well. Well, help us picture how much faster and higher fidelity the data flow might be relative to- Sure, absolutely. So our bundle and our joint solution with Hortonworks really focuses on business use cases. And one of those use cases is enterprise data warehouse optimization where we make all enterprise data accessible in the Data Lake. Now, if you are an insurance company managing claims or you are building a data as a service, Hadoop as a service architecture, there are multiple ways that you can keep that data fresh in the Data Lake. And you can have change data capture by basically taking snapshots of the data and comparing in the Data Lake, which is a viable method of doing it. But as the data volumes are growing and the real-time analytics requirements of the business are growing, we recognize our customers are also looking for alternative ways that they can actually capture the change in real-time when the change is just like less than 10% of the original data set and keep the data fresh in the Data Lake. So that enables faster analytics, real-time analytics, as well as in the case that if you are doing something from on-premise to the cloud or archiving data, it also saves on the resources like the network bandwidth and overall resource efficiency. Now, while we are doing this, obviously we are accessing the data and the data goes through our processing engines. What Trillium brings to the table is the unmatched capabilities around profiling that data, getting a better understanding of that data. So we will be focused on delivering products around that because as we understand data, we can also help our customers and to create the business rules to cleanse that data and preserve the fidelity of the data and integrity of the data. So with the change data capture, it sounds like near real-time, your capturing changes in near real-time, could that serve as a streaming solution that then is also populating the history as well? Absolutely, we can go through streaming or message queues, we also offer more efficient proprietary ways of streaming the data to the Hadoop. So I assume the message queues refers to probably Kafka and then your own optimized solution for sort of maximum performance, lowest latency. Yes, we can do either through Kafka queues which is very efficient as well and we can also go through proprietary methods. So Scott, help us understand then now the governance capabilities that, I'm having a senior moment and I'm getting too many of these. The help us understand the governance capabilities that Syncsort's adding to the sort of mix with the data warehouse optimization package and how it relates to what you're doing. Yeah, right, so what we talked about even again this morning, right, the whole notion of the value of open squared, right, open source and open ecosystem and I think this is clearly an open ecosystem kind of play. So we've done a lot of work since we initially launched the partnership and through the different product releases where our engineering teams and the Syncsort teams have done some very good low-level integration of our mutual technologies so that the Syncsort tool can exploit those horizontal core services like yarn for multi-tenancy and workload management and of course Atlas for data governance. So as then the Syncsort team adds feature functionality on the outside of that tool, that simply accretes to the benefit of what we've built together. And so that's why I say customers who started down this journey with us together are now going to get the benefit of additional options from that ecosystem that they can plug in additional feature functionality and at the same time are really thrilled because and we've talked about this on many times, right, the whole notion of governance and metadata management in the big data space is a big deal. And so the fact that we're able to come to the table with an open source solution to create common metadata tagging that then gets utilized by multiple different applications I think creates extreme value for the industry and frankly for our customers because now regardless of the application they choose or the applications that they choose they can at least have that common trusted infrastructure where all of that information is tagged and it stays with the data through the data's life cycle. So your partnership sounds very, very symbiotic that there's changes made on one side that reflect the other. Give us an example of where is your common customer and this might not be, well they're all over the place who's got an enterprise data warehouse. Are you finding more customers that are looking to modernize this that have multi-cloud core edge IoT devices that's a pretty distributed environment versus customers that might be still more on-prem? What's kind of a mix there? Can I start and then I will let you build on. I want to add something to what Scott said earlier. Atlas is a very important integration point for us and in terms of the partnership that you mentioned the relation I think one of the strengths of our partnership is at many different levels. It's not just executive level, it's cross functional and also from very close field teams, marketing teams and engineering fields teams working together. And in terms of our customers, it's really organizations are trying to move towards modern data architecture. And as they are trying to build the modern data architecture, there are the data in motion piece I will let you Scott talk about, the data in rest piece and as we have so much data coming from cloud originating through mobile and web, in the enterprise, especially the Fortune 500 that we talked Fortune 100, we talked about insurance, healthcare, Talco, financial services and banking has a lot of legacy data stores. So our really joint solution and the couple of first use cases, business use cases we targeted were around that. How do we enable these data stores and data in the modern data architecture? I will let Scott. Yeah, and so I agree. And so certainly we have a lot of customers already who are joint customers and so they can get the value of the partnership kind of because they've already made the right decision, right? I also think though there's a lot of green field opportunity for us because there are hundreds not thousands of customers out there who have legacy data systems where their data is kind of locked away, right? And by the way it's not to say the systems aren't functioning and doing a good job, they are. They're running business facing applications and all of that's really great. But that is a source of raw material that belongs also in the data lake, right? And can certainly enhance the value of all the other data that's being built there. And so the value frankly of our partnership is really creating that easy bridge to kind of unlock that data from those legacy systems and get it in the data lake. And then from there, the sky's the limit, right? Is it reference data that can then be used for consistency of response when you're joining it to social data and web data? Frankly, is it an online archive and optimization of the overall data fabric and offloading some of the historical data that may not even be used in legacy systems and having a place to put it where it actually can be accessed in? And so there are a lot of great use cases. You're right, it's a very symbiotic relationship. I think there's only upside because we really do complement each other and there's a distinct value proposition, not just for our existing customers, but frankly for a large set of customers out there that have kind of the data locked away. So how would you see, do you see the data warehouse optimization sort of solution set continuing to expand its functional footprint? What are some things to keep pushing out the edge conditions, the realm of possibilities? Some of the areas that we are jointly focused on is we are liberating that data from the enterprise data warehouse or legacy architectures. Through the Syncios or DMXH, we actually understand the path that data traveled from. The metadata is something that we can now integrate into Atlas and publish into Atlas and have Atlas as the open data governance solution. So that's an area that definitely we see an opportunity to grow and also strengthen that joint solution. Sure, I mean extended provenance is kind of what you're describing. And that's a big deal when you think about some of these legacy systems where frankly 90% of the cost of implementing them originally was actually building out those business rules on that metadata. And so being able to preserve that and bring it over into a common and open platform is a really big deal. I'd say inside of the platform of course as we continue to create new performance advantages in the latest releases of Hive as an example, where we can get low latency query response times, there's a whole new class of workloads that now is appropriate to move into this platform. And you'll see us continue to move along those lines as we advance the technology from the open community. Well, congratulations on continuing this great symbiotic as we said, partnership. It sounds like it's incredibly strong on the technology side, on the strategic side, on the GTM side. I've loved how you said liberating data so that companies can really unlock its transformational value. We want to thank both of you for Scott coming back on theCUBE twice in one day. Tendu, thank you as well for coming back to theCUBE for both of our CTOs that have joined us from Hortonworks and Syncsort and my co-host George Gilbert. I'm Lisa Martin. You've been watching theCUBE live from day one of the DataWorks Summit. Stick around, we've got great guests coming up.