 And welcome to another CUBE conversation where we go in depth with the thought leaders in the industry that are making significant changes to how we conduct digital business and the likelihood of success with digital business transformations. I'm Peter Burris. Every organization today is some experience with the power of analytics, but they're also learning that the value of their analytic systems are in part constrained and determined by their access to core information. Some of the most important information that any business can start to utilize within their new advanced analytic systems, quite frankly, is that operational business information that the business has been using to run the business on for years. Now, we've looked at that as silos and maybe it is, although partly that's in response to the need to have good policy, good governance, and good certainty and predictably in how the system behaves and how secure it's going to be. So the question is, how do we marry the new world of advanced analytics with the older, but nonetheless extremely valuable world of operational processing to create new types of value within digital business today? It's a great topic and we've got a great conversation. Tendu Yaguchu is the CTO Sync Sort. Tendu, welcome back to the CUBE. Hi Peter, it's great to be back in the CUBE. Excellent, so look, let's start with a quick update on Sync Sort. How are you doing? What's going on? Oh, it's been really an exciting time at Sync Sort. We have seen a tremendous growth in the last three years. We quadrupled our revenue and also number of employees through both organic innovation and growth as well as through acquisitions. So we now have 7,000 plus customers in over 100 countries and we still have the 84 of Fortune 100 serving large enterprises. It's been a really great journey. Well, so let's get into the specific distinction that you guys have. At Wikibon theCUBE, we've observed, or we predicted that 1919, 2019 rather, 2019 was going to be the year that the enterprise asserted itself in the cloud. We had seen a lot of developers drive cloud forward. We've seen a lot of analytics drive cloud forward. But now as enterprises are entering into cloud in a big way, they're generating or bringing with them new types of challenges and issues that have to be addressed. So when you think about where we are in the journey to more advanced analytics, better operational certainty, greater use of information, what do you think the chief challenges that customers face today are? Of course, as you mentioned that everybody, every organization is trying to take advantage of the data. Data is the core and take advantage of the digital transformation to enable them for getting more value out of their data. And in doing so, they are moving into cloud, into hybrid cloud architectures. We have seen early implementations starting with the data lake. Everybody started creating the centralized data hub, enabling advanced analytics and creating a data marketplace for their internal or external clients. And the early data lakes were utilizing Hadoop on on-premise architectures. Now we are also seeing data lakes sometimes expanding over hybrid or cloud architectures. The challenges that these organizations also started realizing is around, once I create this data marketplace, the access to the data, critical customer data, critical product data. Order data. Order data is a bigger challenge than I told that it will be in the pilot project because these critical data assets and core data assets, often in financial services, banking and insurance and healthcare are in environments, data platforms that these companies have invested over multiple decades. And I'm not referring to that as legacy because definition of legacy changes, these environments platforms have been holding these critical data assets for decades successfully. So- We call them high value traditional applications because they're traditional, we know what they do and there's a certain operational certainty and we've built up the organization around them to take care of those assets, but they still are very, very high value. Exactly. And making those applications and data available for next generation, next wave platforms is becoming a challenge for a couple of different reasons. One, accessing this data and accessing this data, making sure the policies and the security and the privacy around these data stores are preserved when the data is available for advanced analytics, whether it's in the cloud or on-premise deployments. So before we go to the second one, I want to make sure I understand that because it seems very, very important that what you're saying is, if I may, the data is not just the ones and the zeros in the file. The data really needs to start being thought of as the policies, the governance, the security and all the other attributes and elements, the metadata, if you will, has to be preserved as the data is getting used. Absolutely. And there are challenges around that because now you have to have skill sets to understand the data in those different types of stores, relational data warehouses, mainframe, IBMI, SQL, Oracle, many different data owners and different teams in the organization. And then you have to make sense of it and preserve the policies around each of these data assets while bringing it to the new analytics environments and make sure that everybody is aligned with the access, the privacy and the policies and the governance around that data. And also mapping the metadata to the target systems, right? That's a big challenge because somebody who understands these data sets in a mainframe environment is not necessarily understanding the cloud data stores or the new data formats. So how do you kind of bridge that gap and map into the target environment? And vice versa, right? Yes. Likewise, yes. So this is where SeeingSort starts getting really interesting because as you noted, a lot of the folks in the mainframe world may not have the familiarity of how the cloud works and a lot of the folks, at least from a data standpoint, and a lot of folks in the cloud that have been doing things with object stores and whatnot may not in Hadoop, may not have the knowledge of how the mainframe works. And so those two sides are SeeingSilos, but the reality is both sides have set up policies and governance models and security regimes and everything else because it works for the workloads that are in place on each side. So SeeingSort's an interesting company because you guys have experience of crossing that divide. Absolutely. And we see both the next wave and existing data platforms as a moving, evolving target because these challenges have existed 20 years ago, 10 years ago. It's just the platforms were different. The volume, the variety complexity was different. However, Hadoop five, 10 years ago was the next wave. Now it's the cloud. Blockchain will be the next platform that we have to still kind of adapt and make sure that we are advancing our data and creating value out of data. So that's accessing and preserving those policies is one challenge. And then the second challenge is that as you are making these data sets available for analytics or machine learning, data science applications, deduplicating, standardizing, cleansing, making sure that you can deliver trusted data becomes a big challenge because if you train the models with the bed data, if you create the models with the bed data, you have bed model and then bed data inside. So machine learning and artificial intelligence depends on the data and the quality of the data. So it's not just bringing all enterprise data for analytics, it's also making sure that the data is delivered in a trusted way. That's a big challenge. Yeah, let me build on that if I may tend to because a lot of these tools involve black box belief in what the tool is performing. Correct. So they don't have a lot of visibility in the inner workings of how the algorithm is doing things, that's the way it is. So in many respects, your only real visibility into the quality of the outcome of these tools is visibility into the quality of the data that's going into the building of these models. I got that right? Correct, and in machine learning, the effect of bed data is really multiplies because of the training of the model as well as the insights. And with blockchain in the future, it will also become very critical because once you load the data into a blockchain platform, it's immutable. So data quality comes at a higher price in some sense. So that's another big challenge. Which is to say that if you load bad data into a blockchain, it's bad forever. Yes, that's very true. So that's obviously another area that seems sort of as we are accessing all of the enterprise data, delivering high quality data, discovering and understanding the data and delivering the duplicated standardize and reach the data to the machine learning and AI pipeline and analytics pipeline is an area that we are focused with our products. And the third challenge is that as you are doing it, the speed starts mattering because, okay, I created the data lake or the data hub. The next big use case we started seeing is that, oh yeah, but I have 20 terabyte data, only 10% is changing on a nightly basis. So how do I keep my data lake in sync? Not only that, I want to keep my data lake in sync. I also would like to feed that change data and keep my downstream applications in sync. I want to feed the change data to the microservices in the cloud. That speed of delivery started really becoming very critical requirement for the businesses. Speed and the targeting of the delivery. Speed of the targeting, exactly. Because I think the bottom line is you really want to create an architecture that you can be agnostic and also be able to deliver at the speed the business is going to require at different times. Sometimes it's near real time and a batch. Sometimes it's real time and you have to feed the changes as quickly as possible to the consumer applications and the microservices in the cloud. Well, we've got a lot of CIOs who are starting to ask us questions about, especially as they start thinking about Kubernetes and Istio and other types of platforms that are intended to facilitate the orchestration and ultimately the management of how these container-based applications work. And we're starting to talk more about the idea of data assurance. Make sure the data's good, make sure it's been high quality, make sure it's being taken care of, but also make sure that it's targeted where it needs to be. Because you don't want a situation where you spin up a new cluster, which you could do very quickly with Kubernetes, but you haven't made the data available to that Kubernetes-based applications so they can actually run. And a lot of CIOs and a lot of application development leaders and a lot of business people are now starting to think about that. How do I make sure that data is where it needs to be so that the applications run when they need to run? That's a great point. And going back to your kind of command around the cloud and taking advantage of cloud architectures, one of the things we have observed is organizations were looking at cloud in terms of scalability, elasticity and reducing cost, they did lift and shift of applications. And not all applications can be taking advantage of cloud elasticity when you do that. Most of these applications are created for the existing on-premise fixed architectures. So they are not designed to take advantage of that. And we are seeing a shift now and the shift is around instead of trying to kind of lift and shift existing applications, one for new applications, let me try to adopt the technology assets like you mentioned, Kubernetes, that I can stay vendor agnostic for cloud vendors. But more importantly, let me try to have some best practices in the organization. The new applications can be created to take advantage of the elasticity, even though they may not be running in the cloud yet. So some organizations refer to this as cloud native, cloud first, some different terms and make the data because the core asset here is always the data. Make the data available instead of going after the applications, make the data from these existing on-premise and different platforms available for cloud. We are definitely seeing that shift. Yeah, and make sure that it then assure that that data is high quality, carries the policies, carries the governance, doesn't break the security models, all those other things. There is a big difference between how actually organizations went into their Hadoop data lake implementations versus the cloud architectures now because when initial Hadoop data lake implementations happened, it was dump all the data. And then, oh, I have to deal with the data quality now. No, it was also, oh, those mainframe people just would, they're so difficult to work with. Meanwhile, you're still closing the books on a monthly basis, on a quarterly basis, you're not losing orders, your customers aren't calling you on the phone, angry. And that, at the end of the day, is what business has to do. You have to be able to extend what you currently do with a digital business approach. And if you can replace certain elements of it, okay, but you can't end up with less functionality as you move forward in the cloud. Absolutely, and it's not just mainframe, it's IBMI, it's the Oracle, it's the Threadata, it's the DTSA, it's growing rapidly in terms of the complexity of that data infrastructure. And for cloud, we are seeing now, a lot of pilots are happening with the cloud data warehouses and trying to see if the cloud data warehouses can accommodate some of these hybrid deployments. And also, we are seeing there's more focus, not after the fact, but more focus on data quality from day one. How am I gonna ensure that I'm delivering trusted data and populating the cloud data stores or delivering trusted data to microservices in the cloud? There is a greater focus for both governance and quality. So, high-quality data movement that leads to high-quality data delivery in ways that the business can be certain that whatever derivative work is done remains high quality. Absolutely. Tendu Yiguchi, thank you very much for being once again on theCUBE. It's always great to have you here. Thank you, Peter. It's wonderful to be here. Tendu Yiguchi is the CTO of Syncsort. And once again, I want to thank you very much for participating in this cloud or this cube conversation, cloud on the mind, this cube conversation until next time.