 From around the globe, it's theCUBE with digital coverage of smart data marketplaces brought to you by IOTAHO. Digital transformation has really gone from buzzword to a mandate. A digital business is a data business and for the last several months we've been working with IOTAHO on an ongoing content series focused on smart data and automation to drive better insights and outcomes, essentially putting data to work. And today we're going to do a deeper dive on automating data discovery. And one of the thought leaders in this space is AJ Vahora, who's the CEO of IOTAHO, who's once again joining me, AJ, good to see you. Thanks for coming on. Great, great to be here, David, thank you. So let's start by talking about some of the business realities and what are the economics that are driving automated data discovery? Why is that so important? Yeah, and on this one, David, it's a number of competing factors. We've got the reality of data, which may be sensitive, so there's control. Three other elements are wanting to drive value from that data, so innovation. You can't really drive a lot of value without exchanging data. So the ability to exchange data and to manage those cost overheads. And data discovery is at the root of managing that in an automated way to classify that data in sets and policies to put that automation in place. Yeah, look, we have a picture of this. If we could bring it up guys, because I want to, AJ, help the audience understand kind of where data discovery fits in here. This is, as we talked about, this is a complicated situation for a lot of customers. They've got a variety of different tools and you've really laid it out nicely here in this diagram. So take us through sort of where that piece fits. Yeah, I mean, we're at the right-hand side of this exchange. We're really now in a data-driven economy that is everything's connected through APIs that we consume online through mobile apps. And what's not apparent is the chain of activities and tasks that have to go into serving that data to an API. At the outset, there may be many legacy systems, technologies, platforms on-premise in cloud, hybrid, you name it. And across those silos, getting to a unified view is the heavy lifting. I think we've seen some great impacts that BI tools, such as Power BI, Tableau, Looker, and so on and click have had. And they're in our ecosystem on visualizing data and CEOs, managers, people that are working in companies day to day, get a lot of value from seeing what's the real-time activity? What was the trend over this month versus last month? The tool is to enable that. We hear a lot of good things on work that we're doing with Snowflake on MongoDB, on the public cloud platforms, GCP Azure, about enabling building those pipelines to feed into those analytics. But what often gets hidden is how do you source that data that could be locked into a mainframe, a data warehouse, IoT data, and pull all of that together? And that is the reality of that is it's a lot of heavy lifting. It's hands-on work that can be time consuming. And the issue there is that data may have value. It might have potential to have an impact on the top line for a business, on outcomes for consumers. But you're never really sure unless you've done the investigation, discovered it, unified that and be able to serve that through to other technologies. Guys, if you would bring that picture back up again, because AJ, you made a point, and I want to land on that for a second. There's a lot of manual curating. An example would be the data catalog. If data scientists complain all the time that they're manually wrangling data. And so you're trying to inject automation into the cycle. And then the other piece that I want you to address is the importance of APIs. You really can't do this without an architecture that allows you to connect things together. That sort of enables some of the automation. Yeah, I mean, I'll take that in two parts, David. The APIs, so virtual machines are connected by APIs. Business rules and business logic driven by APIs. Applications, so everything across the stack from infrastructure down to the network, hardware is all connected through APIs. And the work of serving data through to an API, building those pipelines, is often miscalculated just how much manual effort that takes. And that manual effort, we've got a nice list here of what we automate down at the bottom. Those tasks of indexing, labeling, mapping across different legacy systems. All of that takes away from the job of a data scientist, a data engineer, looking to produce value, monetize data, and to help their business data consumers. Yeah, so it's that top layer that the business sees, of course, there's a lot of work that has to go into achieving that. I want to talk about some of the key tech trends that you're seeing. One of the things that we talk about a lot is metadata, the importance of metadata can't be understated. What are some of the big trends that you're seeing, metadata and others? Yeah, I'll summarize it as five. There's a trend now look at metadata more holistically across the enterprise. And that really makes sense from trying to look across different data silos and apply a policy to manage that data. So that's the control piece, that's that lever. The other side on sometimes competing with that control around sensitive data, around managing the cost of data is innovation. Innovation, being able to speculate and experiment and try things out. Where you don't really know what the outcome is if you're a data scientist or an engineer, you've got a hypothesis. And therefore you've got that tension between control over data and innovation and driving value from it. So enterprise wide metadata management is really helping to unlock where might that latent value be across that sets of data. The other piece is adaptive data governance. Those controls that stick from the data policemen, data stewards, where they're trying to protect the organization, protect the brand, protect consumers data is necessary. But in different use cases, you might want to nuance and apply a different policy to govern that data relevant to the context where you may have data that is less sensitive that can be used for innovation and adapting the style of governance to fit the context is another trend that we're seeing coming up here. Few others is where we're sitting quite extensively in working with automating data discovery. We're now breaking that down into what can we direct? What do we know is a business outcome is a known upfront objective and direct that data discovery to towards that. And that means applying our algorithms, our technology and our tools towards solving a known problem. The other one is autonomous data discovery. And that means trying to allow background processes to understand what changes are happening with data over time, flagging those anomalies. And the reason that's important is when you look over a length of time to see different spikes, different trends and activity, that's really giving a data ops team the ability to manage and calibrate how they're applying policies and controls to data. And the last two David that we're seeing is this huge drive towards self-service. So reimagining how to apply policy data governance into the hands of a data consumer inside a business or indeed the consumer themselves, the self-service if they're a banking customer or a healthcare customer. And the policies and the controls and rules making sure that those are all in place to adaptively serve those data marketplaces that we're now involved in creating. I want to ask you about the autonomous data discovery and the adaptive data governance. Is the problem we're addressing there one of quality? In other words, machines are better than humans are doing this. Is it one of scale that humans just don't scale that well? Is it both? Can you add some color to that? Yeah, honestly, it's the same equation that existed 10 years ago, 20 years ago. It's being exacerbated, but it's that equation of how do I control all the things that I need to protect? How do I enable innovation where it is going to deliver business value? How do I exchange data between a customer, somebody in my supply chain safely? And do all of that whilst managing the fourth leg, which is cost overheads. There's not an open checkbook here. I've got to figure out if I'm a CIO and CDO how I do all of this within a fixed budget. So those aspects have always been there. Now with more choices, infrastructure in the cloud, API-driven applications, on-premise, and that is expanding the choices that a business has in how they put their data of work. It's also then creating a layer of management and data governance that really has to now manage those four aspects. Control, innovation, exchange of data, and the cost overhead. That top layer of the first slide that we showed was all about the business value. So I wonder if we could drill into the business impact a little bit. What are your customers seeing specifically in terms of the impact of all this automation on their business? Yeah, so we've had some great results. I think a few of the biggest have been helping customers move away from manually curating their data and their metadata. It used to be a time where for data quality initiatives or data governance initiatives, there'd be teams of people manually feeding a data catalog. And it's great to have that inventory of classified data to be able to understand the single version of the truth. But having 10, 15 people manually process that keep it up to date. When it's moving feet, the reality of it is what's true about data today add another few sources in a few months time to your business and start collaborating with new partners. Suddenly the landscape has changed, the amount of work has gone up. But what we're finding is through automating, creating that data discovery, feeding our data catalog, that's releasing a lot more time for our customers to spend on innovating and managing their data. Couple of others is around self-service data analytics, moving the choices of what data might have business value into the hands of business users and data consumers to have faster cycle times around generating insights. And we're really helping there by automating the creation of those data sets that are needed for that. And the last piece I'd have to say where we're seeing impacts more recently is in the exchange of data. There are a number of marketplaces out there who are now being compelled to become more digital, to rewire their business processes and everything from an RPA initiative to automation involving digital transformation is having CIOs, Chief Data Officers and Enterprise Architects rethink, how do they rewire the pipelines for their data to feed that digital transformation? Yeah, to me it comes down to monetization. Of course that's for a for-profit industry for nonprofits for sure, the cost-cutting or in the case of healthcare which we'll talk about in a moment, I mean it's patient outcomes, but the job of a Chief Data Officer has gone from data quality and governance and compliance to really figuring out how data can be monetized, not necessarily selling the data but how it contributes to the monetization of the company and then really understanding specifically for that organization how to apply that and that is a big challenge. We sort of chatted about it 10 years ago in the early days of a dupe and then one percent of the companies had enough engineers to figure it out but now the tooling is available, the technology is there and the practices are there and that really to me is the bottom line AJ is it's show me the money. Absolutely, it's definitely then Signal Links is focusing in on the single view of that customer and where we're helping there is to pull together those disparate siloed sources of data to understand what are the needs of the patient, of the broker of the, if it's insurance, what are the needs of the supply chain manager if it's manufacturing and providing that 360 view of data is helping to see, helping that individual unlock the value for the business. So data's providing the lens, provided you know which data it is that can assist in doing that. And you know, you mentioned RPA before I had an RPA customer tell me she was a Six Sigma expert and she told me we would never try to apply Six Sigma to a business process but with RPA we can do so very cheaply. Well, what that means is lower cost means better employee satisfaction and really importantly, better customer satisfaction and better customer outcomes. Let's talk about healthcare for a minute because it's a really important industry. It's one that is ripe for disruption and has really been up until recently pretty slow to adopt a lot of the major technologies that have been made available. But what are you seeing in terms of this theme we're using of putting data to work in healthcare specifically? Yeah, I mean healthcare has had a lot thrown at it. There's been a lot of change in terms of legislation recently, particularly in the US market, in other economies healthcare is on a path to becoming more digital. And part of that is around transparency of price. So to be operating effectively as a healthcare marketplace, being able to have that price transparency around what an elected procedure is gonna cost before taking that step forward is super important to have an informed decision around that. So if we look at the US, for example, we've seen that healthcare costs annually have risen to $4 trillion but even with all of that cost we have healthcare consumers who are reluctant sometimes to take up healthcare, even if they have symptoms. And a lot of that is driven through not knowing what they're opening themselves up to. And I think, David, if you or I were to book travel, a holiday maybe or a trip, we'd wanna know what we're in for, what we're paying for upfront. But sometimes in healthcare, that choice, the option might be their plan but the cost that comes with it isn't. So recent legislation in the US is certainly helpful to bring forward that price transparency. The underlying issue there though is the disparate different formats, types of data that are being used from payers, patients, employers, different healthcare departments to try and make that work. And where we're helping on that aspect in particular related to price transparency is to help make that data machine readable. So sometimes with data, the beneficiary might be a person. But in a lot of cases now we're seeing the ability to have different systems interact and exchange data in order to process a workflow, to generate online lists of pricing from a provider that's been negotiated with a payer is really an enabling factor. So guys, I wonder if you bring up the next slide which is kind of the nirvana. So if you saw the previous slide that the middle there was all different shapes and presumably disparate data, this is the outcome that you want to get where everything fits together nicely and you've got this open exchange. It's not opaque as it is today. It's not bubblegum band-aids and duct tape. But describe this sort of outcome that you're trying to achieve and maybe a little bit about what it's going to take to get there. Yeah, that's a combination of a number of things. It's making sure that the data is machine readable, making it available to APIs that could be RPA tools. We're working with technology companies that employ RPA for healthcare, specifically to manage that patient and payer data to bring that together. In our data discovery, what we're able to do is to classify that data and have it made available to a downstream tool, technology or person to apply that workflow to the data. So this looks like Nirvana, it looks like Utopia, but it's the end objective of a journey that we can see in different economies, there are different stages of maturity in turning healthcare into a digital service, even so that you could consume it from where you live, from home with telemedicine and tele-care. Yeah, so, and this is not just for healthcare, but you want to achieve that self-service data marketplace in virtually any industry. You're working with TCS, Tata Consultancy Services to achieve this. You know, a company like Io-Tao has to have partnerships with organizations that have deep industry expertise. Talk about your relationship with TCS and what you guys are doing specifically in this regard. Yeah, we've been working with TCS now for a long while and we'll be announcing some of those initiatives here where we're now working together to reach their customers where they've got a brilliant framework of Business 4.0, where they're reimagining with their clients how their business can operate with AI, with automation, and become more agile and digital. Our technology, the reams of patents that we have in our portfolio, being able to apply that at scale, on a global scale, across industries, such as banking, insurance, and healthcare, is really allowing us to see a bigger impact on consumer outcomes, patient outcomes, and the feedback from TCS is that we're really helping in those initiatives remove that friction. They talk a lot about data friction. I think that's a polite term for the image that we just saw with the disparate technologies, that the legacy that is built up. So if we want to create a transformation, having that partnership with TCS across industries is giving us that reach and that impact on many different people's day-to-day jobs and lives. Let's talk a little bit about the cloud. It's a topic that we've hit on quite a bit here in this content series, but you know, the cloud companies, the big hyperscalers, they put everything into the cloud, right? But customers are more circumspect than that, but at the same time, machine intelligence, ML, AI, the cloud is a place to do a lot of that. That's where a lot of the innovation occurs, and so what are your thoughts on getting to the cloud, putting data to work, if you will, with machine learning, stuff that you're doing with AWS, what's your fit there? Yeah, David, we work with all of the cloud platforms, Microsoft Azure, GCP, IBM, but we're expanding our partnership now with AWS and we're really opening up the ability to work with their Greenfield accounts where a lot of that data, that technology is in their own data centers at the customer. And that's across banking, healthcare, manufacturing and insurance. And for good reason, a lot of companies have taken the time to see what works well for them with the technologies that the cloud providers are offering. In a lot of cases, testing services or analytics, using the cloud to move workloads to the cloud to drive data analytics is a real game changer. So there's good reason to maintain a lot of systems on-premise if that makes sense from a cost, from a liability point of view. And the number of clients that we work with that do have and will keep their mainframe systems written in COBOL is no surprise to us. But equally, they want to tap into technologies that AWS has such a sage maker. The issue is, as a chief data officer, I don't have the budget to move everything to the cloud day one. I might want to show some results first up front to my business users and work closely with my chief marketing officer to look at what's happening in terms of customer trends and customer behavior. What are the customer outcomes, patient outcomes and partner outcomes that I can achieve through analytics data science? So working with AWS and with clients to manage that hybrid topology of some of that data being in the cloud, being put to work with AWS SageMaker and Iotao being used to identify where is the data that needs to be amalgamated and curated to provide the data set for machine learning, advanced analytics to have an impact for the business. So what are the critical attributes of what you're looking at to help customers decide what to move and what to keep, if you will? Well, one of the quickest outcomes that we help a customer achieve is to apply their business glossary. The items of data that mean something to them across those different silos and pull all of that together into a unified view. Once they've got that for a data engineer working with a business manager to think through how do we want to create this application? Now, what is the churn model? The loyalty or the propensity model that we want to put in place here? How do we use predictive analytics to understand what needs are for a patient? That sort of innovation is what we're locking, applying tools such as SageMaker on AWS to then do the computation and to build those models to deliver the outcome is across that value chain. And it goes back to the first picture that we put up, David. The outcome is that API on the back of it, you've got a machine learning model that's been developed in a tool such as Databricks or a Jupyter notebook. That data has to be sourced from somewhere. Somebody has to say that, yep, you've got permission to do what you're trying to do without falling foul of any compliance around data. And it all goes back to discovering that data, classifying it, indexing it in an automated way to cut those timelines down to hours and days. Yeah, and it's the innovation part of your data portfolio, if you will, that you're going to put into the cloud, apply tools like SageMaker and others, your tool de jour. I mean, whatever your favorite tool is, you don't care. The customer's going to choose that. And the cloud vendors, maybe they want you to use their tool, but they're making their marketplaces available to everybody, but it's that innovation piece, the ones where you want to apply that self-service data marketplace to and really drive, as I said before, monetization. All right, give us your final thoughts, AJ, bring us home. So final thoughts on this, David, is that at the moment we're seeing a lot of value in helping customers discover their data using automation, automatically curating a data catalog. And that unified view is then being put to work through our APIs, having an open architecture to plug in whatever tool technology our clients have decided to use. And that open architecture is really feeding into the reality of what CIOs and Chief Data Officers are managing, which is a hybrid on-premise cloud approach to use best of breed, but business users wanting to use a particular technology to get their business outcome, having the flexibility to do that, no matter where your data's sitting on-premise, on cloud is where self-service comes in. So that self-service view of what data I can plug together, drive, exchange, monetizing that data is where we're starting to see some real traction with customers now accelerating, becoming more digital, to serve their own customers. Yeah, we really have seen a cultural mind shift going from sort of complacency and obviously COVID has accelerated this, but the combination of that cultural shift, the cloud, machine intelligence tools, give me a lot of hope that the promises of big data will ultimately be lived up to in this next 10 years. So A.J. Bajor, thanks so much for coming back on theCUBE. You're a great guest and appreciate your insights. Appreciate it, David, see you next time. All right, and keep it right there, and we'll see you right back right after this short break.