 From around the globe, it's theCUBE with digital coverage of smart data marketplaces brought to you by IOTAHO. Digital transformation has really gone from buzzword to a mandate. The digital business is a data business and for the last several months we've been working with IOTAHO on an ongoing content series focused on smart data and automation to drive better insights and outcomes, essentially putting data to work. And today we're going to do a deeper dive on automating data discovery. And one of the thought leaders in this space is AJ Vahora, who's the CEO of IOTAHO who's once again joining me, AJ, good to see you. Thanks for coming on. Great, great to be here, David, thank you. So let's start by talking about some of the business realities and what are the economics that are driving automated data discovery? Why is that so important? Yeah, and on this one, David, it's a number of competing factors. We've got the reality of data, which may be sensitive to this control. Three other elements are wanting to drive value from that data, so innovation. You can't really drive a lot of value without exchanging data. So the ability to exchange data and to manage those cost overheads. And data discovery is at the root of managing that in an automated way to classify that data and set some policies to put that automation in place. Yeah, look, we have a picture of this, if we could bring it up guys, because I want to, AJ, help the audience understand kind of where data discovery fits in here. This is, as we talked about, this is a complicated situation for a lot of customers. They've got a variety of different tools and you've really laid it out nicely here in this diagram. So take us through sort of where that piece fits. Yeah, I mean, where at the right-hand side of this exchange, we're really now in a data-driven economy that is everything's connected through APIs that we consume online through mobile apps. And what's not apparent is the chain of activities and tasks that have to go into serving that data to an API. At the outset, there may be many legacy systems, technologies, platforms on-premise in cloud, hybrid, you name it. And across those silos, getting to a unified view is the heavy lifting. I think we've seen some great impacts that BI tools, such as Power BI, Tableau, Looker, and so on and click have had. And they're in our ecosystem on visualizing data and CEOs, managers, people that are working in companies day to day, get a lot of value from seeing what's the real-time activity? What was the trend over this month versus last month? The tool is to enable that. We hear a lot of good things and work that we're doing with Snowflake and MongoDB on the public cloud platforms, GCP Azure, about enabling, building those pipelines to feed into those analytics. But what often gets hidden is how do you source that data that could be locked into a mainframe, a data warehouse, IoT data, and pull all of that together? And that is the reality of that is it's a lot of heavy lifting, it's hands-on work that can be time consuming. And the issue there is that data may have value, it might have potential to have an impact on the top line for a business, on outcomes for consumers. But you're never really sure unless you've done the investigation, discovered it, unified that and be able to serve that through to other technologies. Guys, if you would bring that picture back up again because AJ, you made a point and I want to land on that for a second. There's a lot of manual cure rating. An example would be the data catalog, if data scientists complain all the time that they're manually wrangling data. And so you're trying to inject automation into the cycle. And then the other piece that I want you to address is the importance of APIs. You really can't do this without an architecture that allows you to connect things together. That sort of enables some of the automation. Yeah, I mean, I'll take that in two parts David. The APIs, so virtual machines are connected by APIs. Business rules and business logic driven by APIs, applications. So everything across the stack from infrastructure down to the network, hardware is all connected through APIs. And the work of serving data through to an API building those pipelines is often miscalculated just how much manual effort that takes. And that manual effort, we've got a nice list here of what we automate down at the bottom. Those tasks of indexing, labeling, mapping across different legacy systems. All of that takes away from the job of a data scientist, a data engineer looking to produce value, monetize data and to help their business data consumers. Yeah, so it's that top layer that the business sees of course is a lot of work that has to go into achieving that. I want to talk about some of the key tech trends that you're seeing. And one of the things that we talk about a lot is metadata, the importance of metadata can't be understated. What are some of the big trends that you're seeing metadata and others? Yeah, I'll summarize it as five. There's a trend now look at metadata more holistically across the enterprise. And that really makes sense from trying to look across different data silos and apply a policy to manage that data. So that's the control piece, that's that lever. The other side on sometimes competing with that control around sensitive data around managing the cost of data is innovation. Innovation, being able to speculate and experiment and try things out where you don't really know what the outcome is if you're a data scientist and engineer you've got a hypothesis. And therefore you've got that tension between control over data and innovation and driving value from it. So enterprise wide metadata management is really helping to unlock where might that latent value be across that sets of data. The other piece is adaptive data governance. Those controls that stick from the data policemen, data stewards, where they're trying to protect the organization, protect the brand, protect consumers data is necessary. But in different use cases you might want to nuance and apply a different policy to govern that data relevant to the context where you may have data that is less sensitive that can be used for innovation and adapting the style of governance to fit the context is another trend that we're seeing coming up here. A few others is where we're sitting quite extensively in working with automating data discovery. We're now breaking that down into what can we direct? What do we know is a business outcome is a known upfront objective and direct that data discovery to towards that. And that means applying our algorithms, our technology and our tools towards solving a known problem. The other one is autonomous data discovery. And that means trying to allow background processes to understand what changes are happening with data over time, flagging those anomalies. And the reason that's important is when you look over a length of time to see different spikes, different trends and activity that's really giving a data ops team the ability to manage and calibrate how they're applying policies and controls to data. And the last two David that we're seeing is this huge drive towards self service. So reimagining how to apply policy data governance into the hands of a data consumer inside a business or indeed the consumer themselves, the self service if they're a banking customer or a healthcare customer. And the policies and the controls and rules making sure that those are all in place to adaptively serve those data marketplaces that we're now involved in creating. I want to ask you about the autonomous data discovery and the adaptive data governance. Is the problem we're addressing there one of quality? In other words, machines are better than humans are doing this. Is it one of scale that humans just don't scale that well? Is it both? Can you add some color to that? Yeah, honestly, it's the same equation that existed 10 years ago, 20 years ago. It's being exacerbated, but it's that equation of how do I control all the things that I need to protect? How do I enable innovation where it is gonna deliver business value? How do I exchange data between a customer, somebody in my supply chain safely? And do all of that whilst managing the fourth that leg which is cost overheads. There's not an open checkbook here. I've got to figure out if I'm a CIO and CDO how I do all of this within a fixed budget. So that those aspects have always been there now with more choices, infrastructure in the cloud, API driven applications, on-premise, and that is expanding the choices that a business has in how they put their data of work. It's also then creating a layer of management and data governance that really has to now manage those four aspects, control, innovation, exchange of data, and the cost overhead. That top layer of the first slide that we showed was all about the business value. So I wonder if we could drill in to the business impact a little bit. What are your customers seeing specifically in terms of the impact of all this automation on their business? Yeah, so we've had some great results. I think a few of the biggest have been helping customers move away from manually curating their data and their metadata. It used to be a time where for data quality initiatives or data governance initiatives, there'd be teams of people manually feeding a data catalog. And it's great to have that inventory of classified data to be able to understand the single version of the truth. But having 10, 15 people manually processed that keep it up to date, when it's moving feet, the reality of it is what's true about data today add another few sources in a few months time to your business and start collaborating with new partners. Suddenly the landscape has changed, the amount of work has gone up. But what we're finding is through automating, creating that data discovery, feeding a data catalog, that's releasing a lot more time for our customers to spend on innovating and managing their data. Couple of others is around self-service data analytics, moving the choices of what data might have business value into the hands of business users and data consumers to have faster cycle times around generating insights. And we're really helping there by automating the creation of those data sets that are needed for that. And the last piece I'd have to say where we're seeing impacts more recently is in the exchange of data. There are a number of marketplaces out there who are now being compelled to become more digital, to rewire their business processes and everything from an RPA initiative to automation involving digital transformation is having CIOs, Chief Data Officers and Enterprise Architects rethink how do they rewire the pipelines for their data to feed that digital transformation? Yeah, to me it comes down to monetization. Of course, that's for a for-profit industry for nonprofits for sure, the cost-cutting or in the case of healthcare, which we'll talk about in a moment, I mean, it's patient outcomes, but the job of a Chief Data Officer has gone from data quality and governance and compliance to really figuring out how data can be monetized, not necessarily selling the data, but how it contributes to the monetization of the company and then really understanding specifically for that organization how to apply that. And that is a big challenge. We sort of chatted about it 10 years ago, you know, the early days of a dupe, and then, you know, one percent of the companies had enough engineers to figure it out, but now the tooling is available, the technology is there and the practices are there. And that really to me is the bottom line, AJ, is it's, show me the money. Absolutely, it's definitely then signaling is focusing in on the single view of that customer and where we're helping there is to pull together those disparate siloed sources of data to understand what are the needs of the patient, of the broker, of the, if it's insurance? What are the needs of the supply chain manager if it's manufacturing? And providing that 360 view of data is helping to see, helping that individual unlock the value for the business. So data's providing the lens, provided you know which data it is that can assist in doing that. And you know, you mentioned RPA before, I had an RPA customer tell me she was a Six Sigma expert and she told me we would never try to apply Six Sigma to a business process, but with RPA we can do so very cheaply. Well, what that means is lower cost, means better employee satisfaction and really importantly, better customer satisfaction and better customer outcome. Let's talk about healthcare for a minute because it's a really important industry. It's one that is ripe for disruption and has really been up until recently, pretty slow to adopt a lot of the major technologies that have been made available. But what are you seeing in terms of this theme we're using of putting data to work in healthcare specifically? Yeah, I mean healthcare has had a lot thrown at it. There's been a lot of change in terms of legislation recently, particularly in the US market, in other economies, healthcare is on a path to becoming more digital. And part of that is around transparency of price. So to be operating effectively as a healthcare marketplace, being able to have that price transparency around what an elected procedure is gonna cost before taking that step forward is super important to have an informed decision around that. So if we look at the US, for example, we've seen that healthcare costs annually have risen to $4 trillion, but even with all of that cost, we have healthcare consumers who are reluctant sometimes to take up healthcare, even if they have symptoms. And a lot of that is driven through not knowing what they're opening themselves up to. And I think, David, if you or I were to book travel, a holiday maybe or a trip, we'd wanna know what we're in for, what we're paying for up front. But sometimes in healthcare, that choice, the option might be there in plan, but the cost that comes with it isn't. So recent legislation in the US is certainly helpful to bring forward that price transparency. The underlying issue there, though, is the disparate different formats, types of data that are being used from payers, patients, employers, different healthcare departments to try and make that work. And where we're helping on that aspect in particularly related to price transparency is to help make that data machine readable. So sometimes with data, the beneficiary might be a person. But in a lot of cases now we're seeing the ability to have different systems interact and exchange data in order to process a workflow, to generate online lists of pricing from a provider that's been negotiated with a payer is really an enabling factor. So guys, I wonder if you bring up the next slide, which is kind of the nirvana. So if you saw the previous slide, the middle there was all different shapes and presumably disparate data. This is the outcome that you want to get where everything fits together nicely and you've got this open exchange. It's not opaque as it is today. It's not bubblegum band-aids and duct tape. But describe this sort of outcome that you're trying to achieve and maybe a little bit about what it's going to take to get there. Yeah, that's a combination of a number of things. It's making sure that the data is machine readable. Making it available to APIs that could be RPA tools. We're working with technology companies that employ RPA for healthcare, specifically to manage that patient and payer data to bring that together. In our data discovery, what we're able to do is to classify that data and have it made available to a downstream tool, technology or person to apply that workflow to the data. So this looks like Nirvana. It looks like Utopia. But it's the end objective of a journey that we can see in different economies, there are different stages of maturity in turning healthcare into a digital service even so that you could consume it from where you live, from home with telemedicine and telecare. Yeah, so, and this is not just for healthcare you want to achieve that self-service data marketplace in virtually any industry. You're working with TCS hot talk consultancy services to achieve this. You know, a company like IOTIO has to have partnerships with organizations that have deep industry expertise. Talk about your relationship with TCS and what you guys are doing specifically in this regard. Yeah, we've been working with TCS now for a long while and we'll be announcing some of those initiatives here where we're now working together to reach their customers where they've got a brilliant framework of Business 4.0 where they're re-imagining with their clients how their business can operate with AI, with automation and become more agile and digital. Our technology, the reams of patents that we have in our portfolio, being able to apply that at scale on a global scale across industries such as banking, insurance and healthcare is really allowing us to see a bigger impact on consumer outcomes, patient outcomes and the feedback from TCS is that we're really helping in those initiatives remove that friction. They talk a lot about data friction. I think that's a polite term for the image that we just saw with the disparate technologies that the legacy that has built up. So if we wanna create a transformation having that partnership with TCS across industries is giving us that reach and that impact on many different people's day-to-day jobs and lives. Let's talk a little bit about the cloud. It's a topic that we've hit on quite a bit here in this content series, but you know, the cloud companies, the big hyperscalers put everything into the cloud, right? But customers are more circumspect than that but at the same time machine intelligence, ML, AI, the cloud is a place to do a lot of that. That's where a lot of the innovation occurs and so what are your thoughts on getting to the cloud, putting data to work, if you will, with machine learning, stuff that you're doing with AWS, what's your fit there? Yeah, David, we work with all of the cloud platforms, Microsoft Azure, GCP, IBM, but we're expanding our partnership now with AWS and we're really opening up the ability to work with their Greenfield accounts where a lot of that data, that technology is in their own data centers at the customer. And that's across banking, healthcare, manufacturing and insurance. And for good reason, a lot of companies have taken the time to see what works well for them with the technologies that the cloud providers are offering. In a lot of cases, testing services or analytics, using the cloud to move workloads to the cloud to drive data analytics is a real game changer. So there's good reason to maintain a lot of systems on premise if that makes sense from a cost, from a liability point of view. And the number of clients that we work with that do have and will keep their mainframe systems written in COBOL is no surprise to us. But equally, they want to tap into technologies that AWS has such a sage maker. The issue is, as a Chief Data Officer, I don't have the budget to move everything to the cloud day one. I might want to show some results first up front to my business users and work closely with my Chief Marketing Officer to look at what's happening in terms of customer trends and customer behavior. What are the customer outcomes, patient outcomes and partner outcomes that I can achieve through analytics data science? So working with AWS and with clients to manage that hybrid topology of some of that data being in the cloud, being put to work with AWS SageMaker and Iotaho being used to identify where is the data that needs to be amalgamated and curated to provide the data set for machine learning advanced analytics to have an impact for the business. So what are the critical attributes of what you're looking at to help customers decide what to move and what to keep, if you will? Well, one of the quickest outcomes that we help a customer achieve is to apply their business glossary. The items of data that mean something to them across those different silos and pull all of that together into a unified view. Once they've got that for a data engineer working with a business manager to think through how do we want to create this application? Now, what is the churn model? The loyalty or the propensity model that we want to put in place here? How do we use predictive analytics to understand what needs are for a patient? That sort of innovation is what we're locking, applying tools such as SageMaker on AWS to then do the computation and to build those models to deliver the outcome is across that value chain. And it goes back to the first picture that we put up David. The outcome is that API on the back of it, you've got a machine learning model that's been developed in a tool such as Databricks or a Jupyter notebook. That data has to be sourced from somewhere. Somebody has to say that, yep, you've got permission to do what you're trying to do without falling foul of any compliance around data. And it all goes back to discovering that data, classifying it, indexing it in an automated way to cut those timelines down to hours and days. Yeah, and it's the innovation part of your data portfolio, if you will, that you're going to put into the cloud, apply tools like SageMaker and others, your tool de jour. I mean, whatever your favorite tool is, you don't care. The customer's going to choose that. And the cloud vendors, maybe they want you to use their tool, but they're making their marketplaces available to everybody, but it's that innovation piece, the ones where you want to apply that self-service data marketplace to and really drive, as I said before, monetization. All right, give us your final thoughts, E.J., bring us home. So final thoughts on this, David, is that at the moment we're seeing a lot of value in helping customers discover their data using automation, automatically curating a data catalog. And that unified view is then being put to work through our APIs, having an open architecture to plug in whatever tool technology our clients have decided to use. And that open architecture is really feeding into the reality of what CIOs and Chief Data Officers are managing, which is a hybrid on-premise cloud approach to use best of breed, but business users wanting to use a particular technology to get their business outcome, having the flexibility to do that, no matter where your data's sitting on-premise, on-cloud, is where self-service comes in. So that self-service view of what data I can plug together, drive, exchange, monetizing that data is where we're starting to see some real traction with customers now accelerating, becoming more digital to serve their own customers. Yeah, we really have seen a cultural mind shift going from sort of complacency and obviously COVID has accelerated this, but the combination of that cultural shift, the cloud, machine intelligence tools, give me a lot of hope that the promises of big data will ultimately be lived up to in this next 10 years. So Ajay Bihora, thanks so much for coming back on theCUBE, you're a great guest and appreciate your insights. Appreciate it, David, see you next time. All right, and keep it right there, everybody, right back right after this short break. Are you interested in test driving the IOTAHO platform? Kickstart the benefits of data automation for your business through the IO Labs program, a flexible, scalable sandbox environment on the cloud of your choice with setup, service, and support provided by IOTAHO. Click on the link and connect with a data engineer to learn more and see IOTAHO in action. From around the globe, it's theCUBE with digital coverage of smart data marketplaces brought to you by IOTAHO. We're back, we're talking about smart data and have been for several weeks now. Really, it's all about injecting intelligence and automation into the data lifecycle and the data pipeline. And today we're drilling into smart data marketplaces, really trying to get to that self-serve, unified, trusted, secured, and compliant data models. And this is not trivial. And with me to talk about some of the nuances involved in actually getting there with folks that have experience doing that. Vade Sen is here, he's the digital evangelist with Tata Consultancy Services, TCS. And Ajay Vahora is back, he's the CEO of IOTAHO. Guys, great to see you. Thanks so much for coming on. Good to see you, Dave. Hi, Dave. Nice to be here. Ajay, let's start with you. Let's set up the sort of smart data concept. What's that all about? What's your perspective? Yeah, so I mean, our way of thinking about this is you've got data, it has latent value. And it's really about discovering what the properties are of that data. Does it have value? Can you put that data to work? And the way we go about that with algorithms and machine learning to generate signals in that data identified patterns, that means we can start to discover how can we apply that data downstream? What value can we unlock for a customer and business? Well, so you've been on this, I mean, you're really like a laser. Why, I mean, why this issue? Did you see a gap in the marketplace in terms of talking to customers and maybe you can help us understand the origin? Yeah, I think as the gap has always been there, there's become more apparent over recent times with big data. So the ability to manually work with volumes of data in petabytes is prohibitively complex and expensive. So you need a different route, you know, a different set of tools and methods to do that. Metadata, a data that you can understand about data. That's what we are at our focus on discovering and generating that metadata that really then allows you to automate those data ops processes. So the gap David is being felt by business enterprises in all sectors, healthcare, telecoms in putting their data to work. So, Vane, let's talk a little bit about your role. You work with a lot of customers, I see you as an individual who's a company who's really trying to transform what is a very challenging industry that's sort of ripe for transformation but maybe you could give us your perspective on this, what kind of signals you're looking for from the data pipeline and we'll get into how you're helping transform healthcare. Thanks, David. You know, I think this year has been one of those years where we've all realized about this idea of unknown unknowns where something comes around the corner that you're completely not expecting. And that's really hard to plan for obviously. And I think what we need is the ability to find the early signals and be able to act on things as soon as you can. Sometimes and you know, the COVID-19 scenario of course is hopefully once in a generation thing but most businesses struggle with the idea that they may have the data there in their systems but they still don't know which bit of that is really valuable and what are the signals they should be watching for. And I think the interesting thing here is the ability for us to extract from a mass of data the most critical and important signals. And I think that's where we want to focus on. And so talk a little bit about healthcare in particular and sort of your role there and maybe at a high level how Tata and your ecosystem are helping transform healthcare. So if you look at healthcare you've got the bit where people need active intervention from a medical professional. And then you've got this larger body of people, typically elderly people who aren't unwell but they have frailties, they have underlying conditions and they're very vulnerable, especially in the world that we are in now in the post COVID-19 scenario. And what we're trying to look at is how do we keep people who are elderly, frail and vulnerable? How do we keep them safe in their own homes rather than moving to care homes where there has been an incredibly high level of infection for things like COVID-19? So the world works better if you can keep people safe in their own homes. If you can see the slide we've got we're also talking about a world where care is expensive. In most Western countries, especially in Western Europe, the number of elderly people is increasing as a percentage of the population quite significantly and resources just are not keeping up. We don't have enough people, we don't have enough funding to look after them effectively. And the care industry that used to do that job has been struggling overly. So it's kind of a perfect storm for the need for technology intervention there. And in that space, what we're saying is the data signals that we want to receive are exactly what as a relative or a son or daughter you might want from a parent to say, everything's okay. We know that today's been just like every other day there are no anomalies in your daily living. If you could get the signals that might tell us there's something's wrong, something's not quite right. We don't need very complex diagnostics. We just need to know something's not quite right. That my dad hasn't woken up as always at seven o'clock but till nine o'clock there's no movement. Maybe he's a bit unwell. It's that kind of signal that if we can generate can make a dramatic difference to how we can look after these people whether through professional carers or through family members. So what we're looking to do is to sensor enable homes of vulnerable people so that those data signals can come through to us in a curated manner in a way that protects privacy and security of the individual but gives the right people which is carers or chosen family members the access to the signals which is alerts that might tell you there was too much movement at night or the front was being left open, things like that that would give you a reason to call in and check. Everybody has spoken to in this always has an example of an uncle or a relative or a parent that they've looked after and all they're looking for is a signal. Even stories like my father's neighbor calls me when he doesn't open his curtain by 11 o'clock. That actually if you think about it as a data signal that something might not be all right. And I think what we're trying to do with technology is create those kinds of data signals because ultimately the healthcare system works much better if you can prevent rather than cure. So every dollar that you put into prevention saves maybe $3 to $5 downstream. So the economics of it also work in our favor. And those signals give family members the confidence to act. AJ, it's interesting to hear what Vade was talking about in terms of the unknowns because when you think about the early days of the computer industry, there were a lot of knowns. The processes were known. It was like the technology was the big mystery. Now I feel like it's flipped. We've certainly seen that with COVID the technology is actually quite well understood and quite mature and reliable. One of the examples is automated data discovery which is something that you guys have been focused on at IOTAO. Why is automated data discovery such an important component of a smart data lifecycle? Yeah, I mean if we look David at the schematic and this one moves from left to right where right at the outset with that latent data that the value is latent because you don't know. Does it have, can it be applied? Can that data be put to work or not? And the objective really is about driving some form of exchange or monetization of data. If you think about it in insurance or healthcare you've got lots of different parties, providers, payers, patients. Everybody's looking to make some kind of an exchange of information. The difficulty is in all of those organizations the data sits within its own system. So data discovery, if we drill into the focus of that it's about understanding which data has value classifying that data so that it can be applied and being able to tag it so it can then be put to use. It's the real enabler for data ops. So maybe talk a little bit more about this we're trying to get to self-service. It's something that we hear a lot about. You mentioned putting data to work. It seems to me that if the business can have access to that data and serve themselves that's a way to put data to work. Do you have thoughts on that? Yeah, I mean, thinking back in terms of what IT and the IT function in a business could provide there have been limitations around infrastructure around scaling, around compute. Now that we're in an economy that is digital driven by APIs, your infrastructure, your data, your business rules, your intelligence, your models all of those are on the back of an API. So the options become limitless how you can drive value and exchange that data. What that allows us to do is to be more creative if we can understand what data has value for what use case. Fade, let's talk a little bit about the US healthcare system. It's a good use case. I was recently at a chief data officer conference and listening to the CDO of Johns Hopkins talk about the multiple different formats that they had to ingest to create that COVID map. They even had some PDFs. They had different definitions and that sort of underscored to me the state of the US healthcare industry. I'm not as familiar with the UK and Europe generally but I am familiar with the US healthcare system and the diversity that's there, the duplication of information and the like. Maybe you could sort of summarize your perspectives and give us kind of the before and your vision of the after, if you will. The US of course is particularly large and complex system. We all know that. We also know I think there is some research that suggests that in the US the per capita spend on healthcare is among the highest in the world. I think it's about in 17% and that compares to about just under 9% which is sort of a European, typical European figure. So it's almost double of that but the outcomes are still vastly poorer. And when AJ and I were talking earlier I think we believe that there is a concept of a data friction. When you've got multiple players in an ecosystem trying to provide a single service as a patient you are receiving a single healthcare service but there are probably a dozen up to 20 different organizations that have to collaborate to make sure you get that top of the line healthcare service that that kind of investment deserves. And what prevents it from happening very often is what we would call data friction which is the entity, whole organizations to effectively share data. Something as simple as a healthcare record which says, this is Dave, this is AJ and when we go to a hospital for anything whatever happens that healthcare record can capture all the information and tie to us as an individual. And if you go to a different hospital then that record will follow you. This is how you would expect that to be implemented. But I think we're still, on that journey there are lots and lots of challenges. I've seen anecdotal data around people who suffered because they weren't carrying a card when they went into hospital because that card has the critical elements of data. But in today's world should you need to carry a piece of paper or can that entire thing be a digital data flow that can easily be, can certainly navigate through lack of paper and those kinds of things. So the vision that I think we need to be looking at is an effective data exchange or marketplace backed with a kind of backbone model where people agree and sign off for a data standard where each individual's data is always tied to the individual. So if you want to move states, if you want to move providers, change insurance companies, none of that would impact your medical history your data and the political of the care and medical professionals to access the data at the point of need and at the point of healthcare delivery. So I think that's the vision we're looking at. But as you rightly said that there are enormous number of challenges partly because of the history. Healthcare I think was technology enablement of healthcare started early so there's a lot of legacy as well. So we shouldn't trivialize the challenges that the industry faces but that I think is the way we want to go. Well, privacy is obviously a huge one and a lot of the processes are built around non-digital processes and what you're describing is a flip for digital first. I mean, as a consumer, as a patient, I want an app for that. So I can see my own data. I can see price, price transparency, give access to people that I think need it. And that is a daunting task, isn't it? Absolutely. And I think the implicit idea and what you just said, which is very powerful is also on that app you want the control. Yes. And sometimes you want to be able to change access of data at that point. Right now I'm at the hospital I would like you to access my data. And when I walk away or maybe three days that I want to revoke that access it's that level of control. And absolutely it is by no means a trivial problem but I think that's where you need the data automation tools. If you try to do any of this manually, we'll be here for another decade trying to solve this. But that's where tools like IOTAO come in because to do this, a lot of the heavy lifting behind the scenes has to be automated. There has to be a machine churning that and presenting the simpler options. And you were talking about it just a little while ago, Ej. I was reminded of the example of how McDonald's are cooked when because the self-serve idea that you can go in and you can do your own ordering off a menu or you can go in and select five different flavors from a cook machine and choose your own particular blend of cook. It's a very trivial example but I think that's the word we want to get to with access of data as well. If it was that simple for consumers, for enterprise, business people, for doctors then that's where we ultimately want to be able to arrive. But of course to make something very simple for the end user, somebody has to solve for complexity behind the scenes. So Ej, it seems to me, Ej, there are two major outcomes here. One is, of course, the most important, I guess, is patient outcomes and the other is cost. I mean, they talked about the cost issues. We all, the US especially, understand the concerns about rising costs to health care. So my question is, how does a smart data marketplace fit into achieving those two very important outcomes? Well, we think about how automation is enabling that, where we've got different data formats, the manual tasks that are involved, duplication of information. The administrative overhead of that alone and the work, the rework and the cycles of work that generates, that's really what we're trying to help with data is to eliminate that wasted effort. And with that wasted effort comes time and money to employ people to work through those silo systems. So getting to the point where there is an exchange in a marketplace, just as they would be for banking or insurance, is really about automating the classification of data to make it available to a system. That can pick it up through an API and to run a machine learning model and to manage a workflow, a process. Right, so you mentioned banking and insurance, you're right. I mean, we've actually come a long way in just in terms of know the customer and applying that to know the patient would be, we're very powerful. I'm interested in what you guys are doing together, just in terms of revision. Are you going to market together? Kind of what you're seeing in terms of promoting or enabling this self-service, self-care. Maybe you could talk a little bit about Ayo Taho and Tata, the intersection at the customer. Sure, I think we've been really impressed with the TCS vision of 4.0, how they're reimagining traditional industries, whether it's insurance, banking, healthcare and bringing together automation, agile processes, robotics, AI. And once those enablers, technology enablers are brought together to reimagine how those services can be delivered digitally, all of those are dependent on data. So we see that there's a really good fit here to enable understanding the legacy, the historic situation that is built up over time in an organization, a business and to help shine a light on what's meaningful in there to migrate to the cloud or to drive a digital twin data science project. Anything you can add to that? Sure, I mean, we do take the business 4.0 model quite seriously in terms of the lens with which to look at any industry. And what I talked about in healthcare was an example of that. For us, business 4.0 means a few very specific things. The technology that we use in today's work should be agile, automated, intelligent and cloud based. These have become kind of hygiene factors now. On top of that, the businesses we built should be mass customized, there should be risk embracing, they should engage ecosystems and they should strive for exponential value, not 10% growth year on year, but doubling, tripling every three, four years. Because that's the competition that most businesses are facing today. And within that, the data group itself is an extremely purpose driven business. We really believe that we exhibit to serve communities, not just one specific set, i.e. shareholders, but the broader community in which we live and work. And I think this framework also allows us to apply that to things like healthcare, to education and to a whole vast range of areas where everybody has a vision of using data science or doing really clever stuff with algorithms. But what becomes clear is to do any of that. The first thing you need is a foundational piece of information that you can use as a foundational piece. And if the foundation isn't right, then no matter how much you invest in the data science tools, you won't get the answers you want. And the work we're doing without our really, for me, is particularly exciting because it sorts out that foundational piece. And at the end of it, to make all of this, again, I will repeat that, to make it simple and easy to use for the end user, whoever that is. And I realized that I'm probably the first person who use fast food as a shiny example for healthcare in this discussion. But you can take a lot of different examples. And today, if you press a button and start a car, that's simplicity, but someone has solved for that. And that's what we want to do with data as well. And that makes a lot of sense to me. We talk a lot about digital transformation and a digital business. And I would observe that a digital business puts data at the core. And you can certainly, but the best example is of course Google is an all digital business, but take a company like Amazon, who's got obviously a massive physical component to its business, data is at the core. And that's exactly my takeaway from this discussion. It's both of you are talking about putting data at the core, simplifying it, making sure that it's compliant. And in healthcare, it's taking longer because it's such a high risk industry, but it's clearly happening. COVID, I guess, was an accelerant. Guys, AJ, I'll start with you. Any final thoughts that you want to leave the audience with? Yeah, we're really pleased to be working with TCS. We've been able to explore how we're able to put data to work in a range of different industries. Made as mentioned, healthcare, telecoms, banking and insurance are others. And the same impact seems to be too, whenever we see the exciting digital transformations that are being planned, being able to accelerate those, unlock the value from data is where we're having a purpose. And it's good that we can help patients in the healthcare sector, consumers in banking, realize a better experience through having a more joined up marketplace with their data. And Vade, what excites me about this conversation is that as a patient or as a consumer for helping loved ones, I can go to the web and I can search and I can find a myriad of possibilities. What you're envisioning here is really personalizing that with real time data. And that to me is a game changer. Your final thoughts. Thanks, David. I absolutely agree with you that the idea of data simplicity and simplicity are absolutely forefront. But I think if you were to design an organization today, you might design it very differently to how most companies today are structured. And maybe Google and Amazon are better examples of that because you almost have to think of a business as having a data engine room at its core. A lot of businesses are trying to get to that stage whereas what we call digital natives are people who have started life with that premise. So I absolutely agree with you on that. But extending that a little bit, if you think of most industries as ecosystems that have to collaborate, then you've got multiple organizations who will also have to exchange data to achieve some shared outcomes. Whether you look at supply chains of automobile manufacturers or insurance companies or healthcare as we've been talking about. So I think that's the next level of change we want to be able to make which is to be able to do this at scale across organizations at industry level or in population scale for healthcare. Yeah, thank you for that. Go ahead, Ajay. David, that's where it comes back to again the origination where we've come from in big data. The volume of data combined with the specificity of individualizing, personalizing a service around an individual amongst that massive data from different providers is where it's exciting that we're able to have an impact. Well, and you know, Ajay, I'm glad you brought that up because in the early days of big data, there were only a handful of companies of biggest financial institutions, obviously the internet giants who had all these engineers that were able to take advantage of it. But with companies like Io, Tahoe and others and the investments that the industry has made in terms of providing the tools and simplifying that, especially with machine intelligence and AI and machine learning, these are becoming embedded into the tooling so that everybody can have access to them, small, medium and large companies. That's really to me the exciting part of this new era that we're entering. Yeah, and we're pleased to also take it down to a level of not-for-profits and smaller businesses that want to innovate and leapfrog into growing their digital delivery of their service. And I know a lot of time, but Vade, what you were saying about TCS's responsibility to society I think is really, really important. Large companies like yours, I believe, and you clearly do as well have a responsibility to society more than just a profit. And I think big tech gets a bad rap in a lot of cases. But so thank you for that and thank you gentlemen for this great discussion. Really appreciate it. Thank you. Thank you. I keep it right there, right back right after this short break. This is Dave Vellante for theCUBE. In today's rapidly evolving enterprise landscape, data management and data governance is shifting from traditional manual operations to automation, driven by digital transformation and the need to streamline how businesses create value from their data assets. IoTahoe has put together a guide to help leaders and doers put their data to work with automation. This guide was written by data practitioners for data practitioners, as well as other business executives and gives a comprehensive overview, both in methodology and business impact from enterprise data automation. Each chapter has a case study demonstrating how capabilities and best practices are used in real world settings, as well as what business outcomes resulted from implementation. Guidance for how to set up automation to achieve the greatest time to value from data is provided in a way that speaks to the technical and non-technical readers alike. Understand how other organizations are transforming their businesses, leveraging IoTahoe's data automation technologies and gain some insights on aligning people in process to achieve your roadmap for modernizing your approach to data management and data governance. Request early access to IoTahoe's smart data marketplaces ebook, launching in November 2020 by clicking on the link below. From around the globe, it's theCUBE with digital coverage of smart data marketplaces brought to you by IoTahoe. Hi everybody, this is Dave Vellante and welcome back. We've been talking about smart data. We've been hearing IoTahoe talk about putting data to work and a key part of building great data outcomes is the cloud, of course, and also cloud native tooling. Stuti Vishpande is here. She's a partner solutions architect for Amazon web services and an expert in this area. Stuti, great to see you. Thanks so much for coming on theCUBE. Thank you so much for having me here. You're very welcome. So let's talk a little bit about Amazon. I mean, you have been on this machine learning journey for quite some time. Take us through how this whole evolution has occurred in technology over the period of time since the cloud really has been evolving. Amazon in itself is a company, an example of a company that has gotten through a multi-year machine learning transformation to become the machine learning driven company that you see today. They have been improvising on original personalization model using robotics throughout the fulfillment centers, developing a forecasting system to predict the customer needs and improvising on that and meeting customer expectations on convenience, cost, delivery and speed from developing natural language processing technology for end user interaction to developing a groundbreaking technology such as Prime Air Drones to get packages to the customers. So our goal at Amazon Web Services is to take this rich expertise and experience with machine learning technology across Amazon and to work with thousands of customers and partners to hand over this powerful technology into the hands of developers or data engineers of all levels. Great, so okay, so if I'm a customer or partner of AWS, give me the sales pitch on why I should choose you for machine learning, what are the benefits that I'm going to get specifically from AWS? Well, there are three main reasons why partners choose us. First and foremost, we provide the broadest and the deepest set of machine learning and AI services and features for your business. The velocity at which we innovate is truly unmatched. Over the last year, we launched 200 different services and features. So not only our pace is accelerating, but we provide fully managed services to our customers and partners who can easily build sophisticated AI-driven applications and utilizing this fully managed services we can build and train and deploy machine learning models which is both valuable and differentiating. Secondly, we can accelerate the option of machine learning. So as I mentioned about fully managed services, for machine learning, we have Amazon SageMaker. So SageMaker is a fully managed service that any developer of any level or a data scientist can utilize to build complex machine learning algorithms and models and deploy that at scale with very less effort and at a very less cost. Before SageMaker, it used to take so much of time and expertise and specialization to build all these extensive models. With SageMaker, you can literally build any complex models within just a time of days or weeks. So to increase adoption, AWS has acceleration programs such as ML Solution Labs and we also have education and training programs such as DeepRacer, which enforces on enforcement learning and Embark, which actually help organization to adopt machine learning very readily. And we also support three major frameworks such as TensorFlow, PyTorch and we have separate teams who are dedicated to just focus on all these frameworks and improve the support of these frameworks for a wide variety of workflows. And thirdly, we provide the most comprehensive platform that is optimized for machine learning. So when you think about machine learning, you need to have a data store where you can store your training sets, your test sets, which is highly reliable, highly scalable and secure data store. Most of our customers want to store all of their data and any kind of data into a centralized repository that can be treated as a central source of truth. And in this case, probably an Amazon S3 data store to build an end-to-end machine learning workflow. So we believe that we provide this capability of having the most comprehensive platform to build the machine learning workflow from end-to-end. Great, thank you for that. So my next question is, this is a complicated situation for a lot of customers. Having the technology is one thing but adoption is sort of everything. So I wonder if you could paint a picture for us and help us understand how you're helping customers think about machine learning, thinking about that journey and maybe give us the context of what the ecosystem looks like. Sure, if someone can put up the build, I would like to provide a picture representation of how AWS Envision Machine Learning has three layers of stack. And moving on to next build, I can talk about the bottom layer. And bottom layer is, if you can see over the screen, it's basically for advanced technologists, advanced data scientists who are machine learning practitioners who work at the framework level. 90% of data scientists use multiple frameworks because multiple frameworks are adjusted and are suitable for multiple and different kinds of workloads. So at this layer, we provide support for all of the different kinds of frameworks. And the bottom layer is only for the advanced scientists and developers who actually want to build, train and deploy these machine learning models by themselves. And moving on to the next level, which is the middle layer, this layer is only suited for non-exploit. So here we have SageMaker where it provides a fully managed service where you can build, tune, train and deploy your machine learning models at a very low cost and with very minimal efforts and at a higher scale. It removes all the complexity, heavy lifting and guesswork from this stage of machine learning. And Amazon SageMaker has been the C level change. Many of our customers are actually standardizing on top of Amazon SageMaker. And then moving on to the next layer, which is the top most layer, we call this as AI services because this may make the human cognition. So all the services mentioned here, such as Amazon Recognition, which is basically a deep learning service optimized for image and video analysis. And then we have Amazon Poly, which can do the text to speech conversion and so on and so forth. So these are the AI services that can be embedded into the application so that the end user or the end customer can build AI driven applications. Love it. Okay, so you got the experts at the bottom with the frameworks, the hardcore data scientists. You kind of got the self-driving machine learning in the middle. And then you have all the ingredients. I'm like an AI chef or a machine learning chef. I can pull in vision, speech, chatbots, fraud detection and sort of compile my own solutions. That's cool. We hear a lot about SageMaker, Stuti. I wonder if you could tell us a little bit more. Can we double click a little bit on SageMaker? That seems to be a pretty important component of that stack that you just showed us. Sure, and I think that was an absolutely very great summarization of all the different layers of machine learning stack. So thank you for providing the gist of that. Of course, I'll be really happy to talk about Amazon SageMaker because most of our customers are actually sanitizing on top of SageMaker. We've spoken about how machine learning traditionally has so many complications and it's very complex and extensive, expensive and iterative process, which makes it even harder because there are no integrated tools if you do the traditional machine learning kind of deployment. There are no integrated tools for the entire workflow process and deployment. And that is where SageMaker comes into the picture. SageMaker removes all the heavy lifting and complexities from each step of the deployment of machine learning workflow. How it solves the type challenges by providing all of the different components that are optimized for every stage of the workflow into one single toolset so that models get to production faster and with much less effort and at a lower cost. We really continue to add important capabilities to Amazon SageMaker. I think last year we announced there were 50 capabilities in just for SageMaker to improvise its features and functionalities. And I would like to call out a couple of those here. SageMaker notebooks which are just one click deployment notebooks that comes along with easy to instances. I'm sorry for quoting Jarvan here. It's Amazon Elastic Computer Instances. So you just need to have a one click deployment and you have the entire SageMaker notebook interface along with Elastic Computer Instances running that gives you the faster time to production. If you are a machine, if you are a data scientist or a data engineer who works extensively for machine learning, you must be aware about building training datasets is really complex. So there we have Amazon Ground Truth that is only for building machine learning training datasets which can reduce your labeling cost by 70%. And if you perform machine learning and are aware about the technology in general, there are some workflows where you need to join inferences. So there we have Elastic Inference Instances where you can reduce the cost by 75% by adding a little GPU acceleration or you can reduce the cost by adding managed quad training utilizing easy to spot instances. So there are multiple ways where you can reduce the cost and there are multiple ways where you can improvise and speed up your machine learning deployment and workflow. So one of the things I love about, I mean, I'm a prime member who's not, right? I love to shop at Amazon. And what I like about it is the consumer experience. It kind of helps me find things that maybe I wasn't aware of maybe based on other patterns that are going on in the buying community or people that are similar. If I want to find a good book, it's always gives me great reviews and recommendations. So I'm wondering if that applies to sort of the tech world and machine learning, learning world. Are you seeing any patterns emerge across the various use cases? You have such scale. What can you tell us about that? Sure. One of the patterns that we have seen all the time is to build scalable layer for any kind of use case. So as I spoke before, that customers are really looking to put their data into a single set of repository where they have the single source of short. So storing data and any kind of data at any velocity into a single source would actually help them to build models who run on these data and get useful insights out of it. So when we speak about an end-to-end workflow, using Amazon SageMaker along with the scalable analytical tool is actually what we have seen as one of the patterns where they can perform some analysis using Amazon SageMaker and build predictive models to say suppose if you want to take a healthcare use case. So they can build a predictive model that can determine the readmissions of using Amazon SageMaker. So what I mean to say is by not moving data around and connecting different services to the single set of source of data, customers avoid creating other copies of data, which is very crucial when you are having training data set and test data sets with Amazon SageMaker. It is highly important to consider this. So the pattern that we have seen is to utilize a central source of repository of data which could be Amazon S3 in this scenario, a scalable analytical layer along with SageMaker. I would like to quote an intuitive success story over here. Using Amazon SageMaker in Qt had reduced the machine learning deployment time by 90%. So I'm quoting here from six months to one week. And if you think about healthcare industry, there have been a shift from reactive to predictive care. So utilizing predictive models to accelerate research and discovery of new jobs and new treatments. And we have also observed that nurses were supported by AI tools increased their productivity has increased by 50%. I would like to say that one of our customers are really diving deep into the AWS's deep portfolio of machine learning and AI services. And including transcribed medical where they are able to provide some insights so that their customers are getting benefits from them. Most of their customers are healthcare providers and they are able to give some insights so that they can create some more personalized and improvised patient care. So there you have the end user benefits as well. One of the patterns that I have, I can speak about and what we have seen as well, pairing a predictive model with real-time integration into healthcare records will actually help the healthcare provider customers for informed decision making and improvising the personalized patient care. You know, that's a great example, several there. And I appreciate that. I mean, healthcare is one of those industries that is just so right for technology ingestion and transformation. And as a great example of how the cloud has really enabled, really, I mean, talking about major changes in healthcare with proactive versus reactive, we're talking about lower costs, better health, longer lives, just really inspiring to see that evolve. We're going to watch it over the next several years. I wonder if we could close on the marketplace. I've had the pleasure of interviewing Dave McCann a number of times in his team and built just an awesome, you know, capability for Amazon and its ecosystem. What about the data products, whether it's SageMaker or other data products in the marketplace, what can you tell us? Sure. AWS Marketplace is a very interesting thing. So let me first talk about the AWS Marketplace. With AWS Marketplace, you can browse and search for hundreds of machine learning algorithms and machine learning model packages in a broad range of categories, such as computer vision, fixed analysis, voice analysis, image and video analysis, predictive models, and so on and so forth. And all of these models and algorithms can be deployed through a Jupyter Notebook, which comes as part of the SageMaker platform. And you can integrate all of these different models and algorithms into our fully managed service, which is Amazon SageMaker through Jupyter Notebooks, SageMaker, SDK and even command line as well. And this experience is followed by AWS Marketplace catalog and APIs. So you get the same benefits as any other Marketplace products, which is seamless deployments and consolidated billing. So you get the same benefits as the products in the AWS Marketplace for your machine learning algorithms and model packages. And this is really important because these can be directly integrated into our SageMaker platform. And I don't even ask about the data products as well. And I'll be really happy to provide and quote one of the example over here. In the interest of COVID times and because we are in the unprecedented times over here, we collaborated with our partners to provide some data products. And one of them is Data Hub by Capitol Hill that gives you the time series data of cases and data gathered from multiple trusted sources. And this is to provide better and informed knowledge so that everyone who was utilizing this product can make some informed decisions and help the community at the end. I love it, I love this concept of being able to access the data, algorithms, tooling, and it's not just about the data, it's being able to do something with the data. And we've been talking about injecting intelligence into those data marketplaces. That's what we mean by smart data marketplaces. Stuti Deshpande, thanks so much for coming to theCUBE, sharing your knowledge and tell us a little bit about AWS. It was a pleasure having you. It was my pleasure too. Thank you so much for having me here. Are you interested in test driving the IOTAHO platform? Kickstart the benefits of data automation for your business through the IOLabs program. A flexible, scalable sandbox environment on the cloud of your choice with setup, service, and support provided by IOTAHO. Click on the link and connect with a data engineer to learn more and see IOTAHO in action.