 Live from Boston, it's theCUBE. Covering IBM Chief Data Officer Summit. Brought to you by IBM. Welcome back everyone to theCUBE's live coverage of the IBM CDO Summit here in Boston, Massachusetts. I'm your host, Rebecca Knight, and I'm joined by my co-host, Paul Gillan. We have two guests for this segment. We have Stephen Illuk, who is the Vice President of Deep Learning Global Chief Data Officer at IBM, and Christopher Banach's Group Chief Data Officer at ING. Thanks so much for coming on theCUBE. My pleasure. Before we get started, Steve, I know you have some very important CUBE fans that you need to give a shout out to, please. For sure. So I missed them on the last three rounds of CUBE, so I'd like to just shout out to Santiago, my son, five years old, and the shortest one, which is Elena. Miss you guys, tons, and now you're on the air. Yeah. To be on that important piece of business. Absolutely. So, let's talk about metadata. What's the problem with metadata? So, I mean, you know, the one problem, or the many, how many problems? Well, how long you got? The problem is it's everywhere, and there's lots of it, and bringing context to that, and understanding it from an enterprise-wide perspective is a huge challenge, just connecting to it, finding it, or collecting it centrally, and then understanding the context and what it means. So, the standardization of it, or the lack of standardization of it across the board. Yeah, it's incredibly challenging, just the immense scale of metadata, at the same time, dealing with metadata, as Chris mentioned, just coming up with your own companies, glossary of terms to describe your own data, it's kind of step one in the journey of making your data discoverable and covered, right? So, it's challenging, and it's not well understood, and I think we're very early on in these stages of describing our data, but we're getting there slowly, but surely. Well, perhaps in that context, it's not only the fact that it's everywhere, but actually we've not created structural solutions in a consistent way across industries to be able to structure it and manage it in an appropriate way. So, help people do it better. What are some of the best practices for creating, managing metadata? Well, you can look at, I mean, it's such a broad space, you can look at different ones. Let's just take the work that we do around describing our data, and we do that for the purposes of regulation, for the purposes of GDPR, et cetera, it's really about discovering and providing context to the data that we have in the organization today. So, in that respect, it's creating a catalog of making sure that we have the descriptions and the structures of the data that we manage and use in the organization. And to give you perhaps a practical example when you have a data quality problem, you need to know how to fix it. So, you create and structure metadata around, well, where does it come from, first of all? So, what's the journey it's taken to get to the point where you've identified that there's a problem, but also then, who do we go to to fix it? Where did it go wrong in the chain and who's responsible for it? Those are very simple examples of the metadata around the transformations the data might have come through to get to its end point, the quality metrics associated with it, and then the owner or the data steward that it has to be routed back to to get fixed. And all of those are metadata elements, right? All of those, yeah, because we're not really talking about the data. The data might be a debit or a credit, something very simple like that in banking terms, but actually it's got lots of other attributes associated with it, which essentially describe that data. So, what is it? Who owns it? What are the data quality metrics? How do I know whether what its quality is? So, where do organizations make mistakes? Is it, do they create too much metadata? Do they create poor, is it poorly labeled? Is it not federated? Yes. What? How do you get a mix of all of them? One of the things that Chris alluded to and you might have understood is that it's an incredibly labor intensive task. There's a lot of people involved. And when you had a lot of people involved in sadly a quite time consuming, slightly boring job, there's errors and there's problem. That's data quality, that's GDPR, that's government-owned entities, regulatory issues. Likewise, if you can't discover the data because it's labeled wrong, that's potential insight that you've now lost because that data's not discoverable to a potential project that's looking for similar types of data. So, kind of step one is trying to describe your metadata to the organization, creating a taxonomy of metadata and getting everybody on board to label that data, whether it be short and long descriptions, having good tools, et cetera. Yeah, I mean, look, the simple thing is, we struggle as a capability in any organization, we struggle with these terms, right? Metadata, well, if you're talking to the business, they have no idea what you're talking about. You've already confused them the minute you mentioned meta. Hashtag, what's a hashtag? That's basically what it is. Yeah, it's very simple. It's essentially what it is, it's just data about data. It's the descriptive components that tell you what it is you're dealing with. And if you just take a simple example from finance, an interest rate on its own tells you nothing. It could be the interest rate on a savings account, it could be the interest rate on a bond, but on its own, you have no clue what you're talking about. Same maturity date or a date in general. You have to provide the context and that is its relationships to other data in the context that it's in, but also the description of what it is you're looking at. And if that comes from two different systems in an organization, let's say one in Spain and one in France, and you just receive a date, you don't know what you're looking at. You have no context of what you're looking at and simply you have to have that context. So you have to be able to label it there and then map it to a generic standard that you implement across the organization in order to create that control that you need in order to govern your data. Are there standards, I'm sorry, Rebecca, are there standards efforts underway, industry standard wide efforts? Yeah, there are open metadata standards that are underway and gaining great deal of traction. And internally, you have to standardize anyway, irrespective of what's happening across the industry. You don't have the time to wait for external standards to exist in order to make sure you standardize internally. Another difficult point is it can be region or country specific, right? So it makes it incredibly challenging because every region you might work in, you might have to have an own subplossary of terms for that specific region and you might have to control the export of certain data with certain terms between regions and between countries. It gets very, very challenging. Yeah, and then somehow you have to connect to it all to be able to see what it all is because the usefulness of this is if one system calls exactly the same maps to, let's say, date and its local definition of that is maturity date, whereas someone else's map date to birth date, you know you've got a problem, yeah? You just know you've got a problem. And exposing the problem is part of the process, you know? Understanding, hey, that mapping's wrong, guys. So where do you begin? If your mission is to transform your organization to be one that is data-centric and the business side is sort of eyes glazing over at the mention of metadata, what kind of communication needs to happen, what kind of teamwork, collaboration? So I mean teamwork and collaboration are absolutely key. The communication takes time. Yeah, don't expect one blast of communication to solve the problem. It's going to take education and working with people to actually get them to realize the importance of things and to do that you need to start something. Just the communication and the theory doesn't work. No one can ever connect to it. You have to have people who are working on the data for a reason that is business critical and you have them experience the problem to recognize that metadata is important. And until they experience the problem, you don't get the right amount of traction. So you have to start small and grow. Yeah, and you can use potentially the whip as well. Governance and regulatory requirements, that's a nice one to push things along. That's often helpful. Yeah, it's helpful but not necessarily popular. No, no. So you have to get that, we're always struggling with that balance. There's a lot of regulation that drives the need for this but equally that same regulation essentially drives all of the same needs that you need for analytics, for good measurement of the data, for growth of customers, for delivering better services to customers. All of these things are important. Just the web click information you have. That's all essentially metadata. The way we interact with our clients online and through mobile. That's all metadata. So it's not all whip or stick. There's some real value that is in there as well. These would seem to be a domain that is ideal for automation that through machine learning contextualization, machine should be able to figure a lot of this stuff out. Yeah, no, absolutely right. And I think we're working on proof of concepts to prove that case and we have IBM AMG as well, the automatic metadata generation capability using machine learning and AI to be able to start to auto generate some of this insight by using existing catalogs, et cetera, et cetera. And we're starting to see real value through that. It's still very early days but I think we're really starting to see that one of the solutions can be machine learning and AI. For sure. There's various degrees of automation that will come in waves. For the next, immediately right now we have certain degrees where we have a very small term set that is very high confidence predictions but then you want to get specific to the specificity of a company which have 30,000 terms sometimes. Internally we have 6,000 terms at IBM and that level of specificity to have complete automation we're not there yet but it's coming. It's a track. I mean it takes time because the machine is learning and you have to give the machine enough inputs and gradually take time. Humans are involved as well. It's not about just throwing the machine at something and letting it churn, you have to have that human involvement. So it takes time to have the machine continue to learn and grow and give it more terms and give it more context but over time I think we're going to see good results. I want to ask about that, that human in the loop as IBM so often calls it. I mean, one of the things that Ninder Palbandari was talking about is how the CDO needs to be a change agent in chief. So how are the rank and file interpreting this move to automation and increase in machine learning in their organizations? I mean is it accepted? Is it a source of paranoia and worry? I think it's a mix. I think we're kind of blessed at least in the CDO at IBM, the global CDO is that everyone's kind of on board for that mission. That's what we're doing. But I think every, like there's team members 25, 30 years on IBM's roster and they're just as excited as I am and I've only been there for 16 months. But it kind of depends on the project too. Like ones that have a high impact, it's like everyone's really going ho because we've seen process times go from 90 days down to a couple days. That's a huge reduction and that's the governance regulatory aspects but more for us, it's a little bit about we're looking for the linkage and availability of data so that we can get more insights from that data and better outcomes for different types of enterprise use cases. And a more satisfying work day. Yeah, it's fun. No, that's a key point. I mean much better to be involved in this than doing the job itself. I mean the job of tagging and creating metadata associated with a vast number of data elements is very hard work, it's very difficult and it's much better to be working with machine learning to do it and dealing with the outliers or the exceptions than it is chugging through. I mean realistically it just doesn't scale. You can't do this across 30,000 elements in any meaningful way or a way that really makes sense from a financial perspective. So you really do need to be able to scale this quickly and machine learning is the way to do it. Have you found a way to make data governance fun? Can you gamify it? Are you suggesting that data governance isn't fun? Yes. Can you gamify it? Can you compete? We haven't found a gamification, we've found gamification, we're using gamification in many ways. We haven't been using it in terms of data governance yet. But look, governance is just a horrible work, right? People have really negative connotations associated with it. But actually if you just step one degree away we're talking about quality. And quality means better decisions and that's actually all governance is. Governance is knowing where your data is, knowing who's responsible for fixing if it goes wrong and being able to measure whether it's right or wrong in the first place. And it being better means we make better decisions, our customers have better engagement with us, we please our customers more and therefore they hopefully engage with us more and buy more services. So I think we should, that your governance I think is something we invented through the need for regulation and the need for control and from that background. But realistically it's just guys, we should be proud about the data that we use in the organization and we should want the best results from it. And it's not about governance, it's about us being proud about what we do. Yeah, great note to end on. Thank you so much Christopher and Stephen. Thank you so much. Cheers. I'm Rebecca Knight for Paul Gillan. We will have more from the IBM CDO Summit here in Boston, coming up just after this.