 Hello and welcome my name is Shannon Kemp and I am the Chief Digital Manager of Data Diversity. We would like to thank you for joining the current installment of the monthly Data Diversity webinar series real world data governance with Bob Siner, and guest speaker today Chris Paskins. Today Bob will be discussing data governance and data science to improve data quality sponsored today by Alation. We have a couple of points to get us started due to the large number of people that attend these sessions you will be muted during the webinar. If you'd like to chat with us or with each other we certainly encourage you to do so. And to note zoom defaults the chat to send to just the panelists but you may absolutely switch that to chat with each other. And for questions we will be collecting them by the Q&A section or if you like to tweet, we encourage you to share highlights or questions by Twitter using hashtag RWDG. To find the chat and the Q&A panels you may click those icons in the bottom middle of your screen to activate those features. And as always we will send a follow-up email within two business days containing links to the slides, the recording of the session and additional information requested throughout the webinar. Now let me turn it over to Miles for a brief word from our sponsor Alation. Miles, hello and welcome. Thank you. I'm really thrilled to be with everybody today and to share a little bit of perspective that will blend into what you're going to hear from Bob today. So I'd like to talk a little bit about data governance and data science and what we're seeing out there when we talk to CIOs and CDOs and other data leaders. One of the things that I can't stress enough is that data governance really does matter. It matters to data scientists. It matters to those who prepare data for them. And the key things we find that are really important are that it delivers the kind of agile modern data architecture you need. So if you haven't fixed your data issues, in fact, one CIO recently said in one of the CIO chats that you don't want to take your mess if you're putting it up in the cloud from where it was to where it is, you need to fix the data before it moves. So obviously, that implies the importance of data quality and trust, the need as we increasingly create new repositories of data to govern it, protect it to ensure we comply. And as well, we want to do things that speed up the process of creating analytical models and I'll talk about that in just a little bit. Obviously, what we're all aiming for more agile data processes and the analysts have recently coined a term called data ops to reflect on this and to reflect on the role of data governance. I'm pulling here from a report that Tom Davenport did a few years ago, but it was really interesting. Describe the issues with data. You know, first of all, where is it, what do we know about it. The second how much time these analysts spend just finding things so if it isn't discovered if it isn't prepped, it's very, very hard to actually make use of it in fact, as a Kirk bone. A lot of data scientists are really better labeled as data plumbers. And, you know, obviously proud flower has found similar things to, to Tom Davenport, in terms of how much time is actually spent working on data rather than making data into analytical models, which is obviously the end. So, the big opportunity that large legacy organizations have is they have a lot of data, and if it can be put together as my friend Jeannie Ross likes to say, into a nice neat package they can win against the startups that are disrupting their industries, but they need to be able to get it into a nice neat package. So, you know, if you're going to do effective data governance and you're going to hear more about this from Bob, a couple principles that we think are important. So obviously, being more people centric. I can't tell you how many CIOs have said they hated data governance, because they had to go find the stewards and then force them to do the work that just doesn't work so a new less invasive approach is needed. Increasingly, if you're going to find these people, you want the process to be as autonomous as possible. You want to basically take the work out so that you're using the brain power, but not creating another job for the data steward which obviously, in many cases has an important role in the business. Increasingly, continuous improvement should happen. So each new wave needs to be easier and easier to do. And one big concept that Bob and I like to agree on is that data is not a one and done. It's a continuous process and there was some research done a few years ago by Forrester that showed the data degraded at like 10% per year. If we can do this, however, we can save our businesses in a world of disruption where the life of public corporations gets less and less. Data is how you win is how you win against disruptors. And so there's an important role here to play in terms of what we're doing in data governance, it really does matter. And with that, I'd like to turn it back to Shannon. Thank you so much for this. And thanks to Lation for sponsoring today's webinar and helping make these webinars happen. If you have any questions for Miles or about elation, Miles will be likewise joining us for the Q&A portion of the webinar at the end. And then let me introduce to you our speaker for the series Bob Sinner Bob is the president and principal of KIK Consulting and Educational Services and the publisher of the data administration newsletter teedown.com. Bob specializes in noninvasive data governance data stewardship and metadata management solutions. And with that, I will give the floor to Bob to start his presentation and introduce his guest speaker Chris Bob hello and welcome. Hi Shannon hi Chris. Thanks Miles for a great presentation. I think there's a lot of synergies between the things that you said, and the things that we're going to talk about today. But today, I am very fortunate to have a special guest to join me. And I'll tell you a little bit about Chris in a minute here, but this is an important topic one that we have been looking forward to presenting on for for a while now. And certainly data science is something that's important to everybody. A lot of folks are looking at data governance to help to improve data quality. Let's look at ways that we can marry up data governance and data science or at least bring them closer together with the focus on being able to improve the quality of the data in the organization and before I get started. I just want to share with you a few of the things that I'm involved with. And these are literally just a few of the things that I'm involved with. And as you know with this monthly webinar series. Next month is going to be another great subject where I would hope that you can walk away from that webinar with an actual data governance framework completed so we call it do it yourself. Data governance framework. I talk a lot about non invasive data governance I'm sure that will at least touch on the subject a little bit in today's webinar. I'm speaking at Dataversities upcoming event in December the data governance and information quality conference DGI queue. Like I said I do a lot around non invasive data governance and metadata governance, you can go to the Dataversity Training Center and find several courses and learning plans that I've put together. In the data administration newsletter and kik consulting two of the businesses that that dominate my time. And last but not least, I am recently a an adjunct faculty member at Carnegie Mellon University in their C data, oh, chief data officer program. And with that I'm going to let Chris introduce himself here in a second but you know Chris and I have been have worked together, we worked together years ago. As a client consultant relationship, and Chris is still somebody that I go to all the time when I have questions about how to implement effective stewardship globally, and Chris has been really successful at doing that for Western digital. He focuses a lot on data quality and with that Chris anything else that you want to add to my introduction here. Well, thanks, thanks Bob for that great introduction. Hello everybody. Thanks for joining this is really a passion of mine. Data governance data science and data quality all mesh together. I hope this discussion will be very fruitful. Officially, my my title is global head of enterprise data governance data quality. My unofficial title title is director of cat herding. So as Bob was was kind of explaining, we have a very successful data stewardship program here we have over 1200 data stewards that are named and actively engaged in processes data processes in Western digital. And we are currently a very active and aggressive data science program specifically for data quality. So hopefully guys can get a few key tidbits out of this and looking forward to the discussion. Okay, thanks Chris and like I said before, great to have you with me today. Here as I always do I kind of run through the topics that we're going to talk about in today's webinar and so I know that when Chris and I talk we could take any one of these subjects and stretch it out for the whole webinar, but we can't do that today. We're going to be shorter and succinct and to the point with some of the comments, we're going to talk about the first of all, the relationship between data governance and data science. We're going to talk about how data governance can leverage data science and make use of data science to help to improve the quality of the organization and I think you'll see that in some of the things that Chris talks about. We'll talk about the data scientists role and data governance and then I'm going to turn it over to Chris to tell you a little bit about how he sees the data scientists role participating in his program. His role in improving data quality as well and again that's been the focus of the stewardship that Chris will talk about and then to wrap this up we're going to combine the three topics together, and we're going to talk about data governance and data science being used to improve quality so we're going to kind of bring these three together as that tour to force that might be really beneficial, or we hope will be very beneficial for your organization. But the first thing that we want to talk about is just kind of making that relationship between data governance and data science. And one of the things that I talk about all the time. And then I write about is how data governance really focuses on people in fact somebody told me many years ago we should call it people governance, instead of data governance because we're really governing people's behavior. And the definitions that I'm sharing with you of data governance and data science and data quality. You'll notice that there's a behavioral aspect to each and every one of those things. When I say that data governance is the execution and enforcement of authority that comes down to people's behavior and how they act with data. Data governance practitioners need to make certain that we recognize that people are already doing a lot of the things that we're going to ask them to do. And I expect Chris will be able to bear some of that out when he talks about his data stewards. But we need to focus on people's behavior because if we can get the right people to do the right thing in the right way, we will be very successful with how we govern data in our organization. So kind of the bottom line here from these three definitions and some of these definitions are really a mouthful is that data science and data quality really require improvements in people's behavior managing data. And it's really important to look at how or what approach are we taking to implement governance as we move forward in our organizations. And Chris, what do you think about this? How have you drawn the relationship between data governance and data science at Western Digital? Sorry, I was on mute. I always go back to a more, how do I get the business involved, right? Because here we talk about data governance which, you know, just the word data governance, people already start to take a step back. But right now, most companies are all about data science. So this is a great leveraging point. But how I look at this and how I explain this to a non-technical business person is it's like running a race. The outcome will be a direct reflection of the training and preparation that's been put forward, right? So the outcome, of course, we want are predictions and prescriptions for data quality, right? And then the training and preparation is what you put forth into the data in terms of quality before you even attempt a model or something of that nature. And usually when I tell them this, they say, oh yeah, you know, hey, I have run a marathon before. You have to train a lot. So putting a little bit of analogies together about setting yourself up for success, your data science, your data, you know, analytics program success is really what the business value is and what it should be. I also expressed to them that it's really not a luxury, right? There's a lot of companies and businesses, like I said, embarked in this great thing, data science, and they're going to get so much out of it and then come to find out that, you know, it's hard to do to get the great business value out of. So we also like to stress that, yes, data governance does a lot of great things for an organization, but it also helps to ensure that your data can be used and cultivated to its full potential. And just a quick fun fact, I got into data quality because I was a data scientist in the high tech manufacturing world and my data quality wasn't up to par in my programs weren't as successful as I like. So I now have the best of both worlds and doing both. And I can kind of explain and express that to the executives, to the engineers, and that's what really helps with that. So, so Chris, when you say that it's not that data governance can't be viewed as a luxury. And, you know, what have you done to, first of all, did you, did you encounter people within your organization that thought of data governance, data science is the big sexy thing right now. Data governance is more bland and, you know, we need to convince people get past the why and get to the how. But, you know, what did, what did you experience, did you experience that people did not view data governance as being a necessity? Absolutely. I mean you're always going to have, matter of fact, the majority of the organizations or executives that just have not been exposed, right, because they just turned to a data team IT team or analytics team and say, give me what I need and then, you know, they snap their fingers and it's supposed to happen. So, you know, a good example is as we were, we embarked on a major IT transformational project at WD. And it was really hard to get the foot in the door with that transformational product project as a data governance initiative. So, what we had to do is I had to work with the separate reporting teams, business teams, master data teams in this transformational project and say, does data coding data governance itself, but very particular, you know, have an impact on what you can deliver during this multi-million dollar project. And the answer I got was yes. So, as we had those conversations, you know, we were able to find out what exactly was needed that we can provide not from a data quality perspective, but from a good data governance practice perspective and processes to help them fulfill their deliverables and really lay the groundwork for what's happening today in the company. So, it sounds to me as though not only is data governance using data science to help push its discipline, but your organization is using data science to help to push data governance in the organization. And that's a really cool thing. If we think about really trying to marry these two or at least bring them together so that we're at least in some realms talking about them in the same world. So, the next thing that we're going to talk about and we've got several subjects to get through here is how data governance can utilize data science and leverage data science to improve data quality. And just real quickly before I turn it over to Chris again, you know, when we look at our organizations, they're already investing a lot of money in data, they're investing it in data science and informatics and machine learning data mining big data, whatever is the, the big hot technology of the day, even data catalogs and metadata management organizations are investing pretty heavily in a lot of these, these disciplines. And we need to recognize that there's a common thread between these disciplines and that is the data that is going to be necessary to do informatics to do machine learning data mining, all of these types of things. And really data governance in at least in a lot of organizations I know they're focusing on getting people just to trust the data, whether that's through metadata in the catalog ability to find the data, spending less time wrangling data like I'm talking about, and getting people to really start to leverage the data. So, I've, I've pointed out as a very strong purpose statement for data governance in some organizations is the phrase to use data to use strategic data with confidence because if we can identify what strategic data is, and focus on that first with our governance program, and we can deliver confidence in that data. That's going to be really important. And certainly our organizations investing in these things. There's a lot of organizations that want to lead in these things to. And so they're recognizing that it's really not enough just to label people as data owners and some data governance programs. They do focus on the data owners and that makes me kind of cringe a little bit. Because I don't like the term owner, Chris can tell you what the term steward means and how he's applied it, but it's not enough just to know who the people are that own the data, we need to get people to be engaged in how they define produce and use the data. And, and so that's critical. We want to make certain that we can use governance to leverage data science to improve quality by taking advantage of those types of things. Senior leadership certainly needs help understanding what governed data looks like versus what ungoverned data looks like, and they need to focus on, you know, really being able to tell the difference and seeing the value that comes from it. Chris, I know that you've taken, you spent a lot of time focusing on these types of things. Maybe you can share us with us a little bit about how you're leveraging data science and focusing on data quality. Absolutely, thanks Bob, you know, like I said before we have a very aggressive program for data science to improve data quality for this exact purpose and here are the some of the things that my team is looking at. And you know what, not just my team but there are a lot of companies startups and even as well established companies that are using data science, particularly for data quality but here come some of the areas that my team is looking at. First is anomaly detection, right. It's my data fresh, is it complete. Those kinds of things you can directly translate data quality dimensions into anomaly detection with very specific models for specific data pipelines. And those are the kinds of things that will let you know, you know, artificially if, if things are going well. So that way you don't have to write 10,000 business roles. You know you can use models to help you with that, that which leads to, you know, the next one which is, okay, I have a pipeline that has 200 attributes or 200 fields whatever. What should I care about, right. A scientist will tell you, okay, I've done some initial modeling this is what I should care about. But what if there's some, some hidden things that we don't necessarily see or the SMEs don't tell us about. We can use modeling to help us prescribe rules that we might be interested in. And then we as SMEs and experts take a look at those rules and say oh yeah that makes sense let's let's measure this. The next one that we're currently working on is when a data scientist deploys a model or trains a model, right. That model is using data to train. If that data quality degrades over the life of the model, and it is now performing, you know, affecting the performance of that model. The data scientist want to know is the data quality in my training set the same or better or worse than what my my current production run is the same. So this is really big and, and my team isn't the one doing all these model data scoring. All of the data scientists and the company are now coming to me and say, build me a platform where I can do this. The next one is pretty pretty new is impact risk analysis right our supply chain procurement teams use impact risk analysis for their supply chains right. We can do the same thing for data supply chain. If we have a great catalog where we can see the data's movement, we can then start looking at if things are going down or not being refreshed and there's a critical need in the business. How do we reroute things in order to get it back up or an operational. These are the kinds of real things that data science can leverage to improve data quality. You know, I really love the one of model data scoring and being able to say is the data that we're using now for my model the same as the model as the data that, or is it the same quality or is it better or worse quality. And certainly data scientists want to know that, and a lot of that information is available through metadata and information that you'll collect in your catalog. We need to form more formally collect that information and make that information available to data scientists. So they're having that that higher level of trust that I spoke about earlier. I love to see this. These are some great examples of how you can use data science to really focus on improving the quality of the data. Let's spend a couple of minutes talking about what the data scientists roles are in data governance because we're talking about, you know, again using data governance to improve quality using data science to improve quality. And when I talk about roles and responsibilities for data governance programs and I do that quite a bit. You may recall that I, I talk about data governance partners or if you haven't seen it yet, please go look for other real world data governance webinars on roles and responsibilities, but I have the role of the data governance partner. The data governance partner is parts of the organization or functions within the organization that are already governing they're not necessarily governing data, but you've got project management that's governing projects. You've got finance and they're a partner plus they're a business unit, but they're governing the the finances of the organization so I think that data scientists just like it, and they don't always reside within it. And he is also a partner of data governance, we're not here to tell them what to do they're not here to tell us what to do at least we hope that's not the case. We've got to find ways to work together. And if we can bring the power users and the data scientists into the data governance fold and really understand what they're looking for in terms of quality of data. You know, we can really govern the use of the of the data analytics or at least help the power users and the data scientists to govern the use of that. The data scientists are extremely important in the fact that they require confidence and so if we know and we're having conversations with them. I don't know how often, you know, in the days when we were all in the office when you go into the lunch and you find your data scientists sitting with your data governance folks, you know what we need to build relationships with the data scientists, we need to build relationships with the data stewards, so that we can truly understand what their requirements are, and then we're not a data governance program that's shooting at targets of things that haven't been defined. And if we can get these folks to work with us in data governance to provide requirements for the analytical of the data in the organization, we're going to we're going to find that we can strengthen that bond, and we can really provide to them, things that are going to make their life easier. Chris have you done that in, in some of the situations with your global data stewards have you engaged the stewards and the scientists to see, you know how can data governance help us to do a better job of, for one example, improving data quality. Real quickly on that and then I'll kind of go over these these bullets. We have at WD, a fairly large in our global operations function, analytics and automation initiative for the next couple of years. They actually reached out to me and said we need good data help how do we do this. So now my lead a functional track, foundational track for that initiative, and I tie my value and data help directly to their proposed ROI and it's, I can tell you, in the hundreds of millions of dollars of savings over the next three years. So, with that, I do partner very very close with my analytics and data science community because what they're doing for the company to provide that tangible value. They came and said, I need good data quality to make this happen so that's a very good leap forward in terms of relationship building for for our program. But, you know, some in the cases of data scientists role in, you know, data governance is kind of changed over the last, I think, couple of years but in a lot of cases, those data scientists role can be overlooked inside the overarching data governance umbrella. And, you know, I've gotten a list of excuses and explanations. It's not their role in the company. They are not data stewards they don't have time to do this, or they don't have a direct impact or influence on data quality. But you know, the fact is that they're usually aggregating data from several different data domains, and they might claim that they don't have any direct influence on that source data that they're using. But really the truth is that a data stewards to be anyone who has a relationship with the data right, and this includes of course data scientists. In a lot of cases, data scientists can see data issues that arise across domain, they're in the ideal role to make a huge impact. Now it might be a little bit late in the game, but they see data that's being stitched together, across many different domains, and that's the first time there's an opportunity to look at the data in that way. So they can actually give great feedback to the source data owners data stewards to say hey, in order for this program to work out we need better data quality so they are actually a very key role in our data stewardship program, since the company relies really heavily on their expertise to provide business value. Okay, you know what I love what you were saying about where the scientists are in that perfect role. Right, if they're, I mean you I always talk about I break down the actions that people can take with data down to they can define data they can produce data, and they can use data and typically everything falls under one of those three and I've challenged people to give me a fourth or a fifth, and I like that I've kind of found a way to be able to slot everything under those three well data scientists are certainly users. In fact they're power users and they're even power users on steroids I mean they, I love when you talk about how they see a lot of the data problems they see where the data is stitched together because, again, these are scientists that are looking to solve these issues with the data that we have. So I don't know maybe you can even maybe go a little bit deeper in how do they, you know, what kind of things do they see that might not be seen by somebody who's not in that role. You know, it's great great kickback to me so so usually in our global operations this was a manufacturing these are factories that are producing high volume of data high high number of parts and you know with with a lot of things going on so probably in two three years ago, we were using a fraction of the data I would say five to 8% of the data being produced in order to take some decision on. Now we've calculated that's more like 25%, even even collecting at a higher rate. So, we can actually see and do things on the manufacturing floor real time with streaming data that we weren't able to do before and these data scientists are the ones that pick that data source data out of the stream will stitch it. Data engineers will stitch it put it all together and provide data to the data scientists, data scientists are paid to look for needles in the haystack. And they find those needles very regularly when the data quality is good when it's not they're finding pins, not needles. So it's kind of it's kind of, you know, they're in that cross section that they're paid to look at data and use data when people usually aren't looking that deeply into the data. You know what that's something that I really hope people will take away from this webinar because if you are in the position of implementing a data governance program, and you've got data scientists or even starting to have data scientists within your organization to get in touch with them, give them a role in your program. Chris is a perfect example of how data scientists can really help. And because they know the data probably, or they need to know the data in order to do the function of the data they need to, they just need to be a good partner of data governance and there's certainly a big role for them. The same thing that when I talk about data got scientists role in improving data quality. I mean, the, they are quality partners, you know, I think I need to go to my next slide here, or did I. Yep, that was your next slide. That was my next slide. Yeah, sorry about that, folks. I'm going the wrong direction here. Right there. Right here. Okay, so we talked about data scientists role in data governance. Well, the fact is that, you know, just as Chris was talking to their partner in data quality as well, we need to engage them. And that can help us to determine where the pins are versus where the needles or the things that we're actually looking for, or what they're looking for are, we can certainly leverage a lot from them. So, again, as I mentioned when they're partners of data governance, you know, their data quality partners as well. So I don't know if you define if organizations define sets of roles and responsibilities specifically around data quality, but if you did I would think that the data scientists would be a very important role, just even for the reasons that Chris shared with you. They govern the use of the data, they define what their confidence requirements are in the data that they use, you know, they do all the same things for data quality that they seemingly did for data governance and so it makes sense that if you're even if you're doing data quality without doing data governance that we need to be thinking about bringing the data scientists into the fold and like I said before, if there's one real important takeaway from this webinar is the importance of sitting down with your data and understanding, you know how they use the data what will help them to use the data better, what will we, what you can provide to them, so that they can improve their level of quality in the data. So maybe Chris you can share a little bit of how you've gone about that in getting the folks because I know that a lot of what you've been focusing on is the data quality and the things. So just to expand a little bit on the previous slide, before I start actually this one. We don't we don't have a true enterprise data governance team where those 1000 data stewards report to me right, I have a very small, small team of people and, but I have experienced people we call them data governance program managers that work with the stewards that are embedded in the data. So, saying that I kind of Jedi mind trick them into creating a data governance program right, we focus on data quality, but you know in order to get data quality you have to have some processes in place. Then you got to have some roles and responsibilities around stewardship and those processes, and then you can back into policies, you know that way doesn't have to be start from a policy and then try to implement sometimes that never works. And break it down with, with, you know, data scientists are look I'm a very, very pragmatic very practical data science is a wide a range of disciplines inside data science so I look at building the architecture pipelines right we have analysts, architects engineers. And those have a direct impact of data quality because they're building how the, the actual statisticians and analysts are going to use the data, right, so data quality built into the pipe, not bolted on later is very, very important in our program. And using the data, of course, once those, you know, architects and engineers have built the foundation, and then we can safely provide the analysts the statistician, whoever we're you know actually using you know to do the models. So they have the direct knowledge of that data domain, and can give direct feedback on data quality and usually these are the folks the data analysts, the BI analysts or the statisticians are involved in the continuous improvement program and you'll see that, later on the slide deck that this is not a one time build. This will start small and keep building, and it's never finished so you have to have a good continuous improvement program in order for this to be successful. Did I hear you correctly when you said that you played Jedi mind tricks on them to build up. Can you elaborate on that a little bit. Yeah, because when we started the, the, you know, data governance journey at WD, it was data governance. Oh, you know immediately flashes of red tape bureaucracy, just by hearing data governance before we even had one conversation right. So, you know we went down we had a journey just like most data governance teams and we said what does this company care about. And, you know, talking to executives, mid level executives all the way people using the data was data quality. So we kind of flipped it on its head we said we're going to do data quality, but we're going to need these other things to, which are really data governance program practices right. And that is how we backed into it. And that's how I get away with having rather small team. Now the people in the business are performing these processes these checks these ownership of the data. And what my team does is just help them facilitate data governance processes without, you know, calling it data governance process, you know, that's that's the Jedi mind trick right is provide small wins. They'll want more, and then they'll start asking for it. And before you know it. They're doing data governance without even knowing they're doing it. You know what it sounds like to me that you're tricking people into thinking that it's their idea. Absolutely governance is necessary and you know what that's a brilliant philosophy to follow because if people come away thinking that they're being listened to. And you know if it just so happens that governance in the focus on quality addresses the things that they're talking about. You know that's even better and you know I've heard that in meetings with executives to that if we can get them to think that these are their ideas, and that we're following their ideas. So not only are you learning to work with your data scientists in this webinar, you've now got some Jedi mind tricks that you can, you can use to engage your stewards, and get them focused and they're and your scientists to get them focused on improving the quality of data in the organization so it's great when there's takeaways from the webinar but you know when you said Jedi mind trick it was, you know, immediately you start thinking okay so what's, what's how's he tricking people into doing things you're not really tricking people you're getting them to come up with the ideas themselves, and when they make that connection there's nothing that's more valuable than that. And the last subject that we want to talk on in this talk about in this webinar is how can we combine these three topics into a tour de force and in your neck in the next slide when I get to it. There's a great diagram that I think shows how we can bring those things together, but just real quickly before I turn it over to you to kind of share that and then we're going to toss it back to sharing for questions for miles and you and for me to wrap this up. The first thing is that we need to focus on is stewardship. And I know that stewardship has been at the heart of your program since the since I started talking to you, you know, several years ago. And the idea is that instead if you're going to take a non invasive approach, people are already governing data, they're doing it informally. And because they're doing it informally they're doing it inefficiently and effectively, if we can help to formalize that accountability, then like I've been known to say and Chris I know you've heard me say it, many times that everybody in the organization is essentially a steward of the data, because everybody either defines and or produces and or uses data as part of their job. So we've got to focus on that so the stewardship aspect of governance. If we're going to get to the execution and enforcement of authority for analytics data, or for any data in the organization, the way that we're going to do that is through our stewards, which is really. And a big reason why Western digital has been so successful is because they started with that core discipline of stewardship as a piece of governance. And we need to look to, you know, further partner and coordination cooperation with the data scientists with the folks. If there are separate folks that are focusing on data quality, these folks need to be engaged and part of the data governance program. So we need to build out communications around governance around science around quality of data that includes orienting people to the subjects so it's not scary data, the term governance is scary people go running for the hills when they hear it so we need to orient them to what the governance is and why it's necessary and how there are different approaches, like an active approach that elation talks about like a non invasive approach that I talk about, and then there's the whole onboarding of these folks in the organization and ongoing communications, we need to make certain that that becomes a critical aspect of all three of these things, and we can tie them together through those communications. So we, if we can, in our organizations, the ideal scenario would be to get people to view data governance data quality and data science as complimentary disciplines and not as you know separate disciplines that are unrelated that partnership is extremely important and Chris please share how you went about doing that to kind of marry these three topics together, and then I'll go to your next slide here. What I'll say is data science relies on data quality, and we're going to achieve data quality through formal data governance so that is taking people's informal accountability and making it formal. Chris why don't you share with us this this incredible diagram, as to how you kind of bring these three concepts together. So this one has resonated in, you know, in the business right is it's really the name of the game is accelerated value, right. And really business value can be gained right companies have been successful for a while now with with some of their programs and analytics programs right but it takes, it takes longer and they have a little bit more frustration but if we take a look at the diagram and we have data governance right. So, we have here where we've been saying this the whole presentation and data scientists should be part of the data stewardship, and in fact have a very large role in continuous improvement, I think, having a large stewardship base really helps your continuous improvement program because there's no ambiguity of who to contact for your different source systems data quality right. And this is a big part of this to our company because we make products right we we built storage solution we that's what we sell it's it's our value data quality should be treated like a product. Right, so here we're building things but we're also using the data to help build the things that we build. So if we look at data as as a product, then we know we need to, you know, just like any other quality thing that we do in the measurement measurements we got to make sure there's remediation we got to have programs around those. And this is really resonated because, like I said we're engineers we make sense to our minds about how we, you know, do anything quality. And the last is data science right. We're not using data science to replace anybody, or automation to replace anybody where we're using data scientists to make humans more effective and productive because that gives you your business value. In fact, we have a program here at WD that if a data model or any type of analysts analysis or automation affects somebody's job we retrain them we up skill them. So that way they can help be part of the long term viable solution of the company so a few a few notes here is is you know, data governance does not have data stewardship. In this, not just have data stewardship in this core set of competencies, but it is the major factor of success without a doubt, and data quality should be part of a larger management process of continuous improvement. That way, these three things together will get you accelerated business value. And we have on the books with our executive team hundreds of millions of dollars of savings that we can provide over the next couple of years. And the companies really relying on us to make data science work for the company and without these other components. And the ROI wouldn't be there so really, when we're talking to the company and executives, business value is the name of the game and you can tie somehow your analytics programs that need data quality, then the ROI will be self written. So, you know that you and I could talk about this all day and I love the, the concept of data quality as a product, you know you hear about data governance as a service you hear about a lot of different things as services and as products but data quality as a product that really, to me, you know, be a you know that you want your product to be of high quality well if your product is your data then you need to that to be of high quality as well. And that's one question people may have and then we're going to I'm going to wrap it up here in a second then turn it over to Shannon. The one question for you Chris is, how long have you been doing this, how long have you been working on this within Western digital. And when did you come to the realization, and maybe it was right from the get go that data science and data governance really needed to be that tour de force to help to improve data quality was it right away, or did it was there a revelation, that you weren't along the journey. So, at WD we've been down this data governance path for about five years now and it's evolved since the inception. When we started data governance under different leadership that was really focused on policy driven, you know, I would say traditional data governance things for lack of a better word. But I know that you have to deliver you have to get these wins and order for people to take notice and to be interested. So we immediately I would say a year to end to it. We focused on data quality and started driving towards data quality, you know, largely as what the value data governance, quote unquote, can bring to the table. Because I would say a year and a half ago, not even a year about a year and a couple of months ago, where the data science community came together and said, Hey, I need trusted data. How do we build a program to do this. And over the last year, building frameworks building architectural solutions for data health platforms is really when these three really melded together. And it's now viewed as as one of the driving forces of how we are going to gain that value in the company. Well, so you know what you've been like the ideal guest for me to have to talk about this object was like I said when you know when I want to talk about stewardship and how it's been effective I we have conversations about it and so I really appreciate your your help and your opportunity about what you've been through and you know how long it takes and those types of things. So we're, we're coming to the point of the webinar where we're going to start to take some questions just real quickly. What we talked about today, we spoke a little bit about the relationship between data governance and data science, and how important it is to make that relationship. And how data governance can be used, or how we can leverage data science to actually improve the quality of the data and Chris gave several examples of ways that he has gone about doing that. We talked about partnership and where data scientists play a role in data governance and Chris that was a great example that you gave of, you know, the, the meta, the Jedi mind trick in the in governance and improving quality is get these folks to say hey we really need this and let them think it was their idea that is the way to be successful in organizations, and then the last topic that we we trust on was the combining of these three concepts together they are not separate disciplines that are unrelated in some of the frameworks, and they may be separate items but the fact is that the more that we can bring these items together, the more successful we are going to be with data science as data science continues to grow as a discipline within our organizations quality continues to have such an impact on the confidence that people have in data, you know we need to bring governance and we need to bring side data science together to to improve quality. And with that, I would like to turn it back over to Shannon to see if we have any questions today. Thank you all for these great presentations today if you have questions for any of our speakers today feel free to submit them in the Q&A portion of your screen, and just to answer the most commonly asked questions just a reminder I will send a follow up email to all registrants by the end of day Monday with links to the slides the recording as well as anything else requested throughout the webinar diving in here. Are there tooling available to automate capture attribute attribute level and lineages is otherwise a very manually intensive area and data governance program. Well I don't know maybe we'll maybe we'll start with with miles miles do you want to talk about how the elation catalog can help with you know automating the lineage of the data to help people weren't know where the data came from. Yeah, I mean I think you know I really liked the remark that were said earlier around, you know the notion of continual improvement. This is an area where you know data science applied to data really can help so you know a catalog can be a great place to automate a lot of these processes as you go downstream a great place to find stewards. I mean, I've said it before but it's really hard to find stewards if you don't, you know you have to go do it top down so the ability to kind of find who are the people talking about a topic and modifying data and things like that. Those are your natural stewards so yeah I think, and of course lineage is a huge issue where today to come from. You know how do we better control it even going back into the transactional systems is a big deal to so my thoughts are yeah tooling can help a lot but you know you still have to do the process and people stuff to. And Chris did you use a tool to do this to help you to determine who the stewards were to help you to document lineage as you were building out your program. Yeah, great question so right now. We are looking to buy some tooling as this program continues to grow manual. Let's say hurting and keeping up with the steward list and you know, associating them with data assets is a rather large task. So we are currently looking for data catalog tool so the answer is no we don't currently have one me being a solution architect in the past. I've I've built a homegrown solution that was great to learn how to crawl and to walk. But now we're starting to run and we need some automated automated solutions to help us and that's that's the point we're currently at right now. So I guess the answer to the question is yes, it's a great question but I think the answer. The easy answer is yes. Yes, tooling totally can help. But it's but it's not the overall solution. People are the people are the solution. Okay, I love it. So, how do you get buy in from the areas in the organization to participate in data governance to foster collaboration across the organization's data scientists for example, within the research and development departments top to bottom bottom up or combination of both any suggestions. Chris you want to start with that. That's that's that's a great question and it's very very I can see why it was asked. It's very difficult to do. But again, it goes it goes back to what is the value proposition of your data governance program. If it's risk. You know, then the conversation would be a little bit different entry data scientists are looking at models to help assess and, you know, predict risk. Then that has a little bit a different conversation whereas on the manufacturing floor, we sell product. So the, the leap to data science data scientists will produce a model that will help the floor do this, which could save, half a million dollars a year and efficiencies or something right then you can get tie that and that conversation becomes much more easy because you talk to your data scientists and you say, Hey, can you do this model with crappy data. They say no, you know, get engineers to where they have to admit something they'll back you or get data scientists, you know, to back you say no this will not work without quality data. Then you start having that conversation with your data domains towards executives mid level management, and then they'll start putting on the books. Hey, you know we need some resources for this. So, as long as you get someone out of their comfort zone to say, Yes, I need it. Then it becomes easier to have that conversation. No matter if you're in, you know, this operational space of manufacturing or in risk or, you know, credit or whatever. Finance that that kind of is what I always go back to is having them admit can you do this without good data quality. I think that that's, you know, I always ask people what can't you do because you don't have the data to do it or what would you do if you did have the perfect data miles I know you talk to a lot of CIOs. And have you found any specific messaging that seems to really hit home or resonate well with these folks when we're trying to bring together these disciplines and when you're when you're just talking to them about the state of the industry. You know the key thing they say to me all the time is put it as a business issue. So, you know, obviously, one of the things I've been really fascinated with is they'll, they'll describe that I've got data out there that, you know, I don't know where it is and it may be of risk to me or I have, I have, you know, people wanting to make decisions. They can't trust the data so I mean I think if you can talk about it not in terms of technology but talk of it in terms of business problems, you'll get them on board and then they'll help you get the rest of the business stakeholders you need to to get a program established. And you know I think people shouldn't listen to you because I know that you do talk to a lot of CIOs so I think that's really good or not. And I think I have time to slip slip in at least one more question here and you know just to everybody out there keep your questions coming because after the webinar I'll get all the questions to Bob up for the questions we didn't have time to answer so we get those in the follow up email to you. So, um, so Chris are you using a data quality tool is there are some tools now that use AI to do the same things talked about here tools like data buck and others, or are you finding that you still need the data scientists to do the data quality work. So we are not we do not have a data quality tool that does a bunch of the AI and ML, and that is, you know, answer is a couple fold one, they're very expensive. And you got to have a really good ROI to justify just the purchase of that tool. The quality tool isn't going to be all you need you need a good catalog as well because that's just going to hold all of your metadata. So when I say answers to fold is when we start looking into tooling. There's there's some solutions out there that you can buy a data catalog and put a data quality tool on top and then we're talking. We're talking about dollars and when you're trying to build a data governance program, you want the lowest cost footprint available. But I can tell you, since we have partnered with our analytics and automation team and operations. We're not going to give up because they have a large ROI, you know, promise, and the amount of money to buy tools is now very negligible, compared to what they'll save. So it's really, really important. The same, the business problem, the business value needs to have a clear direct link to saving money or, you know, putting the company at less risk. So we don't plan on buying a data quality tool right now. Some of the data catalog tools that we're looking at has some of this built in. But we have a plethora of data scientists that are, you know, at our disposal in the company so we want to take that metadata and utilize it for something very pointed. Some of the data quality tools are getting there. They just don't provide everything that we need enough in order to, you know, get the ROI for that spend. So I guess I could summarize what you said is go where there's money, and if there's money being invested in other things and yeah you're right data governance and the quality tool actually becomes a negligible or at least a minimal aspect of the of the overall investment. So I think that makes sense to me, you hear the expression, there's gold in them our hills, you know, think of the, think of these other efforts as them our hills and go after the gold there. Yeah, one, one last thing just to add I mean, also invoke your community. If you can capture what people think is good data versus bad data, it can help as an initial step. But thank you all so much for these discussions but I'm afraid that is all the time that we have slated for today again, I will get all the questions we didn't have time to get to over to Bob to get those answers into the follow up email I will send out by end of which will also contain links to the slides and links to the recording. Thank you all for these great presentations again thanks to elation for sponsoring and helping to make these webinars happen. Love it. And thanks to our community for always being so engaged in everything we do you guys are just the best. So if you all have a great day. Thanks so much everyone. Thanks. Thanks, Chris.