 Hello and welcome. My name is Shannon Kamp and I'm the executive editor of Data Diversity. We'd like to thank you for joining this month's installment of the Data Diversity Webinar series called Data Edel Online with Dr. Peter Akin brought to you in partnership with Data Blueprint. This month, Peter will be joined by guest speaker and colleague Karen Akin to discuss data quality success stories. Just a couple of points to get us started due to a large number of people that attend these sessions. We'll be muted during the webinar. If you'd like to chat with us or with each other, we certainly encourage you to do so. Just click the chat icon in the upper right for that feature. For questions, we'll be collecting them via the Q&A in the bottom right-hand corner of your screen or if you'd like to tweet, we encourage you to share highlights of questions by Twitter using hashtag dataed. To answer the most commonly asked questions, as always, we will send a follow-up email to all registrants within two business days containing links to the slides and the recording. And likewise, anything else requested throughout the webinar. I'm tripping over my tongue today. Now let me introduce to you our speakers for today. Peter Akin is an internationally recognized data management thought leader. Many of you already know him or have seen him at conferences worldwide. He has more than 30 years of experience and has received many awards for his outstanding contributions to the profession. I don't know, Karen, if you want to switch to the Peter slide. Peter is also founding director of Data Blueprint. He has written dozens of articles and eight books. The most recent is Monetizing Data Management. Peter has experience with more than 500 data management practices in 20 countries and consistently named as a top data management expert. Some of the most important and largest organizations in the world have sought out his Data Blueprint expertise. Peter has spent multi-year immersions with groups at the versus the U.S. Department of Defense, Nokia, Wells Fargo, the Commonwealth of Virginia, and Walmart. He also often appears at conferences and is constantly traveling. Joining Peter today in the discussion is colleague Karen Akin. Karen is a data management consultant with Data Blueprint. As a certified data management professional, she has data management and solution development experience for numerous government and commercial clients. Her skill set includes in-depth analysis of clients' business processes, analysis of data and data sources, and development and communication of data-centric tailored solutions that add business value. Her expertise focuses on eliciting business and technical requirements and facilitating communications between the business users and technical experts, including all levels of management. She has helped clients improve data flow logistics, develop data quality programs, implement data governance programs, and utilize data visualization for effective decision-making. She is also a board member of DAMA-CV. Now, let me give the floor to Peter and Karen to get today's webinar started. Hello and welcome. Hi, Shannon. Thanks so much for that great introduction. And welcome, everybody. We had such a good response to John Selve last month. Really, you guys were terrific about giving him kudos. You really did a terrific job while there. We decided, like Karen Akin, who's also one of our lead consultants, to talk to us this week about data quality. And we're going to do a little bit different, and we're looking for your feedback. So let us know if you like this better, worse, you know, whatever. Not Karen, but the format, because what we're going to do now is sort of have more of a conversation about data quality. And Karen, we're going to jump right in on this. And really, if somebody's coming to you and your role is very much that of what we call a business analyst, you speak both technology as well as the business. And that's a lot of different businesses, and it's a lot of translational work in there. So when somebody approaches you from these perspectives, what do you say when they say, why should I be concerned about data quality? That's a great question, Peter. I think, first of all, when we look at an organization's data and trying to help them understand why they should be focused on data quality, we're looking at several things, and we ask them a number of different questions as to what data is important to them. And usually their answers come back and they are interested in doing things like analytics and business intelligence and increasing efficiencies, finding out where their costs are. And each one of these areas requires good data quality in order to make effective business decisions. So that's kind of how I would answer that question. It also, I think, is really important to point out that the data quality is not just an IT issue. The business really must take an active part in it. And that's really where your expertise comes in, because many people think because it's data, it must be an IT issue, and therefore we can solve it by buying a tool. And of course, that's just a very naive perspective, isn't it, Karen? Absolutely. More often than not, that's why their data has data quality issues. They tried to take a tool and throw it at it to solve a problem. And we need more than that. We need that business knowledge around all of it in order to pull it together and to say, the answer here is that people really need to be governed by technology, policy, and procedures. And that's one of the things you're going to get into in the presentation. Exactly. So I'd like to talk a little bit about how you would get started with the data quality program. And taking what Peter just said is that first, you have to look at your foundational practices. You have to look at the data that's most impactful and important to the business needs. In the case study that we're going to talk about today, we'll be looking at a data quality program that was implemented for a global education company. And in this case, the data that was most important to them was data that they were going to migrate into a new ERP system. And once you've looked at those foundational practices, you also have to look at what your organization's current capabilities are across your data management practices. What do you currently have available for preparing and managing your data? And in this case, we were again looking at their data migration plan and determining that a data quality effort was a crucial component of this. And then created the type of a roadmap to help them deliver on this. And again, this very key component of this roadmap was a repeatable data quality process. So that's what I'm going to talk to you a little bit about today. And hopefully you'll be able to take something away from this that you could use in your own organization. We should point out in this particular case study that you're presenting. First of all, the client gave you permission to talk about this exact challenge that they were at. Not because it was a particularly good or bad instance, but because these are very illustrative of exactly the type of real-world problems that you run into as a person who's trying to help an organization better understand how data quality can make a tremendous impact on business activities. Exactly. So let's talk a little bit more about what this particular client's data landscape looked at, looked like. They had grown tremendously through acquisition. It's a global education company. There was no accountability for their data as they acquired new companies. There was no requirement to conform to their existing ERP systems. They've been through many reorganizations. Even during the time that we were working with them, there was the reorganization going on. So it really created quite a challenging environment just to manage their data. And of course, that had an impact on their business. They had no way to look comprehensively across the organization and do any business intelligence that looked at all of their data in one spot. They really didn't have good data governance surrounding their data. They had a new CEO who came in and looked at the business a little bit differently, but also started asking some what should seemingly be fairly simple questions. Like what's their top product? Who are our top customers? Who are our top vendors and suppliers? What do those relationships look like? And these were all questions that could not be answered with the data or could not be answered in a reliable manner with the data that they had available to them. So that presented a great opportunity and their chief data steward saw this opportunity to implement a centralized data governance program and kind of formalized data stewardship across the organization. Well, at the same time being very proactive, instead of reacting and fighting fires, their chief data steward worked with us to implement a program that was a data governance program, but this data quality program was really the basis for it and a very key component of it. And as you'll see throughout this, there are many benefits to implementing this particular program. And I always hate to make everybody wait until the end to see what the results were. So we're going to give you kind of a sneak peek at how this program actually improved their data quality just in one domain that we've looked at and we looked at several domains during this particular engagement. The supplier data we looked at in a number of different systems and a couple of SAP systems, a couple of Oracle systems, and then a couple of kind of homegrown systems, and in four different countries. This particular data is looking at, or these particular graphs are showing data that we looked at from a quality perspective from some South African area of their business. And they had some very significant issues that they wanted to focus on because they had uncovered some fraud and were working to minimize that, but as well as to do things like increase their efficiencies to maintain cash flow and those types of things. So we started out looking at just their total numbers of suppliers, and the blue bars here on the left indicate what it was when we started looking at this data, just what those numbers looked like, just in peer record counts, and how we were able to help them monitor and reduce some of the data quality issues that they had. And this is just two different systems that they used in South Africa. So they have things like a lot of duplicate data, things with missing data, missing payment terms, missing email addresses which was hampering communications, things, and those all led to specific business impacts like missed payments, cash flow implementations, as well as supplier and customer relationship issues. Like our client, I'm sure that many of you could look at this particular slide and find data quality issues that you could identify with as well. I mentioned a few of them before talking about missing information, duplicate accounts, inconsistent use of business terms, even trying to define what a particular piece of data should actually mean. They had no product hierarchy, no universal product model, so when they were doing some of their reporting and trying to understand what these particular reports were even telling them, let alone what the quality of the data behind it was, all of these issues came into play and they all acted like ticking time bombs waiting to explode. We approached this. Basically, we talked to them about doing a repeatable data quality pilot program and we will walk you through those steps later in the webinar of exactly what we did. But the first thing that needed to be done was for this chief data steward to spell the message across the enterprise. She did that in a variety of different ways. She developed a 60-second elevator speech to try to get people engaged at all levels. She used some of those current inconsistencies that we pointed out before that were impacting people's reports to really get people on board. When two people came into a meeting and they had two different, same report with two different numbers and they couldn't understand why they didn't match, those were great illustrations for her to use to help sell her message as to why a data quality program was actually needed. She went directly to the senior level, to the C-suite to get sponsorship for the program. And by quantifying the value of poor data quality, she was able to obtain that sponsorship. We went on then to perform this data quality pilot, and she used the results of that pilot to demonstrate the success story and helped her really spell the message and get buy-in into implementing a program that could be used over and over again across the enterprise and in different areas, across different data domains. And Karen, this is really one of the major successes that she had in this because unfortunately many people, when they think about data quality, they sort of think about when you're done. And so we knew in this case that being done meant that you would achieve some tangible results, but at the same time, the idea was that always if you invest even more into this, you'll be able to save even more out of it and avoid the sort of death by a thousand cuts that you were describing earlier on in that process. Exactly, Peter. And that brings up a good point. We used this particular data quality program kind of as a foundation for a data governance program. And you'll notice I use the word program a lot. To me, you're never done. If you use it as a project, if you term it as a project, it's always an end date to a project. But if you sell it as a program, you have a much better chance of it continuing on than just selling it as a once and done type of a project. Different situations that you might have with your data and how to think about developing this message and how to get some buy-in to support a data quality program, you can look at a lot of different situations or very specific impacts that poor data quality might have on your particular company or organization. In this particular case, these are real-life examples at this organization. They had, I mentioned before, no standard product hierarchy, so they couldn't determine what their most profitable products were. They didn't have a process for deactivating vendor records, which led to over 200,000 obsolete vendors being removed from one of their systems. And they were very aware that there was a lot of rot, a lot of obsolete, trivial, redundant data in their system. They also had identified data quality issues in their payment terms on their vendors and suppliers, things that directly impacted their cash flow. When we did interviews with their subject matter experts, they said, oh, all of our suppliers, 90% of our suppliers should be on 30-day payment terms. When we actually profiled and looked at the data, we found it was the exact opposite. 90% of their suppliers were on immediate payment terms. So just looking at some of these different situations and putting a plan in place helped them avoid some of the and lessen some of the true business impacts that they were having from these quality issues. One of my favorites on this slide, Karen, is your supply chain example, where you were able to identify hundreds of hours that were spent cracking down missing data for tax purposes. And that's such a quantifiable piece. Nobody likes to do that kind of work anyway. And when you turn around and say, it's not fun work and it costs us money, there's a ready-made case for it. Exactly. We were able to put a dollar value on that, which at first it did seem kind of trivial when you were thinking how many were missing or how many per hour, but when you actually could quantify it and show that dollar value to somebody at upper management, it made a huge difference. Karen, I was on a panel recently at another conference where the topic of the panel was, can data quality be quantified? And this is exactly the example that I used was out of this slide. I said, yes, absolutely. If you put $10 an hour on it, it's $3,000. Nobody wants to throw that kind of money away. Right. So if you want to avoid situations like you see on this particular slide, where do you go? We really advocate that you need to start with an enterprise data strategy. And I know that many of you have heard Peter and others speak about the importance of having a strategy. I'm going to kind of walk you through several steps. Having senior level sponsorship and an organizational culture that really focuses on data as a strategic asset is very important. And this is going to appear kind of as a path. It doesn't mean that you have to have every part in perfect order at this point, but these we feel are very critical areas that you need to have that will help you sustain a data quality program. We always look at a good data governance and stewardship framework. You may need to have a data governance board depending on the type of your organization. Of course, good data quality principles that are part of your process and your system design are really important, as well as a standard business glossary and a master data management solution. But you don't necessarily need to have them all at once. And today we're really going to focus on what those data quality principles look like and how you embed them as part of your process and your business process moving forward. And again, Karen, you came to us with business process engineering skills, business analysis skills, as well as data skills. And you've already mentioned a couple of times in here this is a sort of shared and bridging function that you perform. So you put together a framework and recommend that as well as a way of moving forward in this area to help everybody literally get on the same sheet of paper. Exactly. And this kind of next slide is going to share a little bit about what this framework looks like and where an organization might start. And the first thing is who has accountability and responsibility for the data? We often see this as an issue. I don't think I've been in an organization where this isn't a problem. Some are much further along in identifying who is accountable and responsible than others. But how do you look at that then? You need to establish data ownership and make somebody accountable. There are certainly decision rights that need to be established. There are definitions of, you know, clearly defining who owns it, who can make changes to the data, who's responsible. And sometimes you start at the very bottom and work your way up. And other times it's easier to start at the top. It really just depends on how your organization is laid out. The second area where we see that needs to be immediately addressed as part of this framework is inconsistent data definitions. Those lead to poor data quality faster than anything else. When you have two different areas of your business using the same term, but defined differently, they're going to populate data in systems differently. They're going to interpret it on reports differently. Those things can lead to very poor data quality very quickly. So establishing some type of process where you define terms where there's a centrally accessible business glossary, even if it's something as simple as a spreadsheet, it doesn't have to be something fancy, but something that people have agreed upon across the organization. So when you're talking about a particular term, you know that supplier means supplier. Or Peter's, I think, favorite definition of business glossary conflict is who's a customer. You need to get some agreement on that. Right. You need to get some agreement on that because once you've defined those terms, you can set your standards, what the data quality standard should look like, and set some metrics for meeting those standards. And then the third part of the framework that we all encourage you to begin to think about is looking at your master data. Having little or minimal standardization across your lines of business, again, leads to very inconsistent data and data quality issues across the organization. So there are some of the steps that can be taken there, including developing master data standards, establishing a change control process, again, goes back to you must have some ownership and accountability first, and then bringing in data modeling so that you've got consistent data models, as well as playing a big part in your governance and stewardship program. And the master data piece, in particular Karen goes back to your comment about ROT. Not all data is important, and focusing on the master data is much more effective than focusing in on the transactional data in general for most organizations. You've done some very nice caveats by saying that each organization has unique problems, but this is one thing I think we can say more universally, master data is a really good place to start when you're taking a look at these types of data quality issues. Yeah, and I think as we looked at different data domains to start with, we looked at some master data areas or some data that was going to be used across the organization. And let me just tell you a little bit more, elaborate a little bit more on what this particular client's landscape looked like. They were consolidating 43 ERP systems into one. So the data quality component that they needed or that piece was very, very important to be a part of their migration plan. And that's kind of how we used, so we used our framework here to help identify the steps they needed to take to implement this quality program that would then give them a successful data migration. So every part of an organization has to become involved if they are going to move from what we typically see as a reactive to a proactive approach for improving data quality. And this is where some of that engineering comes into place and embedding it into your processes on an ongoing basis. And three principles that we really try to stress and help organizations implement are to capture data right the first time. And I think that's really the number one rule and you'll hear me repeat that throughout the rest of this presentation, that whenever it's possible, the data should be captured once at the source and correctly and there should be validation rules on input. The second principle that we go by is to really engineer in some of these positive impacts on data quality. So again, whenever and wherever possible, make it automated, be proactive, and make it an ongoing part of your process. And then of course that involves integrating data quality right into your business processes. When data quality becomes part of your day-to-day job, not something you do in a firefighting mode, the people who are actually creating the data become more accountable to those people who are actually using the data and the accountability for good data quality flows throughout your entire organization. One of the really interesting challenges in this particular case study that I was presenting was that these are things that we wanted the organization to aspire towards. But of course the situation that you're actually facing on the ground was that they had dozens, literally dozens of systems that hadn't had these controls in place. And unfortunately all you could tell them was these are great examples of why you should be doing this from the start originally to avoid exactly this problem that you have in order to clean these up. And there was no way of going in and actually boiling the ocean is the way most people would describe what they of course wanted to have. So instead you developed a repeatable process so that they could go in and take a slice of it at a time depending on the business needs, the appetite of the organization to absorb the change and the number of people that you could involve and the number of resources that you could involve into that. Exactly. And we are going to walk through each step of this repeatable process. But if you walk away from this presentation with one slide, I would hope it would be this slide to help you understand how you would develop a repeatable data quality process on your own. I think this is a really good summary slide overall and we will walk through each of these steps. What I want you to look at right now is to notice just how cyclical this particular process is. It's a continuing process. It has a starting point, but once you're on this wheel, you continue around the wheel through each step and then move through to your next data domain on an ongoing basis. Somebody out there is going to say hamster. Hamster, exactly. The first area that we look at is this discovery area. And this discovery process, again, I bring you back to the point that data quality is not solely the responsibility of business, IT, or of your data governance data quality organization. It's really important for all three of those areas to collaborate and work together. And the first thing you need to do is to identify what is the business need for good data quality. In this particular case, as I mentioned before, we've looked at several of the business impacts that poor quality was having, but the immediate need is that they were migrating data from 43 ERP systems into one. And they did not want to just lift and load bad data from one system to another. They had felt the pain and knew how many issues and problems it caused. They were looking at master data management processes. They knew that they had data quality deficiencies, things that were impacting them from a regulatory process, from a tax process, and then from vendor and customer relationships. They were being driven by a data governance or a recently implemented data governance board who was helping to drive this initiative. And they also were being driven from this executive sponsorship that the Chief Data Steward was able to engage. So there are a lot of different ways that you can identify what your business needs are. That moves us into the team members and who it is that you need to bring to the table to play in this game. As I mentioned, it's a collaborative process amongst several areas. Both your data owners and your business data stewards from the business point of view, an IT data steward to help you with the technical side of things, and then a good data quality analyst. In this particular case, their goal was to implement a data quality center of excellence, but they did not yet have that in place. So we played the role of a data quality analyst on this particular team. When we look at the business needs and resources, the next step in this area, this discovery area, it was to really work on refining of the problem statement and then developing an initial business case that could be shared to kind of justify the current project as part of a bigger program. So the data quality team really worked together to look at objectives that were both achievable, but more importantly, they were in alignment with what the enterprise data strategy was. And they also helped to develop that initial business case. They've looked at things like, how much does it cost for somebody to manually go back and fix and find these errors? How much revenue is lost? Because there are data quality errors. Things like I mentioned before, there are discrepancies in payment terms. What kind of regulatory fines are being assessed because they aren't meeting tax deadlines? And then what type of damage to their corporate reputation do they have? So in this particular case, an example was they buy rights to publish information from various authors. Well, often they were rebuying the same rights because they didn't know they had them. They weren't tracking them very well. And so that was an increased cost implication. But also, they thought that they owned rights to things for certain areas. They published not only in print, but digitally and online. So they needed to own the rights to all of those areas. They may have owned the rights to publish, but then used that particular material online, causing issues with their rights owners and damaging their corporate reputation. So having this group work together to help really refine the problem and identify that initial business case is very important and encompasses that first part of what we call the discovery area. So they're going to take us on to profiling and explain a little bit about what this profiling is because we've seen the term profiling in the news a lot lately. We're not quite the same thing. There's somewhat similar topics, but what is profiling, Karen? So profiling is the technique that we use to examine sets of data, to generate descriptive metadata, and to really gain a rapid understanding of an organization's data. And to do that, we use some technology. It can certainly help you automate the process of looking at the larger data sets. But before you actually are able to profile the data, you have to access it. You have to get it first. And to get the right data, you really have to know what to ask for. So how do you know what data you should be looking at? And as Peter mentioned before, you can't boil the ocean. You cannot profile and look at every piece of data. And every piece of data doesn't matter. But there are... So the state of quality team worked to define, once they had their business case, they really worked to define the data elements and the various source systems that would support the business case. And they focused primarily on that data that was answering the questions that needed to be answered. They worked backwards from artifacts, took reports and worked directly backwards into source systems. They looked at data flows that were available. They tried to understand any data lineage documentation. Sometimes that involved focusing on just tracing data back through systems. Sometimes it involves looking at data... in a data warehouse and understanding the source system and understanding source feeds. But the really important thing to note here is to get as far upstream as possible into those source systems so that you understand where those data quality issues are coming from. And once you've identified this data, there are two options, really, for accessing the data. You can always ask for it in flat files, and that's almost always an option. But of course, your best option is to build direct database connections if you can, and that is one of the reasons that we have that IT data steward as part of our team. We go through this process of initially profiling the data. And really, this is looking at the data without any business rules. One, to confirm that you actually successfully got the right data. But two, to kind of make a sense check. And the types of things that you are looking for... and this happens to be a screenshot from a data profiling tool that we use. The client had Informatica, so that's what we were using at this point. But really, any type of tool that you can use can help you with initial data profiling. And you're looking at things like uniqueness and nulls, bins and macs, data that's outliers, things like that. Again, really, kind of, in a common sense, kind of a viewpoint, not necessarily looking at it with business rules yet applied. And so, what do you do next? You move into what we call a findings review. And the findings review is important because you conducted this data profiling already, and you need to debrief with the data owners and the business stewards to share the results. The purpose is really to help them understand what the purpose of the profiling exercise was to help make sure you got the right scope of the data and to help them develop expectations for their participation in developing the business rules. So just, you know, kind of some tactical, how do you do that? Not every member of the team nor all of the business stewards had access to the particular tool that we were using. So we extracted the information directly into Excel spreadsheets and PDF files so that everybody could review the findings. And we highlighted for their review some of the peculiarities that we found, some of those data anomalies. And we initially then looked at what we thought were potential business rules as well as any patterns that we could see in the data. So when you finished the profiling, where does the organization start to see the value from the profiling? Because the profiling sort of points out the problems. Now you're working into solutions, right? Right. So the real benefit then was helping these groups that we were doing the findings review with developed business rules that could be applied to their data. And this is where that repeatable process comes in because they're developing rules and then we are going to profiling and defining metrics and we're going to profile the data again and review it again with them. And this is kind of an offshoot of this cyclical process. This may happen multiple times as more rules are developed. And as we move into defining business rules and metrics, there's lots of information out there on what makes a good business rule, what kinds of things should you be looking for. I'll just very briefly cover some of the sources of the rules that we used. We looked at existing documentation, existing data standards. We did a lot of interviews with subject matter experts because what we tend to find is that well over half of the business rules are not documented anywhere. They exist in people's heads who just know what the process is supposed to be. We looked at desktop procedures documents and also we looked at system documentation because a lot of times systems have been configured to force business rules that people may not necessarily be aware of. And I think we're all familiar with that with drop-down validations and mass validations, those kinds of things that are put right into a system configuration. And those truly are set up as a business rule whether any of the current users know that they've been defined that way or not. Some of the things that you're looking for as you're going through these various sources including what's allowed, what's required. If I enter this type of publication, what else then is predicated off of that? What other fields must be filled in? Any fields that would link between a data domain, a customer number or a GL account, those are links between data domains that we often see business rules developed around. And then really any of the patterns that you see that you might have seen in that initial data profiling may suggest business rules around them as well. Some of the example rules that we had, and this leads back to one of those first data quality issues that I pointed out, is that a tax identifier is required for all non-employee vendors. Well, many of those 320 hours per year that Peter pointed out on one of those early slides that were spent collecting data from at the end of the year in order to produce 1099 were collecting missing or invalid tax identifiers. So this kind of focuses on those first two roles focus on what those identifiers need to look like. Other business rules that we define another example is that an entity should be unique and duplicates should not be entered. This was a common problem in some of those 200,000 suppliers that were duplicate or out-of-date suppliers because there really wasn't any control in the system or any known enforced, the rule was known but it was not enforced for entering duplicates. And then something as simple as email addresses must be entered in valid format, again, a seemingly simple rule but something that wasn't documented and therefore wasn't enforced in the system. Now, I'll talk briefly about what types of rules make good metrics, things that need to be meaningful to the business, something that, and when I'm talking about metrics, things that need to be measured or that you are able to measure, things that can be controlled, you can take an action to improve it, things that you can report on. So there's enough information in that metric that allows somebody to take action. And then another important aspect of a good metric is that it must be traceable. You want to be able to show improvement over time. And then these are just some additional examples of questions you might ask or particular metrics for various dimensions that you might look at when defining your business rules. And again, I'll emphasize this is a cyclical process. It's something you develop some rules to find the metrics and then you move right into that step of evaluating your data against those and doing another findings review. Lather, rinse, and repeat, right? Exactly. And after you do that, you may go back and add additional rules or update and change those rules. Again, this ongoing circle, it's really you conduct this process in an iterative manner and you review it with those data owners until they're really satisfied that the rules are correct. They often don't like the results that they see because many of them think their data is of higher quality than it actually is. But this leads them then towards what they can then do to remediate their anomalies and move into a monitoring process. And you have to make a decision here at this point of whether you go back and fix the errors or change the process or a little above in many cases. Exactly. And often, I think all of us are very familiar with the find and fix option. I think all of us have been involved in find and fix efforts for data quality and that often happens when you may not have access to the source system or it's a once and done, you might be working with legacy data that there is no way to go back and do a process change on. You're trying to correct data that's being migrated into another system. The other route then is to implement a process change. Really the best practice that we always encourage and it goes back to engineering good data quality habits into your process is to implement a process change. How do you go forward and how do you implement this best practice? And really one of the ways to do that is to really set up a continuous monitoring of your data quality reports to confirm that what you're doing from either a data cleansing from a find and fix or a data cleansing from a process change is really being effective. And to determine whether to implement a process change you really need to look at what the cost and the value, what the return on the investment would be. I think we pointed out earlier, there are four key costs to poor data quality that we identified in this particular area. Some of you may have others, but this is also how you elicit the value by looking at your cost and lowering those and how your program then can add value back into it. And I think this is what Peter's favorite thing is monetizing the value of your data. This particular slide here looks at invalid customer addresses and what the true cost of correcting those addresses would be. So in one particular system, we had seven instances of SAP. So this was looking at just one instance of SAP in one country. There were over 84,000 errors identified through this data profiling process and setting up business rules against the data. We then looked at the average salary in that particular country and converted it back to US dollars for a worker who was engaged in correcting these address issues. And it was $25,000 a year. We also looked at the true cost of that worker, including their benefits, and it went up to over $34,000 a year. And that calculates out to a salary per hour of $16.53 and a salary per minute of $0.28. So relatively simple calculations if you can get your hands on some of the salary information. And this was fairly easy to do when we worked with the HR department. They don't generally want to share salary information, but they shared a range with us and that's how we came to these calculations. But it's $0.28 a minute and it takes you four minutes to correct an address. It's $1.10 an address. Still, you know, $1.10, 100 addresses might seem a little bit trivial. You think about it that there were 84,000 errors in this one particular system. It's over $90,000 just to correct these address errors. If you have a similar number of errors in your other six systems, so you've got seven systems with these types of errors, you're looking at over half a million dollars on seven instances just to correct invalid address, customer address errors. And that is a huge chunk of change to any company. It ends up quickly. It certainly does. So just by identifying this particular and putting a dollar value on it, it became very easy to then guide the company into making these process changes versus just finding and fixing these errors. Because these are the kinds of things that would continue to happen had they not changed their process. One of the things that we look at as we're trying to understand then how do you change the process? How do you find out what the root cause really is? There are several different methodologies for doing that. And did not McGillvray, these are a methodology that she uses that I really like kind of, and we implement it often. And it's called asking why five times. So you talk about what the issue is. There's duplicate vendor records which were causing issues with payments because you had various vendors and bills were getting posted to one incident of the vendor record and payments might be getting posted against the other one. So it was causing duplicate payment issues. So the first question we asked is why are there duplicate records? Well, new master records were being created instead of using existing ones. Why were they creating these new duplicate records? Well, the representatives didn't want to take the time to search for existing records. Why not? Well, they told us the search took too long. Why is the search taking too long? Such a big deal. Well, the reps were actually not trained in the proper way to use the system. So the search methodologies and search techniques that they were using were not the optimal way to be searching the system, but we also discovered that there's... it was also a technology issue and that their system performance was poor. But the crux of the issue is is that the long search time really was leading directly to the duplicate record creation because the representatives who were entering this information were being measured on how quickly they could create new master records. They did not understand what the implication was of having duplicate data downstream. So by going back and changing their process, one, having them understand where the data was flowing to, but two, changing the process that these reps were no longer being measured on how quickly they could create a record, but more on how accurately they could create a record, that was implementing a process change that then allowed for better data quality. And that's just one example of how you might implement a process change. You have remediated these anomalies either by finding and fixing and or hopefully by implementing these process changes. What's the next step? Well, it's really to monitor the ongoing health of your data. And we encourage monitoring at the enterprise level. It was done by data stores. This was kind of just a very simple dashboard that was put together to show how many open, how many deferred, how many remediated data quality issues they had in each of three data domain areas. Customer product and supplier showed them how many critical issues there were, how many had been open over long periods of time. And this was kind of more of an executive level view to monitor these particular issues. What we did then is each, we established this process for data stewards to monitor the data quality on an ongoing basis. And that included looking at data profiling artifacts, taking the corrective measures, and then verifying that these improvements were being put in place over time. The data stewards were really responsible for understanding what was being measured and why. And we want to point out that monitoring can be very costly, so you really should focus on those processes that are primarily essential for you to do business. As with all your business analyst tasks, you're trying to focus in on the biggest bang for the luck that you could get out of that. So we've just got about two minutes left, Karen. It's a fairly complex diagram here in terms of that. But lead us through then the sort of maturity model around this, how an organization grows into these. All right. So as I mentioned, data quality really plays a part in an overall data governance and stewardship program. First part of this is to define and identifying catalog your data assets. Define your control. Those would be your data standards. Measuring your data quality, which is exactly what we were doing here, and that monitoring process comes into play here. And then expanding it to become, just as the data quality piece was repeatable, your data management processes are also become repeatable processes in place. And then as you really mature along this data management path, you're really going to be able to optimize your data management capabilities. And that way you can enhance quality and stewardship performance. So this is just kind of, this is again one of those nice takeaway slides. And I'm going to move very quickly into the next slide that really talks about what that journey looks like because it's never simple and easy. And this particular slide will talk about some of the risks that we, and challenges that we ran into, or again some of the same risks that you might run into in implementing a program, and then how to mitigate those. Making your organization aware of what's going on that really requires a strong communication and change management plan. Getting the organization to actually, and all the business units within the organization to actually adopt requires that accountability and once you've identified your data owners and your data governance organization, they help with adoption. Funding can always be an issue. In this particular case, this company used a cost allocation model. And training. Stewardship skills are hard to maintain, but this particular company defined their staffing models and their career paths and made data governance and data stewardship training part of everybody's onboarding. And then the biggest one is time. And I would, even though we're running out of time right now, I would strongly encourage you to always a lot enough time for your program. It's an ongoing program that takes time to build. And that about wraps it up. Karen, thank you for taking us through that so quickly. It's a lot of material. And we've actually got some good questions out there. So I'm going to turn it back over to Shannon as we move into the Q&A part. Agreed, Peter. Thank you, Karen, for this great presentation. And thank you, Peter, for the presentation as well. Of course, you know the most common question that we receive throughout as people asking about a copy of the slides and the recording. I will be sending out a follow-up email for this webinar within two business days, so for by end of day Thursday with links to the slides, the recording, and anything else requested throughout the presentation. So diving right into the great questions coming in here. How did you come up with the statistics via data proofing or, excuse me, via data profiling? If so, what tool did you use? And that was slowly on in the presentation. Sure. We used Informatica's data pro IDQ tool to do the data profiling, and the statistics that were provided are automatically generated within that tool. One of the things that, and there are many, many quality, good data quality tools out there, one of the things that you should look for in a data quality tool, if you choose to use one, is the ability to build a library of your business rules. And that was very important. So as we developed these business rules, they could be used over and over again because they could be stored in the tools and applied against the different sort of domain. I'll tell a little story here. One of the other groups we worked with was a company you guys probably heard of called Nokia. And we spent about four years with them on and off. One of the things they did, though, from a cultural perspective, was that when they started defining these data quality rules, they put them out there in an online accessible place so that everybody in the organization looked for it. And it just became part of corporate culture when a question came up at a meeting about anything having to do with data quality, business rules, et cetera, et cetera, they would immediately, all as a group, say, let's turn to the Nokia bank and see what is in there. And sometimes they found a little in there, in which case they could all go great. That was a wonderful use of our time. And if they didn't have it, then they would suggest improvements, enhancements to the bank. And the one that Karen's describing here, that you can sort of see a pathway. First get them interested, crawling along the way. And then walking, as Nokia did. And then eventually perhaps sprinting when you start to really automate some of these processes. Perfect. Thank you. Moving on to the next question. How can you quantify this actual increase in data quality to business dollars? So Karen, your slide on that showed a dollar per address change. And that was at four minutes to change the address and make it correct. That's probably wrong. It was also using overseas resources. So not nearly as expensive as, for example, European or American costs in terms of the resources in there. So it becomes relatively easy to do that. And Karen has several other examples that she didn't have time to get into here, where they were able to go in and look specifically at, you know, examples of costs. But it's really just a matter of starting and then looking your way forward. One of the other things you mentioned, Karen, was that you didn't need to have the exact amount of money that each individual made. That's kind of a tough question to ask. How much money do you make standing over somebody as they're making or correcting errors? But if you get a range from personnel and to people who do that work make between X and Y, the lower value of that range, you can say pretty authoritatively that it's at least a dollar for address change for people to make. And we have more than half a million address changes a year, which adds up to half a million dollars and somebody's eventually going to say, that's my bonus, you know, that goes into that. Do you want to add to that, Karen? Just to say that that's the particular example that I showed with the corrections to customer addresses was actually, at the client, the chief data steward, they actually did a study and asked, and the reps who were making these address changes and corrections were timing themselves. So they were trying to go, you know, they were trying to be as efficient as possible and that kind of thing. So it wasn't just a number, you know, the four minutes that we pulled out. We actually asked them, how long, you know, we know that you're making these updates to your system. We would like you to, you know, give us an average time that it takes for you to do, which I thought was very, very interesting because sometimes when people are trying to quantify these things, they're just taking straight out guesses. But in this particular case, it was actually real timing that was done by these people making these address changes. The book that I use as inspiration for all of this is something called How to Measure Anything by a guy named Douglas Hubbard. And you know, I've got 10 books out right now. I guarantee you I've sold more of Douglas Hubbard's books than I've sold of my own because it does help you with that process of quantifying these issues. There's some specific data-related examples in my book, Monetizing Data Management, as Karen mentioned. Sometimes you get good answers. Sometimes you get perfect answers. We say it's at least this much money and people are able to see that, oh, wow, you know, maybe it's even more than that. Maybe it's $2 per address change. Well, now you've gone from a half a million dollars to $2 million very, very quickly in that sense. So moving off those slides a bit and into just some general questions, although certainly we may go back to the examples you've been using. I'm going to combine a couple questions, which of themselves can each probably be and have been an entire webinar in and of themselves that one could lead into the other. First question is, do you have an example of enterprise data strategy? And then would you please define master meta and reference data? So data strategy is the idea that data should be managed as an asset in your organization. And all of your assets, if you look in your organizations, have a strategy behind them. There's an asset behind the managing your finances. It's an asset for, excuse me, it's the strategy for managing your finances. There's a strategy for managing your human capital that you have in the organization. So data also qualifies as that need to have a strategy because data without a strategy is managed very well at the work group level, but typically not well at the department or organizational level. And that's what we're trying to do is elevate that and move all of those rowers, if you will, so that they're all rowing in the same direction from the rowboat. Karen, do you want to define master data management and take your crack at that meta data? And I forget what the third term was, Shannon. Master meta and something else. Reference data. Well, Matt, I'll start with reference data, actually. So reference data, we always look at as standard terms like things. It's usually very fixed. It's things like states, lists of states, countries, things that systems or people use, processes and technology use to refer back to they are usually fairly consistently defined across the organization, although in this particular case we found even some reference data that was very out of date. It's usually the easiest place to look at data quality and sometimes of the highest data quality, although I don't always find that. Master data is more, we see that as things like customer data, product data. It's the data that's shared across the organization. And Peter, you probably have a better definition for that. And then I know Peter has just done a lot of speaking about metadata, so I'm going to throw that one back to you. Right. For master data, it's the major people, places and things on which the transactions in your organization depend on. So for example, we can build an example up from the bottom. Karen, one of the other examples that you have in this case study that didn't come out today was the misuse of reference data where some people in the organization were putting in GB as the abbreviation for Great Britain, whereas the proper definition for it was supposed to be UK, the United Kingdom. And the implications of that is if the reference data is wrong and somebody says what are our sales in the United Kingdom and they don't count all of the GBs as part of the UKs, then the number that you provide to manage it will be wrong there and they will be able to manage the business. So from a master data perspective, the list of master country codes would become master data and the list of the product hierarchy you described. So again, if we're looking for how much Harry Potter has sold in a particular geographic region, and we don't have all of those Harry Potter items grouped under a hierarchy, so maybe Harry Potter gets the code of 666. I'm making that silly, of course. And we have other product codes that are 667 and 668 and 6612 and things like that. So we ask the question how many 666 does he sell in the United Kingdom, but some Harry Potter things are listed under 444 or 333. Again, we're providing management with a long information. Then finally, the last part of it, metadata is just data about data. So if we have a code that says we can track Harry Potter books in paperback versus hardback, that's an important distinction. And whether the sales are important, it can help us determine whether we should put more hardback copies of Harry Potter in the United Kingdom or whether we should put softback copies, soft copies of the book into the United Kingdom. So again, reference data are the allowable values. Basterdata are the things that provide transaction data context in your organizations. People place some things in the organization that you manage organization-wide. And metadata is data about the data. All of them are, of course, critical to understanding data quality issues. And Shannon's right, we have, in fact, webinars on each one of those separate topics. Yes, very, very kind of a question and very important questions. For organizations that have dedicated data quality analysts, how many are typically allocated as a percentage of total company employees? That's a great question. We don't have any numbers industry-wide there, but I'll tell you what we do when we go into organizations to do this kind of work. One of the groups we've worked with, as you know, over the years is Walmart, and Walmart has a million-plus employees. And they have about 10,000 HR managers that effectively manage this human resource, the employees at Walmart. So there's clearly a ratio there of, you know, one HR manager per 100 employees. And that seems to provide good results for Walmart. Again, that may not be appropriate for every organization, but for Walmart, it seems to work very, very well. Now, we then turned around to them and said, you're trying to do data quality with three people, and you guys have more data than even the federal government. And they got very quickly the idea that the answer is three people trying to do that. That's probably the wrong answer. Now, we don't know what exactly the right answer is. And so rather than trying to specify what the right answer is, what we said is, what would successful data quality implementation look like at this organization? And that would be that we would reduce the number of 1099 end-of-year reconciliations that we have to have. Right now, there are at least 320 hours spent doing this. If we get to the point of saying maybe 100 hours is a really good number instead of 300, or maybe zero. Again, these are things that the organization itself can adjust to. But what you have to do is put in place a programmatic approach as Karen was describing, a repeatable approach. And then look at these results year after year, because it may not be worthwhile to eliminate all of the 1099 problems. There may just be some that are going to be a problem for you, and that the best you can hope to get down to is 30 hours or 100 hours. But you don't know if you don't manage it. And what I can assure you is if you don't manage it, that number will continue to grow, and the number of errors that you will encounter after that as a result will increase. Let me give you one more example of sort of sizing of the data quality analysts group. Every network organization within IT has at least one individual whose task it is to keep track of where every wire in the organization goes and where every network dropped and wireless port, and device connected to those exists. So somebody is managing the metadata for the network. In a large networked company, you will have more of those than usually in a small networked company. Similarly, of course, organizations with more complicated data problems are going to have more people involved as analysts. And Karen, maybe you want to speak to the role of how much of that is a dedicated function versus how much of that can be pushed back into the business and get them to do it. You mentioned that every employee that was brought on board was given a data quality lecture as part of the onboarding process. And that's a great point, Peter, because I was jotting some notes here. The company we were working with had 40,000 employees worldwide. And we were looking at putting a center for data quality excellence in place at this company that would ultimately have had just four full-time data quality analysts. And the reason that that sizing was at that level was because we could get to... They were making it a point to make data quality a part of everyone's job. It was not just the responsibility of these analysts to identify data quality issues and to put those processes in place. They really made it the responsibility of not only the data owners and the business data stewards, but everybody throughout the company. So it was really a rethinking of how they were making this a part of everybody's job, not just the side of the desk or as an afterthought type of thing. This particular company had a very... I would say a very mature training program in place. They used a lot of gaming type of things, some incentives for their training and such. And so the Chief Data Steward was able to then tag into that system kind of a data quality component and make it very successful there, thus reducing the number of dedicated data quality analysts at the central level. But that was kind of unique to that organization, certainly with 40,000 employees and the sheer number of systems that they had. We could have expected the number of dedicated analysts to have been a lot higher. Thank you both. And we're just a little more than halfway through the Q&A portion of the webinar. We have a ton of great questions coming in. I just want to encourage people to keep them coming in. If we don't have time to get to your question during the presentation today, Peter and Karen will get answers to you and we'll get that likewise in the follow-up email by Anna Day or Thursday with links to the slides and the recording as well. So then moving on, when valuing quality errors, we haven't had much luck in making executives get engaged with the need for data quality or data governance in spite of some figures in the million-dollar area. Have you run across stubborn execs and what do I do? The stubborn execs have concerns. I know Shannon is telling us a lot of questions and we have to be short with our answers, so we'll try to do that as well. But we ran across an executive in one organization that had 100 people fixing billing errors and delaying the billing going out by about 30 days. And we asked this individual and said, you know, do you realize that this could be solved by a data quality error? And they said, no, I don't care either because I just have the best quarter of the best year that I've ever had. So I'm thinking of adding more people to the room, doubling the size of this room from 100 people to 200 people because clearly whatever I'm doing is working for the company. So that executive was not going to get the message. We were there working on a different task. But of course what we did is we ran around and talked to the CFO and when she found out that we could improve her cash flow on $9 billion by 30 days, that was $800 billion annually in savings and it didn't cost us $800 million to fix their data quality problem so everybody was happy as a result. It's a matter of looking around and finding people who mean things. Now that's in a commercial situation. One of the things we're doing is a lot of government and NGO work. And another group that we're working with is a very large organization dedicated to providing services for people who are less well off. And it used to be that their thought was peanut butter and jelly sandwiches but they're now looking at data and saying, you know, instead of making lots of peanut butter and jelly sandwiches, we could use this data that we're collecting about the environment to convince a locality to extend the bus line so that they eliminate a food desert. And by eliminating the food desert, they eliminate the need to have to make those peanut butter and jelly sandwiches. So again, the same kind of thing Karen was talking about, shifting from a reactive making peanut butter and jelly sandwiches to a proactive give people access to transportation so they can get out of their food deserts and get to healthy food. This is the kind of thing that we're talking about. All right, moving down the queue here. How do we establish specific data quality threshold targets for specific data sets? Well, I think you first have to recognize that your data, you know, all data quality doesn't always have to be at 100%. And for some people that's hard to let go of and they think, well, then why should I even be doing this? But each organization has to look at, you know, what that, and we call it, you know, kind of a good enough threshold is. You were, no organization... Fit for purpose. Exactly. What's it fit for purpose? So if I can make a sound business decision knowing that the quality of this particular data element or the numbers in this report is 95% accurate, then that's how you establish your metric. In some cases, you need to know that the data you're looking at is 100% accurate, and then that's what your threshold becomes. But our experience has shown us that good business decisions can be made on data that is not always 100% accurate. It's very subjective to each individual business's organization, but I think if you approach it looking at it in that manner, it becomes a little easier to set the metrics and to set those particular thresholds. And again, that's not to say you can't change thresholds over time. If you set your threshold and say, okay, I want to be 93% accurate, and you suddenly find yourself achieving that, then we would encourage you to move that threshold a little bit higher if it makes the confidence level in your business decision that you've made based on that data higher. And as much traveling as Karen and I do on airplanes, we're real glad that the airlines don't decide that one in 10 airplanes falling out of the sky is a perfectly reasonable level. So you're right, Karen, it's going to depend exactly on what business you're in and what measures are set up. And you guys are the ones who are best able to help the organizations figure out exactly what those measures are. Talk about these processes. The next question I think is inevitable. How portable is this model to big data? Another good question. Back to what Karen said, it's fit for purpose, finding that correct balance. In most cases, when people are talking about big data techniques, which is actually a more proper term to use than big data, they're talking about data that they're reading as it goes by them. So there's really very little chance to actually insert quality into it. And so quality around big data techniques and big data technologies becomes more a case of not fixing the big data that's out there, but in fact, what is an appropriate use for it? And examples of that might be, while people's comments about movies are not exactly the thing that you want to use to decide whether a movie is suitable for you and your family to go see on a Friday night, the studios, the Disney's of the world, the Universal's of OXs, use those Twitter feeds in a very effective manner to decide which movies should have more or less advertising put it, because those Twitter feeds represent good public sentiment. So again, it's not that you're going to change the big data, but it's how are you going to use the data that is analyzed by these techniques for appropriate decisions? Again, it's kind of flipping the situation around, because you're not really going to affect big data. You know, your data lake is not something that you're going to go and try and clean up. In fact, in many cases, people refer to them as data swamps instead of data lakes, given that context. I think we've got a big data topic coming up later this fall, don't we, Shannon? At least so, we do. Yeah, so we'll dive into that and more detail this fall. All right. Well, Karen, anything to add before I jump into the next question? No, go ahead. I think we have time for one more. Again, we have so many great questions coming in for this particular webinar, so keep them coming and we'll get answers out to you in the follow-up. For organizations that have dedicated data quality, excuse me, I already asked that, that just came in again. How do you sustain data steward engagement once the initial windfall data quality issues have been resolved? So, Karen, this is a relatively new job topic, a relatively new category. What are the techniques that you've been using to help people understand how best to use those people on an ongoing basis? I'm sorry, Peter, the question. I was breaking up. Could you ask the question again, Shannon? Sure, absolutely. How do you sustain data steward engagement once the initial windfall data quality issues have been resolved? Okay. And this is definitely something that we addressed with this particular client, as well as another client that we're working with right now, is how do you keep these stewards engaged? Especially if you've fixed their, you know, their particular problem seems to have been fixed there or their particular area that they're responsible for, the quality has increased to a level that's acceptable. We actually wound up taking some of those stewards and expanding the domain of the area, you know, expanding the area that they were working on, giving them some different responsibilities, because you always wind up some of your superstars who are hungry for more. They understand what you're trying to do, and it was an easy way to identify making them maybe a chief data steward over their particular area and making them then responsible for training others in this way. Another thing that we saw happening at this particular organization was the desire to give them a little bit more responsibility in the process changes, because often those data stewards were the ones that not only, you know, were responsible for the data, but they also could help identify the process changes that needed to be made and how to best implement those. And it was kind of a skill enhancement opportunity for those stewards that had, I don't want to say run out of things to do, because there's always that continuous monitoring that goes on, but really it was more of an expansion of their role that kept them engaged. So there's very definitely an ongoing role for these individuals. All right, well, we've cut just a couple of minutes. Let me look here. Do we have time for another question? I'm looking at these questions. Don't know. Let me jump to this question and see if we can answer it in a couple of minutes. If not, we can expand further in the follow-up. Can you please expand on what we should look for when profiling the data? Patterns, anomalies, performance with business goals and objectives. Outliers. Yeah. Yeah, I think that initial... That's a very quick answer. Well, on the initial profile, we really are looking at it, and I go back to that common sense check kind of thing. Does the data look... We often say that initial profile, you have somebody who doesn't really know your date, who doesn't know your business process well, look at it because they often see things that they see those unexpected things where it's kind of like that exercise where if you've got, you know, 50% of the letters in a word, then you see the whole word. Well, if you have somebody who doesn't know your business process looking at your data, they're going to see where those gaps are and see where the things that are strange looking are, especially helpful in that the first set of profiling. And then, as Peter was saying, once you've got your business rules, you've got to look at, you know, conformance to business rules and things like that. But I think the question was probably addressed as what to look for in that initial set of profiling. Well, that we are certainly out of time at this point. I just want to thank you, Karen and Peter, for this fantastic presentation and Q&A session. And to our attendees who are being so engaged in everything we do, we just appreciate all the great questions that have come in. Again, a reminder, I was going to follow up e-mail for this webinar to all registrants by end of day Thursday with links to the slides, the recording of the session, and anything else requested. And then just a reminder, we will be meeting next month on August 9th to discuss the best practices with the data management maturity model, always a very popular topic with guest speaker Melanie Mecca for who helps manage the DMM and build the DMM. And thank you again for everyone for participating, and I hope you all have a great day. Thanks, Karen. Thanks, Karen. Thank you too.