 Hello and welcome, my name is Shannon Kemp and I'm the Executive Editor for Data Diversity. We'd like to thank you for joining today's Data Diversity Webinar, Data Quality Success Stories. The latest installment in the monthly series called Data Ed Online with Dr. Peter Akin brought to you in partnership with Data Blueprint. Now let me give the floor to the Stephen McLaughlin, the Webinar Organizer. Excuse me, I can talk today from Data Blueprint to introduce today's speakers and today's webinar topic. Stephen, hello. Hi, thanks, Shannon. How are you doing today? Fantastic, and yourself. All right, pretty good. Hello, everyone, and welcome. Thank you for finding the time and your busy schedules to join us for today's Webinar, Data Quality Success Stories. As always, a big thank you goes out to Shannon and Data Diversity for hosting us. We're going to get started in just a few moments after I let you know about some housekeeping items and introduce your presenters. We have a one-hour presentation today, followed by 30 minutes of Q&A. We try to answer as many questions as time allows, but feel free to submit questions as they come up throughout the session. To answer the top two most commonly asked questions, and I promise you this will be asked again today, which is always funny for me, yes, you will receive an email with links to download today's materials and the Webinar recording so that you can view it afterwards. These materials will be sent out within the next two business days. You can find us on Twitter, Facebook, and LinkedIn. We've set up the hashtag Data Ed on Twitter, so if you're logged on, feel free to use it in your tweets and submit your questions and comments that way. We'll keep an eye on the Twitter feed and we'll continue to answer questions in our post-session email if we don't get to them off the Twitter. All right, so now let me go ahead and introduce your presenters. Peter Akin is an internationally recognized data management thought leader. Many of you already know him or have seen him at conferences worldwide. He has more than 30 years of experience and has received many awards for his outstanding contributions to the profession. Peter is also the founding director here at Data Blueprint. He's written dozens of articles and eight books. The most recent is Monetizing Data Management. Peter's experience with more than 500 data management practices in 20 countries and is consistently named a top data management expert. Some of the most important and largest organizations in the world have sought out his and Data Blueprint's expertise. Peter has spent multi-year immersions with groups as diverse as the U.S. Department of Defense, Deutsche Bank, Nokia, Wells Fargo, the Commonwealth of Virginia, and Walmart. He often appears at conferences and is constantly traveling, but today we're lucky enough to have Peter in the office, which is a rare treat. All right, and joining us today is also Karen Akin. Karen is a certified data management professional with data management and solution development experience for numerous government and commercial clients. Her skill set includes in-depth analysis of clients' business processes, analysis of data and data sources, and development and communication of data-centric tailored solutions that add business value. Her expertise focuses on eliciting business and technical requirements and facilitating communication between the business users and technical experts, including all levels of management. She's helped clients improve data flow logistics, develop data quality programs, implement data governance programs, and utilize data visualizations for effective decision-making. She is also a board member here at DAMA-CV. Karen, thank you for joining us today. All right, and with that I will hand the floor over to Peter and we'll get started. Thanks, Stephen. Again, happy to see everybody today. Our topic today is quality, and we take a somewhat different perspective on quality, I think, than most organizations, which is that it's fine to look at quality in isolation, but it's actually more helpful if you look at it in business context. And so what we've got here are a couple of stories that Karen's been involved with lately to talk to us about how to do that process, get us through that particular piece. Our agenda today is we'll look at data quality in context, give you some definitions with an example of it, and look at something called the data quality engineering cycle. We'll look at the approach to solving some of these problems. Then we'll look at the root causes, which really have to do with what we're called data quality dimensions and a data quality lifecycle that are probably presented in a way that you probably haven't seen other organizations presented as. And we'll finish up here with sort of the knockout punch that's the piece that we really wanted to invite Karen on here to present. She's worked with a number of different organizations, and she's sort of come up with an amalgamation of how to quantify this, which is something that we hit at in the business value, the business monetizing book on this, but it really is a little bit of a very concrete example of this. The incentive for this came out of a panel that I was on at the Data Governance Conference back in June, and we might get a chance to replicate this panel at the event that's coming up in December on here. So throw a little plug-in for the Data Governance Conferences that a diversity co-sponsors on this as well. And that was, can you quantify the answer to these questions? Can you quantify the value of data quality? And we're not going to give you a precise, we're going to give you a resounding, yes, just for starters, but we're not going to give you a precise piece, but we can give you an at least amount. And if you get an at least amount, that's enough to get people diving into this. We'll look to dialogue with you all a little bit further as we go into the Q&A at the end of this particular example here. So let me dive in and just sort of give you a little bit of where we start from. Data Today is the most powerful yet underutilized and poorly managed organizational data asset. And when you speak of it as your sole non-degradable, non-degrading, durable strategic asset, people start to think about it a little bit differently. We see a lot of places where people are talking about data being the new oil. We really object strenuously to thinking about data solely as a production function, because if it runs through the pipes and out the other end, you're missing a really important aspect of this, which is its reusability. So another way we've seen this is that data is the new soil, plant something in it, and it will grow. That'd be great. We've also seen data as a new bacon. I don't know if you've seen our new t-shirt designs that Steven actually developed here. It's data in a mother's tattoo piece. We should probably put that in here somewhere, Steven, unless people take a look at that as well. They're very popular at the events we go to as we talk about our really collective mission here in the data management world of strengthening your data management capabilities, providing solutions that are tailored for everybody, and building lasting partnerships with the people in this in order to do this. Now, we start out with what is data management, and even this has evolved over time. So I'm going to show you what we thought was pretty good recently, which is that data management is everything between the two important events in a data item's life, which is when it's actually created, it's sourced, and when it's used. And everything in between then is the process of connecting. I mentioned this was the way we've used to do it because we said, well, this encompasses at least three disciplines, engineering the data, storing it in some place, and delivering it in a different way. These are also then wrapped into a governance function that we need to have, and it requires specialized team skills in order to do this. But we even found this was not really good representation because it didn't talk about the reuse component that I just referenced on the previous slide. We do need to understand that the purpose of gathering this data is not to use it once, but to reuse it as many different times. And the more you reuse it, the better it will become and perform for you as an asset. So we've changed our what is data management slide around a little bit and put that reuse component right there in the middle of it, resources in there optimized for reuse, and understand that these specialized team skills go to a variety of different missions within your organization. And that we also need to have a decent feedback function within that data management piece here. We call this the analytic insight loop. Most organizations are having trouble with their analytics because they aren't doing this as well as they could have, and that governance encompasses all aspects of this particular slide. So our latest definition here really is different even. And we're going to continue to evolve this because that's a discipline we've really only been around for about 100 years as we look at it overall. Data management is a lot like Maslow's hierarchy of needs. The idea here is, of course, that we like to have the things that are in the golden triangle. Everybody wants the things that are in the golden triangle, master data management, data mining, big data analytics, whatever it is. And we've been using this truck for 30 years, so we've changed these acronyms out a bunch of different times as things get popular. What hasn't changed, however, is knowing that those things in the golden triangle are just the tip of the iceberg, and that we really do need to focus on these foundational practices in order to make them work, strategy, governance, quality, architecture, and operations, and that these foundational practices are linked by a weak link in the chain method. In the example that I'm showing here, governance, quality, strategy, and operations are linked together with a strong chain, but there's a weak link to that data platform and architecture, and it doesn't do any good for organizations to spend money and effort on governance, quality, strategy, and operations until they bring that platform and architecture up to a comparable level within that. Again, most people in the golden triangle concentrate on technologies and those are wonderful, but really what needs to happen is that organizations need to concentrate on capabilities because that will provide more lasting value in the organization. We get a lot of questions at Data Blueprint. Yeah, Peter, I hear you say that, but I need the results by Friday. Can you do this without the other stuff? Can you do the things in the golden triangle for us without doing the things that are at the foundational level? And the answer is yes, you can, but it will take longer, cost more, deliver less, and present greater risk to the organization than if instead you learn how to crawl, walk, and run your way up to those better capabilities. We look at this now in terms also of our two guiding frameworks that we use here in data management. First one is the DMM. Thanks to the CMMI Institute, we now can talk about being able to manage data as a coherent discipline professionally in a way that maintains fit-for-purpose data with the corresponding architecture and lifecycle management capabilities, and of course the supporting processes that you need to have in place in order to do this. You may have heard Melanie and I talk on various places in order to do this. The other guiding framework piece is also the data management body of knowledge that we use. DMM publishes only in 2009. You can see that quality is crawled out as the tenth discipline area in order to do this. So these functions are critically important and able to be put in place in order to understand this. So let's dive into some specific definitions now and talk about what it means to have data. The first question is, does everybody know what the number 42 is? Those of you that aren't in on the joke don't understand that 42 is the meaning of life, the universe, and everything. And this means that what we're doing really is talking about combining facts and meanings in order to come up with an individual data. So if you don't know the joke, the joke is that the Hitchhiker's Guide to the Galaxy has a subplot in it, which is the white mice and the dolphins, to decide they'd like to solve the meaning of life problem with us as the experiment. So they run a gigantic supercomputer. It runs 300 centuries and comes up with the answer to life, the universe, and everything being 42. Again, not really a useful definition, but certainly you can see here it's absolutely necessary to get that fact and meaning combination put together in place first of all, and then we can move our way up to information and requests, which is about where IT leaves things. That's the requirements piece. Karen's going to talk about it a little bit before, but we also have to understand strategic use of this information, and that really does require a further refinement of that process in order to come up with it. Now quality data is an interesting piece. It means fit for use. Martin Epler is the originator of that term, and I have to tell you that interestingly enough, spinach falls into an interesting data quality problem. It turned out that somebody in the government was doing the evaluation of various vegetables and leafy green vegetables, and they made a data quality error. And so for years and years, spinach has been believed to have stronger than other leafy green property pieces, of course resulting in Popeye as a result. It turns out that that data quality area perpetuated throughout the government, but turned out was completely wrong. So spinach has, unfortunately, no more or less good properties than eating other leafy green vegetables. And understanding that then says it's got to be fit for use. Well, it means it's very hard to look at any one piece of data and precisely define what is the quality for it, but there's the best way that you can discover what that balance is, is between a conversation between business and the IT users in order to do this. Quality management then allows us to establish things that are out there, which means we need to have critical change management and continuous support. You can't just do your data quality and then be done with it in there. And we'd like everybody to understand further that this is really an engineering discipline and that these engineering concepts are generally not known or understood within either IT or the business. It's kind of like the blind people in the elephant sort of example here. Everybody touching data quality comes at it from a different perspective. There's no one right or wrong perspective and therefore there is no universal definition for understanding what it means to have a data quality problem. Now the real aspect of this is that the solution then by taking an engineering approach gives us the ability to start looking at it as a structured problem. And I'm going to turn it over to Karen here for a minute and talk about how that works out, but a structured approach to data quality engineering means that you're going to allow the form of the problem to guide the form of the solution. There is no cookie cutter approach. An example Karen is going to give you today, I think is a really interesting one. That form of the problem then gives you a means of decomposing the problem into a series of manageable pieces. If we understand what those manageable pieces are, we can come up with an appropriate set of tools for simplifying the understanding of that problem in context and offer a set of strategies for evolving a solution. We can then look at specific criteria for evaluating each of the various solutions. Yes, I could get you to 100% perfect data, but it would cost you your bonus this year. Okay, maybe I'm not going to do that. Again, I'm oversimplifying here of course, but it does give us a framework then for developing institutional knowledge around that particular process. Now, Karen, next example is an example where you were working with a customer in this case and came up with some principles for a specific problem. Yes, good afternoon. This, as Peter said, this is definitely very much focused on a particular client, but it shows the way that we engineer the process. And we really wanted to focus on moving from reacting to data quality problems, which is something we see very commonly in the industry. It's broken. We have to go find the answer to creating a proactive approach to improving data quality. And the way that we do that, we look at principles of capturing data right the first time, and what this implicates is we are putting, setting up a process so that wherever possible, data is captured once as the source and validated on input. This requires planning and design. We also strongly embraced this philosophy of engineering in positive impacts on data quality. We wanted to make sure that wherever possible, there were automated, repeatable processes so that data entry was quick and intuitive for the users. The processes were also designed to so that the users were entering and maintaining accurate data and that the data errors were engineered out. We also wanted to focus on integrating data quality directly into the client's business processes. So often, and we'll talk about this in more detail in a minute, data quality isn't necessarily seen as a business problem. We go back and forth on whose problem it is, but the way that we approach this with the principles is to integrate the quality directly into the business processes. And really here, when we're hired a lot of times, people bring us in and just want us to fix something. Yes. We're talking about those changing a mindset in this case. Absolutely. And that's non-trivial, is it? Okay. Now we're going to move on and talk about the cycle. Typically, there are different ways in the industry to approach making your data sparkle. And while each of these ways works in some degree, they are all at the macro level. They're all too high. We've seen ways to prioritize the task, identify your mission in critical non-critical data, moving forward involving the data owners, which we strongly agree with, but you have to use this one thing as part of an overall process. Keeping your future data clean is also very important, but it's not just that simple. And aligning your staff with business. And that is another very important principle that taking a loan doesn't necessarily work on an ongoing basis. However, the problem that you really faced with was... Yes. So what we're going to look at now is one example from a real practice. These are examples of data quality issues that are costing a particular client money and creating regulatory and reputational risk. There are things like inconsistency and payment terms, which meant that many of their suppliers were on immediate payment terms, costing them cash flow issues. They spent many, many workdays per year tracking down and avoiding missing vendor information to do their 1099s. They had an automated process where there were over 222,000 obsolete vendors. And this is what we classify as redundant, obsolete, or trivial data, and that's the favorite term with Peters, of rock... They had vendor phone numbers that were missing or in the wrong format. They had remittance emails that were causing issues with payments having to be mailed. And they had suppliers on immediate payment terms. Again, another cash flow issue. And these all illustrate the types of problems that are created by data quality issues. We will talk a little bit later about how we quantify these. If you want to avoid those problems, you need to have things like this. And again, this is a very specific solution, a very tailored solution for a particular client. However, there are several pieces of the solution that work well for many organizations. And there's not just one singular answer. In this particular area, we were looking at a data quality center of excellence. And that required a framework to give them clarity. It required an organizational culture that treats data as a strategic asset. This particular building block was very important and required a lot of education within the organization, but it was definitely a foundational piece that was needed to be in place. The data quality principles embedded in process and design, this is exactly the heart of the matter that we are talking about today, where we need to engineer in those principles. We recommended and set up a data governance board with a mandate to drive data quality enterprise-wide. They created a governance framework that very clearly articulated the roles of the data owners, custodians and stewards. And one of the first things that they worked on was developing a master data management solution. Building an organization like this and building a process demonstrates the cooperation across the boundaries. It definitely also supports the recognition of data usage across the organization. And it supported their corporate objective to have a data quality policy and their data quality policy, which is to provide the highest quality data available and to give their users confidence and accuracy in their data. Of course, you see here, when you look at this type of a thing and they say, well, I thought we just hired you to fix our data. Exactly. It was very much... they were not aware of how in-depth the solution needed to be, but what we were able to provide them with was a very well-engineered solution and a repeatable process. And again, they were receptive to this type of thing, mainly because you were able to articulate those pieces, whereas if I think you had come in there with a traditional, well, here's a tool that will fix this problem, that wouldn't have, I think, served their customer needs very well. Exactly. And when we went in, they were looking at just laying a particular tool on top of it or a shiny object. And we really demonstrated the need to have this very well-thought-out, engineered, repeatable process. Steven's looking at them. I think FOS is one of my favorite terms in this space. FOS stands for shiny object syndrome, whereas people just think, I bought a college one. We're not saying that the tools don't work. No, the tool works very well. Right. No, the tools are great. Everything else around it. But the tool was more just a piece of the whole solution. So let's talk a little bit about what that repeatable process looked like. We like to use this kind of circle here to demonstrate that data quality is not just a once and done. We start out with discovery, identifying potential issues, and those can be identified either, or they can be identified at a number of levels, including by end users. They can be identified by data governance boards or a business strategy, driven by business strategy. There are a number of different ways that the business problem with data can be discovered. And we moved on to profiling the data, which really allowed us to review sample data and their data creation and usage processes. One of the things that they did not have this particular client didn't quite understand was business rules, and that's a very important part of the process of data quality and your repeatable process. So we worked with data owners and business data stewards to review documented business rules, and then based on our profiling, it also helped us develop the undocumented rules. Moving on to defining metrics. How are we going to measure the quality? And we'll talk a little bit more about metrics here in detail. We evaluated the metrics, and then you'll see these findings review kind of call-outs. A couple of different places in the cycle, we go back and review what we found with the business users. And then we move on to remediating anomalies. How do they clean or execute a remediation process? How much of the data do they need to clean? And at what cost? And this is where we'll talk about calculating the business value. Another very important part of this whole process is to monitor the health of the data. And this is where you define and implement a continuous monitoring and remediation plan. You've done all of this work to develop these data quality business rules, evaluate your data, fix the data. You want to be able to monitor it on an ongoing basis. And you'll notice then the circle doesn't stop here. It continues back on into your repeatable process, just building that this is an ongoing cycle, not a once-and-done. Peter? So, again, as you're looking at that, what you end up with then is saying to people, okay, so I've got an approach here that is more involved than I originally thought it was going to be. And now what I need to do is understand how, instead of just relying on strictly a tool, relying on a solution approach to that whole process will, in fact, give us better results for this. And then, of course, the answer to that is very simple. If you do clean the data up, but you haven't cleaned up the input streams, your data will then again be back to dirty. Back to dirty. Now that we've talked about our approach and we initially talked about our discovery process, identifying our business need and resources. As I said before, this is not, the discovery process is not the sole responsibility of one area in the enterprise. It could be the business users. It can be IT. It can be if you've got an existing data governance, data quality organization. But most often it requires collaboration between all of these areas. And there are, these are just some examples of the way a business need or problem can be influenced. Perhaps the organization is migrating to a single ERP or CRM system. Maybe they're looking at master data management processes. Maybe there are suspected data quality issues that are impacting regulatory requirements. Compliance is often one area where we see the business problem present itself in a more obvious way. The data governance board initiative, again, probably based on enterprise strategy, needs of a data-centric business strategy and opportunities that arise from that. If you've got an organization that is already looking to be very data-centric, the business needs often float to the top there. Or there may be directors from an executive sponsorship team. These are all things that drive identifying what the problem is. Which we can then go back and look at the elephant, right? And again, different parts of the elephant are going to look different. Migrating from a single ERP may be looked at as purely a technical aspect of the problem, whereas directors from executive sponsorship may look like a different part entirely. But we're still asking the same data quality team in this case to solve those particular problems. Exactly. And all of these, in my experience with different clients who are looking at these types of problems, everybody does approach it differently. And sometimes it's a single problem that's identified, but it looks different to all of these different entities. If you've identified the problem, who is it that you need to help work on solving the problem? Often we'll go into an organization and here it's an IT problem. IT might recognize it as a problem, but they certainly are not the only ones who should be involved in solving the problem. You need to involve your business side. And in this particular example, with our Data Quality Center of Excellence, they were also very critical in solving the problem. And it's real key to understand, too, that IT is not really well qualified to talk about the business impact there. And I say that as a general rule. About one in ten IT organizations that we've worked with over the past 30 years is actually very good at this. And if you're listening to me now and saying, well, we're not like that, that's great. I'm real happy for you. We'd actually like to put you on a list and give you out as best practices kind of a thing. But in most organizations, when something goes wrong, the business gets impacted, but IT has no real consequences. So there's a cost to IT of something not going right in there. And that can be a very, very big challenge for organizations. Let's talk a little bit more about the specific roles that each of these players or meeples, as we like to call them, play. Your data owners and your business data stewards, these are the two pieces that encompass your data bubble there, or excuse me, the business bubble on the previous slide. Your data owners are accountable for data quality. They're the ones who get in trouble when things go wrong. They also usually are the ones that have the authority to grant access to data to a team that is working on solving this problem. The business data stewards, they really play a big part in this circle. They understand how the data is used in business processes. They are probably the ones that are using the data on a day-to-day basis. They are also the people that can help you articulate the business roles, and most often wind up participating in the data cleansing process. So a quick little question. I know we're going to get a question later. What is a meeple, Stephen? Well, the term comes from board games, the little human-shaped little board game figures that are used in a bunch of different games. People call those meeples. So when Karen put these together, I thought, hey, that's a meeple right there. Well, and the idea is that it's a placeholder that somebody does need to be assigned to that role at some point. So Karen, are you saying now that we can't solve data quality problems without a business data steward? I firmly believe that. I firmly believe that the business data stewards have to be involved in the whole solution. So if you don't have a business data steward, can you solve data quality problems? You can, but you need to find one. That's part of the solution, in other words, is what we're headed towards. Yeah, absolutely. So again, you'll get the idea it's not a matter of strictly looking at a tool and IT fixing a problem, but the business instead changing its behaviors, changing some of the practices around this in order to incorporate. We still need to have IT in here as well, though. Absolutely. And the other two parts of our triangle there were our data quality analysts, in this particular instance, the organization had data quality analysts who could help lead this effort. This person is who had the analytical and technical skills to use IDQ, which we were using Informatica at that point, for the tool. And again, it was a shiny object, but it served its purpose when it was implemented correctly. They also led the charge in providing those findings back to the data owners and the data stewards. And then we have our IT data stewards. So IT is also a critical part of the triangle. They provide physical access to data. They are usually the ones who understand the data linkages and how data flows in the technical space. They also help identify any archiving and deletion processes that are very important for eliminating redundant data. When we talk about Meeples, we really do, when you approach these problems, you actually do say, so who's going to fill these roles? And it's possible to have one person fulfill multiple roles, but these are functions distinctly that need to occur in that. Yeah. We actually encountered where we had kind of a data quality analyst or a person who was on the business side playing the role of a data quality analyst as well. And so it did work, but they were definitely playing both roles. So this is an interesting place to sort of switch hats here now. And we're going to drive a little bit into why this is so difficult. And then the reason is, this is an articulation that we did a while ago. There's really two things that happen in an organization to cause data to go bad. One is the practice areas. And this is what Karen was talking about before. You could certainly cleanse the data, and it would be clean for a period of time, but if you don't then capture the data on the way in and keep that data at the same level of quality, the overall quality of the organizational data will decrease over time. Those are what we call practice-oriented activities. The other side of this is structure-oriented. And unfortunately, it's a combination usually of both of those that are required to get to really good quality data. So let's talk about them individually. Practice-oriented means things along the lines of, and you had some examples there earlier, zip codes. If you only allow people to pick zip codes from a list that are already approved, we'd call that domain constraints if we were actually designing the physical database, but our business people aren't going to know what a domain constraint is. But what we can say is, we're not just going to let anybody put anything in there. In fact, if you want to try and experiment on your organization, it's a fairly expensive one, so be careful with it. Open up your brand new ERP or CRM and say, hey, you guys can all update your data. And I can guarantee you that the overall quality of the data will decrease as a result of that, not because people are bad, but because people are sloppy and things don't always work the way they want to do. So again, it's checking the inputs on the way in to make sure they conform to within a certain range, not letting somebody have an age of negative 42, which would be a very difficult age under any circumstances. These practice-oriented activities affect two dimensions of data quality that I'll show you later on, data value quality and data representational quality. And it's a very key piece to understand that this is sort of a bottom-up approach to data quality pieces. Then I also mentioned the structure-oriented. This is when you buy the wrong package, design the database wrong, try to get something to do where it was not engineered to do that in the first place. Now, an example of that that we use quite often is an example that came from my time in the Defense Department. In DOD, we have many of the service men and women who are volunteering to help with the defense of our country who take multiple jobs within the Defense Department. They might work full-time for the Army, but then have a part-time job for the Marine Corps at night. That's a very reasonable practice. It's a very standard practice. And if our systems are designed only to support one job and that my second job then I become Peter I, as opposed to Peter, then what you end up with is a situation that at the end of the year, when we have to do tax reporting for them, we have to go find my 1099 Peter, which the system will correctly report. And then I have to go up to the Marine Corps and say, oh, by the way, Peter had an extra job over at the Marine Corps. We have to build that process in, which means we have additional data processing. We have additional tax processing and everything else. Whereas the structural approach, the desired structural approach to that quality problem, would have been to get the system to support multiple jobs internally in the first place. Much overhead, much rot that goes into this, and much additional wasted effort for organizations falls into this category. The problem when we build these systems is that developers only think within the system boundaries instead of across system boundaries. And that's an issue that affects the model quality and the architecture quality of what's going on. This is really more of a top-down approach, and it never really gets us to the root causes. So when you look at data quality, you've got to look at it from this holistic perspective, and this is somewhat frustrating for customers, because oftentimes they'll approach Karen, Stephen, and I, and say, hey, can you fix this data quality, and we'll want to have a longer, more involved conversation with them, and say it's not just about buying a tool, it's absolutely part of the thing, and it's not just about making sure that Meeples have the right labels that are in there in order to go in. But you have to look at this from a more holistic perspective, and you need to say some of the data quality problems are value-related, representation-related, structure-related, and having to do with the quality of the models and the architecture, and that's critically important, because as you can see, the practice-oriented pieces focus on half of the equation, and the structure pieces focus on the other half. And actually, Peter, we encountered that exact issue where we were very focused on the practice-oriented pieces, and the business actually understood that, and what we were able to show them was that they had some structure-oriented pieces as well. We had the exact same problem where they had an employee who also was being paid for providing content, and they couldn't handle it. They had to enter that employee twice, and we had the Peter and Peter one problem altogether. All right, so having a real business problem that somebody had to encounter, and of course you're going to get to that when you start adding up the costs of these things in just a little bit on that, Karen. Great example. So when we look at the dimensions of data quality, we can really array them across this model that I'm showing you on the bottom of this chart. You can see on the left-hand side of the chart it's very close to the user. Data representation quality on the right-hand side of the chart is closer to the architect. Now, those of you that are data modelers can read my very poor data modeling piece. I'm going to actually expand this one chart into the proper notation for it. Here's the full set of this, which also includes these specific attributes for each of these dimensions that go on this. And you can see that on the right-hand side of the chart, one data architecture quality spawns multiple models. That's what that one-to-many notation there means. And each model spawns multiple values, and each value spawns multiple representations. Now, these are all very good. And by the way, if you need this official authoritative site, you can go back and say, this is it. It's the complete definition. There is no point in sitting down with any others. You can, however, use these to say these are all more important to this particular business problem that we're attacking in here. The challenge is, though, if you're starting at the bottom where you're close to the user, which is where these things manifest themselves, which is where they are obvious to everybody and where everybody sees them, it's an awful lot like being in that little boat down there at the bottom right-hand corner of the screen and somebody telling you that you need to fix the water quality problems coming across Niagara Falls. You know, you can wait till the fall and freeze over, but that's not likely to happen, again, given global warming, et cetera, et cetera, now in order to do this. And it really, what we have to do is look at this more holistic approach that Karen's been describing this to us. One other little piece here before we get back into the components of this, just a more little aspect of this that's a little bit problematic. Tom Redmond put together the original definition of life-cycle quality as a very good definition. It got us started thinking about it back in 93. So he said, we have acquisition, we have storage, and we have usage, very similarly to our previous model of data management being a between sources and uses kind of thing. We know we've expanded it since then, and here's our new model for this. We have to look at metadata creation that is a new component of this because if you don't have metadata creation, then you can't have a structuring of that metadata, and that produces the architecture and the models that you have in place. Then, and only then can you actually create the data. You need the container before you can put the data in it, and that populates the data with the model and the storage locations that are in there. Then, of course, we have our data storage. You've seen that as well. We then may utilize the data. Our data may be manipulated. We may change the values, and we may restore the data back into the original data set perfectly reasonable ways to look at it. In addition to that, the assessment part, Karen called it profiling, the formal how's it working for us, and is it going to be utilized, that may also result in data being restored because these tools allow us now, in some instances, to go back and change the data that's actually in the systems to use it. We may also change the defects, eliminate the value defects that we have there, and that's called a refinement process. Finally, we get up to a metadata refinement process where, again, the example that we just talked about, of not having those employees properly characterizing the system needs to be corrected in order to actually do that, and like it or not, that's not something a data quality tool is going to be able to. It'll help you diagnose it, but the fix for it is very definitely an engineering task. That results in models that are then refined, and this, while it's a cycle of people that didn't like it, so I had to put it out into a really cyclical piece. The key for it is to understand that the upper left-hand corner is where you start when you're creating new systems. We spend only 20% of our dollars in IT on new systems, and a vast majority that starts at the bottom right-hand corner, where Karen described it to you earlier, with the starting point for new systems on this. Now, all this is fine in terms of, as you're approaching these things, but how do we go about the process of actually calculating those models? Karen, you've done some really neat work, which is what we really wanted to sort of show off here. Sure. We're going to go back to talking about those business rules and metrics, because this is where you start to calculate values. You have to define what your business rules are, and it's a rule that defines what good data looks like. It's related. The business rule is a rule that establishes how the data is supposed to behave and what the integrity of the data is supposed to be in relationship to the business action. And then, of course, once you've defined them, you must collect them, and they become useful input for creating those data quality tests. So how do you go about that? Well, we looked at every business rule that was defined. In addition, we looked at business rules that were implied based on our profiling of the data. And the data quality dimensions that they addressed, looking for things that could be measured, and then understanding what the thresholds of acceptable performance were. In some cases, those thresholds were less than 100%. And in many cases, that's acceptable. You can't always have your data be 100% clean. However, there were certain issues that we looked at like compliance. And for example, if we look at our 1099 example, we needed to have everyone's social security number 100% correct in order to comply. And we also looked at where the business rule fell and under which data domain. In this particular example, we had global standards, which were established by the Data Governance Board, and those affected the data across the enterprise. This being an international client, they also had certain thresholds and certain rules that were important at the local level. So there were some business rules and metrics that were defined at a more granular level for the local level, and they didn't necessarily apply across the enterprise. So what makes a good business rule? It needs to be meaningful to the business. As we're looking at those metrics, the way that it's measured needs to relate to business performance. It has to be measurable. You've got to be able to quantify it. It needs to be controllable. You have to be able to change your data and improve your scores on it. It needs to be reportable. Every business rule, the information about that rule and how it's measured needs to provide that data stored and enough information to take action. And it must be traceable because we want to monitor things over time. Some examples of various metrics that we used to define rules. We looked at things in regards to accuracy. Does it fall within an allowed set of values? As Peter was talking about earlier, we could control some of that by masking a field, but oftentimes data that we're looking at has its free text or it may not be controlled. Is the data present? Is the data complete? This is a big issue across many data quality projects that we look at. Often users or clients will say, oh, well, we have to fill in every single form or every single piece of data on this form. Well, it may be complete. There may be data present there, but it may not necessarily be the data you want to be there. Is it consistent? This comes back to the metadata. Is the data used the same way across the enterprise? Is everybody defining each field in the same way? Is the data up to date? How current is it? Integrity is a big issue. Are the data identifying data elements unique? Do you have multiple references to the same types of data across different systems? Conformity? Do you have defined data types? Are there certain things that are stored? We found that to be a big issue. Phone numbers stored in strange ways. Duplication. Do duplicate records exist? Another big issue with data quality. Do we determine the business value from data quality and what kinds of costs can be associated with this? This comes back to directly what Peter started with at the beginning of this presentation. Can you quantify your data? Well, in this particular case, we found there to be a number of ways that we could quantify the costs of bad data quality. We looked at the human capital expense required for manually correcting data. We looked at revenue lost due to inaccurate information. We also looked at regulatory signs that were assessed based on compliance violations. And we also could look at the damages that are done to a corporate reputation. Let me ask both of you guys, though, because when we go into these organizations, I'm trying to figure out, and I'd love to gain the insight here, what is it that enabled you guys as technical people in a data management consulting firm to have this sort of broader perspective on what this is? Because I very, very much doubt, for example, that most data quality tool companies concentrate on something like damaged to corporate reputation in that sense. I think, in our case, we had an incredibly engaged and knowledgeable client who had the support of her superiors, I think. I really think there was an organizational understanding of the importance of what she was doing, and she was just incredibly engaging. Well, and I think as far as corporate reputation goes, any company that has been fined or any company that's had publicity surrounding data quality issues, and I know Peter, you could name a list of many that have had these types of issues in the past, but any company that is even aware of other competitors or other companies in their arena that have had these types of data quality issues is aware of the need to protect their corporate reputation. And as more and more data is out there, I think more companies are realizing how important it is. Back to Stephen's point, though, we had a very engaged client who not only understood the value of data herself, but was willing to do the education across the enterprise that data really was a strategic asset. I've seen growth in this particular company over the last two years in their definition of what data is and how much interest there was in data quality. And it's not a simple solution, right? It's not a shiny object. It really is an organizational belief and an organizational shift in their beliefs about what data means. I mean, I know that's very broad-level, but it really is sort of trickling down through their whole fiber of their being there. And if I recall correctly on this engagement, you guys came in through the chief financial officer as opposed to coming in through an IT... We did. We came in from the business side, which is an interesting way for us to be brought in because we're not always brought in in that way. So they definitely were looking at the dollar figures of what was the associated cost with data quality. And that brings up an interesting point, Stephen, where the trickle-down effect, providing this data quality kind of pilot program and demonstrating the value of good data quality allowed them to then build from the bottom up as well in the education piece because it was clearly demonstrated what the business value was. Yeah, it really felt like we came in right at the convergence, like there was really a lot of momentum behind it. And like you said, really from the bottom up and from the top down. And you'll find sometimes one or the other in organizations where the top will be pushing down, but the bottom's like, yeah, whatever, we don't need this. Or vice versa, the people that feel the time we really need this, but they don't see values sort of at the top. So, you know, it's kind of a tough thing. So generally a mix of those two solutions bottom up and top down. When everybody's waiting to get to the... How much of the cost part? How much of the cost? I don't want to brush through that. That's okay. We're going to show you a very simple calculation of a human capital expense for manual correction. And some of this might seem very obvious, but it really was a process to identify what the cost was and to show the upper level of the C-suite just exactly what the cost was and how they could go about fixing it. In this particular example, we were looking at invalid customer addresses, which was a huge problem for them. I know you didn't do all their systems and you already had identified 84,000 of them. Absolutely. And this was probably, as a percentage, basis, even 20%, 25%. I don't remember the problem I had, but it was about the majority. It was big. It was big, 20%, 25%. And 84,000 errors. And we worked with the client to guesstimate approximately the number of minutes to correct an invalid address. So at four minutes to correct an address, an average salary, including benefits, rolled in there of how much it actually cost at 28 cents a minute. It doesn't really sound like a whole lot. A dollar a time per address, again, doesn't sound like a whole lot. You multiply that times 84,000 and that's a chunk of change. And again, this was just one system. That's by a thousand cuts. Exactly. So again, you're giving us an example here of just the second bullet there, human capital expense. Sorry, I went back too far. I'll try it again. Here we go. Exactly. That was just an example of human capital expense. We also were able to examine revenue loss due to inaccurate information. And that was really part of cash flow. The issues that we examined, we also looked at multiple vendor stream purchases. So again, things that could be impacted due to inaccurate information. The left hand doesn't know what the right hand is buying and if they put those two things together, they could have gotten significant discounts. That's just one example. Regulatory fines from compliance violations, again, we're back to, I think there's not a company out there that's not aware of those types of things and that drives a lot of our data quality initiatives. The more important piece of this too was that as part of the engagement, you taught them how to think in these terms and that this particular set of customers here were very, very amenable to learning this process and now they're off on their own doing this type of additional work. While it may stop at a certain point when they say, great, we've got enough, now let us go out and try some more of these things. And in fact, you might even be able to tell us some of these as an upcoming conference or so. That would be very well might be. Stay tuned, we may have some additional pieces. It just goes back to showing you, and I know Peter's going to wrap this up here in just a minute, but it goes back to showing you just how important those foundational data management practices were that Peter highlighted in early on here, that they lay the groundwork for being able to move on to advanced data practices because that's exactly what's happened here at this particular client. So again, they've learned a little bit about it and they're getting better now at the process. Let me do one other example here because we just have a couple minutes left here. And talk to you a little bit about New York City and trees, which is kind of an interesting from the other perspective. And it illustrates more that structural piece that we need to get into in there. So it turns out there are about two and a half million trees in New York City, and in an 11 month period between 09 and 10, four people were killed or seriously injured by falling tree limbs in Central Park. Okay, nobody wants to walk through Central Park and get hit on the head with a tree. So arborists in the city of New York, there's a number of them, say pruning trees can make them healthier and more likely to withstand a storm, but they had no basis for actually saying that other than a good core set of beliefs. There was no research or data in order to put it to back that particular piece up. So New York City did a really interesting exercise around that. What they were looking at specifically was could they say what's going to happen? All right, does pruning of trees in a year reduce the number of hazardous tree conditions in the following year? And the answer turned out to be yes. They kept track of the pruning data, but interestingly here they had a mismatch between their data collection pieces. So while they were collecting the pruning data block by block, okay, I've done this block of Avenue of the Americas and this block of West 84th Street and this block here, the cleanup data was actually recorded at the address level. And trees, guess what, have no unique identifier, social security numbers and everything else. So they had to download, clean, merge, analyze, model, and all sorts of other things here, and they were able to come back and say wow. If we do the pruning it actually results in a 22% reduction in the number of times crews had to go out and go these emergency cleanups. Obviously treating something as non-emergency instead of an emergency is cheaper. And of course this gets us back to the really best part of the analytics all the way around, which is that as soon as you come up with one solution, one answer to one question, the best result from any analytics exercise is another series of questions. Well, that's the kind of thing that drives IT crazy because they say I spent all this time getting the answer to your tree problem. You know, what's the matter with you guys? Now you want another thing. Well the key is if we solve this the one time at the top of the pyramid, we only have in place the one solution. But if we solve it at the bottom of the pyramid, as Karen was describing before, foundationally we can go back and say yes, absolutely. We can now go back in and check what's happening. And so New York is now looking at block risk profiles for trees. What are the number of trees? The type of trees, whether the block is in a flood zone, a storm zone, et cetera, et cetera. But only by re-architecting the data and by putting it in place. So again we have a combination here of structural and non-structural. And at least if you're a taxpayer in New York City, you can say New York is correctly using data techniques appropriately to reduce the number of hazardous things just so that people don't get hurt around the city and it's resulting in lower total cost to the city of New York as we look around it. So again a very interesting example. It's definitely a perfect example of that proactive versus reactive approach. Which again we certainly like everybody to adopt in order to do that. So we've looked here at the quality and context and definitions in here and I hope you understand these for purpose definitions are very subjective which speaks to the need for that partnership between IT and the business in order to do it. To understand that the cycle is very different most people think of it as conceptually that the solution approaches then have to involve knowledge transfer to the rest of your organization. You can't just solve it one time and expect that that's going to fix your future problems. But you understand a bit more about the dimensions, life cycle quality and things. And we've given you a little bit of an example of how to calculate business value out of that both at the monetary level in a particular example but also at the macro level from New York City's piece. And so to really come up on all of this what we really need to understand is that we can do a lot of data quality engineering in order to do this but we don't want to try to get all data perfect. Again we're looking for a sort of Pareto analysis with the 80% of the data, excuse me 20% of the data that causes 80% of the problems in there because not all the data is of equal importance and that it's a largely a matter of scientific, economic, social and practical knowledge combined in here and that's what we've tried to do today to give you that. So it's time now to drop over to our Q&A section and see what sort of questions we can answer from you guys. Thank you guys for that. It's now time for you guys out there in the listenership to ask questions. So feel free to click on the little Q&A window feature at the top of your screen and you should be able to submit your questions through that feature. We've got a bunch of questions coming in so we're just going to roll right in here so we're not wasting any time. Commence the feed round. If there is no financial, what is that? If there is no financial impact, is ROT applicable? What is the benefit of data cleansing of master data which is not or never will be used? Let's take that in two parts. The first part is since data is free, do you care about ROT? And I think everybody has heard the story that data storage is declining. Now let's put that in strict context. What that means is that the price of solid state storage devices, the things that we all are glad that our computers are running these days, is now on what we call Moore's Law Curve. Every 18 months it's going to be twice as powerful and twice as fast and that's really cool. Which means that data is not exactly free but if you calculate only the cost of data storage, it is absolutely cheap. The question comes in though when somebody says, I need to go find out what customers address is. It turns out that the average corporation maintains customer information in 17 different databases. I was talking to one client last week and they said, well, ours is in 12. And I said, okay, they're above average, that's wonderful and you can now say that. So, any of you that are out there with less than 17 places, that's actually a really, really nice piece. But if you have to go check all 17 of them, yes, the storage is cheap but the cost of the person, the knowledge worker to go out and find the correct one, well, that can be very, very complex and very, very problematic. The second part of your question is really interesting, Steven, and I'm going to toss a diagram up here, give me a second to find it, that most people really just don't get. Which is that the idea that what's happening in most organizations when they look at a master data management solution and they look at it again from the typical, what Karen described earlier, the tools only approach, is a real, real problematic aspect of this. And so, we're going to give you some multimillion-dollar advice here for what happens, which is the idea that you're going to put in place a master data management solution, but you're probably going to do it three times before you actually get it right. And one way to avoid having that as a problem is to understand that it's very difficult to put in master data management solutions at all unless you understand that at least two other components need to go in there at the same time. And that is a component of data governance and also data quality. We have seen many, many of these things fail, and we've in fact rescued a number of organizations where they spent tens of millions of dollars on this. Now, the question was specifically saying, so do you really need to do data quality if you're tossing it into master data? Am I paraphrasing that correctly there? I think more or less, yeah. And the answer is, yeah, you know, certainly not all data is as important as all other data. Absolutely. And I think the other part of that is the question says, what's the benefit of data cleansing of master data if it won't be used? Well, if it's not going to be used, maybe it shouldn't be considered master data, right? Again, one of the hardest things for most organizations to come up with is an objective test of whether something is in fact master data or not. And we tell people that before they spend a dollar on the technology, if they can't have an objective criteria that says, our organization is defining master data in this way for this period of time. By the way, that's not your final answer. You're going to evolve that over years and grow with it. But we may say, for example, that master data is just simply about, and in this case, we know that the client that you were working with in this case had physical locations around the world, and they may decide to start with master data as just defining those locations around the world. That is an objective test. We can look at any piece of data and say, does it meet that test or not? Therefore, it is or isn't master data, and therefore it should or should not be subjected to the master data cleansing, which I think we all agree here should be at a higher level of standard than a traditional nonmaster data cleansing because of the nature of master data. So we'll remember to include this slide in there. Next piece on there. Great question. Thank you. Okay, the next one I think, well, we'll see. This is data as an asset. Please substantiate which are the applicable IASB, FASB, asset accounting standards. I'll take a quick Google-ing and found that IAS 38 covers intangible assets, including databases of business-important information. But I think we're clearly being a bit more liberal and less literal with the term asset here. So you want to take that away, Peter? No, actually that was pretty good. And I would also say that what we do is we talk about durable assets differently than we talk about consumable assets. And that's the distinction that we're trying to make here. If you think of data as oil in a pipeline, it's consumed on the other end, and nobody thinks about that gasoline once it's gone into your car and been burnt. We actually want to keep very good track on what data has been used in there from a regulatory purpose from a number of different perspectives. So the distinction is no, data is not going to show up on the balance sheet, and that was a very nice tongue-in-cheek way of keeping us honest on this, and thank you for pointing that out on there. But we'd like it to be, and we actually have some people that are working on it. There's a fellow at St. Louis who's come up with a value of putting a row of database, excuse me, a row in a database on at least a value piece of it in there. And his organization is paying a lot of attention to that. I'm not sure it's really going to turn into something, but it's certainly an interesting step. We also have a lot of people that are working on trying to get it to that. So again, durable assets versus non-durable assets. We would want data to be in that durable assets column. Great question. All right. So here's a good one. The folks working DQ and my org say there is no data owner. And I quote here, data is for everyone and owned by everyone. Okay. Oh boy. I have a million dollar piece of advice that I'll toss in at the end there, so you go ahead. I would have to respectfully disagree with the folks at your organization. There may be no one who wants to claim ownership of the data, but there's definitely always a data owner. And that is often one of the, when we go into particular clients, we do often have trouble getting people to stand up and say, I own the data. That's extra responsibility. I don't want that. Absolutely. I'm going to go a little further than that and say that actually the only data owner is the organization. And I'll tell a very, very brief story that's in the monetizing book. If you want to get more detail on it. But we're working on the Army Suicide Mitigation Project, which some of you have heard about. The soldiers in the military were dying more from their own hands than they were from the enemies, and that's a problem. We wanted to fix that. We were able to get some support from an executive in the high level executive in the Army. Secretary of the Army came in and said, ladies and gentlemen, I have an announcement for you all. It's all my data. And by calling it my data, it changed the, can I use my data for this purpose to, of course, we can use our data to save soldiers' lives in there. Now, the ownership belongs to the organization, but what you can in fact own are something that Lewis teased out a couple of years ago. And those of you that haven't seen Lewis on some of these podcasts and things, we're very grateful for him coming up with this distinction. You can own only the requirements for that data. You cannot actually own the data. And it's a brilliant piece of insight because that way multiple people can own different data requirements for the same data items. And the owning the data requirements is really the conversation that you want to have around that. And if nobody's willing to own the requirements of it, then nobody is willing to talk about the business value and everybody knows you can't build IT projects without requirements. So it's a very nice way of linking that particular piece in. Again, that's a good way to look at it. Absolutely. No, this has been very, very good on that. I consider that to be a several million dollar piece of advice right there. All right. Next question here. How does metadata management fit into the quality aspects you've described? Now, I think that was actually asked right before you covered metadata. So why don't we table that one? But I think Sandra, you asked that question. If you'd still like that answer, please go ahead and sound off. Otherwise, I'm going to move on, but we'll keep that one logged. And then literally the next slide was metadata. All right. So how do you deal with contextual aspects of data definitions or domains? So that actually goes back to metadata, doesn't it? It does go back to metadata. When you're looking at, and again, this is why somebody can own the requirements for the data, but not actually the data itself. All right. So for example, sales from one perspective might actually be an important accounting number for Wall Street to pay attention to and decide whether they're going to vote the stock down the next day on the trading floor. On the other hand, the executive that's in charge of that or the sales team may be looking at it as their bonus. And so contextually, it is absolutely critical. And this again gets us to why the partnership that Karen and Stephen have both referred to in the last hour and a half on this is so important. IT is not going to be cognizant of that necessarily. Again, about 10% of the IT groups we work with are very, very good at this. But 90% of them are focused more on the IT components of it and just say, well, it's bits and bytes that go on that particular piece. You know, we oversimplify that approach by saying they can get access to the server, my job is done. Again, it's a problem. So contextually, it's absolutely huge, very, very difficult to make any intelligent decide definitions about what fits for purpose in data quality is unless you have that context. All right. So this is one of the questions regarding your cost calculations and the question is, what about opportunity loss? Oh, I think there's definitely, that could be another value that you would want to calculate. These were just some that we calculated directly with this particular engagement, but opportunity loss could be huge depending on your situation. Again, back in context, but... Remember when we started the discussion on this, it's not important to develop a comprehensive approach. But if somebody is looking at this and saying, all right, so I've invested $5,000 in this exercise, you are clearly showing value here of 20 times that particular piece. Another thing that we do also, and this is a little tricky bit of social engineering that we'll do in these things, is that when we're displaying these numbers, we will actually show it out there and say, oh, let's just do it at one per minute, which would have taken our number from $92,000 to $84,000. This is still a pretty good number, right? I'm sorry, that's not exactly right. I'm going to do the numbers again. I do the numbers again. That would have taken us down to $25,000 on it. And somebody will look at that number and say, can you really fix a data quality problem in one minute? You say, oh, you don't like the one? How about we put five in there? Somebody will look at it and go, five? Now, it's 10 minutes, right? Either way, it comes up all the way around and you now start... What we're doing is we're saying, it's costing you at least this much money. But absolutely, the opportunity cost is not figured into this. And if you are able to say, it's $92,000 plus, these people are spending their time doing this instead of that, right? If they will be upselling new books. Exactly. Or in a smaller organization. In a smaller organization, that number could be much more significant. I know when we were first running through some of these calculations, some of them were coming up in the 30, 40, 50, and the client said, oh, not enough impact to really, not enough impact to the bottom line. Well, we started lumping some of them together and showing the different types of calculations and it quickly became enough impact. But for a smaller organization, this is a couple of people. This is exactly... They could be out generating revenue instead of sitting in the office because often it's not the person at the lowest valor level who has to be correcting these calculations. You have a person finding these problems, trying to fix whatever business problem they're working on that needs this data and they need it done now and they've got to find a workaround. There are more costs than even just this simple calculation. We have a perfect example right now with somebody who's going through an audit, an external audit. And it's data quality issues. It also surrounds access to data, this particular audit. But the cost in the number of people at the very highest level than the organization who are involved has quickly escalated this into almost a half a million dollar cost to that organization. So just for one piece. It adds up and it adds up and it adds up. Again, that's 5,000 cuts. Exactly. All right, so here's another one. How does this apply to a situation where most of the data are collected not by the organization, but by consultants and contractors? So this is data coming in from a third party. So we were doing this one time where we were on a phone call, kind of like this situation with respect to four of us sitting around a conference table here. And we said, by the way, did you dig out your data quality service level agreement? And the answer came back. Hot service level agreement. So you can specify this. And if you're paying for data on the outside, you should be able to say that data should conform to certain constraints. Quality constraints are as valid as volume constraints. We worked with one company. It was a company that was largely subscription-based. And they outsourced their subscription to the web and had a third party that was producing it. And we found out that only 30% of the data coming in was any good. Any good. Not just good. Any good. So 70% of the data that was coming in was useless, blah, blah. And they were getting paid for every piece of it. And they literally didn't have a service level agreement in place. It's an easy thing to fix. Just next time the contract comes up for renewal, say, hey, we have one little thing we'd like to add in here. But of course, you also have to monitor that, which goes back to the behavioral pieces that Karen was talking about as well. Yeah. Yeah, not, you know, those are good answers. But of course, if you're consuming free data or something from, you know, maybe OpenData.gov, you may not have much influence on how that data is submitted. So, you know, at that point, if there's not another way to work around it, you know, you just have to kind of figure it out on yourself. Maybe, you know, come up with some ETL strategies or something for how you're going to cleanse it as you go. I mean... Well, let's get to do a more fundamental issue, which is that, you know, we've seen articles in Wired Magazine and other places where they say, you know, you don't need to do hypothesis testing anymore. It's the end of the scientific method. You know, garbage in, garbage out still applies. That's right. Absolutely. Yep, yep. All right. So how do you address structure-oriented activities when the organization is already using a number of systems, many of them being off-the-shelf with data structures that have already been determined? We can definitely relate to this, right, Karen? Yes. We definitely have worked on a number of projects where we've looked at data that's been produced in an off-the-shelf system. It goes back to the same cycle again. We profile the data, understand what's there, and create certain business rules around it. What we oftentimes wound up doing, because it was impractical to pay to customize that off-the-shelf, is then you have to engineer in those business processes to make, to define what the quality of the data going in was. Back to a point that I'm not sure that we covered during the actual webinar, but when you, I thought of it when Peter was showing the slide that where data starts and where it ends, one of the things that helps tremendously when you're trying to demonstrate business value or to actually change a business process is to help the end use the source of the data. Those doing data entry understand how the data is used and what impact it has down the line, because often the people at the front end have no idea how the data is being used at the back end and really have no idea, oh, I want to get through this field, so I'm just going to put an X in there, how that affects things down the line. Nine, nine, nine, nine, right? Exactly. We see that a lot. Another component of this, too, which gets to really the longer-term and behavior-changing piece, is that data governance, as far as we're concerned, should have actual veto power over whether or not a software package is selected, because if a package is coming in and the vendor has said, look, it works great on PowerPoint and they all do, the software will work, but the question is how much are you going to have to modify your organizational practices in order to make that work? And if your workforce is highly unskilled and you're depending on them to get the right type of data into these packages, it's probably worthwhile to invest in a better package or a package that more correctly meets the business requirements, given that set of constraints there. I hate to use an old platitude, but it's sort of like you've got to have the clarity to know what you can fix and what you need to do to work with the things you can't. I mean, if you're in a massive organization, you've got software and it's not going to change, well, that's the reality of it, right? We're talking blue sky, best case scenario, but we realize out there that people are out there, you know, they've got to find ways to work with this and it's easier said than done. And so that's why we're talking about the sort of grassroots organization-wide belief, you know. It's going to take a long time for these behaviors and beliefs to change over time, but at least the next time it comes up, you can go back and say, look, every time we've gone through and done this reconciliation process, it costs us an extra million dollars a year to balance our books. You know, somebody else will be able to figure that into the cost of the package the next time it comes into it. Without that data governance approach, you'll have still largely an application-driven solutions method come into play and that's simply not going to be helpful for organizations. Yeah, that's right. Okay, this is just a quick comment someone made. They said in reference to the cost breakdown, they said that cost was quite accurate. I used to have a manual data cleansing team and that's more or less what I would charge for each record. So maybe a pretty decent rule of thumb there. Okay, next question is, do you have a prescribed best practice-based data quality roadmap aligned to the DMM? So those of you that aren't familiar with the DMM, we referenced it briefly at the beginning here. We are working on that reconciliation process. We do not have one just yet, but what we will find, I'm confident as we move forward here, is that it will not be inconsistent with what we've done here because we've done this so many times. And so many of the organizations that we work with are familiar with this in general. The nice thing about the DMM is that it is, in fact, based on a lot of common sense out there. And Melanie, who we referenced earlier here, will be the first person to say that really what she's tried to do is codify a lot of the common sense that's been out there. So you'd have to dive further into the data quality aspect of the DMM to look at that, but the DMM is not about prescribing behaviors. It's about looking at outcomes. Any road that you get, gets you to Nirvana is acceptable as far as the DMM is concerned, assuming it's compatible with what you try to do internally. I think one of the, and I know Peter's just put this back up on the screen, but one of the things I really like about this approach is that the interconnection between all of the other areas. The DMM wheel is great, and those of us that know that well know that those areas are all interconnected. This particular picture also shows just the double-ended arrows make it very easy to see. Cool. Is that the devil's horn? Yeah. I'm not going to say anything. I love it. All right. How do you get business leaders to own the data when they are often very adverse to owning anything? I feel like this is the whole webinar topic right here. Well, it is. It's about changing hearts and minds. It absolutely is about changing hearts and minds. One of the things that we've, if they make decisions based on that data, which many of the owners wind up doing, that is one way to help identify those owners. What data do you use to make decisions? What data is critical to their operations? We talked about multiple owners based on requirements, but it does come back to what Peter was talking about earlier, the requirements as defined by those business people. The other thing is to show that unless your IT group can show you specifically what those data requirements are, which they're very likely can't, then you can say, well, if they're not going to come up with them and we're not going to come up with them, any answer must be the right answer and we know that's the wrong answer. Exactly. Yeah. Great. Well, that's the last question I have. I don't see any coming in. We're checking the Twitter really fast to make sure there's not any we miss. I'm saying the Twitter like my grandpa. The Twitter looks like that's all we got. So thank you guys so much. All right. So we've got a couple minutes left so you guys can go mull over what you thought and how you're going to change the hearts and minds in your organization. So thanks to everyone for participating in today's event. We really hope you've enjoyed it. Thanks again to Data Diversity and Shannon for hosting us. Once again, you will receive today's materials within the next two business days. Our webinar next month will be Design and Manage Data Structures on October 13th. Hopefully you'll be able to join us for that as well. As always, feel free to contact us if you have any questions. Thanks again. Thank you, Karen and Peter and Steven for this great presentation. And thanks to all of our attendees for all the active participation that you provided with asking all the great questions. Hope everyone has a great day. Thanks, Karen. Bye, everybody.