 And welcome, my name is Shannon Kemp and I'm the Chief Digital Manager for Data Diversity. We'd like to thank you for joining today's Data Diversity Webinar, Data Governance Strategies, sponsored by InfoJix. It is the latest installment in monthly series called Data Ed Online with Dr. Peter Akin, brought to you in partnership with Data Blueprint. Just a couple of points to get us started due to the large number of people that attend these sessions. You will be muted during the webinar. If you'd like to chat with us or with each other, we certainly encourage you to do so. Just click the chat icon in the bottom middle of your screen for that feature. For questions who will be collecting them via the Q&A in the bottom right-hand corner of your screen, or if you'd like to tweet, we encourage you to share our highlights or questions via Twitter using hashtag Data Ed. To answer the most commonly asked questions, as always, we will send a follow-up email to all registrants within two business days containing links to the slides. And yes, we are recording and we'll likewise send a link to the recording of this session as well as any additional information requested throughout the webinar. And if you'd like to continue the conversation and networking after the webinar, we may go to community.dativersity.net. Now, let me turn it over to Emily for a brief word from our sponsor. Emily, hello and welcome. Hello, and good afternoon, everyone. It's nice to meet you. I'm Emily Washington. I lead the product management team here at InfoJX. I've been with InfoJX a little over 17 years for those who don't know us. We are a 40-year-old company based out of the Chicagoland area focused on data integrity. We spend quite a bit of time focusing on data quality challenges. We work with a lot of large organizations across the globe. In terms of how we work with organizations, we have this fundamental belief that success really is about focusing on information that matters. So if you think about all the data that is available across the enterprise, there's a subset of that information that's required to help bring insights and process optimizations with ultimately business goals and objectives being the ultimate success criteria of what data is valuable across the enterprise. Here at InfoJX, we really focus on a number of solutions, both technologies as well as services to enable this stream of information to really bubble up and create high-value data. So everything from creating a catalog of your metadata all the way through to enabling business glossaries and providing those insights. But we have a fundamental belief that data quality is really at the root of everything that you do, and which is why this particular session what Peter's gonna be sharing in a couple of minutes here is really important and something that I personally take to heart. A lot of what InfoJX is focused on is helping ensure data quality across operational environment. So we'll work with insurers focusing on ensuring accuracy of information across your policy to billing to claims payments or in the banking industry, financial reporting. We've seen a lot of shift over the years as we've worked in various data environments from mainframes all the way through to most recently streaming data sources and Hadoop sources of information. How do you reconcile all that data across these different environments? That's a critical focus for us here at InfoJX. So really what we're helping Peter bring to today and one of the things that we've been talking quite a bit about is really how do we help organizations bring quality initiatives to bear and support a lot of these challenges that Peter will be speaking about through the presentation today. So with that, I'll turn it back over to you, Shannon and Peter to really kick things off here. Emily, thank you so much. And if you have questions for Emily, she will be joining us in the Q&A portion of the webinar at the end. So feel free to submit some questions in advance in the Q&A section. And now let me introduce to you our speaker for today, Dr. Peter Akin. Peter is an internationally recognized data management thought leader. Many of you already know him or have seen him at conferences worldwide. He has more than 30 years of experience and has received many awards for his outstanding contributions to the profession. Peter is also the founding director of Data Blueprint. He has written dozens of articles and 11 books. The most recent is Your Data Strategy. Peter has experienced with more than 500 data management practices in 20 countries and consistently named as a top data management expert. Some of the most important and largest organizations in the world have sought out his and Data Blueprint's expertise. And Peter has spent multi-year immersions with groups as diverse as the US Department of Defense, Deutsche Bank, Nokia, Wells Fargo, the Commonwealth of Virginia, and Walmart. And with that, let me turn everything over to Peter to get today's webinar started. Hello and welcome. Thank you, Shannon. And Emily, thank you as well. We are looking forward to you joining us at the top of the next hour so we can dive into some of these stories. The topic today here is really about increasing awareness around data. And I'm gonna start out here with a couple of sort of story kind of things that people aren't really aware or are caused by data quality problems. I've been to, as Shannon said, dozens of events and about every one in five or so, somebody will come along and say, well, you can't justify data quality. And I think, no, that's not correct. You actually can. I mean, if you're in a situation where you've got your customer data in this kind of shape, where the prospect comes in through a website and puts in J.E. Smith, and then somebody else comes in through the call center, but the call center had a bad connection that day and they become J.E. Smith instead of J.E. Smith or you get it from a third party list. Maybe it's a banking list that has middle initial. It's J.E. Smith. And then again, we've got another data source from the customer DB internally that says it's J.E. Smith. Well, it's all the same person and we've got to be able to pull all of these things together. Let's do a couple more stories real quick. There's an interesting one where a colleague of mine bought a car and when they bought the car, they hadn't done business with the dealership prior to this, but literally within 30 days of getting the car, they were handed and sent a mail that said, hey, by the way, you know, we'll love to get you a car payment on this and give you a new car. And not realizing of course that this particular customer had already just bought the car and you can see here in the comment, you know, it makes them seem sleazy, which is of course not how we want to appear to our various customers. Or here's a letter I got from a bank at one point. I love keeping this one because it's a real one, obviously. It says, you know, please continue to open your mail from either Chase or Bank One. And I've done business with both of those banks. They're very fine organizations, but this one still was a data quality problem also because I had originally took this letter and if you look at the screen here, you'll see it's crumpled. I went and pulled this back out of the trash can. After I figured out that it was actually something important to them and said it's something that wasn't important to them. You know, again, as a stockholder, still I'm a stockholder in one of the banks on here. And just to make sure, you know, we say at the bottom, please watch out for each of these two things. So even though I wasn't necessarily a customer from one bank or the other, they're forcing us to make those decisions. Or here's another little quick data quality problem. Being a guy, I went and purchased at one point in time in microwave and that would be kind of important, but also being a guy, I ended up dropping the cooking, the little spinning thing in the middle of it broke it. And so I figured I'll get online and fix it before anybody notices and replace it. So I get onto the general electric site and I look at microwaves and I come in and I say, okay, what is it? It's a removable turntable. That's what I'm looking for, right? And they said, oh no, we can't find that part. There is no removable turntable in there. Notice the model number underneath the red in there. It's JES103GPWH002, that's the model number in there. All right, well, I plug around a little bit further. I eventually get a schematic and I find out it's not called a revolving tray, it's called a tray dash cooking. Now again, not good. By the way, I paid for this microwave, a fairly inexpensive amount of money, certainly less than the $48 that they wanted to charge me for the cooking, excuse me, the tray cooking, plus 8.95. I could almost buy two microwaves in response to that. Here's another bank. Again, I've done a lot of business with SunTrust over the years. We actually called SunTrust on this one because we said, hey, you know, SunTrust, did you guys really send us a gift card? And can we use this gift card for buying a car or could I buy a horse or could I charge something with it or whatever? Of course, eventually the person on the other end of the phone said, wow, did we really send you a gift card for $0? The answer of course was yes, you did. Finally, a little quick joke on this one here. Oh, that's gonna run badly. Well, let me tell you the story on this one. This one starts out here where mom says, this is your son's school. We're having some computer trouble and mom says, well, did you break something there? And they say, oh, well, no, we didn't actually break anything, but did you really name your son Robert Drop Tables? Oh yeah, we call them little Bobby Tables. And they say, well, we've lost this year's student records. I hope you're happy. And mom says, yes, and I hope you've learned to increase the quality of your database inputs into that. Let's go one more example here just to get started. These are companies that don't know necessarily that they had a problem with data. Nike, a very fine company, worked with them on and off over the years, and they kept saying we didn't have data quality problems. Well, guess what? When you have a blowout on a brand new sneaker and it hurts a MBA star, not good, right? So billions of dollars down the drain on Nike and imagine they're paying a little bit of attention to data quality. So what we're gonna talk about here is sort of a six step process before we get to the Q&A section. First of all, data quality has to be understood as an engineering challenge. This is part of the discipline of engineering. We're gonna talk then about how to put a price on that data quality. We'll look at specifically DIMBOK components. And the idea is that the DIMBOK components are well oriented so that they complement each other. And that's really a bottom line for all of this is that you probably aren't gonna use one pie wedge at a time, but really a combination of three pie wedges seems to be the best way to put these things together. Then we'll do some savings based stories. We'll look at some innovation based data quality stories and finish up with some non-monetary stories. So first thing, I hate to start off with a grumble, but I see this all the time on LinkedIn and other places where people will say, hey, four steps to make your data sparkle. Just prioritize the task, involve the data owners, keep all the data that's coming in clean and assign your staff with the business. Sorry, align your best staff with the business. And it's just wrong. And it's really unfortunate to let people see this because when they do, they get the idea that things are easier than they are. It's important to have an understanding of high quality data. We're talking about information transparency. We're talking about the basis for analytics, business intelligence. We're talking about increasing efficiencies, decreasing costs and driving better decision-making around to it. But it's not simple. It is important that you get some help. And this is where having organizations that have been around for 40 years is an absolute godsend in this case because these guys have been doing this for a while. So let's take a look at some numbers from some of the clients that I've worked with over the years. Again, just numbers. 47 quintillion bytes of SQL Server data. An Informix system that runs almost 2 billion queries every day. Actually, at the bottom one, there's 30 billion queries each day. 117 billion records, 23 terabytes for a single table. These things mean lots of things are happening in your organization very, very fast. And the point is that when you repeat them and repeat them and repeat them and something is wrong with any aspect of it, you end up with a situation that we call death by a thousand cuts. Now, the most profound lesson I've learned in this business is garbage in and garbage out. And from a data quality perspective, that's so important because if you have quality, excuse me, if you have garbage data going in, it doesn't matter what you're doing in the middle. It could be the perfect model. It could be a data warehouse, machine learning, business intelligence, blockchain, AI, MDM, data governance, technology, analytics. It doesn't matter if you've got garbage in, none of the rest of this stuff is going to work. So you've got to start at the very beginning and clean up that data. Only when you have clean data can you actually push clean data into your various applications and processes. And then and only then will you start to reap the benefits of quality data going in and going out on these. Data knowledge around the organization is informal and insufficient. The data understanding happens pretty well at the workgroup level, but data chaff becomes sand in the machinery. And if you have that sand that is in the machinery, it's going to impact your business in thousands and thousands and millions and billions and trillions of ways that are really kind of unknown. And what that means is that this sand, this grit that's in there is gonna get in the way of you trying to do what you're attempting to do, which is improve the literacy of your organizations, clean up your data supply and start to apply selectively standards. When you do that, you can see it starts to become a little bit cleaner, a little bit smoother in the process and then we can get them all to work together. Now, this process cannot happen without engineering and architecture. Again, we're concentrating in this particular talk on specifically the engineering aspect of it. And yet I was on a tea farm in India a couple of years ago on a holiday and I saw this wonderful Deming quote over the cash register at this tea farm. Quality engineering and architecture work products do not happen accidentally. Of course, what we're talking about here is data. Making sure that data actually understands that if these high-volume transactions are incorrect in any way, shape or form, they're gonna replicate and multiply. And most importantly, it's going to give us a poor quality foundation. Now, those of you that know me know that I'm what's called a horse husband. It means my wife has a t-shirt that says I love my husband in small letters at the very bottom almost as much as my horse. And I put this picture up here, the barn that we built on this. And the key is you take a picture of something like this to show that it's past a building inspection. And the building inspectors come out. One of the things they double check is to make sure that you are in fact using quality standards. In this case, a wreck that is of known capabilities. Now, why is that important to have known capabilities? Because as your business expands, as your mission expands, you're going to need more and more engineering support in order to do this. I'm gonna finish off this little brief section with just a quick example of one of my favorite pieces of engineering things. This is a machine that was built. It's taller than I am. It has a clutch. It was built in 1942 and it is cemented to the floor. You might say to yourself, goodness, what were they attempting to do? Why does a mixer need a clutch? And the answer is, well, when you put 4,000 soldiers on a war fighting machine and send it out to win a war that we're currently losing, one of those things those people will need every morning is breakfast. And if we don't have breakfast for them every morning, we're certainly not going to be able to win the war. And I don't care how many of these machines you have out there, it wouldn't do to try and make breakfast with them. In fact, the duty cycle of that machine is so short, it's probably not going to last you through one breakfast to feed 4,000 people on it. Not the KitchenAid is not a good machine. It's a great machine. It's simply not engineered for that purpose, whereas the machine on the right was engineered to last a long time. And in fact, we didn't know how long the war was going to run, but the machine still works today. You can go to San Diego today and eat pancakes that are made from that very machine. So data quality must be understood as an engineering challenge, but we also need to be able to put a price on it. And my colleague, Tom Redman, according to term a couple of years back called the Hidden Data Factories. Now, consider these two questions. Were your systems explicitly designed to be integrated or otherwise work together? And if they're not, what are the chances that they will just happen to work together? And the answer is not very. So data has to function at the most granular interaction or it results in things that take too long, cost more or deliver less. And that's a problem, because all of these represent greater risk to the process as we're going through and trying to figure out what's actually going on there. In fact, 20 to 40% of all IT budgets are spent evolving data in one form or another. It's migrating data. It's turning around and converting data from one place to another. And it's improving data because we find it is insufficient for use on this. By the way, I'm showing a little I love Lucy cartoon on the bottom there that some of you may be familiar with. This lady's yelling speed up the assembly line because they're doing such a great job. Probably not the best way to create high quality products out of that. One other thing about data quality that too is that it is context specific. So just as these icons mean different things to different parts of the world, similarly, your data quality problems are going to be specific to you. And so we may be able to learn some meta lessons from looking across industries, but in your environment, this is a very personal thing. I'll give you a specific example on this in terms of when we're looking at how we have to do this, we have to do more analysis. This is a very new field to try and find out what combinations of people, process and technology we can include in our solutions in order to do this because the volume of data, it continues to increase. This is a wonderful charge. It's called an infographic from a company called domo.com and every year for the past five years they've added up what happened every minute of every day for the preceding year. So every day for the entire year, excuse me, every minute of the entire year of 2018, the weather channels combined for a fielded 18 million weather forecasts every minute. Okay, gotta have some significant power to be able to do that. NetStream, Netflix streams almost 100,000 hours of video every minute of every day. LinkedIn has 120 new professionals to that. 1,300 Uber rides, almost a half a million tweets and 7,000 Tinder matches occur as we go through this and I hope you're not investing in Bitcoin but if you are, just to know that every minute of every day for the entirety of 2018 we created 1.25 new cryptocurrencies. Should give you a little bit of pause there in terms of what you're thinking about. So one of my favorite books and I've sold many more of Douglas Hubbard's books than I have of my own books is the one at the top there called How to Measure Anything. First thing he points out is that measurement is a reduction in uncertainty. It's not an exercise in precision, instead it's just the opposite. It formalizes stuff which forces clarity when you start to write it down, it gets more concrete. And whatever your measurement problem it's probably been done before. You have more data than you think, you probably need less data than you think. Getting data is more economical than you think and you probably need different data in order to think. And just a quick note on chapter seven, if you've ever had somebody that says, well I want to wait a little longer because I need more information, he's got a calculation in there for how to calculate the value of that delay to your overall project. I like this particular cartoon so much, I bought the rights to it from the cartoonist. Sheena drove to the airport so slowly that she caused 120 other drivers to each arrive five minutes late. And then she gets stuck at TFA, by forgetting to take her laptop out of her bag, I've done it too. Well there's another little bit we can add up. And then she's of course the person in first class who can't get her bag out of the thing when everybody else is trying to make connections at Chicago Air International Airport on a rainy Friday afternoon. Point is that you can add up all of these things and in this particular instance she's killed an entire person day. Now when I say formalizing stuff helps this is an example that Hubbard used from the book where he says how many piano tuners are there in the city of Chicago? And most people go I don't know by the way you can't use yellow pages or Google to figure this out but let's go back to 1938 and say that the current population of the city of Chicago was three million people. The average number of people in the household were two or three. The number of shared households with regular tuned pianos is one in three. The required tuning is once per year. How many piano tuners can a piano tuner tune every day, four or five, and how many work days in a year? Therefore the equation for how many piano tuners are in the city of Chicago is. The tuners in the city of Chicago is approximately equal to the population divided by the people for household times the percentage of household with tuned pianos times the tunings per year divided by the tunings per tuner per day times the number of work days divided by a year. Do we have the right answer? No, but do we have information that will now help people to understand the size of the challenges that we're facing? The answer is yes. Let's take a very specific example. This is an older example for our Virginia Department of Transportation. What we had going in this instance was a number of people that were using duplicative systems, another type of data quality or something that shouldn't have to occur. And by taking this chart, I can take the number of employees by each district and by what pay grade that they are at. And when I look at what's happening there, I can find at least 300 employees that are spending at least 15 minutes per week doing redundant data work. That redundant data work can be costed by finding the entry level point for each grade on here. We don't have to go around and ask people how much money they make, but we can start to apply these calculations in a way that allows us to add things up. And in this instance, we were able to add it up very quickly to a very significant number. So the monthly cost here is 21,000 plus 137,000. And when I go back over here and add up all those districts, I get $10 million annually for putting in bad data quality. Now you can imagine the Virginia Department of Transportation, even though it's a several billion dollar a year organization, was nevertheless happy to recover money that amounted to $10 million annually on that. This way of putting a price on it, when you compliment that with these DIMBOK data management practice areas, allows us now to start to go in and put things in place. So when we look at the DIMBOK, the original DIMBOK, I like to show the original DIMBOK there because it has the little sub pieces in there broken out. The new one we put in just has the labels on it, but it also has one wedge on it. Quality is a good thing. By the way, another really bad thing we did in DAMA was we made data quality the last chapter of the original DIMBOK. And if anything, we never want to say data quality is last. It's symbolic, but it's very important to do that. This gives you an idea of the data quality engineering components. We see inputs, processes and outputs. Again, that's for your reference. We're not going to walk through that, but we will talk specifically here about some definitions. Now quality data is fit for purpose. That is really key. That's why I say it's context specific for you and it's really important to understand that. Interestingly, most people are not aware that spinach falls in as a data quality error because somebody in one of the government offices did a measurement error and was two orders of magnitude off in terms of the amount of iron that spinach was hypothesized to have. Turns out spinach is good for you, but it's no better for you than kale or green beans as well or broccoli if we're going to go that way. So our definition for data quality management then is to look around and establish a program. And it's really key to having a program to do this because you've got to have change management that occurs with it. You've got to have a continuous process for evaluating all of these things. And as I mentioned before, it is a definition that speaks specifically to engineering. These engineering concepts, however, are generally not known or understood with the business and that's why it's important to participate in these types of educational opportunities and learn more about what's going on. Let me give you a quick example on how to use the DIMOC in concert here. A typical question that occurs, many people have experienced this. Well, my data wasn't very good. Somebody told me that buying a data warehouse would solve my data quality problems. Well, turned out it didn't work out that way. So we might have as a first version of data strategy, a combination of data quality management combined with data governance and some data warehousing and business intelligence capabilities that are there. That's what I said, these things almost always work better in threes because three gives you a much better stool to sit on than a two-legged stool or a one-legged stool. However, the first version of it doesn't necessarily work because somebody forgot to look at metadata. So now we've moved it back over onto the other side. And finally, the third version of it here, maybe we're going to look at some reference and master data as we go through that as well. Key again there is to understand that we use the DIMOC components almost always in unique combinations applied to our specific challenges and sometimes within a specific challenge we may reuse those components in a couple of different ways. Now we talked about stories, so let's talk about a series of stories that we'll do. And I'm gonna first of all lay out the groundwork on this. First set of stories are going to be looking at improving operations within your existing company. But there's another component here as well for innovation. And one of the things that organizations get tripped off on is that they try to fix things. So I've been with CEOs that get kind of, oh I get this data quality caused that last problem, okay fine, so let's fix it, can we be done by Friday? And the answer of course no, it doesn't work that way. But even more so within the idea of coming up with data quality success stories, it's probably not good to try to do two things at once as much as we think we're good at multitasking we sometimes aren't. So I'll explain a little bit. Walmart is certainly known as an organization that has absolutely rock solid, very high performing systems and operations. They know how to squeeze the last penny out of operations. In fact, their business model depends exactly on that process. These guys are genius, they do brilliant work, and we'll pretend the folks at Apple up there are good at innovations, right? And again, there's a question set, but let's say they're innovative on this. I want you to imagine Johnny I of the erudite British guy that used to design all the Apple products and tell him to be cheap, right? It's just not gonna work. Or tell the guys at Walmart that are devoted to squeezing the last penny out of their operations to be innovative. Again, that doesn't work either. So this first set of examples here is gonna be talking specifically about improvements in operations. And one of the fun things that I occurred on one of my trips was that I got to Nokia, very fine company in Finland, and everybody in Nokia had one of these things in their office. We're kind of looking at it and going, what is this thing? And they went and grabbed the documentation and handed me a 50 page manual. This is the one page summary of the 50 page manual for how to use their recycling system in there. Now it was a little bit ironic, but within a couple of hours I was actually in front of their CEO, describing to them that they had approximately 50 times more documentation to handle their waste than they did to handle their data quality problems. And a little example like that can be very powerful for them. They've, of course, corrected that particular area. Here's another one. This is the Defense Logistics Agency. So real company, real organization, real case studies. They manage four million items. And the executive in charge requested a conversion update. And they were told verbally the conversion was going well. So they came back, luckily this individual had been coached, and said, well, how many items did you attempt to convert? And the answer was 100 items. Then how many actually converted? Five. We'll just let that hang there in the air for just a minute. You can imagine the officer in charge was not particularly happy at being told that is going well, although in many cases that is going well. So it was not reporting the right results. The problems were discovered too late. And most importantly, we had a very unsophisticated contractor in this. So we got involved at some point here. And the challenge was to take these millions of NSNs or SKUs, stock keeping units. And the data was stored somehow in comments field. It was not in the actual database fields. The reason for that just again, another data quality issue, but was they originally had a hierarchical database and they got rid of the hierarchical database, brought in Oracle, another very fine database. And the Oracle folks made the thing work like a hierarchical database. It's a trick you can do. They did that so they wouldn't have to change any of the rest of their programming. But when they were faced with this, I'm looking at four million items that were stored out there on this and trying to find it. And how did we pull this information back out of it? They said we're gonna have to do it manually. And that really was not helpful. So we developed what would now be called text analytics around it to convert the non-tabular data into tabular. By the way, everybody's gonna tell you that as part of data quality, everybody in this case can convert unstructured data into structured data. Well, that's sorry, that's BS. Instead, you can convert non-tabular data into tabular. So get your vendors speaking to you in the right terms. I like this story though, because it was a $5 million savings to the government in terms of improving the data quality associated with the system. And more importantly, it was the first time I saved the government a Persian century. So we like Persian days and Persian hours and things like that, but we actually did a lot more on this. So here are the numbers on this. And the question was at what point should we stop being automated about the process and switch back over to a manual process? In most instances, the tendency is for IT people to try and get a 100% solution on these things. Here, of course, the answer was that was not the right thing to do. Notice the first piece was to, in this case, determine and set expectations. So the first three weeks, we didn't manage to match anything, but by the fourth week, we had solved half of the problem. Sounds pretty cool. We also had determined that the 12% of the data was absolutely useless. And we could simply throw it away, which meant our problem space was hovering around 30% of the problem that was here. Now the question was how far to continue to push this? And we got to week 14 and they said, well, if you can find one more part of data, we think that would be worth spending X, investing X into it. And we did get that number down to 7.5%, eventually figuring out that 22% of their data was absolutely rotten and that 70% of the problem was solved. So I don't know about you guys, but if I had to go through and do data quality on that many items versus that many items, I'll take the green pile any day. Now, the success story part of this is really key because you've got to be able to articulate this. So we take the number of NSNs, we put down five minutes to clean each and every one of these, we do the number of workdays per year, how much we're paying people, and here's my person century, 92.6 person years. And multiply that times some salary information, there's my five and a half million dollars. And remember, I saved the government five and a half million dollars. So here's the new numbers. When we go in and take a look at it here, I only need now to change 150,000 in here to get the data cleansed and that's gonna reduce these numbers down to considerably less, seven person years, $420,000. Okay, that's a lot better than spending five million. So there's my five million dollar savings from a data quality problem, but let's go a little further on this too. And I can't wait to get Emily back online to question about this. Whoops, sorry, hang on, I went too far on that slide. Let me pull it back. There we go, nothing like interactivity guys. There we go. There's one number on this slide that is absolutely critical and that is the number five minutes in the upper right hand corner. Do any of you on the line think that you can solve a data quality problem? Absolutely not. And so in that case, that five minutes is really critical because when we start to go through and add up how long it takes and how critical the piece is, we double it and if I double it, I now get 10 million dollars. By the way, it's closer to an hour. So kind of important here. Another interesting example here, why should a chemical engineer, somebody with a PhD in chemical engineering getting paid a six figure salary have to know whether a product is Y2K compliant or not? And again, what does this have to do with data quality? Well, we look, this is a big international chemical company and they're doing several billion dollars a year in this to create fuels that burn cleaner, engines that run smoother machines that last longer and they perform tens of thousands of tests annually and these tests cost up to a quarter of a million dollars a piece. So we put together a diagnostic piece for them so that we could take a look at this, finding out that data quality problems were introduced because we had good people who were smart. They had PhDs, but they were sitting down at digital computer A and transferring data to digital computer B by looking at one screen and typing that information on another screen. I'm pretty sure anybody on this call could have absolutely helped them solve that particular data quality problem. They were moving files around in a very haphazard fashion which meant that you would often end up with the wrong inputs to various processes because they were manual data manipulation. People were going in and clipping things out of spreadsheets in an undocumented fashion. There were synonyms that needed to be reconciled causing problems. There were macros that were or were not run depending on where it was. And finally, the real problem here of course was that there were some data quality problems associated with the daytime routine for the product called FoxPro which was never made Y2K compliant. Now we went ahead and helped them out and improved the data quality around on all of this. And when we did, the client came back and said, you've made my group $25 million more productive each year. So there's a really nice success story, very easy to quantify and put in place. This next example I'm gonna pull from Doug Laney. Doug's done some really interesting work on this and he types an example where Lockheed Martin just accumulated 20 years of email from all of their employees on various projects and we're trying to figure out what would be a good use for this very large high quality data set. And they really didn't wanna use it in a way that was invasive but they were able to look around and say what types of groupings, what people work well together. When I put these people on a project the project seems to succeed more than it fails. These are some little bit less quantifiable types of things but nevertheless something that they found that was tremendously useful. So another example is the logistics company out in the Midwest. Now there was a room out there that had a hundred associates in there and I, as I always do, asked questions. I said, what are you guys doing? And they said, oh, well our mainframe system doesn't really produce very good data so by the time it gets to the billing stage every piece of information on every bill is always 100% wrong. The customer name is wrong, the data service provided is wrong, the type of service it was provided, the price we quoted them was wrong, the date we showed that we delivered the service was wrong. Blah, blah, blah, blah, blah. And I said to the individual in the room, hey, you know we can fix this, right? I mean that's a data quality problem that would be relatively easy to fix. The answer was, why should I fix this problem? I just had the best quarter of the best year that I've ever had. In fact I'm thinking next year perhaps doubling the number of people in that room. Well, that may be a very nice thing to think about but literally let's imagine the cost of putting an additional 100 associates in a room for a year. That is going to add up to millions and millions of dollars. Now they were still thinking that was a good value but I went down the hall to the CFO and said, would you like to improve cash flow on your billing procedures or could we get the bills out 30 days faster than we're currently getting them out? Improve cash flow on $9 billion annually by 30 days. And then we came back and said, oh absolutely, who was the person that told you they were gonna put twice as many people in that room and continue to create the symptoms instead of actually fixing the problems? Yeah, so these are things that we can do that help understand how we can derive savings from each of these exercises in improved data quality. Next series of quick stories that we're gonna do are innovation based stories. The first one is a personal story. I had the title when I entered the Defense Department in the late 1980s, U.S. Department of Defense, Reverse Engineering Program Manager. And my boss said to me, your first job is to keep me from testifying in front of a congressional inquiry. Now if you've ever been in front of Congress I haven't thank you but we all know that when that happens the person seated to your immediate right is your lawyer. Somebody's in trouble and we're trying to avoid some explanations on this. So the problem in the DOD was that we had 37 systems that paid people. How many systems did we need within the Department of Defense? I'll take a wild guess and say one. Which meant there were going to be either 37 or 36 users in this particular game and all of a sudden Peter is getting calls. Dr. Akin, you need to come to New Orleans and spend some time here to meet the 2000 people who will lose their jobs if you don't select the system in New Orleans. Repeat with Columbus, Ohio, Denver, Colorado, Memphis, Tennessee, and each of the other 37 processing systems. The point was we needed an objective way to solve the problem and the problem was we weren't using a standard definition for the word employee because when somebody would ask from the Pentagon to Pensacola, Florida, pick one of the systems and say how many employees do you have in Pensacola, Florida? The people in Pensacola, Florida would come back and say, well, what do you mean by an employee? Because after all, we have full-time employees. We have part-time employees. We have employees who work for both full-time force and part-time force. Do they count as one, one and a half or two? You can see this gets confusing very, very quickly and has a lot to do with data quality because data quality was keeping the Department of Defense from actually doing what it was trying to do, which was manage these things. We had done process modeling and come up with incomplete, inconclusive results that just simply all of these process models looked exactly the same. So we developed a technique called Data Reverse Engineering that gave us a definitive answer. We found out there was an authoritative category in the Department of Defense that required a specific pay category for one-legged engineers working in waist-deep water underneath rotating helicopter blades on overtime. And when the people in New Orleans said, why didn't you pick us? We said, you didn't have the one-legged engineers and they looked at their system and they said, oh, and when the people in Columbus, Ohio, said, why didn't you pick us? We said, you don't have the thing under rotating helicopter blades. And they went, oh. In other words, the process was very nicely solved. But more importantly, we were also then able to improve the quality of the data processing within the Department of Defense and start to specifically look around and make some other activities. We ended up writing an article out of this. It's out there if you're interested in taking a look. You can see it's kind of old, but it's not well known. And again, Emily, when she gets on later on in the thing, I'm sure she's gonna talk about that as well because a lot of what we do is trying to help people understand how to do these things better, understanding the way that works. Here's another interesting challenge that we have and just kind of an interesting thing. We're gonna migrate data from place A to place B and here's the attributes that are associated with it. Only thing on this screen that's important at all, look at the bottom number of attributes on the green system, there's 683 of them. On the red system, there are almost 1500. And our target system, as you've already seen, was a PeopleSoft implementation with thousands of attributes. And one of the things we asked as a question was, so how is the contractor who's coming in going to handle that process? And it turned out we got their plan and their plan said, just like most plans, they left the three T's of any systems implementation to the very last minute and then said, we're gonna put two person months on each one of those tasks. Now the three T's are training transformation of the data and testing of the system. In order to transform the data and to look at it from a perspective that says we can actually do that transforming, here's the numbers of the way they worked out. They were gonna take two person months, 40 person days and map 2,000 attributes onto 15,000. So we do some math and find out that's about 62 attributes per hour on the source side. And on the target side, it's about 46 attributes per hour. So you're gonna tell me that you can locate, identify, understand, map, transform and document at a rate of 108 attributes every six minutes or two attributes every minute. Well, if you're that good by all means have at it, but I think most organizations would simply back away from that process and say let's rethink that process and let's do a little bit of risk analysis around it. So if we're gonna try and improve data quality, we need to make sure we have the resources in order to do this and transforming data at a rate of two attributes per minute. We as human beings don't do much, at the rate of two things per minute other than breathe, blink our eyes and hopefully our heart beats a couple times in there as well. Take another example, Rolls Royce, very interesting company. Their old model was selling jet engines and they've come up with a new model where they were instead gonna sell hours of powered thrust and there really was no payment for downtime in this process. They eventually came up with a new piece where they were looking specifically at now having conversations because they could have a much better conversation with people about data quality because they were now starting to be able to partner with organizations instead of be a vendor on them. And one of the examples that they use which was an example that came from NASCAR Formula One types of activities was changing jet engines. So you can see here in the example, the metrics on this one, if we had the audio on it would say we changed two tires and it took 67 seconds. Now if you're running the Indianapolis 500 that's kind of important. However, we've improved that. And the question is what could we learn from the racing industry to replace jet engines on airplanes? And the answer is quite a lot. So here's our new measure, four tires in four seconds. Those are the kinds of numbers that management will pay attention to but there's a secondary effect from a data quality perspective as well. Oh, by the way, just a quick note. That technique was invented by Rolls-Royce in 1962. So that's a pretty interesting piece there. One of the things they discovered though was that now that they were able to have conversations with the vendors was that they could talk about fan blades. The old way of sensing how a fan blade was doing was that you had a sensor on it and at the end of the trip you would download the data from that sensor onto something else which gave you a generalist type of maintenance forecast. Okay, a fan blade needs to be maintained every 100 hours of flight. I am not an aeronautical engineer. These numbers do not make any sense but I'm just giving you some numbers on this. Now, let's imagine instead of one sensor we have continuous sensor monitoring. Maybe there's 100 sensors and we can establish optimal monitoring targets which means instead of being safe and saying, oh goodness, I've got this many engines and this many things that are going on and all of this sort of thing, we can say now this engine has flown only at low altitudes which has a different impact on the fan blades. And so we can fine tune that process and make it much more efficient. But that efficiency changes not just the time at which we maintain the fan blades. It also has the ability to improve the readiness of the airplanes. It also eliminates storage costs, handling, all kinds of opportunity costs where we can put those efforts into something else. And in this case, there was a $1.5 billion savings on the entire process, just from that one little piece. So those are some innovation stories, things that we looked at from an innovation perspective that didn't necessarily have pre-existing solutions that we could bring onto the floor in order to do this. So I'm gonna finish off the section here where we talk specifically about some non-monetary sources because while it is important to put a number on these things, sometimes it's not always all about money. First one, remember the British Armed Forces, this is a wonderful story that Richard Nugie told at one of our conferences a couple of years ago. There's a little tenet that was trying to correct a four-year underpayment to his private. Now, if you've got folks that you're depending on, the last thing you want them worried about is the data quality error in their pay. And we were able to look at that, we were able to determine significant impacts on morale, immediate cash flow issues, and the amount of time it took to resolve the problem far, far outweighed the actual cost of making sure that that individual did not take too much pay home. The British Armed Forces really got this and our US Armed Forces have adopted a similar approach to this, saying that morale is a key mission readiness issue and that we need to make sure that these things are ready to go. And I wanna carry on with a couple more military examples here, that's just because we're a lot of my experiences, but these things are hopefully things that you can find as well in your organization. This is a terrible story that occurred in Iraq where we had a couple of soldiers who were out trying to light up a target. What that means is they go to a thing they want to blow up and they point a laser beam at it and that laser beam helps the bomb go in and blow up the thing that they're trying to do, which hopefully are bad guys, right? Well, I don't know about you guys, but if I change the batteries in something, I kind of expect the thing to continue to work, but this instance did not. So when they replaced the battery, they lit up the target, they replaced the batteries and guess what the targeting did? It targeted them, so we had some special forces soldiers that were killed and others that were injured in a very, very awful, awful type scenario. Working with the military was kind of nice because when you start to talk about how important data quality and data governance are to an organization that does governance anyway, their question was where do we need to go to fix this? If we have something that isn't quality and it's not being governed well, we need to fix that right away because they have a culture around this. Not all organizations have the same culture, which is why that model would not work, for example, with some of the larger companies that I've worked at. Nevertheless, it's still a process that we can bring in and utilize where we need to. So here's another example in the military. This is a different branch of the armed forces, but every time they buy a tank, they end up with about 3 million new data values and exactly one of those data values controls the obsolescence of that tank. Now, one of the things we all want to make sure is that our armed forces go out there with the latest and greatest stuff that will help them achieve the mission that they're trying to achieve. You might imagine the military wasn't really good at figuring out which one of those 3 million values actually controlled the obsolescence of the tank. But with a little bit of work, we can pull in some tools and technologies and do a little bit of mapping around this and find out, in this case, $5 billion worth of savings in order to do this. The real key when we're looking at these is to try and find the correct balance between manual and automated approaches. And just as I showed in my diminishing returns example earlier, it is absolutely critical to make sure that you understand that humans do some things really well and machines do something well. And yes, the dynamic is slightly changing with the advent of machine learning and AI and things like that. Although just a quick note on this machine learning and AI craze that we're facing right at the moment, which is really good stuff when we're making really good strides. In fact, the biggest, most promising area of machine learning right now is processing metadata from our existing legacy systems. Many organizations call this dark data. But the real problem with most machine learning is not the algorithm, it's that we don't have the data to try and actually create this. So AI as a whole, the entire industry is being held back due to lack of quality data in order to do this. Again, let's just take it a little bit further. One of the other projects that we did for the military was the first initial version of the military suicide mitigation project. Even today, more of our soldiers are dying from their own hands than they are from enemy hands. That is not a good thing in trying to fix this out. We have a bunch of different sources of data that we were trying to map from and to and we actually ended up with a 30 by 30 matrix where we're looking at all of these things. And I don't know about you guys, but if you're trying to work from a 30 by 30 matrix, it's generally not a good thing. So the way we were conducting these meetings was that I had this matrix on a big screen and I had my council of colonels that I was working with. And I had a favor that I could ask somebody in the military brass. So by room full of stewards, I brought in a manager at one point who had a very heavy, let's just say he's pretty high up in the chain in this case. And he slammed his portfolio down on the table after the third person stood up and said, sir, you can use my data under these circumstances and here's the quality that speaks to it. He said, all right, let's just change this around a little bit. You guys are all talking about this as your data. How about we talk about it as my data and by calling it his data, this individual changed the entire conversation from can I use my data for this purpose as it fit for these purposes to let's get the mission done in a way that would actually work, wrote this up in a book. He gave me permission to do this. And the reason I'm telling you guys this story is because I've told the same story to more than 100 chief executive officers of companies around the world and not a single one of them will take the courageous step that this individual took in order to do this. Let's finish up with one example here as well that represents the biggest threat to US national security that we have ever faced as a country. And I'm putting the word target up there. Yes, that is our one target. Many of you had your charge cards swapped in and out because target lost control of that stuff. I'm sure none of you on the call have ever heard of a company called Ashley Madison. It's a Canadian website for married people who want to date other married people. Apparently they have more fun with that stuff in Canada than we do. And you take those two pieces and say were they really doing quality data work? And then you combine that with something called the OPM data breach. Now people that were given a position of trust by the United States government who applied for and received a security clearance and disclosed certain information to the United States government as a result of that had that information put in a database that somebody took. So let's see what happens here. First of all, in the Ashley Madison database there were 44 users that use their WhiteHouse.gov email address to sign up for it. It's out there still. You can go look these up yourself. There were thousands of people with military and .gov email addresses. There were bunches of Canadian citizens in fact one quarter of the entire city of Quebec was labeled on that. I'm sure there were some awkward conversations around the table at that point in time when this got released. By the way, the data set's still out there. You can go out and do it. But if I'm a bad guy all I really need to do is take the 25 million people in the OPM database the 37 million in the Ashley Madison database and 70,000 that were in the target database and find one or two or three individuals who were not very smart, signed up to date married people using their work email address. And then I can go over to target because the target database was hacked also and let's just see what target has in it. The target corporate database contents contained your age, your marital status, all sorts of things and they're like how long it took you to drive to work? Because we are creatures of habit and if it takes me normally 20 minutes to drive to work and all of a sudden somebody observing me discovers it takes 40 minutes to drive to work, one may ask the question was I dropping off kids at school or was I having a dalliance? It looks at your salary, it looks at the websites you visit which could have one of them might have been Ashley Madison. Why on earth would Target need to understand your sexual preferences or what type of topics you talk about online? Well, they do it because they can and there's all sorts of other things that are in here but these are the way you identify high risk individuals taking this data and pulling it together in a way that pulls together the threat that we have not faced before in our government, how the US government jeopardized our national security with poor quality data for more than a generation on here. So we've got a fair amount of things that we've covered very, very rapidly in this case and as we're getting ready for the Q and A I'm just gonna do a couple of quick takeaways on this to get us to there. First one is to say that data quality requires a context specific definition. That means that your definition of data quality is going to be relevant to your organization and it may or may not be relevant to other parts of or other organizations that are similar to yours. It's going to require that specific definition and that definition has to become part of your success story. If you don't, people will ask you questions like will you be done by Friday? Most business problems have data challenges, hidden data factories at the root of the problem and you've got to go look for those data challenges. I hope that you've seen over this very, very dense presentation that there's a lot of things that are data quality problems that people would not normally think of as data quality problems. I've done a personally an investigation of well more than a hundred of the major data, excuse me, major systems implementations that failed over the last decade or so and every one of them has a data quality problem as the root cause. All advanced data practices depend on quality data. So if you're having to choose between improving your data quality or moving it to a new platform or doing a visualization on it, I would suggest the data quality piece ought to proceed the other activities in there. Again, the AI machine learning that everybody is so hyped up about are suffering from incredible lack of good quality data to train these data sets. Let me give you one specific example. The data set that we use to train image recognition where we show all of the college and university people that are doing image recognition, how to do this has exactly one image in it to define the concept of a bride. It's a white woman wearing a white veil. There's nothing wrong with that but it is certainly not representative of what happens. There are very few easy fixes but if you start to incorporate this storytelling method into the process, you will be much better able to bring other people into the process. Data quality engineering works best when we combine three data quality, excuse me, three dimbok pie wedges as we call them in there. Hopefully this isn't the first time you're seeing the dimbok but it is important to do that. And your quality stories which have to become part of the DNA of the company that you're working for have to demonstrate tangible ongoing savings, innovative data uses and outcomes oftentimes that are more important than strict monetary pieces. And with that, we're back at the top of the hour and it's time to invite Emily back in and Shannon and let's talk about some questions to which you guys are looking at. Peter, thank you so much and thank you Emily for these great presentations. If you have questions, you feel free to submit them in the bottom right hand corner of your screen and to answer the most commonly asked questions just a reminder I will send out a follow-up email to all registrants by end of day Thursday with links to the slides, links to the recording and anything else requested throughout. So diving in here, are you seeing any concerns out there around data quality and CCPA GDPR? Your example of multiple entries of a single customer seems like it would be problematic with a removal request. And we just saw a Facebook ruling in the European court where they're telling Facebook now they must obey the laws of Russia which is kind of an interesting thing. So yeah, how would you go about the process of finding all the removal things that you've agreed in a court settlement that you would do? I'll just give you one example here. We were working with a statewide data project and we found out that there were 150 systems in this state that maintained information on personnel. So exactly the problem up at the very front where we have lots of different places, how do we know where to go get them all out? If we're trying to do this and if we're not gonna get it right, the European court penalties for GDPR can be as much as 4% of worldwide gross revenue which can be an astounding number when you start to think of it. Emily, what sort of things have you seen out there for people trying to reconcile and grapple with GDPR? I find there's still fairly low awareness in this country but it's kind of growing. Yeah, I completely agree. We actually get quite a few questions about it but in terms of implementation and the thought process around how to address it is still relatively immature in my view, a lot of what we really help promote is trying to, as a part of more of your typical data quality use cases, you're ensuring accuracy or completeness of information, you're working from system to system to reconcile data, detect sensitive data, look for those patterns as a part of that to flag where the potential data, sensitive data is and be able to document where that lives. That's something that we're in particular really focused on. We're working with the number of organizations to figure out how best to go about doing that because we do see it vary from organization to organization. Again, you talked a lot about the business value, where are the critical data elements and those sorts of things across the enterprise. So really making sure that as a part of your data quality processes, you're actually thinking about some of those security detection and sensitive data detection as a part of those processes. It's just another set of rules, if you will, that get applied to the same data sources that you're already monitoring for quality purposes. And Emily, how do you guys use storytelling either as part of your sales process or as part of your sustainment process? And let me just explain sustainment to the group here. But it's great to get these things started and many people look at this and get very excited and say, oh, good, this is wonderful. We've done some things and we've achieved some business value and then the champion moves on to another focus and that sort of falls apart. So that's what we mean by sustainment. You've got to have an ongoing program in order to do that. Yeah, it's interesting. A lot of, we typically will start working with an organization because something really bad happened. A lot of those examples that you shared in terms of customer communication inaccuracies and other sorts of reporting inaccuracies, that's usually the trigger point for needing data quality. We're definitely seeing more enterprise data quality and governance initiatives being put into place, but then it's how do you not boil the ocean when you're talking about those programs. So a lot of what we focus on when we're talking to organizations is what is that biggest pain point for you that you need to solve immediately? And what tends to happen when we start to build these quality implementations is really, you're starting to touch a number of applications or business processes and then it starts to snowball from there. So if you take your customer communications, you're dealing with a CRM system, a marketing database and potentially some sort of outbound communication system. Now you're going to reconcile those three sources for accuracy or sensitive data, but then what are the other adjacent systems or processes that are associated with it? You have half of the puzzle for the next process. So you start to look at what you've already started with and then start to scale out into the adjacent processes to be able to keep that momentum going. So it's not necessarily an enterprise initiative. You don't really know where to start. Start with one area where you can prove some initial quick values and then expand from there. And the other part of it is to understand that you're practicing. So the first time you do this, you're going to be okay at it, but the next time you do it, you should be better. And the time after that, you should get better still. This is the nature of a program, is to build up into that kind of a momentum type of process. One of the things that we have seen over and over again is that organizations will try to go off and do a data quality piece, but not fully understand the root cause of it. And so we may get somebody, and I will catch this all the time, where they've got a really nice visualization scenario where they're doing all kinds of really good things. And when they correct the data, they go back and correct their pile of the data, but nobody ever tends anything back into production to say keep this from happening again. That's really kind of unproductive in terms of your data support. I mean, what it really comes down to is, I think this is what the questioner was getting at, how are you going to respond to GDPR? How are you going to respond to CCPA? How are you going to respond to PIPA, which is the federal government law around this? It only applies to federal agencies, but it's still going to have a ripple effect out into the private sector. You don't know it. If you don't know where it is, how are you going to manage it? And if you can't manage it, you certainly can't improve any of the quality around it. Right. And we actually see that you usually go into these initiatives, thinking the root cause is one thing, but then once you start to really dig into the root cause you have, let's say analysts who are focused on resolving certain quality issues, you start to see the actual cause and then the shift focuses on where you have to then adjust how you apply rules or who else needs to get involved to really own the quality issues and make some changes. And that typically starts to work its way upstream to some of the environments that are really upfront. When third party data comes in or when you're working with data sources across organizations, or as you do these cloud migrations and some of these other projects, we often see the initial thought around the quality challenges ends up being something fairly different. So being able to adjust course as you work through implementing these programs is critical. And I'll add just one more piece for our next question, but that is the idea that a data steward, one of the responsibilities that is now considered a best practice for a data steward is to know both the sources and uses of the data. So it's not enough just to be able to look at the data right in front of you, you have to know where it's come from and where it's going. It's virtually impossible in most large organizations to know everything about all your data, but at least if your stewards are looking backwards and forwards upstream and downstream in these areas as you were describing Emily, they have more ability then to connect the dots and really come to that different. You know, okay, I know we started out with this concept of the problem because we had a burning bridge issue and it seemed like that was it, but now I have a better understanding of that. Right, yeah. And that's where a lot of these governance programs are really forcing the prioritization of these data quality programs. You know, we've always done data quality, we've always approached it in different ways, but as we have more data stewards and whether it's a centralized or federated model across the enterprise, there's more folks that are really looking at what is the upstream downstream impact? How do all these data sources fit together and really articulate what's going on in the business process? So that's where quality, when we talk about lineage and all these things from a metadata standpoint, data quality becomes an integral part of the success of those programs. Absolutely. What readiness framework should exist or be created to begin data quality assessment for an organization? Well, when you talk about a framework, there's a number of different types of frameworks. There are governance frameworks, quality frameworks, et cetera, et cetera. I think I'll see if this answer suffices, but maybe Emily will have some different thoughts on it. I think that really it's a matter of opening people's eyes. Emily was just describing how somebody's conception of a data quality problem can evolve over time as we continue to ask the question, why, why, why, why, why five times and get to the root cause of the problem. By the way, there's nothing magic about the number five, but if you have bugged somebody that much, you're probably gonna have a more in-depth understanding of that. So what we need to do is to sort of adopt a data quality, what I call a data quality firehouse model. Many people call it a center of excellence and other sorts of things. I mean, just imagine taking all of your knowledge workers in the organization and saying you need to learn how to do data quality. Most of them will say, eh, you know, whatever. But if you can take a few people and who learn how to do this well and put your efforts into a concentrated area, you will get much better results in your overall experience. Again, if I could have 10% of 10 people's time, that gives me a single FTE, but you know, it doesn't really work that way, and I will always trade the 10%ers for one full-time person because that full-time person is going to develop a much better understanding of how storytelling in data quality is important and more importantly how data quality is really an integrative discipline around all of these pieces. Emily, what sort of things have you discovered in that? Do you have a framework that you guys use specifically? Yeah, actually some of the biggest successes I've seen is actually when data quality is at the table in your PMO organization. So anytime you start up a new project or you're making adjustments in an application or you're adjusting a business process, data quality is at the table. We're increasingly seeing more organizations do that. I do think that building the framework, it does tend to be slow at the beginning. You know, you need that initial ROI to really prove that it's going to work, that you can calculate a true value to that. You know, a lot of what we see is when you catch that one duplicate financial transaction that is, you know, $500,000 and you know, immediately you see some ROI to that. That's more of the tangible. You know, you articulated in the presentation more of those intangibles. How much time are you spending on working these issues? You know, there's a way to calculate all of that. Once you do that at a smaller scale, then you're able to build that business case on bringing it all the way to that Center of Excellence type of concept where it's just based into your processes and your engineering initiatives and really be able to address that. Because otherwise it is often seen as a added work we have to do or it's going to slow our process down if we put data quality into place. But the benefits of doing that certainly outweigh the rework that you have to do after the fact. And the PMO example is a very good one to bring up because PMO should see all projects of a certain size run through it. So I'm going back to the example of somebody proposing to do work for somebody. In this instance for this PMO that we were talking about, they now understood the three T's and they would always look at any bits that were coming in and saying, so did they treat the three T's in the standard fashion which is to not pay much attention and assume there's going to be an overrun and therefore a renegotiation of a contract and more money to the vendor in this case or can they be proactive about it and say, gosh, you know, you really think you can do 2,000 attributes mapped onto 15,040 person days because that's two an hour. I'm sorry, two a minute. And two a minute's pretty good work. If you're that good, I'd even be willing to give you a raise. Let me add one more point in here, Emily, that you brought up too, that's very nice. Which is also in this particular example here, this is literally the project documentation from the project I was talking about. We found that one of the partners was going to bring in a full-time, excuse me, one of the proposers was going to bring in a partner and charge 2,000 hours a year for three years. And we went back and said, wow, that is just wonderful. We're so fortunate to be able to have somebody with the expertise of a full partner who's going to be on site here working, you know, a full year for three years on this. We feel this project is going to do really, really well. And at that point, we got this little sort of pushback and they said, oh my goodness, I'm sorry, that was not what we intended. Our partners don't actually come and work in your site on our projects, you know. They work at headquarters, they got lots of other things to do. And we reminded them at that point of another clause in the Master Services Agreement that said anybody's going to charge time to the project must be actually on site. And they went, oh, no, we can't do that. We'll have to change that 2,000 hours a year to 200 hours a year. We'll call it oversight. They'll come visit occasionally. That doesn't sound like a really good data quality problem, except that right there, we cut the bid by three and a half million dollars. And if it results in better outcomes for the business, I consider that to be a data quality outcome that's a success story. Yeah. I love the conversation going back and forth between the both. It's just been great. There's a comment here, which I think that you'll both want to add on to here. I started as a data quality person in 2005. We were reverse engineering quality into a data warehouse that started in 1998. Initially, we focused on metadata. We now have a metadata staff and metadata analyst position. Any thoughts on that to set up and any additional comments to that? Demi, you want to go first? I got one, but... Yeah, it's been an interesting ride for me personally. I'm operating kind of on similar timelines here at InfoJix and data quality has shifted over the years from really being more of a plug holes along the way, fix that financial report or for a particular compliance initiative, inspect then SOCs or CCAR or NAICMAR. And a lot of what's happened is now people are more and more, we're finding that metadata is directly related to data quality. It's not just looking at transactional data and fixing problems or reconciling it across the way. There's so much context that sits within the metadata to support quality initiatives. So we too are seeing a lot of being able to bring together data quality roles with metadata management. This is where your data stewards are also becoming more responsible for data quality in support of metadata initiatives. There's definitely a lot going on there as far as how these two roles come together. I don't know, Peter, if there's more you want to add just that we are in a discipline that is still in the process of maturing. And so we used to consider that data governance, for example, was contained within a label that we call data administration. And of course we're gonna be governing the data, but there wasn't explicitly named. And so we eventually broke it out and said, hey, we need to do this. And this does represent our improved understanding of the problem space that we're dealing with as well as our improved understanding of the need for the specialization. Again, if I could have 10% of 10 people, that's good. And that should be an FTE, but I'd much rather put that expertise into one individual where I can really get that individual up to speed and quite expert at some of these things because they are simply non-trivial in here. And I love the comment about really focusing in on metadata as a specific practice. There's another whole, I'm sure you've done it too, I'm like, you know, people will call us up occasionally and say, well, you know, we really don't wanna improve our data quality. We wanna start with our metadata quality. And yeah, you need to do that. But you know what? Metadata is data. So most of what we're talking about here can apply to metadata as well as data. So how do you... That even more into the elevator pitch of needing data quality staff and... Yeah, Shannon, I think... Sorry, Peter, I'll jump in here. I think you're really to help justify more data quality support, especially as you're looking at bringing together things like sensitive data detection with data quality, with a lot of the security and compliance initiatives that are going on. Each of them have very defined roles and skillsets, I believe, while metadata is coming together with data quality, the folks that are working on the metadata aspect of things are really helping bring that business context. Whereas the data quality analysts, they're chartered with making sure that issues are being assessed. We know exactly what the quality problems are, where we can improve the processes and the systems to work better together. Identify what other data quality rules need to be put in place to enhance the metrics. You know, we talk about all these newer initiatives and technologies using machine learning and AI and all that stuff. You still need those analysts at each of these levels from a data quality standpoint, a metadata management, to really put that human touch on assessing these problems and identifying where the improvements need to be in the process themselves. So as we look at building metadata management programs with organizations, data quality being a piece of that, we do look at what are the types of quality challenges that bubble up, and then how do you go about staffing appropriately to really focus on that so you don't leave a gap. You're doing all this great work in these programs, being able to make sure that you have the staff or the technologies to be able to really support that end-to-end process. There's a number of ways that we look at it, but there's definitely value in making sure you're capturing metrics around this whole process to support the different skills that are needed in this area. And I'm gonna layer another challenge onto that too very briefly, which is that while we're doing lots more work in the area of data education, we're not looking at it yet still in, I think, a correct fashion. So our quote, data science approach is that lots of people can do really good things with the algorithms, and that's wonderful, but they expect to have perfect quality data. And so I've been in shops where people say, oh, well, we just discard all the outlying values, and I'm going, oh, you shouldn't discard them. You need to find out where they're coming from and why, because they might, in fact, be important. And we had one customer that had a critical business failure that was signaled in the data quality, but the data scientists were simply discarding the value because they didn't like it. And there's a comment right in line with both of you here, especially along the lines of technology, so saying, you know, artificial intelligence, machine learning, and deep learning will never yield its promise without data quality. Right. How do you train the algorithms? If you don't have good algorithm training data sets, you cannot get good results in your algorithms. It's an interdependent process, and we've been concentrating on one side. We need to come back and look as well at the other side of it. Yeah, that's where a lot of the... Oh, go ahead, Emily. Sorry, I was just going to just add one thing here. As far as... That's where we're seeing metadata management also being increasingly a key area in support of these analytics initiatives because it's, you know, to your point, Peter, you know, we see a lot of organizations trying to discard data that they don't believe is valuable, but once you start to understand the context of the information you're working with, it actually turns out to be fairly important. So really, this linking of metadata with quality initiatives and analytics programs, there's a direct hide-all of this. Absolutely. And there is a comment very much in line with that thing. I'd like to learn more about automated metadata management, specifically data lineage. So that's another topic, obviously, but we can certainly dive in there. But again, given that there are ways of approaching an Emily U-referencer versus engineering yourself as a component, you know, what can we do to help improve those practices? First thing we've got to do is get awareness because we're not even teaching students that metadata exists. We don't teach students that case tools exist. We don't teach students that data quality tools exist. So they come out there and they're thinking they can solve everything with an Excel spreadsheet. And we know that there's many problems that simply are not solvable by Excel. Mm-hmm. And lineages has been an interesting one for me personally because, you know, we're all looking for that easy button to connect data sources, to identify and present visualizations of lineage across the organization. And, you know, there's so much information that sits in these systems. Being able to automatically draw that lineage that is meaningful, you know, and really spotting what is the high-value data across these, across these, there is no magic bullet. There is, you know, there's a lot of ways to determine what is the high-value data. But if it's more work, and it's your point, Peter, here, there's a lot, there's more that goes into that. We do need to take some time to really understand that. Automation alone isn't gonna really help that, to help that situation. In fact, we've got some good numbers now that show us that automation, technology, these types of components really represent 5% of our data environment, and that the people in process issues associated with it are 95%. And that's an astounding number, but it does have some good basis in there. I put this slide back up a second time because it just frustrates me, because I, you know, I watch LinkedIn like everybody else does. And, you know, about once a week, somebody will come up with a, ah, petty easy ways to do this, or four easy ways to do that. It's like, I don't think any of them are involved, and this is just not helpful to me. It bugs me. Now, I will say this is from July 2004, but I could pull one out from last week and come up with it just as easily. And it's just very frustrating. Now, all that said, there's definitely a place for automation, and actually we often see it as a way to jumpstart this, right? Once you start to see the metadata, then that starts to snowball, and you can start to challenge it. You're not working off of that blank slate, if you will, and what metadata is out there. So there's definitely a place for it, but that alone isn't gonna, isn't going to solve the problem. And as you noted, Emily, if you fix one problem in that garbage data on the left-hand side there, you're actually improving many things in the middle, which should help. And that's a tremendous leveraging that organizations are not aware they can take advantage of. Well. Can we get back for one more? No, that's all the questions for today. Lucy, if somebody has anything in the bottom, they wanna submit final question. We'd have maybe, we just have a couple minutes left, but I wanna thank you both so much for a great presentation today. This has been fantastic. Now, again, I really love the dialogue going back and forth in the Q&A here. Emily, thank you so much for joining us this month. And thanks to InfoJix for sponsoring and me helping to make all these webinars happen. So are you guys in Chicago in a week or so? Are you guys gonna be at the Data Architecture Summit when we're in Chicago next week? Is it next week, Shannon? It's week after next. Next week. We will not be at Data Architecture. And we will not. I'm excited to hear that. Well, we will be, so. We'll see you soon. I'm sure if there's other shows we'll definitely be at, so. Super. And just a reminder, I will send a follow-up email by end of day Thursday to all registrants with links to the slides, links to the recording, and we'll send over some information on InfoJix as well. Well, thanks everybody. Thank you all, and hope you all have a great day. Thank you, Shannon. Thank you, Emily. Thank you, guys. Thanks. Bye-bye.