 Hello and welcome, my name is Shannon Kemp and I'm the Chief Digital Manager for Data Diversity. We'd like to thank you for joining today's Data Diversity webinar, Getting Data Quality Right, Success Stories. It is the latest installment in a monthly webinar series called Data Ed Online with Dr. Peter Akin. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar. For questions, we will be collecting them via the Q&A in the bottom right-hand corner of your screen. Or if you'd like to tweet, we encourage you to share highlights and questions via Twitter using hashtag Data Ed. And if you'd like to chat with us or with each other, we certainly encourage you to do so. Just click the chat icon in the bottom right-hand corner for that feature. And to answer the most commonly asked questions, as always, we will send a follow-up email to all registrants within two business days containing links to the slides. And yes, we are recording and will likewise send a link to the recording of the session as well as any additional information requested throughout the webinar. Now let me introduce to you our speaker for today, Dr. Peter Akin. Peter is an internationally recognized data management solid leader. Many of you already know him or have seen him at conferences worldwide. He has more than 30 years of experience and has received many awards for his outstanding contributions to the data profession. He has written dozens of articles and 11 books. The most recent is your data strategy. Peter is experienced with more than 500 data management practices in 20 countries and consistently named as a top data management expert. Some of the most important and largest organizations in the world have sought out his expertise and Peter has spent multi-year immersions with groups as diverse as the U.S. Department of Defense, Deutsche Bank, Nokia, Wells Fargo, the Commonwealth of Virginia, and Walmart. And with that, let me turn everything over to Peter to get today's webinar started. Hello and welcome. And welcome to you, Shannon. Thank you as always for a great introduction. I hope everybody's having a great day. It's a relatively nice all day on the east coast of the United States where I'm at today and at least got some good weather up and about on all that on a good week on this. So I actually got off my charts here, Shannon. I haven't done that for a while, but there we go. There's our first chart. So topic for today, and thank you for joining us, of course. And more importantly, for those of you that are joining us for the first time, do remember this is what I'm trying to give you as reference material that you can come back to me later. So a lot of people go, oh, it's probably going fast. Yep. That's by design for your value. So anyway, topic today, data quality. And really key to getting data quality right is to truly treat it as an engineering discipline and make sure that everybody understands the various success stories that are part of the process. So let's dive in and just give a quick overview of where we are headed today. First of all, it's critically important to adopt a very broad definition of data quality on this. I'll give you a couple of examples in just a minute. It's also really important to understand data quality in the broader context of the organizations that are using the data. From those two, we can easily get to the third proposition, which is that it's critical to approach data quality as an engineering discipline. We're going to have to be able to start putting price on lack of data quality, which means you can put a price on data quality as well. I'll talk specifically about some savings-based stories, some innovation-based stories, and then some monetary, excuse me, non-monetary example stories on this. And again, these are all from a book that we did on this a couple of years back. So let's just jump right in and get to the broad definition of data quality. And I'm going to start off with a self-referential story being a guy, and at one time a bachelor, but recently married not, recently here, it was 16 years ago. But anyway, not that you guys care about any of that stuff. I had a microwave oven, and I bought the microwave oven for $40. And it had something that I really liked in the microwave oven, being a bachelor guy at the time. It had a microwave oven glass turntable. And when you put your leftovers on it, it would heat them without you having to stir them, right? Typical guy stuff for all of this. Well, of course, I drop the glass turntable and break it, so I say to myself, I can fix this. I'm capable. I get onto the General Electric Appliances website, and I look for not an appliance accessory, but in this case, a repair part, which gives me a drop-down category. And I can pick microwaves from that drop-down category. Clicking again and go, it says, OK, I need to know the model number of your microwave. Now, if you've ever tried to read a model number on the back of a microwave, the microwave must be dismounted from the wall, and you still need a magnifying glass. But I eventually got the number there. J-E-S-1-0-3-6-P-W-H-0-0-2, and I put that in, and I say, great, they don't know. What I need is I need the removable turntable. It's for my typing in that second go box there. And I click it, and it says, sorry, the part or keyword research could not be located on the model, you select it, please try it again. Well, that's kind of hard because I know how to write all the letters that are in this microwave glass turntable and things like that. Finally, it turns out they have a schematic, and luckily I knew what a schematic was, so I could look at it and describe this thing. And it turns out what they call this is a tray dash cooking, right? Goes off the tip of everybody's tongue. And here's where the data quality part starts to come into real play, although we've seen data quality issues all the way up here. Price on this item is $48. And when you go back and add it into the delivery charges and things, it's $56.95, and the DERN microwave oven was only $40 in the first place. It would have been impossible to solve this data quality problem using strictly tools. And so I definitely put this in the category of data quality problems. Here's another kind of easy one to get. Looking around the United Kingdom and recognizing that as many times as we do have miraculous things that occur, 17,000 pregnant women is not a problem with what's there. The problem here, as you can probably guess, is that the medical codes were similar. So a miscue on part of the data entry clerk anywhere produces a result that makes Burton look like it's got men that are pregnant. Yes, absolutely, the data quality problem and all of that. The IRS sending checks to dead people at one point, for example, this is literally just off the presses here. Now, first of all, let's look at the title here. Again, Treasury sent more than 1 million coronavirus stimulus payments to dead people Congress watchdog fines. The Government Accountability Office is one of the finer aspects of the government. And it was unfortunate that this report got played because if you think about it and do a little bit of math, they had to send out quickly 160 million payments. And the ones that went to debt taxpayers made up 1,100,000 of them, which meant that their error rate on the part of the government was 0.4 percent. That's actually pretty darn good and turns out there was actually a little reason in this case because the IRS lawyers had actually asked that question of, can we keep checks from dead people? And they said no, because it might belong to their state. Now, the unfortunate part of this, and this is a good article from site, should have been the government did its job pretty well. And the quality problem here is that the government wasn't able to get out in front of this story and say, hey, this is really good. Instead of a typical bad story, it's actually good. Within two weeks, the IRS electronically delivered 80 million payments electronically. That's an amazing part of it. And this will enable folks to get back to work faster with respect to all the data quality problems, absolutely. One final one here. As a congratulations letter, it came into the bank and we called the people that had sent us the letter and said, hey, where can we spend this? And the bank said, oh, we can spend it on anywhere. So we can buy a car, can we buy it? And it was really a good conversation until finally the person on the other end of the line said, oh my goodness, did we really send you a gift card for zero dollars? I am so sorry. Again, tools alone wouldn't have presented this type of an error. And quite frankly, sometimes you lose confidence in the ability of the bank based on something like that and did not have a good ability to respond. Every time we encounter these things, I always say, who are you going to tell about this clear error in your process? And some of them say they have an ability to deal with it. Some of them do not. Data quality problem? Yes. Why 2K bugs? Some of you may or may not know of this, but it's actually labeled out there as one of those hoaxes. It was not, I assure you. But before the internet, computing resources were expensive. So it was worth the trade-off to represent the year field using two digits at the time because the anticipated life of these systems was measured in years, whereas they turned out to be actually useful for decades. And so the year 1959 was represented to the computer as 59, subtracting 59 from 99 yields of correct answer 40 for dates prior to 2001. And nobody expected these programs to still be in use. The documentation was poorly created and maintained. And if all those fields weren't expanded to the right date, then the date calculation wouldn't be in correct results, subtracting 59 from 00 yields the incorrect answer minus 49, excuse me, minus 41. So no one knew how long it would take, but we didn't know when it must be completed. Again, 1999, did it quality error? Absolutely. Again, if we didn't get it before that clock, there was no point in doing any of it. And interestingly enough, the official clock of the U.S. got messed up on that particular night as well, just a real good stamp on it. Again, another one. This is the title that I had when I worked for the Defense Information Systems Agency in the early 90s. I was the U.S. Department of Defense Reverse Engineering Program Manager. Sounds like a great title. And my boss said to me, your first project is to keep me from having to testify in front of a congressional hearing. And I'm thinking, I thought CIO was the career is over kind of thing here. Just briefly, the problem was we had 37 systems in DOD that paid people at the time. We needed, of course, one. The question was, who were going to be the losers? And the answer was everybody except the winner. And the winner could be none of the above. So they really did have a problem of how do they go through and figure out what was going on? So remember, the Pentagon was put in place to manage the departments of the Navy and the departments of the Army after the end of World War II. And one of the things that Pentagon would ask the various 37 systems was, how many employees do you have? And instead of getting an answer back, they would get back the question, what do you mean by employee? And that's a very significant piece because, of course, if you don't understand what the definition of an employee is, that could be a real issue and you're not providing the right information. So it was a correct response, but nobody was understanding the conversation that was going on. The real primary factor in all of this was that 30% of the Department of Defense workforce also had a second job with the Department of Defense. Do you count them as one, one and a half or two employees? It does make a difference, and that was what prompted the question, what do you mean by an employee? So we had done process modeling of all of these things and ended up being credited with the invention of data reverse engineering techniques, which allowed us to definitively determine that a one-legged engineer working in waist-deep waters underneath rotating helicopter blades on overtime was a valid job classification. A data quality problem? Absolutely. Here's one for COVID. It's just going on right now, right? So Microsoft Excel is being blamed, well, it's not Microsoft Excel, it's being blamed, but these are under-reported figures for the British Health System that occurred because somebody used the wrong version of an Excel document file. If I haven't lost you already on esoterics, good for you for sticking with me on this, but imagine trying to have this conversation with somebody that there's nothing about what we're discussing here. So literally million rows were dropped because XLS files only maintained the first 65,000. There was no error message pass, so the operator was not aware of the error, but you can see here the undercount was tremendous. Is it a quality problem? Absolutely. So what happens here is that reasonable individuals disagree on basic definitions, but it really doesn't matter. The answer to the question is data quality a part of data management or is data management a part of data quality? The thing is there's something wrong and we need to fix it. And for data manifest itself as multi-faceted organizational challenges. In other words, people don't see a data quality problem. What they see is an IT system that presents them with a business challenge or a business process or any number of different things that could go about describing way in which systems work. But of course, at the heart of all this is the data. So if you take a good system and feed it with bad data, you will end up with bad results. And these poor results are what all of these things have in common, which the upshot of this means that data quality problem is very much like the story of the blind man and the elephant. So if you don't understand it fully, there are different parts of the elephant and people talk about it and grab it. And the key from a data quality perspective is similar that people look at symptoms not realizing that they all share this underlying problem of the data quality not being addressed. So the focus are within boundaries rather than across boundaries and there's a fair amount of confusion and dispute that occurs in this. So the key of course is to look at this as I said before as a engineering discipline around here. Quality data is data that is set for purpose. And we use that as a very standard but obviously quite subjective definition. Interestingly enough, there's a little bit of data, but I'll tell you in just a second on that. That for purpose means it's synonymous with information quality because poor data quality results in inaccurate information and therefore very poor business performance. And so we have a discipline called data quality management that focuses on how that process is run, which is critical to incorporate supporting processes from change management. And also an ability to understand and really read what's happening out there from the perspective of the business and how the business is using the data. And this leads to this data quality engineering, which is that if you're just managing your data quality problems, you're reacting, but to truly get in front of them, you need to engineer or probably re-engineer a solution in order to get there. That's a very well understood discipline, but not even known as a discipline within or outside of IT. And I'll tell you a story a bit on that here. One of the funny things in the US is that there's this fading and still strong belief that spinach as a vegetable is better than other leafy greens. And the reason for that is because there was a data quality error that occurred within the Bureau of Labor Standards in there. And they actually misplaced a decimal point by two orders of magnitude. So it looked for a while like spinach was the best thing to get a hold of. Turns out it was a data quality error all the way around. So again, our key here is that data quality has to be looked at as a fit for purpose. And I urge you to adopt similarly a very, very broad definition. Another good technique to approach this is root cause analysis, which will also give you the same kinds of results. The data really is the challenge that you have in there. Let's get a little further. Data quality occurs in a context of organizational use. So one of the things we talk about is often separating the wheat from the chaff. I hope you'll agree with me that better organized data increases in value. If somebody disagrees with that premise, then I want you to take a book and rip the spine off the back of it and hand them a bunch of pages that don't have page numbers and all sorts of things like that. And again, yes, better organized data increases in value. And so data that has organization is data that has value. But one interesting piece of data is that 80% of all organizational data falls into the category of rock. That stands for data that is redundant, obsolete, or trivial. The question for organizations is that they find out not analyzing, not using the majority of their data. And the question is, of course, which data should I keep and which should I leave behind? Well, let's go to a precise model of what we're going to talk about as we finish the first chunk, sorry, second chunk here. First of all, the number 42, for those of you that don't know, is the meaning of life, the universe, and everything. So say if Douglas Adams, author of The Hitchhiker's Guide to the Galaxy. Now, I've just taught you some really wonderful stuff. 42 is the meaning of life, the universe, and everybody. And if you didn't know anything else from this webinar, you certainly have that now that you can take with you. I've given you however a random fact and paired it with a meaning. You may or may not like the meaning that I give it, but it's a wonderful book I assure you. And that's the essence of data is pairing a fact with a meaning. So 42 means the meaning of life. My year of birth being 59 means I'm old enough to buy adult beverages, right? Facts and meanings combine to put those together. But of course, we don't want just data, we want useful data. And in order to understand what useful data is, we actually have to understand how it is being thought of to be used by the business. So useful data combined with a request gives us the difference between data and information. Data's out there, information has been requested from it, so if somebody wants it. And you'll observe the obvious at this point that you can have data without information, but you cannot have information without data. I get real frustrated when people try to manage these things separately. They are so integrated and so interdependent. It's much more trouble to do that than it's not, but I don't know if people ask me that question these days. We're not actually all the way there though, to get to Nirvana, we have to understand not just what people want from information, but how they actually use that information. And that's the part we have done the least on is understanding strategic use of this information. We'd like to be able to gain access to it and be able to use it in the context that we want. But as I said already, this as an architecture is still dependent on one of my most profound lessons. Bad data plus anything awesome is still gonna give you bad results. And of course that occurs because we have organizations that don't kind of get the way that impacts them. So they may have the perfect model, but of course poor quality data is not going to help them out. I'm gonna do that again, because I also want to say perfect model, but in addition to the perfect model, people start on this and focus on the algorithms. There's a new series of artificial intelligence, machine learning components that are coming online, which also envision what they're called training algorithms. And a training algorithm needs data. And here of course is the same challenge. If we don't have data of sufficient quality to get us what we need to in terms of results, the perfect model will still give us poor results. And literally we thought this year was gonna be marked by the presence of the absence, excuse me, of data to feed these algorithms, which means they have progressed and helped it. The progression because they've been unable to continue the pace that they need in order to do this. Similarly, I can replace whatever a model is with the data warehouse or any type of machine learning, business intelligence, you know, let's go blockchain, AI, MDM, data governance, analytics, right, technology, all these together. And of course if we don't start to take the quality and put them in there, we are gonna be unable to do this. So hopefully this convinces you that a good perspective from a data quality perspective is to understand it's data quality as a contextual thing within your organization. So let's talk about what it means to approach something as an engineering challenge. And let's first start off with a Y, which is a good great thing to start off with. So these are just some numbers from some customers that we've worked with over the years. I just want you to get a sense of the order of magnitude. We're talking about the very bottom one is almost 30 billion queries a day, okay. Well, if you have 30 billion queries a day and you repeat those 30 billion queries lots of times, like 30 billion times, it adds up. And if you can save a little bit of it, the phrase we use is death by 1,000 cuts. We really don't want to use that phrase because first of all, they're not dead. And so it's probably closer to something like bleeding unnecessarily from a lot of cuts, but that sounds pretty awkward to maybe working while bleeding. Again, this is where you guys can help out and go on at the end and we'll come up with a better way to describe it. The key though is to understand what I call the better data sandwich here. So data sandwiches consist of three components, data literacy, data supply, and some use of data standards. These are going to change or vary from organization to organization, but what we do want to do is get better at all three of them. And if we do this process and get better at all three of them, you'll find they work better together. And when they work together, somebody will eventually point out a Deming quote from many years ago that says, this can't happen without engineering and architecture. In fact, I was on a tea farm in India a couple of summers back and this Deming quote was on the cash register, quality engineering architecture work products do not happen accidentally, in which we're going to put data in there as well, just to make sure that it works that way. But you can see we can't build something that's this big and it's going to be as massive as the way many of your organizations are without the engineering part of this. So let's define engineering by contrasting it with architecture. Architecture is used to create buildings that are too complex, see what I said buildings, systems buildings are one type of system that are too complex to be treated by engineering analysis alone. For example, in an architectural context, you're going to require the technical details as an exception, whereas engineers develop these technical designs and typically are supervised in a sort of manufacturing building contractor kind of arrangements on this. This is my favorite example of engineering and just indulge me for a minute here, but I'll give you a couple of characteristics about it. It's taller than I am, so that gives you a little bit of a spank and I'm average. It has a clutch, it was built in 1942, it is cemented to the floor, clue there, right? And it's still in regular use. So what's going on here? Well, if you have 4,000 war fighters who have gotten on the battleship, excuse me, aircraft carrier midway and gone out to help us turn the corner of World War II. Again, our problem was in 1942, we were losing the war. So we put 4,000 war fighters on this ship and said, please go out and change the fortunes that are occurring here. And one thing they needed every morning was breakfast. They needed to make sure this thing is cemented to the floor. So unless the aircraft carrier sunk, this thing was probably going to be available. And more importantly, making pancakes for 4,000 people requires a lot more than 4,000 of those things that I just added in there. Again, they're wonderful machines, but they're not engineered to be able to be, first of all, in use for well over 60 years, but secondly, in the kind of use, again, making pancakes for 4,000 people every day for years at a time. That's astounding, astounding, and it can't be done without engineering. This is one of the reasons we won the war, in spite of the other things they say, Japanese engineering, blah, blah, blah. So hopefully that you're getting from this is that data winds up being kind of a chaff. Tom Redmond likes to call them hidden data factories. You'd like your machinery of your organization to be running well, but something is gunking up the works. So now what we need to do is to put a price on that because only when we put things in people's terms that they understand, do we actually communicate. So you've probably seen these in the posters around, some people load things and hate things and they're all different and it depends on what you're doing. Data quality challenges are going to be context specific. So I mentioned in Tom's hidden data factories concept here before I've got a link to his Harvard data article which is at the bottom there. But department A delivers work products to department B. But the first thing they do in department B is they check department A's work and they do any corrections because it's just easier than going back and trying to get them to do their jobs right. This happens all the time. That's why they're called hidden data factories and they have to, of course, complete their own work delivered to the customers and they may or may not have it correct and they've still got to deal with the consequences there. So there's just two of them. What you'll find is that knowledge workers, 80% of them are looking for stuff and 20% of the time is spent doing useful work and that's an inordinate amount because these hidden data factories are pervasive in your organizations. And consequently, it is often not well known but in addition to being pervasive we can also start to see how they're expensive. Now the first question is, were your systems explicitly designed to be integrated or otherwise work together? And the answer is unlikely, no. So the chances of them just happening to are very, very small and data must function at this most granular level here. Now this is a Lucy clip from many, many years ago. Look up to Lucy in the chocolate factory if you want to see the whole thing and I'll just narrate here a little while but she and Ethel are being told obviously to wrap each piece of chocolate and the assembly line is getting ahead of them. This is actually quite similar to the way data is becoming a problem for us in organizations particularly data quality problems. And I'm not advocating that you eat your data quality problems, although it probably would be a lot more fun for some of us. I'm gonna turn the volume up here a little bit. I'm good, fine, you're doing splendidly. Speed! She said speed it up a little bit and we don't need a person like her to speed it up but this is why your end of day job runs 45 hours or the wrong assets are transferred or features are not available at delivery or there's a additional risk incurred around these. In fact, easily 20 to 40% of IT budgets are spent migrating, converting or improving data all of which results from data quality problems. Now, I like this particular example of the thought processes around an activity that many people refer to as activity-based costing and it's a wonderful set of constructs to take a look at but let's just look at it this way. Sheena drives to the airport and she was slow, I guess talking on her phone or something and she caused 120 other drivers to arrive five minutes late so we can add that up 60 times divided by 16 is 10 person hours that we have there. And then of course she's the knucklehead that goes through GSA with her laptop and causes some more delays. And then she's also the knucklehead that's in the front of the cabin with 300 people behind her and we're trying to make their connections in an out of oak hair on a snowy night, right? We can add this stuff up and in this case came to a whole person. In our case here, inspiration is from a fellow called Douglas Hubbard and even though I've got a lot of books out right now I've sold more of his books than I have my own which I guess is a testimony to. By the way, he's also written the best books on risk management and big data out there as well but we're just gonna talk about the first one here. So first of all, something can be observed that can be measured and that has to be the basis for what it is you're attempting to do. Also you have to consider that measurement is a reduction in uncertainty. And finally that writing stuff down forces some clarity. So it gives some specific challenges out there to all of us and I've worked with a lot of different groups over the years as you can tell whatever your measurement problem is it's been done before. So look to Google as your friend. You have tons more data than you think you need less data than you think you probably need different data than you have and getting it just fairly economical on this. So let's take an example of this applied. If I ask the question here how many piano tuners are in the city of Chicago? We might go, I don't know what kind of dumb question is that? Well, it's an exercise and how can we figure this out? So if we had been doing this in 1930 with Nobel Prize winner, Enrico Fermi and he would have said to us you can't use the yellow pages and of course Google didn't exist. But we could take the current population of Chicago in the year 1938, which was three million and say that the average number of people in the household was either two or three and that the number of households that had regularly tuned pianos was one in three and the required frequency of tuning for a piano is once a year. And how many piano tuners can a piano tune daily could be four or five? Again, these are things we can come to in our class. And how many working days are there in a year? So the equation becomes tuners in Chicago is approximately equal to the population divided by the people per household times the percentage of households with tuning pianos times the tunings per year divided by the tunings per tuner per day times the workdays divided by the year. But again, that's an approximate example but it does give us something that we can use to get an answer. And in most cases, the way we approach these our answer will be not sufficient but 100% defensible. So if somebody looks at some of these things from a historical lens, it'll look a bit different. So we're gonna dive into a series of examples here. And I just wanna be the think about the story types as we go into them here. First of all, on the Dilbert side of things the geek right technical side. And we've got Dilbert's point here boss who's useless but we'll call him the business side on this. So on the technical side, Dilbert might say something like we need to clean some data but what the boss needs to hear is decrease the number of undeliverable targeted marketing ads. Reorganize the database as Dilbert. Dilbert's boss needs to hear increase the availability of the sales force to perform their own analysis. Dilbert develop a taxonomy. Boss, common vocabulary. Okay, that actually pretty close. Optimize a query from Dilbert's perspective. Shave one second off a task that runs a billion times a day. Even the pointy-haired boss actually does understand that. Reverse engineer the legacy system. Understand what's good about the old system so it can be formally preserved and what's bad about it so it can be improved. The story types have to be different. Now we're gonna dive into some measurement types. First example from a state-level agency that was looking at a way we were challenged to save $10 million during one of the periodic budget crisis agencies go through on these. And so what we found out was that there were a lot of people doing clerical tasks, in this case tasks around time and leave tracking in worse time and leave tracking and maintaining that information across multiple systems. When we had an analysis product we were able to show on the obviously the y-axis the number of employees and where they were located and what pay grade they were at. Now that didn't tell us a whole bunch but it did tell us that at least this much effort was going into it and that we had identified 300 or so employees that were spending a minimum of 15 minutes each week tracking time and leave. We can then go to the charts that tell us what the base-level salary was for those individuals. We don't necessarily need to know what they make. So again, you see what I'm saying. It's entirely defensible but it's probably not as much as should be in there. We add these to this, the labor costs, the time sheet tracking, all the things that go into it and come up with a monthly amount that's the total of that 21,000 and 137,000. And if I move that to an annual charge by district I get my $10 million quite easily. So I know it looks like magic math but I'm sure that it works. So let's practice it. That's sort of the first one. We're gonna do that a couple more times. Maybe you can always go back and watch it again on YouTube when Shannon gets it out there. Let's look at some stories that are specifically focused on dollar cost savings. And this first one here I was quite proud of because we did save the government some money. So let me start off here and describe, this is the Defense Logistics Agency, very fine group that I worked with for many years. And they were moving from a system that had literally 2 million NSNs or what we would probably know them in the outside world is a shopkeeper's unit, SU. And that was maintained in a catalog but unfortunately due to some database re-engineering that had occurred somewhere prior to the time we got involved. The key, another important data was stored in a comments field. So you had wonderful subject matter experts that would call this information up on an old green screen terminal and say an asterisk in column four means that it's Thursday and therefore this is an Air Force part rather than a Navy part. I mean, that's pretty involved and it presents a problem if we're gonna put that into a new system. The system was BSM for those of you that go back that far. The original suggestion was the manual approach. Well, I don't know if we have a choice, we'll have to read each of those comment fields and individually figure out what was going on there. And that really didn't help with a lot of things because we could have moved it from one place to another but we really wanted to move it into a structure and not move it somewhere and then create a structure with it in the structure. I hope that makes sense to some of you. If not, we can clear it up in the Q and A part on this. So the solution was what would now be called text analytics. At the time it was called improvable text extraction process. But again, that same kind of thing I described before with the learning algorithms and it converted that non tabular data to tabular data at a rate that far exceeded what the government estimate report was which was a total of five and a half million dollars. So I saved the government five million dollars which was fun, but more fun was that it was the first time I saved a person century of work. So let's dive in and see what happens with this. First of all, it's important when you're doing something from an automated perspective to do it as long as it is effective, as long as you are getting out more than you are putting in that is a very definition of diminishing returns. So the management at DLA and our team that was working on it here had an agreement and I'm gonna put a dollar figure on this each week here. I've got four of them shown so that would represent $10,000 a week. Gosh, I wish I could have gotten that much money for them. It would have been a lot more fun but we still had a good time with it and everybody was happy. So anyway, it's $40,000 investment and look at that we were terrible the first three weeks but by the fourth week we had solved 50% of the problem, 55%. Now, let's go a little bit further. You can see we did this for a total of 18 weeks which means that $10,000 a week we're talking about $180,000 total investment and that's why I'm doing this for illustration purposes those are of course not the real numbers. But the key here is to look, we started and we got better. Also we found out that some portion of the data was absolutely useless. The rot that I was referring to earlier in this case we verified that 12% was rot which meant that our unmatched score was hovering right around 30% on that. And we did another 10 weeks so we were at the start of week 14 when this conversation occurred. I think we've gotten enough subject matter experts. Well, if you could find us one more, this type of data that would be worth $50,000 to us, right? Or say half a million dollars, whatever the number is. And so we could go another five weeks on it and try to uncover that information. You can see we did, we took the 9% down to 7.5%. We got to a fairly steady state of the rot just one fifth of the entire database was absolutely useless and we had solved 70% of the problem. Now, back to our original problem this was doing this many things manually which meant it would be error prone and not nearly as accurate or doing this many things. And I don't know about you but the green one looks really good to me. And let's see where it came on the dollar score because this is what we're talking about. How do you cost a data quality problem? Well, if we hadn't observed this particular aspect of it we would have required two million NSNs to be cleansed. We're gonna put in five minutes there giving us a total cleanse time times the week per year, blah, blah, blah. And then we look at how many minutes are available for person year. This is my personal century. They're 92.6 person years. You see that's not exactly a thing. Well, $60,000 times at 93 project years five and a half million bucks, right? That was the original estimate how long it was gonna take them to clean that. Now, with what we had done by reducing the problem space from two million we had then brought it down considerably and you'll notice the number of the spreadsheet changing at the very top it's gone from 2 million to 150,000. Again, bringing that number down considerably which meant that the total person years required was only seven person years. A total cost of $420,000 remember the original was five and a half million there's my $5 million saved. And again, one more thing on this. The most important number on this chart is a deliberate piece of social engineering. Whoa, I'm so sorry. I circled the five there. Hang on, let me go back. Circled the five and it went away. So what you want somebody to ask in the back of the room when you're briefing somebody can you really solve a data quality problem in five minutes? And the answer is clearly no. So what should the number be? Nobody knows. There's a statistic that says one hour per you can see that that 93 project years turned out to be many, many centuries. So an important aspect of it was a little bit of drama in that we're doing this. Let's change context here slightly. This is a different client. This is a chemical engineering company that has a resource of 100 PhDs in chemical engineering. These are 100 chemical researchers and their job in there is not to know whether a product is Y2K compliant or not or to give you the definition of Y2K as one of the world's biggest data quality problems. So these guys, billion plus dollars that they're doing each year and they manufacture additives and these researchers are trying to enhance engine and machine performance. They're trying to help fuels burn cleaner, engines run smoother. I sound like a commercial don't I? And the machines last longer. And the fun part about it was when they had a breakthrough they would go down in the basement and run the tests and the tests cost tens of thousands, excuse me it costs up to a quarter of a million dollars and they were running tens of thousands of tests. Now we did something for them that was kind of interesting. I put together this chart which showed essentially the processes that they use to gather information. I put the chart together to understand the workflow but at the same time what happened here was we discovered some interesting things. For example, we had people who were getting paid $100,000 a year taking digital data off of computer A and turning around and retyping it on computer B. Probably not the most efficient use of that information or that individual's resources in there and probably anybody in this class could have very definitely solved that problem for them but remember the degree was not in IT. Their degree was in chemical engineering and what do we teach chemical engineering? Folks about IT, apparently they could use some more. They were using flash drives to transfer data which caused manual workloads and duplication of efforts and things got confused along those lines. They really didn't understand what cutting and pasting was or some manual data manipulation but it was very not well explained in there. They had some synonyms that they had to watch out for in that they had some tribal knowledge requirements such as the fellow that worked in the UK who refused to ever convert anything to the US units along the way. And of course finally the bottom line of all this was that they had a bunch of Fox Pro databases and Fox Pro simply hadn't been made Y2K compatible so they were unable to do this. So we helped out, we solved the real short problem which was why is the date function not working on this? But in addition to that we were able to help them reduce expenses around that and improve their competitive edge and their customer service because they had time savings and improved operational capabilities around that and their internal case that said that they would realize a $25 million gain each year on a $100 million investment. I'd love to know how to manage a business that well that I could do that kind of work with it but it was just a phenomenal, phenomenal aspect of this. So again, here's a cost somebody else has put on the data quality problem is holding them back opportunity costs of $25 million a year. Here's another way to use this to head off some kinds of problems. This is a legacy payroll system and a legacy personnel system and the big numbers in green and red there are the number of data items from each system that need to be moved to the new system which was going to be a people's off system. So we needed to move the 683 and 1478 into the 7073 somehow in order to do this. Now, there's always a task in your project plan that says we are going to allocate 40 person days for this two person months essentially around that. So if you do some math around it when you add the 683 plus the 487 it's about 2000 attributes mapping onto 7000. We could use some math on the left-hand side and say that at least understanding the source it requires to go through at a rate of six and a half attributes each hour and on the target side we'd have 46.875 attributes per hour. So to locate, understand, identify, map, transform and document at an attribute of one per minute. That's doing better than that five minutes I showed you in the last example before the last one, isn't it? So not likely going to happen. It's like we call that extreme data engineering and it's just not going to happen. That companies do it, plan on you not understanding that and then they fleece you on the other end of this when it doesn't go the way it should. Do not let anybody give you a fixed price for converting data if they have never looked at the data. And take another example, data quality problems here. A Fortune 500 company, their logistics company. And I find this room full of associates that's kind of like, ah, things are running around in circles, everybody's jumping up and down. I asked around and they said, okay, yeah, here's the deal. The main frame that we're trying to get off of requires us to go through and correct every item on every customer invoice. So the data, the service is wrong. The type of service is wrong. The way in which the service was provided was incorrect. All of these things are absolutely wrong. We don't want to send those bills out because it will look like an idiot if we send them out this way. Plus we won't get the right revenues on this. And so, you know, what we do is we correct all this. So these people are heroes that works tirelessly to make sure that these bills go out. And the real question was, you know, how much of a delay does this introduce into the process? And the answer was about 30 days. And this was a $9 billion organization. So one of the things that we had a duty to point out was that there's a better way to address that problem than continuing to treat the symptoms. And his response was just amazing. And he said, oh my goodness, I just had the best quarter of the best year that I've ever had. I'm actually thinking about doubling the number of people in this room. Now, 200 people, adding 100 people, that's probably a $6 million investment just for starters. So whoever was going through an approval process on that, you really got to ask a question where they're understanding will be that yes, that may have been achieving good results. But excuse me, if you did a little bit of math here, as I did when I walked down to the CFO and said, hey, would you like to improve your cash flow by 30 days? The answer was quite clearly yes. She said, if you can do that for less than $800 million, then I've got a very positive return on my investment. She's obviously been listening to some of these webinars and things like that. So again, a couple of conversations later, you've got this organization completely refocused. But it's very difficult for people to see beyond this. So again, several savings-based stories, in this case maybe $6 million on this, but more importantly, taking a little bit, maybe $800,000. And they'll try and fix it in 30 days, but certainly in a year you could get there and your investment would still be very positive at the end of that year on that process. So let's change categories a little bit. Savings are fun, but they don't work for everybody, particularly nonprofits. So sometimes innovation stories are very, very useful as well. And the first question is, what do you need as an organization to do more with data? Obviously, improving your organization's data is crucial because data points to where valuable things are located. The data itself has intrinsic value and the data has inherent combinatorial value, which is that if I take my chocolate and mash it with your peanut butter, we've got a pretty yummy treat on the other end of it. So yes, that is good. But another part of data quality is that you have to improve the way your people use its data. We can't just improve the data. We've got to also improve the way people use it because you use data to measure change. You use data to manage change and you use data to motivate change. And all three of those are absolutely key to making anything happen. I'm gonna describe a story here that is representative of an organization attaining a sustained competitive advantage that improved its organization's data and improved the way people use its data in support of the organizational strategy. And obviously the company is Rolls-Royce and it is a really, really fun story here. So let's dive in a little bit. Rolls-Royce for years and years has been selling things that are great that they work. This is one of the most reliable pieces of equipment in the place of the world. And one of the reasons is that right now Rolls-Royce is able to take 9 million data points off of each engine off of every minute. They know more about what's happening on these engines. And in fact, errors are so frequent that they're actually noteworthy. So this is the one time we've had an engine failure flight 1381. But again, it was a fluke. These things are phenomenally well. People do not worry about the engine blowing up or failing as they're flying across large fences of water or around the world on this. Here's the interesting challenge though. Rolls-Royce at the time was a product organization. And there's nothing wrong with being a product organization. But as a product organization, they couldn't have certain conversations and they wanted to have conversations because I'm going to show you they're trying to get to a certain conversation that would allow them to even better serve their customer. So the new model was given a name selling hours of powered thrust. Remember they used to sell engines? Now they're selling hours of powered thrust. Even had a catchy name, power by the hour. And the reason for doing this was trying to find out how we could reduce, in one aspect, how we could reduce payment for downtime. If the engine is not pushing passengers around the world and making line for the airline, it's not making money for Rolls-Royce. Their interests are aligned and that's what enabled them to have these conversations. Now I'm going to pop on here with a little bit of audio here and we'll see how it works. But I think it'll do just fine. So first I'm going to show you the Indianapolis 500 pit stop. But Holland comes in for a pit stop. Time to refuel and change tires. Newmore himself changes the tires. Only four crew members, including the driver, are allowed to work on the car. It's the final time. Holland's agency anxious to get away. Let's watch. The tires are changed to glass. A crewman punishes the windshield as Holland blows away just 67 seconds after he stops. Hopefully we'll miss it. Now key here is that Rolls-Royce was never considered as having a role in this process. But as you can see, the process of changing a tire has evolved greatly from the Indianapolis 500 to the second race that was being shown. And the metrics were very straightforward. Two tires in 64 seconds on the first one and four tires in four seconds on the second example. And that's a really significant difference. And that's the kind of sustained competitive advantage that can only have achieved by increasing the focus on data quality. And what I think is most amazing about this story is that if we were in person, I would ask you, when do you think this model was invented? And the answer is 1962. That is an astounding achievement that they've been able to pull off on this. And monetizes, one of the things that they looked at was not just what's happening on the sensors. But in this case, we can do better tuned and safer maintenance. We can quantify mission readiness. There may be storage aspects. There may be handling things. If we can reduce this type of thing here, again, from going from one sensor where they got one aspect of it to having 100 sensors, their total here is at one and a half billion dollars. And that's just to them, bottom line savings in all of this. So there's lots and lots of things that can happen. Let's do, and we've just about out of time here, but we've got some time for some non-ronetary stories. The first one's kind of a cute little one, but it's worth repeating anyway. When I first went to Nokia in the early thoughts on this, one of the things they had was this little box in everybody's room. We were looking at this and trying to figure out, goodness, what was this? And when we asked the question, they said, oh, and they handed us a 50 page manual on how to use their cruiser collector. And that's kind of cute, but it was interesting because within just a few short minutes, I was able to be in front of their major leadership and say that you have, in fact, more documentation on your garbage than you do on your data. And believe it or not, that story really stuck. So again, it's not to denigrate anybody or anything like that, but when you can point out to them that they have more documentation on their data, excuse me, on their garbage than they do on their data, that's a pretty good lesson. People get that to heart and they say, we don't want to do that again. Speaking of again, kind of a depressing story but an inspirational story here for you. So we were working for the Pentagon 2000 when President Obama focused on the military rise and suicides. This is the idea that so many of our warfighters, in fact, more of our warfighters are dying from their own hand than they are from the bad guys. And that's an incredibly bad situation as the president absolutely wanted to fix. So we were essentially given kind of unlimited charter to go out and fix this thing. And what we ended up doing was a lot of mapping as you might imagine, trying to understand what was happening. And then to go through channels, we ended up with a 30 by 30 matrix. Now I'm gonna show you this room we're gonna come back to it a second but imagine all these kernels that we're working with here and Colonel X would stand up and say, sir, we're speaking about row nine here. And in row nine, we want to check column 10 as yes possible and column 18 as maybe possible. Now I hope you all agree that working off of the 30 by 30 matrix is a very difficult task and probably not one that's gonna produce good results in the long run anyway. That said, however, I had one chip that I could have with the secretary of defense and I was able to bring the secretary of the army into one of these meetings. And again, we've got this room full of stewards who are really trying to do the right thing. But secretary of the army simply looked at me and said, I understand why you brought me in here, Peter. Let me just make something very, very clear. This is my soldier's data and anybody that doesn't want to use my soldier's data to save my soldier's lives, my office door is open. They can make a point with me and we will start to work through whatever the issues are there. Again, just a wonderful story. And he also said on the way out of the room, I'm probably not authorized to do this but it was very much the right thing to do. And I've told this story to more than 100 corporate executives, not a single one of them has yet taken on this challenge and I'm just appalled because again, from a quality perspective, data ownership is one of the more corrosive pieces in here. I've got one more story here to tell you before we get to the top of the hour here. And this is an even fatter story but certainly a quantifiable one. Most of you remember the target data breach, some fewer number of you may even know what Ashley Madison is. It's still a Canadian dating site for married people. So that's an interesting component. And then we add one more piece to that, the Office of Personnel Management had a data breach as well. We're gonna see something that ended up really interesting. First of all, lots and lots of federal employees used their real federal employee name to get on to Ashley Madison. Not a great idea, including 44 people from WhiteHouse.gov emails, thousands of military and government perhaps even more interesting is what do we really know about Canadians because there were tons of Canadians on there, it is a Canadian company, but somebody did an analysis and found that that one fifth of the entire city of Quebec was in this database, wow, fascinating. Well, what we really need out of this from the bad guys is to take a look at these tiny little piles of data that represent the intersection of somebody at the Office of Personnel Management with perhaps a high level security clearance who might at the same time be carrying on an affair so auspiciously or not and then go into Target and get their habits. So what does Target got to do with this? Target's database keeps all sorts of things that are out there. They used to talk about this when they were, before they got smart and maybe their data scientists stopped talking in public but there are a couple of good New York articles, their Times articles, it's New Yorker out there. Look at them, this information is quite easy to set up a blackmail situation and whether you know, like Jason Chaffetz as a piece that certainly got Mark Meadows chair on this as well. The idea is that in 2014, the government labeled this the government's worst national security threat that we've had coming in the form of bad data pieces and this. So they can go all the way to emergency level but at the same time, it's very possible in your organizations to find things specifically that are there. So we're getting close to the question and answer sessions in here. Let's just do a quick review and we'll do some wrap up in here. First of all, again, important to adopt this broad definition to data quality because if you keep it narrow, you will not get to the root cause of the problem to understand that data quality has to only be relevant in the broader context of your organizational data use. Only focus on the rot, excuse me, on the non-rot data cleaning the rot is a waste of money. You've got to approach this within engineering's eye because if you don't, I think I said engineering's eye. Let's try that again. Approach data quality with an engineering eye. If you don't, you're not gonna be able to achieve the kind of leverage that you need in order to do this and it's only gonna get funded when you can put a price on it. I have literally been on a dozen panels in the last five years where people wanna say I can't put a price on this and I think the answer is yes, we can. We've just gotta start and get better at it. I've given you a series of savings-based stories, a series of innovation-based stories and then some non-monetary stories which are relevant as well. The idea is not necessarily that any of these will work for you but hopefully that you'll be inspired and perhaps look for these types of opportunities in your area. And again, I said we just take a laze and get to the Q&A. So let's look at just a couple quick takeoffs. Data quality always requires a context-specific definition which means it's very unlikely that it can be solved in a cooperative manner. Most businesses have data quality challenges, hidden data factories at the root. You just have to keep looking far enough and if you fix the data at the root, a lot of other things will get better. All advanced data practices depend on data quality. Imagine trying to do master data management without having good quality data is just a useless exercise. Again, artificial intelligence machine learning are suffering from this lack of training data that we have around there. There are a few easy fixes that are out there and the key to it is to make sure that people understand stories because that's such a human component of what we do. You've got to have some tangible ongoing savings, got to develop some innovative data uses and the outcome is often more important than money. So look for it because it makes a good deal of difference. As always, there's a couple of references here for you from the dim box to pass along to you and then some special event pricing on this and then of course a quick slide for upcoming events and we roll it right up to Shannon and say, Shannon, it's time for Q&A. Peter, thank you so much for this great presentation as always and just a reminder, if you have questions for Peter, feel free to submit them in the Q&A portion in the bottom right hand corner of your screen and to answer the most commonly asked questions, just a reminder, I will send a follow-up email to all registrants by end of day Thursday with links to the slides and links to the recording as well as anything else requested throughout here. So Peter, diving in, why is it such a challenge to help people understand the relationship between poor data and poor results? I think that we haven't practiced it enough. I like to say that we're relatively new as a profession if we trace our lineage back to Lady Ada then perhaps we're 250 years old or so or something along those lines. And that's very different where accounting has in fact developed generally except for the accounting practices. That said, Dave McComb, John Ladly, several others are working in the area collaboratively here where we can get to some objective definitions here but it is a very, very new profession that nevertheless is not an excuse not to do things but what people haven't been doing is practicing around looking at root cause analysis and telling stories around it. So I'll just give you an illustration of one story that I found particularly revealing over the last couple of weeks. On online conference, I think it was data.world and there was a wonderful CDS story where the individual had gone into the boss and said, hey, guess what boss, I've got maybe six percent correlation and the boss turned out to be really upset and say, wait a minute, around here we only do things at 100%. So you get back to work and just a complete miscommunication. Erin, again, lack of context and things like that but it's certainly produced an awkward moment. There's a lot we need to do around data literacy to improve everybody, not just the data world in this and particularly since so many people are getting preyed upon but that's a topic for different maybe next year's family will do that one. Always new ideas coming up throughout indeed. So when you mentioned the requirement for architecture, is it the same architecture applied to data governance or data strategy? Generally, if you're working off of a shared data architecture, my business definition for data architecture is common vocabulary. And so, yes, so many data governance efforts get off track because people get confused as to what they're describing. So I like to say that there's different webinar obviously but the language of data governance really should be metadata and of course the need for quality should be obvious then if we're focusing on that. Yeah, absolutely, it's just a key component of it. And if you have any additional questions, again, feel free to submit them in the bottom right-hand corner in the Q&A section. So in context of data literacy democratization, do you think that data quality literacy is easier to start with than with statistical AI language? Yes, and there's one primary reason which is the numbers. Imagine if we get the data scientist even smarter and I'm absolutely happy to do that. There will be a tremendous impact on society but on the other hand, imagine taking, we're pretty sure there's a billion knowledge workers out there right now and imagine the effect of a billion knowledge workers becoming data literate. I think the effect is gonna be quite a bit greater in the second scenario. And so consequently that while, we can argue about how many angels dance on the head of a pen, really doing something good for society would involve much more leverage. And I think taking a billion people and making them data literate would be a phenomenal gift to the world over the past. So yes, we need a lot more energy in that area and that's what I'd like to see. And everyone's being shy today, Peter. We don't have any more questions coming in. Maybe we should do something controversial. Should we sing and dance? We're talk politics, right, Shannon? No. You guys have to understand, Shannon's in Portland, Oregon though and I'm in Richmond, Virginia. So it's, well, it's been hot in Richmond. It's been quite hot in Portland. So stay, stay shimmy. I love this question. We get this question a lot, Peter. As you know, and it's always a challenge. You know, how do you approach board members on the topic of data quality? It's really important that you do just for starters. And I know that whoever wrote the question intended it in the right way, but you're not gonna be able to convince most executives that are operational in nature because they can't focus their attention for long enough. And again, we can get into the causes for all this. It's real clear that, you know, the brains have been rewired. If you wanna go read some of the Nicholas Carr, I think it's called The Shallows. It's a wonderful book that describes this, but it's a very real condition that people need to sit down and pay attention to this and that you can't just dabble in data. You've really got to go back and make an investment in it. And that investment is gonna require somebody besides just the CEO committing to it and it's really making the CEO accountable to the board for this. So I don't know if this is a former student that's asking the question, in which case I'm totally proud or somebody else picking it up and it pops out of one of the case studies that we do on these things. But it's a great question and yes, you need to have the board in there. And the only way they're gonna communicate with it is if you talk to them in relation to how they understand the world. So while it's fine to say that, oh, well, they think of everything in terms of cows because they're a dairy person, then you need to learn a little bit about cows and dairy persons in order to do this. Just absolutely critical because board members are there for their brains and thinking process. They're not there for their necessarily expertise, although there's certain aspects of that as well. If you wanna get into this in a somewhat interesting way, there's an interesting book called Adventures of an IT Leader. And we use it to teach the executive MBA classes about, and it's about an individual who becomes a CIO accidentally. It could be called the Accidental CIO and it would be just as effective. And they go through a series of adventures, including having to sell something of this to the board. And it's quite a good example if you haven't had a chance to see something around that. But great to see some good topics coming out of the academic world on that. Anyway, thanks for the question. I agree with that, obviously. Well, just to expand on that even further, Peter, what if you have limited time to talk people into it, you know, 15 minutes? What's your elevator pitch? It's very definitely important for not just you, but your team to have 30 seconds, three minutes, 30 minute and three hour agreed upon pieces. Again, spoken with a common vocabulary around there. If somebody looks over at you in the corner, what I would say is that let's go back to this slide here and just remember that as people are talking about this, it's got to be in their language. Oh, dear, where did I get up here? Searching for my slides. I'm so bad at this. Dead air, Shannon, right? Terrible thing. Actually, it's the Dilbert one. You guys know what I'm talking about. Why can't I find it? Okay, I've deleted it. Ah, there it is, it's very lost. Okay, to be forever. Sorry, guys, if anybody's even still there, you're probably all boarded tears by now. But, you know, technical and business pieces. Again, if you're in the elevator and you say I got to reorganize the database, class is gonna say, let me off and I'll see you later. Whereas if you say, hey, you know, would you like your Salesforce to be better at what they do? Yeah, okay, well then I need to do this. Again, given that cause and effect, because if they don't have it, they're gonna focus in on things. Yeah, the title of the book was Adventures of an IT Leader. I didn't see that particular question come through. Anyway, thank you, it was a good question. That's absolutely critical that we communicate properly. Indeed. So in your lube additive example, what was the key success factor? In the lube additive, is that what you said? Correct. Okay, the success factor was really taking, actually it's a lesson that'll be learned unfortunately over and over again by a lot of organizations. So again, let's just put this chart up on the board here. This workflow produced a certain amount of new products and services or improvements to the existing products and services every year. And it cost them at least $10 million a year to do it. So we know that it was producing at least something larger than $10 million. And they're not gonna have any ideas in the chemical engineering world for at least three decades about data. We've gotta start back in the high schools and push this all the way down. It just incenses me to get, I'm sorry, I'm going off on a rant here, but it incenses me to see people wanting to have coding done as the only thing. Because if you learn coding at age six, you're gonna be a coder the rest of your life. And we've got better things. We've got demand in other areas that we need to have in order to pull this. So having an idea of what this does, you can then look here and say, where are the issues that people need to have? And the key was not that they were doing anything wrong or incorrect about this, but that somebody with a good data management background can come into this and say, let's just reorganize a couple of things here. And again, add some workflow here and improve these pieces and automate this. Again, go back to your workforces and do this in a public meeting sometime. Ask everybody who learned Excel to stand up, right? And who get a large part of the room will stand up. And then say, please remain standing now if you also learned that Excel has a feature that'll make it run perfectly every time called a macro. And they'll be surprised at how many are remaining standing. It's just an absolute dearth of learning that we've got to overcome. Many people are calling it the data debt that we've got to get up and over. Sorry, Shannon, I'm babbling here, but I think I got to the key there. So really just looking at what they were doing and how could they do it better from a data perspective because their specialty is chemicals. They've got PhDs in chemical engineering. Of course, that's what they're good at. And why would a PhD in chemical engineering need to understand anything at all about technology? So again, you can see here, let's make the knowledge workers smarter and more literate. That'll be the real key. I like it. And rambling is good, Peter. We've got plenty of time, so no worries. So how do you suggest to initialize an organization that is not data savvy to get awareness of the cost of data quality? Again, not to pick on these guys, but this is a story that needed repeating. Again, there were two parts of it that were interesting. One, they were apparently going to spend $6 million by doubling the number of people in the room without really good idea on the investment average. It was a clear process of local optimization where this part of the company was doing better, but it was at the expense of the rest of the organization. If you had that same $6 million and invested it in the data quality, it would go away and be cured and you wouldn't need even a hundred people in the room to do that. So very, very key to making sure that you've got the ability to communicate with people in ways that they understand. Everybody understands $6 million? Oh, that's a lot of money, right? And that means we put them in there, they'll be there going forward and going forward and going forward. So we've just increased the price of the bottom line to about $6 million a year without any discernible increase in the top line. And I've got to take it, what's up for the chair? When we look at data science today, what we're seeing is that there are three primary complaints about them. One, they really don't have a sense that they want to learn the business. And we're trying to work on that with the curriculums and things, but it's a slow process as you've already heard me grumble about. Two, they're not productive enough. And it goes back to the other slide with the chemicals. Adding data engineering capabilities to many operations can take that sandwich from being a real rough and crumbly kind of thing to being a really well engineered machine. And again, the third piece of this is there's just not a real sense of interest in their part of the data science part of learning the business. And that's unfortunate. So we can get cross-training if we can get people that are interested from both perspectives. My class this semester, I have information systems people, but I've also got accounting forensics. I've got some Homeland Security people in there. It's a real interesting mix because they're trying to get an idea of how to do this. But these concepts have never been presented to them before, even root cause analysis is a foreign concept in many cases. So let's do what we can to really get this out and make sure that people are aware of this. So this became a story in the organization and years later they will tell this. I'll say, remember when we used to put people in a room and try to fix things that way, that became part of the DNA of this particular organization. And they were able to take this build on it and say, let's not do that again. I've got another company I work with where we talk about the chocolate story. Oh, no, don't go there again, Peter. We don't wanna hear it. But again, these become part of the stories. And I think it's important that everybody who's involved in data be able to tell these kinds of stories. Indeed. We've talked about how to get executive on board, but then should data quality be a chief data officer mandate or a more general stake? Well, we've gotta take at least a realistic look at what's been working or how things have been attempted so far. So for example, most people will say data quality is everybody's responsibility. And my next question is, how's that working out for you? And the answer is in many cases, it's not working out the way people want it to work out. So given that's the case, I think that it's really important that people do start to put a concerted effort into things. And that requires some leadership around it. So I'm not a big fan of the title chief data officer even though obviously I work with lots of them. But the real key is to have that leadership in there that understands the data knowledge and that understands how they can leverage data and it really gets that whole picture of the elephant as opposed to just the ears or just the legs or just the trunk or just the tusks, right? Each of those are gonna be different and have a different perspective, but not having that ability to do the system thinking there is really gonna hurt that effort in the long run. I think that is it. All right guys, well, that was totally fun and thank you for paying attention. Thank you, Peter, for the great presentation. Really appreciate it as always. And again, just a reminder to everybody, I will send a follow-up email by end of day Thursday for this presentation with links to the slides and links to the recording. Thanks everyone for being so engaged and all the great questions and I hope everyone has a great day and stay safe out there. Thanks, Peter. Bye-bye.