 Hello and welcome my name is Shannon Kemp and I'm the Chief Digital Officer for Data Diversity. We'd like to thank you for joining today's Data Diversity webinar getting data quality right. Sponsored today by Calibra. It is the latest installment in a monthly series called Data Ed Online with Dr. Peter Akin. Just a couple of points to get us started due to the large number of people that attend these sessions you will be muted during the webinar. For questions we would be collecting them by the Q&A section or if you'd like to tweet we encourage you to share highlights or questions via Twitter using hashtag data Ed. And if you'd like to chat with us or with each other we certainly encourage you to do so. And to open access either the Q&A or the chat panels you may find those icons in the bottom middle of your screen for those features. And just to note the zoom chat defaults to send you just the panelists but you may absolutely change that to network with everyone. To answer the most commonly asked questions as always we will send a follow up email to all registrants within two business days containing links to the slides and yes we are recording and will likewise in a link to the recording of this session, as well as any additional information requested throughout. Now let me turn over to Henry for a brief forward from our sponsor Calibra Henry hello and welcome. Hello Shannon thanks for the introduction and hello everyone. Good afternoon good evening and good. Good night from wherever you folks are from. My name is Henry Tram I'm a data quality specialist here at Calibra. The reason why we're sponsors today is we're extremely passionate about data quality data management data governance in the space. And we're here to introduce Peter what some of the things we've been seeing in the space as it pertains to your organization whether bs and be all the way up to the enterprise. Data is exploding, as you can imagine right the IDC predicts that there's going to be 175 zeta bytes of data by 2025. And as you can imagine, managing all that data is extremely difficult, whether be entering data from your front end offices in your CRM or your HR systems. Your front end offices working with financial data accounting data trying to reconcile data and make sense of it all, all the way up to management to leadership, thinking through how do we report off of our organization right off of our data. How do we make data driven decisions on a day to day basis and get to the quality of the data trusted data in real time. Right and as you can imagine in terms of every employee in your organization, whether it be an everyday employee. We have questions that are pertaining to data quality issues. Right. How do we run our own data quality checks and where do we get started. What are the unknown unknowns, we've been reporting off of our financials off of our transactions or quarters and years now, but what are the potential pitfall and anomalies we haven't considered in itself, all the way to the bi analysts and the data scientists to decide whether or not data quality is getting better worse over time. And where are they consistent season where do we start with our data cleanup project and management leadership how do we make data driven decisions. If we can't trust the dashboards if you can't trust the bi and reporting if you can't trust some of our financial figures. And as you can imagine all these questions are driving a lot of operational headaches and costs, whether it be a 50 to 70% of the time spent on manual operating 15 to 25% loss of revenue due to bad data and $1.9 billion in data quality spend, perhaps being manual perhaps hiring new FTE is trying to make and create manual processes to manage all the data quality. This is exactly how clear data quality can help right leveraging technology where business and it can collaborate on a self service fashion built in with ML and AI into the platform where it gives you the ability and the easy button to have auto generated rules, adaptive rules, rule discovery to automate that rule writing process, but also being built on a modern spark architecture. You folks can scale at ease and really taking the framework to be more proactive in data quality, rather than being reactive, right really partnering up people processes and technology to build a scalable framework for your organization to not worry about any data quality issues that come into play. Right and the last slide before we introduce Peter is just an overall comprehensive and 10 data management and data quality platform. Right when you think through the needs and the functions and the capabilities for your organization from the back end IT analysts all the way up to the business users have a simple point and click functions to be able to run data quality checks within minutes, you're going to need an comprehensive enterprise solution. Right and you can start small scale fast again data quality within a couple weeks reducing the time to data quality there, but out of the box workflows to be able to navigate and sort through data quality issues as they progress through your organization there, adapting to any specific data quality rules again out of the box rules for organization for ML and AI, but also the ability to create custom rules and automate the rule application process, creating rule libraries and semantic rules to apply to new data sets as you on board. And as your organization your data goals change, you want automated ways to detect changes as well so scheme address and source to target validations for data reconciliation there, and always running profiles over time and having ML study those profiles over time so and giving anomalies in terms of what you need to focus on in itself, and being able to integrate with all your sources, whether it be the snowflake environment study AWS the Azure or GCP world, or the on prem sources in itself so really taking the framework, marrying it with technology and being able to being able to build out a scalable processes in your organization in a very simple and easy fashion. So all these things we discussed I'm sure Peter is going to elaborate more on just keep us in mind happy to have a conversation with your organization talk through how we can help your organization in your data quality projects and partner up with your organization there. Shannon Peter that's all I had so over to you. Henry thank you so much and thanks to Clebra for sponsoring today's webinar and help making these webinars happen. If you have any questions for Henry, or about Clebra, he will be joining us for the Q&A portion at the end of the webinar today. Now let me introduce to our speaker for the webinar series Dr. Peter Akin. Peter is acknowledged data management authority and associate professor, professor at Virginia Commonwealth University, President of Damon International and associate director of the MIT International Society for chief data officers. For more than 40 years Peter has learned from working with hundreds of data management practices and more than 30 countries, including some of the world's most important. His 12 books are the, are the first making the case for data leadership CDOs the first focusing on data monetization and on modern data on modern strategic data thinking and the first to objectively specify what it means to be data literate. His recognition has resulted from these and a pre COVID intensive worldwide event schedule. Peter's also hosts the longest running data management webinar series from data university. net DSU have it's 10 years now I can't believe it starting before Google, before big data was big. And before data science Peter has founded several organizations that have helped more than 200 organizations leverage data specific savings have been measured at more than 1.5 billion US dollars. His latest is anything awesome and with that, let me turn it over to Peter to get his presentation started hello and welcome. And thank you, Shannon and thank you Henry for a nice getting started on that yes you're absolutely right these themes are absolutely lock solid and step going on to this so we're going to talk a little about Popeye today and several other things as we go through this but I figured sort of the main character on this though we'll dive in here and talk about what we're going to talk about in the first third of this really will be looking at what you have to think about as you are starting to approach data quality then the second is what do we need to get better at and this is not really well understood again buying technology like Libra is an excellent way to address this in fact it's difficult to imagine doing it without some sort of technology support around that and it's been wonderful to work with as much as the product that has matured all the way around that the last part of this then we'll talk about how do we get better at what we're doing so it's really a matter of getting started and then getting better in the process so let's dive in and look at when one approaches data quality sort of from the first time how does that tend to manifest itself in organizations and it's kind of fun living as long as I have I guess for a number of different reasons but even back in the 90s we asked questions so I wrote these words in a book called Building Corporate Portals using XML back when we were focused on technology and I was very fortunate to write that and do that work with Clifenkelstein who we acknowledge is a leading visionary in this particular field and the part I wrote down was fixing data quality problems is not easy it's kind of a challenge in that sense because it never tends to look as simple as it does from the outside that when you're looking at it it can become dangerous and I've seen people who said gosh Peter I know you I know you know I'm passionate about data and I thank you for getting into it but as soon as I started exposing some data quality problems. And they came after me yes that is likely the case and that your efforts are unfortunately likely to be misunderstood, not through any specific fault but mainly because we're not practiced enough with the vocabularies that we should be using in this context to help speak with people and you'll see some examples of that as we get into the presentations. And you could make things worse. Now, we've observed in our studies of members at DAMA International that people tend to start out in the data world by starting out in IT for the business. And after a while, you start to realize that gosh if somebody could fix the quality of this data that we're working with things would be easier. Now that's all the promise by the way easier is still better than harder but nevertheless, we can't of course fix everything. Eventually, in your organization, somebody turns around and says to you I'll just pick on Henry as an example and say Henry you said data three times in the last couple of weeks. You must be the data person so guess what Henry. Now you get to fix the problems. And of course that's a really sort of daunting thing and people then look around start Googling find diversity webinars and other places that you can go for this and perhaps even stumble across the block and start to get a handle on what's going on in this area, because a single data quality issue can grow into a really significant unexpected investment in that. Now, the place that most organizations are going to these days is the latest buzzword digital. Everybody wants to go digital. Now, it is kind of interesting and I want to quote my friend and colleague Mark Johnson on this actually invented this little equation he said you know, I see what happens when I track data from digital. I'm not sure what I have but I do know if I do it the other way around I subtract the digital from data, I still have data left over. And of course, that is critical. It is impossible to go digital without a good foundation in high quality data. And yet, so many of these initiatives that I have observed over the years and remember in our profession. We consider that one in three projects succeeding on time with full functionality within the schedule promise with low risk to the organization as successful. My dentist was that successful I would certainly find another dentist but we have not matured as much as a profession, as we ought to. So the reason is kind of simple here and I'll just pop this little quick one minute little session on on what's happening here so as you can imagine, bad data, plus any awesome thing is still going to give you four results or perhaps more than some of you learned at garbage in garbage out. Now, when we look at this what that means is that if I have garbage data and I have a perfect model I'm still going to get garbage results. And that's going to be true, whether my perfect model is a data warehouse, whether my perfect model is a machine learning business intelligence, blockchain, AI, it doesn't matter what's in that blue box. And it's always true that garbage in is going to give us garbage out. The challenges though most people don't really understand that fundamental piece. So until we can start to look at our data flows in the organization that are often duplicated and can save very significant amounts of dollars. Very quickly by just normalizing your data flows, we can't even evaluate the results that come out, because until we get decent results out of it we're not going to have anything else so once again, bad data plus anything awesome it's still going to give you bad results let's dive in for a couple of definitions here. Our definition that we collectively use in the industry is that data quality is data that is to me quality data is data that is fit for purpose. And Epler for coming up with a really good definition. And of course it has to be synonymous with information quality. Many organizations will try to distinguish between the two but if you understand properly that data as a combination of a fact paired with a meaning and information is data that is provided in response to a request, you'll understand that you cannot do information quality without data quality as well in there so let's stop arguing about it and understand the relationships that are there. Our definition is then data quality management planning implementation control activities that apply to quality data management techniques to measure assess improve and ensure the data quality this means that encompasses a variety of different life cycles that we go through and we'll look at two of them. In particular today, and it's got to include supporting processes if you're trying to change the culture in your organization organization, then of course you need to have change management and organizational leadership around this. The best thing to think of is this as a continuous improvement process, really requiring your organization to develop some core expertise that it may or may not have just at this point in time in order to do this. So we get this to our Popeye story and I really am quite enamored with this particular one. There was a German chemist in 1870 who was doing some investigations and then I can assure you that he was very diligent about the process but he unintentionally missed a decimal point when he was describing the amount of minerals that were inside of the green leafy vegetable spinach. And so while that's an interesting thing and he made a data quality error the quality error was that he ascribed 35 milligrams of iron content to spinach through this again misplaced decimal point there were really only 3.5 milligrams of this. And if he had had a solution like Calibra in place when he moved those results on further some of the rules in Calibra might have said, do you realize probably not talking to him in exactly this way but do you realize that you just said that everybody who eats a serving of spinach is about equivalent to eating a part of a paper clip. I'm pretty sure most of you don't want to eat a part of a paper clip and that you would realize that was a lot. And of course if there was enough iron in a serving of spinach to have part of a paper clip. Spinach would be a very different vegetable in this. So the idea is this got transcribed in there and again 100 grams of spinach with 35 milligrams of iron in it. Well, unfortunately this started a legend. They didn't catch the data error until it actually got back to our Bureau of Statistics in the United States where they went back and said, Ah, you know what spinach is a good vegetable, but it's no better than kale or any of the other wonderful leafy green vegetables that we have but you can see of course the myth around it. I suppose that if I just eat the spinach it's going to make me into a super person. I haven't seen a Popeye cartoon in years I guess I should probably go back and brush up on this. It is a wonderful story and it's a really good one because it does illustrate how a single data quality error from previously can carry on and have unintended consequences down the road in this case spawning the character Popeye and of course all the myths that go around them as well. The next definition here left one for this slide is data quality engineering and this is the idea that while you're approaching data quality is a good way to approach it from an engineering perspective. That means they're not really managed but you need to develop a approach to them that is based in an engineering discipline. These concepts, however, are not generally known and understood within it or the business. I can tell you as a professor who's been teaching data topics for more than 35 years. I'm pretty sure I don't know of any class in data quality in any US university there are some in some others in particular in Germany they've done some very nice work with us and I'm sure there are others. One of the things we'd like to do is actually catalog and grab a hold of all those programs so if you're teaching in a university somewhere worldwide and you have knowledge of these programs that are being taught we'd like to start to collect them so that we can make these resources known to everybody else just pay me it's very easy to get in touch with me around. All of these activities so now we've got a definition of data quality data quality management and again the approach being an engineering based approach as we go forward. The reason we need of course data quality is from a number of different perspectives. If we're looking at new data analysis. Everybody understands that when you do this kind of an analysis I get the data in some form, and then we clean it and arrange it in some form this actual slide could have come from my doctoral dissertation back in 1989. The patterns and the process have not changed in order to get them to grouping and that makes perfectly good sense and is absolutely something that we want to do. But what most people aren't really aware of are the ratio of time that has spent doing this. We also tend to assume that since we have hired data scientists or something like it in our organization even data engineers that they will have a pretty good idea of the most effective way the most effective tools to apply to data quality and I'm sorry to inform you all that that is simply not true. We do very little around that in these universities here for example is my favorite example of this data quality tends to happen pretty well at the work group level in fact it is a determining characteristic of a work group the ability to share characteristics all the way around. And the gentleman that was just throwing the pink balls at the piano that was on the ground, his name is Wally Easton according to the internet is somewhat representative of your organizations trying to deal with data quality. He was perhaps told to go out and learn to play the piano. He did exactly this is Wally again he did exactly what he was supposed to do he learned how to play piano. It's probably not the type of performance that you would want to have. If you had a bunch of data people sitting around with perhaps some alcohol at one of our many famous conferences that Shannon and I participate in over the years. Consider the amount of time and effort lost in your organizations with everybody trying to fix data quality problems individually. It just doesn't well it's working about as well as we've seen and if you're satisfied with the way it's working you probably don't want to be on this particular webinar. Data of course over time becomes banned in the machinery it prevents smooth interoperation and exchange, and we have lots of losses in time, money opportunities, due to lots of little data cuts that have been up to this point, difficult to account for. And one of the things we'll talk about is that while it may not be possible to get a total cost of data quality challenges. It is possible to ascribe some cost of them and in most cases those costs are far more than the cost of fixing the data that goes into it. So organizations and individuals lack data quality understanding and the first place that you can look to in your organization is when somebody pops up and says data is the new oil. Well, I really don't think that's a good way to think about it I think you should think about it instead as data is perhaps the new soil. Again they like the knowledge, the skills, the data engineering where how and finally, we don't have all of the resources in the world. So we need some sort of strategy to say what should we do first, second, and third, in order to do this again I mentioned a very little secret about data in the world here is that everybody understands we're going to have to do some sort of data analysis and some sort of data preparation that you might ask yourself, would it be fair to say half and half. I've got half my time I spend doing data cleansing and clean up and prep and things and half the time I spend doing data analysis. I'm pretty sure some of you would look at those numbers and say gosh with the amount of money I'm spending on data science and data engineering in my organization. I sure hope they do a lot more than that well it turns out that might be an idealized characteristic where we have 8020 but it's actually 2080. So data scientist is thoroughly understanding of the fact that the data problems are going to consume the vast majority, or times, give me five times the amount of time that you're able to spend doing data analysis so let's just start off with a very simple economic from a data quality perspective if I'm sending this data to be analyzed and I'm spending oh let's just say $100,000 worth which is by the way a very cheap data scientist, but nevertheless, spend $100,000 on them and I'm having that data scientist, spend four hours out of every five that they work for us, doing data munging cleaning the data. This is not a good use of that individual, and it clearly would make sense to add some knowledge and skills into this and probably at a lower price point than the data scientist because quite frankly data scientists have not been taught anything about data quality, and they have no experience fixing it which means they're going to do their level best, but it's just not the best that could be applied to that particular product. Similarly, we'll also start to recognize as you get into this and I hope a lot of you have seen this in your own areas. There's something out there called hidden data factories a wonderful term coined for us by Dr. Tom Redman. The data doc around these things. So hidden data factories where you have department A sending off the work product to department B and department B rather than trying to crush department A just fixes the things because it's easier than trying to work across departmental boundaries so let me look at that. And they get a hidden data factory there there's our first one, then maybe B sends to C and maybe C has the same process again, these things happen over and over again and I didn't get to my one and a half billion dollars in total through any floppy math again very careful analysis of this and of course we hand the product to the customer who also has a complaint, and now we're up to three hidden data factories in this particular stream here. Of course the reality is there are a lot more hidden data factories out there in your organization, and those hidden data factories are causing you all sorts of challenges on that. And that is, you don't necessarily recognize the challenges that are presented because the data by the time it gets to the place where people decide whether it is fit for use or not is always filtered through one or more it systems, and one or more business practices around this particular and until we connect the dots and see that they think have a common root cause, we will not be able to apply any of systematic approach, again that's the engineering based approach in order to do this. The most analysis always reveals that there's a data component in every business challenge. That means that from our organizational perspective we really need to reverse the flow of information there and find out the common root cause of all of these various data problems. But most importantly, it requires us to have a team with specialized skills that are deployed to create a repeatable process and develop these organizational skill sets. So let's take a look now at a couple of things that you may or may not recognize as data quality challenges I'm just going to tell you the things here at the letter from a bank, a very small rounding error that costs a lot of money very tangible a health data something I call the chocolate story and of course we'll get into covid for a little bit as well here so here's a letter I got from a bank it was pretty old on this but it still is a very good example and I'm not picking on some trust they're now called truest, but the problem was, the bank didn't know that they had made a data quality error. And the reason was because we of course being a data firm when we got this said hey we called them up and said did we really get a particular gift card here this is wonderful where can we spend this money and they said oh you can spend it on this and we said can we buy a car with it can we buy it and it wasn't after about 20 minutes. Finally the representative from some trust said, I'm sorry, did we really send you a gift card for $0 and we said zero we thought you sent us a gift card for a million and it was just an overflow problem. But of course it wasn't they had actually sent us a gift card for problem of the problem here of course is that you should probably have a control and it says if you're sending things of value to the customer make sure that the value is not equal to a negative number or the number zero around this so again not really recognizing this as a data quality problem here's another interesting one I've been a member of the I triple E just as I've been a member of Dama for more than 30 years. And in this case I was on there just the other day, and it turned around and said to me hey great. You've been a member for three minutes and four seconds now I just told you I've been a member of this organization for 30 years. I think that also counts as a data quality error, although they would tell you. Oh well no this just means that you're a member of I triple E TV, which is a sub membership category of the membership in the I triple E and you've been a member there for three minutes and 44 seconds. Again confusion also counts as a data quality error in your customers minds because if they look at this, they will not understand a it's a subsection and be the generally counting part of their membership in which case you just say a better message more informative message around all of this. Here's an article from the Seattle Times here where they were building a new port you may have heard of some supply chain difficulties recently yet of course, and the error here was that they needed a transfer and electrical cable, and the trench was specified to be 22.5 two inches, however, when it was transferred from one piece of paper to another, it was truncated and now has a specification for this cable of 2.5 inches, which meant it was not big enough that small to 10th of 100 inch output here cost them a $15 that they had to go through and divert and wait until it was corrected again these are not what you think of as typical data quality errors but I can tend, of course that they are. Now we get to the chocolate story and one organization that I worked for for a number of years. It's plain to them that if you're trying to sell chocolate at the same time as you're trying to change your systems. It's probably not the best way to do this. And we made that story a cultural touchstone within a meme if you will, within the organization. And so I know I've succeeded in this when people come back and say, Oh Peter, are you getting ready to tell us the chocolate story. And that's exactly what you want to hear because they know the chocolate story now of course what I do is I say great, I've told it a bunch of times you tell it to the people here in the room who have not heard this particular data quality story there which was simply the fact that they spent so much money on their IT. When it came time to sell chocolate at the holidays, they delivered their chocolate to the wrong places so customers were not able to access their pounds of chocolate that they were looking for. And another quick story here out of the UK in this case, wonderful eyes are blinded a great blog post on this one, we're in the UK apparently they have 17,000 pregnant men. What's going on well. Somebody miskeyed them. And how often do you think any of us look up at our medical records and double check the metadata that is on our medical records. Oh, but on the other hand, if you get a letter in the mail it says you need to come in for your latest OBGYN exam and you're a male, you're probably going to maybe somebody confused something somewhere yes and of course that's exactly the answer in this particular instance here. Finally, one last one, those of you that are using Microsoft Excel understand that there are of course challenges with it. In this case, there is absolutely no reason why a healthcare worker on the front lines in the middle of a pandemic should have to know which version of Excel files they are saving. Nevertheless, in this case well documented here again, they were using an Excel file instead of an Excel file, the Excel file stops counting rows after 65,000 and the additional data is dropped without notification. They used to underreport and they managed they went back into a oopsie on this and what happened and found that that they had underreported 50,000 cases that were going on and lots and lots of other things eight days of incomplete data. This is I'm certain, not the only time this happened in this. So you've seen that data quality errors manifest themselves in all sorts of different ways I'll show you one other data quality error a little bit later in the presentation but let's talk specifically about data quality and again a another area that this is not well understood is that there are two aspects of data quality. There is what we call practice oriented activities and there are structure oriented activities and all of them have to be correct in order to get fit for purpose data. So practice oriented might mean that we've, again, not told the frontline healthcare workers, not to keep track of data in an Excel file but instead to keep it in an Excel file, which is hard enough to say much less trained health workers in order to do this. And this allows incorrect data to be collected when the requirements specify otherwise data can be presented out of sequence these are all practice oriented things that help with that particular process but I'm a structure oriented side it means the data is arranged imperfectly. I had an interesting example of this in class last night I had a student that wanted to take an exam, and I needed a little bit extra time on it so I set it up so that the student could have the extra time on the exam, but it turned out the attribute was at the quiz level that the student taking the quiz level. So the data was arranged imperfectly and that student that cheated out of some time. Another example might be that your data is organized by street address in which you really need a GPS coordinates, or your data is captured but it's not accessible yes that does happen on a regular basis in these big data warehouse giant things that we've seen over the years. Similarly when incorrect data is provided in response to a correct response one of my favorite ones with that I had a group that I was working with. We had a Likert scale the data should have been between the values of one and five, and they got an average of seven. Well, if my range is one to five, I cannot get an average of seven but the computer was giving it to them so they said well it must be that the computer is right because we're obviously not as smart as a computer. Obviously, I strongly disagree with that. One of the reasons we're in this state shape this particular situation the data is not broadly or widely understood it's like the blind people facing the elephant, depending on which part of the elephant that you touch. You will come away with a different perspective of the entire elephant because you're only seeing a little bit of it and data is very similar some people think warehousing some people think it's visualization storytelling, etc, etc. These different ways of manifesting are big and challenging. Again, when we look at this here are some images again means here and you can see that in certain cultures, they have low than hate versus hate and low, sorry, low than love, scary and reassuring or not. And again, how you'll choose our pain or pleasure, depending on what's going on all of this is obviously going to be related directly to the context in which you're looking at here. There's a lot of help in organizations. If you have what we call a burning bridge here is a very specific example of a burning bridge. This is Zion Williams, who is getting ready to play a brand new basketball game, and he's got a brand new Nike sneaker on and what's what happens Nike sneaker busted open on the court less than a minute into Duke's Wednesday night game. The star forward went down hard. But Nike, the company stocks sliding on the news closing the day down nearly a point losing an estimated billion dollars in value. Well, this is of course the burning bridge something's going on here and we need to go back and find out what is the data root cause of it now Nike originally didn't look at this they thought it was a shoe quality a manufacturing quality program. They tried to go back and find out what actually happened they of course immediately ran into data, and then they found out that the data wasn't necessarily there is it belong to some of their suppliers. They had to do all sorts of things somebody needed to go in and fix the poor quality and luckily with the burning bridge you've got somebody's attention, but it's also important that you make sure and you don't have just their attention that you take this time to educate them about this again use the data that I've given you here so far, just to illustrate to management why all of these things are burning problems for the organization. Typically what happens is management says okay you're right I got it, go do something which often leads into buy something. Well, again, buying technology is absolutely necessary but it certainly can't be the only stage and if you are a pool and you have a tool you're still a fool, and the tool is not going to be as useful so make sure that you get what it is they're trying to do something does get accomplished but most often, all the project funding gets used up. The early cases that we're looking at in data quality have a dual purpose. You need to make sure that the case will fix the immediate challenge put out the fire, figure out why that sneaker busted open, and then make sure that it will not open, but you also need to illustrate why you can't be done with this by Friday data quality cannot be approached as a project in fact data quality compliments our goal in all of this of leveraging our data. I have 100 kilogram weight on one end of this, any one kilogram weight on the other side, you can see it's obviously going to do this, but when I add some tools and technology to this, I can now start to make it work more correctly, adding in a larger weight on the other side of the fulcrum here will actually make it work a little bit better. In fact, if I add more I'll get even better leverage around it but there's other ways of getting leverage. When we look at it from this perspective here's our organizational data, and we have some technologies. Again, something may and almost always is necessary on this. Then we look at what is the technology one example that I'm using here, there's the fulcrum, which is the lever, and I could use just the lever to move that organizational data on the other side of it but of course you understand the other part of the technology is the lever and the fulcrum, the fulcrum is the purple thing that the lever is on, and that's of course how you properly leverage your data in order to do this. We've got people, typically they are knowledge workers sometimes they are supplemented by data quality professionals sometimes they are not. And again we need a process in order to do this we can't just have people repeating Wally Easton's wonderful performance. So notice we've got our triumvirate people processing technology there, guided by some form of strategy to say these things are more important than those things whatever they happen to be. So we'll get to a bit where we talk about ROT data ROT in this and that we will understand that reducing data ROT will help us increase data leverage as we go forward around all of this so data leverage is a multi use concept that permits organizations to manage the data better within the organization and with our partners that we're changing. So of course all of this in support of the organizational mission. Then we have leverage which is obtained by this data centric skills and processing technology, focusing in on this non ROT data so the type of data we'll get to that in just a second but be the bigger the organization, the bigger the leverage potential exists in the organization to do this. So treating data more asset like the two things simultaneously. It lowers organizational IT costs because IT spends between 20 and 40% of all IT spend working on data challenges around this. And it increases our knowledge worker productivity which is the biggest source of untapped productivity that we have in our organizations. One more specific concrete example of data leverage comes from the master data management initiative and a little yellow dot that I have in the upper right hand corner here of this slide is what we call reference data. It's not a lot of it, but it's important to get it right because reference data tells us what values are databases are allowed to contain. They may address questions like the countries where we do business, the types of accounts that are available, the controlled vocabulary items that we're going to use and this leverage then expands to a way in which we can now go to master data which again is a subset of all of our data so master data might say are you a member of our premium or our VIP club. Are you even authorized to use this or to be a user on our system. And are we using common data standards across this because of course the master data controls all of the rest of the data that's out there and if we don't have this correct. And when somebody comes along down the road and says like a transaction for five bucks or I'd like to be authorized or I just simply like to like something. Each of those are database calls that we're going to make is the instances of values, and you all of a sudden have to go back to the board of directors and say, we can't do business overseas because we didn't control our reference data correctly. So in order to do this we can't determine the country of origin of our product, because we never capture that information, or we can't add a foreign language to our website, because we didn't plan for this so again a little bit of leverage controlling an awful lot of data around these these examples are from my colleague Chris Bradley who's done a great job of helping to articulate some of these things. It's got to come down to a fairly simple math proposition in management's eyes if x invested in why then outcome Z must be greater than X. Again if I invest 100 I want to get at least $101 back out of it. So the beginning of the project where the parties and at least about each other, and at least about the data quality items. We're expecting to agree to the full meaning of price timing and functionalities, and defining X and resources why cleaning one set of data, and the outcome is that the data set will be clean. Most managers go, I'm sorry you lost me. You know it's just not very interesting here but if I instead go back and say no no no no no. I invest X, I clean one set of data and I can save $1,000 and so what changes to, I really care now. Right and that's what we're trying to do is to get management understanding in this. So what does it mean to have a data quality program well it's an ongoing commitment. In fact you are unlikely to not need a data quality program let me say that in a positive way. Your data quality program needs to be around until your organization decides to close up its doors that you're going to have some governance in here with some senior level control and direction in here that you need to make sure that executive management understands that when they run into an organizational business challenge. They need to think of data quality as a part of the problem and therefore has to be part of the solution, almost immediately on this and our data quality. So in a programmatic say I'm inherits a budget senior management attention somebody who's in charge of it and reasonable timelines and expectations again we don't want to do this. And then suddenly say by the way you can have it done by Friday right well that's not going to work. The data is not a project is a durable asset and that asset has a useful life of more than one year and represent something that we want to put additional uses in so that we can reuse it. Reasonable project deliveries might be 90 days or two weeks depending on what we're doing. Our data evolution has to be measured in years because data evolves that is typically not created on a project initiative, a project initiative basis, and it is significantly more stable than other parts of the organization certainly the process architecture. There will always be more influx than the data here. This means that what we need from a data quality perspective is to produce ready made data architecture components that should be a prerequisite to agile development here, because the only alternative is to create a bunch of additional data structures. And I've already said this again the difference between projects and programs as projects have a start and a finish. We do not believe that data quality will ever finish at the organization, it might not be needed at such a higher intense rate, but we absolutely do need to do this. Let me give you an example of bridge maintenance. So in New York City they've done a great job of taking a look at their bridges and staffing them in a way that makes perfect sense they start painting. And when they paint the entire bridge, they go back to the beginning, and they start over again. Wow, because they know how long that pain is going to last and how long it's going to take them to paint it. So, as much as they do this. It's also the same way they maintain that they've thrown it into it, a regular process that they can repeat, and it means they will have a workforce that really does understand this in the longer term and suddenly I'm transforming from what it is to sandwiches. What does this have to do well, most organizations the problem is there's an uneven amount of literacy on even amounts of data supply and very limited use of standard in the process so we try to harmonize these things make them more compatible with other so that they will work, because you're dependent on high performance automation, and high performance automation cannot happen without engineering and architectural components. And I had to go all the way to ink that excuse me India, in order to get this particular thing it was a Deming quote on the cash register at this very T farm in India that said quality engineering and architecture work products do not happen accidentally. Yes, absolutely and if we add the word data in there of course it is even more true let's take a look at it from an engineering perspective. Here's a really good example of something that is engineered is one of my favorite objects in the world. I'll give you a couple of attributes it's taller than I am it has a clutch. It was built in 1942 and it's cemented to the floor. Why might you do something like that well the answer is, oh my by the way it's still in regular use which is kind of interesting. More than 80 years after it was originally built. Of course it lives on the US as Midway which is Harvard in San Diego Harbor may regularly use the kitchen of this wonderful piece of history to have parties and things like that to celebrate the museum. But of course the other part of this is we were sending 4000 war fighters out there to fight in World War two. And the last thing we wanted them to have was be hungry. So we wanted to make sure how can we guarantee that we will make sure that these folks have breakfast every morning as well as other meals well this is a very large piece of equipment that is engineered so well that unless the ship thinks it's probably going to be able to work and you can contrast that with a very wonderful kitchen aid version something like I've got downstairs in my kitchen here but there's no way that I could make pancakes or breakfast for 4000 war fighters the duty cycle of the small red engine is not going to be up to what it needs to be. And you can see the duty on the other one looks actually a little bit more like a drill press in order to do this. I'm actually speaking to you from this location here in beautiful Western Hanover and outside of Richmond Virginia and what I'm showing you as a barn that we built on what's called a horse husband so part of the process was making sure we built a barn. And in the process of building the barn I borrowed money from the bank. When I borrowed money from the bank they gave me exactly this much money and said you may not construct anything until we have inspected this. The challenge around this of course is it makes good business sense to make sure that you have a good foundation in place, but there is no it equivalent in this. And that's why we have to be focused on quality engineering around all of us so what do we need to get better at. Well, let's take a look. First one is systems thinking which is the idea that we have to be able to see both the forest and the trees, but that we can't fully understand any sort of challenge unless we understand all of it. Here's a good way of looking at this and thinking very briefly using what's called an input output diagram here's our inputs there's our process there's our output. Very simply if my process is called make pizza and I have dough and water is the input. We understand for sure that calling the output pizza is probably not going to work. So how can we make this work well we can say we're not going to make pizza we're going to make pizza dough pizza crust terrific. That works out a whole lot better, and you guys can all see the example. We don't have sufficient inputs in order to make the outputs that we want and that type of engineering would no longer be called make pizza, but instead make pizza crust data storage needs similar understanding and looking at quality problems where did these inputs come in what level of quality is required by my processes what role does quality play in my process even where do these quality attributes get used by future customers downstream as in next minute next hour next day could be a number of different things. I think that data quality is interdependent with other types of things so for example I might have data governance and data quality as I'm doing an initiative around customer relationship management. Perhaps, and that would indicate that as I mentioned before the DIMBOK here this is our visual representation of the DIMBOK and we may look at this and say, well, we may think we're doing a data warehouse but it's very unlikely that we're going to do a data warehouse without approaching data quality all though I have seen it done in many cases and also without doing data governance as well. Now, another important category for getting better at this is the idea of understanding the also 80-20 rule applying to wheat versus chaff or data versus non-rot data. So let's take a look at how that works out here when we look at what's going on here, we can first ask the question, does organizing data add to its value and I think it's a very easy answer yes to this by looking at some pre information age metadata as in I'm making a book. And if I make this book, and I don't include page numbers, alphabetizing the various indices and things and diagrams, it's not going to be a very useful book in fact a real easy way to watch this process is to take the spine off of the book and distribute the pages. And quickly the information coming from this becomes a samurai so Betty better organized data does increase in value and 80% of what we're looking at here is rock data that is redundant obsolete or trivial. So the question becomes, why should I clean data that doesn't be cleaning instead cleaning part of my data is a much more effective approach than trying to clean all of it and of course who is better qualified in order to decide. This is not the seminar for it, but there is a approach to this called structured approach on this and the idea is that there's a relatively well thought out method that goes behind this. There's no proprietoriness or anything at all about it I do like to use this in the context of the theory of constraints, which says that in any system, there's something that blocking achievement of goals. And the more you can eliminate those blockages the more efficient your cycle will be in data, we should take exactly the same approach and by the way if that theory of constraints cycle down there looks a little bit like the Deming quality circle. Deming is there for a reason yes Deming cycle plan do study act is the way Deming created it many people actually say plan deploy monitor act. It's the same cycle we're just doing different labels for it and the ideas of course, let's do some diagnostics what is going on how can I scope these things. What do we need to do to address those pieces. Did I fix the problem, and let's take some additional action because even if I did fix the problem I now needs to know what is the next problem in order to do this. Life cycle models are very problematic we tend to look at them and say if we're starting data at new development that's where it starts in the upper left hand corner there and you can see travels the entire cycle but if I'm starting for existing systems. Instead go to the bottom right hand corner and do my cycle there so again people don't even know where these things go and this is a very good way of examining all of those bits and pieces. There are a number of data quality attributes I'm just going to put them all up architecture model model value and representation quality and of course on the left hand side of this diagram if I'm trying to do my work there where there are more representations of the data that I have. It's kind of like being in a boat at the bottom of Niagara Falls and trying to fix the data quality problem. Yes, we're closer to the user so we can achieve more architect but our structure problems are much more about let's plan the architecture for what we need in order to do this. So how do we get better at the process, well, let's take a look at the conversations that we have engineers want to say I want to clean some data, but the business wants to hear decreased the number of undeliverable targeted marketing ads. Right reorganize the database, increase the ability of the Salesforce to perform their own analysis develop a taxonomy is what the engineers say, create a common vocabulary for the organization optimize the query. Shave one second off a task that runs a billion times a day. Yes, those seconds add up very, very quickly reverse engineer the legacy system something I would of course say, understand what's good about the old system so we can formally preserve it and what's good so we can improve it. And our data leadership, our CDOs chief data officers that we want to have should also be focused on these things as well. Inventory I'm sorry inventorying the data in order to do this, and coming up with the first version of a strategy and monetizing that and monetizing maybe for making money but also for putting a cost on how much it costs us to do things poorly. There's a strategy on this because of course nobody has all the time and resources they'd like to have. We started using the term in about the 1950s, but the business consultants turned the strategy into a plan. If you go back to the origin of the word it's actually a process, a pattern in a stream of decisions. So, when we look for example at a choice if we've got ability to improve operations or innovate using data quality both of which we can do. We don't want to not do data quality so let's do data quality no problem with that. But if I'm over here in quadrant two, and I'm going to be very effective. Everybody would agree that Walmart is the world leader in terms of being efficient and effective and Apple perhaps might be the world leader in innovation around this but what we can't do is both of those things simultaneously instead what we should do is get some savings from this and use that saving to start initiating other types of topics. It also helps that we educate management about math. If I'm in a 48 bed hospital and I have a quality problem of the beds are becoming scarce. That would be three of them on Monday, six of them on Tuesday 12 of them on Wednesday 24 on Thursday and 48 of them on Friday. What are we going to do tomorrow you want to catch that way before it happens. Now all of this falls into a category of making sure we're all singing off of the same sheet of music. And while I love Bruce Springsteen and the East Street band on this. There's no way that they could play the volume of songs that they do and things that they do, unless they do an awful lot of practicing and that's what I'm saying here is really critical is looking at the practices here so we've talked about how to approach data quality and I've gone through a number of different considerations. We've talked a lot about what do we need to get better at in order to do this which is not understanding the data quality and isolation looking at data rot, not understanding the way in which culture plays a role in this and developing repeatable practices. So we need to refocus whatever conversation we're having in business terms, make sure that we've got some leadership around this topic that we have a programmatic focus as opposed to a project focus, and that at least our management understands the simple math. Key performance indicators goes in right there, and then we've gotten good at storytelling around that whole process. As we head back to the top of the hour getting ready to invite Henry back in to get your questions and things I'm just going to sum up with a couple of quick bits on this and that is the idea that most people when they talk about data quality they go up and they put up a wonderful slide like a high quality data is critical I absolutely believe it. But what does it mean to be transparent with your information. I still have never found a good definition of the word analytics business intelligence, but increasing efficiencies and decreasing costs. These are not helpful, unless we actually put in place something that will be meaningful to this so rather than to just go with the platitudes that are offered by many of the offerings that are out there. There are very specific pieces in here. One the first project needs to be relatively small. The project should not be allowed to begin unless the data requirements for the entire project are verified to the product owner must be highly skilled we need to have a place that can do this and very few in it, or the business have the requisite skills and knowledge around data quality, you'll need to do some socialization in there. And one of my favorite things to do is to take somebody who comes to me and says, I can't quantify this can you help and yes, we may not get the full cost, but I can definitely get to something that will be recognized as important. The process must be agile ready in other words to implement in this, we need to have a construction technique that we can get ready for, but we don't want to do this without the right type of planning. This really requires more planning before we actually start constructing number four on this the team must be highly skilled in both data quality process and technologies here. And again, a few teams have this requisite skill in order to get started. Finally, the organization must be skilled at a mostly mature level. The management is just going to say, I don't care what you do, I have literally been locked in computer centers and told not to come out until we fix certain issues with this some of them can be fixed that way some of them can't. But most organizations really don't understand this data stuff. So the approach that we're looking for here really says data things happen on this side what we've got to do is translate that into organizational things happen. So we can celebrate the data things happening we need to quantify the organizational things in order to happen here. Again, lots and lots of things around this, the real key to this we've gone through the overall process I've included in here with you, the specific values on this channel and I actually ran over by 22nd here but we're coming around at the top here. We're coming events on this. And again, invite Henry back on and see what sort of questions you guys have in order to get us a little further down the line. Thank you so much for another great presentation. If you have questions for Peter and Henry feel free to submit them in the Q&A portion of your screen. There was a question that came in here earlier. Henry during your presentation about Calibra does Calibra automatically suggest explainable rules that span across several fields. The short answer is yes. So Calibra offers out of the box roles with ML and AI such as behavior, studying profile statistics over time, outlier studying group buys and keys and data over a time bin. There's a series of whether it be quarters over days over months and then aggregates transactions and it points out exactly what it sounds like outliers, right, and other functions as well so there's a number of out of the box functions in accordance and associated with data, and it's data agnostic as well so you run it through the test you run it through the checks it scans it studies it learns, and it's going to give you potential anomalies based upon your data in itself. And I can add one thing to that to that I know Calibra does Calibra has studied the process of collecting this type of data and extracting it so that not only is it an initiative but you're also gaining from these centuries of experience that exists within Calibra in order to try and address these kinds of problems and I really do think you guys don't play that aspect of it up quite a bit really quite mature. Thanks for that Peter yeah our adoption services and our programs here and to your point, a lot of the things you're talking about a bit about is and what we call adoption is, you know, reducing that time to data right how do we incorporate and have organizations on board with some simple data quality projects and metrics right, and then be able to not have to reinvent the wheel and roll that out to the rest of the organization so typically we take a number of use cases for a business unit and we roll out data quality to there we prove success we get everyone on board within 15 to 30 days and then the adoption there is just incremental and it's also exponential at that that the moment right at the juncture because again, a lot of folks are excited about tools. It's not just simple in nature but it's also comprehensive Peter to your point where folks are finding data inconsistencies on self service basis whether it be missing values no values and kind of going backwards and you know not boiling the ocean but taking the project at bite sizes at a time. How do you eat an elephant right one bite at a time exactly right. Other sort of questions yeah. And just I missed answering the most commonly asked questions just a reminder I will send a follow up email to all registrants by end of day Thursday for this webinar, because yes you will get a copy or a link to the slides and links to the recording as long with anything else requested throughout. So diving in here Peter there's a couple questions around ROI so I am the bi analytics manager at a higher education institution. I spent at least 50% of my time addressing data quality issues that impact our ability to use bi assets with consistency and accuracy, getting allocated to data quality projects is very difficult but perhaps equating those issues with cost would help. Can you share more about how you calculate the cost of poor data quality or direct. If you have links to resources on that your slide on simple math kind of addresses but how do I calculate the cost and benefits of a particular data quality project. The method around that if you will is pretty straightforward I'll give you an example of the query that I spoke about earlier. Again, I have looked at organizations that you can see from my resume I've got literally thousands of hours working with some of these groups, and you take a simple query that runs literally billions of times a day. I took that query I just can I take a look at it, and it had never been optimized. It's not a skill that has taught most SQL individuals in order to do this and so consequently they don't think to do it. So we took that query and saved a second off of that query. Well a billion seconds a day added up to machine time and runtime, and even electrical costs that we were able to do on this and I'm giving you a very trivial example. There are some others that are out there that can look at things and I have a small pamphlet called monetizing data management that's out there at Amazon. That will give you some more it gives you 17 different patterns of how you can see if these patterns apply to you and then what measures that you use in order to do this I did one. I'm justifying a data quality initiative by saying you have thousands of it workers if we could make those it workers just one hour per year, more productive that added up to the cost of being able to purchase the technology in order to do this and again as I said, we don't start out by say we're going to fix it all. Certainly not by Friday but what we do do is say where things are going to start getting better, and as it gets better we will be very careful about making sure that we have a website internally facing, so that we can go up and post successes because everybody's going to say yeah you did great for me last year but what have you done for me lately. Well, again if I can go out there and say that, you know, we shaved a billion seconds and that billion seconds translated into a dollar value X, or we were able to in the educational situation, perhaps pushed enrollments up I know that one of the things all institutes of higher education are terrified about is something called the enrollment cliff that is coming up. So what we've seen over the years is that our people who forecast students coming to universities are good but not great. And so the question of how do we play out different scenarios with the enrollment cliff can actually have an awful lot to do with this we may find for example that, while there are going to be fewer students those students that are remaining would be more motivated and I'm diving into a very topic specific example on that. There's a lot of others out there as well Henry maybe you have some examples of an ROI calculation that one could be done because you said we've got the ability to come in for these proof of concepts in there, how do you justify value at a very short time. Yeah, yeah definitely and I can tell you experiences from both being a data quality consumer of a product are used to work as a business analyst so you know for us we can. We have specific processes and procedures in our month and close right working with financial master data management teams so that took us about three to four folks for a single entity. And we quantify that workflow and that process to be about 20 to 30 hours in a given month. Right. And that's for again five to seven tables within our enterprise in our databases. So when you think about the scale again we just quantify that from a blended rate of X dollars and you know that's taking our teams a lot of time what happens when we acquire another. We were doing some NMAs and we're acquiring more entities right what happens when we're onboarding more data sets and as you can imagine 2x3x4x5x2 data right your scales and your challenges are going to grow in the coordinates. So quantifying the data quality issues in our world when you know with a Calibra, quite simple when we start thinking through big data volumes and onboarding data sets right so that's from the perspective of being a data operator there. From a data quality perspective, again when you quantify the amount of time it takes for rule writing, or when it takes amount of time for folks to adjust the data and run through Python or SQL rules or whatnot. Again, it's very lengthy so again that's not a way to quantify it from the back in IT administrators, and from just a business use case there's a lot of folks we work with to your point, the additional zero, the fat fingering the additional characters. And what the banks we work with every return credit card is about $50. Right. And again if we are able to resolve those data quality issues and again multiplied in accordance to how many issues we run into. Again, that it's kind of a simple calculator from that perspective. And I think we have somebody from mastercard in the audience today who might or might not want to share the number of credit cards that come back on a regular basis but rest assured, it's a fairly large number. Absolutely add up after time. So it really is the process of saying what can I quantify. Can I make this to the point where it will come up in a very significant amount of time I had another quality piece that I did where we were moving data from one organization to the next. And the rest of the way that happens with consultants is that they hand the subject matter users a spreadsheet and say, right in the field in the source system that it is in the target system. And while that is a good start and certainly information that we want to have. It does not in fact address some of these data structural issues in fact it can make them significantly worse. Again, the example that I mentioned earlier of not having an attribute at the right level in the hierarchy. In order to do that reduces our control if I don't have language specifications in the reference data there's no way I can add it to the system. After I build the system. I might do business in a foreign language. How long is that data quality are going to hurt as well as it means my initiation and time to money in the market is delayed by a year that's a pretty easy quantification to come up with as well. I'm sure that both Henry and I would be happy to chat with you on any specifics around this but it really is a matter of getting started practicing it, and then starting to add up the cost you will probably never fully quantify the entire thing. And there's no need to once you get it over a certain amount. Management is going to say, you're going to save me a million bucks by this time next year, you're on. All right, let's go. Continuing kind of long similar the same approach. What is the best what's the best approach to remediate historical data quality issues after the root cause has been identified and fixed. It's almost always appear and manual approach is the best alternative which is usually rigorous please advise. It's tough to tell from the scant amount of data you've given us about the specific challenge, but certainly there are going to be times where you want to do things one of my favorite data quality efforts I'll go back to Tom Redmond as well or he taught me this many years he calls it the Friday afternoon aha moment. Of course you would like to have a little bit of perhaps adult beverage included with this but then let's just sit around for you know a couple of hours on a Friday afternoon and examine the quality of the data that we keep about our top 10 customers. And on that Friday afternoon, I guarantee you will find some. Wow, I didn't realize that was going on. We call them a, you know, a small business where they're actually an enormous size, but yeah, all sorts of things will be wrong with these top 10 and if these are the number of quality orders that we found in our top 10 customers, imagine what the rest of it will look like. That by the way makes itself into an instant story that you can then start running up the chain and say hey, we sat down last week with some technology and looked at some things here and found some interesting potential risks that the organization is facing. Should we perhaps take a more serious more careful look at this. And of course that's when somebody like Henry would want to get involved so they can help you plan out that particular process, in the case of looking at it with a specific technology. Henry ever do you. Yeah, definitely and I myself have worked with a number of other data quality products as well. Being on their, their presales consulting teams to, and they're different ways, right. There's probably a million different ways you can cook a potato whether he French fries or mashed potatoes or whatnot right a bit of feeding myself. But here at Calibra we are very confident we stand by the building just to simply be an observability layer. Right and what it means by that is actually pointing out potential anomalies for your organization in a scalable fashion. Right through a scan, there's probably 20 to 40 different anomalies that come your way, and then as an end user, you wanted to flow through a seamless workflow. Right. So what are mission critical things to tackle today, what are mission critical things to tackle. Let's say within the next couple of weeks, right and what are things we can get to by the end of the quarter so again, quantifying and prioritizing the data quality scans, and then being able to assert it through a potential workflow and a work stream. And then having then users validate one whether or not it's an issue or not, and then save and retraining and interacting with our ML models right so I guess it's going to get closer to what the guardrails for what a true data quality anomaly would be for organization so for instance if there's a potential anomaly that comes in a transaction, that's about $100,000. And typically you know your expense transactions for a cost center can veer towards maybe about 120,000, you can edit the guardrail and the control there to hey, anything over 120,000 isn't normally anything under 120,000 isn't normally right. But the reason why I bring all this up is because the cost of remediating data in bulk and scale. When you think through pushing back updates let's say for 1000 records in a CRM and sales force or transaction data. If something is off one little field or mapping is off. It could be very disastrous for your operational processes right so we had organizations who actually push those updates. Some of those customer accounts were actually down for a couple weeks because they had to go backwards and see what was wrong. So then there's a lot of loss of revenue within those couple of weeks what they quote unquote auto remediation. So, I mean there's no right or wrong way to do it, but for us in Clebra, we really stand by that there should be a subject matter expert to dictate and denote that truly this should be resolved, or this needs to take a no later review via service now ticket, and then, you know, our stewards can go in in detail pipelines to be able to rectify or go back to the source to remediate the data there. And I also reminded of another piece on this which is one of my bugaboos. I would suggest if you haven't already attempted it, that your data quality initiative should really be reporting to the business. IT has not proven itself capable of understanding the value ascribed to data. They wouldn't know the difference between a customer number, and another set of numbers that might be, you know, hashtag on a picture, or something like that I'm not insulting it, it's just simply And so the business is going to be more able to add to that and to help with Henry said the prioritization of these things and say, yeah, it doesn't matter that we've got the wrong colors in there because what we do they're all read anyway, you know, whatever on this but they can also say if we're really trying to make sure that we make fire trucks they have to actually be of a certain size and weight and make sure that they will go over over roads and bridges and things in the use case. So the business users are better able to articulate understand and evaluate the issues around data quality than it is and if you have an option of getting your data quality group to report into the business instead of it, I predict that you will achieve better results. Yep. Great and valid points there. By the way, I didn't pay Henry to say that either so it's going to be a like mine. So, how do you learn to translate data lingo into business translation like the likes your slide one demonstrated. Thank you.