 Hello and welcome. My name is Shannon Kempen. I'm the Chief Digital Manager of Data Diversity. We'd like to thank you for joining this Data Diversity webinar, Data Quality for Non-Data People. And as you can see, WebEx has undergone a significant UI update, so feel free to look around. You will find most of the needed icon buttons at the bottom middle of your screen. Just a couple of points to get us started. To do the large number of people that attend these sessions, you will be muted during the webinar. For questions, we will be collecting them by the Q&A in the bottom right hand corner of your screen. Or if you'd like to tweet, we encourage you to share highlights or questions by Twitter using hashtag Data Diversity. As always, we will send a follow-up email within two business days containing links to the slides, the recording of the session, and additional information requested throughout. Now let me introduce to you our speaker for today, Kasusista. Kasus is more than 35 years of experience in information technology, strategic solution alignment, and project program management. He has spent significant amount of time working with Data Assets and has spent four years as the CIO of Health Information Technology Company, where he led the effort focused on acquiring, cleaning, blending, securing, and distributing healthcare data for the purposes of population health management and patient engagement. Prior to that, Kasus has worked on various aspects of data, such as governance, metadata, data quality, BI, and analytics. Kasus is uniquely qualified to deliver on the version of using data for better outcomes using that 35-plus year career spanning many industries and technologies. And with that, let me turn it over to Kasus to get today's webinar started. Hello and welcome. Thank you, Shannon, and thank you everybody for taking the time to attend and for Data University hosting it. I'm trying to flip this slide. There we go. So today we'll be talking about quality, quality from the point of view of both data practitioners and the quote-unquote business people. So we'll be talking about who they are and what is quality, what's perceived quality, data lifespan, metadata, and KPIs, core metrics, what do you measure, impact of data quality, and some takeaways. So I promise I'm not going to read these slides going forward. So as Shannon has gracefully introduced me, my name is Kasus Sister. I work with a company called the Wisdom Chain, consulting firm based out of Chicago. And we focus on mainly on data analytics, data governance, and data quality. So I want to start off with setting the context for the conversation that we're having today. Which is who are the data people and who are not. So with that, I'm going to very broadly define who the data people are and who they are not. So I say data people are anyone that has the worst data analysts reporting their titles or identifies with them. So I'm sure there's hundreds of titles out there that directly involve data. But this is just for the conversation that we're having today. This is what I'm going to define that as. And business people are anyone that has not identified with the previous slide. So this slide pretty much sums up what I'm going to talk about today. So when a business person is talking to a data person, they have entirely two different perspectives. So if you look at the slide, it shows the guy on the right is the one who filled up the blackboard. And he's very excited about his work. He wants to show the work. And he wants people to appreciate the difficulty of getting at the answer. Whereas the person on the left-hand side is going, just tell me yes or no. I don't really need to know I trust you. And so this is a conversation we have many times through our career and through our days. Being on both sides of the equation, I have run businesses and I've been practicing data person for the last 35 years. This is a very difficult conversation to have. So with that, I want to start with defining what is quality. So I'm going to show you a couple of pictures. And since I cannot interact with you guys face to face, I'm going to answer them myself and hopefully agree with them. And so I'm looking at these two cars here and somebody says, okay, which one is higher quality? And immediately what comes to my mind is Mercedes. Because the bottom one is a Chevy, the top one is a Mercedes. And we never ask ourselves why. And now we have Toyota and a Chevy. Again, what comes to immediately to my mind is, no, Toyota is higher quality. I know a little bit about Toyota's manufacturing processes, their TPS, Toyota production system, which is based on hypothesis and experiment. And so it's basically continuous improvement type of a methodology that they use. But even if I didn't know all that, I would still say Toyota is of higher quality. Then the next slide is, I have two comparable cars. I have a BMW and a Mercedes. Now, how do I know which is of higher quality? At this point, I don't know because I think both are comparable quality. And now things like intangibles come into play. We start looking at other things to evaluate quality. So one of the things I want to start is, start looking at what those, before I go there, sorry, this is one of the things I want to establish early in the conversation is this equation. So immediately, quality means trust. It's like you trust and then you buy. So essentially there's a relationship between that's directly proportional between trust and quality. So now let's take a look at what the quality dimensions are, right? And what is perceived quality because quality is not an absolute thing. It depends on who is looking at it and who is making the decision. So these are the dimensions that a survey that European Journal of Business Management did a while ago, back in 2013, surveyed mobile phone owners. So these are basically what they distilled as the dimensions of perceived quality in a product. And so although it's for mobile phones, I'm sure it applies just as well to other types of products as well. So reliability is number one, durability is number two, ease of maintenance is third, ease of use, then brand name and price. So surprisingly brand name and price are at the bottom. And I know most of us, if something is expensive, we have a tendency to directly equate it to quality, but it's not necessarily always true. So what's interesting is the reliability and durability are two things that you can only establish over time. These are not something that's immediately apparent to you when you're trying to make a decision. So can we come to how do we evaluate? How do we decide something is of higher quality versus another? How do you compare before you decide something is of higher quality? So we use our five senses. You see, you hear, you smell, you touch, and you taste. And depending on what the product is, obviously you're not going to taste an automobile. You'll taste fruit. So really we use our senses to make tangible evaluation of something. Next level is we go talk to, walk into the store, look at the product, talk to a real estate agent or an automobile dealer. Or we ask our friends, family and friends, if they have any experience with the product. And we do our research. We go through magazines if I'm looking for a car, road and track, motor, trend. Or there's so much information on the web now, it's almost impossible to know what's true. So I have a tendency to believe more of printed material if I'm doing some research. And then lastly, we look at the company marketing. The glossy brochures you get are the marketing material. So after all this, we make a determination. And notice at this point, I'm not even talking about personal experience. So we do all those things and we buy something, but we still really don't trust it. So first time you buy a car, you do all your research. But not until a few years later you go, I really like this car. It has met all the check boxes that I need. So trust has to be earned and should come only after the passage of time. So Arthur, I said that. And that's true, right? So you only trust people, products, clients after your own personal experience with it. So that is when we're buying products. So think of us playing the role of the consumer and that's what we do. So now we come to data and I say, okay, so this is how in real life where you use to evaluating things using our senses, using all these other dimensions that we have to say I like this product and then you develop a personal experience with that product and you trust it. Now when I come to data, all those things are thrown away. I have an entirely different set of criteria to judge if data is of good quality or not. And so this is a very large gap in the two sides of us and the two sides of the people that we're talking about today. So the quality dimensions are validity, accuracy, consistency, completeness, uniqueness, timeliness, fit for purpose and source. So I want to go through this quickly one by one. Sorry. Okay. Be valid. So if data is to be trusted, it has to deal with all these dimensions that I just mentioned. So if I am data, what do I need to earn my trust? So I have to be valid. So validity could be measured because typically we have a reference set for us to bounce against. So that's something that can be measured. I think I skipped a slide. Excuse me one second. Oh, sorry. So I skipped a slide. So how can data earn its trust? So we have all these dimensions. We say be valid, be accurate, be complete, be consistent, be unique and timely. And then all these things measurable is the question. So this is my experience with what can be measured and what is kind of hazy. Validity can be measured because typically we have a reference set. Accuracy is hard to measure. Completeness is situation-dependent. Sometimes you can measure. Sometimes you can't. And consistency, again, depends on the situation. Uniqueness can be measured and timeliness can be measured. So now let's go look at them one at a time. So I have validity means that I have actual, my values that I see in my data actually are consistent with the defined values for that field. For example, ICD-10 codes. They're published and they're by the federal government and those are the codes used by everybody. So if I get an ICD-10 code in the record that I'm checking and if it doesn't match one of the ICD-10 codes that's in the reference data, then it'll be invalid. And so otherwise I would say it's valid. So it's something that we can talk about and calculate some metrics on. Accuracy, on the other hand, is harder. For example, I cannot be accurate about lifespan because it's an estimate. And so there are things that we use in business that are not necessarily measurable from an accuracy point of view. So every insurance company have their own lifespan calculation. And so which one is more accurate than the other? We don't know because those happen to be statistical numbers which are essentially an expected value, a probability of something happening. So accuracy is harder to measure and it has to be complete. So sometimes we can do that. For example, you're applying for a loan at a bank. They would not process it if not all the fields are filled and all the signatures are there. Accuracy is another issue. So here's the interesting thing. Jessica, something is complete doesn't mean it's accurate because people fill out fake applications every day and the data is not accurate. It really doesn't match the person that's filling it, but they do it anyway. And that's a huge business. So anyway, completeness doesn't imply accuracy consistent. So make sure the data meets expected constraints over a period of time. So this is basically saying that things happen the same way. So I'm getting about 100,000 transactions a day. So if something goes out of act, get 150,000 one day, I know something is wrong. So consistency can be used as a constraint, measuring of a constraint and also can be used to set new constraints. And be unique and this can be measured. This is something that our data practices depend on is unique values to identify things, to make sure that there is no mishandling of services type of things based on the ID. So this is something that we deal with and may not be able to achieve it, but we know we can deal with it. And timeliness. Timeliness absolutely can be measured and sometimes we depend on it. For example, stock prices, they have to be up to at the tick level, each transaction gets transmitted. So otherwise, people trading are not going to be very happy. So these are some of the metrics that we're looking at for each of the dimensions. Validity, again, you have to define a threshold. If it's 100%, it's 100%, that means it has to be absolutely valid every single time. Or you can say, hey, 90% is fine and then we can make that business rule and make that happen. And similarly, accuracy is a threshold that's acceptable. Complete set is, again, acceptable threshold. Consistency is either a number or some kind of a probability. Uniqueness can be measured, no duplicates. And timeliness is a ratio in saying how timely am I? Am I current? 99% of the time? 100% of the time. So that's a measure for timeliness. So these are the dimensions and measures that we currently have. Toolbox has data practitioners. And this is something that cannot be done just by data practitioner alone. And this is where data and business have to collaborate to set this threshold for the data. So as somebody that built and ran a data analytics platform, I can tell you this is extremely, extremely important to be able to say what's acceptable. Because we were processing medical claims, Medicaid medical claims and pharmaceutical claims along with some other data. And some of the data had to be 100% accurate. Otherwise, we won't pass it. So anytime something was had to pertain to a patient, we had to be 100% accurate because we could not make any mistakes in terms of a patient's health care data. So we had to be extremely diligent, stringent, track the lineage. I'll talk about that a little later to make sure that we have control of those data elements and they're correct. And some others, we never really cared about the payment amount because our warehouse was more supporting care management and population management. So we're not part of the revenue cycle. So we didn't really care that much about the dollars. I mean, if they're in the ballpark, they're reasonable, you know, we're fine. We're not going to spend a lot of time trying to make them accurate. So making sure that these, and then we had to work with our customer to understand what's important and what's not so important. And so it's really a collaborative effort for data quality metrics to work. It has to be a collaborative effort between the people that are being benefited by the data that you're cleaning and, you know, processing and distributing. And the people that are actually consuming that data, it has to be a collaboration. Okay, moving on. So basically it takes two to do the trust tango, right? The one who risks the trustor, and in this case it'll be the business person, and the one who is trust for the data practitioner. And each must play their role, but they have to work together. So I want to talk about the points of view. So we have the business person, you know, who's business oriented. And then typically this is what I always felt like when I first walked into an organization. It's like who does what? Who has what data? Who is in charge of what? You know, basically it's a very confusing thing to walk into a new data shop. I don't know how many of you have done that and trying to make sense of what's going on because if we have 10,000 data shops, we'll have 10,000 data shops. There's not too many similarities in how things are managed and run in each of the shops. So the point of view of a business person is more from the point. So he's thinking what's the most viable product I can build? And who is my customer? And where do I find good people? How do I pay for it? And how do I use technology? So if a really, you know, committed business person is not really worrying so much about the technology. And his point of view is, or her point of view is how do I use technology to do all the other things that I need to do that are in the slide here. So as a data practitioner, if I put my other head on, my first question was always where do they need this? Don't we already have that? They have this, they have that. So our immediate reaction is why? Why do we need to do this? And the second thing I go is, okay, do I have all the data? Where do I get the data from? So that's the question I ask. And then I go, okay, now I need to process it. Who can tell me what the business rules are? So this is where we're scrambling around and saying, who knows? What do I do with this? And many times there's nobody. You probably know more than anybody on the business side what to do with it. So all these cases we have to handle. And then do I have to modify the data model? So there's always a big E, right? Especially if you're a warehouse and you may have modified the data model, then there's maybe many, many issues that you have to deal with. For example, you know, I keep five years worth of online data online and I add a column to my fact table. How do I backfill it, right? Do I have the data available? How do I explain to the business person that we only have data going forward? Just because we added a column didn't mean the five years worth of data magically appear in that warehouse. So immediately all the reports they'll be able to go use that column, right? And I had many, many conversations and many arguments about this, about how do I... It's easy enough to add columns to dimensions, as we know. But harder to add columns to a fact table and because you have to gap fill that thing. And we calculate energy metrics based on columns because this column might actually impact other columns. So there's issues that we deal with on a daily basis. Adding a column to a fact table is not as simple, but we're always having these conversations with the business people. And so the idea is we have to educate in simple terms why it's difficult to add a column to a fact table, right? And so that's a hard thing to do because most data and technology people are not known to be great communicators, right? So especially when they have to explain difficult concepts to somebody that's not a layperson, if you will. So this is the mindset we have. So we're thinking on the ground level and the business person is thinking at maybe 50,000 foot level. So going back to this little matrix that Donald Rumsfeld has talked about way back. So there's no knowns, known unknowns, unknown knowns, and unknown unknowns. And where are data people most comfortable? They're most comfortable in the top left quadrant because they know it. They know the data. They know where it goes. They know what to do with it. And that's the place we're most comfortable with. And then known unknowns, we're okay with it because we know we don't know it. But at least we can explain to people we don't have that data. Unknown knowns, on the other hand, and unknown unknowns, data people can care less if I don't know it. I don't care. So we try to future proof data models. We try to never work for me to add extra columns to a table because something might appear there some time. So it's not worthwhile keeping things for future when you're a data person. Data happens now. And if it doesn't happen, I know it doesn't happen. Whereas the business people are dealing in all four quadrants. They're dealing with unknowns. And that is the uncertainty. And so the business people, as a business person, you have to deal with a lot of uncertainty on your daily life. Whereas if you're a data person, you're not dealing with uncertainty. Basically, you like knowing things. And so that's the gap between the two points of view. So I'm going to go back to my equation I created before. Quality equals trust. Now I say trust is inversely proportional to uncertainty. And so what does that mean? The more uncertainty I have, the less trust I have. And this goes for both people and products and data. So this is a universal formula because we are afraid of strangers. I don't know this person who is writing a report for me. So I'm not so sure that the report is going to come out correctly. This data, I don't have a way to check this data. I don't have a third-party benchmark to be able to test this data. So I don't know. So I don't trust it. So it's really uncertainty creates mistrust. So now where do we meet? So if you look at what happens when we, you know, why, what and how, and say where do we play? So the business people typically play in the white space. They're the ones trying to figure out what product to build, what report they need to make a decision. So they are the ones dealing in the white space. And then the white space is where we collaborate. This is, we're deciding what to be built, you know, and the famous words of Steve Jobs, customer doesn't know what he wants. And that may not always be true, but lots of times customers don't know what they want when they, but they know, they always know why they want it. Right? And so this is where, and the data practitioners, we know how to use that data to produce whatever product is required. So we have to, the business person has to articulate the why. So this is the conversation that has to happen. The why needs to be articulated with the business person. And the collaboration has to happen between the business and the data practitioners. And the how should be left to the data practitioner. So any overlap in these circles, the ellipses, creates conflict. Right? For example, if the business person starts telling the data practitioners how that creates friction. So the topics of conversation between business and data should involve these six things. It's from Tom Davenport's article back in HBR in 2013, saying it has to be important that both people, both sides understand the business problem clearly. And then you have to decide as you're solving the problem, if the problem is solved, how do you measure the impact on the business of solving that problem? What data is available? And so it's important that both know just because you want something or you need something, you cannot always have it because we may not have the data available. That means we may have to go collect the data. We may have to go to a third party. We may have to impute the data. So many, many ways that you can get the data possibly, but it's not going to be as simple as writing a report. So we need to know what data is available. And then have an initial hypothesis and saying, okay, what's the first version of something that we want and the solution for that hypothesis. And then finally a way to, like we said in the second bullet, business impact of the solution. So once it's solved, what is the impact on the business? So it's really knowing all those things and say that's the conversation to have around. And both from both sides, those are the topics that you'd have a conversation around. So now we talked about data quality and allow the two ways people perceive quality. And then we talked about how do we have conversations to data people. Now I want to switch gears a little bit and talk about data itself and the data lifespan. So for me, data quality is a three-legged stool. So we have metadata and data governance. And then we have metrics. So we always need those six things or eight dimensions. I've seen many different ways of parsing the data quality dimensions. Those metrics are extremely important. And so those things actually allow us to understand data quality and allow us to get to a quality of data that we want at a threshold level. And then metadata helps us get data because metadata is what tells us about the data. And governance all the policies and procedures that tells you how to, you know, governance uses metadata to put constraints and to monitor and track the data. And then metrics allow us to evaluate how well we're doing and then put processes in place to be able to continuously do better. So these are the three legs of this tool which gives us their quality. Now this is a diagram which looks very similar to any warehouse BI type of diagram because that's how data also flows. So if I look at this diagram on the left-hand side are all the enterprise entities. And so the entities actually run business processes. At any point there might be hundreds of business processes even in a small company. And every time you execute a business process and then you use some tools such as applications, you collect data and that data gets collected and stored. And so that's how data gets created. So and then that data is used for other purposes than what the transactional data was intended for. And we have, you know, ETL and data preparation tools take that data from where it's originally stored and move it into different constructs like data lakes, data warehouses, relational databases, could be in-memory databases and no-sequel databases. So there could be many, many instances of these things. And then finally we use that data that we collected and curated and then create products out of that such as reports, visualizations and other type of products, extracts, and people consume it. And then eventually data gets stale and gets disposed. So if you notice through the whole process we have metadata management and data governance at the bottom because those are the ones that really help us ensure quality as data moves through the life of being created, transformed, used, and then disposed of. So metadata is really data about data. I'm not going to go into great lengths with what it is. There is such thing as technical metadata and business metadata. And data governance so applies both during data creation and the left side of that slide that I went through in the last time of data. And then it's also passive governance is used in the data consumption. And the left side is what we call active governance. In the process of data creation we are much more active in terms of doing governance than we should be. Whereas on the data consumption side it's more passive in the sense we are more controlling who sees what data and what type of capabilities they have type of things. It's more passive type of data governance that is on the consumption side whereas the active data governance resides on the creation side. So with that I want to talk a little bit about core metrics. Core metrics from the organization point of view. So far we talked about metrics that measure data quality. And now I want to talk a little bit about metrics that cooperation would use, uses for which they need quality data. And so this picture is a balanced scorecard. It's something Kaplan has come up with way back when. And it's still very relevant back in the 90s and it's still pretty relevant to every organization. So how do we decide between value and base? So we collected a lot of data. And it's almost like most data shops are not like my basement. There's a lot of stuff in there but I'm not sure which one of it I would use again. So how do we know what's going to be useful to us and what's not? Because valuable data helps us make better decisions. It helps us attract more customers because we can do, for example, what Netflix does. Customize your viewing experience based on your preferences and your buying patterns and things like that. So by doing that type of analytics and we can attract more customers providing that type of an experience. And then basically we can make more money if we are actually monetizing. We have good data. We can monetize it to make money. So how do we know something is good or bad? So what we do is we have to define our KPIs. And this could be at enterprise level. It could be departmental level. And if we have data that doesn't really go into creating our KPIs, I'm making a very broad statement. But that's, I think it's true. By defining KPIs you'll understand what data is valuable and which data is not valuable. If it doesn't go into any of the KPIs that are required for your company to run, then that data has no value for you. And so have a way to decide how to create and maintain them, which is data governance. Determine what data is needed to create them, metadata. Data is created, taken care of, used and disposed of. Data life span. And any data that's not used for a purpose is waste. So it's really, we're using data governance, metadata and metrics to also help us determine which data is valuable and which data is not. And create a process that helps you choose metrics. So metrics should not be chosen randomly. I mean, we all know that. I'm sure everybody's experienced in the audience. But you need a standard process that allows you to create metrics. So how do you work backwards from what's required by the business? For example, business wants to increase sales. What are the lovers required? What are the measures required? And then what are the metrics that allow me to monitor that? So it's really about having a process that creates metrics that actually has an impact on the business. It can be very simple, but very effective. So let me show you a couple of examples. Uber and Airbnb have very similar metrics that they measure. One is liquidity. It says, are there enough vehicles on the road at any time? And how good are we at matching customers with the cars? And how much do the customers trust us? And that's the ratings that you give. They aggregate them. And they use that as a measure to say, what's the trust level of customers in a particular area? And Zappos is another one. They have a single form that's filled out, and they call it the happiness experience. And so there's a bunch of things. How long were you on the phone with the customer? Did you find out anything personal about the customer? So these are the type of things that they measure from the call center operators. And then they create a happiness experience to tell them how happy the customer is with them. So that is one of the core metrics for Zappos. Netflix, on the other hand, has one that they call efficient content. And what that means is really how much happiness are you generating? And I don't know how they do that, but for dollar spent. So if the customer spent $10, how happy is he with the dollar spent? And then the idea is to increase that happiness for that customer. And so these are the type of core metrics that you need. And now to measure that, Netflix has to go through a lot of analysis. And they have to decide what data is needed and then make sure that data is of high quality before they really understand how happy the customer is. So I always help me to understand what data you need and why this thing is happening by starting with the metric and then working backwards to see what data is required for that. So I have a process to do that, and I'll give you a leg up on defining metrics. So good metrics are comparable, so you can compare them against other organizations, other people. You can compare them against yourself in a different time frame. And then they're understandable. They're easy to know, so everybody understands, for example, happiness experience. So anybody knows what that means as soon as they hear it. And they're usually ratios because most of them, numbers by themselves are not worth a whole lot. Sometimes they may be, but usually ratios are the most useful metrics. So too many metrics is the same as no metrics because it doesn't help you a whole lot. So if you look at where Google searches used to be to where they are now, now there's so many ads on my first page. I have to seriously search for a real link that doesn't say add on it. So too many metrics, too many results, too much information is no information because it doesn't give you any value. So what is the impact of that quality? So we have talked about how quality is quite, we have to measure the dimension, we can improve the data conversation, business and data, and we looked at the data lifespan and say where data gets created, processed and consumed. So now we talk about what is the impact of quality on those things. So poor data quality creates lack of trust and confidence. Potentially missed opportunities, lost revenues and reputation damage. So lack of trust is something that's very easy to lose. I mean lack of trust is hard to, let me say, trust is easy to lose. One mistake, and this happened to me once because we're producing a report and we're calculating a value that told them how many emergency room visits the patient did in the last 90 days. So we had the data, we had the data from the admits and discharges and we had the data from claims. And what we didn't do is we didn't take into account that some of the information we got for admits and the claims was already covered in the admit discharge transmit messages we received. So we counted both of them and we came up with a very large number, which is obviously a mistake because we had to dedupe the reconcile the two records and saying if I got the same information in an ADT message and I got information and claim, this is the same patient at the same time, so counted as one, that is two. So that deduping from that point on, it took us about, I don't know, three to four months to convince our customer that we fixed that problem, that the data is now good. So it's very easy to make a mistake with not knowing the data fully and it's very hard to gain the trust back because that mistake has caused lack of confidence. So, and then the other, in the flip side, good data quality gives you confidence. You can make your decision confidently because you know the metrics you got are correct and increase productivity because it helps you to see where the waste is and where the deficiencies are and you can fix them and you can increase productivity, increase agility, and definitely better customer satisfaction. We have seen over the last 10 years customer service has improved across the board. I know, I know, there's a lot of places that doesn't feel like it, but in general your vendors know you much better than they used to. And that also led to better marketing because we know who the customer is, we can segment them very nicely and do a lot of target marketing which is going to help us more sales and more profits. So it's really the impact of business quality and business processes is very direct and directly attributable. So we almost come to the end. I have a few takeaways for you guys. Understand the different points of view and that's, you know, so it has to be a two-way street. You know, data people deal in nonce and business people deal in uncertainties and uncertainty creates mixed trust. So knowing this, I think we can work towards having better conversations and the lead to more trust which would lead to better data quality. How do we evaluate quality? Product quality is tangible and we use tangible and intangible dimensions whereas for data quality we use quantitative and qualitative dimensions. So that is the gap in terms of how we evaluate or we understand quality between products and data and knowing that hopefully, you know, will help. And then how to talk to data people. So explain why. I think that's probably the most important thing that you can do to get data practice on board is explain why it's important and how it's going to help you. And so that requires patience. So have the patience to explain why you need something. And then spend the time to figure out what to gather so you know what needs to be delivered and we have some measures or metrics to say what's acceptable. And then leave the how-to data practitioners because they don't like to be told how to do things because they have spent a lot of time figuring out lots of different ways of doing things and it's best to leave it to them and make sure to give them a deadline, right? So data people never stop tinkering so you have to give them a deadline otherwise you're not going to get something. So capture data the right way. So governance really happens at creation time. So quality at creation time is the easiest to enforce and it propagates. Obviously quality changes when a particular data element that's created of good quality travels through its path in life and gets combined with other things and quality might suffer and the quality has to be fixed again but the easiest place to impose quality and governance is at the creation time. And then the importance of metrics and benchmarks. If we can't measure, we can't improve. Measure what is of value. We talked about what's of value, what's not of value. And more importantly, do not measure based. Vanity metrics are useless. There's a number of followers, those type of things. A number of people that love me, it makes you feel good but it's not useful, right? So those are essentially vanity metrics that really are not necessary. Impact of data quality. So good data that makes good things happen and good data prevents bad things from happening. So these are the two things to remember and that's the impact that data has on business, right? Bad data, bad things happen. So finally, those are the takeaways that I have for you guys. I hope this presentation has been helpful. Now I turn it back to Shannon and Q&A. Kasi, thank you so much for this great presentation. And if you have questions, feel free to submit them in the Q&A section in the bottom right-hand corner of your screen and to find that icon, just click the three dots in the bottom middle of your screen. So, and just to answer the most commonly asked questions, I would just remind or I will be sending a follow-up email to all registrants by end of day Monday with links to the slides and links to the recording. So jumping right in here, Kasi, how much of the nuts and bolts needs to be explained to non-data people? Yeah, as little as possible, right? So like I was saying, are you saying how much of the, how much business people had to explain to data people? Is that what you're asking? How much of the nuts and bolts needs to be explained to non-data people for the other way around? Oh, the other way around. Again, it depends on the data person. Typically, I would not explain a lot of the nuts and bolts. So the only time you really need to do that, the nuts and bolts, like, you know, when we go back to that first slide where, you know, there's a board full of equations, most business people don't want that. But there are always what we used to call power users that want to know exactly how you do it. So those tend to be the exceptions, but most business people are interested in seeing what the end product is. So all the questions that we've got, that can't be. Okay, must have been crystal clear. We've got more coming in. It's an additional comment to that question. Just need to justify resources required for data quality. So how do you do that? So it's hard because if you say somebody is, you know, I need a data quality engineer, right? So it's hard to define the job description. You can always say I need a QA. So from the application side, it's much easier to justify resources. So what we have to do is we have to come up with creative titles. So I had a guy that was in charge of my data quality. I called him the content officer, right? And so I need somebody to curate the content. I need to somebody make sure that all the data we received, the sources are correct. And so all of a sudden he had a status that is not a QA, because nobody wants to talk to QA. So by being creative like that, you can say I need a content officer as opposed to saying I need a data quality guy. And I know that it's not much, but it's very hard to justify data quality people because people don't look at the data pipeline and say that I can justify an FTE. So the way that I used to do it is everybody gets a little bit of it. And this is really what has to happen is you take away people, you say you're working for 10 hours on data quality to somebody that already is doing something, and then you create need for other resources as opposed to data quality. Does that make sense? It does. And what about in the case for metadata analysts? So metadata is relatively new. It depends on the organization, because most of us already collect metadata. So if you really need a metadata person, for example, you're rolling in Elation, one of the metadata tools, then essentially that person's role, you can call him a librarian, for example, a data librarian, sounds better than a data analyst, because that's what they're doing. They're in charge of curating the data, making sure, along with the stewards, stewards, because they're working with the stewards, but they're essentially the ones that are maintaining the metadata repository. So you have to change the job description and say it's data curator. And so things that appeal to the business people. Business people understand curation. They don't understand metadata. So it's on the data practitioner on us to say, put the job in terms of what the person is actually doing, as opposed to some technical jargon that we have. How do you convince non-data people that the data actually does have value to the stakeholders needing the data? So this is where you're playing the role of the state jobs. You're deciding that they need some data, right? And for that to do that, you have to have evidence. For you to get the ear of someone that needs a metric, you know that he needs a metric, but he doesn't know it, right? So the thing is you have to help prove why they need it. You can show if that metric is used, are they going to make better decisions? How is that going to help them? So this is why you have to bring proof. You have to create that trust in that whatever it is that you think that they need. Otherwise, it's you thinking what they need. They haven't seen the need yet, and there's no way you can sell anything to anybody when they don't see the need for it. How do you move from minimum viable product to most viable product definition of MVP? So I haven't heard that one. As I think about it, this is how we over-engineer data products because we are planning for apocalypse when the business side is looking to cross the street, right? And so most viable product is first of all, and my mind is pretty hard to define what that is. And if it just happens to be in the person that knows the data in their mind, it's going to be really difficult to sell those extra things because they cost money to anybody on the business side. It's just, I wouldn't even try to do that. All right, Cassie. Well, that does bring us right to the top of the hour. There's other great questions and thanks so much to our attendees for being so engaged in everything we do. We just love the questions that have come in. But again, we're right at the top of the hour. Just a reminder, I will send a follow-up email by end of day Monday with links to the slides and links to the recording to everybody. Cassie, again, thank you so much for this great presentation, such an important and hot topic, of course, and really appreciate it. Absolutely. It's my pleasure. Thanks, everybody, for joining us and staying for an hour. I appreciate it. Thank you. Have a good day. Bye.