 Hello and welcome. My name is Shannon Kemp and I'm the Chief Digital Officer of DataVersity. We would like to thank you for joining the latest installment of the Monthly DataVersity Webinar Series Advanced Analytics with William McKnight. Today, William will be discussing the ROI of adding intelligence to data sponsored this month by Simarchy and Monte Carlo. Excuse me. Yes. And so just a couple of points to get us started. And due to the large number of people that attend these sessions, you will be muted during the webinar. For questions, we will be collecting them by the Q&A section. And if you'd like to chat with us or with each other, we certainly encourage you to do so to open the Q&A panel or the chat panel. You will find those icons in the bottom middle of your screen for those features. And just to know the chat default sends to just the panelists, but you may absolutely change that to network with everyone. And as always, we will sign a follow-up email within two business days containing links to the slides, the recording of the session, and any additional information requested throughout the webinar. Now, let me turn it over to Steven from Simarchy for a brief word from our first sponsor, Steven. Hello and welcome. Hey, Shannon. Thanks. Thanks for introducing us. Can you hear me? Can you hear me all right? Hi, you sound good. And I see your slides just great. Perfect. Hi, everyone. My name is Steven Lin. I'm the product market manager here at Simarchy. I'm going to share with you very briefly on how our Unified Data Platform can help you master your data and accelerate the ROI on your data quality. So a little bit of the background on who we are. So we're a master data management leader recognized by most analysts in the industry and probably many of your peers. We've done this rodeo a couple of times. We have about 300 clients across the world with very different industries, different use cases. And our main claim to fame is really about 80% of our clients typically can deliver a tailored MDM solution in about 12 weeks, so a little under three months. And then once they sort of get that up and running, improve your quality and see the benefits, they usually leverage us for more than one use case or another domain that they add on. Oops. So why is the quality challenging and how does MDM fit into this, right? The biggest thing that we see from our clients is that one, the data ecosystem is growing extremely fast and so are their business needs, right? And oftentimes maybe not always in the same direction. So it brings a lot of frustrations. And as you might probably be aware from your own experiences, the data needs are less simplistic now. There's not just a single use case or a single team managing and stewarding all this data to create the high quality data that you would need for analytics or other business use cases. It's a little bit more convoluted and complex like this, right? So just because you have a customer 360, that's not the end of it. That's not the only data that you need to manage and ensure that there's high quality because customers buy products and different partners will share data with you to create these products. Customers also, you know, give you money. So finance needs to be involved to make sure that these invoices collected, they're reconciled and so on and so forth. So there needs to be something sort of foundational to sort of tie all this together and organize it in a way that's actually meaningful. But we still recognize this, there's still a lot of headaches, right? Because without something to organize the complexity, making more manageable, you do run into these situations where you're asking these very vague and large important questions of just what is the value of doing something like this? So for now, where do we start? How do we deliver and who can we trust, right? Because so far, maybe like Excel spreadsheet has worked but will continue working forever. So those are some of the things that we run into quite frequently. So the main point of MDM is it really helps accelerate this alignment between what your business need and your data teams need, and it creates a collaborative foundation so you can actually ensure that you can measure and deliver ROI on your data quality initiatives. Okay, so a lot of talk. How do I actually get started today? What are some lessons learned that we've can share from our decade long experience to you guys? So the biggest thing that we see in our client success is starting small. So really prioritizing a single use case for a domain. Most of times that's customer 360, just because customers are typically the largest sort of data domain for our customers, but there's very different ways you can start depending on what is the most important and most critical domain for you. And actually aligning it to see results, right? Aligning it to actual business KPI. Oftentimes we see our clients get bogged down into the sort of baseline metrics of, okay, we increased our data accuracy or duplicate data, but aren't really sure how to align it to the business of what actually, quote unquote, matters to them is increasing, let's say month in month revenue growth, right? So an actual business KPI that matters to them. And then as we sort of look into more solutioning and design and wanting to scale, because the future is always going to be changing, something that is future ready, something that's important that we preach to our clients and how we design our solution, something that's easily configurable, no code, not complex that business users can understand, something that's flexible and open architecture. So as new data sources come in, technologies change, it can rapidly adapt that you're having to migrate to another solution. And then something that's like a unified data platform like ours, right? So something that has more than one capability that can help you solve the different angles and areas of challenges that data quality can creep in. So what do we actually sell, right? So we have these two incredible modules in a single unified data platform. We have XDI, which helps orchestrate data from your source systems to your target systems and then XDM is sort of this mastering engine that you configure your business without any code. And then it generates custom applications for your business teams or your data teams to manage the data, improve the data quality collaboratively, then finally ensure that it pushes the trusted golden records with high quality data to your analytics or operational needs into sort of updating your systems and applications. So that's enough show what we can do and you can deploy this anywhere whether it's on-prem, cloud or hybrid. So if you want to ensure that your business teams and data teams are working together to help improve the ROI of data quality, Samarki is definitely here to help. Thanks Shannon, back to you. Even thank you so much for kicking us off and thanks to Samarki for sponsoring and helping to make these webinars happen. If you have any questions for Steven, feel free to submit the questions in the Q&A section of your screen as he'll be joining us at the Q&A at the end of the webinar. And now let me turn it over to Jesse from Monte Carlo for a brief word from our second sponsor. Jesse, hello and welcome. Hello, thanks for having me. All right, I'm going to do a quick overview of Monte Carlo and just kind of work, because I'm going to be part of this conversation today because the ROI of data quality really does tie in closely to at Monte Carlo what we see a lot of. So by quick introduction, I'm Jesse Mailer, I'm on the product team at Monte Carlo. And we're going to talk about something real quick called data downtime. It's kind of a term we use at Monte Carlo. And it really could be best described with a meme. Data downtime kind of encapsulate is very much encapsulated by this meme. It's a period of time when your data is down unavailable. It's got errors in it or otherwise just, you know, not accessible to your team. And really, you know, this becomes something that a lot of data teams have just become okay with. It's something we just kind of, you know, we know it's going to happen. So we just like sit in the burning room and we just tell ourselves that it's fine. And it's something that's quite common with a lot of the customers that we talk to or, you know, people that are basking about data quality. But we know that data downtime or data quality issues have a huge impact on your business. You know, we know from talking with our customers that around 70 high severity incidents occur every single year per 1000 tables. This is something that we've kind of learned from talking with our customers. And we also know that when those incidents occur that around 30 to 50% of a data engineering's time is spent on drills these fire drills to actually correct these issues and correct these data quality issues. And furthermore, we know that really 80% of data science analytics teams spend their time trying to fix clean and preparing this data. And there's a lot really going into and so you see there's a lot of time sink going into just working with data and bad data even. And all this ends up with kind of kind of a point to bring it home. We know from research that around 12 to 27% of annual revenue lost is can we do an attributed to poor data quality. So that's a big impact for having data downtime and poor data quality on your data. Now, what we also know is that data downtime incidents, a lot of those occur 90% of those occur with downstream consumer detection and only 10% of those really happen today. Upstream and are detected at the time of code or through some automated tests. And ultimately this 90% of faults or, you know, delayed detection of these data quality incidents lead to weeks or, you know, days, weeks, months passing before these issues are discovered, detected, and ultimately resolved. Now, in all of this negativity, there is some good parts of this. And there's some the good part is that data downtime or data quality incidents, largely look similar across companies. This is kind of, you know, can be demonstrated by asking them some similar questions. You know, is my data up to date, does the size of the data look off, you know, why is this value suddenly higher than what the normal rolling average is. These are many questions that we can ask. And this kind of makes us, you know, have this this template of what data downtime or data quality incidents look like. And so Monte Carlo, this is really what we what we do. We form these five pillars around data observability to address data quality incidents and data downtime. These are the five pillars really form the foundations of the Monte Carlo platform, freshness, volume, distribution, schema and lineage. And all these pillars together go go together to help you help our customers solve, detect, solve and ultimately try to prevent data quality incidents and their data pipelines. So that's what we do at Monte Carlo. There's some great resources that we have out there kind of about data quality incidents. And if you'd like to learn more, look us up afterwards. So the conversation back to you. Thank you so much. And thanks to Monte Carlo for also sponsoring and to make these webinars happen. And likewise, if you have any questions for Jesse, feel free to submit them in the Q&A section of your screen as he'll be joining us in the Q&A at the end of the webinar. Now let me introduce to you our speaker for the series, William McKnight. William has advised many of the world's best known organizations, his strategies from the forum, the information management plan for leading companies in numerous industries. He is a prolific author and popular keynote speaker and trainer. He has performed dozens of benchmarks on leading database, data lake streaming and data integration products. With that, I'll give the floor to William to get his presentation started. Hello, and welcome. Hello, and thank you, Shannon. Thank you also Stephen and Jesse for those great presentations up front. Hello from Rainey, Texas today, much needed here. Okay, so I'm talking about a topic that has been around since probably about the dawn of man or the dawn of language. I can certainly imagine that there were cavemen back in the day in their grunting ways, saying to each other, well, I made a mark on this stone for every spear that we have and I thought there were 100 and there's only 99 where's the missing spear. And that's just carried forward into the explosion of data quality issues that we have today. So to be sure, it's still an issue. And so what I want to do is clear up some of the cobwebs around data quality and get you all on a great path to improving something that Jesse just showed us how important it is. It is to the bottom line of our business. So I'm going to talk about what is data quality. How do you know when you have enough of it. What is a violation look like. What can we do about it. And how do we ultimately determine the ROI of data quality efforts which I think is a good way to help you justify the addition of data quality to whatever it is that you're doing you'll certainly hear a lot of my philosophies on this. And this sort of thing coming through and where I find that data quality should be placed in the organization and so on. So let's dive in. Like Jesse just said enterprise data is still a mess the proliferation of data sources. These are what I attribute the problem to the complexity of data formats. Now we have so many more formats and so many different data styles or different types of data stores so we've got of course relational databases cloud storage no SQL databases etc etc on and on so many different data types new ones seem to come up that are very important every three to four months. And so the vendors are are I mean they have dedicated teams that are just keeping their their their software up to date with these formats and of course that's generally different kinds of data. The lack of data governance I still have to say the lack of data governance now I do believe that as time has gone on here a lot more of you a lot more enterprises out there are acknowledging the need for what they're calling data governance. And but I and we're still in the early days though really making it super effective in a lot of organizations so I'm going to address that because that's where the data quality rules come from. So that's where we start to learn where we have violations and what we can do about it this is not a technical issue. This issue of data quality, you're going to have to have business interests, heavily involved in this process and that can become the bottleneck so we'll talk about that as we go on, and then there's this big push into AI, which is really a new form of data utilization, like we need another one well apparently we do. And so we are marching headlong into using data for these really complex and important purposes and using every last square inch of data that we can get our hands on. And so it is all the more important that that data have great data quality you see what gets the budget in organizations, a lot of times it's the uses of the data it's what it's what the executive see I want this dashboard I want this utilization of data this application so but what about the data that it uses that is where the majority of the budget should be the data should be screaming out. Hey, here's what you get to do with me, and also maybe doing some of it on its own but it if it is a poor quality. So if you keep putting bandits on a gushing moon by just continuing to work on those dashboards and those things that are above the waterline that everybody gets to see but not the 90% of the effort which is behind the scenes. Then you're really not getting too far. So I must say, we got to focus on data. If we're going to fix data quality issues we can't just keep changing data in the dashboard changing data for this report changing data for this KPI, because it's, it's coming off the database wrong but we're just going to fix it right here. Let's fix it. Let's fix it back at the closer to the source. So, but William, we have the CDO function in our organization, I've been consulting to CDOs I have a CDO advisory service I've been doing this for about five years, and I can tell you that it, it tends to become a political position, like a lot of other things right so I try to keep the CDO focused on hyper shared artifacts like the data warehouse, the data lake and master data manager and by the way hyper shared that's my term. You know what I mean it's these have a lot of leverage they're used once they're used twice, they're used many many times for many many things. And so the leverage is is higher in these types of artifacts now that being said I do acknowledge that at any point in time in an enterprise, there is one application that is kind of the feature application for these hyper shared artifacts and we do focus on that. But over time, if done right, these artifacts support many many applications so let's focus there data architecture. Yeah, that should be the CDO's focus I'm just rounding it out here data innovation new ways to use the data the CDO is responsible for driving innovation organization's use of the data. This includes exploring new data technologies and applications and developing new ways to use data to improve the organization's business performance. And finally, I see the data governance function and the data quality function as a result of that coming under the CDO's focus very intently so when it comes to who what in sector level is going to be responsible for this issue we're talking about today. I'm saying the CDO if you got one and study showed that a lot of you do data quality is essential to business success. Correct data is a widespread need yet data quality lacks consistent definition so we got to put some definition around it, or we're just trying to get anywhere so many organizations, people walk around saying the data lacks quality and nobody drills in on well what are you talking about. Is it missing. Is it wrong. Is it not fit for purpose or their gaps. I'll get into all the, all the various data quality defects as we go along here, but it's important to drill in. People are talking about data quality half the time. They're not what I would call data quality issues they're more about well the dashboard looks wrong or reloading the wrong sets of data. These aren't necessarily data quality issues but allow for a lot of freedom by your stakeholders to say things and go with the flow of what they're saying and figure it out. So you need a definition for data quality. And I'm not trying to be profound. I lack of intolerable defects in the data that's about the most unperfound statement I could probably make about it but yet it does carry the connotation of what I want you to how I want you to think about data quality, a lack of intolerable defects. So you see, we are dealing with a discipline data quality that most people don't care about it until they do until bites. Usually it's not considered critical path what is critical path while it's building the application is building the database it's that you know that will support the application and building the dashboard or the AI or however you're going to use the data. Those are considered critical path, but I don't think data quality is quite yet in many organizations anyway kind of cross that chasm into okay. It's in line with those things it must be considered up front, early and often but you now that you know, at least after this webinar you'll know you probably already know that data quality is really important, and you must be an advocate now for it. And you must cite the downside risk, the risk of doing this application, doing this AI for example I forget who it was at the terror data show last last week cited from the, from the speaker platform that you cannot if you cannot trust your data you cannot trust your AI and I couldn't agree more. Absolutely. So, consider these business imperatives, and this is going to lead into some architecture information based in store and contact center crossing up selling. What do you need there. In order to do that well, do that function well you need clean customer and product data. You see you can cross reference. You need your business imperative to your business subject areas, and those subject areas are master somewhere within the organization, or maybe they're not they're all over the place, and they should be master. So we're talking here about customer I think Jesse alluded to customer being very important yes credit card fraud detection that needs clean customer almost everything needs customer and transaction data supply chain efficiencies. Clean product and location data. Of course I'm just giving the surface here to really do a great job at any of your business and prayer imperatives. There's a whole lot of data subject areas that need to be clean but let's start with the biggest bang for the buckets often customer and product. Now, I just mentioned that I want these are I suggest these be put in leverageable artifacts like master data management for things like what we're talking about here today. All of these business imperatives have failed or underperform because of incomplete incorrect inconsistent data, or let's say data quality issues so I think it's important for you to go now and cross reference. All your business imperatives with subject areas and and that will help you to see how important some of these subject areas are how important it might be to build them once in MDM I suggest and use it many times for all of these applications not have everyone roll on their own out there. Okay, so I'm going to make some investments in data quality I'm going to show you a little bit later where those investments can go. But what are these investments going to do they're going to give you cleaner data. And somebody somewhere is saying so what to that by the way, business objectives cannot be met without quality data in support yeah we know this data quality returns are in the improved efficacy of projects. There's no you came for data quality ROI but I kind of fooled you there is no data quality. The ROI is in the applications that use the data that hopefully are going to provide enough data quality to attain those are that ROI. All right, and it should be an integral part of most projects. Hopefully you'll agree with me by the end of this. So it's important to know about data quality it's important to act on data quality issues I think a lot of us know. But we're not all acting maybe we don't know what to do. It's not, it's not a proper strategy to, to just do just beat the data into submission to us and not do things systemically and not do it without business input and governance. So you don't necessarily need a tool we'll get to that tools can be distracted, but oftentimes depends on the data, the level of the data quality issues in an organization as to whether I would recommend a tool or not. The benefit of clean data though remember I said somebody somewhere saying, so what it's not enough, not enough to go to your organization and say, we're going to clean the data for you now and some data driven organizations. Okay, I say those organizations that have kind of cross that chasm and, and they recognize the value of data from top to the bottom. There's not too many of them, but in those organizations you can get away with stuff like this more, but even in those organizations I would say let's drive back to ROI. Let's talk about the strategic benefit of it. How is going to lower TCI which is a form of ROI. That just means you're lowering expenses. Data quality should have a value proposition to a to a project or projects. And let's let's let's go in a let's go in a stepwise manner here. These are not our eyes, but they lead to our line. And you got to go here and understand what category of thing that you are improving by improving data quality. Are you improving decision making. Are you increasing efficiency. Are you reducing risk, etc, what you see there. All of these things can be taken further into ROI, you got to start here, and but you cannot stop here. And that's kind of the point. So let's get to data governance. Some of us are calling this data part of data DevOps now or just straight up DevOps, however you say it. People experience a range of emotions in the process of transformation data governance is about transformation of the organization. And we want people to be involved and use our data governance program to keep business interest in data quality, keep the idea of data quality being important. Keep that alive in the organization. You can find probably most of you can find five ways in the next month to inject this idea into your business. Maybe it's through a presentation you're giving. Maybe it's going to be a side comment to an executive at the right point in time. I don't know. There's a few different ways that you can keep that alive, but I can tell you that if you have this going in your organization, good data governance, marching towards being data driven, and that sort of thing. So without that, it will grease the kids for your data quality program. You don't have to fight so hard to get funds for data quality. You won't have to fight so hard to prove that it's important, etc, etc. However, most of you probably are still going to have to fight the good fight here. Okay, now when it comes to data governance, there's a wide mix of effective implementations of data governance. And I'm afraid that many enterprises have have made it more of a academic exercise and this very malleable term of data governance cannot be a technical exercise for long. I've seen them. We all have I still say a good 50% of the so called data governance programs out there are not helping the applications and that's what I want these to do help applications. They're also establishing accountability and tangible delivery, establishing data governance is not helpful. Deliver to the organization both in support of projects or applications, and as a horizontal organizational function but mostly, I mean if you're not in support of applications, then the data governance is not accountable. It's not going to last for very long in an organization. The governance needs to strongly line itself with those data stores that have high leverage in the organization. So I've mentioned this probably three times now. Okay, so we keep coming back to leverageable data stores, MDM, data warehouse, data lay those sorts of things. I'm going to pick up a little bit here without a basis and did a lot of a lot of data quality I said already it's about the business right. Okay, it's about communication without a basis and data quality the counterparty has no idea what you're talking about. And I said this before you know people it's like people on on Twitter walking around just kind of talking into the air. Right. And this is what it feels like when an organization is having is data quality discussions oftentimes it's it's people talking over each other people talking past each other. Not understanding so bring this bring some ideas from your experiences and try to lay down that culture. Listen actively and attentively be open and honest be patient understanding be flexible compromise were possible and so on that don't make this a rigid. You must do this or the company is going to fall over tomorrow that you know change this one field right so you know speak appropriately in your communications about data quality and it's real work, by the way, it's not an afterthought. It's something you've got to think about, you've got to think about those messages and being effective with your message especially today when we all seem to be working remotely we don't really have a lot of face time. We have to make the most of our limited interactions anymore so I do recommend for data governance and this is not the data governance presentation, but it helps data quality have those meetings of it on a regular basis make decisions actions have great timing and understand that the focus of this will change over time it's the applications change over time. But change, part of data quality is changing processes and changing data, changing those things anywhere requires extreme buy in and education, education about the change. So we want to make changes, we need some support data governance can be that support for us. As you're some data quality. This was this is fine. Only a methodological approach will work it must be a repeatable process must be progressive improvement you're not going to go from zero to 100 overnight you are going to have to make little changes is kind of like you're going to the gym and you're trying to improve your bench press it's not going to go from 135 to 225 in one session. It might inch its way up to 140 next week, etc, etc, etc. So that kind of progressive improvement is what we need out of our data quality. So I'm going to show you how to measure here in a bit. This methodological approach must encompass new data, because we're always getting new data and we're always getting changing requirements so it has to be that kind of program can't be locked down to today, to the way things are today you see the causes of poor data quality, keep coming in the front door. So we're we're we're behind the door, and we're trying to clean up what's in our house, but we keep opening the door and incomes new trash. I don't know if that was a great analogy, but you know what I mean, the causes of poor data quality keep coming in. So we got to get on top of it. So let's have a data quality improvement program define those quality expectations, profile the data, measure the data quality options, select the best one and then go about improving the data. I have whole presentations on just this slide, just that process. But this is the process, and this is this is a mouthful here. I just said define the quality expectations as if we all know what that is well we'll get to it. We'll get to it I'm just setting the stage here a little bit before we dive into the pieces. It's important to find the expectations it's not going to be 100% define where it needs to be. And I, and I highly recommend grading the data quality in every one of these databases these leverageable artifacts and others profiling the data so you know what your grade is now measuring some data quality improvement options because there's always options as to what you can do about data quality. A lot of times the options fall into and I'll get into it but I'll just kind of lead with this now. You're going to change data entry, you're going to cross check data to see if it's correct or you're going to change data. Those are some of the things that we can do about it. Now, what do we want to do something about well, these are the data quality rule categories. And if these are violated you have a data quality violation, congratulations. We probably all have a bunch of them. Business processes with data driven insights. If they are not aligned. If, if, if business is conducting business as usual, without bringing enough data to bear on it now you might say William that sounds like an application you're, you're getting into applications yeah I'm going to pull this under data quality though because we're not you're using the data properly when you're your business processes are not aligned with data and it's often because of communication data driven decision making our decisions being made based on data that you have that you have readily available or not. Does the data conform to referential integrity, the uniqueness you would you would expect in a field like customer number that should be unique what if it isn't that's a quality violation. Is that correct. Does a person have if you be to see, and you're keeping track of people's I don't know cars, is it reasonable that one person could have one car course, one to two yes of course and one to I don't know what the magic number is 10. Beyond that it may look kind of fishy and maybe there was a fat finger there or maybe somebody was using that field for something else on and on. All these things happen inside of organizations I'll give you some examples, but the point we're getting at is, when are you doing enough, when are you doing enough to meet the quality standard that is required now I would also add here that the quality standard for data warehouse or one of these individual platforms is higher. So these are your four actions that you can perform. Actually, I've got five, five actions to perform for data quality I mentioned that before screen data entry, add cross checking quarantine data, which is just a transfer of data quality to somewhere else it's a bit like kicking the can down the road but still you can do that and prevent the quality violation from going forward into your architecture. You can report on quality violations let it flow, but report on it let somebody know that should care about this, so that they can fix the future so that it doesn't keep coming in that front door maybe that's what you do or you change or repair and correct data to conform to data quality. And all this is about preventing improper use of data and raising the awareness of data quality. Yes, data quality can be automated. And as much as you can automate go for it. Data profiling can be used to automate data quality checks, data cleansing validation everything you see there. Now, I wouldn't go out implementing most of us are in a position of data architecture. I wouldn't go out and start implementing data quality automatically without getting it clear through data governance that this is what is happening. Data governance should of course have subject matter experts on the different domains and they should be empowered to speak on behalf of the organization for those domains so you got if you got that set up, then you're then you're running data governance efficiently and you can do things like this. Put your data quality and leverageable platform I think I've already beat this with a hammer. I've added data hub here because some things are not called these other things but yet they are a they are a leverageable platform for multiple applications, which I think is great and I like to focus on those things. Every project needs to focus on data quality. Clean data is the key to unlocking the power of many processes, including some of the ones that I mentioned before. Now having clean customer product transaction location data is essential for these projects to be successful. So, wherever you're, you're, you have your clean data maybe it's a hub, hopefully it's more MVM based, maybe it's in the data warehouse if it's more transaction based, but there should be a go to place. And it shouldn't be that every application needs to create their own copy of that data warehouse because they don't like the, the, the transformations that somebody somewhere did for these, or maybe just that group's hard to work with, or it's hard to work with I'm just shooting in the dark here and I just like control. So if that's the case then you need to becoming more effective IT or let's say, let's say a shadow IT organization, so that you are providing leverageable data to the enterprise. Now, how much. Here's the big question, how much money should be spent on data quality. Well, sometimes I'm given zero, and I know you're given zero but we do it anyway, we find other categories of effort within the project to do data quality in that's not great. I mean, that's, that is less effective but it's all about how, how we want to phrase things to some people it's perfectly fine to do data quality, what we're talking about is data quality is part of data integration as part of the architecture. Okay, that's fine. That's great but if it's not and somebody is saying that don't you know don't waste your time on data quality. It doesn't matter. But 10% of the budgets, somewhere, some way somehow should be spent on data quality right now for most data quality of most projects, excuse me, because most projects use data, and this is the kind of where things are that's kind of my starting point, and I can definitely be swayed to more or less based upon the level of data and how much data is being used and so on and so forth but as a rule of thumb, 10% of the budget on this stuff. So, note that data quality will never be perfect. Hopefully it becomes all of these things though. This is your goal. Make it one through 10 and then you've got 11 they're fit for purpose. So, how do you know. Well, this is not easy, but I do advise that you score your data quality adherence over the possibility so understanding what you're expecting out of the data for to be hugely successful, and then going and profiling to measure where you are with that. Okay, and the tricky part here is the bottom sentence multiple creative rules are used to determine the overall system score. And where many organizations fail is they hear this, they get this, they want to do it. But how do you, how do you come up with a say 10 data quality rules that are going to define the data quality for this data warehouse. For example, how are you going to do that. Well, with the help of governance hopefully, but it can be done. And I'm one that's, I'm going to trial balloon things, and I'll apologize later if I got it wrong but I'm going to start putting scores out. People going to start asking where'd you get that score and maybe that will generate the conversation necessary to get that scoring down a little bit tighter but I really like it when the culture understands that we are scoring data quality. And that is a motivator right there to get the data quality score up. And you can do celebrations around that and so on. Data quality is cost to the enterprise of poor data quality. Yeah, there are so many. And Jesse mentioned some of them in his presentation before but one off data quality repeated remediations. So I fixed data quality by didn't make it systemic. I put a bandaid on my problem, but the rest of the organization still got a big data quality problem so we're going to repeat data quality efforts. That's failed enterprise initiatives and that's, that's huge. That's huge right there. And sometimes you don't know until down the road that you've been doing, let's say predictive maintenance with bad data you've been doing target and marketing with bad day you've been doing with bad data. And so you haven't been doing a great job at it. I mean not as great as it could be right. And so whatever the ROI is on those projects could be improved. If you had better data quality how much, how much are I, how much data quality improvement. Well, did I say that this was all science and not some art. I did because it is a lot of art. It is a lot of just abstract thinking. And that's one thing I like about is we're not, we're not doing it by the numbers here get a quality. It's so multifaceted. It requires you to think in the abstract quite a bit in order to get this right. So for example, which rules are you going to go after, which applications are you going to suggest are getting improved by you doing this. And by the way, what thing are you doing, because there's a few things you can do. So you could you also have just around it out here for data quality misguided roadmaps. You decided to do a roadmap based upon data quality issues, and it comes back to buy compliance costs. That's a driver for a lot of data quality efforts out there today. You might end up in a situation where compliance, not only adhering to it, but maybe in a fine situation is costing you. And then you can, you can have various dollar per data record attributed to however many dollars. That makes sense. Failed outreach, you're losing customers that therefore you lose their customer lifetime value to the business. Storage space for keeping poor data with a bunch of duplicate records incorrect marketing segmentation personalization. These things really add up. And the cost expands because on average corporate data is growing at 40% per year. So until you get a handle on your data quality issues. It just keeps expanding that front door keeps opening. So for example, let me give you a ROI. Example here. I'm going to use a simple one that I hope we can all relate to just target a market. And we're sending out mass emails or something with our promotion on it. Okay, so we got a bunch of data, a bunch of customer data. Some of it's bad. Some of it's always bad, but we want fewer bad contact data. We want more good, I guess I could have said. We want to improve our customer segmentation, bad data, bad segmentation, meaning bad marketing, higher marketing initiatives are why that's the bottom line. So we have these potential hypothetical data quality scores, 8590 and 95. Well, William, where did the score come from? Well, I just went over that it's adherence over the possibilities, and you're pro writing a bunch of different scores to come up with an overall number. I like it for the scoring to kind of land where our scoring used to land in grade school right 90 to 100 is an a 80 to 90 as a be et cetera, et cetera, but to each his own as far as that goes. So you can reach a certain number of prospects based upon a higher data quality score. You will get higher return on marketing with better customer profiles, because the targeting is better. And then there's but the average profit of a conversion doesn't change. So you do the math. Then you understand what is the investment I'm on the second to last column there. What is the investment to get to 95 to 90 to 85. What if I do nothing? Okay, well, then it's 60. You got zero for investment there. You don't have to do anything extra to get your whatever 60. And then you can measure the ROI. And as you can see, in this example, it looks like the higher the data quality score the higher the ROI. Oh, but look, you had to spend what is it 100,000 more. Is that right? 100,000 more to get to the 95 data quality score. That's okay. It pays for itself. And I'll have a big statement for you in regards to that as we're going up. So how do you how do you know this stuff? Well, fortunately, I've never really had to know this. But I do know how to set that table for the business interests. In this case of marketing that should know this. And if they don't, there are things we can do. We can do some A to B testing and a limited market to see how we do with different levels of data quality. But we can sit here and guess a lot of times, the guests and I don't mean just stick your finger in the air kind of guests but a good educated guests goes a long way. I just wanted to give you a quick time check we're about 12 minutes away. Okay. All right. Yeah, so much to say. Okay, let me finish up here. So this is a different example, different data quality scores, same kind of deal though. But here we see that raising the data quality score all the way up to 99 is not going to give you the best ROI. We find the best value proposition here is spend the 200,000, get it to 98, call it a day, keep it there and move on. Here's some other examples. So, oh, no, this is a little bit more of the example that we just showed you. Reverse of what, what can we do? Well, how do we get the data quality score up? Remember, I said there's a few ways screen the entry. We can cross check the data, which is what we're going to do here. We can change the data based upon what we know. So cross checking the data here means that we're going to use a third party service and cross check our customer addresses, cross cross check our customer demographics. We're going to purchase additional prospects that look like our good prospects and so on. We're going to use multiple and different third party providers and corresponding deduplication results. So not just one but two. So a combination of these things is going to drive our data quality score up, which in a limited way will drive up the overall ROI of the application. Here's some other examples. I'll just mention the AI exam applications. Okay, many of us are doing AI applications in all our verticals, inaccurate or non representative data leads to biased and inaccurate results, which lead to sub our application ROI and potential compliance costs, increased computation leads to increased expenses of all that inaccurate data. And that's a factor when it comes to AI applications. So here's my big bottom line for you. If you can materially, I by 1% improve any of these or any of these types of items, or literally 100 other things that you can think of for your business, and you do a reasonable job with data quality, data quality will more than pay for itself. If you can help the fraud detection application, detect 1% more fraud if you can help the predictive maintenance application predict with 1% greater accuracy, etc, etc. It should more than pay for any data quality initiative. But the key is not only doing it but making sure that people know that yeah data quality did this. All right. Now I got some miscellaneous items to round out my part of the presentation. If you have any questions for me or Steven or Jesse, please go ahead and put them in the Q&A panel and we're going to get to them in just about two minutes. I have some things here about when to consider using a data quality tool, and I've had to kind of make up some things in terms of numbers of attributes and entities and so on that sort of thing. But what I'm trying to say here is there's a certain level of data quality, a certain level of environment where a tool can really help out. And by the way, those big data collection systems, they don't all get everything exactly once. Sometimes they get it multiple times. You have to look at your particular data store to see how it's doing it. And be careful with that. Data catalogs, I think they are definitely part of the data stack anymore. They serve as the metadata store for all services, including data integration, prep and transformation, data lake, data warehouse, machine learning. We could do a whole presentation on data catalogs and all the wonderful things that you're doing with data catalogs out there. If you haven't started your journey there, your environment is only going to get more complicated over the next five to 10 years than less complicated. We're not to a point where I can even see that they're going to get less complicated. So now's the time to start putting in some of those pieces around our environment. Streaming data data quality, wow, yeah. Data is coming in so fast, makes it difficult, extra difficult to assess and maintain data quality. Yet, we still need data quality even in streaming data. Maybe we're not checking for reference integrity and stuff like that, but we should be continuously monitoring streaming data to identify anomalies, outliers and deviations from expected patterns. And we should be able to do this without regard for slowing down the stream. So many I have seen are evaluating streaming data just based upon loading the data. And I'm guilty of this too, but not also checking to see what would be the overhead of doing a few data quality checks on that data. And we still want to stay ahead of the stream. So please do that. Data lineage has become really important, especially with the era of compliance. And this provides you a graphical representation, impact analysis of change, root cause analysis. This along this function, which can be found in different kinds of tools, but this function along with the data catalog are a couple of things that I am recommending be part of your standard stack. And be some of those, those bounds that you're putting around your widening data environment. And yes, data quality is getting subsumed into data observability, which gives you not just data quality, but also data freshness, data volume, schema change, data lineage and Phenops. Data observability is right there as well. And Jessica can tell you more about that from Monte Carlo. So what are my recommendations for you here today? Bottom line data quality can and should have a value proposition. It's not going to happen by accident. And every time an application is considered, I want them to come to the data governance team or what have you and get some data quality advice as well. Measure the level of your data quality. Not good enough to say it's good, bad or something like that, right? It's a business driven imperative, not an IT one. It's becoming part of data observability and take care of the people issues associated with data quality, which there are many. Much to say about communication here. Establish the value proposition for data quality. What is the ROI of adding data quality to each project and run it as an ongoing process. There you go. And now I will turn it back to Shannon to see if we have any questions. Thank you so much. Lots of great questions coming in. Just to answer the most commonly asked questions. Just a reminder, I will send a follow-up email by end of Monday for this webinar with links to the slides and links to the recording. So diving in here, these came in kind of early. So what are the good choices to measure data quality for data in motion? And are there many tools for doing data quality on static data? But what about data in motion and discover abnormalities in the data as, for example, unexplainable variation volume or unexpected variations in data values? Okay. If that's the question, I'll just, I'll start. This is kind of what I was touching on when I was talking about streaming data. And this is pre-landing of that data in whatever data store it's going to live out its life in, Data Lake or what have you, checking it in motion. There are tools available for doing that. A lot of streaming tools have that ability. We don't all have those abilities, those functions turned on in our stream. I would say there's still a long way to go, but I would say that this whole idea of data quality is making its way over into that world and you're soon going to see more possibilities there. And maybe, Jesse, maybe you have a comment on that with your tool. Yeah, absolutely. From the, from the modern Carlos side, we've actually just recently released our first kind of entry into helping people do data quality monitoring for streaming data, specifically around Kafka. And this really came from seeing exactly what you, what you mentioned there, William said streaming data is becoming a more kind of integral part of the full data infrastructure. It's helping people move it in motion, maybe just move it in motion between two places at rest or actually like, you know, needing to use that data in motion to power generative AI models or, you know, real time interactions, things like that. So it's becoming a much bigger part of the data stack and needs to have the same level of data quality checks on it that that we've kind of known and kind of seen you evolving for the data at rest. So it's very much kind of a new space but needs the same kind of data quality checks as the rest of it. And anything you want to add? No, I think Jesse's covered that pretty well. I'm just three minutes here, but I'm just seeing if I can get as many questions in as possible. How do you suggest handling the scenario where people say data quality is important, but resources aren't allocated because there are too many business operational items to focus on. I hope I did, you know, not easily, but with great communication, I think that's the real basis for breaking through in that scenario, which I've encountered so many times. Yeah, this is a problem. Why you want to add this many thousands of dollars for data quality checks and this much time to the project? We don't have that much time. Well, okay, you're going to live with this level of efficacy of a project then. And I just take it back to, okay, your data quality score will be a 60 and your ROI will be this and that. And if we raise it, studies have shown generally speaking, if we raise it to a certain level, you're going to get better ROI out of the application. As a matter of fact, it'll probably be the difference between what we're going to call success and what we're going to call failure. So which do you want? And by the way, a lot of times people say they don't want data quality, but then break it down. Break it down. Do you want the data to be, let's say, do we want gaps in the data? Do you want gaps in the data? Is that going to serve you well? No. Do you want numerous customers in the database with the same customer ID? Oh no. Well, that's data quality. So when you break it down, you get to a better conversation about it. Steven or Jesse, anything you want to add? Yeah, I mean, I would say seeing is probably also believing sometimes too, right? Like if you involve the business early on to put their input and actually have a stake in what does managing data quality look like for them? You know, sort of at the very end, having like a data or IP team percent, hey, this is what we've done to improve data quality, but have been saying, here's what I need from my data quality to actually do my job or meet these goals. It kind of makes them hard to escape from the accountability that they're also involved in improving the quality there. So we've seen that pretty successful in our use cases, as well as doing small pilots, right? That takes, you know, just a couple of weeks to say, hey, let's see what this looks like. See how well this works for us. And this data quality improvement actually helped me at all. And just really allow them to see what the end result is involved in early and that typically helps them sort of overcome some of these objections and say, okay, now understand from the small sample set this works. Data quality is actually helpful for what I do. And then go back to your sponsor or stakeholder and say, hey, everyone's aligned, right? And we actually have some use case and success metric, we get some more investment and take this to full scale and production or a different scenario. So that's typically, I don't know if that's a good answer to your question, Darcy, but that's what we see work. I love it. We're a little bit over time, but Jesse, I want to make sure you have an opportunity to speak if you have something to add there. Oh, no, nothing else to add from this. This is great. I love it. Well, you guys, thank you, Steven and Jesse, thank you so much for joining us today and your presentations and William, thank you so much as always. And thanks to our attendees for being so engaged in everything we do. I'm afraid that is all the time we have for the webinar. Just a reminder again, I will send a follow up email by end of day Monday for this webinar with links to the slides and links to the recording for everybody so you can get a copy of all of that. Thank you everybody. And thanks to Monte Carlo and to Samarki for sponsoring today's webinar to help make these webinars happen. Hope you all have a great day. Thanks, help. Thank you. Bye bye.