 Here we go. Hello and welcome. My name is Shannon Kemp and I'm the Chief Digital Manager for Data Diversity. We'd like to thank you for joining the latest in the monthly webinar series Data Architecture Strategies with Donna Burbank. Today Donna will talk about data quality best practices and joined by guest speaker Nigel Turner. Just a couple of points to get us started due to the large number of people that attend these sessions. You will be muted during the webinar and we very much encourage you to chat with us and with each other throughout the webinar to do so. Just click the chat icon in the bottom middle of the screen to activate that feature and for questions we will be collecting them via the Q&A section or if you like to tweet we encourage you to share highlights or questions via Twitter using hashtag DA strategies and as always we will send a follow-up email within two business days containing links to the recording of the session and additional information throughout requested throughout the webinar. Now let me introduce our series speaker Donna Burbank. Donna is a recognized industry expert in information management with over 20 years of experience helping organizations enrich their business opportunities through data and information. She currently is the Managing Director of Global Data Strategy Limited where she assists organizations around the globe in driving value from their data. She has worked with dozens of Fortune 500 companies worldwide in the Americas, Europe, Asia, and Africa and speaks regularly at industry conferences. And joining Donna today is Nigel Turner, the Principal Information Management Consultant for AMIA at Global Data Strategy. Nigel has over 20 years of experience in information management with specialization in information strategy, data quality, data governance, and master data management. He has created and led large IM and CRM consultancy and delivery practices in multiple consulting organizations including British Telecom Communications Group, IPL, and FHO. Nigel is well-known thought leader in information management and has presented at many international conferences in addition to writing numerous papers and blogs on information management topics. And with that, let me give the floor to Donna to get today's webinar started. Hello and welcome. Hello, Shannon. Hello everyone for joining. Thanks again. It's always nice to see some familiar names on the call. And for those of you who might be your first time joining us, and one of the common questions we get is will this be recording and yet you record it? Yes. So we'll have this recorded on data diversity, I think in perpetuity if you want to keep listening to it again. Or if any of those other topics earlier in the year were of interest to you, they are all on demand both the slides and the actual recording. And then please join us for any of the upcoming sessions later this year. So without further ado, to get into the topic of day is data quality. So we had a lot of resistance today because this is always a popular topic, unfortunately, because no matter how long we've been in data, there's always well, because data keeps changing, right? There's always data improvement issues that we can focus on. And what we want to focus on today, and Nigel and I do a lot of this in our day job at Global Data Strategy is really how do you look holistically at data quality problems and not just do sort of a one-off cleaning effort. And so kind of the analogy we use a lot is you can clean up the pond, but if the stream is going into the pond or still putting in dirty water, you're still going to have an issue. And that's really what it is like with data quality. A one-time cleanup is not going to fix the holistic problem. And the holistic problem, and we'll talk a lot about this, and that's what's so tricky about, tricky but also rewarding about data quality, is that it's not just a technical problem, like many things in data, but it's people, it's process, it's as well as technology. So is it the business process itself that's causing the data issues? Is it a badly designed data model or a poorly defined definition, you know, all of the above? So we'll be touching that throughout the session today. And if you've joined our webinar before, you probably are familiar with this framework that we always use, because I can say this with any topic, data quality in particular is one that is particularly multifaceted and touches a lot of these areas. But if you don't have good data quality, is it because you don't have a great data governance in place? Is there ownership over that data? Do you have a culture of data quality in your organization to really promote data? Is it, as I mentioned before, is there a poor architecture on data quality? Don't get me started. This is where I feel like I've been around for over 20 years or the data quality rant. So something as simple as going to customer have more than one address or, you know, that's a data quality rule, but could be fixing the data model, as well as things of, you know, do we have duplicates, could match the data management itself? So we really need to look holistically across all of that. And then one of the things we always focus on at global and our engagement is just going to pause for a second. There's a lot of background noise. I might want to go on music. We're not speaking. So the key to all of this is to focus on the business strategy. And I know that's obvious, but all of us, all of us, myself included, sometimes forget this, either in the eagerness to fix something quickly because, oh, I know I could just fix this. Just give me a minute or, or to play around with the text that you want to play with, or maybe we don't see the big picture. But T with data quality and we'll talk a lot about this is really focusing on the right problem, getting the risk stakeholders involved, and then making sure you do a bit of marketing around the benefits. And so that's another thing that we'll talk a lot about. But before you even do the data quality cleaning, have you quantified any harm that's been done by the data quality or even better, any opportunity that could be generated by cleaning up the data quality and look at both of those. And then when you do the quality over time, we'll keep track of that. That is one of the nice things about data quality as part of this entire framework is that it's actually very rewarding. It's easier to quantify than some of these other more, you know, nebulous areas. And it's easier to show the results. Often when we do a data strategy, one of the quickest wins we do look at and sort of especially when we're trying to tie it to ROI is some sort of data quality cleaning up the cleaning up customer data for a marketing campaign or, you know, we're solving issues with mailing best based on bad addresses, you know, there's so much that can be very rewarding as a quick way. So we're going to talk more about that. I'm going to pass it over to Nigel, one of the things kind of a fun thing we can go off, we got a whole session just on this and we all especially if you're in data, actually my family starts doing it, you go to the store and something's wrong and you know that the customer experience was bad because of a data quality issue. One member of my family went to the bank the other day and there was a problem and I told him they had bad master data and that's what it was all about. I'm going to pass it over to Nigel to share some of his data quality stories and kind of put this in context. Okay, hello everybody and I just as Shalem said earlier I've been in information management for 25 years and started with data quality so it's always been my sort of primary passion in the information management space and I think there are still a lot of common misconceptions around about data quality. Don has already touched on one of them which is that it sometimes seems a sort of slightly odd off to the left discipline of data management. It actually isn't, I think it's fundamental and embedded if you like in many of the others, you know, we all know you can't have good BI without the data quality. You can't have accurate predictive analytics if the core data that you're working with isn't accurate. So it's not a standalone discipline, it depends on other disciplines but also many disciplines depend on it. I think a more common misperception is the second one which is that there are still a lot of people out there unfortunately who think the data quality is first and foremost the problem for the IT department and what that then does is it encourages people in organisations to think well if it's an IT problem, IT can solve it, let's buy some data quality tools and let's try and apply them somewhere and all the problems will go away but as Don has said already that is really the case. I think every single data quality problem I've come across in organisations and I've come across a few in the last 25 years are usually caused by process people and IT. Usually the three things simply not working well together. So if you're going to fix that particular specific data quality problem any solution that you come up with must as Donna said earlier be holistic and it also has to be driven by the business because if you're going to change the way that people behave or change business processes then the only people who are able to do that is the business not the IT department. I think another common misconception about data quality which I've heard in many organisations you need to improve the quality of your data. Yeah I know but we've got a lot of other priorities at the moment and we maybe will think about that next year when we sorted out the problems that we currently got. Of course the problem is data quality improvement isn't a choice because I'll give you an example. I mean let's say I'm a billing clerk and I work in an invoice department and I'm just about to send an invoice to a client. People up the information on screen and notice that half the address is missing. So what do I do? I may go back to an old invoice and hope I can retrieve the address from there. If I can't find an old invoice I have to get in touch with the sales department and the sales department might go through their records and then eventually you might come up with a decent address so you can actually invoice the customer. Now that is data quality improvement. It's just a really inefficient way of doing it and what would happen in that case you waste a lot of business time and hence money. You've delayed probably the production of the invoice and therefore you delay payments and we're still the next time that happens you've got to go through the whole process or over again. So I don't think data quality improvement is a choice. It's not a choice about if you do it, it's how you do it and what we're going to try and suggest in this webinar are ways in which you can be proactive about data quality so that you fix your model your paradigm from one where you're fighting fires all the time because your data is poor to one where you're engaged in fire prevention. You're anticipating the problems, you're fixing them before they happen and it stops fires breaking out and then finally data quality improvement. Some people think it's a project as Donna said you know we've got a problem with our marketing database what should we do is do a data cleanse and you do a data cleanse and nothing else and then you do another project a year later to do another data cleanse and yet another data cleanse. That's not proper data quality management that's simply data cleansing and a cost of failure to the business. So basically data quality improvement may start with projects and we will certainly show you how you would do that but if you're going to get a handle on data quality in a permanent sense in an organization then you must develop processes and organizational structures that tackle data quality as a business's usual activity because your data is always changing, your business is always changing so the challenge of managing data quality doesn't have a start and an end it just continues. So what exactly do we mean by data quality? Being simple people we go for simple definitions and this is ours it's simply the data that is demonstrably fit for purpose. So what does that actually mean? I think the first thing demonstrably means that you can't control and manage data quality if you can't measure it and I know that's one of the oldest cliches in data management but for data quality it's more true than for any other discipline I think. The other thing that it implies as well is that any data quality improvement that you do must be directly aligned with business outcomes. So for example you know if the example I gave earlier if you've got poor data in your marketing database and you put something some steps in place to improve the quality of the data you need to be able to relate that to business outcomes i.e. in this case better marketing success more revenues for example otherwise what's the point of cleaning it? Data quality is never an end in itself it's a means to business improvement and fit for purpose also implies something which is that there isn't an organization anywhere in the world I would venture who has got data quality cracked no one has 100% data quality in every single system or platform that they operate so it's the business that must decide how good the data needs to be and for different purposes different quality of data would be acceptable so a marketing data email for example doesn't need to be of such high quality maybe as an invoice because you want to make sure on the invoice that you've got it right and it must be the business that defines that. So if you're a good organization with data quality then all your data meets that definition but unfortunately there are many organizations out there but a long way from that and I'm a bit of a data quality nerd so I've collected a few horror stories and I'll keep these pretty brief. Let's take the first one January 2020 a UK insurance company made headline news in the UK for all the wrong reasons because it decided to send a marketing email to everybody on its contact base and that amounted to more than two or three million customers and unfortunately every single email they sent started dear Michael now of course that wouldn't have been a problem if they sent it to Michael Cain if he happened to be a client of that particular company but I suspect he wasn't and of course when it hit the headlines the company was approached and they explained it on a temporary technical error which is a classic non explanation for poor management of data quality it's always the fault of some technical thing somewhere rather than some intelligent vetting process to make sure these things can't happen and of course that was a bit of a laugh and everybody had some fun but then it got a bit more serious because then people started to contact this insurance company and said well if you make mistakes like this with data how do I know that you're not sending my policy information to the wrong customers and as a result of that they've lost a significant number of customers because of that one error and that's pretty bad but if you look at the second example I've got which is again from my home country the UK back in 2018 a retail bank undertook a customer data migration and they decided to do it over a weekend thinking that would minimize impact on customers and they did it and they migrated five million of their customers to a new platform and on the following Monday after the weekend the outcome was to put it very briefly two of the five million customers couldn't access their accounts at all but you think that was bad enough but it got worse than that because many customers who could access accounts actually discovered they were accessing accounts that weren't theirs and the more worse still they could actually move money from those accounts to other accounts and so the fraudsters got hold of this very quickly and millions of pounds worth of money was siphoned off from customers accounts and the reason I put a data November on that a major report came out in the UK and the outcome of that that mess which was caused by the fact that they didn't bother to profile or analyze the data before they did that migration and they lost 80,000 customers and to pay the customers compensation for their accounts being siphoned off and to fix the problem that's been caused cost that company 480 million dollars at the end of the day and as a result of that the CEO was sacked and the whole IT department who were blamed for the fiasco actually were sacked and they outsourced all their IT to a third party so that's probably the most expensive data quality error I think that I'm aware of and then bringing it more up to date with all aware I'm sure of covid and the UK government sent out a whole bunch of letters nearly a million letters to people who were regarded as vulnerable and therefore needed to shield themselves during the lockdown unfortunately although they sent them to a million people they later discovered that 600,000 people were not contacted as part of that original letter circulation and then all the letters that were sent out 17 percent of them were sent to the wrong addresses which meant that people who should have shielded didn't shield and people who shouldn't have shielded were told to shield so that again was a bit of a catastrophe and destroyed I think critically destroyed confidence in the UK governments involved at the time where they most needed the support and the confidence of their people and I put this one in because Donna always does and I told you my photograph unfortunately comes up again and I've been had a UK pharmacy loyalty card since 2012 and even after eight years they are still convinced that I'm female and not a male despite the fact that I've emailed them on frequent occasions they still keep sending me stuff that says Ms Nigel Turner and I still get lots of cosmetics offers so I'm in a mascara's foundations and everything else to do with cosmetics he does look nice yeah and that's why I looked at my age I was going to say thank you Donna and but basically yeah basically they still haven't got that right so I miss out all the offers that I'm interested in my get offers that I'm not interested in even after eight years so what's the what's the impact of errors such as this if you take the the sort of industry as a whole then the poor data quality impacts both companies and it impacts individuals and I think some of the examples I mean the bank one is probably the prime one can have a massive economic impact if your marketing data is poor your marketing will be poor and so it'll affect your revenues if you are constantly reworking data because the data can't be trusted in your core systems that costs the company money both of course if you increase costs and you decrease revenues then your profits go down it I think the examples will show you as well that impacts brand and customer loyalty that if if customers don't trust you with their data and customers are much more savvy than they used to be about this then your brand can be damaged and you can lose customers and of course we all know about the increasing law and regulation I mean in Europe of course GDPR the general data protection regulation is now law in all countries in Europe and of course if you work in a regulated sector such as banking or insurance or telecommunications there will be specific regulations as well that the control how good your data needs to be and you might need to prove to these regulators that you have done the right things with data so if your data quality is poor you are increasing the risk and exposure of your organization but it's not just the damages it's not just for companies it's also of course for individuals so I think personal harm you know the COVID example is a very good one to give you another one very close to home I have an uncle who applied for a mortgage and was denied not because there was anything wrong with his credit record but he discovered eventually after many complaints that the organization concerned mixed him up with somebody who had the same name as him and mistook him for this person who had a bad credit record so he was actually personally damaged by that it can cause annoyance of course as well you know I'm sure where all of us have had letters or emails from organizations that can't even spell our name right and therefore that's hardly going to encourage us to buy things or do business with the company and I think what's changed in data quality in the since the advent of social media as well is that in the past companies could sweep data quality problems under the carpet they can't anymore because you know people if they if they feel they've been badly treated or their data has been badly managed by an organization they tell other people and there are lots of cases now of things going viral where somebody has made a complaint about an organization put it out in social media before you know it there are hundreds of thousands of people doing exactly the same thing and really causing big problems for that organization so that's that's the scope and scale of the problem that still exists with data quality so what Donna's now going to do is we'll start to talk you through how we think you can fix these problems Donna. Great so I'm sure a lot of us can relate to the stories Nigel has told because we all unfortunately live these things day to day and it always amazes me after as many years in the businesses I am that companies are still having big issues right but it is complicated and to solve it and there's been a lot of chat in the discussion on this very topic of you know isn't data quality like MDM isn't it like governance isn't it like yes it really takes a holistic approach and how we like to look at it is that it really is a combination of people technology and business process which I guess is a combination of people and technology right but if we even look at the people with their governance in place I mean some of the examples that Nigel talked about some of the customers we've worked with to help clean things up you sort of wondered why did this happen then did not a person check in it was anyone accountable for this issue and that's part of the problem if nobody owns it nothing gets done some of the other pieces in some of the chat mentioned is it training the people know the right business rules even to put in some person's stress they're in the front of the line they need to enter some data they they put it in do they understand the importance of that data or even better could you automate those business rules straight into the front of applications especially with things like master data can you have you know the beauty of a drop down right if we can uh I actually I tell this joke all the time I was registering for a data quality webinar somewhere um and then the the US state code with a preform text and that's a really nerdy sort of joke but it's true um because what better way to have bad data quality you know to not even have just a simple list of the valid values right so a lot of that people problem it can be alleviated by tech and can be alleviated by training it can be alleviated by understanding me part of it a lot of this is what what might be rules from one department might be used differently in another department so or do we have aligned business rules um you know that famous I use this field because it was empty and I put something else in it because I can use a place for it right so we're really getting those business rules aligned and then aligned into governance so we have that holistic view that does align with business process that sort of is a very nice analogy to are we cleaning the the streams going into the pond um you know what caused these these business the data quality issues what business processes could be better managed um the people who are see one part of the process the person inputting the data may not be the person using the data right so we completely take a holistic look um and also you know even the process of data management um always amazed to one of the big data quality issues was one of our retail companies this was two years ago um one of the DBAs decided to change the the field length of the product code um probably a lot of you are cringing and shortened it from 10 characters to eight for some reason brought down their retail system again millions of dollars lost customer reputation all of that and that was really just data management best practices life cycle management right so we'll talk a lot about the business side I think that's an easier way to think about data quality and there's the impact but some of this is just basic housekeeping on the back end right that shouldn't happen um and then technology of course um we're here at university because we probably love data and we love tech and there are tools we're not going to talk about tools there are quote pure play data quality tools that can help as Nigel mentioned automate a lot of this cleansing do data augmentation they're great they have a purpose but really to look holistically around what's the data quality tool you really need to look fully at data architecture what was the data modeling tool to get those business roles aligned into the database right do you have an mdm tool to really help automate a single version of the truth is the tool again the front end system that has the right drop down to integrate with mdm etc etc etc so you do need to augment data quality with technology I mean no I mean you can be writing everything on pencil and paper um but really we're using technology for this so they all have to be integrated and you can't look at any of them in a vacuum so that can sound daunting um often these problems are large um and they are complex Nigel will talk more about that in a bit but we work a lot with complexity and I've found anyone who does work with complexity keeps it very simple right I'm in a stressful situation what am I my key things that you remember so we thought kind of a nice way to look at it would kind of be the the ABCs right just a simple sort of five step approach and mnemonics are always nice to remember remember things um so kind of thinking the ABCDE is kind of a maybe a helpful way to look at it so we always do these steps whether we call it this or not that's really up to you but it's a nice way to kind of look at it so first assess any situation that you're in and you need to understand a solution stop for a second thinking and really what are what is the the business value of this is is it even a problem what's the priority was the impact um as Nigel mentioned every company um has data quality issues and that's actually not a problem I mean I'm a big old data nerd you all know that um and um here in our own company I could probably appreciate this some people will call I often have to stop with cleaning data I'll say you know here's a a set of data we're just using for a one-off you know event or something I don't need every field to be correct right because if you did that no one would ever get anything else done in in in the organization right so you really have to prioritize as Nigel mentioned is this going to send an invoice out to help us get paid or is it just something that we're using for a draft and it's going to be thrown away so especially us uh we'll see these times they probably want to fix everything sometimes you know most people have to look at it the other way we need to clean things some of us have to not clean things depending how you look at it but both are valid um and then baseline and Nigel mentioned it will mention it again it may be trite but it's true you can't manage what you can't measure so where are we today and where do we want to go is a night you also want to be realistic even your high priority invoice data there's probably a good enough there and and that's defined by the business and so often when we work with data governance the business defines what's your goal is it 90% accuracy 100% you know maybe it's health data like I hope my my doctor knows which link to operate on right you don't want any error there um but really that that's what you need to baseline and don't forget that again we've all made these types of mistakes but probably in our eagerness to get it fixed oh I can fix that stop see how bad it is now make that baseline and then when you do improve it you can you can show folks how far you went we often forget that with any any kind of project um converge how do we how do we prioritize as I mentioned what's the biggest value and then give a little thought on how we develop those improvement is it a people is it a process is a tech again a lot of us on the on the call probably go right to the tech sometimes it's training sometimes it's just changing the business process and there's absolutely no tech involved and that can be as valid it's not more of a solution and then again don't forget we we often do this as well we're busy we go on to the next thing a stop and and show the ROI and then again if you want to get more buy-in tell everybody about that ROI there's a bit of evangelism there and then don't forget as Nigel mentioned and we'll we'll mention again this isn't a one-time thing so you clean it up and you aren't done how do you integrate that with an ongoing business usual activity so that this is just part of your DNA in the company um and you don't you don't forget about it moving on so as we'll go through each of these steps in a little more detail and then hopefully give you I mean gosh each of these steps to be a whole webinar but again we do this in a lot of the webinars it's just enough kind of the takeaway hopefully give you some ideas of maybe a couple things you may not be doing or to put some thoughts in your head so again what is the business the business landscape what what is the organization trying to do are you a healthcare organization and you're trying to do help telehealth and want to make sure it's accurate are you a marketing organization and you want to sell more widgets to customers right the goals are important um analyze where you are today and who are the primary stakeholders who are the parties that are going to be involved in both fixing it and consuming it um and then understand it as nines will mention this idea of fitness for purpose what's what does the goal need to be what's working well what needs to be improved and then again document it not there are fancy ways to do things like data quality issue logs that seem to tools be scrolling by in the chat tools are great but good old spreadsheet might be something to start with even right but just document don't don't forget to list the and prioritize some of the issues so some of the tools you can use to prioritize what that impact is again so much value just doing some simple documentation do you have a list of the stakeholders we spend a lot of time on any engagement who who cares about this who's impacted by this who is influential in solving this um spent a lot of time on that and really look at how those people are using the data and then how are you going to communicate um to those people we use a lot in our practice pictures right because the pictures worth a thousand words um and people can start to relate to their own problems to really tell the story in a real world issue so it's a bit of um qualitative and quantitative everyone can can understand an anecdote remember last year when we had this or or you know the stories Nigel just told everyone can relate to stories but then back it up with evidence and here's the review remember we had that embarrassing thing where we sent out the emails to all michaels well this is how much it cost the company and this was our reputation loss etc etc our net promoters going to went down right the more you can do both and then my favorite business data models and process models draw out that flow explain I have just whiteboarded a data model with some of the in five minutes with some of these issues of why the data structure might have caused some of these issues or have a collaborative session link that with your process model where in the process did things break down what was the human loop or what was automated and again do some initial ROI of quantifying the problem and also quantifying the opportunity but we'll talk a little bit about more about this as well but we often so easy to call out the problems right but and we did an example where when you think of the stakeholders don't just make this an IT issue get some if you're an if you are an IT person that we call get some business people to be your co-champions and align and can they again when we talk about governance and accountability can they put themselves in the line too to say yes we're going to be accountable so one of the examples we had we actually went to get funding and approval and all that with the marketing department and we done the analysis together with them and we said not only can we you know solve these issues but the ones that Nigel mentioned but we can we will commit that if we can clean up this data we will have a 12% increase in our campaign you know click-through rate and that will generate X amount of revenue and we will commit to that you know and we obviously we're a little bit conservative so we could make it but again that speaks volumes that shows that direct link to the business with data quality and opportunity as well we can make more money with better data quality it isn't always about avoiding risk but you can use the opportunity as well so I kind of meant that we we can have more examples also of the ROI and we will but sometimes you and I have done this we do a lot of work with the sea level and sometimes it's just getting that right anecdote the right story having it hit home and pictures of stories do great for this so I'm going to pass it over to Nigel to kind of show at least one of the methodologies we use for that so Nigel yeah okay thanks Dolly yeah I mean you're familiar I think with many of the outputs that Don has just listed so we thought what we do because with time is limited is focus on a few techniques that we use but which are perhaps less commonly used in the data quality space and one example of that are these things called rich pictures which you may have come across but originally they were derived from systems thinking methodologies and disciplines and in systems thinking is all about solving messy and complex problems so I think what we've tried to highlight so far successfully I hope is that data quality is a classic messy or complex problem and it's complex for the reasons listed there very often there's a lack of information and hard facts people know the data is not very good but if you say how not very good is not very good as Donna said earlier we don't know because we never measure it very often as well data quality involves large numbers of people especially in big organizations so we've come across situations where we talk to the data producers and they say yeah the data is fine we think it's fit for purpose but then when you talk to the consumers of that data they go no it doesn't meet our needs not nearly good enough for what we need so you get different perspectives and different perceptions of what the problems really are and then when you do find problems very often who owns those problems is a bit of an unknown and that little diagram in the middle there on the right hand side with the balls on it those of you who were around in the 1980s might remember that as being called Newton's cradle and it was based on Isaac Newton's third law of motion which is action and reactionary equal and opposite but in an organization very often data quality problems are caused at the front end of that chain but you don't feel the pain until the ball at the back end pops up and that's because you know a bad input of an address at the front end means that somebody at the back end saying dispatcher input and what they should send so that makes the ownership of the problem really difficult so where we think rich pictures of great value is is that a great starting point for getting a grip on what do we need to focus on because it covers whole ism and I think what it does very well as well is highlight how interconnected many problems are in data quality and certainly we've used these in workshop settings where all you need is a whiteboard a bunch of colored pens and you encourage the people in that workshop just to get up to that whiteboard and show or draw in any way they're like using a picture a cartoon words whatever how they feel about data quality and the great advantage of doing that is that it's a great and quick way of deriving these things we call problem themes so we've got an example here that's based on a on a real company that I had some dealings with a few years ago and this is a rich picture of a yes it albeit a fictional hotel and casino group but a lot of the problems would arrive from conversations I had with a real hotel group and if you draw a decent rich picture which I hope this is you should be able to look at this picture and very quickly at least get a feel for what some of the main sort of problem areas and issues really are and you can see there one or two things you can see the top middle scale of the of the organization just below that they've got a new CEO and they've got some new business goals and they've got a lot of duplicate customers and no single customer view and by doing that you can then derive these things which we call problem themes and those are just a few I've highlighted and the advantage of having problems themes like this of course is that then it gives you a starting point to say okay if we're going to these are potential data quality issues that we can start to tackle so maybe we hit the marketing database first we need to look at finance data what do we do about uncontrolled customer data duplication how do we deal with supply management problems and I think again this diagram is a classic example if you take for example the potential need for IT investment then that implies that's generally a technology problem if you look at the the cultural issues about data capture that seems to be a people problem there seems to be a problem with with the culture of people who don't feel that it's part of their job to collect data accurately and then at the bottom supply management problem sounds like there's a process issue and probably in all those three cases this combination of the three but it gives you a good feel as well for the sort of people process and technology issues that you've got to face and it is a good place to start I think in terms of getting that assessment initial business assessment of the business and some of the problems it's got with data quality so once you've done that what you can then do is moving on to the step two of that methodology the B of the A to E methodology which is the baseline the data what things like which pictures and some of those other tools will do is give you a qualitative view of of some of the data quality issues but I think in order to really convince people of the need for action you then need to supplement this with very much a quantitative view and as Donna said it's really important to baseline the quality of some of the key data sources that you've discovered in the first stage and to do this I think this is all pretty self-evident you decide which are the key data sources and domains you profile the data well I'd always recommend using a data profiling tool rather than trying to do it on SQL or Excel for reasons I won't go into now but you can pay hundreds of thousands of dollars for a data profiling tool or you can get free versions on the internet and if need be start with a free one and then make the case to get something a bit more sophisticated and then you can assess how good the data is according to the seven dimensions and I'll come to that in a second then you can present the results of that exercise back to your stakeholders gain consensus on what impact some of those problems are having on the business and then you can refine this data quality issues log which as Donna said could be a spreadsheet so you know potential outputs of all this you would do some sort of initial report with a lot of numbers in it about how good or how bad the data is and then you can start to look at some of the potential financial costs and business impacts of the poor data then once you've done that what you can then do is start to actually put some baseline measures together and this is where the seven dimensions of data quality come into play and why seven well basically there is no industry standard on what the dimensions of data quality are I think the important thing to remember is that data quality isn't a sort of a unified entity there are a number of reasons why data can be of poor quality and it encompasses all of those things and the ones in blue we call the content dimensions of data quality in other words it's about the data itself and demonstrates that data is multi-dimensional so when you're looking at a particular data set for example or a system you know how complete is the data are all the fields populated that should be populated or are some of them left blank and I'm sure we've all seen databases where I don't know the prefix field is blank or the postcode or zip code field is blank does that matter but until you can actually establish that you have blanks in those fields you can't really ask the question we've got blanks does it matter then is the data accurate in other words does it reflect the real world so if an organization holds my physical address my home address is that the one I currently live at or is the one I lived at four years ago the other thing is a very important to measure is how unique the data is and I'm sure every organization we've ever been to has a problem with unintended duplicate records and in other words in a marketing database you might have a hundred variations of a client name which may be a company because people have input that data was spelted in a hundred different ways and that's a great testament to people's creativity but it's pretty hopeless if you're trying to get a single view of that customer as a as a as a purchaser of your supplies or services and then you can look at validity so things like you know is the date of birth in a particular format the format there is a uk ddmm yy yy maybe you've got both us and uk formats in there can you identify the two and do you ever get them mixed up and then you've got business rules which says that probably our customers need to be aged between 18 and 120 at a push um if you've got customers younger than 18 or older than 120 there's probably something wrong with that with that date of birth and then consistency so in many big organizations you know one of the problems of data quality is that they don't hold a customer data in one place once unless they've got really good mdm master data management but they hold it in very different sets um platforms and systems so the key the key challenge becomes are those systems consistent so if my current address is held in one system and is held in eight other systems so there's eight other systems also hold my current address so have they got out the data addresses so those are the content dimensions and then look also but you need also to look at the context ones which is simply that if people need access to the data can they get it and do at all and we've come across organizations that say well we need this data but for some strange reason we've been waiting for six months for the it department to give us access to that data so we can't do our job properly and then of course the other question is then timeliness so if you know invoices need to be sent out promptly then they should be available in the data warehouse at nine a.m the following day for that to happen so you can then make those assessments and then come up if you like with the baseline figure for each of those for a data set and what you can do if you like and i've seen some organizations do this you can roll each of these up into a sort of weighted scoring system and come up maybe with an overall score for a data set and say well it's a 79 at the moment it's 85 for accuracy but only 62 for completeness so you can start then to derive some measures and then eventually kpi's that donna will talk about so i think that's really important what i'll do now is hand back to you donna to talk about how you derive kpi from that okay great so um lots we can do on kpi's um one is to to have them that's sound really obvious and so we can talk a lot about that do we have do we have duplicates is the data valid is it incorrect so it's good to not only to get a quantitative measure of that but then a target right what what is it and do that with the business do that as part of your data governance council or stewards right what is the status now um so i think there's a lot of tools that help with that but i think one of the key parts is linking that back to the first stage in this is what are the business benefits right so can you quantify it again one of the ones we find especially companies still do physical mail right just think how much you're spending to send physical things to the wrong address just stopping that will you know help not have the leak um but then not only some quantifiable business benefits uh reputation that some of the stories anisle told didn't make the company look too good they've lost some trust so not only do the quantifiable one absolutely um but especially that gut feel of brand reputation um you can't really do great marketing if you don't i mean i've had some obviously a lot of companies now are trying to go digital um and then wake up and say well we don't even have emails let alone text message you know phone numbers and things like that so it's really going to limit your ability to be innovative if you don't have the right data so again we use the marketing example it could be anything we have some companies going to tell a health um and then the the ability to have very safe and secure and correct patient information help them do that very quickly so again look at the business driver that only not helps prioritize what you need to do instead of target that you can achieve and track against now but also helps get buy-in i saw some of the comments you know often often it's really hard to get buy-in from the business often this is the absolute easiest thing to get buy-in from the business because they're feeling it every single day um you'll probably get a lot of nodding heads so we've talked about this next slide a lot that hasn't um hit home you know that idea of you can't manage what you can't measure and you know finance does this all the time and money is an asset nobody absolutely questions that we have to get a list of how much money the company's making and what the assets and liabilities are it would be a very strange company that didn't have that have that so i think we do see improvement we've done a bit griping about how long we've been in the industry and still see problems we still we have seen improvement a lot of our customers um do have we'll talk later about the idea of a dashboard that monitors the key kpi's around your data assets in terms of the data actor is a complete at the time when all of those dimensions imagine so now do track it and then do move on with that so moving through our abc's um then we we talked about this a bit and we don't want to hit it to death but you understand the business value you understand where you are and then where do you want to go and do give that some thought um you know we're probably all familiar with the Pareto principle which is the 80 20 rule 20 percent could have the biggest benefit so what is that 20 percent so again from that issues log you have some ROI analysis that should really be something that's bought in from your business stakeholders it might be really easy we'll talk about some tools of how to prioritize that and again all of these things sound like they can be big behemoth the discussions and long so much to be from quick little workshops and whiteboarding and some nice templates can really even stopping and ask the questions can go a long way i think a lot of people say oh we don't have time to do that else is let's just go do something oh gosh just stop and look them out for a second or you're not going to get lost right so a lot of these tools and templates we can share with you um can help with that um and just we'll talk about this in the next slide but there's no here it isn't always that the first one you're doing as a group isn't always the highest value biggest thing right let yourself practice a little bit find a quick win that is valuable and that everyone will understand but might be easy to achieve so yes we need to employ a customer master grade management system absolutely valuable absolutely beneficial problem not the best first thing to start with right can we do something about can we even get the email addresses right now can we just um reduce the duplicates or something like that so one tool we use um is a quick um priority grid um we've done it on a whiteboard uh people bring up sticky notes or everyone brings to the meeting a sticky note with their problem and then as a group again this is a hand of the not a detailed analysis of difficulty and level of effort and all that it really is just a you know finger in the wind of what's the biggest value um and and what's the difficulty so ideally you would have the the lowest difficulty the highest value would be say storage of the green um the things you probably want to avoid are things that don't have a lot of benefit that are really really hard um and then some are in the middle and again you might maybe the first one might be a low benefit low difficulty or just to get things started or you know ideally it's one of these but give that some thought and these are really interesting workshops especially when done together because people and I do feel like people can be adults in the room and realize oh this was my issue but you're right the real thing we need to do is get those addresses right let's start with that and then everyone agrees it together and it's just kind of a nice getting people up um in the old days we we did this in person we've also done things like that pretty successfully um online you can have online workshops we do that as well so um I'm going to pass it back to Nigel okay now we know what the problem is where we are where we want to go and how we do it also okay thanks it's fairly quickly because I'm conscious of time um basically uh this is where you actually know design and implement some improvements and and fundamentally there are three basic things none of these I think is rocket science first of all you need to create a team of people to tackle these problems I think it's really important to get the right stakeholders on board so for example the producers the consumers of the data you need IT support and you may need something like a GPO for example if personal data is involved in the project to make sure that you're not doing anything that contravenes GDPR or something else um whenever possible if you've got data governance clearly a lot of these things should be led by data owners and and driven by data stewards on behalf of the data owner and then you might want to do some reanalysis of the problems you've identified maybe go into more detail to refine things like the ROI as Donna mentioned um and then you design some improvements um those improvements as Donna said are often processed people and technology changes but sometimes I've seen really good data quality poorer projects that involve nothing more than a bit of reeducation of people or nothing more than a tweak to a process um or creating perhaps one extra feed into a data warehouse so they don't always have to be really complicated I think that's the key thing and again in terms of outputs and tools um you know we've covered most of that already member business case is really important as Donna said if you can't relate this to a business outcome then you shouldn't be doing it um one of the one of the tools we use quite a lot in this stage or at the early stages of developing and designing improvements is things called root cause analysis diagrams and here's an example of one from a real organization which obviously we've anonymized and um basically what they do is they show two things I think first of all the interconnectedness but they also cause uh they also model things like causation chains and this is also comes by the way from systems thinking initially so you start maybe with poor data quality which is sort of center lower middle and then you ask the question right what causes poor data quality well we've got multiple versions of the truth in our systems um we have silo data and problems are always fixed in silos nobody ever looks at the whole thing and then you go back and says well why do we have silo data problem fixes because we don't clearly prioritize our data efforts why don't we do that because we don't have a data strategy or a data architecture so you can go around and go around in in this model and it really helps I think to understand which you should tackle first so what you can then do is highlight what I what we call the root cause is on the outcomes so if you look at this particular model and start with the yellow blobs which are the outcomes you can see that the result of bad data quality in this organization is that they have high rework and failure costs so their costs are higher than they should be as a company you've lost revenues because of bad customer experiences and ineffective marketing because you've got poor data quality you are risking the wrath of the regulator which is easy for me to say but then you can also look at and those outcomes by the way should be the things if you've drawn this correctly where you have arrows going into them or causations going into them and nothing coming out of them and then conversely the root causes are the ones where arrows go out of them but don't come into them so if you look at this in terms of where the hell should we start with all of this then you could argue the first thing we need to do is make business people accountable for data so let's appoint some data owners and data stewards to lead this work or it could be we need a data strategy because we don't understand what data we need to prioritize oh maybe we need a data architecture to understand the interconnectedness between the data elements which is something we can't do at the moment so again it's another technique that's a really good starting point for a data improvement project and then talking about which these these things start off as projects as as we mentioned earlier and once you've identified and then I think it's really important to plan them properly as projects because you should kick start them as projects and our way of doing this very simple way is something called data improvement planning so that every single piece of work you do within data quality should have something like this which drives it which is a data improvement plan which should have an owner and should have many of those elements that you see there on the contents page and a data improvement plan could either cover a data domain so for example you might say let's figure out how we're going to improve our product data or our inventory data or location data or it could be a problem area of the business like for example our supply chain process um our data literacy of people within the organization but every initiative that you do should be controlled by these plans and if you do that then we believe that the benefits are laid out there on the left hand side and I won't highlight all these now key thing about them they're not like every good project plan they evolve so as the business changes the plan should change the other advantage of having these is that if somebody says what's our data quality strategy you can roll up these individual data improvement projects to form in effect a data improvement program which is the way we tackle this when I worked in BT many years ago so this is data quality is a good case I think where a data quality strategy is sort of top down and bottom up so that it's the summation if you like or the sum of the individual projects that you do um on the ground that then becomes the program so at that point I hand over the last time $5 just to finish off so this last this last phase is often the one folks forget right because we're busy we fixed up we want to move on to fixing the next thing but this really as as Nigel mentioned isn't a one-time effort it's a sustainable program so yes maybe the initial cleanup the initial analysis is a project but how do we keep this and sustain it over time otherwise you're just going to keep cleaning up the same thing so part of that is accountability and it's part of an ongoing data governance program the business is usual activity um and a nice way to do that the most of our clients do is have some sort of data quality dashboard right so these are the 10 data elements or data areas that are critical to our business for every other area of the business you probably have some sort of um you know these these charts are we on track are we right amber green you know where's there an issue and so you can proactively start to look at some of these and track them so it's a really nice visual way um especially for business users just really and some of folks you look at these at every single governance meeting and just say are we on track if not what do we do about it and then don't forget we often do especially um folks that maybe you you know a little modest of evangelize the benefits of what you did you had data quality cleanup you showed the ROI did you tell your sponsors about it just tell the CEO about it did you put it in the newsletter did you thank the people who did it right and then tell them again and then each of these the data quality will continue to improve with a bunch of these quick wins that everybody hears about um so really make it part of everyone's job a cute thing one of my customers did you know they kind of had a data-centric initiative and they have little videos of everyone at their desk folks from all of the walks of life that we mentioned the data entry clerk the salesperson um the product manager and they they would be doing something with data and they said this is what data-centric looks like right i'm putting in the right values into the the field i'm selling to the customer and i'm getting their email address this is what data quality looks like i thought that was really cute um because it really hit home that this is everyone's job data quality is part of your day job so you don't have to go quite that far over the whole video campaign but just do keep keep people aware keep their responsibility and when they when we do something right we all see the value so just summarizing because i know we want to move over to questions um i think we hit home that data quality is complex well because business and organizations are complex right um and so these these will always be things we need to look at just like your finances are helping you look at every year at a company right it's not like that's a bad thing that data is somehow unique and terrible that we have to keep managing right so you want to look holistically at the people process and technology and really start to embed that not only in your governance but with some very quantifiable plans where you can have those incremental improvements that are really multi-dimensional um before i quickly pass it over to Shannon for questions um our typical plug that we do this for living so if you need help do let us know we're happy to help um and then one plug that should be helpful so not just published hot off the press um a blog on data quality in this multi-dimensional approach which you might be interested in shares a little more detail about some of those dimensions and that's out in our website under the blog succession so um one more plug and please join us next month if you're able to on data virtualization and we'll have a nice talk about that so Shannon i would like to pass it over for questions and open it to you thank you donna and thank you so much for this great presentation but it is uh just to answer the most commonly asked questions just a reminder that we uh i will send a follow-up email to all registrants by end of day monday for this webinar with links to the slides and links to the recording and anything else requested let me dive in here for um for you so it's the questions in um for someone who doesn't have access to pay data modeling solutions what applications would you recommend to make visual data modeling graphics um i'll take that one and then i just if you want to chime in i mean data modeling tools is great there's some there are some low-cost data modeling tools they tend to often be on the techie side but um good old vizio and powerpoint can work especially when you try to tell that story i almost put some examples in the slide of you know put a picture of a customer um put some lines towards a customer can buy more than one product but the website won't let you do that um or something so you know if you actually don't have a data modeling tool and you're trying to tell the story like i wouldn't i would not recommend a powerpoint for obviously the board engineering database um but to just start to tell that story you can be very creative with some of this stuff how do we determine that this is a sense of this identified when um determining the problem theme i'm sorry oh go ahead yeah i could pick that but if you like i mean the problem theme is part of that stage one assessment process in order to gather a lot of the information that you need to do that process effectively you know the way we'd always recommend doing that is yeah you can look at documents and other things but the best the most effective way of doing is to talk to people and when you ask people in interviews or in small workshops or small groups you know what are your biggest data quality problems and challenges and then the first you know once they tell you what they are the first question we'd always ask is okay how does that impact your part of the business what problems does it cause you can you actually quantify in any shape or form you know the impact of those problems on your business in terms of money or in terms of loss of customers or anything else so you sort of gather that information and what the problem themes that you extract from the from the rich pictures then does is simply to help you group those problems in a way to help you tackle them as part of Data Improving Project I love it well you guys this has been so great and Nigel thank you for joining us this month it's been such a great topic it's always super hot um but i'm afraid that brings us to the top of the hour and that is all the time that we have heard today again just a reminder I will send a follow-up email to all registrants with links to the slides and links to the recording of this session thanks to all of our attendees for being so engaged in everything we do we love always love watching the chat and the questions coming in so I hope everybody has a fabulous day and stay safe out there thanks Donna thanks Nigel thank you thanks all