 Sure for data diversity. We'd like to thank you for joining the latest in the monthly webinar series data architecture strategies with Donna Burbank Today Donna will be joined by special guest Nigel Turner to talk about data quality best practices Just a couple of points to get us started due to the large number of people that attend these sessions He will be muted during the webinar We very much encourage you to chat with us and with each other throughout the webinar to do so just click the chat icon in The bottom middle of your screen to activate that feature and for questions We will be collecting them via the Q&A section or if you like to tweet We encourage you to share highlights or questions via Twitter using hashtag DA strategies And if you'd like to continue the networking and conversation after the webinar and to learn more about Donna Just go to community.dativersity.net As always, we will send a follow-up email within two business days containing links to the recording of the session and additional information Requested throughout the webinar Now let me introduce to you our speakers for today Donna Burbank and Nigel Turner Donna is a recognized industry expert In information management with over 20 years of experience helping organizations enrich their business opportunities Through data and information. She currently is the managing director of global strategies limited where she assists Organizations around the globe and driving value from their data She has worked with dozens of Fortune 500 companies worldwide in the Americas Europe Asia and Africa and speaks regularly at industry conferences And joining Donna today is Nigel who has worked in information management related and related areas for over 20 years This experience has embraced has embraced data governance information strategy data quality data governance master data management and business intelligence He is a great advocate for keeping information management as simple and business focus as possible and feels that a key role of Information management professionals is to help business people relate to information management to real business benefits And with that, let me give the floor to Donna to get today's webinar started. Hello and welcome Thank you Shannon always a pleasure to do these And thank you to all of the familiar names and some faces on the call Always nice. We get a lot of repeat folks and I know you guys are always very popular and in the chat Which is always nice to kind of see I don't always answer cuz I come a terrible multitasker But I always appreciate seeing it after And for those of you who have been on previous webinars, you realize that they are all on-demand So if this is the first time joining us, thank you very much But any any of these other topics are of interest to you Dataverse is very good about keeping things I think indefinitely so both the slides and the recording or if you know that you were just so excited by this You want to catch it again? You will have an opportunity So this month as Shannon mentioned we have my colleague nice turner So it's always a pleasure to have him over from Wales virtually and the next month We'll be talking about self-service bi and analytics that we hope you can join us for the few remaining this year And then we already have an exciting plan Line up next year as well. So Without further ado today We're talking about data quality and data quality for those of you who are data quality experts are trying to embark on the data quality Program realize that it's more complicated than it seems like a lot of these issues And we'll talk a lot about that seem very simple on the surface You know how hard is it that the zip code is wrong or you know the gender code is wrong and the medical record? Okay, we just fix it. Well, you know as we get to root cause Really to get these right or to get them right long-term and not just a quick fix It really is a holistic architectural approach this people process and tech and we'll talk about all of those and as the theme of this is data architecture we always like to take and that's sort of our Our way of working in general is to kind of look broadly So many of you have joined the presentations before no have seen this architecture slide We should have always go back to this framework because Well because you kind of can't not do it right because as Nigel and I were so that was we were preparing this kept going back and forth of You know what it became a bit of a meta conversation and we you know are big fans of looking holistically To understand data quality you need to look at business strategy Which drives everything? Why is this important? What is the business meaning of a rule? What's the business you know value of fixing some of these things? Who are the people governing it? You know do we have the right architecture to design the data quality and keep it relevant? Do we have metadata management to know what those rules are and we sort of kept checking ourselves of saying? Is that data quality or is that governance are they the same thing because they are are so intertwined right? But a lot of these topics we have done a deep down deep dive in in previous or future webinars So we can't obviously cover everything in an hour So what we're going to do in this particular presentation, and we haven't done this one in a while Let's just talk about data quality and of course there'll be touch points with everything So as I mentioned you can't think of data quality without governance They go hand-in-hand you shouldn't think of data quality without understanding the business context I would ever you can't do my data quality without understanding the metadata the technical and business lineage and understanding of it So we'll touch on this You know it's a favorite slide of mine because it just to me it sums up a lot of the interconnectedness of all of these disciplines, which is nice So without further ado, I'm going to pass it will kind of switch back and forth as we do to my colleague Nigel Who's sort of kick us off and he's been working many years in data quality So we're interested in what he has to say so Nigel welcome Okay. Yeah, thank you Donna and good morning. Good afternoon. Good evening. Whatever you're listening to this webinar I was enjoying the warm up music at the beginning I had absolutely no idea that Donna and Shannon were such good singers, but Anyway, you learn something new from these things every time So basically an agenda today Well, what I'm going to cover is with Donna is basically start by some basic definitions As Shannon said earlier, I like to keep things simple That's certainly true with data quality and then obviously to cover Well, why does it matter for many organizations? I'll try and prove why it matters by giving you a couple of pretty recent examples of what happens when data quality goes wrong talking more general terms about How poor data quality can really hurt an organization and stop it doing the things it wants to do and then look up I really finish off there by talking about how data quality has been traditionally approached and addressed in many organizations And why that approach still has value But it's no longer the only way to tackle things and that now more much was Donna said earlier holistic approaches are needed When I started in data quality data quality was seen almost as a discipline in itself One was devoid and and outside all other data disciplines today as Donna said You know what we're finding is the data quality now has to be done as an integral and inherent part of any Data management initiative within an organization for it to really succeed and to reap the benefits Then hopefully if we finish on time, we also have time for some Q&A at the end So start with some. Oh, I don't know where that one's gone. But we're on slide 16 According to me. Thank you data quality a simple definition And this is my definition of data quality There are many definitions out there and of course as all with some of them are very complicated I think this one is my favorite simply because it's simple What does that mean demonstrably fit for purpose? I think demonstrably means that it's all very well saying what I think our data quality is okay Or I got a feeling that we could improve it a bit. It's all about quantifying improving How good your data actually is so that implies that really data quality an essential component of any sort of data quality management or improvement is that you can measure the baseline How good is your data quality now and put some numbers around that look at What sort of numbers you need to achieve to get the data quality fit for purpose and then Come up with plans for how you get from where you are to where you want to go and the second bit of that fit for purpose What does that mean? I think the other key message in that is the data quality isn't an absolute And I've worked and Donna has in many organizations in the last few years There isn't a single organization out there They can put their hand in the air and say we've sold we've got our data quality totally under control We're all our data is 100% accurate 100% complete and reliable wherever it stands. That isn't true So what is fit for purpose basically means is that depending on its uses? Then that data is good enough for the use it's put to so if you're doing a monthly finance report And the data that goes into that report is two weeks out of date before it's published That doesn't really matter Because that window is fine. But if you work in in a transactional system or an online Purchasing system where you need the data to be right and right up to date For it to work within the context of the business process Then the quality of data needs to be one pretty much 100% right away straight at the beginning So that these sorts of things have to be put in that business context as Donna said And you know what is having data that is demonstrably fit for purpose means what it means those things there I think first of all that it needs to be accurate enough What accuracy means I think is does it model the real world? So, you know if a company out there has my national insurance number is that my correct ni number? If it isn't then it's an accurate doesn't model a real world simple as that completeness simply means Do you have all the data that you need? So if for example you're doing online marketing and you've got a lot of customers in your CRM system with no valid email address Then clearly your data isn't complete and fit for purpose Reliability simply means is the data consistent in different sources? So disregarding the timescale thing I mentioned earlier generally speaking You know if you have details on me as a customer in your in your CRM system It should roughly be the same in your marketing system and maybe as well in your sales system So basically you need to be sure that that the data is reliable and then those are what I call the content Triteria of data quality in other words the data itself the other two are about the contexts So obviously date having high data quality isn't any good if the people who need access to it Can't get access to it when they need it and the second thing I mentioned is time being us So if you're working as part of a process you need data that was Accurate as of yesterday then it's pointless having that date for a week too late because it simply won't work So basically That's a fairly simple definition. I think and why does that matter? Well, I'm sure this is a slide that many of you will be very familiar with not think maybe not exactly this slide but certainly some of the The key messages in the slide which is that if data quality isn't fit for purpose and demonstrably fit It has a very negative impact on companies and organizations It has an impact which is economic on a company It hits the bottom line of any organization whether they're a private company or even a government department because you know government departments have have incomes and they have costs to me and Revenues a good example I mean you can lose a lot of revenue if your data isn't fits the purpose two examples if you've got poor marketing data For example, and you rely very heavily on marketing mail shots to your clients Then the chances are that the response rate that that will be lower than you expect and therefore the revenues You would hope to achieve from that are going to be lower than you would want Similarly In terms of costs, I mean this is the famous phrase that one of the gurus of data quality came up with Larry English When he talked about the cost of failure So every time data is one they're not fit for purpose within a business process Somebody somewhere usually has to do something to put it right. I'm not getting it right first time incurred the cost So those costs as you'll see later can really add up And of course if your costs are higher than they need to be because of poor data Your revenues are lower than they should be and that impacts profits on the bottom line in a commercial organization But it isn't just about economic costs because poor data quality also impacts other things So, you know, I'm sure we've all seen and I'll come to a couple in a minute Horror stories from organizations who got data wrong. It impacts their brand damages the reputation and also damages their customer loyalty I Because I'm a bit of a geek about these things I have a policy that if any organization sends me a communication whether it's through the post or through email or any other way And they cannot get my basic details, right? Like how to spell my name, for example I think well, they obviously don't manage data very well I suspect they wouldn't manage my insurance very well or sell very good car parts If they can't get that right that it does give an indication I think that the company is not trustworthy in some way and of course big driver of recent years particularly in some parts of the world Has been the importance of data quality in terms of law and regulation and I'm sure you're all familiar With the general data protection regulation that came into force across the European Union in May 2018 And article four of that states very simply data should be accurate and where necessary Kept up to date and they put that phrase in where necessary meaning it must be fit for business purpose and Comply with personal data laws within the European Union So all those things are very good reasons why you need to get it right But despite those reasons companies still get it wrong and here's a couple of very recent examples The first one might surprise you because Amazon is often held up quite rightly. I think is a paragon of good data management You know, they are a data management company first and foremost that happens to sell things So it will goes to prove that these errors can happen to the best Companies in the world But basically this happened in their in their Amazon Prime Day Which weirdly extends over two days of the 15th and 16th of July 2019 And I think they're the bullet points basically tell the story that lens normally if you want to buy it is about 10,000 pounds 30 thousand dollars But at the start of prime day they quoted a price for that of 78 pounds Eventually that that error was spotted and they adjusted the price to the correct price, which was about nine and a half thousand dollars But in the meantime, of course, and such is the power of social media as soon as a few Keen souls got on the site and noticed that they told all their friends And they told lots of other people as well and hundreds at least the Amazon won't say how many of those lenses were purchased at that incredible price of ninety four dollars and Amazon decided rather than rather than get into any legal complications on of the deal But in effect they lost six thousand pounds about seven and a half thousand dollars for every lens they sold Now Amazon you say can afford that and good luck to the consumers But that's a great example I think of where hundreds of thousands of pounds can be lost by an organisation Because of one fairly simple error and of course it's an error that you think would be quite easily spotted because the average Reduction apparently in prime day was about twenty five to thirty percent So a simple business rule that says is this sale price twenty five or thirty percent lower than the full price Then Amazon would never have published that price in the first place So even the best of us can make mistakes So that's a good example I think very recent example of an economic consequence of bad data, but I've got a more personal one as well Not personal to me. Thank goodness But this was one very recently that I came across in the UK in that in a much better national health service And there was a man who entered hospital for a system. I can't even say it's a scoppy operation If you don't know what that is because I had a look it up. It's where a camera is inserted into someone's bladder To actually do some examine the state of the inside of the bladder But unfortunately at the same time in the hospital was another patient who had a very similar name And I've avoided saying what that name is But I think golden rule is if you go into hospital make sure you've got an unusual name This is far less likely to happen to you and then when this guy came around from his operation He'd been confused with the other patient and he was given a circumcision instead of a cystoscopy Now I mentioned earlier about they poor data quality being a cost of failure I think you call that one probably a cut of failure But it was one of the string of errors at the hospital around bad management of data and bad Implementation of data and so a major investigation is now underway into the workings about hospitals and the brand the reputation of that hospital Apart from the injuries to the poor man Are things that don't go away very easily once you get these things wrong and Donna and I have worked in lots of different companies in the last couple of years One of the things we like to do when we interview stakeholders when we do data assessments and data maturity assessment is to Take some quotes down from some of the things that some of these people say and These are from a number of different companies from different industries different sectors Different roles different parts of the world and yet some of these issues seem to come up time and time again And I certainly won't go through them all but you can see such things like the bottom right I think is probably my favorite which was a conversation with the marketing manager in one of our client companies Well, we don't really know how good our data is and the only way we find out if it's wrong is if a customer contacts us And tells us and basically what they do is when they do a marketing exercise They basically assemble the data They wish for the best and they press a button and send it and that's not really a very good feedback loop I think for data management and data improvement Lots of companies tell us as well that there's no accountability for bad quality data and come back to that later Basically, no one is responsible And I think the top right is also significant So there's a lack of appreciation of what happens to data from the front end to the back end I always like data quality problems to I don't if you remember Newton's cradle Which was an old executive toy that was very popular back in the 1990s for those of you old enough to remember And there are a series of balls hang on strings You hit the ball at one end and because of Newton's laws the ball at the other end springs out That's what happens with data that where the problems are caused is often not with the problems Have an impact and cause pain So that's one of the three reasons why you need a holistic cross-company approach to data quality rather than a project basis To data quality, but you can see there lots and lots of negative impact support a class And if you add all those negative impacts up Then here's some fairly recent evidence some of it's a little older than others of many studies that have been done to show How damaging bad data quality actually is the top two really about the company level The most recent data that we've got there is from barb BIRC, which was done earlier this year They they interviewed and they surveyed a lot of organizations who said that half of them said that a quarter of all the data They hold they believe to be an accurate and in our experience if they think a quarter of their data is inaccurate in reality It's probably more like a half So that's probably an underestimate or top right This is a this is a figure that's been validated time and time again But if you have an organization where data quality problems are endemic That can cost you up to a quarter of your revenue in things like failure costs and loss of revenue It's really quite staggering and top bottom left is really an economy-wide impact where that was done by IBM a few Couple of years ago where they basically calculated that you know The three trillion dollars a year in the US are lost to the economy because of data quality issues and the bottom right one from the UK You know just customer data alone and the problems without costing a company average of six percent of their annual revenues So it seems a bit weird doesn't it? We all sort of recognize how important good data quality is and yet these problems persist What's more scary was some of these numbers simply reflect what I remember when I first started in it Something like 15 20 years ago. So why hasn't the situation not improved? So I suppose that actually the question is why do these things continue to persist? Well The first thing to say is of course that the data world has changed an awful lot in the last 20 years And therefore the problems that organizations are dealing with today I'm not the same problems that organizations were dealing with when I started in this back in the late 1990s The data world has simply become more complex and diffuse So apart from we all know about the increasing volumes of data that every company is now experiencing in many companies is doubling every 15 months It's also the speed by which that data is processed as increased as things like Internet of things technologies come in and also of course the variety of data is so much broader than it used to be where you've Now got structured semi-structured stream data and structured data etc. All hitting a company much more hard So getting a handle on that is obviously more difficult than it used to be second reason is the age-old one that I've said earlier that accurate data models the world and Unfortunately the world changes So if you don't take proactive steps in your organization to capture those changes and model them in your data Your data in Inevitably inherently gets out of date the day after you first collect it in many cases The third thing is that I've been in many many organizations where there's predominant paradigm is unfortunately still this Data stock data quality sport. Yeah, it's all I tease fault It's not I tease fault data is a business asset and it's a business problem I mean the data quality problems in a business that caused by the business and therefore to sort it out The business has to take prime responsibility for that And I'm people will make mistakes with data as they did in that hospital in the NHS And that inevitably is going to happen. So you need to have to cater for that We talked in other webinars I know Donna has about the lack of common data definitions and metadata around data Which can cause people to misinterpret it and misuse it I've mentioned the data Newton's cradle It's hard to solve a problem where the people creating the problem don't feel the pain Because you've got to get the people who do feel the pain to talk to the people that create the problem and very often in some Organizations where they're siloed that's quite a hard thing to do and then finally as Donna said earlier It's all about governance if nobody is responsible for improving data We'll guess what happens. No data is improved and we see that time and time again So take each of some of these in turn. This is a little diagram I came across recently about the data will be coming more complex and I admire the people that created this from first mark It's must have taken a very long time It looks very pretty, but I think it's an indication of how much more complex the data landscape is today Then it has been in the past and I think that reflects the that the increasing variety Velocity and variability of data And I think it also reflects the fact that the data management industry with things like big data, etc Is getting much much more popular So there's a lot more vendors out there than they used to be when I started in data management many years ago We had mainframe computers and dump terminals It was all easier to manage data in that environment in this environment all those tools there all home process and manage data So how do you start to address a problem where? Varieties in that data across those platforms and systems are inevitable You simply cannot get all those keep all those completely up to date And I mentioned also earlier the world changes just a little bit of research on the next slide that I did earlier Not earlier today. I mean earlier this year and You know the world does change and it changes in surprisingly quickly And in the UK just to put all this in context There are about 60 million people who live here at the moment And so you can put those those into some sort of context the 3 million of those 60 million 5% roughly they move house every year So if you've got a marketing database with customers on it If you don't actively seek to keep that to keep on top of that then every year you lose 3 million people Basically off your marketing list if you've got their address wrong Lots of babies are born every year lots of people die every year of course unfortunately lots of people get divorced That causes all sorts of spits and of course people come into countries and leave countries as well Although in the current Brexit mess we're in goodness knows what those figures will look like in one or two years time Your guess is as good as mine, and it's not just on the consumer side the data changes all the time These are some facts from business to business B2B as well and there are about five million businesses at the moment in the UK and They and a half a million roughly new businesses start up every year and of course a lot of companies also Disappear every year. So if you're doing B2B then keeping track of that is quite a challenge Surprising figure on the bottom left But I've seen this validated in several places that 30% roughly a third of business people actually change their email address every year So if you're trying to keep a contact database for your B2B contacts If you don't actively manage that in some way then that's going to be out of date in three years basically completely useless in four and Their average DK is about 2% a month So, you know again if you don't actively manage these things things tend to go wrong So those are a couple of reasons why data quality problems persist and I'll hand over the back to Donna No, it's going to talk about some other reasons why these problems continue Yep, and and I think a lot of you can probably relate to some of those And I think you know more importantly of I think we can all sort of understand them But how do we fix it right because I hope play that's why you join this call So as Nigel mentioned, it's not an IT problem. It's not only an IT problem It's also a business problem So I think the key is getting both IT and business working together because generally it's a piece of everything It's not that IT is innocent either, right? So So if you kind of look at the triangle of people process and technology sometimes it's just human error It's such when you know, there's manual data entry, which even with automation can't always be avoided often That's how you get the data in but then that's you know, for example, where pairings with technology can help Can we make yes, maybe the the person put the wrong gender code into the hospital and they put you know Mail instead of the word as a letter M But could we have the system just to the drop-down, you know If there's only two values or there's four values or whatever in the gender code then make sure those are the right ones So or you know, you're doing 90% of your business in a certain region or state We ought to default that you know, just minimize human error So I think that you know Understanding how the data is used which we've talked about how technology can help with that Be a big idea a big benefit And so sometimes it's the the human interface with technology that has the poor design and sometimes it's the business process itself So when we look at sort of the idea of these data silos so much of that And I've kind of been looking a bit at some of the comments, you know And say things like fit for purpose whose purpose, right? I mean that's exactly the issue and that's where things like governance come into play So certain things have enterprise-wide standards Can we all agree if gender code is M and F or you know, T for transgender or do we use the word male or do we use the word female? Let's just agree on some of the basics and make sure that's done consistently But also understand what you said because sometimes that's where data problems come up So kind of getting those agreed standards making sure they're part of the business process that people are following those rules making sure Everyone's trained really that's kind of the crux of that's getting everything, right? And that really ties into the accountability as Nigel mentioned We do a lot of these projects and often we kind of start out with Interviews and you probably do similar things whether you're a consultant or whether you're just you working with your team And you might sit about you know introduce yourself as being part of the data governance team or the data quality team And often people, you know, everyone loves to vent about the problem. They say great. You're gonna fix that, right? No, we're gonna fix that right? So yes, it's partly things systems can do we can help with metadata But either the data is entered by you know, often business people or the data should those rules should come from business people What you don't want and I think IT can be just as frustrated I don't want to have to come up with a business rule for how total sales is calculated That should be sales or accounting or not not me No But sometimes just because things need to get out the door that happens So one of the quotes that came from one of our customers that we liked because it just sums it up if We're all we often start with kind of best principles The data is everybody's responsibility if you start with kind of core principles with your data governance and they kind of yeah But if we're all responsible, no one's responsible, right? Nothing changes So yes, we all should take care of data quality But I am responsible for the sales data that I type into the ICRM system, right? So I think defining that you know, we can talk about the general issues Amazon had some problems with sales But who who could have Affected that wasn't an IT issue. Is it a business issue and not you know, punish people But to really understand root cause and give people stewardship and accountability And that really helps with kind of turning this big ship around, right? It's it's not something that can happen overnight. It happens over time But some of the things you can help to make that ship turn around Is that business led data governance framework and both Nigel and I could wax poetic for hours about data governance They're near and dear to our heart in different ways But really getting that framework and the organization in place to understand what what data you're prioritizing Who does what what organization and how do we escalate these issues? Data quality is much a process and a people and a policy issue as much as a technology issue You know once you have that framework in place You have the right people to table and IT is right when they say I shouldn't be the one coming up with you I don't know is it em or is it male? Is it you know, does that mean something else altogether? That we should be the business or your steering committee or your stewards your owners really coming up with that And then we have that reuse it And so when we have these data silos often the problem is you know Maybe one part of the organization solves that we want to make sure you know We're if you have something like a data quality tool or you you have a holistic master data system That these same rule rules are used across the organization so that you don't fix something in one place And it just gets spoiled down the road Where you can automate all the better and we want to be careful what we mean by this because I think I don't know I can still remember a very painful call in the beginning when we were trying to explain to one particular generally 99.9% of the people we work with are very understanding of this, but there was one in particular He's we're trying to explain the data quality issue And he said gosh, can't you just come in and like that's it the address data and be done with it you know how complicated this be it's address data and Needless to say that was a challenging project because it might seem like that is just so easy and you can just automate it But yes, you can automate it and the analogy we often use as you can clean up the the pond in your backyard There's some pollution in the pond But if you if the pollution is coming in from the streams feeding that pond your pond is gonna still get dirty, right? So I think a lot of people reasonably I guess in the beginning think well This just run that the profiling script to get it and do we thought there's automated tools that can do this Yes, there are but you need to make sure those automated tools are using the right rules And those automated tools are aligned with your business process and they're being done at the source system Not the way to warehouse too often and people sort of catch it after the fact and think they're done Well, yes, we caught that these these codes are wrong. How do we how do we fix that? Is it master data that we need to publish as a reference data and metadata about how to use it? Etc. Etc. Etc. And that really ties into the next one, which is Obviously near and dear to my heart is really linking to that data architecture and data architecture I'm using very broadly is it do we have a data model to understand what we mean by customer and how that links with the contact, etc What the fields are what the valid value? I mean So many of the I might even be slightly worse than Nigel on these rants about When I get the something wrong in the mail with my name wrong or they my bank sends me an ad for a credit card That I already own and things like that because I know how that could be fixed, right? And so getting these business rules understanding the valid value some of it could just be you know I understand the difference between a postal code and a I mean a postal address and a physical address, right? Do I have a PO box or is the physical address to add the valid values, right? So that that's kind of your data architecture that might be in a data model But also your holistic enterprise architecture I may have I had one client that I worked with and I had one of the industry really leading and master data management tools That was integrated very well with one of their systems But wasn't integrated all with about 80% of their other systems So the perfect example of their you know, their silo data architecture worked really well But they were still having all of the classic problems of customers not having the right mailings and products not arriving at people's Houses because they didn't look at that holistic architecture. They did the micro data architecture, which again wasn't wrong For them, but it was wrong for the organization because it wasn't tied into the larger data architecture and the larger data governance Framework to get the right people to table again very few times in my career Have I ever met someone who's maliciously saying I want to you know, they make my data quality bad And if you do that's an entirely different issue Mostly people just need to get their job done and they don't see the downstream effect you know, if you're walked in a city and some of the the Covers to a to a you know A water drain might say you know this water flows into the Chesapeake Bay or something like that because you might not think when you Dump your cigarette butts or your you know oil from your car into this you're just dumping it into this thing You don't realize that's going to be your drinking water down the road, right? So they got some just education of where that data is going to go It could be a big improvement In fact, I think I've told this story in the webinars before but I need to tell it again One of these it was a retail company and they were having the classic problem with their email address is not being updated Not being correct and some of their high profile, you know loyalty customers were getting Didn't have their emails, right? So we did a you know your classic data flow diagram and system architecture and showed it to the chief marketing officer And not some of that would normally and you know be thinking of data But we clearly showed what happened when we changed the email that didn't flow and she was very bubbly as you can imagine a market And she said oh my gosh, I never thought in my life I would say the word data flow diagram, but I love it That's exactly explains why we have the problem and and so for those data architects from the call You often can't go to see at the table up at you know very high level decisions if you explain kind of these impacts And that's why I love data architecture You can really see how these small changes can affect something downstream So then once you found these small things that affect something downstream you build that business case because there's a lot of things We could fix what is the impact that this change is going to have to keep going with that analogy with that marketing company what we did is basically a Very targeted micro enterprise architecture if that makes sense We said there's a lot of issues We know email is an issue if we can just fix email address that's going to fix 90% of our Marketing mailings it's going to help our loyalty program It's going to help shipment because we sent shipment notices to our people if we could just fix Literally just email address and we did that we did a cost benefit benefit We had we had the marketing team sign up We had supply chain sign up everyone just and of course we fixed some other problems along the way By understanding that flow of data, but we had a very targeted use case We got people bought into governance and we were able to solve things so I think that's important to remember because it can be overwhelming and there's a lot of things you could fix But what's going to be the highest value and then when you do have this high value What do you do about it and Nigel will talk more about this later? Do you have a data improvement plan? You can't improve what you can't measure So yes, we can all you know tell the funny stories about the quality and we can complain about we found it wrong But what what tactical thing we're going to do and what ROI is going to show from that? And then you show the benefit from that and that that kind of closes that loop of the get the data governance framework We can report on it Everyone knows the value of it and it's a repeatable process Which ties into our favorite topic which is data governance and it truly is a framework down from as we've both talked about What are your business goals? Why are we doing this to why can't we do it with the data? You know issues and challenges and keep it simple And Nigel I and Nigel mentioned that I'm a fan as well the more complex it gets Um, keep it simple, you know anyone who's done anything, you know think of pilots on the airplane They have very simple checklists. I'm going to fly a massive plane with complex things Did I turn the switch? Did I do x y z and anyone you ever worked in emergency? You know services or anything with complexity. They always go right down to basics So especially if you're a technical person on the call and you're speaking to the business It's just keep it very simple We need to fix the email address because it's affecting your cart marketing campaigns And we're losing 90, you know Thousand us dollars in potential revenue as a result Everyone can get their brain around that and then go fix it So it'll be very clear on what you're going to solve and how and then get that right organization people And I'll just I'll just use this analogy to death on the email because I think it's just so easy to understand Right, so that email address who cares about getting the email address I mentioned it before sales cared the loyalty program cared, you know marketing cared The supply chain cared get the right people involved and the right people to make the decision because so often you think you're being helpful I don't want to bother people. I'll just fix this email And and you you realize that you're using it a different way For example, we actually went into the stores of this retail organization and looked talk to the salespeople And they said oh, yeah, we never get emailed because we want to close the sale So we don't even ask we just put in you know me at me.com and move on And then we looked and we profiled data and there was 20,000 me at me.com And as soon as we explained what that email was used for they chased it But again, had we not gone literally gone into the sales floor that sales we might not have seen that we might have just I think that's a data quality issue So think outside the box think of everyone that could affect and put a process in place Uh, which ties into the process and workflows. Do we have a process for data remediation? Do we have a process to escalate issues? Often the people that find the issues aren't the ones on your data governance steering committee They're the ones in the field being affected by it. So how can they escalate that? How do you manage? How do I know what good looks like? How do I show ROI on that? Nigel will talk more about that and culture and communication is We could do a whole lecture we have a whole presentation just on that And that's really what makes it think that's what solves the you're going to fix data quality for me, right? No, no everyone realizes that yes data Is everybody's you know responsibility and this particular piece of data is my accountability and I will I will Take charge of this and I'll make sure it's right And then of course the tools and technology But that is really the icing on the cake if you don't get the top of that house, right then Having the foundation doesn't help So I'm going to pass it back to Nigel who's going to talk a little bit more about kind of How some of these trends have changed and how that will apply directly to data governance Yeah, thanks. Thanks Donna. Um, yeah I mean the good news is that that you know The data world has changed enormously since the sort of discipline of data quality really took off I've said around the 1990s and in my experience But this is really how the focus has shifted Well, I started pretty much data quality really only focused on batch because batch was the predominant mode in those days so things like data warehouses and And such and such things were really the main focus today There's a lot more focus on getting data, right as I mentioned real time online And that suggests a whole new different approach to data quality because in batch it's the the implication is you can be reactive You wait until the problems occur and then you fix them Today you can't afford to do that because in in the digital world you can't wait until things fail You've got to get it right first time and prevent the problems from happening in the first place I mentioned before a lot of data quality was driven by the IT department today through governance and other things And the rise of the digital company. It's now becoming a much more business driven So your approaches need to reflect all these things. You know, it was very platform specific So, you know, your crm system had data quality problems So let's try and fix those problems reactively In your crm system as Donna said today, it's much more about identifying the horizontal flow Inside and also outside your organization and how those flows are impacted by bad data as it passes through So enterprise-wide approaches are now necessary in order to get a good handle on this Again, original emphasis on data cleanse, you know, wait for the problem to occur then do a clean-ups Get the scrubbing brush out and scrub the data a bit The trouble was that approach, of course, is that you never finished doing that because as soon as you cleaned it up Because the world changes so quickly you're cleaning it again and again and again Which itself is a cost to failure So there's a lot more discussion these days about the sort of data re-engineering approach And I had this philosophy when I did this in a telco Which is that you don't fix data unless you fix a method for how to keep it fixed Because otherwise you constantly keep cleaning it all the time a lot more focus on that the tool sets as donald touch on In a minute have improved significantly And today, I mean a lot of the focus around better data quality is not on operational processes Tends to be more and more focused on on reporting and analytics as big data and analytics becomes a lot more prevalent and important in organizations Then so the importance of data quality and getting data quality right to support that bi Analytics focus is becoming very important as well And instead of putting data quality tools in Platforms like data warehouses today data quality tools are much more often as service Across the organization so that you have a set of common rules as donald mentioned earlier somewhere And apply those rules to whatever system whatever platform is using that common data And so that means you can't be siloed anymore You have to be part of a broader holistic data management change program Which was also touched on earlier in the conversation But that's not to say the traditional approaches to these problems Don't still have a relevance because one of the things that we do find Is you know when I started a long time ago in this field, you know People were really focused on some of the key systems like CRM systems ERP Data warehouses operational data stores, etc And so the approaches that were originally developed then are still very relevant today when you're looking at some of the data issues in those areas And i'm sure you're familiar with many of these things you start basically by profiling the data Actually looking at the real data contained in those platforms Looking at gaps looking at inconsistencies looking at obvious errors And sort of quantifying the baseline of that data then developing some data standards and definitions Building the business rules on the basis of those Automating the cleanse and enhancement wherever you could Embedding those standards and rules then into batch and sometimes real-time environments Producing the the KPIs to make sure that this isn't a project that goes away. It becomes a continuous process So data quality improvement becomes not a one-off project It actually becomes a continuous business as usual process But I think you know the way that things have changed and there's a sort of what we call the new age of data quality now You know for digital organizations in particular those approaches in themselves although they have value aren't enough And there's I think an increasing need as well for some of the approaches that you see here I'm not going to go through all these in terms of time But I would increase focus these days on validation rather than cleanse and enhancement at the back end It's going to be done at the front end And I think you know the concept of intelligence at the edge Is now being accompanied by something what I call data quality at the edge So if for example, you have a smart meter or you have an internet or things device Then building some basic data quality checking into those devices is a good idea So that you're pretty confident that data will be correct by the time it hits the systems Rather than waiting till it hits the systems and then trying to sort the problems out there Golden rule I think of IT that we're all familiar with if you prevent a problem You fix it a lot more easily than if you wait for the problem to occur So some of those things I think are going to be increasingly important And I think as well things like artificial intelligence, for example Looking at methods, certainly I work part-time in Cardiff University my home city They're looking at things like self-checking data quality tools and algorithms in AI So that if the data sort of feels wrong the system itself learns that the data is wrong And starts to flag up some of the issues and some of the some of the errors with it As Donna mentioned as well, I mean the days of IT control of August have gone And that the tool sets and the approaches these days need to give the business users a lot more control In terms of you know dynamically preparing business rules So it's mentioned fit for purpose becomes more complicated because what's fit for one One business intelligence consumer will be different for another, but that's fine That's how dynamic and flexible these these approaches and tools need to be And you know because of end user self-service data quality So a lot of those users themselves eventually and I don't think where they're yet Are going to need some of these capabilities themselves rather than passing the buck to IT and saying excuse me Can you do some data preparation for me? And clearly these end users need to do that themselves. There simply isn't time IT would become a bottleneck And so if you do so have tool sets and Donna will touch on those in a minute to support Improving data quality. They must be able to operate in a much wider variety of platforms than previously So, you know where they could operate in a huge big data warehouse as long as in a traditional relational Data warehouse for example both in real-time and batch and also increasingly Operating sort of semi structured and more unstructured data types So that you can do some data profiling for example of key data within unstructured data A lot of that is beginning to take shape But I think over the next few years we'll see more and more commercial tools come out Then actually we'll make that a reality for much many more organizations and at the current time So those are some of the things the new wins and but if finding the right balance is important So I'll hand back to Donna as well to talk about that a bit more So sure so as we should have mentioned, you know, we have to be realistic and proactive about this But there's also the realities of Of life, right? So I think the right the proper mix will always be a mix of both humid and automated because data quality is fit for purpose For human being so we can automate a lot of that. So what is that right balancing? What can we automate? What can we fix after the fact? What do we do up front? Yeah, obviously it's a continuum But if we think of kind of maybe we start with the right with this if sort of We're resolving it after the fact that kind of post-process So that's you know resolving at the source or versus post-processing What's automated and what needs to be have sort of human intervention? Well, if you think of sort of if so if you're high human intervention and sort of high Post, you know doing it after the fact, you know often when Nigel and I build sort of data governance frameworks One of the first thing to do and it's still I we still say it's a good idea But it is very reactive of what are some of these these? Issues, let's do a data quality working group fix them quickly move on So it makes governance popular and you can do some of that reactive to distinct problems But wouldn't it be better to do that more proactively? So how do you do that things like business process change? Can we think when we're designing a new process how to add the data flow to it? So I'm a big fan of process models And whenever we look at the data issues, what's wrong with what's the issue with the process? So can we design those processes up front? Do we have the right policies and procedures? Do we understand that a governance steering committee should have this not happen? Can we train people? Some of my customers are actually They're on industry advice, you know, sometimes it's the data exchange Or you know, we can't be the first people on the planet who have you know managed addresses Or we're in retail and then retail product codes And so to help that a lot of my customers are actually on some of these industry advisory councils to get you know cross data quality, correct? Um death missions and glossaries I could do my whole metadata rant here But I won't of just often knowing how that data is used can go a long way You know people are trying to put in the best answer Um, for example, one of my clients is a non-profit and they have children and parents and asked education level And the data quality was terrible because people really didn't know do you mean the education level of the parent? Um college educated versus high school or the child who's you know in kindergarten and there was an And the data drop-downs didn't help there was a free text So of course that's wrong teach people what you mean by education level that would have been an easy fix and we actually did fix that Um kind of the in between at the human level of proactive of planning ahead and reactive of this idea of state of stewardship Those are the people that yes can be fixing things as they go But also looking ahead and and proactively understanding and then this might seem strange, but conscious disregard Sometimes we know there's issues, but we can't fix everything Um, and so I think that's almost the most valuable to pick the right things You know So someone we filled out a survey and we asked for their favorite hobby and everyone put in a sarcastic answer Does that really matter? No, but it matters that they put in the wrong email, right? So just pick pick your right battle So when we think of on the tech side similar thing you can we talked about this already you can after the fact clean up The data that's only going to be a one-time fix You'll keep planning and keep cleaning it and keep cleaning it similar with etl You can fix it in the warehouse, but that might even be more confusing because people see it wrong on the source So ideally you do want to fix it in the source So this application data entry and workflow that example I had with the parents and children If the drop downs were kindergarten first grade and second grade it would have been pretty obvious You wouldn't need a data dictionary because it would have been obvious what you need to fill in versus college You know phd, etc Application level validation similarly Can you have a drop down if you want m and f versus male and female put the right Validation and only give them two options are the valid state codes in the us, etc And a lot of that can be driven by your data models And as Nigel mentioned a lot of these data quality tools can validate the source They can go look at you know valid addresses and against A database and say yes that's that back to those dimensions of data quality It's a correct email, but it doesn't exist. I mean address, but it doesn't exist anywhere You can't mail to that and kind of do automated data quality checks And again kind of in the middle of this idea of audit and dashboards We're always you know, if you can't manage what you can't measure So do you have a dashboard that's proactively always auditing? We're recommending often and at these steering committees if email and address and gender codes and age are important for your demographics Look at them every day to quality steering committee meeting or data governance meeting or whatever you're calling them And make sure you're getting better And sometimes again, it kind of ties with this validation. You can use external data sources to kind of validate that So anyway, lots of tools of the trade which leads me to tools There we often ask what's the right tool for the job? There is no one tool Um, and and some of the tools have gotten better. There's is overlap A lot of the mdm tools have data profiling a lot of the data Pro modeling tools have metadata Um, a lot of the pure play data quality tools do a lot of these things from analysis to augmentation, etc So the key thing is that you want to have business rules that can be used across systems that can be repeatable and used across all tests as tool sets so the The business will you create and one can be cascaded across So really when you get these right we've talked a lot about each this idea of data governance data architecture and data quality They really is this virtuous cycle Right, so data governance provides the people and the policy and the prioritization That helps to write that architecture that provides the model and the integration and the prioritization And then quality actually can you know, it's the culmination of that of can I track the metrics and the Business quality rules that then tie into governance to fix them and you really need all of those in equal measure in place Um to really get this right not need none of them lives in a vacuum So the other piece we mentioned as a result of these is you know, man identifying it find the business value of it Um prioritize it and then fix it and then make sure you're fixing it in a holistic and documented way Which is what Nigel is going to talk about this idea of data improvement plans Yeah, thanks Donna. Um, this this is something that I really encourage If you're embarking on any data quality improvement work, whether it's at an organizational level or departmental level Or maybe within a single a single platform or a single system But having a plan I think is really important I don't know who said it's a cliche I know but if you fail to plan Then basically you're planning to fail And in my experience unless you have some really hard targets and you treat this like and you would treat any good solid project plan Then you're never going to succeed and you see there. That's just the heading Um from a sort of a typical data improvement plan that we've been involved in helping companies with You know, you need to look at the data area and the elements involved So what's the scope as Donna said earlier of the data that you should focus on Then what are the key issues and problems with that data and those problems have to be defined Not in terms that 32 of the times this particular field is missing It's got to be in terms of we can't email all our all our customers on our marketing list because 32 of our email addresses are missing So it's got to be related back to the business somewhere Basically, I mean lots of benefits of having this approach. How do you create one? I could talk for a whole presentation about this. So I've tried to summarize it in one slide And how do you go about creating an improvement plan for data? One of the things I should stress by the way Is that it doesn't imply that, you know, you would have a customer data improvement plan or a product data improvement plan They would be valid areas But sometimes your data quality issue could be focused around the business process So we could for example be something like a A customer fulfillment data quality plan because the process is beset at the moment With the with the whole series of data quality issues as the data flows across that process So you can apply these are all sorts of levels to all sorts of things But basically you're supposed to be the same you investigate the data you baseline it Identify some of the key problems you get organized to look at fixing it Um You then prioritize which of the problems that you come across you need to address first And that is normally purely on the basis of value To the business and which things will create the biggest buck if you like for your bank Um your biggest bang for your buck I should say in the initial phase and then and then the improvement phase after that And then very often these data improvement plans not don't let become a single project with the beginning and end date But they become almost a continuous process So for example, you might start with a with a customer fulfillment data quality working group But that could very well turn into a stewardship group Which sits on a continuous basis to could continue to improve the data as the needs of that data changes So i'm a great advocate for having those and I mentioned as well One of the things I firmly believe in is never do any data quality improvement work Unless you first of all have a business case for action and this is a very simple example from a real client actually Um, I've an organization I was with worked with a few years ago Uh, who basically were doing online gaming not everyone's favorite type of or company, but they were a customer And they had issues with customer data and also with their sales data And you can see there that we spent some time with i'm actually figuring out How much these things were costing their bottom line and how much the improved revenue was going to be Or cost reduction would be from improving the data to a certain threshold And there were some pretty big numbers there So you can see that that gives you the justification for what you do And if you want, um, I spent 10 years of my life working in in bt And if you think these these uh, if you want any evidence if you like that some of the approaches that don and I have talked about work This is probably as good a case study as any although it's quite an old one now But I still have people who still refer back to it And bt through approaches that we've just described Aggregated more than 800 million dollars in benefits Are as a result of a whole series of data improvement projects that it ran within an overall data improvement program And bt saved lots and lots of money through those three things and if you want to validate that Independent reports on that program were conducted both by garter and by forrester And I think if you hunt the internet they're still available somewhere if you can't find them That you know, I've got copies of them. So these approaches really do work. So I've just back to you to sum up and finish Sure. So yeah, as we mentioned, this is a holistic approach Which kind of you know, I identify as a lot of the you know, organization and technical logical issues And I know Shannon's probably wanting to open it up for questions. So I'll pass it to you, Shannon Donna, thank you so much and Nigel. Thank you so much for joining us for this great topic and conversation Just to answer the most commonly asked questions just a reminder I will send a follow-up email by end of day Monday for this webinar with links to the slides and the recording If you have questions, feel free to submit them in the bottom right hand corner of your screen in the q&a section There's been a lot of conversation Going on But no direct questions coming in yet All right, then I will hurry up I Everyone's like Yes, all right. Well, we only had a couple minutes to share anyway, but we happen kind of following some of the great Discussion as we went. So yeah, join us next next month for bi and analytics for self-service and This will all be recorded if you want to pass along to your friends Yeah, and Nigel there are a couple of requests for the case study if you have a link to it If you send that to me, I'll get it out in the follow-up as well Yeah, and actually at the bottom you'll notice that if you're a Gartner client that is the number of the publication from Gartner Um, there you go All right. Well, thank you again. Nigel. Thank you so much for joining us in the late evening for ya I really appreciate it and Donna thanks as always and thanks to our attendees for being so engaged in everything I love the conversation going on throughout and Again, I will send a follow-up email by end of day Monday Thanks everybody Thank you very much. Thanks for listening. Bye Hey