 Hello and welcome. My name is Shannon Kemp and I'm the Chief Digital Manager for data university. We want to thank you for the joining us the latest monthly webinar series data architecture strategies with Donna Burbank. Today Donna will be joined by guest speaker Nigel Turner to talk about data quality best practices. Just a couple of points to get us started. Due to the large number of people that attend these sessions you will be muted during the webinar. For questions we will be collecting them by the Q&A section or if you like to tweet we encourage you to share highlights or questions via Twitter using hashtag DA strategies. And if you'd like to chat with us or with each other we certainly encourage you to do so. Just to note zoom defaults to send chats just to the panelists but you may absolutely switch that to chat with everyone within the webinar. And to access the Q&A or the chat panel you will find those icons in the bottom middle of your screen for those features. And as always we will send a follow-up email within two business days containing links to the recording of the session and any additional information requested throughout the webinar. Now let me introduce our speaker for the series Donna Burbank. Donna is a recognized industry expert in information management with over 20 years of experience helping organizations enrich their business opportunities through data and information. She currently is the managing director of global data strategy limited where she assists organizations around the globe and driving value from their data. As she works with dozens of Fortune 500 companies worldwide in the Americas, Europe, Asia and Africa and speaks regularly at industry conferences. And joining Donna today is her esteemed colleague Nigel Turner. Nigel has over 20 years of experience in information management with specialization and information strategy data quality data governance and master data management. He has created and led large IM and CRM consultancy and delivery practices and multiple consulting organizations, including British telecoms group IPL and FHO. Nigel is a well-known thought leader in information management and has presented at many international conferences and additional to writing numerous white papers and blogs on information management topics. And with that, let me give the floor to Donna and Nigel to get today's webinar started. Hello and welcome. Hi, Shannon. Thanks. It's always great to join these webinars. And for those of you who return each month. Thank you so much. It's always nice to see some familiar names in the chat. And I know folks are never shy with the chat, which is nice. And always some new names as well. So this is your first time. Welcome. If you have missed any of the webinars in the series, as Shannon will mention and follow up, all of the previous webinars are recorded and the slides are available. And for this webinar, which is always the most popular question, it will be followed up by Shannon and the next few business days with both the recording and the slides as well. And please do. We'll talk a little bit about data modeling today, but we have an entire webinar next month just on data modeling, which I know is always a popular topic. So feel free to join us next month as well. But moving on to the next slide. So today is specifically around data quality. And what's interesting or difficult they're challenging around data quality is it isn't a one size fits all approach and it never is just one thing. And so most what's difficult is data quality is often the business as with many things in data management is half business issues and half technical. And so we really in our practice Nigel and I really propose to use a more holistic approach. It's part people, it's part process and partner technology. So to go to the next slide and delve more into what we're covering today. We're going to talk a bit about some of the methodology that that we can use to really you know how do you baseline data quality, you know know where you are and know where you want to go. In this session we're going to particularly talk about business rules, which I think is kind of an unsung hero in the data management world. I think they're embedded and we'll cover this they're embedded in a lot of other things you might use they might be embedded in a data profiling or a master data management tool or a data model. And yet they're ubiquitous and often very critical to the success of getting data quality right is how do we even assess what right is so the lot of things we can talk about data quality we've done Nigel and I have done sessions on this for the past few years now so we wanted to sort of do something a bit different and take this different aspect. So if we go to the next slide. You may be familiar with our kind of wider data strategy methodology as you can tell from our name global data strategy we do a bit of that. What makes data management data strategy complicated and exciting is that everything is interrelated so we like to always show this framework. And we've gotten some good feedback in that it's helpful of that, for example data quality management is, and should be a practice in and of itself. There's a lot of, you know, technologies and techniques to support data quality, and it's interrelated with many of practices for example data governance is hard to do quality without governance because, as we'll talk about today, a lot of the thresholds and the rules around data quality need to come from the business so you really need to get, you know that culture and people aspect of data quality. So you need the right architecture to manage and measure you need you need data models you need those tools to support it, you need metadata to manage data quality and you can generate that metadata about data quality right so they're they're all interrelated and there's some healthy touch points so we'll talk a little bit about that in this session, but we always like to call this back to this bigger picture that my neglected dimension is really driven by that business strategy and will kind of stress this out so much of what we do when we try to implement anything in data management is to really focus on that so what what's the business value and I know manager will touch on this. There's so many things you can look at with data quality. How do you pick just that right amount that's going to have the biggest impact to the organization, and that's a bit of an art and a bit of a science. So we'll talk about that today so if we go to the next slide. So this is an approach that we often use in different phases that we kind of call the a to e, but from assessing to about getting eventually that ROI and evaluating the ROI, and then we've talked about other aspects of this and other webinars. Today we're literally going to talk about kind of the business rules lens of that, but I'll quickly go through all of the aspects because I think it's helpful to kind of frame data quality. And as I, as I just mentioned, assess the business usage usage. Why are we doing this, and where to zoom in and focus where big fans of quick wins. You know, we've worked with a couple organizations recently that I think their biggest problem ironically was that they were doing too much data quality. They sort of were scanning all of their data sources. They had a lot of profiling that they had some of the fanciest tools in the market to do that, but they were sort of missing the. You know what what where were we going to focus how did these data quality problems cause a business impact and it didn't mean that eventually you didn't need to focus on on more areas of the business, but their biggest challenge was, how do I focus on just that right amount and they were also trying to kick off data governance and how do you get generally data quality is one of the best ways to kick off data governance because people can relate to it they can see that the numbers and the report are wrong or there's no operational issues due to their quality, and it's often a good thing to champion around. So one of the ways to start, once you've identified, maybe it's customer data, maybe it's, you know, product information or whatever you pick. How do you baseline where you are today and sometimes that's forgotten with any data management project, you're going to show success, but remember to show where you started. And we'll talk later about this idea of a data quality dashboard, not just a data dashboard with quality but you know actually using to be a bit meta, you know look at the quality of the data itself and have a dashboard on the quality itself to really see how you're improving over time and monitor that over time. And then, you know really prioritize what things we're going to fix and then develop improvements on those so there's kind of two levels of focus anything a data management. So focus focus focus and really focus in that ROI. So now that we've we've picked the area, and maybe it's customer contact information, what area of that do we focus on to start doing actual improvements. And we'll talk about that today where business rules can have a nice impact on that, and then never forget to really look at the benefits and ROI. It's so easy to jump off to the next thing. So we have fixers as data management professionals we always want to build and fix, but sometimes it's good to remember as you're fixing the next thing. People are just consuming what you've just built so really is part of your rollout needs to be continually showing the ROI continually showing the benefits and the other thing. So what I move off the slide is that, you know, it's just sort of the thankless effort of getting it right right so we've we've gone into some organizations or help certain organizations that everything eventually just run smoothly, and nobody notices you because everything's just right and they can't imagine what, well, how could you ever run business when things didn't work. I remember back in the day and that's where some of these, you know, data quality baselines and showing improvement and then showing how that affected the business. You know we removed efficiencies, we were able to generate better marketing campaigns with better customer information etc etc so so don't forget to kind of to your own horn a bit and show that ROI because because it's easy to forget as you go into the next thing. So, moving on to the next slide. I'm going to pass it over to Nigel who's kind of kind of dig into this business rules aspect of data quality so over to you Nigel. Okay, thanks Donna and good morning good afternoon good evening to everybody listening to this live or as a recording. We're going to focus today on these things called business rules and particularly the role they play into the stages of that e2e methodology, which is the baselining of data. And the key questions that always come up there was how do we baseline the data that we've got how good, how do we know how good it currently is. And what do you measure that against because if you measure something being good you have to measure it against some sort of benchmark or threshold. So those are two key questions and then when you come to develop once you once you've selected the areas that you want to focus in on, you then start to look at developing improvements. And then you've got the question of what data do we need to focus on to steer that improvement. And how are we going to assess whether we've been successful or not. And in order to answer those questions and I believe that business rules are absolutely key to this. And what is a business rule. Well, I've used that definition there on the left hand side as you can see from Ronald Ross which is a widely accepted one. To put it in other words a business rule basically business rules are guidelines for how a business should operate and make its decisions. And when you think about it like that I mean every organization depends on business rules to maintain its governance whatever its governance or whether it's people process technology finance or anything else. For example, you know some example business rules would be things like levels of authorizations for payments of invoices. So obviously in every organization if the invoice is for $10 it can be signed off by somebody quite junior if it's a $10 million it's probably going to be signed up by somebody at board level. And that's another example would be know how sales people in an organization a bonus, depending on the value and the volume of sales that they make there will be a set of rules around that. And also if you had a construction site for example you run construction sites. It's a business rule all the workers on that site must wear hard hats and vis jackets in order to maintain or comply with health and health and safety rules. And those sort of rules not in themselves aren't directly to do with data. But very often when you end up with business rules like that in a data context they often lead to data related business rules. And there's I think a strong connection here as you know this in the top of this when it said we say business rules are used to define and enforce standards. Then we're all familiar with data standards and I would argue that data standards are a specific subset of business rules that obviously relate to data. Once you've got those standards in place and they can help you to assess how good your data currently is. And also how how best to improve it. And you can see there there's a number of different uses and we'll touch on some of these as we go through the rest of the session. As to how you can use business rules first of all to cleanse your existing data and enhance it. So if there are gaps in the data or inadequacies in the data you can use business rules to drive that. Also then when new data is created, they become the standards which new data must conform with. And the third thing is, you know, obviously as well if you're building new platforms, new applications, new data sources, then clearly those rules should be applied within those new developments as well to ensure they consistent. And also enforcing standards in existing applications. And then the other key thing about business rules, if you've got well designed business rules, for example, you can stop bad data getting into your systems in the first place. And a good example of that is if you have a business rule that says the allowable values for this field are you can then create things like drop down lists on screen entry validation to make sure that people can't put in that field something other than what the business rules specifies. So business rules are very useful in all these respects. So I suppose the next question becomes and you know how do you classify business rules as related to data. Well, the one thing that I was doing a little bit of research for this presentation is that there are many, many different ways of classifying business rules including data rules, some of which are extremely complex. And one thing that Don and I always greatly believe in is to keep things as simple as possible. So when it comes to data business rules then then we contend that really there are two main types of business rules that are relevant when you're talking about data. And the first is about specifying the format of data, which we very originally called format business rules. And you know, you know the sort of thing I mean those we've ever done any coding or any system design will be very familiar with these include things like if you have a field or an attribute. Well, how long should that feel be invites it should it be a fixed length should it be variable, etc. They are business rules. And then you can look at things like what the format of the field should be. So should all the contents of that field be alphabetic for example in the case of somebody's first name. Should they be entirely numeric in the case of a finance field or maybe a mixture of the two in terms of a code. So format rules are really important but then you can also have content business rules. Even though you might have the format of a data entry right or a field right, then you can also check its content against set business rules. So for example, things like allowable values. So you know the field must only contain the numbers one to 10 and nothing else. That field is mandatory or optional for example so in some cases you might have a field which says mobile number, but the clients and customers may be may be able to leave that field blank if they don't want to give you their mobile number. And then also as well relationships with other fields or records and I'll touch on that later as well that sometimes business rules will specify how one field should relate or is interdependent on another field. The next week to bring this to life is really just to give you some simple examples and these are fairly simple examples because business rules can get very complicated. As anyone has done any coding and tried to implement some of these in a computer code will know. These are some of the good example anybody that lives in the UK for example as I do will know that the national insurance number in the UK is always or should always be in that format to alphas three double sets of numerics and an alpha to finish. Or if you will live in the US or Canada, then date of birth is usually in that format then month day and year as opposed to day month and year as it is in Europe for example, and you go to something like a router identifier, which must conform to that pattern. So format business rules basically enable you to specify the shape that the data should look like. And therefore when you look at actual data in those fields you can then compare and use this as a benchmark to say does the data in those fields comply with those format rules. And then you can do the same process with content rules as well. And here's some examples of content rules so for example in this organization if you've got a sales rep, a sales rep can only be assigned to one and only one sales region. You can't have a sales rep in this organization that is allocated to more than one sales region. So if you could look at for example a supplier, and you say a supplier cannot be a valid supplier, unless it has at least one associated geographical address. And then sometimes business rules can involve them calculations as well like the bottom one here that you could have a rule in a manufacturing organization that says the price we charge for our product should always be calculated on the basis of the unit cost of that product, and as you know if you ever try and implement some of these things, I remember I'm old enough to remember coding in COBOL, believe it or not, which for some of you is probably ancient history. And I remember getting into terrible pickles as a programmer, trying to work out a lot of nested if then statements including ands and ors. So if this is true then this may happen or this may happen and this may have to happen. And those are in effect very low level, cold level business rules that are often derived from rules such as this. So, as Donna said, where, where can you identify business rules, it'd be nice if they were all neatly laid out somewhere in an organization, but I don't because there's an organization on this planet that does that. And so therefore when you're looking for business rules, you're really going to be a do a little bit of detective work, and one very good source of business rules and Donna's going to talk about this in a minute. So data models, if you have data models Donna will show how a simple data model can often help you to figure out what the business rules should be. Then of course you've got good old business documentation. So you may have things like for example user instructions for people who are inputting data. And there might be some indications in that documentation when they complete an input screen for example, which might actually tell them what the valid values that can be put into those particular fields on screen are. And then of course you've got it documentation as well, whether it's a requirement spec even a system manual may contain some of the rules source code of course sometimes I work in organizations where you literally have to look at the source code in order to figure out what the business rule is that's being implemented. And if you're lucky enough to have master or reference data as Donna mentioned earlier, things like currency codes for example they will give you a valid set of values that you can then implement with within your particular application. And again you know metadata often good sources of business rules for example glossaries and dictionaries and metadata repositories. And we'll talk a little bit in a minute about how you can empirically derive business rules from data profiling outputs. Then the other thing as well we always say doesn't matter how good the above are, but you always when you're looking for business rules always engage with the stakeholders whether they're the data producers the consumers. If you've got data governance in place they might be the data owners of the data stewards on any subject matter experts that you can find and there's two reasons we find for that is that many business rules in many organizations are actually implicit. Either inside people's heads, and they never really formally documented it anywhere and they're almost passed by sort of legend and tradition. You know I've asked questions in an organization where I've said so what value can that feel be then. Oh they can be either this or that right is that written down anyway no it's something we've always done. So you need to talk to people to actually extract these implicit rules from from what they do it. The other thing as well is where you do get documented rules. Sometimes very often in fact the documentation gets out of date and changes are made for example to applications or systems that are not reflected in the documentation. So you cannot take always the documentation at face value. So I think it's really important to do that digging around. So I can hand back to Donna now was going to tell you a little bit about the value of data models as a really good starting point for deriving some core and key business rules Donna. Thanks Nigel. So I'll just go lightly on this as I mentioned we'll have a whole webinar next month on data modeling, but one of the beauties of data modeling as it relates to business rules. Touch on a few things Nigel just said that it's a both top down in bottom up and then matching to see if they, they align. So some of the top down are the things that are in people's heads or in documents or in training materials etc. And one of the nice thing about a top down or a conceptual or logical data model, you can actually translate these into actually sentences. If you start with the data model and or you can take sentences and turn them into a data model. And sometimes it's these very simple issues that cause a system not to run or report to be wrong, or, and I almost make a game of it in life and have a bad advert or some some system I'm using doesn't work correctly I often think I can almost picture the data model behind this that isn't working correctly. You know an employee can work for more than one department, a customer can have more than one account. And so data models are a subset of the business rules you can kind of get that pattern it does one exist, can you have more than one. It isn't as Nigel mentioned before I mean software programs can get a lot more nuanced that you can have three departments if your department code is this you know that that requires programming logic but some of these basic core rules, but I wouldn't underestimate them for example that first one employee can work for more than one department. That's made my life miserable. I was in an organization where I was partly in the consulting organization I was partly in product development, and I sometimes did them some things for marketing. And if I wanted a business trip, I might have been doing three of those things at once and had to sort of allocate my expenses and my time to different groups, the ERP system that that organization used or the way it was set up or again who knows you have to look at the business rules didn't allow that their role was an employee to work for more than one department but I worked for three right so just think of me spending hours of time and the accounting department to get involved all over something is simple as going through some of these very core business rules. If you have one, can you have more than one must you have one. And if you're a data modeling you're probably not in your head is sometimes just those simple rules that makes such a difference and then what does one of those mean. I love to tell our stories about data but you know how many issues have been what do we mean by a customer. I was at a very large organization where I was an employee and they sent out renewal notices to people who didn't have the product, because they, they, again they were sending it to a marketing database that said customer. It wasn't customer it was people they were prospecting to. And I'm sure you have similar similar stories about any of these what's a product was an employee what's a patient was. And that's where you sound strange to your friends but all of these really relate to business roles, it can have a massive effect on either positively or often unfortunately negatively on an organization. So that's kind of some of the top down if we go to the next slide. But still sort of at the conceptual logical and again it's the definitions, a business rule might be in the definition this particular tool I like because it shows those business decisions right on the data model. An employee can be a full or part time worker who's on the active payroll the organization contractors not employees. Well they have to be in the active payroll to be an employee. And each if they're taking some time off, maybe they're not an employee. Let's think about that or part time to include partner. Those are all things that again, when business people look at this model can start to discuss. Some things are different from every organization and it's a choice so do we include part time people as employees in our discussion in this particular scenario or not. And we contractors are not employees but again, that's each organization is different, or you love the cardinality lines of a data model, you know can something be more than one right a sales rep can have be working with more than one company and then lower left, but each company has a dedicated sales rep, some companies do not have a sales rep they're too small to have a sales rep. Right, or can we have several sales rep helping that customer. Let's really think that through. So that's the beauty of a data model I like them because they tell stories. And again that's from the top down approach and what we often do, if you can also that might be the rule, you know, an employee can only be in one department. And then we may reverse engineer from the databases, and maybe the embedded logic and the databases say the opposite that there can be more than one and you can look at the relationships and often that's where you see issues in the top down with the conceptual and logical design of that's what's in people's heads. And then the bottom up reverse engineering from the database, and that's what's in the system and we've had some hugely aha moments in our consulting practice with that type of thing of okay well you're saying one thing the systems are not doing that. And we're doing that with some workshops right now with a big financial organization and it was something as simple as, you know, is this field required. And the business folks said no, and then we said well in the systems and part of it was the databases part of it was as Nigel saying the system logic itself. We said well it is so which one do we change do we change the business rule, or do we change the system but nobody realized that and they were wondering why they was business quality rule so take some effort take some sleuthing. But if you enjoy that sort of thing kind of be fun to find these you know hidden nugget of why things aren't working. So I'm going to pass it back to Nigel. And again, if you're into data models join us next month, but over to you Nigel. Thanks Donna. Yeah, what I'm going to do now is you know with Donna mentioned earlier we were very keen on the so what question. And so why are business rules so important so what I thought a good way of illustrating this would be just to give you a couple of real life examples both from the UK of where business rules either went missing, or simply incorrect. The first of those was was made headline news actually in the UK back in the beginning of this year in the height of our lockdown our second winter lockdown. And it happened in the city of Liverpool, which most people across the world know probably best because it's the place where the Beatles came from of course. And that's the Beatles statue that you can see at the waterfront in Liverpool. But Liam Thorpe the guy in the bottom left there with a pint of beer was a 32 year old Liverpool resident who no one had ever heard of until he received a letter from his local clinical commissioning group who were rolling out the vaccines within Liverpool at the height of the pandemic. And he received the priority invite for a vaccine, because he was medically classed by that health board as morbidly obese. Now if you look at poor old Liam. You might not be the skinniest guy you've ever seen but you certainly wouldn't describe him as morbidly obese so he was a little bit confused by this and started to ask questions about it. Anyway, to cut a long story short, the reason that he was regarded as morbidly obese was because for some reason on his health record, his height had been recorded not as his actual height of six feet two inches which he is, but at 6.2 centimetres, which probably would have made him one of the shortest residents of Liverpool in history, and also would have meant that rather than getting ill from COVID he was much more likely to be savage by his neighbors pussycat I would have thought. And to make it even worse then that they then use that height of 6.2 centimetres to calculate his body mass index, because they knew his weight and they recorded that okay. They came out at 28,000, whereas a body mass index are higher than 40 is classed as morbidly obese so he was not just morbidly obese he was very morbidly obese if you believe that. And when he actually complained about that he was he was then they realized the error of their ways and was put back in the vaccine queue in his rightful place which was a little bit down it being a young guy. And if you notice the quote there in the yellow box from the chair of the Liverpool clinical commissioning group, this as I said made headline news in various news channels in the UK. And she said yeah I can see this is quite a funny story, but they're also recognized that there are important issues for us to address. And basically you can see there that whatever went went went wrong here certainly the business rules weren't right, because you would have thought that a business rule should specify well what should somebody's minimum height be. And clearly somebody who was 6.2 centimeters high is probably not right. Not unless his name is Tom thumb, and that they should be some sort of maximum BMI which makes sense so having a BMI of 28,000 is really a bit crazy, given the 40 years morbidly obese and of course the other thing that is the format business rule, which is what sort of system design was taking place where somebody could record somebody's height in centimeters and not in feet and inches. It's a bit weird as well. So anyway, as he said, funny story, which cheered us all up at the time we're all being miserable in the winter in lockdown, but I did raise some issues then about the way that this this clinical commissioning group were actually managing the data and the quality of the data must have made people feel a bit nervous I think that they got their data right as well. The first thing that came out last year as well was about aircraft taking off at Birmingham Airport in the UK, and the us accidents investigation branch published a report in April, highlighting that the year before three flights to Europe had taken off. The big issue was that they underestimated the weight of those planes by 1200 kilograms per flight. Whether it knows anything about aircraft taking off knows that the weight of the plane is absolutely critical in terms of the pilot deciding what's the correct take off speed, the correct thrust to apply to the engines, etc. Because of the miscalculation and there could have been a serious incident on take off luckily the planes did take off safely, but the pilots reported some issues with the take off. The reason why it happened was really very simple is that in the system that this particular aircraft was using, airline was using all passengers who had the title miss were assumed by it suppliers to be children and not adults. The business rule was that if you're a child your estimated weight is 35k kilograms and if you're an adult it's 69 kilograms. So in other words, there were a lot of adults on my plane who the system thought were children and therefore underestimated their rate. And I like that the airline said that it was a simple flaw in its IT system. But if you think about it, it was actually quite a serious problem with the business rules. And the key problems that are on the right. First of all, as Donna said earlier, if you rely on IT to draw up the business rules they're going to make mistakes. And part of the reason that the mistake was made is that in this case the IT suppliers outsourced suppliers were based in India. And it's said in the report that in India if someone is called miss that immediately assumes they are children. So the Indian subcontractors, you know made that decision on the basis of their own Indian culture. But nobody in the business of this airline ever checked that that was right. And of course now it says at the bottom there the way the airline claimed they're going to they're going to sort this problem out is they're going to do manual checks at airports. But I would have thought it'd be a lot easier to come up with a simple business rule that says, check the person's date of birth, before you decide if that person if she's a miss is a child or an adult. And that would solve it a lot more easily than having to put additional checks in the airport. But, you know, so sometimes these sort of errors cannot just cost money and embarrassment they can also cost lives. So that's why date that's why getting business rules right is really important. And as usual, we've come up with a really simple way of, you know, well how do you use business rules to help you improve data quality. And I'll just run through these four steps quite quickly. That one, as Donna mentioned already is to profile the data sources. So that means an empirical exercise to figure out what the data looks like today. And to try and highlight data that doesn't look as if it conforms to the quality standards or business rules that you would expect of it. Then once you've done that, then have that debate with the business to say, which of these problems that we've identified are real problems, and which are priority problems if they are that would basically help the business most if we fix them. Then you work with the business to design what the business rules should be to drive that improvement. Then you deploy the business rules and Donna will talk about how you would do that. And as we said earlier as well, that then monitor and report adherence to those rules. So this is when Donna will start to talk about data quality dashboards for example and the reason we put that as a circle. It's because like of all things in data quality that you never get it right first time and there's a continuous process of improvement going on. You might run for a particular data area, you might run through those four steps. You then begin to note this in step four that some new data coming in is again not adhering to the business rules that you put in place. And therefore you need to profile your data sources again and look at where that date is coming from. I'll actually try and figure out how you might need to adjust or tweak the business rules in order to again improve the quality of the data. So this is your ending process when you do this. So I'm just going to talk a little bit about steps one and two. And one of the common ways of actually getting a baseline for your what your data actually looks like is to do this thing called data profiling. And there are various different ways of doing it. You can do it manually. You can do it using things like SQL you can download some data on to Excel spreadsheets and eyeball it. And companies will choose, for example, to use a data specialist data profiling tool. Now whichever way you do it, you know, the benefits of doing this are manifold. So, you know, you can check whether the data set meets the business rules that you have laid down for it. For example, if you take an example from the table on the right, which is from a commercial data profiling port and you look at something called country, you will see there were four 5,438 instances of country or records containing country. The minimum length was one in the maximum lengths with 13. That immediately implies a format business rule that the field is a variable length. They can extend from at least one to 13 bytes or 13 digits. Is that correct? That's a good question to ask the business. Then you notice as well that our field can also be blank. Is that right? Are you able to have a country which does not have a value, for example, or is that a data quality error? If you look at this as well, if you've got United States as the maximum value, that's the 13 character field in there. So does that mean there are other countries that are not contained within that particular rule? So, as I said, you know, sometimes if you've got data profiling tools, they make this job a lot easier, but you don't have to have data profiling tools to do this profiling piece. The alternative way of doing it was done a really interesting survey that was done a few years ago that I still sort of call on because I think it was a really good piece of work. What they did there, and this is particularly useful, I think, if you're at the relatively early stages of trying to make the case for data quality improvement. What they did in this survey, they went to 75 different companies or organizations and got 75 executives, one executive from each organization, to take a core data source that they relied on in the organization and to extract 100 records at random from that data source. And then basically eyeball it and figure out how many of those 100 data records were correct in terms of what they would expect as conformance to the business rules. And what came out of that, it was quite, quite scary really when you looked at that, is, you know, of all the records that were looked at, only three in 100 met the data standards or the business rules that the data was expected to conform to. So it proves how bad most data quality is. But it's also a very useful way of highlighting if you're trying to make the case to improve data quality to say, you know, our data records are not good. And we need to take some action to actually improve them so that they conform with the business rules that we set for them. And I think that's really important. Of course, the danger of doing it this way is that you're only taking a sample of your data, which means you could have sample bias in that. And another thing is of course that if you've got particular outliers that may cause you problems, then there's chances are you might miss some of those outliers, but it is a good way at least as a starting point if you don't have any tools. You don't want to write any SQL. Just get some people to actually look at some data for you and tell you how good they think it is. And that proved to be very effective in that case. I just give you an example and this was actually derived from something I did a few years ago. I've changed the names to protect the guilty but this was part of a HR table that we looked at when I was involved with a company specializing in data quality. We just extracted a few records at random really just to look at them and say, well, can we tell anything from this data that we've extracted and there's three approaches you take to this and you can do this manually. If you don't have tools to do this, you're going to get employee number, for example. And if you're looking at that, I'm assuming if you're looking at as I am, you can see that there are some strange potential problems with employee number. If you look at the second one down, Greg, that looks suspiciously like a bit like a national insurance number from the UK. Most of the employee numbers of six digits and they're all numeric. But if you look at both Roy, that's only got five digits, for example, and then you've got Taylor at the bottom that has doesn't have an employee number. So again, you can't assume therefore that those things are wrong but they are the starting point for the discussion with the business to say, is it possible for an employee not to have an employee number. Another thing that you may notice on there as well is that there are two rows in this table, where the employee number is identical. The employee number, therefore, is supposed to be unique key that indicates that they could well be something wrong with that field as well. And then you can go down the other columns. Look, for example, at first name, Greg again, the first name is blank. Is that allowable? If you look at gender, Patel has four X's in the gender field. Is that acceptable? And you can carry on like this and you can see at the bottom of the date of birth field there. There are three X's who has no date of birth, who was born in year zero. And then you have Kevin Taylor again, who seems to have a date of birth that seems to be in a US format, in other words, the 30th of December 69, whereas all the others, apart from Hayes, seem to be in a European format. So immediately you can start to have that conversation with the business. And then once you've done that as well, you can then look across column analysis. So for example, if you take again Brian Smith at the top there, Brian, for most of us anyway, is a male name. And yet the gender in a field indicates that that person is a female. So again, that's for looking at sort of interdependencies between fields. There seems to be a bit of an issue there. This is something we need to check on, whether we can check in some way that I do know some of the data quality data profiling tools, for example, hold a dictionary of common first names. And therefore they can identify if those names are generally male or female. Not always easy if you've got a Leslie or something like that. But in this particular case, Brian is probably a man's name. So the once you've done the cross column analysis, you can then look at the look at the row analysis or the record analysis. And if you look at the top record Brian Smith, and two from the bottom Brian Smith, they would appear potentially to be duplicate records. So Brian is spelled differently. They've got the same employee number, the same surname. The genders differ, but we sort of figured out that seems a bit odd anyway. The same date of birth, but a different roll call, which might indicate for example that Brian Smith maybe was promoted at some point from PM 10 to PM 16. But instead of updating his record they may have created a new record for him. So that raises all sorts of questions as well about about the adherence to business rules. Just to summarise that really just looking at that little table. You can see there all the ones Mark read highlighted there's a potential data quality problem. And all the ones in yellow indicated they could be a potential duplicate records so immediately you're starting to home down on the on the things that may or may not be wrong with this data but remember at this point you cannot assume that some of the things that look like a problem are a problem. And give me an example if you take Kevin Taylor again and the blank employee number. It may well be the case in the HR function that a new employee is not given an employee number until he receives his first payment, which might mean Kevin Taylor joined two weeks ago and hasn't yet had his first pasting, but you can only establish that you can't do that if you're in it. You can only establish that by asking the data owner if there is such a thing or the subject matter expert. Is it okay to have a blank employee number. If it is, then there isn't a data quality problem if it's every employee has to have an employee number. You've already highlighted a data quality problem. So this is why the business and it must work together very closely on these things very quickly. So once you've done that. This is what I mean when I say once you've done that to baseline you've done some data profiling. Then you've got to do that review with the appropriate business and it stakeholders and I would always say that if you've got data governance in your organization. Use your data stewards should be the people leading this effort to understand what the business rules really should be. And then you know you need to try and get consensus you can do that with workshops for example I've used these sort of techniques very often in workshops, which is a powerful way of saying can an employee number be blank? No, yes it can really I didn't know that and you start to share if you like some understanding of what some of the issues with the business rules are. And then also as well of course then that review and validation should involve the business saying okay well you know we know we don't have everybody's first name but it doesn't really matter because we don't use it anyway. So that's not really a high priority for us but what is a high priority for us is getting a date of birthright because that has an impact upon pay rates for example and pay scales. So that's where the business needs to come in and then once you've done that you can use that group of stakeholders then to create what the business rule should be. You can then design that business rule and then you can deploy the business rule somewhere in the organization and on this little table that I've done here for example just to demonstrate that. You know there you can begin to see already that there are some potential format and content business rules that come out of looking at that very small and very simple table of data. And I didn't pick that data by the way because it was bad it was honestly a typical sample of what we found within that database. So you know you can see there that maybe a business rule is first name must not be blank so that means the Greg's record is it doesn't meet the standard. The allowable genders are female male self determined or unknown for example so patelle's gender for X is is not conformant with the right business rule. So you need to take action to improve that etc etc so you can see how that all begins to work. So how the question is then okay now you need you've decided what the business rule should be and you've designed them and you've agreed them the next question is how do you deploy them. So I'm going to hand back to Donna and I was going to talk about the next step in this process if you like is deploying the business rules across an organization. Donna. Thanks. And as we kind of started kind of come full circle so what some of the ways to discover some of these business rules are in systems and applications. And that's also the best way to implement them. So, you know, just going through some of these. The best way or maybe I'm being biased but a very good way is to catch it at data entry and I've seen that in some of the chat and questions of, you know, where do you store data quality, what you check data quality, where do you fix it. I put it out in a limb and say I highly discourage trying to fix it only say on the data warehouse level or through etl because then your reporting might be right but you're only kind of fixing the output you're not fixing at its source. So again, one of these helpful but frustrating is no one realizes, you know all the work you've done when it's right but you know if the valid values are only red and green, the drop down to just have red and green and then there go your data quality I have to tell some of my stories as well but I was actually registering for a data quality webinar and for an unnamed organization who the drop the when they add you put in your state code in the US it was a free text field, and I just sort of wanted to capture that and laugh because that's the easiest, easiest drop down to have as a list of the 50 US states and putting a free text for many things. So again, things like drop downs things like valid value ranges. The best way is to catch that with catch it at the source. Which linked to that actually is your reference and master data so when I said, you know what's the drop down list of red and green. That's probably a reference data list. Now I'm an often when I explain what reference data is to people. I can't see you know this drop down list your country code just take code your product codes. And that's another great way to validate at source. And also a helpful discussion saw some chat on gender codes is male female at it is it male feel undetermined is it will that change over time, etc. So that's your organization, each application may have its own list of valid values, do you rationalize them. So I think that's why they're mapping done, you know, in a perfect world did all use the same codes but again, so much and I've seen that in the chat and the questions. I mean that's why in the beginning we sort of said, or I said it's an art and a science. I mean there are scientific rules, you can get apply. Some of it is is people looking at it and some of it's a decision of is this important. I tend to probably one of the few people not in this call, but in a normal business I'm in the few people that overdue data quality. I'll often fix a kind of a list of things and realize don't know what why are you doing that you're just being OCD. This is just a spreadsheet no one else will see, and I'm fixing data quality will not affect the business not a good use of my time, let it go and I think. I guess the other way the organization I think let's do many things so finding that right balance but master data management. I think it's a great way to catch data quality in the q amp a someone was asking about the dimensions of data quality. You know one of them is duplication master data can help define those duplicates create that golden record, and then again in the spirit of pushing back in getting the data right in the source systems. If you have that published and subscribe model, you can subscribe back to the core sources, I mean a classic example and it again especially large organizations have been around for a long time who hasn't had that sort of problem of don't get me started to my insurance company, you know I changed my address on the website does not cascade everything else and I'm just trying to change my address. Perfect example for master data change at one place. You know is that that's been updated in cascades to all the other systems great operational use for for data quality related to that let's get an input is the code. Again that's where you can discover things that's also where you can implement. You know, these range rules I mean I mentioned a data model that can be done at the database level, but I also mentioned that's fairly limited. It's a blunt force can you have one must you have them how many. It's a little more nuanced like Nigel was saying that, well, and often data quality has nuanced well if it's in the HR department they can be into departments but if it's in this department, you know and you kind of have these ifs and then and and clauses, that's where application comes in. And then there's the data quality tool which can can monitor and over time. Business rules engine are these ways to kind of capture these rules and propagate them to the system. You know, you could often say sometimes a master data management system could be your business rules source it is a nice idea, and Nigel kind of touched on this earlier. Can we store, rather than have these business rules sort of scattered. Is there a single source for business rules that we can then cascade so that's probably good segue into the next slide. Is that you know a, in fact I have one client that sort of debating do I really need a master data tool and maybe just a business rules engine. And, you know, there's pros and cons to each but the idea of, you know, having a single source or a source of business rules can then cascade again into the data input into your source systems into the warehouse, and then you're reporting layer you know I would tend to say that the beauty of catching an input is that's more of your real time data validation, generally at the reporting layer is kind of a batch and after the fact and kind of your, your mapping tables and things like that I mean mapping tables come in handy I tend to cringe at them because I usually say what band aid are you putting on to kind of have to map and it can also be really confusing fixing it in the warehouse but it's not fixing the source like you're not really reflecting So, if we move on to the next. A favorite quote of mine is you know you can't improve what you can't not quite as messed up, but you can prove what you can't measure right. So, we often measure KPIs and measures in the organization total sales total headcount total churn of customers is so many things to measure. Are we looking at the data itself. And we have several clients that we work with that we generally recommend this if you have a data quality data governance Council, or data stewards, how are they managing their progress. So often at each data governance meeting or or a weekly task for each of the data stewards is to look at their dashboard and actually take action on it. Oh, and we've caught numerous things ahead of the game that would not have been fixed. Oh my gosh, all of a sudden, we're not getting email addresses and we really need that what happened, oh there was a change in the source system, had we waited to the reporting layer months later it would have been a lot harder to fix. And then the other the beauty of these and something to think of, they can change over time. So, we've one client that's been sort of going through systematically and I think we've been talking about that a bit you really need to focus fix and move on you can't fix everything and then say well, you know the emails were really terrible. Well we made a concerted effort for training and really getting that right. Now they're fine. Let's, we don't have to monitor them as a top KPI let's move on to something else, still monitor it, but it's not kind of in the red zone. So again, really helps with kind of keeping and maintaining over time. So if we go to the next slide. I just want I do want to go into questions because I know there's there's a lot I can see, but just just to kind of summarize, I mean there's a lot of things with data quality there's a lot of dimensions of data quality. But I think, in a lot of ways they do come back to the business rules because the only reason the so what of data quality is that it affects the business so kind of thinking that business first view, which is a big kind of a core tenant of our practice. So the rules are a nice way to get to that. So before we open it up for questions if we just go to the next slide, our shameless marketing plug we do this for a living so if you need help, you know where to find this. And then the next slide, another shameless plug is if you enjoyed this and you haven't seen this before please do join us for either the recordings, or next months on data modeling or October and December on governance and digital transformation. So I would do, I'm going to pass it back to Shannon to open it up for Q&A so over to you Shannon. And Nigel thank you so much for this great presentation and we have a lot of questions coming in here, and just to answer the most commonly asked questions just a reminder I will send a follow up email by in a day Monday for this webinar with links to the slides and links to the recording along with anything else requested. So diving in here so do we do data quality for only CDE elements critical data elements. Good question. I'll answer then pass it over to Nigel. I wouldn't say only and everyone kind of defined CDE is differently but I think that's a good way to start and look at it because the idea of a critical data element is you've done that filtering and it generally has some impact on the business. So you know consumer contact information patient, you know medical history and things like that. So it's a good place to start I wouldn't say only because sometimes it's those other gnarly ones but yeah I think that's a great way to think of it your thoughts Nigel. I agree with that I mean it just a paraphrase George Orwell I think all data is equal but some data is more. I think it's critical data that's the whole point of the step two of the methodology that we outlined the simple four stage methodology is that we know once you've done your identification of the data quality issues or suspected issues. You have to have that dialogue with the business to figure out which are those issues really matters, and which are the best issues to fix to get the maximum business return on that. Fantastic. And should data quality are be done at the source level raw layer or transform layer or cure later curated layer level. And that's the whole answer with an explanation point and I think I touched on it in the session to big fan of at entry is the absolute best catch it at its source I mean this might be a hacking phrase but you know we often think of data quality as kind of streams, you know a pond or a lake right and you if you just clean up the lake and don't clean up the streams feeding into it with dirty pollution, you're just kind of keep cleaning cleaning over and over so you know sometimes you do patches, I think it's good to monitor at each level. Because we talked a lot about business centric data quality rules and those are the best to catch it source if they say all your rules are in the application and no one can enter bad data, absolutely the best way to catch it. There could be something in the data pipeline, you know there could be data type errors and kind of those more low level ones so still good to check we generally check as part of our kind of etl framework, each layer of the transformation just to triple check, especially for those critical data elements but I would say other than that is the best ways to catch a source but Nigel, anything to add to that. Okay, great. Awesome. So, um, should business rule cover data quality rules. Yeah, I see them as kind of linked that it may be business rules are broader than just pure data quality rules but I think data quality rules are our business rules generally. Yeah, yeah there is subset I think of business rules as a whole. Anything else, Shannon? Yeah, Shannon, any others you want to fire at this? I think I've lost Shannon. I'm talking to the mute button here so. There's some comments here Nigel, there were some big fans of your of your examples on the business rules. So, considering that our data environments are becoming more complex data lake plus data warehouse plus several bi tools plus plus plus you know will AI become a good option for data quality automation in the future? Yeah, yeah. I'm just going to say yes I mean I'm involved with my local university and one of the key research areas there is looking at ways of identifying data problems when they're created so that if you're in an environment where there's a lot of real time, a lot of automated processes going on, then having AI to actually sift out or weed out the data that looks problematic, and then you can also then of course give it some business rules to look at fixing that data. And of course a good AI engine should also be able to figure out itself what the business rule should be. So there's no potential for that but it's very early days now. We have a slide for that we didn't put in actually but yeah now I would agree with a little bit of the jaded I think what people call AI may not be but I mean I think for a lot of those pattern related things like this looks like a social security or social insurance number, you know all of the stuff that we used to do manually just kind of that core matching is a great use of a machine. I would caveat that for a lot when we talk about CVEs, automation absolutely when it looks at these that's why some of these tools we mentioned are good, but some some data is so important patient data, you know, maybe some education data, it is awesome good to have a kind of a second set of eyes for some of these as well. So, but yeah, I agree. Um, we don't have a lot of time here but I let me slip in a quick question. Especially since Don I know you're, I know you especially are a member of Dama is the framework taken from the Dama DM box to. Um, yep Nigel and I are both Dama fans so I would say we rely that is our framework I think a lot of it has a founding know Nigel and I are both contributors to the Dama DM box. I think what we add to it is a, you know, some of that that business view and that idea of kind of the governance being a collaboration engine so you know I think Dama is has a good foundation for some of those core building blocks and we've just kind of extended that. Yeah, but to answer the question as well the both the a to e data quality methodology and the four step methodology for business rule application are both ours. Yeah. We've come up with within global data strategy they're alluded to in DM box but they're not specifically in there. I would just say in general in the DM box great kind of a dictionary and I think what we add to that is kind of the application, because I did what I would say please look at the DM box as a first step but don't have that be your last stop you know that those are suggestions and you really need to kind of apply to your real world example it doesn't know what's in there is suggestions doesn't work for everybody. No, no it's very good advice. Well that does bring us to the top of the hour. Thank you both so much for this great presentation everyone's got some big fans there of the content today. And thank you all for all the attendees for being so engaged in everything we do we just love it. And just again a reminder I will send a follow up email by end of day Monday with links to the slides and links to the recording at this session. Thanks everybody, and I hope you all have a great day. Thanks for joining us today thanks Donna as always. Thank you. Bye bye. Thanks a lot.