 All right, so now I have the pleasure of taking us through some of the basics of data quality. Again, a lot of this is going to be kind of foundational to many of you. Many of you, like I said, could probably give this presentation yourself. But I just want to make sure that we all appreciate all the different components of data quality so that we are just starting off on the same page, singing the same song, so to speak. Let me just progress through the slide here. Okay, so what are we doing in this presentation? We're going to start off with some definitions of data quality. Many different definitions out there. We're going to kind of give you the textbook definition as we understand it. Some implications of different data quality checks, some causes of some data quality issues, and then quickly just point out some strategies to address them. Certainly not a comprehensive overview of every single thing about data quality, but we are going to just lay the foundation here in this presentation. So why is it important to address data quality? This is a no-brainer, hopefully rhetorical question. But what I want you to appreciate is that data quality and data use are connected. And we call this the virtuous data cycle, okay? So we have data quality, and we have poor data quality. We have good data quality. You can see poor on the left, good on the right. Then we also have data use. We have low data use and high data use. Low on the left, high on the right. And what we have to appreciate is that these are completely linked together. Data quality and data use are inseparable. You cannot have high data use with poor data quality. It's just not really possible. We've not seen any evidence that that is possible to have high data use and poor data quality. We have seen examples where you have good data quality, but low data use. Now that is possible. Unfortunately, many countries or many projects put a lot of emphasis on having high data quality, but then ultimately people don't use the data. That's a separate issue. And there's another academy for that as well on data use practices. But the important thing to appreciate here in this virtuous data cycle is that we have to have good data quality to have high data use. We just won't ever get data use there if we have poor data quality. And if we do have good data quality and we do have high data use, we can start to do all these things that everyone's always talking about. Data driven actions and decisions, ownership of data, data cultures, data being actually used in our routine daily practices, people consulting data prior to making decisions, prior to planning processes. And people are asking for data. People want data throughout your entire hierarchy or organization, your project, because they know that it's good quality, it's reliable, and it's going to make the decisions that they have to make have more impact. So we get data demand. So why do you think that data is of poor quality? There are many, many reasons for this. And actually one of your assignments later today is to tell us some of the reasons why in your country or your project you think that you have some data quality issues. One of the major ones, the first bullet point here is poor form design, or poor system design, or high reporting burn. Just you put up many different kind of infrastructural, or it's just you make the tools that people need to use to report data or to even visualize data hard to use. If you make the tool itself hard to use, then people don't use it. If they don't use it, then when they do have to use it, they're going to struggle to understand it. They're going to struggle to fully utilize it, and that's going to result in oftentimes poor data quality. The second bullet point, and this is really very kind of pervasive in many countries, is that people do not feel accountable for their data. People don't receive any feedback when they submit data. They don't receive notifications that their data is being used or being checked. Especially think about the folks at district level or a health facility level. How much feedback are they getting on data quality? In many places, it's very, very little. They're also seeing that their data is not being used. They don't see any decisions coming from higher level. They're not being encouraged to make decisions off their data. So people don't feel accountable for it. People just kind of think it's just a thing that I have to do. I just put it in, and then it's gone, never to be concerned about it again. And without accountability, people don't put a lot of effort into making sure that what they're doing is correct. Third bullet point here, and I already kind of touched on this, is that there's very little ownership of data. And the ownership of the data needs to be principally at the lowest levels. The people who are actually generating the data, community health workers, facility staff, district staff. These are the people who are actually doing this or providing the services, filling out the forms, and in many countries, putting that data into DHIS2 or into the reporting tools. And if they don't fill ownership of the data, then they are not going to do a good job with this. And that's kind of just the bottom line. They have to appreciate why their data is important, why collecting the data and putting it in is important, and they need to say that, yes, this is my data. It represents the work that I do, and I stand by it. I say that it is accurate. And in order to get to that place, they need to see people using it. They need to be encouraged to use it. They need to be given the tools to use it. They need to receive feedback. That's how you generate a sense of ownership. The last bullet point, and this is probably one of the biggest reasons that data is actually poor data quality, is that data is not trusted. There's a lot, there's many countries that I've worked in where people just assume that the data quality is bad. And they just assume this because they don't actually have any insights or they're not able to see where the data quality issues are. They just know from some kind of anecdotal experience or some kind of process they went through that some data shouldn't be trusted. And if we just don't worry about that data, then we can kind of go about our lives. Well, if we build a culture of data not being trusted, then we are not gonna see any data quality improvements. And one of the best ways to make sure that we actually get to working on data trust issues is we start to show people where the data quality issues are. We start to use tools, dashboards, alerts, notifications, all the various things that we're gonna show you this week to actually push data quality issues to the user so that the data quality becomes very transparent. People are saying, yes, there's a problem here. I can do something about it, right? And that's really fundamentally necessary. Is that people see the data quality issues, people feel that they can do something about it, and that starts to build data trust. How do we define good data quality? Well, the most basic sense, data is of high quality when it's fit for its intended use. Can I use it for my operations, my decision-making, my planning, my policies? Do I have standard operating procedures built around it? When data is routinely utilized in these ways, we can say that your data is of high enough quality. More often than not, it's just data quality is defined as good when we are able to make it fit. Make it fit for purpose, right? Make it something that we can reliably depend upon. Let's quickly cover some of the data quality dimensions now. The first one that we want to go through is completeness. Completeness is, did we get all the data in that we needed to get in? Do we have all the health facilities reporting this month? And if we do, is that enough to be able to use to make a decision? You can't make decisions if you only have, let's say, 50% of your health facilities reporting every month, right? You need to have high completeness. You need to have all the data coming in that you expect to come in. This is easily measurable in DHS too. The next thing is timeliness. Is all the data there when we need it to be there, when we expect it to be there? If we are receiving data from many, many months ago, maybe that data's not even relevant anymore. Maybe things have changed. Maybe you're getting malaria data before the rainy season. Is that useful if you're in the middle of the rainy season? Probably not. So your immediate decisions and actions are not going to be based upon this old data. So we need the data to be coming in when it's supposed to be coming in. And we need to build our operating procedures and our processes around the most relevant and up-to-date data. Again, this is something that we can easily measure in DHS too. Consistency. This is one that a lot of people can easily forget about. Is our data reflected the same information across all the various tools that we're using? So if you go down to the facility, is that registry, that number that's recorded in that registry, is it also the number that I'm seeing in DHS too? Maybe you have some kind of patient tracking system, right? Is the data that's in that patient tracking system also the same as the aggregated numbers I'm seeing in DHS too? We need to be able to check all of these various reporting forms. And we need to say, is the data consistent across all these various sources? We also need to think about consistency over time. Is the data following a similar trend that we expect to see year after year? For example, with a lot of seasonal diseases, we expect this very specific trend over time, year after year after year. Is the data following that trend? We consider that consistency, right? Conformity. This is one that is easily, we stumble over quite a lot. So is the data following a set of standard definitions? A good example of this is when in the form design or system use, we have a lot of, many countries have a lot of different ways of naming the same thing. So malaria versus confirmed malaria, for example. Which one do we use? Which one is the data that we should be consistently using? Which one is the one that it's actually reported on? Sometimes it's fairly not clear on which data should be used. The next one is accuracy. Does the data reflect the real world events? Is the number, you know, is that number of patients that was actually seen during the anti-natal care visits? Is that the actual number that gets into DHIs too? Integrity. So this is kind of our internal consistency traceability. So the integrity here is talking about, can we, sorry, a little bit of disruption here in the room. The integrity here is a little bit about, can we look at the data over time and making sure that we are able to trace it back to its original source, right? So we see the higher level of data at the very top, the outcome impact indicators. Can we go down the hierarchy? Can we go down to the much granular place and say, yes, this is consistent throughout the whole process? One big example of this is like, if you actually have patients in your database, you are maybe collecting individual patient data and do you have multiple patients for the same person recorded in your database? This is becoming increasingly common. You need to be able to be able to perform these kinds of data quality checks as well to make sure you're not like double counting patients, which would then skew your aggregated data, which ultimately can skew off your entire impact and outcome indicators. Let's just look at some quick examples from the real world. So first of all, uncompleteness. This is the DHIS2 dataset completeness app. We're gonna cover this app, but this is a picture of some Sierra Leone generated data. And you can see that looking at completeness, we have the actual reports divided by the expected reports. And then we can see the number of, and we can see what percentage of reports are actually incomplete. And we can see the number of reports that reported on time as well. This is an app that I said we'll go through and you can quickly start to see those facilities, the completeness by different aggregation levels, so national level, district level, regional level, and go all the way down to individual facilities if you need to. Timeliness, DHIS2 has, if you're reporting an aggregated data, DHIS2 has a really good tool for timeliness. And that is in the dataset configuration. You can set the number of days after the period that you wanna qualify that data for being timely. So for example, you can say 15 days in what you see here. And that says that this data has 15 days since to be entered for the previous period for it to be considered timely. And if it's entered after that 15 days, then it's considered late. It's not considered timely or current data. And that can be a data quality check that you build in. You can either say don't allow it or you can say allow it, but flag it and we need to follow up on that. Consistency is a good one here. To give you a bit of an example, we're not actually not looking at DHIS2 data here, but we're just looking at it for an example from the 2016 Zambia population where there was an NGO path that had population figures. And then there was the district health office that had population figures. And then there was a population figure provided by the central statistics office. And here you can actually see, excuse me, that the district health office population figure was significantly different than the NGO's population figure. Now, which one is used? Which one is used and which indicator? How are we able to monitor and check for this? It's important that we actually have some processes to appreciate how the data is consistent. Like the same number should be consistent across various sources and it should be relayed and communicated over time and be consistent and again, following those trends over time. The trends over time is something that we're gonna really take a look at in the WHO data quality app. So consistency over time, if it's a little bit vague now, it will become crystal clear to you in the next few sessions. Conformity, conformity is a good one here. This is actually going back to kind of our design considerations. So here we're looking actually a picture of a data entry form, right? And you can see what we've zoomed in on here is the number of vaccine doses discarded due to and then you have the various vaccines, you have an expiration date, that's fine, but then you have a VVM change. And what is supposed to be entered here? Is this clear? Are you, is anyone actually really sure? So one person wrote no and then there's a bunch of tick marks. Is this counting anything? How is this going to be reported into DHS too? This is very confusing. And I think the important thing to appreciate is that if you allow it to use, if the person entering the data is making interpretation about how the data should actually be filled out on the original form, then you're gonna have very poor conformity. People are gonna be doing different things. Accuracy, another example from the field, a data entry form. And let's take a look at that second row there, which is IPT first doses. And here you see that it's a tick form. It's a common form you see out in the field where they've ticked off one, two, three, four, five, six, seven. But then they have to do a manual aggregation of this and if you follow that all the way over to the right, you see they have written two. Well, they've ticked off seven, but they've aggregated to two. Is that accurate? Who knows? I mean, it could be seven, it could be two. It's probably seven if they've filled out the form correctly, but somehow they did a manual aggregation to two. And what goes into DHS too? Probably that too. And so the accuracy needs to go down all the way to the individual reporting forms. And you need to be able to check for this in your standard operating procedures, in your routine processes, the supervision and support that you're giving to health facilities and community health workers to be able to check for these kinds of things. Oh yeah, sorry, just highlighting the points here. Another one here, if you have a total number of deliveries, they've ticked off four of these and then what do they do? They put a zero. Integrity is very important. Integrity is essentially, are we able to track the changes in the data over time for a single data value? And are we able to have processes to be able to see if a value is changed, who changed it and potentially follow up with them, why they changed it? And here you can actually see, and Nora's gonna take us through this, is the integrity checks within DHS too, the audit trail within each individual data element. And here you are able to see that change over time. All right, a few key points here to wrap us up on the session. The first one is that data quality issues are a fact of life. Everyone is saying, oh, I've got so many data quality issues. How do I get rid of my data quality issues? That's kind of the wrong question to ask. We have to appreciate that data quality issues are going to happen. Every information system that I've worked with, I think that anyone here has worked with, they have outliers. People put in crazy numbers sometimes, usually data entry mistakes. Outliers happen, it's a fact of life. Data entry errors happen, absolute fact of life. Validation alerts are going to be triggered. You're never going to, if you really have clearly well-defined validation rules, there's probably never a week that will go by that you don't have some validation notifications being triggered, fact of life. Fraudulent data, people make up data. It does happen. Many countries incentivize data to be reported on time more than they incentivize data to be reported accurately. What does that mean? That means that people really feel a lot of pressure to get their data in on time, but because the data's not being used, because data quality checks are not being performed, they know that they can get by by just making up data. They get that, they get it in on time by just making up some data. This happens. This is a fact of life in many countries and it needs to be appreciated and we need to have the controls to put it in. So data quality issues are not a failure of the system. They're a fact of any information system, really. And what is the failure is to not being able to address the data quality issues and specifically address them as they come in. So I just want to make the point is, no one here should be beating themselves up that they have outliers or that they have data quality errors. What they should be upset about or what you should be upset about is are you able to identify and address them? And if you're not, that's a failure, not the fact that they exist in the first place. We, when we talk about data quality and data quality checks is we have, we like to think that it's some kind of race. That the, and here I'm kind of going back to an old analogy here where the data quality issues are like the rabbit or the hare and the data quality corrections are like the tortoise. Usually in most country databases, we see that data quality issues are coming in so fast. They're constant every day, more and more data quality issues coming in. And we're not able to keep up with them. Our data quality correction processes, if we have them even, are very slow. They're very formal. They're too cumbersome. We can't keep up with the volume of data quality issues coming in. And what does that mean? That means that we have low data trust. We have low data ownership. And ultimately we have low data use. Remember, it all goes back to data use. And what we need to appreciate is that we have to build a process where these move together. So our data quality issues are moving at the exact same speed like the rabbit on the tortoise at the same time as our data quality corrections. We're talking about routine data quality corrections. We're talking about standard operating procedures that are built in, that are integral to our entire operations within our information system that account for data quality. Everything is moving together. Everybody knows their job. And we're utilizing all the tools that we have available to us to stay on top of this. Let's talk about some strategies to address some data quality issues. This is really what this whole academy is about, is how do we actually start to get that hair riding on that tortoise? How do we actually get the data quality correction processes moving at the same time as our data quality issues as they come in? A couple of points here. The first one is that even before we get any information system launched or implemented, we need to appreciate how data quality affects design. And again, we have an asynchronous session on this. So we have an entire session on how to link up design and data quality. We need to make sure that we have the right training processes and approaches to make sure that people understand and are able to address data quality issues. We need to make sure that everything is relevant, that we've taken into account the various contexts that the data quality checks will be performed. For example, we can potentially configure DHIS2 to send automated emails whenever a data quality check is detected. Well, is an automated email relevant to a community health worker? Maybe most community health workers probably don't have emails, at least in many countries throughout Central and Southern Africa. And maybe it's not appropriate to start sending up email alerts to community health workers. Maybe it's not relevant to them. We have to find other ways, other tools to be able to make sure that they know about their data quality issues and they're able to address them. And then finally, we have to give it sufficient resources. If we're gonna take data quality seriously, you have to give it the resources that it requires, which is tools, staff, exercises, processes, routines, meetings, workshops, trainings, all of the things that you actually need to actually get these kinds of things brought to life, requires resources. When you actually have your system live and operating during data entry and kind of just in the routine processes, you have to give yourself adequate time to perform data quality checks. It does take time. It takes effort to be on top of the data quality. You have to build data quality into your feedback and into your standard operating procedures. Again, people need to see the data quality issues. People need to be held accountable for their data. That requires very intentional, very targeted feedback. If you don't have feedback mechanisms going down to district level, to the facility level, to community level, there's a good chance that the data is not, you have very low data accountability, low ownership and poor data quality. So the feedback has to be very intentional and you need to figure out how to develop this feedback. What's appropriate to the person that you're giving the feedback to, right? Data quality checks need to be routine. They need to have very much, they need to have intention behind them. They can't just be, oh, when you have time, check the data quality. No, they need to be built into people's process and through their day to day work. And when possible, let's avoid manual aggregation. Manual aggregation, as we saw in that reporting form, is a really easy place to introduce a lot of data quality of issues. We need less manual aggregation. Let DHS2 do the aggregation for you. Capture the most granular data that you feel comfortable with and then let it aggregate up throughout the hierarchy. After you have your data quality, after you have your data in the system, we need to have very specific events, activities built around data quality assessments and data cleaning. Nora's gonna go through some of these and we'll have an example from I think one of the case studies presented on how do you actually build procedures, activities around routine or maybe periodic data quality assessments and data quality checks. Extremely useful exercises. WHO has a lot of really good guidance on this. But once you have the data in the system, the work is not done. You have to routinely review it and clean it. All right, so let's take a look at a quick cycle here. All right, so poor data quality. I've talked about this virtuous cycle where data quality and data use are connected and I'm I think earlier in the presentation on my first or second slide. Now I'm gonna talk about what we call a vicious cycle of poor data quality. So we have poor data quality issues. It results in data not being trusted. This means that we have a weak demand for data. Weak demand for data means that people are not actually utilizing or valuing the HMIS. They're not actually thinking that the information system has any usefulness, has any utility. If the data inside of it is just garbage. That usually means that if people aren't appreciating why the HMIS or information system is necessary, that we end up having weak health information systems. It's not updated, it's no one monitoring it, no one is providing the admin support that it requires, low data use, no one's really referring to it when they're making decisions. And that then feeds back into poor data quality. This is a vicious cycle. And it's something that you can spiral down and you have to build in very specific activities, exercises, standard operating procedures, tools, everything that we're gonna present to you in this academy to be able to break out of this vicious cycle. One of the big outputs of this process, breaking down, and this is the arrow that's coming off to the left of the screen here, is that when the donors feel that they have lost trust in the health information system, what do they do? They build their parallel systems or programs or projects build parallel systems. And then you end up in a situation with many different information systems in one country, some of which are reporting on the same thing, a lot of redundant and parallel efforts. And they can all go down this trap together or individually, but it does weaken the infrastructure or the idea of a health information system, a national health information system where all data is in one place. Yeah, exactly. So fragmentation, which leads us back to weak health information systems. So you can see that data quality is not just like effects, kind of perceptions and feelings about, how data can be used. It actually means that a lot of times parallel systems are built, fragmentation happens within the country or the project, and that then just kind of exacerbates or makes the whole situation even worse. Right, so weak demand, weak decisions making. All right, so from this presentation, we have two assignments. And this is important that you complete these assignments, they are graded for completion, not necessarily for accuracy, because we're asking your opinion. The first assignment is that we want you to list five reasons why you have poor data quality in your country or project. What are those? Some examples, wrong use of data, some weird issues with indicators, like I don't understand mortality, maternal mortality ratios at facilities. Don't ask for ratios at facilities. You're asking for difficult data at the lowest levels. You have very, another example could be lack of standard operating procedures or you have a large manual aggregation process, little to no use or feedback. Just what are some for reasons that you have poor data quality in your country? It's important to appreciate what these are. It's important to, again, to realize that everyone has poor data at some point in their system. It's not something to be embarrassed about, it's something to acknowledge and to use as a motivation to improve. So we're asking you to acknowledge what are some of the issues that you have. The second one is list five implications of poor data quality in your country. Having some poor data quality, what does it actually mean? What are the real world consequences of having poor data quality? So wrong or no decisions are made, lack of trust or use, weak accountability and the vicious cycle. So these are kind of broad ones, but you can even be more specific than this. You can say that we are not able to do our monthly planning meetings or we're not able to use data in our monthly planning meetings because of poor data quality or maybe you have like fragmented systems. So like the malaria project decided to build their own information system because they didn't trust the malaria data in the HMIS or a donor came in and built their own immunization system because they couldn't trust the immunization data in the HMIS. You could be very specific here and we would like to hear those stories and understand what are some of the outcomes or consequences of poor data quality. Okay, so this is the last slide here. Again, hopefully you've been asking any questions that you have on the Slack channel. If not, please go ahead and post those there. We are now going to take a, I guess it'll be a 12 minute break. So we are gonna come back at on my time, 12, 15, which is, or sorry, yeah, 12, 15 which is 12 minutes from now. So please take this time, get yourself a new cup of coffee or a cup of tea, take a bathroom break, whatever it is you need to do. And we are gonna come back here in, yeah, 12 minutes at 12, 15.