 Good morning, everyone. My name is Shereji Dutta, and I'm with the University of Oslo, and I'll be talking today about data quality. We're just allowing some people in to the room. I'll get started in a moment. Just as a reminder, while I won't be able to answer questions live, if you do have questions, please feel free to post them in the chat. And then, of course, afterwards, I'll try to address some questions if we have some time through the chat or if we have some time to communicate. But also, I'll remind everyone afterwards as well that you can post your questions on the community of practice, and I'll make sure to have some help, and people can flag questions so we can answer them there as well. So please don't... I don't believe you can turn on your mic anyway, but please try not to ask questions live. We won't be able to answer them that way. But if you add your questions in the chat and everything, that would be much appreciated. So I'm here with a couple team members. Alice is helping us manage and facilitate the webinar, as well as a lot of information on the community of practice. Victoria, who's been managing a lot of these webinars in the past, and we have our French team as well, who will be giving this webinar on Thursday, who's been assisting in preparation for this version. So I'm going to go ahead and get started. And once again, if there are any questions about any of the concepts that I'm discussing, please feel free to post them in the chat and we'll try to get to them as soon as we can. All right. So I'll be talking about the data quality toolkit today. And I just also want to want to mention quickly my colleagues, Scott and Olaf, who've been working with this on me since working with this together with me since about September of last year. And we've released a new toolkit. And I'm just going to discuss what that contains and demonstrate some of the specific features that are related to this data quality toolkit. All right. So the outline for today's session, we'll just give a bit of an introduction, you know, what the data quality toolkit is, what it comprises, what are some of the changes that we've made. And then we'll talk about some of the metrics of data quality. And in particular, we'll discuss some of the key metrics that we've implemented. And then we'll also demo those within a demo instance that you actually can access yourself as well to view some of these measures. We'll also discuss a new version of the data quality annual report app. And that has been built with our colleagues from his Prowanda, in particular, who've been helping us to redesign this application. And I'll discuss what it, you know, what are some of the changes and what it looks like now and demonstrate that as well. And then we have a number of implementation tools and considerations to actually kind of, you know, give you some guidance as to how to make this work in practice. All right. So let's start with a bit of a brief introduction. So what is the new data quality toolkit? Well, the data quality toolkit is basically comprised of a number of different guidelines, some demonstrations, and some implementation guidance that help you to actually make this work in practice. So we have some starting with a new documentation that outlines all the different features that we support as related to data quality within DHIs too, and kind of detailed steps on how to configure those, how to interpret the measures that we see that are created through these data quality feature set, you know, and just kind of showing, showcasing and demonstrating and discussing, you know, how to make those work within a particular implementation. The guidance is generic enough that it can be applied to different use cases. And I'm going to actually demonstrate that as well, because while we have a data quality set prepared for immunization, we also have, for example, a core set of variables that looks across programs as well. And the guidelines are meant to be kind of applied on an as needed basis, you know, based on the data that you are analyzing in your system and want to review on a routine basis. We also have a demo instance, and I'll be using this throughout the presentation in order to demonstrate some of these specific concepts. You are able to access that and we'll share the link at the end of the presentation. So this is a publicly available link, and you can access that, you can log in, you can view the dashboard, you can view all the different types of analyses that you can perform through that data quality demo that we'll share. We also have a couple new tools that we've created in order to help us with this process. So we have a new app for data quality review, and I'll show that this is the one that I mentioned earlier, developed by our colleagues in Hisperwondo. We also have some scripts available for a generation of Min and Max values, and I'll discuss that more when I get to the Min Max session. And we also have some templates. This is on SLPs. We have some standard checklists, et cetera, and I'll try to show those as well. So you can see what those look like and kind of understand how they can help you support your implementation. So the data quality toolkit is still what we're framing this on, or what we're basing a lot of this on, is the data quality assurance guidelines, the WHO data quality assurance guidelines, and in particular we're focusing on routine review of data. So for those of you familiar with those data quality assurance guidelines that module two of the toolkit, and this is developed largely in support from the Global Fund. And we're also going to work on new data quality training material. So while we have the guidance and the documentation, we will also be working on new training material planned for later this year. So just real quickly, I just wanted to demonstrate the documentation because it can be a little tricky to navigate. So from DHIs to even though the link is in the presentation, we'll give it to you. When you access this documentation, the first thing you'll see is a bit of a review of the principles. And this is just giving some background, so kind of everyone's on the same page when they're reviewing this. Now some of this might seem familiar to those of you who've maybe kind of worked with data quality in the past. And that's because we're still basing a lot of the toolkit and the feature set on this data quality assurance guidelines. And I have a link here in the documentation. So when you open it up initially, you might see some information that is kind of consistent with what you've experienced in the past. If you navigate on the left side of the screen, we have then specific features for diagnosing data quality issues at the point of data entry. So we discuss kind of what those are. And we also discuss how to configure those. So for example, validation rules, I'll demonstrate this concept later on today. Well, we go through kind of the detailed steps to actually configure these in the system and how that's done in maintenance, for example. Then we also have on the analysis side, excuse me, a number of different features and items. I'm going to walk through those metrics and show you those metrics as well. But similarly, we discuss what they are. We discuss how to interpret those metrics, and we discuss how to configure those metrics within DHIs too. So there's kind of detailed step by step walkthroughs about what these items are, how to interpret these different items, and kind of how to set them up in your own systems as well. All right. So we also have this section on implementation. And this discusses kind of how to take all of that. It gives you checklists. It gives you standard operating procedure templates. It gives you other types of kind of features that will help you and actually implement this and set this up. So this is not just kind of the step by step kind of clicking in DHIs too, but it's more thinking about, well, what am I going to do at different levels of my health system to kind of implement data quality review? What are the different procedures associated with that? What kind of checklists can I give to people to check their behavior after a dumb training? Other types of features as part of this toolkit that are more on the implementation side of things that help kind of to think about how you kind of get this operational in the field. All right. So while the toolkit, while the documentation is kind of contained within this page here, just make sure that you're navigating on this left side of the screen because that'll really help you find the different resources that are associated with this. And that's how we split it up at the moment where we have a review of the principles, review of data quality features for data entry, review for data quality features for analysis, and then all of our different implementation procedures and documentation in different sections. All right. So the demo instance, I'll showcase that in a moment as I get into the metrics and measures. We'll discuss what that is and how you can access that towards the end. And then we have some tools here, some links. I'll also discuss the SOP template later on as well. All right. So this toolkit is actually kind of a revision of many features that kind of have been in place for a little, but we've kind of tried to streamline that and kind of make them easier to access, configure, and we have added a number of new features as well. So we've added in some new features. We've also combined them with some of these existing features. And then we've kind of created this summary within this toolkit. So some of the features that we're demonstrating are not necessarily new features, but we have tried to kind of combine them with others to kind of see, identify where they actually fit into the broader kind of scope of what we're trying to do. For example, validation rules. This is a concept that's been around for a long time. This can help us measure both measures of internal and external consistency. And we try to describe that in more detail within the toolkit documentation. So as I mentioned, the core toolkit, for those of you familiar with these data quality assurance guidelines, there's four different modules. We're really focusing on the desk review of data. So there is, for example, a part of this data quality assurance guideline that looks at reviewing data in the field. There's also some more information on community health information systems, for example, where we can follow some of those routine guidelines for desk review within those community health information system guidelines as well, we have done that as well. So we try to kind of support the desk review side of things to the extent possible. And then we have this WHO annual reporting up and we'll discuss that as well. All right, so for those of you who've kind of been using DHIs to for a little, little while, maybe you're familiar with this WHO data quality tool that was previously available in DHIs to and what I'll do here, just open it up real quick. So this is what it looks like for those of you who may be seen this before. Okay, and I just want to discuss this tool, and what the kind of some of the ramifications of our data quality tool can and approach to data quality are on this applications for so for those of you who are familiar with this tool. Okay, so we previously relied quite heavily on that application that I just showed with the different dashboards coinciding with the WHO data quality assurance guidelines. But there were a number of challenges with this app. So we couldn't actually save any analyses or add these analyses to the dashboards. On larger systems, they would often kind of hang and wouldn't really run the way it was intended. If you were doing any type of specific analyses, and it required some skills in navigating the interface to filter outputs correctly. So just let me give some examples of what I mean. So if I have a look at this dashboard within this application, you know, I have to actually go to this separate application, I can't just have it on a DHIs to dashboard when people first log in. Right and I have different types of analyses and they are useful quite quite a bit. But I'm not able to take any of these specific charts and place them on a DHIs to dashboard. You know, some of the other issues we had with this. If I start modifying some of the options, I need to add in specific filters to kind of get to where I need and I need to understand to some extent, you know, what do these terms mean? What is a modified Z score? What is an extreme outlier? You know, this can be difficult to cascade down to various levels of the health system in regards to training, in regards to kind of building up capacity overall, right? If this is kind of a routine activity that you want done every month, for example. So because of some of these challenges, we've tried to take a bit of a new approach in order to kind of rectify these. Okay. The other issue that I mentioned was maybe on a larger system, say a country with tens of thousands of facilities, sometimes the tool would not actually run the way it was intended to do so. So we're trying to kind of address some of these challenges with this particular tool, even though it's been useful for a very long time, we've been quite happy with it, but we are trying to move on to make sure we can improve upon what we already have. So at the time this tool was released, you know, much of the data quality features that we're going to demonstrate today, they were not really available within DHIs to but that has changed quite a bit over time. So we're almost kind of at feature parity, where we see the same features available, for example, to place on a DHIs to dashboard as was available in that application that I was just walking through very quickly. Okay. So what we're going to do in the coming kind of one to two years, you know, sometime in 2025, I think, we're going to start to deprecate that WHO data quality app. Okay. So we're not going to be supporting it as much anymore. And that's because we moved a lot of those features over to the kind of core DHIs to application. Right. But there are some functionality gaps. And a new application data, this WHO NRA report app has been made in order to fulfill those gaps. So just real quickly, to give you an idea of what the features are between the two applications. So on the left column, this is the WHO data quality tool. That's the application I was just showing on the right side here, we have the kind of core DHIs to system. Right. So we can see that there are just a handful of features that are not implemented right here at the bottom consistency over time using scatter plots and these automatic generation of the data quality report that is not immediately available. Now, actually, this one consistency over time using scatter plots. That's something that will be available in the not too distant future. It's actually something that can be done. But it's a little difficult right now to do. And then this automatic generation of the annual data quality report, we have made a separate application in order to deal with that gap. And this is something in particular, even though the functionality was there in the previous tool on a large system, it often would not run at all. So we have rectified that as well to make sure that it can be run on these larger systems. All right, so what I want to do now is go through some of the metrics in the toolkit and just kind of explain what they look like and describe how we can use them to review data quality before we're getting into analyzing our data. And as I mentioned, the whole focus of this toolkit is looking at routine data quality, right? So we want to look at our data as frequently as it's submitted. If it's submitted monthly, we want to look at it weekly. It's not just about having kind of an annual data quality review, even though we have some features to support that as well, because we know that it's done in practice. But we also want to kind of focus on this routine data quality review. Okay, so within the data quality toolkit, we have a number of different metrics. So we have internal and external consistency via validation rules. I'll demonstrate most of these. I'm setting of min, max values. We have a data set reporting form, completeness and timeliness, data element completeness. So you'll see the ones with these stars next to their name. These are new measures or features that we've implemented very recently. Whereas, you know, the first three that don't have any stars next to their name here, those have been around for some time and how we've traditionally measured, you know, some aspects of data quality in DHIs too. So we'll describe each of these measures and we'll also describe, you know, why we've added some of this additional functionality. Okay, so data element completeness. This is a new one. Organization units and facilities consistently reporting. But we'll describe all these measures, if you're not familiar with these terms. Consistency of related data. So via scatter plots and dropout rates. That was something that was quite heavily used in the DHO data quality tool. And we've since been able to support that within the core DHIs too application. We also have consistency over time. That's some of the charts that you see within this tool. If you go back to, you know, for example, these types of charts, okay, these were not supported previously in DHIs too. We have outlier analyses, which allows you to kind of look at outliers using different formula. And we have a lot of different ways we can actually analyze outliers now within DHIs too when trying to compare values and see if they're consistent. Alright, so let's start with some of the features that we've had for some time. So while you might be familiar with some of these, we're also able to use them in kind of new and improved ways in order to kind of get what we want out of the system. So the first concept I'll kind of discuss is validation rules. And validation rules can be used to measure both internal and external consistency. So what do I mean by that? So internal consistency looks at data that's kind of collected within your routine system and compares various values together. So I have an example down below, which is actually kind of just looking at the internal consistency of data using data that exists within the system, where I'm comparing DP3 doses given with an outlier threshold that's a calculated threshold based on my previous data that already exists in my system. You could also, for example, compare DP3, DPT3 doses given with stock availability as an example within a given month to make sure that these values are consistent. When I'm looking at external consistency, I might want to compare this with survey data, you know, data that is collected using a different kind of method, right, using a different process and then compare these values together. So if I were to bring that survey data into my survey data, that would be an example of a comparison of external consistency. So these validation rules support both of these functionalities and adding in kind of this examination of comparison against outliers is a relatively new feature within a feature that's been existing for some time. And the nice thing about validation rules is that they can be used both during data entry as well as in bulk analysis, all right. So here I am in data entry and many of you might be familiar with this screen, for example, and what I'll do I'll just quickly run my validation right. So here we see that there is some validation issues that have popped up. In this case, we're comparing, for example, most of these are comparisons of doses given with stock availability, right. There's some issues there. There's more doses given than there is stock available as an example in most of these validation rules that are appearing, right. So depending on the standard operating procedures you have or kind of the way that you kind of go about doing things in your system, you could, for example, have some procedures that help you identify, you know what these rules are telling you. And in particular when you're doing data entry training, maybe you'd want to have those procedures available for people so they understand what they need to do in order to fix these issues. So, you know, let's look at one of these specific procedures. MR doses given in wastage should be less than or equal to doses available, right. And we have 88 as our total and then there's only 84 doses available. So somehow more doses have been given. So if I just look at my MR1 doses here, right, I have a total of 88. So let's say I just reduce this, okay. And then I run this again. You know, that validation rule will no longer appear. So depending on your procedures, you might want to develop or have procedures to have people modify these values if they're incorrect, especially when they're kind of obvious. Maybe they've added in too many zeros at the end of a value. Maybe that they've added in, you know, some other type or something else. Now, depending on, you know, their capacity and the skills and the procedures that you have, you know, this might be limited to some extent in terms of, you know, can people actually modify these values or not. You know, they go back to their logbook, they go back to their registers, and they still come up with the same value, for example, or there's some other issue that they're not able to kind of deal with. You know, this data is still saved and you can actually view these validation rules at different levels. So let's say you do it at the facility, you know, when you're entering data, you know, whoever is entering data at that time and they're not able to fix the value, okay. That has to be maybe handed up to a higher authority or someone else in the system. You can also run these rules in bulk, right. So you can view these rules in bulk and look at everything together rather than looking at them one at a time. And another feature tied to this is that you can actually send notifications related to these violations. And you can run this either as a manual check or you can kind of have this scheduled. So let's say I want to run this 15 days after the end of the month, right. I want to kind of make sure that all my data is submitted and then I want to have the system just run this check on all the data that I have at the end of the month. And what I'm doing in this example is I'm comparing it to a threshold. So in the first example in data entry, we looked at a comparison of the data entered versus, you know, doses given versus stock data. In this example, I'm looking at doses given versus a threshold which is based on previous value. So this is a statistically calculated threshold that I'm comparing it to and I can make this for, you know, as many values as I need, right. So I'll just run this real quick. I'll click on validate. Okay. And we can see that a couple violations appear. I've run it just for the month of December within one particular province in this particular example. And, you know, we can see here that there's some violations here where the number of doses given exceeds a statistically calculated threshold that is based on previous data that is available for that particular facility. Okay. So as an example here, we see it's a little bit higher. Now, this might be within range. Maybe if these values were more extreme then it might be more noteworthy. It might be worth checking up a little bit more. Now, as I mentioned, in addition to kind of running this as a bulk analysis, you can also receive these as notifications. So this is the analysis I just ran real quick. We can see that the same information that I see appear within that kind of outlier page. I'm seeing the same validation rule violations in my email now. So someone could check on those if they wanted to and determine, you know, if there is something incorrect. You can see here it describes a bit more what the outlier threshold is. In this case, we're using this type of formula in order to create our outlier, and then we have the actual value that's been submitted on comparing it to that value. And this can be run as a manual process as I just ran it now, or it can be run on a schedule that just runs, let's say, 15 days after the end of each month and checks all the data that's been submitted, let's say for the previous month. It can be run on different schedules. It's completely user-defined. So another feature that's available at the point of data entry is min-max values. Minimum and maximum values basically allow you to determine for every data element or variable within every organization unit that you have, so let's say every facility or maybe at lower levels community, and for every desegregation, so male, female, less than one, greater than one, etc. The actual value that can be entered for any particular period. Now, right now in DHIS2, the features to generate these minimum and maximum values are not very strong. They require some revision. But in order to kind of supplement that in the first kind of slides that I was showing you, I did mention that we have a prototype script that helps to generate these minimum and maximum values based on a number of different kind of statistical formula that are kind of well accepted for generating this type of information. So at the moment, unfortunately, we do recommend that the min max values are generated outside of DHIS2 and then brought in to DHIS2 through a kind of import process. But once they're in there, then you can use those to kind of effectively check your minimum and maximum values. So just like validation rules, these can be checked at both the point of entry as well as through some type of bulk analysis. Now I'll just show it quickly at the point of entry. This is the new data entry app, the kind of one that we've been testing out. If you're not kind of familiar with this, but I just thought it'd be interesting to show both the current one and then the new version. But what we're going to do here, I'll just center a value here. Okay, I entered a value and you can see that if I just scroll over this red box, the box is turned red telling me that this value is not acceptable. I'm going to say the number cannot be less than zero or more than 100. I also have this kind of notification here. There's one in value value not saved. So this value actually does not get saved in the system because it's invalid based on the min max limit. Now I can check these min max limits for each of my data elements. Here I'll just click on this view details button and it opens up this window for me. And you can see here that the min max value is zero to 100. And that's why it's highlighted this particular value for this user saying that this value is not acceptable. So I'll just close this and I can change this and probably it probably would be much less. So maybe I entered in an extra zero for example and then I could switch that value it's highlighted in green, it's now saved there's no more warnings that appear. So this is another way to help at the point of data entry identify values that maybe are not acceptable based on kind of min max limits that are set in your system. And we describe kind of a bit more you know how you would import that there's a link in the presentation to the script which describes how it can be used a little bit more as well as some of the statistical methods we use in order to generate those min max values. And we are looking at how we can supplement the features in DHS2. So you don't have to do that outside of DHS2 at the moment. But for the moment we do recommend that you use some other software or another tool in order to generate that min max values outside of the system and bring them in currently. We are hoping to change that in the near future. All right. Okay, so let's have a look at some of the kind of classic measures that we've seen for some time for measuring data quality in DHS2. So one of the more common ways that people have measured data quality is dataset completeness and dataset timeliness. And while those are still useful, we're going to discuss some measures that actually also assist with the measuring completeness and timeliness beyond dataset reporting. All right. So let's just define these real quick and then we'll describe some examples of them. So dataset completeness is defined as our received reports divided by our expected reports times 100%. And dataset timeliness is our received reports on time divided by our expected reports times 100%. So let's just describe how these are kind of calculated in DHS2, where you actually get these values from. All right. So just go back to my data entry page. So when you're in data entry, whether you're in the new app or the current application, okay, dataset completeness is defined based on these complete registrations. So when the user clicks on this complete button, the dataset is considered complete. And then for that particular period, in this case it's monthly for this immunization dataset that I'm demonstrating, then, you know, you've marked the dataset as complete for that one. Right. And then if I go back to my dashboard. So here we have a kind of representation of that reporting rate. Just make it full screen. Okay. So this is a very traditional way of kind of identifying completeness for data. Right. The monthly or in this case, it's a monthly dataset. So I'm checking the monthly reporting rate. Right. So 80% of my facilities, let's say for this particular month that I've highlighted, 79%. Okay. I've reported on this dataset. So I'm still missing 20% of my reports as an example. That means the entire dataset has not been submitted at this time. And it's defined or calculated based on that complete button within the data entry screen. Okay. We also have measures of timeliness. This is a user-defined parameter. So when I'm in the DHIS2, let's quickly open this up. Okay. We have a when you're creating a dataset, I don't really want to get into too much of the configuration part here. But when you're configuring a dataset, you have this field here which allows you to define how many days after a particular period in order to qualify for a timely submission. So if I say 15 days, and this dataset is captured monthly, this means 15 days until the next month. If people do not click on that complete button within 15 days, then it is not a timely submission. Right. If they do that within 15 days, let's say they submit it on the 10th of the next month, then it contributes to my timeliness measures. Okay. So this is a completely user-defined parameter for timeliness. Right. Whereas completeness is based on an actual button within DHIS2 in the data entry screen. Timeliness is completely based on, you know, what your routines are in your own settings in your own place. Okay. All right. So those are very kind of traditional ways of measuring some of the completeness of data in DHIS2. But we've since identified some challenges, you know, and this is not just us within the DHIS2 community, I should say. This is the broader kind of community I think of practice looking at, you know, health data and health measures in terms of kind of identifying where there are challenges relying on this type of measure. Okay. So there's a couple new variables and features we've added with some input from some of our colleagues within the broader community that include data element completeness, okay, which looks at the individual values within a data set, not the entire data set itself and organization unit completeness or facilities consistently reporting as it's more often referred to. Okay. So data element completeness looks at the number of received values divided by the number of expected values. So this is for a particular value. So in the screenshot here I have, for example, IPV. DPT1, DPT3, whatever vaccine you're giving or whatever other variable that you're looking at and it's looking at on a monthly basis or sorry, on a basis of your choosing, I should say, not just monthly, okay. How many times has data been entered for that particular variable? Are there any blanks? Are there data not submitted? Okay. And it checks and is able to kind of give you a representation of this. And oftentimes what you will see is that the data element completeness can be quite a bit lower than the data set completeness. And this is why sometimes using data set completeness as a proxy can be challenging in some cases. We also have facilities consistently reporting and what I want to do is just kind of describe some of these measures. So let's look at the dashboard and I'll just scroll down. Okay. Here we have a number of measures on data element completeness, right. These are percentages. These are based on the number of value submitted within a month divided by the kind of expected number of reports. So I'm just going to open up a pivot table here, right. And here we see values and we don't need to focus so much on the values. But what we're looking at is blank spots in the table. Okay. So as an example here in this first row for this particular facility. So we're looking at dbt1 doses given at the facility level. So in this first row here I see it a blank spot. Right. So for this month your data element completeness would be 11 out of 12. Or sorry for this year I should say not for this month. Okay. It would be 11 out of 12 which is 92 percent or something. Okay. So when we're kind of taking the overall aggregation of this we're looking at let's say we looked at it for a particular month right in August. I'd have all these facilities that didn't report on this value for August. Right. And what it would do is take the total of all the facilities in this particular table. Okay. It would say X number of facilities have not reported any data. Okay. And then it would take the total number of facilities that are expected to report data and I would be able to get my data element completeness on this. Okay. And you can see that we can also investigate which facilities reported and which ones didn't. Right. Now the kind of key with this is to identify values or measures where they're expected to report every month. Right. So if you're using values or variables where the reporting is sparse just due to the nature of that particular variable then this is not a good measure to use because maybe completeness will never be 100 percent because that's just not the expectation within the particular setting that you're working with. So you do have to be a bit careful when setting this up so it's not misinterpreted. Right. You want to make sure that you can follow up on those particular issues that you're kind of identifying. In this case if I am expecting every facility to report on DPT-1 doses given and I know that in my dashboard here 30 percent of facilities for December have not submitted any value and I'm expecting them to do so. At least I can create a list. I can identify which facilities are missing that value and then I can subsequently follow up with those facilities and identify what happened here, why didn't you submit any data. And of course if I'm interpreting any value, for example a coverage rate and I'm missing 30 percent of values then this might alter the way I interpret that particular indicator. Right. So we also have to be a bit careful when we're making conclusions about our information and that's why it's often useful to do this type of exercise before I actually analyze my data because I want to make sure that I can make proper comparisons and conclusions about my program's performance based on the data that I do have. So another measure that we discuss is facilities that are consistently reporting. So this measure actually identifies within let's say a 12 month period facilities that reported on a particular variable for every month. So we see down here the percentage and we just see here this chart which are just trends for the number of facilities reporting for the last 12 months based on that current month. So for example the July measure here is from July 2022 until June 2023 for this particular percentage and then we keep progressing through our different months. So let's just describe this as well real quick So let's have a look at some of the rows with no blanks. So basically if I look at this row here and this row here there are no blanks no months where there is no data reported. So these two facilities as an example or 2023 would be facilities that are consistently reporting. If we look at some of these other facilities this row here there's a blank, this row here there's a couple blanks. These values or sorry these facilities are not consistently reporting this particular value. So these are just different ways of kind of looking at completeness at an individual level in order to kind of be able to follow up and identify where am I not receiving data from. And of course as I mentioned the idea is not just to kind of look at this in a bubble and identify which facilities are not reporting but also consider when I'm making conclusions about my program's data if I'm missing a lot of data then maybe the values that I'm reviewing are not correct. So these are two new features that are two new kind of types of measures that are supported within DHIs too and can really help with you kind of managing your routine data and kind of comparing in particular completeness measures to identify is your data set reporting a valuable or useful proxy or has it been inconsistent and not in line necessarily with the data element or variable completeness and if there's a big difference or a big gap then it might be time to reconsider how you maybe review those completeness measures. Consistency of related data is another way that we can look at data quality this was something within the DOHO data quality app is something within a lot of the DOHO data quality assurance guidelines as well and what we're doing here is using various types of tools in order to identify consistency of related data and in particular we have two types of charts we use for this so one the one displayed on screen in the screenshot is a scatterplot we also use for example a dropout rates in immunization that's very common to look at this consistently of related data and often what we're looking for is a predictable pattern there's a relationship between these two variables let's say dbt1 and dbt3 doses given it doesn't mean they'll be equal necessarily but there's an expected pattern between these variables that we're investigating based on some statistical formula so let's go back to our dashboard and have a look at some of these values again and what I'll focus on real quick maybe is just dropout rates okay sorry so for example here I have a dropout rate of opv1 to opv3 over the last 12 months and I'm looking at kind of provinces within the country now you'll see here this negative dropout rate this is invalid right it's not really possible if your data is correct so as an example in this particular province I would want to investigate the relationship between opv1 and opv3 because these values are not correct on one side of the equation you see these other dropout rates maybe this one's a little bit high as well but it could be valid but this isn't a data quality issue that needs to be checked and verified because this is not technically possible in this example so when you have a relationship between two variables and you kind of understand the expected outcome it kind of makes it quite easy to validate this information and verify the values so in this particular case we know we would have to check on opv1 and opv3 in this particular facility to identify whether or not those values are correct okay we also use scatter plots in order to do this I'll open that up scatter plots similarly we can see here that there's a number of different values that are appearing this is looking at the relationship between opv1 and opv3 again just in a different format the dots represent all of the facilities in this particular country the red dots are ones that may require investigation the middle dot is kind of the closely expected relationship based on different statistical parameters and the further these dots are from the line the more likely there is a potential issue with the data so where you want to investigate are these red dots that are far off from the middle in particular there's a lot of different ways to interpret this I won't go over all of this right now but just know that you can make these types of scatter plots within dhis2 and you can place them all on the dashboard so maybe as I'm scrolling through this example dashboard, hopefully you've noticed that everything that I'm demonstrating you can place on the dashboard whereas previously you could not do that and that was one of the challenges of using that WHO data quality tool we also have consistency over time so this was something quite commonly shown if I just go back see these are these year over year charts that you see in this WHO data quality tool here you see the scatter plot for example that I just talked about a bit so previously you could not add these to the dashboard either but now you are able to do so alright so what these are looking at I'll just scroll down on my dashboard let's open this one up so what I'm showing is for one value I'm comparing five years of data on a month by month basis so these different lines represent one particular year of data on my x-axis at the bottom here I have my months the y-axis is my values and then I'm seeing the comparison of them stacked over time now essentially what you're expecting with this type of chart is that the values are not kind of closely they're not kind of there's no parameters or pattern that you see that is not consistent with the other values you can see here many of the values are just kind of closely linked maybe there's some issues here because you're seeing the value actually decrease over time this is the number of MR1 doses given so maybe that's an issue but you can see there's kind of this obvious outlier and one thing that's nice about this particular chart is that in terms of training it's very simple for people to review this on a routine basis and identify to them if there are kind of any significant increases or decreases in the data please identify those please follow up on those or please identify them to us so we can follow up on those because it's really kind of quite easy to teach people how to interpret these develop procedures to deal with these values and follow up with them and find out what's going on especially when there's immediate and it might not be immediately apparent within this type of chart but here there's probably been some type of data entry error somewhere you can make a pivot table for 2019 and check your facilities and follow up and see which value is incorrect as an example in order to rectify this quite easily so this type of stuff should be handled on a routine basis and it's quite nice to have this on the dashboard so people can review this immediately and take subsequent action outliers is another kind of value that we focus on quite a bit so there's different ways to calculate outliers there's different statistical formula you can use to calculate outliers a common one might be looking at the mean plus three standard deviations for the last let's say 11 or 12 months in comparing the value to its previous values but there's a lot of different ways that you can calculate these outliers and we now have a lot of functionality available in order to allow you to compare the value that's been entered with some type of statistically generated or calculated outlier based on all the previous data that you already have in the system and this is all done within DHIs too so you don't need to export it out and do something in some other tool and then bring that data back in you can make these comparisons and generate all the outliers directly within DHIs too itself so I'll scroll down here so at the bottom of the dashboard we have quite a few different variables on outliers so here we have some information on data that's excluding outliers I'll explain in a moment what that means what we're trying to do is identify the weight of the outlier and the effect it has on a national total so let me just open up an example here okay and what I have here is two values so this middle box are VCG doses that are given that's what's reported in my system what I've done here then is I have a separate value which actually strips out the outliers, the potential outliers from my value and then the percentage is just this as our numerator this is our denominator times 100% which gives us the percentage that's represented by the outlier so if I have a look at this and let me just add some totals here before I do this so you can imagine if I have outliers and roughly in this bottom row for December roughly a value of 3,000 of those values that have been reported is a potential outlier this could be inflating some of my coverage rates as an example so here the value reported is 126,000 so here the value if I strip out all the outliers for the year is 109,000 so there's a difference of roughly 16,000-17,000 doses given and that comprises whatever 13% of all values so this is quite significant if I'm making conclusions about my data and it includes a bunch of outliers in my data potentially then maybe the conclusions I'm making are not correct so I have to be very careful here this is just an example to show you this feature it could be much less than a real system or it could be much more as well if there are significant issues with some of the outliers that exist based on the statistical calculations that you use but the idea here that we're trying to identify is that the weight given to certain data can really affect how you interpret that information and in this particular case you really want to check if I were to do any kind of calculations that looked at BCG coverage and knowing that in particular maybe 13% of my values are outliers maybe I want to fix that first because this could be having a significant effect on my overall coverage rate maybe surpassing a target and when I review that data it surpasses my target but if I strip out those outliers I'm now below my target for that particular year so there could be a lot of implications here when you're checking and reviewing your data it's often useful to add in some type of outlier analysis to identify is my data internally consistent with my other data that's in my system or is it not and kind of creating some issues is there data entry errors or problems and these are the types of things that should be sorted before you make any conclusions about your program's performance so these are just a couple of the new measures that have been implemented in order to kind of support the review of data quality on a routine basis now as I mentioned everything that we talked about is separate validation rules that has to be done separately it can all be placed on this dashboard so when people log in let's say that the standard operating procedure is let's review this on a monthly basis because this data is submitted monthly so the idea is not to wait till the end of the year to perform a large data quality exercise but just to perform these routine checks and then you can review your data and discuss your data make conclusions about your program's performance but the idea is that you'd be confident in making those conclusions before you do so right so if you're evaluating and having some meetings let's say on a quarterly basis or something like this you'd want to make sure that these checks are performed so we make it as part of the dashboard people log in, people can access this they're trained on what the values represent we do have different types of dashboards as I mentioned this particular dashboard comprises immunization data but as an example I've created a variation of this this is for a facility level versus a national level the one I was showing maybe which focuses on some different measures as an example we have some particular analyses that would be maybe more useful at lower levels of the system where we're comparing for example individual doses given with their statistical threshold so you can see where they surpass in this case there's not too many issues here we can see there's some problems potentially okay and then we also have a generic dashboard which comprises core elements core indicators within that WHO data quality assurance toolkit so the idea is that the dashboards we've made the toolkit that we made it can be applied in many settings in many programs while we've mostly focused on immunization data in the course of this webinar we can apply this more broadly to other programs as well now of course each program will have some variations on the data that they collect and the data that they analyze as a routine part of this process as an example in this immunization dashboard I've added for example coverages that are over 100% as either the numerator or the denominator in that particular case might need to be checked so for each program you would also want to make some changes in order to kind of fit your needs alright okay real quickly I'll talk about the WHO data quality annual report so this has been made in order to generate data on a routine basis on an annual basis when you are performing that big annual review okay so we have these four measures that the tool measures and these are from the WHO data quality assurance guidelines so completeness and timeliness internal consistency external consistency and consistency of population estimates now when I was going through those measures just now and discussing those different metrics and how you can review those in DHS too you really focused on these two boxes okay so for routine data quality review you're mostly focusing on completeness and timeliness and looking at internal consistency however external consistency and reviewing you know the consistency of your denominators this is something that is kind of maybe more applicable on an annual basis and should be done you know especially when you're making kind of large conclusions about your program so external consistency is where you're taking your data I briefly mentioned right and comparing it with an external data source let's say you're taking data from your routine HMIS your routine DHS system and comparing it with a survey MIX DHS other large surveys that you conduct obviously this can't be done routinely because those surveys are only done maybe every five to ten years right so when those surveys are available you would want to make sure that you're performing some type of external consistency check same with your population estimates these are your denominators right this can really affect of course a lot of issues around performance in immunization as well as other programs if your denominators are not correct and you might want to be comparing for example your national census estimates with UN projections or other data sources on population in order to determine if there's a large variance between the different values that are available so real quick this is the data quality here I'll just reload it here this is the data quality annual report application this is a new application that has been developed now it supports so in the WTO data quality tool the previous tool we also had a version of this but there were some major challenges that so what I mean is here you have this annual report but basically on large systems this did not run so you tried to rectify this by replacing it with this app and well it doesn't have all the dashboards and everything else we've discussed why we've moved away from that approach but generation of the annual report was something we still wanted to support so I'll just run report quickly here and you just select a couple inputs after this is configured and you generate your report and it highlights all of those kind of four domains that I talked about so completeness and timeliness internal consistency external consistency and consistency of our population estimates so it gives us summaries at the beginning and I'll just describe this real quick here's our internal consistency we have a number of different measures that we've discussed you can see these scatter plots and some of these scatter plots are very similar to what we see or what I demonstrated on the dashboard and just for each of these four measures it gives us a bit of a summary here we see consistency of related data or related indicators and for example we see dropout rate charts for Penton 1 to Penton 3 which is what we described earlier but the nice thing about this tool is that each of these items you can add in text interpretations and you can save this report and print it out and share it so because it gives you the summary and then it also adds in these two domains that we don't necessarily review on a routine basis external data comparisons as an example this is comparing coverage with a routine data source as well as a DHS demographic and health survey we see here coverage, same thing where you see here differences and these values are just maybe not so accurate so there are significant problems in this comparison but you would want to be able to highlight those on a semi-annual basis whenever those surveys are available or that data is available for you to use and then we have this last part which is consistency of population data which is comparing current data within your routine systems with data collected via a different mechanism or method so this can be you can add in these text comments as I said you can print this you can print this out save it as a PDF and share it on an annual basis for example as an email, however you want to use it so this annual report is meant to kind of help us support this annual review data it adds in two domains that we don't necessarily see on the dashboard, the example dashboard that we've made and that's because those comparisons are not traditionally done for example on a monthly basis you won't be able to compare survey data with your routine data on a routine basis necessarily so we made sure to kind of support this as well through this new application and that will be released on the DHIS2 app hub shortly it's not available yet but it will be available soon I see that I'm actually at my time limit so well I just real quick what I'm going to do is go to this last part here discuss some of the resources that are available I'm just going to open them up just so you can understand what they're for so we have the documentation there's a lot of information here if you are thinking about implementing this in practice we have some checklists that kind of discuss you know maybe how you should think about tackling this process it starts from just reviewing the features introducing them to the ministries that you're working with and conveying that information to them describing the requirements that you might want to gather stuff that you can do with them together sitting in the office versus stuff that you might be able to do either remotely or in your own office without their input so much and it goes through the different activities that we think might be useful to help you actually get this working in practice we also have an SOP real briefly so this describes roles at different levels national level, regional level or provincial level, district levels etc and what they're responsible for and the idea is you add in your own procedures of course maybe you already have procedures but if you're adding in these specific data quality features they're not meant to just kind of sit there and you made this nice dashboard but really how you act upon those problems how you deal with if you identify an outlier what do you do how do you change that value what is the process in order to do that whose responsibility is that how are these values reported etc the day to day workflow of implementing these data quality features within your own setting it's less about kind of the where you click in DHIs too or how you configure things it's much more focusing on modifying kind of the workflow in your work daily work practices and it's just meant as a template maybe you have something already it's not meant to replace that by any means but perhaps you could review it and if there's anything you could adopt that could be potentially a useful resource now tied to this resource we also have a checklist at the kind of sub national level this is to be used maybe after trainings for example when you've done data quality trainings to observe people's behavior and seeing if they're doing the checks that you proposed within your standard operating procedures on a routine basis so are they checking kind of the values that they've told them to are they investigating things based on the procedure that you've identified etc right so it's really meant to kind of measure the effectiveness of your training and if you're being able to adhere to the procedures that you've since implemented alright so those are meant to help more on the implementation side of things where you're actually getting into thinking through what the different features are how you implement them in practice what are the different things you need to do at different levels of your system how you deal with these data quality issues what is the procedure to actually modify values in the system when you find these values are incorrect so that's meant to kind of walk through that process a little bit more well we don't have the training material yet that's a national checklist it will be tied to that a little bit closer when that's available here's the demo link that you can access the data quality dashboards so this is publicly available you're able to log in you can then have a look at these dashboards and see how things are set up you can review how they're configured the documentation of course has much more details on how each of these measures have been configured and walks through that in insignificant detail so if you're interested you can have a look at the dashboard and play around and then also consult the documentation because there's a lot more detail there on the configuration okay and there for now if there are any questions or people want to stay around a bit maybe we can do that but once again as I mentioned you can add your questions to the COP I see the chat has been quite active so I'll have a look at that as well but you can add your questions to the community of practice and we'll make sure to respond to those as well so with that I'll close there and thank you very much for your time today thank you so much Shira Jade and as we posted in the chat as well some questions have already received some answers and some others I think in particular there are some that are particularly relevant and it would be great to share with the wider public so maybe I'm going to post I've noted them down and I'm going to post those questions in the community of practice so maybe you can follow them up directly in the community of practice because it's great that those actually remain written because there are valid questions and I think they can contribute for the general public and so yeah the majority have been answered and I'm going to post those missing ones in the community of practice and you guys can continue the conversation there maybe and we can also refer you to our HIST network if you'd like to receive more information and of course we're always available to be reached out in the community of practice Shira Jade will also keep an eye out to answer if there are more questions and please do not forget that on Thursday there is also the French session of this webinar so if you haven't already signed up or if you want to like publicise it among your network and among your colleagues who maybe are not that confident in English and they'd rather follow it in French we're going to have it pretty soon so yeah everything from my side and thank you so much for everyone who has joined us today