 Okay, everyone. I think I'll get started because I understand you have another excursion at 5.30 to meet in the lobby to go on a city tour. So I'll try to end a little early, give you a little more time to get ready. Okay. So in this session, we're talking about data quality. Keep in mind if you want to learn a bit more about climate, that's happening in the Regency Room. So you can attend there as well. All right. So for data quality, we have released just recently a new toolkit on data quality. I'm going to try and talk about some of these things. I might not be able to mention everything in an hour, admittedly, but I'll try to talk about some of these things and I'll try to demonstrate some of them for you as well so you can see them. I think one thing that's important to note, many of you who've used DHS2 before for data quality might be familiar with the WHO data quality tool. There are some changes there that I won't mention just so we're transparent from the beginning, but I will also go over the toolkit itself, everything it contains, and all the different aspects there. So there's a number of different concepts that I'd like to cover, and we'll see how far we get. We might not cover all the features. For example, there's a lot there, and we are looking at other ways to share this information and build more capacity to manage quality of data within different systems. Alright, so just some general comments on our approach and how we have designed this data quality toolkit. So we've heard this word toolkit a couple times now. The toolkit for data quality is built a little differently, and I'll talk about that in a moment. But generally speaking, for our data quality toolkit, we base it on comprehensive guidance that's provided by WHO. So they have a set of data quality assurance guidelines, and this presentation is on Google Drive, and I've linked to these guidelines in the presentation. There's four manuals in that WHO data quality assurance guideline toolkit, including a new one for community health that they just released earlier this year in March, I believe. So it's quite a comprehensive guideline. It encompasses desk review, it encompasses routine data quality review, it encompasses a number of measures and indicators that are used, and concepts and principles used for measuring public health data quality. So it's a very good tool, and it's generic, and it's not DHIS2 specific by any means. There's no mention of DHIS2 in those guidelines, but it's just a good toolkit to use when looking at health data and measuring data quality of health data. So previously we have made this DHO data quality tool and used it within DHIS2. And some of you may be familiar with this tool, the DHO data quality tool or DHO data quality app. But there are several challenges that we've had with this tool, and we've seen this happen in many places. For example, we could not output any of the outputs from that tool onto a dashboard. There were some performance problems on large systems as well. So if your system was very big and you were trying to analyze large amounts of data, the app could crash or just not work as intended. And we've seen this in several systems at the moment. Also because you couldn't save various outputs in the DHO data quality tool, a person had to know how to navigate the tool in order to replicate what people were seeing for certain aspects of data quality. Or if you wanted to perform routine review of data quality, similarly a person has to add filters and navigate to the right spot in order to perform that routine analysis. So we kind of try what we're thinking around how to automate this a little bit more essentially. So how to make this so people could log in and see the errors or see the issues just upon kind of logging in with people kind of maybe setting things up for them. And then those outputs being saved so they could just mark them and follow up on those specific values that were marked for follow up essentially. So at the time that the DHO data quality tool was released, a lot of the functionality that we're discussing, it wasn't really available within DHIs too, but that has changed quite a bit. And that's kind of what we're going to discuss today in terms of how we can transition some of these things over so we can create a lot more automated analysis for the purposes of measuring data quality. Now we're slowly phasing out this DHO data quality tool. And what we mean by that is it won't be updated anymore because of the challenges that I mentioned before. And we have some complimentary tools that we've developed in order to make sure that this transition is a bit smooth. And we also have a new application being developed in particular for this. It had a feature for generating an annual report. This did not run well on large systems at all. So the new app is being built from the ground up in order to, especially with a lot of kind of performance behind the hood, to make sure it doesn't just crash or fail on the systems that are relatively large in size, which a lot of systems are these days, admittedly. So there was kind of mixed success with the implementation of that tool, I think over the last couple years. But we still base everything on those guidelines. So just a quick comparison in terms of where we are right now, in terms of what the DHO data quality tool, all the functionality we built inside that tool, and where we are within the DHS to core. So we can see that for the most part, we're almost there. We've almost got everything inside of DHS to at the moment, where you no longer need to use the debate show data quality app to do the various data quality analysis, this half check mark here for the completeness and timeliness of data elements. It's supported in the annual report, but not in the debate show dashboard. So that's why I said half. But the point here is that one, we have some variants, some differences between the WHO data quality tool and what DHIs to can do. We can see not one of these tools at the moment covers both. Okay, which is why we're still supporting that tool for the moment. Okay, but the idea is that all these features do get put into the core. So then you can use, you know, you can make dashboards, you can automate all the analyses, you can create various outputs for people, and they can just log in and review those and mark those issues and not have to necessarily have specific knowledge on creating various data quality outputs. So we are moving or transitioning a lot of the functionality over to DHIs to and we can see in this comparison here, actually, a lot of this is almost done, right? We just have a couple other small things to kind of add in one for this consistency over time using scatter plots, that's not there. And then this this quality annual data quality report, we are building a new app to create that that will not be a core part of DHIs to that will still be via an application. And if we have time, I can I can maybe show it as well. You can kind of see what the future looks like a little bit. Okay, so we've talked about toolkits, Yuri talked about these in a previous session. And if you attended that session, he would have given you a bit of a breakdown on what the toolkit typically is. Now for data quality, it's a little bit different. It's not a metadata package. It's not a set of guidance necessarily just focusing on a particular disease. Okay, but the data quality toolkit comprises many different components. We have new documentation and I'll show that we have a demo instance that's set up with all the features that I just described here. So you can see how they're configured, how they can be utilized, you know, and think about how you can apply this to your own systems. Okay, that's also true for all the other toolkits, by the way, where there is a demo instance available. Okay, we also have some new tools that we're developing. Okay, there's a new app for data quality review for generating that annual report that I spoke about. We'll see if we can show it or not depending on time. There's also a script for generating minimum and maximum values. I'm not going to necessarily get into all the details, but I'll just cover what minimum min max is. We also have for data quality specifically, a number of SOPs and checklists. Okay, so this is to help more on the implementation side. We have a draft template for an SOP that you could take and implement in your own setting. We also have a checklist for the subnational level, so a district or lower, for example, what type of behaviors should they do on a routine basis. So all of our toolkits focus on what should be done on a routine basis rather than analyzing data quality once a year. Okay, we're really looking at once a month or once a week, right, depending on the frequency of your data. And we really want to implement procedures to check on data more routinely and not just as a big exercise at the end of the year. Okay. And then sometime early next year, I'm still working on the schedule, but we will have that available soon. But we will have a new data quality academy with a lot of new training guidance on implementing these features, configuring these features. I'll cover some of them now, but this is really a crash course. The academy will of course be a lot longer, give you more time to actually log into DHIs to do exercises and configure everything that we're going to show today. Okay, so this is kind of our path forward. And there's a number of links here on the screen. And this is in the presentation as well. So you are able to access this. Okay, so first, I'm just going to quickly start with the documentation that we've created. So we've created documentation separated into four sections. So we have just general principles for data quality. This is largely based on this WHO data quality toolkit that I mentioned earlier, or sorry, this data, the data quality assurance guidelines, I should call them the correct name. Okay, we also have then, then we split it up into a couple of different sections here. And we can, I'll just increase it a bit more so you guys can see that there. Yeah. So we have the, the principles themselves. We have data entry. Okay, which is focusing on data quality features you can implement for people that are entering data. Okay, we have then a section on analysis, which is all the different analyses that you can perform on the data after it's been entered. And this is the largest section. There's a lot of different analyses that can be performed. And then we have a section on implementation guidance. Okay, so this is around the SOPs. This is some extra stuff that might need to be configured to support some of the procedures. This is the checklist and implementation guidance around actually getting this kind of making it work more from a systems perspective, right? So not so much focused on the DHIs to configuration, but thinking about routines and procedures that you need to implement in order to get this to work in practice. Okay, so these were just released. I think we just finished them last last month. So the very new, very fresh. And if you want to learn a bit more about this, I do suggest you have a look at these in more detail is it give you a lot more information than I'm able to provide in the next hour. Okay, similarly, I mentioned a demo instance. So in the presentation, there is a link to this instance. I'm going to use this to demonstrate some of these features as we go through the session today. But here's the dashboard that the data quality. We have two dashboards actually that we've configured. One for kind of the national level, one for lower levels of the health system, for example, facility or district, which focuses on there's different measures that we kind of focus on when we discuss this. So we have some things that you may be familiar with. Just reporting rates, for example, of data sets. We also have some new measures that I'm going to discuss in this session. So we have measures for data element completeness, which is actually looking at the completeness of individual variables, not entire data sets. We also have facilities that are consistently reporting. I'll explain some of these measures that we have here on data quality. There are a number of new measures we've introduced as part of this toolkit. And of course, we've talked to some of our partners like WHO and Global Fund. I mean, we're to derive these measures. Okay. Now, you can see the good thing about this, of course, is that I have a dashboard. I'm not logged into a separate application. So people log into my system. I have a number of data quality issues they can review. And this can be reused across many levels of your health system. So, you know, this would allow people to log in without any particular knowledge, specific knowledge on setting up data quality measures or features, and just review the data quality that's there, you know, on a routine basis, rather than having to go to a different application and add in filters and configure it the way they want it to, to get the output that they need. So this is kind of our whole rationale behind revising our procedures, because this allows people to log in and review this on a routine basis, you know, to hopefully, however, often the data is collected. Here we have some other measures here. Data element completeness again, I'll talk about that in a moment. Facilities consistently reporting, I'll discuss how these are derived. We have some different charts here, you might recognize these from, from your own work or previously, where we have scatter plots, we're viewing consistency of related data in this example, where we have two variables that are related, and we're examining the difference between them, and if they fall within an expected pattern, essentially, and I'll explain this more as we get into that. Okay. We have dropout rates. Many of you might be familiar with dropout rate calculations. They're very common in immunization, for example, but can be used for other purposes. We just have some various consistency over time charts, right, where we have various values over a set number of years, in order to evaluate their consistency over a various periods of time. And this is all just measuring kind of internal consistency. Then we have some new measures here of values excluding outliers. Okay, and I'll discuss what that means as well. So we do have a three or four new data quality measures that we've implemented with the help of our partners in order to kind of look at data in a different way than we're used to, but we also have a lot of the classic measures in place that we're used to seeing as well. Okay, so I have the demo link in the presentation. Just go back. Maybe not in the best place, but it's the last slide here. Okay, so you can log into this demo. You can log in while I'm doing this presentation if you'd like and you can have a look at that dashboard. Okay, so we have two dashboards, but if you just take a look at it, but if you just type in data quality, okay, you'll see here we have one data quality core. That's the national level dashboard, let's say, and then we have one for the facility. There's some different measures here. So for example, in this one, we're also examining the reported value versus what we derive as a threshold, a statistically kind of sounded threshold. So for example, the mean plus three standard deviations or some other threshold, a calculated threshold for the data. So the facility has some additional measures that aren't on the national dashboard because they aren't suitable for measuring at a national level, essentially. Okay, so feel free to log into the demo and review this as I'm going through the demonstration. I'll be using these dashboards to demonstrate and discuss and explain some of these features a little bit more. And once again, if you open the presentation, it's just the very last slide. Okay, you can open up the demo from there and the login details are on the screen. Okay, so first thing I'm going to focus on are features for data entry, right? So this is basically to, we want to be as proactive as possible where we can in order to mitigate the amount of data entry errors that we are seeing. Now you might be familiar with some of these features already. Admittedly, there's nothing too complicated or complex, but the whole idea is that we try to reduce the amount of errors that we're seeing at the point of data capture. Okay, so the first concept that I'll talk about is validation rules. Now many of you might be familiar with this. Okay, so validation rules consist of a left side, right side and an operator. Okay, and this is a measure of internal consistency or measuring data within our system. These can also be used to measure external consistency. If we bring in data from another system, for example, maybe you bring in survey data as an example, then you can also create measures of external consistency when using these validation rules. Okay, so validation rules measure what should be true, and if it's not the case, then you detect a violation essentially. So you are able to view these directly in data entry where our focus is right now, but you can also have them run automatically in a batch. So let's say you want to view a number of organization units together and see if there's some challenges with their data quality. You can run these as a batch operation and then view the results. Maybe that's more applicable, for example, at a provincial or national level as an example. Okay, and then you can also run them manually in bulk as well. So you can either view them by individual data set or have them run bulk on a schedule or through a manual procedure. Here's just some examples of what it looks like. So I'm actually just going to switch over. Okay, so this is our regular data entry screen. All right, and many of you are familiar with this, I'm sure. Okay, and I'm just going to run a validation here. So this is an example of validation rules at the point of data capture. You can see here there's a number of issues with my data, right? Now a person who's reviewing this, it kind of depends on the training that you've given them, right? Some people might be able to review these values and change them. Some people may not, right? That's why we have operations that kind of feed upwards. So you can also look at this in bulk and not just one at the same time, right? But if the person is able to review these rules and change the values accordingly, maybe it was just a mistake when they entered their information, for example, then this can result in, you know, some significant savings in time when you're reviewing your data and your data quality. So as an example here, we have a couple of rules that have been violated, right? So cases treated should be less than or equal to the number of suspected malaria cases. So we have this one here, okay? So we could possibly change that one of these values, either the left side or the right side, in order to correct this. Now, the person would, of course, have to understand that in order to do this, but that could be done. So, for example, so for example here, I removed some of the values for cases treated because there were no suspected cases reported. So you can see a number of the number of rules showing now is much less because, of course, we've been able to fix that error. Okay. So it can be as part of your routine procedures. If you set all these rules up, these are configurable per, you know, it's a local customization. You can configure these per instance, per setting, setting. So you want to make sure you have local rules that make sense that follow internal validation checks. But there are guidelines for setting these up. There's a toolkit called the WHO Facility Analysis Guidelines. And for a number of health programs, they have guidance on what makes sense for reviewing internal consistency. Okay. And just real quickly, this is the new data entry application. So Austin and Phil talked about this a little bit before. And you can also run validation rules in this new data entry app. There's a bit of a different interface and we can see that here. Okay. And the validation rules appear on the right side of the screen instead of as a pop-up. And we get the kind of priority and a better bit of a nicer description and everything in terms of how to handle the rule. Okay. So these are available both in the previous current data entry app, if you will, as well as the new data entry app that's up and coming. And they're configured the same way. So I'll show you that I have a number of resources available where you can review how these validation rules are configured if you want to add them. But generally our implementation guidance on this is that all of your aggregate data sets should have these rules. And then for things like Tracker, you want to have program rules or program indicators, for example, that check the validity of your data during data entry. Okay. So always we want these warnings or error messages or things like that to pop up to prompt people when the data is not being entered accurately or as we intended to. Okay. Let me just zoom out before I switch over here. Okay. Okay. So on the implementation side, we have some general recommendations on using validation rules. So firstly we recommend, as I mentioned, any data set that you have in DHIS 2, you should review and check if you have the necessary or required validation rules that you need. Because a lot of times what happens is the data set is created and the validation rules are not created. And then the training happens and the data set gets rolled out. A lot of data is entered. Maybe you add the validation rules six months down the line. You go to check things and you're not really sure how to fix the data. So it's better to kind of have this in place before you do any training on anything you're introducing. And if it is after the fact, okay, no problem. It's better than nothing. But generally speaking, try to be a bit proactive with this. A lot of the times data entry can occur from paper, right? So you're having a paper form of some kind and then entering the data manually into the system. It's not entered in real time necessarily. So what the user can fix, you know, that might be limited in scope. So you need very clear procedures and some training and guidance for those people. And of course it depends on capacity as well to some extent. If that person entering data doesn't really understand necessarily how everything relates together, it might be hard for them to go back and fix things, right? But generally the more you can do on this, the more you can improve because it allows things to be corrected at the point of data entry. We also recommend that validation rules should be set up only when they're basically impossible to occur in real life. All right? So what I mean by that is there's two examples I have here. Positive tests should be less than or equal to test performed. Within a month that's always true, right? I'm seeing the same patients and the number of tests that I do should not be more than the test or the number of positives shouldn't be more than the tests that I've done, right? In any given facility in any given month for any given time period, that is true, right? The second one, ANC1 greater than ANC4. Now for a year or for some other longer period of time, generally speaking, yes, ANC1 is greater than ANC4. But for a given period, ANC4 could be higher than ANC1, right? So we don't want something flashing, some flashing error for somebody to say, oh, go check your ANC1 value. It might not be correct because it's less than ANC4. Well, for a given period, you could have more mothers attending the clinic just because that's when their visit was scheduled for their fourth ANC visit. So in that case, this would not be a good rule because it's only kind of occurring from time to time, okay? So you want to be careful when you set these things up that they follow a logic that makes sense. All the time basically, okay? And you can perform that analysis over a longer period of time separate, but make sure those prompts don't appear for the user, essentially, because that can be quite confusing, okay? All right, we also have another functionality called MinMax. So this is basically where we can define minimum and maximum values for every variable within our system, okay? Now the functionality for doing this inside of DHIs2, it's not as well-built as it could be but the functionality for comparing the values is very strong. So what we generally suggest and this is why we have a tool that I spoke about in the beginning, we have a tool for actually generating this MinMax outside of DHIs2, but also there's many ways to generate these minimum and maximum values, right? This is a statistical method essentially of generating these values. So we do have a tool, for example, and I've linked it at the beginning of the presentation because it supports a number of different statistical calculations, you can see here some of them for those of you familiar, maybe those of you are not. We use modified Z-scores, we use also this boxcox transformation method which is a method of normalizing our data before calculating our minimum and maximum values for each value within DHIs2. So there are some kind of advanced methods that are associated with this, but I'll just kind of explain this a little bit more. So if I head here to one of my variables, and this is in the new data entry app, I can also show it in the old one. Okay. So you are able to actually edit limits for the minimum and maximum for each value within, so let's say, for example, I said one and my max was, zero and my max was five. So this is saved. Now, of course, you're generally speaking not going to do this one by one. You can also do this in bulk and generate all the minimum and maximum values together. But as I said, the methods within DHIs2 right now could use a little bit of strengthening. So we do have outside tools to help with generating these minimum and maximum values. Okay. So if I enter a value of seven, this value actually is not saved. So we can just kind of try to figure out what's going on here. Okay. And it's highlighted in red and I just scroll over it. So it might be hard to see this message. I'll just increase it at the top here. Right. And the reason it says number cannot be less than zero or more than five. And that's because there's a minimum and maximum value set in the data value and it's prompting you in data entry to make sure that you're within the range that's expected. Okay. And generally speaking, we set these in bulk, as I said, not one at a time, but I just showed you real quick how it's done for one. I'll just remove those quick. Okay. And we can also do analysis of this in bulk and I'll show some of that in a moment. So there's a lot of different stuff here that we're going to explain because I've added specific resources for all of these things. So we have links to the documentation. We also have links within that data to quality toolkit documentation that I showed you. This shows you how to configure the validation rules, how to use them in data entry, how to apply them to your own system. Okay. So within the slides, you can just click on the item and it'll take you to the link. Okay. So that would be my suggested method for sure. I know, but even if you don't remember necessarily everything in the slides, there's a lot of extra resources on this. We also have some more information on minimum and maximum if you want to read about that. Okay. So for data entry, we just had a handful of features. For analysis, we have quite a quite a bit of features and we've also introduced several new methods for kind of measuring data quality within DHIs too, that maybe are a bit different than previous ways to calculate data quality that you've seen, at least within DHIs too. Some of these methods have been around for a while, but now we can implement them quite successfully inside our systems. All right. So first I'll talk about completeness and timeliness. Now many of you might be familiar with these measures as they're kind of historically represented within DHIs too. So if we're looking at completeness and timeliness, typically we're talking about a data set completeness or a data set timeliness, right? So we expect 12 data sets in a month, okay? We only get 10 and we calculate our completeness based on 10 divided by 12, okay? That's a very typical way to kind of calculate these things, okay? We've also introduced these two new measures in orange, okay? For completeness of data. So this is the proportion of facilities that are consistently reporting, I'll explain this calculation and the completeness of individual data elements, right? And I'll also explain this one as well. So data set completeness and timeliness, I think we're all pretty familiar with and I'll just quickly demonstrate this again, okay? So this is an example here. We have our data values reported and then we have the completeness rate reported on the same chart so we can, you know, get a good sense if this data is representative or not. These can be very important to let us know, you know, if we're having very low measures of completeness, generally speaking, we might have issues of our health system and then we might have issues with interpreting the data and making some type of action or plan based on that information when it's, you know, let's say we only have 60% of our data, for example, it's hard for us to make conclusions about the service delivery aspects that are actually occurring within our setting, okay? So this is kind of the typical way. We kind of understand timeliness and completeness within DHIS to at least implement them by either the number of actual reports or the number of kind of reports on time within a system. Okay. We also have a new measure here. So I'll try to go a little slower through this one, okay? So refer to this as data element completeness and this is not new in terms of talking about statistics or data quality necessarily, but in terms of using it or being able to implement it easily inside the system. So data element completeness is a useful complement to data set completeness. So it's not necessarily one or the other. We generally recommend that both are in place, okay? And what we're doing with data element completeness, we're not evaluating the completeness of an entire dataset, okay? We're actually evaluating the completeness of an individual variable within that variable inside of it, okay? One of them is RDT test performed, okay? So you can actually measure the completeness of that RDT test performed variable, right? This is more granular in nature. It gives you some more insight as to the data if, you know, the data you specifically want if it is complete or not. The dataset could be submitted but it could be missing many variables or you're measuring the variables within that dataset, okay? Now, the way this is calculated can vary. So you can see basically I have this little figure here and I have a number of different denominators for the way this can be calculated. And this is because we can use more than one denominator essentially to calculate this variable. So the numerator generally speaking, in my example, I said RDT tests performed. Okay, that's one variable within an entire dataset of 100 variables in my example. Okay. As a denominator, we can then use a number of different calculations, number of different denominators, sorry. We can use a number of expected reports, okay, from within a dataset. Okay. So if we're, let's say we're expecting that value to be reported 12 times a year within one facility. Okay. So that could be our denominator. Right. RDT tests performed should be reported once a month. That means it should be reported 12 times a year. Okay. So that could be our denominator in one case. Okay. We could also say what about the received reports? Right. So let's say we only actually received 10 reports. Okay. We could take the number of data values that are reported and divided by the 10 reports we could also look at data values reported from a related data element. Okay. So if we said RDT tests performed maybe you can look at my cross speed test or we could look at something else. Okay. That's similar in nature that has a relationship with that variable. Okay. And we can also look you can also use historical data instead. So rather than looking at current data and reflecting on that we can look at facilities that have previously reported data. Okay. So I'm going to skip this because there's a lot here going on here but I'm just going to kind of describe and discuss and demonstrate why this measure is useful. So in addition to kind of calculating the data element completeness we can also run ad hoc analysis inside of DHS to to quickly give us our data element completeness. So one of the difficulties of course when configuring this for 10,000 data elements essentially using the approach we've defined that's not really a good idea. What we suggest just like the data quality assurance guidelines suggest for WHO is you find core variables within your HMIS and you use those as a proxy for data quality. So maybe you have 15 or 20 for example the data quality assurance guidelines also discusses this in quite a bit of detail selecting a core group of variables so that configuration will get a little heavy but we do have ways to run this without having to configure a lot of stuff either it's not as maybe specific but it can give us some interesting information and I'll demonstrate both methods okay let me go over here all right so let me go to our dashboard okay and here you can see my numerator is a number of values that have been reported and my denominator is a number of expected reports okay so I'm just doing it for this specific variable so in this case it's A and C1 okay and you'll see on the dashboard that I've made oops I don't want to do that okay I just have four variables now that's probably not enough you'd probably know that because there is a lot of configuration involved it is heavy on the system and if you have too much it's going to be impossible to follow up on those issues anyway right so you have to make it contained so you can do some proxy analysis and understand what's going on in your system all right so in order to create these indicators we use something called DHS2 and we would select a core group of indicators to do this so what this is saying is that for this period for October 2023 within the country 84% of my A and C1 variable is A and C4 is 84% complete okay so for this specific variable not the data set now we could run it for a year we could run it for a quarter we could run it for whatever we wanted to right but it gives us a more kind of granularity for this and if we compare this so I'll just remove these bottom ones this is the data set it comes from RIMNICA and in any given period the reporting rate is not greater than 67% but the data element completeness is higher between the variables being reported and our data set completeness so in this case actually our data element completeness is higher than our reporting rate it could be the reverse also right so sometimes these values might not always match and this gives us some more insight into what's going on and of course in this case we'd want to check what's happening because this doesn't really necessarily make so much sense okay and there's one more thing okay and then as I said you can also do ad hoc analysis of data element completeness so it's one thing to configure the completeness for all these different variables that can be quite time-consuming and as I said you might pick a core set but you can also do account of the values that have been reported quite easily using a pivot table or a chart or anything else within DHS too so right now I'll just change it back actually so our traditional method of looking at variables is to sum them right so if I'm looking at the number of doses given in a particular year it's just going to take the total value it's going to add them right so 13,655 that's the actual number of doses that I've given for this particular or unit for this particular period alright now here I have the number of expected reports so right now this doesn't help me so much right but what I can do we can change the aggregation type in our data visualizer and what we can do is change it to count if we do this rather than summing the data values it'll count the number of values that have been reported okay so if I click on update now okay so here we have our number of expected reports that hasn't changed 1,385 here we have the number of values that have been reported okay which is 1,179 so we can see that if we use that expected reports variable as our denominator we are missing quite a few reports we could calculate a percentage from this of course as well quite easily and we're able to do this for many many variables at once right so these are all counts and we can compare them to our expected reports in order to give us some sense of what our completeness is within the system now of course we might also want to create some measures like this one here okay which gives us these percentages that takes a little bit more work not impossible but can be done we just want to do some more kind of routine analysis just quickly giving us some information or insight on these variables we can just quickly use this count function within the pivot table in order to give us this now of course we have to make sure to add our reporting rate right this is the number of expected reports from that particular dataset and then we have quite a bit you know a host of different variables you know I have a long list here right and to configure this for that we would recommend but this is many many variables for the immunization program you might only want to select two or three as a proxy to configure in totality alright so this is a good way to kind of run ad hoc analyses of your data element completeness without having to configure too much alright so this is also supported and can be quite nice to help support this feature okay this is another new measure and this is for organization units that are consistently reporting so what we mean by this is that for a 12 month period or whichever period we define six months 12 months two years okay they have reported every month or every week for that period okay they haven't missed a single period in which they the variable or reports have been blank okay and we do this on a per variable basis once again there is a fair amount of configuration involved in the core group of indicators decide on what those indicators will be and then configure those in the system so the example calculation I have on screen is for facilities consistently reporting A&C one in the last 12 months okay so what we do we get a numerator here our facilities that have reported A&C one for every month in the last 12 months okay so if they did not report A&C in any one month okay and then on the bottom we have facilities that reported A&C one in any of the last 12 months so as long as they reported at least one time okay because that number will be higher than the numerator in this case okay this bottom part is on the configuration I'm going to direct you to the documentation or to come ask me separately just for the sake of time so we have in this case we have five facilities okay and we're measuring consistent reporting now the only two facilities in this case that have reported for every month our facility B and facility E we can see there's no blanks right facility B has reported for every month facility E has reported for every month okay and then we take the number of facilities that have reported at least one time facility D didn't report any A and C one values so it's not included in our denominator the remainder are included because we see at least some values scattered about okay so facility A B C and E would all be included in our denominator but facility D is excluded because it did not report for that 12 month period okay so our consistent reporting for this and we had four facilities total that reported any value within that same period okay so this is also a good proxy to kind of help us understand you know how consistently various facilities or organization units are reporting and if there's a lot of kind of inconsistency then we would definitely want to check up on those facilities to determine what's happening okay so once again we do this on a per calculation in practice so we've taken the number of facilities that have reported consistently divided by the number of facilities that have reported at least one time and that is the value we see here okay and the implications of this so I'll just if I can find it within my C of different things I've made okay so the implications of this are what I'm oh no wait I don't have this right now hold on give me one second okay so we'll go back to the dashboard all right so then we can follow up of course we can drill down and see those individual facilities that are not reporting as well because we will have the numerator and we will have the denominator available to us as separate calculations okay so just to go back right the numerator was those facilities consistently reporting the denominator reporting so we could get a list of all the facilities that were not consistently reporting as an example because we have that denominator calculation we could make a list we could find out which months they weren't reporting and we could ask them what's going on you know so as a follow-up action so what we're trying to do with a lot of these proxies is not just to display values for the sake of doing so but to give us values that we can act upon right because this gives us lists of facilities what happened why are you not reporting maybe it's expected behavior and you guys know that and that's fine okay but if it's not expected behavior then of course you'd want to develop some SOPs so you make sure to follow up with those specific facilities when the time kind of calls for it okay before I go on I've probably thrown out a lot of information is there any type of clarification or questions before I proceed with some of the other types of analyses that can be performed so far so good oh okay Farida please can we get her a mic I think it's a little difference with the WTO quality app especially for the consistency based on WTO standard there is a there is a consistency consistency between indicators and over time you said over time cannot provide by the HH2 right and then the consistency between indicators there are two consistency internal and consistency external I haven't seen the consistency of internal in the presentation and there is oh yeah there is but in you said that only separate between polydase and rule and use min max while in WTO standard we should combine polydase and rule and min max or the criteria so for example sometimes we expect that A less or higher than B and we there is also criteria for example 50% or 33% so I haven't seen that in this the HH2 I haven't got that far we just those were just data entry and then I talked about some new measures so for example when we talk about scatter plots we use very statistical methods here to set the threshold and it's more sensitive than just setting 33% or 10% so I'll talk about it here where we use methods such as median Z scores or intercortile range which are much more kind of sensitive to picking up these errors okay so when we're measuring for example consistency between two related variables internal consistency we can use these measures instead which are much more sensitive than what we had previously in the WTO we just haven't gotten there yet yeah that's a good pickup yeah yes Anton okay about the run validations mean max if the data if the data imputer input the data and the the data exceeds the targets and appear notifications okay can the HH2 platform facilitated the the data imputer can cannot save the data to that system or the data still be safe or how because I think in the implementations there is some such like a bizarre situations for example if there is innovations to approach the services or there are outbreak happens so the data exceeds above the targets so how about that okay yeah so there's an option when you create your data sets and it's a complete allowed only if validation passes so what this does basically you cannot complete the data set unless all of your validation rules pass so you can set it will still save the data however it will not just discard them the values will be saved but the data set will not be completed and it will only be completed if all the validation rules pass so that's kind of our way of getting around this we don't want to kind of stop data values from being saved that could be very dangerous on our part I think but we can stop the data set from being completed that way about data element completeness is it only applicable for the aggregate data set so is it therefore tracker yeah so I mean the problem with tracker data we don't know the denominator right so that's kind of the challenge we have what is our expected value how many values are we supposed to report in a month if you can define yeah I think you could create some type of proxy right but the issue is let's say we're doing something on maternity and people are coming in for the ANC1 and you're saying collect Gravita as part of your ANC program how many times are you supposed to report Gravita in a month it could vary we have no idea right so because we don't have the denominator that's the challenge we have right now with tracker data in terms of defining these historical values potentially to calculate the expected value you could say on average 100 times Gravita is reported for this number in this facility and then using that historical data you could create a denominator and then subsequently create those calculations on completeness you couldn't use any expected value unless you had some type of algorithm for that so if you're able to bring in values to compare as your denominator like I have here you could use if you have the calculation available you could still do the same type of procedure for tracker data elements or event data elements as well to do that as long as you know kind of what your denominator is in that scenario and then it'll work the same way yeah Naeem so we're going to calculate the mean max so usually in DHS we took the last 12 month data and with the standard deviation either probably standard deviation 2 or 3 in some cases in some of the data are much more fluctuating and in some of the data are not fluctuating over the period so is it possible to define I mean is there any plan to make it defined by the data element so some data element standard deviation is 3 and some data element standard deviation is less than that and also in some cases we want like in some scenario we need to like for not only for last 12 month but also for last 24 month or might be last 6 months yeah right now I mean the features in DHS2 they're not great right and that's kind of the challenge we're having here so where did I put that yeah okay oh GitHub doesn't want to let me in okay we have a link here to this Python script this allows you quite a bit so you don't just have to look at 12 months there's very statistical methods that you can use you can change a lot of the different inputs it might not be exactly what you're looking for I'm not sure but it definitely is more flexible than what's in there right now right now what you'll often see in DHS2 is when you calculate the min max you actually get a lot of negative values for your minimum so that's not good right that doesn't usually doesn't make sense you might not know this is something we're looking at a bit more in order to fix right now we do suggest that you calculate those minimum and maximum values outside of DHS2 whether it's using some tools that we've developed or some other statistical methods that are available unfortunately right now that's something we need to improve upon a bit but if you use that automated feature to calculate the minimum because it's not giving you the right value let's see here almost time to let you guys go okay maybe you'll just try to cover a couple things maybe to help answer some of the kind of outstanding questions there's a lot of stuff in here so I'm not gonna get through it all but maybe your question won't be answered but you can come see me and then I can try to help you explain where we've made the switch and then we can discuss for consistency of related data our scatter plots now these were available or are available I should say within the WHO data quality tool this is looking at so we don't have the consistency over time for scatter plots yet we are working on that but we do have consistency of related data and the nice estimate I think in the WHO tool 33% 55% I mean it's no real basis in statistics it's just you guessing what the difference should be okay so you know just quickly over here okay this is a scatter plot in DHIS for anyone who's kind of work with the WHO data quality on my Y axis and C4 on my X axis okay and what I'm looking for is an expected relationship between these two variables that doesn't mean they'll be equal right typically they're not okay but I want some kind of consistency between the difference in these items that fall within a relationship that I define okay so so let me just kind of zoom in here so all these red values according to this are you know requiring some further follow-up these lines here these are 1% extreme values these are our most kind of highest outliers that we should kind of prioritize for our follow-up so this middle line here this basically means this is the relationship between the two variables the closer the green dots are to the middle line and this is the other side of our expected relationship now the further it falls away from these outside lines the more likely that relationship is not correct okay and in this case if I scroll over the line we're using something called a modified z score now I know everyone might not be familiar with that term okay but the way it was used in sound statistical methods so you might be familiar with some of them right intercortile range z score modified z score rather than just defining a percentage for the difference okay and then we can define what that threshold should be okay and the higher the threshold the more sensitive the model is to finding outliers when I say sensitive basically you can either say let me find a lot of outliers or more outliers generally speaking okay because we increase or sorry decrease the acceptable difference between the variables in our relationships okay so let me just zoom out here okay so here it was this is what it looks like on three just to give you an example you can see these lines where they lie okay and then I can just switch it to five so the lines kind of expand a little bit right so let me just change it okay so let's just kind of explain this a little bit more and what I'm going to do so the nice thing about this is it's quite an interactive chart where you can do lots of interesting things okay so what I've done is zoomed in on a particular area within the chart because in particular I want to investigate these items that are kind of similar to this one and this is the probability for this particular period ANC one was 391 ANC four was nine okay now what we've said is if these vary by at least a Z score of three a modified Z score of three the relationship the difference between these two of the relationship is off necessarily because it could be ANC one it could be ANC four now more likely what we can check is some of these other historical values right so if we examine many of the historical values ANC four is nine in that example so probably ANC four is okay but ANC one is like what was it 300 200 yeah 391 so probably there's this within our expected kind of limits relationship and this is kind of harder to spot right doesn't look like such an extreme value necessarily but if we compare it to historical data or compare it to data that relates to that variable we might find out that some type of error has been made either in our reporting or data entry maybe this should only be 39 for example or maybe it should be nine or something so this is a chart type inside of DHS2 called scatter and then to modify all the outlier options I apologize I'm going very quickly quickly I know we do have all this documented though okay then you go to options outliers and then you have all the different options to perform your outlier analysis for these two related variables okay another one is dropout rates I mentioned other use cases as well let's use the dashboard to examine this one dropout rates are kind of fine here but you can see the screenshot here though so for those of you kind of familiar with this measure this was also something that you could do previously in the tool to analyze consistency of related variables this is a very common measure for immunization but can be used for other functions as well A and C one to four dropout rate and we have some resources and links for this as well okay so for all these sections I've tried to put in links for more details on how to configure them how they're utilized explanations of them okay I don't have a lot of time and I have way more left okay so I'm not going to get through all this by any means okay I'll show one more feature and then I'll kind of see where you guys are at and let you guys go for the evening okay so another feature previously for consistency was year over year charts these were very kind of basic and easy to interpret you know if I have a look at this chart I can kind of you know without any real training I can see what the obvious outliers are right and that's the kind of advantage of some of these tools not all of them have to be as advanced as I was describing previously some of them can just kind of identify immediate data quality issues and can be easy for example how to interpret this you know that's pretty self-explanatory so a lot of these measures can be you know don't have to be as complex as I was describing those are just there to kind of supplement some of the measures of data quality that we have okay so I'll just show you some examples of this on the dashboard so here's an example of one of these year over year charts and we saw this previously in the WHO data quality tool as well you can see there are some line when compared to previous years so how these charts work one year is one line of data okay and we can see here it's showing January to December I can just hover over one line to show you so these are the values if I just maybe get rid of some of these okay so one line is one year of data we can see here January 2023 till December 2023 for that same number laid a number of other years data so I have 2018 19 2021 22 okay and what we're looking for is extreme variations in the pattern from year to year or month to month okay so for this particular month in March I have five years of data to compare to right and generally speaking maybe we see an increase in services because your population is increasing but we wouldn't be able to do that right so that would be too high right so then we want to investigate what the source of that problem is and maybe we do a more detailed outlier analysis in order to identify the specific facility or facilities where this is coming from in order to fix those values we can see this in this example here okay where we have all these lines kind of closely bundled together but it warrants investigation right and typically a person could log in on a month to month basis have a look at these charts and you know if something looked incorrect they would at least be able to flag it for possible follow up so you could investigate the issue further alright okay outliers there's a lot of stuff on outliers so okay let's try to finish by 445 okay but now we have actually more granularity on displaying outliers and utilizing outliers for a number of calculations and what we're looking at is outliers over time um so once again we kind of stick to defining these for the core variables with end because the configuration is heavy um but then once again if you select your core data quality variables to use within your framework you'd want to configure these types but we heavily rely on the use of this concept called predictors um to define these and set them up okay uh so this is kind of the long-winded way of describing how these are configured I'm not going to touch into that because that's a whole longer session um but just to give you some idea okay so we have various predictors and data elements that are used eventually we get down here which are values and then we get data uh the data values excluding outliers so if we remove the outliers from our totals what would our new value be okay so let me just show some examples of this okay wait let's go through an example first okay just so you can get a grip on what I'm describing alright so let's have a look at ANC1 here and the middle column is what's been identified and outlier has been identified in this orange color okay uh then we have some other columns ANC1 excluding outliers on the left hand well when you're looking at it I guess it's the left hand side okay and ANC1 outliers which is on the right hand side okay so what we're able to do is get a summary and all these values that you see here we get them in the system we're able to also exclude these values and I'll kind of discuss the implications of this in a moment and then we'll maybe end okay so the first thing we kind of do to make sure we can calculate all this is calculate our threshold our statistical threshold okay so we take all the values we have in that column um we then get um the mean plus three standard deviations so the mean is another word for average three standard deviations to that total that gives us our statistical threshold okay so we don't want the overall value 6415 or any other or sorry any single value within the month to be over that value now that value is also throwing off our mean quite a bit our average right it's increasing our value a lot but let's just try to stick to one thing at a time okay so then we get a count of the outliers all months for this specific variable are an outlier okay and in this case it returns a value of one because that large value of 4243 is an outlier okay and that would be saved in dhis two you could display it on a dashboard you could display it in a pivot table on a chart whatever okay and you could display it per org unit that you define this for okay so if you maybe are working in a district and they have 10 facilities and then you say well how many were not outliers okay um for this particular that should be 11 okay sorry um okay so on the way to this 10 there's only 11 values the brain is getting fried here okay so then there's there's 11 total values so 10 of them are not outliers so you get a count of those as well okay this one okay this one here this one might be a little tricky okay but it's this column okay when you're facing once again leftmost column ANC one excluding outliers what this is doing is removing the outlier from the total and giving you a new total okay without the outlier affecting your overall data value okay so the ANC one value excluding your outliers is 2,172 because it does not include this value of 4,243 in the overall total and then we have the actual outlier value which is this value here okay we can then get a percentage representation of this okay and a percentage representation of how many values are outliers okay so there's a lot of stuff that can be done there's a whole lot of guidance on this information but what I'll do okay so let's just kind of just to kind of quickly see the implications of this what I'm looking at here I have a bar chart or column chart okay the green bar is the ANC one value excluding the outlier the blue bar is the value including the outlier okay and for most of them they're not that different right there's a difference of 100 200 300 okay as a national total not so bad okay the green bar is the value excluding the outlier it's just 29,000 if I include the outliers for that particular month in December it's 45,000 15,000 higher okay so this is kind of the implication of including those outliers within your totals right in some cases if they're really statistically incorrect or just impossible then they should not be included when you're making reports this will also affect your national total for example if I were to make any type of estimate on ANC one coverage for December it would be wrong if I included those outliers right it would probably be over 100% in this scenario so that doesn't make any sense either so that's why we try to exclude those outliers for analysis or at least identify what those outliers are so we can identify some follow-up action that we can perform in order to mitigate the effect those have on our overall national totals okay okay I'm gonna stop here I've given you a lot of information I apologize there's still a lot more that we include okay so we have various methods for outlier analysis there's resources for all this stuff we have validation rules and notifications okay which we can also include our outliers in the validation rules we can also perform external consistency analysis using these validation rules I didn't have an opportunity to discuss that but if you bring in data you can do that external consistency analysis as well it can be sent to an email okay and then we have a number of considerations for implementation okay it's not just features so I have a lot of guidance that we've recently written okay we have SOPs we have the full toolkit we have checklists we have other types of configuration that support the use of these features okay and I put all that in the Google Drive so for example whoops okay we have a SOP template for data quality that discusses kind of general roles responsibilities what the different levels of the health information system should be responsible for and what they should do and what they should perform okay this will load okay if you're considering revising your data quality procedures we also have checklists basically of the different tasks you should consider if you're looking to implement new data quality procedures or strengthen your data quality configuration so we have kind of both various kind of preparatory tasks in terms of kind of reviewing things and getting things set up kind of what we kind of expect would be minimally configured the different type of outputs and reports the data quality tool WHO data quality tool is still in there because there is a report and there's all kinds of visualizations there's the dashboards links to this and then we have all the implementation side of things as well within that SOP and within the Google Drive okay there's also a checklist for observing behavior at lower levels of the health system okay to ensure that all the various procedures that you've implemented are being reviewed on a routine basis okay so this is kind of to accompany training and your standard operating procedures to make sure that behaviors are changing over time okay with the various checks that we've described and basically saying if people are performing them on a routine basis yes or no this would be typically filled in by the supervising person within that facility at whichever level you happen to be working with yeah I'm going to stop there I apologize there's just too much for me to get through but okay so it's 445 I know you have your social event at 530 so you can meet at the lobby at 530 if you have any questions I'm happy to stay back and answer any questions and if there's any more kind of questions about this during the week please have a look at the remainder