 We'll discuss completeness and timeliness and what that looks like, validation rules and validation notifications, different types of outlier analysis, including based on standard deviations as well as a minimum and maximum outlier analysis. We'll also talk about predictive analytics and how we can apply this for purposes for the purposes of reviewing data quality. And then we'll also just briefly touch on the WHO data quality tool, which some of you might be familiar with, some of you might not be, but we're happy to also discuss that in more detail later on as well. So just as a group, since you're all here, we're talking about data quality, I just want to do a very simple exercise with you all, okay? So what we're going to try and do is identify the outlier based on the visualization that I'm showing. And you can either say it aloud or just think to yourself. So have a look at this chart. And I know it's kind of hard to see, but even with what you can see, what do you guys think are the outliers on this chart? Are there any? So you guys are all able to identify. I can hear you pointing and saying, doing one of this with your hands, right? It's very easy to see, right, that these are potentially suspicious values, right? These are outliers that might need investigation. And this is just a very simple chart, right? A very simple line chart showing our data over the last several months, right, the different values. And we can really quickly identify that these are potential values that we might want to check on. That's not a hard exercise, and it's not hard to train somebody to review something like this and do it regularly, right? The whole point of this is that we don't need to make it extremely complicated to do this, but there are some simple tools. There are some some challenges with that and I'll discuss that in a moment. But the idea is, you know, we try to make this as simple as we can, because it really helps us in reviewing our quality. What about this one? This is drop-off rates for pbt3 to pbt1, or 10 to 1 to 10 to 3, whatever you guys happen to be using. Can you guys spot the issue here? Some of you are this, some of you are kind of giving me this look like, uh-huh. So this one's a bit tougher, I guess, right? So sometimes it can be very simple, like our last one. Sometimes we need to understand a little bit more about the data. This one, I didn't even mention what variable we were looking at. There's a number of variables at the bottom, but you can't really see them, but you can just automatically kind of tell, okay, the ones that are really high, maybe there's a problem. For this one we're looking at drop-out rate, and sometimes we require that program specific knowledge or context to kind of apply this in the practice. So this is for immunization and, you know, actually, you know, these negative ones here. So these are negative values, negative drop-out rates. It's really not correct, typically. This means that the amount of pbt3 or 10 to 3 is more than the amount of pbt1 in industry. So you might have some challenges there when you're classifying information out of the soil. So sometimes, you know, as you kind of get into this, you see it's a can be a bit more challenging. This one you can't really see the colors, huh? Okay, so I'm going to skip this one because it's too hard to see. Okay, so the idea is that there are visualizations that can really help us in spotting some obvious issues with the data. Sometimes it requires some program specific knowledge. Sometimes, you know, you can just really quickly spot these issues. And the idea is to make it as simple as possible, at least for this first-line level of support, right? Because often when we refer to data quality, it becomes this very large exercise. Interviewing data quality may be on an annual basis. Some external person might come in and review all of your data together, submit a large report. It's very difficult to understand. It's very hard to act upon those issues in a subsequent way. So there are a number of things we can do to do this on a more routine basis and make this a little bit easier on people. So I think that a lot of these features that we will be discussing are not just meant to be done on an annual or a two-year basis where, you know, we have this big exercise on reviewing all the data quality in the system. That is very time-consuming, first of all. And secondly, it ends up, you know, the report can be sometimes challenging to interpret and actually act upon. Whereas if you're kind of proactively looking into your issues on a more routine basis, you know, then you can deal with these items well before they become, you know, a big problem. It's not true for everything, of course. This might be limited to a subset of information, but it's still possible to do. Right? And you can see sometimes, especially, you know, items like this, you know, it's very easy to kind of train somebody on a monthly basis. Can you just look at this chart and if there's a value that's really high and you inform us, or, you know, you can come up with some very simple guidance to allow that person to review their quality of information in the field. And you can even have, you know, staff who are maybe just entering the data and you can just review this on a more routine basis if these are available. And then, of course, with the program-specific knowledge and skills, you might have to discuss this in a little bit more detail. Right? It really explained what the programs bring in experts who kind of know what these indicators represent and kind of discuss why this is a problem. Probably it's a problem within the facility itself. All right. So there are, quite hard to see on the screen. I'm just trying to figure. It's still hard to see with the light. So what I'm showing here is that sometimes, though, these obvious challenges are not as obvious as they could be. So if you look at this data, there's like, there's small, minor bumps in the information, but there's nothing kind of really crazy, right? We don't see this high variation from one value to the next. Okay. And we kind of see like this bigger bump here, but it's not so large compared to the previous value. Right? So sometimes there's issues with kind of the sensitivity of the information. But if we look at this at a facility level, at a lower level, where it's not all aggregated. So this is all aggregated for the entire country. And we can see here, for example, there's values like around 200, 300, and then all of a sudden there's this value, it's hard to see, but it's 33,000. Right? So that is a big increase compared to our previous periods. And we're able to identify those outliers at a lower level. But the national total might be, you know, very high. Maybe you're dealing with millions of people, especially for something like immunization, or maybe intonatal care, where you have millions of people registered. Then, you know, a small variation in 30,000 at the national level, that's not going to be so obvious. Right? Whereas at the facility level, that's going to be much more problematic. So sometimes you have to look at your data at lower levels in your system, and we have a number of tools that support this. This is just looking at national totals year over year, called as a year over year chart inside of the HHS2. So we can create that and put that on the dashboard, but it might not be as sensitive in picking up values at lower levels of the system if you are aggregating that up. It might be more useful to view those year over year values at the facility level or district level. And where you might pick up on this a little bit more. So Scouterplots are also another visualization type that we have inside of the HHS2, and we can use this to identify outliers on a routine basis. So we can focus on dashboards and we can, you know, allow it to help us to really view extreme outliers in particular. So these are values that are just really too high or really too low, compared to the other data that we have in our system. So once again, I apologize, the light makes it very hard to see, but this chart is subdivided into four quarters, and these top quarters are kind of our problematic areas where it's saying that the values are just really high compared to our other data. They're not consistent with the other values that we have. So maybe you're not as familiar with this chart because it's in the newer versions of the HHS2, but it's a very nice way of identifying outliers and showing those outliers on the HHS2 dashboard prefer the follow-up in action. So we can see, for example, this red, I mean all the red values are kind of outliers theoretically, but really these ones that are way out here and way out here, you know, very far off from the rest of our averages from the rest of our previous values, these are quite problematic and would require further investigation. So it also allows you to prioritize a little bit. So you always might have some variation, you know, but you may not know and maybe you spend so much time on all these that are kind of more tightly knit or close together, whereas these values that are very far off, you know, they might require more investigation. And this is once again, something else that is quite easy to train somebody to dispute, even if they're not familiar with the entire kind of theory behind how this is calculated or made, but I mean, basically, you know, you're just looking for these very far off values and if there are any, that might require some investigation and I want to inform somebody and look into this a bit more and determine kind of the direct cause of this problem. So all of these items that I've shown so far, they can all go on the HHS2 dashboard. So it also might be for your consideration, you know, if you don't already to think about building data quality dashboards for various people to review. And of course, you can share those at multiple levels of your system and different users will have access to kind of the different geography based on what they're doing. All right, so completeness and timeliness. This is a feature that DHS2 has supported for a very long time, right, in data entry for aggregate values, kind of allows you to talk a bit more of timeliness of tracker data in his presentation and just show you a bit more of what that might look like. But once again, I apologize, it's quite hard to see me for those of us in this room, but there's a lot of variation in these lines. This is just the number of reports being submitted. So you can already tell there's some kind of problem or issue with the data. So if I were to look at a national total and every month, I have a different number of facilities reporting, I mean, the values are probably not that accurate, right? Because we see big spikes, we see them go down again, up again, it's just all over the place, right? So when this is the case, it kind of, you know, makes reviewing the data a bit more difficult for challenging. And, you know, you might want to kind of try to figure out what those facilities, you know, what's going on. Of course, we also have routine rates that we can calculate. You can see, for example, it's sometimes quite obvious when there's a problem, when the rate just kind of dips. And, you know, there might be a reason behind that, of course, service outages or things of that nature. But if there's not, you know, my warrant investigation. So all of these measures that I'm showing here, they can be calculated automatically by GHIs too. I'm just using a very simple feature set for aggregate data values. Validation rules is another tool that we have inside of GHIs too. Now, these are features that are supported, but remember, none of these are configured out of the box. You have to kind of apply them to your own data, your own data sets, when you're actually reviewing this information. And with, at least with aggregated values, you know, you can also do this with tracker data by aggregating those values and comparing the aggregation, at least over a different period of time that you would like to see. And of course, all of these validation analyses have quite a bit of information that you can obtain from them to determine, you know, give more information to the users that are reviewing this information, provide more information to yourself to be able to subsequently follow up on this information. Validation notifications is another feature tied to those validation rules. So we can actually, anytime there's some type of issue with your consistency, and let's say your number of malaria tests is greater than, you know, your number of suspected cases, for example, which shouldn't be the case, then you could send out a notification of some kind via email, via SMS, or also using the GHIs to internal system, but that might not be so helpful in all cases. So usually some, you know, for people who aren't regularly logging into the system, this can be quite useful. Of course, you don't want to overwhelm them, right, because they'll just start ignoring all these messages. Okay. But you, so you want to be kind of careful and targeted to make sure that only the most critical issues are the ones that are kind of being sent out. But you can use various thresholds and various calculations to compare your internal consistency, or even your external consistency with other data sources, to be able to create these notifications that are then sent out to people so they can check on the data or maybe inform the right person to perform specific follow-up actions. GHIs too also has quite a fair amount of outlier detection suites, and these are built on a couple different methods. So I mean, the first, it was kind of the scatterplot. There are one method to use to detect outliers. We don't have to get into the statistics right now, but we also support this here as well, where we can get information on the different outliers that are detected based on, you know, a comparison of previous values and, you know, a certain kind of stratification or, let's say, a comparison of the data with previous values or, you know, values from other periods, just to determine the consistency of those values overall. So there's a lot, quite a large feature set that you can implement to support this overall. We can also use predictive analytics in support of our data quality. So I'm going to try and move through this a little slowly just so I can kind of explain, you know, what I'm trying to get across here. So I know it's kind of hard to see once again. I apologize for that. If you look, this is a value for A and C one, and I just showed it by facility, and if I look at it over many months, my values are between 50 and 200. So there's not a ton of huge variation. Some of these hospitals have higher values compared to the facilities. That's expected if you look at the previous trends. So what I'm going to do, I'm going to do is sort the table. So now I've sorted the table and we see some values, like this is 25,000 in comparison to the last one, which was 28. We have 28, 28, 34, 28, 22, 18, you know, just kind of goes on and on. Very small values, right? And we have some other instances of this. If we look here, we have a thousand, more than a thousand reported, but that's fairly consistent because the other values are within this range. So what we want to do, we can use these predictive analytics tables or features to actually identify these very large outliers and just pluck them out of our data. So we can pull them out and see where are they coming from, understanding that the data has a certain range of values, and that we might need to review these values. So what we're saying here basically is this value of 25,000, it's extremely high compared to all the other values that the facility has reported. Now we don't need to do, it's not really giving us a trend analysis of any kind. That was why I went through these other tables, just to kind of show you where it was pulling that out of. But we can use these features to identify these outliers, and you know, we kind of set the formula basically for it to be able to pull these values out and display this. So this is just an irregular pivot table, which would mean we could put this on our dashboard, and you know, every month we could see those really high values. If there are any, you can see in some cells they're blank, meaning that I don't have this really suspiciously high outlier that I've detected using this feature set. And then maybe I don't have to worry for that month, depending on what you're looking at. But these other values, you might want to find out the source of them, where they're coming from. Maybe it was data entry error, if it was something else. And it can help us with that investigation. Another example of using these predictive tools is by creating thresholds for our data values. So in DHIS2, you know, you can create a validation rule. You might just compare it to two data items that you're currently reporting on. But you might also want to detect if the value is very high in comparison to some previous trends. You can also use a WHO data quality tool, which I'll discuss in a moment. The issue with that too, however, is you can't place any of those items on the dashboard, right? So the theory here is basically, we can calculate those thresholds based on similar formula to WHO, or if you have other formula that you want to use, there's quite a broad range and spectrum of statistical calculations that you can use to calculate these thresholds. So if you want to be able to detect these on a routine basis, because you could insert this into validation rules as well and say, is the value I reported greater than or less than the threshold for that value based on the formula that I've entered inside of DHIS2? This is just showing a comparison of what an extreme, what an extreme level error would be considered to be based on. None of these are exceeding, but the red bar is the reported value, the green bar is the threshold. So none of them are being exceeded necessarily in this case. But you're able to then generate those thresholds and use those for a variety of purposes in the different visualizations we showed. You could show graphs where you compare, like I'm showing now. You could run validation analysis, like I showed in the previous slide. So there's a lot of ways you can use these various thresholds through this predictive analytics feature that kind of really enrich your ability to pluck out these data values problematic and review information in a more timely basis. And especially put all of this on the dashboard, because then you can just make a dashboard and you can train your staff to review that dashboard in a routine manner and investigate these values further. So a number of the features I talked about are also covered by the DHO data quality tool. The challenge here, of course, maybe you're aware of this tool or maybe you're not. But the DHO data quality tool is a custom application built on top of DHIs too, but it only works inside the application itself. So you're not able to extract any of the outputs that the tool produces and put it on a dashboard. You have to go inside the tool and run the analysis each time. So the training requirement for this tool, it's a little bit high, right? We can't create outputs that people can share. So everything I showed up until now, we can put that on a dashboard. We can put it in guidance documents, etc. And we can train people on it. And it's probably a little bit easier to do, right? So where I would recommend this compliments is for these larger analyses, these larger kind of reviews of data. Of course, you can train people to use it on a monthly basis or more routinely. It's completely possible. But if I had to prioritize capacity, right? This tool is quite large in scope, can do a lot of different things. But you have to know how to use it every time you log into it. Otherwise, there's nothing really preconfigured for you as such. So the Devisio data quality tool looks at four different measures based on the Devisio data quality framework. This is completeness and timeliness of data, internal consistency of our data, external consistency. So this would be, for example, comparing your data with another source, maybe a DHS survey or a mixed survey or some other survey tool that you perform data collection on, as well as consistency of population estimates. So I know denominators is always a problem for everybody basically in the world, right? So it's also just comparing your various denominators and seeing that this is a problem. Now a lot of this we can do using the tools I already showed. But you know, there are some other functionality, additional functionality that is here within this tool. So here's an example on completeness. So we get scores. That is one thing kind of unique to the Devisio tool. So I showed you examples of completeness and timeliness before. But in the Devisio tool, we get these kind of scores based on the thresholds that are set. And once again, someone configures this application after it's installed on your DHX2 system. So any of these thresholds, they can be modified, but we try to keep these quite high. Otherwise the score is maybe a little biased. But then we get scores basically, in this case, I've done it for a couple different data sets. And we're getting an overall score basically to determine, you know, how well the reporting is on that data. So this goes a little bit beyond than just looking at the reporting rates every month, right? It's actually giving us some type of measure of quality. And the same is true for timeliness. So we set a threshold of what we think is timely. Of course, we do this in DHX2 as well. That's how we're able to obtain all the various charts and everything related to timeliness. But we get a score once again for each of our different data sets that kind of helps us determine, you know, what's going on. If there's a really problematic data set, people are just not submitting the data on time. So for internal consistency, so you might not be as familiar with this term, but really something like the validation rules that is often measuring internal consistency, we're comparing, you know, two variables that are reported within the same system essentially. So maybe you have two variables on your malaria data set or two variables on your ANC data set, whatever it might be, and you're comparing those variables. So that's all this measure is really talking about. There are a number of features within the DHO data quality tool that support the review of a specific principle. And once again, you get this score at the end for each of these values. I know it's quite hard to see. The slides are online, at least, so you can download them. But I apologize, the screen is not as good as the one in the DHO. But you get a score for all of this, and it allows you to kind of have some more idea of what is happening with your consistency. And there's a number of consistency measures that are there. So you're able to compare consistency over time, you're able to compare different values, and if they're consistent with one another, there's a lot you can do. But once again, you know, maybe this is less of a routine measure. It is harder to train people to do this on a routine basis. So external consistency, this is another one that I talked about. And here what we have is we're comparing two different denominators in the same system. So this often happens, right? Where you have two denominators, they're the same value, you know, maybe one is, you know, coming from one program, and one is coming from another program. So for example, here we have ANC1 coverage, and this one is comparing basically the routine ANC1 coverage from DHIS with a demographic and health survey. So it's just seeing how far are those values from each other. Of course, you need the values from your survey inside of DHIS to perform this analysis, but you are able to take this data from different sources and compare and contrast them to see if there's a wide gap. And if there is a large variation, then you know, which one is accurate or not. Because often many people will fight for the survey results, of course, but the idea is to show them that, you know, you're collecting data routinely, there is some impetus, some strength behind the data that you are routinely working with. This is a bit more difficult to set up, of course, because you need all that data from your surveys. But of course, you can select subsets, pieces of information that you want to compare to perform this analysis. And the last one was consistency of population estimates. So sorry, I kind of started talking about this earlier and then switched over. But this is comparing denominators from two different programs. One is from the HMIS, one is from the EPI program, and it's looking at population less than one. But sometimes we've had this issue or inside of DHIS too, there are many denominators. Valeria gives some denominator, EPI will give one, and they're all different. None of them are the same value. So which one do you use and which calculation? How can you be sure that your coverages or your incidence rates or any of your other indicators are correct? So this is what this is trying to evaluate. There's a significant difference between all these different indicators. And if so, to what degree does this affect the quality of the different data that I'm reviewing? Of course, it's better if you can collapse and just have one denominator. But in many cases, that's not the case. So there's a lot you can do with this tool. And we have a very long introduction and tutorials on how to use this if you are interested. But once again, like I said, because the capacity requirement for this is quite high, there are a number of other tools that I've been showing you that are just directly within DHIS too. And the advantage is, of course, once you set those up and place those on a dashboard, it can be a little bit easier to train people how to utilize those on a much more routine basis. If you're reporting your data monthly and you're able to review all of these different pieces of information on a monthly basis, the WHO data quality tool you still use on a monthly basis as well. It's just the requirement is a little bit higher to use it. But it does offer some features that maybe are not always there. But a number of those features are there already. It can be placed on a dashboard and where possible. I suggest to keep it simple and as simple as you can. So if there are any questions about these features, of course, I'm happy to have a bit of a small discussion now. But I know I just kind of showed them, showed screenshots and things like that. If you're kind of interested more in how they're actually used or how you implement them, feel free to talk to any of us. We can spend more time on that. But for now, I'll just ask if there are any questions about anything I've presented at the moment. I'll straightforward. Thanks for the question. Thank you. This application, this WHO's application, could it be downloaded and also installed? Yes. So that's a good question. So yeah, the WHO data quality tool is just an additional application installed on top of DHS-2. So you can download it from the DHS-2 app store. I can show you directly. But for anybody else who has that question, yes, you do have to configure the tool once you install it. But it works with any DHS-2 system essentially. So the configuration for the tool, you'd be defining what your outliers mean. What is an extreme outlier? Is it two standard deviations, three standard deviations away? What are your quality thresholds for things like reporting rates for timeliness? All of those statistical measures that you're measuring against, the tool has some preconfigured notion of what that is. But you also have to determine what data elements and what indicators you want to actually evaluate this against, what your data sets do you want to evaluate. None of that is preconfigured because every DHS-2 configuration will have different metadata, essentially, different configurations. So you have to tell it, what do I define an extreme outlier as? What are the threshold values? What are the indicators I want to evaluate? What two items do I want to compare? It gives you some suggestions, but those are modifiable. Is it for the iCapture or iCapture? Yeah. You're from Lalli, I guess? Yeah. Okay, we use iCapture2 and Venumon2. So validation rules are a core feature inside of DHS-2. So for iCapture it depends. If it's an event program, then you have a different functionality called program rules, and you can use those in a similar way to check your event data. For the aggregate data in iCapture, validation rules and notifications will work. If you're aggregating your event values, then you can also use validation rules and notifications. So if you want to compare numbers in your event programs, you can still use this functionality. It just depends if it's individual data or if you're kind of taking the numbers. So you're adding up different events in your program, but you can still use it in iCapture. And with iCapture data, I should say. Any other questions? So you're discussing amongst yourselves. Not a problem. Okay, so if there are no more questions, I'll hand it over to my colleague Venumon, and he'll just kind of go over some more practical considerations on how much you use it. Actually, there's some questions online, so maybe we can take those as well. Okay, so the question was about validation rules in Tracker. So once again to the online participant who's asked this question, two schools of thought there. So one is if you are comparing individual events or individual, just individuals, patients or something like this, then you would use a concept called program rules. Okay, but if you're aggregating that Tracker data, so you're taking, you know, the number of patients that meet a certain criteria, and comparing this, you know, to a number of patients that meet another criteria, then you can use validation rules and validation notifications. So it just depends on the data that you are analyzing at the time. So once again, for the individual records, each individual record, if you wanted to analyze something within that, you would use the concept of program rules. If you're analyzing many records together, a number of some kind, then you would use validation rules and validation notifications. Okay. Okay, so thank you all for the questions and discussions. So yeah, right. Good afternoon, everyone. I know it's not the best of times to do a presentation. So while we set this up, feel free to drink some water and, you know, right. So what I'm planning to discuss in next 15 minutes or so, maybe 20 max, is to discuss around the data quality and data use experiences from Sri Lanka. So again, this is not just data quality. And one thing I must emphasize is that I'm not going to explain any of the data quality features, functionalities, which are there in PHIS too, which should I did already. But rather, I will be talking about some practical information, some implementation experiences and challenges, things like that. So if I start from where I left yesterday in the presentation on immunization, in that presentation, we specifically talked about routine immunization, right? But then some of you approached me and asked about all the, you know, like implementation around COVID in Sri Lanka, because there were like some articles that were written about how we started and during the pandemic, how it was made possible. So some of that I'm planning to discuss today. But then again, nothing specific on immunization generally. But I will mention why and how it was, it really made possible in Sri Lanka for us to roll out all these programs during the pandemic. It was mainly due to the various data quality and data use measures that we have been practicing over several years, right? So first of all, what do you think about data collection forms? And do you really feel that, you know, like how the data collection forms are designed? And do you see any relationship with data collection forms and data quality? What do you think? Yes, no, it's something totally relevant. I mean, we, you know, in DHS too, we configure data quality, validation rules and stuff separately and data collection forms we design in separate like we design data elements and put them together. So it's totally two different things. Is that so? Yes, yeah, obviously, yes. But the thing is, it's not really the case, right? So in most of the countries that we support, we don't have, we don't really collect data in digital format at the field level. Isn't that so? Like, I mean, we are, it's really good that we have all, most of our colleagues from Asia region. So most of the issues and challenges that we see here are common to all of us, right? The way we implement the challenges that we have, the complaints which are coming from the end users are very cultural and contextual, which is a good thing so that we can discuss. So the thing is this. So if you design forms in a less confusing way, right? And to, you know, like when someone has to look at a paper, paper form, which is like 10 pages long, and when we expect them to capture the data on paper forms and copy paste it into a like or write it or submit it on a digital format, the way we design, I mean, it always matters, right? So we should minimize, and we should try and prevent errors that can happen. So I will be discussing some of these steps that we have taken a long time back. So all the content that I'm showing here are a couple of years old, right? So what I want to highlight is doing this kind of measures couple of years back really helped us build capacity at all the levels, which made us possible to implement digital technologies during the pandemic, when we really could not reach out the district and feed level much, right? We had some focal points. That's all because we had to do things mostly online and rapid. So first thing is, and also I must emphasize like the use cases I'm taking in this, in this presentation are related to mostly related to maternal and child health and nutrition. And so those two programs in the Ministry of Health, I must acknowledge for their support in providing all the screenshots and content. So the first thing is sometimes we do some identical kind of transformation when we are designing data and reforms. This is mainly for the existing, I mean, paper formats, which have been there for a couple of decades and well established. So here when we try to introduce something digitally in DHS too, we don't really try to, you know, like change things too much so that end users will get, I mean, they might get confused and there might be too much of resistance again coming from the field and district level. Okay, so you have to be a bit more strategic when you are dealing with a well established paper formats, right. So for example, sorry about that, I mean, anyway, so I was actually going to apologize you because the content was in our local language, but now I have to apologize twice for the content plus the visibility, sorry about that. But basically, like this is one paper format we had like about 14 pages long, but it is a, it was a well established paper format. So the content, the data items, it has been finalized, right. So we had public health experts for decades. So they have done this process over and over again. So the data items, I mean, we didn't have to change anything much. But the way we customize DHS too, because this 15 page form, when you put it in digital format, we should make sure people don't make mistakes. And we should also make sure that we don't bore the people, you know, like who have to collect and write these 15 pages and put it back again in DHS too. We shouldn't make it too difficult. So this was when, when we decide, when we design this form, I'm sorry, you may not really see it. So about five, six years back, when we designed this form, we kind of designed a custom aggregate form with vertical tabs, right. So we had to introduce that so that the presentation wise, when you see the form, it was like, I mean, more appealing. And there was this chance of making errors when they are entering data. And then sometimes when we have systems which not well established workforce, so for example, this mental health system and the school health data that we collected in Sri Lanka, and we digitized about in 2018. So those days, we could actually relieve the paper based formats, right, and make some suggestions because ministry again was kind of, they were accepting the feedback, right. So first what we did was, this is the kind of form that they had, right, with a lot of, you know, like completeness issues. So they were mainly them. So the main issue was like, see, we have a form, but people don't submit. But we have a MCH form, people do submit. So please put it into DHS too, because MCH people were submitting and we had good, good completeness. So let's try to do it for mental health. So people will submit, right. So it's not really the case. So there were some issues with the paper format also. So what we first did was to discuss with the ministry and kind of change the layout and the structure in which they collect the data. And also we kind of strategically made it in such a way it is not too different to the data collection forms that we are going to have in DHS too. Of course, in DHS too we have this tab. Now here we are using horizontal tabs, so that for layout wise it is much, I mean, it's looking much better compared to having a very lengthy form. But then the data elements and the data items, how it is ordered is the same to watch what we had in this modified paper format. And also this again is something similar in the same form. And sometimes we also get requests where we don't really have paper formats, right, especially when we introduce new data forms. So we have more opportunity for us to get actively engaged with the ministry in designing the paper format as well as in the meantime when we design the system. So we had this MLH indicator form which was newly introduced. So for that we actually could design a paper format where again this is an event program that we had to create. So we had the opportunity to create both the design, both the paper format and the DHS to customization at the same time. So we had more flexibility, right? So what I'm trying to emphasize is like when you are assigned a DHS to customization or your job is mostly on supporting DHS too, I mean try to kind of have a better engagement with the ministry and try to explain them like it's not really like digitizing a content but there's not more that you can do to increase the data quality overall, right? So it's so thing is like you have to optimize your paper formats and you have to optimize your workflow and finally you have to think about the DHS too, right? Just applying DHS to validation rules will not really happen. Right and also this way this of course is from 2015. So when we had to design some custom Android applications for collecting nutrition data, one issue they had was that when they were capturing this height and weight values they wanted it to be you know like validated rather validated it's like comparing it with this WHO standard deviation values and give a kind of visual prompt where the height and weight value stands as the normal criteria, right? So for that what we actually did those this was to kind of change the background color of the data and reform. So here you can see it is green meaning like it's probably between minus one to plus two SD standard deviations, right? We all know these growth records where you have these color bands and all. So the same thing this was a request that came from the field level and we had a lot more flexibility in designing the application. So we had we tried to capture at the level of data capture give a visual verification of where the data lies so that to prevent the any accuracy related issues that can happen while capturing the data. So the same thing we are highlighting there as a visual verification. And sometimes so basically like what I've been summarizing in last five minutes or so is like we should you know really target on paper formats and we should pose minimal disturbance to service providers because then there will be a lot of resistance coming from the end user level, right? We have to be strategic when we are designing forms. We have to think of data quality as well as do it in such a way that you know there's maximum compliance from the ministries. And also we have to think about the infrastructure related issues like so when we collect very lengthy forms what happens when internet is not there, right? Or there may be disturbances for like five, ten minutes. And again when we are you know like in ministries in workflows we have approval, right? So in DHS too also we have a data approval for aggregate. So we should use features like that also in DHS too. I mean not just you know I design the forms that capture data because this data approval is a is a really nice feature which we have independent in some programs which is really helping the data improving the data quality. Right. The next thing is about user trainings, okay? So when you are conducting user trainings I know all of you do. So what do you focus? So we all know that we focus on you know like teaching people how to enter data and how to analyze data. These are core components but data quality is something I mean we should not really think that so sometimes I have seen proposals where you have like when you plan trainings the first training is for aggregate data entry, right? So you have series of trainings aggregate data entry and then you have separate for analytics and finally after one and half years you plan for data quality, okay? But it ideally should not be like that. You should try to include some of the data quality related concepts in the trainings itself even in the data capture trainings itself, right? It doesn't have to be trainings on WHO DQ app but simple things like running validation rules. Sometimes it's because we are lazy in customizing systems. We don't we try to launch something without you know like putting in place all these things like validation rules because it's easy for us but if we try to do a proper job I think we should try to design training programs which also includes some of the data quality features. Next thing, user manuals, right? So at least in the Asia region I think user manuals is not a major issue because most in most of our countries when we deliver something you are expecting a very big user manual, right? By default it's like that at least 100 page or something like that, right? So we all produce user manuals. I don't know how many of you, how many of the end users are referring it but one thing that we do especially this particular program is that we maintain the user manual. It's a living document, mind you and you can, I don't know whether you can see. You have the version number, right? So ideally you should have a version number or some way to identify. It's not the same user manual that you produce or share with the ministry in 2019, right? So try to do like that, try to maintain a proper user manual which has updated content. And then the SOPs. So if you observe what is there, like here it is a checklist which we distributed to health facilities which they have to kind of be advise them to keep at the desk where they are entering data or paste it on the wall where you are having your laptop so that it kind of highlights some essential items that you have to check before submitting data, right? So sometimes we take it for granted after conducting the training program that people will be just entering data and doing all the validation checking and something, right? We can of course make it compulsory to run validation rules. But the thing is talking at a very generic level we need to have a SOPs in addition to user manuals and user guides so that and you have to make sure that people are practicing those, right? So you need to have these kind of documents which are kind of checklists so that when you enforce them people will kind of learn to practice. This is a single-pager user guide. So I think even in most of the documents and resources shared by UIO during the pandemic had this but in Sri Lanka we were practicing these single pages from 2017. So this is a kind of a quick guide, right? Not too much text but a lot of graphics where we keep next to the data entry desk so that they can, you know, like have a quick glance in case if you have a new person you don't have to go through this entire big user document but these single pages for the task. So we maintain them especially in the MCH related programs for a very long time and it was really helping us during the pandemic when we had to train people within like a couple of hours of online training. Same here and of course we use some remote desktop technologies again for few years because some of the facilities we can reach online but reaching them on site is terribly difficult, right? So for them we are using several remote desktop technologies and using them of course really helps because sometimes if they don't have support they just submit. Otherwise simply if they have good support they will they will feel like asking getting things verified before submitting. So even for them we had SOPs and this is a kind of a user support mechanism. I know most of your countries are having a user support mechanism but we try to make it more structured which has couple of layers of support. So one layer which sits very closely to the end users and then gradually you have a kind of a higher level where you escalate matters and finally, I mean finally you have a, I mean the year that year three is about very technical matters where public health, I mean CCP stands for consultant community decision. So that's kind of like the public health expert with the doctorate in Sri Lanka where you know like you have very, so not, I mean these people don't get maybe they may maximum get one or two peries per month and then we have health informatics experts but other than that this year one and two they'll be solving most of the user related issues. Right and then we had this, we still have this data use so data review meetings. So these meetings are super important because this is where people discuss the data that they collect and this is where actually the data become information. So these should not happen at national level, these should actually happen at all the levels. So when it comes to MCH, metal and child health, we have it at PHM stands for public health midwife which is the field level and then we have other sub national levels all the way up to the country level. So we have this review meetings at different frequencies of course at field level we have it more frequent monthly but at country level it's kind of buy and only. So here what you are seeing is a field level review of course you have, you all know so here not just us, this is one program he participated so this is what happened in these meetings. So they discuss so many administrative issues as well as they discuss the data they have collected. But the problem is how do we guide them, how do we make them aware like what they should be looking at in this meeting. So one strategy that we use is to provide support for the review meetings. So after deploying DHS2 in the MCH domain you can tell okay these are the areas you should discuss but going one step further what we did was in our DHS2 dashboards we created some dashboards now if you can see these dashboard items see here the MCH review 2021 slide 6, 2021 slide 7. So what these people actually do is in this data review meetings they are supposed to prepare a PowerPoint and present it so that everyone can see and comment right. So to facilitate the matter what we did was we just created some dashboards in DHS2 with some template favorite items so they are just supposed to download them and put it in the PowerPoint because we tried to change them you know like don't use PowerPoint just use DHS2 but it's not that easy right you know it's commonly in the region it's sometimes they have been practicing things for too long it is not too easy but it was much easier for us to create these dashboards and ask these people to download these dashboards and use them for data review right. So this is what we have been using and they have been doing it for now several years and this was super useful I will come back to that during the pandemic because they really knew what they should but they should be doing. So this again is another slide and then these are the kind of regional or sub-national level review meetings we kind of have it in quarterly basis and then this is a national level we are when it comes to national level most of the data is clean right so these meetings are mostly conducted to identify some cross issues and of course some of these meetings are even targeted to see how we are aligning with the strategies right so because when it comes to this level most of the data quality issues have been atrocious and what happened during the pandemic so we talked about this field level sub-national level country level big gatherings right physical meetings and during the pandemic these were not possible but the data because we had to provide sales services so how do we ensure data quality and how do we make sure people were actually using this data because we had all these measures in place it was just a matter of changing where we are present right so rather than on site what we did was to shift online okay but it was not so difficult because we had all the slides required for this review meetings on DHIS too so what we all had to do was people had the laptops they had the internet zoom some of them at least I mean one person from field level knew right so we didn't really have too much trouble in training someone on how to use zoom so using these online telecommunication services they were able to conduct the same review meetings in fact with much more efficiency and cost effectiveness right so because otherwise national people or provincial people will have to travel to districts right physically which is like taking two to three days with the accommodation you know like first they travel in second day meetings that they're traveling right so you reduce three days just for one day work but with this of course some people don't like it because now they can't travel but however this was really useful and we were able to contact this even during the pandemic and this slide of course is from one of research publication article source mention so this was looking at how much of nutrition data was entered during the pandemic so you can see you actually can't see sorry so here from I think from March to July so here we are we are kind of looking at one district called Jaffna it's another part of Sri Lanka it's in post conflict state we are looking at how much of nutrition related events were recorded in DHS too so you see this sudden deep here because that was a curfew complete lockdown right no services provided but the good thing is just after the curfew we went back to this new normal and it continued during the 2020 right it was at a lower level because the health service delivery was some somewhat reduced because of you know like health staff getting engaged in so many other activities but the data kept coming on which was a good thing so that was made possible due to all these attempts around data quality and data use that's what we wanted to highlight and more about that available in this research article and also we asked everyone to see our DHS to dashboards but in this region we have to have some physical publications right so these are the things ministries have been doing for like decades so this they still have to have this paper publications and these publications continue it's the only change that happened is most of the content keeps coming from DHS too right so for us to do that the quality of the data has to be at very high level so but thankfully to all these resources and commitment of the health staff we have been able to do so right final two or three sites so now shifting back from all this work that was done in the past to the current context so now we are talking about COVID-19 right so as some of you are aware Sri Lanka has one of the largest COVID-19 tracker programs from day one because we pre-registered virtually the entire dull population into DHS too so we have a population of 21 million out of that initially seven 16 plus millions were registered pre-registered in the DHS too from day one of vaccination program right so it's a one of the large scale tracker programs I mean it was the largest kind of of course largest in Sri Lanka that time so after implementing this with all these millions of vaccination events and attributes and stuff we kind of did a review to identify some aspects of data quality this was done a couple of months back in fact May 2022 so what we found out was that we had overall completeness of 87 percent of data in in our tracker program and of course the completeness varied between the doses right so for first dose that time it was 89.8 second dose 83 and third dose 78 so it was actually comparing of number of actual dose given from which is coming from a different source not from the tracker with the data that was reported in the track right so completeness was around that and then interestingly we also looked at time limits right so in this large scale tracker we are looking at how much of data was entered real time what do we mean by real time means in the vaccination center people are actually entering data into the tracker program might do these are mobile vaccination centers not really sometimes in the health centers these may be in temples you know like schools and places like that so we noted that close to 30 percent of the data was entered on the same day so this analysis is not actually coming from DHS too if you if you want to know how we did it you can contact Chathura our lead developer so he knows how we did it but it was actually we got the data resource out from the database itself so we noted that 30 percent of the data was entered real time same day and 30 plus 10 at least about 40 percent of the data was entered at least within two days so what does that mean so probably this 10 percent may have captured data on paper format went back to the health center and entered data right so but that was good enough data coming within 48 hours so we had about 40 percent of the data coming within 48 hours which is okay for a tracker instance that's what we built based on the context of the country but all together if you look at the data that that that we had in the system within two weeks it went all the way up to 66 percent right so we have 40 here and another 25 there so two-thirds of the data of this tracker instance was there in DHS too within two weeks of the actual vaccination event so which I believe was very good given the context of the country with the lockdown and all resource limitations but then again we definitely had challenges there were some some in some facilities which with very low resources infrastructure and all and so many other different admin challenges where the data took very long time to reach the DHS to base system so so I also encourage all of you to do some data quality assessments around tracker implementations because the thing is this right there is too much of drive from partners and ministries to go for tracker but the thing is ensuring data quality and using the data in tracker is again something that you will have to think about and we don't have too much of evidence and research available on tracker aspect right so this is something I know all these countries are implementing tracker so something that you can also think about so I'm going to stop there any questions if any thank you we find about the duplication data so how you find that problem or not if you find the problem duplication data how you solve that for me at the tracker first we have experienced sometimes one tracer input the data twice three back yeah good question so in fact like one reason why we I mean one of the reason why we wanted to these these kind of analysis is to identify all these issues that we have so again this 20 million that we have we have identified there are some data which are duplicated sometimes it's not really a DHS to this problem because we also found out so one good thing I mean it's again very contextual in Sri Lanka we have a national ID which is which everyone has right so that I mean when we are using the national ID as a unique identifier there is very limited chance that you can have the same person registered twice but then again yeah I mean like we had challenges beyond our DHS to setting a unique identifier because there were some issues with the existing systems related to identity issue also where we found some duplicates and in addition there were some some areas we identified that there has been some you know like some manipulations of data so what so one good thing the more we understood that there are duplicates the more you know like effort that we had to put in place like audits and all to see how the data has been manipulated when it was changed who update things like that but out of all this data also we have identified some duplicates now actually Chatur and the team is currently I mean one thing that they are working on is to identify these duplicates and get rid of them so one good thing is when you are setting it up you should always try to foresee there can be duplicates and try to implement measures like unique identifiers but if you are working in a country where one unique identifier is not possible then you will have to I mean one is a tech aspect where you can you know like have a combination of them the other thing is to train people and put measures in place to you know like have regular supervision and identify any duplicates but this is a challenge and I think this is one major discussion even happening at your university of Oslo I mean how are we going to deduplicate data in tracker so there's a lot of work ahead thanks for your question any more questions if not thank you so much thank you very much good I still say so nobody wants will be sleepy because after lunch we always sleepy okay the first time I introduction myself I know today go we admit in a room or a restore but today I will be introduction myself my name is Lien I'm from Indonesia my role in Indonesia is support my teams for implementation there is two or a coordination with the Ministry of Health and then make the planning for implementation program so this is the one program have a good story and bad story so I just want to share because we know today three day we discuss more technical and now I just want to share not very technical just the story how we find like before I say duplicate data how we can solve that we use the unique IDT and then the many feature in the DS2 a very helpful okay this is the the outline I hope everyone will receive but the first time I'll be tell about the background why coffee in Indonesia use the DS2 and then the second the indicator how much the indicator how much the metadata we use because you will be fine good story because we have many metadata many indicator but we can single play more simple to the indicator and then problem of the data for many we implementation this DS2 for the coffee the three years and we find many things about the what Nick's explained before and the story telling about the descent pump is a good once experience because we use that okay and then system contact tracing a little different with the yeah many our friend in here like Sri Lanka or or the order country because we customize the DS2 combine with the other application and integration with the many system in Indonesia ministry the order ministry okay the first story we know copied change everything in Indonesia almost all will be pushed to digital we meet in meeting and we use zoom on something like that so in November 2000 I think is 2002 is 2002 years ago I say two years ago yeah this November two years ago we starting with Excel but not very helpful because we wanted tracing we analyze many things with Excel yeah Excel powerful but if you combine with the many data like the home vaccination and then how many people close contact because there's two in Indonesia use for the close contact how many for the child something many so that's why we use the DS2 because it's the more easy to install and customize that's the starting first and then we create many model application the first we use trace tracker capture sorry tracker in the DS2 in we have 10,000 pushkasmus we call pushkasmus a public health office in there in Indonesia 10,000 and then you can calculate uh calculate I think it's 50 50,000 user the first time so many user can use DS2 the first time because they they say user there's something like that so like the before it's like I said we design the form we design form we create on the system more the simple what they can use when you can use so but the DS2 DS2 will be the back end so everything data from the front end we create with the if technical is Reaction S and then for the mobile we use the water push to DS2 for analysis that's the one story with DS2 you can customize many things you can create as the back end analysis data okay so our user we have a for for the national this is not only Ministry of Health but we have the top level like the our president to use that the data because they want to make the this isn't making the policy brief and then the order ministry like the economy mystery to because they need calculation how many money they will be spent for the vaccination or the program so the national use the DS2 not only in Ministry of Health and then we have a user district we have how many must be 80,000 district and every district have a two person one for the data person one for program person so we have the user and then public health office what I remember what I say before we have 10,000 public health office in Indonesia for the 37 before 35 province and then we have the volunteer is the because covid is Indonesia is very worse we have many case so we have volunteer from the police for the military for the academician main for university so the calculation our user is if you look uh so many until now we have 200 I think 200 and for the active user you can look in 18,000 user active now this the base today I get the data today and yeah before we uh capture the user use the use the order system because we need the understanding how many people and every province if this is the related with the planning for the budget so it is just story before we go to data quality because uh this the point the point the story is how we handle for many user 80,000 per second so access the our application okay and go so this is the process the in here we have the treasure the treasure is someone collection data in the field is it the who person in close with uh how with the people uh positive covid close contact the close contact we have the half worker every public sector we have the gather is the name of the volunteer for every village in Indonesia so we have one gather for every village so they have to collection the data the person and then we have a military we have a police and military have to collection the data almost two years and then we have the other polinter like the from the school university and more so they put the college sorry the input data with the other application but to push to the s2 the application uh this is we create with the for the mobile so they can use the phone the on phone or we try with you sorry we try use the whatsapp before but not very effective for data quality so we done for the whatsapp because many people just put and wrong always wrong for the for the data okay and we have the second layer this is more more like the people have decision maker so we have then public health officer in the district level and sorry public health office uh the first people will be review the data that will be the next slide how we maintenance or how we review data with them and then we have the district health office district health office only can access the data analysis so I want to show this because we have a problem with the user role before we give all our user access for the view and edit and that will be worse every people can delete the analysis the dashboard everything so we split the first uh tracer only can access for input the data and the second layer only can view for the visualization so that's the we do in Indonesia because we find the one problem and then we have a province health office only so the district the data and then we have the minister of health minister of health and so every province level until level four for the for the public health office okay okay we have 54 it's correct yeah indicator it for and then we have the metadata but the problem now now we're sharing about the what we find the problem this is the related with the what Nick mentioned before how to use the like the sample validation rule or maximum and minimum okay we find many too many variables to be get because the first time they need for the identity person like the name age all the data yeah we know we need that for the analysis more deep so what the solution for that next the slide uh we have identity number typing error this the one mentioned before our friend is Sri Lanka the error because they wrong to put the number and then manual collection number this is the phone number the because we need the phone number for if the uh positive we can call and go to a home okay and then manual calculation complete the case so much so uh in tracker capture if you remember before this morning session you saw complete not complete until it so we make it the our script for the manual sorry for automatically complete that uh data and then data hopelessness this is the one problem it finds because one for the one treasure need to find uh 14 people in one day that in two years ago okay the many system to record and the same data we find this one we have five for the system and the collect the same data in order ministry of health internal ministry of health okay what our solution is the we use the you the first is data validation we use this one the for the solution for unique identity we integration with the our others ministry ministry of the what we call that Masali yeah civil registry we integration so uh we don't need to input many data against like the personnel like the name only they just put the unique identity and get the all the data like the age number and then where they live and then many things in theirs yeah the first is complicated because the order we different order ministry so we need to make like NDA for the integration so but this is very helpful to uh to solve the one problem is error typing unique and then the one's duplication the duplication we finds uh many than not only duplication they put four times the same names before yeah different unix a number but the same name the same name so that's why we you we integrate with the order ministry to validation the data so if the ready in the server not can put again okay rule validation this is the unix in indonesia 18 uh user have the problem about the digital iteration so they always wrong to make the dead format like the the birthday and and then the last time they close with the someone of positive just we find that so we create the one is yeah integration is the one solution and then second is the in the age two we have the validation rule for the dates too that's why use that and then rule validation for tax format this is the unique if you visit indonesia you will be fine uh many people just put uh the number by text like the a bc something like that so we use the validation rule for a number phone too by only uh number that's the we do because uh 80 000 so many not clear data one i think is six months for many stories we get and then uh yeah age like i say before uh they we are only food for uh i think it's number but they always use the something else like the unique uh like the symbol something like that so we do like the validation rule in indonesia too okay um this is the video i think this video but it's okay will be share what we do in the application okay uh this is the validation rule the same like i say before can we go okay that the cleaning method this is i want to relate it what uh nix say before and what our friend is like i do it so we have the visualization to analysis the data is the one example like uh i can sorry okay like the the first indicator is how many confirmation for the cockpit and how many a close contact and what we do i want just bold how we the process combined with the one is the training one is the so pay we do one is the method in this too for uh data like the outlier or something else okay we have the half desk half desk is the person in charge in the every push cash mass to validation the data so they have responsibility for the check the every data before we publish to public so one week before published they will be analysis so half desk will be check the anomaly data the first time is very hard so every two weeks we uh together with them uh like creating the meeting online so discuss every day data yeah we have from 35 provinces so hard but we do that and then now they can do alone so we do like basic project because if we give the theory or the manual book it's not successful in Indonesia they will never read it so we do is give the project we mentoring in the online together we check the every the data the example how many outlier data how many uh data not completeness the example the completeness we do creating one desk for the table if you saw this we have one table how many not have the close contact so they will be checked they will be checked uh based the data too we have in here the data the data they the data they put the data and then in here how many contact tracing they have if zero or enough it means they're not yet to put the data so we do that together in uh in online meeting okay so this is i think the okay this is the ones we to do to in the tracker capture if they don't have the the sorry if they have uh close contact will be checking tracker capture it's really they don't have they not yet to put or uh something error in the system so we do and repeat the process and then now they can do alone and teach the difference if uh they move someone else okay i think i'm okay after data clean they publish every week to open dashboard this is you can access the from the m2 because we not the close order Indonesia for the epa so this is the open dashboard can access by every once to look the how the cockpit stops in eunicea this is the from minister of health and join with the many data in minister of health like the vaccination or something else and then this is the last the data use like the medicine Sri Lanka after one week we make the publication which one uh district will be level one level two level three and level four base uh base our uh data from the system i think this is the last because i look so many yeah the last slide thank you very much i hope the particulars are back to our integration integration okay thank you okay everyone we'll take our break i believe the cheese in the other larger room yeah so we'll just uh over there grab your tea and we'll come back in 20 minutes