 WHO defines four dimensions or domains in data quality, which you see presented here. Completeness, internal consistency, which means do the data agree with each other. External consistency, does the data reported by health facilities agree with data from other sources such as a health facility survey? And finally, a review of the consistency of the denominators of the population estimates that are used to calculate coverage. So I'm going to review each of these four domains in turn. You see here a chart that is showing the trend over the last 12 months in the reporting completeness for two different data sets. It's possible with a single chart like this to show the trend for multiple data sets. Completeness, reporting rates, this is the most fundamental aspect of data quality. And before you attempt to interpret any trend in the data themselves, you have to look at what is the trend in the percentage of completeness of the data. Internal consistency, what do we mean by that? Well, you could think of this internal consistency as the apparent accuracy of the data comparing one aspect of the data with another aspect of the same data set. And there are several different ways of measuring this internal consistency. A way of measuring is here referred to as a metric. There's one metric which is the consistency over time that is our data that consistent from month to month. B, the second metric is consistency between indicators that are related and the commonest example of this is to assess what we refer to as the DPT dropout rate are the reported numbers of first doses of pentavaccine consistent with the numbers of third doses of pentavaccine. And we'll provide some other examples of related indicators. And finally, there's a form of internal consistency which can only be measured by visiting the health facility as I've referred to. And that is the consistency between clinic registers and reported data. And in that way, we can measure something called the verification factor. This is an example of the type of chart which we can use to look at month to month consistency of data in this case, the number of reported third doses of pentavaccine. And each of these lines represents the trend over the 12 month period and the pent of three doses for one region. The blue line is region one. And the blue line in particular has this hump here which unlike the data from the other regions suggests there's a certain inconsistency from month to month. This has jumped up in a way that we don't quite see with the data from the other regions. However, when we're looking at data at the level of a region, we can't be sure is this due to a data quality issue or does it represent an actual increase in services? But look what happens when we actually look at these trend lines for individual districts. So here we see the data from the same country and from region one. But in this case, there's one line for each of the districts of region one. And there's a certain amount of instability from month to month in the indicators. But look at this line for district number 12 where in the month of June, the reported third doses of penta has more than doubled and then it drops back down again. This is highly suspicious. And the lesson here is that when you're looking for month to month inconsistency, it's best to look at the data that are disaggregated to the level of the district and then you're more likely to pick up these quite suspicious numbers and have confidence that it is a data quality issue. In fact, with DHIS-2, it's possible to drill down, to go down to an even lower level and identify the specific health facility which has reported the suspicious number. And I think you would agree with me that when we look at these lines for each of the doses of vaccine for this health center number two of the same district of the same region, we see that this jump in the value for the region is actually due to a single number reported by this health center in June. And we can also see that this is almost certainly an error. This is due to a data quality issue. Not only has the typical value of less than 100, of the typical value of penta three jumped to 3,749, but we see that this health center in the same month has reported normal consistent values for the other data elements. So here we see an example of an erroneous value that has been reported for one health facility. This slide shows you the table that is generated by the WHO data quality tool to automatically identify facilities and values like this that are extremely suspicious and which are almost certainly due to errors that have been entered into the DHIS too. In this case, it has sorted these values and put up at the top of our table, the 12 months of data for Coway dispensary data on penta vaccine doses given in females under one dose three. And it has found that this number is the most extreme and most suspicious value of all in all of the immunization data sets. And we'll practice with this WHO data quality tool and see how it can be used to rapidly and automatically identify such outliers. The next type of internal consistency that we will be reviewing is, as I said, consistency of related indicators. And here's some other examples of related indicators in addition to looking at the dropout rate between penta one and penta three. We could also compare the values of the first doses of penta to values of the first doses of OPV vaccine. Both of these are typically administered on the same visit. So we expect the values of these two indicators to be roughly the same. And then a final example, the data on confirmed cases are frequently reported both in the monthly outpatient department report and reported in the malaria lab test report that is submitted each month. So we could compare confirmed outpatient cases to the total number of positive RDT and microscopy tests that were reported in a given month. The slide here shows how the WHO data quality tool analyzes for the dropout rate between DPT one and DPT three. And it generates this type of chart showing that there's one particular district which had a negative dropout rate, which if reported for a full year is usually a sign of a data quality problem. External consistency, I'll just quickly review these last two dimensions which aren't a major focus of this workshop. Here's an example of how the WHO data quality tool has compared the values reported on a household survey. In this case, this was a demographic health survey. The values for antinatal care first visit coverage reported on the survey versus what was estimated based upon the data reported by health facilities. And in this case, there's a striking difference between what is called the routine estimate of coverage, that is calculated with health facility data and the estimates based on the household survey. And finally, consistency of population estimates or denominator estimates. There's a couple of things that should be assessed about the denominators. One is whether they are consistent from year to year. And the second is whether related denominators such as the number of pregnancies, the number of live births, the number of infants are consistent between each other. So here's a chart giving an example of a country which has not had year to year consistency in the growth rate of the number of surviving infants, the population under one. In fact, that estimated number of population under one is seen to have dropped from 2015 to 2016. And then it increased dramatically from 2017 to 2018. So this estimated value is not showing year to year consistency. And similarly, if you think about it, it is a bit unusual that we would find that the estimated number of infants, of surviving infants in 2018 and 2019 was estimated to be actually greater than the estimated number of live births. That really should not happen at national level and suggests that we have some errors in the denominators used to calculate coverage.