 So this paper is entitled The Political Economy of Bad Data. We're shifting focus from GDP statistics mostly to the social sectors. I'm going to talk primarily about educational enrollment statistics and health, and specifically immunization rates. And before I get to the grand kind of motivation and pitch, what I'm actually going to present to you today is something very simple, which is lining up household survey data sources with administrative statistics, looking where they disagree and looking whether we can learn anything from the patterns of those disagreements. What do we learn about the determinants of bad data from the discrepancies between household survey data sets and administrative data sets? The kind of big motivation policy pitch, if you want one, you will all have heard that the UN high-level panel for the new MBGs and the post-2015 agenda has come out with a call for a data revolution and one part of the way that that data revolution is being kind of made concrete is a push for a new round of global, harmonized household surveys that will monitor the MBGs around the world. I think there's a lot of good ideas out there, but survey and administrative data sources serve very different purposes and serve different needs. We want to understand why they disagree before we assume that more survey data is going to be the answer. All right, so our framework here, a little bit cheeky, but we start off by explaining the data demands of international aid donors versus the data needs of an African state and I'm going to be using samples of African countries here. And then I want to talk about where the discrepancies lie and I want to divide this into kind of two, but I think are very distinct phenomena that often get lumped together. The first half is going to be about getting fooled by the state and I think this is what we tend to think of when we think about bad statistics, that somehow the government is misleading us, that national statistics offices are lying about the results or through some lack of technical capacity have screwed up the results and are foisting bad numbers on us. And I think there are cases where we can make a case for this and I'll point to immunization statistics and cases of consumer price inflation, but I think there's a second phenomenon here that we need to pay attention to which is kind of the state itself being fooled. And we need to question the assumption that African states have the ability to get reliable statistics on the questions they themselves want to answer. And so I want to point to cases, a case study from Tanzania in agriculture to what extent can the Ministry of Agriculture and the National Bureau of Statistics actually get a grip on what May's output is and why there might be incentives at the lowest level to misreport up the chain from agricultural extension workers in the field and then more systematically with kind of a cross-country analysis and school enrollment as we move from thankfully the abolition of user fees to a top-down funding system what that does to the incentives for truthful reporting of statistics on school enrollment. All right, okay. So, very stylized. Seeing like a donor, this series of papers back from kind of the twilight of the popularity of conditionality in aid programs, Svensson in 2003, Azam and LeFont in the Journal of Development Economics in 2003, kind of present aid conditionality in a principal agent framework. This is a simple moral hazard principal agent model. The donor P offers the recipient country government a contract to help the poor but they can't observe the policy effort so we end up with this moral hazard problem. What does this kind of seeing like a donor principal agent view of the aid relationship demand in terms of data? Well, we need independent verification of results. Clearly, you can't just rely on the agent's self-report of how things are going. You need independent verification. And so people who are still kind of pitching this contractual model of doing aid, my boss in Washington, Nancy Birdsall, is advocating paying for results in aid. And if you're paying for results, you're going to need really reliable independent verification of what's happening on the ground. So maybe more household survey data, an expanded DHS survey maybe. And that means more high-quality, harmonized household survey data on poverty, child mortality, learning, all the things that we want to measure under our Millennium Development Goals. If we look at things from the state side, though, the state's task is very different. You know, this is kind of drawing on another kind of parallel long tradition in the analysis of African political economy from Herbst to Vandeval and Bezli in person. I'm not going to go deeply into these papers, but I think all of these analyses would suggest that our question, whether it's reasonable to assume that African states, even if we have some incentive contract for results, can necessarily implement the desired reforms and can necessarily even get the data that they need to implement those reforms. The data policy implications, what does an African state need? What does a ministry of education need to run a school system? What does a ministry of health need to run an immunization campaign? Well, they need administrative data linked to lower units of political accountability. They need district level, they need clinic level data on what's happening in terms of immunization in order to make the program go. And there they run into huge problems of incentive compatibility about getting headteachers to tell them what's going on truthfully in clinics to report accurate information. So two very different kind of data problems here. A kind of caricature version of this. If we see you sing like a donor, you can, you know, look up cross-country databases and look, this is primary net enrollment. I've lost the legend here somehow. But darker blue is higher net enrollment rates. You can compare net enrollment rates around the world. And this is very handy for me sitting in Washington if I want to run across country regression or if I'm, you know, working, my wife works at the World Bank if she wants to make allocation decisions. She doesn't work in Africa, but between two countries you've got comparable data for neighboring countries. How is Tanzania faring compared to Uganda? This is very useful. But if the Kenyan government wants to manage its primary education system it's going to need enrollment data at a much higher level of disaggregation. And this is what the Ministry of Education in Kenya actually uses, which is school level data which is collected through an administrative data system on a quarterly basis for all 20,000 primary schools in the country. So, you know, enrollment for each and every school. And on that basis textbooks are allocated, teachers are allocated, and so on. This is also, as it happens, and if you want to worry about kind of citizen accountability, this is what, you know, there's been a lot of fanfare around Kenya's open data portal. You can download all this administrative data on the Kenyan open data portal. This on the right is what you get, the administrative data coming out of the EMOS data system. The trade-off here, unfortunately, is while this is much more useful for subnational policymaking, I'm going to try to convince you in a minute it's completely wrong. We have less useful for Kenyan policymakers national level data, which I think is much more reliable. Just, there's a trade-off here between accuracy and that degree of disaggregation. Another example of this trade-off, Tanzanian agricultural statistics, you've got kind of three choices. You've got the Ministry of Agriculture's routine data system. Every village in the country, every year, collected with, until last year, no questionnaire. Agricultural extension agents just write a note and they pass it up to the district, you know, official. No systematization whatsoever. You've got a much larger survey run by the Ministry, by the Ministry of Agriculture. A quarter of the villages in the country, every five years, a 25-page questionnaire. And then you've got something which was what I was employed at once upon a time in Tanzania to run, which was a panel survey which had a very small sample and an extremely long questionnaire. So highly technical, a hugely capital-intensive enterprise, a hugely skill-intensive enterprise with a PhD economist running the show. But when the minister asked us for regional numbers on crop output, I said, well, how about a national number? And, you know, he kind of lost interest. All right, so we face these trade-offs between, I think, contrary to what you'll find if you read US Census Bureau stuff about, or OECD manuals, where, well, survey data has a sampling error and we should worry about that. I think for most low-income countries, there's a strong case to be made that the accuracy and reliability of the survey data compared to the administrative data is just much greater. You get much more detail in terms of the questionnaire with things like the DHS, which we're going to talk about, you get much more cross-country comparability, but you trade off the frequency and the subnational disaggregation. All right, now let's turn to the comparison of these two sources to see what we learned from the disagreements. Okay, immunization and pay-for-performance aid. So starting in 2000, the Global Alliance for Vaccines and Immunization, GAVI, offered a very simple kind of contract scheme. It was like they had read the Journal of Development Economics. GAVI paid eligible countries $20 for each incremental child immunized for diphtheria, tuberculosis, and I can never get the P. Thank you. All right, so this scheme comes in 2000. We're building on, you know, this is not a unique discovery to us. Lim et al and the Lancet in 2008 kind of compared the DHS numbers with the numbers that countries were reporting under this scheme and said, look, there's something awry. Countries are reporting numbers that are much higher than the DHS is showing. So we wanted to push this a little further. It's interesting just on the African sample, but we're going to do something a little bit more econometric. We've got 91 DHS surveys across 41 African countries before and after this 2000 policy comes into place. So we can compare changes over time in your vaccination rate in the WHO data, the administrative data, versus the DHS survey data before and after the 2000 policy comes in and compare DTP3 where the incentives were in place with measles as sort of a control disease. There was no incentive scheme for the measles. And if you care, this is what the regression is. We're going to have the change in the vaccination rate from the WTO for each country disease in time period on the left-hand side and have that same statistic from the DHS on the right-hand side control for before and after 2000, what disease it is, and the interaction of after 2000 and being the disease which was incentivized. In pictorial form, this is the ratio of WHO to DHS vaccination coverage. So if you're above one, that means your administrative data shows something higher than your survey data. And they're kind of centered around one for measles which has not been incentivized. And in 2000, the line kind of carries on more or less flat. There's a lot of variance there. There's still a lot of discrepancies, but we don't see any really systematic pattern in them. When we get to the DTP3 vaccination rates, we see something a bit different. The picture is not the most dramatic thing, partially because Nigeria is pulling my access way up. Nigeria has problems completely unrelated to this scheme, but we see a statistically significant evidence of a structural break here where the shift towards exaggerating or administrative data increases after the 2000 program comes in. If we do this in a regression framework, single differences, DTP3 immunization up 13% after 2000, that increase was 4.6% faster in the admin data than the survey data. That increase in the discrepancy was 2.3% larger in DTP3 than measles and quadruple differences moving from levels to changes over time. The jump in DTP3 discrepancy was 4.5% faster per annum than in measles. Something went awry, specific to measles after 2000. All right, consumer price inflation. Not a story about incentives from donors, but a story about politicization, maybe. CPI data is rare in low-income countries and that you have high-frequency economic indicators that you don't get from many other sources. As a result, it's in the newspaper. You hear about the CPI. This poverty number comes out every five years. The CPI is always there. It's typically based on market surveys, which in many African countries, including Tanzania where I used to work, was completely an urban survey. No rural data on the CPI. On the other hand, the national poverty lines are based on independent survey data, with a lot more technical assistance, a lot less frequent. But if you're using a cost of basic needs poverty line, this is, I can spend a lot more time talking about this, but I'm going to pitch this as being roughly a measure of the cost of living or a separate consumer price index for people who are purchasing a basket of goods similar to what the poor consume. So the cost of basic needs line as a surrogate for a proxy for the CPI. Okay, here's the Tanzanian mystery. Tanzania grew at I think 4.6% per annum from 2000 to 2007 in real per capita GDP terms. The Gini coefficient did not go up by anything significant. The mystery among Tanzanian policymakers was that the national poverty rate barely changed. This flat line at the bottom was the national poverty rate using the national line. How could you have growth and no increase in inequality and no poverty reduction? Well, if you use the World Bank's dollar a day line these are the tandem rebellion numbers from Pavkalnet, dollar 25 a day poverty goes from 85% down to 68%. These are not necessarily inconsistent. For one thing, you're at a different point in the distribution. Poverty can be going down relative to one line and stay constant relative to a different line. But another thing that differs between these two poverty series is they use different deflaters. The dollar a day poverty line because it uses international PPPs to measure the official CPI. National poverty numbers don't use the official CPI. You're using the survey data to measure your prices. What happens if you compare these two price indices? The official CPI is going up by 5.7% per annum. Whereas if you look at the survey deflator, it's going up by nearly 10% per annum. You've got inflation in the household survey which is much higher than what you're finding in the official CPI. To readjust the poverty numbers with that, you see similar trends. Dollar a day poverty is more or less flat just like the national poverty. And if you reevaluate the growth rate using this deflator, that's a little dodgier. You shouldn't necessarily be using your consumer price index to deflate your whole national account system. So just as an illustration, your per capita GDP would have fallen from 4.2% down to 1.8%. Not an issue of donor incentives but of a highly politicized economic indicator. Lastly, let me... How am I doing on time? Okay, let me go through quickly then. One more. Let's switch sides now. We've been talking about ways in which national governments are presenting statistics which appear to be misleading. Let's talk about the national government's own trouble in getting reliable information. So a simple example is agricultural output and agriculture is relying on a system of 10,000 village agricultural extension agents based in each village to write out these notes up to the district level and tell them what's happening to May's output and Sorghum output and so on. The same agricultural extension agents' main job is to increase the output of these same crops. Now, they're not on any kind of explicit pay-for-performance system, but they have strong incentives for things to go well. If you go to the FAO website, you look at Tanzania and I first looked at this data when I'd been working in Tanzania in the National Bureau of Statistics for nearly a year and I was like, oh my goodness, there's all this data on agriculture I didn't know. Where does this come from? I didn't know we collected this. But every year for all these crops, there's data there. Wow. And so all I want you to take away is not the trends just that May's, Patty, Gustavo, Sorghum, every year you've got a number. How many metric tons did Tanzania produce? Well, for a specific period, we can compare those different data sources I showed you at the beginning from the 2002, 03 crop year to the 0708 crop year, only for May's. The FAO numbers show 9% per annum growth. The National Panel Survey, with lots of technical assistance and a detailed household survey shows 6% per annum growth. This is not monotonic though. The survey that's in between, which is run with the National Bureau of Statistics that has some technical assistance, but not as much, gets a number which is 17% per annum growth in May's output, which given that per capita consumption among farm households in Tanzania went up by about 1% per annum seems unlikely. This is not, I can assure you in the case of the National Bureau of Statistics, not wanting to know the answer, they want to know the answer. The Ministry of Agriculture wants to know the answer is that the agricultural extension agents who actually collect both this data set and this data set don't necessarily have the right incentives. More systematically, school enrollment going very quickly. This is our sample of, I think we've got about 21 countries, changes in school enrollment from administrative data sets versus the DHS again. And in different countries in different years we'll have abolished user fees. Why do I think the abolition of user fees matters? Well before user fees people come to school and they pay you. After you get rid of user fees I fill out a form and report to the Ministry of Education how many students I have and I get a capitation grant in many countries based on how many, what my enrollment is. Incentives to exaggerate enrollment just got a lot stronger. So if you look at Kenya user fees abolished in 2003 this is the administrative data set about net primary enrollment in Kenya. These are the survey data sets. Comparing to the Kenny integrated household budget survey and the welfare monitoring survey spanning the abolition of fees zero change in net primary enrollment. Looking at the two DHSs zero change in net primary enrollment. So the administrative data continues to climb, climb, climb and the survey data sets are flat. Rwanda it's not quite as dramatic. Also in 2003 fees are abolished. You get kind of a turn there and the survey data you don't really see in the administrative data but not quite as dramatic. Bottom line we can run the same kinds of regressions I showed you before. My sample's small. I've got 46 surveys, 21 countries. But the point is that the acceleration in enrollment was about 10% faster in the administrative data than it was in the DHS survey data after the abolition of user fees. There's not a case that we need to go back to user fees. This is simply saying that our data systems need to adjust to the new incentives and realities of a new financing system for education. All right. I don't have a big summary conclusion but just one possible take away. I mentioned the post-2015 agenda at the beginning. There's an emphasis on new survey-based global monitoring system but I think the system is also going to need a complementary system of incentive compatible administrative data systems. Otherwise we can monitor internationally what's going on but the Tanzanian Ministry of Agriculture or the Kenyan Ministry of Education can't do much about it if they don't know what schools are doing what. How could this work? One small idea. Our surveys maybe need to be designed more deliberately to cross-validate the administrative data systems. I'm doing this at the national level but you could be cross-validating at a much lower level, at the school level or the district level. Bigger disaggregated samples linked to facility surveys as well would be this one idea of many. All right. Thanks.