 Thank you very much. Good afternoon everybody. I'm pleased to be able to present a paper I've put together looking at a similar topic which is the question of how do we think about composite indicators of which multidimensional poverty is one and particularly this issue of robustness and the previous speaker kindly provided some of the motivation. He noted correctly that these indicators are becoming extremely popular. We all know the human development index by the UNDP and whilst there are some noted conceptual difficulties with these indicators, one of the main critiques that comes up time and time again is that there are significant uncertainties in how they're constructed. The point made by the previous speaker is about, for example, arbitrary weights. We use different weighting vectors and so on and so forth to create these indices, but how robust are they? Do we get different results if we use a different set of weights, for example? So, the fundamental question that I look at in this paper is really that, is that are comparisons, for example between two units, let's say two countries or two different groups in the same country, is a comparison robust to the specific construction choices used to develop that indicator? Or more simply, is there some choice of parameters that switches the order of two units? And my attempt or my answer is yes, we can do that almost certainly and I'll explain what I mean by almost certainly. Let me just clarify that just to fix ideas. This is the human development indicators for two countries, China and Botswana. And there's three dimensions, there's the life expectancy, there's education and there's income. So, China clearly outperforms Botswana, has got a higher value on life expectancy, but on education and income Botswana marginally outperforms China. So clearly, we can see that depending on how we weight these different dimensions, we might come to a different conclusion about which country has a higher level of development. That's just to give you an idea. But let's very briefly look at some more specific definitions. What is a composite indicator? So a composite indicator in a very general form is just a mapping from a set of parameters, theta, which I assume is taken from some compact space, and a set of raw data. So for example in the HDI, the Human Development Index, it's these dimension-specific values. And I assume that this mapping is well behaved and then it gives us just some, typically a single number. The next definition is important. It's this idea of point-wise dominance. So we can say that a variable y, which is our composite indicator, point-wise dominates another random variable, say yj, if yi is higher or for all feasible values of theta in the parameter space. So that means for any choice of parameters, yi is higher or equal to yj. So for any choice of parameters, that's the important. So that's really the definition of robustness. So my definition of a robustness for composite indicators is that point-wise dominates yj, or yj point-wise dominates yi. So in a way it would be saying that a comparison is robust if for any choice of parameters we come to the same conclusion, at least in terms of one is higher than the other. We can also relax that slightly and get a degree of robustness, which is simply the share of points in the parameter space for which yi is superior to yj. So these are just relatively simple definitions. The question is how do we go about looking at this? So the proposal here is to use stochastic search. So stochastic search is taking random draws from the parameter space and evaluating the composite indicator for each draw. What are the advantages of this? Well, the advantage is that the establishing robustness from properties of multivariate distributions is highly problematic, particularly with non-linear functions. So ideally we would like an analytical solution to the problem, we haven't got one. And indeed stochastic search is used extremely widely where we have these kind of unknown complex spaces. Moreover, as long as we can take an infinite number of draws we're guaranteed to find the result. So as long as we can keep on taking a draw we're bound to sample every point in the sample space basically. And this is what is used in a lot of stochastic optimization methods. Moreover, it also generates an estimate of the empirical outcome distribution. So we can actually undertake these point-wise dominance estimates or calculations. There are disadvantages of course. The curse of dimensionality. So as we have many, many parameters that we can choose from the volume of the search space increases geometrically. So then the question becomes, well how do we know when we stopped? Because for any finite number of draws there's always going to be some area of the search space, of the parameter space that hasn't been explored. And this occurs in lots of different areas. And for those of you familiar with the econometrics or cross-country growth literature we have Salah E. Martin's two million regressions. It's a similar idea. He's exploring the parameter space in this case of specifications. This is particularly important when we're talking about robustness because robustness by definition refers to all points in the parameter space. Not just a representative sample of them. So the question really is how many draws are enough. And that's one of the fundamental things that I look at here. How many draws from stochastic distribution, well from stochastic draws are enough. Which means really, oh there's a strange large there but ignore that. So it really means how much of the outcome space have I visited after n random draws? Let's say I've taken 10 random draws, is that enough? 100 random draws, a million random draws, how much of the outcome space have I seen? Now it turns out that we can actually estimate this. One way of thinking about it is let's assume the search space is discreet. Could be extremely large but there's a discreet number, there's a countable number of potential outcomes. And this is really something known as the classic balls in the box problem. So let's imagine I have an unknown number of boxes, let's say an unknown number of seats in the room, and I just keep throwing balls randomly and they keep landing in a box. These are my boxes but it turns out that if I count how many boxes I've seen after n throws, this actually will provide me the information about how much of the outcome space I've seen. The example is that if I keep throwing balls into boxes and I keep seeing all the same boxes again and again, that's indicating to me that I've seen a lot of the potential outcomes. So it turns out that this problem has been looked at before in a very different context during the World War II to crack German ciphers. And I.J. Good and Alan Turing came up with what they might call a naive estimator of this, which is their estimator of missing mass. And it turns out that it's simply their estimate of missing mass is the number of boxes I've just seen once, the C1, divided by the number of throws. That's all it is. We can get slightly less or more conservative versions, under the assumption of non-uniform distributions. So this gives us a starting point. So if this estimate of missing mass falls below some desirable threshold, let's say 5%, then this can be the basis of a stopping rule. And in principle, I could then assert, well, point-wise dominance holds for at least one minus this threshold certainty. So I'm almost certain that I've seen all the outcomes in the parameter space. Of course, you're now going to say, what about continuous outcomes? But my answer is that for all kind of empirical problems that we're looking at, for example in multi-dimensional poverty, we can always undertake some form of rounding rule which is empirically valid. For example, a difference between a headcount poverty measure at 50.001 and 50.0011 is in virtually all circumstances empirically indistinguishable. So we can use those rounding rules to discretise a continuous distribution. Fine. If you don't like that, we can have another measure. So under the assumption that W, which is our mapping function, is stable, it turns out that as the number of iterations increases, the distance between any given quantile of the distribution, let's say the median at iteration n and iteration n plus k will converge to zero. That's again another standard result from Monte Carlo sampling. From this, we can develop a distance measure. We can just compare what's the distance between quantiles over potentially a grid of all quantiles or whatever we want to look at, and we could look at the percentage change between two sets of iterations. So between iteration 1,000 and iteration 1,500, has my distribution shifted? Or what's the average shift in the points of my distribution? And your alpha there will give you different distance metrics, essentially. So this is nice because it involves no discretisation of the underlying outcome distribution, so it provides what you might call a useful cross-check on the good-turing type measures. And so we can use this as a stopping rule. So stop drawing from my stochastic distribution, stop undertaking stochastic distribution draws, sorry, if both of these measures fall below certain thresholds. So that's the idea. So we're identifying the completeness of the search. Two applications. How long have I got? Just to... I don't want... Okay, well I'll be quick. So let me apply these methods to two approaches. The first is simple. It's the Human Development Index. And I think we're reasonably familiar with that. The most recent form of the index, they moved from an arithmetic to geometric means, can be seen in generalised form at the bottom here, where the w, the omegas, the w's are looking at other weights. So what I'm doing here is I'm drawing stochastically random weights and seeing what happens. So just looking at three specific countries, what happens as I increase the number of draws that I've taken. And as you can see, the UGS is this, a good touring, it's the conservative version of the good touring type measures, the D0, it should be D infinity, so it's actually the maximum percentage change, apologise for that, is also shown. And as you can see, both measures decline relatively sharply. For example, for Nepal, which has the highest values of all these, so that's why it's chosen, after 5,000 draws the estimate is that I've seen 8.6% of all possible outcomes. Sorry, I haven't seen 8.6% of all possible outcomes, so which I've seen more than 90% of all possible outcomes for any parameter choice. And the distance measure is also very small that says that the expected change in the next 500 draws is less than 0.06% of any quantile on the distribution. So these are quite comfortable, so that's saying I could assert the robustness with at least 90% across all countries. So this is probably the main result, just for some selected countries, and this is then the point-wise robustness measures for sets of different bilateral comparisons. And if you just look at the lower diagonal, that's giving us the percentage of points in the sample parameter space for which the column country dominates the row country on the HDI index. And as you can see, there's some clearly robust comparisons. China, Nigeria, China dominates Nigeria 100% of the time. What about China, Botswana? So China, Botswana, approximately 80% of the time. So we can assert that the comparison between China and Botswana is not strictly robust. There are parameter choices by which Botswana dominates China, but 80% of the time China dominates Botswana, and there's other ones there as well to see. So this is a simple version of the idea. Let's take it to a slightly more complex function, which is the Al Qaeda foster poverty measures, and here is a summary of the poverty measure which probably isn't very clear, but I'm going to focus on the head count measure. And the head count measure is simply a weighted set of, a weighted count of deprivations, and if that exceeds a specified cut-off, this is the HI here, CAPA, the unit typically a household would be considered poor. So the parameter choices here are which weights do I give to which deprivations and what's the cut-off threshold. So all I do is I run again stochastic simulation, different choices of weights and thresholds, and in this case I apply to Mozambique with seven dimensions, and I'm looking at three household surveys over time, so on and so forth, but let's go to the results. What happens to my measures of missing mass? Again, as expected, they decline. So let's look at the, for example, just the final two columns there for 2008 to 2009 by, after 2000 draws from the parameter space, I've seen around 80% of the outcomes, but by 10,000 draws I've seen almost 99% of all possible outcomes. Similarly for the distance measures, there by 10,000 draws I've fallen below 1%. So my expected change of any quantile has fallen below 1%. So again, this is a comforting that I've done sufficient number of iterations that I've seen this parameter space completely. And what do we find? Well, these are two kernel density estimates of the difference in, this is just the raw head count measure, the multi-dimensional head count between two periods, and as you can see for the blue density, which is the difference between 96 and 2002, there's no point at which that falls below zero. So for any choice of weights or cut-offs, it turns out that poverty declined between 96 and 2002. And I can assert that with a degree of confidence noted previously. However, for 2002 to 2008, as you can see, there's a share, a proportion of the parameters that give us an alternative result. So poverty did not decline, although for the majority of cases it did. So again, this provides us with an indication of the degree of robustness of that poverty comparison. And we can extend that to regions over time, as I've done here. I don't think there's probably time to look at that in detail. But again, the values below the diagonal are giving the share of points in the parameter space for which the column group, which is a region seen at a particular time, has superior welfare to the row group. And what's interesting is that you see the southern, so SO2 would be the south seen in 2002, welfare dominates both the north and the centre, 90% of the time, even in 2008. So that's indicating what we know about Mozambique is very large regional differences in welfare. So let me just summarise. Stochastic search is a very powerful tool used in large numbers of applications. The main limitation, particularly in the issue of identifying robustness, is the coverage of the searches unknown. How many draws are enough? I propose two estimators to measure the completeness of the simulation. They aren't completely new, but they're neither widely known or have been used in this context before. And we can use point-wise comparisons and assert point-wise comparisons with almost a certain amount of confidence. And I believe the applications are quite fruitful. Thank you very much.