 I will be presenting a unified structural equation modeling approach for the decomposition of rank-dependent indicators of socioeconomic inequality of health. And this is joint work with Hido Eregers, who's a colleague of mine at the University of Antwerp. First of all, what is socioeconomic inequality of health? Well, it deals with two dimensions or two variables. On the one hand, we have a socioeconomic status variable, like, for instance, income, wealth, or consumption. And on the other hand, we have a health variable, which can also be an ill health variable. And then socioeconomic inequality of health tries to measure the degree of correlation between these two variables. And this is usually done by means of a rank-dependent indicator, which is a weighted average of individual health levels of individuals from a targeted population, where the weights are determined by the ranks of the individuals in the socioeconomic distribution. The most well-known rank-dependent indicator is the health concentration index, of which there are two versions. We have the relative or standard health concentration index, and then there is the absolute or generalized concentration index. And these indices, they express the relative or absolute differences in health between socioeconomic status groups. These concentration indices, they can be graphically represented by means of concentration curves. On the left-hand side here, you see an example of a relative concentration curve. Whereas on the right-hand side, you see an example of a generalized concentration curve. And the generalized concentration curve is linked with the relative concentration curve by multiplying the relative concentration curve with the mean level of the population health represented by mu. Now, further on, do we need to interpret these graphs here for this particular example? On the vertical axis, you see the cumulative proportion of our health variable, which in this example is an ill health variable. It represents the infant mortality. Whereas on the horizontal axis, the x-axis, we have the cumulative proportion of our targeted population, which in this example are babies that have been born alive, the live birds. OK, and so what do we see then here? Well, that infant mortality is concentrated among the poor people. Given that on the x-axis, we have ranked the individuals, the babies from poor to wealthy or from poor to rich. So we have ranked the babies in ascending order of the wealth of the households in which they have been born. So they are ranked from poor to rich here. And what do we see here? That most of the math is concentrated on the left-hand side, so that infant mortality is concentrated among the poor people. And the associated concentration index will then be negative, meaning that we have a pro-poor bias of the distribution of health. And concentration index is computed as twice the area between the concentration curve and the line of perfect equality, the 45-degree diagonal line. OK, so this is just in order to set the scene a little bit about the aim of this work. In this setting, in the setting of concentration indices and concentration curves, what we recommend is the use of a structural equation modeling framework for a regression-based decomposition analysis in order to explain the generalized concentration index denoted by the GC. So we look at the generalized concentration index, the GC, the generalized or absolute concentration index, which we wanna explain. So we wanna find out what the reasons are for a particular value of the GC by means of a regression-based decomposition analysis. And so the GC is our measure of the correlation between health and socioeconomic status. And we propose the use of a structural equation modeling framework since we are going to say that it will be a more proper way of doing regressions that correspond well with decomposition analysis. And there are several existing decompositions which we will run through. First of all, I will be discussing two one-dimensional decompositions, the most well-known of which is the health-oriented decomposition proposed by Wachstaff, Van Dorslar and Matanabe in their 2003 Journal of Econometrics paper in which health is subjected to a regression. In previous work of ours, we also proposed another one-dimensional decomposition in which we regressed the socioeconomic status variable instead of the health variable. And also we proposed a two-dimensional simultaneous decomposition in which we regressed both health and the socioeconomic status variable on a set of explanatory variables. Okay, and I will run through these decompositions. I will shortly introduce them to you. Okay, for that we need some notation, some machinery. First of all, we consider a population of N individuals ranked from one up to N. And then we have a health variable H with individual health levels. This health variable is a non-negative ratio scale variable or a cardinal variable. Then our data set also has SES variable with individual levels, Y1 up to Y1, which we are going to rank from least well-off to most well-off. So we have some households then and based on the SES variable, we will rank the individuals from least well-off to most well-off in order to get our rank variable in. Okay, based on this rank variable, we will be computing the fractional ranks for which the average equals a half. And then further on, we will be computing the fractional rank deviation, which is the deviation of the fractional rank from the average fractional rank for which the average equals a zero. And it will be this variable, the fractional rank deviation variable. That's also being used in the definition of the generalized concentration index, the generalized health concentration index, of which you find here two definitions, the product definition as well as the covariance definition. So it's basically the covariance between the health variable H and the fractional rank deviation variable D multiplied by two. And so measures the correlation between our two variables H and D. And this measure is bivariate. So it consists of two variables, two dependent variables. And so we say that it is bivariate. It's a bivariate measure. Okay, first of all, a quick overview of the decompositions that I'm talking about, the most well-known of which is the health-oriented decomposition, which starts from a regression of our health variable H that we try to explain by means of a set of K-expandatory variables, X1 up to XK. And then we use the product definition of the GC and do some math. So we plug in our regression for H into the product definition of the GC to finally end up with the health-oriented decomposition to which we refer to as decomposition one, which is a sum of K contributing factors or deterministic components determined by the explanatory variables X in relationship with the fractional rank deviation variable D and then we have a residual term. Okay, we can do the same thing for the fractional rank deviation variable D. We can propose a regression for D given by a set of explanatory variables Z, ranging from Z1 up to ZQ. And then using the covariance definition of the GC and we do some mathematics, we obtain the rank-oriented decomposition to which we refer to as decomposition two, which contains a set of Q contributions, one for each of the explanatory variables Z and then again a residual component. Okay, these are two one-dimensional decompositions. Now, since our measure, our socioeconomic inequality of health concept is bivariate, we can just as well regress both H and D simultaneously by means of a bivariate multiple regression model. And we have done so in previous work of ours. Now, typical for this bivariate multiple regression model is that we have used the same set of P explanatory variables as one up till SP. So the explanatory variables both for H and D are the same. And doing some mathematics using the covariance definition of the GC, we obtain the following decomposition. It's a simultaneous decomposition which we refer to as decomposition three. It consists of a sum of P direct effects given by the variance of the individual explanatory variables and then weighted by some coefficients plus a set of terms that reflect the correlation structure between the independent variables S and then again a residual term. Okay, so we have three components here that make up our decomposition to which we refer to as decomposition three. Okay, now, while having done this work, we got some criticisms related to it. And the first criticism that we got related to the use of the bivariate multiple regression model and the associated decomposition was that we used the same set of variables to explain both H and D, which may not be appropriate given that the determinants of H and D may not be the same. And then more importantly was that actually in all of our decompositions, we did not include the variable D as a predictor in the regression for H and also we did not include H as a predictor in the regression for D. We left those variables out of our regressions. Whereas actually in empirical work, it might be the case that health is potentially both a cause and a consequence of socioeconomic status, of our socioeconomic status variability. So it might be that there are some feedback mechanism that there's a reciprocal relationship between the two variables, but still we did not include in any of our regressions, we did not include any of the variables that are part of the GC. We did not include any of those as regressors in the regressions. And why did we not do that? Well, it's actually meaningless to do that because we are trying to explain the GC and which is bivariate, which consists of two dependent variables, but using one of those as an explanatory variable. So in order to try to explain it, it's just meaningless, it's just treating those variables both as a dependent and as an independent variable. So actually we are not explaining anything at all then. If we are just going to use either of these variables, both as an explanatory variable, as an independent variable, then what we get is an artificial result and that translates itself by means of a zero residual component. And so the residual will be zero because all of the variation will be absorbed by one of our dependent variables, which we use as an independent variable at the same time. So we will get an artificial result and this artificial result also happens when we use a proxy variable for one of the dependent variables, like for D, we use consumption, income, wealth, and so you see that often appearing in papers, but these variables are highly correlated with D because D has been composed from those variables. So it doesn't make any sense to include those variables in the regression, which is used as an auxiliary step in order to explain the GC. So first we have the regression in which we wanna explain health or D and then that regression, those results are used in order to explain the GC. Okay, here you see an example of a simple regression of H on just one explanatory variable, which is then D, and the definition of the estimate for the corresponding coefficient, the slope coefficient, is then the covariance between H and D divided by the variance of D and that gives us just the definition of the GC for which then the residual component is completely equal to zero. But actually we have not explained anything at all. It looks like we have explained a lot, but it's just an artificial result. Okay, still using decomposition one, our most well-known health-oriented decomposition, this procedure has oftentimes been done and it's indeed true that when using SES as an explanatory variable in the regression for H that SES comes out as an important determinant of health and then going one step further then in order to use that result for the decomposition analysis, it makes the contribution of SES to the GC artificially large because the GC is being defined by SES itself. So using SES in order to explain the GC is not a good thing to do in this way. Using SES in order to explain health, okay, but then using this result into this result, this leads to an artificial result. So how do we combine this empirical results with the regression-based decomposition methodology? So we have this empirical result but we don't have the right machinery actually to use it, to make use of it. Well, therefore we have proposed the structural equation modeling approach in which we have in a first equation adopted D as an explanatory variable in the regression for H and H as an explanatory variable in the regression for D. And both D and H are assumed and don't you then in that case, and we then estimate the parameters using a generalized method of moments estimation procedure in order to obtain consistent estimates. And this estimation procedure makes use of instrumental variables. It's an IV estimation procedure. Doing some rewriting by replacing the right hand side and dodging as variables by their respective equations. We obtain only exogenous variables, the X and disease, the X originally explaining H and disease originally explaining D. We only obtain exogenous variables on the right hand side and this way of rewriting is referred to as the reduced form of the salmon which we only have exogenous variables on the right hand side. And what do we obtain then by rewriting this? Well, we again obtain the bivariate multiple regression model. So which includes the same set of explanatory variables and which can be directly estimated using OLS. So we don't need GMM in that case. Okay, so the bivariate multiple regression model is just another way of rewriting the structural equation model. That's allowed to depend on different sets of regressors and that includes both D and A in the regressions for H and D respectively. Okay, so the same approach results in our simultaneous decomposition, decomposition three based on the bivariate multiple regression model. It integrates the feedback mechanism between the variables H and D which are allowed to depend on different sets of predictors. So this refutes the two criticisms of the bivariate multiple regression model and the resulting decomposition three. Okay, let's quickly illustrate the SEM approach using some data. Data that involves stunting or malnutrition of children below the age of five in Ethiopia and that come from the latest round of the demographic and health survey of Ethiopia. It contains 9262 observations and we define stunting or malnutrition as having a low height for HZ score where the Z score is smaller than minus two standard deviation from the median of the height for H a variable of well nourished population and reference population. And we transform to this stunting variable onto a unit scale, unit interval for which individuals have been assigned a value of zero if they are not stunted and a value between zero and one in which they are stunted and the value then indicates the degree of stunting. So actually our response variable is the degree of stunting which is an ill-health variable with one being assigned to the child that is most stunted. As explanatory variables, we selected a set of eight variables which we partitioned into exogenous and instrumental variables in the case of a GMM analysis and we performed weighted regressions. Okay, here are some descriptive statistics still. We used the weighted fractional ranked deviation variable based on the wealth indices provided by our data set. And then the question was, so what are the explanatory variables so explaining the GC? Okay, yeah, this is our set of explanatory variables. Okay, just let me quickly still talk through the results for two minutes if I'm allowed to do so. One minute. Oh, okay. Well, what came out of the regressions? Well, that's a fractional ranked deviation variable using the GMM analysis, has a significant impact on our age variable. But, so using GMM, but that the opposite relationship does not hold. So our health variable does not have a significant impact on our socioeconomic status variable represented by the fractional ranked deviation variable. Whereas using OLS, it does seem to have a significant impact. But the right analysis in this case would be a GMM analysis. And based on this GMM analysis, so one can do these decompositions, the most famous of which is the health oriented decomposition. So using GMM compared to OLS, we see that the contribution of our SES variable using OLS is close to 70% whereas using GMM, it drops to about 40%. Using OLS, we don't have a residual term, it's zero. If we include the D in our regressions, whereas using GMM, the residual component equals 28%. So we get a different picture, so whether we include D or not into our regressions here. With respect to decomposition two, we left out the health variable because it does not have any significant effect on D so that we can just use OLS and we don't have an equation with an endogenous right hand side variable. This is just decomposition three, but I leave it as such because I don't have much time. Okay, so yeah, this is some additional material, but I won't run through it anymore. So I hope that I convinced you of the fact that it might be better to assume a broader perspective on the regressions that are used for these decomposition analyses and that there might be a better estimation method in the context of a structural equation modeling approach. Okay, thank you very much for your attention.