 Model testing is an important part of workflow when you work with structural ecosystem models. Model testing is also often stated as one of the key advantages in using structural ecosystem modeling. Let's take a look at what model testing does in the context of structural ecosystem modeling. This of course is simply the application of same principles that we use in simultaneous ecosystem models that I explained in another set of videos, but the models tend to be a lot larger. Let's take a look at the simple mediation model from my mediation video. In the mediation model what we are testing is basically does this mediation model here fully explain the covariances or correlations in the data. More specifically what we are asking is is this small correlation covariances here, is it an indication of model misfit or is it there just because of chance only. So models in small samples they rarely reproduce the data covariances matrix perfectly and we want to understand whether the degree of misfit can't be attributed to sampling error or should it be interpreted as evidence of the model being incorrect for the data and we should reject the model. The test that we apply is the chi-square test. So we have the chi-square test here, it is one degree of freedom because we are only looking at one value in the covariances matrix. We have one omitted path here, so we have one constraint and the chi-square test does not reject the model. That is good. This is called an over identification test or chi-square test and the reason why we refer to as an over identification test is that it requires that this model is over identified so the decrease of freedom is positive and we are leaving something out from the model. For example we could add the recursion path here but we choose to leave it out and in this case we want to not reject the null hypothesis because the null hypothesis is that the model fits the population data perfectly and if we reject that model then we would have to conclude that the model actually doesn't fit perfectly and then we would have to deal something, do something about it. So let's take a look at a more complicated example and this is from Mesquit and Lazzarini who are kind enough to provide the correlation matrix of their indicators in their paper so we can actually replicate the full paper, all the analysis and get the same results than they do. So I'm using a subset of their indicators here to estimate this kind of big confronted factor analysis with 10 factors and 21 indicators and this shows part of the model. As we can see the implied correlations are pretty close to the data correlations and indeed the residual correlations are very small in the range of 0.0 to 0.01 and they are about. Now what we are asking is can these small differences, these small non-zero residual correlations, could they be by chance only or are these large enough that we cannot attribute them to chance and we would have to conclude that the model is mis-specified. So can these be by chance only or not? Again we use the chi-square test and the chi-square test does not reject the model so we have a non-significant p-value we conclude that we don't have enough evidence to say that this model is definitely not right for the data therefore we tentatively accept the model or use the model as it is. Now there is quite a lot of debate on the use of chi-square and there are reasonable opinions, unreasonable opinions and unsubstantated opinions. I will get into the potential issues with chi-square in other video but let's take a look at some of the things with chi-square or any way of assessing model fit that are genuinely problematic. These are listed in Klein's book which in my opinion is one of the best sources on model fit assessment on the introductory level in SCMs and he lists these concerns. The first concern is that when you have a large model so we had a model of 21 indicators and you have 21 by 21 covariance matrix there are lots of elements that can be non-zero in that model. And if we are trying to summarize all those potentially non-zero correlations into one average measure so we're basically quantifying the average degree of mis-specification that average can hide a lot of things. It's possible that the model is actually mis-specified but because in one part but other parts are correct and we don't have enough statistical power to detect the mis-specification. Or it is also possible that if we have a large sample size then mis-specifications that are very minor like 0.01 correlation instead of 0 are detected and the consequences of that kind of mis-specification will be trivial but nevertheless because the test is powerful they are detected. Then any these model fit statistics, chi-square or any other statistic they only tell us whether that model is plausible for the data. They do not tell us if the model is theoretically meaningful that is up to the researcher to interpret and they also don't tell us whether that model is better than any other model that would produce the same covariance matrix. So even with a simple model of three variables we can do tens of different models with different configurations of path direction and correlations between the variables and they all fit the data equally well. So all models must be driven by theory and if two models are equally good fit then chi-square cannot say which one of them is the best. Particularly that chi-square does not reject the model, does not imply that the model must be correct because other models that are equivalent could produce the same result. Then the issues are related to the chi-square statistic itself and the chi-square it depends on a couple of things. It depends importantly on multivariate normality. So the data, the observed indicators must be multivariate normal for the normal chi-square to work well. If the data are substantially non-normal or substantially non-multivariate normal then there are corrections that we can apply. For example, a bentler has worked on these corrections. Sattura bentler correction is probably the most well known. It depends also on correlations how strongly the variables are related or how much the differences in the variances the indicators have. So if one varies in the thousands, one varies in the tens then chi-square will be affected by that. And as I said, chi-square like every other statistical test is affected by sample size. So when you increase sample size then power increases and you will detect trivially small deviations from the null hypothesis as statistically significant. So what should you do with the chi-square test? Well the first thing is that when the chi-square does not reject the model then you should be really happy about it because models that fit well are not that common and basically most of the models that you see published have non-significant chi-squares. Whether those should be published or not is another matter. Even then you should look at the residual covariances because it is possible that if your sample size is in the hundreds you don't have enough power to detect all possible mispecifications. And it is possible that you actually see a mispecification so you see a one large residual covariance or two large residual covariances in the covariance matrix but that disappears when you average all the residual covariances to calculate the chi-square. So even if chi-square is non-significant that does not mean that you should not do the diagnostics for the model. This is the same thing with regression analysis. Whenever you fit the model you should do diagnostics in structural equation models the diagnostic involves looking at the residuals. I have another video about the diagnostics in the context of converter factor analysis but it also applies to structural regression models as well. So what if chi-square ejects the model? It means that the model is somehow incorrect for the data and it becomes important to understand in which way the model does not fit. So chi-square simply measures the average covariance fit so are there some covariances that the model reproduces well whereas others that it don't and then is there a reason to some reason that you can think of why a part of the model does not fit and if so is there something that you can do about it. Then if you cannot do anything about the misfit you can't identify a clear theoretical reason for some covariances to be not well presented by the model then what should you do? Well an extreme position is that none of those models should be published but perhaps a more reasonable is that you do diagnostics to ensure that the mispecifications are not serious and this requires some expertise and you can try freeing some of the parameters in non-theoretarian fashion and as sensitivity analysis you should not do that for your final model and finally you should make sure that you are confident that the mispecification only influences your measurement models how your measures work. Measures rarely work perfectly and we need to have some degree of tolerance for imperfect measures but if the relationships will be related variables that present your theory if those are incorrect that kind of mistakes should not be tolerated in published research.