 Jos haluaisit testata, jos haluaisit seuraavaksi dataa, niin kai-square-testi on seuraavaksi asia, jossa on seuraavaksi suunnitelmasta. Mutta kai-square-testi ei ole vain välttämättä valita kai-square-testi. Let's take a look at these two examples from published research, and these are fairly common in papers that apply SCM. So you see these tables that list the kai-square first, but then are all these other indices, there are CFI, there is RMSCA, and then there are others, so there are many, many indices, and then supposedly they tell us something about model feed. Let's take a look at what is the argument for these indices, and what is the idea behind the indices? The argument for these indices is not actually an argument for the indices as much, it's an argument against the kai-square, and the argument goes like this. So every model is wrong even before it is fitted to the data, and this means that for example if we model a relationship between x and y as linear, very few relationships are actually exactly linear in any real dataset. Another example if we have a measurement scale consisting of five agreement items we assume that the measurement errors of those items are exactly uncorrelated. There are many reasons to believe that they are not exactly uncorrelated, the correlations may be low but they are not exactly zero. If we have a large in a sample size, then these kind of small discrepancies will be detected by the kai-square, and the argument further goes that instead of testing if the model fits the data exactly, we are more interested in the decree of misfit of the model. So the kai-square tells that the model is not exactly the correct, but we want to understand whether it is incorrect in a way that causes problems for our inference. So this is the argument against the kai-square and argument for the alternative fit indices. And this argument is actually made in one of these papers. So in Uli Rengö's paper in the footnote for the table, they say that the kai-square is sensitive in large sample sizes, therefore it can be completely ignored which they do in the study, and then they look at these other indices. But this is a highly problematic statement because the same statement can be made in the context of any statistical test. So if you run a regression analysis, if your sample size is large enough, then the normal test for significance will detect trivially small differences between the coefficients and zero, and therefore the p-values should not be trusted. But I have not seen a single article that makes this kind of argument that the kai-square should not be trusted because it has too much power, but then they would notice that this is the same problem for regression analysis. So there is this kind of tendency that when you see evidence against your model like we have here, we disregard it when you see evidence that supports a theoretical argument, even if the same logic that is applied here would lead to discarding. That's test as well, it's nevertheless not done. So there's some cherry-picking of evidence that is fairly common. Okay, so what are these indices then? So what do the indices tell us? And they are referred to as alternative fit indices and they fall into two families. Or most of the indices fall into two families. The first index are descriptive indices. So descriptive indices quantify the degree of misfit. And RMSCA and SRMR are the two most commonly used ones. The interpretation of RMSCA is a bit more complicated, so I'll just focus on SRMR. And SRMR is basically what is the average residual covariance in the residual covariance matrix. So it's a geometric mean, but basically an average. And if the average degree of residuals is small, then we conclude that the model fits okay. Of course, as I explained in the video about the Caiskor test, if you have a small average, it does not mean that all the residual covariances are small. It is possible that there is an area with large covariances that you should do something about it. But if you just look at the mean, then those mispecifications go undetected. The rules of thumb that you can find in many different sources and you can find multiple different rules of thumbs. But I think this is the most lenient one, goes that it should be below 0.8 or another more strict guideline is that it should be below 0.5 and the confidence interval should be below 0.8. But then you can find these guidelines that also say that 0.1 is sometimes acceptable, just mediocre fit. The other family of fit indices are comparative fit indices. So we calculate alternate invoendos. So we calculate our hypothesized model. So that is the one that gives us the Caiskor. Then we calculate two alternative models. We have the null model. The null model is a model that should not fit the data at all. And typically the null model we use is a model where we say that the variables have variances that are estimated, but they're constrained to be uncorrelated. And quite often what we study, we study measures that are supposed to measure the same thing and we study phenomena that are related. So the indicators are actually highly correlated. And then forcing a model that says that the indicators or don't correlate at all will fit really badly. And then we have on the other extreme, we have model that fits perfectly, the saturated model. We basically allow all covariances between the indicators to be freely estimated. And then these indices quantify how far from the worst possible model towards the best possible model we are on some metric of model fit. So for example, the CFI of 0.95 could be interpreted that if we start from the worst possible model, then go towards the best possible fitting model, then we are 95% of the way. And the interpretation of TLI is similar. Let's just quantify the degree of misfit a bit differently. The rules of thumb, the original rule of thumb for CFI was 0.95, but then you can see these rules of thumb that say that sometimes 0.85 is okay, 0.9 is already acceptable fit. And the original rule of thumb was 0.95. If you are below that, then you have problems. These have conceptual problems. And the big conceptual problem can be understood with an example. So let's say that I like running and let's say that I, if I compare myself to the best marathon runner in the world, that is the saturated model, and the worst possible marathon runner in the world, that is the saturated model, I'm 95% of the way toward the best possible runner. So how do we come up with that kind of comparison? Well, you can just choose a person who is really bad at running. For example, you can choose a person with our legs as your reference point, and that person definitely cannot run a marathon. And then you can claim that you're a good runner because you run a marathon in less than one hour slower than the worst best marathon, which is still a large difference. So the fact that someone takes 100 hours for a marathon does not make you a good marathon runner if you are like in the four hour or something range compared to the two hours, which is the word record. So this is a bit of an unfair or illogical comparison because we are using a model that should not fit the data at all. So why would it make a difference if our model fits better that model that should not work at all? So that's the argument against the CFI and also the TLI or any other index that uses this principle of comparing against the worst possible model. And when you read about these indices, you can find all kinds of tables from books that list these cutoffs that you can apply. And there are many other indices as well. So there are model evaluation basically reduces to finding the right index, finding the right cutoff and finding the right book to recommend the cutoff that you want to apply. This is of course, this is not a good research practice. So let's take a look at how the chi-square test should be used. And let's take a look at two extremes. So the argument for these arguments of the indices was basically simply an argument against the chi-square. So there are two extremes in this argument. In chi-square, one is that models with significant chi-square are wrong and should not be published or trusted. And the other end of the argument is that chi-square can be completely disregarded and does not need to be reported. So both of these, and this argument would further continue that you evaluate the model based on these alternative fit indices and descriptive fit indices. Both of these arguments have problems. So this argument has the problem that all multiple indicator measures, so if we have a scale, measurement scale, then basically they would be rejected if the sample size is large enough because it is unrealistic that if a person responds to survey questions, then the measurement errors of every question with every other question are going to be exactly zero. No one can design that kind of survey and it will also have to eliminate all item context effects and so on. The second problem is that we lose information from models that their triviality is specified. So it is possible that some of the rational covariances are 0.01 and they are not 0, so chi-square will eventually detect them, but they may not be large enough to make the model results completely untrustworthy. So that is the one extreme perspective. The other extreme perspective, which is actually very common, much more common than this extreme, is that chi-square can be completely disregarded like in a really rank of paper. But the problem with this perspective is that the chi-square detects trivial mispecifications in large samples, that is true, but it does not imply that a significant chi-square means that the mispecification is trivial. Also severe mispecified models are likely to be accepted. If we don't respect the model test, then we are going to be accepting models that have serious local mispecifications in one part of the model that are nevertheless not reflected in the indices that just quantify the average mispecification. And finally, it is not clear if the alternative fit indices will actually help in detecting model mispecifications. There is mixed evidence of whether these indices work or not. We know for sure that they will not work always. There may be scenarios where they work, and then the problem is that how would you know if your study is done in a scenario where, for example, the CFI 0.95 rule is going to be guaranteed to produce only results that are freely mispecified. So a more reasonable perspective is somewhere in the middle. And the more reasonable position would take the chi-square and respect it. So if our model is not exactly correct for the data, we report that, and we say that the model can be mispecified or is mispecified in some way, and then the results may be a bit less trustworthy than we would hope them to be. Then we need to do diagnostics to understand the mispecifications, understand the source of misfit, and when we understand the source of misfit, then we can probably fix the model. So it's fairly common that when you look at models recently in the papers, there are constraints that don't make sense. For example, you're constrained to explanatory variables, to exogenous variables to be uncorrelated. Those are almost certainly simply model specification errors and not intentional, and they just go undetected because people don't do mispecifications. Once you understand the source of mispecification or whether you can do something about it, then the next thing is that you do sensitivity analysis. So if you don't know if your model, if you don't have any theoretical reason to modify your model based on diagnostics, then you can free some of the parameters empirically. Your statistical software will provide you something called modification indices, and you can just check how much changing your model a little here and there would influence the results. If changing the model here and there a little would not change the results by much, then we can conclude that the possible mispecifications in the model probably will not have a major result impact on our results. And importantly, the alternative fit indices are not needed in this more reasonable process. So there are different arguments and foreign against these alternative fit indices. Klein summarizes these arguments really well in his book in chapter 8, and he concluded that there is enough evidence to show that these alternative fit indices don't detect all serious model mispecifications. So it is possible that you have severely mispecified model and CFI is still above 0.95 and therefore if CFI is more than 0.95 it does not mean that we wouldn't have a serious mispecification. And his position is close to mine. The chi-square provides you useful information. If it does not reject the model then we know that the covariance fit is very good. If it does, then we need to do diagnostics and understand the source of mispecification. One final thing about these indices that is worth mentioning is the understanding where do they come from? So who presented the first alternative fit index and why? Well, it was Carl Jörskuk who developed LISREL which was the first structural ecosystem modeling software and the first alternative fit index was the GFI index. And he said that that was developed because users were unhappy that LISREL rejected their models with the chi-square test and then he wanted to provide the user something else otherwise the users would stop using the software and use software that doesn't do model testing. So this is the history. These were invented to make users happy not really to be good yardsticks on with the models of good or not. So conclusions about model fit. The chi-square is the only test of model fit. If you want to test if the model is correct for the data then chi-square does that job for you. The other indices quantify the degree of average mispecification in the model. The problem is that the average mispecification the average covariance misfit hides lots of individual covariances and those individual covariances can indicate problems that are serious or problems that can be addressed and just looking at the average will hide these problems. So these indices are not helpful in doing diagnostics for models. Nevertheless, they can be reported because it's a convention and often reviewers will ask you to report the indices if you just do chi-square. So maybe it's a good practice to always have CFI-RMSI, RMS EA and maybe SRMR in your paper. Finally, if the model does not fit the data it is important to do diagnostics of residuals or potentially modification indices to understand where is the source of misfit.