 Yksi asia on, että nämä asioita voivat olla testovuotoja. Tällä kertaa, että esimerkiksi asioita voivat olla erittäin tärkeää asioita. Katsotaan, mitä testovuotojen asioita on tärkeää ja mitä kokemukset voivat olla testovuotojen asioita, että se on either reject the model, or fails to reject the model. Asioita testovuotojen asioita on, että kun on yksi asioita, ja model with one that more than zero of x is for freedom, then that model implies constraints on your covariances. For example, in this example from previous video, I calculated an over identification test. So we have six variances and covariances, we have five free parameters. So they imply a constraint that covariances between x and y divided by covariances between x and m must equal covariances between m and y divided by variance of m. And how we construct a test is that we take this difference here and we compare the difference to zero. If it's slightly greater than zero, then we conclude that that can't be attributed to chance only and the model should be rejected. It's not an adequate representation of our data. In practice, we use the Kaisquart test. Let's go through five examples on model testing to understand what model tests, how they work and what they tell us. So our example data is here, we have thousand observations. The number of observations doesn't really matter, but I generate the 1,000 just to have sufficient statistical power to clearly reject incorrect models. We have three variables, x, y and z, that are roughly have the same variances. They are correlated to the same decrease, so this is the covariance matrix. And let's first estimate the recursion model here. So the recursion model is saturated, means that the decrease in freedom are zero, nothing is being tested. So we have six unique units of information from the data, three variances, three covariances and we are estimating six things. We estimate three, two variances, one error variance and one covariance and two recursion paths. So nothing is being tested. This is saturated model. The fact that the chi-square test does not reject this model, does not mean that this model is correct for the data, because we could equally swap y and x and have x as the dependent variable y as the predictor and the model would still fit perfectly because it's saturated model, zero decrease of freedom. It cannot be tested. Now let's take a look at the model that actually can be tested. So if we remove this exogenous covariance here, we gain one degree of freedom. So every time you take a parameter out from the model, you gain decrease in freedom and that has a testable implication. This model is rejected. It has one degree of freedom, it is over identified, but we need to understand what is being tested. So can we, based on this model, claim that x and z cannot be causes of y because chi-square test reject this model? We cannot. When we test the model, we need to understand what are the constraints that are being tested. So this model allows the free estimation of x and y relationship, z and y relationship, but it constrains the z and x relationship to be zero. So the only constraint that this model implies is that x and z are uncorrelated and if we can take a look at these sample covariances, model implied covariances, calculate using the tracing rules and residual covariances, which are the difference between the implied and the sample covariances, we can see that the correlation between x and z is not well explained by the model. Then when a correlation is not, or covariances is not well explained by the model, we need to look at, okay, so what covariance does the model imply? It implies zero covariance. Why is there zero covariance? Well, there's zero covariance because we constrain it to be so. So this is what is being tested, is whether x and z are uncorrelated and that has no link into the claim that x and z would not be or would be causes of y. So it's completely uninformative of the claim that these are causes of y. Unfortunately, these kind of models are sometimes appearing published research or papers that I get to review. The reason is that some SCM software require that you manually add these correlations. So it's possible that the researcher draws a rigorous model and forgets to add these covariance. Sometimes when we see path diagrams in published papers, these covariance with exogenous variables are not included in the model diagram, even though from the decrease of freedom you can see that they were actually estimated from the data or they were treated as fixed in case, in which case they are not tested at all. Another scenario where this can occur that these x's are correlated to be unconstrained is that you have exogenous variables that are observed like this case. We have x and z, we have actual scores and then you have latent variables that are unobserved and are measured with indicators. I'll talk about those in a separate set of videos. In some statistical software those two sets of variables are constrained to be uncorrelated by default and then you would see that all these observed variables are uncorrelated with all the latent variables and what is being tested is not whether there's a causal relationship from the x variables to the y, but whether those two sets of variables are mutually uncorrelated. I have never seen a paper that actually justifies that kind of constraint so whenever you see a constraint between exogenous variables that is quite likely to be just the data analysis mistake. Let's take a look at the next model. What if we reverse the causality? We say that y is a cause of x and z and not the other way around. Well, this is also a saturated model. Nothing else being tested and it fits the data perfectly. If we remove this covariance here, then we gain one degree of freedom. So we have one degree of freedom here and what is actually being tested. It's also now important to understand. So one could make a claim that because this model does not fit the data, then y cannot be a cause of x or cannot be a cause of z. Well, that claim is not valid. The reason why it's not valid is that the only thing here that is being tested is does the model explain the correlation between x and z well? We can see from the residuals that that covariance is not well explained by the model. So what are we testing is does the model explain that covariance well and nothing else? Then we need to consider so what is the implied correlation and what implies correlation between x and z and it is the common cause y. So with this kind of model configuration what you would be testing is whether the correlation between x and z can be fully attributed to the common cause y. These models again are something that a researcher might estimate accidentally. There is hardly ever reason to constrain these to be uncorrelated but in some statistical software that do SCM you will need to add that correlation manual because it's not added by default. So it's very important to understand what exactly is being estimated and what exactly is being tested before you make any grand claims about what the test result implies. Let's take a look at the final example, model 5. So model 5 is a bit more complicated and this is actually a model that is very common in the literature and it's actually used for two different purposes. It is a full media model so we say that z influences x, x influences y and it's rejected by the chi-square so we know that the full media SM model is not adequate representation of the data. The next thing we need to again understand what part of the data the model doesn't explain and we can see that it is the correlation between z and y and what is being tested. It's actually what is being tested is whether it's not, whether this is a partial media SM model. So many researchers would argue that because the full media SM model is rejected, therefore the data must follow a partial media SM model. Another group of researchers might argue that this is actually an endogeneity test. So z is an instrumental variable, x is a potential endogenous variable and this is a test of whether the error term of x correlates with error term of y. So that's an endogeneity test and this is actually a fairly common endogeneity test. Or we could just argue that this is a test whether z and the error term of y are correlated like so or we could argue that there's a feedback loop between y and z. There are like tens of different models that you can construct using just three variables. So we cannot use the rejection of full media SM model as a criterion for acceptance of another model because there are multiple different models that would explain the data well. So what exactly is being tested here is whether the full media SM model fully explains the covariance between z and y and nothing else. If that full media SM model is rejected and if we consider alternative models then we need to consider based on theory what is the most plausible alternative model. Is it more plausible that there is a full mediation or is it more plausible that x and y actually have a common cause that is not included in our model or some other configuration. Then we go based on that. So we should not automatically say that because the full media SM model is rejected therefore partial media SM model should hold for the data. Let's summarize these five models that we tested. So we have the regressor model zero decrease of freedom cannot be tested. We have the regressor model where the x variables uncorrelated that can be tested. We have the common cause model where x and z are correlated. We have common cause with uncorrelated errors and then we have the full media SM model. So we can see that based on this summary we can clearly see that testing a model does not tell us anything about directionality because comparing the first model against the third model where we just reverse the paths and it doesn't tell us anything. Both fit the data perfectly. Some people, some researchers are tempted to conclude that because we could actually compare model two, model four and model five because they all have one degree of freedom and then conclude that the full media SM model here is much more plausible for the data than this regressor model. That is an invalid conclusion. This regressor model only tests whether these x and z are uncorrelated. It does not test whether x affects y. X to y is freely estimated. So you are actually testing of whether you are omitting something in the model. You are not testing whether there is something in the model that shouldn't be there. So we are testing constraints, we are not testing for unnecessary things and this is important to understand. So to summarize these tests tell us whether the current models explain the covariances and really not much more and then you need to have a really strong theoretical reasoning to argue that another model will be better. They are important to understand which covariances are not explained. So you need to look at the residuals from the model. So if the model fails, it means some of the covariances are not well explained by the model, you need to understand which ones before you make any claims about the model itself. Then we have these don't tell us anything about the specific alternative model. So model tests are only about the model at hand and they cannot tell us anything about any potential alternative model. So the fact that full media SM model is rejected does not imply that partial media SM model should be accepted. And finally they tell us nothing about the direction of effects. So direction on effect of effects can only be influenced or assessed by having a research design that allows you to measure effects over time just having cross-sectional data set where you estimate the effect in different ways will not do that for you.