 This video will show an example of diagnostics for a confirmatory factor analysis model. Doing this diagnostics takes a lot of time and a lot of expertise, so this video will be a bit longer than most other videos that I've made. Let's take a look at the problem that we are facing and what kind of data and model we have to get an idea of the task ahead. So I'm going to be using a very large sample size. The reason for using a very large sample size is that we want to be able to understand what it means that chi-square will detect trivial small problems, and if your sample size is 200 or 300, then chi-square still gives you some wiggle room. So we are going to be using a sample size of 6038, and the data are from PISA survey. This is a survey of students, and we are going to be analyzing a set of three factors, a model of three factors, aspirations, enjoyment and self-concept. I'll explain the variables in a moment. This has been used to demonstrate non-linear structural equation modeling. The data are non-normal, so we will be using robust chi-square statistic. The workflow here is going to be the measurement validation part of the analysis. This is typically the part where you spend the most effort. Once you get the CFA model to fit well, then doing structural regression model is fairly simple because those tend to have less constraints than the CFA model. So the decrease of freedom that you gain when you go from converter factor analysis to structural regression model is fairly small compared to the decrease of freedom of the converter factor analysis model. So a lot more things can go wrong in a converter factor analysis model than in the structural regression model. The chi-square rejects the model, and that's the purpose of the diagnostics. If chi-square did not reject the model, then unless our sample size is very small, we pretty much know that the model explains the data nearly perfectly. But with this sample size, I've never seen a model that would fit perfectly with the first try, so you always need to do some diagnostics. The diagnostics tools that we use are residuals, modification indices, and exploratory factor analysis. I'll use a couple of different kinds of residuals and explain them when I use them. We'll do modifications, so we'll add minor factors. We discover that there's some dimensionality in some of the scales, and we address that by adding minor factors or doing bifactor models. We also be dropping indicators because some of these scales have more items that we actually need, and sometimes just dropping an indicator that you can't fix is the easiest alternative. Also, in the last step, I will add a couple of error correlations that are going to be trivially small, but that's not the main thing in the diagnostics. The main things that I do are bifactors or minor factors to the model and dropping some of the indicators that are simply dimensional in a way that would be difficult to model. So let's take a look at why the chi-square is important and what are the arguments against the chi-square in this kind of data. I have another video about the chi-square that covers these arguments in more detail, but the main argument against chi-square is that every model that you can possibly fit is going to be an approximation of reality. So reality is more complex than our models, therefore all models are approximations and they are incorrect. They can be useful as close approximations, but they are not exactly the same as reality. Therefore, if we have sufficiently large sample size and sufficiently powerful test, which chi-square is, then every model will be rejected ultimately if the sample size just grows up. There may be some special cases if you have, let's say you have a random number generated from a computer that may be exactly uncorrelated with something else in the population, but beyond that kind of experimental research. If we do observational research, then trying to get every correlation fully explained to the data is pretty much impossible if you have an infinite sample size. We are interested instead of getting a model that fits perfectly, we are interested in understanding to what extent and how the model does not fit and what is the consequence of the misfit of the model. Then we make an informed decision on what to do. This is basically the thing that I follow. There are two extreme ways of how people address chi-square again. This is an argument that I explain more in another video. One of these extremes is that models with significant chi-square are wrong and they should not be published because we don't want to be publishing wrong models. This is not a very productive approach because ultimately if you have a survey data set, we are pretty much guaranteed to be not able to explain all the different processes that people go through when they answer those questions. We are simply trying to approximate the survey response process as well as possible and then say something about theory using those data. If we go to this road, we will also lose information that we could learn from trivially mis-specified models. If the model is mis-specified only a little, then the estimates that we have, if we are biased by 1% or 0.5% or even less, then it's better to use those slightly biased estimates than not using the results at all. The other extreme is that we completely ignore the chi-square, which is to look at approximate fit indices. If CFI is greater than 0.9, then we have a well-fitting model and if it's less than 0.9, then we do some stuff. But that's not very good approach either because these alternative fit indices, they are quite loose. So CFI 0.9 is not that strict of a benchmark and also trying to quantify the misfit with a single number hides a lot of nuances and lots of things that we could actually possibly address. It's possible that one of these summary indices of model fit show that there is on average not much mis-specification, but that hides one particular problematic part of the model and other parts are pretty okay. So we need to do something, we need to respect the chi-square. If chi-square tells that our model is not correct, we need to understand in which way it is not correct. And that requires that we do diagnostics, we need to understand the sources of misfit and possibly to do diagnostics, we need to do sensitivity analysis. So we need to understand how much the misfit or how much the possible changes that we could do to the model affects the results. If we add some things to make the chi-square significant and there is no meaningful difference in a correlation that we are interested in, then adding those things to make the chi-square non-significant wouldn't make a difference. And importantly, I'm not going to be using any of the approximate fit indices, I'll print them out just for reference and I'll say a few words about them, but those are not part of the workflow. We'll just be using chi-square, we're going to be using modification indices, residuals and then explorator factor analysis. These follows, there are recommendations by Klein and basically we are looking for theoretical reasons for adding stuff to the model. So we'll be adding things to the model and data simply says that we could add something, but then whether we actually want to do so, that needs to be justified based on theory. So we're looking at survey items and then realizing that well, actually when we look at the scale, there is some dimensionality that we should have noticed even before running the analysis. In some cases it's not obvious, but when we have the data that shows that there are two dimensions, we realize okay, yeah, maybe that is the case. So we're not going to be using these approximate indices, but we'll be focusing on theoretical reasons. Why would you like to include something in the model? Let's take a look at the data. So we have three scales, we have enjoyment scale, it has five items and items are listed here called referred to as X1 through X5 in the model. And then we have self-concept, which is basically how much you think that you know science things. And six items, you can read more about this survey using these links. So that's the link to the PISA report and if you want to take a look at the actual forms, this is the link to the forms. And then we have carrier aspirations in science, four items measuring how much a person would like to work in the science field. And so this is a quick overview and we are going to be feeding this data. We're going to be feeding a three factor model along these scale dimensions and then see what happens. And I'll be looking at individual items when I make different decisions on what to do about the model. So let's get started. So this is the first thing that we do is to just run a converter factor analysis of the data. We got 87 degrees of freedom. The robust high square is very significant. So the model is not perfect for the data. At this point for reference it is useful to check what are the levels of the alternative feed indices for our data. So the CFI is 0.9740.69. That exceeds the commonly used benchmark of 0.95. AIC and BIC are not useful for model testing, so I'll not talk about those there for more comparisons, not for model testing. And then we have RMSCA, the rule of thumb is that it must be less than 0.05. And then we have the confidence interval that's ideally less than 0.05. That is the case here. And then SRMR, standardized root mean square residual, should be less than 0.05, which it is in this case by a pretty good margin. So if we only look at these alternative feed indices, we would say that this model is okay. We would ignore the chi-score, we wouldn't do any diagnostics and we would proceed. In the end of the video I'll take a look at whether doing that would actually make a difference. But for now we'll do diagnostics to understand the process. And at the end of the video we will have a significant chi-square for the full model with reasonable modifications and a sample size of 6000. So the sample sizes are in the thousands is really not a justification for not looking at the chi-square. You can make the model fit whether you actually want to do so. That's another question that I'll answer in the end of the video. So the approximate or alternative feed indices indicate good feed and we will basically go with whatever we have. But the chi-score rejects the model so we are in the position of what now. And what we do now is the diagnostics. So the problem is that these indices show an average. So we have quite a few indicators, quite a few correlations. We have 80-some degrees of freedom. So there are lots of small things that could be grossly incorrect that are not really shown because we only look at the average degree of mispecifications. So we have average degree of misfit. We need diagnostics to understand which of the correlations or covariance is the model does not explain. The first thing that I do is simply to print out the residual covariance matrix. And here we have it. This is a correlation metric. So they are the intermediate like correlations here. And when I do this kind of analysis at the office, what I quite often do is that I print out like a large A4 or A3 sheet and I'll print out the correlation matrix. I'll print out the residual covariance matrix and then I'll use marker to highlight large correlations and try to understand what is going on. So here I have highlighted with yellow all these residual covariances that are more than 0.03 and with red I have highlighted everything that is more than 0.05. So we can see if there's a pattern in the data. If there's a pattern that we have misfit over here but not over here then that indicates that there's a local mispecification around here and we should do something about it. But this is more like scattered around so there are like no big clusters of mispecification. Maybe x3 has a bit more large residual covariances than x2 for example. But there are no like blocks of large covariances. There are individual items x6 and so on. Another thing that we will look at is whether the distribution of these residual correlations is about normal. So in a well-fitting model the distribution should be normal and well it's close to normal. If there are any peaks on the right hand side or the left hand side then that indicates that there is like some larger values than what is expected because of chance only. So in large samples these are normally distributed and the Keisko just tells that whether that distribution is narrow enough that we can attribute it to chance. In this case we cannot. Another thing that I'm looking at here is the correlations within scales. So these are the scale within scale correlations. We can see that on the second scale it looks better than the first scale and the third scale. So this indicates that the first scale is just more higher residual covariances than the second scale and third scale has higher covariances. This indicates that there is potential dimensionality here. So there is a dimensionality, one or more minor dimensions in the scale that the current model doesn't explain and therefore some of the correlations are unexplained. Another thing that we could be looking at is the standardized residual covariances matrix. The standardized residual covariances matrix basically takes the residual correlation matrix. So the residual covariances is in the correlation metric and then it divides its covariance with its estimated variability. So we basically calculate like a standard error for each residual covariance and then we divide each residual covariance with that standard error. The standard error quantifies how much each of these residuals would vary from over repeated samples if we were to study over and over and over with repeated random samples. So of course residuals, they depend on sampling error and we need to quantify the effects of sampling error. So these standardized residuals tell us what is there or how plausible it is that these are due to chance only. So it is kind of like a chi-square test for each individual covariances and these are actually interpreted like z-statistics. So if you have a statistic with an absolute value of greater than 2 then that would be individually statistically significant. We can see that there are some large values 4.7 here. There are blocks 7s, 4s, 3s, so we have individual correlations that are individual statistically significant. This correlation matrix in standardized form on standardized residual covariances matrix is sometimes more useful than the raw residual covariances matrix because these are larger numbers and it's easier to see patterns in larger numbers than smaller numbers and this is one of the reasons why I'm looking at this. Another thing is that this basically tells more directly which of these correlations are going to affect the chi-square. So if we take the highest correlation here, highest standardized residual correlation then that will decrease the chi-square the most, so at least roughly. All right, so we didn't see a particle pattern here in the residual correlations. What we need to do next is to just run an exporter factor analysis to see what the result will look like. And this is quite often what I do just because running an exporter analysis is very easy to do and it always gives us some data or some results that we did not know in advance. So the exporter factor analysis here, what I look at is whether all loadings except the main loading are close to zero. So we can see here that these first indicators, first five load on one factor, the next six load on another factor and then the last four load on the third factor and all the cross loadings are very small, so there is 0.12, that's the largest, but otherwise they are less than 0.01. So if we were to use an exporter factor analysis we would consider that this would be a very clean solution. So we know that the data have approximately three dimensions but not exactly three. So the chi-square tests if there are exactly three dimensions that explain this data and the answer to that question is no, there is some dimensionality beyond those three main dimensions but as an approximation, would this be good enough while it depends a bit on the purpose of the study. But let's do the diagnostics and see what it takes to get chi-square to be significant. So the overview of the workflow this far is that we did the diagnostics for the large model but trying to fix a large model by looking at it as a large model is not easy. So I prefer to break the model into smaller components. So if I have over-identified factors which means factors with more than three indicators then I'll be looking at one factor at a time or if I have just identified factors, factors with three indicators then I may be looking at two factors at a time. So we're just looking at smaller parts of the bigger problem and trying to fix those smaller parts first and then return to the bigger problem after we fix those small parts. So let's take a look at the diagnostics that we do for one factor at a time. We have correlation residuals, standardised residuals, modification indices, we apply bifactor models and export the factor analysis. The objectives of this analysis in this video is to understand the source of mispecification and to get as non-significant chi-square. I'll talk about whether these objectives make sense in the end of the video. So let's get started with the first factor. So we have the five items, five items X1, X2, X3 and X4 and we fit the converter factor analysis. The model is rejected. So one factor is not sufficient for explaining these data. So what we do now is that we look at the loadings and we can see that all indicators except X3 estimated the load about the same and then we need to understand, okay, so why does it not load? So we have five degrees of freedom and a chi-square of 152. So this is not even close and in the next factor we actually have nine degrees of freedom and chi-square of 80 initially. So that's a lot closer than this one. So ideally there are the tests that this thing will be close to the decrease of freedom and once it's about twice the decrease of freedom depending a bit on the decrease of freedom then we'll start getting a non-significant chi-square in these small ones. So what we do next is we take a look at what correlations we have in the data and what parts of those correlations the model does not explain. So let's take a look at these R results now and we have the items here. So whenever you think why do these items correlate and why does the model not explain those correlations you need to be focused on what the items actually are. So these are the enjoyment items and we can see that there are some interesting correlations so we have a large residual here and we have a large set of residuals here that are positive and we have negative residuals here. So we have a pattern of two indicators not correlating more than what the model predicts. Set of three indicators correlate me more with the model predicts and these two sets correlating less with what the model predicts. And this pattern indicates dimensionality. So there are dimensions in the data that the model does not explain. X1 and X2 are about reading. So if we take a look at these this is more about passive it's about learning without doing any specifics and this is about reading. And then third, fourth and fifth is something that requires more effort so you are acquiring new knowledge you are doing science problems and the fifth one is pretty generic. So it's not like we would have noticed that in advance but now that we have the results we can see that maybe there is some dimensionality along these lines. We can also check exploratory factor analysis of these data. So this is the unrotated exploratory factor analysis and why this unrotated is that I want to see how much dimensionality there is beyond the main dimension. So the main dimension loadings are here this is maximum likelihood factor analysis so the loadings are pretty much the same than I had in the converter analysis and then I want to run a factor analysis on the residuals from the confirmatory analysis. We can see that the first two items load more negatively on the secondary factor, third item not that much and then item four and item five load positively on the secondary factor. And the secondary factor indicates the existence of a secondary dimension so we need to model this dimensionality in the scale if we run an exploratory factor analysis using factor rotation we discover that there actually is the first two items one factor, items three, four and five they load to the second factor the factors are correlated at 0.8. So if we fit a single factor model to the data then this 0.8 is not exactly one and the chi-squirty takes that the model actually does not fully account for the dimensionality of the indicators so what do we do? Options are here by factor model with reading and doing as minor factors or we can do one secondary factor reading versus doing which loads positive in X1 and X2 and negative in X3 and X5 and X4 so the second option is basically specifying this kind of factor structure that we had in the initial exploratory factor analysis and the secondary factor which loaded positively and negatively I'm going to be doing the by factor approach first because that is a bit easier to do so we'll add a secondary factor for reading loading on X1 and X2 and the factor loadings are constrained to be equal because of identification and the secondary factor must be uncorrelated with the primary factor so I add orthogonal X equals true as an argument to my function call let's see the results so we have a robust chi-square which is significant so the model is rejected but there's quite a lot of large difference between the previous model so the previous robust chi-square was 150 and now we're down at 11 so there's clear dimensionality so we need to account for in the analysis another thing that we notice is that the indicators X1 and X2 loadings decrease so it was about 0.7 something and now it's 0.6 something and these minor factor loads are substantial so normally when we want to have a minor factor when we add a minor factor to the model ideally we would have the minor factor to load just not as strongly as the main factor just to be able to call it a minor factor but this is a pretty substantial load why do these indicator loadings decrease to understand that let's take a look at the vendiakram so this is the vendiakram and we added a minor factor of reading so this is basically the blue area here and then we have the general factor and then we have the factor enjoyment but because we don't have or we have the secondary minor factor for doing but because we actually did not add the doing factor here then this factor, this general factor of enjoyment actually includes the enjoyment factor as well as the doing factor and for that reason these X1 and X2 load with smaller coefficients because they don't correlate with the doing factor so that's the reason why these are lower all right let's take a look at the residuals so the residuals are here so we have the observed covariance residuals and then standardised residuals we have X1, X2 correlation explained perfectly because we added a minor factor that explains the correlation perfectly and all these here indicate the picture of misfit but we knew that the model is not going to fit well because we identified the reading factor and doing factor and we only added one we can also take a look at modification indices so these modification indices tell us that maybe we should add X4 as an indicator for reading which wouldn't make much sense X5, X3 and X4 correlations and so on if we take a look at these items we can also think that okay so X3, X4, X5 they are problematic we are trying to address that by adding a secondary factor for doing that loads on those indicators so we had a secondary factor for doing and now we have two minor factors we run the model and we are winners we got a non-significant card score with one degree of freedom now the model is not rejected so do we use this model or do we do something else let's take a look at the estimate so the estimated factor loading is here once we added the doing factor now the X1 and X2 load highly or higher on the main factor which just indicates that the doing factor is now no longer contaminating the main factor but some things that we notice is that this is a pretty large load so that's a pretty large load for minor factor and so this minor factor actually explains some degree of the data we would like that level of explanation to be a bit lower so I'm going to try the other option as well so the residuals look good there some but this could be a bit too chance only so I'm going to try the other option so I'm going to do also a secondary factor that loads on all the items it loads on the first two items positively and the last three items negatively so I will add this constraint here and this constraint forces convergence if we don't add the constraint then the model will not converge the model is not identified for the same reason that an exporter factor analysis is not identified but can be rotated to get equivalent solutions but we will not care about that for now because it will be identified once we embed this factor into a larger model that contains other factors as well so how does the model fit the data? well it can be tested because it's not identified so this is not testable model but what we can see is that the factor loadings here are look good and then importantly the minor factor loadings are smaller than in the previous model so we don't have the large loadings for X4 anymore and I'm just going to go with this model but we have to remember that this is not identified so these loadings may change and we need to revisit the loadings once we have this model in a larger context so the fact that the model is not identified at this point is not problematic of course if this was our final model and then it's not identified then that would be a big big problem for us but we know that this will be identified later and we were just trying to see if this kind of model would explain the data and all non-identified models with the same constraints will explain the data equally well so we're not really that interested in the specific values at this point we'll identify them later so summary this far, what have we done? so the full model did not fit the data we explored the factor analysis indicated the existence of quite clean three-dimension solutions diagnostics one factor at a time the first factor showed dimensionality we had reading versus doing and we needed to address that dimensionally somehow alternative one was to add two minor factors make it a by factor model, reading factor and doing factor another alternative was to add this reading or doing factor that loads on all items this is similar to running an exporter factor analysis which is not rotated so you first take the main dimension and then you take whatever covariation is remaining after the first dimension and then factor analyze that and that's the choice that we made because the loadings on that secondary factor were smaller than in the by factor case then we proceed to the next two factors and these next two factors we need to do a bit of different diagnostics then we do the full model and then we decide what to report and which model actually to retain in our final analysis, final interpretation so let's take a look at the second factor so we have six items here and this is the R code for the analysis before we actually run the analysis let's take a look at these items and how they differ because we can clearly see that the first item is more future looking than the others so the first item X6 is about what do you feel about the future and items 7 through 10 are like the present and the past then item X7 I can usually give good answers just tests is more objective more fact based than the others so these others are more like assessments of how do I assess myself and this is more about facts so have I been able to perform well in the past and then these are X8 through X11 they are more about your feeling of yourself at current time or in the past so they are, we could expect them to be slightly different from X6 and X7 let's take a look at the analysis so this is our results the first thing we note all is rejected so the chi score is significant the loadings of the first two items are a bit lower than 8, 9 and 11 and that's or 10 and that was expected because they were slightly different than these other items so what do we do? well let's take a look at an exploratory factor analysis first so we run the main factor here and it loads on all items except X11 and the second factor only loads on X11 so there is no clear dimensionality once we start to have these factors in an exporter analysis that only explain one indicator then we know that there is no meaningful factor structure beyond a specific number of factors so we know that there is no clear dimensionality this time and we can actually see it from from this chi-square so chi-square with 9 degrees of freedom and 81 that is a lot closer fit than we had before when we had 5 degrees of freedom initially and 150 chi-square so the starting point is already much more due to the dimensional than in the previous scale so let's move on no clear dimensionality then we inspect the residuals so we have the sample statistics residuals in correlation metric and then standardised residuals I have highlighted values that are greater than 2 with yellow and values that are greater than 4 with red just to see the pattern in the residuals so what do we observe we observe that X6 and X7 correlations are lower than with the other indicators and that's natural because they measure slightly different things than the others so there is the future looking dimension and then there is the fact dimension compared to the feeling of the present there is no clear pattern in residuals well we can see that X10 and X11 or X11 has the most these very large residuals then X6 has larger residuals as well so what do we need to do let's take a look at the modification indices what we should do and modification indices simply tell us that we should allow X10 and X11 or X8 and X11 to correlate and if we take a look at X10 and X11 and we take a look at these items why do they correlate well both have the word understanding here so X10 and X10 are about understanding and then X8 and X9 are more about whether you learn quickly or whether a person can learn quickly but without really understanding or a person can really understand things but not be that quick of a learner so there is a dimensionality in your learning and that dimensionality is shown here in the scale and it's shown in the modification index so we can also see the X10, X11 correlation here so we have a that's the large positive correlation large positive residual so what's the actual observed correlation and now there's an interesting thing so why is this correlation 0.495 why is it underestimated so this is not a very large correlation if it was a larger correlation than others then we could say that maybe there is dimensionality in the data but this 0.495 is actually smaller than 0.522 so why is this particle value overestimated by so much we need to understand that the implied correlation is a product of these two factor loadings of X10 and X11 and X11 is estimated to be very reliable but X10 less so so why is X10 estimated to load a bit less on the factor than X11 and this small loading of X10 is actually what causes this positive R residual covariance the implied covariance is again the product of these two loadings so why is there such small reliability value on X10 the reason is that X10 correlates very little with X6 and X7 so these correlations X6 and X7 with X10 X-lain why the factor loading is small and the small factor loading explains these large residual covariance so what do we do well we have some options I will try dropping X6 from the model so that's because X6 is more of future looking than others so it is really different from others X10 there is no real clear reason to say that it is different from others so it is easier to justify that we drop X6 because it is different from these others and then we drop that and we will see what happens that will likely increase the reliability of X10 and also decrease this residual covariance so let's take a look at what happens so we run without X6 and we get a significant chi-square model is rejected and now what so the loading of X10 increased a little but not much and when we take a look at the residuals we can see the same pattern here X10 X11 correlation is not fully explained by the bottle so that is large X8 and X9 have also a large negative correlation if we take a look at modification indices the modification indices simply say that well we free these correlations that we have highlighted so that's not very useful let's take a look at the similar analysis of the loadings and the actual items here and so we have the loading of X10 and why is the loading so low it is low because there is the correlation of 0.396 which are with X7 which pulls the reliability or factor loading of both X7 and X10 low and that causes the residual covariance to be large now we need to start thinking about what do we do about this there is no clear reason to add things to add dimensions to the model in the previous scale there was clear dimensionality here not really so do we drop items and if we start dropping items to get the model to fit we need to understand what is the meaning of the concept so what is the meaning of self-concept and that's one definition from Wikipedia and then does the definition of the concept give us any information on if some of these items are not compatible with that definition or not and looking at the definition there is no clear mismatch between any of these indicators and the definition so that does not really help at this point so what I will now try to do is that because these capture quite different facets X6 and X7 were slightly different from others so if I drop X7 then I will have a very homogeneous set of indicators and then if we look at these what are the problematic items X7 is actually explained really well by the models the results are small we have problems with X8, X9 and X10 so we are going to be dropping try dropping X10 because if we drop X10 we don't really lose much it's about the perception of one's capabilities at current time instead of being future-looking or fact-based like the first two indicators so we will be dropping X10 from the model and see what happens and now we get a significant non-significant chi-score so the model is not rejected we have two decors of freedom we declare victory for this factor and then we move on now is this the right way of doing it so we drop two indicators when we drop two indicators we lose information and in principle it is possible and I was actually able to get the model with the six indicators that fits the data but the model gets really messy so if we want to communicate our research results to some other people then it's easier to go with simpler models that are adequate and if we really want to have a significant chi-square in this case dropping two indicators will get sorry non-significant chi-score then dropping two indicators will get us to that point and if we compare the indicators that we dropped X10 and X6 then we don't really lose anything important considering the definition of self-concept so we can drop indicators as long as the indicators as a set contain cover the concept so we don't lack content validity and as long as we have enough indicators to identify the model so four-factor indicator is four-indicator factor is pretty good because it's testable and then with 6,000 observations chi-square does not reject this model so summary this one, we tried the full model the chi-square did not reject the model it did not fit perfectly exploratory factor analysis looked good so it gave us some hope that we can actually find a well-fitting model then we do diagnostics one factor at a time first factor so dimensionality we added a secondary factor, the first factor second factor did not have a clear reason to misfit and the decree of misfit originally was not as bad as for the first model so if we look at the actual the residual covariance values they were not that large to start with so we could have just perhaps gone with the full six indicator set and maybe that wouldn't have made a big difference we'll take a look at that in the end of the video the second factor was fixed by dropping two problematic indicators so basically we omitted information and then the model was able to explain what remained in the data and it's also possible to get a well-fitting model six indicators but that gets a bit messy so I didn't want to do that and include that in the video now we'll proceed to the final factor and then we'll do the test of the full model using these modifications so the final factor has four indicators if we drop any of these indicators this model will become just identifiable and we know that it will fit the data perfectly and let's take a look at the results the chi-square shows no model does not fit, it's rejected we need to now understand what is the reason and if we take a look at these loadings y2 loads highly y4 not so much but they don't really help us that much let's take a look at the residuals so the pattern of residuals shows that we have a large positive correlation here which is y1 and y2 we can take a look at the standardized loadings it's easy to see the pattern then we have a positive correlation here with y3 and y4 and then as a set y1 and y2 are negatively correlated with y3 and y4 as a set so all these are negative we also have y1 and y4 that are not as highly correlated as the other indicators so this kind of pattern indicates dimensionality and the small correlation here between y1 and y2 indicates that it is specifically y1 and y4 that measure two different things and now we need to look at the items what the actual items are to understand what the y1 and y4 measures represent so y4 is more about working on projects and y1 is more about having a carrier in science so what do we do? we proceed and we take a look at modification indices then modification indices basically show that add a correlation between y3 and y4 add a correlation between y1 and y2 this is equivalent to adding a minor factor for y1, y2 or a minor factor for y3, y4 and that's what the modification indices show then we do an expert factor analysis and show the rotated solution so this is the rotated solution we see the first two factors indicators loading on the first factor y3 and y4 loading on the second factor with y2 and y3 load they crossload a bit so that is a non-trivial crossloading for both so what's the dimensionality? we have y1 and y2 more directly about the carrier y3 and y4 more about doing science and now our scale was about about carrier aspirations these modification indices simply show the minor factors so our options are to use a bi-factor model to try to explain these two dimensions or we can just drop y4 the reason for dropping y4 is that the carrier aspirations in science is more about carrier and y1 is more about carrier than y4 so if we drop y4 then we will get a unidimensional solution because the model will be just identified it will fit perfectly and if the factor loadings are acceptable then we could go for that model we can see here that y3 loads crossloads on the first factor quite heavily so we could argue that whatever is in the y4 will go to the uniqueness element of y3 and it does not cause misfit in the model so let's take a look at the model that we chose it's not testable decrease of freedom is zero because it's just identified so we just look at the factor loadings the estimated factor loadings are pretty good y3 because it loads on the secondary dimension and it has higher uniqueness than the others but that's not really a big problem so 0.7 no one is going to complain about that reliability in a conservative factor analysis model so summary this far so we have been able to get all the factors to fit individually or being on testable with decrease of freedom of zero now we are going to be taking the factors together the larger confirmed factor analysis model where we take these modified factors and we put it back together so the idea was that we first test the full model it does not fit, we take it apart we fix individual part and now we put the parts back together so this is the putting back together so we have the read or do minor factor otherwise the factors are the same we dropped one indicator from aspirations and two indicators from self-concept and let's see what the chi-square tells us so do we are we winners or not no we are not winners and the model is rejected so we need to do something about it we can now do an expert factor analysis to get some hope so the EFA of these items shows a pretty clean solution but that's expected because the original solution was pretty clean but we just dropped a few indicators it should not affect the model at all so the data have approximately three dimensions now the largest cross-loading is 0.09 it does 0.112 in the original EFA and we just dropped the cross-loading items so that's good and now let's take a look at the individual items so what can we say about the individual items based on the expert factor analysis so this is the full set of items here and let's take a look at the individual items so first item that does not behave well is X3 X3 here does not have a high loading on the first factor and it has cross-loadings on the factor 2 and factor 3 why is X3 different well that is about doing these problems so when we analyze the first scale we notice that we have the doing versus reading and this is more about doing than the others and in this model we don't have the secondary factor for reading versus doing so that's the reason why it does not fit then we had X7 and X7 is more objective so it is different from the other self-concept scales and that these others are more about how do I see myself this is more about facts have I been able to perform well in the past and person would have like an objective way of saying something to that question then we have X10 which loads at less than 0.7 again but here the cross-loadings are not that high so maybe we keep that in the model and dropping X7 and X10 will be a bit problematic as well if we drop both then we only have two items in the analysis the Y-items, Y1, Y2 and Y3 they load highly on the factor where they are supposed to load and there are no cross-loadings so these cross-loadings are practically zero then we take a look at the residuals and mark large residuals with red and large is with yellow we can see that the item X3 is the one that cross-loads or has the highest residuals even after the full CFA and when we take a look at the items that's about the problems so other items are about learning, this is about problems and if you like to do science problems then that probably correlates with liking to do those when you are an adult as well so there is this X3 it could actually measure in a way the same thing that the aspiration scale measures when it asks you whether you want to do science problems when you grow up to be an adult so because of that cross-loading we could of course model it but we have enough indicators and trying to get the model that explains all these correlations will be quite difficult so let's take the easy road then we'll drop X3 from the analysis so we still have three indicators or four indicators for the first factor X1, X2, X4 and X5 so dropping one we are still four indicators so that's good so we can do dropping of X3 let's take a look at the factor analysis now so this is our second modified full factor analysis model the chi-square tells us that we are not winners yet so model is rejected but we are getting closer so the test that this will get smaller of course the decrease of freedom will go smaller as well but that's what happens when we simplify the model we drop constraints, we get better feed but also there's less things to be tested now when we take a look at the residuals there's a lot less red here and I'm gonna pull up the items again and now what do we do? so we can see that X7 has the largest loading X4 has the largest loading so we could consider dropping X7 or X4 just to get them all of the fit and then we need to consider what do we lose if we drop X4? so if we drop X4 then we only have three indicators of enjoyment but we had two factors to explain those three indicators so we are gonna be seriously overfitting the model if we drop X4 if we drop X7 this could be dropped because it's more objective so it's different from the others so the other self-concept items were more about feeling and this is more objective so that would be a reason for dropping it it's a bit different from the other self-concept items also the self-concept scale had four items but no minor dimensions so if we drop X7 then that factor is still going to be identified itself without information from the other model and that produces more stable results so we are gonna be dropping X4 on these grounds and then let's take a look at the model so this is our model we have dropped indicators four indicators for enjoyment we have a by factor for that four indicators set and three indicators for self-concept and aspiration so those are just identified three indicators still pretty good and what does the chi-square say? no victory but we're getting closer so this is significant, I'm always rejected now let's print the residuals we can see that now when we look at this individual standardised coefficients then we are no large standardised residuals anymore so everything is below four in absolute value so they are still significant but not as large as before then we can also ask the question that when we look at the absolute magnitude of the correlation residuals the largest one that we have is 0.40 so is it really a big concern that there is a small 0.04 correlation that the model doesn't explain is that correlation already trivial? so these are very close to zero they are not exactly zero so would this qualify as trivial mispecification that is just detected by the chi-square because the sample size is 6000 so if you have a sample size of 6000 then 0.04 is statistically significantly different from zero but is it meaningfully different from zero that's an entirely different question let's take a look at the residual residuals and modification indices the modification indices simply say that well add correlation between x4 and x11 because this is the largest residual and if we add that correlation then the expected estimate for that correlation is going to be 0.03 so should we add a correlation to the model where the parameter value for that correlation will be that small can we just assume that zero is a good enough approximation for this or do we really need to add this to the model what I'm going to do is I'm just going to do a bit of data mining and then we'll see the end result so I'll do a couple of rounds I'll add the correlation here to the model and I'm always going to be picking the largest of these modification indices and an error correlation added to the model and see what happens I'm going to do this for four rounds and we got four error covariances here and model is not rejected so we did a bit of data mining at the end I tried to come up with explanations for those correlations but I really couldn't come up with minor factors but then again if we take a look at these correlations they are trivialism more so we add four correlations that are not driven by theory but the correlations are meaningless so these correlations if you had a sample size of 6000 you wouldn't notice these correlations with plain eyes so they make absolutely no difference for most research scenarios so these correlations are small and therefore I think at least myself I think that adding these correlations purely based on these data mining approach would be okay because these don't really matter they're so small alright let's take a look at the model so we have estimates that are reasonable the factual loadings are what we expect and there's a bit higher loading in a minor factor a bit higher of what I would like but I can live with that the other loadings are small and then the factor correlations are pretty high and that's expected based on theory so let's do model comparisons that's our code if someone wants to replicate this analysis and we're going to be comparing 5 confirmatory factor analysis models so we have the full CFA so this is the original model this was the CFA after factor level diagnostics so we identified the secondary by factor in the first scale we dropped indicators from the second and third scale and then we dropped indicators 3 in the third model indicators 7 in the fourth model and then we did data mining in the final model to get a non-significant chi-square now the question is okay which should we use so we could go for the final model on the basis that it is non-significant according to the chi-square test but then again we had to do a bit of data mining and after dropping x3 the residuals were already very very small so we had a very well fitting model there were only trivial correlations and they were small and they were significant because of chi-square there was power in large samples so maybe this small where we dropped x3 would be acceptable and we don't need to go to the data mining or maybe we picked the x7 because where we dropped that because here we basically we justified still this action it does not purely data mining and the second question is did the model modifications make a difference so we are interested in estimating the factor correlations so are there meaningful differences between these factor correlations if we look at the original converter factor analysis model and then the final one there is a minus 10% difference so minus 10% probably makes a difference for many people and that just shows that when we do the full CFA even if all our alternative fit indices look good it does not guarantee that the results are unbiased or the model is correct if we do diagnostics we can understand the source of mis-specification and when we address it the results might be different let's compare x3 and this model here and the largest difference is between these two correlations again and it is only minus 4% difference here after dropping x3 that was kind of justified we started to do a bit more questionable things so that was more about data mining than theoretical justification x7 may be justified but this was pure data mining so would it make a difference whether we choose the first the third model or the fifth model well we need to understand if minus 4% is large we need to contextualize what is a large difference so there are a couple of things that we can consider as context quite often in methodological research when we analyze how different estimation strategies work in small samples typically we can prove that things are consistent but we cannot prove small sample behavior and in practice many of our estimation techniques are biased in small samples maximum likelihood estimates are biased in small samples in methodological research we typically consider 5% bias to be trivial and we are okay with having 5% bias if you think that 5% bias is too much then you pretty much shouldn't do anything with sample sizes in the hundreds using maximum likelihood because there is bias and that's simply something that you cannot avoid so based on this consideration that I think that 5% bias is trivial I am going to be choosing the model 3 so I am going to be choosing that model and this is something that we did for diagnostics but we considered that the effects of dropping x7 and doing this data mining they did not change the results in a meaningful way therefore we are not going to be using these models but it is important to show this as diagnostics so how do we justify choosing a model 3 well we first take a look at the residuals in absolute sense does this model fit the data we can see that the largest residual correlation is 0.049 that is very small and the average absolute correlation is 0.011 all the approximate fit indices also indicate extremely good fit how do we justify choosing model 3 so this is what I wrote up so if I were to report this analysis in a paper I would write something like that I would probably expand it a bit just to tell a bit about the details I'll put an appendix to the paper just to explain what the exact decisions were and how did I justify those decisions but this justification or this explanation has some important elements so we have the initial model test so we have the starting point the first model did not fit the data and then we need to explain the workflow of the diagnostics so we performed the diagnostics by doing two things so we do residuals, modification indices and other empirical results those empirical results suggest modifications but they are not justifications for modifications so the fact that you have a high modification index does not mean that you should actually do something about the model it is just a suggestion by the computer so this is something that you should be looking at so the item warnings justify the modifications so whenever you look at whether you should be doing something adding a correlation between X1 and X2 for example then you need to try to look at the warnings why do X1 and X2 correlate more with one another than what the model predicts then I applied by factor models instead of correlated errors and correlated errors are more common but by factor models are better because they force you to be more explicit about the assumptions at least you need to name the factor and when you come up with the name for the factor you need to think about what does that factor actually represent what is the omitted common cause that the original model did not have then we dropped items and item dropping is important to understand that I did not do item dropping because some of the items were estimated to be less reliable than others I dropped items because there was unmolar dimensionality in the scale and I couldn't come up with a clean and reasonable approach to more of the dimensionality so therefore it was easy to just simplify the scale another important thing about item dropping is that the scale meaning should not change so if we have items about objective performance in tests and items about how do you believe that you do in a school subject then dropping all the objective items is going to be changing the meaning of the scale so we need to make sure that we actually get the general factor and not a combination of a general factor and a minor factor from the scale when we drop the items unless the minor factor is actually something that we are interested in then we re-estimated the full model with these modifications we got a significant chi-square we did some modifications we again dropped our first loading items and at this point it is important that you have at least three indicators per scale because otherwise the model becomes unstable for those factors with only two indicators because you need information from other parts of the model for identification and again when we drop the indicators we need to make sure that the scale meaning does not change then we need added error correlations but we can only do that kind of data mining when the effects are trivial so this was simply to show that we actually can get a chi-square and what we need to do is trivial so we add small correlations 0.03 to the model and that makes the chi-square significant then we do sensitivity analysis so what would be the effect had we gone full data mining and gotten the chi-square to go to non-significance and if the difference is trivial then we can conclude that the effect of the mis-specification in the actual model that we chose is going to be trivial as well if we data mine there is research showing that by following modification indices you are not going to be ending up to the correct model so you are basically going to be adding correlations that capitalize on chance and don't reflect the underlying populace so if you can have a more parsimonious model and the results are not very different from the final model where you did data mining then retaining the more parsimonious model is a better choice than going for the data mining model then we make a decision and we explain the justification now the final thing what's the point of doing this so this is a lot of work it took me almost 2 full days at the office playing with this data first getting the survey forms trying to understand the different translations how it was translated to Arabic whether there was something in the translation process that could have caused some dimensionality in the scales how were they validated that kind of thing studying the scale getting the items then running all kinds of factor analysis of the model I did some steps that I omitted here that I eventually chose not to use so some of the by factor models that I tried are not included in the model so it's a lot of work so what's the point the point of this diagnostics is the understanding that mis-specification because if your model is mis-specified in a serious way in this case the original model was pretty good and even without any of these diagnostics the original model and the final model were very close to one another in terms of results so going for the first model would not have been a big mistake but the problem is that that is not guaranteed so even if your CFI is more than 0.95 then it's possible that there are local mis-specifications that affect one of the correlations a lot but not others and that would still not be captured by CFI but it would affect your results so fitting this is the nature of the problems and diagnostics and analysis of item warnings can reveal issues that can be addressed for example adding dimensionality to the scales so doing diagnostics is important for the same reason as we do diagnostics for regression analysis we need to do these things just to be sure that the model is good for the data and the results are trustworthy there is also another reason for doing this and getting a significant chi-square is less important it could be an objective and sometimes you need to go for it and why would you want to go for the significant chi-square well one of the reason is that you may have a co-author or you may have a reviewer who insists that models with non-significant chi-square are only models that should be published and models with significant chi-square should not be published because they are wrong well with 6000 observations the correlation between 0.04 is going to be statistically significant so the same person is basically arguing that having a correlation of 0.04 is going to be meaningful of course in smaller samples there is lots more leeway lots more room for sampling variation in the chi-square but with 6000 observations we need to get the model almost exactly correct to get a significant chi-square another thing is that the final two steps in the model dropping x7 and adding correlations in full data mining mode by adding these very trivial correlations it did not change the result at all so there is minus 4% difference in one correlation in other factor correlations the differences were in the third digit doing those extra steps wouldn't make a difference so if a reviewer tells me to do something that does not make a difference for my end result I might as well do it it does not compromise the quality of the research and the final thing is that the chi-square is not a valid test anyway when used this way so when you do a chi-square test and then you modify the model based on your result for example dropping an indicator based on the end result and then you run test the modified model using the data then you are building the model partially on that data and then testing the same model with the data so the test basically becomes a self-fulfilling prophecy so this is a similar problem that you have in a stepwise recursion analysis so when you are building the model based on data and then you are testing the same using the same data for testing the model then the model test is not valid because the test and the model are not independent and then final thing is that the non-significant chi-square does not guarantee that the model is correct for two reasons one is that it is possible that you just simply capitalized on chance so you added a correlation that happened to be there by chance only and then some correlations that existed in the population happened to be small in the sample and then you did not model those so it's possible that we simply model sampling error instead of modeling something that existed in the population another thing is that there is the possibility of covariance equivalent models so it's possible that another model would have explained the data equally well so the point about diagnostics is about revealing problems and addressing them if you can not about achieving a particular result such as a non-significant chi-square if you can get a non-significant chi-square using theory based modification that's nice but that's not always achievable