 Today's topic is about Confrontal Factor Analysis, and I was asked to give a topic, pick myself, and I was thinking, okay, so what do people talk about, CFA, that is not common knowledge? Like, there's a lot of point-and-click tutorials how to do this, so what would be something new for people who have actually used this before, and I decided to do a talk about diagnostics, but then again, sorry, a little louder, yes, for sure, I can, yeah, please remind me to do so, and then I also need to go to the basics, so this is kind of like we start very basics, like you go to a swimming hall, you go to a pool that's knee-deep, and then we jump to the deepest pool ever at half way of the presentation, and I'll try to keep it understandable. So, if you want to learn more about factor analysis, my slides for this presentation come from my Advanced Course on Quantity and Research Methods. There is a unit for measurement, the measurement validation, and then there is a unit for structural equation models. Structural equation models, I talk about how to identify the variables of models and things like that, and then this is about how do we actually apply. Most of my stuff on these slides come from here. If you take a photo of this QR code, it will take you to this screen on YouTube, and you can learn more. There's like 40 some videos on this specific topic, like around it. So, let's take a look at what is the basic idea of factor analysis. Why would we ever want to do factor analysis? And my application area here is focusing on factor analysis in measurement validation. And the basic idea of what factor analysis builds on is pretty simple. So, if we have two measures, we measure A and measure B, and they measure the same thing, we expect them to be correlated. If we have a bathroom scale, and we measure a person's weight with that, and we ask a person to self-report, those are two measures of the same thing, they should be correlated pretty highly, if we have enough variance in the sample. And then, if we have measure C that measures something different, for example, we have a person's self-reported height, we should expect that measure to be correlated less with A and B than what A and B correlate to get. So, the idea of what factor analysis does is that it tests this kind of hypothesis. To give an example, this is from the PISA study. So, this is the PISA study about attitude towards school and all kinds of school-related things. And this is a five-item scale about enjoyment of science, and all these questions that are positioned in different parts of the survey form are supposed to measure how much the students enjoy science, and they should be expected to be highly correlated because if a person is indicating positively that they enjoy acquired new knowledge in science, they should also indicate positively that they are interested in learning more about science. So, do these items correlate highly enough for us to say that they measure the same thing? And here is another scale, self-concept, how much a person thinks that they know science roughly, and, for example, school science topics are easy for me. We should expect school science topics are easy to be, for me, to be correlated with, I enjoy, acquiring new knowledge in science, but correlate less than this item, because these are supposed to measure the same thing, and this is supposed to measure something else. What factor analysis does, it allows us to test this kind of hypothesis. So, do the pattern of correlations in my data support the idea that the variables that I have in one set measure one thing, and the variables that I have in another set measure, another thing that is distinct from the thing that the first set measures. Factor analysis, converter exploratory, answer this kind of questions. There are variations, more technical, more specifically what factor analysis does, it tries to discover or confirm the existence of certain dimensionality in the data. So, if we have two measurement scales that are measured, supposed to, that are intended to measure two different things, we should have two dimensions in the data, the two things that we want to measure. And factor analysis answers the question of what the measures have in common. Comes in two variants, exploratory factor analysis, where you give computer data, and then you tell computer that these data, I think there are three factors in the data, find me the factors. And in converter factor analysis, you tell that these three items, these three variables belong to this factor, these other three belong to this other factor, and is that true? So, it's more like you ask a question, and you expect a true or false answer, and in exploratory factor analysis, you expect an answer of the various of how these items vary together. Exploratory factor analysis, by far easier to apply. I typically tell my students to always start with the exploratory factor analysis, because it's not easier to mess up a converter analysis than an exploratory one. And the hard part about converter factor analysis is that you might not have realized that you have actually messed it up, but you end up reporting statistics that are not trustworthy. Much harder to misuse exploratory factor analysis than converter factor one. Both are a part of my workflow. So, when I do factor analysis of a scale, I pretty much always apply both, and I will talk about the different roles in this. The converter analysis is my main analysis tool, but quite often, the converter factor analysis answers no to my question. My question is, do these items work really, really well in measuring two or three or how many different things, and then the factor analysis gives me no as an answer. Then the next question, of course, is to understand why they don't work that well, and that's where the exploratory analysis comes into play. That's one tool in my toolbox for doing diagnostics about this kind of models. So, let's take a look at a concept or example of what a converter factor analysis does, and this is a set of slides that I stole from Todd Little, so he was the guy who actually got me to understand what factor analysis is. He showed me this slide of data as variance components. So, let's assume that we are measuring two things, let's call them A and B, and we have indicators A1, A2 and A3 that measure A, and indicators B1, B2 and B3 that measure B. So, we can see here that each of these circles represents a source of variance. So, the item A1 varies because there's some variation in A. For example, if this is a measure of height, height measures vary because people vary in how tall they are. Intelligence measures vary because people vary in how intelligent they are. So, there's variation due to the construct that we want to measure. There are also other sources of variation. There is random noise. If we do an IQ test, if we have a bad day, we might not do that well. On a good day, we might do better than average. There's random noise, and then there might be some, we call it item specific variance. I use these colored letters. For example, if there is in an IQ test, there is a pattern recognition task, and then there is a verbal task. There's something distinct in the pattern recognition task that differs, that is reliable, but that differs from the verbal task. What Confederate Factory Analysis does is that it presents a hypothesis to the data. So, we have a hypothesis that these variables A1, A2 and A3 have something in common that we might be interested in. These have something in common that we might be interested in, and then there is some variation in these items. We call it measurement error that we are not interested in. Conceptually, when you estimate the Confederate Factory Analysis, it separates the variation in the items, into the factor variance, and into these error variances. So, it kind of like takes the variation. It answers the question, why do my items vary? Why do they correlate? And it estimates a model that is supposed to explain variation and correlations in the data. One key advantage of Confederate Factory Analysis is that because it splits the error variances out from the indicators, conceptually not technically, it allows us to estimate how much these constructs A and B correlate. Without that correlation being biased because we are measurement error in the data. So, this is the basic case. It asks how much of these various and in these items is explained by A, how much various of these items are explained by B, how much is error, and how much A and B correlate. We of course want these error variances to be small. Typically, we would like the error variance to be less than half of the total variation of the indicators. In that case, these lambdas that we call standard as loadings would be 0.7 or greater. But that is not everything that you could do. You can also model imperfections in the data. So, rarely, your data work really, really well that it's just exactly how you estimate it to be. So, you can do correlated errors. For example, if we have a variance component here R, that is belongs to both that affects A3. Let's talk in a minute what A might be. So, we have this sort of variance and this sort of variance affecting two items that will throw off the factor analysis because now we have variance in A that doesn't belong there. We have variance in B that doesn't belong there. So, factor analysis would not be correct. We cannot relax assumptions. So, we can specify in a factor analysis that, by the way, these two errors of A3 and B1 can actually be correlated. And that allows this correlated measurement error to escape in the data terms and we get a nice factor analysis. Of course, adding correlated errors must be theoretically justified. So, when would B1 like to do this kind of trick to make the model fit better into our data? For example, this is that I use on a management course. If our measures are, we have innovativeness measures of furnace innovative. We have many patents. Our personnel are innovative. And then we have productivity measures. Our personnel is productive. Our furnace is more with less of furnace productive. Why might these two items be more correlated than, for example, why might these items be correlated? Well, they both have this person dimension. So, they also, they are not only measure innovativeness and productivity, but they measure something about the quality of the personnel in this hypothetical company. And converter factor analysis allows us to model this kind of imperfections in our data. And this is one of the key advantages in converter analysis. Of course, if we want to be a bit more rigorous, we could, instead of specifying that these can be correlated, we could add an additional factor. Let's call it C and call that the quality of personnel. We would call, this would be kind of like what we call a by factor model because some of these items load on more than one factor. But yeah, depending on how you define by factor, that might or might not be. But by factor model is something that, that if you hear that term, it refers to something that is related to this kind of configuration. So, adding these correlated errors or additional factors is, must be theoretically justified. I'll show you an example in a moment. Yeah. So, this would be personnel demands. So, what is hard about converter factor analysis? This is a difficult technique to apply. Correctly. So, there are things that are easy and things that are hard. It's fairly easy to do this kind of model. If you use data or if you use M plus, you can point and click and draw a path diagram like I just had and click estimate. AMOS only allows you to specify the model that way. If you are using, let's say R, then you have to specify the model. You have to write it as code. But that's not, that's not terribly complicated. And things that are hard, convergence. Sometimes your, your model, your computer says that I can't give you a solution. So, what now? I have a, a set of 10 videos or so talking about convergence. If you want to know about it, it's over there. And then, other thing that is hard is, is diagnostics. So, you have to understand that when our, our software tells us that these, this model is not right for this data. There's something wrong with the model, something wrong with the data. Understanding what that wrong thing is, is, is difficult. It requires practice and it requires expertise. A lot of research, I, I seen a lot of bad practice. Like, let's say that, that you decide that, that converter factor analysis must have a CFI statistic more than 0.95 for the model to be any good. And then your CFI statistics 0.93. So, what do you do about it? Well, you'll find an article that recommends that 0.90 should be the right benchmark and ignore the problem. That is bad research practice. So, when the model doesn't fit, you need to understand what to do. And that is the role of diagnostics. And, independent isn't the results beyond rule of thumb. So, it's fairly easy to check if, a set of statistics like CFI is more than 0.95 or whether a factor loading is more than 0.7, but going beyond that, then it gets really difficult to understand what the numbers actually mean. So, that's the first part of the very basics. And then the second part, I'll explain a bit of my workflow and how I do diagnostics using real data. And this is my integral example. I'll be using the PISA dataset. I have three things that are measured in this data. And enjoyment of study science, self-concept about studying science, and carrier aspirations. These data have been used in a number of meteorological articles to demonstrate certain features of structural regression models. And the idea is that we will test the hypothesis of, we will test two competing hypothesis. It's either person's self-concept or the enjoyment that determines their aspirations in carrier science. These are students in like high school and middle school, if I understand correctly. I use this as an example on my course. If you want to work through the example yourself, then you can scan that QR code and you can access to the data. I also use this as an example in a paper that we are now working on for psychological methods. It was being used by Kelaava in the same journal before. And the data are from OACD. It's freely available. It's a bit hard to find. So, perhaps the QR code that I gave is where to get the data. Here is, if you want to see worked examples, here's the QR code for some analysis of this data using M plus and LaVanne in R that we use in the psychological methods paper that we are under review. So, the piece of data measures things related to the school accomplishments and attitudes towards school. We are interested in the attitude in our measures here. So, our measures are enjoyment, five questions about where the student enjoys science. We have a self-concept, six measures, six questions about how much the person thinks that they are good at science. And then we have four questions about career aspirations. Would the kid want to work with science when they grow up? And we fit this kind of converter factor on a small. We test the hypothesis that the items that measure enjoyment, they go to one factor, career aspiration, go to another factor, and self-concept goes to a third factor. All right. And now, we just specify, so that would be the R syntax for the model. You can point and click, but the hard part comes now. You get the results. What do you do about it? The results tell us that the P value for a model is essentially zero. So, the model is not correct for the data. Now, a lot of people, when they face this, would take a look at these alternative fit indices. There are rules of thumb that, for example, CFI and TLI must be more than 95. And this is based on who and Bentley's work. The model is not correct for the data. We know for sure because the P value for the model is very small. But is it adequate? That is kind of like a question. One way to answer this question is to look at these fit indices that the software produces. So, CFI, TLI looks okay. And then RMSI must be less than 0.05, and SMRR must be less than 0.05 again. So, it looks good on these statistics. But this is actually something this indicates good fit, but then the model test rejects. So, now, what do we do? One statistic rejects the model. Others say that this is good. Do we ignore the statistic that says that this is not the right model and just focus on the statistics that say that this is the right model? A lot of researchers go that road because the other road called diagnostics is very hard. So, we need to understand in which way the data does not fit. And when I have three factors, and I'm estimating a full model, a computer tells me that the model is not right for the data. What would be an obvious next step to do? If you have a car that doesn't run, what do you do? You scream, you call the expert, yeah. But you take a look at, like you have a car that doesn't run, then you start setting individual parts. Do I have gas? Is the power in the battery? Is the engine ignition working? So, you take a look at different parts of the bigger problem. And this is what I do when I do diagnostics. So, if I have a model with three factors, it doesn't work. If I just take, estimate the same model, one factor at a time, would it work that way? And if not, is there something that I can do about it? So, what I would do is that I would then do diagnostics. So, I would start by looking at this model as, first, either as whole or as smaller sub-models. If you look at this as whole, you might take a look at something that is called residuals, that all of your statistical software that does digital models can report. And this is just a pattern of correlations. And the higher the correlation here, the less the model explains that correlation. But looking at this kind of, like, 11 by, sorry, 13 by 13 correlation matrix and trying to find patterns, that's pretty hard with plain eye. So, what I would do instead is focus on one factor. So, these, or do an exploratory factor analysis. So, if a confirmatory factor analysis indicates that the model doesn't, is not exactly right for this data, then I would run an exploratory analysis next. And that exploratory analysis tells me, okay, so if my model is not exactly right, what kind of model would the computer recommend? And this exploratory factor analysis indicates that there is a, there are high loadings here. And importantly, these, some of these item, most of these items don't load on any other factor. So, we are, when we do diagnostics, we are not looking at which factor an item belongs to. But we are looking at, is there some evidence that this, for example, Y4 actually measures something other than what it's supposed to measure? We know that Y4 is a measure of carrier aspirations, which is the third factor here, but does it measure these other factors? So, we, these are pretty small numbers, but we still could do better to understand what is going on. The next step would be that we do a single factor. So, we might do a single factor. Let's do one factor and let's do just the, the enjoyment. So, these factor loadings should be more than 0.7. That's a rule of thumb. And this is a bit low. So, we would like to understand why this is a bit low factor loading. Also, the model is, is not correct for these data. So, when you are looking at the single factor that doesn't really, it's not, these data are not fully explained by a single factor. Then, we can take a look at the, okay, so what are the questions? So, if we have these questions, are there some dimensionality? So, the correlations are here. First and second item correlate a bit more strongly with one another than the other items. And fourth and fifth item correlate more strongly with one another than the other items. So, we started reading why would a person who, who rates number four highly be more likely to rate number five highly than to rate number one highly? So, is there something else going on in the data? And what might number four and number five have, have in common and what might number one and number two have in common? So, well, number, number three is specifically about doing. So, there's this action part. If we take a look at number, item number one, that is just about learning generally. The number two is about reading and then number three is more like active things. So, you like doing science problems. Some person might read like reading math, other person might like doing math problems. So, these are two different things that you might, they are highly correlated but they are different things. So, we might identify based on looking at these correlations that these are items one and two have something in common and these items three, four and five might have something in common and we might decided, okay, items number one and number two are about reading about science and nine items number three, four and five are more, more active. They're more about doing. So, reading and doing. That's not like an exact split but maybe that helps us to understand why there's some dimensionality in the data. Then, what we do is that we can add a second factor. We do an exporter factor analysis. So, the second factor indicates that there is indeed some dimensionality. So, the second there's, there are, this is not one factor. It's one main factor and two minor factors. And if we add the minor factor to the model, we add a second factor reading for the first two items. We still have the main factor and now we are getting closer to a model that is not rejected by the Kaiser-Greuer. So, we see that the big model doesn't fit. We take a look at smaller part. Then, we try to understand why the smaller part doesn't fit. We add modifications and then we re-estimate. So, rejected, we can see also that the items don't perform as well. But when we add another factor, yeah, another factor doing, then the model fits well. So, we have one main, one major factor that measures how much these students like science, enjoy science and then students vary whether the enjoyment is more focused on doing versus reading. And this is the minor factor. This would be called a by-factor model because there's one major factor and a minor factor. So, you start thinking about this data. When the data don't fit, there's typically dimensionality. The model doesn't fit the data. There's some dimensionality to the data that the model doesn't really fully explain the data. What is the dimensionality that is typically discovered by looking at what are my data? If we do two bathroom measurements of person's weight and then we ask that person to self-report, the two bathroom measures will be more highly correlated bathroom scale measures than self-report because they are measured using bathroom scales that are that in common in addition to having it in common that they measure the same person's weight. Yeah, so this is not rejected. We would be happy with this model, with this modification. So, what do we do of diagnostics? We do first like the full model. We almost always doesn't fit exactly. Then we try to understand why does it not fit? My workflow includes running an exporter analysis to see if it gives me a same kind of pattern that a converter factor analysis does, then looking at that one factor at the time to see if I can identify some sources of misfit. Then diagnostics one factor at a time. I do this on a video for all the other factors. And yeah, so I discussed this reading and doing factors, not the other alternative, and then proceed with the next factors. Then finally you test the full model. So a final outcome of this kind of process might be that we have our original confrontationalist model with the full data. And then we have this kind of like series of different models in which we add certain modifications based on what we think makes sense. And then we compare, does adding these modifications make a difference? If we are trying to measure enjoyment self-concept correlation, our original estimate was 0.529, and then my estimate of doing like everything that I possibly can is 4.76, there is about a 10% difference. So, doing a model that has been modified to fit better gives us a different result than the original model. And now the question is that if we have a simple model with no modifications, and we have models where we add modifications, this is from a bigger example, which one do we pick? So there is a 10% difference between the most modified model and original model. 10% is a really big difference. But if we take a look at the one where I just did diagnostics adding a couple of minor factors, then we only have a 4% difference. We would go for one model, decide that this, after this point of modifications, there is no longer any differences in estimating the correlation that is our primary interest. So we're interested in understanding whether enjoyment and self-concept, for example, how much they're correlated, if we decide that some of these modifications based on diagnostics no longer make a difference, then we would not do them based on the parsimmonic principle. And this is how it might look reported in the paper. This is from a real paper I'm working on, and we would say that the original model doesn't fit the data. Then we did diagnostics looking at what the items were, what the questions actually said. We look at exploratory factor analysis, we look at the correlations with the items. Then we do additional factors. We dropped some items later on. I realized that there are some questions that don't work that well in that context. I decided not to use them. I had five items. Three is the typically minimum for a factor analysis. I could easily drop one of them, and then I estimate the full model, and then I take a look at the diagnostics for the full model, and then finally, does it make a difference? And then I report it. So this is kind of like state-of-the-art way of doing and exploratory factor analysis. First, you run the full model. It typically doesn't fit. Then you do exploratory factor analysis. That typically discovers that the data has roughly the correct dimensions. Then you are pretty much a winner already, and then you go one factor at a time to see and understand if there is any like big problems, and then you return again to the full model, and then you compare across different modified models to pick the one that you want to report as the final model. Yeah, this is the sensitivity then, I would say. So that is, yeah. In a nutshell, what is the point of this? The point of this is that if you rely on the rule of thumb that CFI must be pointed greater than 0.95, there's a lot of research showing that, including my study that I'm now working on, that important mis-specifications go undetected with that rule. So you might have, for example, if you have 11 items in a fairly good data, you might switch to items that are in completely different incorrect scales, and the CFI greater than 0.95 rule would not detect that as mis-specific. This is the reason why we do diagnostics. And getting a non-significant statistic for the model fit is not that important. Some reviewers insist, then you don't need to go and see what it would take to get a significant chi-square, non-significant, but I wouldn't bother reporting those. And then you finally take a look at if it makes sense. So diagnostics is super important because the conventional rules that CFI is 0.95 don't detect all possible mis-specifications. The model test, the chi-square statistic and the p-value will tell you that the model is somehow incorrect for the data. It doesn't tell you in which way. And to understand which way it is correct, that's where the diagnostics come from, conveniently. Okay, so we had a bit of shallow, like basics, and then we jumped into the deep end. Thank you. I guess we have now time for a couple of quick questions. There is some. Yeah, Pili? Yeah, I have a short question about the role of theory or the, like, ideas behind these questions, or what do you think about how important it is to do statistical things because they can be so small differences. Do you think it's more important to think it, like, theoretical, for example, the interest and all that, like, moving and meaning should be, like, actually the same, same concepts or different concepts? I have, in papers that I've worked on and in projects where I've been consulting or helping, I don't take money from friends, but I've made decisions against the empirics. Like, if there's a good conceptual reason to believe that I'm interested in learning about science is more about enjoyment than it's about career aspirations, then it should be included in the measurement, measure of enjoyment instead of career aspirations, if it doesn't belong their concepts or if you have plenty of items like you have here, you realize that one doesn't really behave well and you have a reason for it, then you can take it out. But yes, the conceptual reasons are the most important, like, we take a look at these items for reading, for self-concept, then we don't group these items one and two because the computer tells us, we group them because the computer tells that you should consider and then we as researchers make the call that, yeah, this is about reading, more like passive things and this is more about doing, I guess, and that would be our rational. So theory always, empiric success what you should do and theory justifies doing so. So that's my take on it. There's a lot longer version of this talk on my channel. Please, I could actually still ask something, because you are recommending to use the FA because it's kind of safer, but what is your take on? Because the good side of the CFA is that at least you have theory and you are testing a theory, but when you take EFA instead, then you are constructing a whole new measure which is not based on theory anymore. So what do you think about this? It depends on how you consider the EFA result. So if we have a hypothesis that these data measure self-concept and enjoyment and carry aspiration, three different things, and our hypothesis is that these why items measure the aspirations, why would we not consider this as a test? So it indicates that these have belong to one factor and one factor only. So you can do EFA in a converter manner. So you can have a prior hypothesis that these patterns would emerge from my data. Then you run EFA and you check, does the pattern that emerges my data match my theoretical expectation? Then that's a converter way of applying EFA. Similarly, you can apply CFA in a purely exploratory manner. So you can just add things based on modification indices, for example. And you can do diagnostics, add factors without much thinking. That is also possible. That's called specification search. So it's not like exploratory must be used for exploratory purposes. A converter analysis must be used for converter purposes, but they can be both used for both purposes. Yeah, it's kind of a misunderstanding that exploratory architecture analysis is only for exploration, for new measures. So why do you think it's a misunderstanding? Yeah, because you can use it for confirming. Confirming in the sense that we have a pattern that matches our theoretical expectation. We don't have a p-value for that, but why would we need one? For instance, when you use EFA, you just find what is common for them. So you can, for instance, take away the method. No, that's right. You can't model these more complex structures like method variance or secondary factors. You're absolutely right about it. So that's why I'm saying about the importance of theory in there. So at least in CFA, you have the strength of the theory behind but when you are modeling with exploratory factor analysis, then you are kind of just finding some common factor in there. But you cannot say kind of what's kind of the core of the factor. Yeah, but that is about the factor interpretation problem. How do you give meaning to the factors? But this using EFA to confirm a pattern applies when each indicator loads on one indicator, one factor only. If each indicator loads on one factor, you can run an EFA using CFA software and you can run a CFA using EFA software. And this is actually something that I teach my students. So I tell them to use status EFA command to run a converter factor analysis and then compare that against the converter factor analysis result and then do the same the other way around. So you can run a converter factor analysis with EFA an expert analysis with CFA. That is also possible. Just allow all items to load on every factor except one and then you get an exploratory analysis using CFA software. Yes, thank you. Yeah, there's a lot of examples on the course that I gave. Just a quick question. Go ahead. So here, if you go to this QR code, there's an assignment for students. I don't know if it's this one or they are the previous unit where I tell students to use EFA and EFA with CFA. So what about the paper in psychological methods? Our paper. Yeah. It's about latent interactions. That's still under review. So that's second round now. Yeah, but I can send you the paper if you want. It's about latent interactions and how we test them. It's more advanced than this stuff. But we use this example there.