 Suomessa maalokonvertennaista on yksinkertaisuinen, joka oli koko ajan pari vuoksi. Ja koko ajan, kun minulla oli puolesta, oli yksinkertaisuinen, johon kysymykset, miten löytyy maalokonvertennaista. Jokainen, joka on käyttänyt SCM, tai muiden erilaisuuteen maalokonvertennaista, saadaan maalokonvertennaista. Ja se on easy to follow someone explaining the diagnostics. But there is quite a lot of quite high bar from going from understanding or high gap going from understanding when someone explains it to being able to apply this diagnostics to your own data. So when you see your software just printing likelihood values over and over and over and it doesn't seem to go anywhere, you don't even get a warning, how do you deal with it? Well you need practice and I'll first explain my own way of how I teach this to students and then I explain how you can create your own practice cases for yourself. So you can take a look at how I teach identification, what kind of assignments I give to students by going to AaltoUniversity, mycourses.alto.fi, I give a course there and you can just go and search for the course code QU-L0040. That's my advanced course with this open materials. And there is some convergence, we start talking about convergence in Unit 5. There is data analysis, assignment 4 tools and at the end there are data sets and there is an exercise about convergence problem. So I've generated these simple data sets that have a problem that I know what the problem is but the students don't know it and then the students are tasked to analyze the data do the diagnostics that we discussed in the class and then tell me what the problem is. Of course these kind of examples of data sets that don't work are not that common because most textbooks and most statistical software that I'm aware of they only give examples of model data combinations where the techniques work really well but they don't always do. So how do you create your own practice cases? The idea of creating a practice case is that you create a problematic model data combination and then you either create a problem and then diagnose yourself. This is a bit boring because you know what the problem is and you're just kind of like trying how the diagnostics work when you already know a problem. Perhaps a more challenging and more fun way of learning how to do diagnostics is to work with a colleague. So create a problematic data model combination and have a colleague troublesuit it. Challenge your colleague, see if they can figure out what is the problem in your model data combination and you can work back and forth so that you challenge your colleague, your colleague challenges you and then both of you learn how these diagnostics work by applying them and how do you come up with these problematic cases and model data combinations. There are two strategies that I use, I've used both of these strategies. One is to simulate the data set with a known problem. So we just simulate random numbers and you build a model from those random numbers requires that you understand a bit of simulation, not rocket science but coming up with nice examples might be a bit challenging if you have never done it. Another approach is take an existing data set from a statistical software user manual or a book and then an example there, break it. So I will show you soon seven different ways to break one example analysis and then you can use those techniques to break your own working models, give it to a colleague and tell them the troublesuit using the techniques, the starting values, Hessian matrix, Berners-Kovernes matrix, estimating smaller models, that kind of things and see if they can figure out the problem. One final thing that I sometimes use is that I see researchers publishing data sets and models that possibly cannot work. I've seen researchers publish models that are not identified and it's very hard to spot these in the wild but sometimes you find a paper that publishes the data set and shows an analysis result that cannot be from an identified model and then that becomes a teaching case. So these are very useful for students because they also show them how important it is to understand diagnostics because just looking at the results might not reveal that the results themselves might not be that trustworthy. Of course these are difficult to find because most people don't publish their data sets. So what some do and you find some of them like a few times per year. But now let's take a look at how you can break a model data combination. So I'm going to be using example nine from status SCM manual and this is a simple SCM with three latent variables. So we have SCS socioeconomic status that explains alien 67, alien 71 that are all variables are measured with two indicators. And this is the data code starting point. This is from status user manual. We'll just run it and as expected because this is a user manual example it is kind of like a textbook example. Then everything works beautifully. One minor thing is that the chi-square rejects this model. So we should really try to understand why it's mis-specified but mis-specification analysis is different from identification and conversions troubleshooting analysis. So we'll ignore the fact now that this model does not fit the data exactly. But in real life we would need to do diagnostics because of that. So how do we break, how do we make it not converts? I'll show you seven different ways. I could probably come up with more but these are the seven that I came up with like in the hour or two. First, rescale variable. SCM works well and numerical optimisation techniques work well when the variables are approximately on the same scale. And if we rescale education 66 by multiplying by the 100,000 this is not technically a violation of any of the SCM assumptions. So the model should work but computational issues start to emerge when the variables are measured on a wider different scale. We run the model and we can see convergence is not achieved and our state does not really give us any other indication in the warnings on what exactly might be the problem. If we take a look at these missing standard errors then we might be able to start pinpointing where the problem is and this problem could be perhaps best discovered by analysing the Hessian matrix where you can see that the magnitudes of the secondary ones are pretty different than you would probably know that well. There is something weird going on in the data. Another one is linear dependency between parameters between variables in the data and this is a violation of the maximum likelihood estimator. There are other estimators. You could perhaps do unweighted least squares or something like that which would be able to estimate this but ML can't estimate one of these because there is redundancy between the variables. We can see that there is no unique information. We would say that the sample covariance matrix is not full rank because there are linear dependencies. This also breaks SCM so you can see convergence is not achieved. How would we go troubleshooting this? This would be more of an identification issue kind of problem so you might be looking at the Hessian matrix and gradient vector to understand where the problem actually is. Then you can use weird starting values. Starting values or lack of starting values, starting values that are really bad cause computational problems. The simplest way of having weird starting values is simply to disable state of starting value algorithm by using no IV start and then it is the user who is responsible for setting the starting values. If you use no IV start I think all parameters start at zeros or something like that and there is no convergence. You get this warning and then convergence is not achieved. These variances look pretty weird once and point once because they are exact numbers then that indicates that the optimizer does not really know what to do with them. Normally they have decimals but in this case the starting value is one and the optimizer doesn't know which way to go because it appears that probably that appears as a maximum originally. But this would be identified either by printing the starting values you might see that they are weird or you might do print the Hessian matrix to see which estimates are not identified. Then you can do models that are not identified. In the first model we had SCS, Alien 67, Alien 71 all connected with recursion paths. I'm eliminating these recursion paths from SCS, Alien 67, Alien 71. SCS is now a single two indicator factor that is not embedded in a larger system. To be sure I set the covariance between SCS and all other variables to be zero. The identification condition for a two indicator factor is that it must be connected sufficiently strongly to other factors to identify the loading. This is not identified for sure and we can see here that convergence is not achieved. We have missing standard error. That's an indication of an identification problem and you might be looking at the techniques for model identification to see what exactly is the problem. Then you can see that it is the loadings of the SCS factor that are the problem here. We can also do another model that fails because of decrease of freedom. So the decrease of freedom for this model is problem positive but we have only three covariances between latent variables and we're trying to estimate four paths. You can't estimate four things from three things so it's not identified for sure and we run them all. Surprisingly there are no warnings so some researcher might actually think that these results are trustworthy because the software does not tell us anything but our closer inspection reviews that we have some very large standard errors here. So that would indicate that there is some kind of identification problem with the model. So large standard error is an indication of an identification problem to fully understand the problem because we have convergence now. We would print out the variance covariance matrix of the estimates and then we would see which estimates are competing for the same covariance and we might see that the two directional paths they can't be estimated from this data. Then we have empirical identification. This is a data issue so this is very similar to the case where I constrain SCS to be uncorrelated with the other two factors. Here I allow SCS to be related to other factors but I take the SCS or actually I take the Alien67 indicators and then I make those indicators uncorrelated with everything else in the model. So I basically take the Alien67 latent variable and I make it unrelated to the other latent variables in this dataset by this orthogonalization technique and this produces empirical under identification, no convergence, missing standard errors. You can troubleshoot this with the identification techniques that are discussed in these videos. The final way to break a model that I'll demonstrate here is to make a model that is severely misspecified. So my misspecification here is that I constrain these indicators of Alien67 and Alien71 to be invariant over time except that the 71 indicators are negatives of the 67 indicators. As more sneaky way of doing the same would be to constrain their indicators to load the same over time but simply reverse code one occasion. Why does this break the model? It breaks the model because all these indicators are positively correlated in the sample and this kind of constrain forces some of these estimates to be negative and the computer doesn't like it. So when we run the model we get this kind of weird result that the warning, the LR test is not reported because of identification issues. We don't get an error but we can see that some of these standard errors are pretty huge so that's an indication of model non-antification. We also see very large, very small estimates for factor loadings. They should be a lot larger in this data and also negative factor loading is something that you generally should be looking at a bit more carefully to understand why there's a negative loading. So that is seven different ways that you can break a model and then these seven different ways can help you to check how these various estimation or troubleshooting techniques work. So my workflow for dealing with this is I would first read warnings if there are any, then I would check standard errors, I would eyeball identification issues after drawing a path diagram, print starting values and adjust if needed, print gradient and SN or print the variance covariance metric of the estimates. That tells you something about identification quite often. Try different optimizer, it wouldn't solve any of these problems but you might still try, then estimate simpler models, use those as starting values. That might work for some of these cases and then empirical identification checks like simulating datasets or running the same model using different sets of starting values. So this identification checking and this non-conversion diagnostics, they are something that if you just read or watch a video on YouTube, you are unlikely to learn it to a level which is going to be useful for you. So this requires practice and you can get the practice on a good course or you can get the practice using the techniques that I explained here, take a model, break it and then try to identify what is the problem and this is actually how I generate some of my teaching cases so I break a model, I give it to students and then that forms their assignment. They have a week time to discuss it together and then after a week we can see if they have come to a conclusion what the problem might be. Interestingly, quite often the students come up with different potential problems and only one of them would be correct.