 Maximulakeljud estimation is the most common estimation technique for structural equation models. However, there are scenarios where you might want to consider alternative estimation techniques. For example, if you are working with categorical data, the weighted least squares or diagonally weighted least squares can be a more appropriate alternative to maximum likelihood. Because that estimator assumes that data are normally distributed and a categorical variable Se on tärkeintä, että se ei ole multivariateenomalainen. Lähdemme katsomaan ideaa tämän alternatio-estimastotekniikan. Tämä on seuraavaksi kuvaa vaikea vaikea vaikea vaikea vaikea vaikea vaikea. Ja vaikea vaikea vaikea vaikea vaikea vaikea vaikea on, että olemme tullut minimaamiseen eri modelissa, vaikea vaikea vaikea, ja seuraavaksi vaikea vaikea tai vaikea vaikea vaikea, ja tämä on vaikea vaikea. Tämän ymmärräminen, joka tulee tai mitä on tarkastella, ei ole ainakin pitiä pysynyt, ensin minua. Tämä on tärkeintä, mutta jos haluat kertoa, miten tämä on tullut ja mitä eri paikalla on tärkeintä, voitte nähdä, esimerkiksi, Täällä on tärkeintä, joka on tärkeintä, joka tullut tämän ymmärräminen jälkeen multivariateenomalainen. Tämä ymmärräminen voi oikein olla ymmärräminen kentän ymmärräminen, joka on tärkeintä, vähän ymmärräminen. discrepancy between these implied matrix sigma and absurd matrix s can also be understood as a weighted sum of the differences between individual elements. So instead of having these covariancies as a matrix which is a maybe 4x4 matrix, we will have the unique elements of the sample covarianist matrix ja model-inplaita kovaa 2-vektioita. On vain semmoisia nampua, joka on 1-demensioon. Sitten otamme kertaa kertaa. Se on tullut asiaa. Tämä on kertaa kertaa kertaa, koska ympäristämme kertaa esim. Sigma on aivan ensimmäinen ja kertaa ja ympäristämme kertaa eri perustan, joissa on kertaa. Tällä kertaa W, josta ympäri matriaksi on ympäri matriaksi ML-estimettä. Tällä kertaa EQS, Structural Legacy Modelling Software, uses this estimation approach, and it produces identical estimates, compared to that equation here. So if we just take the weight matrix W here, and then we construct that, based on the model implied matrix, then this will produce the maximum likelihood estimates. But there are also other interesting ways to weight the difference between the sample covariances and the implied covariances. And the interesting alternatives are generalized least squares. That's interesting for a statistician, because it performs fairly similarly to ML. So in applied research there are very few reasons to use GLS, but for completeness, if you want to understand these techniques, you can read about GLS. The more interesting variables are asymptotically distributed, there's a free estimator, which is sometimes referred to as ADF estimator, or WLS estimator referring to weighted least squares, or AGLS, which is asymptotically a distribution free GLS estimator. And then there is diagonally weighted least squares, which is a simplification of the asymptotically distribution free estimator. These two estimators are useful when you are dealing with non-normal data sets. In particular, if you have large models with categorical variables, then they would be almost always estimated with the idea of these two techniques, because maximum likelihood techniques for categorical variables are computationally very demanding. Let's take a look at what these estimation techniques actually do, and what is the meaning of different equations. And we'll need some data. I'll be using this R code. So the R code basically shows that we have a factor, single factor model with three indicators, and then we fit the model using different techniques. S contains the sample covariance of the data, and then WLS period obs contains the matrix, or the vector of the observed variances and covariances. Let's take a look at the content of this element. So the S matrix is here. That's our sample covariance matrix, and the WLS vector is basically just the same information arranged a bit differently. So instead of arranging everything as a matrix, you have a vector. So we have a vector of differences here, and vector of differences here, we multiply on both sides, from both sides the weight matrix using these differences, and that produces a weighted least squares, because each element, each difference is taken twice. So you first multiply from the left side, then you multiply from the right side the weight matrix, and that produces a single weighted sum of squares. So this is the S matrix, and this is the S vector, and if we apply a W that is diagonal, then that produces unweighted least squares estimator. So the ULS estimator is not particularly useful, but it is easy for teaching purposes to start with this. So the idea of ULS is that we simply take each difference we weight each difference equally, and we multiply the difference with just itself. We multiply the difference with just itself, just itself, just itself. So we just take these differences, we square them and we take a sum, so that's unweighted least squares estimator. Okay, so in this particular case, the variances and covariances are roughly in the same ball part, so there is no big reason to believe that one of these indicators is more important than others. So let's multiply the indicator X1 by 2 for demonstration purposes. Now we can see that the variance of X1 is a lot greater, it's about four times as large as variance of X2 or X3. And now if we look at the absolute discrepancy between X1, the actual observed X1 variance, and the implied X variance, then we should tolerate larger differences. So if we say that the difference of let's say one, we estimate that the residual for the variance of X1 is one, that's a lot smaller difference relatively to the current observed variance than if we say that the residual for X2 is one. So that would be a very big difference compared to the observed value. So these are the variance of X1, how much it differs from the implied variance in absolute terms should be a less important factor in model estimation than how much X2 differs from the implied value on an absolute sense. Another way of thinking about this is that if we take repeated samples of this data, if X1 has a larger variance, then the variance of that variance of the repeated samples is going to be larger than the variance of X2 or variance of variance of X2 over time. So whenever we take repeated samples, we estimate the variance of X1 in each sample and those sample estimates of the variance vary. So we have a variance of a variance. Also we can see that because X1 and X2 are correlated, then if X1 has a large variance, then X2 should have a large variance in the sample as well. Let's say that there is an outlier in X1, if X1 and X2 are correlated, then that variable should be an outlier in X2 as well. So both variances will go up in a particular sample. And we need to take the fact that not all of these covariances and variances provide us the same amount of information about the model into account when we do the weighting process. So this is the maximum likelihood weight matrix and this is the inverse that we actually use here. So we can see that the variance X1 has a weight of 0.03 compared to the 0.20 in X2. So that's about what 8-fold difference and it's because the variation of X1 is larger. So we should pay less attention to X1 and X2, the absolute difference between the implied and absurd correlation. So this is nice to know and if we compare this ML weight matrix to the GLS weight matrix, they are very similar. The reason here is that the GLS weight matrix is calculated based on these correlations, its observed correlations and the maximum likelihood weight matrix is calculated based on the sigma, the implied correlations. And because this is a just identified model, the implied correlations and observed correlations are identical ones when the model converges. And these small discrepancies are just because of the numerical estimation process. So the problem with this is that we are now estimating a large matrix here based on a small correlation matrix. So we cannot estimate these elements uniquely based on this information because this matrix is larger than that matrix. So there is some redundancy here. Particularly, we assume that the data are multivariate normal and there is no excess kurtosis. So for example, the correlation, how much, let's say, X1 and X3 correlate and X2 and X3 correlate, how much those two correlations correlate over repeated samples depends on the kurtosis of the data. Here we assume that the kurtosis is, there's no excess kurtosis data, multivariate normal, but if that is not true, then these weight matrices will produce biased results. So another alternative is to not use correlation matrices but to estimate the weight matrix from the raw data when we take kurtosis into account. And this produces the asymptotic distribution free weight matrix or asymptotic distribution free weight matrix. This used to be called ADF, but now it's more commonly called WLS estimator. However, this technique, while it seems appealing that we dropped the normality assumption from maximum likelihood estimation and go for more general approach where all these elements are estimated from the data instead of assumed based on normal distribution, this comes with two big issues. The issue number one is that there's a computational difficulty. So this weight matrix tends to be very large and it needs to be inverted in the estimation process and this is difficult to do for computers. So calculating WLS estimations for a large model takes a long time and it may not even work. Another important thing is that estimating these weight matrix in a way that is stable requires a very large sample size because kurtosis is difficult to estimate and it varies a lot from one sample to another. The sample size requirements for WLS are in the hundreds, 200 probably not enough, 400, 600 could be enough or in the thousands depending on how complex your model are. Because of these two issues the requirement that you invert a large matrix which is computationally difficult and the fact that these off diagonal elements tend to be unstable there is another alternative that applies the same idea by estimating this from data without assuming any distributions and it is called the diagonal weighted least squares estimator. So the DWLS estimator simply takes the weighted least squares estimator or ADF estimator and just uses the diagonal and this has been proven to be consistent but it is less efficient in very large samples. However in practice this DWLS estimation techniques or weight matrix works a lot better in small samples than the WLS weight matrix and this is actually very commonly used with categorical data in if the sample size is not very large. So summary of these estimation techniques. Unweighted least squares it is easy to understand you can easily program unweighted least squares in Excel and use Excel solver to minimize the unweighted sums of squares differences between the implied matrix and the observed matrix. GLS and ML both assume multivirt normality as well as more statistical appealing so for that reason GLS is not very commonly used. ML is the default alternative that you should go for unless the data are severely non-normal for example if you have categorical data. Then we have WLS slash ADF estimator which is an ideal estimator if the sample size is very large but in small samples it is difficult to calculate so it might be biased and inefficient in small samples. In practice we use the diagonally weighted least squares which just takes the diagonal matrix of the ADF matrix and that gives us better estimates in small samples. Then there are also robust variants of all these techniques and the robust particle in this context it does not mean anything about estimation but you are estimating the standard errors in a robust way. So the same thing when you do regression with robust standard errors the coefficients will be the same. If you do MLR robust maximum likelihood the estimates will be maximum likelihood standard errors will be calculated using the robust formula and also the test statistic, the chi-square has similar kind of robust variants. These techniques cover pretty much everything that the MRO SCM software can offer including data N plus or LaVan and except for some missing data procedures that are estimated a bit different from these approaches. But yes this is, if you understand all of the, what I explained in this video you basically understand everything that there is to understand about estimating models of continuous variables and no missing data in SCM framework.