 Full information maximum likelihood estimation, or FIML, is a common modern technique for dealing with missing data. This is something that is available in pretty much every modern SEM software, and it produces you consistent and asymptotically unbiased estimates under missing at random. So as long as the cause of missingness is observed, this technique will deal with the problem. If your data are missing not at random, then the estimates might be inconsistent and biased. Let's take a look at what FIML does. So normally when you estimate a maximum likelihood estimates of a latent variable model or any other structural equation model, what you do is that you have the model implied correlation matrix that you construct based on the model, and then you have the observed covariances and you find the values of the model parameters. In this case we have the lambdas and the correlation of this factor analysis model, and you try to make this observed correlation matrix and this implied correlation matrix as similar to one another as possible, and those are the maximum likelihood estimates. The traditional way of dealing with missing data in this kind of estimation is that we simply calculate this observed correlation or covariance matrix using least-wise deletion, so we only use complete cases. But this is not the only way of calculating these estimates, and there's actually another interesting way of calculating likelihood that makes it possible to calculate the maximum likelihood estimates under missing data. So we don't actually need to calculate this observed correlation matrix and compare it against the implied correlation matrix. Instead what we can do is go observation by observation and calculate observation level likelihoods, or in practice we would calculate this for groups of observations with the same missing data pattern, but the idea is the same. So instead of comparing two matrices and checking if one matrix is close to another one, we ask the question how likely would an observation for one case be given this covariance matrix. So let's simply look at the variables A1 and A2. So just two variables, and if these two variables are correlated, we can calculate the likelihood of one observation, probability of one observation from the bivariate normal distribution. So we can see that here these variables are positively correlated, so getting a large value in both is more likely than getting a large value in one and a small value in other one. Also getting more extreme values, values of four and minus four is less likely than getting values that are close to zero. So we can calculate the likelihood of an individual observation from a multivariate normal distribution. Here we have two variables, but this of course generalizes to how many variables you could ever imagine, but it's just difficult to visualize this beyond two dimensions. So how the full information maximum like a estimation proceeds in principle is that you go observation by observation or you could go missing data pattern at the time and you calculate individual likelihoods. So let's assume that we have this data. So this is our example data set and we have no missingness for the first case and we have different pattern for each first five cases and then we have more data that we are not interested in. So calculating the likelihood for the first observation we simply look at these six values A1, A2, A3, B1, B2 and B3. How likely we would be to observe those six values from a population with this covariance matrix and that gives us the likelihood. Now the second observation has missing data. So what we do is that we use the same implied covariance matrix or implied correlation matrix, but we just leave the missing variable out. So we look at how likely we are to get these five values from a five-dimensional multivariate normal distribution characterized by this covariance matrix. Then we move on to the next one. We have four observed values, two missing values. We calculate the likelihood for that observation from this subset of the model implied covariance matrix. So we basically proceed this way and we calculate using the part of the covariance matrix for which we observe data to calculate the likelihood for one observation and this way we calculate the likelihood for every possible case or every possible missing data pattern. In the final case we have data for just A1 and A2 and those give us information about these factor loadings and these error variances. We can use the case even if we don't have data for the other variables. We simply compare that case against this bivariate normal distribution. So it takes a bit more to calculate using the computer, but this is generally very fast. So it's almost instantaneous using a model computer. So there's really not many downsides to using full information maximum likelihood or maximum likelihood missing data. And this is implemented in all modern SCM software and it's simply a matter of switching the estimation approach on. So you might have something like ADFWL weighted least squares, maximum likelihood, robust maximum likelihood, FIML, different estimator, simply choose the estimator FIML. There's a robust variant that includes robust standard errors and you run the software. So there's no specification involved. It's simply a switching the different estimator and you're done. So this is very convenient and it's very useful to do. There are a couple of things that you should consider. This is based on multivariate normality and like normal maximum likelihood estimates. The multivariate normality assumption, it's more relevant for standard errors and test statistics. If your data are not normal, then what you can do is to use robust standard errors. Robust standard errors have been developed for normal maximum likelihood estimates and they have also been developed for this FIML estimation technique. And they are generally available near statistical software. When you calculate the standard errors and test statistics, there are two ways of doing that and I will not go into the technical details, but there is expected information and observed information. You should make sure that the standard errors are based on the observed information instead of the expected information if you have missing data. This is generally the default, but it's a good idea to check. Another thing that you should consider is the role of X and Y variables. So X variables are observed variables that don't have any incoming rigors and paths and Y variables have incoming paths. So if you have indicators of latent variables, then all those variables are Y variables. X variables are observed predictors. The full information maximum likelihood estimation is based on considering the Y variables. So you might still do list-wise deletion of X variables. But there is a workaround to this issue, which is treating the X variables as Y variables. And this again is a bit of a technicality that I will not go into this video, but in practice, switching an X variable to a Y variable is just that you tell the statistical software to treat all X variables as Y variables and then it will use the missing data algorithm for that. The difference of treating a variable as an X or Y variable if there is full data for that variable, it typically does not make a difference. But you should be aware that the statistical software can still drop some observations. For example, in the R-package LaVanne, there is fixed X option and that should be set to false if you do missing data analysis. Another thing that you should consider is that if you have observed predictors, but then you treat those as Y variables, then the statistical software might correlate those to be uncorrelated with any other Y variable by default. And that's typically not something that you would like to have in this case. So after you do full information maximum likelihood, you should check that the number of observations that was used in the estimation matches the number of observations in your data and you should also inspect the parameter matrices to make sure that you actually estimated all correlations that you wanted to have in the data instead of accidentally constraining some correlations to be zero. Beyond these are small practical things that you should check after estimation using full information maximum likelihood estimation. This is simply a matter of changing the estimator option and then estimating as usual. So this is very simple thing to do and there is really no reason not to do this if you have any missing data.