 There are a few techniques available for researchers to dealing with missing data. Which of these techniques should you apply, when and why? This article by Neumann in Organizational Research Methods provides a nice overview of the available techniques. He divides these techniques into five categories. We have list-vast deletion, dropping all cases that have any missing data, pair-vast deletion, basically using whatever data that you have for each specific analysis, allowing the sample to be different for each analysis. For example in a correlation matrix, one correlation we calculate with different data than another correlation. Seeing an imputation techniques means substitutions being the most common. Also regression-based imputation techniques belong here. Then we have the modern techniques, maximum likelihood for missing data, FIML, and then multiple imputation techniques. These are the techniques that he recommends and he lists in a brief form all the major issues in the other techniques. So all other things being equal, if we don't know anything about the problem, we should always go for FIML or multiple imputation. But there is another strategy that is applicable. The strategy number one for any problem is always to demonstrate if you have the problem at all. So before you go and do anything complicated, use it first check if there is a problem in the first place. If you have 1,000 observations and then you have 10 cases of missing data, that's 1%, maybe that's not a big concern. So unless it is very easy for you to do, you might just ignore the issue and apply a list-wise deletion. In no circumstances you apply a pair-wise deletion or simple imputation techniques. So list-wise deletion is okay if you have very little missing data, otherwise these two other techniques are the ones that you should apply. So if we have an amount of missing data that we simply cannot ignore, let's say we have 50 cases of missing data out of 1,000, that is probably not ignorable, which one of these should be applied. There are a couple of considerations on whether we use full information, maximum likelihood, or whether we use multiple imputation. Both of these techniques are asymptotically equivalent, which means that in large samples, if we estimate the same model, they should produce the same result, and this has been proven to be the case. So in large samples, then it's a matter of preference. But there are some details on the capabilities that might influence our decisions. Both can use auxiliary variables, so you can enter variables that help in the imputation that are not part of the actual model. Doing so might be a bit easier using multiple imputation than using full information, maximum likelihood. So this would speak for using multiple imputation. On the other hand, multiple imputation is more general. So this also speaks for multiple imputation, because full information, maximum likelihood, it's an estimator that needs to be implemented in your statistical software for the particle model that you want to do. In contrast, multiple imputation can be applied with any analysis that can be used raw data. So in this sense, multiple imputation is more general than full information, maximum likelihood. In practice, full information, maximum likelihood techniques tend to be implemented only in structural ecosystem modeling software or structural ecosystem modeling commands of general purpose statistical software. That will take you a long way, but it's not all possible models that you can apply FIML to. On the other hand, there are the flexibility of multiple imputation being applicable to any model, means that specifying a proper imputation model in multiple imputation can be complicated. And what makes it complicated is that the complexity of the imputation model needs to be equal or higher than the estimated model. So if you're estimating a model with interaction terms, then your imputation model needs to take that interaction into account. So imputing with a linear regression model and then calculating an interaction model would not be consistent. Same thing, imputing with a linear model, calculating a logistic regression model or any other GLM would not work. So you need to take any nonlinearities, any interactions, any levels of analysis, any multiple groups, any feature of the data that you could possibly be interested in in the analysis into account and calculating the imputations. So specifying the actual imputation model depends on the data analysis model. So in a way, you need to consider your model specification twice. And if you want to first run, let's say, a normal regression model, and then you realize that you actually should be using a multi-level model, then you need to rerun your imputation data sets unless you took that non-independence of observations into account in the first imputation. So while multiple imputation is more general in that you need to, you can analyze, use any data analysis command to analyze imputed data sets, the imputation itself needs to match the analysis that you apply. And this can make using imputations a bit challenging and also error prone, and it takes time to recalculate the imputations. On the other hand, a full information maximum likelihood, if it's estimated in your statistical software, you simply switch to that estimator, and it's basically adding an option in your syntax file or if you use a graphical interface just toggling a button on to enable missing data estimates. One final consideration is that full information maximum likelihood has less convergence issues. Multiple imputation models tend to not always converge and that is less of a problem in full information maximum likelihood estimation because there is no imputation step that could produce autocorrelations that are undesirable. In practice, what I tend to do almost always, I go for full information maximum likelihood. But this of course depends on what kind of questions you are analyzing. If you are working with non-linear models, interactions, things like that, then multiple imputation might be better for you. But as a rule of thumb, whenever possible, I think it's better to go with full information maximum likelihood. But when the capabilities of FIML estimation in your software run out, then you can consider doing multiple imputation, which is more complicated. It takes time and it's more error-proof.