 How should missing data analysis be reported? Unfortunately, the use of missing data analysis is not that common. So there is not that much empirical examples that I could apply. But we can use Ender's books and his recommendations as a baseline on what should be reported and how should it be done when we apply missing data analysis. Again, the reason why these are not that commonly used is probably not that the techniques are not useful. It is simply that people are not aware of these techniques and that particular using the full information maximum likelihood technique is very easy to do. So Ender's gives a couple of recommendations and there are some points that he raises that I really like. One is that some people tend to avoid using particle imputation techniques believing that the reviewers or editors will think that imputing data is like cheating. Well, that's not a good reason for doing suboptimal decisions. So instead of not imputing on the grounds that you may be criticized for doing it, it's a good idea to teach your readers and your reviewers of the advantages of modern missing data techniques. So it's a good idea to be a bit teaching like Ender says. So how do you actually go about reporting these missing data? The first thing that you need to consider is descriptive statistics. So you need to be transparent. It's quite common that researchers simply say that we had 200 observations and after misropping cases of missing data, we had 150 observations. Well, that's not ideal for two reasons. First, dropping 25% of your data is generally not a good idea. And the second thing is that understanding the pattern of missingness is something that is probably useful. So Enders recommends that if you apply missing data techniques, you should tell the amount of missingness in your descriptive statistics table. So you have the table with correlation, standard deviations and means and there's lots of other information that could be put into that table and the percentage of cases missing for each variable would be useful information if you apply these missing data techniques. So this is the first thing. In descriptive statistics, characterize the problem, it does not need to take multiple sentences or much space, simply state the overall percentage in the paper and then put the individual variables, percentages of missing data to the descriptive statistics table. Then when you go and explain the actual estimation, it's a good idea to tell a few things. So Enders gives these hypothetical examples of what you could write and this is a good template. He starts by simply explaining what was done. So we used FIML estimation using M+, version 5.2 and robust standard errors option. So simply a description of what was done and then technical details and this analysis uses auxiliary variables. So there needs to be a description of that and then some other descriptions like how do you interpret the results and so on. And then what I really like about this is that in the final sentence or two, Enders provides reasoning. Why is this a good idea and what evidence supports the claim that this is a good idea. So quite often when people justify their decisions, they cite previous empirical research saying that it is a common to do thing X in the field Y. But that something is common is not really a reason for doing so. Instead you should explain what the advantage is and what is the evidence to support that advantage is actually real. So this is an example of full information maximum likelihood estimates. Reporting multiple implementation is a bit more complicated but the same principles are followed. So first start by explaining what was done. So you did multiple implementation and then multiple implementation requires a lot more decisions from the researcher than applying full information maximum likelihood. So you would then have this explanation of technical details like auxiliary variables, how did you check convergence, how long imputation change you had, what was the statistical software, how did you do diagnostics and so on. And so that's a bit longer description. And then finally again, what is the advantage? Why would one want to go through all this trouble and what's the evidence that the advantage actually exists? So importantly you can use things, say things like it has been shown to be or it has been proven to be instead of saying that someone claims that multiple implementation is useful. Because multiple implementation requires lots of decisions it might be a good idea to also prepare an online appendix that explains the imputation process in great detail. Ideally you could have the statistical software's syntax file or code file that implements the imputation process so people can check what exactly did you impute with what variable, what kind of assumptions you made during the imputation process and so on. So it's important to be transparent and because most journals allow online appendices nowadays there's really no reason not to just put your analysis files up for others to inspect. I have an empirical example of full information maximum likelihood that comes from a paper that I help to write. So this is about school bullying and the main analysis was latent class analysis and we have a separate section on missing data in the paper. And it's a good idea to have a separate section because missing data analysis is not something that all research are accustomed to. So we explain that we used full information maximum likelihood estimation. We cite Ender's book, which is my favorite book on missing data as you probably see from these missing data videos. And then we explain the details like how much data are missing and then we explain the assumptions. So we make the missing a random assumption. We discuss the missing not a random and what is the risk of missing not a random being the case in our data. And then we tell that we used auxiliary variables and then we do some robustness checks and by doing different things. Now this is a good example of the amount of detail that you report in full information maximum likelihood estimation. And one might ask why are we so detailed about it. Well, the reason is that we were actually fortunate to have a reviewer who pushed us to do more. So we simply in the initial version we didn't discuss the assumptions but the reviewer made us to do so and that made the paper better. So this is what the reviewer once said. Our response was rather lengthy and we cited different parts of the paper. So copying our response here wouldn't make much sense. But generally what you need to do when you report missing data is to be transparent about the amount of missingness, provide enough detail so that people could ideally replicate their analysis and then provide justification. So what is the methodological justification for doing what you do? Simply saying that these techniques are modern, they have been recommended, they are common is not sufficient. You need to explain what is the advantage and in this case the advantage is that you get consistent estimates if the data is missing at random with older techniques you would get inconsistential estimates and consistency is something that we want to have in the estimates.