 Missing data pattern tells us what data are missing. When you have missing data, the consequences of that depend on the pattern, what data are missing and the mechanism why data are missing. In this video I will take a look at the patterns. Missing data patterns can be thought of as different levels. This is from Neumann's article and he focuses on three different levels. First there is item level missingness. This refers to a scenario where we have a multiple item scale, so we have construct X measure with three items X1, X2 and X3 and there is some missingness to one of these questions. This would be very typical in a survey based study when an informant thinks that one of the questions in the scale is either difficult to answer or does not apply to them and they will leave it blank. This is fairly easy to deal with in some scenarios we can simply use the mean of X1 and X3 to impute this missing data for X2. So that is an easy case. Another case is what Neumann refers to construct level missingness and what the missing data more generally refers to scale level missingness is that you don't have any data for measurement scale for a particular concept for a particular person. This is more problematic because we don't have any information on where this person belongs to on construct X. Then we have the person level missingness or the case level missingness or the observation level missingness where an individual observation is missing from the data altogether. This would be maybe a sample selection problem. Another way of looking at these missing data is given by enders. Enders explains six different patterns. The univariate pattern basically means that there is missingness in one variable and that's the same as the item level missingness in Neumann's paper. Then there is unit non-response pattern and this would be for example the case if you have a survey data set and you have some secondary data from your sampling frame. Only some people decide to respond to your survey. You have the data from the frame for those people or companies but not the data from the actual survey and this is the unit non-response pattern. Monotone pattern where the amount of missing increases over time would be a problem in longitudinal studies where you have attrition so people drop out from your study and the amount of dropouts increases as time goes on. Then there is general pattern. There is no particle pattern for missingness. Missingness is rather a random phenomenon. Then we have two other interesting patterns. One is plant missing pattern. This is not very commonly used in management but it is fairly common in for example psychology where we can get large samples from for example schools. The idea of plant missing data pattern is that data collection can be costly. So if we have a survey form that is four pages long but the organization that we survey only allows us to have a three page survey or two page survey then we need to leave some questions out. Plant missing data would occur if we decide that we will use certain subsets of the questions for different people. Here we have one group of people answering to Y1, Y3 and Y4 and another group of people answering to Y1, Y3 and Y4 and the third group answering to Y1, Y3 and Y4. How do we then analyze the data is that we apply missing data techniques to deal with this missingness and that will take care of the problem. So this is for scenarios where data collection is costly and we want to collect more data than we actually could. Another alternative would be simply to collect Y1, Y2, Y3 for all of those cases and leave Y for uncollected. The final pattern identified by enders is the latent variable pattern. The idea of latent variable is that it is simply a variable for which we don't have any data and this could be because the latent variable is something that we simply cannot measure with a survey or it will be unethical to collect or something like that. Then we don't have data for a variable, we need to use latent variable techniques. Here's an example of missing data analysis. Typically when you start working with data set that contains missing data you inspect the patterns. This is done with data and we just print out the missing data pattern. What comes out is simply a list of these cases. So there are three variables that have missingness. The first variable is age, second is female and third is department. 93% of the cases have all these variables. Then 5% are missing age and female and 2% are missing age, female and department. There are other variables that don't contain any missingness. So when you start working with missing data the first thing that you do normally is to print out the patterns to understand where things are missing, how much is missing and are there some things that are missing together. Based on that analysis you then start thinking about your analysis strategy for dealing with the missing data.