 Longitudinal datasets are very common in research, what are longitudinal datasets and what is longitudinal data analysis. In the simplest possible case, a longitudinal dataset is simply one-time series. So we follow, for example, how one variable within one case develops over time. However, more typically we deal with panel data. In panel data we have multiple observations of multiple variables of multiple cases. Here, for example, in Woolwich's book, the introductory dataset for panel data analysis is repeated observations about different cities. So we have 150 cities, each city observed twice, so we have 300 observations in the dataset. And then we have various different variables related to crime statistics of these cities. So this is a panel dataset. And these kind of datasets, they are longitudinal because we have repeated observations that can be indexed on two different values. First we have the city ID and then we have a time index. Importantly, whereas the city ID does not really have an order, the time variable is ordered variable. And that's what makes these kind of datasets special. So we have two levels or two ID variables, so this is a multilevel data. And on the final level, the ID variable has a time order. So this makes it a longitudinal dataset instead of simply a generic multilevel dataset. So how do we then analyze this kind of data? There are a couple of different ways depending on what we are interested in. The simplest possible way to analyze this kind of dataset is simply to ignore the time dimension. So simply assume that this is just a multilevel dataset. The time index does not really make a difference if we could use GLSRE, GLSFE or just normal multilevel model for analyzing the data. These models don't really consider the time and even if we reorder the variables or reorder the observations, we shuffle or randomize the time variable, the results would be exactly the same because these models just take the clustering of the data into account, but don't concern the time ordering of the observations. So we can simply use this multilevel techniques and ignore the longitudinal nature of the dataset if we are not interested in the longitudinal nature. Then there are two things that we can study from a longitudinal dataset that is not possible using cross-sections. So the first set of observations, this is basically we could do cross-sections as well and get their answer, but for these two things we can't use cross-sections. Why I say two becomes clear in a moment. So the first class is effects of time or trends and we want to analyze how something decreases or increases or first goes up and then goes down over time and what explains why certain cases go and have different time trends. Here we would use multilevel models with time as a predictor or perhaps latent change models or some other kinds of models. But these models are actually pretty simple because there is nothing special in time as an explanatory variable. So time is just a variable like any other, just the fact that it happens to be one of our ID variables does not really make a difference for this kind of model. Then we have these two classes where we model the effect of a variable and its future values or we model the change in a variable. So we compare y at time zero against y at time one, we take the difference and then we try to explain that difference. These are in practice the one and the same thing. So whether we calculate the difference between two observations of y that are come after one another over time or whether we model y as a predictor of its future values. There are problems that we encounter are pretty much the same. So these are basically the same thing as long as analysis is concerned. So this first said no time or no dynamic effect. So we use the term dynamic effect to refer to a case when a variable predicts or explains its future values. If we don't have time effects that we're modeling, if we don't have dynamic effects, then we can simply use a multi-level model with cluster of a standard. And that's not really a long-intutinal model. So we have two cases then about long-intutinal analysis. We have models that model time as an explanatory variable, they model time trends and then we have dynamic models where y predicts its future values. These models of time trends are very simple in a way because time is just a variable. There's nothing special about time. However, what makes this a bit more complicated to understand or requires some amount of studying is that there are some special labels given to special cases. For example, if we have a latent change model, latent change model basically refers to a model where we have a time trend that is linear and the slope and the interest vary between the different observations. So that's basically just a random interest model of time, but for some reason we refer to as a latent change model and you have to just know this terminology. If it wasn't for the terminology, we could just apply multi-level modeling or structural equation models like we do to any other dataset and there wouldn't be anything special to know about how to model time. Latent change core model is another term that is used for a model that is simply an application of a more general model. Then we have some statistical techniques that are general, but they tend to be used in management research in the context of modeling trends. One such example is the latent class model. So the idea of latent class model is that it's a latent variable model, but at least one of the latent variables in that model is actually it's not continuous and normal, rather it's a category. We could have, for example, different time trends and we could say that a case follows the first time trend with 70% probability and second time trend with 30% probability. And our goal of analysis is to discover these time trends from the data and then try to understand which variables explain which of these data time trends that we discovered each case follows. So I have another video about our latent class models. Then dynamic models, this is the challenging case in long internal analysis. And dynamic model is a model, again, where y explains its future values. And what makes these complicated is that the unobsured courses or the error term tends to correlate over time. And why is that the case if you think about the x variables and the y variables in your model. Quite often if we calculate autocorrelations for those variables, they tend to be pretty strong. So there's tendency that a company that is profitable now is also profitable the next year. If a company is large now, it doesn't suddenly become a small company next year and then become a large company again. The company size tends to persist. So there are these factors that we observe tend to persist over time. And if the factors, the variables that we actually observe correlate over time, then it's unreasonable to assume that those variables that we don't observe that go to the error term would not correlate over time. And whereas in multi-level model where we don't have an effect on y on itself over time, this is pretty simple to deal with by using cluster over standard errors. When we actually have y as a predictor of itself, this leads to a tricky endogeneity issue. And these equations show the endogeneity issue. So we have y1 that depends on y0 plus some error term u1, then y2 depends on y1 plus error term u2. Now if the error term correlates over time, so u1 and u2 are correlated, we can see that there's an endogeneity problem because u1 is a part of y1 and then u2, which is a part of y1 correlates with u2 here. So u1, which is a part of y1 correlates with u2 here. And that causes an endogeneity problem. How we solve this endogeneity problem leads to some tricky situations, but basically we need to have instrumental variables for that. And we can be creative about the instruments. I will talk about that more in another set of videos. And conceptually one of the challenging things in dynamic models is actually whether you should use a dynamic model or not. So this endogeneity problem, this is more like an engineering problem. You can solve it from the data. Assuming certain things, which you of course need to justify based on theory, and some can be justified partially empirically. But whether you actually need to have the lag-dependent variable as an explanatory variable. That is a challenging question. I've actually asked that some authors whose work I review or as in Marola's an editor. And the response for whether why people have the lag-dependent variable as a control is quite often that it is a convention to do so in a particular field. So instead of having thought through whether the lag-dependent variable as a predictor is actually required, which complicates the analysis process considerably, people tend to assume that it is required based on the fact that these kind of models are commonly used. So to summarize, longitudinal models can be categorized into two different classes. We have models of time trends, and then we have dynamic models where why variable explain or predict its future values. These models of time are fairly simple because time is nothing special. It's just a variable like any other. And these dynamic models they lead to indogeneity issues and they are actually pretty tricky to understand.