 Before we talk about specific techniques for panel data analysis, it's useful to take a look at the big picture. Why do we want to model effects over time? Why do we want to model X at time 1, Y at time 2, and why do we want to sometimes model the effects of y from time 1 to Y at time 2? Let's take a look at one example that I used when I started discussing how to make quама claims using quantitative data. Tämä esimerkki on 500-lista, joka on yksi 500-luvun järjestelmä, joka on puhuttu suurin järjestelmässä. 2005 oli tämän huomioon, että erilaisuusroojen järjestelmässä oli 4,7 % ympäristössä. Tämä on tietysti yksi erilaisuus. People wanted to interpret this empirical finding as evidence that it is actually the female CEO or the woman leader who causes the profitability differences. So there were calls for having more women in CEO and engine management team positions after this finding. But this is just an observation and it's not directly evidence of causality. So what do we actually need to make a causal claim? että meidän on tärkeää yksi käsitellä, ja tämä observaation on yksi käsitellä. Joten näemme, että X ja Y ovat liittyneet ja se on yksi käsitellä. Meidän on myös tarvitse olla se, että yksi käsitellä on influenssia ja yksi käsitellä on yksi käsitellä. En ole käsitellä yksi käsitellä yksi käsitellä, mutta katsotaan yksi käsitellä. Onneksi yksi käsitellä yksi influenssia on se, että me mennään käsitellä yksi käsitellä. Me mennään X, Z-poimman, Z-poimman, ROA. Luukausa voidaan ottaa järjestelmää arvo-ajärjestelmälle. Me voimme olla yksi käsitellä yksi käsitellä, tai ollaan edellinen va quatre, mutta tämä on yksi video. Joten depressionin yksi käsitellä yksi käsitellä, ja miten voimme tehdä? Yksi mote emphasize tämä aluella. Joten meillä on timet 1, CO-genar, timet 2, ROA, ja jos tämä on kaikki data, jota meillä oli, semmoista, mitä voimme tehdä, on vain run and regression analysis. Meillä ei voi tehdä mitään, jos ei ole enemmän dataa. Mutta tietysti olemme enemmän dataa. Joten voimme olla enemmän observaioita. Voimme olla ROA, timet 3 ja CO-genar, timet 2. Joten mitä tehdään tätä dataa? Voimme pystyä kaikki dataa, ja run and regression analysis. Joten ymmärtää dataa, ja run and regression analysis. Tämä voisi tehdä, mutta siellä on vähän suurimmat. Ensimmäisenä jos ajatellaan, että ymmärtäjärjestelmät ovat ymmärtäjärjestelmät. Joten mitä ymmärtäjärjestelmät on ymmärtäjärjestelmät, se on ehkä ymmärtäjärjestelmät, että onko ymmärtäjärjestelmät, tai jotain tosiaan ymmärtäjärjestelmät. Ja ymmärtäjärjestelmät täytyy persistää aikaan. Joten meillä on, jos niiden proffitabilitiojärjestelmä on eri painotapuolot, niin tämän olevat esittelyksemattomien ja ja timet 3. Ei vain sitä, mutta ollaan sitä edelleen myös ymmärtäjärjestelmässä. Siellä ymmärtäjärjestelmät ymmärtää tosiaan sellainen, että onko gearsiantilaattoman roadaa, josta se on ymmärtäjärjestelmät, joilla ei ole niin paljon asettelijä. Joten meillä on ymmärrytty tuntur- ja tuntur- ja tuntur-levy-tehtävä. Jos tämme vain katsotaan eroturmaa, niin tämä olisi kokoa kokoa eroturmaa. On se yksi problemo, että eikä ymmärry tuntur- ja tuntur-levy-tehtävä? Tämä on yksi keskustelua. Jos tuntur- ja tuntur-levy-tehtävä on ymmärrytty, Siksi control for the time dimension, using cluster over standard errors, and we will be fine as long as the sample size is large enough. But sometimes we want to do something more complicated. So what if we have this scenario, where we are saying that actually ROA now affects the future values, so we have lag dependent variables, a predictor of the current variable. This is where things get complicated. Kyse absolutely. This is called the dynamic panel model, because the variable depends on its paths values. And I have a general video about lag dependent variables that discusses whether and when you should apply these kinds of relationships. I am just gonna give an overview at the end of the video, why I think this kind of relationships and when they would make sense. But quite often we have more ROAs. So, we don't have ROAs from just time 2 and time 3. We have that from time 1 as well. So we can add those predictors, and we probably have CO-genders. And if we say that ROA persists over time, then we would typically also model that, well, CO-gender persists over time as well. So we can check if it's the CO-gender that comes to the ROA or whether it's actually an ROA that causes CO-gender. So this is a very common model used for panel data, typically estimated using structural ecosystem modeling techniques. This is called the cross-lagged panel model. So the idea is that it is cross-lagged. It's lagged because we have lagged dependent variable explaining its future values, and it's crossed because ROA affects gender, gender affects ROA, so these lines cross here. So this is cross-lagged time model. But then again, if we have these unopered courses of ROA, how realistic is it? Assume that they are not correlated. Not very realistic. So if we put out a correlation here, this would be identified because we could use econometric techniques, and these ROA and CO-gender from the first time point could be used as instrumental variables for these future values. But if we think that unopered courses of you correlate over time, then we would also logically have to add these arrows to the model. So whatever is the course in ROA at time one, if it persists over time, it also causes ROA at time one. So time one and time two would be correlated. So and that's an endogeneity problem. We really can't deal with that problem with just these data that we have here. Then we can also add, consider another case. So if we have Fermi-level effects here, this is something that we would model with, for example, GLS random effects model. But how realistic is it that the Fermi-level effects that affect ROA at time three and time two don't affect time one? So we can do Fermi-level effects two and have them affect ROA at the first time point, and that gives us the correlated random effects model. So this kind of brings together many of the different modeling approaches that I've discussed before in the context of multi-level model. So we can do these lag-dependent variables and permanent effects. This causes some complexities in the analysis, if you want to do it using econometric techniques, but for structural ecosystem modeling using white form data, this would be identified if you have three observations, three time points or more, and that wouldn't be a problem. What if we have autocorrelation here in the error terms? Then again, we would have problems because how would we deal with that endogeneity here? So this is the most general problem and general case. We have error terms that autocorrelate, we have Fermi-level effects, and we have ROA or dependent variable that affects itself over time. We can deal with the Fermi-level effects easily, or it can be dealt with, whether that's easy or not depends a bit on what else you include in the model. But it's pretty much always doable. So what we have here now is that how do we deal with this endogeneity problem? We are saying that the causes of unopposite or causes of u-correlate and ROA or why dependent variable affects itself over time. So how do we know that the correlation with ROA here is because of this causal effect or because of this correlation between unmodeled causes? We really don't because this ROA and CO-gender can be used as instruments for this model, because they too have to correlate with error terms. In practice, we need to make decisions. And the decision that we make is that we assume that one of these paths, autocorrelation path is zero, and that is sufficient to identify the model. So when would we make such a sum? It depends on what is the nature of your dependent variable. If your nature of dependent variable is something that is accumulated over time, like for example assets, then assets this year is simply whatever we had last year, plus or minus the change. Then having this ROA to ROA, the dependent variable to dependent variable, would make sense, would be more important than have this autocorrelation between error terms. But if the variable is something that is realized over and over again, so sales tend to correlate over time, but it's not because sales from past time cause this time, rather it's because the determinants of sales at the particular year correlate with the determinants of sales from the previous year. Sales. And then we would use this correlated error term and would leave this effect from the lag-dependent variable out from the model. So this is the trade-off that we have to make. Of course, it's possible that we use external instruments, so we have instrumental variable. We instrument the first values here, but finding those instruments, particularly if you want to find them outside the model, that would be tricky. There are some techniques that rely heavily on instrumental variables, but most simplest techniques don't use instruments and simply leave out either this autocorrelation here or this autocorrelation here. This kind of modeling framework that builds up from the simple model to our complex models are, of course, not my invention. So here is one recent one from organizers of research methods, and this is a series of two articles that go through, in a way, similar model building approach. They start with a simple model. They consider what complex things we need to add in certain scenarios. The model goes more complex, and then they discuss the estimation techniques. This article, this set of articles is pretty advanced, but it's a good starting point or kind of like a good reference point on the literature of these estimation of these dynamic panel models that can have effects over time. So this is a good starting point also for trying to understand how these models are specified and estimated and why.