 Hi, this is Justin Essary, and this is Week 9 of PolySci 509, the linear model. And today we're going to talk about panel and time series data. And panel and time series data is a really interesting field of study for a lot of reasons. One of those reasons is that a lot of the data you encounter in political science is of panel or time series form. All we mean by that is that it's either one unit, like a country or state or person, observed at multiple different points in time, or it's multiple units, so multiple countries, multiple people, observed at multiple points in time. This is a form that we often find political science data in. So it's a highly relevant area of study. And it's also a highly problematic area of study in the sense that panel data presents a lot of special methodological challenges to be overcome. That's why the lecture is, as you can see here, entitled, Whatever You're Doing With Panel Data, You're Doing It Wrong. There are lots of pitfalls and, I should say, there are lots of potential issues one can encounter when studying panel or time series data sets. And there's no way that I could even hope to tackle all of those issues in one short lecture. So what I'm going to do in this lecture is try to familiarize you with some of the most basic issues of panel data that will hopefully get you ready for basic analysis of panel data, which you're automatically going to encounter as I already indicated in political science. And then, hopefully, the lecture will encourage you to be more curious about panel data sets and take a future class in it. So some kind of panel or longitudinal data analysis class. So just a couple of quick notes here. Sometimes panel data is called TSCS data. That's shorthand for time series cross-section, which is just an indicator that the data set is cross-sectional in the sense that we have multiple units, but that those multiple units are observed over multiple times, hence time series cross-section. As I hope you'll learn today, there are lots of problems that can exist in TSCS data sets. And the consequences of these problems can be severe. Sometimes they're not, but often they are. And diagnosing whether those problems even exist can be somewhat of a challenge. So what I'm going to do is get you started on the analysis of panel data by talking about four problems in particular, spatial correlation of the errors, which is when the error component of a regression shares some kind of common component across units at a single point in time. I take that back. It's a case where the errors of a particular unit are correlated with each other across time. So for example, if we have country data, country year data, all of Germany's error components might be bigger than everyone else's, for example. Contemporaneous correlation to the errors is actually that thing I was just describing. That's a case where across all the units at a single time, there's some kind of common shared components. So like, for example, in 1975, all the errors of all the countries are positive or negative for some reason. And we'll talk more about why both of these cases can happen in just a second. I'm also going to talk about autocorrelation of the errors. Autocorrelation of the errors is a case where what happens in a particular unit at time t is influenced by what happened at time t minus 1. So for example, if the stock market has a really big boom in one year, that might cause a due to random chance, I should say, that might cause a boom in the next year for various reasons that propagates through the error components. So we'll talk about why that happens and what you can do about it. And finally, the last problem I want to talk about is unit heterogeneity. That's just a simple idea that different units, different cross-sections are different from each other. So for example, Germany and Italy and the United States, for example, if we're studying their conflict behavior, all have different characteristics. Some of them we can model explicitly, but others we can't, we're going to figure out what the consequences are of that problem and what to do about it. So in each case, I'm going to present you a problem, talk about why it's a problem for inference, and then advance some kind of model-based solution. So this should be interesting. So we're going to start off with spatial correlation. And it's probably going to be helpful to start off talking about what time series panel, I'm sorry, time series cross-sectional data looks like, because that's going to sort of set up the problems that we're going to find in time series cross-sectional data. Just to sort of set it out, TSCS data looks like N units, which can be countries, they can be people, or any kind of unit of observation, observed over T time periods. And those time periods can be years, months, not moths, months, or any other. There are some IR data sets where they're even down to the day. So the issue here is, so we've got the data set kind of looks like N units observed over T times. And it might be the case that sigma squared, the degree of noise in the data generating process, is different for different units. So this is probably best illustrated in an applied example. Let's talk about an example where we might see this happening. Suppose, for example, we're modeling a state's GDP gross domestic product per year, so a measure of its economic output. And we've got its gross GDP being a function of a bunch of covariates x at time t. So the gross domestic product of a country is at time t in a certain year, is a function of its characteristics in that year, plus some error. OK, so it might be the case that some countries are less stable or more stable than others. So some countries' economies are more stable. Some are less stable. So we might expect, for example, communist countries to have lower variations in economic output than capitalist countries, because communist countries have more regulated economies that result in lower variations, slower growth over longer periods of time. More capitalist countries might have more volatility. They have periods of really fast growth than recessions of slower, even negative growth. So as a result, this is another way of saying that some countries have larger variants of their error term than others. So what this does is it breaks the homoscedasticity assumption that we used as a part of the CLRM. So our CLRM. So you might recall that we said that u transpose u, you can write that as utu or u prime u, those are both equivalent statements. That's no longer equal to sigma squared i, which is the usual simplification that we used in deriving the variance covariance matrix of the beta hat in the classical linear normal regression model or even the CLRM. So all of our proofs pertain to hypothesis testing in the CLNRM no longer work. So the VCV of beta hat, which is that, is no longer equal to sigma squared x transpose x inverse, which is the usual formula for the VCV. And therefore, any statistics we have that rely on that result are going to be inaccurate. So the t tests, f tests, all of these things that rely on that result are going to be inconsistent. I'm sorry, are going to be, just say it, let's say incorrect. However, beta hat is still consistent because the probability limit as t goes to infinity of 1 over the square root of t, sum of t equals 1 to cap t of u hat t equals 0. So in other words, the variance in our estimates still declines in increasing numbers of observations, particularly number of time observations. And so this problem gets diminishingly small as t gets larger and larger and larger precisely because the variance in beta hat shrinks anyway. And so the fact that the variance of beta hat is inaccurately estimated becomes less and less important. So let's recap. The CL, I should make this clear. It's really the CLNRM, although the CLRM wouldn't work either. But the CLNRM regression model has these nice test statistics that fall out of the asymptotic or even small sample distribution of beta hat. But that distribution depends on a lot of assumptions, one of them being that u is homoscedastic. u has a constant variance. Spatial correlation of the errors breaks that by making it so that u is now correlated with other values of u for the same unit. So a really large u for Germany in the GDP example we had earlier in some year would also be correlated with a high u for Germany in the next year because Germany's variance is always higher. So if that's the case, if it's the case that some countries have larger variance in the error terms than others, then we can no longer use the test statistics that fall out of that are supported by homoscedasticity in u. One thing that's actually worth noting before I move on. So this is all this stuff I just said, so it falls under the consequences bit. But one thing that's worth noting is that there is a difference here between differences in the mean level of the DV. So the mean for a country, let's call it country cap N. Well, no, we used cap N earlier. Let's call it mean for country K is not equal to the mean of country J for two units K and J. That is different than the variance of uK at time T not being equal to the variance of uJ at time T for two units K and J. And lots of people get these two things confused. The first one says, for example, Germany, going back to our GDP example, maybe Germany always grows faster than the Netherlands or France. That's a difference in the mean growth rate, which is relevant. That comes under unit heterogeneity, which we're going to talk about later. What we're talking about now is not that. We're talking about systematic differences in the variance level because the u component is too large. So there are some subtle ways in which those two ideas can be connected. But the underlying theoretical structure, as I understand it, is that spatial correlation of the errors has to do with error values being correlated with each other and, in particular, certain units having larger or smaller variance than other units. There are some special cases in which one could maybe patch up the problems of unit heterogeneity by treating them as though they're problems of spatial correlation. But those are special cases. They won't always work. In fact, they won't often work. And we'll talk a little bit about that in a second when we do some work in R, and also when we talk about unit heterogeneity. So the next problem we're going to discuss is the problem of contemporaneous correlation. Contemporaneous correlation is closely related in concept to spatial correlation of the errors, except instead of errors being correlated inside of units, so for example, all of the use of Germany over the United States being similar to one another in a time series cross-section data set, all of the errors at a specific time are correlated with each other across the units. So if you think about time series cross-sectional data set looking a little like this, here's a big old data set. And let's say we've got units in the rows, units here. And in the columns, we've got time periods. And each one of these squares contains a bunch of observations. So this here is the observation for unit 1 at time 1. This is the observation for unit 1 at time 2, and so on. Spatial correlation says that the errors are correlated inside of units. So all of the errors inside this unit are correlated. This is the spatial correlation of the errors. Contemporaneous correlation says that errors are correlated across units at a specific time, so contemporaneous correlation. And so if we were thinking about our GDP example from the previous discussion, instead of having, say, Germany's GDP correlated across time at different observations, we'd instead have a particular year's error all correlated, which makes perfect sense in that GDP example, because sometimes the world economy does really well, and sometimes the world economy does poorly. And when the world economy does well, we expect multiple states to outperform their GDP. And when we have, actually what I should say is when we have states, when the world economy does well, we should expect multiple states to outperform their projected GDP according to the fundamentals, so to speak. And when the economy is bad, we expect multiple states to underperform their projected GDP projected on the fundamentals. And we expect that to be true across the world at that time. So contemporaneous correlation is very similar to spatial correlation, except that it happens across time instead of across units. And the consequences, conveniently enough, are essentially the same as those of spatial correlation, which is to say that the VCV of beta hat is no longer equal to sigma squared i. I'm sorry, take that back. Sigma squared x transpose x inverse. That's no longer true anymore, because that result depends on u transpose or uu transpose, rather. That result depends on uu transpose being equal to sigma squared i, which it's not anymore unless it's not when the errors are not homoscedastic. When the errors are correlated across time, the errors are no longer homoscedastic. Ergo, I can't jump to that step in the various CLNRM proofs that we saw a couple weeks ago. Ergo, our standard VCVs and all of our hypothesis tests that draw on that result are out the window. Now, the good thing about spatial and contemporaneous correlation is because they're so similar and because they also bear similarity to the generic heteroscedasticity problem that we saw earlier, we are going to be able to solve them both at the same time and using a technique similar to the white or huber white heteroscedasticity consistent standard error procedure that we talked about, I think, one or two lectures ago. So let's just get to it. How do we solve it? We employ a variant of white's robust SEs, so let's do it. So panel corrected standard errors. Panel corrected standard errors are an application of the huber white heteroscedastic consistent VCV matrix but making a different assumption about what that VCV matrix is going to look like. So you might remember that the VCV of beta hat, which is equal to beta hat minus beta times beta hat minus beta transpose, is equal to the following quantity. This comes out of the CLNRM proofs that we did earlier. I'm just sort of skipping to a later point in those proofs, skipping a lot of steps so that we don't have to titulate it all. X transpose X inverse X transpose UU transpose X times X transpose X inverse. That is the formula, the total formula for the VCV of beta hat. Now, if we were using the CLNRM proofs, this would reduce to sigma squared i. And so we would get X transpose X inverse X transpose sigma squared i X X transpose X inverse. This would go out front because it's a scalar. i times any matrix is just equal to 1. So what we get is X transpose X inverse X transpose X X transpose X inverse times the quantity sigma squared. This cancels, and we get sigma squared X transpose X inverse. Done and done. That's where that all comes from. I guess I actually did end up recapitulating the proof. Oh, well, free extra content. It's video. Who cares? Anyway, but now we can't make that assumption anymore. Now, instead, we're going to assume that you. So I'm going to actually get another color here. Now what we're going to assume, come over here, that U transpose U takes a form omega. And if we think the form of omega determines the form of heteroscedasticity we're facing, in sort of garden variety vanilla heteroscedasticity. So standard heteroscedasticity framework. Heteroscedasticity framework. We see something like this. Omega, we assume, has the following format. Sigma squared, sigma squared 1, sigma squared 2, blah, blah, blah, sigma squared n. And that's different for every observation. But there's no correlation across observations. So in other words, omega is a diagonal matrix of variances, each variance corresponding to each observation. But the errors across observations are not correlated at all. That's the standard framework for heteroscedasticity. And that's what implies the Huber-White robust standard errors and all the other variants that we talked about a while back. And we estimate that using something like omega hat is the diagonal matrix of U, U transpose. U hat, U hat transpose, I should say, but hats on those. So that's effectively where Huber-White standard errors all come from. Instead, now we're going to do something different. We're going to construct the standard errors in a different way. I'm going to use the gray here to indicate the new framework. Oh, that's red, not gray. Here we go. So spatial and contemporaneous or temporal correlation of the errors implies a different structure for omega. Something looks a little more like this. So imagine we've got NT observations in both the rows and the columns. This is an NT by NT matrix. So NT is the total number of observations, the number of units, times the time. And each one of these sigmas, 1, 2, 3, all the way up to, there should be cap T, many of these, is an N by N block. So each one of these sigma i's is an N by N cap N by cap N matrix of observations for time T. So each one of these sigma matrices is an N by N matrix, like so, contains omega 1, 1, omega 1, 2, this is squared, dot, dot, dot, omega 1 cap N, omega 2, 1, omega 2, 2, squared, dot, dot, dot. Actually, I should say just omega 2. This is omega 1 squared right here. This is omega 2 squared, omega 2 N, dot, dot, dot. This always goes all the way down to omega N1, omega N2, dot, dot, dot, omega N squared. Each one of these omega or small omega elements is a variance or covariance for an observation for a particular unit at that particular time. So if this is sigma, let's see, if this is sigma sub T, this is the matrix of observations for time T. Each omega ij is the covariance between u of i and u of j at time T. Omega i squared is the variance of ui at time T. So in other words, each one of these sigma matrices is a matrix of the variance covariance of the errors. It's a VCV of the errors for that particular time period. So these observations in this grand big omega matrix here are ordered by time. So all of the first times observations are listed first, then the second times, then the third times, and so on all the way down to the last times. All of the off diagonal matrices are just matrices of zeros. So what this is telling you is that the errors in our data set are correlated at a particular time and across particular units. Why do we know that they're correlated at a particular time and across particular units? Well, going down to the properties of this variance covariance matrix, I'm going to switch back to blue now just so it's a little more visible. All n units will have unique variances, omega i squared. These are the diagonal elements of sigma. So we allow units to have errors that are correlated with each other, as in spatial correlation. Two, the error for unit i at time T will be correlated with the error for unit j at time T. This is omega i omega j, which I have written as omega i j, but it's actually omega i times omega j. So omega i squared is omega i omega i. Omega ij is omega i times omega j. So third thing, the error for unit i at time T will not be correlated with the error for other units at other times. These are the blocks of zeros. So you might get a little more insight as to what's going on here by figuring out how we estimate these quantities. So what I've abbreviated as omega i squared, we're going to estimate that by looking at all cap T observations from T equals 1 to T of the estimated error for unit i at time T. That's going to be our estimate for omega i hat squared, or the variance of ui hat. So in other words, in order to figure out what the common variance inside of a unit or the spatially correlated error is, just take all of the error terms across all the times for one unit and take the variance of u hat for that particular unit. What I've called omega ij, which could be also written omega i times omega j, is equal to the sum from T in 1 to T, cap T, divide by 1 over T, of ui T hat uj T hat, or the covariance between ui and uj in general. So what this says is, at a particular time period, variances for two different units can have a covariance. And we're going to estimate what that covariance is by, for every pair of units, every pair of countries, say, or every pair of people, at every time, we're going to calculate the covariance by summing up across all times all of the different covariances across those units. So in other words, what we're not going to estimate, this might be enlightening, what we're not going to estimate at all is what's the covariance between unit i's error and unit j's error when unit i is measured at a time, say, T, and unit j is measured at a time, say, T minus 1, or T plus 1, some other time. We're not going to allow the variance or the error variance for a unit at a particular time to be correlated with the error variance in some other unit. So we are still imposing that level of structure. But what we are going to do is we're going to say, at any particular time, these error covariances can be correlated across units. And across times, the errors can be correlated within units. So Germany can have a higher error variance than Austria at all times. Germany's error can be correlated with Austria's error, but only inside of a particular time. That's what we're doing in this panel corrected standard errors framework. I should also, so I'm going to move this down a bit, I can write this as a measurement that happens all at once in matrix form. So let E be a cap T by n matrix. In the rows, we've got times u hat 1, 1 is unit 1 at time t. So this is times t rows and n columns as units. u 2, 1 hat is u 2 times u 1. Wait a minute. No, it's unit 1's error at time 2. There we go. This is unit 1's error. No, this is unit n's error at time 1. This is unit 1's error at time 2. This is unit 2's error at time 2. This is unit n's error at time, or this is unit, yeah, n's error at time 2. u 1t, u 2t, u nt. These are all hats. Actually, I'm not too fond of my notation there, because I notice I'm putting the rows first, or I'm sorry, the rows second and the columns first. I prob that's against the normal notation. Probably a better way to write this, thinking about it more carefully, is to write u 1, 1, u 1, 2, u 1, n. u 1, 2, u 2, 2, u 2, n. u t 1, u t 2, u t n. That way the row indexes come first. That's probably a better way to do it. So now, omega hat, which has to be n by n, is going to be e transpose e times 1 over t. And if you work that out in a matrix format, that works out to a variance covariance matrix similar to this one right here. This estimate is consistent, not unbiased, which is to say that we need an infinitely sized sample, technically speaking, for these estimated variances and covariances to be accurate. And in particular, it's worth noting that both of the variances that we've calculated, the variances and the covariances, I should say, are calculated as averages over time, averages of a unit over time. Going back to those estimates, up here, you can see that for both the unit level variances and the covariances between units at a particular time, we add up the observations over time. What that implies is that more n won't help with consistency. We need t to go to infinity, technically, for consistency results to apply. This is just a way of saying that our variance covariance estimate that we just calculated is only consistent in t, very large t, not in very large n. So if you're an IR scholar and you have two years worth of data and 200 countries, you might say to yourself, oh, I've got a really big sample. Heteroscedasticity robust estimates of the variance covariance matrix should be good to go. But if you're running panel correct-to-standard errors, that's not really true. Because our estimates of these omega i squareds and omega i omega j squareds are only good, or say, their means, and those means get more and more accurate as t gets bigger and bigger and bigger. So you really only want to try using this in particular situations where you have very, very large t. There's no necessary strict cutoff with just how large the t needs to be. Sometimes 20 or 30 is given as a good cutoff. But it's certainly, there are some Monte Carlo results along these lines. Particularly, I want to say, Judson in economics letters has some analysis, if I recall correctly. But the point is you only want to try this when you have big t, which is to say many time units, not just when you have big n. Big n is fine, but big n won't help you attain the consistency you need in order for these results to make sense. So now I want to show you how to estimate panel corrected the standard errors in R. And what I've done is given you a data set, lecture data.9, or like I'm sorry, lecture9.dta. It's a data in data format. The first thing I'm going to do is restrict, I should say, first of all, this data is state level information on a variety of characteristics such as police per capita spending, welfare per capita spending, and population murder rates, all kinds of different interesting things. What I'm going to do is create a few variables in this data set. I want to restrict myself between 1977 and 2001. And I'm also going to extract the District of Columbia to just focus on states. So I'm going to do that right there. And now if I do a view of this data, you can see a capital view. There we are. Here's what this data set looks like. It's a bunch of different state level observations on a bunch of different characteristics. You can see there's white population, black population, other population. I always liked that variable, other population. Total population, murder rates, rape rates, robbery rates, these are all per 100,000 people, larceny rates, motor vehicle theft, all sorts of different things. And what I'm going to do is try to model the crime rate as a function of police per capita spending, per capita welfare spending, education spending per capita, log population, and log per capita income. I'm creating all these variables to be per capita. And then I'm going to run a regression and summarize that regression like so. So I've got, actually if I were to mention one variable, add per eye, this is the percentage of the population that are adherents of the Christian religion. So what I'm testing here is the hypothesis that the murder rate per 100,000, or murder, is a function of Christianity and the number of Christian adherents. Police per capita spending, education per capita spending, log per capita income, and log population, which are all variables that for various theoretical reasons we might think are related to the murder rate. And lo and behold, they all are strongly and statistically significant. Christian adherents do tend to be negatively correlated with murder rates. So more Christian adherents as a percentage of the population tends to lower your murder rate, or tends to be associated, I should say, with a lower murder rate. The problem with this model, there are many potential problems. But one of them is we have many state observations observed over many different times. And so what we might want to do is try and correct for the fact that, for example, murder rates go up and down and having to do with sort of fluctuations over time. And it might be that some gears have particularly low or high murder rates, net of the fundamental projections in the variables in this data set. Some states may also have higher murder rates than others, or lower murder rates than others, and they may have higher variance murder rates than others. So big states like New York probably have higher murder rates, but they probably also have higher variance in murder rates because there's just so much dynamism in that population compared to a more rural state. So what we're going to do is I'm going to try to estimate a panel corrected standard error adjusted VCV matrix for this model and see how it changes our inference. So what I'm going to do is, first, I'm going to bind all of my variables into a new data set, and I'm going to strip all the NAs out of that data set. And I'm going to do this because the PCSE package is not especially good at omitting NAs automatically. Then I'm going to run a model on these NaOmitted variables using the data.na matrix. And then I'm going to calculate panel corrected standard errors using this specially created data matrix with the NaOmitted. So the panel corrected standard error package is just PCSE. You can see I'm calling it using the PCSE package or library, which I downloaded from the CRAN website, I installed it actually automatically through RStudio. And I'm going to calculate those PCSEs by feeding it a linear model object that I created up here. I'm going to tell it what the group structure looks like. And in this case, the group structure or the units, the panels, are states. So I'm feeding it data.na$stfips, which is a numerical code that corresponds to each state in the United States uniquely. So there are 50 of these codes. And the time group is year, data.na$stfips. There's a year variable in this data set. Now I can do a summary of this model using the panel corrected standard errors. And I can compare it to a summary of the model with the regular or vanilla standard errors. So let me expand this up and we can compare them. So you can see the betas are all the same. That's encouraging. That's as expected. Panel corrected standard errors, as we discussed, just like the Hebrew white standard errors are non-adjustment to the betas, they don't fix bias because bias is not the problem. They're fixes to the standard errors. And what you can see is that in general, the panel corrected standard errors are bigger than the regular or vanilla standard errors, which again makes sense because typically, heteroscedasticity is especially problematic when we have standard errors that are too small and need to be bigger. And in general, Hebrew whites, heteroscedasticity, robust standard errors tend to inflate those standard errors back to where they need to be. We expect typically the same amount of panel corrected standard errors. And so as you can see, all these standard errors are in fact bigger, but the p values are still all very small. We don't actually change any of our statistical significance decisions on the basis of this analysis, but we do have greater uncertainty in these coefficient estimates as a result of controlling for spatial and temporal correlation. So that probably wasn't a terrible thing to do. Now, for many analysts, it's fair to say many analysts would stop right here. In fact, Neil Beck often likes to joke that PCSEs are often mistaken for a panacea. For all potential problems, one might find in a panel or time-to-use cross-section analysis. And that is not so. They are a useful tool, but they're designed to correct for the specific problems that we just laid out. They are not really designed, at least not in their original implementation, to correct for two other problems we might find in panel data, specifically auto-correlation and unit heterogeneity. So what I want to do now is talk a little about those two problems and ways we might try to attack fixing those. So what is auto-correlation? Auto-correlation is a case where, again, you have some kind of time-series cross-section data. And so at a certain time t, the error term, ut, is correlated with u at time t minus 1, u at time t minus 2, and so on. The extent to which these errors are correlated with each other across time is the order of the correlation. And there are often, there are actually multiple forms of auto-correlation that are possible. I'm going to talk to you about one common form. It's by no means the only form of auto-correlation, but it's one that you often encounter. Autoregressive error with one lag. This is a so-called AR1 error. Error 1 correlation looks like this. So yt is a function of gamma yt minus 1 plus x beta plus ut. And if yt minus 1 is omitted, this model ends up looking like yt equals x beta plus ut, where ut is correlated with the previous error, ut minus 1, and some kind of truly noise-based random error term. These two models are essentially equivalent in the sense that both of them involve correlation of the current observation with the observation in the immediately preceding time period. The difference in the two models is whether they attempt to explicitly model that correlation as some kind of relationship with the lag-dependent variable, or rather whether they relegate any correlation between the past and the present to the error term and then try to capture that correlation specifically in the error term. And in the case where it's relegated to the error term, rho is a correlation coefficient, which means it varies between negative 1 and 1. There are many different kinds of autocorrelated error structures. AR is just one of the kinds. There's also so-called moving average error structures, which we're not going to discuss, but they are another common form. There are autoregressive moving average or ARMA structures, and there are many others. We can save that for your longitudinal or panel data class. There also are, it's very possible that the degree of autocorrelation goes deeper than one lag. There are so-called AR2 or AR3 models, where today's error term is correlated with the error term two days ago or two years ago, depending on exactly our data structure, maybe three years ago, maybe four years ago, maybe seven time periods ago. Again, I'm just noting that that's possible, and we're going to leave the specific details of how one might deal with that for a panel or longitudinal class. The consequences of neglecting autocorrelation are an inefficient estimate of the VCV for the exact reason that spatial and temporal correlation, spatial and contemporaneous correlation, were a problem. The errors are no longer homoscedastic. Without homoscedastic errors, we can't invoke the CLNRM proofs, and all of our hypothesis tests are messed up. On the good side, beta hat is still consistent, so we don't really have necessarily a bias problem. Nevertheless, because we like to have reasonably accurate standard errors and maybe conduct hypothesis tests, we want to fix this. And there are many ways to think about fixing this, and I'm going to discuss all of those ways in the next section. One way of thinking about correcting for autocorrelation is our dear friend, the lag-dependent variable model, which we talked about earlier in the class as being a biased, but consistent, and very often used model. And you probably got that I was thinking about using this by the way I wrote the AR1 model as potentially sort of this kind of lag-dependent variable model. So why don't we just explicitly lag the dependent variable into the heavens? And then so this here, this gamma, is explicitly the degree of autocorrelation, which is just to say it's the extent to which yt minus 1 is associated with yt. This sort of thing is a perfectly reasonable thing to do, and it's an attempt to explicitly capture the structure of the autocorrelation in the model. Another way of trying to do it is to try to build an omega matrix in the VCV estimation that accounts for the degree of autocorrelation. So sometimes this is sort of said as there are two ways of dealing with a problem statistically. One is to try to directly take it on as a modeling challenge, and the other is to try to patch up whatever problems the complication creates, whatever complication this problem creates in the standard errors. I'm not going to, because this is not really a longitudinal class, I don't want to belabor exactly how this AR1 correction would be made in the panel corrected standard error context. What I want to do is just show you in R and also in Stata how one might think about making these corrections. So you can see I've opened up Stata 11.2 here, and I've got a do file that has the same data set that I was just using, this lecture9.dta. I'm just going to run through a bunch of really basic models here, and really I'm going to script straight down to the panel corrected standard error model with AR1 correction. So most of this stuff is just the same kind of corrections that we made in R, for example, removing missing data and restricting ourselves to a specific time period. This last bit is a panel corrected standard error model, or XTPCSE, with AR1 correlation that's common across all the panels. So as you can see, these results that I've just estimated are actually identical to the results the betas that I estimated from my previous R model. So 36.34 is the intercept. OK, here we go. So after fixing a couple of minor data stamp foods in the data set, you can now see that my standard panel corrected standard errors results, which are just an OLS with panel corrected standard errors as estimated in Stata, are exactly the same as the panel corrected standard error estimates I got in R. So for example, the constant is 36.34 with PCSE of 8.68, 8.68, 36.34. Exactly the same answer. But Stata can go a bit further than R in the sense they can also include a correction for AR1 correlation inside these models. So what I've done is run the same XTPCSE command, which is the Stata command to run a regression with cross-sectional time series data and panel corrected standard errors, but I've put correlation AR1 in here to indicate that I need to correct for AR1 correlation errors as well as spatial and temporal correlation. And you can see actually that the betas do change here. That makes sense because in practice, what we're doing or sort of a version of what we're doing is putting in a lag dependent variable. It's not actually what we're doing here. We're sort of trying to fix it on the back end by changing the error structure of the regression. But the betas do change a bit. And you can see that now, whereas before all of our variables were statistically significant, now many are not. Police per capita spending, education spending are not statistically significant. Log per capita income is only statistically significant at the one tailed level, 0.05. However, Christian adherents are still a strong negative influence on murder rates. And a log population is still a strong positive influence on murder rates. The police per capita spending, in particular, goes way down, possibly because police per capita spending is very closely tied to past year's per capita spending. So there might be a bit of a so-called multi-colonarity issue here. You can see that rho, which is that degree of multi-colonarity, is very large, which means that, generally speaking, murder rates track each other very closely across years. There's also another version of this correction we can estimate in data called PSAR1, which is Panel Specific AR1 Correlation. So here, each error is correlated across time. But the degree of correlation, the rho, is actually different for different states. So for example, the first state, rho is 0.95, but the next state, it's only 0.80. Sort of estimate these rows independently using the T observations for each state. And here we get even different answers yet. Log per capita income becomes more statistically significant again, but generally speaking, it's kind of heartening that we keep getting the same kind of results over and over again. Christian adherents are a negative, about 5% decrease in, I'm sorry, is it yes, 1% increase in Christian adherents causes 0.05 decline in murders per 100,000 population, so a very small but statistically significant decline. Log population and log per capita income are stronger, though in one case less certain, influences on the murder rate. A one unit increase in the log population causes a one point, or is associated, I should say, with about a 1.5 for 1.6 per 100,000 increase in the murder rate. We could also, instead of trying to model this with some kind of indirect correction on the errors, we could try to estimate a model that has a lag dv directly. And that's what I've done down here. This model.lag is a lagged model using the PLM package. What I'm going to do is, actually, I think I'm going to change this to pooled. Pooled means that all we're doing is correcting for, it's a lag dependent variable model with no panel correction and errors at all. So I'm going to run this model.lag using the PLM package. Oh, it's pooling, not pooled. Pooling, some of these other ones you can see within random between all these other things. We can do some of these. Oh, data.plm not found. Oh, I need to create this as a, yes, this is up here. I need to create this as a same data as before, but I recapitulate it as a panel linear model data set with these index stfips is the panel variable and year is the time variable. So I'm basically recreating this data set as a panel linear model data set. And then I run this model.lag. And if I do a summary of this model, you can see what I've done is the exact same model I did before except adding in a lag dependent variable, which as you can see washes out a lot of the statistical significance of the other explanatory variables in this model. Almost all of them become statistically insignificant other than log population in this little model. And you can see the lag has a coefficient of 0.92, which means that whatever the murder rate is in the previous year tends to carry forward. 92% of that murder rate tends to carry forward into the next year with the rest of the murder rate increases or decreases being determined by these structural predictors. So that's another way of handling it. The disadvantage being for this lag dependent variable model in R, there's no attempt to correct for spatial or temporal correlation using panocard standard errors the way that we can do instata. So the last problem or a problem with panel data that we're going to discuss, at least in this lecture, although far from the last problem with panel data, is unit heterogeneity. I should note right at the start it is, again, not fit. Whoops, again. No. Again, it is not addressed by PCSEs. So it's not fixed by panel corrected standard errors. Unit heterogeneity exists when different panels or units have different intercepts for the data generating process. So for example, consider the following model. Y at time t is a function of the covariates at time t times beta plus a series of dummy variables for time t times alpha plus u sub t. So here in this alpha is the effect, or alpha sub i, I should say, is the effect of being in unit i. And d is a matrix of n many dummy variables corresponding to the units. So actually, I should say that if we're talking about a specific observation at time t, this is just going to be a vector n by 1 of dummy variables corresponding to the units. So the idea here is, actually, no. Is that right? No, it's going to be a matrix. Because this is what it's going to look like. There's going to be n many dummy variables. Actually, if you include a constant, it'll be n minus 1 many dummy variables to avoid the so-called dummy variable trap of perfect linearity between the dummy variables and the constant. But let's just assume that there's no constant in this model for the time being and put in n many dummy variables. So we're going to have the dummy for panel 1, the dummy for panel 2, and the dummy for panel n. And each one of these is going to correspond to a 0 for all of the observations at time t, save 1, where that 1 corresponds to which unit this particular observation is in. So if the first observation is in the first unit, it'll have a 1 on the first dummy and then a 0 for all the other dummies. And then all the rest of these observations at time t will have a 0 for dummy 1 because there's only one observation for the first panel at a given time. Maybe the next observation is for panel 2. All the rest are 0s because there's only one observation for panel 2 at that particular time, and so on. So all this effectively amounts to is, I'm going to steal some room from the consequences idea here. Suppose we have data on two different countries, like so. And this data is continuing the previous example, something like GDP over some kind of X variable, like let's just say, let's say past growth. And what you can see here is that, let's say this data right here corresponds to like the United States. And this screen data here corresponds to the United Kingdom or the UK. What this is telling you is, look, the slopes are the same. So it's the case that in both countries, GDP at time t is related to GDP at time t minus 1, where a 1 unit increase in GDP at time t minus 1 causes the same slope increase in GDP at time t. But the intercepts are different. So growth is always higher. I should say GDP is always higher in the US compared to the UK, which makes sense for a variety of different reasons. So what we're allowing is these intercepts to be modeled by dummy variables corresponding to the different panels. So we allow the relationship between X and Y to the slope is the same, but the Y intercept of each panel is different. This is like the most basic form of unit heterogeneity that most people can possibly imagine. There are more complicated forms, but this is a very basic form. What happens if you don't account for this form or unit heterogeneity? Well, beta hat can be biased due to omitted variable bias. So we talked about omitted variable bias in a previous lecture. Omitted variable bias occurs when an omitted variable is correlated with both something you care about and the dependent variable of interest. So let's take something like continuing the example from R that we were doing in R and stated earlier. Let's consider the relationship between murder rates and police per capita spending. It's probably the case that police per capita spending is different for different states. And in fact, police per capita spending may be systematically higher over time in some states than others for a variety of reasons. They may have greater urban populations. The citizens of that state may have a greater preference for crime prevention. Who knows? In turn, the states may also have systematically different crime rates due to the same kind of structural and institutional factors. So if you omit the state dummy, if you omit the fact of being a specific state, then you omit a variable that's correlated with both the regressor we care about, which might be something like police per capita spending, and the murder rate. Anytime we omit such a variable, we allow a spurious correlation to possibly exist and create the basis for omitted variable bias. So what we're going to talk about is two different ways of, actually, I should say, the standard areas can also be screwed up as well. But bias is typically thought of as a bigger threat. So now what we want to do is, now that we know what the problem is, think about how we might fix that problem. And there are two different ways of fixing that problem, one of which is easy to implement and has historically been more implemented, probably because it's easier to implement, and a more complicated fix that is a little bit more fragile in terms of its assumptions, but is coming into wider use in both economics and political science. So the first model that we might think about running is the so-called fixed effects model. And a fixed effects model is just a fancy name for put in dummy variables for each panel and run and regression. So it literally is, you've got y, you've got x beta, so y at time t is x beta as x at time t times beta, plus a bunch of dummy variables and alpha. That's it. That's the model. And there's going to be some u that we're going to estimate as well that we assume is part of the data generating process. This is the so-called fixed effects model, where the fixed effects are equal to the dummy variables. Now, one thing that's important to note is that you have two choices on how to do this. One is to omit the constant term and put in n dummies. Option two is to include the constant and put in n minus 1 dummies. So exclude one particular panel from the dummy variable. The reason why you do this is if you put in n dummies and the constant, the n dummy variables all added together add up to the constant. As we learned earlier in class, that can cause a problem, in particular, that estimates are unidentified because the matrix x is not a full rank. Now, you might be thinking to yourself, God, I'd really hate to have to go through and create all these dummy variables. It turns out that in most packages, including R, the statistical package will create them for you and, in fact, will even omit them. It won't even report them in the regression results very frequently because, typically, they're not of independent interest. They're only designed to correct for this threat of omitted variable bias that we expect may exist due to you and heterogeneity. So let me show you how that's done. So there's one way of doing it. We're back in R now, running that same state level data set with 50 states. And I don't forget how many years. The easy way to do this, easy, or naive, I guess, is way to do it, is to just drop in a literal dummy variable, factor stfips. Factor stfips will create a factor level variable corresponding to state Fips code. And just enter that in as a bunch of dummies. And as you can see, I'm going to run the model minus 1, which is to say with no constant. And if I do a summary of this model, check it out, I've got a massive model with tons and tons of dummies in it and no constant. And interestingly, here are the variables I really think I care about. They're all still statistically significant, one tailed at the 0.05 level, which is cool. That's nice. And they all have a negative impact on crime. I'm sorry, all but one, which is log per capita income, has a negative impact on the murder rate. So an increase in log population in this case is causing a decline in the murder rate, which is a bit weird, telling us that perhaps this fixed-effect model is actually creating some issues. And we're going to talk about some of those issues in just a second. The fixed-effect model is not without its disadvantages. Police per capita spending has a negative impact on crime. That makes sense. Christian adherence still has its reliable but very small negative impact on crime. Education spending has a negative impact on crime. Log per capita income has a positive impact on crime. That's weird. So maybe some problems with this model we want to think about and think about alternative ways of estimating this model, which are alternative estimation, not an estimate approach, not alternative ways of estimating a fixed-effects model, but alternative ways of controlling for unit heterogeneity, which we'll think about in just a second. We can also do this dummy variable thing with the panel linear model or PLM package. PLM, you can see I loaded it up here, is the panel linear models library. And instead of entering in this giant factor variable, what I'm going to do is just run a PLM model. So first I put the data in PLM format, just as I did before. And then I'm going to run a PLM model. And I say I call this the within model. So this just takes an extra argument to the PLM model here, which is just LM with a P in front, takes an extra argument model. And I can specify the within model. The within model is a fixed-effects model. So if I report the estimates, you can see that my estimates here, 11.72, correspond to, you know, there's 11.72. Ed spend is negative 1.1. Ed spend is, no, negative 1.1. This is exactly the same model. But you can see it omits all those crazy dummies, which again are probably not of particularly substantive interest. And it sort of makes it a little easier to interpret. And if you've got the PLM package, it's a little bit easy to do. The only thing you have to do before you can do this is, as I mentioned before, change your data into a PLM data format. And you have to, say, call the pdataframe argument where you give it the data set. You tell it the panel variable, which in this case is stakefipscode, and the year variable, which is year. These are just sort of arguments you can look up. Row names, I think, tells it to preserve the row names. And that just creates a data set in the PLM format that is amenable to panel analysis. Effectively, just a way of telling how these data are structured as panels, different units across different times. So as I've noted, as you can see in this particular example, fixed effects models are easy to estimate, but not necessarily perfect. We had some weird results out of here. Let me just sort of talk you through the advantages and disadvantages and how they might be materializing in this particular example. So I'm going to put pluses next to the advantages. This model is the best linear unbiased estimator for a DGP with unit heterogeneity. That's good, OK? Negative. This model is theoretically bereft. Whoops. This model is theoretically bereft. What I mean by that is we have resorted to a really blunt instrument to try to correct for omitted variable bias. We are literally dropping in the names of states as covariates. It would probably be better if we could collect greater information about these states and put in that information as actual meaningful variables as opposed to the sort of, oh, I'm controlling for Nevada as Nevada. So not the most theoretically informed model. Positive, it's easy to implement. It's very easy to implement. I just showed you two ways. Effectively, all you're doing is dropping in a bunch of dummies. On the downside, if the elements of x are slow moving over time within states, so in other words, within units, I should say. So in other words, if covariates for states are basically fixed, like Nevada always has the same value for most of its variables, which is a little bit of noise, but Nevada is really different from New Hampshire, what's going to happen is the dummy variables are going to be very highly collinear with x. So the regressors we actually care about and efficiency is harmed. So in other words, when you drop in dummies and most of the things you care about, in other words, the variables that are not simply unit names, don't move very quickly inside of a panel. You're going to have a big multicollinearity problem between your dummies and the things you care about. And that's going to cause the usual multicollinearity problems we've already discussed. If x is fixed, if any elements of x are completely fixed, you must drop those elements of x. So in other words, suppose we have a bunch of states in the international system, and we measure their democracy level. And their democracy level does not change over time because we have a relatively short, say, five-year time period. So democracy doesn't move. If we put in a fixed effect for state names, United States, Canada, and the like, the democracy level will be perfectly collinear with the state names. And state it will choke, or r will choke, because your matrix is no longer a full rank. So you can't have elements of x that are fixed across states and dummy variables at the same time, which is often a big problem. Finally, as the number of units n goes to infinity, the beta hat is not consistent. That's because the number of dummy variables to estimate goes to infinity as n goes to infinity. This is the so-called incidental parameters problem. So the idea is that these parameters, these dummy variables, are incidental. We don't really care about them. We're putting them in as a correction. But as the number of panels you have, the number of units you have goes to infinity, you get more and more and more of these incidental parameters to estimate. And you don't really get more data per panel to estimate them. So the complexity of your regression, the amount of information you're trying to extract goes to infinity, whereas the amount of information that's present in the data from which this information is being extracted remains effectively constant. All of that can actually cause the variance in your estimates to get very large. And if you notice, all of these problems, many of them have to do with the variance of estimates of beta hat on x. So multicollinearity between the dummies and slow moving x's cause greater variance in estimates, not bias, but variance, which can cause you to get crazy estimates in individual samples. That's possibly, or even probably, why we see those weird coefficients in that fixed effects model that we just ran in the American states. So fixed effects are hardly a perfect solution. There is another solution that we're going to talk about today, which is also imperfect, but becoming more popular in political science. And that's the random effects model. So now let's wrap up our discussion by talking about random effects. The random effects estimator works in the following way. What we're going to do is we're going to assume that the variance of uit, which is the error on unit i at time t, is a combination of two sources of error, one pertaining to unit heterogeneity, and one pertaining to the standard error variance. So this is the proportion of the error variance ascribable to unit heterogeneity. And this is the normal error variance. And we estimate these things. Well, actually, first let me define them before I talk about estimating them. So the covariance between uit and uis, which is the correlation between or the covariance between two error terms for the same unit i at different times t and s, is equal to sigma squared nu. And the within unit error correlation is homogenous. So it's the same. The covariance between uit and ujs, which is two different units at two different times, is 0. So there's no correlation between two different units at two different times. The unit effects nu are distributed independently and identically with zero mean and variance equal to sigma squared nu. OK, so each observation, yit, is a combination of xt beta plus nui plus epsilon it. So in other words, what's going on here is that each unit has a random effect nu sub i. Those random effects are, in truth, distributed randomly. They have a mean of 0 and a variance of sigma squared nu. And in essence, what happens is unit heterogeneity is drawn out of a, jeez, is drawn out of a normal distribution centered on 0. And here, this is nu. This is f of nu. And what happens is for each unit, each panel, we pick at random a value out of this distribution, and that becomes the unit heterogeneity effect. They're called random effects precisely because they share this common distribution. The unit effects share this common random distribution. And their mean 0. So on average, there is no unit heterogeneity. But in other words, the average, actually, let me say that a little differently, there is a unit heterogeneity. But the average effect of the unit heterogeneity on y is 0. That's the best way to put it. So another way of writing this is that there is this omega matrix that's nt by nt. And this omega matrix, which is a matrix of the variance covariance of u, is just as before a block diagonal of sigmas. But now, each sigma is t by t. So each one of these corresponds to a specific unit i. And what we've got here is at time t, how do I put this? Across times, so this is time equals 1, time equals 1, time equals 2, blah, blah, blah, time equals cap t, time equals 2, blah, blah, time equals cap t. When you're looking at the correlation in errors or basically the error variance at a fixed time for one unit, the variance is a combination of true error variance, sort of the usual epsilon error variance, and this unit heterogeneity variance. When you're looking at the correlation of errors across time but within the same unit, so I want to know, how is the error in Germany at 1995 correlated with the error of Germany at 1997? That is going to be just the unit heterogeneity level sigma nu squared. That's going to be the covariance between those two. And so what you've got here is a VCV matrix of, well, actually, it's a variance covariance matrix of use of the errors, where all the on diagonal elements are sigma squared E plus sigma squared nu. And all the off diagonal elements are just sigma squared nu, sigma nu squared sigma nu squared sigma epsilon squared plus sigma, sigma nu squared. Right. And this block diagonal is, or this sigma is repeated for each unit at all times. But across units, there's no correlation in the errors. So in other words, what we've got here is NT many observations arranged in blocks of t by t per unit. And the unit errors are correlated with each other, but they're not correlated at all across time and unit. Now, we can't just correct the SEs. Recall, we have omitted variable bias just as before. So what we need to do is restore the assumption that we need the expectation of u given x to be 0. In other words, we need x and the regressors to be uncorrelated. And we can do that by weighting the errors. So recall that OLS minimizes u transpose u, which is just the sum of squared errors, y minus x beta hat transpose times y minus x beta hat. That's what it minimizes, the sum of squared errors. If we were to minimize this sum of squares in the case where we have unit heterogeneity, what we would find is that the expectation of the error given x is not equal to 0. And the reason is because if we don't include some kind of control for unit heterogeneity, all of the errors for a given unit are correlated with each other. And therefore, the expectation of u given x is some number. But what we can do instead is we can minimize this. u transpose omega negative 1 or omega inverse u, which is y minus x beta hat transpose omega inverse y minus x beta hat. What we're doing is standardizing the errors, the use, in order to restore them to mean 0 status. That's the basis of the random effects model. We're going to weight the errors in such a way that we restore them to having an expectation of 0 given the covariates x. And what we need to do that is we need to derive estimates of sigma epsilon and sigma nu squared. We need estimates of these things because we don't know them. So for sigma squared epsilon, we're just going to calculate the variance of epsilon i hat, which is in turn going to be equal to 1 over nT minus n minus k times the sum from i equals 1 to nT of epsilon hat i squared, where we get this epsilon i hat out of a fixed effects model, yi equals x beta plus d alpha plus epsilon i. So we just put hats on all these. Actually, take that out for me that we need that there. So what you do is run a fixed effects model, get the residuals out of that fixed effects model, and then use the fitted residuals epsilon from that fixed effects model to estimate sigma hat squared epsilon. Similarly, sigma nu squared hat is just the variance in that alpha coefficients. Or 1 over m, where m is the number of units. Actually, it should be 1 over n, where n is the number of units, sum from i equals 1 to n of alpha i. So in other words, if we want to know the variance of the unit heterogeneity effects, what we do is just run a fixed effects model, put a bunch of dummies in, get the coefficients for all those dummies, and then calculate the variance of those dummies. And one thing you need to note is that we need a lot of n in order for this to be a good estimate. In other words, if we want to have a good sense of how much variance there is in this unit heterogeneity, we need lots of units. We need lots of n. This process that I just described is called feasible generalized least squares, or FGLS. So this is a really tricky thing. What we do, just to recap this for you, is run a fixed effects model, use the error components of that fixed effects model to calculate the variance of epsilon, sigma squared hat epsilon, and the variance of the unit heterogeneity portion, sigma nu squared hat. Then we re-go up here and reconstruct omega using these estimates that we just derived. Then we minimize the sum of squares using our omega hat inverse. So we run a fixed effects model in order to eventually avoid running a fixed effects model. It's really, I think, kind of cool and interesting. But incidentally, this is not the only way to run a fixed effects model. Or I'm sorry, a random effects model. There are other ways. This is just one of the easiest ways I find to explain it. So there are lots of disadvantages and advantages of using random effects model like this one. The biggest advantage I can see is that you can use slow moving or fixed x. So the ultimate model that you end up running does not include dummy variables. You initially run a model with dummy variables to get these error components. But then the ultimate final thing you do is minimize this thing right here. So if I put a hat in there because we had to estimate that, that's what we do. We minimize the u-hats with this weighting matrix that we constructed specifically for this purpose. So no dummy variables actually end up in the final regression, which means we don't have to worry about culinarity between dummies and slow moving or fixed x variables like democracy from our early example because there's just no culinarity. Furthermore, if all these assumptions we just made, because you notice I threw a lot of assumptions out about what u looks like there, u, for example, is the component as the addition of pure error and normally mean zero-distributed error variance, unit heterogeneity variance. I don't know that's true, but if it is true, if the assumptions about u are true, then this model is more efficient. RE is more efficient than a fixed effects model, which is to say that its standard errors are tighter without beta hat being biased. This leads to the major disadvantage. If the assumptions about this model are not true, then beta hat is biased. So the random effects model is biased when the assumptions that we made about the error are false. And the critical assumption we made here is that x is uncorrelated with the unit heterogeneity effects nu. In other words, nu has to be completely independently and independently distributed according to typically some sort of normal probability draw. If that assumption is broken, if x is correlated with the unit heterogeneity effects, then we have a problem. In particular, the problem is that the random effects betas will be biased. And we should note that substantively, it's not that terribly uncommon for unit heterogeneity effects to be related to x variables. So for example, the United States has a lot of weapons, for example. It might also be more likely to go to war, just as a result of its unique culture and history. So in other words, there's a unit heterogeneity effect on propensity to go to war. The fact that propensity to go to war is correlated. This unit heterogeneity effect is correlated with an observable variable they care about, such as arms build-ups, is going to interfere with our ability to relate arms build-ups to war probability. A similar story can often be told for many, many unit heterogeneity effects. All right, so how does one actually run one of these models? Well, I've got some versions here in R to show you. I can run a random effects model by changing this within argument in my PLM to random. And as you'll see when I run this model, some of the things I found earlier go back to being more sensible. So for example, log population now has a positive association with murder rates, which makes sense to me. I noticed that log per capita income still has a positive association with murder rates, which doesn't make sense to me. But it's on the border of being statistically significant. So again, maybe this is just random noise, or in other words, just a high variance estimate. You might be asking yourself, well, if I'm worried about bias in the random effects estimates, when do I decide how to use a fixed effects or random effects model? There is a test for this called the Hausman test. And it's run in R using this pH test thing. The alternative hypothesis of the Hausman test is that the beta coefficients between the random effects and the fixed effects models are different from each other. And remember, if both models are unbiased, in other words, the fixed effects model is always the best linear unbiased estimator, if random effects model assumptions are true, it's still unbiased but more efficient. If the random effects model estimates are biased because the assumptions are not met, the beta hats from the random effects model will be substantially different from the beta hats from the fixed effects model. So what the Hausman test effectively does is just compare these two estimates and see whether they're the same using a chi-square contingency table. The alternative hypothesis is that the random effects model is bad. So a small p-value, which you can see we get here, tells you that the coefficients between the random effects and the fixed effects model are different, and therefore that the random effects model is not preferred, QED. The problem is that the Hausman test is extremely prone to false rejections, which is to say, it tends to reject the random effects model a lot. The Hausman test is very bad at sort of retaining the random effects model when it ought to be retained. So we get a lot of false positives in the Hausman test. In fact, a couple of my colleagues have a rather new paper trying to sort out when random effects models and fixed effects models are appropriate when one should decide to use them. And their tentative initial conclusion is that among other things, the Hausman test is not good at making this decision. And very frequently, the random effects model is the best one to choose because it's more efficient even if it's slightly biased. Because this is not a full-on panel or a longitudinal class, I don't want to belabor this point too much other than just to say they're both options. There is this procedure for deciding between them. It's far from an ideal procedure, but it is a procedure. And to tell you, first of all, that we're going to learn much more about how random effects models work and when they're appropriate next week when we talk about hierarchical linear modeling. And you can learn a vast lot of information about how to go about doing panel modeling, decide between fixed random effects, and many other topics if you take a formal panel or longitudinal data class, which is offered as a part of the PhD program here at Emory, and which I hope you will take. So that's it for this week. See you soon.