 Baltaghi has contributed to numerous fields in econometrics, including panel data models, simultaneous equation models, spatial econometrics, prediction and specification tests, and just to mention a few. His author of several books, including the econometric analysis of the panel data, copies it widely, and is also the editor of numerous books and journal special issues. Professor Baltaghi's author or co-author of more than 160 publications in leading economics and statistics journals is the editor of economics letters, editor of empirical economics, and associate editor of journal econometrics and econometric reviews. He is the replication editor of the journal applied in econometrics and the series editor for contributions to the economic analysis. Baltaghi is a fellow of the Journal of Econometrics and a recipient of the Multa and Puroscripts awards from econometric theory. He is also a fellow of the Advantage in econometrics and recipient of the Distinguished Authors Award from the Journal of Applied Econometrics. He is director and founding member of the International Association for Applied Econometrics. Professor Baltaghi is one of the leading scientists in our profession and a truly distinguished speaker, and we look very much forward to your talk today, Badi, on panel data forecasting. And let me just remind you, this lecture has been video recorded, so we decided to take questions after Badi has given this lecture. Please go ahead, Badi. Thank you very much, Niels, for this wonderful introduction. I just hope that I live up to it. I've enjoyed so much wonderful hospitality here at Arus. This is my first time in Arus, at the first time in Denmark. This is, of course, at the kind invitation of Timo, Tiras Virta, a good friend. It's nice to see him. Again, we shared wonderful times in San Diego with our mutual friend, Clive Granger. Beautiful walks, good econometrics, lunches with Hal White and Alan Timberman and Graham Elliot and others. So I'm very, very happy to be here. This is obviously a, when he told me I have to give a distinguished series. I'm used to being videotaped at the IMF, so I have to be careful to not to say any bad words or take the mic with me to the bathroom. But I mean, make those mistakes anyway. As I decided, since it's a general lecture, especially after the course with these wonderful students that I taught for the last two days, is to continue in that spirit of basically educating, giving you a survey of what's happening in forecasting a panel. I don't promise that it is up to date because there's a lot of stuff going on. But it is a chapter just out in the Handbook of Economic Forecasting, Volume 2B by Graham Elliot and Alan Timberman. If I understand, Alan is also a fellow of this center. It's the second volume, the first volume was with Clive. So I'm a panel data guy, so I'm going to stick my lecture to that. What I'm going to give you is a review of some of the forecasting applications in panel data. I'm going to talk about very, very simple models. I've been hammered by a lot of scientists to keep it simple. That's what I told my students for the last two days, kiss. Arnold Zunner used to always kiss. Keep it simple, and I won't tell you the last S for what it stands for, and many parsons and many others. So I'm sure you're aware of that simplicity thing. And that's really a guideline here, very simple models that we probably say it's too simple, but the baby has to walk before they run. So I'm going to talk about forecasting with error components, basically, that take care of heterogeneity in panels. This is the simplest model, the simplest workhorse for panel. I'll talk about it, but the students that took the course already know, or if you work in panels, you already know that that's a simple workhorse. And then we're going to add some serial correlation structure on it of the Arma type. I'm also going to add to it some spatial dependence for the cross-section type dependence. Because Arma type goes on the time series, and there's two components here. I'm going to add equations of Allah Zellner, equations with no endogenity on the right-hand side. Those are simulated regressions. And I'm going to talk about some, like I said, forecasting application, Monte Carlo studies, and future work. There's a lot of work to be done here. I know this is a center for time series, and a lot of the stuff in time series is being moved over to panels, especially with non-stationarity unit roots and co-integration and all that stuff. So it is worthwhile seeing whether some of the stuff also in forecasting can be done. So the basic advantages of panels have been already elicited, and I'm not going to talk about those. You can read Cheng Chao's masterful book from the Econometric Society or else my latest edition of Wiley. As I tell my students, shameless advertising, I make sure I advertise all the time. What I'm going to talk here about is the advantages of panel data forecasting rather than the advantage panel data. A lot of examples have been done in economics. The applied guys always say it has. Fortunately, I've been part of those applications myself. So we've looked at liquor sales across U.S. states with Griffin. There's studies on world carbon dioxide emissions using national level panel data in Ari Stat vintage 95 by Hot Seekin and Seldin and Schmalen C. Stoker and Judson. Ari Stat again 98. I'm sure there's more now. Gasoline demand with Griffin, an energy economist, and then residential electricity and natural gas demand by Madala and his co-authors and by myself and my French co-authors. There's also individual earnings by Chamberlain and Herana. And there's other examples by Keenan Runkle and thus at all on listing respondents' intentions or predictions for future outcomes using Houseward Survey panel data. There's also macro type growth rates of OECD countries, Huxtrat, Palm and Fan, and then there's also cigarette sales in Ari Stat back in 2002. I can go on. Actually it'll talk about some of these applications, but you get the idea. There's also the impact of U.K. investment, authorizations using a panel of U.K. industries, driver at all, Urga, 2004. And then lottery ticket sales in Wisconsin, using zip code data, Fries and Miller, and exchange rate determination using industrialized countries, quarterly panel data, Rappach and Warhart, JME, and then migration to Germany from 18 source countries over the period 67 to 2001, Rooker and Silverstaffs, and inflation uncertainty using a panel of density forecast from the Soviet professional forecasters, Lahiri and Lu, and annual growth rates of real gross regional products for a panel of Chinese regions by Girardin and Holodin. What I'm going to do is start with the basic models, the basic model like doing in econometrics one. Rather than talk about blue, best linear and biased estimators, I'm going to talk about blup. Blup is best linear and biased predictors. Best linear and biased predictors, I think the naming goes back to Goldberger. Goldberger, great teacher in econometrics at Wisconsin, was students like Bill Green and many others. So he had the statisticians that worked on this stuff are basically biometricians. Henderson, Harville, 75, 76. It's mostly there. They're interested in animal breeding, but really panel data people have taken this literature into economics. This is where you'll find it. You'll find it, as I tell my students, in linear models, books and statistics, and the linear models don't have X beta in them. They only have these mu i's and then the t's and nu i t's, fertilizers and father bull and daughter cow milk production and stuff like that. For them, the beta's on the X's are the nuisance parameters. They're interested in the variance components. For economists, those are the nuisance parameters we're interested in the beta's. We're in the same ballgame, but we're talking different languages. They've used it to estimate genetic merits in animal breeding, as I said, and they could predict the milk production of daughter cows based on their lineage. And believe me, I've been in Texas, so these numbers matter. They hang them on the bull, and that's where you buy whether you want to sire this bull for your cows to get more milk. So they really use it for, and they put their money where their mouth is. But it's also been used as a nice write-up by Robinson and not Peter. It's a different Robinson. A good review of this. If you're interested in it, it's cited in the chapter, and I assume it'll be available to you. To derive Kalman filters, it's been used for all reserve estimation in krigging. It's been used to work out insurance premiums using credibility theory, remove noise from images, and for small-area estimation. That's where you'll see a lot of the articles in jazz and small-area estimation. Harville, a great statistician, has done a lot of contributions to the literature, and he shows that the Bayesian posterior mean predictors with the diffuse priors are equivalent to blub. In actuarial science, the problem of predicting future claims of risk class, given past claims of that risk class is free at all, and Bates, Harbour and Fuller, the same way in Fuller that does the Dickie Fuller and the Areson measurement, has written a paper on predicting county crop areas with survey and satellite data using error component models. So although this blub has been widely studied in statistics and biometrics, little discussion on the subject appears in econometric literature. So I tried to remedy this by writing the first forecast. It came from an invitation from the Bundesbank in a forecasting conference, and I believe that I was probably one of the few panel data papers. So I'll show you what's been done, and like I said, it will be deriving simple blubs for a simple regression model that you can teach in your econometrics I course, using AR1, MA1 and ARMA models and SAR models and SMA models. If you don't know what the spacial autoregressive model or the spacial moving average model, I'll talk about it. But I clicked on some of the videos and I saw that Bill Green talked about some spacial models. I'll also, like I said, talk about some of the extensions and then try to tie it up to the literature and forecast combinations for forecasting pooling methods. And that's what I think really needs a lot of work. If I have some really good students, that's what I will send them to work on. So the panel promises you that it really takes care of heterogeneity. We are all heterogeneous and in a cross-section you can't take care of heterogeneity. With a panel you can. Very tough to take care of heterogeneity. They had to work with twins to make sure that we're keeping things constant. The mother's milk, the father's upbringing, the genetics. And so Arshen Felter and company looked at people at twins and following twins, that's pretty hard. With panel data all you have to do is a repeated observation on the same person. And as long as you have a time invariant characteristic like race, gender, ability in a short time period, you're not learning. We are learning, so that may be also changing. You can say that differencing across two periods will wipe out that characteristic that you have. And so with control for any time invariant variables. Unfortunately there is no free lunch. You pay the price. The price is you actually cannot get the effect of a time invariant variable that you are interested in for policy purposes. And that's normally what is the problem with these methods. Political scientists don't like it because they don't give answers to these problems. Labor economists don't like it because they don't... If you're estimating a mincer wage equation and you want to know why females make less than males, that's the dummy on the gender. The estimate of the coefficient on that and that's wiped out by differencing. The same with race discrimination and the same with... So you're trying to control for ability but you lose the discrimination case. You can't get estimates of that. The same with gravity equations and trade. You want to know whether a common language enhances trade but common language is common language. It's not going to change. It's wiped out. Democracy indices. All these indices now they're putting in gross equations and development and seeing whether if you're more democratic. Well, if you have countries with 40 years of undemocratic rule it's not changing very much in time. Even though it's not completely wiped out, its identification is hanging by its nails. So we've got to be careful. So that's I think one of the main contributions that panels have done. You cannot control for heterogeneity in cross sections properly, I will claim. You're always lucky if you have another repeated cross sections or a panel because that's where you really can control for that heterogeneity and it can be thought of as omission bias that would bias your cross section results. So how does that help? Forecasting it should help and it does help. I'll show you in Monte Carlo's and in empirical applications how much the gains are. Normally you show how bad the cross section results are with estimation with panels. Here I'm going to show you with forecasting it really matters even if it's post forecasting. But I don't want to say that that's the only model out there. There are skeptics. So Pesaran and his co-authors have generated a huge literature on heterogeneous panel models. So if the time series gets large, so these are not micro anymore, micro is normally, as I said in my lectures, a tall and thin panel. It's a very large N for individuals, very short T because it's very expensive to survey the same people over and over again. And it never happens in poor countries. It only happens in rich countries. Whereas in macro panels, especially after the pen world tables, we now have a huge literature on countries over time. And so the T can be large. Or if you are in the stock market, you have an abundance of stock prices over literally daily units. So your N T can be large. Same with marketing. When you swipe your card, they know all the purchases and repeated purchases over time. Many consumers, many time periods. So the asymptotics are, you have to be careful. It was clear before in micro panels the N large, T small, T fixed. So N tend to infinity is obvious. But when we went to macro panels, and that was 90s, 2000, then we got both indices growing. And some applications aren't either. They're not even kosher. And I'm guilty of that. You've got 20, 30 countries and 40, 50 years. So the T could be even bigger. Most likely they could be of equal size. And 50 countries, 50 periods, which one is going to infinity? Or is it enough for asymptotics? The asymptotics for that was developed by Peter Phillips and Roger Moon in an econometric in 1999, telling us the joint asymptotics is not as easy as you think. Is it going along the array from the origin? Or is it sequential? It matters to the results of the estimators. I'm not going to go into that. That's another area that could come in into the properties of your forecast that haven't been done. Okay. So I will get back to the heterogeneous versus homogeneous later on because it will matter in our... But at the end, if I don't have time, it's too many for the time allocated. Let me say what's coming at the end. The combination forecast literature weighs in, I think, here because if you're going to... You either can pool the data and forecast, or you can keep the data separate, especially if you believe every country is different. They shouldn't have the same equation. And I have a very long time series, my time series stuff on every country. But this is a panel, so you should take it into account. You can take it into account by saying these are Scandinavian countries, they're correlated, they affect each other, there's common factors, all that stuff that will help in the estimation and in the forecasting. Or you can forecast them separately and then combine the forecast with simple combination weights, like average forecast, discounted mean forecast, error shrinkage forecast, principal component forecast combinations, time varying parameter forecast, or Bayesian model averaging, Alastoc and Watson or Timmerman. All right, so I said I will talk about homogeneous versus heterogeneous later on. I am not going to be doing justice to the Bayesian literature, which always claims to do better, okay, and there's a huge literature on that, Zellner and his co-authors and Coop and Potter and Fabio Canova and Cigarelli. So that's where the survey is deficient and you could find some more literature on that. So let's get started. The simplest model that I can put for you is our simple regression model, linear, ITI is cross section T, time periods you can think of them as firms, countries, households. Here's the simplest error component model to control for heterogeneity. The heterogeneity goes into the muI. The muI is unobservable. So instead of the error being basically a idiosyncratic shock, new IT, we add the muI to it, okay, in economics. And that takes care of any time invariant factor that is in a model that we can't control for, okay? So that's the other components. The statisticians call that one-way ANOVA, okay, one-way analysis of variance. I told you they don't have X-Vedas, okay? You saw the examples where, you know, the bull or, you know, a crop yield, muI could be a fertilizer. If you came to Texas A&M when I was in Texas, you'll see plots of land. They divide the land. They'll have flags on each plot. They're controlling for the same temperature, the same soil quality. All those variables are being in an experiment sense. And now they put different fertilizers and they see how much is the yield, okay? So that's why the X's aren't there. The X's are being controlled for in a semi-experiment. That's what some labor economists are trying to do today with natural experiments and human beings. Good luck. That's a tougher, tougher example. So that's how we're going to control endogy. Obviously, the basic panels, if muI is fixed, a parameter to be estimated, then it's a fixed effect. And if it is a random component with mean zero and variance sigma squared mu, so all you require with that magical quick stroke, one parameter rather than n parameters, you've got the random effects model. Of course, I talked all about fixed versus random and the housement test and all that stuff. I can't do that here. But the idea is that the muI fixed effects, the modern interpretation of this muI is random anyway. I don't care. Jeff Woolidge will tell you and goes back to Mondleck and say, the muI's are correlated with the X's. In economics, we're obsessed with endogeneity. MuI ability is correlated with schooling. If you want the returns to schooling, sorry, you're going to have to control for that endogeneity. I'm not going to go out to experiments or twins or stuff. It's very hard with panel. What am I going to do? Essentially, I'm going to condition on the muI's. I'm going to wipe them out. That's what the first difference will do, but that loses the first observation. In time-seize, that's not a big deal. But in micropanels, that's a big deal. Because every T is precious. Every T is precious. You paid a lot of money for it and you're using N observations, not one. I followed David Henry, as I said, in the IMF and he walks in and puts a lot of lags on the Y's until there is no serial correlation. I tell him, you can't do that in panels with a very short T. Because every lag, you're getting rid of an N. Five lags is five N. Then you've got a difference, that's six N. Well, you're back to a cross-section. You've got to be careful with your lagging in a very short T situation. I'm not going to do the matrix algebra, but I'm going to show you what to expect, especially if you read this literature. This literature comes from biometrics. Like I said, I've used their notation. Economists never like it. Here's alpha as the intercept. Beta is the slopes. X means there's no intercept. Z means there's an intercept in the regressors. That's all it means. Then the error term, remember, the mu i is of dimension N which is large. N individual effects. When they are N parameters, and the asymptotics is on N, like in micro, remember, every new observation if you're talking about asymptotics will bring me a new mu i. Every new individual gets his own mu i. He's special. He's different. He's heterogeneous. Timo's not me. Timo, even though we like each other, we're different. The mu i is different. If it's different, it's an extra parameter. That's the incidental parameter problem in statistics. Neyman and Scott, econometric are 40s. It's an example of an MLE under normality where it's not consistent for mu i. Fortunately for us, the inconsistency of the mu i is that it doesn't transmit into the betas. And what we are interested in, the betas. Then that's why we're able to estimate by conditioning on those. So the fixed effects will assume they are parameters, and if they assume they are parameters, this is a matrix of dummies. What is a matrix of dummies? Well, it depends on how you sort the data. If you sort the data such that the slow index is i and the slow index is t, the turtle and the rabbit, as I called it in my course. The rabbit is t, the turtle is i. Then when you put the dummies for the mu 1 up to mu n, you have to add mu 1 here, you have to add mu 2 here, you have to add mu n here. The dummies have to look like this. This is a vector of 1s always, and the dimension is right there. So this is a vector of 1s, and it's put next to an identity matrix of dimension n. The computer will print out your dummies if you sorted the data like that. Of course, nobody enters those dummies, you create them now beautifully with any matrix language. The random effect says there's too much, the statistician says there's too much loss of degrees of freedom and parameters to estimate. So we're going to estimate only one parameter, sigma squared mu, they're drawn randomly, by the luck of the draw God gave you this mu 1. It came from this distribution with mean 0 and variance sigma squared u. Easy trick, with one variance for the heterogeneity, the edgysyncratic variance, your omega is going to be fancy, it's going to be black matrices, I don't have time to go through that, but up to a scalar, you only need to estimate one parameter, just like an AR1 model or an MA1 model. That's why I say this is a simple model, a quick model. So the omega under an error component model will be this, to show you that, actually this can be rewritten as a, this is a spectral decomposition of omega, this is the averaging matrix, this is the within matrix. If you pre-multiply the model by this matrix, you'll get the average over the whole time period. It's back to a cross section, and that's called the between regression. This is i minus p, so this is y minus y bar, y bar only over t for each country, and so that's the within regression. So this is the within regression, this is the between regression. If you forget about these sigmas, if you add them up, you'll get OLS, equally weighted within and between variation. But if you weight them by their variances, you'll get the random effects. The random effects is a fancy generalized d-squares, that takes care of heterogeneity through the variance. The reason why it's not so popular is because it assumes it's uncorrelated with the axis. And I told you in economics, we're obsessed with endogeneity. So you've got to pass the Hausman test to get into the endogeneity. Okay, so obviously, even with fixed effects, you're not going to be able to run a regression with the dummies unless it's only 50 states or 30 countries or whatnot. With 5,000 individuals, you can't put 4,999 dummies. You say, I can, I have a powerful computer, but you shouldn't. That matrix may give you a generalized inverse that's not good. And remember, this was developed in the 60s in econometrics. And so these people created the within matrix, the projection P, the I minus P stuff. So what you do is you use the Frichois level theorem to remove the dummies. Oops, to remove these dummies. This is a kind of review for some of you I know, but I need to set it up for the predictor. These are the dummies. So if you put the dummies back in here, and so what fixed effects is saying, you omitted dummies. Those dummies actually span everything that you omitted that's time invariant. And you're not going to get those, but that's going to be more robust than or less pulled or less on this model. And so when you put the dummies in there, you need to get the betas. You're interested in the betas. You're not interested in the muses, but you need to put those in there and you can't put them in there. So how do you do? You project on them. That's what the Frichois level theorem says. You project on these dummies and it turns out these dummies are ones and zeros. And it's so easy to show that the projection matrix is your xx prime x inverse is an averaging matrix. This is j is a matrix of ones of dimension t by t. The bar means you divide by t. So it's an averaging matrix. You multiply that by a time series. You'll get the sum and divide by t will be the average. And you repeat that for every country. So that's the yi bar in econometrics. And the q matrix is y minus yi bar. This is where you remove the mu i because you subtracted the average. This is where the mu i lies in a projection on this, the between regression. So at the end of the day, you could have run, so far you've developed, you could run pool dollas ignoring all the heterogeneity, whether fixed or random. Or you could run the fixed effects estimator which puts the dummies there if you can. But if you can't, you do a within regression. Or you do a random effect. And that's this 8-kin estimator, the best linear unbiased estimator that I need. But this is knowing omega, so you need to estimate that. And obviously you cannot invert an nt by nt matrix when it's 5,000 individuals, 10 years, 50,000 by 50,000 by giving it to the computer. The inverse of that. And the inverse of that comes beautifully from the spectral decomposition. Since this is the characteristic roots, it turns out to be the characteristic roots, and p and q are the matrices of characteristic vectors and they're idempotent and sum to the identity, the answer is simple. It was given by Fuller and Bates in jazza and 74, and their answer was you pre-multiply by omega minus half and do olas. And this half turns out to be so simple. This is the, within, this is the between, at the end of the day, it's a simple transformation that you can do with Excel. y i t minus theta y i bar. That's why random effects were so popular, you just run an olas regression, just like you would run the within now by y i t minus y i bar. The within is just q, the between is y i bar. Really, this is where you see it's a weighted combination of both. Now, where do I estimate the theta? The statisticians tell us that the best quadratic unbiased, this is the best you could do on these variances, is by taking quadratic forms because the are quadratic, variances are quadratic, in the use, in the true disturbances. Okay, with p is the averaging matrix, so a simple calculator formula or a within average, a deviation from being average. Okay, the only problem is, even though this is an estimate, quadratic, it's based on the true disturbances. We don't have the true disturbances. Okay, but remember, the statisticians did not have x beta, so they actually had the yield, they had the milk, they had the output. They didn't need any of that stuff. So if you read Graebel or you read Schale-Seal at Cornell in Biometrics, that's what you'll see. The economist came in and said, okay, we will plug in olas residuals. We'll plug in fixed effects residuals. Wallace Hussein, 69, econometric, amemia, international economic review. Now I want to go to the prediction. Okay, sorry I gave this quick background, but it's important because Goldberger, Art Goldberger in jazz in 62 said, in this model with a fancy omega, whatever it is, the best linear unbiased predictor, the blup, okay, the best linear unbiased predictor, blup is you don't only extend the line with your GLS estimator, but you know something about the error and that error can feed into your forecast. So what am I doing here? I'm forecasting for the ice country s periods ahead, or the ice individual or firm s periods ahead. So obviously I need the regressors. That's another issue. I can plug them in. This is what all your packages are doing. All your forecast and all these applied guys come in and forecast, forecast. This is exactly what they're doing. They're just extending this line here. Most of the time they're not doing this. They're not extending it with the, since I know it's an AR1, and since I know it's an MA1, if I believe that and I'm using that, then I might as well use that in my forecast. So the omega could be an AR1 in a time series. GLS residuals is the vector of the data residuals from GLS. W basically is the covariance of where I'm predicting and my data sample. So R3 derived, the Gauss-Marcoff theorem showed that this is the formula in general depending on true omega and that is best. Among all linear unbiased predictors this is the best one. Well now I need, I'll show you the AR1 in that jazz piece if you've read it and what he showed is for the AR1 Ut equals rho Ut minus 1 plus epsilon t, which you're very familiar with, baby AR1 model, all you have to do is if you're predicting one period ahead, t plus 1, the whole term will be rho times the last Ut. So you're predicting Ut plus 1 by rho Ut basically. If you add the last residual, you multiply it by rho and add it, that's better than not adding it. The problem with that is you have to estimate rho and you estimate Ut and Bailey and Spitzer showed that once you estimate it, it's not clear that you do better than OLS in some of these models. That's another jazz article much later. But hold on for now, let's see what is it for the error component model. For the error component model you cannot omit or the individual effect. If it's Denmark, you have to have the Denmark effect, the Germany effect. But the covariance turns out to be very simple because the only correlation across time, so far I'm not allowing correlation in the idiosyncratic terms, the AR1 or the MA1 or the ARMA, the only correlation across time in a simple panel is because it's a Denmark observation or a Germany observation. So the correlation, the statistician called equi-correlated, equi-correlated because even if it's five periods later or two periods later is the Germany effect and the covariance is the same. I tell my students, the patient is not dead if you plot the corollogram, it's beeping like this, but it's equal, it's always equal. It doesn't die out like an AR1 like an MA1. It's always there, equi-correlated matrix. So that's what we'll do. We will actually correct with this omega and this covariance matrix that I spent time to show you, even though it should be obvious to panel data people, okay? And so with this omega and this row, remember I need omega and I need this little omega or w and this little w turns out to be the effect of that country. The effect of that country. So this whole term that I need to add to my predictor from a random effect regression turns out to be this term. What is this term? This term is the residuals for that country at this time. That's where its effect lies. That's why it's going to give me more information for that country effect weighted by the variance components. This was derived by Taub in Journal of the Metrics of 79, but it was also derived by once-picked and keptain literally in a in a related matter and by Griffith and Li, Longfeili, in a non-published paper for a reputation actually. So in any way, this is the published one, so this is the predictor. Just like a and there's the citations. But this assumes the true values of the variance components. So if I plug in estimates of them, what's going to happen? Well, the statisticians had already taken care of that. So those inflation factors that account for additional uncertainty introduced by estimating these variance components. Well, when I was giving an early paper on prediction with AR1 and simple one-way error component models in Michigan State, Richard Bailey told me, no, no, I've got this paper with Spitzer that shows even for the AR1 model, when you estimate rho and you estimate the residual, you already left a long time ago and the OLS may even do better. Well, we wrote a paper together. This paper is in a volume in honor of GS Madala. It's called Quality of Limited Dependent Variables and Panel Data. Came to the University Press 99, edited by Chao Pesaran and two of Madala's students, Longfeili and Kaja Lahiri. And what it says is this will not happen in panels. In fact, we proved that with deriving asymptotic mean squared prediction errors and doing Monte Carlo's. So even though you plug in these estimates in our model, you still get a better predictor. What did we compare in that paper? We compared basically the MLE predicting this T plus S assuming normality. A truncated MLE where you don't add what Goldberger said to add. The mis-pacified predictor, these are the packages OLS predictors and the fixed effects predictor which basically adds the dummy for that country if you've estimated it. And the dummy variable estimators if you can do that. Of course you can do this with a within regression. The within regression will give you the beta. It doesn't give you the alpha or the mu. But the alpha and the mu can be retrieved from the average equation. This is the intercept always retrieved the data passes through the mean and this mu i from the averaging equation can be retrieved. So this is how actually the packages retrieve those alphas. When there shouldn't be an alpha in STATA, there is an alpha, that's that alpha. There shouldn't be one. It's a dummy variable trap. So in any way we did asymptotic formulas with a v squared error predictor for all four predictors and we found with numerical and simulation results that are shown to perform adequately for realized samples of the size of 50 countries or 500 individuals over 10 and 20 periods. Of course this all depends on the there's gains of course in this. And the ranking was clear. The predictor using the MLE is the best. The misspecified is the worst and truncating it and actually the fixed effects is second best. So taking care of the heterogeneity and doing the bluff thing is important and do it right and it depends on the magnitude of that heterogeneity if it's 0.9 versus 0.6 the gains are either 10-fold or 2-fold. Okay? What about if I start going to a two-way analysis of variance as the statisticians would call it or really when we go into macro panels or more time series panels, okay? Well then you really have to have a time period effect. These are the common factors. Everything that happened in those years that affect all the firms. Some regulation, some recession in the economy high unemployment and whatnot. This is going to span every firm invariant variable that it's only varying with time. Is that clear? That's the negative part. That's the positive and negative. Positive because if I didn't include one it takes care of it. The omission bias problem. Negative because if I want one of those and want their effects to report then that estimator doesn't give it to me in a fixed effect because it wipes it out. That's the two-way model. Of course if they're both parameters you put dummies for both. That's easy for labor economists. That's why in Stata there is no two-way model and for Stata there's only a one-way model because the n is large the t is small so you just put dummies for each year by hand. That's it. Stata didn't program it. In e-views there is both. Two clicks. It has cross-section period. You click random, random, fix, fix random, fix, fix, random. It's easy. I don't leave home without it in an undergraduate course. They love click, click as I tell my students. In a random model all of these are random. Where would that be? Particularly good stock markets. I got the stock effect. I got the daily effect and their shocks. If I knew what they were, if I knew more information I'd make a lot of money. If they're random this should be better than doing nothing. Fancy covariance matrix now I have three variances. I've got heterogeneity across time due to the same country I got heterogeneity across different countries for the same time period. How am I going to do the blup? I've got a fancier omega. The various covariance matrix surprisingly if I'm predicting for the I-th country even five, three, two periods ahead is still the same covariance term because I'm not allowing the lendatis to be correlated. That could be extended for allowing that. The predictor derived would look like this. It would depend on the averages of the residuals. Again the average over time of the residual, the total average and these various components. Actually if there's a constant in the model this term drops out so this looks like our old friend except it's with a different omega. A fancier two-way omega. There's an extension of this to models with heteroscedasticity and the extensions of the Bailey and Baltagi work to the two-way model. I'm not going to talk about that because I don't have time. But how do you predict even with a two-way model even using well, I showed you how to do that with the random effects model. With a fixed effects model there's a problem with time. You don't know the coefficient of future time periods. But economists are resilient so in predicting carbon monoxide emissions or dioxide emissions my chemistry is bad 98 what they did is they basically put on the lambda t a time regression essentially. You see it's now a linear trend plus a structural break at 1970. That's what they did in their Aristaat. And they actually made it non-linear in a second version where they put log here minus 1940. Although these two time effects had essentially the same goodness of fit there is all in different auto-sample projections. The linear spline projected time effects by continuing the estimated trend 2050 while the non-linear trend projecting and flattening trend consistent with the trend decelerations from 50 to 90. This was done earlier by Hatz Eakin and Seldin in 95. Same idea but they didn't do the linear trend they basically used the for the time effect the value of the last year the last year. So they assumed constant as the last year so there was an assumption to break the fixed effect Now can I add serious correlation because I mean if we're going to predict in time there's always strike effects that affect for periods of time, policy interventions that may have lasting effects oil embargoes or wars or whatever. And the answer is yes you can do that. I had written a nice paper with Chile extending the estimation of an air component model with serial correlation of the simple type where we know the omegas of course you can do it for more general terms but let's start with the simplest one the AR1 on the remainder disturbance so you've got the heterogeneity mu i and then the no IT is an AR1 so you have an extra parameter and stationarity is assumed here and here's the initial value and so the predictor so the paper shows how to estimate this model and how to forecast the forecast turns out the Goldberger correction term for the fancier omega with AR1 and with heterogeneity of the random type will depend on the GLS residuals again the omega and the covariance and at the end of the day this is what it looks like yes but it isn't really this is if heterogeneity was 0 this term will drop out and this is Goldberger's term for one period ahead I'm doing one period ahead I didn't tell you that if rho was 0 this will turn out to be the average of the residuals across time it doesn't look like the average of the residuals across time because in a pre-Winston transformation there's a Cochrane Orchid from 2 to T and there's a special special weight for the first observation this anchors the data this is almost an absolute difference and they have the same variance so this is the pre-Winston or Kadiala transformation we're applying it here that's why the first observation takes special care the special care turns out to be of this form in the correlation coefficient of course if rho is 0 this is 1 and this is 1 plus T minus 1 this is T so this becomes an average the D squared becomes an average anyway I'm not going to go into the details you can read the details so it's the natural predictors if there was no heterogeneity or no correlation AR2 as long as you know the omega or what cracked that time series model you can plug it into our paper and get your bluff your estimation actually and your bluff it's done in STATA for AR1 but I don't think they do the predictor correctly so here's the AR2 so you correct like a Cochrane Orchid for the first two periods and as usual there's the average of the residuals transformed with rho of course where the first two observations get special weights the rest are equally weighted it sort of makes sense or it makes sense to me here's a special quarterly AR4 model of Kenneth Swaleth for seasonality you can do that that's a special omega that it looks like you go UT-3 and then the average would look like that so all of these are formulas to correct your predictors to get a bluff of course they all depend on these systems for an MA1 we've also done that you can do that sequentially I'm not going to go through the details this is an econometric theory piece and you can extend it to ARP MAQ and Zindwalsh and what's the other fellow at McGill Galbraith extended the estimation the forecast hasn't been extended to an ARMA type model if you want the application by Freese and Miller on sales of lottery tickets in Wisconsin was actually with serial coordination so they had 50 zip codes in Wisconsin selling lottery for 40 weeks the first 35 were used to estimate the model the remaining five were used to validate the model with forecasts and using mean absolute error criteria mean absolute percentage error criteria the best forecast were given by the error component model with AR1 disturbances followed by the fixed effects with AR1 disturbances the fixed effects does well as long as you're not you're not predicting the time effects then you have to make strong assumptions we can extend this to spatial correlation spatial correlation really was invented for cross-section dependence we normally assume randomly selected these people they should be IID independent but once we start looking at networks, neighbors then there is correlation and that's now very popular whether in economic theory the networks or whether in econometrics I don't have to convince urban economists studying housing that neighbors, neighbors, neighbors are important you know that the price of your house is all about neighbors and where you are and if there's a lot of crime in your neighborhood then your price is low if all your neighbors are rich and you're the economist you bought the cheapest house you're doing well and there's spillovers if there's crime in a one mile area it's going to affect you so anyway that's the spatial effect these geographers draw or draw circles mile, two mile, three miles those defines my neighbors or make it proportional to distance or distance squared weighted all to equal one and make your price dependent on your neighbors price so there's a lot of structure there in the spatial model that's why economists are resistant to that type of model but it's a simple model like time series AR1 come on we don't believe it's ut rho ut minus one plus epsilon t but it's a nice way to start and then you can sophisticatedly study something more so that's what the spatial guys do they do u is rho w u plus epsilon so here's I'm going to sort the data differently because I need the houses I need the houses in each year so now the mu is not time varying the phi is the idiosyncratic time varying variable for the n houses here's the SAR spatial AR model they call it AR even though it's not a lag it's your neighbor the disturbance of your neighbor your neighbor was robbed you're going to get that shock there was a fire it's going to affect the price of your house is that clear? the parameter is lambda just to keep it different from the rho of an AR1 and the w is a spatial weight matrix that's what I said I don't have problems with geographers doing GPS mapping your distances your coordinates and seeing how far you are from another trade economists use it all the time distance capital to capital to see the effect on trade but they use it as regressors once you start putting in a w matrix they scream bloody murder that's the one mile radius the two mile radius the three mile radius you're either my neighbor 1 or 0 if you lie there or not or by the distance measure or some get more sophisticated commuting distances and whatnot I don't have a clock so somebody has to tell me how I'm done well one hour has okay and I have only one hour you have 5 to 10 minutes left okay well you should say something okay sorry about that I went through the easy stuff I still have a hundred slides see this is what I told them I'll teach you all day if I can still keep standing but I am not so I'm going to go a little quickly here just to give you a view from the forest you can read the chapter it's there you bought the book the Timmerman it's in your library so you can get the copy of the chapter it gets fancier so the fancier matrix now looks like the old panel thing here but here it depends on this spatial weight matrix okay the weight matrix like I said your error is related to your neighbor's error the W is well specified it's distance or ones and zeros okay and you only you put zeros on the diagonal because you don't want to relate your own disturbance to your disturbance you want to relate it to your neighbor's disturbances okay so in any case it's not as pretty as the simple error component model but the the degree of computation becomes less because it's not the NT by NT it's really N by N so the problems that that the cross-section people have in computation of these MLEs like Luke Ancelin a very nice place to start old book now 88 the same N by N dimension is still the same in panels so this we I did with Dongli one of my students who is a Kansas State and what we did was we allowed for the spatial AR model in estimating cigarette demand and liquor demand in the United States comparing doing nothing or I'll ask doing fixed effects with spatial going random effects with spatial and we also have done this for the spatial moving average model they're on the side now which makes sense to the time series people but there's no lags here all right and then so what we derive here is the predictor for SR or SMA with heterogeneity and obviously it will depend on the heterogeneity components as theta that we saw before and the elements of these W matrices that are that define your neighbors and give them weights does that help yes it helped in the cigarettes and in the liquor of course this is an application so you don't know what the true model is but given that you specified the model correctly you had 46 states here contiguous states 63 to 92 we removed Alaska and Hawaii and we do spatial by just whether you are a neighbor geographically so you could do more sophisticated stuff the predictor comes out the random effects is the best fixed effects is second best and doing nothing is the worst and not doing spatial is less doing only heterogeneity is less of course we've applied the default Mariano test here for its pairwise forecast but more sophisticated measures should be used this was also applied by a long NICAM for some West German regional labor markets and ignoring spatial interactions they found suboptimal forecast I really have to speed up I'm going to skip that one skip that one I'm going to skip a lot of stuff if it's only five minutes and I apologize for that this was all empirical applications does it work in Monte Carlo it does but of course it depends on your design it's limited by that the same results happen when we set up a model heterogeneity and spatial even specifying the spatial or the heterogeneity taking care of it is better than not doing anything alright I I simulated regressions just extends this to multiple equations I did that with Alain Pirot my French co-author we use GM methods and MLE methods and actually if you want to see an application on hedonics it's with Bresson in the Journal of Urban Economics on Housing in Paris okay we didn't forecast but we estimated and looked at hedonics of housing there okay I'm clicking away homogenous versus heterogeneous I wish I had more time to talk about that because that's really a big issue to pool or not to pool as I told my students Shakespeare was an econometrician so very important to study whether you want to pool the data or not and the ultimate it's never going to be settled and actually there's anytime you group observations what countries you put together is very important okay Clive used to say you don't put Cyprus with India but you know I'm sure some guy would say no for this purpose I want to do it sometimes but in any case you the idea that you can test whether these countries all have the same delta you know they used to do that in the 90s with a chow test which is simple assume sigma squared i okay or this was made fun of by Robertson and Simon first time in 92 general applied econometrics and later by Pesaran and Smith and a dynamic model and that's why they stick to heterogeneous models so you can stick to different estimates or the pooled estimates and the dollar is somewhere in the middle he said look you do the test you do the different estimates that are all over the place we showed that in our gasoline study you get price elasticity that are positive okay you can try to get better data better model you're still going to get high variance across what you think is even similar countries OECD countries they should respond to gasoline in the same way in terms of the prices whereas the pooled model always gives you nice estimates reasonable estimates you can take to policy makers but when you do the test even the proper test you reject so as a statistician you should not pool okay as an economist you can't give these estimates and so you actually can combine them some Bayesian way right Madala said shrink the good estimates if you want an estimate for California okay take the national estimate and shrink it and the shrinkage factors could come from Bayesian ideas or from the F statistic for testing the regression I promise to go faster okay so I know you're looking at the clock here so we did that we did that in a lot of race horses so I took our gasoline data and I did a race horse I said I don't know what the true model is this is a model for gasoline of course you can criticize the model let's look at the heterogeneous models let's look at the homogenous models let's look at Bayesian models and let's forecast let's take away some years and see how it does one year ahead five years ahead ten years ahead and by if you look at that R.E. Stat piece in the journal of econometrics piece one on gasoline and one on cigarettes you'll see that the simple estimators in forecasting so in a horse race like that this happened this happened in other applications I don't have time to go through them driver an orga and it happened in in macro applications the article that I wanted to talk about was Rappac and Wohar but to look forward and maybe I can finish with that to look forward we really should look at the pooling of forecasts versus panel data forecasting literature and really try to do better things in panels why do you combine forecasts well I'm not going to go through what Timmermans has already written but it's important to do that and maybe there's something that we can do with the heterogeneous forecasts much better it's not been done especially with structural breaks and other stuff read the Rappac and Wohar article it has some of this stuff for a monetary model exchange rates and it talks about how some of the estimates were not plausible when they were heterogeneous and how they're plausible when they're homogeneous and how they performed an auto sample forecasting for a panel versus country by country for a one four eight twelve and sixteen step ahead and they basically follow our lead and discuss this in a similar way ok ok there's more recent literature on diagnostics for these the Syles U, the D-Bald Mariano the S by Westerland and Basher and there's ok I should have better timed myself here ok there's this also a more sophisticated model one in the JBS by Pesaran Schurman and Smith forecasting 134 economic and financial variables for 26 regions made up of 33 countries covered about 90% of the world output and this is a global VAR model over the quarterly period building on the forecast combination of the effects of model and estimation uncertainty on forecast outcomes are examined by putting forecasts obtained from different G-VAR models estimated over alternative sample periods given the heterogeneity of economies considered as well as likely the multiple structural breaks averaging across both models and windows make a significant difference that's his idea if you want there's pool being grouped and averaging forecasts here using panel version of D-Bald Mariano they conclude a double average G-VAR forecast performed better than the benchmark competitors especially for output inflation and really quality prices skip that forecasting versus aggregate forecasting I'm not going to go through that I'll let you read that and I think I should stop here because I don't want to make my host unhappy ok thank you very much