 So let's start this session with the first keynote speaker of the day, Simeon Kupman. So Simeon is a professor of the Free University of Amsterdam and is one of the world leading econometrics in the world, specialized in the field of time series and forecasting. Okay, thank you very much and also thank you for having me here at ECB. I've been here for many years and it's always a pleasure to come back. My presentation is about panel models, dynamic panel models in the sense that I include stochastic, dynamic stochastic components into the panel model. It is joint work with a team from the VU in Amsterdam, so my colleagues over there, but also people from Maastricht and from Lies. So we are sort of building some nice work around the idea of how to apply panel models in financial econometrics, in climate econometrics. So we have all kinds of nice applications and I will show you two of these applications. So dynamic panel data structure, you probably are familiar with the general setup and I will not sort of move away very much from this general setup. So I think we can all keep it relatively basic. Also the methods that I'm going to use are just regression and maximum likelihood. So in that sense all very applied material. But the only sort of thing that I'm going to change is that inclusion of that lag-dependent variable in the dynamic panel data model. In the middle of the slides you can see the standard dynamic panel data model with the lag-dependent variable input, lambda as the autoregressive coefficient. And that part I'm going to take away. So I'm going to take that one out and I'm going to replace it with a component, with a stochastic dynamic evolving component and then still sort of address the dynamics in the panel. But then not directly, including this lag-dependent variable, but by having a dynamic component in. Well, in a two-way fixed model, fixed effects model, where you both have fixed effects for the cross-section and for the time, and that is sort of at the bottom screen you can see that, that is what we call a two-ways effect model, is that you have both a constant for the cross-sectional dimension and you have the dt, the constant for the time dimension. And the dt is not fixed in sort of my case, but I will allow that to be some stochastic variable, a time-varying coefficient. So that's sort of the idea. So although you can do sort of both by having the lag-dependent variable in and having the dt in, but basically I'm going to remove both of them and replace them by this sort of stochastic process. And that sort of allows me to leave the sort of standard approach of GMM for the estimation of dynamic panel data models. So I'm going to use maximum likelihood estimation instead. And I'm sort of following that approach from the book of Pesserund in chapters 26 and 27, many, many more chapters to go there, but still these two chapters, they talk about the sort of alternative of using maximum likelihood for estimating panel data models and dynamic panel data models. And while that sort of triggered me to look at this more carefully to see that there are sort of also in the panel data world other ways of estimating these type of models. So I'm going to use what he also calls the transformed maximum likelihood. So you're transforming the data and then you can sort of rely on standard regression type methods. So that is sort of the aim or the approach that I'm going to take this morning. I will start with the basic panel regression model and there you already see that I have replaced a dt by a mu t at the bottom. So that mu t is that dynamic variable and ci is still that fixed coefficient for the cross-sectional fixed effects thing. So that these two are both there. And then the focus is really on the estimation of the beta with the regression part. And here, although in most of the examples that I'm going to take, x i t will just be a scalar, but as you can imagine, you can easily generalize that to a vector of x i t by a betas and a row vector and x t is a column vector. So you can sort of do all these things, but in sort of like keep things a bit simple. I just take only one x t and then show you how we're going to estimate the beta from there. Yes, all the lag dependent variables are now removed and all the dynamics are put in this mu t. And that mu t then needs to be modeled explicitly and that can be an AR process. That's just basically alluding a bit to that sort of lag dependent variable that you put in the dynamic panel data model while instead I have this mu t and I allow this mu t to have an AR1 process. So still the autoregressive dynamics are a part of the model, but not sort of directly but more implicitly by having this unobserved component in. So that is basically the stochastic trend. And while once you start doing that, you can generalize this to an AR1 or to a random walk or to any other type of dynamic process. You can include seasonalities in this way. You can include other types of business cycle dynamics if needed. So it can basically be fairly flexible. But again, I will restrict myself a bit and so only basically focus on the mu t and only have this as an AR1 or as a random walk process. Well, for many of us, an AR1 or a random walk that is a huge world of difference. One is stationary, one is non-stationary. But sort of in the methodology that I'm going to use, it doesn't matter really. So I can easily deal with a non-station component there or with multiple non-station components and a mix of non-station stationary because in the end, I'm going to use the Kalman filter to basically do the necessary transformation such that the prediction errors are all stationary and then from the prediction errors I get the likelihood function. So in that sense, that is all covered. But yeah, that is all of the ID and that is the definition of mu t. So mu t is this stochastic process. Well, then often it is sort of nicer to present a panel model in terms of regression type formulations and that is what I'm doing here. So I'm going to write the same model as in the previous slide where I have this mu t in but it's sort of high this mu t in the error term. And so you can see I now have an error term instead of epsilon t, I call it u t and u t is basically the cumulation or the sum or the conglomerate of all the stochastic terms inside the model. So on the right-hand side of the first equation you see all the fixed coefficients, the c i and the beta, the beta pooled in this case here. And then the u i t has all the stochastic terms and both the dynamic and the i i d type of dynamics. And so this is what I call the static formulation of the dynamic panel data model with stochastic trends. Well, and then basically many of the standard panel data methods can be applied here between estimation, within estimation, differencing that basically can all be done. Of course you need to allow for the special u i t for the special structure that you impose to the error term. And that can be dealt with in a way that I will explain later during the presentation. So also in first differences, for example if you want to do that, then you cannot, after first differencing to remove the fixed effects, the cross-sectional fixed effects to remove them by taking a delta, you cannot then do straightforwardly a regression because you still need to allow for the autoregressive moving effort structure in the delta u t. So even if u t is an ARMA or an ARMA process, then delta u t is still an ARMA or an ARMA process. So you still need to deal with that. So it is not that after this you can solve two straightforward least squares, you still need to solve for allow for the dynamic in delta u t. But I will come back to that and at what states I can use between and within estimation, depending on the assumptions that I can give to x i t. So if x i t is uncorrelated to the error, then I can do a random effects model and if they are correlated then I do fixed effects models. But that doesn't change these techniques that can still be applied. But then of course allowing for the ARMA errors in the regression. Yeah, so as I said, this is very much what Pescher calls the transformed maximum likelihood method. It has been developed earlier than in his book. So he refers to all these papers where he sort of have developed this type of method. So in the cross-sectional literature, this transformation approach is developed. Also in the time series literature, Harvey Marshall and Marshall between his PhD thesis of Pablo Marshall, he also had all these dynamic panel data models where he also uses techniques like the Kalman filter and the dynamics in there. So in that sense it is all basically refreshing that material that is around in the literature. I will come back to more links to the literature later on. So our approach provides the estimates of the time-gearing effects muti. So in most of these techniques, in the techniques of Pescher basically the muti is either differenced out or integrated out. And that is where our path changes where we depart from each other because I still think that the muti can be very interesting. And I will have my illustrations where I show that the muti really has an intrinsic interest. So I'm not going to integrate out mutis and to get rid of them. I really are interested in them and I'm going to estimate them. So I'm going to estimate the muti from the DPD model. You can also see that there's a dynamic factor model. So there's a link between the DPD models and the factor models. For example, in this setting I can take out the regression part the Bxit and then sort of what is left over is very much like a dynamic factor model. So that is the second equation here on the slide just to show you the link that I'm not sort of oblivious of the link with the dynamic factor model. But in sort of these dynamic panel data models the focus is much more on the estimation on the regression part. And that's basically also where I'm focusing today. And of course I can then also put a loading in front of the muti and in some applications also in dynamic panel data model it's this sort of advantageous to put a lambda i, a loading coefficient in front of the dynamic process. And then on the heterogeneous panel where the dynamics are sort of like have different impact in different parts of the cross-section then it is definitely advisable to put a gamma i a loading in front of the muti. But also for some applications of this homogeneous panel they basically can just ignore the loadings and just impose the same trend on all of them. So I see the link but here we are going to focus much more on that first equation where we are interested in estimating also the regression part and basically how to deal with the regression that is having the major development in this presentation. So these bottom remarks are basically saying that I'm aware of all these links with dynamic factor models and that we can also have multiple stochastic trends and then have them but I'm not going to focus on that because they sort of made either implicit or explicit. Well, another extension is that instead of only having a dynamic process for the intercept and like you are saying that the intercept is varying over time also the beta, the pooled beta can also be sort of varying over time. The regression can have different impacts on different parts of the cross-section. For some of the divine groups or even each individual cross-sectional series can have their own beta at their own beta eye but also over time the regression can change. So basically that same division between the two ways of allowing for the heterogeneity in the cross-section but also over time and that is all of what I include in that regression part. So we have the beta t and the bi in the same way as the intercepts so we sort of generalize that for the two and then the idea is to extract both the beta t and the beta t and still allow for the fixed effects like CI, BI, etc. and how to disentangle these two estimation processes. Well this then basically a dynamic panel model with time-varying effects. It always depends a bit on where the dimensions lie is the dimension high in N or is the dimension high in T. Well in macros nowadays I think more or less N and T are of equal size so in many applications what I've seen so far but you also have these these truly panel things with large N and small T and in these methods we try to tackle them all. Of course the inference and the theoretical implications can be different but for the methodology that doesn't need to change very much so the estimation the inference can follow in the same way. Well if N is really huge and this sort of is not feasible to estimate all the coefficients over the cross-sectional dimension so all the CI's and the BI's it's just basically too much then you can enforce them take that into the part of the stochastic thing and then you basically say well the CI is a stochastic term with a certain mean C and then certain variance sigma squared C and the same for beta I. So if there are too many then it's just all pool them in a sense where you also allow a distribution so it goes into the stochastic part of the model and again the methodology can handle such cases as I will show you in a moment. So all these different approaches of random coefficients and fixed coefficients they are really the same thing. There's always some issues about identification if you go two ways both in cross-section and the times dimension but that's more to do with initial conditions that the initial value for the trend of the intercept and the trend in the slope coefficient they need to be set equal to each other or you set 1 to 0 and the other one you estimate so there are always some sort of nitty gritty details of how to do the identification but that is just a long way you sort of see how to tackle that. So this basically the overall model that I'm trying to tackle today a single explainative variable just for out of convenience just to make the notation too complicated. So everything is a scalar what you see on the right hand side everything is a scalar for the moment but as I said you can generalize these things. You can basically just represent this whole thing as a multiple regression model so that's what I do in the middle you can see the yt is xt delta plus ut where xt now contains all the intercepts all the ones for each cross section and the xt that is diagonal thing to match them for each cross sectional thing and the delta has all the fixed coefficients the c's and the b's in them to tackle that estimation. So delta can potentially be a large vector and again we can tackle that via regression or via within estimation or between estimates and that sort of thing and then ut that has this sort of like at the bottom this structure where you collect all the stochastic terms in and so ut is common to all the ut's ut is now a vector by the way that this is n by one vector and the ut is a scalar so that's why there's a fat one in front of it vector of ones of n dimension xt beta I think that should also have a big one in there xt is also a vector as you can see on the right hand side xt is also like a column vector so that is an n by one column vector and then you have the beta t to stochastic part of the regression coefficient. Here the funny thing is and I will come back to that in a moment but here the funny thing is that both the xt has an impact on the location on the regression part but also on the second moment on the variance part so xt because xt has both a stochastic time varying thing in there and a fixed effect for the cross-sectional part so that's why I have xt both in the variance and in the in the location in the mean. So that is basically the next step to sort of sort out what exactly this multiple regression model is well on the outset all very simple but it is about the ut I have already showed you the composition of ut and then the covariance structure of ut well if you stack them all so if in the end as you can see in the bottom line of this slide in the end you want to stack them all basically to allow you to do gls and generalize these squares but then I need to know what this omega matrix is and what the structure is of that omega matrix and that omega matrix is implied by the definition of ut so and the ut also has the dynamics in there so it is not only the contemporaneous variance that I need to allow for but also all the cross stuff for different time points so it is like what we call a topless matrix and so it has all these bends where we have the lag dependent properties of ut ut is an error is an unobserved thing so to say so it is an error I'm fully aware of that but I can sort of work out what the covariance structure is given the definition of ut in the previous slide here at the bottom so given that structure ut has dynamic properties these are exogenous or semi-exogenous and epsilon t is stochastic and also has properties beta t has properties, dynamic properties so I can sort of work out all the variances and covariances I can all work them out from this specification and that sort of makes this omega matrix so it is sort of like an animal if you really want to work it out it is like a big beast so you don't really want to sort of work out exactly what the composition is but you don't have to do that so there is no need to do that especially the next slide but if I have that sort of bottom equation then I can relax and then it is just GLS if I am aware of how that omega and how to tackle this omega then I can relax and then it is GLS and I can use any packets that is around that can do GLS so it is basically this translation thing of separating out the fixed part, the stochastic part and then working out the stochastic part well just to show that benefit as soon as I have this of my dynamic model in this formulation then typically what you need to do to do GLS you basically do a transformation thing so you do a Goleski on that omega I know an animal thing so not nice I don't want to see it but in principle I can do this Goleski decomposition and once I have the Goleski decomposition I transform the left hand side and the right hand side variables and I call that small v so the second third of the page the final equation that v and capital V these are basically defined as the transformed and the Goleski transformed data so the small v y is the transformed data on the left hand side the observations that you have and I call that v i after the Goleski transformation then the same for the axis as you see on the first equation on the left you can see that x is the first thing that you see at the right hand side well you also transform that one so basically pre multiply the left and the right hand side by this L inverse and then get this small v capital V and this L U or L inverse U which I call v U and the nice thing about this v u thing is that that is IID that is multivariate big vector n times T and capital N times capital T big thing but in the end after that Goleski transformation it is just an IID variable and that means after the transformation I can just apply OLS and so then I can just do OLS calculations so now the trick is how to get that transformation going well that's basically the kelmer filter that is doing that for you so you don't need to know explicitly what that omega is you just need to know this UT you see that error thing the second equation I'm going to put that into state space form and then I can ask the kelmer filter to do the Goleski decomposition for you so basically I run like for every column on the left hand side and every column in X I run the kelmer filter to do that transformation and get my transformed data and my transformed X and then do OLS that is all of the implication of this transformation so I'm going to put this error term that second equation to put that in state space form that is the idea so that error expression here is represented in state space form that is sort of the second equation in the middle that is the state space form which is slightly unusual notation for me I call the transition matrix here capital A I'm not sure why I did that but I'm sure for a certain reason that A matrix should be used here but here it is the A matrix and ZT is the time-varying bits where you have all the explanatory variables and intercept ones and zeroes so that's really like time-varying ZT in there and the alpha the only contains so the state vector here is only two-dimensional in this case only for the time-varying intercept the mu t and the time-varying recursion coefficient the beta t so it is really a 2 by 1 vector in this case so even though the dimension of n can be used in the end for the time-varying the pool thing I only have a state vector of 2 by 1 so that is very parsimonious and so Kelmerfield is like a very light exercise I only need to integrate out two elements out of the likelihood so that is sort of a nice thing we notice that u t can be a big vector n can be potentially large in panel data but there are ways to treat that high-dimensional observation vector which I also will come back later but the nice thing as I underlined just a minute ago is that that state vector that has a small dimension that allows us to do fancy stuff the t matrix is the A matrix the capital A matrix that is time invariant so that's a fixed matrix that's usually the AR coefficient or if you have seasonal some seasonal type of dynamics but that is all fixed so it is not changing over time that A matrix that is all of important for now but it can be subject to some parameters so it can be a sort of like a phi coefficient you then still need to estimate the phi coefficient so that psi vector so you see that the A matrix depends on a psi vector and also some of the variances they may still be unknown that is something I need to estimate by maximum likelihood so all the other stuff is done by regression but some stuff I still need to do by maximum likelihood the impression is that I already have shown you this transformation only here this sort of in stacked form so it is here now all like in econometrics no not 1.1 but maybe 1.2 where we put everything into big vectors and then just to all less in multiple form so that's all of what we do here so we basically apply all less on that bottom equation and so bottom equation there and over all the dimensions and this L inverse that is basically the Kalman filter operation and that is doing column by column so for every column of x you will do this transformation but you only need to do it for the mean and so for those of you who are familiar with the Kalman filter you have a part for the mean and a part for the variance so the first equation is for the mean and then the second equation is for the variance and then the updating of the state is then for the mean and all the other stuff is then for the variance well here you only need to do this for the mean so you don't have to do this variance part of the Kalman filter so it is really very very very fast this L inverse okay so then once I have this GLS done basically I have integrated out all these fixed effects from the likelihood then the only part that remains is the dynamics and that is inside the state factor this two-dimensional state factor and that is then what I do in the regular way via this Kalman filter operation and get the likelihood from the Kalman filter so the Kalman filter is used in that sense fairly efficiently both for estimating all the regression coefficients and for calculating the likelihood function and that that likelihood function then I need to optimize that likelihood function over one, two, three, four parameters maybe depending on how evolved your dynamic formulation of your stochastic trend is and depending on how many variances you need to estimate how what type of level of electricity you have in the error terms so if there are many variances there to estimate then you also need to do that via this maximum likelihood procedure but every time the likelihood is sort of doing this transformation approach and then computing the likelihood that's how the whole approach basically relies on very classical methods the OLS and the Kalman filter for doing the transformation and then the maximum likelihood estimation for the remaining coefficients so that's all of the technical bits it is very much alluding to the sort of the transformed likelihood approach that I was already mentioning earlier only here the transformation is not done explicitly in the way that you can do if you put the lag-dependent variable inside the model and then you do the transformation as in pasteren but here the change is that we add these stochastic components and then you do the transformation via the Kalman filter so that is basically only the sort of the technical deviation from the method of pasteren so the implementation is different in relies on this Kalman filter you see and then the use of the Kalman filter is to transform the data and that goes back also to ages ages and ages ago so that is also like pretty much established well then there are some nice things that I at least like that I want to estimate also the time-varying coefficients so if I put this time-varying thing in for the intercept and for the regression coefficients then after all that work let me see it how it looks like well that is what we call signal extraction and that is done via the Kalman filter and then maybe the schmoozer if you want to get a full estimate it allows for forecasting important for now so here it also opens up sort of a way for penodating models also to do some forecasting and again this sort of implementation of this transformation approach can be combined with within estimation between estimation different type of approach all these different approaches in penodate models can be handled also if you make the C i that is sort of the third item that I would like to discuss here even if you make all the C i's and the B i's stochastic then they go into the UT term but then still we can deal with that within the Kalman filter because it is just a part of the variance and it comes back as a layer for all the T's but you just put that into this relation of UT and then you're done so that can all be can all be added well it's a big however two so that was however one however two if N is really really big then well you can do filtering updating equation by equation there are all the techniques in the book where we describe if N the cross-sectional dimension is very big then you can do all kind of techniques to lower the dimension by collapsing the observation factor into the dimension of the state and that is here of course very beneficial because the state has only very small dimension and N has potentially a high dimension so if I can collapse the whole thing to a dimension of only two then Kalman filter is also very fast in this dimension of two so there are all kind of like funny ways to make this really computationally efficient but again that is all building up on earlier work that is done earlier yeah so this is basically the whole model that I've been trying to tackle in this presentation I've sort of giving you my my journey of how I have arrived here but that is what it is the the identification issue is here at the bottom and that is also what I do for the two applications well given the amount of time that I have only going to do well the most relevant thing here within the buildings of the ECB that is the Philips Curve so I will present to you the application of the Philips Curve work that I do jointly with Marint Flecker from the Netherlands Institute for Economic Policy in The Hague and the other nice illustration I will probably not have time then I will fly through all the pictures but that is maybe not so useful but that will also come out soon we are working out on the papers so that is the climate econometrics application so the Philips Curve is like also a long history of traditions of literature but we are just focusing on a very recent article by Hazel et al in the QG AE very recent and they basically had this basic Philips Curve regression that I put up and estimating it and we are also discussing all the issues related to estimating the Philips Curve the identification problem due to the co-varying between UT and the expected inflation term in there so we have learned something from the Mavrudis paper and the issue of the simultaneously the classical issue about supply and demand stocks and how to tackle these so have followed their solution and by instead of looking at the cumulation of the inflation numbers of the whole of the US but looking at state by state so basically using this illustration because they have a panel of the well not all the 52 states but the 36 states so they have a data based on 36 states of the US and basically collected all the relevant variables of the Philips Curve equation so inflation, unemployment level of inflation etc so we basically took their data and then applied our panel model on their data so here's the data so this is the quarterly inflation by US states and as you know there are many states and well they're the only ticket you can see here that sometimes you have in that the data set is not balanced so there are some missing values and that again music to my is because then again you can say well the kelmer feels you can deal with all these missing so we can basically run this all this data stuff with this big gap in there and if you then go back to Hazel they do all kind of incutation methods and then you say oh how ugly and so we can just do this all much much nicer of course would not expect something different from me I guess this is the unemployment data that is balanced data but this is the for each of the states so the same picture but now in all one color but it is the same collection of data states you see some variation in there so you can see it oh by the way why does it solve that supply and demand simultaneous problem here the idea is that that the central banks cannot offset the regional demand shocks using a single interest rate so this may also be relevant for the European Union and for the you can set a policy for interest rate but that still does not sort of impose a full demand shocks regulations by the central bank so that is basically why they motivated to use these regional data okay so so we basically took our panel data model from them but they also had this sort of two ways effects with both the cross sectional and the time varying thing and so we basically took that second dynamic panel data model with two time varying effects both for the muti and for the for the regression coefficient that is not in Hazel et al so the introduction that we do in Hazel et al that we also allow to have a time varying Phillips curve coefficient so the slope of the Phillips curve it can be time varying and so that is what you do in the same way I see intercept and we use the lagged X in here because that is for unemployment and that is also what they do as an output measure so this is the empirical model of Hazel that was used well lagged but this was quarterly data so they used yearly lags so T minus 4 both for the price level and for the unemployment rate again for all the states also the relatively price variable is also state by state all these axes are state by state so here you can see that we can deal with multiple experimental variables so and then our model is well we allow a time varying coefficient for the Phillips curve slope but not for the level of price so it are not going to do this anything so it fits rather well so Hazel is looking at model 1, 2 and 3 specification and we add basically Hazel is looking at models 1 and 2 and we basically add model 3 and 4 in their tables so they have this table with all the GMM estimation results we now add well the maximum likelihood estimation approach and allow for this time varying coefficient so model 3 yeah 3 minutes now is fine I can manage because model 3 is only a time varying intercept and model 4 is a time varying intercept and a time varying slope for the Phillips curve so well if you have well many of you but if you look at the paper of Hazel they have this table here and well they have even some more options before but the last two columns are my first two columns here these are maximum likelihood estimates but they coincide pretty well with the GMM estimation results of Hazel what is reported in that paper and then we sort of add the stochastic trend and in model 4 also the stochastic slope and then it's sort of nice to see of course it's not very surprising that the likelihood is giving a big boost if you make that thing time varying, the time varying intercept making it sort of dynamic allowing also that recognizing the dynamic features not only by a lag dependent variable but also by having a proper stochastic formulation for the dynamics over time and if you also do that for the slope that also adds likelihood points tremendously so I'm going to claim that the Phillips curve is not constant over time over this period of time so it is really time varying the periods where the Phillips curve is very strong present and there are times where it is fairly weak and that is sort of like summarized in this plot so this is again model 1, 2, 3, 4 the bottom ones have the time varying coefficients for the new T but you only see the beta T so that's why you see all these flat lines so the slope for the first three models are just constant but in the last model it is time varying and there you can see it is moving over time where you see periods where Phillips curve are really strong between the 1990s and 2010 which is significantly negative but then in other times it is sort of like somewhat weaker then becomes negative again after the financial crisis and now it is, well it is now maybe hard to say what the thing is at the moment but time will tell but you see that it is not constant over time and if you are not sort of thinking that these confidence bounds will support it then at least the table and the likelihood values it is just a better fit for the data with this posimonious model well I don't have time for the instrumental variable because as I said you have the issues of simultaneity between the unemployment and the price well in Hazel et al they do instrumental variable estimation using the Bartik correction and basically we followed it up we did it sort of in their ways by just taking the fixed model in the second route to use the y-hats and the x-hats from their model as it is explained here at the bottom of this slide but we also did it for the time varying thing and that even works a bit better but yeah that is another step that I think people find hard to see how you can sort of use an instrument in a time varying coefficient context so I think I will better to rest for the moment and just to have standard procedures as they do it and then maybe later think a bit more how to use instrumental variables in a dynamic time varying coefficient world there is some papers about it once you start reading and getting interested in it there is some literature on this so I am going to follow on that but that is for next time not for today and then you can add some of the instrumental variables this is my last half a minute and you can see again that all the time variation that you put in that is really supported by the data of course the data is a very narrow jacket just like the panel model is very parsimonious and you put all the 36 states into that narrow jacket so that is a big constraint so if you just allow it to be a bit more flexible over time at least that the cross sectional fit may change a bit over time that is what the data likes a lot to have that sort of flexibility and that is all of what you see in this overview and also with instrumental variables the impact remains more or less the same of course the subtle differences sort of the above pictures are without instrumental variables and the lower pictures are with the instrumental variables they change a bit but overall you can still see the significance of it well that is other concluding remarks on inflation basically commenting on the things that are already mentioned to you and it is sort of a nice application of all the sort of methodological stuff that I presented to you earlier so this is a straight application of that material thanks a lot Simian, so we have time for a few questions if you could present yourself and give you an institution for people online so Gabrielle Hi, hello, this is Gabrielle Pérez-Kiros from the Bank of Spain Simian thank you very much for your excellent presentation and a couple of questions one is kind of technical when you have this I mean stochastic trend that contains a unique route we know that we have all these problems pilot problem and things like that we have to go to the median and bias estimator and things like that I don't know if you have you face this problem here or this is completely I don't know that's something that I would like you to mention and the second thing is like do present these nice results from different data I mean very different slopes in the philips core if we have done this thing for the aggregate I mean suppose that it would be very different or do we learn a lot from using the different I would like to see what is the gain of using this thing versus using the aggregate and see I don't know if we gain standard errors precise I mean I don't know something like that Any other questions Yes This is Davide D'Alemone from Banco Vida I have a couple of technical questions about the estimation I didn't get if you do iterative procedures because if you go from LAS to Kalman filter or if you do in one step or if you need to initialize the Kalman filter then do LAS and then back and forth iterated it and then the second question Have you thought about to put individual mu t mu i t so individual trend because in this specification you assume that all the cross-section have a common trend but in many cases maybe individual seeders may have a different trend or maybe they have a common trend but with the loading and also the example with inflation maybe you may lose something because different states or different country Europe may have different trend so this may reduce also the confidence band around your beta t which are pretty big and floating around zero so and if you integrate out also the parameter uncertainty I guess is becomes so probably giving it different dynamic specification of the mu t may help thank you maybe I can under one quick question also so you mentioned at the beginning that we could also imagine to deal with seasonal time series so that could be super useful at some occasions but I was wondering if you put one seasonal for one common seasonality for all variables in the mu t or you could also imagine to have seasonality for each variable in the C high okay so I don't know how to do that and maybe also you didn't mention the forecasting but you mentioned it but you mentioned it but you did not exploit could you imagine to improve the forecast by looking at disaggregated the variable by states okay I think I still remember all the questions I had so no the non-station of course when you do it directly by having the lag dependent variable and then all this issue arise in post or conditioning if you put a unit root on the stochastic component and all the variables obey to this sort of like common non-stationary random walk type of behavior then there's no issue but that is all about testing so I'm going to look at the prediction errors and if there's still sort of like unit root evidence in the prediction errors then of course I need to work on the model and I cannot do the likelihood and it is not valid anymore but as soon as the prediction errors are all stationary then basically I have transformed the data into stationarity and then all my inference is valid so that is a bit more work and you can also do it formally with testing but usually what I do is just looking at the prediction errors and then instead of just looking at all the individual and modelling and then sort of within the models you pull everything into one component or just not taking all the averages well if it is only like a trend then I agree sometimes you see these applications where they basically only just take the average but in these models where you have access so you still want to sort of allow for the hydrogen and IT between all these access and you may have different components one for the stationary and one for the non-stationary bits then you don't want to sort of assume that all the series have the same type of decomposition you sort of want to have different buildings on them some of them can be common other ones can be more idiosyncratic so that sort of choice is then this is sort of beneficial but if it is only like straightforward application with one common trend so then yeah then I think you can average the data then on the technical things about the Kalman filtering well once you know the hyper parameters so once you know the autoregressive coefficients and the variances then the transformation is done well if the component is non-stationary then you need to do this diffuse initialization if the station then you just have proper initialization then there's no issue and that filter is the same for each of the series that you have to transform so for the y and for each column of x so then the whole initialization business is the same so that is the primary use of the Kalman filter it's just to do that then if I have unknown coefficients that I need to estimate by maximum likelihood then I also construct a likelihood from that model basically it's just the OLS model I can construct a likelihood and that OLS model formulates is then subject to the to the parameter vector so if you have a highly persistent AR1 or not so persistent and then I just do the maximum likelihood based on the transform data just like OLS so it is a non-linear OLS in some sense and so it is not an EM this is just a one thing so it is not an expectation maximization type of thing that is just evaluating likelihood functions for models like these in one go and you get a likelihood and well if there are no yeah so there's no EM type of thing at all and then your other question oh yeah it was about the idiosyncratic part they can also include MUITs you can for example Andrew has this balanced growth type of models and as you know we can also put that in state space then of course the number of coefficients become larger so then you either have to group around this together just to make it all feasible but you can then also do sort of like conditioning on like let's say the common thing like you first filter out the common part and then sort of the remainder you can then also but that is a bit more on the and basically what I'm trying to do is is to look at applications where people are basically just interested like here in the Phillips curve there's not so much that I want to forecast for each US state the next inflation or the thing that is not what they want they just want to use it for the policy they are interested in the cumulative so they want to know nationwide what inflation is and so then yeah then of course it is not a model for each individual but it's a model for all and then I think it is justifiable then I'm not so unhappy if some of the prediction errors do not look so nice then I think well for the purpose of why they use it then I think it's fine but if there are other purposes in mind then they need to do a lot of things and that is the same on the forecasting I think these are not typically good forecasting mode because you move so much together that in the end yeah you're paying something there's still dynamics in there that I'm not accounted so then then I would more look at the idiosyncratic ones and then also use the thing but if you want to do it you can do it so maybe if you're interested in inflation forecasting well then there's always a big debate and I think also in central banks that is discussed widely is whether you sort of like pool first and then do the analysis or first do the analysis and then pool the results and then I think there are mixed messages in the literature some people say oh no you first pool all the data and then do the forecasting or you first do the individual forecasting and then you ever support that this is a bit of a mixed bag and then the seasonality well in that other application illustration I really have the seasonality in so it is like summer winter just like ethane measures and they are high in the summer and low in the winter so they have to sort of up and down and that can be common yeah that is common again because otherwise it is just too much 36 days but again you can do it but then you need to do some ad hoc I would not sort of advocate to then have a state factor of 36 times 12 in the state and put them all in nowadays computers are fast but it's just not so elegant so it is nice to think about all the ways to do it okay thank you very much