 So let's move on to the last paper of this morning session by Daniele Bianchi from Quidmari. So thanks, thanks all of you for being here. Thanks to all the organizers for having our paper on the program. I really enjoyed the day yesterday and today terrific program. I go back home with lots of ratings to do and lots of new things to study. So it's a joint work with Mauro Bernardi at Paloa and Nicholas Bianchi and Nicholas Bianco who recently moved to Pompeu Fabra. So I'm going to talk about linear aggressions in a high dimensional context and the goal of the game here is try to select which variables are in and out dynamically in your aggression. So before getting deeper into the paper, let me just give you a little bit of an overview of what we want to do. So it's a boring univariate regression, you know why an x t minus one so predict the regression plus the castigal activity. And in a static context, the idea is that a predictor can be in and out and being static is always there or is never there. Perhaps more interesting for forecasting the beta could be time varying, right? So you have trajectory you could model your beta as a random walk is an air one, you know, depending on your preferences, you have a variety of choices. Now, perhaps even more interesting, you might have a situation where your beta is time-varying at periods is zero at others and that gets time-varying again after a while. And again, if you add one step of complexity, it could be time-varying some predictor, maybe some others are constant and zero. Or perhaps you have a predictor that is always in time-varying and now that that gets in and out at different periods of time. So this is essentially what we are after. And there are a few key issues that obviously makes the things a bit more complex. The set of predictors could be very large, potentially as large as the number of observations. And I'm alluding here a little bit to why we care about variable selection in the first place. Obviously it's not really clear exactly which predictor matters and actually when. So the trajectory are unknown a priori, the significance is unknown likewise. And if you work in a dynamic context, this actually is even more relevant because if by chance you have a predictor that shouldn't be there, your space essentially accumulates noise. So the auto sample performance becomes even more detrimental. So what we're going to do in this paper, we work within this context. So we have, you know, large dimensional regression, time-varying parameters and things can get in and out. And this is essentially a Bayesian method. So Bayesian methods for dynamic variable selection, high dimensional time-varying predicted aggressions. And the idea is that you might have what we call predictability of the intensive margin. So you might have single predictors that can be active for the multiple periods and predictability at the extensive margin, which means that multiple predictors can be relevant only for, for instance, one observation only or few observations. If you look at literature, you know, people like Rochkova or Eddie Bell Lopez, they make the differentiation between horizontal sparsity, which is essentially what we call intensive margin predictability or vertical sparsity, which is what we call extensive margin predictability. The method is another variational-based inference approach, and there are above and beyond the computational efficiency. There are a couple of advantages that I'm going to lay out next, but essentially it requires a minimum hyperparameter tuning. You know, as a Bayesian is always a good thing, try to rely as least as possible to prior views. I'm going to show you that the posterior concentration properties are very much the same as MCMC. So if you're worried that, you know, we're not as accurate MCMC, hopefully convinced that that's not necessarily the case, said that it's parable. So using exactly the same formulation, the same, exactly the same MCMC counterpart. And then it's much more efficient. And by efficiency, I mean that essentially we have, we design the algorithm so you can learn about the sparse trajectory of the predictors that are in and automatically online. You can eventually exclude those predictors that are never in, so the noise, really. And this is the things that differentiate with respect to MCMC because doing the same thing within an MCMC framework would be particularly complicated. Right, so we have two empirical exercises. The one that I'm going to talk about today is about inflation. So we use the FRETQD. So we have roughly 230 predictors, which is relatively close to T, so it's relatively close to the number of quarters. And then in the paper, we also have an applications on the Equity Risk Premium Predictability, where we use 150 anomaly-based portfolio to forecast the aggregate stock market. As I said today, I'm going to focus on inflation predictability, given the theme of the conference. So, non-exhausting list of references. I'm going to refer to the two highlighted paper, Kupen Korobelius and Rochkova-McCulling, because as far as I know, those are like the most recent advancements, so to speak, about dynamic variable selection. Dimitris actually use a variational base, but within the context of dynamic spike and slab. And Rochkova and Ken, they use an EM version, so expectation maximization algorithm of a spike and slab framework. I should also mention Nakajima and West, that, you know, as far as I know, it was one of the first that introduced this idea of thresholding and dynamic selection of parameters. And also, I should mention both Aubrey and Joshua, that are successfully contributing to this literature on variational base inference, so strongly encouraged to read their papers. Right, so the model, as I said, it's a boring linear regression. I'm sure many of you will be disappointed by the end. So, it's a dynamic Bernoulli-Gausson process. Think about that as a state space. You have your observation equations, so standard predicted regression. The state space might not necessarily be that trivial, so it's a combination of two pieces. Your beta is a combination of a time-varying coefficients that I could be. And that could take any form, really. In our framework is a random walk, for reasons that we'd be clearing in a couple of slides. But, you know, if you don't like that on the walk, you want to go for an air one. You only have, you know, a few additional derivations to do, but can accommodate that. And then you have, you know, the key actual, the key novelty of our setting, which is this dynamic variable indicator that I call gamma JT, which is essentially teach 10T is a Bernoulli. We are giving dynamics that we'd be clearing next. So what I'm after, really, is this object here. So I want to have two things. I want to have an indicator gamma that tells me when the predictor is in over time, you know, dynamically. And I also want to have the corresponding beta. So on the left, you see here, you know, how the gamma would eventually look like. And on the left, you see how the beta corresponding beta would eventually look like. And I want to be very clear here. The dynamics that you can capture is going to be very general. Not necessarily me that a process gets in and out could also be the equivalent of a structural break. So the beta gets in and then never gets out. So essentially changes the whole dynamics of conditional expectations, if you wish. And this is going to be perhaps interesting when I talk about the inflation forecasting part. So the model specification, as I said, this small beta, which, you know, together with the gamma drives the slow parameters or the regression coefficients could be anything really. We go for a random walk specifications. The reason being is that there is a nice so-called Gaussian Markov random field representations, which essentially represents the joint distribution of the beta with a given precise, tight, tightly defined Barian-Skova instruction. That makes the computation quicker. In fact, you also have a Gaussian Markov random field representation for an AR1, an AR2, and so on. But, you know, for this implementation, we go for a random walk. Similarly, for stochastic volatility, we have a similar random walk representation, which is relatively standard. When it comes to stochastic volatility, you know, leaving aside all of the implications for non-stationarity that I'm fully aware of. And again, you might have a similar Gaussian Markov random field representations. So the only parameter of interest here you have to estimate is this new square for the volatility and eta j squared for B. Now, when it comes to the dynamic sparsity, things become a bit trickier. So we kind of use sort of a doubt augmentation approach. So the idea is that gamma conditional on an auxiliary parameter omega, which is going to be key, is essentially a Bernoulli. And again, gamma has a Gaussian Markov random field representation. So it is effectively a random walk on itself. And the parameter of interest, which I'm going to discuss a bit more carefully later on, is this psi j squared. Now, it's an auxiliary parameter. So if you integrate out that omega, essentially you're going to get the persistence that comes from the omega is going to be translated almost one to one with to the dynamics of the gammas. So the persistence, the time series persistence of gamma is directly driven by how much persistent is the omega. Now, I said at the very beginning that one of the advantages of our framework is minimal set of assumptions. But I want to be fully transpiring here. There are priors obviously have to decide and there is one particular prior that needs to be discussed carefully. And this is why we do in the paper, which is the prior for the auxiliary parameter omega j. Because as I said, omega j drives the dynamics of gamma j. And therefore, if you kind of allow me the term, if you screw up that prior, you can screw up the posterior, posterior estimates of your gamma and therefore of omega and therefore of gamma. So when it comes to new square and e to j square, we are informative, uninformative, sorry, so it's a flat prior relatively standard. But as I said, when it comes to psi j squared, we have to be careful. So I'm going to digress a second. And I want to, you know, to try to convince you that we try to be as careful as it could be still by Asian framework at the end, but as careful as it could be in picking up the shape and scale parameters of the inverse gamma prior for psi. So we look at three scenarios. One scenario is whereby we have a, so the scale parameter is constant and the shape parameters goes to plus infinite. And what I'm going to show you here is I'm going to show you three pictures that looks the same for the next three slides. The left panel is essentially the covariance structure of omega. The mid panel is the trajectory of omega. And the right panel is the trajectory of gamma. So you're indicator whereby you see, you know, the indicator in black and the true values of omega in red. So if you let beta, if you let B goes to plus infinite and you fix a essentially you're going to go to a very, you're going to put essentially zero weight on the Q inverse matrix that I showed you before. It's actually almost an IID framework. So almost by construction the mid panel tells you that your omega is going to be very erratic and as a consequence to that your omega is going to go all over the place. So becomes labeling becomes particular tricky. So that's not perhaps a good choice. So scenario B we do the opposite. We fix B. So we fix the shape and we let the scale parameters goes to plus infinite and we get the opposite. So we give very, very high weight on the parameter on the inverse of the covariance structure Q inverse that I showed you before. So essentially omega now is almost an infinitely persistent process. And you see that is going to be you don't see the scale probably in the mid panel but that's essentially a flat line. So that is a consequence to that your omega becomes, you know, almost irrelevant. So you have this very, very flat dynamics and you really can capture much when it comes to the identification of your one and zero. So they identify which predictor is in. So again, perhaps this is not necessarily a good choice. Now, if you take the ratio of the two parameters constant, whatever that number is, then things start to behave a bit more, you know, more sensibly. And what I'm showing you here is the covariance structure of the omega on the left again, and is not an IED and is not you don't put too much weight on Q. And as a consequence, the omega parameters has a shape that is persistent. Remember is around the walk at the end, but that allows you to have a little bit of a more closer identification of the gamma parameters. So the identifier of the regressor in the first place. So our recipe, so to speak, is to pick those two hyperparameters. So scale equal to shape equal to five. In fact, in the paper, we also experiment with changing the B, so changing the shape parameters. The A speaks to two because then is the equivalent of a very uninformative view because the variance is plus infinite, really. So, you know, given that prior formulation, so that's the only prior really we have to kind of take a view. All of the others are, you know, uninformative and standard in the literature. Now, let me talk a little bit about variational variational base inference. I'm going to give you the highlight and the, you know, what is the basic intuition and also want to be transparent on why we pick variational base in the first place. So the idea of variational basis really, you know, you can think about that as a so almost like an indirect inference approach. You have a cool buck library divergence that you have to minimize. You have a variational density that I call Q2T and you have a true posterior density that you know, usual notation theta given the data. Now, in a general sense, you might think about the true posterior is not observable. In fact, in our sense is observable in the paper we derive also the exact MCMC counterpart of the variational base. It's just that for, you know, computational reasons, we're going to use the variational base. So what is the variational base that corresponds to find an optimal density that maximizes so called effective lower bound, which is this logarithm of P, Y and Q. What is P, Y and Q is essentially convolution of densities really of things that you know. So Q theta is picked. So you have to, you know, think about what is approximating density is and P, Y and theta is the joint likely, you know, data and parameters, you know, as far as, you know, the likelihood form of your model. That's okay. You have all of the ingredients that you need. So the goal of the game is to maximize this object and, you know, you can start to derive this object essentially starting from a standard base rule, nothing more, nothing less. So things becomes a bit tricky when you have to decide, you know, the space of densities you have to, you have to maximize. So you can fall into different categories depending on the choice you make on this calligraphy Q. You may have a mean field variational base approach, which is non parametric and the idea is very simple. You're going to factorize your densities in lots of independent pieces. So the idea, you know, to give you the example here, think about prior for linear regression, B and sigma. The simplest example, at least in my mind of a mean field is an independent prior. So you have a prior for beta and independent prior for sigma. So you're going to separate the two things. Now, the advantages that you're going to get close form updates based on a coordinate ascent algorithm. And for those of you who are curious, the main difference between variational basis is stochastic optimization algorithm. These are the assimilation based algorithm rather than MCMC. That's why, you know, talk about coordinate ascent rather than posterior draws. Then, of course, you might have a second choice, which is this parametric variational base. The idea is that if you have views on your cubing Gaussian, then, you know, is a fully parametric parametric parametric approach. So what we do here, we combine both aspects. So we have an hybrid approach, which we call semi parametric. The idea is that we exploit them in fields. So we factorize the densities with some caveats that I'm happy to discuss later on. But also we take a parametric approach for some of these densities. And for instance, when it comes to H, we expand on the paper by Josh in the recently appeared in JDC, whereby we use a fully coordinate ascent algorithm. And when it comes to gamma and Z, we use a polygamma representation, which is essentially an auxiliary parameter on top of an auxiliary parameter, but allows you essentially to speed up computation substantially. So those are the only two parametric, fully parametric choices we make when it comes to the mean field factorization. Now, I'm not going to bother you with essentially any of the propositions. I'm fully happy to discuss later on, or we have full details in the paper. But I want to highlight the key object of interest, which is the beta. The beta is a compounding process of the B, so the random walk, and the gamma, which is this weird object that comes from lots of auxiliary parameters. Now, at the end of the day, what you get is an object that is very familiar, is essentially a mixture. So you have mixing weights for the full trajectory of one and zeros, and then you have the mixing weights and a bunch of multivariate gaussians. So conceptually, it's not necessarily that complicated. So that's the only thing that I want to highlight when it comes to the optimal variational densities. The other thing that I want to discuss, I made sort of a great deal at the beginning talking about computational efficiency and the fact that we can exclude parameters online during the estimates. Now I want to give you the intuition and show you a little bit of the, let's say, asymptotic properties of the algorithm, and then I'll show you the algorithm in a couple of slides. So here's a proposition we have in the paper. I'm highlighting here the only thing you should really care about, which is this new, this is the average of the optimal variational densities for the parameter omega. And the idea of this proposition is to show you that for small values of this object here, keep decreasing, and at the end of the day, you're going to converge to, you never select a given predictor. So you're sparsely inducing properties for the full trajectory of your regression parameters. Now if that's the case, and it is, I'm showing you here a simulation. So on the left, you have the gammas. On the x-axis, you have iterations of the algorithm. On the y-axis, you have time. On the right panel, you have the omega parameter. Same thing, x-axis iterations, y-axis time. And I'm not sure you can see, but perhaps it's too transparent, but there is a dashed line here, vertical, and that's essentially the threshold. If we put a value of, let's say, 0.01, meaning that if the increment is smaller than that, we can stop the algorithm and claim that that predictor never enters the predictor, never enters the regression. Now if that's the case, you can add a step to your algorithm whereby this part is a standard, the equivalent, think about an MCMC. So you're going to update your parameters. But then you add the steps here that essentially tells you within the iterations, if that parameter is for some periods, some iterations never enter the regression, you just exclude it. So essentially you're going to shrink in online the dimension of the regression. And if you think about it, it's really like a variance inflation problem when you have noise in a large OLS. It's essentially the same idea. But we do that online. And that's, you know, when it comes to the main advantage of the variational-based approach, doing this thing with MCMC, at least for me, could be particularly complicated. Now I'm going to show you the simulation study. So two things, comparison with MCMC, because that's a question that everyone asks, why don't you do MCMC? And then the comparison with what we call competitive, comparative approaches. So simulation setting, we have three parameters. And for the comparison with MCMC, that's enough. We generate 100 replicates, and that's essentially the regressions. We have different dynamics for the three parameters. One is always in, one gets in and out, and one is never there. And I'm highlighting in blue the variational-based estimates, and in red the MCMC one. So we measure the accuracy essentially based on what other people have done before, one and quarters. This is essentially the distance between your variational, optimal variational density and the MCMC equivalent, the posterior density. Now I also want to highlight that making a comparison between variational-based and MCMC is always tricky. You have to believe that your MCMC converges in the first place, and then you can always make your MCMC faster and slower, depending on how accurate it should be. There's less variation in the variational-based, but, you know, it's always so. What I'm showing you here is as good as it gets, but I'm sure you could do, you could better. So I'm showing you here in this panel an example of a trajectory, and then essentially this trajectory in a box chart sense for all of the 100 simulations, and that's for beta one on top and gamma one at the bottom. So this is a parameter that is always in, so never becomes zero. There's a time-varying parameters, and I want to highlight, you know, essentially one thing. The accuracy of gamma one, accuracy compared to MCMC. So when it gets close to 100, it means that you are as accurate as the MCMC. And the accuracy of beta one. You see that the accuracy is not necessarily the same for reasons that perhaps is clearly if you look at here. So being a simulation-based sampler, you always draw a small amount of probability from the, if you think about a spike as well, from the spike. So from the zero. While in a stochastic optimization setting, it's like having a very hard threshold in constraints. So is it a zero or not? Now we have a second example whereby it is a constant at zero. So it's always zero. And again, the efficiency of variational base perhaps is clear here. So essentially variational base is, you know, very tight density around zero while MCMC being, you know, uninformative. We choose uninformative priors to compare Apple with Apple is less efficient. So you still have probability mass that are well outside zero. Yep. So it usually takes much more time than expected. So what I want to do now, I briefly mentioned the comparison with standard methodologies. And then I walk you through the main results. So comparison, we have 50 hundred or 200 variables, 200 observations consistent with empirical analysis. We have some parameter that is always in, for instance, beta one, think about an intercept. Some parameters that varies over time with different types of dynamics and parameters from the eighth to the 50 or 100 or 200 that are never there. And we compare different variations of our algorithm. We also compare against rolling window static, normal gamma or shoe spike and slab methods, dynamic spike and slab and as in Ken and Rochkova with different parameter choices and the dynamic variational base of group and corubilis. We compare the F1 score and the computing time and show you the F1 score now. So this is when the parameters is always in. So all of the, all of the algorithms do relatively good job. But what happens when you have a single switch, so it goes from zero to significant only once. Obviously, as you would expect, these are only the rolling window, those in grays. Those are the competitive dynamic approaches and those are all of the hours. So the left, the left panel is ours. Mid panel is compared, like dynamic competitive approaches. Right panel is the rolling window, as you would expect, the rolling window deteriorates. If you go to 200 variables, so there is a lot of noise, rolling window essentially goes over the place. Now another scenario, we have two switches. So parameters can be in and out for twice over the sample. And again, rolling window does relatively poorly. And again, if you, if you increase the noise in your set of axes, things go over the place. We also have a case in which we have a very tight signal. The switches there will be nice. Okay. So is a signal that is very short lived and this is going to be important in empirical application. And again, we kind of outperform competing approaches. Computational time is faster. We're going to talk about inflation or forecasting. So what we're going to have here, we're going to have four different targets. One, two, four, and a quarters ahead for four different measures of inflation. 230 predictors forecasting benchmark is going to be the local level model of stock and what some plastic as the volatility and the time-varying parameter air to in fact, sorry, there is a typo as in Coupe and Corobillis. We take a fully, you know, real time recursive forecast approach for discussing the forecast. I want to forecast what is the in sample narrative of the dynamic parameters because I mean, I personally kept to understand where things are coming from. So what I'm showing you here is the surviving predictors for the total CPI. And at least to me, some of these parameters make sense. For instance, if I look at industrial production that is more on the supply side, you know, it's positive for giving part of the sample that interest and becomes negative towards the end of the sample. The air one becomes a significant only up to the 90s and then things that perhaps are more interesting for me. You have the demand side. So that's personal consumption expenditures. And then you have monetary policy. That's the five years treasury rates. And this is the sort of, you can think about there is a pass through. This is the PPI. One of the PPI is indexes that positively correlates with future total CPI. On the top, you see the betas at the bottom, you see the indicators. This is for total CPI. If you look at the PC deflators, which essentially is a broader measures of inflation. Some of the variables are the same. Some others gets excluded. For instance, lag one is always there. Monetary policy is there. Positive correlation with PPI and industrial production is there. Now I mentioned about the short lived dynamics before. And this is what we found for instance for the GDP deflator, which takes into account on only consumption, consumption, inflation, but much broader. This is essentially the industrial production for final goods at the, you know, at the end of the first lockdown. Really, you have this spike, which have really don't have much clue of what is coming from, you know, hope the micro columnist in the room helped me to understand it better. The other thing that I want to highlight, this is short term unemployment. And, you know, you can think about that as a sort of a reviving the Phillips curve for a given part of the sample. Now I talk about, I want to talk about briefly in the last minute that I have on forecasting. So in terms of forecasting performances, we compare favorably with all of the method except the usual suspects. So the stock and what's local level. It's hard to be that framework. But I think that I want to highlight. We also do the Bariano tests. Our framework is the first two columns of each of the sub panels. If it's blue, we outperform the y-axis. If it's white, we don't. And the only white boxes on top is the unobserved component model of stock and what's in predicted density sense is similar. Now the benchmark changes is a time varying air to with the casting volatility because it's the one that is best performing. The punchline here is that all of the other methods that use macro variables we outperform. We outperform those. So in concluding, apologies for taking some extra minute. What we have here, we have a dynamic variable selection framework for time-varying predictor aggressions, which is having much time to talk about computation, but it's fast and scalable and it's competitive in terms of accuracy with respect to MCMC. We're thinking about some generalizations for recent generalized linear models, group variable selections and change the structure of maybe a regular time point spatial data networks. It's pretty flexible. Thanks. Thanks a lot for the attention. I'm looking forward to another discussion. Thanks. Thank you. Let's move to the discussion carried by Hannah Simone. So first of all, I would like to thank the organizer for inviting me to discuss this very nice paper. So yeah, so I really enjoyed reading it. It's a very interesting paper. Okay. So we start with a brief summary and then I will, I will present my points. So the aim of this paper is to predict the dynamics of economic variables. So this can be useful for instance, if you want to forecast inflation or forecast asset returns and the setting is a setting where you have a large number of predictors. And where the relevance of each predictor may change over time. Okay. So we have a sparsity which potentially vary over time. Okay. So the, the model that they use that they propose is the following. So they consider a Gaussian time varying parameter regression model where we have a Gaussian error term and a p dimensional parameter beta t tilde. And so P may be large. So a large number of of predictors and so large compared to is n. So we are in a high dimensional setting. And so high dimension. The first point is to deal with high dimension. So they also deal with high dimension by assuming sparsity. And then they have to deal with time varying sparsity. Okay. So this is very interesting. And they propose the following prior. So basically my, my discussion will, will focus on the price. So this is a very dense paper. Okay. So there are many interesting things. And so I, I have chosen to, to focus on the prior. So what they do first of all is to reparameterize. Okay. So they, they write the, the beta t tilde. Okay. The product of, okay. Of a diagonal matrix gamma t that contains indicators gamma j t, which can be either zero or one with a given probabilities prior probabilities times. Beta t. Okay. And then they endow it with a, with a prior that is a Bernoulli Gaussian prior. And in the sense that they have a random work for beta j t. And then, so they have a stochastic volatility. And so what is the main contribution is this prior for a gamma j t. Okay. For this indicator that tells us whether the J's predictor is active or not. And so this is a Bernoulli with a, with a probability p j t that is related through this logic functions to this parameter omega j t. Okay. It is an element of this. So this is not the answer of this n plus one vector, which is a Gaussian. Okay. And so the components of omega j are correlated through this matrix q minus one as we have just seen in the presentation by Daniela and so this this correlation is carried on the joint distribution of the gamma j's. Once we integrate out the p j t. With respect to the prior. Okay. And then there are priors on hyper parameter. Then what they do is to propose a semi parametric variational base algorithm and they use the two assumptions. Assumptions on the set. So there is a set of approximation meeting density. So the set over which the cool back library divergences the minimize and what they assume is that there is a mean field factorization and they assume some parametric approximation for the density of age and for the probability of these indicators. So the main novelty of this prior. With respect to what has been proposed in the existing literature is a prior of this gamma j t in the sense that this prior allows persistence through the correlation. Okay. So in the marginal prior of gamma j t has. That's correct. I mean the components are correlated. So my first question means this is just a clarification question that was not clear to me. In the paper, whether the first component of the gamma j vector can be zero or it is always assumed to be active. Then I have. So what I've done is to think about this prior case. I really wanted to understand what is the probabilistic structure of this prior. So actually, we cannot write this prior. I'm using the spike and slab formulation. So I was wondering whether this can be interesting, whether you can present your prior in this alternative way. So basically, by using this spike and slab formulation. So you have your original parameter beta tilde. Okay. So this is the parameter that is in the model. Okay. And the condition is here. So conditional on the beta j t minus. So remember that this beta tilde is the composed in the product of gamma j t times beta j t. Okay. So what the prior can be written as a conditional prior of beta tilde j t given beta j t minus one. So the past values and the gamma j t. So the current value of the indicator. So this is a mixture of a Gaussian and a direct major in zero. Okay. So this is the spike part. Okay. So this direct does not depend on the past on the previous values of beta j t minus one. While the slab part depends on this page beta j t minus one. But actually, it doesn't depend on on beta tilde t minus one, but on beta j t minus one. Okay. So this means that it doesn't depend on the fact that the previous component is active or not. Okay. So this means that implicit in this formulation is this assumption that conditional on gamma j t and on beta j t minus one. The beta j times t tilde is independent on the on the gamma g at the previous time at times t minus one. Okay. So this means I want only three minutes. The past sparsity pattern affects the value of beta tilde j t only through this gamma j t. Then once we integrate out a gamma j t and once we integrate out again, p j t, then we have that the the pasta values, meaning the past sparsity patterns affect the beta j t tilde. And then I have compared this prior with the two prior that are mentioned in the paper, the Coup and Corobidis prior and the Rogkov and McAllen prior. So both these priors are soft spike and slab prior. Okay. Soft in the sense that they are a mixture of two non degenerate prior. So for instance, if you look at the Rogkov and McAllen prior, the first one, this is a slab prior, the slab part. Okay. This can be taken, for instance, as a as a Gaussian. The second one is that it can be taken equal to a plus. So the the first difference for instance, the second difference with respect to the prior in the paper is that this is the mean of this psi one of this slab part. Okay. It's not a random walk. Okay. But it is another regressive of order one with parameter phi one smaller than one. So my first question is the motivation for taking phi one equal to one. And also another difference is that gamma JT in this prior in the Rogkov and McAllen prior depends explicitly on the previous value of beta JT minus one tilde. Okay. So it depends explicitly on the sparsity pattern of of the beta J parameter. So what what could be interesting is to compare the persistency of the sparsity pattern induced by these two prior. Okay. So your prior and the prior of Rogkov and McAllen. Then I have another question. Maybe I skip the first one given the time. So so my first question is about so you you mentioned the fact that you have a small number of hyperparameters, meaning that you are using a random walk instead of an R1 and then you have two less hyperparameters to select. But the question is what if the true beta T tilde is not persistent? Okay. So it doesn't satisfy the random walk assumption. And in particular in your simulation, you take phi one equal to 0.98. So we are very close to a random walk. So it could be interesting to see situations where phi one is much smaller than one. And then how large N can be so in your you in your simulation, you set N equal to 100. I think. But I would be interested in seeing what happens. How the results are affected by changing N by increasing N. And also, you could look at the correlation between predictors. Okay. So if the predictors are correlated, then you could look at the correlation between predictors. Okay. So if the predictors are correlated, how this impact your results? Then the second thing that I have done is to rethink a little bit the model by relating to something I'm working on currently. So we could interpret this time varying parameter models. What means? In terms of groups. So what does it mean? It means that each covariate J defines a group. Okay. So the beta tilde J is a group with N plus one components and that can be decomposed in the product of a diagonal matrix gamma J times beta J. But now the elements of this diagonal matrix are no longer indicators, but they are standard deviations. And so what is the main difference here? The main difference is that you can allow a group to be inactive. Okay. So this account for the situations where you have many predictors, but some predictor is never active. Okay. So you can exclude it directly. And this is what we call by level sparsity. Okay. So in this, your paper, you have one level sparsity. Okay. But you could have a sparsity at the group level and within the group. Okay. So in particular, in a recent working paper with Matteo Moigliani, we have a prior that induces a double sparsity, which we call it's a double spike and slab. So if you use this interpretation in terms of groups, I was wondering, I mean, how the two priors perform. And I think that it would be nice to make this comparison. So to have a prior that induces one level sparsity versus a prior that induces a bi-level sparsity. And so how it performs in terms of efficiency and how it performs in terms of computational efficiency. Okay. Does that stop here? Thank you. Many thanks, Anna. We have time for one or two questions. Yeah. One over there. I go ahead. Okay. Daniela. Why you don't compare by the spike and slab by John Nolan and Prumichery, the econometric one and what are the differences. And also based on what Massimiliano was doing yesterday within the Barth, they made this picture in simulation where there was no time variation. I was jumping and I was asking if you compare with them or what's happening there. But in particular, the spike and slab, I think you know the paper, we know the paper. And what is the differences and maybe because you compare just with Rochkova. That's all. Very interesting paper. I'm curious to get your thoughts of why in this very rich environment you don't beat the stock on Watson. So I guess my question is, did you try to do like a sub sample analysis? Probably in the 90s is when a stock on Watson was very difficult to beat. But in the 2010s with the revival of the Phillips curves, maybe that's when the method that you have a better chance. How easy would it be to incorporate economic restrictions on the coefficients, say motivated from economic theory? Okay, so that's one. So I want to ask if you have the risk to cancel out one regressor at the beginning because it's time-barry and then maybe one regressor could be interested after the financial crisis but you already discard it. So since you have this dimension reduction and so this may be a risk right. Okay, so Daniela, maybe a few... Thanks, thanks, thanks a lot, Anna, for the very nice discussion. I like a lot the spike in slab counterpart and this is something that I honestly have to think about it because Dimitris does exactly that. So it's a dynamic spike in slab, which is kind of my idea almost. So I have to think about it. All of the other things in terms of the correlated predictors that we do have in the new set of simulations, we also look at correlated predictors, we look at 200 predictors, so results are kind of similar. I also like the idea of looking at the computational cost depending on how many zeros you actually have in the regression, so these are all points well taken and so thank you, thanks a lot. Regarding the question, so we do compare with the genonal and supremacist static. So we also compare with horseshoe, normal gamma, all the others. So, I mean, so it's static, so it's not directly comparable. The benchmark is Rochkova, Meccalina and Dimitris because it's the only dynamic really valuable selection, but we compare in simulations, sorry, in empirical analysis and we outperform. In terms of unobserved component, very good question, I don't know. My guess is because there is almost zero estimation error. It's a local level plastic volatility while we use. So, if you throw in some noise also in the macro variables that ultimately affect. So in fact, when we saw that we kind of compared, I was kind of happy enough because at least we have a story, not just a time-vary mean, but I mean, I'm fully with you, perhaps we could look at different subsets. Economic restrictions, I don't know how easy, how difficult could be. I don't know, I need to look into it. The measure reduction, so we prove we have a small theoretical part where we show that we exclude the predictors or if it doesn't matter for the full trajectory, not just at the beginning. So if a predictor doesn't matter, never, and after a few iterations, the algorithm tells you that it's never there, you just discard it because you take out noise and then supposedly makes the estimates more efficient and reduce online the dimensionality. But yeah, that's not a risk for us. So thanks a lot to all the speakers and the students of this morning session. Yeah, they deserve it.