 So the next paper is by Mark Oliver Poller from Hyderabad Institute for Triagra Studies and it's about testing quantile forecast optimality. As usual you have 25 minutes and then 10 minutes for discussion and 10 minutes for Q&A. Yeah, thanks a lot to the organisers for you're organising this nice conference and in particular for having our projects on the menu and in general for also including papers on forecast evaluation. I mean in the poster session we already had two nice posters on forecast evaluation and this is also a talk on forecast evaluation and this is joint work with Jack Foster from Kings College, London and Daniel Gutknecht from Goethe University, Frankfurt. Yeah, so it's title testing quantile forecast optimality and the motivation is of course quantile forecasts are becoming more and more popular. I don't need to tell you that anymore and in economics and finance they come in the forms of growth at risk, inflation at risk, value at risk and our goal is to evaluate quantile forecasts and we are in particular focusing on evaluating the forecast over multiple horizons because in practice usually we do not forecast only for one quarter into the future but for many quarters and we ideally want to evaluate our full forecasting approach not only a forecast for a single horizon. Okay, so this is what we focus on multiple horizons and then as we've seen for example in the previous talk often also not only a single quantile is of interest but multiple quantiles. Let's say if we have prediction intervals or if we want to approximate the full distribution by a set of quantiles then we are in a multiple quantile situation. Okay, so this is a paper about multi-horizon, multi-quantile forecast evaluation and then usually forecast evaluation can be divided into two classes of approaches. The first one is relative evaluation where you compare different forecasting approaches that we have seen a lot today. Yeah, compare these approaches by an expected loss let's say expected quantile score. This is not what we are doing we are interested in absolute forecast evaluation where you have a single forecasting approach and you want to check if this approach is in line with the data or if the forecaster uses information efficiently. Okay, and why should you be interested in absolute evaluation? I mean clearly in a horse race your best fanciest model yeah can read all other models but also the best model can be very bad and so you should always, at least I say that, you should always complement your relative evaluation with an absolute evaluation where you test at the model or the forecast for some basic properties which we usually call optimality, efficiency or calibration depending on the literature that you come from. Okay, and this is what we do. We, that's the delay. We propose an optimality test, several optimality tests for multi horizon and multi quantile forecasts. A little bit on the literature, I mean there's a huge literature on optimality testing for quantile forecasts at least in the single horizon single quantile case. Yeah, some papers I listed here but there are many, many more and then in relative when it came to relative comes to relative evaluation one of the organizers of this conference forcefully argued that I mean in the case of mean forecasts but in particular in general for relative evaluation you should evaluate over all forecast horizons jointly and not just consider a single forecast horizon for evaluation or consider all the single horizons and do a test on them separately but do a joint test and we try to do the same thing in the quantile case for absolute evaluation. Okay, what do we do? We pick a specific notion of optimality which is called, has been called auto calibration in the literature. This may be not such a familiar term but this is essentially the notion that a Minzer-Sanowitz regression tests or in our case a quanti Minzer-Sanowitz regression and we built on those Minzer-Sanowitz regressions to test this notion of auto calibration. We allow as I said for multi horizon and multi quantile forecasts. Our tests are based on a set, a finite set of moment equalities and we use bootstrap critical values and we also have some extensions. You can also test stronger forms of optimality or calibrations while augmented Minzer-Sanowitz quanti regressions and a multivariate version for example to test the forecast from the from the previous talk. You can also test quanti forecasts for multiple for multiple variables at the same time. Okay, so we provide some simulations of course to analyze the finite sample performance and we have a macro and a finance application so a value at risk and a growth at risk application to showcase how our tests may help you in practice to evaluate your models. Okay, let me start by introducing our basic tests and before that let's look at the setup. So the goal is of course to forecast a certain quantile or multiple quantiles but let's start with a single quantile. So our variable of interest we call yt let's say it's inflation. The target period is called t and we are we are standing h, h periods in the past let's say that's the forecast horizon h and we try to forecast a certain quantile yeah the 5% quantile so it's for example the tau quantile of yt given the information that we have in t-h. Okay, so this f t-h is the forecast this information set. Okay and then we denote this quantile forecast by y hat and tau is the quantile level, t is the the target period and h is the forecast horizon. Okay and now we are in this multi-horizon multi-quantile setup so we have forecasts for possibly many horizons from 1 to capital H and for possibly k capital K different quantile levels. Okay, what is basically our null hypothesis or what do we want to test? So we call a quantile forecast is called optimal with respect to the forecast as full information set if it's equal to the true conditional quantile of yt given the information set. This notion of full optimality is usually hard to test because we don't know the information set of the forecaster and it's usually large. So what we test in our main test is the notion of auto calibration. Here the full information set of the forecaster is replaced basically by the forecast. So we condition on the information that we have in the forecast. Okay, so this is the notion that a that a mean sartanowitz or a quantile mean sartanowitz regression tests optimality with respect to the information contained in the forecast. So you can interpret this as your forecast you should take them at face value. You should not apply some transformation to them. If they are too calibrated you should take them as they are. Okay, this is a weaker notion of optimality but it's also an important property of of the forecasts itself. Okay, so and what do we do to test this? So we use some quantile mean sartanowitz regressions. They were first introduced in this Galliano net at all paper in the quantile case. So what what you do is very simple. You take your evaluation sample you take the observation the realization for YT and you regress it on the forecast. Okay and then if we go to the previous slide the mean sartanowitz or the regression line models or estimates this conditional quantile here and under auto calibration you see the regression line should be equal to the forecast. So auto calibration implies the intercept should be zero the slope should be equal to one. So this is then essentially what we formulate as our null hypothesis. Yeah for for all horizons that we have for all quantile levels you run these regressions. We have these population regressions and the null hypothesis is intercepts are zero, slopes are equal to one. Yeah and rejecting the null then implies systematic errors in the forecast, a violation of this hypothesis of auto calibration. And in the end our test gives you of course a single test decision over your whole forecasting approach but we can also zoom in as we'll see in the applications and see how we can possibly improve our forecast. And then the first way that we can zoom in is we can look at the contributions of single quantiles or horizons or quantile horizon combinations to our test statistic to see where do the problems lie is it further into the future when you forecast further into the future or is it at alter quantiles. And we can also look at the estimated means sartanowitz regression lines to get an idea in which direction we should try to try to improve our forecast. We'll see that in the applications. Okay so we have an evaluation sample of forecasts in the end. So the evaluation sample has size p as standard in the literature this notation I guess. And we have a matrix valued so that the forecast are matrix valued because we have we have these multiple horizons we have these multiple quantiles. Yeah so in the end we have the forecast and the realizations and we run although we estimate all those quantile regressions the min sartanowitz quantile regressions save the coefficients that we estimated and then the test statistic just takes the deviation of the estimated coefficients from their values under the null. Okay so estimated intercept minus zero so the estimated intercept and estimated slope minus minus one under the null and these are the empirical moments that we use these m hats and then we basically just sum them up scale them up by the evaluation sample size and square them and this is the test statistic you so you can interpret this test statistic as a basically summing up the distance from all those from the regression lines under the null to the to the estimated regression lines. Okay so the asymptotic distribution of this thing here of course depends on the variance covariance matrix so on the dependence between all those estimated coefficients and we do not estimate the variance covariance matrix we use a bootstrap implementation so a moving block bootstrap yeah and we take the critical values from this moving block bootstrap and we establish the validity of course of this bootstrap under certain assumptions in particular we assume that the evaluation sample and the estimation sample if we have if you have an estimated model that they both go to infinity but the estimation sample dominates the the evaluation sample. Okay so this is our basic testing procedure yeah let's shortly look at at the two extensions that we that we also provide. Okay so I mean you can also try to test stronger forms of optimality where you do not only evaluate the forecast conditional on on the information contained in themselves but yeah in principle you could test full optimality so with respect to the full information set but then you would have to include all the variables from the full information set into the regression so you can then run augmented Minzat-Sahnowitz regression where you if you think that some variable is important for forecasting the other variable then you can check if this variable z or this vector of variable z if the information from this was incorporated efficiently. Okay so this would then be a stronger form of calibration or optimality and here the null would be the same as before but the coefficients on these additional variables should be zero because the forecasts on the right hand side under the null already efficiently incorporate all the information. Okay and rejecting this null would mean that the additional variables contain additional information which which are useful in forecasting and should be incorporated into into your models. Okay and the other extension is yeah if you just have multiple variables and quantile forecasts for them you our approach extends straightforwardly in that direction. Okay then let's look at the two empirical applications the first one is a value-at-risk application from finance this is more about the basic test application of the basic test and the second application is is a macro application which is multi-variates and where we also apply the the other extension of the augmented Minzat-Sahnowitz regression. Okay I mean in financial risk management value-at-risk is of course important and usually yeah you want to you have a single level for value-at-risk but there's no real consensus which is the right level 1% 2.5% or 5% let's say and more modern risk measures like expected shortfall basically look at the look at the information in the in the full tail so yeah you could be interested in multiple quantiles let's say but you're definitely interested in multi horizons when you do risk management you're not only interested in in risk management for the next day so our approach is naturally suited to this to this financial risk management setup and we use in our application daily S&P 500 returns with a standard GARCH 1.1 model as we had that several times today we generate the multi-step they had quantile predictions by the SCARCH bootstrap by Pascal et al that's basically resampling from from your estimated residuals to to simulate forward from the model and then take the empirical quantize from that the data span from 2000 to 2022 and we use the recursive window of initial estimation size 3000 we go up to 10 horizons so from horizon 1 to 10 over all those horizons we forecast and we take those three quantile levels here 1, 2.5 and 5% as a block length for the bootstrap we use 10 and yeah we do some robustness checks which do not really really change the results qualitatively so here there's just a picture of the realizations and those three quantile forecasts for the first horizon remember we we do go up to 10 horizons and I mean this is the output of our test so we have a test statistic a value of the test statistic the the bootstrap critical values and here's the bootstrap p value so it's 1% so the test gives us a decision our model there's strong evidence that that this risk manager risk management forecasting model is not auto calibrated okay that's the overall decision that's what we wanted but of course we can now zoom in and look at the contributions to the test statistic from single horizon quantile combinations yeah see the inside the table or from single from a single quantile down here or from a single horizon and if you look for a longer time at this you will see that for the outer quantiles the the deviations from the null are larger so 5% of predictions seem to be okay from the scotch model for the outer quantiles it gets a bit more the test statistic gets larger or a lot of contributions and for the the smaller horizons what h being equal to one or two there is not so much evidence and against auto calibration as for the for the larger horizons thanks okay um what you can also do you can look at individual p values that you that you get with our tests only testing single horizon quantile combinations or single contents or horizons we have that here the the individual p values but why do I show that and I showed it to you to to to make clear what you would have if you didn't have a joint test you would have all those individual p values and you would have to combine them so this is what what's the a major benefit of our approach you have this this overall testing decision the p value of one percent and you do not need to find a way to combine all those p values or this is the way that that we give you okay one more way to zoom in as I promised you you can look at individual i'm in sartano its regression lines because you have estimated them so for a specific horizon and specific quantile let's say for those cases that contributes most strongly to the rejection um so here is a scatterplot of forecasts and realizations um and the orange line is the diagonal so here we have the um the forecast for the one percent quantile this is and for for h being equal to one so here we have the forecast on the diagonal essentially the one percent quantile forecast if the area and the the min sartano its regression line gives you the auto-calibrated forecasts so the how where the forecast should have been if they if they were to be auto-calibrated and then you can can see yeah for example um here in the more extreme cases um we are too conservative the the the quantile forecasts are too low they they wouldn't need to be so low of course this is the linear this is the linear regression you could do that um non-parametrically um but this is not the focus here but you can get some idea on how you should improve your quantile forecasts um by looking at those regression lines okay to finish off um let's look at the macro application um yeah i don't need to give you a motivation on quantile forecasting and macro um we explore optimality of model-based forecasts of four us macro variables the same variables as in man san 2015 who was one of the first probably to to consider quantile forecasting in macro um and we use monthly not quarterly variables and we look at um two inflation variables the first one here and the last one and industrial production and employment so two real variables and um we use the classical quantile autoregressive distributed lack model by by or not by abg but that they also use we regress those variables on an autoregressive term and on the national financial conditions index of the chicago fed um we use the threat monthly data um from 1984 on um we have 432 observations we split the sample in half um for the initial estimation sample and we go up to 12 horizons into the future so one year 12 months and we use the 10 percent quantile 25 and percent quantile and the median here the block length for the block boot swap is four um and now let's look at this um this multivariate version of the test so jointly the quantile forecasts for all the quantiles of all the horizons for all the variables we get a p-value of about seven percent so there's some evidence that not all those forecasts are auto-calibrated and then we can also look at the individual series and here we see interestingly that the two real variables industrial production and um employment there is not not really evidence against auto-calibration p-values are larger than 10 percent for the two inflation variables there is more more evidence with a p-value of essentially 10 percent and zero more evidence for for miscalibration so one take away could be the abg approach for gdp growth yeah is is more or less okay because it's about the real variable but for the inflation variables inflation at risk maybe you should you should look for a different model um okay um i mean you could also zoom in like we did for the other application we don't do that now um let's look at the other extension the augmented minstrel zhanowitz test where we check if here here we check if one of the other variables um contains useful information um because we just include the three other variables um in the augmented minstrel zhanowitz regression and then of course the um the inflation variables got a rejection for the v-canal hypothesis for auto-calibration they still get a rejection but for the real variables yeah we don't get a rejection the other variables do not seem to carry important information um about um yeah the the two real variables here so no improvement of the forecast possible via including the other um variables in the model okay great so um what we did we proposed a minstrel zhanowitz type test for quantile forecast optimality at multiple horizons and multiple quantile levels which we think is yeah practically very relevant we provided those two extensions or a stronger form of auto of calibration for the widely augmented minstrel zhanowitz test and the extension to multiple time series um yeah simulation evidence that i didn't show you today shows that um the tests work well in in finite samples and we have yeah these two large empirical applications to illustrate the the usefulness of the tests yeah thanks a lot for your attention i'm looking forward to the discussion thank you mark the discussant is laura coronel from the university of york that's been really a pleasure sorry to to read the paper and uh i'm gonna just start with um just small introduction um saying what the paper does so this is a test essentially for misspecification for multiple quantile forecast at multiple multiple forecasting horizon and what the paper does is that it builds on multiple minstrel zhanowitz quantile regressions cast in a moment equality framework so um essentially um when we do in general forecast evaluation uh in the case of point forecast we do forecast evaluation um horizon by horizon so or we are in a quantile type of forecast we do quantile by quantile what this paper does is that it proposes a procedure that we can test uh for uh um specification we can do specification test uh jointly for all the quantiles at all the forecasting uh horizons and the main test is for this null hypothesis of auto calibration which is optimality with respect to the information contained into the uh forecast themselves but it also proposes two extensions one is the extension uh for optimality with respect to larger information sets so with documented uh means quantile minstrel zhanowitz regression and the other one which i think is more interesting and more relevant for the work we've seen in this um um this workshop is for auto calibration for multiple series uh so if we have a model that provides us forecast for multiple series each it's different forecasting horizons and for different quantiles so what they're proposing is a test that allows us to test for the specification of the model jointly for all the series for all the quantiles and for all the forecasting horizons so i'm very enthusiastic about this paper uh so uh let me uh go a bit into the uh the definition of uh auto calibration it is uh used so here uh we have that uh soy forecast which is indicated by y hat at for the variable at time t um made each period ahead and for quantile tau is auto calibrated if it is equal to the conditional quantile of the variable at y t conditional the information set that was used to uh to make the forecast that is contained into the forecast itself so what the paper proposes is to use these quantile minstrel zhanowitz regressions which means essentially that it regresses the um it does a quantile regression of the variable of interest essentially y t on the forecast and if the forecast is auto calibrated what you should have is that alpha so the intercept should be zero and beta should be one for all the quantiles and for all the forecasting horizons so this is the null hypothesis of the test so the null hypothesis is a joint hypothesis that all the alphas and all the beta so all the alphas are equal to zero and all the betas are equal to one for all the quantiles and all the forecasting horizons the alternative hypothesis is that at least one alpha or one beta are different than zero for some forecasting horizons or conditional quant or sorry or um quantile yes so um since um the paper uses these uh quantile minstrel zhanowitz test is natural to estimate alpha and beta using quantile regression so this is what the paper essentially does so once you estimate the alpha and betas in quantile regression so the paper says okay um here we have that this is what we get our estimate and this is what we have under the null hypothesis right so the alphas should all be equal to zero and the betas should all be equal to one so these are what are the moment conditions that we have essentially and they have this distribution so if you want to do a world test what you would have to do is essentially to construct a test statistic that normalizes for this variance what they say is that this can be um so this uh the dimensionality of the problem can become big if you have many quantiles and many forecasting horizons and even more if you want to go multivariate so let's go the other way around so what they do is that they propose essentially to sum up all the elements in these moment conditions so all these here to square them so first to square them and then to sum them all up okay so this distribution this test statistic sorry this is the test statistic where did it go sorry here so this this statistic is non-pivotal because we didn't we don't have we didn't normalize for the variance right so we need to compute critical values using bootstrap so what they use I didn't go too much into the details in the presentation but they use moving block bootstrap directly from the forecast and then of course it's assumed that the estimation sample is large enough and that's the estimation error is irrelevant um and then um as uh Mark said they propose two extensions one is the augmented quantile means exam which in which essentially the test efficiency with respect to a larger information set by adding additional regressors into the quantile regression and adding the hypothesis that the coefficients on these additional regressors should be zero and then the multivariate one that is for multiple variables okay so my comments general comment is that this is very interesting paper and addresses are important issues that is very relevant for whoever has to make decisions about which forecast to use which model you select so I think that this is the best place to present this paper and as I already said when you have multiple forecasting horizons multiple quantiles then the dimensionality of the problem can become big very easily and especially if you go multivariate so a wall test is not feasible so I think that they they got around the problem in a very smart way proposing a feasible test statistic that has uh good properties uh I have essentially uh three comments so the first one is about bootstrap then the Monte Carlo which was not into the presentation and then on the pick application first uh this is I don't know if it's a comment or a clarification but this was the same also in the presentation so reading the paper is not clear to me how the bootstrap is implemented because in the auto calibration test you say that you bootstrap example jointly the observation and the quantile for each tau so which I think that you do multiple times while instead in the multivariate extension it looks that you take them jointly so my view would be that they should always be taken jointly so you should the example jointly always all the quantiles and all the forecasting horizons so I don't know if this was just how it was written or how you're doing it so it's a question and second uh in the paper they're not um the the the the bootstrap is described for the case of auto calibration but then uh it's skipped a bit in the other two extensions but in the case where you have uh the augmented means are done with regression maybe you could specify also how you treat the conditioning variables do do you resample them or do you keep them as fixed um so this is about the bootstrap and then I think my main comments are about the the Monte Carlo simulations so I think there is lots of scope to expand the Monte Carlo in the paper because it is quite uh tiny and it's good to have new tools but you know a new tool is not like a new game you want to see how much it works when it works and which are the rules of the game somehow right so now the the Monte Carlo that we have in the paper looks at let's say two sample sizes and three choices for the block selection in the block bootstrap but for example the auto correlation coefficient the data generating process is fixed 2.6 so how has this number been chosen and maybe you know the final sample size depends on this uh and this auto correlation so it would be interesting to see how like the properties of the test change if you have different auto correlation in in the data generating process and then also for when you look at the sites of course you use the correct um auto regressive coefficient when you use a power you use a larger so 0.8 auto correlation but it would be interesting to explore how uh the power of the test depends on the deviation so um so how wrong you are from you know the null hypothesis uh then um uh or the data generating process so how does the power depend on this deviation and maybe if the power depends also on the auto correlation of the process um and then um also uh there are three choices of block selection but then these are used in the Monte Carlo but for the empirical application um for one the one with the financial data the the value at risk the block it is chosen is not included in the Monte Carlo so and my use that's most likely the block depends on the auto correlation so coefficient so maybe this is something could be also be explored um and then point two which i think is relevant so it's um interesting approach that allows you to test lots of moment conditions i would be interesting to know how far we can go so now in the Monte Carlo you have three quantile levels and four uh horizons so like 12 moment conditions when does the test break right so so can we go up to 20 to 50 100 so this is also interesting to know especially for whoever wants to apply the test and then uh since we are in a central bank we deal with macro data the smallest sample size in the um in the Monte Carlo is 120 so you know with quarterly data indeed in your um um application in macro you use monthly data right but if you want to use quarterly data 120 is a lot so what happens if you have smaller sample sizes uh and then my final comment is the in the big application so i like a lot this idea of recalibrating the forecast and i think this could be explored a bit more in the paper uh so to look how this recalibrated forecast what they look like and how you could combine the recalibration that you do for different quantiles at different forecasting horizons um and then since i think i'm almost out of time these are minor points i can tell you later uh i want just to say that this is a very interesting paper um that addresses a very important issue and proposes three tests one for autocalibration one for optimality with respect to larger information set and auto calibration for multiple series and to conclude i want to say that the paper has a hidden gem in the appendix which is the horizon monotonicity test which i find very interesting and is a bit um frustrating for it to be in the appendix so i was wondering if you could maybe put it aside in another paper and do Monte Carlo and then bigger application because i found that's very interesting uh i know that appendices they um they evolve with the paper so i wonder why is in the appendix um should i say again jump out of the ccb uh uh no very nice presentation many thanks actually i was now inspired also by one poster that it was outside which was about combination of the quantiles and we are always struggling as you said you know how to combine models and what horizon within one or two years so is this just a direct application that we can do that have you thought about combination of you know quantile models using this information because now you can combine with an optimality you know with a function with optimizes over horizons right for example thank you julian manto on bank of england um i have a couple of questions the first one is um i don't know if i missed the point about how do do your tests solve the problem of multipolypothesis testing especially in the multi-horizon dimension uh and then i was wondering whether it makes a difference how you compute the horizon the the forecast horizon so if it's a direct forecast or indirect forecast if you see any difference i guess no but let you answer thank you thank you points thank you for the clear presentation uh to me the the test is a bit non-constructive so if uh if the test rejects then uh then that's too bad for the modular but to me it's there's no no guidance on which which part of the specification i should now change for the for the test to uh to to to indicate a good model specification and one way to perhaps be more constructive is for me to know whether the test just rejected because the alphas are not are not zero or whether the test rejected because the slope coefficients are not equal to one so person can disentangle the test statistic and then i would know why the test rejected thank you um so pushing on this and the point was made by Juan um indeed it would be interesting to have a way instead of a testing model in a way to combine different quantile estimates and then you also solve this issue you know she said the perspective that any model is mis-specified then this could be one way to aggregate different uh forecasts and i guess it's linked to the poster session that uh seen before and another question on on time horizon there are many ways to look at the time horizon one is to look from t from t minus h to t but it can also do uh you know each step at a time there may be although they're not um you cannot sum them but they can be value right and seeing whether the model is doing a good job also estimating from t minus two to t minus three and so forth all these small steps forecasting steps instead of just doing one direct quest okay um yeah thanks a lot Laura for this very nice discussion and thanks for all your questions and comments um i'll try to answer um most of them um so the first question was uh on the on the joint resampling um yeah i think this is not written clearly in the paper we jointly resample the observation and the whole matrix of forecasts um so then um okay and the resampling of the of the additional regressors and the mental standards of augmented regressions this was a bit complicated um i would have to to check to check how we did that um yeah but it's not yeah yeah it's not it should be extended in the paper probably the description of that um yeah i mean all the the comments on the simulations um yeah i take with me and this could be well discussed and with my co-authors yeah definitely they are um they are a bit brief the simulations but actually for submission we we put all of them in the appendix so um yeah but they could they could be extended definitely um yeah and this is also what the referees remarked um but something that you said um for the we have a a simulation that is more tailored to the macro application not to the finance application and yeah we will have more simulations which are which are from a garage model um um we we chose the block length for the bootstrap there um longer because the time series is just longer for the for the for the financial series and the the dependence is probably stronger than in the macro case um but we'll have simulations backing that up um okay then um yeah two two very interesting comments the two last ones um so on the recalibration um maybe i'll explain that shortly so the mental turnovers regression line we've seen one picture it gives you it in a way tells you how the forecast should have looked like if they were auto calibrated because it estimates the auto calibrated forecast from the actual forecast and this is then in the literature um for me in forecast or in in other literatures like the meteorological one it's called recalibration so taking forecasts and manipulating them in a way that they that they come better um let's say um yeah we are thinking of a follow about the follow up project where we take your forecast from structural economic models like dsge models and try to recalibrate them because they are interpretable but maybe the forecast accuracy is not as as good as for statistical models yeah for statistical models yeah i'm not sure if this this leads to much improvement and then you also have the problems of gradual structural change and well the coefficients that you estimate based on the past really helps you to recalibrate your current forecast yeah but that's certainly something interesting that we that we look into um yeah and then we also have this horizon monotonicity test and the appendix that i didn't mention thanks for mentioning it so this is essentially also a test for optimality which i mean we have this quantile score we have seen this a lot today and the expected quantile score should um should increase over the horizons because the forecast accuracy should get should get worse the further into the future your forecast because you just should have less information um about 10 10 periods into the future than about tomorrow let's say um and we we just propose a test for it we have a test for that which is from a technical point of view maybe more involved than this one um but yeah usually when you look at forecasts in practice the um the loss is increasing over the horizon so sometimes you have a non-monotonicity there but that's probably noise so we we didn't really find suitable applications for that test so we we put it into the appendix um yeah okay then um the question on quantile combination or combination of quantile forecast yeah probably should ask julia i'm not an expert on that really but yeah you could use this let's say augmented mental sandwich approach and regress run a regression of of your variable on on different quantile uh forecasts and this this would give you a linear combination of them but i'm sure someone has tried that already i don't have a reference but that's probably the simplest thing i don't know yeah but yeah i'm not too too much into that literature um so then uh your question was uh how how we we solved the multiple testing problem um i mean in the end we give you one p value and if you had a test for every single horizon every single quantile you had many p values and you would have to combine them and the bonferroni correction or something like that we know that it's you lose a lot of power basically doing that so here we we don't lose lose power i would say that's yeah that's and maybe the main contribution um that we that we solved this problem um then i didn't really get it there's no difference if the depending on so if the forecast the direct or indirect this this doesn't this doesn't matter for our test we take the forecast as as primitives as as we see so we don't we don't go the it doesn't matter from where they come let's say we we take just just the forecasts we take them as given um okay so then um yeah the comment that the test is non-constructive um it's essentially yeah we have a p value in the end which tells you nothing so it gives you a decision uh on a statistic problem or a decision problem as you want but um yeah then i i tried to show in the applications that we can zoom in in a way and look at the test statistic is nicely interpretable in a distance kind of way and you can look at the contributions of different horizons or quantites or combinations of them i mean this this can give you a flavor of of where something goes wrong um maybe at every horizon and quantile you have a have a problem okay but yeah this can give you a flavor and then um yeah as you said you can look at the single in sartanowitz regressions and you can look at the coefficients or you can even plot the regression lines and see um see what is going so how it deviates from the from the diagonal um wanted to do that i would use a non-parametric approach on that that's done in the statistical or meteorological literature um so there's this scatter plot of observations and and forecasts and and then you estimate this regression line non-parametric field and sartanowitz regression line that's called a reliability diagram or calibration plot in the in this literature it's it's a fairly new tool for quantile regression i would say but you could you could always do that um and then see um um what is what is tells you where your forecast could be improved and wouldn't take the linear regression line literally as i said because it's only a linear regression and we have structural change and so on but you could get some some um ideas on how to improve your forecast um yeah um yeah this was your your first question i guess and i would have to think about the the second one so um so the gradual the gradual forecasting um might also be interesting to think about how to evaluate them well thank you thank you lot mark and thanks to all the speakers