 So let's move on to the second paper of the session. So Reiner Schlusser from the University of Rostock on the use of textual data. So right now you have about 25 minutes. Thank you for having me here on our paper on the program in this great conference. So this is joint work with Philipp Ademar and Jaden Prouser. It's about the impact of adding textual predictors from macroeconomic pay risk. So given the audience, I can be brief on the motivation about quantile forecasts. Of course, quantile forecasts have the advantage of allowing for a quantile specific predictive relationship between the target variable and the covariates. And usually, as economists, we are more interested in the extreme economic periods in the downside risk. And literature has moved in this direction from point forecasting to quantile forecasting. And another recent development in macro forecasting is the use of textual data. Textual data have the advantage of being timely available, and some papers have already shown that they might embed incremental information in addition to hard economic predictors. But most of those studies use them in settings that only look at point forecasts. We look here the potentially added value of textual data for quantile forecasts. What we do in this paper, it's on the applied side, we explore the role of textual predictors for quantile now and one step ahead forecast for monthly data. We look at linear and nonlinear models. Linear models, we look at base in quantile regressions with different shrinkage priors, and our nonlinear methods are Gaussian process regressions and quantile regression forests. And as target variables, we look at employment, inflation, total CPI, production, and consumer sentiment. Let me start with the models that feature a linear predictive relationship. So the base in quantile regression can be stated in this form where the error terms have a mixture representation, and we can state our shrinkage priors in the general form. So in this setup, we have the possibility to include global local shrinkage priors. So this term psi is for the predictor-specific shrinkage intensities, and we have here the global shrinkage intensity. And so we consider three specifications. The first one is the rich prior. And as you can see, here we only have a global shrinkage part, no local shrinkage part because the psi is all set to 1. So this prior doesn't allow for very rich shrinkage patterns. It's only the global shrinkage term. So it would be consistent with a dense representation of the prediction problem where we have many weak predictors. In contrast, we have the horseshoe prior, which doesn't require the user to elicit any hyperparameters. We have the half-Cochit distributions for both terms. And this would be consistent with a sparse representation of the prediction problem, meaning that we get rid of most of the predictors because the horseshoe prior has fat tails and spikes at zero. It means we get rid of most predictors and leaving only a few strong ones. And the less-sue prior is the most flexible one, which can be between sparse and the dense representation, so allows for the richest shrinking patterns. Gaussian process regression yesterday in Massimiliano-Marcellini's talk. He already gave a motivation for this model class. We do it in a quite standard fashion with squared exponential kernels. So the genie parameters, w1, w2, control the smoothness of the function. And so nonlinear method and for the second nonlinear method is other quantile regression forests. We have nonparametric frequentist method. And what is different from standard random forests is that for standard random forests, we try to approximate the conditional mean. And here we try to approximate the conditional distribution. So when you drop down one predictor x, which might be high-dimensional, for standard random forest, you would only store the conditional mean in each tree. Here, you have to store all the values because you want to all the observations because you want to approximate the conditional distribution. But it's an extension by Mainz-Hausen from the standard random forest to the quantile random forest. But the intuition is the same. You want to grow a large collection of trees and then using only a subset of all the predictors at each node to decorrelate the trees, so to ensure that you have heterogeneous trees and thereby reduce the variance and improve on the bias variance tradeoff. So the macro predictors we get from the FRED MD data set. But let me introduce how we obtain our textual predictors. We obtain them as news attention measures from a large collection of newspaper articles. So let me briefly give you an intuition what topic models do. So topic models try to capture the stochastic process, which most likely has generated the text. So each document, as documents, we use roughly 800,000 newspaper articles from New York Times and Washington Post. So this is our so-called corpus, the collection of documents. And the idea is that each document has different documents here, and each document is a mixture of latent topics, which are outside the specific documents. And each topic is a probability distribution over words, over a set of vocabulary. So each word is part of each topic, but with different probabilities. The first topic is about genetic articles. We have prominent words like gene, DNA, genetic. And of course, as a user, you have to label it. So it's not because labeled by the machine. So all documents share the same topics, but with different probabilities. So given that we have the topic proportions here for a given document, of course, this is the job of the topic model to estimate those quantities. We have to estimate the probability distribution of the words within each topic. And we have to estimate the topic proportions for each document. Of course, we don't observe these quantities. We only observe the words. And then it's the job of here we use the correlated topic model to estimate the probability distribution of the words of the topic proportion for each document. And what we work with, what our predictors, are those guys here, the topic proportions. So it's about the attention of a given topic at a given point in time. So if, for example, we have a topic where words about inflation prices have a high probability and the document has a high topic proportion for the, say, inflation topic, then media coverage at a certain point in time is high for inflation. So we use this as a news attention measure. And the advantage of the correlated topic model, in contrast to the more standard latent Dirichlet allocation model, which is more famous, is that we can capture that some topics tend to occur together within one document. So if the document is about inflation, it might be highly probable that there appear also words like commodity prices. But not likely that it's a topic like. So the topic proportions are only about the content of the articles, not about sentiment. So there are different directions. In text analysis, you could go. You could, of course, do both. Extract measures of sentiment and of content. Here we focus on the content. But of course, it's possible to do some tone adjustment to add sentiment predictors as well. So when extracting the vocabulary for the topics, we only use the documents until our evaluation period, starting in 1999 and October. So the first documents are available in 1980. But to set up the topics, we only use the vocabulary until 1999 to avoid any look ahead bias. And then in a given month, we estimate the out-of-sample topic proportions and simply take the average over all documents in a given month. So here are examples from our 80 topics. As a user, you have to specify the number of topics. We go with 80, but have, of course, robustness checks with 100 and other numbers. So but it's not very sensitive. So and from our 80 extracted topics, for example, this one, we could label a topic a 71. We could label it inflation because the most prominent words are prices, commodities, and so on. Of course, we could, in the appendix of the paper, we have the most probable words in each topic. And it looks like what you would expect. So inflation, the media coverage goes up here in at the end of the sample or the housing topic during the great financial crisis. Or here you have the Gulf War and different debt prices, Mexico, Euro, and so on. So those are examples of our news attention measures, which we use as predictors. So let me briefly give you the forecasting setup. So we have three different sets of predictors. The threat and the data only. We use vintage data to avoid the bias. And we have one setting where we use textual predictors only and then a combined set where we use both of them. And we include in each setting 12 legs of the respective target variable. And so for nowcasts in the given month, T, say we are at the end of December, we can use the macro predictors from November released in December and the textual data from December and the financial predictors in the threat and the data set, like exchange rates, which are also from December. Similarly, for one step ahead prediction, say we are at the end of December, one to forecast January, we use the macro data from November released in December and textual data from December, as well as financial predictors from December, so we have the advantage from textual data that they are more timely available. So we start in 1980 where the first news articles are available. And then we go with recursive estimations on an expanding window, starting with the evaluation period in 1999 in October. And we use the standard measure for evaluating our predictions using the quantize score at different levels, of course, in the tails, which might be more interesting, but in the center as well. So here are the outputs for the nowcasts and then the next slide for the one step ahead forecast. Everything is measured relative to an AR1 benchmark. It doesn't matter too much if it's AR1 or AR2 or AR4. What you see is in the columns, we have the different models from the different shrinkage priors, horseshoe, lasso ridge, and the two non-linear models, Gaussian processes, quantized random forests. And in the rows, we have the different target variables, employment, inflation, production, and sentiment. And so the benchmark is the quantize score at 1. So if we observe a quantize score below 1, meaning we do better than the AR1 benchmark. And we have three different predictor sets. The yellow one is the threat-only data set. In green, we have text-only. And blue is the combined data set. And if you see dot, which is colored inside, that means it's significantly better in terms of the default no-yano test at the 10% level for one-sided test. So what are the main takeaways? First of all, it seems that textual data improve the predictions in the tails. For example, look at sentiment, the combined predictor set, textual data, and threat-and-d data. The blue line does better in the tails than only the threat-and-d data. It's not always the case, but on average, across the different target variables and the methods, we observe that, especially in the left tail, text data add some value consistent with the notion that during COVID or the financial crisis, they are timely available and they bet additional value to hard predictors. What we also observe is that they might be more important in the linear models. They tend to be competitive once we add textual data. For example, here, inflation, when we go with rich, so large improvement when we add textual data in the left tail. We observe, in general, better performance for the non-linear models, for the Gaussian process and the quanta random forest, especially for Gaussian processes, which seems to be the most powerful method here. So non-linearities turn out to be important. And our interpretation is why textual data add more in the linear models, especially rich, is that perhaps they are compensation for the lack of complexity in linear models. So given that there's complexity missing in the linear models, they add more than in the non-linear models. So we have these different shapes, this hum-shaped form for the non-linear models and what the u-shaped forms of quantized scores for the linear models. So again, especially the non-linear models do very well in the tails on average. We observe large gains from textual data, especially for predicting consumer sentiment, which is consistent with previous findings in the literature that news data are important for forming expectations for households. And on average, we can say not in every case, textual data add something, but even if they don't lead to more accurate forecasts, they don't hurt. So and similar for the one-step ahead forecasts, but it's more difficult to achieve lower quantized scores. But on average, we see the same patterns. So it's the summary of the results, especially we have the gains in the tails and for the linear forecasting models. Interestingly, the rich shrinkage spryer does better on average in the now cast and the one-step ahead forecasts, even so it's the most simple of the three shrinkage priors, only allowing for global shrinkage. And this is consistent with a dense representation of macro predictive tasks. So many variables are weakly important. So we don't have very strong predictors, but many of them are in some sort of relevant, but not that much. So and we try to shed some light on which variables are important and to give, have a sense about the variable importance across heterogeneous and nonlinear models. We did the following. We approximated our quantized predictions with a less suetype regression, which has been suggested in the literature. Now you see the standard lasso regression. Here's our quantized forecast and here are the predictors. And so which variables explain our predictions? And what we find here is we did it for the 10% quantized because less tail might be more interesting than the center of the distribution. And what you can see here, the bars tell you the share of non-zero coefficients, the survivors of the lasso. And what we see is in the red bars correspond to the threat data and the green bars to the textual data. It's quite balanced. We don't have on average a clear overweight of one or the other. What we can see is, again, for sentiment, the textual data turned out to be quite important, which was consistent with the previous slide that adding textual data was especially important here for sentiment. And once again, we observe or we have evidence of a dense representation of the prediction problem because we see many non-zero coefficients. And if we play around with the tuning parameter alpha, then it can happen that, for example, here for the quantized random forest, for the now cast to the one step ahead forecast that none of them or none of the predictors are relevant at all. So meaning that it's evidence for a weak, say a dense representation with many weak predictors. So the key takeaways, I don't reiterate them, but maybe say what could be our future directions for this work. So obviously, on the data side, we could extend the analysis with sentiment scores. I think it would be also interesting to add a service of professional forecast data. And on the methodological side, I was inspired yesterday by Julia Montauan's presentation about using quantized scores for density combination or model combination density forecast. I think in my view, those are the most interesting avenues for future research. But I'm curious to hear yours and Jasper's. And thank you. Now we have a discussion by Jasper de Winter. Jasper, from the Schunter Bank. So first of all, thanks a lot for the invitation to discuss this really interesting paper by Reiner and co-authors. So basically, the main idea of the paper is to explore the benefits of textual predictors from monthly terrorist forecasts. And they do this for four variables. So employment, industrial production, inflation, and consumer sentiment. And they use a correlated topic model, as Reiner explained on a large database of English news articles. And they analyze linear and non-linear models. So that gives you a nice, as you can nicely see, as he shows that you can see the difference between the linear models and the non-linear models. And I think one of the contributions is that he uses textual indicators, but not only to forecast the mean. So in most of these models, they're not modeling the quantiles, so that's something new. And I think there are two main insights. There are a lot of insights in the paper, but the two main takeaways for me are that non-linear models have a higher now in forecasting accuracy in the tails of the distribution in linear models. And that these news topics are especially beneficial for forecasting the tails of the forecast. So they're not so much informative for improving the forecast of the median, but more for the left tail. So I have some comments on the paper, but I picked out four that I think are interesting. And because I think it's fair to say it's a rather empirical paper, or it's an applied paper, so most of my comments will be on the setup of the empirical part of the paper, but they also have a technical comment. All right, so my first comment is about robustness of your exercise to shifts in the timing of the real-time exercise. So what do I mean? I saw you use the FRED MD database, and I wasn't sure what the exact timing of the database was. So I emailed the maker of this database, Michael McCracken, from the Fed. And he said, well, we download all the data on the last business day of the month. That's the timing of the database. Then you use 100 monthly indicators, so 21 financial and 80 news topics. And you impose that microeconomic indicators have a one-month publication date delay, and the financial indicators and the news indicators have no publication delay. But you have to remember that the outcomes of your analysis are then only valid on the last day of the month, right? Because if you shift the exercise by a week or two weeks, then the outcomes might change. Why? Because as Michael Bambourou is not here today, and also Gerard showed in a paper already from 2011, I think, it depends a lot on the time, on the exact day in the month that you do this forecast. And a recent paper by Kanota Kanzama in 2002 shows this for inflation forecasting. So he shifts it. So for instance, in the US, you have like, I think the unemployment figures are really seven days after the end of the month. Industrial production two weeks after the end of the month so you can imagine that the incremental power of these news topics might decrease, right? If you add this, if you shift the analysis by two weeks, then it will be a different picture. So what I did is just basically do that. So I took the database, all the indicators you have that you use from the FRED MD database, looked up the exact publication calendar for the indicators between April and mid-May, and see what happened to the publication delays. So in your database, in April, you have 21 financial indicators with the same publication delay. Well, if you shift by two weeks, then the number of macroeconomic indicators in your database will increase by 54. So that means roughly that in your paper, you have 21% of the indicators, the hard indicators known. If you shift by two weeks, that will increase to 75%. That's very likely if these indicators have high correlation with the things you forecast, that this will decrease the forecasting power of the textual predictors, OK? This brings me to my second comment if the presentation works with me. And that's about the indicators you include. So in FRED MD, there's only one survey indicator. While we know from previous research that survey indicators are very good indicators, why? Because they're available timely, just as your news indicators. So this is a fierce competitor to the news topics. But it's not included in the analysis. There's only, so for instance, if you forecast inflation, you only have the Michigan Consumer Sentiment Index. So my advice would be to increase it. And it's very easy, because these indicators are all available. The only thing is they're not in FRED MD, but these survey indicators are not revised. So basically, you can include them. And then some indicators are even available before the end of the month. So again, this will probably decrease the value added of these textual indicators, which is something that is found also in other research. So these are two comments on the data part and how you should interpret results. And my advice to include also consumer indicators. And the last empirical comment that I would like to highlight is, it was discussed lengthy also yesterday with the impact of outliers in your data set. So what I show here are your four indicators that you have. So that's industrial production, inflation, employment, and consumer sentiment. And I plotted the bands. So these are kind of three standard deviations of this indicator over the period, 1980, 2008, August. So just before the financial crisis. And you see that in the estimation of your model, this will probably play some role. So you do an expanding window. So up until 2008, there's not much of a problem. Then you get the financial crisis. This might disturb your coefficients in the model somewhat. But actually, only COVID might be really problematic, but it's at the end of your sample. And then again, your quantile regression somehow ensures you against these outliers, because you don't do a regression to the mean. You use quantile regression, so it's somewhat insured again. But what might be your problem is that in the way you show your results is you calculate the quantile score over the entire period. So my suggestion would be to show the quantile score in a cumulative sense. It was shown yesterday also. So do a rolling quantile score. So you can see where it adds the most in the, so where your news indicators add most value. And usually, from previous research, we see that the relative forecasting performance improves around crisis. So then you really need these monthly indicators. In tranquil times, it's very hard to beat an AR model. But during crisis, it really helps. So that would be a suggestion to use that. Then I have, I think, two more minutes for the comment on the model. OK, so what you have in your model is you basically fix the topic or the words that are in the topics over the period 88, 99. So I think that's a problem when you do recursive estimation and you want to see what happens in time. Because a word like Brexit is basically not inside your topic model. Is that important? Well, we did some research for the Netherlands, and there we changed the topic model. So we sort of made dynamics into this word topic distribution. And you can see what happens. So from 2000 until 2013, this is a relevant measure of the importance of this word. And I don't want to go into the details. But then there was a speech of Cameron in 2013, June, where he announced that there will be a referendum on Brexit. And then right after you see, if you have a dynamic word topic distribution, of course you see the importance of this word increase then. And then later on when the referendum was announced, it was increased again. And in your context, this can be problematic because the topic proportions that you have in your model will probably change. So if you don't include Brexit, that word, then it doesn't allow you for an increase in the topic. So this to make this more clear. So here I show you how the word topic distribution changed for the topic financial markets in our model. So on the left side is the first time slide, and on the right side is the last time size. And then you can see, for instance, the word ECB, of course, it wasn't existing before 1999. So this first slide didn't take that period into account. And the last slide doesn't. And you see an enormous increase in this word. So it would also, in your case, it would also increase the topic proportion of the topic financial market. So I think that would be interesting to see if you can do something like that. So overall, I think it's a very nice paper. And it combines state-of-the-art basin techniques and topic modeling. And I think it really stimulates further discussion on tail risk now in forecasting using textual data. So thanks a lot. Thank you. We have time for maybe one or two questions from the audience. So my comment is that I'm curious why you didn't include the Wall Street Journal. You're including two journals that are typically considered on the left in the US. So I'm wondering if that's introducing some bias. And I don't know if you have thoughts about that. I'm Wajaj Mazur from National Bank of Poland. I have a question as a follow-up of excellent discussion. Perhaps those topics shouldn't be treated as independent variables, but as things that move parameters of thread data. So I am saying you could use some kind of non-linear combinations perhaps in a way because it seems to me that looking at this Brexit example, it looks like structural change in a relationship not as a single bump. So perhaps you could look at this this way and I would be interested in a longer horizons, right, how it works. Thank you. Just a very follow-up comment. I think that sentiment measures may be extracted from these newspapers might help quite a bit in terms of prediction. And we've seen it in a variety of ways that send measures more credit to uncertainty risk and maybe more focused also on economic outcomes or related to Pablo's point. I don't know if you use all articles in these journals, but you might add a lot of noise. So I don't know if you can subset on articles that discuss economic conditions or that are more related to the topic you're trying to, sorry, the variables you're trying to predict. Hi, thank you. This is Eleanor Bobeka from the ECB. I was wondering whether when you try to forecast for instance inflation, you used all the variables that you got from the textual analysis based on all the topics or just several or, and if you used all of them actually, which ones you found maybe more important for inflation, let's say it would also be interesting when, is it maybe the real side related variables or maybe the nominal ones related to inflation particularly? Thank you. Okay, thank you. So maybe right now, if you want to rapidly answer the questions. Thank you for the comments and the suggestions. All of them are valid points and some of them are particularly interesting like including the survey indicators and doing the dynamic word topic model. So I have a more dynamic treatment. Yes, concerning the why we didn't include further articles from other newspapers, it's those were part of the Lexis-Nexis database. So we could have a look whether we could perhaps include more newspapers. For example, for inflation, which topics turned out to be important, there were more on the nominal side. Sentiment is something we can include which would have several ways to do it, to add sentiment scores. I think also the interactions of hard economic data and the textual predictor is something we might have a deeper look into. And so I have to think about the first, your first comment with the last business day and the delay. So it could be more precise about that. So if we address those points, some of them will decrease probably the value of textual indicators. Like if we include survey data, I'm quite sure that the value will be decreased. But let's see whether we still have some gains at least on the left tail. But on the other hand, if we have more dynamic models, perhaps we can have some gains on the other side. Perhaps we miss something with our, because we have words like Brexit not in there. So let's see what happens if we include those changes. Thank you. Okay, thank you. Thanks a lot for this presentation.