 Hello, everyone, and thanks for coming. I'm Tomasa Rodrigo from VB Research, and I'm going to present together with Alvaro our work to measure and monitor central banks' communication strategy using natural language processing and topic modeling. We have structured the presentation in two parts. First of all, I'm going to briefly motivate the analysis, and I'm going to describe the data we use and the methodology we follow. After it, Alvaro will present and will focus on the main results, showing you some of the applications we got for some central banks. Well, before starting, why this is important? Did you know that 80% of the total amount of web pages on internet is composed just by text? Think on the information in the media, in the social media, in blogs, or in economic and financial reports. All this huge amount of data increase significantly the available information for the analysis, and we need to take advantage of it because it will help us to understand the society, the economy, and the world. So why we are not taking advantage of it? Well, before it. Nowadays, we can exploit it using natural language processing that is also known as text mining or computational linguistics. We can exploit all this amount of data. In simple words, what does it mean, this methodology? It is just to quantify text, to extract meaning from the letters, and to convert it into numbers. This novel approach will help us, combining with traditional economic tools, to improve and increase the potential of applications in economics and in any other field. There is a huge potential for exploiting this new data to enrich the knowledge of the economy, the world, and how the society behaves. For doing it here today, we are going to show you an application to measure the central banks communication strategy. What is this strategy? With this strategy, we refer to the information that central banks release about the economic situation and the current and future policy decisions. These decisions are really important because they move the financial markets and they play a key role in the economy. So until now, we know we have qualitative information to know how they think, how they behave. But it was nowadays when we can quantify this communication strategy and to see the impact and evolution over time. So let's go in to take advantage of this 80% of data that we have and we haven't deployed until now. This slide summarizes really well the whole working process we follow in this project. First of all, we are going to start from the data we get. Then we are going to explain and to show you how we preprocess the data, how we clear and transform this data. And finally, we are going to apply topic modeling and sentiment analysis to have clear insights about all this and structure text. We are going to go through the presentation to each of these steps in the process. Starting with the information. As I said before, we are going to analyze the central bank wording referring to monetary policy. Throw the communication reports published in their websites. For doing it, as you may imagine, we use web-scribing techniques. And basically, we are going to focus in three different types of notes, press releases or statements, minutes, and speeches. What do we mean by statement? This is a short report about the decision on interest rate that is released immediately after the meeting on monetary policy. Some central banks decide to make a press conference where the president of the central bank explain the decision on monetary policy and answer questions from journalists. Some days later or even some weeks later, a more detailed document explaining the reasons of this decision on monetary policy together with giving an overview of the financial market as well as the economic and monetary developments during the month is released. This longer document, it is called minutes. And finally, moreover, we are going to analyze speeches and articles that some seniors, central bank officials, done during the month and they are published in the website of central banks. So we are going to take into account this huge amount of information regarding this central bank for analyzing it. It is called corpus. This is all the information that we are going to call from now on corpus, okay? When this corpus is identified, we are going to clear, to transform, and to preprocess all this data, to convert it from an instructor test to a final database that is ready for the analysis. As you can imagine and for sure that some of you have working in this type of process, you know that this is the most time consuming part of the process. This is really painful to work with it. We took several steps in order to have the data that we can use for the final analysis. For doing it, and just to summarize, first of all, we break the documents into tokens. Tokens are just a list of words, numbers, punctuation, and symbols, just to put it like in an instructor. Then we are going to use all these tokens for getting the root of each word that is we are going to do steaming with the information. We are going to get rid of all punctuation and also to get rid of stop words. Did you know what I mean with stop words? These are words that appear really frequent or really infrequent in the document, and literature has proved that this word cannot disentangle content from one document to the other. So we are going to eliminate it in order to reduce dimensionality. Once we have all this data more or less prepared, the final step is to construct the document term matrix. Do you know what is that? It is like a huge matrix where we have the frequency of each term in each document. In that step, we convert this and instructor data into numbers, and now we are ready for the analysis. We can go to the next part. In the next part, what we are going to do is to explain the topic modeling or the approach that we use for modeling all these texts. Here you can see the image of the main model that we use. I'm not going to go through each technical step of the model, but I will be really happy if you have any question at the end of the presentation. You can come with me and we can discuss about it. But since we want to show you all the applications we got, I'm going to just give you the idea, the main insight of this model. We base our analysis on the Latin Direct Sled allocation. This is an unsupervised machine learning algorithm based on a Bayesian model that gives a probability to each word that appear in a document. That is, it is going to give me the optimal allocation of words in a certain number of topics over all the documents. So at the end of the day, I'm going to know that depending on how these words appear together with each other, I'm going to have some vectors of topics that are commented in the article. So without reading the article, I can have an image about which topics appear in this article. And finally, now I know what the article is talking about, but the other part that is really important to know was what is, how they talk about those topics, right? And for doing it, we are going to apply sentiment analysis. We rely here on the lexicon approach. That is, we are going to use different dictionaries in order to capture words that has a positive connotation and words that has a negative connotation in the article. Okay, so here you see two of the dictionaries we are using. We are using the dictionary of Lorena McDonnell that is a dictionary focused on financial markets, so financial issues. So this is the issue that I care about. And also the dictionary of the Federal Reserve for financial stability. So you see that we have a huge number of words. Here is just an example that are classified in those two groups. So at the end of the day, I'm going to construct an indicator that gives me the average sentiment of the article that is just to take into account all the words with positive connotation minus all the words with negative connotation over the total number of words. So at the end of the day, I'm going to take a measure that could go from minus 100 that it means that no positive words appear in the article to 100 that is the opposite. No negative words appear. Normal values range between minus 10 and 10 with zero indicating neutral. But now as you can imagine, this zero could mean that in the article appears a neutral language or could be that positive words compensate with negative words. So take it in mind once you are going to interpret the results. Perfect. Now we have a complete, a comprehensive image of all the process we follow to get the results. What we care about. Now we are going to analyze here three central banks. The Federal Reserve, the European Central Bank and finally the Central Bank of Turkey. Here you can see an example where we see the words that appears most frequently when we speak about monetary policy. So that's true that it can appear terms like percent, interest, borrow, overnight. This is for the three central banks. Then it is important also that you take into account that we are not going to take just the words like independent. It is important to take the relationship between them. Why? Because there are broader terms like monetary policy that appears together and we want to capture it. So here in the network you can see like monetary policy appear together, industrial production, current account balance. There are words that normally appears together and we are going to consider it. Then we are going to study the evolution over time of all those terms. But here I would like to tell you something. If we take the easy approach that is just to take this frequency analysis and give me the words that appear most frequently in the article. Over all the year we are going to capture almost the same words that are inflation, growth, prices. So you see that there are like little variations between one year and the other. Nonetheless, I explained before that we are going to take another approach that is to introduce a model that take into account this frequency analysis and take into account the relationship between words to give me a better image of the topics that appears in the article. So here each word has a probability to be included in one topic. And a document is a mixture of topics, right? So this is a good way to identify words that appear together and could be related to one topic to the other. Taking all this information into account, now Álvaro is going to show you the main applications we got from these three central banks. Thank you very much, Tomása. Well, at BVDA Research, we are an economic analysis department. So we work intensively in the techniques. We use artificial intelligence as much as we can. But the final purpose for us is to understand things and to solve questions. So at the end of the day, our simple and final goal is to understand how the central banks, what they are talking about, which is using the dynamic topic model to identify those topics and how they are related to each other, which is also important. How are they talking about these topics? And this is all related to the sentiment analysis that Tomása's playing. And also, which is important is who he's talking about because we can identify the different members of monetary policies of central banks to distinguish how they talk to each other. So from the dynamic topic analysis, what we identify first is these workloads. Normally you fix 40 workloads, 40 elements or 40 vectoral words that the analysts have to identify. So there is a room here, it's all for the machines, it's all for the dynamic international, artificial intelligence is our work, also complement these because we have to identify these workloads. For example, you can see here the European Central Bank and the Central Bank of Turkey. And you will see that every central bank has is a specific language. In the case of the European Central Bank, for example, they talk of course about the economic, they talk about monetary policy, but they also talk of important things for them, which is the banking union, the monetary union integration and the quantitative vision. Once, if you remember years ago when the interest rate reaches zero, they have to use another tools to fight the crisis. In the case of the Central Bank of Turkey, which is an emerging market, we identify another word clause relative to the global economic flows, to the global markets, to the economic activity and also different kinds of economic activity. We are analyzing here the Central Bank about think about any document, any report that you use in your life, the company reports and you can do the same. Monetary policy and inflation. And we once we identify the topics, we group these topics and categories and for example, you can see that these topics are not static. They are evolving over time. And here you can see from 2006 and then we have the financial crisis in 2009 and 10 and now we have the recovery from 2016 onwards. In the case of the Central Bank, of the European Central Bank, what you see for example is that the gray color, it is increasing significantly over time. What is this? This gray color, it is what we call non-standard monetary policy. Once the interest rate reaches zero, the Central Bank has to use another tools, especially quantitative arguments. In the case of Tarky, they are also changing over time. And you can see for example, that in the lower part of the graph, the global flows are gaining importance. We can also use networks because networks are a useful tool to see how these topics, once we have identified the topics, how these topics relate each other. And again, everything is dynamic. So this relation is not the same in the pre-crisis period until 2007 that during the financial crisis and during the recovery. Let's see in the upper part of the presentation the European Central Bank. The pre-lead bank, the pre-financial crisis was a standard relation of the topics, S&P is a standard monetary policy related to economic activity, single currency liquidity in banks. In the case of the emerging market bank, which is the Central Bank of Tarky, they talk about inflation, economic activity, and monetary policy. Then something happened in the world in 2009 and also in 2012 in Europe and they changed completely. You see this S&P which is non-standard monetary policy. Everything changed in the upper part of the panel and the relation with the other gets more complicated and they also influence what is happening in the Central Bank of Tarky which is a small Central Bank which is very dependent on the European Central Bank. After the crisis, they start to normalize again. But what is important for us is to understand that during, after this crisis, the monetary policy makes much more complex for the Central Bank because they limited the standard tools and they have to use new tools that we're using in the North and inflowing later in the rest of the countries. So this is useful for us to understand the complexity. But we can also use in sentiment analysis not only to talk, to analyze what are they talking about but how, with sentiment. We can see here, for example, in the case of the Tarky Central Bank, how, and this is no numbers here, economic activity and inflation are evolving over time or also economic activity and employment. Or more specifically, how the monetary policy is tightening or is relative to the total report. So at the end of the day, we will be very interesting to know, are the Central Bank prepared to tie rates or not? And it is very difficult, very different from the statements that from the minutes as Thomas had commented. So finally, we have to recheck all of this information to check with the manual or other analysis for cross section. Last but not least, we are using also these analysis to analyze the Federal Reserve. And as you can see in the last slide, we can use all of these to identify how the different people talk related to the normal. And this is not the same from the case of Mr. Grispang, Mr. Bernanke, Mrs. Yellen or Mr. Powell. Remember that all of our analysis you can find here in our webpage. Thank you very much. Okay. Okay, thank you so much. And now it's your time for questions that we think is the most interesting part of it. Here we just want to show you one of the applications that we got. But the important message that we want that you get from this presentation is the importance of analyzing all this amount of data that we have and we haven't exploited. There is a huge potential for getting value for your business, for your university, academy or for your personal life also. So any question about it? Thanks for the talk. Are you using this tool in production now? We are not using it in production because we are really focused on economic research. So we are just, we have it and we update the index. Every time that the central bank publish a report that is every month. Until now we use it for introducing in our models to see how they behave and how to monitor these all these topics and to check it with other literature. But the idea is just to produce it and to have automatic updates of all the indices we have created. No more questions? Okay, so thank you so much and enjoy the lunch. Thank you.