 So should we start or do we have to wait for people? We start, or there are people who should be arriving, OK? Actually, there is this. So welcome, everyone. Just to take my notes. So again, I would start with a very faster recap of what we were discussing last time, then try to finish up what we didn't discuss last time, and then get to new things. So OK, let's also write them or recap. We are discussing a bit the mechanisms market, so how they function. It was the dynamics of limit order books. It was, which is just to understand, the main point is to know how things work. We discussed briefly the Bachelier model, or Bachelier's ideas, and the random walk model. And then we got to look at empirical things. So I think there were two things to take home. So one was the question of diffusivity of power prices, which in general, OK, we discussed diffusivity, but the main take home message is that unpredictability. So in a simple linear manner, unpredictability of price changes, and there was the question of the distribution of price changes. OK, we'll write return. So this came up yesterday. So return means price change. That's the answer for all. So distribution of returns. So what we saw there is that it's some factilled distribution. It's not Gaussian. We discussed that it has a power-lowish tail. And we discussed the meaning of this, that the probability of a tail event is extremely different in a Gaussian setup or in this case. So just to see, so these were, boom, boom, boom. So essentially, there was this point to understand. So how was the functioning? There were, I think, these points which were impressive. So diffusivity, we did looking at two different ways. And the point was that, on long times, you see diffusivity, and on short time, you see some deviation from it, essentially, some mean-reverting behavior. And there were these type of figures about the fat-tailedness of the distribution. And so one thing I wanted to mention is that there are these plots that we look at. I think one important thing from this course is to understand the empirical things, understand to read a figure, to understand the main information from it. So you have ideas how to analyze data if you get the data. And also, I have the capability to read figures. So it's important to understand the figures. It can be important also for an eventual exam, et cetera. So this is very, very yesterday. And I wanted to discuss a few more things. So OK, about the distribution of returns, what we discussed, OK, it's fat-tailed. It was the student distribution. But what we also mentioned, that it has a finite second moment, typically, the distribution. So one question, actually, which I think came up during the course, is OK. But so eventually, do they become Gaussian distributions? So the question is, what happens on longer timescales? So here the claim is, OK, things are non-Gaussian, but finite second moment. Then do they become so? So if you have finite second moment, do distributions become Gaussian on some longer scales? Because what we looked at here was up to one day. So the question is, OK, do they become Gaussian? And the answer will be no. So we can look at this, which visually won't, OK, sorry, this was so last time. So this is actually the distribution for here. We are at daily and monthly returns. So it is the same idea that we had before. Just what we have here is the return. For some reason, it's with an eta here in units of its own standard deviation on the x-axis and distribution of this variable. And so what we can see, again, that even at monthly scales, we are, so this is the Gaussian approximation here, we are very, very much off. So the answer is no, because I'm surprising from the fact that I wrote it off. So they are not Gaussian. The reason for this, I will see in a moment, but the reason for this is that there are strong otocorelations in the volatility itself. So this we will see. But OK, so we discussed the central limits here. And why it doesn't go to the Gaussian, it's because of strong otocorelations. And I will show this in a moment. But one more thing that I wanted to discuss here is that while, actually, this also came up as a question the last time, that while on the shorter timescales that we discussed, we always had these two curves, which was one set positive, one set negative. I always pick on this side. Is it visible from there when I show something, or it's not at all? OK, maybe I will move. So anyway, so there is always the positive and the negative tail. It's just flipped over the negative. And while we showed for shorter timescales that they were very much similar, here we start to see that there is some type of deviation between the two. And then actually, if we look at the tail exponent itself for positive tail and negative tail as a function of going further out in the tail, we can see that for small returns, the two tail exponents seem to be the same. But then after, I don't know, two, three sigma events, so events that are two, three times the typical deviations, or larger than this, we see that there is a difference. And the slower decay for the negative case, so probability of large negative events being larger than large positive events. So this is just an empirical fact, actually. One expects is that times when there is a huge downward jump is more typical than a huge upward jump. So I wanted to discuss the question of this autocorrelation in the volatility. So what does this mean? So what one can simply do is, OK, we were discussing returns here, which price changes, which we call R. But of course, one can write up the returns some way in, it's a sort of trivial writing up. Usually what you do is write up in the following way. So describe somehow the volatility of the moves, some positive random variable. So let's say this is the volatility of price changes, but this is a signed variable and you can put in a separate random variable the direction of events, or something which does have a sign and it will be somehow, OK, it will have a unit variance, and it will be my IID if you're looking at it. So if you write it like this, so a simpler way one could, of course, this should be one could also simply write this. There are some reasons that you can define them in different ways, but OK, you can say that, yeah, let's get the absolute value and the sign. And what one finds if you look at this, is that indeed this variable here, so the sign term, so the direction is IID. So today we'll be independent no questions, yeah? It's the direction, so OK. Actually, we can also look at this. So what you can do is either write it up as the absolute value times the sign, or you can have slightly different definitions of writing it up as a product of two random variables, one which is positive, which is the typical sizes, and one which defines the direction, which can have either one plus or minus one. It can also have some distribution with unit variance. So what you find is this guy here, so the sign essentially is uncorrelated. There are no correlations, but in this term here, you have what we usually call long memory. So actually this term is in practice, this goes as a power low with some exponent, it's called gamma. And so, OK, what is important, that what you find is gamma is smaller than one, which is the definition of long memory processes, but actually it's what we are exactly so discussing the, when we're discussing the central limit theorem that if the correlations decay slower than one over time, then OK, things still go to a Gaussian, but this is slower than that, so you do not expect to convert to a Gaussian for lunch times. This, by the way, means that the integral of the correlations is non-finite. So this is what you find empirically, and one way to look at this actually, OK, we could look at the correlation explicitly, so OK, this is just the correlation, we could look at this explicitly. What I have figure is actually not this, just for a change, because another way to look at this is what is called volatility clustering, or activity clustering, which is to look at, well, we can look at two things here, I will talk on this side now. So what we show here is on the upper plot, we have for the Dow Jones index, we have daily returns, which is a signed variable, we can see that it can be positive and negative in time, and what one can see is that there are periods of very high volatility, I mean long periods, this is on a hundred years, so what you see here is already decade, essentially, when you see something. So there are periods when there is high volatility, and there are periods where there is low volatility, they seem to be somehow clustered together, while, OK, the trivial example of Brownian motion here below is you don't see any type of clustering, there is no correlation in the size of deviants. So one way to look at this correlation, one way to look at this correlation is to look at the way the large events are clustered together, and you find that it is the case, and what is actually interesting is that, OK, this is on a hundred years, but if one looks at the same on five minutes, so then it's not daily returns, but actually I don't know in what time scale it does, I mean every second or millisecond looking at the returns, OK, it's less visible, but still already here you have the feeling that there are periods when, for an entire minute, there are only small price changes, and there are periods when there are large price changes clustered together. So any scale, time scale, you seem to find that large events tend to be followed by large events. Sign is not defined here, of course, on this, here it's the absolute, but here you can see that it's the absolute value which is clustered together, it's not the signed returns. I don't know if it's clear what I mean, I hope, yeah. Well, this, so the first and the third, you can always write up, a variable is it's absolute, variable times it's sign. You can do something more complicated if you want, but actually if you want you can also forget in the middle, so each term. Each term, so this is a volatility, so something which is, it's in the same, so a positive random variable which somehow describes the typical size of changes, and another random variable which is assigned, so it's assigned random variable which is the direction of moves. But there are several ways you can define this, but you can also forget what's in the middle. Let's say that return can be done as written as absolute value times sign, and if you look at the signs are uncorrelated, the sign of returns, but the size of returns are correlated in time. This is the way you can get diffusivity in prices, but not converging to the Gaussian or long time scales. Yeah, here it's the R of t, exactly. Sorry, well, on the left it's R of t, on the right it's the absolute value of, yeah. Well, the distribution of R of t, we have seen. It was the size before that only says that on different scales you keep your fat tails even if you aggregate, and so what I say the reason for this is the autocorrelation, and one way of looking at this autocorrelation I think it's more interesting, one could plot the autocorrelation function and put this guy, but the information here is somewhat the same, but I think it speaks more so that if yesterday you had a big change in the market, probably you will have a big change today as well, and it's a slowly decaying, yeah. For the sign of t, sign, yes? It can be or can't be, I don't hear you well. Well, so here, okay, let's forget the middle. Let's just look at this term and the raster, okay, and then it's easy. What I just say is that you can define in other ways, you could say that instead of putting r here you put the r square instead of the absolute value. Okay, that's a trivial example. So you can define in different ways depending on how you want, but the idea is that you can write it up into the street. And okay, so why I plot it like this is because I think it speaks more to me to look at the clustering. Actually, I think it's interesting if we won't discuss it at all, but also the way, so here we just see that volatility is correlated, but the way it decays is very similar to avalanche dynamics or earthquake dynamics. So if you have a big jump in a big event, then the number of events exceeding a given threshold will decay as a power law. So it's very similar dynamics that I think you discussed with Deepak. Okay, I'm not sure I wasn't here, but I think dynamics that come up in many models in physics. So this is about the clustering of volatility or you can call it actually, in other words for it is intermittent that we'll use in physics. And another thing to look at is actually, okay, so this was about the volatility, but what is volatility? Well, volatility is big because people are doing something in the market. So actually another measure to look at can be the activity of people, some definition of activity in the market. And so what I plot here is I will define, okay, activity can be defined in many ways. So how I will define activity is the number of mid-price changes in the market. So you say that, okay, if the price changes many times, then there is a lot of activity going on. This is one type of measure. You could also count the number of clicks that people do when they are turning online. But if you look at the autocorrelation of the number of mid-price changes in time, then what you see here, it's okay, here we have the same type of dynamics here just on different time scales. What you seem to see is two things. Okay, you seem to have some type of periodicity. Let's forget it for a second now. But you see that this autocorrelation decays slowly. So after here, okay, after 12 days, you're still at the autocorrelation of zero or two, three. So, okay, it means that, and these are in a 30 minute interval. So given the fact that in this 30 minute, there was a high activity, in 10 days you can make a prediction on this. Now there is a shape of this as well, which is due to the fact that actually, intraday there is a dynamic. So there are more people active in the morning than at noon. So there is a human effect. So one could, if you want to forget this periodicity, you can imagine just to look at these shoulders here, the way that correlation goes. And what we see on the right-hand side actually is the same, just we go up to hundreds of days and here we don't look at 30 minute windows, but daily number of mid-price changes. And then what we see is that, okay, so it's somehow the same information. We seem to have a periodicity which is now the weekly periodicity, so that people behave different on different days. You don't want to discuss this much, but it's important to know. But you see that, okay, we have one has to define a noise level for this correlation, but still at least up to 100, 200 days, you seem to be above the noise. So if you look at a lot of data, which means that when there is a really big event which generates a lot of activity, it will be very slow to die out. This is the correlation of the number of times the mid-price changes. So it's good to recall things. So there was this type of joke plot of a market, limit order book, where you can define several prices as we discussed. You can define the mid-price as the halfway between the lowest ask order and the highest bid order, one way to define. And the times these changes is, okay, it can change if somebody puts a new order somewhere here in between. So the blue or the red one here is replaced by someone else, or if one of them is canceled. So what we just say here is that one way to measure activity would be, okay, how often does this change? I mean, the more often people are putting orders, the more often it will change in a very simplified world. And this is the autocorrelation of this activity. So essentially what we say is that in volatility, we saw a clustering that we discussed here, but it can be discussed as a clustering and autocorrelation in activity in the market. So one more thing I wanted to discuss, discuss related to this type of model very quickly and then we get to new things is actually, it's nice to write up this way, the return. But in practice, if you look at it actually, what you find is that, so recall this equation here. In practice what you find is that sigma t and r and x i t are not independent. So are not independent. And what this means is that actually, so there is a strange correlation because it depends on the direction. So negative past, okay, so I will just show the figure. I think it's much easier to, okay, actually I will draw up. So actually, if you look at the correlation of these two guys, so if you look at the legged correlation of these two things, so which is the sign of the return and the size of the return, you see something like this. So you see something which is very close to zero for negative legs and it's somehow like this for positive legs. So what it means that past volatility doesn't have an effect on returns. There is no correlation in this direction. But past negative returns actually increase volatility. If they're, okay, in a simple language what this means, if there was a big price jump, there will be a large volatility after this. So one can understand this in a simple way that yeah, people become stressed about this. And actually if there was a large price movement upwards that decreases future volatility. So there is a negative correlation dying out slowly for positive legs and there is zero correlation for negative legs. Just show this, it's just for, we won't discuss actually this more. It's more in a question of general culture. It is called leverage effect and it's an important thing that happens in the market. If you want, I mean it's just something to know that I wanted to discuss. This is what we see here actually is exactly the same what I drew up. So for positive legs, you seem to have a negative correlation that dies out slowly in case of stocks on the left-hand side and in case of indices on the right-hand side. While for negative leg which is in the inset, you just have no reason. So, okay, so these were what I wanted to discuss this sort of simple empirical, what they are called stylized facts. So, okay, so we had a couple of them here. So essentially this was one stylized fact, this was another one. And then let's put, so we discussed volatility clustering which is, as we said, it's auto correlation of volatility or the same, but it's an activity. So what we discussed and so just to have a full list of these facts is the leverage effect. And obviously from the quantity of time I spent on talking about different things, the first three are very important. To know the force is good to know, but we won't discuss it, I think, further in the following. So, okay, is this, yeah. No, XI of t is, you can define it simply just a distribution, having unit variances, yeah. But for most of the practical reasons why I wrote it up this way is that for many things this is more, speaks clearly. So I just want to mention one more thing about these things but we don't have the time to discuss about them. So there are actually several models in the market trying to reproduce, but that can reproduce part of these stylized facts. So in traditionally in economics the type of model that people are using, I'll just list them here, but we don't discuss. So there are models which are called somehow the GARCH model family, which has a family of autoregressive models, that's the AR here. It's an autoregressive model on volatility itself. We won't discuss it here. There are more physics people worked on different multifractile models, which we also won't discuss. There are several herding models. Of course if you want to do this autocorrelation of clustering, herding models can lead to these dynamics and there is also, okay, so there is also many things reproduced in the so-called minority game, which surely we won't discuss here because it will take more time, but there is Matteo who is the right person to discuss about that. So okay, so that's it I think for this type of stylized facts, if there is no question. Yes? Yeah, okay. So it's the correlation between this psi t, so which is essentially, let's say this contains the direction of the price move and the sigma at the later time, which is the typical size of absolute size of moves. So what this correlation says is that for negative times this is very close to zero, meaning that for negative times close to zero means that past large, the fact that in the past price moves were large, have no predictive power on the direction of the price in the future, but for positive times, you see something which has a structure, which is actually negative. So first of all it means that past large price changes, so the direction of past price change has a predictive power on future size of price changes. Right, is it clear? And the way it has actually it's negative, so if there was a past downward move of the price, then the typical price moves later, so the absolute value of price moves will be larger. While if it was an upward move, it will be smaller. One could actually try to explain this type of behavior in some very simple model, but it wouldn't describe it. I don't want to go more into detail. I think it's more, the leverage effect for me in this case is more to hear about it. Okay, so the next thing, can I erase the stuff? So we see this, discuss all these dynamics for individual assets. It was for single products, how about the behavior and we had this, what we call style aspects. So the question is, how do the, what are the co-movements? How do those look like? So it's clear, co-movements. So the motivation of this, for this of course, is we will discuss in a second about other motivations, but one is okay, if there are market-wide, if stocks moved or different products move together, that's a question of for stability of the market. I understand this maybe for regulation reasons. And then of course, there is the other point, if you yourself are trading and you have in your pocket some portfolio of products, some basket of products, then the probability of having a large loss depends on the probability of things moving together. So it's somehow related, okay, what you want to do is diversify, of course, in a simple way. What you would like to do is that if you have two products in your pocket, if they are positively correlated, or if they are positively correlated, you don't want them both even, you want to buy one and sell the other, or some trivial thing like this. So okay, so let's define it, we discussed a bit, but just what is the measure that we are going to discuss in the future is okay. So if you look at, so define the return of the price as we did before, so on a product I, you can define it, we will talk about log returns actually. So there is, you can define, okay, let's put a delta T index here, so you can define the return like this, it's okay, it's the log return. What is important to know here, of course this is for product I, so this is something that we have seen, of course. What is important here is that I, before I never discussed this delta T dependence, so of course you can decide on what times, because you are looking at these returns daily or millisecond, and actually correlation structure will depend on this, but we won't discuss this immediately. I think if we have time we'll discuss it later, and so what we are interested in is the following quantity, again, so we have a delta T dependence in general, correlations between these guys. So most of the time we will assume, so we're looking at these correlations, what we'll assume is that R is zero mean, and unit variance, this we can do because we can always center and rescale things, and then handle the scale terms elsewhere, this is INJ, this, okay, sorry, so I is for a product, so I, sorry, so this is the price of product I, return of product I, and correlation of product INJ, so what we want to do is co-movement of product. And okay, so what in practice you are measuring, if you have empirical data, of course, is what this means, is the following, it's okay, but this is clear, and so the questions we are going to try to answer is, I think there are three questions, is that, so okay, how does this matrix look like? It's already a good thing to understand visually or just, so how does, so this C, of course, is a matrix, look like, how can we get information out of it? Why I write this, of course, from this, you see that, how do we get information, of course, if I have the correlation matrix, you can say we have everything, we have the information, but we'll discuss is that life is more complicated, it's not that you can just trust the correlation matrix, it's simple, simply, and another thing which I'm not sure if we have time, but the question is the delta T dependence, the dependence of this scale that, on which we decided, looked at your price change, so actually, for motivation of this, actually, I have an exercise, how do you call it, so I mean, to do at home, so my idea about this is that I give the question so that I'm able to motivate this, and then it will be discussed in the tutorial, which I think is on Wednesday. So, okay, just to write here, so let's assume that you have M assets, which, for simplicity, will say that they are, they have Gaussian distributed returns, which we have seen that it's not true, but okay, let's leave it for now, it's actually not really important for us in this moment. So, okay, so let's assume, in this case, actually, unlike what we wrote here, let's assume that product I, as expected, price change is non-zero, so this is the expected price change of product I, and we have a correlation across product, so as before, in this case, of course, we have two, so it will be R, R, I, T, so this is the correlation we are looking at before got delta T now, so okay, it's a simplifying thing, and we own a portfolio of these, so a portfolio means that at least a basket of these, we have in our pocket, which we'll call some vector pi, so we have M products, we have, this is in our pocket, which is normalized, we will say that these are simplicity, okay, so this is what we have, and I have like three questions, which are actually very simple in a mathematical, from a mathematical point of view, it's more to understand what the concepts are, so okay, given that I know what are the expected return and the covariance matrix, what is the expected return and its variance of my portfolio? Is the question clear? So you have a bunch of these products, please give them away, the second question is okay, they have a correlation matrix, then okay, how to choose, choose in general of course, so okay, so of course you have these two things you assume to know, so the variance and the mean, you, it's something that you know, so okay, how to choose pi to minimize, okay, to either to maximize the return, it's okay, I write it here, but I don't know if it's visible, or to, actually I can write it in an email, I mean, if you don't have the time, I can write it to, so the first question is clear, second, okay, how to choose this pi, given that you know these guys here, I mean these three guys, to maximize simply the return, you don't care about anything else, or to minimize the variance of this return because that somehow it's a measure, if you never use this word, it's a measure of your risk, if it can vary a lot then, yeah? Yes, yes, pi is, okay, so pi is your portfolio, that's why we call it, which is, in this case, it sounds to one, it's just a fraction of your total money that you invest in any of these, let's assume that they are positive, okay, and so, okay, and to put something more interesting, so three, okay, if you don't want to just maximize your return or minimize your variance, but what you want is to do a sometime of constraint, just so what you want to do is maximize return with constraint on the variance, then how do you have to, so it's the optimization you have to write up, the idea is to use Lagrange multipliers, I mean, I think the calculations are simple, it's more to, I think it's a way to understand more things, and so I have four, which is a hint, okay, so what's the interpretation of Lagrange multiplier? It was a hint that there will be one, yes. What is the expected return and the variance of the return of your portfolio? You know are, so you know this, this, and this, I mean, you know, I don't give you numbers. I know only the percentage of each product that I have, but the price variation is not related to the percentage, I need to also know the value of one product. But you know the very, the change, the typical changes in the price, I mean, these things. Okay, so I mean, if it's not clear, we can discuss more, it's a bit, it's not a traditional type of exercise. If there is a question, you write me an email or you talk to the tutorial people. And actually to give another hint, actually for this problem, there was a noble price given. So which is, so the solution to this in a simplified way is Markovitz model, Markovitz is, I think he's still alive, who got, I think he wrote this in, okay, I don't know the year, I don't know when he got it, but he got a noble price for this, so it's worth doing it. You want to minimize somehow your variance while maximizing your risk. And of course, you will have freedom there. Okay, so this is just to get an idea why we will discuss correlation matrices apart from overall interest. So can we go, can we continue? Okay, so these things I can clean. So what we will do now actually is try to look at, so try to go through the correlation from the beginning to the end. So to first, okay, what we said, how does it look like and how can we get information out of it? So okay, so this is a, I made a correlation, it was me who made it, I'm very proud of it. A correlation of how many products, 275 US stocks. So what you see here is the matrix of what we drew up. Well, because of size, these are the different type of products here, different type of products here, not all of them are written because we don't have space, but it's a 275 times 275 matrix. And so this is the correlation matrix if one looks at it simply in a trivial way. So what you can see is that the diagonal is red, it's different, it's one, it's a normalized SPS we drew up. It's a normalized matrix. And so what you see here is not that much, what you see is that it's noisy, I mean not much to see. And what you see is that, okay, one thing you see is that it's more red than blue. So what you see is that, okay, the correlation on average is more positive than negative, but that's not, and okay, and as I said, you can see the diagonal. And okay, so what, sorry, I didn't get it. They should be the same, it's just because of, so they should be ordered properly in the same way. It's a symmetric matrix. So the 997 is the case, so what I think happens here is that Python, when plotting, uses some different rule for the X and the Y axis, but they are ordered in the same way. Actually, I can tell you that 997 is Apple, by heart, the others, I don't know. So believe me that they are ordered in the same way. So this is the correlation, and actually what one can do, it was much slower to get, I think, to this result when it came up, but actually in Python, you can just do some super basic clustering of this. You just, you don't even know what happens, you say cluster, and you get this thing, which is okay, not that much better for the moment, maybe. What this does is somehow reorganizes the products in order to create clusters, so to put similar products together. So okay, what you start to see is that there is some structure, so unlike here, okay, maybe here also if you have a good eye for it, you see that there might be some places where there seems to be a structure, here it's clear that there is something going on, so here there is, so here there is something more red than the others, here there is a big square, which is pinker, and this type of thing. And but still, okay, what you see is that the red dominates, which is not great. So one that you can do, okay, it's a trivial thing, okay, let's just zoom in the matrix, so what I did here is just instead of being, instead of between minus and one, I'm zooming in between, I don't know, zero and zero five, okay, it's trivial, so at least you see something, it's not just the average value that you see. And so if you do the same cluster, there is no difference between this and before you just zoomed in, you see this. So there is a very clearly, there is some structure in this, you seem to see that there are products which are, well, essentially you see squares, most of the things, okay, because of the clustering, you see these big squares around the diagonal. So there are smaller squares and larger ones, and actually there are parts which are quite blue, so bluer than, okay, the average here should be Y-tish. So you see these clusters that are positively seem to be above average correlated and seems to be below average correlated. So this is what you can do in like, okay, if you have the data, but you can do this on the correlation matrix very simply, and of course the question is, what are these clusters? And I think you have guesses, all right, I don't know. Do you have guesses of what the clusters are? Sorry? But everything is different types of products, but the clusters are, what you expect is somehow similar products because the correlations are high, but actually your first guess, or my first guess would be, but it's easy because it was written in papers as well, is that it will be somehow financial sectors. You think that, okay, great, this is data, we discussed this question of random work model, but still these are financial products, you expect that somehow a similarity between products is that they are on the same market. They depend on the same inputs, for example. So actually what one can do is instead of doing this clustering just by hand say, okay, let's look at each stock here, I have the list, it's long, and you can take the sector definition of these stocks. By sector, I mean that there can be technological, so probably Apple will be technological, there is industry, there is energy sector, so the oil and these type of things, and so there is a, I don't know, a ten-ish sectors that you can define, and actually if you simply order by, so there is no clustering whatsoever, you just simply order things, it's a bit ugly the way Python plots it, but you order them, so there is basic materials, communications, consumer cyclical, pump, utilities, if you just order them like this, well, you see a bit less structure than you saw before, especially that it's, before it was really in the upper left corner, right, but you see that, okay, so this seems to be what you are looking for, so it seems to be that somehow the sectors are very correlated among them, so you have these red squares around the diagonal, but there are also off-diagonal, you can see red parts, so I don't know here, one could define that this is energy and basic materials correlated, so one can look at this, what they are, and try to come up with, if you have intuitions, but okay, we see that this contains some of the information, and we'll go further to see, yes? The operation by definition is not always positive, but what you see, what, yes, so what I did is actually, simply, when you look at this curve, so which is the rogue stuff, you see that red dominates, so it's skewed to the positive correlation, so just per visualization, what I did here is zoom in and say that everything which is below minus zero one is put to minus zero one, so it's just, what are outside these regimes are kept from below or from above to this number, it's just easier to see the real structure if you remove some outlier points, but in general, most of the correlations are positive, there are some companies that might have negative correlations, I don't think, I don't have a good example in this moment I can think about, but positive dominates, if you call it. So okay, so this is a way to look at the correlation matrix, and just I wanted to mention I wanted to mention that of course there are other ways to, so this is a clustering way of looking at it, but there are other ways, this is actually one, actually I did this some time ago, so one other thing you can do is for example, this type of minimum or maximum spanning three, I don't know if you know this, yeah, so what you can also do is you take all the matrix of correlations and you simply say you order them in a decreasing fashion, you take the highest correlation, so you take two points, and then every time you take the highest correlation from your list in a way not to create loops in your graph, so you want to create a tree, one way of visualizing there are other things, and actually what you can see is that another type of information that we see here, we probably, if one color the sectors you can see, so these branches will be related to sectors, but there are some companies which are real hubs, which are very much correlated with many others, I don't know, general electric in the period when we studied, it was a super hub, okay. So this is the way to look at correlations, it's the visual part, but of course the question is okay, so well, I show, yeah, on the next slide. Well, given that most of the correlations are positive, this doesn't affect, there is no good way for this, so actually this, you can come up with some fixes to it, but if you just, you're right, it's, but actually what will happen is that even, so there are no really disjoint parts of the market, which are negatively correlated to everyone else, so still you will have a tree, but, it's a. It's not the value of the edge. Here, well, no, no, here the length is just for visualization, it's just. Well, here you don't have weight, it's like, when you construct it, you do create weight, so first you will take the two companies which are the most correlated, then a third one you choose which is, but anyway, so this is just to, there are other methods to look at correlations. So, but okay, what we said is that we have this type of correlation metrics, they're great, we saw that there are sectors in it, but what we want is to get some more information out of it, I mean, how can we use this, and actually a hint, why we really want to understand how this correlation looks like is that, if you do the exercise, what you will see is that, very often, what you have to do, as in many problems in physics, is you have to invert a correlation matrix, so you're interested more in the inverse of it when you do practical calculations, and of course, when you invert the matrix you, it's dangerous, you can have very weird results if things are noisy. So okay, so let's analyze this correlation matrix, I want to discuss very quickly the idea of principal component analysis, I don't know if it's, is it something, yes. Sure, you can, so what you mean is that one thing you could do, I mean, getting to further, yes, one thing you could do is look at this matrix, subtract the first mode, still you would have a mostly positive, but yeah, so that's what we'll look at exactly the eigenvalues and eigenvectors in a second. So okay, so there is the idea of principal component analysis, which is, which I won't go clear very much into it, but I think it's good to, I mean, something that is often used, but I will be very much known precise on it, so the idea of PCA principal component analysis is that you have some type of, okay, so it's analysis, and actually it's also some domains called singular value decomposition, and the idea here is that you have some type of data, you get, and what you want to find is you are looking for directions in it in which there is the most variance, so maybe the dimensions that your data tell you initially are not the good way to look at data, so I'm really being vague here, so what you want to do is find directions in data, with highest variance, highest meaning, okay, you want to find the first direction which has the highest variance, and then the second direction should be what you hope is will be orthogonal to the first, the highest variance after taking up the variance of the first mode, et cetera, so this is the idea of principal component analysis. Actually, I took an example from the internet, which is, I think it's easier than to describe, so for example, from this page, so you can have data on the x, y axis, which looks like this, these are your data points, and then you can see that, okay, maybe x, y is not the best, you can say, okay, this is maybe, this is a direction on which most of the, which describes most of the variance of the points, and if okay also, if this is my first direction, then of course it's easy in this case, then your second direction would be this, and which simply what happens is you define somehow new directions in your data, so you do not, new dimensions of your data without changing your data. Without changing your data, you have some new dimensions that could speak to you more, and of course what you also hope, it's not the case here, but what you hope is that you can reduce the dimensionality of the problem. You might find out that there are only a few dimensions in which things vary, and in the other, the others you can forget, so hopefully you can reduce, which would be good for us because we have a 275 dimensional problem here. So okay, so this is the way statisticians, if they want to be vague, probably right up. Obviously the PCA in practice is identical to determining the eigenvalues and eigenvectors of, so if you have, so what this would mean, is if you have a data, a matrix of data, you say you have data, which is A, so matrix which will be, which for example in this case would somehow look like this, so it would be a two dimension, which is the X and Y times the number of points, okay, it somehow would be looked like a two times 10 or something matrix, and so what this principle component has is essentially is to look at the autocorrelation of the data and define the eigenvalues and the eigenvectors, so essentially it's looking at the correlation matrix. So let's do this now, and then try to define, okay, why is the case and what different things mean. So in our example that we discussed, we have, let's say that we have, as we said we have m stocks, okay, so we have these m stocks which were actually an example 275, and so which means that the correlation matrix is an m times m matrix, and so to write up the eigenvectors, what you do is simply this equation, it's important to know a bit of algebra that since your C is, okay, so this is a symmetric matrix, we define it like this, and we show it as well that it's symmetric, it means that all the eigenvalues will be real and all the eigenvectors will be orthogonal and normalized. So what can we do with these two things, or what do they mean? So let's consider a bit in the spirit of this case, let's consider a portfolio weight VA. So we have this, we take one of the eigenvectors and have the dollar weight on each of our products according to this eigenvector. So the variance of this portfolio then will be, so if you want to write up the variance of the return, then you have the following formula, then so what I call sigma a square will be the following thing. So what I do here simply, you have VA is a vector, so VA i are its element, that's the weight of each of your product times the return of this product. So if you sum these up, you get the return of your portfolio and we are looking at the expectation of the square of this. Which is okay, but what does this mean? Of course, this can be written up like this in the definition, which becomes something nothing. So what I simply wrote up is okay, so the expectation of the variance of this portfolio will be this thing, which is clear. Of course, CVA is lambda VA and since VA is normalized and orthogonal, so if it will be free and B, it would fall out. So it's simply lambda a. So what does this mean? That's also okay to write up, but it means that the eigenvalue is the variance of the portfolio having weights of the eigenvector. It's clear, so this comes to the PCA, so what does it mean? That the largest eigenvalue you think will take most of the variance, so that's how it comes back to this slide here, so you're looking for the highest variance, so it's identical to doing the identical position. And okay, so one thing that you learn from this actually, so if this is the variance, it means of course that lambda i will be positive for all i. Some are already looking at the matrix. One can understand this, and so what it means is that we have this set of uncorrelated portfolios, which are called VA. So this is an A here. So it means that, okay, if we do enter the composition, then we have the number of dimensions, different vectors which are orthogonal to each other, which are essentially uncorrelated portfolios in our language. Is this clear? We won't discuss, okay, we won't go much more in detail for this, I don't know. Okay, actually we can do an example for this, I don't know how much. Yeah, so what we do here simply is write out the sum. So this square can be written out as a sum over i and j of VA i, VA j, times, well, we have R i, R j, but expectation of R i, j is our correlation. So for covariance. Yes, okay. Yeah, so a portfolio is simply a basket of stocks. If I have M stocks, I can define, so a portfolio can be M dimensional, so an M dimensional vector. Did I ever write it up? No. Is M dimensional? Of course it can have zeros. You can say that your portfolio is a single stock, so your VA, okay, VA is an eigenvector here, but your portfolio could be one, zero, zero, zero, zero. Yeah. So it's exactly. But in this case they are defined by the correlation you can choose whatever, what we say is that, of course, whatever portfolio you choose can be written up as composed by the eigenvectors. They are orthogonal, so actually what it means that any return can be written up as a sum of these VA portfolios, I mean, weighted sum. So, okay, I actually have an example for just I'm lost with my notes. Yeah, so we can just look at a very simple example because they are a bit general here. I write it just up and then I'm gonna calculate, so of course you can say, let's assume that you have only two stocks, okay? And you have a very general correlation matrix, you say your correlation looks like this, okay, it's normalized and it's symmetric. So simply what you say is that the diagonal is one, off diagonal is zero, if you have two dimensions, you cannot do that much. So what does this study here boil down to? What you want to do is do a diagonalization of the matrix, which I just write up and then as a homework, anyone can do the calculation, is that so this you can write up as, as, okay, so what, tell you in a moment what I'm doing. So what I'm doing here is, what is the eigen decomposition of the matrix, I think it's called. So what we do here is simply that matrix can be written and diagonalize the matrix can be, so let's call this an O matrix, let's call this lambda and let's call this O minus one, so that the matrix C can be written up as O lambda inverse of O, lambda is a diagonal matrix containing the eigen values of C. So the eigen values of C can, it's like this defines the eigen values of C, it will, one can write it up and it will be one plus rho, one minus rho, and the O, and what O contains is the eigen vectors of the matrix going with each eigen value, normalized because of one over square root of two. Okay, unless I made an error somewhere in the calculations, I actually tested in a homework. So okay, this is the general solution, so what does this really tell us that you can have in this two asset system, essentially you can describe everything by two portfolios, one which we will call, I don't know, O plus which will be one eigen vector, the one one eigen vector, which will have a variance of this and the other one would be, okay, so the calculations maybe are not good, but anyway, I hope they are good, test it, check them. So what does it mean is that in this system of two stocks, having a correlation matrix like this, anything can be described by these eigen vectors, so you, okay, let's go back to financial language, what does it mean? You buy both products, or you sell the first and buy the second, if I write it okay, and with this combination, of course you can make out any portfolio. In this case, since they are, if they are positively correlated, then buying both of them, so okay, sorry, of course everything here is normalized, so you could have the variance of each product being different and then it's the same type of calculation, just more complicated, but so in this case, if you buy both of them, you will have a variance, which is let's call it, variance of a PNL is one way to call risk, means that okay, you have something in your pocket, how much does it, the value of it vary, so it's a one definition of risk, it will be, so if the correlation is positive, then having both of them in your pocket is a higher risk strategy than selling one and buying the other. If they are correlated minus one, then of course selling one and buying the other is risk less because when one moves up, the other will move down, you will never lose money or gain money, okay? So I mean, I don't know if this clarifies the notion a bit. Okay, yes, I wanted to say something. Okay, so this is the main idea, so what we want to do is, okay, we had this correlation matrix, great, now let's use this and let's start just simply looking at the eigenvalues of the correlation matrix. And so you get this, so what we do simply here is histogram, so it's the same correlation that we had before, again, you just calculate the eigenvalues, eigenvectors and you see the histogram of the eigenvalues here. And so what you see immediately is, okay, it's not that much, what you see is that there is a bulk, there is something here, so is it clear what we plot here? So it's the, lambda is the eigenvalue and it's the distribution of the eigenvalue. Now I have one covariance matrix, it will have 275 eigenvalues. I just, in practice it's just a histogram but it's the same as probability distribution. You see that, so this, okay, it's normal, actually it's not even normalized. So it's actually the number of eigenvalues in a given window. You say that here, so this is, you have one eigenvalue here, one here, but then, okay, here in this window you have four or I don't know what, I don't know how much exactly. And so you have more and more here because of course in a given delta lambda window, how many eigenvalues you have. So okay, the first thing you see is that, is that there is a big bulk, which we don't know what it is, but there are some things which are sticking out. So first of all you see that there is something enormous here. The largest eigenvalue, so this will be the first mode that actually we are looking at. And we will look at in a second what it means. But okay, so let's forget the one on the right-hand side and zoom in on the rest. So it's now, I'm only up to seven here. So up to this point I'm plotting. And you see that, okay, even if you zoom in a bit also in the bulk, so you still have another bulk, but you have some other things here that seem to be different. So somehow, okay, you have the feeling that this bulk might not be very informative, but okay, we'll see. But there are some eigenvalues that are sticking out. And okay, so recall what is the, okay, so there are these numbers here. So things are normalized, so the number here is a bit hard to understand, but lambda is the variance of a given portfolio, right? So what you want to do is look at the eigenvector of which, so the variance of which will be this lambda. So that's what we will do next if this is clear. So let's look at the first eigenvector. It's this, so the one which goes with this guy here. So the eigenvector is, okay, we can calculate it numerically, but so it's the weight. It's a 275 dimensional vector, and I just ordered them actually in practice to be a bit more visible. It doesn't matter here by the sector that we just discussed. So what do we see here? The blue bars are the actual values of the eigenvector. Then I plot the red just for each sector. I do a flat average just to, here it doesn't really matter, we'll see later. Anyway, what do we see? We see that essentially they are all very similar, all the elements of the eigenvector are very similar. They are all positive. It's not just because I plot from zero, but they are all positive. And they all have some value actually, which is close to 0.06, which I just tell you what it is. It's one over square root of the number of products. So essentially what this says, okay, there are some variations, but the first eigenvector is to be the same sign. So let's say buy all products in a similar manner. So the vector will be normalized. So the sum of the squares goes to the dimension. No, sorry, it goes to one. So the sum of the squares is one. So each element is somehow close to one over square root of n. So this is what is called actually market mode, and this is what Shannon mentioned. So this means that the riskiest thing in the market is that everything moving together. If you are buying everything in the market, sure you can gain a lot of money maybe if the market is going down, but there are, so it's a mode of the correlation matrix which has big moves and realizes a big risk. Is this clear? So this is, okay, this is the first eigenvector, but of course we can look at other ones. And then what we want to do, of course, my question is, okay, what do different things mean? So, okay, let's see the second eigenvector. Okay, it's something. It's, okay, if you just look at the blue curves, if you don't look at the axis here, you would say it's random. There is not much information in this. Okay, it's not obvious. I mean, but still you have the feeling that, for example, here there is one sector, all of them are positive. There is one sector here, all of them are negative. So, okay, if one wants to play with this, one can come up with some interpretations of these eigenvectors that, yeah, okay. So the first eigenvector was buy every product in the market. The second eigenvector is, I don't know, it's buy stocks on utilities and sell stocks on energy. So probably this will have a high variance because there is some, I mean, opposite commovements between these products. Is it, okay? The first eigenvector, so this one, yes, exactly. So I look for the, okay, clean it. I look for the highest variance, the second highest, and so on. Because I think, okay, so since variance is a physical quantity, it's the variance of your return in this product, it's something that is meaningful, I mean. Okay, so one can look at, I have all the first 10 eigenvectors. We don't have to look at it, but so for a while you might make a story. I don't know here. You can still make up stories of, I don't know. Okay, I won't make up stories. You will have the figures, you can look at these, but okay, yeah, because you have another, so I mean, these are different orthogonal modes of the market. So in a first, how do you say? In the first order, they move separately, but in the second order, they both move against, I don't know, financial products. Ah, I mean, don't listen to the exact example because I never studied these really. It can be interesting for someone. But of course you can also say that it's noise. Just we have, what we'll see is we have to find a way to prove that it's noise or not. Because okay, so here, after a while, it's clear that it's becoming noise. I don't know, here you still have, I don't know, different sectors that start to try to move together. Of course the fact that, it's okay, and we go on, let's say, and you start, I don't know, here you seem to say that it's noise. Okay, actually I'm not at all correct saying this, but if you said that sector is an important structure, you would say that it's noise because it's up blue and up and down for the same sector everywhere. Of course you can say that who cares? Sectors are given by, why should they follow a sector's structure? So we have to be a bit more formal to understand this. And so that's what we'll do, but actually I have, so we'll do a bit of understanding what is the information here, but if I understand well, some of the people studied the random matrix theory or not? No one, because some people in Paris in the PCS were studying, and some people came here to test that we have fun. Okay, it was with whom? With Mark Potter's or? Someone else, Aletissia. Okay, okay, so anyway, so what we can see is that it's not clear until when the first one has a meaning, the eighth one, we don't know if it has a meaning, so we want to have some proper measure of this, and this is what we will try to do. So we will have a very little discussion of random matrix theory, but just stating some things. I won't do any calculations. I can give it as an exercise. So okay, we need a method to infer information from this, and okay, why do we need a method apart from interesting and interesting? Of course, because we said, I just cleaned it down, but so lambda A is the risk of a portfolio of VA. So if it's true information that buying this and selling that has a large risk, then you can decide either you can go and trade according to this eigenvector, or you can decide to be orthogonal to this eigenvector not to have risk. So it's important to know. The problem with these is always, is that you have your data, so you have, let's say, t days, and you have m stocks, and unless t is much larger than m, your correlation will contain a lot of noise. Okay, that's clear, right? I mean, if you have 10 products in 10 days, it's very hard to make any meaningful. And actually, in our case, I didn't say it, but for us, t over m is roughly 2 1⁄2. So we have more days than products, or more data points than products, but not that much. And so, as we said, of course, if you want to, often what you do is want to invert the correlation matrix. If it has a lot of noise in it, a lot of eigenvalues that don't contain information and are small, they can, in the inverse, they can explode and then kill, and mislead you. It's okay. So we will do a very quick discussion of random matrix theory, which will be the following. So, as I said, let's assume that you have a C matrix, which is okay. So in a matrix notation, it's this what we are looking at. And, which is so, A is your data. And, okay, you need to come up with a null hypothesis for, to look at statistics. So what we say is our null hypothesis will be that things are uncorrelated. That assets are uncorrelated. So, which means that A, I, T, which shouldn't be in our case, R, I, T is I, I, D, right? So what we try to say here is, okay, let's assume that all this shit that we looked at is noise. There is no information in the correlation matrix, it's just we were lucky to see red things. Okay, this is our null hypothesis that everything is uncorrelated. And then, we can write up the following, then we have an expression for the, for the distribution of the eigenvalues of C. So this is, this is exact, so I'm talking about this guy here, rho lambda. So, which, okay, if you want, we can properly define in the following way. We can say that it's one over m dn lambda over d lambda, where n lambda is, so to be clear, this distribution is simply this. So here, one can say that in the limit, in the limit of m going to infinity, and t going to infinity, but over m being finite, but still larger than one. Okay, so I'm a bit, is it readable what I wrote here? m going to infinity, so the number of products, it's big. t goes to infinity, so the number of days. And the ratio is, well, larger than one, at least. I mean, if it's lower than one, you have less days than products, and you cannot really look at the correlation matrix. Then you can write up the following expression for rho. Okay, actually, I'll write it here, because when I write low, I think it's not really visible, but you get the following. So in this case, there will be an ugly, not that ugly, but an expression. Ways, okay, so you will have a, sorry for this, but once you have to write it up. Okay, so there is an expression like this, if you want, I won't ask you to take the exam, which is probably not that very hard to write up, but I won't do it now. So, which actually has a name, which is called Marchenko Pasteur. Marchenko Pasteur, if this is your null hypothesis, you see, you can have different null hypothesis, and I mean, the random matrix here is a super large field with an enormous amount of results. I won't, I'm not an expert. Anyway, so this is what you get, so what does it say to us? That the distribution will, first of all, there will be a lambda max and lambda mean, that there will be a lambda below which you shouldn't have eigenvalues and the lambda max above which you shouldn't have eigenvalues, so this will define a distribution which is finite support. And what is important, of course, to say about this is that, okay, there are two things to say, one is just by word, so here we set for q larger than one. It's clear, no, it's not a big problem that I wrote here and that I wrote here. So this is for q larger than one, normally, for q equal to one, actually, so if you have the same number of points as days in our language, it would be the Wigner semi-circle low, which is, I think it's not really a semi-circle, but anyway, so it goes back to another result, it's not that important. And one other thing that I wanted to say is that, of course, we said that for m finite and t finite, but the ratio, m going to infinity, going to infinity, the ratio being finite, then you have this expression, which there is a lambda max, lambda n. In practice, if these are finite numbers, it's not that clear, it sort of holds, but it's not exact in the, you might have small probability, you might have lambda below lambda mean and might have a lambda above lambda max. So this is the marching of a student, so let's say, okay, so first of all, what does this mean? It means that via the null hypothesis, that if things are uncorrelated, the distribution should look like this, so what we can do is, okay, let's compare this distribution to that to this. And, okay, that's what I did. So this is what you get, okay, so things changed a bit, but so the red line here is exactly the same, okay, it's exactly the same as the blue bars before, what happened here is simply, it's not bars, but it's a line, and I'm zooming in up to three, so I say it's just for practical reasons. And so that's the real, it's the same that we had before, and it's a normalized version, before it was the count, now it's normalized for the integral to be one. Anyway, and that is the, because that's the way this is defined, and the theoretical is the blue curve. And, okay, so how is it done? So one thing you can do is you simply take sigma from the data and q from the data, you get more or less this. In practice, you can also play a bit around with it. So defining exactly q can, so if there are correlations, autocorrelations in the data, then maybe you're effective q, so the effective number of data points is lower than actual. So one can optimize the fit between a theoretical and an empirical curve. I don't want to go into details, but you get this, so what do you learn here is that the message of this is that only the eigenvalues, the red eigenvalues above, let's say 1.7 here or something like this, which can be considered not noise, right? So it means that essentially only below 10% the eigenvalues that you kept. Actually it's still large because 10% of the eigenvalues would be 225 in this case so it means that there is information in those, but in the rest below here, so this is only the noise bulk and you don't want to use it in any calculation. Why you don't want to use it? Because if it's noise, of course you don't like it and especially if you have noise with small eigenvalue, you are inverting it and it will be a big number. It can give strange results when you're doing an optimization. But it's also important that okay, it's only 10% or below 10% of the eigenvalues which are above this, which are meaningful, but it's almost 25% of the variance of the total that is in the correlation matrix which are described by these eigenvalues. So it's still, you have a relatively good description and so the rest is noise. What we can assume, so we don't have a perfect match because of course remember that the problem is that I have the two figures are too far away from each other. So this was what we had in the beginning, right? This is the entire spectrum of the, so the distribution of the eigenvalues. So it was up to 60. We zoomed in and then we were up to seven. So we will go to the next figure but we will see that essentially somehow up to this point it says that it's noise. Which means that everything above is not noise. So indeed it means that given that this matches so good for the left hand, so for the bulk. It means that the bulk is just noise that simply if you put uncorrelated, essentially what happens is you can have uncorrelated stochastic processes. You look at their correlation on some finite T window. So I mean M products, T window, you would have a correlation. Even if they're uncorrelated what you measure is not zero. You will have something which will follow this blue distribution. So what we say is that the actual correlation matrix has a bulk like this and has a, I don't know, 20 eigenvalues which are meaningful. Exactly. So estimate what you don't want to do. So we catch these. Yeah, so what you want to do exactly, what you want to do is somehow clean your matrix. There are several sophisticated methods but yeah, the idea is that you want to subtract this but it's not the obvious because still you want to keep other certain measures you want to keep constant. You don't, I mean, subtract correlations. It's not obvious, I'm sure. You could write zero in your matrix but that doesn't work perfectly because these are the eigenvalues which are noise. You don't know which correlations to put to zero. If you change the correlation matrix, the eigenvalues would also change. So of course, it doesn't mean that everything outside is not noise. It means that everything inside is noise. And yeah, so your question is, okay, is this here? Of course I don't claim that this is not noise or this here but I just say that, okay, with this method, I cannot say much. Actually one can, you could have some other new models. You could say that everyone, so for example, you could say that everyone is correlated to something so there is a market index which moves. Everyone is correlated to this but otherwise they are uncorrelated, every orthogonal to this. And then you could, which would be essentially changing sigma somehow, so change the variance of the processes. It would give slightly different result. It's one approach, I think. I mean, it's quite sophisticated. So the way, there is a calculation. What we plot here is to get the best match between an uncorrelated matrix and what you have here but one could do better, of course. And okay, so I just want to say a few words. So what you want to do is to clean the matrix somehow, which is exactly the question. So you want to do something, throw away the bulk. And okay, so there are some methods for this. I don't think we want to discuss it. So one is to throw away all these eigenvalues there and just replace it by one representative eigenvalue in a way that the trace of the correlation matrix remains the same. I don't know, I mean, if someone is interested, you can look at methods. There is a huge literature of these. And actually there is a very simple, just I want to say one way. So there is a very simple solution which is called shrinkage. I write up because it's simple. So there are sophisticated things that you replace the eigenvalues by one and keeping the trace. But you can also do something saying that, okay, what I'm sure is the diagonal. The diagonal is information in the correlation matrix. Things are correlated to one with themselves. So you can say that, okay, instead of the correlation, I do some, so this is what is called shrinkage. I set a value alpha. I set a number alpha. So if alpha is zero, that's the correlation matrix, but you could change alpha and say, okay, I want to take the correlation matrix with a smaller weight and boost up the diagonal. So this is a diagonal with one's matrix, boost up the diagonal. So essentially what you're doing is just increasing the importance of the autocorrelations and a bit decreasing the cross correlations. It often, this type of tweaking of correlation matrix can be helpful. So essentially what you do is if alpha is one, I don't know anything. I just know that it's one on the diagonal. And so the weight you give to your measurements on real data is decreasing. And okay, so I think we'll stop almost here. Just I wanted to say something. So it's okay. Okay, we might mention it next time and it will be a bit related to the exercise that I said. But so as I mentioned it several times, what happens is often you have to invert the correlation matrix and one can show this. Okay, I won't show any figure for this because it would take much more, too much time. But in practice, if you do not do a cleaning of the correlation matrix, but you take it as is, it can lead to overestimating your variance a lot. So one experiment that one can do is that, okay, you have a time period, these two days. On this you can measure a correlation matrix, but it means that it's essentially you have future information. Time has already passed, so you're studying this data. But if you want to use this for some optimization in the future, then you have to use it. This is what's called out of sample, usually in practice. So what you can do is that, okay, in sample, of course, the actual correlation matrix will be perfect because it contains all the noise that was in real data so you don't need to clean it. But when you want to clean it is when you want to use it for out of sample studies. Actually, we have a couple of results showing this that if you do not do the cleaning and you just apply, just use the correlation matrix in an out of sample fashion to look for some, to do an optimization. By easily, by a factor of three, you can over-realize the risk that you thought. So the variance, which is not overwritten, but anyways, so the variance of a portfolio defined in sample and which you think will be under a proper limit, out of sample will be way higher. Okay, here I'm on this last point, I'm not sure I was very clear to you. It's okay. Okay, so if you have questions, ask. Yeah, I think it's time to stop. Okay, that's it.