 Let's record on this computer. OK, so it's a great pleasure to have Jacopo Mastromatel from CFM in Paris to give this joint work with Thomas Mehdi and Mikhail Benzakhen on what really is a financial instrument inside from cross-impact. Thank you. Yes, thank you very much. It's great being interest at least virtually. I hope that as soon as we can, I'll be able to be there physically for the moment that's more than enough. So yeah, try to come up with this title, which is meant to be a bit catchy. And what I'm trying to do is try to justify and to build some narrative around the research we have been doing. The target was really to build a model in which you explain how prices of correlated assets move as a reaction to trades. But you can give some sort of interpretation and build up a story of this. So the idea is to try to guide you through the story that comes out. And the other angle with which one could see this work, which is maybe more related to what you do, is that you try to fit these models of price responses and the world is noisy. And you have a large dimensional system. And so you need some general principles that guide you. And if you're able to pick the right principle, then you can vastly reduce overfitting. So I think that's the message of this talk that doesn't necessarily only relate to finance, but it's a bit general and it relates to learning more in general. So first, the plan is the following. I'm going to formulate a question. I propose a tentative answer in an infinitely liquid market. Then I formulate the same question for a market in which liquidity is finite and you have transaction costs. And I'll try to give some answers. Someone based on common sense and common sense takes the form of axioms and principles. And another one is data. And I'll try to emphasize how those axioms play with the empirical results. So I start with the first part. And then I come back to the title and tell you what really is a financial instrument. And the emphasis here is on what really is a financial instrument as opposed to two or three. How do you count the number of financial instruments? And this seems a trivial question, but it's actually not. So I have listed some corner cases. First, if you're trading stocks like Apple, you can trade it in multiple markets. You can trade it in NASDAQ and bots. In the US, you have like 13 venues in which you can buy it. You would like to say that you just have one single entity that you're buying or selling at this, the fact that you have multiple markets is artificial. So you would like a notion that is able to capture that. For future markets, say that you're going to buy oil, you could be buying oil at different maturity. So you can buy a contract that expires in 12 months. You're going to have the crude of 12 months for now or a contract which gets you crude 15 months from now and so on. So you have many of them, but you think that the intrinsic dimensionality of the space is much smaller than all the number of possible maturities. And options. I don't know how many of you are familiar with options, but you can build an example which is similar to the futures one. But in that case, you have a B dimensional space. Somehow you can, an options is a contract that entitles you to buy or sell something at a given time, which is dimension one, and at the given price, dimension two. And now it's like in the futures example, but you really have a surface of products and you would like to count how many extra products you have. And another kind of financing example is a calendar spread. Sometimes in finance, you have one financial instrument which here is denoted with, whose price is denoted with P one. And you have another instrument which is P two and sometimes you want to buy one and sell the other one. So you build a product which is the linear combination of these products. And so you have three products which are product one, product two and product one minus two. And what you would like to formalize is notion that actually you have just two products. And what I'm going to argue is that you cannot do this. If you have, as you have finite liquidity, this sort of naive intuition breaks down. So the simplest possible operational answer to the question of how do I count financial instruments? Let's say you're an asset manager. You want to enter in a certain risky position. And the way in which you count the number of different things in which you can invest is the following. You compute the correlation matrix of price variations. You diagonalize that matrix and you plot those eigenvalues. And you could say, well, how many instruments I have? Whatever mode is a zero, then you cannot put money on that. You cannot trade it because it doesn't fluctuate. But any other thing, it can be scaled up to target risk by using leverage. And when it comes to how should I name these financial instruments, you can label them with the index of the eigenvalue. So these lambda a's are the eigenvalues and a labels the eigenvalue. That's really a naive answer. And how it's usually written down in formulas is that when you try to solve a problem which you have some signal in which you want to invest, but you want to manage your risk, since the fifties people like mean variant schemes. And one way to build a principal manner to do it is to build some sort of Lagrangian, which is this L here. I don't know if you, do you see my mouse? So yeah, can I? Yes. Okay. And you have a predictivity term. So Q is your position, how much you own of that asset. And this mu is your expected price change in the future. But you could say, well, I want my risk to be capped. And so in order to cap it, you introduce the Lagrange multiplier, which is this gamma here. And you say, I want to have a constraints on how much risk I realize, which is a quadratic form that you build up of your position, times this sigma, which is a covariance matrix. And when it comes out to the solution of this problem, which I'm sure many people are familiar with, just a linear quadratic optimization problem, then you should take this productivity thing, put it in the numerator and divide by the risk. And as everything are matrices, you need to do it with matrix inversions. But the natural interpretation is in terms of modes of risk. So you say, I have a productivity, which is this mu. I project my prediction on the modes. So I get this mu alpha, this is projected productivity. And I divide it by this lambda a square. And what are those? These are the modes of the covariance matrix of prices. So I introduced the composition sigma equal, sum over a number of modes on the projection on these modes, which is this O matrix and the corresponding eigenvalue. So when it comes to how do you count products, I can say modes of zero volatility are actually infinitely expensive because they don't fluctuate. And in order to trade them, you need to put a lot of dollars on that. You need a lot of leverage and they're infinitely expensive. The modes for which the lambda is small, okay, you can leverage them up and get to target risk. And another point is that you should be careful to overfitting this lambda a parameter. And this is just a simple example to fix a notation. First, you compute the correlation matrix of price. Say in a two-dimensional case, you have one parameter, which is rows, the correlation between the two assets. You diagonalize that, so you have the projection. In this case, you just have one absolute mode, which is this one-one component of the O matrix here. And you have a minus one-one mode, which is the relative modes. So you have like a center of mass and a relative mode. And you have the eigenvalues, which are one plus row and one minus row, which are trivially related to the correlation. So this portfolio, which you can put money are the center of mass. Its risk is square root of one plus row. And the other portfolio is the relative mode and the risk that you can put is one minus row. And for row equal to one, that relative mode will not fluctuate. You won't be able to bet on the difference between those two prices. And there are a number of caveats, which are quite substantial, which are abstracted away here. First, there is a statistical problem. How can you say if lambda is actually zero, which is very relevant in a case in which you have a large number of instruments? Actually, you can see easily that if you have more products than observations, then just out of linear algebra, you see that you have some combination of prices that never fluctuated, but it's not because really that mode should not fluctuate. It's just that you never saw it fluctuating on data. This problem you can deal with that, it won't be discussed. You also have another subtle problem, which is dynamical, which depends on the fact that I have not been talking about time scales. I didn't say if there are price variation one millisecond, five minutes, one hour, or one day, and the picture of correlation that you could have might vary. For example, in the apple, which you might trade in different venues, actually prices, they tend to go out of line at the time scale of some micro to milliseconds. And so at that time scale, it does fluctuate, but if you go to say to the daily time scale, you can completely neglect that. So you have a hidden dependency of all what I'm going to say in the time scale. I'm going to consider one minute, but the actual thing that I'm going to discuss is the liquidity. What happens to this naive labeling of actual financial instruments that I have proposed if liquidity is finite, which is today's topic? So let's take the example that I have with disaster manager that wanted to invest. He had this predictor, Mu, and he wants to invest a fixed amount of risk, say $10 million, and the disaster manager would like to invest this $10 million on the June expiry of crude oil future. And then you see that the June contract is not very liquid. I say, okay, then the contract expiring in May and the one that expiring July, they're very similar. They fluctuate in a similar fashion. Can I dilute my trading in these products so that I add up liquidity given that they are almost identical? And here I'm using to ill-defined concept, which is adding up liquidity and almost identical. And I'd like to make this blurry notion a bit more precise. And the question is, let's say that actually May, June, and July, they're completely identical. Does diversification work between completely degenerate products? Do I reduce my transaction costs when products are completely identical by splitting my trades along them? And can I build an effective liquidity pool notion which is able to sum up the volumes trading in a number of products? And so if I want to incorporate that, in the previous problem, I had to add several ingredients. And so I am taking back the Lagrangian I had before with some productivity term, a risk control, but now in same, actually, while you trade, you're going to move prices and this is going to generate extra costs, which I call impact costs. It's the orange term that I have. I assume it's linear for simplicity. And you have a matrix that tells you how much you move the price of product J when trading I. And then you have another term, which is a cost that scales linearly with the amount of trading which you do, which is an absolute value of the quantity trading, sort of an L1 control on the trades, times a parameter which can interpret as a fee. Each dollar I trade on that instrument, I pay a given fee, which is Psi. And when it comes to notation, so this capital Q is the position that you have. You gain things which you gain dollars that scale with the position you have built up, but you pay costs according to the trade, which is the small Q. So small Q is the derivative if you want of the large Q. And the question I was asking before was what to do if in some space you has complete the generosity. Let's say in the two-dimensional example I had, what if rho is one, you have 100% correlation and you have a kernel of this sigma which is larger than the empty set. Then this impact cost term is telling you, look, you should dilute your trading across products because the impact relates to how well you're hidden. And so if you have that two-dimensional example, you should do half of your trades on one products and half of your trades in the other one because you're somehow hiding yourself from the market flow that you would otherwise measure on those products. And on the other hand, yeah. Yeah, no, sorry. So if I understand so, the question you are saying, you are concerned with when you have finite liquidity is that you want to split your orders across similar assets, essentially. Exactly, yeah. Now you have that this similar asset as a correlation matrix and also a cross-impact, essentially, in the sense that you have one, also the other. Now, these two objects must be related, right? I mean, they are essentially one correlation, one the response function of the same system. It's precisely the content of the presentation. So how do you build in a principal fashion this lambda respecting some consistency relation that you would expect from such a system? Okay, thanks. Yes, and on the other side of this trade-off, you have linear costs. And this is quite easy to interpret because you say I have five products. And if you forget impact costs, which one should you trade? You should trade the cheapest one. If one has a fee of five bits and the other one has a fee of one basis point, then you should go for the cheapest contract. So somehow if you want to solve this trade-off problem, you need to find a sensible way of modeling this cross-impact matrix. And actually, the fact that dilution works, it means how should you put indices in the equation for this lambda, which is this cross-impact matrix? And you could argue two different things. You could say indices are in the space of physical products. So these matrix should be somehow diagonal. And in that case, dilution works very well. And you can prove that if your impact is actually non-linear, but if it's got an exponent delta, say larger than zero, then your dilution is n to the minus delta where n is the number of products. Basically, it's a convexity argument that if you have the sum of the n volumes that gives my target and I split as one over n over n things, then I gain something by splitting. But another way of putting the indices is to say, indices are actually in the space of modes. So my cross-impact matrix, first I project on the basis of these correlation modes, which basically means that this lambda and the correlation matrix they commute. And then you have no dilution because the market is understanding that you're actually splitting your trades, but actually you should put them together. So projecting in this O and responding with some common liquidity pool and then projecting back to the physical space. And so the problem in the most general, or say the simplest way to frame this problem is how should I build a consistent coupled theory of price variation and traded volumes? And if you want to have a simple and sensible model for cross-impact, it can be of this form, price change equal to lambda, which is this cross-impact matrix Q plus a residual. And I would argue that it's simple because I have no time dependence on doing a static model and it's simple because the impact exponent is one. So I have a linear impact model, which leads to a quadratic cost model. And we have some proposition on what to do with non-linear stuff, but I won't be discussing that. And it should be sensible. And sensible means explaining well data, which in some sense means that your residual should be small, but you also wanted to satisfy common sense. So this lambda should be compliant with suitable axioms. And here I am formulating the question of not fitting this lambda from data and saying, let's consider space of all possible models. How should sensible model be made? And the first thing is, how should I take the building blocks for this model? And let's assume that the world is Gaussian, price variation are Gaussian and the volume's traded are Gaussians. And then you don't have many things to measure actually. In a zero mean world, you just can measure quadratic variation and they completely specify your model. So you have quadratic variation of price, which is the price covariance sigma. You have volume covariance, which is this omega and you have a response function, which is a price change Q. And the question is, which reasonable formulas you can build by combining the symbols together? And this slide is just to show you how these building blocks look like. These are, I've taken three products. So it's a crude oil contract. I have two expires of this contract, the next two maturity and the following one, which are the first two products. And the third one is the calendar spread, which I was mentioning, which is just long the first contract and short the other one. It's a linear combination plus one, minus one. As you can see in the correlation matrix, which is the first plot, the successive expires, they're very close. They're 99% correlated. Whereas the third thing has positive correlation with the first maturity and negative one with the other maturity, which is what you expect by construction. On this omega object, which is the covariance of price variation, you see a strong diagonal pattern. So in the upper left corner, you see that there is a product which is much more liquid than the other, which is the so-called front month maturity. The product which is closer to expiry. The other one you can see in the center, which is much, much less liquid. And then you have the third one, which is the calendar spread, which has a very thin liquidity. And this reflects in the plot of the response function, because you see these vertical stripes and these vertical stripes are reflective of the fact that actually the price variation, they might have similar scales. So you don't have horizontal strikes, but the volumes are very different. Some products are very much, much more liquid than others. And actually, when I first presented this, the audience, they had like a device to answer to multiple choices test. So I was asked to do quiz, and this was the quiz, but I will save you from answering to this. But what I was doing is to build some candidates formulas and asking people if they like the formulas for cross-impact or not. The first one is takes the price covariation and do the third power, because why not? The other one that you can do is to say, okay, I don't believe in cross-impact. I think the cross-impact does not exist. It's diagonal. So it's diagonal of sigma, which has dimension of price, divided the diagonal of omega. So they build something which is one over volume and have a delta V over delta V somehow, which is something that satisfies dimensional analysis at least. You could try to build a cross-impact model, but I say I want to do the same thing, but interpreting the one half symbol as a matrix factorization. So this sigma one and a half could be a PCA decomposition and I have a cross-impact model that satisfies dimensional analysis. I can do the same thing, but based on response. So I can build the same combination, something that transforms like a price, but it's built out of responses rather than price covariances. And or you could do something which is along the lines of what I was doing. So respecting dimensional analysis, which is on the numerator, I have this lambda A, it's the price variation. And on the denominator, I take the volume covariance, but I project on the modes of the correlation matrix. And I build something that scales well dimensionally. So I have to take a square root of this omega projection. And then I have a mysterious formula, which I'm going to talk you more about later, which we like a lot and we've been working. So in this seminar people have been answering and I was saying, so the first formula does not make sense. It's a price to the sixth power. It doesn't make sense because impact should be a price over a volume. Second point is that you're assuming the cross-impact does not exist. Third point, we call it the whitening model because somehow you're going to a basis in which things are uncorrelated and then building a model in that base and then rotating again. But you can notice that this model is not symmetric. And the second mode is r omega minus one. I don't know if some of you recognize that, but that's just what you get out of a linear regression. Linear regression usually the regress parameter is average of yx divided average of xx. And that's precisely what maximum likelihood gives you, which is not symmetric. This is a model that we have proposed as first in which you basically assume that the cross-impact matrix commutes with the correlation matrix and modes of the correlation matrix are also somehow modes of liquidity. And this is another model which we like more that we have proposed later. But the take home is that you have many ways of combining those symbols and you would like to have some maximization procedure to classify model. And somehow another speculative message is that this model are not very important per se. Probably what holds more value is to validate on empirical data, which symmetries does the data satisfies. And because establishing the fact that the symmetry is valid or not has much more deep value than validating the fact that the simple instance of a model is good or not. And another point is that if you axiomatize, then you have a sort of a plug and play approach. You can enforce property in the cross-impact matrix by controlling how your inputs look like. Somehow you control your output cross-impact matrix by knowing what you're putting in a disciplined way. So how to enforce this common sense, this axiomatization? Well, first you can try to get dimensional analysis right or gauge theory if you want to somehow glorify dimensional analysis. First you could say, well, I want to predict price variations. There are costs expressing units of cash, say dollars which means that my cross-impact matrix, given it's a function of the sufficient statistics, price covariance, response and omega, if I multiply your prices by alpha, then the cross-impact matrix multiplies by alpha. And then you could go further and say, actually, the volumes, they can be multiplied by arbitrary factor. If tomorrow I say that all volumes are multiplied by five, I can divide all prices by five and the market will behave probably just the same. Which is true to a certain extent, then you have some symmetry breaking things like minimum tick size or microstructure related limits, but you expect the symmetry to be respected. I call this split symmetry because in stock markets, you sometimes have stock splits that volumes are augmented by a factor and prices, they reabsorbed that factor. And this reflects in the property here. Prices get a D to M minus one and volumes, they get a D. This gives you a functional equation for your cross-impact. And then you can, going even further, cost could be invariant under rotation of products. If you wake up in a world in which you can trade linear combination of products, you expect things to behave a bit the same. So the first axiom, we believe a lot in this. The second one a bit less and the third one even less. For reasons which are easy to justify and that relate to the fact that the money that you make or the P&L that you make have symmetry breaking terms sometimes and sometimes they're large, but to a certain extent it's natural to build models in which those symmetries are respected and then go progressively and break them. Another property that you expect is that for uncorrelated product impact should be additive. So if I take this cross-impact matrix, but everything is diagonal, then it's the sum of a single product impact on the products. And then you have very important properties which relate to no free lunch. No free lunch in the financial markets usually is the principle of no arbitrage. And in this specific instance, no arbitrage means that you cannot use cheap products to push prices of expensive products and gain money. So you cannot pump and dump, use pump and dump schemes in a real market, which means boils down to the fact that lambda is a positive semi-definite. And this prevents the existence of static arbitrages. Somehow if you also ask for the absence of dynamic arbitrages, then you should have a matrix which is also symmetric. And this is a reasonable thing to impose. One should you impose it well if you're using the output of this research for an optimizer, some algo trader that is taking decision, you want the thing to be controlled not to start to pump the market or to trade infinity. So you need something that doesn't believe that it can arbitrage the market away. So that's one good reasons to enforce this. And the other one is if you're a strong believer in market efficiency, which to me brings to the next point, which is this stupidity chart which was made famous by an Italian economist, Cipolla, that divides people in these four categories, people who benefit themselves and the other who are smart, then you have people who benefit themselves and not the other who are bandits, then you have people who bring losses to themselves and the others that you can call stupid. And an open question empirically is the one that concerns the presence of arbitrages. So it's not clear to me whether it's possible to arbitrage the market and saying that it's bad behavior because it's forbidden by law, prevents you from pumping and dumping, whether you could not doing that because it's forbidden and because it's unprofitable or because it's forbidden, but it could be profitable. And the open question is, are real markets vulnerable to this thing which means is trying to pump and dump using the other products just stupid or is bad behavior? And this is, I would say, still an open point. So now I come to one of the properties which I think is one of the most relevant. So it's a bit technical, I hope not to lose anyone in here, but it's really central. And the idea is that if you have combinations of products that never move, so which means that if you have a kernel of this Sigma and you never measure fluctuations, you could have a condition called weak fragmentation and variance, your impact should conserve that. If something never fluctuated, even if you trade in that direction should not be able to move it. And in formula, you write it by saying that if you project your impact on the kernel of this Sigma, then that should be zero. And then you could ask for more, we call it semi-strong version, which is actually you'll have a projection also on the other side, which means that regardless of how you trade those zero modes, you have no influence on what happened on the rest of the world when you're on the thing that actually fluctuate. And the strongest version that you can ask, it is a requirement that actually everything just depends on how the modes that actually fluctuate behave, that those modes that never fluctuate, they have no influence whatsoever on anything. And so somehow the property I had before, this fragmentation in variance, it tells you that if you have no fluctuations, impact should not move and you would like to control it from the first principle in this way. And there are other properties which are, we'll see are a bit related, which relate on liquidity. And this again, it's a bit of a long shot, so I'll go slowly. Let's say that you have some products which are very liquid, their liquidity is proportional to epsilon. So this means that you have this vertical stripes as I was showing you before in the response and you'll have a strongly diagonal dependency in your Omega because you have this subspace of stuff which is hard to trade. And what you would like to ask is that illiquid products should have finite influence on liquid products. Somehow, if I have $1 invested on a very thin liquidity product, you should not be able to use that dollar to move disproportionately, like epsilon to the minus one, something which is liquid. This we call it weak cross stability. And we also have a strong version of that, which is the fact that illiquid product should not change interaction among the liquid products. Somehow, if I introduce a new instrument and no one is trading that, the behavior that you measure on that should have no influence on the stuff which is liquid. And finally, these two are nice properties and they have a property that you should not want to have which is a self stability, which is somehow dangerous, which is in the part of the plot in which you have the projection on the illiquid space. And it's telling you that actually if you trade something which is illiquid, you pay a finite price. Actually what you would like to have is to have a model in which things are self unstable, in that you get punished if you trade illiquid stuff. Somehow you would like a model that tells you that you don't want to be the only one in the market trading some name. So as this concept is a bit complicated, I'd like to illustrate how it translates in the model I was presenting. Let's say you have a two dimensional word with an omega. So there's a liquidity matrix which is diagonal. And then I say that the second product is illiquid. So I go from omega to this omega prime and the illiquid is this epsilon and this PS project projects on that space. If you have a model with direct impact, it is cross stable because your cross term do not explode. You have, it's not self stable because it's telling you that if you trade the illiquid product, you pay a lot something which is epsilon minus one which is nice. If you do a maximum likelihood estimation, it's violating this cross stability. Maximum likelihood is telling you that if you trade the illiquid product, some say small liquidity stock, which is small cap, you're able to move the price of Apple. You can use that product to move the price of Apple which you think is unreasonable because it would lead to arbitrages. So doing naive maximum likelihood from this respect is not good because it doesn't respect this common sense requirement. This model that we have been proposing, which we have called ELM, Eigen Liquidity Modeling, which you project in the space of modes, actually it doesn't have any epsilon whatsoever because it's telling you that anything with which exists are the modes of the correlation matrix. And even if you trade the liquid products, you're not going to move it a lot because it's correlated to something that has finite liquidity. So it's self-sable. Somehow it's telling you that you should not be afraid of trading illiquid things because they're correlated to liquid stuff. And finally, the reason why we like this Kyle model is that somehow it gives a non-trivial answer to this point. It builds an interesting scaling variable which is this epsilon minus one square root of one minus rho square combination. And how one reads this is that should you be afraid of trading the illiquid product? And the answer is no, but it should be very correlated. So somehow you have a limit which is not well-defined. You have a non-commutation of limits and you should ask yourself, is it more illiquid or is it more correlated? Somehow you should compare the level of illiquidity with the level of correlation. It's okay to trade illiquid stuff if it's correlated with liquid stuff, but if it's very liquid then you should be careful. And I think it's quite a smart answer from this model gives. And another properties that you might want to ask is that does the cross-sectional structure of returns is the cross-sectional structure of returns consistent with the observed ones? And this property relates to the fact that lambda omega lambda transpose should be equal to the correlation matrix, which to me is not that clear that it should be the case because you could have that a part of the covariation of prices of financial instruments comes from liquidity reasons and it goes in one direction and you have part of price variation that comes from news and goes in another direction. But it's good to know which models have these properties and which ones do not. And finally something that we discovered just very recently but which I find extremely interesting is that these two key properties for building a cross-impact models, these invariance fragmentation invariance properties and these stability properties on liquidity, they are related by gauge symmetry. And the connection goes as follow, you take a model which is split invariant, so in which you can rescale away the volumes, then you can map a theory with weakly fluctuating direction to a theory with illiquid direction. And this just goes in one direction but if you also have rotational invariance it goes the other way around that if you have a theory with illiquid direction then you can go to weakly fluctuating directions. And it's got a lot of very deep connections I think which is the fact that if you have fragmentation invariance semi-strong fragmentation invariance then you have weak cross stability and if you have strong fragmentation invariance you have strong cross stability. Somehow it's enough to ask for fragmentation invariance and you have good behaved models in terms of liquidity. And yeah, so there was the second part of the quiz and the audience could answer to which models they like the more now which shall skip. And the take home from this part here is that the best choice that you have might depend on the application. And the other point is that what you consider reasonable from first principle might not be met in real data which brings me to the last part which is the empirical one. Before that I just wanted to show you how these different models relate with respect to these different properties. You don't have to check all the ticks. I just want you to point out that the sky model it's nice because everything is green. Somehow all the properties that you could have are verified by these models. And actually just the total number of models we came up with is quite large was 11. And that's why this axiomatization is nice because you check which difference they have they have on the basis of which properties they do satisfy or not. So now data which I had promised I started with a set of illustrative examples and back to this crude case. If you remember I had three products front month future contract on oil the next to maturity oil future contract and the relative calendar spread plus one minus the other. And you see the results of maximum likelihood on the left. And what you see is that it's clearly non-symmetric and you see this strange vertical stripes as this trading this combination plus one minus one actually was able to have a disproportional effect on crude which seems a bit unreasonable. And on the other hand you have this Eigen liquidity model that tells you that basically you have just one single product. And the single product if you remember these two crude expiry they were 99% correlated. So let's assume they're 100% correlated. It's close to reality. And then you get one single product and you see that this matrix here is actually rank one basically. You just have one mode of liquidity. It's symmetric and you're not afraid of trading the calendar spread even though it's very liquid because you say well anyway it's one single mode then one single pool of liquidity. We have also run the same example by taking anti-correlated products which is a nice example. We take US bonds, these 10 year US notes. The first expiry and the second expiry which are very correlated as in the crude example. And then we take the standards and poor mean so it's an index future which is basically the state of the US equity market. The first expiry and the three months expiry also very correlated. And they are typically anti-correlated but always accepting this COVID related days as we have discussed before. And so that's it for the correlation matrix. And when you look at the volumes what you see is that the front month expiry is very liquid and the other ones they're less liquid which reflects in the vertical stripes in the response function. And if you see which are the fitted cross-impact models actually what you see is that in the maximum likelihood as in the previous example impact is driven by the illiquid product. Somehow the answer is that the impact is a relative phenomenon. It relates to how much you trade with respect to the typical level of liquidity in that product that what's the maximum likelihood gives. Whereas this Eigen liquidity model tells you that the impact is absolute. It's not about the relative level of your trading respect to the usual. It's about the overall number of dollars that you trade which means that basically in this ELM cross-impact matrix you have two blocks which are basically one single product and then you have this cross piece. And finally this Kyle model is quite interesting because it's able to give an impact model which has a negative sign in the off diagonal block. It's able to grasp the fact that effectively you have a single product for the 10 year US note and for the SP mini things. You have these two by two blocks but somehow it also understands that it's if you trade the thin liquidity contract you might move it by a lot. So you have 4.4, 4.3, 4.3. It's the same product but you have a 59 if you trade the illiquid thing. So Kyle is saying that impact is not relative and not absolute and it's something in between and satisfying all the actions that we had put. And the other final illustrative example is what you see on stocks. Which we used in order to emphasize the fact that you can run this machinery in a high dimensional space so we take I think 400 products. These are the correlation matrix for stocks. You see that you have a strong diagonal component. You have a strong market mode which has been removed from the plot because otherwise you would see a constant color thing but if you remove the center of mass you see things which can be associated with the financial sectors which are like blocks in this correlation matrix. And actually the correlations of volumes it looks pretty much the same. So once I want to make a slide that you have the strong diagonal component correlation in both cases. You have a market mode removed sector structure and somehow the flow correlation is less correlated cross sectionally which means it's a bit more diagonal which is not visible in the figure. But what's interesting is that you have a huge heterogeneity in volume. Some stocks are very liquid. You have Apple trades three billions a day and you have something else in the sample that might trade maybe $10 million a day. You have orders of magnitude of difference which leads to estimations of cross impact matrix that have big stripes. If you use this maximum likelihood estimation and on the other hand if you do this Eigen liquidity model prescription of projecting on the modes things are very much homogeneous but you don't somehow you don't conserve this heterogeneous volume aspect and this kind model is able to do a bit of both. And my next point is that one could ask himself how much do you overfit by doing this? You would expect that maximum likelihood overfits a lot and more principal things overfit less. And what you see in this plot is that in the upper panel you see an R square of these regressions at the time scales that we have been using like one minute and there is actually maximum likelihood that does good. The bottom panel is the overfitting and actually what we are quite surprised by is that you don't overfit that much or out of sample score with respect to the in sample is like between 95% and 100% but that's because we have a lot of instances each one minute it's a new snapshot and so you're able to afford a greedy model in terms of data as maximum likelihood but if you were sampling at slower frequency like 10 minutes or some minutes then you would see that the performance of maximum likelihood would degrade and models that have a more principled that they're building a more principled way they stand better the challenge of overfitting. As you can see the one that behaves the better in terms of overfitting is the direct model because if you don't have cross-impact you don't have N squared things to estimate you just have N and this is good in terms of statistical performance but on the other hand you're losing a lot and the point is that these techniques they tend to be greedy in terms of data and you need to have enough data in order to be able to say something and this general principle they're helping you in selecting things that make sense. Another point here is the comparison of the performance of all the examples I have I showed you three examples one was about this crude future one was the example with the anti-correlation and the SP mini and the 10 areas notes and then the stocks and the upper and bottom left panel they compare the performance of the models in the different examples what I think one should say is that typically at this one minute timescale maximum likelihood it tends to perform better and the direct model tends to perform quite poorly you need to have cross-impact but somehow it's not that much evident from the stocks which are this solid continuous line in which we've arrived the number of assets from say one up to 400 by bootstrapping in order to check the robustness of the results with respect to the number of products but if you check the points there they are the other two examples so the first example which should be the plus signs they're bonds and indices and the other one are the crude oil contract example and in that case the direct model does very bad and you expect that because they're very much correlated so somehow if you don't take into account this cross-impact you lose a lot you don't lose a lot on the stocks because they're very diagonal but you lose a lot on these future contracts because they're very much correlated and that's somehow expected and then I like to show this picture because I find it quite illustrative here I'm showing the R square so the statistical performance of different models in different liquidity trenches what you see with the blue curve which is the direct impact is that it does relatively well on liquid and illiquid products but it does bad in the middle the orange model which is this Eigen liquidity model does well in the intermediate region and somehow the Skyl model does is able to get the best of both worlds and the maximum likelihood again does better for this choice of time and this has an intuitive representation you can understand, you can rationalize it by saying that actually for very liquid and very illiquid product the volume heterogeneity is very important so things tend to be very diagonal and a model without cross-impact does the job quite well in the intermediate regime you would expect a model that has cross-impact and it puts all liquidity together to go to do well but if you want to have right both the illiquid regime and the strongly correlated regime you need some good way of interpolate which brings you to this green curve and the last bullet point I have is that yes maximum likelihood does a bit better of them but it over fits more at larger timescales and I had this very last instance of this quiz and the take-homes are that well you can rule out bad models based on data but noise often leaves you with a set of reasonable models and using principle criteria for model selection might be a good idea in those cases and the conclusions first on the theoretical side the initial recipe I gave was naive labeling the modes of sigma as true products misses the liquidity effect this Eigen liquidity model predicts that they are actually the same thing assuming that correlation and liquidity they are matrices that commute but in general if you have heterogeneous liquidity these things do not commute and what actually is a financial instrument has a hard answer because you have no commuting objects and in practice you actually have no clear winner on data the application should be a tiebreaker among models performing similarly and on the other hand arbitrage condition fragmentation, invariance and liquidity behavior they play a crucial war and you would like to understand what's going on from that angle when choosing a model for example a direct model is working if you're trading is quite diagonal but it fails miserably if you're trading say modes or relative modes of products and the Skyl prescription it's nice because it gives you decent performance and many convenience features which are guaranteed exactly and I think it's all, yeah Okay, thank you very much Now, are there questions? Maybe I can start asking a question while we read your disclaimers Yes, they're always very funny So, why do you call this model skyl model? Does this has to do with skyl? I think it's a backup slide yeah, how do we came up with this model so why it's called skyl because you might see this model as arising from somehow a variational principle as an equilibrium of some game it's a Nash equilibrium of a game in which you have some market makers whose job is to price correctly the information which is contained in an order flow you have informed traders received say correlated trading signal and you have noise traders that are used by the informed trader to hide himself and somehow you have an adversarial feature in that if the market maker is wrong about his pricing model the informed trader is able to exploit that weaknesses, right? So somehow if you find an equilibrium to this game you find an impact model that cannot be arbitraged by construction that arises so the idea is to put an adversarial aspect in that that ensures that the formula is robust in that whatever weakness you put in it it would be arbitraged by someone else who's playing against you and then you guarantee exactly that there cannot be arbitraged say evident arbitrages in that Okay, thank you other questions so if not I have another one so all of what you discussed is based on say correlation matrix say linear response and this say essentially it would be I mean seems like you are assuming a Gaussian word essentially now is this correct or so higher order connections so the first point is that I think it's very bad to always assume a Gaussian word which is I think one should do more than these on the other hand if you're not even able to build a nice model in a Gaussian word you're bound to fail in a more complicated setting okay first let's do this and then move forward and this Mehdi Tomah is working on these in building non-gauss and non-linear version of that for example something that we are seeing is that the correlation structure of volumes is not linear so conditional to the trading on one asset the average volume traded the average imbalance on the other ones is non-linear and you have saturation non-linearities and they bring in effects for example okay hello yes to run to another meeting so actually you also have to run to another meeting Jacobo right yes so if no one else has a burning question so I had one so can you use these to understand how much these axioms are true in real finance yeah again this is a project that we would like to start so not focusing on the models but seeing how well you're able to reskill things somehow what these gauge invariance and stuff they tell you that you can rotate stuff or dilute things and having collapse of curves so you could use the symmetries and check how much these curves collapse and at which point you break symmetries for example I know fees and express they break rotational invariance because you have this absolute value things so you would expect something that maybe has rotational invariance when you trade a lot but that's some microstructural scale you expect to measure rotation and one could do that yeah okay so I think close here so it's one minute to three thank you very much again Jacob and and thank you all for joining the seminar and see you next time yes see you next bye bye bye so Massimo yes stop the recording stop the recording