 Oui, dans l'ancienne façon. Merci beaucoup, Mervan, pour l'introduction. C'est un plaisir d'être ici, comme d'habitude, et c'est un plaisir spécial pour cet événement spécial en partenariat avec Huawei, avec lequel j'ai été heureux et heureux de faire partie de plusieurs événements depuis les dernières années. Donc, ce sera un tour expliquant partie de ma activité de recherche pendant 15 ans. Peut-être que j'ai commencé ça comme 15 ans plus tard. Et il y aura un peu de keywords en commun avec le talk précédent. Et je vais expliquer ce qu'il y a. Laissez-moi dire que, même si je n'étais pas un expert en machine-learning, je suis un professeur dans la prochaine convention de colt, que les machine-learneurs autour de vous savent. Parce qu'il y a un sujet qui a des points de contact. Donc, ce sera un nombre de sujets. Il y aura des sujets concernant la théorie de gaz, qui est où j'ai été entraînée originalement par mon PhD. Il y aura des choses concernant la géométrie, la forme des triangles, etc. Il y aura des choses concernant l'économie, et tout ça va arriver ensemble. Et ce que nous allons dire, c'est la théorie synthétique de Curvature. Alors, allons-y commencer par demander des distinctions entre la synthétique et un point de vue analytique. Et s'il vous plaît, dans tout ce que je dis, vous êtes parfaitement free de m'interrompre. Alors, quand on dit que, pour expliquer la distingue, le meilleur est de voir les fonctions convexes. Comment définir une fonction pour être convex? Un simple point est de dire que la deuxième dérivitif de la fonction n'est pas négative. Il peut y avoir plusieurs variables, n'importe quoi. Et l'autre moyen de dire que quelque chose est convex est de dire qu'une fin de 1-tx et de y sont boundées par une fin de 1-tx et de y. Et ça devrait être pour quelque chose de x, quelque chose de y, et quelque chose de t1. Ok? Bon. Donc, c'est une définition possible. C'est une autre définition possible. Et nous savons que si tout est smooth, les deux sont équivalents. Cela va apprendre très rapidement. Cette définition va être appelée « analytique ». Parce que quand vous faites ça, vous computez quelque chose, quelque objectif relativement à FI, vous computez la deuxième dérivitif et cela vous donne un autre objectif que vous pouvez quantifier et utiliser pour vos computations. Nous allons dire « synthétique ». C'est une géométrique de graphes de FI. Il dit que FI est toujours sous la courbe qui joint la pointe. C'est facile de vérifier. Si je vous donne une fonction et vous voulez vérifier la convexité, généralement, ce n'est pas possible. Si vous essayez de faire ça, ce sera un nightmare. C'est simple. En fait, c'est très utile. Une partie signifique de ma vie avec la fonction de convexité a été expérimentée en utilisant ça pour vérifier la convexité et en utilisant ça pour prouver ce qu'il y a de la convexité. Deuxième, c'est évidemment une propriété locale. C'est pour un X, quelque chose que vous computez qui dépend seulement de la neighborhood de X. C'est complètement non-local. Vous devez le savoir pour un X ou un Y qui peut être assez loin. Et aussi, c'est plus général que ceci. Vous savez, imaginez que FI n'est pas très différentable. N'oubliez pas de distribution de théorie, ce qui est plus élaboré. Ça fait toujours sens, même si votre fonction est quelque chose comme ça. En ce cas, la seconde derivative ne fonctionne pas. Donc, c'est plus général. Et aussi, c'est plus stable. Si vous avez une séquence de fonctions, FI K, convertir à une fonction FI. Assumez que la convergence est une notion très lente de convergence. Ce sera impossible de passer à la limite dans cette formulation. Mais ici, c'est trop facile de passer à la limite. Il n'y a pas besoin d'une convergence uniforme. La convergence simple suffira. Mais pour aucun point, vous avez une convergence de FI K à la limite de FI K de X. FI K de X à FI K de X. Donc, c'est une notion très robuste de convergence stable. Et ça montre aussi un autre aspect de pourquoi c'est utile. Donc, les deux ont leurs advantages et c'est comme deux côtés de la même conception. La convergence est aussi utile, parce qu'on a ces deux formulations. Et ça a été, pour longtemps, essayant de développer, simultanément, pour beaucoup de choses, l'analyse synthétique et l'analyse analytique. Les deux ont leurs advantages et les bords. Et ici, il sera sur la histoire avec respect à des notions géométriques. Donc, question, comment faire une notion synthétique de curvature ? Alors, qu'est-ce que la curvature est ? La curvature est le principal objet dans une géométrie non-nuclide. Et la curvature explique à vous comment les lignes, dans une géométrie, les lignes qui sont les paths géodésiques, les paths les plus courtes, peuvent être comparées aux lignes dans la géométrie euclide. Donc, pour exemple, si j'ai fait quelque chose comme ça, c'est une picture typique de positive curvature. Ce sera la curvature flat, la curvature européenne. Ce sera une picture typique de negative curvature. Donc, on va mettre un capar de moment comme curvature et on va donner une définition précise plus tard. Mais ce sont les situations typiques. En particulier, la géométrie de negative curvature est une géométrie dans laquelle les triangles semblent comme ça. Et en particulier, les angles sont toujours moins qu'un pi, toujours moins qu'un 180°. Vous pouvez compter la curvature dans un de l'analytique et il y a des formules qui sont liées au comportement de la géométrie sur la curvature et qui prendra deux différences de la métrique rémanuelle, des choses comme ça. Mais vous pouvez aussi demander de la notion synthétique de cette curvature. Et cela a été développé longtemps avant par des gens comme Carton, Alexandrov, Toponogov. Les gens dans la géométrie parlent des quatre spécifiques. C, Carton, Alexandrov, Toponogov. Et cela peut être des quatre spécifiques pour la géométrie de negative curvature, quatre spécifiques pour la géométrie positive, et cela est, en lieu de compter quelque chose qu'on appelle la géométrie, on regarde les propriétés, les propriétés géométriques de la géométrie de triangle. Donc, en définition, votre espace est 4 plus 0, c'est-à-dire que la géométrie de la géométrie de la géométrie n'est pas négative. Si les triangles sont plus fatigues que les triangles, que les triangles flottes, ça signifie plus fatigues. Ok, ici est un triangle. Pour déterminer si le triangle est fat, je vais comparer-le à la géométrie de la géométrie. Donc, ceci est un triangle dans ma géométrie, à peu près non-nucléaire. Et cela sera de la même length, cela sera de la même length, cela sera de la même length. Ok? Comment comparer-le et décider que l'un est fatigue que l'autre? On dirait que c'est comme si vous portez le point de midi et vous regardez le médium. Et vous trouvez tous les médiums et si la length de la géométrie ici, de ce triangle est plus longue, est plus grande, que la length de la géométrie dans la case flotte, dans la situation flotte, vous décidez que la géométrie est fatigueuse. Le démon est si facile. Si vous êtes un triangle, juste vous en utilisez, et vous assumez que c'est un triangle. Puis, c'est comme le pied, qui va de l'arrière à l'arrière de la bête. Et, on dirait que la plus lente que vous êtes, la plus longue de votre pied, pour que vous soyez parfaitement fatigueuse. Donc, le médium est comme le pied et un espace positif est un espace dans lequel les triangles devraient être longues. Longues que les homologues, les triangles isométriques dans le espace flotte. Donc, c'est une définition synthétique. Pas de computation de quoi que ce soit, juste de comparer les distances. Et, encore, tout ce que j'ai dit ici, c'est vrai. En pratique, c'est un ordinateur pour vérifier que si je vous donne une géométrie non nucléaire, vous allez toujours aller pour les formuleurs que nous trouvons dans les textes, pour vérifier les signes de la curvature. Mais, d'autre côté, c'est une géométrie de propriété qui peut être utile, qui est direct, non-trivial, de conséquence de la curvature et qui sera utilisé dans les comparisions et théorèmes, dans de nombreux respects. En même temps, ce n'est certainement plus général que la définition de la curvature dont nous computons la notion de la curvature. En particulier, prends une cône. Une cône avec un point ici, vous savez, des apexes, des vertexes. La cône est flat, partout. Nous savons cela parce que nous pouvons faire la bouche avec les enfants. Nous pouvons couper un morceau de papier, faire une cône comme une haine clé et le mettre sur la tête. Donc, c'est flat, presque partout. Mais, il y a une singularité. Ici, à ce point, ce n'est pas une géométrie smooth. Et vous ne pouvez pas compter la curvature à ce point ou ça serait infinitif. Mais vous pouvez toujours vérifier la condition ici et vérifier que la dimension du triangle, le triangle peut être plus grand que le triangle dans la géométrie flat. En fait, aucun triangle, aucun trois points que vous avez ici, si le triangle inclut ici la partie pointe, le triangle sera plus fatigué que le triangle qui est dessiné sur la géométrie plane. Et donc, la cône est, dans cette définition, un spécifique spécifique non négative de la curvature. Donc, c'est comme si la cône serait comme l'analogue de l'exemple que j'avais ici pour la fonction de la connexion. Et la prochaine chose est, bien sûr, cette notion est plus stable que la définition de la curvature dans laquelle vous avez la seconde derivation de la géométrie. Pourquoi ? Regardez, si j'ai une série de géométries qui convergent à une autre géométrie et je veux prouver que la géométrie limite la même boundation de la curvature que la séquence, qu'est-ce qu'on doit savoir ? Quelles sont les distances qui convergent ? Le plus fort possible de la notion de convergence de la géométrie que vous pouvez imaginer. Il y a un spécial de formalisme, il s'appelle Grumhoff Hausdorff Topology, mais tout le monde, c'est juste une façon de faire une convergence si faite que la seule chose qui concerne est la convergence des distances. Et donc, c'est stable aussi. Donc, c'est aussi quelque chose bien connu et qui a été étudiée dans des décennies par un peu de géométrie. En particulier, il y a une school russienne et il y a une très forte schools japonaises aussi pour travailler dans ces spaces de l'Alexandorff. Donc, c'est l'exemple principale de l'application synthétique de synthétiques, de concepts en géométrie. Ainsi, avant l'électricité, c'est le prologue, si vous voulez, de la histoire que je vais dire. Ainsi, c'est si bon, n'importe qui a des questions ici ? Ok. Donc, ce que je vais dire maintenant est la histoire de l'encontre de quatre fields. Le premier field sera sur la curvature. Et dans la curvature, il y aura une notion de curvature, qui je l'ai déjà dit ici, même si je n'ai pas donné une propre définition. Et aussi, on va parler de la curvature riche et des conséquences géométriques. Le deuxième topic sera l'optimal transport, qui est une branche d'optimisation avec des liens à l'économie, qui est aussi appelée l'Homage-Kontorović théorie. Le troisième field sera la théorie de gaz et la physique statistique. Avec l'intention de l'entropé dans le sens de Boltzmann ou de Shannon. C'est la même formule. Ou de l'entropé dans le contexte des statistiques. Et le troisième sera la notion de l'entropé gradiant. L'entropé gradiant, qui nous connait comme quelque chose comme x dot, est minus grad U of x, si u est un potentiel, un peu d'énergie, ou quelque chose. Comme il y avait dans le précédent talk, l'entropé gradiant. Et il y a la toute synthétique théorie de l'entropé gradiant, qui a été développée dans des décennies et décennies, non smooth gradient flows, en particulier par des gens comme Degiorgi ou Beniland. Et je vais parler de ça en même temps. Et nous allons dire que, de toute façon, le problème de trouver la propre synthétique définition de Richie-Covitcher était créée par une combinaison de ces quatre fields. Et cette combinaison appuyée à la fin de 1999-2000, qui a été développée depuis. Elle a maintenant donné un progrès, un progrès à un très, très rapide pace. Au début, il y avait seulement un couple de papiers, including contribution by myself. Et maintenant, c'est en train de se faire que je ne peux pas suivre les développements. Nous allons dire que, à partir du début, c'était un bon coïncident comme je l'ai été un étudiant de PhD. Alors, je l'avais entendu un parler d'Otto, un chercheur en Germain, sur son interprétation sur les frôles de Gréden. Il a à l'amener les trois topics ici. Et quelques mois après un couple de mois, j'ai vu un cours par l'étudiant de son parler. Je l'ai lu un cours J'ai lu un cours par Michel Ledoux sur la concentration, la théorie et l'entropie et l'optimal transport à un moment. J'ai pensé qu'il y avait beaucoup de keywords que je devais pouvoir voir. Il a dû être lié au talk de Félix que j'ai entendu il y a quelques mois. Il m'a pris 15 minutes pour trouver le lien. Il y a deux semaines pour écrire le papier avec Félix et c'est mon paper le plus quoté de ce jour qui montre que le succès d'un papier n'est pas une fonction de temps que tu as mis en place. C'est une question d'avoir le right, d'être au right place au right moment. Let's continue et explique un peu les concepts. First, let's give a precise definition of a definition which is good and which we can understand. Let's define the curvature as follows. Sectional curvature. Here is this geometry. It may be a curved geometry in several dimensions and I take a tangent plane in this geometry. Two directions out of n if it is n dimensional. I take a slice in the tangent space and in this slice here is a point in my geometry and I can take velocities going along this slice. And I take two of these tangent vectors that I suppose are unit length and orthogonal to each other. This is u, this is v. And I look, I send one geodesic here in the direction of v and one geodesic here in the direction u. These diameters will call the exponential map. It's like I send this plane in this direction and this plane in that direction and because the plane is not full it wants to minimize the amount of energy which is spent so we'll go around the geodesic. And now you look at the distance between this curve, gamma of t and this one, gamma tilde of t and if it was exactly a flat geometry it would be square root of 2 times t. Juste good old Pythagoras theorem. Ok? But now, because it is curved there will be a correction. And so at first order it is one the second order vanishes the third order gives you the corrections and it is minus k over 12 t squared multiplied by t so it's a cubic term. And then the rest is higher order term as t is close to zero. So this is it. The infinitesimal leading correction which you have to put to the distance of two geodesics which spread apart. That's the definition of curvature. It's a definition which is equivalent to the one that Gauss used and this one has the advantage that it is expressed in terms of distances so that you immediately see that it's invariant under isometry. The property which Gauss was so proud to discover. So this is the definition of sectional curvature. Why sectional? Well because I took a section, you know, a plane in the tangent space. Now this means that we will have a number of curvature. If the dimension of space is 3 say and I consider three directions of my tangent space there are three sections that I can find in this way. Either vectors 12 or 13 or 23. Here is now another notion of curvature which is the favorite notion used by people in statistics, in probability theory, in potential theory, in stochastic processes is the Ricci curvature. So Ricci curvature is an average of these sectional curvatures. So take, this is my point, X and I take one direction, E. Let's say it's a unit vector. So it's a direction in which I'm looking. And because this is just one dimension among n, let's say the dimension is equal to n, I can complement this, complete this with other unit vectors so that this is an orthonormal basis. Ok? And then if I pair E with either E2 or E3 or whatever this gives me n-1 possible sections and I will add up all the sectional curvatures corresponding to this. So Ricci in the direction E is by definition the sum of the curvature associated with the sections E, Ej for j going from 2 to n. That's the definition, equivalent to the definition used by Ricci. It turns out, and it's not trivial fact that this Ricci, which you see to every unit vector it associates a number in fact you can extend it into a quadratic form on the tangent space. So that's a quadratic form. It is famous that Ricci curvature is the right notion of curvature to use in generativity and so it played a cornerstone in the work of Einstein generativity. It is also famous that to this date generativity had only one let's say application in a practical life which we all know is GPS because when you want to compare the times which the information which is sent by satellites to you you have to take into account the fact that the time is not exactly the same for these signals which travel at very fast speed and the relativistic correction is one of the corrections that you have to take into account. Otherwise, you just miss the point but I think it was computed without the generativity correction your GPS would be an error of 10 kilometers after one day something like this. So it's an error you have to take. Ricci curvature is the one that gives you the information about the distortion of space time. What to say next about Ricci curvature? Ricci curvature, as I said is the notion of curvature that everybody interested in probability and measure likes. It goes well with measure. Why does it go well with measure? Because while the section of curvature tells us something about distance the Ricci curvature tells us about volumes. We are taking an average of distances in various directions it's a bit like a surface. So I will give you another interpretation of the Ricci curvature. Before that let us say that for instance if you are a specialist of diffusion processes and in your geometry you start a Brownian motion and we know in general there is this correspondence between solutions of diffusion processes like heat equation and the Brownian motion that was discovered by Einstein, Bachelier et others. And you may ask does this remain true in the Neuclidean geometry? Answer, yes, if the Ricci curvature does not blow down too fast does not go too fast to minus infinity. If you have a lower bound on Ricci curvature that these things don't get too negative in curvature wise there is no correspondence between the stochastic processes and the partial differential equations. Please interrupt me if I am going too fast or too slow. So here is an interpretation about Ricci curvature the one that you can explain to anybody. So it's an informal interpretation. Yes. Here, this is me standing and observing some light source which is here. The light travels in space in all kinds of directions around geodesics but if I am interested that those light rays which arrive to me it looks something like this. Ok? And it may be distorted so that you see this is what you get the geodesics are not straight and so if you are here I am asking what do I see you have depends on the slopes of these directions of these rays when they arrive on you and so if you try to reconstruct the image from observation this will be your impression you see and here this is a typical picture of positive curvature meaning that you will overestimate the size of the source overestimate in which sense if the context is this positive sectional curvature you will overestimate say the diameter of the source if it is a richie what you will overestimate is the surface of the source or the volume you will have the impression for instance it will be something like this in reality and your impression will be something like this it may be smaller in some directions larger in other directions but the surface and if that is always the case it means your space is non-negative richie true surface and if it is like this this is a richie positive and there are similar ways to formulate richie bounded by minus identity richie bounded by minus something I said it's a quadratic form compare it to other quadratic forms ok so far so good let me mention also that this is the thing that people love so probabilists will say ok take a manifold let's assume that richie is bounded below on all the space so that I know that there is no blow up of my brain and motion and things like this let me also notice that nobody works under assumptions of negative richie curvature it makes sense but it's not useful there are no theorems about it and there are deep reasons why it's not interesting to consider negative richie curvature while negative sectional curvature there are a bunch of people who work with this and it's very important in certain branches of mathematics but for richie when you talk about richie comparisons it's always compare richie from below ok so far for the story of curvature and let's go on with this other story here optimal transport so Gaspar Monge here was a remarkable mathematician of the 18th century he was a personal friend of napoleon and he was a devout revolutionary guy you know he was one of the founders of the eco-politechnique eco-normal superiors and so on and very well known for his geometric intuition so this is the problem that Monge asked in a memoir which appeared around 1781 imagine that you have some piece of whatever of soil or whatever maybe so let's let's draw it rather in this way suppose that you are you have some matter something that you are extracting from the ground you see this is a mine of whatever and you want to transport this and rearrange this in some way so this will be some kind of construction that you have to make ok good and each piece of the matter you transport here from some point x you will transport at some other point say t of x here here it's a one dimensional drawing but in reality x may be three dimensions or even more and so t of x also we have three components or even more and of course in general there are many many ways for a suitable map t which rearranges your matter from the starting condition to the final condition people will call this an operation of push forward if I consider at the mass here probability measure and I call it mu and here I call this nu the distribution of matter then the condition is that the push forward of mu by t is equal to nu but this really means that I transport every atom of mass which was at position x to the destination t of x and I look at the shape of the resulting transform distribution after this transport operation and so Morge asked ok there are infinitely many ways to do it good but certainly they are not all equal and there has to be some ways which are better than others and he asked how to find the optimal way to do this can be a very practical issue for instance assume you own a big taxi company you have 30 cars at the end of the day you should go back to 30 parking lot places how do you decide which car goes to which parking lot it's again a problem of transporting the matter in here every atom corresponds to one taxi and you want to make sure that in the end every parking lot place has a taxi in there but how to decide that this one goes there this one goes there you want to spend as little fuel and time as possible here's what Morge said let us minimize it's just putting this into mathematical formula the integral of the cost to go from x to t of x and this here will be a cost function and I integrate this against according to the starting distribution and the minimize minimum is over all t such that the image measure of mu is equal to nu which means at the end of the day I have rearranged the matter as I want it to so this is the Morge problem ok and this is a dreadful problem because of course depending on your if you look in terms of t it may be a non convex problem of course this cost c here may be anything taking the distance cost function which is degenerately convex but even more serious this condition is highly non convex so you are facing a highly non convex optimization problem if you try minimizing it doesn't work doesn't pass to the limit anyway still Morge managed to make interesting observations about the geometry of the solution if it exists it took it took more than it was yes it took nearly 300 I'm exaggerating took 220 years before people could prove that the problem of Morge had a solution in general Morge was assuming solution exists and drawing conclusions proving that the solution always exists was a very tricky variational problem Kantorovich is a completely different profile even though there are some common points between both one fun coincidence being that both of them were extremely precocious students and both of them were professors at the age of 22 so Kantorovich bit of praise for Kantorovich one of the great mathematical economists actually Nobel prize in economics one of the first masters of how to say computation and numerical analysis let us say he played a role in the war as applied mathematician in some very sensitive matters he was involved in the development of the atomic bomb on the Russian side he was also involved in the improvement of the taxi fare in Moscow working on all kinds of problems and at the same time a master in abstract functional analysis so Kantorovich at some point gets interested in this problem without knowing about the work of Morge and reformulates this in the following way minimize the double integral of c of xy pi of dx dy where c is still the cost function and the minimum is taken over all admissible pi which have marginals mu and nu so this pi here is a joint probability measure such that when you integrate over x pi so it gives you a measure in only y and you will obtain nu of dy and when you integrate in y you will obtain mu of dx so if you are if you have an engineer heart it is very easy this is x, this is y you have some joint probability measure which is like a blob here and when you take the average here you obtain the measure nu the average there you obtain the measure sorry this is mu and this is nu and the amount of intensity here tells you how much mass you should transfer from position x to position y on the plan and these people call a transportation plan so that the problem of Kantorovich and it's the same problem as Monge except that it's a probabilistic way to formulate a point some matter which is at x may be spread, half will go here half will go there it's a probabilistic version of the Monge problem and Kantorovich discovers several things first this problem now is perfectly convex convex constraint and convex here whatever c, this is a linear function of pi actually it is a linear programming problem meaning that the constraint is defined by linear inequalities and what you want to optimize is also linear in the unknown and Kantorovich is actually one of the fathers of linear programming it is funny that in the days of Soviet Russia some of the theorems of Kantorovich were considered so seditious that it was forbidden to mention them in public and nowadays the work of Kantorovich is everywhere, people learn about it in micro-economics courses and so on so what, for instance Kantorovich prouve about this it's called the Kantorovich duality and by stating it we shall understand why it was relating with economics so the Kantorovich duality it will not tell you what is the optimizer but it will tell you if an optimizer exists, any optimizer satisfies a certain property and can be re-expressed in terms of optimal prices so minimum of double integrals c of x, y, pi of dx dy is equal to the supremum of integral phi d nu minus integral psi d mu where here pi has marginals mu and nu and here phi psi are real valued functions such that phi of y minus psi of x is always bounded above by c of x, y but it is so seditious in this so here is interpretation the shippers interpretation assume you are here, the big boss trying to organize how to send the production I told about taxis, maybe it is something that you produce you extract from some mind and send it to some facility where it will be bought or whatever, transport it from producer to consumer and this is your problem how to organize a transport and then this other guy comes to me and says, look I am specializing in transport don't worry let me buy the good from you from your mind, I will buy it and then I will sell it back to you at the consumption site and in between I do the transport I am not a crook so you can check that whatever the final destination y and the start point x I will not charge you than the transport cost and then it is up to me to organize a transport of course this is what in the end of the day will be in the pocket of this guy what he will have earned by selling back and buying at the initial time and of course if he wants to maximize his profit this is what he should achieve and the Kantorovich theorem tells you it is the same for you to minimize as somebody with an arc view transport or for that guy to maximize his profit and this was considered a very capitalistic theorem actually Kantorovich wanted to build a rational theory of prices which almost certainly was going to death sentence if it had not been for him being so useful in other projects in those days you know building a rational theory of prices was considered like so Kantorovich started this and gave ways to find to study the solution now the subject was revolutionized at the end of the 80s with independent works end of the 80s independent works by Brunier showing some relation of these problems to fluid mechanics by Mike Cullen showing relation to problems of meteorology and with John Mather showing relation to problems of dynamical systems and then many people jumped in and proved theorems let us just give one theorem so if you are in geometry and you consider natural cost function to be the square of the distance between two points why the square? if you are in physics, kinetic energies square of the velocity and if you look at the energy which is spent squares naturally appear also in the previous talks they are natural squares as error functions it's a natural way to define error which has all properties so let's take this how does it look like the optimal transport using the theory of Kantorovich using the bunch of convex analysis quasi convex analysis whatever semi convex you find that the optimal has the following shape t of x so that's your optimal transport is equal to exponential grad psi of x this means what I start from x I send from x geodesic with initial velocity grad psi of x where psi is some price function as in that duality theorem and I go for time t equals 1 and then I stop and that's my t of x and such that psi is d square over 2 convex meaning it has some kind of convexity property there exists some function zeta of y real valued such that psi of x is equal to the supremum of zeta of y minus d of xy square over 2 so here it is whenever you have a map which takes this form exponential of grad psi of x and the geodesic going from x to exponential grad psi of x is minimizing geodesic then you know that it is a solution of the most problem than it is optimal if you are performing transport in a non-nuclide geometry with cost which is a distance square this is a theorem due to macan, Robert macan so many things to digest in such little time so far so good so basics of optimal transport we talked about this we talked about this let's be so brief about this by just giving definitions one possible definition how do you define gradient flow in a non smooth setting well you always do not always but the simplest way is to do some kind of time discretization time discretization so fix a time step tau very very small and you construct x0 tau x1 tau etc. xk tau as your discrete approximation meaning that this should be something like x of k tau and in the end you will send tau to 0 and your discrete approximation will become a continuous limit and at each step if you have constructed xk tau then xk plus 1 tau is obtained as a solution of the minimizing problem u my energy u of x plus distance xk tau square divided by 2 tau meaning I want to make you decrease as much as possible this is gradient flow but not going too far from the point when I was just a moment ago not going too far and how is too far is too far is you look in terms of the square distance divided by tau if you pass to the limit you obtain the gradient flow but of course this makes sense even if there is no gradient no smoothness just a metric space is enough to make sense of some of this en général, however it will not converge to a single defined object maybe there are many possible limits and that's a problem how can you define a gradient flow which is unique which is well defined in the limit it was developed in particular by the Georgie in order to do a theory of image processing how do we do gradient flows on objects such as shapes for instance and let's say no more about it and let's concentrate in the end about this entropy Boltzmann so entropy it may be argued is the cornerstone of both statistical physics and information theories and in entropy we know that this is the famous Boltzmann formula S is K log W in which W is everything that you cannot know about your statistical system for instance if you are looking at the air you can measure pressure you can measure wind you can measure temperature but you will never know what are the actual positions and velocities of the particles so all this is the unknown W it may be a set of high cardinality or it may be a continuous set and the volume will be important you take the volume and multiply by some constant and Boltzmann understood you can always find a practical formula for this in terms of the density F and it has been formalized in statistics under the name of Sanofi Rem it's the rate of deviation for the empirical measure in technical words but it's also a simple exercise which corresponds to the following suppose that you ask which is the number of ways to put n particles in k boxes such that think of each of these box as a discretization of the state of one particle say position and velocity such that for box number k the number of particles in the box k divided by the total number of particles is approximately equal that is in the limit of many particles to a fixed fk which is given and which is a frequency so you asked in other words knowing about the distribution of particles statistical distribution of particles which is this how many ways are there to realize this in terms of many particles that will spread along and the answer if you look at 1 over n log of this number of ways to achieve this to achieve these fk's and you go to the limit n goes to infinity you obtain minus sum of fj log fj over j giving the famous x typical non-linearity which appears in the Boltzmann entropy and is the same that you find in the Shannon entropy when s equals minus double integral of f log f integral over all the space very often one speaks of information rather than entropy so the formula for information would be integral of f log f and very often it is with respect to sum measure and here this measure may be uniform measure if you are in context of classical statistical measure or it can be some reference measure which is arbitrary maybe it is your geometry it will be the volume measure in your geometry maybe if you are working in a problem of potential coming from physics you will consider the Gibbs potential something like exponential minus v of x dx many possibilities but you may be interested in counting the entropy with respect to a measure which is non uniform and what do we do with this so first there is a clear statistical interpretation entropy tells you if a given statistical distribution is rare or very common this is the information here that will tell you if it is very common to have something that is very common is when this h would be as slow as low as possible entropy as high as possible and also it's classical that in many respects when you look at distributions of high the maximum of entropy you will have Gibbs distributions you will have Gaussian distributions and this a major rule in statistical physics in equilibrium just to quote a few places in which entropy was used in an important way for instance since it was a question about the empirical measure in the previous talk so through in particles x1 etc xn which are iid and low nu ok and you write the empirical measure mu hat is one of her n sum of Dirac masses at xi ok and we know by law of large numbers in the long run this will be an approximation of nu but which is the probability that I make a mistake probability that the empirical measure looks like something than a certain mu rather than nu the target by reconstructing while it is approximately exponential minus n h nu of mu which is the relative information of mu respect to nu so this shows you that it gives you a measure of discrepancy of one measure respect to another and as such it also plays a major role in statistics where it is called of an cool back information another example of a place in which the entropy is useful it was used in a John Nash master piece and the regularity of diffusion processes it was also used to give quantitative errors in the central limit theorems you may think that central limit theorem is most of us when we see this in computations in teaching that central limit theorem is related to Fourier analysis you take Fourier transform you look at this etc but there is a much more beautiful proof going through entropy in which the Gaussian distribution arises because it is a distribution of maximum entropy for fixed variance so these are some of the examples entropy was also used by Grigory Perlman in his proof of the Poincaré conjecture it's kind of ubiquitous so here we are now I have zero minute left to tell about the core of the talk so I will in a very rude way take a few minutes with the authorization of our big boss Emmanuel and what is the link between all of this was discovered around the end of the 90s that there is a strong link between them and this was done in a seminal paper by Jordan Kinderlierer and Otto which is the link I will tell it in words if you consider probability measures on some geometric space and you know to transport one measure on the other has a certain cost so there is a transport problem you have a distance on the probability measures so you will have a certain distance let's take cost to be the square of the distance and let's take the square root of the cost as a distance on probability measures so distance between mu and nu being the square root of the optimal cost with cost c equals d2 of transport cost so this gives him a metric space I can apply then the gradient flow procedure with a distance between probability measures I need an energy function on probability measures I need two things, I need a distance and I need an energy let's take this as the energy the entropy and so let's take some things or the information if you want so let's construct the gradient flow that will minimize the descent for the information question, what do you get ? not obvious answer heat equation and this is a very general procedure so heat equation is the gradient flow of the information H for the optimal transport metric this was a discovery of Jordan-Kinderler auto and you know first they did this in Euclidean space but then people did it in contexte and even in discrete settings it's an extremely general rule and the second link was between these three objects and this was a paper of myself with Felix Otto and we showed that also there is a very simple link between these three and the link is this if I want to know what is a Ritchie curvature in a way that reminds is mindful of this but without any issues about directions or whatever in a way which is robust in a synthetic way I need to look how the entropy is changed in the process of optimal transporting one distribution and to the other and this is the criterion which later from 2004 was built into a synthetic definition of the Ritchie curvature and used to prove a number of theorems so let me show you a definition and you will hear have the real meeting point of these three fields how do we define that a certain geometry with maybe no smoothness has a Ritchie curvature bounded from below by zero synthetic so this was turned into proper definition in works by Lott and myself plus Calte-Odor-Sturm and now is often called LSV spaces you had CAT for sectional LSV is for Ritchie so how do you do it take any probability measure starting probability measure mu zero and final probability measure mu one assume that this is the initial state of your gas this is the final state of the gas I call this the lazy gas experiment you are giving an order to the gas you have just one minute t equals zero t equals one after time one you have to be in this shape and the gas is free to rearrange how he pleases maybe particles will go this way that way etc and the gas will choose a way which will spend the least amount of kinetic energy an optimal transport way because he is lazy this is life but there is a fixed target which you impose and during this process what is interesting for you surface, apparent surface so you will try to measure the spreading of the gas maybe at times the gas will be very much compressed at other times the gas will be very much spread so we need the measure of how much the gas is spread and the right notion of measure to do this is this the entropy as a measure of the spreading of the gas so you will measure the entropy of the gas at each time t equals zero to t equals one if the entropy curve is a concave function of time it means if it is always a concave function of time it means that which is not negative it's equivalent to s equals minus integral f log f let's see I will write this it's not very clean but a mu t log mu t where mu t is the evolution at time t of the gas is a concave function of t and this at first looks like a complicated definition but look here there is no need for any smoothness it uses only robust objects integrals, minimization of cost function optimization problem which is behind so we know it should be very robust and indeed this notion of a synthetic curvature what can we say about it and so it took bunch of people to show that this notion of Richie first is compatible with the Carton-Alexandrov-Toponogov theory is more general than the classical one allows to prove many geometric theorems sometimes under slight reinforcements of the structure condition is enough to define a unique gradient flow unique heat equation so if you have a space which is as horrible as you may imagine you know nothing about how wild it is whatever if you know that Richie is bounded from below in this sense then you know it has a unique heat flow on it and you can do some calculus in it it's in general non linear this heat flow but if you impose an extra Hilbert space condition on the sobolev space anyway some extra condition it becomes linear flow one example of something which truly captures the meaning of curvature and which is emblematic example the Brunminkowski inequality deep and simple if I have Richie bounded below by zero and I take a compact set X and a compact set Y and I look at all midpoints so I drew all possible geodesics between one point here and one point there and I take the midpoints each time and this gives me a set of midpoints then this set of midpoints cannot be small it has to be large the volume of midpoints of X and Y to the power one over N effective dimension of the space is bounded below by half of the volume of X to the power one of N plus volume of Y to the one over N so that's one example there are many others other applications and here we will be closer also to the spirit pass talk so Richie bounds also imply bounds on the convergence of gradient flow to as we call pass talk if you have a uniformly convex function then you look at the gradient flow in there the convergence is exponentially fast you can see this as a geometric property and the condition in fact is Richie bounded below by a positive number this corresponds to exponential convergence in a natural way they also imply concentration measure bounds measure concentration of measure like something like if you look at the error between observed quantity one over N sum of f of X i where these are iid this will be approximately equal to integral f d mu and the convergence will be exponential if you have positive Richie curvature so again in approximation theory say strictly positive Richie bounds and so on so many things that you know for the smooth Richie case also extent to non smooth Richie case and you can also prove inequalities quite recently were proven using these definition inequalities isopérimetric inequalities which are sharp outside the range of Riemannian geometry which are quite a surprise also there is a possibility of discretizing a bunch of people have looked at this including Ian Olivier Ian Maas Mathias Erbach and a bunch of others what does it mean we all know life is discrete computers and in many cases and you will say it's good to have a continuous geometry but in real life it may be a bunch of points and then may be all these ideas of curvature and so on will collapse when you have points, indeed geodesic don't exist in a discrete space but you can always find approximate geodesics and you can massage a bit the definition I erase the definition which was a mistake but the measure always makes sense in a discrete space instead of minimizing over paths which are time continuous maybe they are approximately continuous and there will be some small error that you can justify and this was studied in particular we know precisely now what it means for simple taste case take minus one or zero one to the power n so this is discrete hyper cube how is it curved depending on n it looks like absurd question but it is not, if you have in mind that there will be some geometric inequalities with some taste of curvature that will allow you to prove isometric inequalities inside this or spectral gap inequalities or things like this and this is indeed what was achieved so this exact question was asked by Dan Struck at the end of the 90s what is the curvature of the discrete hyper cube et now thanks to the works of these people and this kind of definition we know, particular for this the answer is one over n in some sense at least at the order of magnitude so it acts like a space of positive curvature means you will have convergence of gradient flows you will have concentration of measure etc and the bound the bound is less and less as n goes to infinity and the decay is exactly 1 over n so this was one of the motivations final remark just for those who are interested in the star theorems part of the proof of Grigory Perelman on the Poincaré conjecture I see just part of it was rewritten using this mixture of entropy gradient flow curvature and optimal transport John Lott and Peter Topping ok, that's it, thank you how is the worst time for questions just before the lunch I have just a question in the Smong-Kontorevich I mean I saw some paper applying this to telecommunication where they wanted to transport on the optimal base stations up to some calls pushing and where it can put them on again I remember discussing this with people from telecom yes yes, I was in discussion with some of them and one of the problems when I saw those papers was the implementation I mean the formulation is nice is there like some software or techniques oh yes, in particular the recent PhD gosh, who was his name he was my student in Yannès long ago there is a recent PhD Quentin Merigo qui donne des algorithms très convaincés qui permettent de solider le problème dans une très faible façon en particulier quand vous avez un problème d'image processing c'est aussi un problème typique d'image processing vous avez des images et vous voulez le matcher avec un autre par exemple, vous avez deux images différentes d'un avec des palettes de couleurs et l'autre avec un autre et vous voulez adapter les couleurs ici c'est ok, peut-être pour la vision stéroscopique vous devez adapter donc vous devez matcher les deux images vous devez résoudre un problème de transport optimal et le transport optimal est un de les moyens et avec le progrès recent de la computation ils peuvent le faire en temps qui est parfaitement adapté pour ces propositions donc je dirais aussi avec ça la thèse est vraiment bonne d'autres questions pouvez-vous parler d'un espace avec une routine négative ? pourquoi ? ok premièrement si vous allez profondément dans le coeur de la équation je vais dire comment vous faites les computations ? c'est comme vous devez être intéressés dans le jacobien déterminant d'exponential map il y a un certain let's take gradient initial velocity for our vector field like in the problem of cantorovich but it's also the way they formulate it usually so I have in this direction this direction and I send bunch of geodesics and I ask how is the jacobien determinant of this or are volumes distorted and you solve this and let's call this j of t for instance and you go into this and you try to find an equation and you will find that the corresponding equation is something like j of t to the power 1 over n second derivative plus Richie at the point that you are considering multiplied by j of t to the power 1 over n is bounded above by 0 this is a way to translate infinitesimally the Richie curvature and it's a way like I want to know how the volume of particles evolves if I am a specialist of fluid mechanics I will say this is like a flow with the velocity which is potential field and I am interested in how density will increase or decrease along the flow variations of the density are given by the jacobien determinant and there is this differential equation linear which appears and there is a sign here and if you want to do it without a sign you are dead because somewhere in there there is an equation which is not closed because there is a trace operation as always when you differentiate jacobien determinant and the square and trace and square don't commute but they commute up to 1 over n coming from Cauchy-Schwarz but there is an inequality it goes into this direction there is an equivalent formulation if you discuss with geometers they will tell you people in geometric analysis they say how do we detect Richie it's always in this way by comparing minus grad psi, grad Laplace psi plus Laplace grad psi square over 2 and looking what it is and this will be bounded below by Richie Richie times grad psi square plus Laplace psi square divided by n the dimension and in this term here there is again a Cauchy-Schwarz inequality because you are comparing this to the full Hessian of psi norm square between this and this there is a 1 over n which comes and there is an inequality and if you want to have a closed equation with Laplace and so on you need this inequality so at the level of equations at the reason why you always need the inequality there is a Cauchy-Schwarz which gives you inequality and without the Cauchy-Schwarz things don't close up this is called the Buchner formula now people have shown some strong theorems about negative Richie curvature saying that for instance there is no geometric consequence of Richie curvature bounded above and for instance you can have a family of spaces with negative Richie curvature converging in the sense that distances converge to a space with positive Richie curvature so that all the geometric properties that you can imagine that would be characteristic of negative curvature are not stable, they don't make sense there are some analytic consequences which you can look at by looking at solutions of heat equation but they don't translate into any good geometric properties Emmanuel somehow all the questions interesting in the negative when you are in a negative curvature case or not I mean you are somehow saying that you are dead when you just tried to solve the problem in the negative curvature interesting first of all and second I mean I'm always working in negative curvature always so in general it's very rigid that's why you won't have to look to convert this problem you have single objects you cannot deform them so it's all the type of questions these are the type of questions but the first indeed I said people in dynamical systems for instance are very fond of negative curvature but it's always a sectional negative curvature it's never a negative Ritchie curvature bisexional bisexional also yes absolutely yes it's sectional or bisexional but it will never be the Ritchie or the analog in the context of the in the complex case in the Kelerian case the Ritchie case is really one that comes with a sign that is in some sense given and it's a lower band below and even that at level of finding the propagation of local criterion to global criterion for instance works fine with negative sectional with positive sectional and with positive Ritchie but these are the only 3 cases in which this works fine there is on that there is not strong motivation for this people in concentration of measure are interested in knowing that the measure concentrates so that your space looks like bounded and also that's a typical philosophy of positive curvature like in the case and so on so there would be no there are no real motivation that I can see about negative Ritchie negative sectional for sure is very important and by the way the most general case for good concentration of measure is when the target space has positive Ritchie curvature but the start space has negative sectional curvature so it's un space in which you put the observation measure for which this Ritchie bounded below assumption makes sense I see observation measure it's always if you remember the interpretation of Boltzmann entropy it's always about observing data in the end you put the measure you want to know how they look like and so on there you want the Ritchie positive you can also say the way to make sure that errors don't matter there is a small error to look kind of the same and solution will automatically be where it should be negative curvature we know it's chaotic all kinds of crazy behavior can occur and so on ok maybe two small questions one is your title oh gosh yes yes yes yes man because in this case actually I often give a popular broad audience version of this talk we forget about the equation and so on the main is because this is an encounter between shields but was only made possible by encounter between people at some point I met Felix Otto it was some accident and this resulted in the in these domains coming together at some other point I met John Lott and this was the start of the theory so we should always cause it's a story of meeting of fields but it's always stories of meeting of human people behind this and that's the raison d'être of institutions such as institute Poincaré or institute d'études scientifiques or the whole conglomerate karma that we are part of make people meet so that the new ideas can come together mathematical theories in the end they are track objects but at first we are people talking to people and the second question very fast from middle fields is related to which part is for different the field middle was only in the statistical physics but not for this work was related to issues about the rate of increase of entropy in a gas and the solution of the theoretical problem of landau damping which is you take a plasma you start from your plasma equilibrium so take the plasma as a bunch of électrons the evolution equation will be the so called Vlasov equation it's like boss man but with no collisions electrostatic interaction you make a small disturbance electric disturbance and you want to know if your electric field will decay spontaneously it's like decay without collisions and in some respect this looks very paradoxical and together with Clément Moore we showed that it's true under technical assumption of periodic boundary conditions and for very smooth data smoother than C infinity it's a Jeuvré regularity and we explained why and in particular we showed that this problem was related to other problems of theoretical physics it was a big discovery for me and this problem later was the basis for solving a problem in food mechanics which was standing for more than 100 years I will explain it in words also it was done by Bedrosian Masmoudich beautiful paper taking inspiration from our work take a fluid two dimensional so like on this wall and take it a shear flow and take a linear shear flow so this is x, this is y and I will draw the vector field of the fluid so like this ok horizontal vector field proportional to the height and then you make a small disturbance of the fluid and then you wait and you assume that your fluid is non viscous and incompressible simplest in some sense and you ask is it true that just by the fluid equations it will converge to another rest state in which the velocity is horizontal in large time and this problem looks simple but it's horribly difficult to solve and it was solved a couple of years ago by Bedrosian Masmoudich using some of the ideas we developed with Climamo so this other set of works related to evolution equation in mathematical physics is the one for which I get the middle and it's not your most cited work it's fine definitely not, also because it's more recent also because there's a ok in all these issues there is a very large community whenever you go into entropy information and so on community is much larger than the community interested in rigorous mathematical physics ok thank you