 Okay, so it's my pleasure to introduce Nicola Gie, he's a professor here at CISA, and he will give us an introduction to optimal transport, and he's a great person to give us an introduction because he's written the book that is one of the main references for this topic, so I hope we'll learn something. Thank you, thank you very much. Yeah, it's a pleasure to give such a basic notion in seminars, optimal transport has been a very active research field in the last 20, 30 years or so with deep connections with other fields of mathematics, in particular, the study of parabolic PDEs and we geometric analysis. A lot of important people worked on the topic, just to mention a few of them, Felix Otto, Cedric Villani, Cartelosturm, Luigi Ambrosio and Robert McCann and many others. Now, to give an introduction to such topic, I mean, essentially I had to make a choice, either to give you a glimpse of what optimal transport is and what are all the connections that has with these other important research fields, or to just sort of stick to the basics and try to be sure, or at least as sure as possible, that in one hour you have an idea of what optimal transport really is and maybe some of the crucial definitions and properties that this problem has. So I chose the latter one. So this means that I will really, really stick to the basics and so hopefully many of you will come back with having learned truly something new. Maybe some of you will be bored. So I apologize for those of you who already have an idea of what the optimal transport is because maybe you won't really get that much more. Anyway, also let me add that, feel free to interrupt me at any moment and ask questions whenever you want, okay? Okay, so here is what I'm gonna speak about. So first of all I will tell you what is optimal transport in a quite general framework. And in fact, for the most part of the talk, I will stick to a very general framework which if you want, you can specialize in whatever way you like, but one of the features of optimal transport is that it provides a very flexible and general setting which in some sense works in very high generality. So we'll see that the notion of C convexity and C-seq monotonicity which in some sense generalized concepts like standard convexity for functions on the Euclidean space and monotonicity that can be generalized to abstract setting of Polish spaces. I will speak about the dual problem and I will conclude with a very surprising, one of the crucial theorems of optimal transport, one of the first in fact that came out which is Brunier theorem, which is about existence of optimal maps in Euclidean space for a very particular and important cost function. But anyway, so let me start. So here is the setting. We shall work on Polish spaces, so-called Polish spaces. What is a Polish space? A Polish space is an anthropological space whose topology is induced by some metric which is completely separable. So you typically say Polish space instead of complete separable metric space because you typically don't really care about who is the metric but just on the topology that it uses. Why should we work in this setting? Well, essentially this is a natural setting where to do measure theory. For reasons that I will not discuss that much, measure theory works particularly well on these sort of spaces. In some sense, as well as it does on RD. And in fact, if you want, you can just pick the space RD, to be RD if you prefer. So some of us prefer to work in highest generality sort of just to focus on the core structural properties of the problem in question. Others maybe prefer to work on concrete situations. Whatever you want is fine, okay? So P of X is the space of border probability measures on our Polish SpaceX. Now, let me give you a definition. Now, this is a crucial definition. If you have two Polish spaces, X and Y, or RD, RD, if you prefer, and you have a Borel map from the first space to the second and the probability measure on the first space, then you can build the so-called push-forward measure. So the push-forward measure, T push-forward mu, assigns to every set A the mass that the measure mu assigns to the pre-image of A. I guess there is a type over there about how parentheses are made. Anyway, so imagine that this is the first space X, this is the second space Y. On X, you have a probability measure. You can think as a sort of distribution of mass and you have a map, T, which goes from X to Y. Now, the push-forward measure is in some sense the measure that you obtain if you pick every atom that move X and you put it in T of X. So that the measure which is assigned to some set A, the measure that T push-forward mu gives to a set A, this will be by definition mu of the pre-image of A. It is trivial to check that this defines a Borel probability measure on Y. One of the characterization of the push-forward is that for every Borel function on Y, real value that the integral of F with respect to the push-forward is equal to the integral of F composition T with respect to the original measure. So by right composition with T, you create the pull-back of maps. Now, measures and functions are in some sense dual in each other, right? You integrate a function with respect to a measure and you get a real number. So by duality, you obtain the notion of push-forward of a measure. Okay, now here is the first basic formulation of the optimal transport problem. Now, suppose that it is given not only a measure on the source space, but also a measure on the target space, mu and mu, respectively. And suppose it is given a function from the product from X cross Y into R, which let's say for simplicity is non-negative and discontinuous. And this function we shall call cost function, okay? This function represents, so the optimal transport problem is this. I want to minimize the integral of cost X comma T of X in D mu of X, among all the transport maps from mu to mu. What is a transport map? It's a Borel map such that T push over mu is equal to mu, okay? So why this is optimal transport? Because so the cost function C in my mind is the cost, the amount that I have to pay to move a unit of mass from the point X to the point Y. Now, if this is the cost for a single unit of mass and I want to transport mu into mu, the total cost would be the integral of C X T X, okay? In D mu of X. And I have the constraint of having to map the measure of mu into the measure of mu, okay? Now, I am saying that this is most formulation, actually the original formulation was much more specific. So the original formulation was X and Y were the credian space, R D. And the cost function was the distance. This is perhaps be the most natural cost function that one can think of in order to move from X to Y have to pay an amount, which is how far they are, okay? But nevertheless, I mean, sort of we lose nothing at least a prior in the beginning in formulating the problem in this general. Now, from the mathematical point of view, this problem to some extent is, I wouldn't say imposed, but it is not really well formulated. And for quite a few reasons. So the first is that a priori, nothing ensures, given measures mu and mu, that there exists a transport map from the first to the second. So maybe I'm trying to minimize over the empty set. For instance, if the starting measure mu is a delta, then, and you look at the definition, you realize that T push over mu, whatever T is, is gonna be a delta. So if mu is delta at X, T push over mu is gonna be the delta at T of X. So if mu is a delta and mu is not, there is no transport map at all. But moreover, even if you have transport maps, still the set of transport maps is not closed with respect to any reasonable weak topology. Typically, when you have a minimization problem, what you do is, okay, I know that there exists an if, not sure if there is just a minimum. So I take a minimizing sequence and maybe under some topology, this minimizing sequence will have some limit and then maybe this limit will be really minimized out of my problem. But the problem is that there is no reasonable topology that you can put on maps. Regardless of whether you are working on a poly spaces or R, there are no reasonable topologies for which for which the constraint is closed and for which, in some sense, you have compactness, okay? So what do we do? Well, we came actually, what Kantorovic did was to formulate the transport problem in another way, which is way better in some sense. And the idea is not to consider any more transport maps, but to consider instead transport plans. So a transport plan, an admissible transport plan between the measure new and the measure new on X and Y is a measure on the product space on X cross Y. Having the property that it's marginals, so pi one, push or gamma and pi two, push or gamma are given by the measure mu and mu. So yeah, pi one and pi two are the projection onto the first X component and onto the second and onto the Y component. So if gamma is a measure on X cross Y, pi one, push or gamma by definition is a measure on X. And I can ask whether this measure is equal to mu and the same for the projection on Y. Now the new formulation of the optimal transport problem is well, now I minimize the integral of cost X, Y in the gamma X, Y, okay? And among other things, we gain symmetry in this formulation, so in mu and mu. Now why this is a much better formulation? Well, first of all, we always have some transport plan. So you can always form the product of two probability measures, mu and mu, and this product will be a transport plan. The second problem is that in some sense, transport plan are a generalization of transport maps. If you think about it for one second, you realize that if T was a transport map, so T push or mu equal mu, and now you look at the graph of T, you look at the map which takes, say, X and gives back X comma T of X, so from X into X cross Y, and you do the push or the mu with respect to this map, then what you obtain is a transport plan. So to every transport map, you can associate it in a very natural way a transport plan, okay? Now moreover, on the set of probability measures, so now you are minimizing not anymore on maps, but on probability measures with satisfying some linear constraint. Now this set, the class of admissible transport plan, is easily seen by means of sort of basic measure theory that this is closed with respect to wick topology. Also compact in fact, okay? Now I'm being a bit lazy on speaking of what it is wick topology, but let's say if your space is X and Y, so you are compact, let me exaggerate a little bit, then wick topology means convergence in duality with continuous functions. And moreover, given that the function C was continuous and bounded from below, one can check that the map which takes a probability measure gamma and gives back the integral of this construction with respect to gamma, well this is clearly linear, and it's also continuous with respect to this wick topology. So in particular, the minimum exists. I can always find the minimum for this formulation of the optimal transport program. Very well, so that's already something. So now that we have optimal plans, few natural questions arise. So for instance, can we say something about optimal plans? I mean, do we have any sort of earlier equation or any structural properties of optimal plans? Are they unique? Maybe under some condition on the coast or on the measure, on the spaces or whatever? Are they induced by maps? So in solving Kantorovich formulation, can we recover optimizers also for the monge formulation of the program? So these are some of the first natural questions that one can sort of pose himself when starting this business. And in fact, let me try to show you that some reasonable circumstances we can sort of produce reasonable answers to those questions. So I need to give a few definitions. The first definition is about C-seqical monotonicity and C-convexity. So let me start with a key basic example. Let's say that the two measures, mu and nu, are a very particular structure. They are just combinations of deltas. So I have n points, say n points on the source, n points on the target space, xi and yi. Mu is just one over n sum of deltas on the xi's and nu sum of the deltas on the yi's. Now, it is pretty trivial to check. In fact, I guess the statement that I'm gonna give is one of those which essentially have to, there is nothing to prove. That a plan is optimal, gamma is optimal. If and only if. For every permutation on the set of indexes, I guess I made some confusion with the small n and capital n. So small n is equal to capital n, sorry, I apologize. So for any permutation on the set of indexes, if you do the sum of the cost, xk, and you pick a few couples of points, xk and yk, in the support of the optimal plan, you do sum over k of cost, xk, yk, this is gonna be less or equal to the sum of the cost from xk to y sigma k. So what is this saying? Oh, actually, yeah. Sorry, small n is not equal to capital n, can be different, can be smaller, can be bigger. So you have a few points given and now what I'm saying is that whenever I pick a finite number of points in the support of my plan, okay, when my support of my plan, of course, it's gonna be finite, so I cannot take too many of those. But anyway, that inequality is true. So what is that inequality telling? So let's meditate a little bit on this. So any transport plan in this very finite dimensional situation is a sense, it's only telling us if I'm moving and in case how much mass I'm moving from the point xi to the point yj, okay? Sort of, I have a finite number of points xi here, yj over there, I have the points xi, yj, the couple of points in the product and for each of these points, I'm watching whether I'm moving some mass from xi to yj, okay? Now, if my plan is giving some mass to one of these couples, so if the couple xi, yj belongs to the support, it means that, okay, I'm truly moving something over there. And what this condition tells, this inequality, tells that reshuffling the way I move the mass, I cannot improve in the cost, okay? If I was moving, for instance, x1 to y1 and x2 to y2, I cannot produce a better cost, so a minor cost, by moving instead x1 to y2 and x2 to y1. Questions? Okay, so now this statement is, in this case, it's rather easy to prove, in the sense that if not, well then the new transport plan that I obtained by doing this reshuffling is gonna have a minor cost. So my original plan was not optimal. Sort of end of the proof, in some sense. There is nothing to say. Say that again, sorry. Oh, so this really depends on the structure of the cost around the measure. You can have non-trivial situations in which for every permutation you have equality. Let me give you an explicit example for this. We are on, say, R2, and so let me say, well actually, so maybe let me, we are on R3, imagine a regular tetrahedron, so I want to use the cost function equal distance. So this is the regular tetrahedron, all the sides are equal. Now mu is a measure which gives one half of mass to this point and one half of mass to this point. And mu is a measure which gives one half of mass to this point and one half of mass to this point. Now you see that whatever transport plan is optimal because anyway, any unit of mass is gonna be moved by a distance one, right? So if cost is equal distance, or any function of the distance, nothing really, nothing really matters. So despite the fact that the problem is a non-trivial geometric structure because cost equal distance for more complicated measures can produce very complicated geometries. In this case, no, you have something good. So the answer to your question depends a lot on the structure of the particular problem considered, okay? Other questions? Okay, yeah, please. When I leave, sorry. In particular, yes. So there are, so you, so, wait a second, let me rephrase, let me rephrase. There are always optimal plans in this case which are permutation, but not necessarily all permutations are, sorry, not necessarily, every optimal plan must be of the form of permutation because the function that you are minimizing is linear. So it's convex, but not strictly convex. So think to this case, no? I move half of the mass here, half of the mass here, and I put the rest. So here, whatever you do is optimal, okay? So what you're saying is very true and so these extremer points upset, et cetera, et cetera, but the function is not strictly convex. In particular, so this gives the occasion to add this. In general, you cannot hope to have uniqueness. I mean, if you have uniqueness it must be for some other reason, but the functional is linear. So integral of cost of any function with respect to the measure is linear in the measure, okay? Which is great when you look for continuity or semi-continuity, but makes things harder for you. Okay, now this property that optimal plans in this finite combination case have is so important that it is worth its own name. So let's say that a set, a subset of the product X cross Y is C cyclical monotone provided for any natural number for any N couple of points in this set. I mean, this property of not increasing the cost under permutation holds is a definition. So why this name? You might have heard of monotone operators on RD. So this is just a small parenthesis for those of you who have heard about monotone operators. So a monotone operator on RD is something for which are essentially X scalar product of X1 TX1 plus X2 TX2. Is always greater or equal or less or equal I guess for monotone operator typically one puts this than the sum of, I mean, with the switch. This for a map T from RD to RD. This is what is called a monotone operator. So if you plug cost equal minus a scalar product, you get closer to the notion of C monotone ECT. Now C cyclical is just because you don't just take two points as well, but you take any final number of points and then sort of permutation. That's where the terminology comes from. Of course, if dimension there is one, the monotone means we increase. Okay, so now here is a first structural theorem, which I will not prove, but at least I will state. And the point is this. So take now take a probability measure on the product space X cross Y. Now this by nature is a transport plan that is visible between its own marginals. And I can wonder whether it is optimal with respect to its own marginals. And it is so if and only if its support is C cyclical in monotone. So for measures which are supported in a final number of points, this is, I mean, this sort of we prove this. I mean, at least there is very little to prove according to what I was saying a couple of times. But what this theorem tells is that this is true for any transport plan. Now here is where, so the proof is not really complicated, but this is a bit tricky from a technical point of view. So essentially here the continuity of the cost plays a role, okay. In some sense, so now typically a final number of points in the support typically will have really no mass. So, but you have to use the continuity of the cost to tell that, okay, if the support is not monotone, then I can rearrange a little bit of the mass without changing the marginals and decreasing the cost. So the actual construction is a bit tricky, but the intuition behind the proof is really what comes from the finite dimensional case. Now I will not prove this, but I want to emphasize one consequence of this fact, which for me was very surprising when I started studying the optimal transport business is that the optimality of a measure does not really depend on the whole measure, but only on its support. You take an optimal plan, now you reshuffle the measure without changing the support in the way you want, and the new plan will typically have different marginals, but still will be optimal between its own new marginals. This is a bit, I mean for me it was a bit shocking at the beginning, but if you sort of meditated a little bit on this cyclical monotonicity business, I mean if you realize that what cyclical monotonicity or the 3D characterizes optimality, then this is a call. Okay, so let's move on. So now in some sense we translated the problem of understanding the optimality of a plan to a problem of understanding the structure of cyclical monotonic sets. So how can we sort of study them, get a better intuition, and here it helps the dual formulation of the optimal transport, which again was introduced by Kantrovic. So optimal transport is in some sense, okay it's infinite dimensional, but it's a sort of, I have to say a linear programming problem. So you are minimizing a linear functional over a convex set. Now for those of you who have familiarity with linear programming, well then you know that this kind of problem has an actual dual problem, where you maximize another linear functional over another convex set. So this sort of business is also in place in this sort of a little bit more functionalistic framework, and here it is the dual formulation. So the measures mu and u are given and this cost function is also given. Now rather than minimizing an integral of cost, we maximize the sum of two things. We maximize the integral of phi d mu plus the integral of psi d mu. For a couple of functions phi psi on x and y, respectively, continuous and bounded with the following property, with phi of x plus psi of y is always, for every x and y, less or equal than the cost of all. Now at a very heuristic, very, very heuristic and rough point of view, let me underline one feature of this dual problem, which is very common in this duality of linear programming. So our original problem was minimizing a certain functional over a given some constraint. In the dual formulation, in some sense the constraint become the functional and the functional become the constraint. So for instance, the cost c in the original problem is in the functional integral of cost with respect to gamma, not in the constraint. The constraint is just a constraint about push forward of the measure. In the dual formulation, the cost c does not appear in the functional, but it appears instead in the constraint. By sversa, the measures mu and u, which are the constraint in the original formulation are in the function and the dual. So this is a kind of reshuffling, which often happens, actually always happens in this, there's a counterpart in this dual formulation. Okay, let's discuss a little bit this dual problem. So this function phi f psi like that are called admissible potentials. So the first observation is that, so the soup of this dual problem is always less or equal than the inf of the original problem. And this for trivial reasons, because take any gamma transport plan and any couple of admissible potentials, phi f psi. And look at how much it is at the inter of cost in the gamma. But for any x and y, c x y is greater than phi of x plus psi of y. So we have the first inequality over there. But now I can split the integral into two. Now we'll have an inter depending only on x and the other depending only on y. So now only the marginals of gamma. So this proves that the inter, so the cost of gamma is always greater or equal than what I can get for any couple of admissible potentials. So the inf of the original problem is greater than the soup of the dual. And one, and typical, so this is a very common in the, and the question then is where you have strict inequality or not. Or if you have strict inequality, you say that there is a duality gap. If you don't have, you say that there is no duality gap. Okay, so let's study for a little bit more this dual problem. And here is a construction which is very simple and very important. The construction of this is called the C-transform of a function. So say that phi is given, phi is a function on x, I can build that it's a C-transform. So phi with this exponency. And this is gonna have to be a function on y. This is defined as, so at the point y is the inf over x of Cxy minus phi of x, okay? Notice that, so if phi psi were admissible, then, well, two things are true. So the first is that this C-transform is always gonna be greater or equal than psi. In fact, this C-transform is the given phi, is the biggest function that I can take remaining admissible, okay? So recall that being admissible means that phi of x plus psi of y should be less or equal than Cxy for every x and y. Now, say that phi is given. Now, what is the biggest psi that I can put here in order for this to be true? Well, that, particularly if phi psi was admissible, then phi C is gonna be greater or equal than psi, for sure. Everywhere. Okay, and of course, I can do the same. I mean, I can iterate the construction. So now, I mean, my symmetry, okay? Now, the fact that phi C is always greater or equal than psi in the dual problem, it means that if I was starting from a couple of admissible potentials and I replace it with phi comma phi C, in the dual problem, I gain something more. I increase a little bit. Certainly, I do not degrease it, okay? And of course, I can iterate, okay? So, I start from a couple of admissible potentials, phi psi, then I mean phi phi C. And then phi C, C, phi C, et cetera, et cetera, et cetera. Maybe I can hope that I continue and maybe in the limit something will happen and I will be done, but in fact, the process stops after three steps. So, it's not hard to check that if I do three times this construction, it's the same as doing it just once. This is, I mean, I won't give you the full proof, but then it just, I mean, it's really in one line. So, if you write down what is phi C, C, C, you have inf of cost minus phi C, C, but then there was a minus, so this becomes a soup, so et cetera, et cetera. So, in the end, you have this expression which looks a bit complicated. So, inf over x, super over y tilde, inf over x tilde of this expression over there. And now it's really just sort of for algebraic reasons. So, here, in some sense, in this discussion about C transforms, so no major abilities involved is really sort of basic algebraic manipulation. So, in that expression, you pick x over x tilde and you get that the triple C is less than one C and then you pick y tilde equal one and you get the other inequality. So, phi C. Okay, so, in particular, in particular, this tells us where to look for maximizer of the workbook. Rather than looking in general, phi and psi admissible, I just look at this sort of stable couple under this operation form. So, let me say that function phi C concave, if it is the C transform or something. And, perhaps, let me comment a bit again on why C concave, so what has this to do with it? C concavity. So, if you think back to the case here where the cost is equal to, say, the scalar product or minus the scalar product, actually, of phi on Rd, now a function is C concave for this cost if and only if it is the inf of a family of affine functions. Concave and upper semi-continuous to be precise, but concavity. So, this C concavity is a generalization of, in some sense, of a notion of concavity. And the operation of C transform, for those of you who know it, maybe at this point they realize that this is a generalization of the standard notion of Le Jean transform within complex functions. Okay, now another concept that is useful in this connection to this is the notion of C super differential. Now, the C super differential, despite what the name may sort of inspire, has nothing to do with derivatives, at least in general. The C super differential is a set of points, actually, of couples, a subset of the product space x cross y. And it is the set of those couple xy for which phi x plus phi cy is equal to Cxy. So, in general, you have an inequality. Heavy quality, well, you say that those couple, those couple are in the C super differential. Let me now state two facts, one which is trivial, the other one which is not. So, the trivial fact is that, whenever you take a C concavity function, and you look at its C super differential, well, this set is Cxycalimonotone for trivial reasons. Okay, I know that I gave quite a few definitions, but I stop here, I will not know nothing more. So, let me try to explain you why. I mean, it's really, it's really trivial. So, pick a family of points, say xk, yk, in the C super differential of phi, and whatever permutation of the index. Now, the sum over k of Cxk, yk will be equal by the notion of the definition of C super differential, the sum over k of phi xk phi transform at yk. But now, sum is certainly a commutative operator. So, I can reshuffle the, in which I add the y's in whatever way I want. But then, xk and y sigma k will be two generic points of my space, and the sum of phi at the point, the phi c and the other, is gonna be less than the cost. So, any C concavity function, since it produces a cc gali monotone set by just by taking the operation of this all. Now, what is less trivial, is that in fact, any cc gali monotone set arises in this way. Take any cc gali monotone set, and for sure you will find some cc gali monotone, sorry, some c concavity function whose c sub differential contains it. Now, okay, this statement, in some sense, is so general that the proof cannot be too hard. There has to be some trick, and in fact, there is, that produces you starting from the set. In fact, the construction is very explicit, and you can write down who is the c concavity function. Now, the original formulation of this theorem was for a convex function and a cyclical monotone sets over at the end. It was a theorem by Rockefeller. Okay, and it has been later realized that this theorem is really little to do with the particular, the very special property of convex functions and sub-differentials, and it's much more general and based just on this notion of c sub differential. But we'll not really prove this theorem, actually, but just mention that there is a very explicit construction, even gamma, produced by it. Okay, so let me wrap up what we said up to now. You take two probability measures. You take your cost function. Well, then the following are equivalent. A plan gamma is optimal. Support is c cyclical monotone. And, or it's support is included in the sub-differential of some c concavity function. Now, to be completely honest, I'm neglecting some little hypothesis, not that much. So, if x and y are compact or say, new and new have compact support, I have to add nothing more, but in some sense, I have to do something to exclude the presence of too many plus infinities. So, for instance, under the assumption that I made up to now, nothing excludes that for any transport plan, the cost is plus infinity, for instance. Function is continuous. Maybe the supports of new and new are so spread out that in order to move the mass. So, if I pay just a little bit of attention and avoid this, say the cost is bounded or the support of my measures are bounded or something like this, then I'm fine. Okay. Now, okay. So, I gave you so many definitions, stayed a few theorems that all of them sort of look quite trivial, or maybe one implication trivial, the other one has some construction. So, you might think that up to now we really did almost nothing. And so, now let me tell you an application of this theorem that is just written here, which is Bernier theorem, which is amazing, I think. So, it's very nice. And it's just a consequence of what we've done up to now. So, oh, actually, well, so, before I forget. So, in particular, there is no duality gap. So, if you believe in the theorem that I just brought, so the inf of the transport problem is really equal to the sum. Because when you take a plan gamma, which is optimal, you can find a function phi, which is C concave, C sub differential contains the support of gamma. And now, the cost of gamma, now, in this case, you have really equality, because on the support of gamma, C x y is equal to phi of x plus phi C of y. Not just less or equal, it's really equal. So, now it's really, so now you have equality in this one. So, there is no duality gap in the optimal transport, okay? Under these mild technical assumptions. Okay, with that said, let's now discuss these definitions that I just gave in a very particular situation. So, the situation is where x and y are equally in space, and the cost function is the distance squared, or perhaps, distance squared divided by two. Now, when I first started studying optimal transport, I frankly felt that taking cost equal distance squared was somehow cheating. So, the true cost had to be the distance, and so, why distance squared? And now, the reason why distance squared appears, well, not just in Bernier's theorem, but in fact, in most of the, not all, but in several of the applications of optimal transport, cost equal distance squared is the crucial quantity. It's morally speaking related, so it's much like the space is L1 and L2, right? So, L2 is way better behaved than L1. And if you say why L2 is better behaved than L1, well, because on R2, the distance that you put between a couple of points is the L2 distance, and you want Pythagoras theorem, no, no, no. So, to buy this very bizarre heuristic or very rough heuristic, I mean, you might get the favor why distance squared is a good thing to do. But anyway, regardless of this, in fact, in some sense, one of the conclusions, I will come to that in a second, one of the conclusion of Bernier's theorem, you can read it regardless of what you're on transport and it's surprising anyway. So, if you don't want to buy in this distance squared business, just hopefully you will appreciate the conclusion of Bernier's theorem. So, in this case, what it is a second key function. Okay, they have a very explicit and very simple characterization. So, a function phi is c concave. If and only if I build the function as squared over two minus phi, this is convex. So, c concavity in this case is tightly linked to the standard notion of concavity and convexity. And why is this the case? Well, I mean, look, just follow me in this computation. So, phi being c concave means that phi of x is the inf over y of x minus one squared over two minus some psi of y. I expand the squares, I get x squared over two and plus some function of y. Now, the x squared over two, I bring on the left. It does not depend on y. So, I can bring on the left. So, this tells that phi x minus x squared over two is the inf over y of scalar product, x and y or x minus y plus some function of y. So, if and only if this function phi bar is the soup, I change changing design, this is the soup of the scalar product plus some function of y. So, if and only if, very well. So, if c concavity fix wall with convexity, maybe also c sub differential will sort of have some relation with the standard notion of sub differential of a convex function. Indeed it does. So, the point is that x, y will be a couple in the c sub differential of phi, if and only if y belongs to the sub differential of the corresponding function of phi bar. Maybe, let me just remind you what is a sub differential of a convex function. So, you say that v belongs to the sub differential of phi of say phi bar at x if by definition for every and now I should say what is z on Rd, I have that phi bar x plus the scalar product of v and z minus x is less or equal than phi bar of z. Now, in sort of looking at the picture, this means you take the graph of your function phi bar and this tells that the hyper plane driven by the vector of v and which passes through the point x phi of x, this is tangent from below to the graph of phi bar. If the convex function is differentiable at some point, then it's sub differential contains only one point and it takes the point being the degree. Okay, but sub differential may exist even in points of non-differentiabilities. So, for instance, the absolute value. It's not differentiable, but it has a lot of points. Sub differential. Okay, now the point is that so our original secon k function was phi. We built our convex function of phi bar by this x square over two minus phi and now we have this relation between C sub differential and standard differential. Now, again, the proof is really just writing down what it means being in the sub differential versus what it means being sub differential. Now, following the computation on this slide may be hard, but believe me, there's nothing complicated. I really write down the definitions and as before, I carry sort of x square over two from the left to the right and want to follow the computations really few lines. Nothing really complicated. Now, let me put one last ingredient into the discussion and the ingredient is this. This is a weak version of Alexander's theorem, very weak, in fact. This is more rather market theorem for convex functions. So, take a convex function. Now, convex functions on Rd, they are locally Lipschitz. This is really, I mean, sort of, their domain of definition in the interior deli. Now, Lipschitz functions are almost everywhere differentiable, Lebesgue almost everywhere differentiable. So, in particular, say if I have a convex and say real valued function, then this function is Lebesgue almost everywhere differentiable, okay? This is sort of basic major theory, if you want to try the Meckard theorem for convex functions. So, in particular, for Lebesgue almost every x, the sub-differential of phi contains only one element. Now, here is the theorem of Grignet. Take two probability measures on Rd and assume that the first is absolutely continuous with respect to Lebesgue and then consider the cost function, c, e, whatever, sorry, distance squared divided by two. Well, then, three things are true. So, the first, there is only one optimal transport plan. Second thing, this optimal transport plan is induced by a map. And third thing, this map is the gradient of a convex function. So, in particular, so here I cheated a bit about the support, so I should say that I should put some assumption on the support of the measure on the second moment, but let me skip this technique. So, in particular, one crazy thing that the Grignet theorem tells is the following. You take two measures, probability measures on Rd, with mu absolutely continuous with respect to Lebesgue. Well, then, so this is a statement which has nothing to do with optimal transport. Well, then there exists a function, say phi bar convex, such that grad phi bar push forward mu is equal to mu, in particular. Who is this? Well, this is the phi bar, which is given by this theorem. If you look at, so this statement, so it looks a bit unbelievable, right? So, one would say that there are many more couple of measures than convex functions on Rd. And part of the reason why this can be true is that the same convex function may work for several couples. But anyway, anyway, so this very surprising statement can be proved in few lines from what we said before about the structure of optimal plans. So, let me show you why. To pick an optimal plan, it exists, okay? We said it exists, okay. So, its support is included in the C supper differential of some C concave function phi from general theory of optimal transport. Now, due to this special structure of our cost function and relation between C sub differential and the standard sub differential, we know that there exists some convex function phi bar whose sub differential contains the support of gamma. But we do, so in particular, this means that for gamma almost every couple, x, y, y is in the sub differential of this phi at x. I'm just restating what I said before. Okay, but mu was absolutely continuous with respect to the back, right? So, this assumption implies in particular that our function phi bar not only is the back almost everywhere differentiable, it's also mu almost everywhere differentiable. But I said before that if the function phi is differentiable at a point, then its sub differential is unique, right? So, for mu almost every x, there is only one y for which, I mean, which belongs for which x, y is in the support of gamma, and this y is really given by grad of this phi bar at x, no other choice, okay? So, this proves that our optimal plan is concentrated on the graph of the gradient of this phi bar, okay? So, essentially I'm almost done, right? I should just prove the uniqueness of optimal plan. Maybe there are two of them, okay? But, okay, so this point is the same as to say that this is the optimal plan, so induced by a map and the map is a gradient. And why the optimal plan is unique? Well, because if not, maybe you have another optimal plan and another convex function and whatever, but now I take the average of the two. Half the first plan plus half the second plan. This is also optimal, right? Because it has the same cost as gamma and gamma tilde. But now if one is concentrated on say, on the graph of some function, and another is concentrated on the graph of another function, then the average is concentrated on the union, so it would not be concentrated anymore on a graph. But the previous proof that any optimal plan is concentrated on the graph. So in particular, I mean, for size that this very abstract machinery of second curve, the second vector, and so on and so forth, that one might fear it is so abstract that it has sort of no serious implication. In fact, it implies this very beautiful theorem of Bernoulli in just a few lines. So I guess that's it. Thank you for your attention. Okay, any questions? Yes, this has square root of two. Yeah, absolutely, absolutely, absolutely. Any other questions? Okay. Or you mean heuristic or what is, yeah, sure. So you have the problem of optimal transport. So you have your cost function, you have to move mass from mu to nu, and you're figuring what is the best way of doing that. And I'm a transport company, and they come to you and tell you, look, I will take care of the transport. For me, I want just to be paid whenever I take a unit of mass from some source x, you pay me phi of x. And whenever I put some mass on some target y, you pay me psi of y. Now, the way I transport things doesn't matter. So I can, for you, every point is indistinguishable in some sense. And you don't have to care about transport, you just pay me an amount phi of x whenever I take mass in x, and you pay me psi of y whenever I put psi of y. So what I gain is the integral of phi in the mu plus psi in the mu, okay? Now, of course, I cannot charge prices too high, otherwise you won't come to me and you will do the transport by yourself. So what is a reasonable assumption on the price is that whatever x and whatever y we take, the price that I charge you from x and y should be less or equal than what you would pay in moving x to y. So phi of x plus psi of y should be less or equal than cost x, y for every x and y, okay? And so the consensus duality theorem that we gave shows that under very reasonable assumption on the cost, in fact what I'm able to gain is in fact the same that you would have spent. So in some sense there is, okay? I had a question on that note. So Kantorovich was an economist, and so. So he won the Nobel Prize for a, I guess he was more a mathematician, I would say. So he won the Nobel Prize in economics precisely for this duality formulation and business about opium. So is that duality formulation saying somehow that capitalism is as good as communism? In a sense. Oh, well, I'm not good in politics. I'm better in math, I'm already in math. So maybe if I can skip the answer. No, I don't think, I wouldn't say no, I wouldn't say there is, you mean because central management with respect to own. Yeah, you're saying that. You're just saying that you cannot charge too much and then you can do whatever you want. Oh, I wouldn't say, I wouldn't. Perhaps, I mean not related to your question, but maybe I didn't mention if you are interested in optimal transport and want to read something on your own. So there are a couple of books by Cedric Villani on the topics and there is also a book more recently written by Filippo Sant'Ambrogio with an eye more toward applications applied math. And there is also a lecture note that I wrote together with Ambrosio called User's Guide on Optimal Transport and all these things that I did in the first 20 pages or so and of this last reference of it.