 This is Jean-Balbier from ICTP. So you've been with us since 2019 in the quantitative life sciences section and you're doing research in many things, but among other things, statistical physics with connections to statistical inference, computer science and machine learning. And today you will talk a little bit about phase transitions from physics to computer science. So, good floor to you. Thank you, Zach. Hello, everyone. So it's a great pleasure to be here today for this basic notion seminar. So, as Zach mentioned, I'm a statistical physicist, but working really on various fields which may apparently be unrelated to physics. And what I want to do today is to show you that actually very important, one of the crucial notion of statistical physics which is phase transitions are actually of crucial importance also away from physics. And the aim of the talk is to give you an overview of the very large variety of systems in which this type of threshold phenomena where you have an abrupt change of the system behavior, some complex system behavior, when some control parameter like the temperature or other more abstract parameters as we will see change. And I want to show that this type of phenomenon appears essentially everywhere in nature but also in computer algorithms and these kind of things. So let us start with the basics and single molecule of water. So this is a very tiny object. It's a tenth of a millionth of a millimeter. But if I consider very many of them, and when I say very many, I really emphasize on the very of the order of 10 millions, billions, billions of them. So this is the so-called Avogadro number which is the typical number of molecules you can find at the drop of water, for example. Notice that this type of numbers are much bigger than the numbers appearing in astrophysics or high energy physics. If you take a lot of them like this, depending on some parameter here, the temperature, this may lead to very, very different results. So here you have, of course, ice. This is a kind of iceberg. Then you have the liquid state and you have the gas state. But what is important to keep in mind here is that these three very different systems are made of the very same components which are molecules of water. So at the microscopic level, they look quite similar but at the macroscopic level, of course, these systems are very different. They have different mechanical properties, optical properties and so on and so forth. So really the whole is more than the sum of its parts or as summarized by Philippe Anderson who has a Nobel Prize in 77, more is different. And what he means by different here is really different at the fundamental level. So let me continue to quote Anderson. The behavior of large and complex aggregates of elementary particles, it turns out, is not to be understood in terms of simple extrapolation of a few particles. Instead, at each level of complexity, entirely new properties appear and the understanding of the new behaviors requires research which I think is as fundamental in nature as in any other. So here is the kind of pictures that we like in physics. This is called a phase diagram. So it essentially tells you in which macroscopic global state can be found the water, the system of interest here, which are molecules of water as a function of control parameters. So here I put the pressure, which is in bar and the temperature in Kelvin. And you see different regions which are separated by lines and these lines correspond to phase transition lines where you change from one macroscopic state to another one. So for example, the ice, so the solid state is here where molecules are very compactified and they are very ordered. You have a kind of crystalline structure. Mars is here in the phase diagram. So most of the, if not only the water and Mars is in the solid form, the planet Earth is here in the middle of the liquid phase. That's why the planet is blue and not so much liquid water but we're actually close to the phase transition line and therefore we also find ice on Earth. And then you have the gas state where essentially particles are freely moving around and Venus is here. So notice that the density of the gas, for example, is much lower than the one of the liquid than the one of the ice. This is what we call a phase diagram. Let me just give you a bit of physics. The intuition, I don't know the intuition but the kind of tools that we like as physicists to understand this kind of pictures. It is called the thermodynamic potential. So essentially there exists a function that I would call the free energy that I will describe a bit later which is a function that when you plot it has a function of some order parameter. So the order parameter is what? It's a quantity that essentially describe the macroscopic state in which the system is found. So for example, at a high density, it means that particles are compact all together. You should be in a kind of solid state, okay? While at lower densities, you should be in a kind of liquid state and much lower densities, this is the gas state, okay? And there exists this function that if you plot it for a given temperature, let's say 0.2 Celsius, it will look like this. So it will have a minimum at a density of zero which in this case is by definition the one of liquid. And if you continue, if you decrease a bit the temperature, you see that at zero degree Celsius, you have a second minimum that appears and that will become the global minimum if you continue to reduce the temperature and this corresponds to the solid state. So essentially from this picture, let's say there exists this function that you can compute in some way. You can read off directly in which macroscopic global state, the system, the complex system you are studying will be found, okay? And the minimum of this function corresponds to what we call the equilibrium state in physics. So what is this function, this free energy? So this free energy is a way to understand the perpetual tension between order and disorder. So let me say that I'm an experimental physicist. I'm in some lab and I can fix for my experiments some control parameters. This is why I call them control parameters. I can control them. For example, the temperature or the pressure or in a natural environment, this is fixed by nature, okay? I will then define a number of order parameters which is a set of quantities that I can measure for my system and that essentially define that describe my system. So for example, this can be the density, the average density of molecules in the water which allows to tell if you are in the solid liquid or gas state or the magnetization which is a kind of measure of how atoms are aligned. If they have a preferred direction or any, so the magnetization is a global quantity, you look at all atoms and you look if there is a common alignment or any other macroscopic global property of the system that you are interested about. So these are the order parameters, what you can measure. So this is the way you describe the system. Then there are two very important quantities that enter into the game that really control the physics of the system. The first one is of course the energy and the E of O where O is the set of order parameters is essentially the cost for the system to be in the macroscopic state O that is described by this order parameters. So usually physical systems like to lower their energy and this leads generally towards more order. So for example, ice has a lower energy than the liquid state and ice is much more structured. You have a crystalline structure which is much more ordered. The second quantity that only enters the game where you have complex systems like the one I will be interested in today is the entropy, which is essentially the logarithm of the number of microscopic configurations of the system that correspond to a given macroscopic state O. So maybe you have many possible configurations of the atom that correspond to the macroscopic state gas or liquid or ice and the entropy essentially counts the number of such configurations and you need to take the logarithm because there are so many, actually there are exponentially many in the number of atoms or simple entities that form your complex system that if you don't take the log this number is just too huge to be compared with the energy so you need to take the log. And higher entropy leads generally to more disordered system. So the gas, the particles have more degrees of freedom to move around so the entropy is higher. There are more configurations corresponding to gas than the ice which is highly structured. And so the free energy is essentially the difference between the energy and the entropy where the entropy, there is a minus sign here and the entropy is weighted by the temperature which is therefore a way to tune the tension between order and disorder. So we come back to this picture. I have my free energy here. Now we understand a bit what it means. And so indeed if we look at a high temperature above zero degree Celsius, the minimum, so what I should observe the equilibrium state which is the state minimizing the free energy is the liquid state. And indeed at high temperature the term that matters the most in the free energy when the temperature is high must be the entropy because I have a bigger weight here. And therefore I need to go to a state with bigger entropy and the liquid state as indeed the higher entropy than the solid state. So this is the term that wins, the disordered term. If I reduce the temperature what will matter the most then is the energy term because this term will become small. And so if I want to minimize my free energy to find the equilibrium state, the one I would observe experimentally, physically, I need to minimize the energy and the solid state, the ice has a lower energy and therefore my system is found in the solid state. So now what I want to move a bit away from this classical physics models and to show you that these notions of phase transitions and the language of statistical physics and all that can be applied to a variety of other nice systems that actually appear everywhere in essentially all contexts that you can think about as long as you have enough complexity. Again, a complex system is what? It's something that is made of a very huge number of interacting entities. And what I claim and what I will show you is that in such complex systems you essentially always have the presence of this space range. So let us start with an example of imitation effects like in financial markets or votes. So I will consider a simple model of votes where essentially you have two binary, you have two candidates A and B and each person, each voter, each individual is indexed by I that goes from one to N is modeled by a variable that I call Si of T. This Si of T is essentially the choice at time T of individual I, okay? So if the individual I would be given the choice to vote at time T, it would be plus one if its choice is A and minus one if it would be, okay? So we'll consider a very simple model where essentially voters are only influenced by a global trend that is at each time we have a kind of survey of what's the trend in the population of voters and we'll model that in the following way. We'll say that the probability that at a given time the individual I votes for A which is the same as the probability that the variable Si of T is equal to one will be proportional to the exponential of the trend divided by what I call a temperature. I will come back to what it means but what is the trend first? The trend is defined to be the difference between the fraction of individuals in the population that are voting at a given time for candidate A minus the fraction of individuals voting for B, okay? That I can also just rewrite as the sum of these variables and so you see when there are as many people voting for A and B, this sum will cancel, will be zero and essentially in this case, the probability for a given individual to vote is totally random. So the probability to vote for A, so here there is a type of probability of individual to vote for A, will be 0.5. While if the trend, this quantity is bigger than zero then it means that each individual will be biased towards voting for A when it's negative, the trend means that the trend goes towards B and people will tend to vote more often for B. And what is this temperature here? I can, you see that when the temperature goes to infinity this essentially means that this trend does not matter anymore. It means that individuals are not influenced anymore by the global trend. While when it goes to zero, this exponential, this shape becomes extremely peaked and it means that people are strongly biased, much strongly biased by the trend. So I can think of this temperature as the average independence of individuals with respect to the dominating trend, sorry. So it's a kind of level of rebellion if you want. So let's simulate this system and see what happens. So here is a small code where I essentially simulate this dynamic of people voting at different times and that are influenced by this global trend. So let me consider first a very small system with five individuals. Here what happens is that over time we see that, so minus one, when this curve touch minus one it means that at this time all individuals would agree on voting for a candidate B. When it's plus one all agree on candidate A and in between it means that there is a different fraction for A and B. And you see kind of fluctuations like this and there is no consensus that appears essentially. It's quite erratic. I will increase a bit the number of individuals. I simulate again. And you see again the same kind of patterns with a kind of change of behavior of the population like this at times goes on, but it looks still quite random. Now I increase a bit more, 20, and we start to see something interesting. You see that the time window on which individuals agree starts to increase, okay? So you see a kind of patterns that appears. Now I go to 100, so it's a kind of large system. And what happens? What happens is the following. You have what we call a polarization effect. So this is a kind of breaking of the symmetry between the candidate A and B or the state minus one and plus one if you want. So essentially the trend created by the population becomes strong enough so that if some individual starts to try to vote something else essentially they will be, the population will have a stronger effect and will win against that. And essentially the whole population is stuck there in the state, which is the minus one state in this case. And this is what we call a spontaneous symmetry breaking in physics. So you have a breaking between the plus one and minus one symmetry. And you see that if I simulate the same model but on much longer times, what happens is that you see that the system will first polarize. So the population will agree essentially on which candidate to vote for for a very long time, but suddenly for some reason because of subtle correlations and probabilistic effects you have a drop and all the population then changes. So they change choice and then they vote for the other candidate. Okay? This kind of phenomenon is very generic in complex systems. And this illustrates this nice sentence, which is again, more is different. For 10 individuals you have this kind of erratic behavior while for a hundred you have this polarization effect that creates a spontaneous breaking, spontaneous symmetry breaking. But keep in mind that these two systems they follow the same rule. The only thing that changes is the number of individuals the number of variables that enter your system. And this is the type of phenomenon that leads for example to cracks in financial markets. Okay? Actually the model that we just simulated which is a baby model of imitation of course oversimplified, but still it contains a lot of interesting phenomenon. It's actually very closely related to an important model from physics. I think most of you will have heard about it. So-called Ising model of ferromagnetism. Ferromagnetism is the fact that you have magnets that essentially atoms in magnets tend to align in the same direction. They all carry a kind of little magnet themselves. And when all these little magnets align together you have a global magnetization and therefore you get a magnet. And this is what we call spins in physics which are just up or down variables which interact through this green interactions here. This interaction what they do is that they try to align spins. So you can think of that as energy contributions that are lowered when the spins align together. Okay? And the temperature is a kind of measure here of the fluctuations of these spins. Okay? How strongly they are affected by this alignment by this energy term that try to align them. But high temperature, what happens is that essentially most of the spin are in random positions. So you don't have a global magnetization. Half of them are in up, half of them are in down and they fluctuate like this. And this is equivalent to our simulations where I take a high temperature which means that our voters are very weakly affected by the global trend. And in this case, you see that there is no global consensus. The average vote which is the equivalent of the magnetization in this system. Again, the magnetization is the global alignment of this arrows, it's the sum of all these arrows is around zero and fluctuate around zero. But if I decrease the temperature which in our voter models means that people are more akin to follow the global trend. At some point you have a phase transition towards the ferromagnetic state which means that almost all spins will align together and you create a global magnetization. We can again read that from our free energy that I plot for different temperatures here as a function of the magnetization which is our order parameter. And you see that when the magnetization, when the temperature is large, it has a minimum in zero. So there is no global magnetization. We are in this region of the phase diagram or equivalent in this type of states here. Well, if I decrease the temperature at some point there are two minima that appears here. So it's here, I'm at the critical temperature and I see a global magnetization, a global alignment of the spins that starts to emerge which is the equivalent of this polarization phenomenon, this imitation model that we consider. And the same type of transitions also happen in a collective animal behavior. So let me show you some nice pictures. So here is a picture of huge colonies of insects what we call so active matter in physics where each individuals can make decisions by themselves but then you study huge colonies of them. This is a swarming state which is a kind of disordered state where individuals essentially do not follow specific direction actually. They look that they follow specific direction. This is more a polarized state, it's an ordered state. Disordered would be more like flies packed together with a center of mass that don't move much. And here is two other states of this active matter systems, these animals, here are fishes found in the so-called miling state where they kind of rotate around the center. And you have a polarized states here of birds which are huge colonies of starlings. I would like to show you a nice video. So here is a real video of these birds that form this kind of very complex colonies. And what is nice, what is important to notice is you see that this kind of system are kind of critical. What does it mean? They're really close to a phase transition in the sense that they are nor liquid nor solid. So they're kind of liquid because you have this very, you have kind of independent parts but at the same time you have a global coherency which is like a solid as well. So these systems are really critical. They are at the edge of a phase transition and this is not for nothing. These are, this comes from biological reasons of course. Okay. So this type of systems can be very simply modelled by this so-called, for example, V-shek model where each individual, what it does is that at each time it looks at its closest neighbors. What it does is that it approximately computes the average angles of all its neighbors and it tries to follow this direction. Of course the computation is not perfect and therefore we model that by adding some noise. So in equations, this is very simple. The angle in which the individual eye will move at time t plus delta t. So at the next step is essentially the average direction of its neighbors plus some noise which models the imperfection of the computation by this bird. So this term is a kind of energy term that will tend to create order. This term will contribute to align all the birds together to create a kind of magnetization for the birds, global alignment. While this term, which is the noise term can be found as an entropic term that tends towards more disorder in the system. Okay. So maybe I can show you a small simulation, this model. So this simulation was not done by me. So here they define an order parameter which is essentially the global alignment of all the velocities. So you sum over all the individuals in the population their directions essentially. And this is called the velocity correlation which I like to call the magnetization but this is the same. And so it is essentially zero if the motion of the birds are totally random and one when they're all perfectly aligned. And what they will do is that they will tune as time goes on the strength of the noise which reduces here. And you see that as the strength of the noise reduces this global alignment increases this correlation between the alignment and you indeed see that there is more and more structure in the movement of these birds. The beginning it was kind of totally random. And at the end we see a kind of global behavior where they follow each other which I think is really nice. All right. So let's now go towards maybe systems in which I think most of you wouldn't imagine that type of notions would appear but actually they do. And in particular I will start with problems in computer science and discrete mathematics. In graph theory and combinatorial optimization and I will tell you what it means. So the fundamental question behind that is a kind of basic question but at the same time extremely deep which is the following. Why some problems are easy to solve while others are not at all? So this is something very clear. But is there a fundamental reason for that and you must have a guess the fundamental reason between different hardness for some problems is related in some way to phase transitions and this is what I want to illustrate now. So let me introduce the father of what we call combinatorial optimization and graph theory which was a problem that people were wondering about in Konigsberg which is a city in Russia which is now called Kaliningrad in the 18th century. So the name has changed, it is here on the map. And so the game was the following. So this is the map of the city at this time. So there were an island and then there are two banks here and another island here. And there are one, two, three, four, five, six. There are seven edges connecting these different parts of the city. And the question that people were trying to answer is can we find a path that crosses all bridge a single time? And really what is important here is a single time. So people were just trying like this different solution. Okay, let us try, let us play a bit. Okay, so we missed one. Unfortunately, we cannot reach that bridge from here. So we didn't find a path. Let's try another one. We still unfortunate. We didn't find a solution. Okay, so people were trying thousands and thousands of combination and never found one. And actually the reason is that there is not and this has been understood by Leonhard Euler who was a mathematician. And what he did to answer this question is essentially develop what is now a very active field of discrete mathematics which is graph theory. So we took the map, define the scheme. So he simplified the map and then he put on each banks here on each side of the river and on the islands, a dot. Okay. And what he did is that he connected these dots if there is actually an edge on the original map that connects these parts of the city. And what you obtain is the first graph ever which is the following. And here, so for example, this node correspond to this one and you see that you have an edge connecting to this one and an edge connecting to this one. Actually you have two edges connecting to this one which are these two edges and so on and so forth. And Euler told us after some work if more than two nodes have a node number of edges there's no solution. You can not find a path that crosses all bridge at a single time. So this became, this has a name now, this problem of course in reference to Euler. This is can we find an Eulerian path? And so the question is the answer is very easy. Just look at your graph, answer the following. Is there more than two nodes that have a node number of edges? Here the answer is yes. We have one here for example, one here and one here. And therefore there is no Eulerian path. If the answer would be no there would exist an Eulerian path. So essentially we have a problem here that is easy to solve. We have a simple algorithm which is just to check that to answer the problem, okay? By the way now the map of the city had changed a bit. They essentially destroyed these two edges. So therefore now there exists an Eulerian path. So in 2020 the solution is actually trivial. All right, let me now discuss another problem that looks very closely connected but that is fundamentally different. She's the so-called Hamiltonian path problem which is the following. Can we find a path that encounters each node a single time? So before for the Eulerian path problem I wanted to path through each edge, each edge. Now I want to path through each node a single time, okay? And actually to the best of our knowledge the best solution is just to try all paths until finding one that works. So there is no smart way to find a solution to this problem. But the problem is actually that there are exponentially many solutions, different path, not solutions, different path to test until hopefully finding one solution if there exists one. But let's just remember all together again how fast exponentials are increasing. Exponential 10 is 20,000. Exponential 30, 10 to the 13, so 10,000 billions. So it's really a huge number. So forget it for bigger numbers. Actually it's easy to show that even if n would be of the order of 50, so n is the number of nodes in the graph for which you want to find an Hamiltonian path. It would take you more than the age of the universe on the modern supercomputer to test all path. Essentially you can just forget about it. This problem is not solvable efficiently. At least we don't know how to do. And therefore answering the question is there an Hamiltonian path is a very hard problem. So there are easy problems and hard problems. And hard in the sense here that we don't know efficient algorithms to solve that. And there is a name for that in computer science. These are problems that belongs to so-called P-class where P stands for polynomial. It means that there is a polynomial time algorithm. So an algorithm that makes a polynomial number of operations, polynomial in n, the number of nodes in your graph to solve the problem. NP means that there is no such polynomial algorithm. The only thing we know at best is an exponential time algorithm. So a very costly algorithm that usually you cannot run. And actually this problem because belongs to a special class of such complicated problems that we call NP complete class where complete here means that if you can solve efficiently this type of problem, again, we don't know how to but if you would be able to, you could solve an extremely large class of other complicated problems. Essentially almost all hard problems for a computer. So if you solve one, you solve them all. And you solve them all. And proving that these two classes are not the same in the sense that there actually is a class of problem for which there is no polynomial time algorithm. Because we don't know there is not, we just know that we didn't find any. Proving this inequality that these two types of problems are really fundamentally different. It's one of the most deep questions in computer science. If you can solve it, you get $1 million. And that's not a joke. This is one of the millennium prices. All right, so now I want to ask the following question. Are all NP complete problems, so essentially hard problems, are there really all the time hard? What do I mean here? These problems are hard in general. This is what we've seen. But sometimes they're actually very easy. Here is a graph. And if I ask you to find an Hamiltonian path or this specific graph, the answer is absolutely trivial because of the structure of the graph. Just start from a node and follow the graph. And you just found an Hamiltonian path. So this is very easy. So something must happen in between these two extreme regimes. You have a regime where the problem is really hard. We don't know any algorithm able to do something smart. And there is another regime where the problem is absolutely trivial. Okay, you give it to a two year old kid and he solves it. So something must happen to bridge this kind of two complexity regimes. And of course you have guests, probably the connection, what separates this kind of two regimes are phase transitions. So to illustrate that, let me introduce another such NP complete problem which is very nice. Which is the so-called coloring problem. So it's very easily stated, but at the same time very hard. That's why it's such a nice problem. Imagine you have a map. And I ask you, can you find a coloring which means a way to color the different countries so that two countries that share a border do not have the same color? Okay, there is actually a theorem which is the so-called four color theorem that tells you that four colors are enough to properly color any map. Properly color means again that no neighbors have the same color. So this has been conjectured a long time ago and it took more than a hundred year to be proved. And the algorithm to do that efficiently took additional 20 years to be obtained. So how to turn that into a graph problem coming from graph theory. So just add, just associate a node to each country. Then put an edge between any two nodes that share a border on the original map and you just got a graph. And now the four color theorem in this language reads the following. Four colors are enough to properly color any planar graph. So planar graph means that you can essentially draw it on a surface so that two edges will never cross. There are no crossing edges, okay? Right. So now we want to study how we go from simple instances of this coloring problem to hard instances which are really difficult to color. So to do that, we need to define a kind of ensemble of graphs, which will be random graphs. So how to generate a random graph? You just take a certain number of nodes, let's say 100 nodes, then you fix a number of edges, in this case, 218. And you just connect nodes randomly with this fixed number of edges, okay? So here is a representation of such a random graph. And the problem that we will fix ourselves here is can we color this random graph? It's just three colors. Of course, it would have access to infinities to as many colors as you want. The problem would be trivial. You just color them with a different one. Here we constrain ourselves to have only three colors. Notice that even if we would have four, it would be non-trivial because here this is not a planar graph. You see that things intersect a lot. So this graph, you cannot represent it on a plane, okay? Without intersection. So now we have this ensemble of random graphs. Let me define a control parameter that will allow us to tune the complexity of the problem. So our control parameter, like the temperature in physics, is here the density of edges, which means essentially the number of edges divided by the number of nodes, okay? That you can fix as the average number of borders between countries in the example of the map, okay? So when this thing is high, it means that you have very many small countries that have a lot of neighbors. When this is small, it means that you have essentially very few large countries and it's much easier. So a small C, the graph typically looks like this. You have many nodes that some are just separate of all the others. You have few connections like this. So it's very easy to find a coloring. While for large C, it's very hard to find colorings because you have many connections and therefore the nodes are highly constrained. If I put a color here, it means that I cannot choose the same color for all these guys, which is a hard constraint. All right, so now let me take the best algorithm that we know to solve this problem and run it on many instances or many different random graphs for a fixed average connectivity, which is my control parameter. And I do the statistics. I compute what is the fraction of random graphs that I can color with my algorithm as a function of this C. And what you will see if you try the experiment is that until some threshold, essentially your algorithm performs extremely well. It always finds solution. And suddenly you have a drop in performances like this and above this threshold, which is extremely sharp and that gets sharper as the number of nodes increase. We go from 50 to 100, 100 is the red curve. Above this threshold, essentially your algorithm is totally stuck, it won't find any coloring. And the reason is that actually deep in this region with high probability, there are no ways to color the graph. But here you have a region where still you find some, but you see that the computational time required to do so explodes seems to grow extremely fast like this. And this is nothing else than the appearance of a phase transition in a combinatorial optimization problem. The algorithm trying to solve this problem is experiencing a phase transition and therefore the problem in this region becomes extremely hard. While in this region, the problem is extremely easy and in this region, the problem is essentially impossible to solve. You don't have colorings for your graph. And this defines the phase diagram, exactly what I've shown for water at the beginning, except that we're talking about the behavior of an algorithm and you do have different phases for this coloring problem, okay? And this can be put in analogy with physics. Again, the easy regime can be thought as the liquid phase where essentially particles can move freely around. So a lot of degrees of freedom, it's easy to move things and to find colorings. While the hard and impossible regime, the graph is highly constrained. It's very rigid, which can be linked to the glass phase in physics. And actually it can be more than linked. The tools coming from the study of glassy systems in physics, in statistical physics, are really the ones that are appropriate to describe all this region of the phase diagram. Let me now discuss not a last another set of problems which I find fascinating and where again, phase transition will appears which are problems of signal hidden in noise. So this is called inference. And in inference, the problem is always the same. You have some hidden signals, some piece of information that has been corrupted by noise and your task is to recover the signal, okay? So let me discuss the father of inference problem which is the so-called communication problem which gave birth to what we call information theory in 1948. So the problem is the following. You have an emitter and a receiver that wants to exchange information that you can think as bits like zero and once. You can always represent information zero and once in the numerical world. So our dear Bob is trying to say to Alice that he feels kind of strange because he's blue, which is indeed strange. But unfortunately Alice is not hearing him properly. So what she asked very naturally is to repeat. Can you repeat the message please? And indeed, Shannon, who is the father of information theory who's an engineer from MIT, understood why it has to repeat is actually trying to communicate at a rate, at a speed if you want, that is too high. So let me explain what I mean here. So the communication problem, you have again this emitter and receiver that wants to exchange a piece of information. But there is no way, there exists no perfect communication channel. There is always noise, interferences. There is always some source of randomness that will corrupt the information that you want to exchange. And this can be modeled, for example, in the simplest case, but what we call in coding theory and communications, the binary erasure channel or BCP. What does it mean? This channel, which is a probabilistic model of information corruption, it takes, for example, as input a zero and with probability one minus P, it will output the bit without any error. But with probability P, which is typically not too large, it will output nothing. So it's essentially erase, it fully destroys the bit. And it is the same, it is symmetric with the bit one. So if you try to communicate through this channel, the fraction P, random fraction P of the bits would just be erased, fully lost. The question is, is there a way to robustify communication with respect to noise? Is there a way to communicate anyway despite the fact that there is noise? And actually, Alice, without realizing for the solution directly, there is a solution. Just repeat, you need to add redundancy, which is the key idea behind communications. So here, what you can do, for example, let's say you want to communicate yellow, which in binary form could be 001. Instead of just sending these bits like this and taking the risk of losing a fraction of them so that the message could not be decoded at the receiver side, just repeat it. So repeat it three times, for example. So here I just repeat my message three times. This repetition code are free. And what I call the rate is the number of information bits divided by the number of actually transmitted bits through the communication channel. So in this case, I want to transmit three bits and I send nine and therefore the rate is one third or equivalently the rate is one divided by the redundancy and here the redundancy is three. Okay, so it's very simple. And what Shannon understood is something absolutely fundamental, which is the fact that there is an absolute limit to the rate at which you communicate. So what do I mean by that? If you allow yourself a target error because there is no way to communicate perfectly, there will be always error. But what you can do is to try to minimize this error. So let's say you allow yourself a very small target error, almost zero. Shannon tells you that actually there is a rate above which you cannot communicate at maximum for this channel, for this eraser channel with probability 0.1 of erasing bits. You will never cross this communication rate. Okay, if you allow yourself a higher error, this curve is moving a bit. We don't see it on the graph. It's moving a bit on the right side. You can communicate at a slightly higher rate, but not much. And this is really a fundamental limit to communication. Above this line, it is impossible. And below this line communication is actually possible. By impossible here, I mean independently of any coding scheme. So for example, here we use the repetition code. The repetition code has a rate of 1 third, which is here, but it leads to a very high error. It's not, at the decoder side, you will get many errors. What you can do to lower the probability of error is to increase the number of repetition. This is five, et cetera, et cetera. And this is 61 repetition. So you see that we arbitrarily lowered the error, but the price we pay is that we tremendously reduced the rate of communication. So there is an absolutely fundamental trade-off between communication rate and error, probability of error, the fraction of bits that are lost at the receiver side. If you want to increase the rate, you need to pay the price of having more errors. If you want to reduce the number of errors, you pay the price of reducing the communication rate. And this is the best curve you can reach. So our dear Claude Shannon answers to our friends, Alice and Bob, whatever you do, and even if you add access to the most advanced alien technology that will ever exist in the universe, you will never be able to communicate at a rate exceeding the capacity of the noisy channel you are using. There is a fundamental phase transition that prevents you to do so. I'm really sorry. And so our friends are a bit disappointed and poor Alice is still not hearing very well apparently. All right. So let me discuss something that I find very nice, which is the strategy that nature has found to solve our problems. So what do I mean by that? We have this plot that I just showed you. So you observe that with our repetition code here, we still have, even if we repeat a lot, we still have a very large gap between the optimal curve, okay? The optimal trade-off between error and communication rate and what we actually reach with this communication scheme, this repetition code. So is there a way in some sense to close this gap? And nature has found a way to close this kind of gaps. So let me illustrate that again, coming back to water. Imagine that we have water at above zero degrees Celsius. You can do the experiment if you lower the temperature of this water slowly enough. And if the water is very pure, it will stay liquid even until minus 40 degrees Celsius. So the water is trapped in a so-called super cool state. So essentially you see that because you are in the liquid state, which is this minimum of the free energy, you see that when you decrease the temperature, the minimum is now here, but there is a barrier between those two states, which is here, but if you don't help in some way your system to jump over this barrier, you will just stay here forever, okay? You will get stuck for very long times and actually for infinite time if the size of the system is big enough. So your water remains liquid. But if you help the system a bit by punching on it, by adding a bit of energy locally, essentially you allow this ball to jump in the equilibrium state, which should be solid at the temperature, which is below zero degrees Celsius. And you create a small nucleus of crystal, okay? This water is at the temperature, which is below zero. And this crystal propagates along the system as a wave like this. And therefore at the end, thanks to this nucleation effect that starts from the seed of the nucleus here, the whole system finds the equilibrium. And actually, quite amazingly, we can use this ID to obtain error-correcting codes, so ways to add redundancy for communicating that allow to reach the Shannon capacity. And this is called spatial coupling. The idea is to create a code, a way to correlate our bits, a bit more smartly than just doing repetitions in a way that you create a kind of 1D chain like this. And the idea is that you have a kind of nucleus. So let me first say what are these boxes? These are called parity checks. These guys tells that their neighbors, so the bits to which they are connected, the sum has to be even. So essentially the emitter and receiver for communication, they agreed before communication that whatever signal will be transmitted, it will verify all these constraints. So for example, the bit number one, two, and three, the sum of them has to be an even number, okay? So now we send our bits that verify these constraints and at the receiver side, Alice, she receives this version. So some of the bits has been erased unfortunately. But you see we designed the code in the way that the first bit is connected to a single check. So even if it is erased, we know for sure that it has to be a zero because the sum has to be an even number. It cannot be a one. And now you see that now that we inferred that we reconstructed this bit, we can now reconstruct this one because I have a zero, a one. And if I want the sum of these three bits here to be an even number, this one has to be a one. It cannot be a zero. So this information, this seed helped me to reconstruct the next bit, which will help to reconstruct the next and the next and so on and so forth. And actually you get a reconstruction wave which is behind. It is exactly the same physics as this super cool water. And this is not an analogy. I mean, this is really the same phenomenon that appears. And you have a phase transition from this super cool state to the equilibrium state, which is the crystal. And in the context of this problem, crystal means reconstructing the information. So the information that was sent to this binary channel, the razor channel here was this image of a woman. And you see that you see a wave of propagation, a wave of reconstruction that starts from the first bits and that is propagated inwards the signal like this, exactly like in the water, which I find really beautiful. So thanks to this technique, we managed to fully close the gap between what we obtain algorithmically and what we can obtain optimally. And so Shannon is happy because Shannon told us there is an absolute limit, but he never actually gave us the recipe to reach this limit. And it took more than 40 years actually to get error correcting codes that are able to reach this limit. And what I just told you about, this partially coupled code is a class of codes able to do so. Actually there are only two classes of such error coding codes. Let me quote Richard Feynman who was a Nobel Prize from 65. The science of thermodynamics began with an analysis by the great engineer Sadik Arnault of the problem of how to build the best and most efficient engine. And this case constitutes one of the few famous cases in which engineering has contributed to fundamental physical theory. Another example that comes to mind is the more recent analysis of information theory by Claude Shannon. These two analysis incidentally turned out to be closely related. So Feynman of course has understood at this time that information theory and statistical physics and thermodynamics were extremely close fields of science. Let me discuss another example coming from machine learning and artificial intelligence. So in this example, I will discuss one specific field of machine learning which is called supervised learning. So the idea of supervised learning is the following. I give you a bunch of examples. So each image is what I call a data point and these data points are labeled which means that for all of these data points I also give you the answer, the category to which belong to this data point. So this bunch of data, of labeled data is called the training data. And what we want is to have an algorithm, a procedure that outputs from this training data a predictive model for new unseen data which were not part of the training data. What do I mean by that? Once we have our predictive model, what we want is that if I take a new example that is unlabeled, that was not part of the training data. So our predictive model never seen this image before. I want that our model is able to generalize in the sense that for this new data point it outputs the proper label, in this case a dog. So let's look at a cartoon version of this problem. So here is some labeled data. I have one cat and two dogs. So I train an algorithm that in this case we just try to find a linear, just a plane to separate the data. So let's say that this is what the algorithm output, this is our classifier. In a sense that when we are on the left side, the algorithm would tell you this is a dog. When we're on the right side, the algorithm would tell you our predictive model would tell you this is a cat. Okay, let's test that on new unseen data. Unfortunately, we have errors because here you see that this dog would really, that interface is not properly classified. Maybe it would be classified as a cat and this cat would be classified as a dog. So our algorithm is not yet very able to generalize. So now we give labels, so a bit more data to this algorithm to train it more. The algorithm is able to fine tune. This is our new predictor, our new model. Let's test that again, some new data. There are still errors. You see that this cat would be classified as a dog and this one is still at the interface. So our model is not yet perfect. So maybe by giving some more labels it will improve and indeed, now we have something that looks quite nice and if I test that again, a new example, it will properly classify it as a dog. This can be done through the so-called Perceptron Learning which is the simplest neural network that exists which is a very simple model of artificial neuron like we have in the brain. The idea is that you have a number of input neurons that will correspond to the input image. So these are the values of the different pixels of this image. So let me emphasize that this machine, this algorithm does not see this image. What it sees is really a huge vector of numbers and that's all. So it's a very hard task to make sense of this huge vector of numbers. For us, it is very easy to guess that it's a cat because we're trained for that. But for this machine, this is just numbers, a lot of numbers. So the task is not so easy. And what this machine does is that it takes these inputs and it projects that onto a so-called classifier which essentially can be taught as the synaptic weights of this machine. And here you have an activation function that will take the sign of the sum of these inputs. The xi are the values of this input neuron. So the values of the pixels of this image times the weights which represent the synapses here. You can think of that as the sign of the linear projection of our input onto the vector w which is essentially the normal to this line here. So this vector w parameterizes our classifier. And this sign, so the output neuron gives you this sign and this is actually the class that you predict for this data point. And what the perceptron learning is doing is from many examples, it tries to learn these weights, the synaptic weights in a way that it properly classifies all the data from the training data. And then what you hope is that if you use it on a new example, it will output the proper label. So if you try to study this simple neural network for a simplified data set, a bit simpler that is extremely complicated that a set of images of cats and dogs for a type of data for which you can really develop a mathematical theory and understand what happens. Then you can get formulas that tell you what is the generalization error that is the performance in classifying new unseen data of our perceptron algorithm as a function of the amount of training data, okay? So the amount of training data here is our control parameter is what we can play with. And this is our order parameter which will define in which algorithmic complexity phase we are, okay? And what you observe is again a phase transition. So if the amount of data is not big enough, if you look at the optimal algorithm and by optimal here, I mean that this curve has been computed thanks to a formula. We cannot actually run the optimal algorithm because it would take an exponential amount of time to be run. But fortunately in this simple model, we can compute an exact formula. We have a theorem for that. And this is this red curve. And you see that there is a critical amount of data where you have again a phase transition. It drops suddenly and above here you have a kind of perfect learning. It means that essentially you find exactly the proper classifier between the two classes. But before this transition, you don't have perfect learning. So this can be defined as a phase like in physics. And now let me test that again to efficient algorithms that I can actually run now on my computer. These are the black and blue dots. And what you see is that this fast algorithm, the black one is able to match the optimal performances except in this region here where you have a mismatch between the optimal algorithm which would lead to perfect learning and your actual fast algorithm which is not perfect. But if you give it a bit more of data then you have another phase transition and suddenly you enter this so-called easy learning phase where now you have a simple fast algorithm able to actually reach optimal performance. And in between you have again this odd regime like what we've seen in coloring, for example. Which is the kind of glassy phase of the problem. All right, so I'm essentially done. So I hope that I conveyed the main message which was that the variety of systems in which this kind of phase transitions and statistical physics notions can be applied is absolutely huge. And may a priori look like very different and separated systems, but actually they can be described into a single framework. So I've discussed physical systems, animal behavior, computer science problems, communications, inference of signals, financial markets and then votes. But I could have talked also about quantum systems, neuroscience, DNA and protein folding which is how proteins place themselves. The spreading of diseases in populations, of course, traffic jams, neutral breaking cosmology. And let me also emphasize that behind all that there is not only the physics and algorithmic aspects but there is tons of beautiful mathematical questions which actually I'm very interested in and I will stop there. Okay, so thank you very much for this beautiful talk. Maybe we can all unmute ourselves and first give you a round of applause. Okay, so here comes the difficult part. Now, if anyone has any questions, we just want not everyone to speak at the same time since we are almost 100 participants here. So maybe if you can either raise your hand or just write in the chat that you have a question and I can try to choose an order. So any questions? Maybe the diploma students. Okay, so yeah, raise your question. So, well, sorry for the pronunciation. Not a problem at all. Thank you, Jean. Very nice, very, very nice talk. Hi, what? Very clear. Yeah, very clear and lots of nice examples. I'll start just a question from backwards. I mean, taking the algorithm's question, you wouldn't know a priori if you quit, how an algorithm will follow that curve. If we look at your final slide of the black and blue algorithms, you wouldn't know a priori how it will behave. So the answer is, in general, no. If you gain, after working a lot in the field, you gain a kind of intuition in the sense in which, if I take the machine learning example, in which scaleings or the amount of data, the problem will be easy or hard. The precise numbers, of course, no. But in general, no. There is no recipe to get, to know in advance in which phase, algorithmic complexity phase you will be, without doing, actually, quite involved computations. And this is the whole job of the statistical physicist and of the mathematician to compute this free energy or these generalizations of free energy for these more abstract systems to actually get the phase diagram properly. So to follow up, so knowing that there is a phase transition in whichever system I'm working on, does not knowing it exists somewhere does not necessarily help me solve it? No, but it at least helps you to be careful in the sense that this kind of, let me take the example of the Shannon capacity, which I think it's very enlightening about this point. It's very important to know that there are fundamental limits and when you can quantify them, because imagine that Shannon would never got his theorem telling that there is an absolute fundamental maximum rate until which you can communicate. Maybe a full generation of coding theorists working on communications would have worked a hope, I mean, without any hope at the end to develop algorithms able to beat this limit that they didn't know about by improving algorithms, but actually there is no way to beat this limit. So it's very important to know in advance if your problem can experience phase transitions or not and if you can to try to quantify it because you can then just lose time developing algorithms to get performance that there is no way to attain. Thank you. Okay, so we have another question as well from Ralph Gebauer. Yes, hello everyone and thanks Jean for this nice seminar. And there's one thing which is not very clear to me, in this part where you talked about Shannon and communication and error in the communication, then you said one can reconstruct or correct for errors thanks to these constraints, which in your example are the sum of three consecutive bits must be even or something. So, but what happens if you try to use this and want to send a message which does not satisfy this constraint? Yes, so maybe I did not emphasize that enough. So what I said is that before communicating the receiver and emitter, they agree that the only possible messages that will be communicated will verify this constraint. So if you allow yourself constraints that generate a dictionary big enough, then you can map one to one, any of these messages verifying the constraints to a message that you want to transmit. So you can always find what we call a code book which is this huge dictionary of messages that verify the constraints to communicate. Okay, so the example of this image which was reconstructed, imagine the rules were different from just the simple sum. Yes, yes, yes, of course, of course, of course, yes. Thank you. Okay, so it seems there are no more questions at the moment. If you do, of course, you can still write. So I can seize this opportunity to tell you that we will do a little questionnaire here just to see if everything works well or not. I will start this little poll that takes 20 seconds soon and please answer that. It will be helpful before you leave the meeting. And then I'm also thinking that maybe there are some diploma students, for example, who didn't dare to ask a question in front of all the hundred participants all over the world, right? So maybe the diploma students can stay a little longer and they can get another chance to ask questions to the speaker. And will we get wine and food and snacks outside now or not? Yeah, this is what happens normally, no? This is something maybe you can take up in the poll and we will try to fix that for the next meeting. Okay, so I will watch this poll now. Yes. Okay, Fernando is suggesting a group photo as well. Let's see. Okay, so Jean, there is actually a question here from Syam saying his mic is not working. So how do you relate the second-order phase transitions to various systems is his question. Okay, so I think what Syam has in mind is that there are different type of phase transitions in the sense that the most common phase transitions are what we call first-order and second-order phase transition. So what it means is, so first-order phase transition, it means that the first derivative of the free energy, which is the fundamental object that allows to describe the thermodynamic and the global properties of the system, there exists a first derivative, which is discontinuous, okay? A second-order phase transition is when you have a second derivative that is discontinuous. In the latter case, when you plot the order parameter, like anything that I plot here, the algorithm, the performance of an algorithm as a function of the amount of data, for example, second-order means that it would be continuous. It would be smoothly varying. Well, the first order means that you have a discontinuity, like most of the cases I've shown here. There is no rule in the sense that in these systems, you may have the two types of phase transitions. I emphasize more the first-order type because they're more visual and in this kind of high-dimensional problems, this is more typical in the sense. But for example, in the biological systems, when we've seen this V-shek model where the correlation, the alignment between the birds was increasing smoothly as a function of the decrease of the noise, this was more a kind of second-order phase transition. But the two can happen. This really depends on the system. Okay. Very good. So now I guess we can ask to take this group photo that was suggested. So when I ask everyone to please switch on their cameras, switch on the video, and we will see if we manage to take this picture. Yes, with all the 60 plus participants that are still here. Suddenly it becomes a bit more... Yes. Yes, and now you can see everyone who was here. I suggested it, but I'm not quite sure how to do it. Yeah. Well, suddenly we don't all fit in one screen. So what I could do is just stay still and I take a picture one at a time with my phone. I think a mixture of technologies maybe... You can do a screenshot maybe also. Yeah, but I tried this and... Before it didn't seem to work. I can try that, but... Okay. Fernando, I can do it. No worries. Okay, Sabrina can do it. I think I see that there are three groups which come up. Yeah, we can do three. I will do... Well, the screenshot actually works. So I can do that too just to have... One... So people want to turn their camera on? No, not all. Bogdan, Banju... No, they're saying that... They haven't come around. If they are so kind... Sergio, Núñez, Sochog, Yikram, Dennis, who? Well, maybe they don't want to. So we don't have to watch anyway. I mean, those that are willing, it would be nice to turn the camera on. I'm on page two. I'm going to try another screenshot. Somebody has a penguin. So that's not real. All right, so... Oh, now we have just... Yeah. People left. Yeah, people just care about the science, not this kind of stuff. Wait, wait, one more. Okay, I think that I have them. Okay. Thanks. So thank you, everyone. Okay, so I think everybody is a diploma student.