 I'll start at the beginning again. OK. Jen, we are very happy to welcome you. Dimitri is famous in mathematical physics. Many come from a relation group theory, but then he becomes interested by the story of the formation and complexity, and then he will give us this lecture. So welcome, Dimitri. You can start. So thank you. And as I was saying before, I apologize for not being there in presence because I had some personal problems. And I thank the organizers for giving me the opportunity to speak anyway through the web. So I'll speak about information and complexity. And the problem of information is the statement of the problem is the following. Suppose that we have a random volume that takes outcomes in some finite alphabet, capital X. So capital X can be anything you can imagine. It can be the 0, 1 set bits. It can be the normal alphabet. It can be the alphabet of the genetic codes or anything any finite set that you can imagine. And the problem of information is engineering problem. How to efficiently convey a message from a sender to a receiver without paying any attention to the semantics of the message? For the engineer, whether it is a bit sequence or it's an alphabet, a written letter of the normal alphabet, this is totally irrelevant. And in 1948, Shannon proposed a definition of information content of images as being the decrease of uncertainty when the random variable, the outcome of the random variable is revealed. What does it mean? It means the following. Before the outcome is revealed, the only thing you know about the random variable is maybe it's low. After the outcome of the random variable is revealed, then you have a precise value for the random variable. So the uncertainty you have before the outcome is revealed is maximum. And when you are communicated the outcome of the random variable, this uncertainty becomes 0. So the information is probabilistic notion and concerns and sums of messages. It cannot be thought about individual messages. That's the first point. The second point is that this quantity of information that Shannon proved in 1948 as being the information content was already known practically three quarters of a century before because Boltzmann in 1877 had already introduced this notion as the terminology. It was published only much later in the 1896. But it was known already early. And she introduced this quantity of entropy to explain the microscopic reversibility, despite that microscopic gas is governed by reversible in time equations. So that's about information. So what is complexity? Complexity is an algorithmic notion that concerns individual messages. So this notion has been independently introduced by Solomonov, Kolmogorov, and Chaitin, almost simultaneously by Solomonov and Kolmogorov. Yeah? Solomonov, it's Raphel. Can you use full screen because it's not so big for the students. Can I use? Can you use a full screen, the big screen? Yeah, a full screen, yeah. Because it's not so big for the students. We have a little, we have not a big screen. You're right, you're right. I don't want to use full screen for some very peculiar reasons because I'm using this open world thing. And with the open world, I can write on the transparencies. Ah, yes, yes, yes. So can you see up to here? Can you see up to here? Yeah, yeah. OK, so essentially you have the whole screen. Is it OK if I continue like that? Yes, we will try this, in fact, we will manage. Pause. We can put our screen on, we can put our screen on. Yeah, I think it's a good one. One second, we'll try to put our screen as full screen. So I can continue like that? Yeah, yeah, one moment, one moment. No, I think it's a good one. Put it on the screen that you want. The what screen? The you want. Please come if you want. I hope it didn't seem to tell you that there is a certain student which is online, which is not here. So they can tell you, we will tell you, because I'm not sure that you see when they ask questions. It's online students. So when they ask a question, I will tell you. OK, OK. So here we are 20 students and online students. OK, thank you. So the complexity is thought as being the length of the shortest program from which a random sequence can be reproduced. This concerns individual sequences and is also known as algorithmic information. Now, what is important is that the two notions of entropy and complexity are closely related because for some special class of processes that's called ergodic processes, I'll explain it later in this course, the complexity per symbol is equal to the entropy per symbol. So in fact, these two totally independent notions of complexity and deformation coincide. If you look at a long sequence and you see the content of information of complexity per symbol. So the plan of the course is the following. The first lecture and maybe part of the second lecture will be devoted to the basic postulates of information, its definition, its significance, and some properties of the information content. And then we shall speak about related functions. The second course will be devoted to source coding and to compression algorithms. And the third will be considered, will be concerned with channel coding and studies, especially stately from the mental theory of information transmission. And the final talk will be about the complexity during machines and during machine description of information content. So that's the plan. So let's start about information. So how to define it? So you consider a random variable, taking values in some finite alphabet. And the law of the random variable is the probability that it takes outcomes small x. So just a notation. So x, like that, is the alphabet. x, like that, is the random variable. And small x is the value of the random variable. The value it takes, the outcomes of the random variable. So small x are the elements of capital X of this x. Small x's are elements of this alphabet. And so the law is determined by what is called the probability vector. That means for every small x, you know some number. So this number will be positive, non-negative, in fact. And the sum over all x's must be equal to 1. So this is what is called the probability vector. Now, assign to every event, that means this set here, to every event a quantifier of the decrease of uncertainty of this event. So I call it small h, this quantifier. So before the random variable has been observed, you have a maximal a priori uncertainty. What I was saying before, the only thing you may know is the probability distribution, the vector p, governing the law of the random variable. And so suppose that the law is uniform over x's. So all x's have the same probability. You have a maximal uncertainty before the random variable has been observed. But after the observation, the uncertainty is 0, because you have no uncertainty. If before the coin is tossed, it can be 0 or 1. But once the coin is tossed, it is either 0 or 1 exclusively. So there is no uncertainty when the coin is tossed. So since the afterwards uncertainty is 0, that means that the reduction of uncertainty is only the a priori assembly. Now intuitively, the uncertainty of an event is a function of the probability vector. So if it depends on the small x. And now, defined for a fixed event, a subset of x, having some given probability, maybe call it q, not to confuse with p. So it's better to call it q. And the h will be a function of this. And for the moment, I don't know this function. Just as I write as small h, I don't know this function. But this function depends heavily on x, heavily on q. So it's not very convenient to use it. So it's better to use the function capital H. That is the expectation of small h. That means the capital H will be the average with respect to the probability vector of h px. Is it clear? We have some issues on that. Maybe we'll try to appreciate. Maybe basic questions are important. Yeah, but I think it's not very important. There's another one you must follow with some. So I need this. Tell me one question. Quick question. Hello. I'm fine. Sorry, could you please repeat the explanation of why reduction of uncertainty is equal to upper ionic? OK, say it again. Consider a concrete example. Suppose that the random variable you have is just the head or tails. So the random system you have is x equal to 0, 1. Right? And with probability of 0 equal to probability of 1, equal to 1, 1, 1 half. So you have a honest coin. So before you toss the coin, the only thing you know is this probability vector. So you have 50% chance to have 0, 50% chance to have 1. So you have a maximal uncertainty about the outcome. But once you have tossed the coin, you know that this is either 0 or 1 exclusively. When you toss the coin, the coin shows upper phase. And you know whether it's 0 or 1. So the uncertainty you have afterwards is 0. So if you make the decrease of the uncertainty, isn't this uncertainty before minus the uncertainty after? But this is already 0. So the reduction of uncertainty is equal to the upper ionic. Is it clear now? Yes, thank you. You are welcome. Now fix the cardinality of x to be some integer m capital M. And provisionally, because I'll drop this notation just afterwards, the expectation of small h of the uncertainty as capital H indexed by the number of the elements of the set x. Now, some totally intuitive idea is that it's much easier to correctly guess the outcome of a coin tossing than the outcome of a number lottery. So when you toss a coin, so you have just two possible outcomes, 0 or 1. But if you choose six numbers among 49, as it happens to a number lottery, you have one possibility over 13 millions to get the correct answer. So it's much easier to guess the outcome of a coin. So if I tell you that the outcome will be 0, even if I don't know the outcome, I have only 50% chance to be wrong. But if I take that the output of the number lottery will be the numbers 5, 7, 8, 18, and so on, I have only one chance of 13 millions to be correct. So intuitively, it's easier to guess the outcome of a coin tossing than the number of a lottery. And so a very normal postulate will be the postulate of monotonicity. If I denote fm to be the uncertainty over a set of m, of cardinality m, and I use the vector 1 over m for every x. So x here will be 1 up to m. So it has cardinality m. And the probability vector I take is just the uniform vector. So every outcome has probability 1 over m to occur. So that's the notation here. And so I am considering the average of the uncertainty in this situation. This will be a function only of m. So since I have this observation, that means the function f is strictly increasing. So it's a very reasonable postulate. So this is the postulate of monotonicity. So that's the first postulate for the information. Sorry, Dimitri? Yes? Yeah. Can you go back to the what? Sorry, I got lost. What is hm? It's just we renamed the. Yes, I renamed hp. So hm p is just the sum over x. Now it's from 1 to m. OK. Of x, h of p of x. I just think I had denoted like that previously. But now I precise provisionally the number of the cardinality of the set on which I'm working on. It's just a difference of notation, nothing else. So this is to remember that this refers to a set of cardinality. Is it clear? Yeah, it's clear. Thank you. Sorry, I have a question. Yeah? So in the previous page, we did not fix the cardinality of the. Yeah, I fixed nothing. Cardinality is totally unfixed. So I have a sum over all x's. OK, OK, thank you. OK, so this is only a precision of notation. And if it will be only provisional, because I'll drop this later on. But for the moment, I keep it. So is it clear now? Yes. So do you agree that this function, because of this observation, must be increasing function on it? The bigger the set of outcomes, the bigger is the reduction of uncertainty if I know the answer. Nobody knows. Nobody reacts. So is it? No, it's OK. It's OK? It's OK for somebody. There is still a question, yeah? Apparently, everybody is OK. I missed your. Apparently, everybody is OK. Everybody is OK. OK, thank you. So now the second postulate. So suppose that I have now two independent variables, x and y, that are uniformly distributed over sets x and y. And I suppose that the set x has cardinality l. The set y has cardinality m. So the composite experiment will be described by a random variable that will be the pair of x and y and taking values in the Cartesian product of x and y. And so the cardinality will be l times m. Now, if the outcome of x is revealed, the uncertainty of y remains unaffected because the variables are supposed to be independent. So however, the uncertainty of the pair is decreased from f of l times m to fl. So the next postulate is a very natural observation. It's not 4L or integers l and m bigger than 1. The uncertainty, the average uncertainty of the pair is the sum of assertiveness for the elements of the variable. Is it clear for this postulate? Yes. OK. And can you repeat the last sentence you said, please? So you want me to repeat the mean of the postulate? The last one you made. Yeah. So when I look at the pair of variables x and y, the uncertainty for the outcome of the pair is f of l times m. Now, since when I reveal you the outcome of x, the uncertainty has decreased by f of l, that means that if this function of the product can be written as the sum of the functions of the individual term. So if the uncertainty for the coupler is the sum of the uncertainties for the components of the coupler. OK. Now a third postulate. Now the idea is to relax the uniformity of the distribution, because up to now we have considered only uniform probabilities. But now suppose that we have an arbitrary probability vector on x. That means the only thing you know is that this p of x is positive and the sum is equal to 1. Nothing else. It must not be the same for every x. Now partition the set capital X to 2 disjoint sets x1 and x2. Suppose that these sets are non-empty. Both of them are non-empty. And consider qi, the sum of the probabilities in the set i. So qi will be the probability of the set x1, pq1 will be the probability of the set x1, and q2 will be the probability of the set x2. And now split the experiment into two steps. What does it mean that p is equal to p of capital X to be equal to small x? What is the probability? This probability can be written as the conditional probability to get small x, knowing that x is inside x1 times the probability of x1. Plus the probability of conditional probability of getting small x, knowing that the x is in the set small x2 times this probability. So the conditional probability is the conditional probability written here is the probability of getting x divided by the probability of the set here. So q1. Is it clear? So what I'm writing here is that the probability of small x, given that x is in x1, is the probability of x equal to small x. OK? There's a question, itself. Yeah? Do we partition x in any way we like? Anyway, absolutely no. The only thing you have to check is that the probability of the sets x1 and x2 is not 0. Just be able to divide by q1. Nothing else. And so you get this expression for the probability. So the probability, if you do this experiment, computing the probability directly or computing the probability through the formula of the total probability is just the same. And now I come back to the uncertainties that are associated with the probability vectors. So here x has a cardinality m. So I'm looking at the uncertainty of this vector for m. Now I use this decomposition. So this is the composition means that I have to split q1. This will be the uncertainty of m1 corresponding to the vector p restricted to x1. So what does it mean? So if I have, suppose that I have x to be 1, 2, 3, 4, 5 with p1, p2, p3, p4, p5 as probability vector. Now suppose that I have a split to x1, that is 1, 2, and x2, that is the rest, 3, 4, 5. So what will be the restriction of p to x1? It will be just the vector 1. So p1, p2. And p restricted to x2 will be p3, p4, p5. So you see that this will be a two-component vector because x1 is just a two-element set. And this will be a three-component vector because this is a set with three elements. And so you have the composition. So up to here is clear. It's just the translation of the total probability. But there is still an uncertainty. And the remaining uncertainty is that we have introduced now somehow an additional uncertainty. You don't know whether you are in x1 or in x2. And the probability of getting x1 is q1. And the probability of getting x2 is q2. So you have to add this additional uncertainty. So the uncertainty you had originally can be decomposed in that way. So this is the postulate of grouping. Is it clear or not? So what's the meaning exactly of h2? I mean, is that an overlapping uncertainty maybe? No, no. So this is the uncertainty of this probability. This will be another probability vector. This is a probability vector. I know. I know. I mean, that's what you're asking about. So this is the uncertainty of choosing x1 and x2. So still a two-set possible outcome. And this two-set possible outcome has probability q1 of q2 to occur. This is exactly these things here. OK, that's perfectly clear. And the last postulate is just a technical one. I have no intuitive explanation for this. Is that h2 has a smooth dependence on p. That means the uncertainty is not jumping without it it behaves continuously with respect to the value of p. So with this four postulates, Shannon proved the following thing that is very astonishing is that the unique function that verifies the four previous postulates is the function that can be written like that. So c is an arbitrary constant. And the logarithm here is a logarithm in arbitrary basis. So it can be decimal or it can be binary or it can be the basis of natural logarithm or anything you like. Now this set here is the set of probability vectors of dimension m. So this pVm is just the set of vectors with m components that are positive and such that the sum of the components is equal to 1. So this set is called pVm. It's the set of probability vectors of dimension m. So I'll take some time to prove the theorem in a moment. But the remark, I have to make a remark that h depends on the probability vector. That means the low of x. Not x itself. Nevertheless, very often in information theory, we write h of x to denote h of p, where p is the low of x. So it's just an abuse of notation that we have in mind. This is not very important. So I'll spend some time to prove this theorem because this is really instructive and it's not difficult. So we have to prove that h of p is minus sum px log of px. So first remark is that if I have numbers a and b bigger than 1 and x strictly positive, the logarithm in days a of x is logarithm in base a of b times logarithm in base b of x. So this is a constant that does not depend on x. So I can use an arbitrary basis for the logarithm because, sorry, here is a constant. I can use an arbitrary basis for the logarithm because the difference in the logarithms can be observed in this constant here. In fact, in information theory, we use the binary basis. So the logarithm basis 2. And in physics, we use the natural logarithm. This has nothing fundamentally different. Now the proof of the theorem is split into several steps. So the step one is the following. For integers m and k bigger than 1, by the extensivity postulate, we have what I called f before. So this is the h of mk. It can be read. This is trivially written as m times mk minus 1. And because of extensivity, this will be equal to f of m plus f of mk minus 1. And so you can iterate. And then you get that f of mk. It will be k times f of m. So the uncertainty of k realizations of the randomness will be k times the uncertainty for what? So that's the first step to prove this theorem. Is it clear? Yes. So the second step is the following. We shall show that for integer m bigger than 1, for any integer bigger than 1, there is a constant such that f of m is equal to c times log m. So start with m equal to 1. So for m, what is that c? For m bigger than 1, that is integer. m is integer. Yes. And the following inequality, I cannot read the following. OK, so for integer m bigger than 1, so you speak about this symbol? OK, now I can read. OK, thank you. There exists a constant, positive constant, such that fm is c times logarithm of m. So for m equal to 1, I have f1 is equal to f1 times 1. That means f1 plus f1 by extensivity. And so the only solution is f1 is equal to 0. So this is by extensivity. So this equation is OK for m equal to 1. Now suppose that I have proven this equation start to be true up to some m. That means f of m will be equal to c log m. So m bigger than 1. Now for every integer, r bigger than 1, there exists an integer k bigger than 1, such that mk will be less than 2 to the power r less or equal to mk plus 1. And one of these inequalities will be necessarily strict. Is it clear? Yeah. OK, now I use the monotonicity postulate to say that f of mk will be less than f of 2 to the power r, less than f to the power mk plus 1. And now I have already shown that this will be k times f of m will be so less than rf of 2, less than k plus 1 above m. In other words, I have shown that k over r will be f2 over fm will be less than k plus 1 over r. So I can divide here because m here I have supposed to be strictly bigger than 1. And since f is strictly increasing, this will be non-zero, so I can divide. Is it clear? So on the other hand, logarithm is also strictly increasing in an arbitrary base. So that means that k, I can write that k over r less than logarithm of 2 over logarithm of m less than k plus 1 over r. And again, one of these inequalities will be strict. Now, if I'm looking on this inequality and the inequality here. I'm not sure about this step because I don't see why the monotonicity of the logarithm can be applied to maintain the sense of inequalities. Because is there an intermediate value between the two limits, right? No. So this inequality comes from this thing here. So I have m of k. From the first one, we do. And so since logarithm is an increasing function, I take logarithm of this inequality and it remains sitting in equality. So it's k over m will be less than r log r, less than k plus 1 log m. Right? So I divide by log m, and you get this inequality. I know, you know, thanks. I thought that you were applying the monotonicity just in the. No, I'm applying the monotonicity of this inequality for the logarithm function. Not for the same thing. OK, so consider now the inequality I have shown in the last page for f. And consider this inequality. So I have an interval that is k r, k plus 1 r. And I have two numbers, logarithm 2 over logarithm m, somewhere there. And I have also, from the previous inequality, I have some other number somewhere in the interval. So the net effect of this is that I'm looking now at the difference of these two numbers here. These two numbers cannot differ by more that the length of this interval. So this will be 1 over r, because the length of this interval is 1 over r. Is it clear? Yeah. OK. Yeah, you have a question. Sorry, can you speak a little more loudly, because I cannot hear you? I think I understand the base of a of x is equal to log b of x. So you speak about that? Yes. Yeah. So for instance, if I have a equal to 10 and b, the number is, if I'm looking at the logarithm base 10 of x, this is, so suppose that this number is y. What does it mean? That means that x is 10 times 10 to the power of y. Right? Is it OK? Yes. OK. So now I have x is 10 to the power of y. So suppose now that I compute the logarithm base each of x. This will be logarithm base each of 10 to the power of y. But this is nothing else, that the logarithm base each of 10 logarithm of base each of 10 of y. But yeah. Sorry. Yeah. Is it? Yes. I'll go with you. Sorry? We don't listen to you in two minutes. I have to stop in two minutes. No, no, no, no. We don't listen to you. We don't urge you during the line. You don't listen to me. OK. Yeah, but you have 20 minutes. Don't worry. 30 minutes. 20 minutes, yeah. OK. 30, 30, 30. OK. OK. So if you take this logarithm out, then you get that this will be logarithm 10 of x will be logarithm 10 of each times logarithm e of x. So you can always change the logarithm basis without problem. I don't know whether I have answered your question or not. It's good. OK. So we have proven so that the difference between the ratio f2 over fm and log 2 over log m cannot be bigger than 1 over r. OK. So now since m is fixed and r is arbitrary, I can take r as large as I like. So this will be true for every r. And so I have shown that log f2 over fm is equal in the limit to log 2 over log m. I can take limit. So that means I have shown that f of m is equal to c times log of m. And c here will be equal to f2 over log 2. Is it clear? Yes. OK. An additional remark, since f of 1 is equal to 0 and f is strictly increasing, that means that f2 is bigger than c. And so that means that c is strictly positive. I have a question. You need 3, 3? Yeah. When using induction method or not? When? If you were using the induction method to prove the spec, or you were assuming that m was strictly positive, integer, and that's all? No, here the induction method is used here. So I have supposed that I have shown up to some m that is like that. But you have just proved that, no? You were not supposing it, I think, no? You have proved these facts. No. I have proved, yes, that I have proved this thing here. Yeah, I know, I'm saying that the fact that f of m is equal to c over m. OK, remove this phrase here. That's what you're asking. No induction, only proving that you're right. You're right, you're right, you're right. OK, OK, thanks. But it is an induction method. OK, so the third step. Sorry, I have another doubt. Can you go back? Yeah, yeah, I mean you, this inequality of log 2 over log m minus f2 over fm is a street. Because then we are taking. I missed your question. This is strictly less than one over here. Yes, because you see one of these inequalities, one of these inequalities is strict here. OK. And the same thing. So never mind, by the way, never mind if I take non-strict here. The argument here goes through. If I take non-strict here, nothing changes in the rest of the argument. And I'm a bit lost because then you are taking the equality. In the limit of r going to infinity. Yes. Yeah, I don't understand why you can take it equal to 0 in the limit of r. Because if I have that, if I have that for every epsilon positive, some number a is less than epsilon, that means that a is equal to 0. OK, yes. Thank you. Right? Yeah, thank you. Now let's prove the third step. So for p, some rational in the set 0, 1, we shall show that h2 of p1 minus p is equal to minus c times p log p plus 1 minus p log of 1 minus p. Maybe for the moment I can take open interval here, but this will be even for the closed interval. OK, so since p is a rational number between 0 and 1, p can be written as a ratio of error and dash integers bigger than 1. And now I have that f of s is equal to hs of what? A vector that will be 1 over s, 1 over s s times 1 over s, 1 over s r minus s times. Is it OK? You have to be telling me, are times no r, no, no, no. So I don't see that. No, here I'm looking at the information content in this average, uncertainty reduction of a set of s, cardinality s, with a vector that is uniform in this s. So everything is s. Sorry, I have here, sorry, you're right, here is r. I was saying that. And here is not good now? Yes, thanks. OK, so now I use the grouping argument, grouping postulate to say that this would be h of 2, r of s s minus r over s plus r over s f of r plus s minus r over s f of s minus r. So this is by grouping postulate. Now, I use what I have shown in step 2 to say that this will be h 2 of, so r over s is just b. This is 1 minus b. And here will be cb log r plus c1 minus b log 2 of s minus r. This is just a small computation. And on the other hand, f of s will be equal to c times log s, because all these r and s are integers. So I can solve this equation for h 2. And I'll get that for rational b, this will be c times b log r plus 1 minus b log s minus r minus b plus 1 minus b log s. And this is c times b log b minus 1 minus b log 1 minus b. So this is just algebra. I don't wish to spend the time to do these details. Just write that b is r over s. So once you arrive here, you have an expression now for h 2. That is valid for b Russian in 0, 1. Now we pass to the last step, b in 0 or 1, open interval, and arbitrary, not necessarily rational. Now if b is irrational, then it can be approximated by a sequence of rational spin. So for every such bn, I have this equation. And now I use the continuity argument, because this is the last postulate, the continuity of h 2. This will converge when, so that means that pn converges to b. That will converge to h 2 b 1 over b because of the continuity of this function h. And the logarithm is also a continuous function. So by continuity, it will be minus c log b log b plus 1 minus b log 1 minus 1. So let's conclude this here. It remains now to show a last step, but this step is trivial, so that if x is not just two elements, but an m element set with p equal to p1 bm, we have to show that hm now will be minus c sum of px log px for x and x. But this again is trivial. So suppose that the formula now is a real induction. So suppose the formula is correct for m equal to 2. Suppose that is correct up to m minus 1. And then consider hm of b. This will be h2 of q. I'll explain the symbols in a moment. Plus q hm minus 1 p1 over q p minus 1 over q plus pm of h1 1. So what I have done here is that I have regrouped the probabilities up to minus 1. I call it 2. And so I have regrouped the terms. And so I have here that this will be minus c log q log q plus pm log pm plus q sum of k equal to 1 minus 1 pk over q log r of pk over q plus 0 for the last term plus. And so if I do out the algebra, you see that I get that hm of b will be minus c sum of pi log pi for i equal to 1 to n. So I have proven this famous formula of sum. I hope that you understood the proof of this theorem. Have you any questions? In some places, I went quite rapidly because it's just algebra. I don't wish to spend time to do some elementary computations on the blackboard. But have you understood the idea of this proof? Possibly. Sorry? About this other thing. The whole proof of this formula. So I now define the so I introduce basic convention that is 0 log r of 0 to be equal to 0. Because log r of 0 is minus infinity. So you have to define what is 0 log 0. So I suppose that the convention is that 0 log 0 will be always 0. In that case, I have not to deal with strict positivity of the vector. It can be even 0 for some places. And so the function h of p that is defined in that way. So the sum extends from 1 to the dimension of p. That means the dimension of p will be the cardinality of the set x whose existence and uniqueness is established in the previous theory is the entropy or the quantity of information associated with the probability vector p. So this is a function that we have already seen in physics certainly. Here we have done rigorous mathematical proof that the function is like that, must be like that. And since this function plays an important role, I just draw here the graph of the function h for dimension equal to that means for the cardinality equal to 2. And you see that when the cardinality is equal to 2, so the probability vector will be p1 minus p. So that will be the probability vector. And in that case, in the place 1 half, this function takes the maximum value that is equal to 1. And this is the maximal uncertainty. So when your coin is honest, you have a maximal uncertainty and this is the information content in that situation. When p is equal to 0 or 2 1, then this function is 0 because even if you don't toss the coin, you know what will be the output. So the uncertainty will be 0 at the edges and there will be intermediate values elsewhere. So that's an important function in information theory and is an important function in physics, of course, because this function has a long story. I mean, in almost all computer science books, the definition of h is usually attributed to Shannon. And the reality is that Shannon, his article in 48, establishes for the first time, recursively, the existence and uniqueness and the mathematical properties of information. But the formula like that was established in 1877 by Boltzmann and published only 20 years later in 1896. And in the next page, the facsimile of the page 41 of the Boltzmann's book, Lectures on Gas Theory. And you see that this formula appears here in red. And you see that as he says, L, he uses the symbol L. He uses the symbol L to denote the natural logarithm. And so that we should write logarithm, Ln of n, 1, and he obtains this formula with totally different approach. He uses the composition of the space space, small boxes. And he counts how many microstates are inside every box. And he uses the Stirling's approximation for the factorial to get this formula. And by the way, this formula or the short hand of this formula is on his gravestone in Vienna, in the cemetery of Vienna. So this was known by physicists three quarters of a century before Sun. So I think that this is the good point to stop because tomorrow we can see what the entropy is. And you will see three different interpretations of the entropy. So I was much slower than I thought. But I think that it was important to make at least this proof on the board. And so if you have any questions, I can get to ask. No, no, no. Yes? Yeah. I saw that another action of the entropy information. And one of those was instead of using the action one of the monotonicity, some people use the action of that the entropy is maximum at the uniform distribution. Are they equivalent? No, not really. The maximum of the entropy at the uniform distribution is essentially an output of this. And it comes here. But of course, you can reformulate the things. So that instead of using the maximality of entropy, use the maximality of entropy, and then turn the arguments in another way to prove the same result, of course. OK, OK. So there are two main different options. No, we can use, no? Sorry? There are two different axiomatics. Yes, but if you take as axiom the maximality of the entropy at the uniform distribution, then you may change the other axioms also. You cannot use the other axioms as I did. You should have to modify the axioms. But they must be equivalent. The two theories must be equivalent. OK, thanks. You're welcome. Other questions? Before question, Dimitri, can you send the presentation of today? Of course, and I can send both the presentation without any annotations, so it will be very clean. And I can send you the thing with the annotations. To Erika? Yeah. So you can send to Erika, which we put on there. I don't know. I can send to the secretary to put on the. Yeah, the secretary. We had an email exchange with her today. It's this SMR367. Is it your which send you the Zoom link? Yeah, the person will send you the Zoom link. It's a nice secretary. And I can send the transpirers to her. Whatever you want, material you want to share with students, you can send to her. For me, it's the same. I mean, if you give me an address, I can send I can send to one of the organizers. Yeah, also, also. So I can send to the organizers and then they dispatch to the students. OK. We are updating their website continuously. There are new materials, videos, photos, there's a lot of things that you can. By the way, these transparencies I'll use also tomorrow. So there will be additions on this. Because I have the same set of transparencies for tomorrow, but I have different sets for the other days. But so this is not the the the annotations stopping.