 Thank you very much. Thank you very much for your kind words. So briefly I turn off my camera because it takes bandwidth so it's not very pleasant to have bandwidth just for you have my picture on your screen. So I take my camera off and so yesterday we saw that there is a function h depending on the probability vector p that is written like that in fact with some constant here and the logarithm in any basis and this function is what is called information by Shannon but it's exactly the same function that is called entropy in physics and we showed yesterday that this function is uniquely determined by four axioms is the unique function that verifies the four axioms we stated yesterday. Now suppose now I'll give you some interpretation of this function. The first interpretation is that the entropy or information is a kind of expectation and I'm joking that is an expectation that makes us older I explain this in a moment. So you have an x-valued random variable whose law is determined by some probability vector and you define a random variable c that is the logarithm minus logarithm of p of x. Do you understand what I mean by that? Yes it's a function of random variable but it is a function of a random variable but the function the precise function here is the law of this random variable. So you have a random variable that is distributed according to the probability vector p of x that means p1 up to pm is the probability vector and so every time you apply this vector on this random variable so it's depending on the value of the random variable this takes a new value minus logarithm of p. So I'll come in a moment again in this in this problem so here I have forgotten a minus sign so there is a minus sign missing so the expectation of this of this variable is minus the expectation of the logarithm of p of x so I write it down and it gives you h of p. Why I am doing such a fuss about this function? Because in an improbability theory the expectation is a linear operator that acts on random variables and gives you something and the measure with respect to which you integrate defines a linear form on the set of random variables. So here you have lost the linearity because both the integration measure and the random variable you have to integrate depend on p. So this is a not linear function so this is not a usual expectation so you have to keep this in mind and so why I'm saying that it makes us older because Boltzmann related this irreversibility this function with irreversibility when this function is positive it's essentially an irreversible process and since the process is irreversible for instance life is an irreversible process otherwise there is no evolution so because life is an irreversible process this expectation makes us older so that's just for joking and so now past the second interpretation of the function h the the entropy is the mean number of questions binary questions you make you must ask in order to determine the outcome of the random variable x so suppose that you have a random variable x taking values in some set here I take five elements set with this probability vector so how do you determine how many questions do you have to ask in order to determine what was the output of the outcome of this random variable so the idea is the following try to separate this set of points x1 to x5 in such a way that the probabilities of the subsets are almost equilibrated so here I ask first is if is the function is the random variable x taking values in the set x1 x2 so you see that x1 x2 if I look at this vector here is zero five so the complementary set will be also zero five point five and so here I have made a question that is quite equilibrated in probability so either the question is yes or no and these two branches happen with probability one half now when the answer is yes I can ask is the function x taking the value x1 and again I say yes or no and now it will be yes if the sorry do you understand why I have here zero six and zero four the probability a conditional probability sorry a conditional probability right you have you have already taken here the the the set with probability zero five so this is will be a conditional probability and so again I have equilibrated the conditional probabilities and the answer will be either yes and in that case I have determined the value of the variable x or it will be no and in that case I have again determined the value of x and I go through the other branch through the other branch I ask whether the probability the random variable takes value x3 either yes or no if yes I have finished if no I have to make an additional question so when I'm in the in this magenta boxes I have totally determined the the the outcomes of the random variable and I have also I know also the probability so a very important thing is that if I compute the entropy for this probability vector this entropy is 2.27 and on the other hand I can compute the average number of questions I have asked in order to determine the value the the variable so here I have done two questions two questions two questions and here are three questions for these leaves there so if I multiply the number of questions I have done by the corresponding probability of the leaves so here here is the number of questions and here is the probability of the leaves so in red inside in every box I have written these quantities then you determine that the expectation number of the expected number of questions is 2.3 and you you observe that these values are very close to one another if I had chosen here as a probability vector a probability vector where the probabilities are inverse powers of 2 in that case I should have a complete agreement between hp and e of n here I have not exactly the same thing but I have something that is bigger than the than the entropy but in this we can show that if instead of asking things about one random variable I ask about a pair of random values an independent pair of independent copies of x two independent copies of x I shall have for every variable I shall have some some other expected number of questions that will be closer to hp and if instead of pairs I ask about triples of copies then I can again be even closer to hp and in fact it can be shown that asymptotically when you ask a big number of independent if you are looking about a big number of independent copies of x and you compute the number of questions you need per variable this will converge to hp so that that's a very important thing in information the number of questions you have to ask in order to determine a random variable is always minimized by the the entropy and in fact you can approach the value of the entropy at the end is it okay for everybody do you understand can you repeat in which limit do you both want to do this approach yes so if I start if here I have reached to this value to 2.3 that is close to 225 but it's not exactly 225 so now instead of looking on a or next I'm looking now on pairs independent pairs of x1 x2 so now the set in which this variable takes values will be the Cartesian product of x and if I'm looking at the joint probability vector this joint probability vector so the probability that x1 is equal to x and x2 is equal to y will be p of x1 sorry p of x p of y so I have also the joint probability vector now I can compute so I know I called this kappa of xy this is a joint probability vector so for this joint probability vector I can compute the entropy this will be two times two times 225 27 and now I ask questions I arrange questions in the same scheme but of course more complicated because now I have more questions to ask to determine the value of the pair x1 x2 and now if I if I compute the number of questions I need it will be something that it will be between between two 227 and 223 2.3 the expected number of questions will be inside the set closer to the to the lower bound than there that means that the number of questions per variable you you ask is less than previously but always bigger than 2.27 now instead of asking questions about pairs you ask questions about triplets then you approach again even more to the value of the entropy at the end of the limit when you ask infinitely many questions and you take the the the entropy per question per variable you have used then you reach precisely the limit h of p is it clear or not yeah thank you you're welcome that is there any any other question now for now now we pass to the third interpretation of the entropy the entropy is the logarithmic ratio of typical over total configuration this is a very important thing so I have to do some preparation about notation so suppose that we have a random variable on a finite alphabet a so here without if the cardinality of a is m without without loss of generality we can suppose that a is isomorphic to the set of integers from 1 to m and then in that case p will be just the vector of of m components now the law of x is given by probability vector and for arbitrary integer I consider the random variable I denote by bold x bold x will be also depends depends on n and will be n independent copies of x so x1 to xn obviously the bold phase x is an element of the set a to the power n that means xn is a random word having n letters over the alphabet a is it clear no yeah so x is a random variable and on the set a that means the x takes values and a so that means x will be if a is equal to 0 or 1 for instance bits then x will be 0 with probability p0 sorry and p1 with probability p1 will be 1 so the meaning of these two phrases is precisely this if x takes values in a set of bits for instance it will be like that if it takes in the normal alphabet it will be the probability of obtaining every letter now I consider n independent copies of x is the same thing I did I did before for the question so here I have x1 is distributed according to p x2 is distributed according to p x3 is distributed according to p and so on independently from one itself so now I have a couple of random variables I denote by x bold indexed by n and of course this end up belongs to a n that means if I interpret these like letters the elements of a like letters then x bold will be an n letter word that is random because it's random randomly chosen on the alphabet a n is it again not clear is it clear or not yes okay okay now for a fixed and now forget about random values I'm looking on just words on the alphabet a of n letters so alpha bold will be the n type of n letters chosen in this alphabet some fixed word now I count for every letter of the alphabet but I count how many times this end up contains this letter is equal to this letter and this number this is a just a number and this number will be between zero and two and n is it clear or not now due to the independence of the random variables if I'm looking for the probability that x bold equal to some given alpha like that this will be the product that is here clear I'll do an example in just a moment and in that I also can compute the probability that this number is equal to l for l between zero and ten and I compute this number so maybe make an example here so take n equal to five a the alphabet of bits and alpha bold let's say the sequence one zero zero one one so this is a five letter word five letter word on the alphabet eight and take us letter a the letter zero for instance so what will be new zero of alpha this will be the sum for i equal to one to five because the length is five of the number of times the word takes the value zero that means here I compute the this sum will be two and if I take new one of alpha exactly the same and here I have to change and here I have three so you see that the sum of the sum of mu a for alpha is always equal to n now if I take a random variable x five five letters so x one up to x five and these random variables x one to x five are independent and identically distributed so this will arrive very often in my talk from here so I have the abbreviation iid meaning independent and identically distributed random variables so again I use the abbreviation rv for random variables and the probability that means that such that the probability of x k equal to a is just pk of a p sorry pa pa now what will be the probability of x being equal to alpha for x and alpha words of size five so that that means that I ask whether x one to x five is equal to alpha one up to alpha five and this is since the variables x are independent here this means by the way that probability x one is equal to alpha one etc up to x five equal to alpha five but since the random variables are independent the probability of this intersection here is equal to the product of the probability so this will be b of x one is equal to alpha one up to p x five is equal to alpha five clear questions any question sorry okay so now come to the situation where alpha is is this one so alpha is this sequence what will be the answer here so alpha one is equal to one so alpha one is equal to one so that means here I have b one alpha two is equal to zero so it's p zero alpha three is equal to zero so it's p zero alpha four is equal to one so it's p one alpha five is equal to p one and so you see that you can rewrite this this product here as p zero to the power new zero alpha p one to the power new one alpha so the terms I the quantities I determined before and zero and one self to express the probability in a more compact way good any question no now I come to the question what will be so here now I use a random variable so this will be a random number this is the the number of times in the sequence in the random sequence x one to x n I find the letter a and I ask whether this is equal to l so you see if I have a random sequence of length n if I have a random sequence of length n I can find zero times the letter i I can find one time the letter a up to n times the letter a so the l here will be an element from zero can be an element from zero to n so what will be this probability yeah so the meaning of this probability would be and the probability to find that the the frequency of c in the letter a is n in your message no I I I'm tossing I'm tossing the the coin n times so this gives me a sequence x one x n so this is a random sequence so so and now I'm counting how many times I have the letter a in this sequence yeah okay so this will be a random variable because x random variables every every every realization of the of the entosis of the coin gives me another number so this is a random variable is it clear or not okay and now this random this random variable because this is just the number of a cardinality of of the number of times you encounter the letter a in the sequence if I have a sequence of size n the letter a can be can appear any any number between zero and ten for instance if I have suppose that instead of looking on bits I look on on letters on the alphabet and I have a sequence x that takes the value so a bracket so it's one two three four five six seven eight nine ten eleven letters so if this is a sequence of eleven letters and now I'm looking what is the problem what is the number of times I encounter the letter z in x this is zero the number of times I y in x is zero the number of times I'm counting a in x is one two three four five but if I if by chance I I I find another another word this for this other word the numbers here will be different but anyway the these numbers cannot be less than zero and cannot be more than n and the is the the the size of the sequence so the the possible values of of repetitions of the letters is of appearance of the letters is between zero and ten it's not clear yeah yeah yeah okay so in order to calculate the probability you would have to consider all the options to rearrange your characters right okay so what will be here we have to consider the probability to find each of um to find the letter a suppose you have a letter a in your message so the probability of this certain message would be l to the l no p l no p a to the l and one minus minus p a to the n in minus n and also con consider the combinatorics no yeah of course you must you must also take into account the combinatorics so come come again it's easier to come again to the case of bits and now I I I I define new set this is the sequence the set of sequences of size five such that the letter one because here I have one the letter one is equal to appear cell times if I if I am looking on one it's enough to sum the ones to get the number of terms sorry so what is the cardinality of this set more generally in n here what is the cardinality of of the sequence of n letters so here maybe generalized n n n so what is the cardinality of this set I have I have for every choice I have two possible choices zero into one so this will be the number of ways I have to choose l among n so I have to choose here l once among the n possible values so maybe you don't know this this symbol is used in the French literature and the Russian literature I don't know your habits in in the anglo section literature it is written like that but this is the same is the number of ways to choose l among n so now I have finished because the probability of having this number equal to l this will be equal to the cardinality of this now of this set n1 and here will be p1 to the power l and p0 but p0 is one minus p1 the power n minus so I have finished I have computed completely I have determined completely the low of the of the of this random variable is it okay so this is the formula I have written here good now again some correlation it's more convenient to work because I have to take limits it's more convenient to work with infinite words that means alpha in the set of a to the power n that means infinite sequences of of of words and infinite random sequences that means x1 x2 etc and the way to to to consider finite words is just to index up to n and sum up to n so here I have an infinite sequence and I'm counting how many times the letter a appears in the n first letters of the sequence nothing else and I'm using the notation restriction to the n letters of alpha and the restriction to the n letters of x to denote the restriction of the sequence to the n first letters now the important thing is that the if I if I compute this quantity for every letter of the alphabet I have a collection of of of numbers and this collection has how many elements as as many elements as the elements of the alphabet so nu n I denote by nu n the vector in some sense whose components are the the numbers I have computed here and if I divide by n since the sum of these numbers for all the letters of a will be equal to n if I divide this vector here will have some will sum up to one and this will be a probability vector so this is a probability vector depending on the sequence alpha you have chosen so this probability vector is called the type of alpha is it clear or not I maybe I come back again sorry I had already written something like that yeah I had written shown that the the number of of the letters of the of the of the times you find the letter a in the sequence alpha will be equal to n so if I divide by all these numbers are positive so if I divide by n one over n same sum of over over a nu alpha of nu a of alpha this will be equal to one and since all these numbers are positive then nu zero of alpha up to nu n of alpha this will be a probability vector so is what I'm I'm saying here nothing else so this is a probability vector and this probability vector is called the type of alpha clear now I'm I'm defining the notion of typical configurations so suppose that n is a fixed integer a finite alphabet p a probability vector of this alphabet and k a fixed integer strictly positive a word alpha of n letters over the alphabet a is called typical more precisely n p k typical if the following thing happens if I count the number of words of the if I count the number of times the letter a appear in the world in the n first letters of the world and I subtract n times p a and divide by some term that for the moment I don't explain this is for every a this is bounded by constant k otherwise if for some letter this inequality does not hold then the the word is atypical atypical and the set of typical words will be denoted by t and pk so these are the the words who who are the numbers of each letter the number of times each letter appears full face this this this this bound here now if I divide by by n this inequality here I can write also like this and you see that that means that the the number of times I have found that array divided by n this is a kind of empirical probability minus the theoretical probability of getting that array is something this is a constant but one over n a square root of n that means when n is large this number will tend to zero so typical words are those words where the the they they contain every letter with some preset density of letters so the the density is determined by p so typical words depend on the probability vector but they are not themselves random so here I have just words alpha alpha is just a word and I am looking for all the words verifying this this condition these words are not random they are just a set of words that verify this bound is it okay so they are like kind of how do you say central theory what central low of large numbers of yeah yeah yeah I know also the property of being typical is like like there probably yeah yeah yeah of course yeah any any other remark or question okay so that's a very important theorem it is called the asymptotic equation property so it says you the following thing if I choose I'm choosing some epsilon stricter strictly larger than zero and strictly less than one and some k that is an integer that is minorized by the cardinality the square root of the cardinality of a that's just technical then for all n sufficiently large so larger than k the probability that this random sequence is not typical can be made arbitrary small now for every alpha in the typical sequences the probability that xn is equal to the typical sequence is morally exponent 2 to the power n entropy of p so you see here I have something that depends on n and here I have something that depends on square root of n when n is large for instance 10 to the power n this is one the the number of bits that are in a usb stick it's just one gigabyte this is eight times 10 to the power n bits so when n is 10 to the power 9 square root of n will be 10 to the power 4.5 and this is negligible in front of 10 to the power 9 so the main term here will be essentially the term of that is proportional to n so provided of course that h is not zero so that means that the probability that a random sequence is typical is essentially this this number here and the other thing is that the cardinality of the set of typical sequences is like that is 2 to the power n to the power hp so you can forget about the time that time is some small number small number so maybe say two words about the significance of of this theory so to fully grasp the idea of of this theorem you have to consider some astronomical situation so a usb what i was saying that the usb key is one gigabyte that means eight to the power 10 to the power n nine bits now two to the power 10 to the power nine is 10 to the power 3 times 10 to the power 8 so i cannot work with this kind of numbers because i can explain nothing so let's start with a very small set so a will be bits and i take sequences of length only thousand not not ten to the power nine thousand thousand bits now already a to the power n is two to the power thousand and this is already something like ten to the power of 300 to something like that so already the there are enormously many words of size thousand binary words and i'm choosing as p the vector zero to two zero eight so that's is the probability of taking zero and that's the probability of taking one so the the i can compute now the entropy of this vector of this will be 0.72 i fix some epsilon uh in the in the theorem before uh five percent that means zero zero point five and i'm taking k as in the theorem cardinality of a over epsilon this is two over zero zero five and this is six point 32 i'm computing n p a one minus p a square root this will be thousand times zero two times zero eight and this is two point six and so if multiply the two i get 79 six so let's say 80 to have some in a whole number so for so this will be k k is it will be uh so this will be a k so what what does it mean that means for for alpha in the typical sequences like that uh by the low of large numbers somehow new uh zero of alpha will be 200 plus 80 plus or minus 80 and the new one of alpha will be 800 plus or minus 80 now as i told you before uh this theorem gets the full flavor when n is officially large of this of this size here and you you understand that the when i when i go to this size is here the correction i have here is negligible so i can take that uh the uh i can forget about these corrections i can forget about these corrections now what this theorem says the first uh claim of the theorem is that the cardinality of the typical sequences over the cardinality of all possible words is approximately uh two to the power thousand times zero point 72 over two to the power thousand and if i take the numbers this is will be minus two to the power minus 280 that is five times minus 10 minus 84 so that means that if i depict the set a n like that the set of typical sequences is a really tiny subset really uh the number of of the the proportion of of configuration into the typical set is uh ridiculously small it's the the the ratio of the cardinality is some ridiculously small number so typical sets are very small in in cardinality however argue claim b of the theorem says the following if xn x1 up to xn are independent and identically distributed according to p then the probability that x is in a typical set is bigger than 95 percent so although this set has a ridiculous small cardinality it carries practically the whole probability mass so i write it this is because important and the third claim of the theorem is the following for every alpha in the set in the typical set the probability that x is equal to alpha is equal approximately to two to the minus 720 so this is we can be shown that it will be equal to the cardinality of t and pk that means uh the probability of taking any word in the typical set is the same for all for all words so this is a uniform probability uh you have a uniform probability on this set so that that's why that's why it is called asymptotic accurate partition property so this theorem is very important uh because uh it gives you an efficient way to compress information it says that the typical word are uh you have interest to encode the typical words by short codes and the atypical words by long codes so the codes that you will use to encode the words in the typical set will be uh will give you small words and here we can even lose and have enormous words but this practically will never will never will never happen because the probability of of of being out here is only is less than five percent so did you understand the meaning of the importance of this theorem is it okay so i think that i i skipped the proof because i'm already late in my program so if we have time at the end of the of the course i'll give you the the the proof of this theorem it's not it's not difficult but anyway it takes some time to to to take proof now i uh go to some easier things so properties of the entropy uh the entropy of course is positive because uh if you uh the minus sign is precisely there to ensure positivity because uh p is probability so it's between 0 and 1 so logarithm is negative so the minus sign here is just to remove the negativity of the logarithm so this is what is always positive now if you have uh you have a technical lemma that is very useful that if you have two arbitrary probability vectors on the same alphabet then uh the sum here is always majorized by the sum where you have replaced the b by q here and the the proof is extremely simple so i give you below the the function t log t is a concave function on r so a concave function is like that so it has the has the carved below and uh so uh that means that the the function will be below the graph the graph of the function will be below the graph of the tangent to the to the function at any point so i'm taking the tangent at uh at 1 so the tangent at 1 will be uh has has a slope equal to 1 it is equal to 0 so it's like that and so the concavity means that the graph of the function is always below the the the tangent so that means log t is always less than t minus 1 for all t positive and so the logarithm of qa over pa it will be just less than p pa qa over pa minus 1 and so we conclude that if i plug in this here so it's b so this can be written as uh sum pa log qa over pa and this will be negative and so this gives you gives you the proof of this inequality is it clear this is just convex analysis nothing else no answer so the uh this will serve you to uh prove an upper bound on the uh on the entropy because we have already lower bound that is zero and now we have an upper bound that is the logarithm of the uh cardinality of a out of you you you prove this you use the previous lima uh with q equal to the vector 1 over the cardinality of a for all a that means the uniform probability and uh when you plug this vector in the previous inequality you get immediately this bound upper bound and it this inequality will be an equality if if and only if p is equal to the uniform vector of a so the maximum probability the maximum sorry entropy is uh logarithm of one uh the logarithm of the cardinality of a now another uh another thing we can define another quantity we can define uh in uh respect to to entropy is what is called global libelar contrast or relative entropy also uh if p and q are probability vectors on the same alphabet uh p will be absolutely continuous with respect to q we write we write like that if whenever p a is equal to zero uh whenever q a q a is equal to zero then p a is equal to zero uh that uh uh means that uh this ratio p a over q a uh is well defined you have not division by uh positive over over zero that would give you uh infinite term so uh you define the uh the uh uh cool back library contrast between p and q to be this quantity so if p is absolutely continuous with respect to q then if you have this uh ratio otherwise you define this mean thing so again you can prove that uh the uh this cool back libelar uh contrast is positive for arbitrary p2 it's not symmetric so it cannot be a distance may very often it is called a cool back library distance but it cannot be distance because d is not symmetric uh nevertheless it serves to uh discriminate between p2 because the larger this uh this contrast is the uh more different p and q are uh i'll skip probably this what time is it yeah i'll skip this thing here so i can study the cool back library contrast for Markovian evolutions so suppose that xn for n uh varying over the integers is an irreducible in the periodic mark of chain under the numerable space x and stochastic matrix p by the way uh do you understand what i'm saying do you are you uh uh do you know uh elementary mark of uh chain theory it would be good for us to review a little bit okay so in that in that way maybe before before stating the theorem just uh review something element maybe don't speak about uh infinite the numerable but only finite the numerable so mark of chains maybe so finite so uh i consider a sequence of variables that are only dependent so this will be finite set uh i interpret this integer here indexing here as kind of time so maybe time measured in seconds or something something like that and uh these random variables are not independent but they have some uh very uh constrained uh conditional dependence if i'm looking at the position at time n plus one conditioned on the positions the the chain had on all previous times this conditional probability will be equal to the probability of x n plus one equal to y given that x n is equal to x n that means the the position at time n plus one depends solely to the to the position on the position you had a time n all the previous times have been forgotten the position you had all the previous times have been forgotten so this is the mark of property this is the defining property of mark of chain uh now in fact since everything depends on this conditional probabilities i introduce some notation for this so i denote by p of x y the probability conditional probability of x n plus one being equal to y given that x n equal to x so be careful about the inversion of the of the order of x and y here and uh so the collection of this will be an x times x matrix with terms and moreover uh this matrix has a very important property if i am looking on the sum of the elements of a row of this matrix so i am summing the element the second the second component so i am summing the elements of a row of this matrix this will be equal to one for every x do you see where it comes from normalization sorry exactly because because this sum here is the sum of the conditional probabilities of x n plus one equal to y given x n equal to x so here uh anyway you you must go somewhere so you're summing overall possible positions for x n plus one so they will be equal to one and the the main thing here is that you can represent this this this matrix very easily by some uh uh directed graph p is uh in bijection with some directed graph i give you just an example suppose that two friends uh are uh they are playing head or tails and uh uh so the the total fortune of the of these friends is fortune of a plus fortune of of b so let's say that the total fortune is l euros uh and so they start the game they start playing so zero and then one and then l minus one and then l so at every time they toss a coin if the if its heads a loses a euro so that means b wins a euro and vice versa so that means that you can represent uh this game in that way so that goes up to here here it goes like that and here you are uh the uh the uh this uh player is ruined and all the the the fortune is for the other and the same thing here and so so this is the graph the directed graph corresponding to the uh to the mark of chain that is known as gambler's ruin now uh uh i uh define the notion of uh irreducibility so for x for every x and y if for every x and y that exists an integer that depends on x and y such that the nth power of the of the function of the matrix n has element x y strictly positive then the chain is called irreducible the meaning of this is that if i'm looking at the graph the directed graph of the chain if i'm choosing two arbitrary nodes of this graph there is a strictly positive probability to go from one to the other in a finite time in a finite number of steps for instance for gambler's ruin is not irreducible because you see that if i'm going to zero or to l then i stay there forever once i am at zero or l i can never go back to the other nodes though this chain is non irreducible is reducible chain and the chain will be called strongly irreducible if there exists a capital n there exists positive alpha such that the minimum of p and x y the minimum of x and y is equal to alpha and this is a strictly positive so another i'm i'm running out of time and i have not finished the second the i i i see some question of one day at yaha that i don't understand if there is abnormalities kind of the clarification you give still hold what kind of abnormalities are you here monday yeah what kind of abnormalities you are thinking about okay i i i'll ask later so uh let me oh yeah is there a question in the chat of course you can chat uh you see the email the chats i i'm seeing the what what you wrote in the in the uh in the chat is okay okay uh variation variation in data what what kind of variation in data at the yaka can you precise your question please so you you asked me two questions the first one is that if there is abnormalities can the explanation you give still hold and the second question you ask this can a function go go convex to r plus especially for uniformly in equality and i don't understand the two questions can you be more precise i think you can't speak you can just write you can write i think you can continue if yeah okay so just just a more definition and i stop because i am i'm running out of time and then after afterwards i discuss with monday so definition uh so again xn is a mark of chain uh a generalized random variable that means taking values not only in integers but also taking possibly an infinite value is called a stopping time if for every n the event t okay generalized variable t t equal to n is completely determined by the values taken by x0 xn that means that does not depend on xn plus one and so on feature and give you an example uh suppose that xn is the temperature every every day's temperature at day n n of the year at a given place and i'm asking about the first time the first time the first day in the year where this temperature is bigger than 42.2 if uh with the convention that the infimum of the empty set is equal to infinity so this is a this is a stopping time because in order to to to ask whether xn is big with whether t is equal to m it's enough to see the temperatures between the days one and m but if i am asking t prime to be the infimum of n such that xn is the maximum of the year the maximum temperature of the year of course this is not a stopping time because to to to answer whether t equal to m i have to know not only past uh past temperature but also future temperatures so this uh the this uh notion of stopping time will be important when we speak about Turing machine at the at the last lecture and when we are going to speak about Kolmogorov's complexity so i think that i have already past time so i have again not finished what i was planning to say and so i stopped here is there is a is there is a question for me for Vladimir so what what is no no there is no question so there is no question there is no question in the room but there is one there is one yes my question is is there any lesson between please please be more close to the microphone i don't i don't hear sorry what was the question is there any relation between the Markov chain and the dynamics of a system as they are depending on the last step as for the Markov chain probability so a dynamical system is is a special case of Markov chain in the sense that the stochastic matrix the matrix p i was writing is uh what is called a deterministic stochastic matrix in the sense that in every line there is only one position where this matrix is equal to one and all the other elements of the line are zero so this is again a stochastic matrix it has this property that every line sums up to one but it's a very special matrix in the sense that it's deterministic and this is precisely defines precisely a dynamical system so dynamical systems are special cases of Markov chains important class of special cases other question is there any other question okay i look in the chat in the chat this was the question of Monday i asked you this question okay so you can try to maybe to speak directly with this man yeah he thanks very much Dimitri can you once again send me your i'll send it i'll send the transparent this guy perfect so for all the students i advice you strongly to because there is a lot of material in this lecture so i advance you to read this afternoon or tonight because the lecture the lecture before tomorrow it's a very short time i know but tonight we don't do soccer so you have time we have time to you have time to read the lecture in order to profit because the all the lectures are very good also the general lecture so i i will advise you to try to profit by working the night not so not not today but okay so Dimitri thanks very much thank you we will go to it and we see you tomorrow morning okay see you tomorrow okay i'm not the third person it's coming up very good for walks and so we'll see if maybe we can change work we'll see we'll tell you it's here for who now i think it's okay