 Σήμερα, ξεκινήσαμε να μιλήσουμε για την κόδη της σώσης. Έτσι, έχω δημιουργήσει κάποια δημιουργία, για παράδειγμα, τι είναι η κλώσσα της κόδης, τι είναι η κόδη. Η κόδης είναι ένα μάπο, που μάψει τις σύμβουλες που έχεις στη σώση της κόδης, στους λεπτάρες της κόδης, ότι η δημιουργία και η δημιουργία μπορούν να καταλαβαίνουν. Οπότε, για τέτοιο, έχω δημοσιασμένα λεπτάρες της κόδης, και μπορώ να σας δείξω, τι είναι η κόδης της μάπος της κόδης, παρακολουθούντας από την συμφωνία, να αυτοκρατήσουν μόνο όλες οι λεπτές που αυτοκρατήθηκαν από αυτή η σωστή, αλλά και οι αυτοκρατές πράγματα που αυτοκρατήθηκαν από αυτή η σωστή, για να αυτοκρατήθηκαν από αυτή τη σωστή. Αυτό που δεν είχα εξηγηθεί σήμερα, είναι ότι αυτά τα κομμάτια δεν είναι σημαντικά να κάνει κάτι που θέλεις. Για παράδειγμα, αυτά τα κομμάτια δεν είναι σημαντικά να κάνει κρυπτογραφή, για παράδειγμα. Η ιδέα είναι ότι όταν βρεις ένα λεπτύριο α, εδώ, σε αυτό το set, πρέπει να μπορείς να εξηγηθεί το σύμβολο που έχει been used to produce this word. Αυτό είναι ένα πολύ σημαντικό δρόμο της κομμάτιας. Και... So here is the example I gave yesterday. I guess you understand this example. I have not to go through again. And now, let's say something about the codes. So a code is non-singular, if the function encoding symbols of the source into words in the machine alphabet is injective. That means if two code words are the same, they must come from the same symbol emitted by the machine. So I take the contraposition of this implication. If Cx is equal to Cx', that means that x must be equal to x'. The code should be named uniquely decoded. If not only the code itself, but even its extension to arbitrary words of symbols of the source remains injective. And it will be instantaneous if no word of the glossary is prefix of another word of the glossary. And so these are more stringent and stringent conditions about codes. And so usually if I denote by C instantaneously, uniquely decoded synonsingular, the sets of these codes, they have these inclusions and of course all of them are included in this set of codes. And now I give an example. I give an example, so suppose that we have four symbols, one, two, three, four, emitted by the source. And usually we wish to encode them into zero one alphabet. So for instance, one code you can use is just one and two map to zero and three and four map to one. Obviously this code is not good because this code is singular because C1 is not injective. You have here zero, zero, but this zero comes from one, this zero comes from two. So you cannot invert this code. C2 is non-singular but it's not uniquely decodable. You see that if I look at the inverse image of zero, one, zero, then I can produce this word in different ways. So one way is to concatenate the code for three, it's zero, one. And the code for one is zero, so I get zero, one, zero. I can of course obtain this word just by two and also I can obtain this word by one, four, because one will be zero and four will be one, zero. So the net effect will be zero, one, zero. So this is not uniquely decoded. And so this code is also not very good because I cannot invert. C3 is uniquely decodable but it's not instantaneous because the code word one, one is prefix of the code word one, one, zero. But nevertheless, there is no ambiguity. I can correctly decode the word alpha to get the original message. And of course C4 is the best one, this is an instantaneous code. But the two last columns are good codes. Now you know already a code that does not fall in the good situation here. And there is a very precise reason why it's not like that. So this code is the genetic code. So the cell is the function, one of the functions of the cell is to produce proteins and the proteins are constructed out of amino acids. So there are 20 amino acids, one to 20. And the last, the two last ones are just stock symbols, but they are considered as 21st and 22nd amino acids. So the set A I have to use is this set of letters. So the set A is a 22 element set. Now you know that on the DNA or RNA sequence, these are sequences of repetitions of four letters, ACGT. So that means that I have, from this side I have 22 from the other, I have only four, so four it's not enough. So in fact, the DNA is read not single by single basis, but groups of three. So codons, what is called codons, are strings of three, so bases of... So now you see that three letters of the bases, and so bases is in the set A, sorry, AC, G and T. So there are four different letters, so we have three syntax of four letters, so it makes you 64 elements in this codon set. So you have to consider the 64 codons as the set X. So the original set is not ACGT, but the elements of X are string of three letters. So X are the 64 codons. And so the genetic code, so here U is, you can consider that U is T. So depending on whether you are looking on the DNA or RNA, the fourth basis is either Thymine or Uracil, so that's the difference. But this is just the same. And so you see that now if I look, for instance, how to encode the amino acid, the u-syn, I have to use C, U, and then all of these possibilities give... So this is the codon, so there are four codons give u-syn. And so you see that this code is strongly singular, I mean any amino acid can be produced by different codons. The reason is very simple. Nobody has to translate back a sequence of amino acids into DNA, because this should have very bad biological consequence, because in that case proteins could influence the genetic content of the cell. And this is of course not even thought in biology. So the genetic code must be protected against attacks of outside attacks. So that's the reason why the code is not... There is no reason for the code to be non-singular. Moreover, the letters here, I just use letters, but in fact they are not letters, they are just macromolecules, chemicals. With a very precise stereochemical conformation. And when they match on the DNA strand, they must fit together. So since they must fit together, you have to pay some energy to let them fit together. And so that means that, for instance, if I have a U here, it could be thermodynamically more favorable to have an A than a G. That means that the next codon on the DNA should be different. So that's to imply that the codons are not just IID variables, they are not independent identically distributed variables. Codons follow some precise probabilistic distribution. That is not even Markovian, it's much more depending on the previous letters you have encountered. So that's the reason you have this redundancy of the code. Another situation that I shall not speak about in this course, where redundancy and not singularity of the code is interesting, is when you like to do cryptography. In that case, the singularity of the code is a character that you are looking for. You are looking for codes that cannot be inverted. Uniquely. So I close the parenthesis that I continue with the codes. Are there any questions? In the room or in the remote participants? No. So here I concentrate on codes that must be inverted. So these are codes used in communication. Of course, as I told you, the unique decodability of the code is an important character. So I have conditions under which a code is uniquely decodable. So first, if A is a finite alphabet, I call with capital A, I denote the cardinality of the alphabet. And I have a family of integers. If this family verifies this inequality, I say that the family fulfills the crafts inequality. Now, the theorem is the following. If I have a code X from X to the words on the alphabet A, this code, if the code is instantaneous, the family of the lengths of the code words fulfills the crafts inequality. That means the sum of A minus CX when I sum on X is less than 1. Conversely, if I have a sequence of integers that fulfill the crafts inequality, there exists an instantaneous code such that the lengths of the code words, the lengths of the code words have this preassigned family of L as lengths. So the proof is very, very easy and I give you here. So suppose that I wish to show that if the code is instantaneous, then it must fulfill the crafts condition. So the words of A star have different lengths, that are code words, have different lengths. If I look at the maximum length, if I look at the maximum length, then I can, that means that the code words will be words of lengths at least, at most capital L. That means I can represent the set of possible words as a tree whose last finite tree, whose last generation will be capital L. Now suppose that I have already ordered the set X, that means I have X1 less than X2 less than X3, etc. And if it's not ordered, I can always order a finite set just artificially. So suppose that I have placed the code word for C of X1. This is the code word for X1. So this will be a word in this set, so it will be some place on the tree. But if C is an instantaneous code, that means that this word cannot appear as a prefix of any other word. That means that once I have placed this word here, I must exclude all this sub-tree from possible positions of the other word. So I have excluded, in that way, I have excluded A to the power L. So here I have A equal to 2, but anyway here is A, L minus L of X1. I have excluded all these excluded positions at the last generation. Now suppose that I place the second word, the second word can be here. For instance, suppose that I have placed the word there. That means I have again to exclude A to the power L minus L X2. Again excluded. And in fact, if I have to be able to place the code on this tree, the total number of excluded words cannot exceed the total number of the last generation. Otherwise, there are not enough place to put the code. So the condition is that the sum of A, L minus L XI for I in the family of the code words cannot exceed the total number of nodes at this last generation, cannot exceed A to the power L, because the last generation has A to the power L nodes. So simplifying by that, you see that this means that the sum of A to the power minus L XI is less than 1. So this is the condition. So an instantaneous code must verify the craft's condition. And now for the other way, it's just the same thing. I have the sequence of lengths. I order again the sequence of lengths. So I have L1. L1 should be… So that's the… that way. In this way, I have ordered L1 less than L2, etc. I suppose that the lengths are… the integers L are already ordered. So I place in the L1 generation, anywhere, I place anywhere I have, maybe, let's see. So I have again the three. So suppose that L1 is one. I have placed some… in this generation, everywhere, what I like. This… and I assign the code word for the word X1, for the symbol X1 at this place. Now this will exclude a certain number of vertices at the last generation. But since craft's inequality is fulfilled, that means there are remaining positions available. So if I am to place L2, I try to… I choose a vertex at the end, and then I place at the generation L2 CFX2. And of course, this excludes me of this. This excludes me of this. The crafting inequality is still fulfilled. So I can… I can already find… I can always find some position where I can place CFX3 and so on. Is it clear? Have you understood the construction? No answer. Now the… Ah, sorry Dimitri, we were muted. So is it alias the understood? You understood, okay. Because it's… in the beginning we are muted finally. When we speak we don't… you don't… listen. It's good, they understood. They understood, okay. Okay, now some words about the optimality of the code. So the craft's inequality guarantees that there is an existence of an instantaneous code whose lengths of code words are preset to the length LX. Nevertheless, the… the code you see in the construction I did of the code, I can… any… any permutations should be accepted. But if the LX are not constant and the underlying probability of X is not uniform, there exist codes that are better than others in the sense that the average length of the code should be shorter for… for some choices. That means that the craft's inequality is not a constructive result, it's only an existential result. It says that there exist… there exist good code. But to get optimal code, I must minimize the average length of the code. So the average length is the weighted sum of the lengths with the probability, the underlying probability, under the constraint that these numbers LX must verify the craft's inequality. So this is the constraint. So the traditional method… method to solve this problem is just using like grand multipliers, that means for every family… So if I denote by L bold the family of all… of all the lengths, the… there is a function along the… on this family that is the average length plus lambda… so this is the… the constraint, the… the craft's inequality constraint. And on deriving we get this… on deriving by LX, so this is elementary, we get this equation. If you saturate this equation for fulfilling the… the… for saturating the… the craft's inequality, you get a lambda and then you can replace and then you… so lambda here… lambda here will be one over log A. And finally, if you consider the optimal length, this optimal length will be given… so optimal length… its length must be in quotation. I explain you why. Optimal length should be this number here. So with lambda… so PX in sense. And that means that LX is just the logarithm of PX. The problem is that L star has no particular reason to be an integer. So for instance, suppose that I am in cardinality 2 for… for this set, if the… the probabilities of appearance are not powers of 2, there is no reason that the logarithm of PX is an integer. So these numbers L star are not necessarily integers. Nevertheless, Shannon proved the following theorem. For every instantaneous code on a set with cardinality… on an alphabet with cardinality A, the average… the mean length of the code exceeds the entropy of X. So the A… the subscript A here means that AHA is minus some PX logarithm at base A of PX. So it's not logarithm 2, it's logarithm A. That's the meaning of the subscript A here. And this inequality holds as equality if it happens that for all X's, the probabilities are inverse powers of the cardinality of A to some integer. So this has… does not happen in general, so it is a condition on the probabilities. So the proof is elementary, so you introduce a new probability vector R in that way and you compute the average… the average length minus the entropy, then this is the expression, just replace. You arrive at this expression if you use the vector R, it's just elementary operations, I don't have to spend time on them. And you see here you recognize that here I get the Kulbach library… library contrast for vectors P and R and here I get the logarithm of 1 over the sum appearing in the Kraft's inequality. But this is always positive because D, the Kulbach's library contrast is always positive and since the Kraft's inequality is verified, the logarithm of the inverse will be always positive. So that gives you the inequality appearing in the Sanoff's theory. And now if the X has a random variable, is a random variable with low p, then there exists an instantaneous code such that the average length of this code is between the entropy and the entropy plus 1. So again, the minimizers coming from the Lagrange multiplier method are not integers, so L star are not necessarily integers. Nevertheless, if I am looking at the interval L star plus 1, this is an interval of length 1, so it contains necessarily a unique integer. I call Lx this unique integer. Is it clear or say it again? Students reply, it's okay. So now if I'm looking at the family of Lx, this family fulfills necessarily the Kraft's inequality because of this. Here's the by Kraft's theorem that exists an instantaneous code admitting the family of L as lengths of code words. And from this inequality, so I recall with that this is just minus logarithm of px. Multiplying by px and summing up over x, I get that the entropy will be less than the average length and less than the entropy plus 1. So the entropy and the optimal and the length, the average length of the code are within one unit. I mean, they are very close to one another. And you see that, you remember that the instantaneous codes are a subset of the uniquely decoded codes. So since the set of uniquely decoded codes is larger than the set of instantaneous codes, you can ask whether the codes that are uniquely decoded and verify the Kraft's inequality are better than the instantaneous code. The answer is no, because Macmillan's theorem says that if every uniquely decoded code verifies the Kraft's inequality and conversely, if you have a sequence of, if you have a family of integers fulfilling Kraft's, then there exists a uniquely decoded code such that it admits lengths of this size. And so that gives you, I don't prove this, the theorem is, it's easy, but I don't want to spend time on this. But more, I would like to give you an algorithm, there are many other algorithms, but I give you just one algorithm of how to produce a unique, how do you produce optimal codes? Let me see. So the algorithm goes like that. You, at the input of the algorithm, you need the probability vector p. And at the end of the algorithm, you will have a forest having two trees. So a forest is just a family of trees. So the forest at the end will have two trees. And the algorithm goes like that. I fix first m to be the dimension of p. And I start by the empty forest. Now, the, sorry, just to say you what is a binary tree. So a binary tree has zero or two children at every node. At every node has either zero or two children. You cannot have one child. Now, I start, so the, the, I, so I will be, I can identify the set I have to encode by I. Now, a binary tree is always represented in the following way. If I have a binary tree, I can represent it as a triple R, T, L, T, R. So R will be the root of the tree. And T, L is the left, left sub three and R is the right sub three. These sub trees can be empty. So at the first step, identify the elements here with some binary trees that are just trivial, they just contain only the roots. And then I assign the weight equal to the probability vector. And I add to the forest all these trees. Now I group together between all these trees, there are some that have the minimal weight. I choose the T1 and T2, the lowest line trees in terms of weight. And at the next next step, I amalgamate them. So I add, I add the new vertex as the root of the amalgamation. And I add this new tree, I subtract the original trees T1 and T2 until I arrive at the forest of cardinality two. So to understand the algorithm, you have to go through an example. Otherwise it's totally you. So I give you an exercise. I'll explain to you how to, the first two questions, well, the second question by the way, the second question is just the, is to run through the previous algorithm. There is a question. Is there a question? No. No. So here I have, I come back. So I have here a set of size 5. This vector here, I'm using one half, one four, one eight. So at the first step, I have a forest that is composed by just the roots. They are binary, trivial binary trees with zero children. So that's the forest at the first step. So the two trees with the smallest weight are D and T. So in the next step, in the next step, I leave the other, the other trees, like as they were, as they were. And I am made the two. So I have a new vertex. So I have now a forest composed by four binary trees instead of five. So the weights for this have not changed, but the weight for, for this one has, has changed. Now, if I'm looking at the second generation, I must look at the, I must choose the lowest line in terms of weight, binary trees of F1. And these are C and D. These are the two because here is one eighth and here is one fourth and here is one half. So I, in the next generation, I must, so I leave A and B as they were. And now I am made the two trees here. So C, C, D, E, sorry. And here I have C and here I have D. And now the new weight will be one fourth. One fourth, one half. And now again, I continue. And then at the end, I have a malagamation of B and CDE with the weight one half, one half. And this will give me the forest, the final forest F3. I write here, it will be A. And then B, CDE. With B and then CDE. And beneath, below CDE, you have all this forest. All this tree, sorry. Is it clear? Is it clear how to do? Yes. Okay. Now you say that I have a code because I, if I decide, for instance, to assign zero to CDE, then if I'm looking at this code, I'll have A, B, C, D, E. And here the code will be zero. For B, it will be one fourth. It will be one. And then where is B? B will be here, one zero. C will be one zero. D will be one one one zero. And E will be one, one, one, one. So you see that I have constructed Ms. Tantanium's code, in the sense that no code word is a prefix of another. And if I have the extension of this code to the X+, then I can immediately decode and get the original word. Is it clear? Yes. Okay, so I leave you as homework, the other questions. So I answered the, well, the first question is elemental, you just have to compute the entropy of this vector, it's nothing. So I have answered the second question. You have to ask, to answer the third, fourth and fifth question. So in particular this one. So think about that. And we'll come back tomorrow. Think about this. Okay. So are there any questions? Yes. It's good? Yes. Okay. So we come to the, now the next topic I should like to cover is about the channel coding. So a channel is a very general notion. Initially it was meant to study the transmission of a coded message through a noise medium. Presently is used to mean arbitrary transformation of a word of a finite alphabet to another word of maybe different finite alphabet. For instance, the genetic code I showed you before can be thought as a channel. We have a transformation of a word in the alphabet of cordons into a word and the alphabet of amino acids. So the input must be thought as a random word X on our arbitrary length on the alphabet. The output will be another word of arbitrary length on the alphabet. And the working of the channel will be encoded into its transmission probability that is the conditional probability of getting output Y, given that the input was X. In this lecture I assume for simplicity, simply there are too many letters. For simplicity I assume that the input and output words have the same length and the input symbols are emitted by independent source. We have to use this encoding to study, for instance, the genetic code, because the letters on the DNA, the codons of the DNA are not independent variables. So both, both hypotheses can be relaxed at the price of more complicated formulas. So this is not, this is not, it's only to simplify the presentation. The idea is that to transmit one bit of information, I must transport the precise state of a physical system that encodes the bit. And I must do this through a physical medium or process, that is the channel. So if the transmission vector is the electric current, so an ideal channel will be an ideal Cooper wire, but a realistic channel is a Cooper wire with strictly positive resistivity. The electric current does not go through unperturbed, it has some perturbation when it goes through and its value changes, and if the Cooper wire is very long then the value change a lot, and you can encode something that encoded zero at the beginning, must encode one at the end or vice versa. So the transmission vector is a hydrogen beam, the ideal channel is the empty space, but the realistic channel is the atmosphere with its turbulences and the perturbation it can go goes to the urgent beam. So if the transmission vector is a laser beam, then the, again the ideal channel is either empty space or an ideal optical fiber, but the realistic channel is either the atmosphere, or a fiber that is not 100% transparent. So if you come down to more elementary vectors, for instance, one single photon, if you are doing quantum communication in that case it will be just one photon and not a laser beam. Again, it will be the, the empty space or fiber, and again the same realistic channel. The DNA, the ideal channel is theoretical cellular matrices and matrices, but in a realistic channel there are mutations, there are some faults that will appear. So anytime you have a transmission for a realistic system, you have necessarily transmission errors. It's impossible to transmit signals without errors, physical signals. You can do it on paper, but in real life you will never transmit without errors. Here I give definition of a channel, the Markovian modeling channel. So simplifying the village. So a channel will be a triplet. There is a question. Is that a question. So a channel will be a triplet of X is the input. A question. I hear some noise but No, it's added by the channel and I don't understand what happens. Okay. Okay. So the channel will be the triplet of the X is the input alphabet. Y is the output alphabet. Περίπτω ότι πι θα είναι... Παρακολουθώ σε αυτή την εξηγή. Πι έχεις ένα πι, ένα στωχαστικό μάτρο, x, y, x στωχαστικο μάτρο. Στο πρόβλημα, είναι ένα πρόβλημα? Είναι κάποιος άλλος που είναι ανοιχτός. Είναι ένα λίγο πρόβλημα γιατί είσαι... Φα, οπότε... Φα, οπότε... Φα, οπότε... Φα, οπότε... Φα, οπότε... Είμαι φα, οπότε δεν μπορώ να... Δεν έχω... Κάποιος... Κάποιος είναι ανοιχτός. Εεεε... Παρακολουθώ σε αυτό. Παρακολουθώ σε αυτό. Τι? Δημιτρία... Είχε, είναι πρόβλημα... Είναι καλό, Δημιτρία. Ήταν κάποιο μορφό, αλλά μόνο είναι ανοιχτός. Παρακολουθώ σε αυτό. Παρακολουθώ σε αυτό. Πράγμα. Παρακολουθώ σε αυτό. Υποχή, ήδια στήριζ cliffs, θέλα εφαρά. Αυτό είναι καλή. Να, λιγο. Βρπάει, βρπάει. Βρπάει, βρπάει. Βρπάει, μεταφανικά. Όσο, κόμμα. fotografe. Κάθος, εμφάνι, Η φορά που έχουν λεγκίνηση S, με το νησίες C και νησίες Δ, terms i, εγώ το νησίες S, φορυ, λεγκίνησης Ν, μ. Ω νησίες� Investigations, λεγκίνησης Π. Ω νησίες, λεγκίνησης Π. Ε They are not the indexes. A&M. So here I have, there are the analysis. Y missing here is. Index is Y. So maybe give an example. Do to understand it. So suppose a first example. Is. Porque memory memory less binary channel. είναι λίγο το βινάριο αλφαβίδιο και έχω ένα 2x2 μάτρηξο, εδώ είναι, οι καρδυνάσεις είναι δύσκολο να δούμε. Ο μάτρηξος θα είναι μόνο ένα 2x2 μάτρηξο. Θα χρησιμοποιήσω 1-E0, E0, E1, 1-E1. Είναι η καρδυνάση που έχουμε σχέση, ότι αν έχεις ένα μάτρηξο, έχεις μικρό δυνατό ειωάνο που έχεις 1 βέβαια. Εδώ είναι 1, εδώ είναι 1, ένα 0-1. Με ένα μικρό δυνατό ειωάνο που έχεις μικρό δυνατό ειωάνο, με ένα μικρό δυνατό ειωάνο, που δεν είχε κάποτε δυνατό ειωάνο για το θέμα ειωάνο. Με ένα μικρό δυνατό ειωάνο, 0 θα γίνει 1. Και με την πυσοπωτική δυνατό ειωάνο θα ειωάνο θα μεταφέρει 0. Και το ίδιο για το βάθος 1, αν το βάθος 1 με κάποτε δυνατό ειωάνο θα μεταφέρει 0. Δεν θα μεταφέρει τέσσερα. If we consider the probability that 1 isreu the length of 3, suppose that the Channel is fed with the word x, that is 010, and we get as output 110. What is the probability here? How can we compute this probability? Λοιπόν, βλέπεις ότι εδώ το πρώτο λίγο της σύγχροτης zero έχει been changed into two one so it will be zero one the second is transmitted without any change and the third is transmitted without any change and so it will be e one the one minus e one one minus zero is it clear now the the second example is to see to show you that the notion of channel is very general a channel can be thought also as a code for instance if we if I have x to be a b c d and a to be set zero one I can have a code x gives a word in this alphabet for instance I can consider the following code so this is an instantaneous code can be viewed as a channel as x y p channel if I am choosing for y not all the letters or all the words of a plus but only the image of the possible inputs that means why in the in this situation is just the set zero one zero one one zero and one one one and so the matrix p in that case will be the stochastic matrix a b c d with zero one zero one one zero one one one that will be the unit matrix this is a stochastic this is a stochastic matrix this is a stochastic deterministic stochastic matrix is it clear that so you see a random variable can be thought as a channel its channel between the I have not spoken about omega fp or where the random variables are defined but but Raphael already spoke about the definition of a random matrix or of a random variable on some abstract space omega with values in x in that case a random variable is nothing else than a deterministic channel from omega to x now if the source low so the the random variable at the at the input of the channel is distributed according to probability vector mu and the channel transmission matrix is the stochastic matrix p we can compute of course the source entropy this will be minus some mu x log mu x the joint input output low so this is the joint probability of having input x and output y i call it kappa the probability vector the corresponding probability vector that can be obtained through the expression of mu xpx so you see that the the what i did yesterday about uh mark of chains i i i find it again because i have a stochastic matrix allowing me to pass from input to out so uh since i have uh since i have since i have the uh the joint low i can also compute the joint entropy this will be minus some of x and y of kappa x y logarithm of kappa x y i can compute the output low this will be the marginal with respect to summing up the the the input of kappa and so this will be just what i said yesterday about mark of chain this will be uh new uh this will be new p element y and of course i can compute again the entropy of the output variable and i can compute conditional entropies and neutral information but these quantities are uh uh compared in this following few figure so if the length of this uh strip is supposed to represent the value of uh age of the joint entropy then uh what remains if i subtract the uh the entropy of x will be the conditional entropy y of x if i subtract from this if i subtract from this h y what remains is the conditional entropy x given y and uh what is in between is just the mutual information now uh channels uh arrive in uh many different uh variants uh the uh uh channels uh without the lossless channels are those that are characterized by the the fact that the uh conditional entropy of x do given y is zero what does it mean it means that if the output is known there is no residual uncertainty only on input and equivalently we can show you you can show that the uh mutual information is just equal to the uh uncertainty of of the input uh the deterministic channels are those that uh there are transmission matrix is deterministic that means for every x there is precisely one y uh for which the matrix element is equal to one and for all other elements it's zero and if mu is the source low the uh i i i can compute the uh uh the uh the product in that the joint low in that situation will be like that and uh it follows that uh the the deterministic channel is one uh where the uh h the the entropy of y given x is equal to zero and that that means that the mutual information will be just the uncertainty of y that if the input is known the uh residual uncertainty or the output is zero the noiseless channel is uh a channel that is lossless and deterministic so both situations this so that means that the uh the mutual information is equal to h x it's equal to h y a useless channel is uh a channel where the uh mutual information is zero uh if i go through the operation that means that the uncertainty of x is equal to the conditional uncertainty given a given y uh and the same with the conditional on the other y that means the input x and the output y are independent values so in that situation uh the channel carries no information the the channel acts as a random number generator you you uh knowing x you learn something about y and knowing y you learn nothing about x and the final uh class is this symmetric channel so this metric channel comes in this next slide uh if i denote by s n the permutation group on n objects and x y p is a memory less channel with x yeah there's something uh so i assume that p is uh the probability probability vector on on y and z is a zero one is a vector of size x having values between zero and y but it's not a probability vector it's an arbitrary vector i have no normalization for that and uh suppose that uh for every x there is a permutation on the set of of cardinality y such that the the line x can be thought as the permutation of a given vector and y is the permutation of the vector z then the channel will be uh will be uh called symmetric in this situation uh things are much easier so the the uh the uh symmetric channel is a good laboratory to test the ideas uh have already discussed so uh maybe give some examples of of the types of of channels and so the first channel is the lossless channel is the one verifying this condition on the uh conditional entropy and this implies that the mutual information is just h of x so suppose that x is x1 xm xm and suppose that y is uh split partitioned in into m into uh m subsets that are not empty so uh the uh i have i have the uh the uh uh stochastic matrix p xy and so if i'm looking now to the probability that y belongs to the set bi given that x is an xy this will be the sum of y in bi probability okay sorry so this will be probability xi y and this will be equal to one so uh schematically we can think like things like that so we have x x1 up to xm every every uh set here every every element here is mapped into different disjoint sets and now you you are looking at the probability that x will be equal to xi given that that y is it is belongs to bi in that situation you will get that this will be equal to one that means if you know the output uh the uh input is determined okay uh the second kind of uh of channel is the uh the deterministic channel now uh suppose that we have uh 52 a deck of 52 cards that is considered as the uh as the uh cartesian product of uh x1 x2 where x1 is the set 1 10 k k 2 k and x2 is the symbols we have on the last one so you understand if you have a deck of 32 cards any card comes with a number on it and with a symbol on it and uh uh the i i denote by x the uh random so this is will be x so x will be a random number in x the random card in x and y uh will be uh so this will play also the role of y y will be uh just the sign on it so you see that uh the uh if you uh if you uh have x if you know x you know everything you know both the x the x1 and x2 component of it so if you know x you know uh precisely y and so the uh the uh mutual information between x and y will be h of y minus h y of x but this of course is zero because uh once you know x there is no uncertainty on y so this is zero and uh this will be uh the logarithm of the cardinality if it's a uniform distribution uh this will be the cardina the logarithm of the cardinality so the it will be two so this is a deterministic channel uh an example of noiseless channel is uh something that is uh lossless and deterministic that means uh that you can represent it like that so a is mapped to a prime b is mapped to b prime and so on and there is absolutely no uh no perturbation if you go uh in that way so the uh mutual information will be the uncertainty you have on x and is equal to the uncertainty you have on y now i come to the symmetric channel that's the most interesting so suppose that i have uh i give you two examples so p1 is a symmetric channel because it's uh the the transmission probability is like that so here x will be for instance zero one and the y will be a bcd so they have not the same dimension but they have this so this is a symmetric channel because uh the uh the vector here is a permutation of the probability vector you get on the second line and uh sorry here is one six sorry uh and now we are if i'm looking on this vector this has components one third one six so it's not a probability vector but nevertheless if i look at the other positions the other position is just a permutation of this vector so p1 is a real corresponds to a symmetric channel and i give you a second example this time will be uh one half one third one sixth one sixth one half one third one third one sixth one half here the sets x and y have the same cardinality and moreover this this stochastic matrix is doubly stochastic in the sense not only lines but also columns vectors but anyway this again corresponds to a symmetric to a symmetric channel so what will be uh here the uh the neutral information so mutual information here will be x minus x y given x and it will be also equal sorry y here okay y uh so here it will be h y minus h of p y because the uh the conditional information conditional entropy of y given x is essentially the the entropy of this vector one one so this is the vector b this is the permutation of this vector and this is another permutation of this vector but the permutation does not change the entropy of the vector permutation that leaves the the entropy unchanged so in that in that situation i can compute explicitly the the uh the uh mutual information that it will be given like that is it clear maybe i am much slower than i had uh planned to uh so tomorrow i'll finish about channels and uh the the last topic i i would like to cover is uh about Kolmogorov complexity this is a difficult topic i had uh i had uh suppose that i should spend maybe uh one one lecture and half that i have only half lecture to do that so it will be a very concise uh uh resume of the of what i had planned to explain i prefer to finish the channels and maybe solve the exercises than to give you in a hurry what the the difficult notion of of Kolmogorov complexity have you any questions could you please repeat last few sentences we got disconnected regarding what we shall do tomorrow so what i was saying uh is that uh i am much slower than i had planned to and so uh the i had predicted that i could spend one lecture and half to speak about uh Kolmogorov complexity because Kolmogorov complexity is a difficult topic but i have uh not one lecture and half i have half lecture remaining for the Kolmogorov complexity so uh i'll give you a very uh short resume of the idea of Kolmogorov complexity and i prefer to finish somehow correctly the uh things about channels and instead of giving in a hurry the notion of Kolmogorov complexity is it anyway the the notes about Kolmogorov complexity will be more explicit because i have written the notes for one lecture and half but i i cannot go through that it's it's too long okay so is there any other question try to solve the exercises i have proposed to you i am speaking i'm speaking to students now try to solve the exercises they are not difficult but you you must think about questions okay so if not we stop here have a nice good appetite why are you upload i'll be here tomorrow anyway