 lecture is information and mutual information. Suppose a dye has faces 1 and 4 with red color and faces 2, 3, 5 and 6 with black color. Now suppose this kind of dye was rolled and we were told that the outcome was 4. Clearly we were given all the information concerning the outcome of the experiment. If we were told that the outcome was red we would agree that we were given some information but not all. The outcome is narrowed to one of two possibilities. On the other hand if we were told that the outcome was black we would feel that we were given some information but even less. The outcome is narrowed to one of four possibilities. Another example suppose after six weeks of the semester students were told that there will be a one hour examination. Clearly such an announcement contains a certain amount of information. However if the students were told after one week of classes that there will be a one hour examination. We would say the announcement contains much more information because it is quite unexpected that an examination would be scheduled after only one week of classes. So this is a surprising statement so that is why it will contain more information. So from these examples it is clear that it is very important to measure quantitatively how much information a certain piece of message carries. If a statement tells us the occurrence of a certain event that is likely to happen we would say that the statement contains a small amount of information. On the other hand if a statement tells us the occurrence of a certain event that is not likely to happen we would say that the statement contains a large amount of information. So this observation suggests that the information contained in a statement asserting the occurrence of an event depends on the probability of occurrence of the event. The information contained of a statement asserting the occurrence of an event is minus log P where P is the probability of occurrence of that event. Since P is always less than or equal to 1 minus log P is always positive. It is also clear that the smaller the value of P the larger the quantity minus log P. This implies if the probability of occurrence of an event is less the statement asserting the occurrence of an event will contain large amount of information. Thus if we now consider the example by which we started when we were told that the outcome of rolling a die was 4 the amount of information we received is equal to minus log 1 by 6 and that is log 6 which is 2.585. On the other hand when we were told that the outcome was read the amount of information we received equal to minus log 2 by 6. Because if read is read has occurred then the probability will become 2 by 6 because there are two possibilities only 1 and 4. So that is why number of favorable cases will become 2 and that is why the probability will become 2 by 6 and as a result of this amount of information will be minus log 2 by 6 and that will be log 3 which is 1.585. Another example suppose that we received from the computer as output a binary digit that is either 0 or 1 with equal probability of occurrence. When we are told that the output is indeed 1 the amount of information we receive is minus log half because there are only two possibilities 0 and 1. So that is why probability will be 1 by 2 and the amount of information will become minus log half that is 1. Similarly when we are told that the output is 0 the amount of information we receive is also minus log half so it will be 1. Now suppose we receive 32 binary digits from the computer as output assuming all 2 to the power 32 possibilities are equally likely the information we receive is minus log 1 by 2 to the power 32. So there are all total 2 to the power 32 possibilities so the probability and there equally likely also so that is why the probability will become 1 by 2 to the power 32. So the amount of information will become minus log 1 by 2 to the power 32 and that is 32 bits so the amount of information we are getting 32 bits. At this stage I would like to introduce the notion of mutual information suppose we were told that the outcome of rolling a die is read how much does that help us to determine that the outcome is a 4. Suppose we were told that the professor will be out of town tomorrow how much does that help us to determine that there will be a 1 hour examination tomorrow. So this kind of questions can be tackled by mutual information thus we want to know the amount of information concerning the occurrence of event a that is contained in the statement asserting the occurrence of event b which we shall denote by I of a b since minus log probability of a is the amount of information contained in a statement asserting the occurrence of event a and minus log probability of a given b is the amount of information contained in a statement asserting the occurrence of a given that b has occurred the difference between these two quantities is the amount of information on the occurrence of a provided by the assertion that b has occurred in other words we need minus log probability of a bits of information to assert the occurrence of event a and we still need minus log probability of a given b bits of information to assert the occurrence of event a after we were told that event b has occurred thus the information provided by the occurrence of event b on the occurrence of event a is I of a b that is I of a, b that is equal to minus log p of a that is minus log probability of a minus log probability of a given b so this is the conditional probability. So this is equal to minus log probability of a plus log probability of a given b for example let a be the event that 4 appeared and b be the event that red appeared when a die was rolled then I of a, b is equal to minus log probability of a plus log probability of a given b so this will be equal to minus log 1 by 6 because here the event a is that 4 has appeared so there are 6 possibilities 1, 2, 3, 4, 5, 6 among that 4 has occurred with probability 1 by 6 log probability of a given b what is this it is given that red has appeared so this event is given so in that case we will have only 2 possibilities 1 and 4 so that is why the probability that a given b will become half so this will be log half. So we will get 2.585 minus 1 and this value will be 1.585 bits now if we replace the event b by the event c that an even number appeared then what will happen so b is replaced by c so that is why we have to find I of a, c and that is equal to minus log probability of a plus log probability of a given c so minus log probability of a that is same that is minus log 1 by 6 plus now what is probability of a given c if it is given that even number has occurred so that means there are only 3 possibilities 2, 4 and 6 so in that case 4 will occur with probability 1 by 3 so that is why this will become log 1 by 3 so we will get 2.585 minus 1.585 which is 1 bit so from this we can observe that if probability of a given b is large it means that the occurrence of b indicates a strong possibility of the occurrence of a consequently I of a, b is large however if probability of a given b is small it means that the occurrence of b does not tell us much about the occurrence of a consequently I of a, b is small as a matter of fact the occurrence of event b may mean that event a is less likely to occur in that case probability of a given b is smaller than probability of a and I of a, b becomes negative let us also examine some extreme cases suppose that b is a subset of a in s in that case intuitively the occurrence of b assures the occurrence of a since we have probability of a intersection b is equal to probability of b it follows that probability of a given b is equal to 1 and minus log probability of a given b is equal to 0 that is the mutual information provided by the assertion that b has occurred on the occurrence of a is equal to the information provided by the assertion that a has occurred however suppose that b is the whole sample space in that case probability of a intersection b is equal to probability of a and minus log probability of a given b is equal to minus log probability of a and I of a, b is equal to 0 and this means that occurrence of b tells us nothing about the occurrence of a. So, here we take one example consider the problem of estimating the likelihood that there will be a one hour examination when the professor is scheduled to go out of a of town let s is the set containing x 1, x 2, x 3 and x 4 this is the sample space where the samples represent the four possible outcomes now this can occur in four possibilities only. Professor is out of town and examination is given so that is considered as event x 1 probability a professor of out of out of town and examination not given so that is considered as x 2 professor in town and examination given that is x 3 and x 4 represents professor is in town and examination is not given so there are four cases which are possible here now another thing we are considering the probabilities also probabilities of x 1, x 2, x 3, x 4 probability of x 1 is considered as half probability of x 2 is 1 by 16 probability of x 3 is 3 by 16 probability of x 4 is 1 by 4. Let a denote the event that an exam is given and b the event that the professor is out of town note that probability of a is equal to half plus 3 by 16 so probability of a means probability that the that an exam is given probability what is the probability that an exam is given so that probability is probability of a and we know that we have two cases here x 1 and x 3 so that is why we have to add probability of x 1 and probability of x 3 so that is why it is half plus 3 by 16 which is 11 by 16 now next we are finding the conditional probability of a given b so that is probability of a intersection b divided by probability of b now what is probability of b b is the event that professor is out of town and that is occurring in two places x 1 and x 2 so that is why to find the probability of b we have to add the probability of x 1 and probability of x 2 so that is here in the denominator and in the numerator it is half because it is probability of a intersection b that is both a and b will occur so here a and b that is exam is given and professor is out of town both these events is occurring here so probability of x 1 we have to consider in the numerator and if we simplify this we will get the value 8 by 9 the information needed that an exam will be given is minus log probability of a so it is minus log 11 by 16 because probability of a is 11 by 16 so this will be minus log 11 plus log 16 and that will be minus 3.46 plus 4 which is 0.54 bits and the information that professor by professor is out of town on the fact that an exam will be given is i of a, b is equal to minus log 11 by 16 plus log 8 by 9 how is it coming here a is the event that exam an exam is given and b is the event that professor is out of town so we are trying to find the mutual information of a and b so that is i of a, b which is minus log probability of a plus log probability of a given b now minus log probability of a will become minus log 11 by 16 now what about probability of a given b that we have already found that is 8 by 9 so it will be log 8 by 9 and that is why the resulting value will be 0.37 bits let c denotes the event that the professor is in town then probability of a given c will become probability of a intersection c divided by probability of c now probability of c is the probability that the professor is in town and if we see the cases professor is in town in two cases x3 and x4 so we have to find the probability that professor is in town and for finding this we have to add probability of x3 and probability of x4 and that is what we are doing here so it will be 3 by 16 plus 1 by 4 and in the numerator both the events will occur that probability so probability that a and c will occur so exam is given and professor is in town that probability if we consider the cases again so that is occurring here professor is in town and examination is given so x3 so that means we have to take the probability of x3 that will be 3 by 16 so if we simplify this we will get 3 by 7 so we will have i of a, c that is the mutual information taking this two events concerning this two events that is minus log 11 by 16 plus log 3 by 7 and that is 0.69 bits the fact that the professor is in town makes it less likely that an examination will be given so if we see so probability of a given c this is 3 by 7 but probability of a given b it is 8 by 9 so from this we can conclude this that professor is in town makes it less likely that an examination will be given consequently the mutual information provided by the presence of the professor on the occurrence of an examination is a negative quantity so that we have already observed that is minus 0.69 so the mutual information is very very less now let us consider one example here let us draw the figure first so this is the transmission end and this is the receiving end this is a simple model of a communication channel known as the binary symmetric channel at the transmission end either 0 or 1 is transmitted and at the receiving end either 0 or 1 is received specifically when 0 is transmitted 0 will be received with probability 1 minus epsilon when 0 is transmitted 1 will be received with probability epsilon when 1 is transmitted 0 will be received with probability epsilon when 1 is transmitted 1 will be received with probability 1 minus epsilon so this kind of communication channel we are considering suppose we have two equally likely messages m1 and m2 that will be transmitted over the channel using the representations 0 0 0 and 1 1 1 respectively if 0 1 0 was received we can compute the mutual information transmitted the event that either part or the whole of the sequence 0 1 0 was received so if 0 1 0 was received we are trying to find the mutual information between the event that message m1 was transmitted and event that either part or the whole of the sequence 0 1 0 was received and that will be I of m1, 0 now what is this this will be minus log the probability that m1 is transmitted plus log the probability that 0 was received given that 0 was received the probability that m1 was transmitted so this value we can calculate in this way this minus log probability of m1 that will be minus log half because m1 and m2 are equally likely so there are two possibilities m1 or m2 so that is why it will be half probability will be half so that is why this value will be minus log half now what about this conditional probability this we have to find so probability that m1 given 0 and this we know this probability that m1 intersection 0 both this events will occur m1 is transmitted as well as 0 is received this probability and in the denominator probability that 0 is received and this can be written as probability that m1 intersection 0 divided by probability of m1 into probability of 0 given m1 plus probability of m2 into probability of 0 given m2 because this is this is probability of 0 this is nothing but the total probability so because there are two possibilities m1 and m2 so that is why either m1 or m2 will occur so probability of m1 into probability of 0 given m1 so this is the probability that m1 is transmitted that is given then what is the probability that 0 is received and next is probability of m2 into probability of 0 given m2 now probability of 0 given m2 is the probability that it is given that m2 is transmitted then what is the probability that 0 will be received so this numerator can be written as probability of m1 into probability of 0 given m1 divided by probability of m1 into probability of 0 given m1 plus probability of m2 into probability of 0 given m2 so this probabilities we have to find now the numerator if you consider it will be probability of m1 into probability of 0 given m1 probability of m1 is half and probability of 0 given m1 that will be 1 minus epsilon so m1 given that m1 is transmitted 0 will be received with probability 1 minus epsilon okay so this will be half into 1 minus epsilon now in the denominator we will have probability of m1 so same whatever we are writing in the numerator this one is also same now the other one that will be half into epsilon so we will get i of m1,0 is equal to minus log half plus log half 1 minus epsilon divided by half into 1 minus epsilon plus half epsilon so finally we are getting the value 1 plus log 1 minus epsilon next we are considering i of m1,01 because we are trying to find the mutual information between the event that m1 is transmitted and the event that the whole sequence 0, 1, 0 or part of it is received so one part we have already considered that is 0 now we are considering another part that is 0,1 so i of m1,01 this will be minus log probability that m1 is transmitted plus log probability that 0,1 is received then the probability that m1 is transmitted so this conditional probability log of this conditional probability so let us find m1 given 0,1 this will be probability m1 intersection 0,1 divided by probability of m1 into probability of 0 given 0,1 sorry 0 given 0,1 given so let us write it freshly here m1 given 0,1 that will be probability that m1 intersection 0,1 divided by probability of 0,1 which is probability of m1 into probability of 0,1 given m1 divided by probability of m1 into probability of 0,1 given m1 plus probability of m2 into probability of 0,1 given m2 now the numerator probability of m1 is again half and this conditional probability will be 1 minus epsilon into epsilon so m1 given that m1 is transmitted that means 0,0,0 that is transmitted so from 0,0 will occur with probability 1 minus epsilon and from 0,1 will be received probability epsilon so that is why this we are getting in the denominator we will have half into 1 minus epsilon into epsilon plus this also will give the same expression half into epsilon into 1 minus epsilon so this value will become half and as a result of this I of m1,0,1 will be minus log half plus log half so this mutual information is equal to 0 in the same way if we find I of m1,0,1,0 that will be minus log half plus log half into 1 minus epsilon whole square into epsilon divided by half into epsilon into 1 minus epsilon whole square plus half into epsilon square into 1 minus epsilon so this 1 will give 1 plus log of 1 minus epsilon so knowing that either 0 or 0,1,0 was received tells us exactly the same amount information on the transmission of message m1 however knowing that the sequence 0,1 was received tells us nothing about the transmission message m1 and this is why what we expect this is actually expected because the transmission of either m1 or m2 would yield the sequence 0,1 at the receiving end with the same probability so now let us discuss one result of mutual information I of a,b is equal to minus log probability of a plus log probability of a given b and that is equal to minus log of probability of a minus log probability of b plus log probability of a intersection b because probability of a given b is probability of a intersection b divided by probability of b so we can write this as minus log probability of b minus log probability of a plus log probability of a intersection b so it will be minus log probability of b plus log probability of a intersection b divided by probability of a and that is equal to minus log probability of b plus log probability of b given a and this is nothing but I of b,a so this implies mutual information is a symmetric measure in the information concerning two events thus I of a,b is a measure of the mutual information from b to a as well as from a to b that is all thank you.