 Hello everyone, welcome to lecture on entropy joint entropy and conditional entropy. At the end of this session, students will be able to correlate between the conditional and joint entrapies, determine information in terms of probability and entropy. Now before starting with the actual session, let us pause the video and think about what is the mean by rate of information. It is nothing that the average number of bits of information plus second in the communication system. So, we know that average number of bits required for per message are h that is nothing but your entropy and your system generating a message at a rate of r messages per second smaller. So, this small r is nothing but the generated messages per second. So, rate of information equals to nothing but r into h that is bits per second. Now let us start about the entropy. So, what is mean by entropy? It is nothing but the average information of the message and it is a characterized to attain the unpredictability of the random variable. Now it is only average of the information no. So, it is also about their frequency in the communication system or in the channel. For example, you have communication system in that the source generate a message different messages. One message is having a probability with 0.1. So, it is transmitted 100 times. Second message which is having a probability occurring probability is 0.15. So, which is transmitted 150 times. So, means if you compare these two at the receiver side you can say that the occurrence of the second message is having 50 percent more probability at the receiver side as compared to the first one which is occurring 100 times right. Now let us derive the equation for that entropy. Now suppose you are having m different messages m 1, m 2 up to m and for that all m messages you are having a probability respect to probability is p 1, p 2 up to p m right. So, during the particular time interval you have a generated L messages capital L and that L should be much greater than m. m is nothing but your different messages capital M and L is nothing but the particular messages which is generated or particular time slot. Now for if you are having m 1 message consider m 1 message and during that time interval whatever the L messages are generated in that what is the probability of message m 1. So, that probability of message m 1 in that time interval is probability p 1 into time interval a total number of length of message L right. So, this is the number of m 1 messages how many m 1 messages are there probability into the L messages. So, amount of information associated with that message m 1. So, number of messages m 1 equals to amount of information is equals to log of 1 upon probability of that particular message. So, in this case log of 1 upon probability 1. Now the amount of information associated with the all m 1 messages we already saw over here how many messages are there p 1 into L. So, amount of information in all m 1 messages how much p 1 into L and then amount of information of single m 1 messages. So, that is over here it is multiply you will get the this that is the entropy. Now so, total information of this whatever we derived in the previous slide that is a entropy amount of information only in a single message m 1 you have total m messages. So, you have to find the amount of information of each and every message and then you have to add that together. So, that is nothing, but your this equation all messages p 1 L log 1 upon p 1 that is related through message m 1 then message m 2 up to message m. So, this is the amount of information total amount of information in L messages which is generated in a particular time interval. Now if you divide this by L that is the average of information L is number of messages if you divide this by L that will gives you the average of information which is nothing, but the entropy. So, if you divide this by L then L L get cancelled from all this term and remaining part is what summation of k from 1 to capital M probability of k log of 1 upon probability k this is general form we will get for the entropy. This can be rewrite equals to minus summation k equals to 1 up to capital M equals to pk log pk. Now suppose there are only one message that is m equals to 1 and with that having a probability equals to 1. So, m s one message which is with probability 1 in that case you are having one message. So, starting from 1 to 1. So, only one term is there probability is 1. So, 1 log of 1 by 1 and log of 1 equals to 0. So, 0 into 1 0. So, that means entropy equals to 0 means the average information in that message at the received side is 0. So, as the probability more you will get the less information. We already saw that in the previous session. Now consider a binary source with generating 2 output. So, in that case m s is having probability 1 and all other are having 0. So, you can write in that previous equation. So, entropy equals to h equals to p 1 which is having a probability 1 and all are 0. So, we already saw for this log 1 is 0. So, 0 into 1 is 0. So, overall is 0. Now if the source is a binary in that case this is a graph showing the entropy versus probability entropy as a function of a probability. If you have saw over here the probability is arranged from 0 to 1 and this is the entropy which is from 0 to 1 also. Now from figure we can see that what is the maximum entropy we will getting maximum entropy is 1 which is occurring at probability having 0.5. So, if you consider this 0.5 as a probability and binary source. So, that is why you have 2 output. So, 0.5 log 2 plus 0.5 log 2 equals to 1 bit means you require only a single bit to represent the output at the receiver side. So, same example we saw in the previous session. Now this whatever the maximum probability 1 it is also called as the h max h max right. Next this is a condition condition for this maximum probability probability should be 0.5. So, in the overall we can say that for m equals to t means if you having only 2 messages with the probability 0.5 50 percent probability you require only 1 bit per message. Now this is the case in the binary what is the case when there are m array messages m array means there are number of messages you cannot predict how many messages are there. In that case what will be the h max or what will be the number of required in bits for per message that can be derived using the same equation for we derived for this one. So, you having a m messages all that m messages are supposed to be equally likely means every message having the same probability equals to m 1 1 by m. In that case you can write that rewrite that equation of entropy equals to summation from k equals to 1 to capital M probability of k log 1 upon probability. So, you having all the messages with the same probability. So, it is and there are total number of messages number of messages are m. So, it is multiplied with m into probability log m m get cancel. So, what we require the minus final equation is equals to what h max equals to log m bit per message. So, some of the important properties of entropy log m is always greater than equals to h greater than equals to 0. So, in this case h is a entropy and log m is nothing, but the maximum entropy we already saw over here maximum entropy is nothing, but the log m and these 2 are always greater than or equals to 0 entropy is also non negative value it is always 0 or greater than 0. If h equals to 0 means if entropy equals to 0 means the probability supposed to be unity all probabilities are 0 if you having number of messages with the probability for that messages all probabilities are supposed to be 0 and only 1 is supposed to be having unity probability. And if h equals to log m if all the probabilities are equal we already saw over here if the all the probabilities are equal then you are having the entropy is nothing, but the maximum entropy. Now let us see about the joint entropy. So, what is mean by joint entropy? So, it is nothing, but the average information of the joint probability distribution. Suppose you are having 2 events x and y and you have to find the joint outcome of that 2 events that is also called that called as a joint probability distribution. So, for that like a take example you are having a you have to define a people by hair color and by eye color. So, by hair color it is having x set and by eye color it is having y set. So, there are 4 different values of x and there are 3 different values of y means you are having a brown colored hair black colored white colored or also eye color the brown is right. So, for that you can write the equation probability p x comma y is nothing, but your joint probability distribution. And using this you can write the joint entropy equation h of h of x y equals to double summation summation of j equals to 1 to m and summation of k equals to 1 to n. And this is nothing, but the joint probability distribution and log of joint probability distribution. The same equation we already saw in the previous slide only the difference is what instead of single probability of single event here you are considering a joint probability of event. 2 summations because there are 2 events x and y. So, that is why we use 2 summations. Next one is a conditional probability. Now, this is nothing, but the average conditional self information is called as a conditional entropy. So, it is given by the equation h of x given y equals to summation 2 equals to 1 to m and summation k equals to 1 to n probability of joint probability distribution log of 1 upon probability of x j by y k. So, how you going to interpret this one? So, you can see that this h of x by y or h x given by y is nothing, but what the information of in x that is a source after you are observing the y that is a receiver. In the second case you are having a information in the receiver by observing the source. Here both x and y are the random variables and some relationship between the entropies are this one is a joint entropy equals to conditional entropy plus receiver. Second one is this is a joint entropy equals to conditional entropy plus entropy of the transmitter or source. So, how this will be derived for that we have to go for the next video. These are the references. Thank you.