 Hello everyone, welcome to lecture on Introduction to Information Theory. At the end of this session, students will be able to articulate the amount of information per symbol and rate of information. Now before starting with the actual session, let's pause the video and recall what is a mean by a communication system. It is nothing but a transmitter, a channel and receiver. So here, this transmitter is nothing but the source of the information which generates information which is communicated to a receiver through a channel and at the receiver it is nothing but the destination of the information which receives the information which is coming from the channel, right. Now let's start with what is a mean by information. If it is generated in the several formats by the source as we seen in the communication system, the first block is nothing but the transmitter which generates the information. So it generated in several formats, it may be English, Marathi in the any regional language alphabet series or it may be a binary input sequence which is generated by the computer or it may be the analog waveforms which is nothing but the audio and video signals. So now can we measure it? The answer is yes. So for that let's take an example. So there are two sentences here you can see that. The first is that there is a traffic jam on National Highway 5 and the second sentence is there is a traffic jam on National Highway 5 near tunnel number 3. So if you compare these two sentences, from which sentence we will get the more information? Obviously, it is from the sentence number 2. So as compared to sentence number 1, sentence number 2 gives me the more information about the traffic jam at the tunnel number 3. So from this we can conclude that we can measure the information. So for that the basic principle which helps for measurement of the information is uncertainty. So what is mean by uncertainty? It is nothing but the unpredictable of the event occurrence. So for example, when you throw a tie a dice there are six outcomes but you can't confirm that which will be the outcome whether it is 1, whether it is 2, whether it is 3, whether it is 4, 5 or 6. So there is unpredictable output. So that helps to find the amount of information in the communication. So for that let's take two example. So first is a dog bites a man which is a common event. So occurrence of this event or probability of occurrence of this event is more which will give you the less information. There is no information in that. Second sentence a man bites a dog in that this event occurrence is less means probability of occurrence of this event is less rarely happens. So obviously the information in this event is more. So comparing these two you can say that a quite amount of information is communicated by the message a man bites a dog because the occurrence of this event is rare. So you will get the information in that. So using this you can say that there is a inverse relationship between the probability of occurrence of the event and the amount of information associated with it. So for example, more probability you will get the less information associated with or less probability you will get the more information associated with it. So let's take this as a in the form of equation. So you can see over here this i of xj is equals to function of 1 by probability of xj. So xj is nothing but the event which is occurring and p of xj is nothing but the probability of associated with that event. So as we said in the previous slide the relationship is a inverse relationship. So that is i of xj that is the information associated with that event is equals to the inverse of probability of that event occurrence. In the similar case there is another event which is happening that is yj yk and the condition is that these two events xj and yk are independent. What does it mean by independent? Independent is nothing but there is no relationship of the event occurrence xj is occurring differently yk is occurring differently. In the same manner what is the information associated with the event yk i of yk equals to function of 1 upon probability of yk. Hence we can say that if these two events are independent so total information associated with these two events combined that is called as a joint event is nothing but the probability of xj, yk equals to probability of xj and probability of yk. So total information associated with that combined event is nothing but the ixj, yk equals to nothing but the function of 1 by probability of xj, probability of yk. Now we already said that these two events are independent means xj is of event is separately occurring and yk event is separately occurring. So whatever the probability of these two events having and the joint probability is nothing but the sum of these two individual information associated with that. So on this arreches of this equation is having a multiplication but we want the sum. So how are we going to convert that multiplication into sum? So whatever the function over here is supposed to be convert this multiplication into the addition. So how that is done? For that we have to use log that is logarithm is one of the function which can convert the multiplication into the addition. So by using the logarithmic we can say over here that is information associated with the joint event equals to log of 1 upon probability of xj into probability of yk. Separate out these two terms we get log of 1 upon probability of xj plus log of 1 upon probability of yk. So this individual term is nothing but the information associated with the event xj. This is nothing but this I of xj and this individual term is nothing but the information associated with event yk. So that is why I yk and the summation of these two adding of these two either is gives you the nothing but the information associated with the event joint event that is a xj comma yk right. So from that you can derive the basic equation of the amount associated with event occurrence is nothing but the log of 1 upon probability of that event occurrence. So in this case for xj you can write information associated with that event xj equals to nothing but log of 1 upon probability of that event occurrence which is also can be written as a minus log probability of xj. Here it is 1 by probability of xj here it is only the probability of xj so that is why the minus sign is over here. Now let us take a so there are different units of the information defined for different basis of algorithm. If you already seen the equation for the information associated with the event for that the unit is depending on the log base whatever we are using. So if the base is 2 the unit is bit and if the base is e the unit is not and if the base is 10 we are using the unit is decade or hardly. So for that amount of information i of j xj satisfy the following properties. So first properties i of j xj equals to 0 when the probability of that event occurrence is equals to 1 means you are definitely know that there is one output is going to be occur. For that we will get the information is equals to 0 because log 1 equals to 0 so information associated with that is equals to 0. Second property is that the information associated with that event is always greater than or equals to 0. We already seen in the first one that i of xj equals to 0 when the output or probability of event occurrence is 1 in that case it is 0. So it is supposed to be minimum 0 or it should be greater than 0. So it is not a non-negative value it is always positive value which is 0 or greater than 0. Third one is if as we said in the previous slide it is having inverse relationship and if the probability is less you are getting the more information and if the probability is more you are getting the less information associated with that event. So in that here it is right. Now let us see what is mean by average and mutual information. So average information is nothing but it is also called as an entropy and it is nothing but the uncertainty in a random variable. We already see what is mean by uncertainty. It is nothing but the unpredictable of the event occurrence. You cannot predict what will be the output of that event we already seen the example. So entropy is nothing but the average information per individual message means whatever the events occurred and whatever the output you are getting you make the average of that I am saying average but if it is it has two types. One is arithmetic average second one is statistical arithmetic average means you have the fixed output deterministic output event occurred and you can make the arithmetic summation of that and you can take the average. So that is why that is called as a arithmetic average. Statistical average means you cannot predict the output the numbers whatever the output you are getting is unpredictable. So for that the value is always changing and it is random. So for that you can get the average is nothing but the statistical average and the entropy is defined or denoted by the h of x over here h of x equals to summation. Summation is means it you are performing an average of that summation of probability of x into the log of probability of x which gives you the entropy and the unit for that is nothing but the bits per symbol. The last point is a mutual information. The mutual information is nothing but the amount of information which is obtained by one event with respect to the other event. You already seen in the previous slide there are two events we consider x of j and y k but in that case both are independent. In this case we are taking the output considering the second event. So that is why the mutual information is nothing but the i of x over y equals to entropy of x minus entropy of x by y. So this one is h of x is nothing but the initial entropy which and h of x by y is nothing but the final entropy. So whatever the difference between these two you will get is nothing but the mutual information that is the amount of information which is obtained from one event with respect to observing other event. These are the references. Thank you.