 Welcome back. So, we were talking about the data compression setting and we asked ourselves the following question that which Shannon also asked himself. So, which is that is it possible to construct f n functions encoder decoder functions such that for each n encoder decoder functions for each n such that if you take the limit as n tends to infinity of the probability of error that limit goes to 0. So, remember we were talking since we are referring to the data compression setting we the medium that we had could only accommodate 2 raise to k possible 2 raise to k possible inputs and that therefore necessarily meant that only 2 raise to k possible values of the source could be recovered at the decoder because after all the decoder can only see 2 raise to k possible inputs outputs and then he has to map those each of those outputs to one of one of the possible source values. So, only 2 raise to k could potentially be recovered. Now despite this this is we saw that we are asking this kind of bold question here we were asking if the probability of error could still be given driven to 0 despite the fact that there are we actually have fewer inputs than the possible number of source messages. So, and now what we saw here is that this kind of this particular quest here of driving the probability of error down to 0 this itself is meaningful only when k itself grows with n. But remember k we were always in the regime where k is less than n. So, k has to grow with n but has to remain less than n. So, we thought that well let us postulate or let us posit that k is some theta times n some fraction theta times n where theta itself is less than 1. And we looked at this fraction which is the number of source messages you can recover which is 2 raise to k divided by 2 raise to n which is the number of possible values of the source. This fraction here has as n tends to infinity goes to 0 and that it goes to 0 because theta itself is less than 1. Since theta is less than 1 this fraction goes to 0 which means the fraction of symbols you can recover actually vanishes as a so you can recover a very very tiny fraction of the total number of source symbol. But remember we but the question that I ended with was how is this compatible with this particular claim which is that you want the probability of error to be to be to be going to 0 which means you want the probability of error to be going to 0 but at the same time you are bound by this requirement that the number of sequences you are recovering is very very small. So, this is therefore the key question then that we have the question we have is does so suppose epsilon let epsilon be greater than 0. So, let epsilon be greater than 0 the question we have is does there exist a subset let us call this subset A n epsilon and this subset is a subset of all the possible values of the source out of all the n length strings that the source can take this is this is a subset of that let us call this set A n epsilon and we want this subset to have a high probability means we want this subset the probability of this subset should be at least 1 minus epsilon. So, that means this subset is what is being recovered. So, whatever is not getting recovered has probability epsilon. So, the complement of this subset is what is not getting recovered A n epsilon is the set subset that is being recovered. So, we want this subset to be of high probability but at the same time if you look at it should also have the property that we said was essential which is that the if you look at the cardinality of this subset divided and look at that as a fraction of the total that is there to a fraction of 2 raise to n this card this fraction goes to 0. So, does there exist such a subset. So, a subset whose probability is large but cardinality as a when we when viewed as a fraction of of of the total number of strings is actually very is vanishingly small. So, in fact as n goes to infinity we want this probability to become larger and larger and eventually approach to approach 1 and at the same time we want this fraction to be going down to 0. This this sort of thing which you know on the face of it appears mutually contradictory is actually not contradictory. So, that is the that is the remarkable finding that we have here. So, let us consider this of course this is not true this may not be true in vast generality but there is a there is a there is a fairly fairly popular you know widely applicable setting in which these 2 statements are not mutually contradictory. The 2 statements being the probability that goes to go of that set goes to 1 but the cardinality of the set divided by the total number of strings goes to 0 what is that setting. So, the setting we will consider is the following. So, let us let consider some string like this x 1 to x n and let be a string in 0 1 to the n this is some just some some string. Now, we will assume that this string is generated according to a certain probability distribution. So, the probability of seeing this string x 1 to x n here is the is the probability is the is the product of the probability of of each of them which means what this is effectively saying is that each of these what this is saying is that each xi is generated independently and with an identical distribution what effectively what we are saying is that these xi is our iid what are called independent and identically distributed. So, the now let us take an example for example suppose the suppose p is the probability that that say x 1 is is equal to 1 in that case. So, if small p is this particular probability and what is the probability of seeing the string says 1 0 0 1 1 well the probability of seeing this particular string is going to be p times p raised to 3. So, because you see 3 once so that is going to come up so p is going to come up 3 times. So, it is p raised to 3 times 1 minus p raised to 2 this is the probability of so the probability of this of this particular string is equal to p cube times 1 minus p square. Now, what do we observe about this particular expression what we observe about it is that unless p is half unless p is equal to half not all strings this probability is not so unless p is half this probability is not the same for all strings unless p is half this probability will change according to strings. So, there will be certain strings which have a which which have a certain probability some other strings will have some other probability. So, in fact, if you see more closely the probability will depend on the number of ones and the number of zeros in that particular string it will be p raised to the number of ones times 1 minus p raised to the number of zeros. So, which means as the number of zeros and ones varies the probability expression itself will vary this this and therefore the probability of seeing a particular string would vary. So, if p is half though what happens is there is going to be no difference because p is equal to 1 minus p and all strings have the same probability and that would be equal to 1 by 2 raised to n. So, unless p is half this expression here that we have here is going to differ is going to have a different value for different strings. So, now therefore there is therefore there is some kind of some scope here to say which strings are the ones that have highest probability and there is scope then to sort of there is some scope here to optimize in this or prioritize the strings that have highest probability put them together put them in this set a and epsilon and check well what is the what is the kind what how does the probability of that set and the cardinality of that set behave. So, remember we want to get we are we are looking for a subset like this a and epsilon whose probability is large but whose cardinality is as a fraction of 2 raised to n is small. Now, to you know to do that what we are if what logically what one needs to do is look for strings whose you know the look try to see if since we have these since different strings have different probabilities maybe what we need to do is do some kind of a prioritization so that this so that the right strings get into a and epsilon and sufficiently many of them get in so that the their probability is greater than 1 minus epsilon but at the same time not too many get in so that because their fraction is become small the cardinality of that set itself remains small. So, this particular property is ensured by what is called the asymptotic equipartition property. So, this asymptotic equipartition property. So, let us let us define this particular so what we will do is now is use this we are going to I am going to introduce this particular property and to introduce this particular property what we will define is we will give a construction of the set a and epsilon and we will show that there is actually such a set. The construction goes through what is is through what is called a typical set. So, a and epsilon is a subset of 01 to the n is called a typical set if for all x1 to xn in a and epsilon. So, it is a set of sequences which have a certain property and that property is the following. You look at the probability distribution of those sequences or the probability of those sequences which is so you look at the probability that you see of that particular sequence x1 to xn this probability is between 2 raise to minus a term called n 2 raise to minus n times h minus epsilon okay h h here itself is a function let me write this as h of p h minus h of p minus epsilon and it is greater than equal to 2 raise to minus n times h of p plus epsilon. What is this function h of p well this function h of p is simply the following h of p is minus p times log to the base 2 of p minus 1 minus p times log to the base 2 of 1 minus p. This quantity here h of p is has a name it is what is called the entropy of a 0 1 or a binary random variable random variable with probability that of a binary random variable x with probability that x is equal to 1 is equal to p okay. So, before I discuss entropy let us first concentrate on our definition of the typical set because that is what we were after first. So, the typical set is a set of sequences whose probability itself satisfies a certain two inequalities. The probability is between these two quantities the lower bound here is 2 raise to minus n of h h of p plus epsilon and the upper bound is 2 raise to minus n h of p minus epsilon. In other words if you look at another way of looking at this would be that if you took the log of this probability if you took the log of this probability then the log of this probability would be between n times h of p minus minus n times h of p minus epsilon and minus n times h of p plus epsilon the log being taken with log of base 2. So, the probability itself is between these two quantities okay. Now, let us further see what this is actually saying if you see the expression here for the probability was that these x i's were i i d. So, this probability itself is a product and you can see that is why the log is actually making an appearance here. So, because there is a product here so this the log of this is in fact equal to the sum of the logs of individual probabilities 1 to n and that is between that is between these two quantities. Now, what I can do is divide throughout by n that does not change the inequalities and I actually get something like this. So, this is a much more you sort of palatable expression here what this is saying is that if you look at the average probability average log of the probability look at the log of the probability of every symbol in the string take the average of that over the length over the block length and that is between these two terms h of p plus epsilon and h of p negative of h of p plus epsilon and negative of h of p minus epsilon one can multiply both sides by in fact one can do more we can multiply both sides by minus 1 and get something like this that negative log of the probability averaged out over the length of over the block length is between h of p is in the range h of p plus or minus epsilon. Now, what is this h of p itself well h of p is what is called the entropy. The entropy is a fundamental quantity in information theories statistics statistical mechanics and so on it has a meaning it has an interpretation and so on that will take a lecture of its own unfortunately I do not have the time to do all of that but important thing to note here is that the this is the formula for the entropy but what we can also we can also write this in the following in a certain way which is which is which is the following we can we can write that this entropy can be written in the following way we can write this first let us take the negative sign outside and then we can write this as log the base 2 of p raised to p times 1 minus p raised to 1 minus p. Now, what is this particular what does this particular expression remind you of if you see what the one one one thing you can think of when you see this particular expression is actually this expression here see do not notice that if you see the probability of any string here it is given by p raised to something times 1 minus p raised to something. So, suppose if you had a if you had a string x 1 to x n in which there were suppose and and suppose the probability of seeing a 1 is equal to p if you see if the probability of seeing a 1 is equal to p then you expect that in this sort of if you expect in a random string like this you expect that this random string would have the number of ones in this in this kind of random string would be equal to n times p because the probability of seeing a 1 is p and the number of zeros would be equal to n times 1 minus p. So, as a result of that what the the probability of seeing a string of this with n p is n p ones and 1 n times 1 minus p zeros the probability of seeing a string like that probability of seeing what you can say is a typical string a typical string would have n p ones and n times 1 minus p zeros the probability of seeing a typical string would then be for any any the probability of seeing a typical string for any typical string would be p raised to n p times 1 minus p raised to n times 1 minus p and now if I take the log of that so if I take the log of the probability of a typical string that would then be n times the log of log to base 2 suppose n times log of p raised to p 1 minus p raised to 1 minus p. You can see this is this is essentially so this is this comes from writing out the previous the preceding expression. So, effectively what we have done here is is is you have so therefore what is this this and if I just take the negative sign here to make sure the because the log is being taken off a probability it is a negative quantity I am was putting a negative sign to make this positive. So, it is minus minus n times h of p. So, where is this entropy really coming from the entropy is really the law the n times you can say the the log of the probability of a typical string or 1 by or sorry so it is 1 by n times. So, what where is this entropy really coming from the entropy is 1 by n times the log of a typical string log of the probability of a typical string. So, this is this is roughly the intuition where this entropy is coming from and that is so if you look at that here what we have here is essentially is basically that. So, he we are talking of a and f the set a and epsilon is comprising of those strings whose behavior is typical in the sense that they have the number of zeros and ones that you would expect them to them to have given that the probability of seeing a zero is is 1 minus p and probability of seeing a 1 is p. So, with that sort of probability distribution what kind of strings would you mostly see is being captured in the strings that you would mostly see are the ones with probability with whose probability themselves are between in this range and that the set of all such strings is this set a n epsilon. So, this is what is being captured in a typical set. So, now what so I this is just the definition of a typical set what does this what does this bias. So, here is our theorem first for n sufficiently large so that means as n becomes tends to infinity for n sufficiently large we have that the probability of a n epsilon is greater than 1 minus epsilon. So, as n becomes large the probability of a n epsilon becomes large and it eventually become in fact it is greater than 1 minus epsilon. So, if you fix an epsilon and let n go to infinity you effectively have this particular property that the that the typical set has probability at least 1 minus epsilon. So, this is remember this was one of our requirements here we were we were asking does there exist a subset a n epsilon whose probability is large but whose size is small. So, we have been able to meet the first requirement which is that the probability is large for this particular set. Now, let us ask what can we say about its size? Well if you look at the size here is what we can say well a n epsilon the cardinality of a n epsilon is at most 2 raised to n times h of p plus epsilon. Now, this is this is amazing this is telling you that the cardinality is at most 2 raised to n times some constant which depends on p and your epsilon but there is we are we are seeing here that this we are we are seeing the appearance here of a term that was remember we I took k is as theta times n something like a theta has made its appearance here. In fact a n epsilon this is a n epsilon is at most this size a n epsilon can you can also lower bound the size of a n epsilon and that and that is so you have also that it is greater than equal to 1 minus epsilon times 2 raised to n h of p minus epsilon. So, this is this is this is the lower bound on the size of a n epsilon. So, this this combined theorem here is what is called the asymptotic equipartition property or AEP in short. This theorem is the asymptotic equipartition property and so effectively what this is told us is that the this told us the size the probability of the of the typical set and the size of the typical set. Now, all that remains to be seen is that well if this is if this typical set is actually good enough for us. So, we will do that in the next lecture.