 Now, talking about sufficiency principle ok, recall that by sufficiency principle we are trying to say that when I reduce my data does it capture sufficient amount of information in the best possible way about my parameter theta ok. Now, we will be interested in you may come up with the different statistics, but what is that statistic which will capture the best possible information about your parameter theta right. So, maybe we can different parameters different statistics we know right, we can we have sample mean, we have sample variance, we have minimum value, maximum value all this statistics are there, but for my given population which of the statistics are capturing the best possible way the information about my underlying parameter theta. So, we need to know make this which is the best possible statistics we need to kind of formalize for that we are going to introduce something called sufficient statistic. Now, sufficient statistic for a parameter theta capture all information about theta contained in the samples. So, in a way it is giving me the best reduction or best possible way capturing information about theta from your samples. And it is also about reduction right, you have n number of samples and you have reduced it using your statistics to one value and now in that value you are looking into how best information about my theta is captured. Now, if I give you that reduced value and if in that reduced value information is well contained then the in addition to that reduced value if I give you any further information that is individual samples they should not add any more information about that particular parameter theta. So, that is we are we are just leading to the what should be the formal these are kind of like this points are like what should be the properties of a sufficient statistic ok. So, an example of the second case is suppose if I have x 1 to x n and it is a statistic suppose now T of x contains all information about theta what does that mean? If I give you any any individual samples like let us say x 3 or x 4 they are not going to benefit you to get more information about theta. Whatever T x telling you information about theta the individual samples are not complimenting that ok. So, in summary we are going to take sufficiency principle as if T of x is a sufficient statistics for some parameter theta then any inference we are going to make about theta should depend on the sample x only through your T of x ok. So, this is the sufficiency principle we are going to talk. So, we are saying that statistics should be enough to get information about the data summary or the data reduction I am going to get through my statistic should be capturing enough information or what all the possible information about my parameter theta that is it any additional things will not help me. If that is happens then we are going to say yeah we are then we are following this sufficiency principle ok. Yeah we already said that if x and y x and y are such that they have the same statistic value then the inference about theta should be the same whether it is x sample or y sample. Because x and y individually does not matter to me what matters is T of x or T of y in this case T of x and T of y are same. So, that means, I have same amount of information about theta irrespective of whether it is x or y ok. Now, let us take an example let us take x 1, x 2 up to let us say x n they are coming from Bernoulli parameter theta everybody fine? Now I am interested in some statistic which is summation of x i and we know that what is the possible values of T, T of x we know that this is going to be between 0 to n right and 0 this is like a 0, 1, 2 up to n. Now my underlying parameter the unknown parameter here is theta ok. So, rather than this let me take it as yeah whatever let it be like this. Now let us say that my x is going to be 1 what is this probability under parameter theta? So, x is this x is this Bernoulli that is the populations it is going to be theta right x is going to be 1 is going to be theta. Now, ok maybe let me write this as 1 by n. So, now this values are 1 by n 2 by n like this love the value of T of x is going to be this ok. Now I am going to tell you this is x equals to 1. Now let us say I am going to take this x this x this is a 1 value maybe I should have ok this is 1 I am going to make it vector now this is going to be x x bar means vector and let us call this x bar is let us say. So, the first component x 1 is x 1 x bar of this 2 is x 2 and like this x n bar equals to x n you all of you in sequence 3. So, now x bar ok let me call this as x bar x bar is this sample and this is like a 1 realization of that random vector. We know that this is nothing, but because because talking about random sample that means they are independent. So, this is like x i bar equals to x i and we discussed this last time I can write it as i equals to 1 to n. How can I write it? I can write it as theta x i and 1 minus theta 1 minus x i let us do a sanity check when x equals to 1 I only get this term theta and when x equals to 0 I will get this term. So, this is the probability this is like a unconditional probability of observing sample x under my parameter theta this is the probability. But now I am telling you t of x has taken some value t ok t of x this is taken let us say I am calling it here small t and we are also saying that this t is a sufficient statistic I am as I am telling I am taking t is a sufficient statistics by that what we mean what is our intuition? This t is a this statistic is a data summary in such a way that this t contains all the necessary information about this theta ok. So, that means this t is now a proxy for this theta because t is containing all the necessary information about theta. So, I could as well interpret this t is a proxy for theta. Now if I tell that ok this is my t x has taken value t that means in a way like I am actually passing you the proxy for this theta and if this theta is indeed capturing essence all the essential information about theta once I tell you this is value t then this should this unconditional when I move from this unconditional to conditional one given that this t x bar equals to t this should be independent of theta and this should only depend on this t value because now the role of theta is played by this t because it has captured the essential information about that t. So, that is why we are saying that conditional distribution of x and here I am using this notation x bar here just to distinguish one random variable from random vector. So, this conditional probability should be independent of theta ok and that is because this sufficient statistics captures all the information about theta and now I am telling you what is that theta through value t on which I am conditioning ok. So, now we said that ok let us take two statistician one statistician has generated some sample x and he also computes t of x. So, statistician one has x as well as t of x and the second statistician has only t of x. Now in terms of the parameter theta who is better off between these two one or two we said that both are same right even though this guy has x as well as t x is no of better than the guy who knows only t of x ok because having x if t is sufficient statistics this having x is adding no more additional value about the parameter theta. So, in we are now talking about sufficiency principle when we say that when that is going to be sufficient we are talking about what is the sufficiency principle is telling that sufficiency principle saying that ok when you do a data reduction that reduced value will should contain information about theta. Now, we are going to say it is going to be sufficient when whatever that reduce action we are going to do it will contains information rich enough so that even if you going to give me any additional information about India I am no better whatever is required is already contained in that reduction itself ok ok. So, one quick thing we will do is ok quantify that when we say that my conditional probability r is going to be independent of theta can I write it in a better way ok. Now, let us assume t is a sufficient statistic and now we said that if that is the case this conditional probability does not depend on theta and now let us say I am now going to compute I am going to fix an x arbitrary sample and my statistics is going to give me a value t on that ok. Now, let us try to compute the probability of conditional probability for this particular x conditioned on its statistic ok. Now, this is the conditional probability what I have done here I have simply applied the definition of conditional probability that is the joint probability divided by the marginal. Now, we know that if I am going to let us take T x equals to t small t and look into all the y's that will all the points that will map to the same t. We know that x is already going to take that value so x definitely belongs to this right. So, because of this now if I am going to look into this event so we said let us call this event a and let us call this event b, b is my b event is t of x is equals to t of yes that is means this is all y basically that are going to t of x and we just know that this a is a subset of b because of this is holds true and now this is the case this is x is going to take x under parameter theta we are going to write it distribution of x under parameter theta and the denominator is now the distribution of my statistic again under the parameter theta. So, what we have done is this conditional probability we have segregated numerator is only on the distribution of x and the denominator is only on the distribution of your statistics both under the parameter theta. Now if this part is independent of theta so must this ratio be this ratio must be also independent of theta. So, yeah I said p of x is the joint p m of your random samples and q t the denominator is the distribution of your statistic.