 So, now let us do this example, how to verify a given statistic sufficient or not ok. So, naturally we have one characterization look into the ratio we just defined ok, for that ok, maybe what we want to now based on what we have just derived, we said that a statistic T is a sufficient statistics for this parameter theta, if its conditional distribution of x given T x does not depend on theta. Now, the same thing we are going to say in an alternate manner, we are going to say that T of x is a sufficient statistics for theta, if for every x this ratio P of x given theta divided by Q of T under theta, this does not depend on theta. Notice that now this ratio has to be independent of theta for every possible x, it is not that if you show it for 1 x, then you are done. This whatever we said this, this should not depend on which x you are considering ok. So, let us look into this example we just discussed. Let us say this x, I do not know I am sometime putting bar and not bar do not get confused. This is x I am going to treat as a random vector here, which is coming from underlying population which has a Bernoulli parameter theta. Now, I am interested in the sum, notice that n is fixed I am fixing n. Now, our claim is if I am going to take the sum of this sample that is a sufficient statistics for theta ok. So, how to verify this? Just apply the formulation we have compute there marginal sorry compute there unconditional probability under the parameter theta, then compute unconditional probability of your statistics under the same parameter theta and then see that that ratio does not depend on theta. So, we just discussed that the distribution of x under parameter theta has this probability right. Anybody has any question on this? We just did that right we can write this. Now, the denominator is only we need to think through. Now, T of x is summation of x I we know that this value can take only 0 and n and we know that this is a sum of n Bernoulli random variables. So, sum of n Bernoulli random variables which are identically distributed is binomial. So, now, so I should have written it like this like I should have written this q of x that T of x takes this value t given theta. So, that the T of x taking value t is now n choose 2 n choose t q to the power t 1 minus theta n minus t ok. So, you know now if you just simplify this now it is simple algebra you just you because of this product form you take the you make this product into summation by taking this into the exponent and if you see that what you are going to get is eventually all this theta gets knocked off what remains is 1 upon n choose t which does not depend on your theta parameter it only depends on your data. So, this should be i equals to 1 to n x I. So, now is this T is a sufficient static for your parameter theta and now suppose instead of this I have taken T of x equals to 1 by n this is the sample average right earlier I was taking some I am now taking the average. Let us call this this one T 1 and T 2 these two are different statistics right one is taking some and another is taking the average, but in terms of information about theta are they any different? No why? Because in this case n is a constant I mean if I know this quantity I can easily get this quantity even if I know this quantity if I know n I can go back and get this quantity. So, one if I just by knowing one it is not not that I do not know anything about other or like less information about the other. So, because of this these are fine ok. So, another quick example let us say now I am saying that data is coming from Gaussian population with a parameter mu and sigma square I am fixing now the underlying population which is Gaussian with parameter mu and sigma square, but I am interest I am in this example let us say that I know sigma square, but I do not know mu, mu is the parameter of my interest. Now, I want to get information about this mu which is a statistic which will give me good information about this mu. Any guess? I have this underlying parameter mu right from data I want to get information about that parameter mu. How what function I should apply on my data or what should be my statistic so that I get a good information about mu from the data. Others? No no no why I cannot take expectation right I only have this samples I have only data now I do not have the luxury like if you want to talking about expectation you need to know distribution with what respect to, but I do not have the full information about the distribution I do not know what are the new parameter here ok. So, we have earlier we have discussed that sample mean is one good estimator for your mean and we have said that is a consistent estimator and also unbiased estimator, but now let us see that now now instead of thinking about in terms of estimator consistency and all now we are thinking in terms of sufficiency principle ok. Now is which is a good data reduction method here to get information about mu. Sample mean is a data reduction method maybe let us try that let us try that sample when I do the sample reduction sorry when I do this sample mean is that a good statistic will it turn out to be sufficient statistic ok. Now so notice that I am always using P here I am not using F even if it is a continuous distribution just to make sure that this is same for both discrete and continuous case I mean it should be clear from you for all of you whether it is a continuous or discrete I may just use simply P. So, this is the probability density function of Gaussian. Now, I am using this statistic x bar now what I know about x bar. So, now to do compute to verify whether it is a sufficient statistics or not I need to compute that ratio right the ratio has two things one is the unconditional probability under the parameter mu and what was the denominator about the distribution of statistic let that it is taking some value under the parameter mu. Now, let us compute I know that the unconditional probability is this under the parameter mu. Now compute let us compute the distribution of this statistic x bar what we know about x bar if I average n IID Gaussian samples it is going to be still Gaussian, but with mean mu and variance sigma square by n. Now I know distribution I can write its distribution only thing I have done is replace sigma square by sigma square by n which is giving me this value. Now compute the ratio if you now compute your ratio you see that this entire quantity mu does not appear here even though sigma square appears here sigma square is not my unknown parameter sigma square is known only unknown parameter is mu in this ratio mu does not appear what does this indicate what sample mean is a for what for mu of Gaussian distributions ok. So, we need to tell all this like it is a sufficient statistic for which parameter under which population distribution ok. Now, if you want to know something is a sufficient statistic or not. So, first thing is if you want to know some parameter now you feel that having a sufficient statistic is good right because it can essentially capture all the information about the parameter have your interest. But then what should be that is it always easier to get that sufficient statistic is it that sample mean always happens to be such sufficient statistics we do not know a priori right. Now the then the question comes how to find it what previously we just saw is given a statistic to check whether that is a sufficient statistics or not somebody is claiming somebody has given you t and he is claiming that see this is a sufficient statistic. Now what you are doing is you are telling him you are verifying whether his claim is correct or not by computing the ratio. But that guy who is coming up with the sufficient statistics how he is going to come up your job was easy you just verified it by applying finding this ratio, but that guy's job was hard to come up with a sufficient statistic. Now how to find a sufficient statistic is there a method. So, factorization comes to rescue there to some extent and it tells us when it is possible to have sufficient statistic for an underlying population. Okay now let us see what does the factorization theorem states. Let us say you have a random sample x which has the underlying pdf or pmf which I am denoting px given theta and let us be t of x be sufficient statistics for parameter theta. Now it is saying that t of x is a sufficient statistic if and only if there exist functions gt of theta h of x such that for every x and theta your probability mass function should be factorizable in this form. It is an involved statement, but try to parse it. It is saying that some statistic is going to be sufficient if and only if if its probability mass function or the probability density function can be factorizable into two parts where like factorizable into two things g function and h function where h only depends on x and the g only depends on t. It depends on x only through t not explicitly on x, but through t t of x. If you are able to factorize your probability density or probability mass function in this fashion for some statistic t then that statistics is going to be a sufficient statistics. Now the guy who want to give statistics now he has to do use this. He has to pick some statistics and see whether this holds. If this holds then that guy is confident that that is a sufficient statistic then he can confidently give you ok take this and verify it is a sufficient statistic. Okay now let us too quickly go through its proof. I am now going to do the forward part. So, what is the forward part here? If and only if there exists function g and h such that this holds. Now for the sufficiency what I need to show t is a sufficient statistic if something happens right. Okay now I am going to start with saying that t is a sufficient statistics and now I am going to argue that if that is the case my PMF is going to factorize in the way I want. Okay if t is a sufficient conditions sufficient statistics I am going to use the properties. First thing is if t is a sufficient statistics it is this conditional probability does not depend on theta. I am going to start with that. Now what I am going to do in this is I am going to take h of x to be this conditional probability and can I claim that this h of x does not depend on theta? Yes or no? Yes right that is by definition. If t is a sufficient statistics I know that this conditional probability does not depend on theta and I am calling that a h of x. And now the g of see what I am going to do in the if thing I am going to show that if t is a sufficient condition I need to show that there exists a g and h function such that this factorization holds I need to show that. Now what I am trying to do in this proof is I am trying to show you such a g and h exist whenever t is a sufficient statistics. And now for g function I am going to take this to be probability that t is going to take some this is basically distribution of my statistics itself. Okay now probability of x given theta is this by definition. I know that I can write it like this why is that because I already told you if I treat this as set A and this as set B A is a subset of B that is why I could write like this. And now I am going to write this joint distribution as conditional probability and this marginal probability. And now that is it now I have defined this portion has h and this portion as g and I know that this portion does not depend on t it only depends on x and this portion depends on x only through t of x not directly on x. Now is the sufficient part is complete what I have shown is if t is a sufficient statistic I have this PMF split into h and g function where h depends only on x and g depends only on t of x. Okay now let us do the opposite direction. Now I am assuming that factorization already holds for some statistic let us take some statistic t and assume that the factorization holds. If that factorization holds I need to show that that t is a sufficient statistic and how to show that t is a sufficient statistic I am going to go that characterization of sufficient statistics where the ratio does not depend on theta. Okay now let us see. So, t is my some statistic and I am assuming that p of x given theta factorizes like this for some h function and g function. Now I am under assumption that my p factorizes like this I am going to start with that. So, to do that I am going to start looking into this ratio. Now I know that p factorizes into g and h function which I have written here and the denominator I have just kept it like this. Everybody agree with the first step? I simply use the fact that my p is factorizable. Okay next what is this probability? So q t of x given theta is nothing but this is summation over all y which are mapping to a t. That means they are all I am now going to take this t looking into all those values y which are getting mapped to that value t and then this is p of y given theta. I do not know all of you are able to follow this step. This step is you need to understand. So, t of x right t of x is a mapping here and let us say this is going to be value t. This is some value t but there are multiple y's that could be mapping to this y right. There could be multiple y's which could be mapping to this value t. So, I need to sum over all of them and when I take the probability of all those y's I will get that probability. So, what I do see that the distribution on the statistics I have converted it to the distribution on samples right. So, basically what all the points that would have converged to when mapped to t of x I am look taking probability of their sum. Okay this was like we have to start thinking about what we did in the first half of the course. We said that we did function mapping right. We did that let us take one function here and this is f of omega. So, we said f of x function mapping we did and when I took one point here there could be multiple points which could be mapping to the same things right and that is when when you found a distribution of the function of random variables you people use this property there okay. So, that is why now this prop. So, basically t is now function of random variable here right and that is the property that I have used here and now once I do this I am now go back and use my property again that this p function is factorizable p could be written as a factor or a product of h and g function I have written here okay. Now if you notice here here t of y is a constant because all these t of y's are going to map to the same t and so I can pull it out. I have pulled it out of the summation because all these t of y's have the same value okay maybe what I could have done is yeah what I could have done is simply okay let me write this okay this portion here this is g of t given sorry g of t given theta h of x and in the denominator y belongs to a t but this t of y is also t. So, this could I could write t given theta and h of y right. So, I wrote it t of x but I could write it simply small. Now this guy can get knocked off with this and now after knocking off what we left is h of x divided by summation of h of y. Now what did we say h of x this h of x does not depend on theta h of y's also does not depend on theta does this ratio depend on theta now no then what we have just showed is this ratio here on the left hand side is also does not depend on theta if it is not depend on theta then what we concluded t is a sufficient statistic is the necessary condition is clear now. So, we are able to show both direction. So, this is so the nice thing about this factorization theorem is it is like a complete characterization it is not just like a sufficient it is a necessary condition as well okay. So, I said earlier two steps right somebody will come up with their statistics and ask going to give somebody to verify it is a sufficient statistics. But the first guy he do not know how to find a statistic, but he comes with a random statistics and tries to use this factorization theorem and chooses the one which follows satisfies this factorization theorem in the first step itself you know that what he has is sufficient statistics because you saw that actually second step is already used in the first step to verify that it is a sufficient statistic okay. So, any questions about factorization theorem? So, you should be comfortable in applying this and be clear about g and h function like what is that when we say we should come up with a g and h function what should that h depends on what is that g depends on and what is that they should not depend on that should be clear. You cannot come up with some arbitrary g and function and say that okay product earlier I mean take this as your g there is a restriction on what and g and h function and when you give me g and h function you need to prove me that h only depends on x g only depends on t all this part okay. So, let us stop here