 Let us revisit again this factorization theorem. So, we said that factorization theorem helps us identifying a sufficient statistics because this is a complete characterization whether a of a whether statistics is going to be a sufficient statistics of a particular parameter on a particular population distribution ok. What is the factorization theorem? We said that if I have a random sample with its population f and t is a statistic for theta, then we are said that t x is a sufficient statistics if and only if I am able to find function g and h such that the density function splits in this format where h is a function of only on your random sample and g is a function of your random sample only through the statistics the static t ok. We discussed this example I do not know if you discuss this example in the last class, but anyway we already came across this example of a having a population having samples from a population following Gaussian with parameter mu and sigma square where let us say mu is unknown, but sigma square is known. Now, let us say I want to start with a problem of identifying first a sufficient statistic. Right now let us assume that you do not know all you know is factorization theorem. Factorize theorem has said that if you are able to find a g and h function in some format and this f can be written as a product of those two functions, then by looking into them maybe you will find out what is the sufficient statistics. Let us try to find out this function f I am able to write as a product of two function h of h and g where h only depends on x and g depends on x only through t ok. So, I am only interested in the parameter mu because sigma square is given to me only mu is unknown. Now, what I have done here is this is a by the way this is the joint Gaussian distribution and I was little bit manipulated this exponent part here by adding and subtracting x bar. What is x bar here? It is the mean ok n samples here I have taken mean. Now, after adding and subtracting this you have to do little bit algebraic manipulation after that you will end up with this equation. Now, if you look into this sigma square is known mu is the only unknown thing this function here this portion here can I treat it to dependent on only x bar given mu can I treat this function as g of x bar given mu right. Sigma square is anyway no I do not need to worry only mu if I know this I can treat it as g x bar given mu and the rest of the part can I treat it as h of x anyway the first part is constant x i x bar is there, but x bar is again depends on all x i's. So, I can treat it as h of x. So, now what is since you are able to factorize your population density in this format what it is saying what x bar is a sufficient, but what you showed here is you just establish the fact that f can be factorized in this form you are able to get h and g function where g only depends on x bar and you know now this x bar is a sufficient condition. Why is this? Because you know that if if this x bar if I am able to split f into this bar whatever this x bar it has to satisfy the definition of sufficient kind statistic because that was the characterization of my factorization theorem right I mean that was the characterization my factorization theorem said you will be able to find such a factorization that h and g only if where x bar happens to be a sufficient statistics otherwise you possibly would have not been able to find this such a factorization ok. Now, consider the other case where I have population distribution again Gaussian with parameter mu and sigma square, but here both are unknown mu is unknown sigma square is unknown. Now again I want to see that now my parameter is what my parameter is this mu and sigma squares it is 2 dimensional now ok. Now I need to find out this parameter I need to find a sufficient I want to check whether for this parameter theta there is a sufficient statistic. So, what I will start looking into is see that if this function f I will be able to factorize into form g into h. So, again start do the standard manipulation what we did I added x bar here and then I am also trying to bring in s square. So, x bar is your sample mean and we had also one more quantity called s square right what was that and what was the ok. Now what I will do is I will try to write this into s bar. Now you see that now this quantity I have here. So, I will cross multiply s square with n minus 1. So, I will get this and this other factor I will just keep it like this. Now let us see if I have the required factorization here ok. Now if I give you mu condition if I tell you mu and sigma square can I think of this form only depends on x square sorry x bar and sigma square sorry x bar and s square 1 by say that again. Yeah we will do that, but first let us focus on the function g function the g function has to be such that this only depends on the statistics given parameter theta. So, now let us say if I assume that theta is given can I say I have to basically this is up to me like I have to come up with a g and h function ok. So, now let us try all the combination let us start with what you are saying I have these three products here and I have to group them and see which I can take it as h and which I can take it as g ok. According to you which can be h here that is a constant that can be h of x can be a contact that is fine, but if I wanted to define these two get the if let us say if I want to take wanted to take these two together as h well is there a problem in that yeah it depends on the sigma square. So, that is a problem because this h shows not supposed to depend on the parameter and similarly if I want to take this let us say this first and the last one this one and this together we ran into the same problem because that depends on sigma square as well as mu also. So, the only way for me to now get out of this way is consider this possibility like if I tell mu and sigma square does this only depend on some quantity yes if I tell mu and sigma square this function this product depends on what? So, if I take if I tell you mu and sigma square this depends on now this additional quantity x bar s square. Now one possibility is if I took them as a statistic then I could treat this entire quantity as g t by mu sigma square or like yeah there now instead of t ok I have to I have to take that there is now not one dimension right there are two dimension. So, the first component is s square and other is s x bar and s square given mu sigma square now and this h of x could be simply this constant ok. Now what is the sufficient statistic? So, everybody agree now x x bar and s square together is a sufficient statistics according to this definition even though it is I mean I even though we know that x bar is a proxy for mu and s square is a proxy for a sigma square, but here I am interested in this entire parameter which consists of two. So, that is what I need to specify completely for this both the one. So, that is why I need to consider them together x bar and s square. Now if you interpret like this now this is going to be a sufficient statistics for theta. Now I said now I said in this case h of x equals to in this case h of x equals to 1 upon 2 pi sigma square n by 2 and g of x bar s square given mu sigma square is this entire thing ok this entire exponent form. But instead of that so, I have now given g to be this quantity sorry h to be this quantity and g to be this entire quantity. Instead I wanted to consider it differently I wanted to define this g to be g to be this entire thing. For this if I add a constant it is still a function of x bar and s square only right then what is h of x in that case that is going to be constant 1 right there are different ways in which you can think of this h and g functions here. It is need not that h and g need to be unique the theorem is not saying that there exists a unique h and g that such that this happen. All the theorem said is there exist functions g and h all that you just need to show that there exists some g and h function this factorization happened. You need not I mean necessarily they exist in a unique way ok. Now let us look into the exponential family. So, where does let us one looks into something and see that we can by just looking into the structure will it help us to find what is a sufficient statistic. So, we know that exponential family many many distributions belongs to exponential family right. By the way did you come across anything which did not fall in exponential family? T distribution did not fall in F also did not fall ok. T for any P and what about the uniform distribution did uniform distribution fall in exponential family ok fine. Let us say we know that many distributions fall in exponential family and this exponential family has this nice structure right. We said that our distribution falls into an exponential formally if it can be written in terms of h c w i's and t i's ok. Now you see that already this pdf is already in a way factorized right into h function c functions and within the exponent w and t i. Now this is already in some factorized form can I apply this factorization theorem and see if this exponential family has a sufficient statistic ok what is that. So, first part h is already given for you for free in the exponential family h is already there you only need to worry about g function and the c theta is also harmless because it is not involving any x. So, given theta it goes away what matters is only the things inside but the things inside are also nicely behaved theta and x's are separated. Once you give me theta what matters is only t i's now and because w i's are constants now what matters is only t i x then all this t i's are I do not know why I added this summation here the typo. So, this all this t 1 to t k on that x is going to be I do not again this so many typos here. So, this vector of t 1 of x t 2 of x all the way up to t k of x is right away is a sufficient statistic for you. You do not need to do all these circles of saying that whether the ratio becomes independent of theta like one of the characterization of sufficient statistics we say it is if you take the ratio of its condition unconditional probability and conditional probability of the statistics given the parameter that is independent of theta that was one test for us right. But if you going to use factorization theorem you do not need to do all those circles you have right away not only a statistics, but you also know that this is going to be a sufficient statistic ok fine. I just said that in the factorization theorem the factors G and H need not be unique, but are the sufficient status themselves are unique or there could be multiple sufficient statistics for a parameter there could be multiple. Then if I tell you one sufficient statistics can you generate another multiplying the constant multiply what? Multiplying by constant is trivial thing you can do right what else smart something more that because multiplying the by of constant you know constant if you multiply and give I will divide and give it back to you. There is no any more information in that what kind of function so that mapping has to be there right. So, that is what we will say now. So, if t is doing the data reduction and having all the information and you do another transformation of that in a unique way that should still retain that property ok that is what we are going to yeah another ok that is what we will come, but you will see that if I have a random sample consisting of n points I can simply take my statistics to be that sample itself. When you are trying to extract information about that sample I am giving you that inside sample itself is your value that is also natural statistic sufficient statistic right. I mean you all not even you are not reduced the data you have just given me everything you do whatever you want. So, that is a one natural sufficient statistic sorry statistic and it also happens to be sufficient statistic because you can always write f x given t in term like this where h of x is h of x is just one. You can write always f of x given theta is f of t of x theta into h of x where h of is one and this t of t of x is an identity function. So, nothing has changed and our factorization theorem is saying that this t is indeed a sufficient statistics because h is already there and this I can treat it as my g function. Now, as you some of you noticed if you take a sufficient statistic and take any one to one function on that that continues to remain a sufficient statistic. So, how is that I mean you can see that again using our factorization theorem. Suppose let us say t of x is a sufficient statistic and you are going to do a transformation of that using some function r ok and we want the transformation to be one to one. If it is one to one this function r is invertible. So, that our function r is invertible for me. Now, let us see how I can use my factorization theorem to conclude if t of x is also t star the new sufficient statistics I have is also a sufficient statistic. Sorry the new statistics t star I have is a sufficient statistic. If I started with t which is a sufficient statistic I know that my factorization theorem tells me I will have g and h function such that this factorization holds that this is simply through factorization theorem. So, I hope all of you in sync with me I am saying that t is a sufficient statistics here by doing the transformation r I got a new statistic and I want to check whether that is a sufficient or not. So, now t of x so, t of x I can invert this relation here then t of x is going to be simply r inverse of t square of x. Now, I can treat now I have three function g function r inverse function and t's function. I can treat this g composition r star as another function g star then I will have g star of t star of x and now this g star function only depends on x through t star x. All of you see this if this is the case can I now claim through my factorization theorem t star is a sufficient statistic. So, if you give me a statistic you do not need to just multiply it from constant you take any 1 to 1 function apply it on that and you will get a new sufficient statistic and by taking different 1 to 1 functions you may generate as many sufficient statistics you want ok. Now, we just discussed that there could be many sufficient statistics. So, first thing is we started with a statistic statistic is something which is giving you data reduction, but what we wanted is the guy who best reduces data in some sense right that is where we we look for sufficient statistics. We say that sufficient statistics once you reduce that is it you do not need any additional information. But now, we want to see like now that there are so many so many sufficient statistics right we are not happy. We want to see that among the sufficient statistics which are good ok and that is where we use the notion of minimal sufficient statistics ok. Now, what is this minimal sufficient statistics? It says that we are going to call a sufficient statistic t to be minimal. If if there is any other sufficient statistics t prime this condition implies this condition. What is this condition? It says ok according to this sufficient statistics x and y points are the same that is what we said right. If x and y points are giving me the same statistics they are kind of same for me. If they are same under this t prime they they are also same under t ok. So, if this sufficient statistics is saying that ok these two points are same the one which is better should also say that ok these two points are also same ok ok we will not go into that. Instead of that let us look into directly example what is the minimal sufficient statistics where it arises ok. Let us consider the case of a sample coming from this caution with parameter mu and sigma square where I am right now assuming only mu is unknown, but sigma square is known. Now, if I have samples let us say x 1, x 2 which are coming from this population I can compute x bar from the same sample I can also compute sigma square sorry s square and I know that if I take x bar as a statistic this is a sufficient statistics for my parameter mu. Since I can compute s square also I can take another statistic which is x bar and s square I know that this is also sufficient statistics for mu why you can simply ignore the s square right and then it is, but it is more and nobody is stopping you from computing s square from the samples. But now I can obtain t 1 from this t 2 right I can obtain t 1 by t 2 t 2 you are basically giving me x bar and s square I can just ignore s square and I got x bar which is basically t 1. Whether you give me x bar the t 1 which is containing x bar and t 2 which is containing x bar as well as s square in if I think about mu information about the parameter mu both are having same amount of information ok. But the additional information provided by this statistics s square is not adding any more information about the parameter mu than what has been already provided to me by my parameter mu. So, if you think in terms data reduction both t 1 and t 2 are good in terms of getting information about my parameter mu both are sufficient statistics. But from the data reduction point of view which one is good t 1 is good because t 2 is unnecessarily storing s square which is not providing any more information about mu once I know x bar. So, not only I am interested in how well information about my parameters contain I am also want to provide any more redundant information to me. So, this guy is providing this about mu this is a redundant information here s square ok. So, that is why in terms of t 1 and t 2 I would we can say here t 1 is better here about the parameter mu because it is sure storing less information I mean storing it is reducing in terms of data reduction this is better ok. So, which is minimal here in term about the parameter mu from t 1 and t 2 if you want if you compare t 1 and t 2 which is minimal t 1 is relatively minimal compared to t 2 here right from the data reduction point ok. I will just say this definition and leave its proof because at least few people can continue looking into the problems which uses this definition of minimal statistics and other things we will do later and this is involved definition let us focus let f of be a probability density function suppose that there exist a statistic t and if every pair of samples x comma y this ratio of this pdf is a constant if and only if they provide the same information through the statistics about the parameter mu that means they are they are same means they are kind of indistinguishable to me if that is the case then I am going to call such a that t is going to be minimal sufficient statistic ok. So, just understand how to check a sufficient statistics is going to be minimal or not one way to do is ok take a your sufficient statistic take two points on which x and y x and y on which it is giving me the same value and now if I take the ratio of this pdf on this two points x and y that should be a constant if that is the case then that t is a minimal sufficient statistic ok. On the other hand t is a minimum sufficient statistic and if this ratio happens to be a constant this is going to happen only this is going to happen. So, we will not get into the proof of that those who are interested take a look into that because it needs us to its proof is not simple, but yeah we will we will skip this those who are interested look into that ok I will stop here and next class we will revisit this and discuss the examples related to the minimal sufficient statistic.