 So, last time we talked about this generating random samples, where we talked about both direct and indirect methods. So, by the way, how many of you are able to solve the question on the indirect method in the mid-sem? I think many of you got confused, we are deliberately ignored whether there is a term indirect or not there, right. So, when I say direct method things become very simple, right. If I give you pdf or like a CDF function, if I give you a CDF function that is it, I mean you just invert it, direct method one can easily get. But the challenge there is we may not always be able to invert and the question I gave that was related to which distribution, F distribution, was that easily invertible? No, right, it is a little complicated that is what one has to go with indirect method and that is what I deliberately wrote that word indirect there. But many of you have missed this and also some confusion because the way T has evaluated that, but that is fine. So basically there we talked about if I given a pdf or rather CDF how to generate the samples and now we are going to switch gears and start talking about data is already available to us and from that how to infer the underlying CDF which is going to generate and most often what we will do is we will assume the class of the CDFs itself, we will assume that this samples are coming from a Gaussian distribution or Poisson distribution or something, but we do not tell you the parameters. So, unless you do not know the parameter, you do not know exactly what is the distribution right. So, that is what we will do now, we will say that data is given to us and it we will be told appropriately that this is going to come from this class, but I will not just tell you the parameters and your job is to identify those parameters from the data and for that we are going to look something called data reduction and there are different principles there we will start focus most of our time on the sufficiency principle and something called sufficiency statistics and today we will also cover factorization theorem. Now, for us we are going to assume that data is generated let us say x 1, x 2 these are the observed data and we are going to say that this data is going to come from some underlying population f theta and I do not know what is theta, the structure of the f theta may be known. For example, I may say that f theta is a Gaussian distribution then you know its structure right. So, now let us say you have this data points, now this is just a bunch of data for us. Now let us now focus on data reduction part data itself of no value to me, but what is important to me is the information provided by the data. So, you may be interested in obtaining some key information from this data either by defining sample mean, sample variance or like smallest value or largest value and to define sample mean I am going to use all the values. So, is sample variance similarly to define smallest value I am going to use all the values. Now, in general we can talk about any function of the samples which we earlier called it as statistic ok, t is any function which can operate on my data samples we call it as statistic and we said that sample mean, sample variance, smallest values, largest value these are all examples of your statistic and what they are actually giving you is a basically kind of data reduction or data summary. Means telling you one summary, sample variance is giving you another summary, smallest value is one summary, largest value is another summary. Now, if you are only interested in the statistic and statistic is giving me the summary and if it so happens that if two samples x and y are giving you the same summary tx and dy which happens to be the same then from information point of view both x and y are same for me because they are giving me the same information when I use my statistic ok. Now, thinking more about this I am now going to take one particular example of statistic which is basically sum of all the values in my random sample. Now, it may happen that let us say I have this, let us say I have n samples and another bunch of samples, another bunch of samples let us call this may be x and the simple x 1 and let us call this x 2. It may happen that if you add all of them they may add up to the same value and both of them could be still coming I mean my assumption here is that both of them are coming from the same underlying distribution. So, what is now happening is this samples I am summarizing by applying some statistic in this case sum, this samples also I am summarizing by applying this statistic t. Now, if they happen to give me the same value let us say if I take t of x 1 that is this and this happens to t of x 2 we said that x 1 and x 2 they are same for me because they are providing the same information. So, for me like at this point this random sample and this random sample they are the same. So, I may group all the random sample which give me the same information and that grouping can lead to partitioning of my space ok. So, last time we said let us take a two dimensional case maybe this is 0 sorry this is 1 this is 1 and I am interested in all the point my space is only this, this is my x 1 and this is x 2. Just let us take one hypothetical case if I draw a line and any take one point let us take this point and let us take this point. If I add the components of these two points will they have the same value? Do people understand this? Let us call this is one let us say this let us call this is 0.2 and 0.8 and this point here is 0.8 and 0.2 ok. So, you take any point on this line they will add up to the same value. So, from data reduction point of view if I sum that all these points on the line are the same for me right they are indistinguishable to me right. So, like that what I can do is I can now start of thinking this data reduction as the partitioning of my space itself ok and in that all the points x which will map to the same number let us call t I am going to collect them and I will call that set as 80 ok. Now, let us take another point. So, here the sum is 1. If I am interested in all the points whose sum is going to be let us say 0.5 where how on which line they will lie? Let us consider all the points whose sum is going to be 0.5. So, here is one point this is 0.5 ok. Another point is going to be here 0.50 and another. So, if I am going to connect this line all the points there is going to have the sum 0.5 right. So, like that what I can do is in this case maybe let us take all t equals to 0.5. Now, if I am going to define 80 this is going to be basically all the points on this line right and similarly if I take t equals to 1 this is going to this set is going to include all the points on this line. Now, I can consider all these sets 80 right. Now, what are the possible values of t? t here what is the value of what is going to be here? It is going to be 2 right if I am going to sum. So, maybe I will be interested in taking t from 0 all the numbers may be t 0 to 2. So, if I am going to construct all the sets where t is going to between 0 to 2 and I take this sets 80 will they overlap or they are disjoint they are going to be disjoint right because all of them likely they will be on this particular line different different lines depending on what t I am going to choose. So, that is why now if I am going to think of this this set 80 now forms a partition. So, in a way the when I reduce data the possible reductions I am going to get that is going to partition my space in this fashion. Now, what are the advantage of this data reduction? We discussed last time right data reduction is one obvious thing is if I tell you just the sum then I do not need to maintain the entire vector right x 1, x 2 up to x n I just has one value I have reduced it to one value I just need to store that one value in that way that is going to be useful in terms of reducing. But now our question will be always with respect to getting information about the underlying parameter. So, that is our basic principle right through which we started like when I started talking about this we say that I have data what I want is extract information about this parameter theta. But this data reduction came we are going to use this data reduction as a means to get information about that theta. Now, the question here is so fine data reduction is fine you have this samples you can reduce it to one number the question always remains is when you reduce it how well that reduced value is capturing information about your theta parameter. For that we are going to look into different principles one is called sufficient principle, sufficiency principle is always going to talk about the statistic when you are going to say that your statistics is going to capture sufficient information about your parameter theta. So, sufficiency principle will govern this. And we are going to look into another thing called likelihood principle which is going to write your data sorry the parameter that you are interested in as a function of your data observed. And then it is try to connect how your data points are governing or like what is how your parameters and the data points are related in the best possible way are most likely fashion and the likelihood principle will try to capture that essence ok. So, it is something called equivalence principle we will not go into that it is a it is just like a relaxation of this condition here equivalence principle. So, in this case we said that x and y will map to the same statistics we said that x and y are kind of same for us, but why that should be the case ok. So, maybe x and y are somewhat related may not be exactly the same. So, that will be captured by equivalence principle, but we will not dealing much with that in this course.