 So, in the Kramarau bond, we ended up seeing this term right, the partial derivative of your log of your pdf function and now you see that this actually governs this denominator actually governs how small your lower bound can be right. So, because of that it has been given a special name and it has also a special interpretation, it is called information number or it is called Fisher information of your sample. And naturally, if this information number is going to be larger, the lower bound in your Kramarau's bound is going to be small and it in a way also says that if your variance is going to be small. So, if the estimator is such that its variance is going to be small, what does that mean, louder? Data is it is a best error estimator that is fine, but in what sense? It is able to essentially capture the information in the data about your parameter well right. Data is it could be yeah less spreaded off for because of which pdf itself if the data is less spreaded off or the data is not going to be spread out too much that is fine it will concentrate about its variance part. But now if you are some if you are by the way notice that this quantity here it does not depend on the estimator, it is it is only about your pdf function ok. So, then in a way what you people are right like it is not about estimator irrespective what is your estimator, what matters is how good my data is spread out like how far my data is spread out. So, it is a property of only your pdf function and if pdf is such that the data is spread out too much will this be larger or smaller? If your data is spread out what do you expect your any estimator is going to do a good job or bad job, bad job in that case what do you expect this quantity to be lower right. Because the Kramber bound says that it is coming in the denominator that means it is going to say any estimator has to incur a large variance. But on the other hand if your data is not so spread out it is easier to infer the parameter maybe then it must be the case that this quantity is going to be larger for that ok. That is why sometimes because of the appearance of this quantity in the lower bound in Krammer's row bound it is also called Krammer row bound is also called information inequality ok. So, in the tutorial you will see like how to compute this lower bound for various pdf functions and all. So, specific examples we will see in the tutorial, but any question about the general steps one has to follow in computing main square estimator or Krammer or this Fisher information or the Krammer or lower bound. If you have any question you should ask now this one why not we could use right. So, here I forgot to plot this I mean I did not plot this, but if you people read the book you can plot let us say you have this mean square estimators as a function of n. n is on the x axis and let us say mean squared error on the y axis and this could be for both I am going to w and w prime I will plot. It may happen that by the way both of them the mean squared error are going to fall with as n increases right. So, let us say for w this is the top one is w let us say it is going to fall like this and for w prime it may happen that initially it may be larger, but as n goes it may start falling faster than this at some point let us call that point as n prime. So, for n smaller than n prime which one you feel is better the second one right sorry this one let us call this as w and let us call this as a w prime. So, you see that for n in this region when n is small your w is better actually and it may happen that as n increases beyond n prime your w prime may be better. Depending on how many samples you have you can decide whether the biased one is good or unbiased is going to be better. So, of course, this is this is the mean squared error is not going to fall down to 0, but this is just like I mean this is just for representative purposes ok, but I hope you got the picture and this is where the analysis is important when you have to you have data and depending on your samples you need to decide which variance is going to work out better for me. Maybe you need to compute all this mean squared errors of your various estimators you can think of and it may end up that you may want to you may end up using biased estimators because for that many samples may be biased estimator will work out better ok. So, it is not necessary that all the time unbiased estimators are going to do a good job ok. Maybe biased estimator can also do a good job, but that depends yes that is why it is important to compute all this expression maybe for a some toy examples like first you need to see your data is discrete maybe if it is a discrete and you feel that it looks close to binomial compute all these things for the binomial and if you feel that your data is looking more Gaussian and then compute all this for Gaussian ok and by the way notice that this is you can compute this expression is true this expression is true independent of what is your underlying distribution agree this is also true irrespective of what is your underlying distribution. Maybe you need some property here to calculate the variance of your variance samples variance estimator ok. So, that is where you need to see which distributions you should use to make this computations ok. So, maybe what I have written here this is this holds for the Gaussian distribution not necessarily for everything this holds for every distribution, but this to compute variance of where variance estimator that is not easy that maybe you will be able to compute only for specific distributions like Gaussian or maybe simpler ones ok. So, depending on your data means better representation your data is closer to the Gaussian or better represented by exponential maybe you should use that particular distribution put it like and see what is the better works out for you ok. Any other question on this ok if not yeah you will see the examples on this in the tutorial. So, let us now move on to our next topic called hypothesis testing. So, how many of you heard this topic before yes or no ok. Now, the same question we are talking asking about the parameters right we are estimating the parameter we could ask in a different way. We can ask whether the parameter the data that I am seeing whether it is going to represent this population parameter or not I can ask kind of S no questions ok. Earlier I was exactly trying to find out what is the parameter, but here I could just say ok maybe S or no or maybe it belongs to there here or there maybe this kind of things. And so, I will make hypothesis like that ok S hypothesis is this no hypothesis is this. Now, it boils down to checking those hypothesis ok. So, the definition is a hypothesis is a statement about a population parameter. We the hypothesis can be ok the parameter lies in this range or that range this is like hypothesis. Now, you decide whether it lies in this range or that range come up with a criteria to evaluate that. Most of the time we go with two hypothesis in hypothesis testing which are complementary to each other. And the two complementary hypothesis in a hypothesis testing are called null hypothesis and alternative hypothesis and they are often denoted as H 0 and H 1 respective. And here the general form of the hypothesis testing is you will assume that maybe you will make a let us say this is your parameters theta space you have partitioned into two parts. Let us call the upper part as theta 0 and the lower part as theta 0 complement. You do not care which is that particular point it belongs to what you care is whether my theta belongs to this or this you have only two hypothesis here. My theta belongs to this region or this region and you need to find out or come up with a method to evaluate that ok. And if you take a real line my hypothesis could be as simple as maybe you put some threshold here some this is your known threshold and you ask the question whether my parameter lies in this region or in this region. This is like a boundary ok. Now, hypothesis need to be tested ok. You have two hypothesis I said null hypothesis and alternative hypothesis in which you are they need to be tested and now that hypothesis testing procedure or a hypothesis test is a rule that prescribes for which sample values the decision is made to accept your null hypothesis to be true and for which sample values H naught is rejected and H 1 is accepted as true. Naturally in this case since there are only two hypothesis to be tested when I reject hypothesis null hypothesis that indicate that I am accepting alternate hypothesis. So, now the previous diagram I showed one as the parameter space. Now let us say this is my sample space and I have only two samples and this and maybe there is some partition here. I need to come up with the decision rule like this maybe something which says that all the points which are here they corresponds to let us say null hypothesis. I have come up with a decision boundary here saying that if my points belongs to in this region I am going to accept it as null hypothesis and if it is coming from any of this space I am going to accept alternate hypothesis or basically reject my null hypothesis. Now the question is how to come up with the decision boundary? How who is going to give me the decision boundary? So, for that we need to come up with a method. So, the method to come up for this decision boundary we are going to use something called likelihood ratio test. So, all of you already know what is the likelihood function. Now we are going to use it to define likelihood ratio test and using this likelihood ratio test we will define our decision rules ok. Now if you have given two hypothesis H naught which says my parameter belongs to the space theta 0 and H 1 which says my parameter belongs to the complement of that set. Now for any random sample x you are going to take ratio of these two quantities where the numerator is going to compute the likelihood maximize the likelihood function over your space of null hypothesis and the denominator is maximizing it over all possible parameters this is both over null and alternate space this is all possible parameters. So, now let us go back and recall what we said about our likelihood functions right. So, likelihood function is capturing how likely that value theta is for my observed samples. So, the numerator is trying to compute the best theta that is explaining my observed sample x and the denominator is capturing all possible theta that is among all possible theta which is explaining my x best. Now just think intuitively if this lambda x happens to be large what you expect there that means that numerator is dominating right if lambda x is larger that means some parameter theta in my null hypothesis is better explaining what is x and if this quantity happens to be small that means denominator is dominating that means a parameter which is not in my null hypothesis is better explaining my data ok. So, hypothetically let us say if lambda x is going to be large you want to accept it as a null hypothesis or alternate hypothesis null hypothesis right, but then the question comes what is this large ok. So, for that we are going to define a parameter. So, then a likelihood ratio test is any test that has a rejection regions of the form x lambda x is less than or equals to c. So, c is some parameter that you are going to decide. So, what is this is going to do is all the point for which this lambda of x is going to be less than or equals to x it says reject that means they are not coming from my null hypothesis they are coming from my alternate hypothesis. And this is my rejection region ok, example I forgot right can you think of an example which we can work now. Take simple case no let us take Bernoulli my x is Bernoulli x 1 x 2 to x n and x i are Bernoulli with parameter p. What is the likelihood function for Bernoulli ok, p to the power 1 minus p to the power let us take it the log likelihood function right that is simpler let us take the log of it we know that nothing changes right because we are optimizing let us say changes we are not looking for the argument right we are looking for the actual values here, but for computing the optimal value that is fine let us find let us take the log of this. And what are my ok now let us define ok first let me finish this x i minus n that is what log p log p and n minus summation x i log 1 minus p. Now let us propose me a hypothesis what should be our hypothesis I am going to check the hypothesis whether this tosses are coming from a fair coin or not. So, what should be my p my null hypothesis p equals to half and what is h 1 is going to be p naught equals to half ok. Now in this is my theta set my theta naught set here is this just to have one point and what is my theta 0 complement has it is like all x belonging to 0 1 and that x is not equals to half it has every other point in it ok. Now let us do the optimization first let us consider the numerator where ah. So, log likely lambda of x is what sup over p equals to p belongs to theta 0 let me write it like this L of x given p divided by sup over p belonging to entire thing 0 1 now ok that includes half and entire thing and L x given p ok. Now let us compute let us I know that the maximum value it does not change with respect to. So, what is the maximum value here first numerator theta is what half I am going to compute it at p equals to half that is nothing to optimize only one point is there. So, the numerator is going to be what ok let me directly put half of summation x i half of summation x i ok and what is the optimizer here I know that if I have to optimize it my p is going to be 1 by n summation x i right that we already noticed everybody agree with this the optimal value of this. So, the denominator is going to be summation x i by n summation x i yeah p p I am replacing by this quantity and summation x i 1 minus n divided by n minus summation x i. So, what is the numerator? Numerator is going to be simply 1 upon n and what is denominator is that anything I can simplify in the denominator nothing just keep it like this x i and yeah 1 minus summation x i by n and n minus x i. Now what you are going to do is now if you want to if a point x is given to you a point x is given to you now you are going to come up with your rejection region lambda of x is this. Now you are going to see if this is going to be less than or equals to c for a given c if this is the case you are going to reject it you are going to say this is not coming from your fair coin and you are going to say other way on the other hand if lambda x happens to be larger then you are going to accept it to be coming from a fair. So, all depends on this c how you are going to set ok. So, we will continue discussing this in the next class.