 So, in the last lecture we started discussing about little bit about the confidence sets. So, we will continue and continue that today, but before that we mostly talked about the hypothesis testing and there we covered all these aspects of power functions uniformly most powerful function test and then we talked about Neyman-Piersen lemma. So, for what all the methods we discussed they were all like a point estimators right like when we had a maximum likelihood estimator we got one estimator based on the samples one value for the estimator based on the sample and when we did method of moments again we got one value for your estimator and what are the other methods we discussed in the point estimation we talked about base estimators right. There also we get one value for your estimator ok, we did not talk about another method called EM method expectation and maximization there also you will try to find iteratively one value ok. And in the hypothesis testing we mostly focused on how to come up whether instead of asking the question whether what is the parameter we just ask answer the question like whether this sample corresponds to this set of parameters like for example, we said whether my theta is equals to some theta naught or not we wanted to ask such questions there and we we come up with our rejection region. Now in this confidence interval instead of looking for one value for your estimator we will say my estimator possibly lies in some set estimation as my true parameter which I try to find out using my estimator estimated values we are going to say that lies in some region or like. So, in that like earlier ok let us get started with this. Suppose we have this random sample which are coming from some population with the underlying parameter theta which is unknown to you. Now we can estimate a point estimator for theta as theta hat and we have discussed many methods for them and whenever this theta happens to be continuous valued we know that the probability that the estimated value is equals to the true value is going to be 0. So, that means no way I can tell that my estimated value is actually representing the true value. So, my confidence is 0 in that. However, instead we can find a set C such that this theta hat belongs to C sorry may be sorry this is there is no hat here such that the true value lies in that set and try to find out what is the probability that my true value theta lies in that confidence set C and may be we will be able to say that to happen with positive probability ok. So, then maybe I can say that something more confidently. So, for example, instead of saying that I will come to the class exactly at 2 p.m. You can say that maybe I will come to the class maybe between 2 to 2 p.m. So, in that way maybe most of the times I will be between 2 to 2 0 5. So, that means like I am more confident exactly like I will be able to reach in that 5 minutes window. So, that is not like I am giving you a window there. Now, we will discuss how to come up with such interval estimators or which we also call it as a set estimation. So, in this problem our inference in a set estimation problem is a statement that theta belongs to C. The set I want to come up where C is going to be obviously set of my parameters I do not want to say anything outside my possible set of parameters that theta is my set of parameters and the set C is itself is a function of your observed sample X is a set determined by our observed sample X. So, what we want to now do is given a random sample X you can find theta X which of course will depend on X, but now instead of coming up with one value we are going to say that instead of this we are going to come up with a set now. This is a point and now this is a set. Now, set how you are going to characterize this set an interval estimation of real valued parameter theta is any pair of functions L of X and U of X such that L of X is going to be smaller than U of X and this should happen for all possible samples. So, this should be capital X here I mean bold X and this random interval is called an interval estimator. So, notice that so this L of X is a real number U of X is a real number and what I am now doing is I am saying that this is your L of X and this is your U of X and now I am giving you this interval it need not be always interval may be in some situation we may end up with sets. Now, couple of things to note one is this L of X U of X that interval is a random quantity why is that because that depends on your random sample itself right like let us say for one sample X I have this interval L of X and U of X and let us say I got another sample may be from the same population itself for that I may have this quantity ok. So, this intervals themselves are random quantities and when I have a particular realization of this random sample which we denote is X then that is for that X I have this realized interval ok. So, that is what like I mean for different possible X I may get different different intervals that is why that random interval that interval itself is a random and for a X particular X I will get that realized value. However, in some cases we may not be interested in both L of X and U of X one side we can just say that ok L of X may I can take as minus infinity or U of X I may take it as plus infinity. So, in this case I will be just getting one sided confidence intervals that is like here if your L of X is minus infinity like I will have only U of X. So, then entire thing from minus infinity U of X that is like my one sided confidence interval and similarly if you set the other way around this right and you just set L of X this is like another one sided interval. But mostly we will be interested in something where both I have a lower bound and an upper bound that will govern my confidence interval. There are other variants also possible for example, why you have to take close interval at both ends you may take open at one of them or a mix of both ok. It depends on how you want to define and what is your application. Actually this example we discussed in the last class I am just repeating it here again. Suppose let us say I have random samples drawn from Gaussian distribution with unknown mean mu that mean is not known, but let us say variance is fixed at one and for simplicity I am just working with four samples. Now as usual my X bar is my sample mean in this case 4, but now instead of directly taking this as my point estimator I have defined my L of X as X bar minus 1 and X U of X as X bar plus 1. So, basically now instead of taking this X bar as my estimated value for my mu I am taking X plus 1 sorry X minus 1 and X plus 1 as the value. Now let us compute what is the probability that my true value mu lies in this interval. Can we calculate that? So, the way to calculate that is yes the probability that mu is going to be lie in the interval X minus 1, X bar minus 1 and X bar plus 1. This question is same as asking ok X lies between these two values that is that mu is something between these two extremes. Now I have simply manipulated this I have bought X bar minus mu in the middle and the left side now I am left with minus 1 and plus 1 that is just a simple reorganization I have done. Now in the next step I have divided throughout by square root of 1 by 4 ok. So, why specifically square root of 1 by 4 to make sure that this guy in the middle becomes a standard normal ok. Now that standard normal quantity I have represented by Z and now the left side is minus 2 to plus 2. So, now I have the standard Gaussian we know most of its things about its tail behavior and they are all usually available in the tables are a simple program will immediately compute and give. This value what you can figure out that if this quantity is let us say this is minus 2 and this is 2. The probability that it is going to be in this range minus 2 to plus 2 is going to be 0.9544. Now what you are saying is the true value is going to be between minus sorry true value is going to be between X bar minus 1 and X bar plus 1 that you can say you can claim that with about 95 percent ok. What does this 95 percent means here can somebody quantify it out of 100 like yeah on an average like if I get you give me a lot of sample I can guarantee that 95 percent of the time I can guarantee you that the value I have provided is going to I mean the interval I am going to provide here we are talking about interval the interval I am going to provide that is going to capture the true parameter ok. So, what we have done is instead of giving one point we have now given this interval by doing so what I have done I have compromised on the precision like exactly I am not saying what is the value of unknown quantity mu I am saying ok instead of saying this is exactly it is I am saying no it is in this interval, but I am saying this with more confidence. If I said that ok your mu is exactly X bar then I know that my confidence is very low in almost 0 confidence like, but if I say that mu is going to be between X bar and minus 1 and X bar plus 1 now 95 percent confident ok. So, a quick question instead of X minus 1 and X bar plus 1 had I made it X bar minus 0.5 and X bar plus 0.5 what would have happened to my confidence it would have increased or come down it has decreased and if I made it like X bar plus 2 and X bar minus 2 increase. So, like see like I am kind of trading off here by being more and more loose I am being more and more confident and if I want to be tighter and tighter I am lesser confidence ok. So, you have to decide you want to be loose or tight. So, loose means more confident tight means less confident. So, you have to trade off where to hit a good balance. Again some analogy here like if tomorrow meteorological department says there is a chance of rain 30 percent and some other other one give that ok there is a chance of rain that it is going to rain 10 percent which one as per you is better I mean who may who is making a better prediction. The one guy is telling that it is going to rain 10 percent or the guy who is going to claim he is going to make it is going to 30 percent that is the confidence right 10 percent 30 percent is it not confidence they are going to say it is going to rain with likely 30 percent that means they have only 30 percent confident about rain or not rain is our decision now 10 percent or you are going to say that with 10 percent or 30 percent that is your confidence. Is that is enough for you to decide who is better in this case what else is missing ok may be we have to enlarge the details about this problem to discuss that let us not get into that now ok.