 In the last lecture I have introduced the concept of interval estimation and I discussed one method of constructing the confidence intervals with a given confidence coefficient. This method is the method of pivoting and we constructed the confidence intervals for parameters of normal populations when we have one sample or two sample problems. Today I will also discuss briefly in the confidence intervals for proportions that means, we are dealing with a binomial problem for example, we may have people who favor a certain proposition by the government, people who are who can be categorized as one type in a population. So, if we are doing the sampling from there then to conduct the to construct the confidence intervals we can use the binomial approximation to the normal distribution. So, let us consider confidence intervals for proportions. So, typically we will have the data like this that we have a sample of n observations and out of that we have x number of successes. So, let us define say the sample proportion as x by n we want to construct the confidence interval for the parameter p that is the proportion of successes in a binomial population. So, we can consider say p head minus small p divided by square root p q by n this converges to normal 0 1 as n tends to infinity this result is known. Now for n large we can approximate p by p head and q by q head that is equal to 1 minus p head. So, we can then write p head minus p by square root of p head q head by n as approximately normal 0 1 random variable. So, we can use the pivoting method by considering the interval from minus z alpha by 2 to plus z alpha by 2 this is equal to 1 minus alpha because we are considering the 2 points on the standard normal curve this is z alpha by 2 that is this probability is alpha by 2 and if this probability is alpha by 2 then this is minus z alpha by 2. So, this in between probability is 1 minus alpha. So, the probability of p head minus p divided by square root of p head q head by n this is approximately 1 minus alpha. So, we can construct the confidence interval from here we can adjust the terms. So, this is equivalent to we can write minus z alpha by 2 square root p head q head by n less than or equal to p head minus p less than or equal to z alpha by 2 square root p head q head by n that is equal to 1 minus alpha or probability of p head minus z alpha by 2 square root p head q head by n less than or equal to p less then are equal to p hat plus z alpha by 2 square root p hat q hat by n that is equal to 1 minus alpha . So, we have the confidence interval for p here that is from p hat minus z alpha by 2 square root p hat q hat by n to p hat plus z alpha by 2 square root p hat q hat by n. So, here p hat is the sample proportion x by n. So, this is an approximate . So, an approximate confidence interval for p is constructed. We may even consider comparing two binomial proportions. For example, it could be like proportion of the people who drive a certain vehicle in city A and proportion of the people who drive a certain vehicle in city B. So, the proportions may be different p 1 and p 2 and we may want to have a confidence interval for the difference to have an estimate whether one of them is less than the other are equal. So, we may consider confidence interval for difference in proportions. Let us consider say lexed x and y be independent binomial random variables with x following say binomial m p 1 and y following say binomial n p 2. So, here obviously m and n are known we want confidence interval for p 1 minus p 2. Let us say it is equal to p. Once again we will make use of the approximation of binomial distribution to the normal. So, if we consider say p 1 hat is equal to say x by n x by m p 2 hat is equal to say y by n. Then p 1 hat minus p 2 hat minus p 1 minus p 2 divided by square root p 1 q 1 by m plus p 2 q 2 by n where here I am using q 1 is equal to 1 minus p 1 and q 2 is equal to 1 minus p 2 and q 1 hat is equal to 1 minus p 1 hat and capital q 2 hat is equal to 1 minus p 2 hat. So, this is approximately normal 0 1 as m and n tend to infinity. So, then we can write we can replace p 1 q 1 by p 1 hat q 1 hat and p 2 q 2 by p 2 hat q 2 hat to get approximate statement of the following nature that is probability of minus z alpha by 2 less than or equal to p 1 hat minus p 2 hat minus p 1 minus p 2 divided by square root of p 1 hat q 1 hat by m plus p 2 hat q 2 hat by n this is less than or equal to z alpha by 2 this is equal to 1 minus alpha. So, once again as before we can simplify. So, we can write probability of p 1 hat minus p 2 hat minus square root p 1 hat q 1 hat by m plus p 2 hat q 2 hat by n z alpha by 2 less than or equal to p 1 minus p 2 less than or equal to p 1 hat minus p 2 hat plus p 1 hat q 1 hat by m plus p 2 hat q 2 hat by n z alpha by 2 that is equal to 1 minus alpha. So, we have an approximate confidence interval for p 1 minus p 2 of this form that is p 1 hat minus p 2 hat plus minus square root p 1 hat q 1 hat by m plus p 2 hat q 2 hat by n z alpha by 2 where again z alpha by 2 is the point on the normal distributions curve. This method of pivoting as I have explained can be used for various distributions whenever we are able to find out the pivoting quantity. Usually as we have seen it can be dependent upon the sufficient statistics and it is also coming from the theory of Neyman Pearson's best tests. So, now I will move over to the concept of the testing of hypothesis. Let us look at the basic notation and term and Raji for the problem of testing of hypothesis. Let me introduce the problem first. So, I have mentioned to you the problem of statistical inference that is we are considering certain population and we are looking at its characteristics. So, for example, we may be looking at the average heights of the say adult males in an ethnic group. We may be considering say average precision or the precision of a measuring instrument measuring device which is used for measuring something. We may be considering the amount of symmetry or asymmetry present in a curve. We may be interested in estimating the average life of a electronic component and so on. Now, in these we are making that we are considering we are having no prior knowledge about the parameter. So, we consider estimation, but there could be another type of thing. For example, we have a certain brand for a particular item. Now, a new brand of that item has been introduced in the market. The manufacturer or the shopkeeper or the customer will be interested to know whether the average longevity or the average life will be more than the previous brand. Suppose, there is a drug which is being used for curing a certain disease. Now, a R and D division of the drug company it introduces a new drug in the market. It finds out or it invents a new drug. Now, certainly everybody will be interested to know whether the new drug is more effective in curing the same disease than the previous. They may be looking at its efficiency in the terms of less time taken, the proportion of people who are getting cured that could be more or the average cost of the medicine and so on. There can be several factors that can be used to test. That means, here we may have some information about the parameter, but we want to test. So, this is called the problem of testing of hypothesis. We can roughly say that it is an statement about the, since we are dealing with the parametric methods, we can say it is some statement about the parameters of a population. In general, a hypothesis would be any statement about the probability distribution. For example, you may even say that ok, we want to test whether the data is coming from a normal population or the data is coming from a gamma population. That could be a more general statement of the testing of hypothesis problem, but in the beginning we will restrict attention to the parametric methods. That means, the population is identified, but we want to test something about the parameters values, whether the value is equal to something or it is less than something and so on. So, we pose the problem in the following fashion. So, let X 1, X 2, X n at our disposal we have a random sample. Let X 1, X 2, X n be a random sample from a population say p theta, theta belongs to parameter space theta. This theta could be a scalar or a vector. A statistical hypothesis is an assertion about parameter of the population. So, for example, a drug for curing a certain disease is found to be effective in say in 50 percent of the cases. So, if we use the notation say p for the proportion of patients who are successfully cured using this drug, then with this drug p is equal to half or p is equal to 0.5. Now, a new drug is introduced and let p star be the proportion of patients who get cured using this, then we will be interested to find out whether p star is greater than 0.5 or not. So, I have stated the problem in a very simple terms that we want to make some statement about the parameter of a population. So, here it could be like you take observation that means, you consider a sample of patients out of that you find out how many get successfully cured and not and based on that you will conduct a statistical procedure. So, let us discuss this. So, we will firstly we will try to write down a hypothesis in this fashion. We write a hypothesis like this. We describe a hypothesis as say H naught p star greater than 0.5 or some we may say H 1 p star is equal to 0.5 or H 2 say p star is less than 0.5 or say H 3 p star is equal to 0.75 and so on. These are various statements in each of them we are actually identifying the value of the parameter. In some cases we are telling a range in some cases we are exactly specifying. Now, now in general hypothesis testing problems the common formulation that we give we firstly have a statement. For example, we may like to say p star is equal to 0.5 or p star is greater than 0.5. Then if we make a statement this is called a null hypothesis and then we test against another one. So, that is called an alternative hypothesis. Now, this type of formulation for testing of hypothesis problems was developed by Georgi Neyman and E. S. Pearson in 1926 onwards in a series of papers where they developed this theory. In this formulation we have a null hypothesis say H naught. So, let us say if we are considering say normal distribution with parameter theta and say variance sigma square we may like to test whether theta is equal to 0 against an alternative hypothesis say H 1 theta is equal to 1. We may write in different ways also like H naught theta is less than or equal to 0 against say H 1 theta is greater than 0. We may like to write H naught sigma square is equal to 1 against say H 1 sigma square is greater than 1. We may like to write H naught mu sigma square is equal to 0 1 versus H 1 mu sigma square is not equal to 0 1 and so on. So, there can be various hypothesis which may be required to be tested. Now, we make a simple classification here when the value of the parameter specifies the distribution itself. For example, here in this binomial testing problem if we say p star is equal to 0.5 then the distribution is completely specified. This is called a simple hypothesis and when we say p star less than 0.5 etcetera then the distribution is not completely specified. This is known as a composite hypothesis. For example, if I write mu sigma square is equal to 0 1 then this is a simple hypothesis. But if I say theta is equal to 0 then this is not a specified sigma square. So, this is a composite hypothesis. So, we have the concept of a simple hypothesis. If a hypothesis completely specifies the parameters of a distribution then it is called a simple hypothesis otherwise it is called a composite hypothesis. So, for example, this is a composite hypothesis. This is a simple hypothesis. This hypothesis is composite because this does not specify sigma. This is composite. This is simple. Sorry, this is theta here. This is composite. These are all composite. Now, a statistician based on a sample will like to test the hypothesis. That means, he will give a procedure and he will decide that procedure will try to make a decision in favor of a certain hypothesis. For example, we may say suppose we consider a sample of 100 patients. We find that nearly 75 percent of the patient that is 75 patients get cured from the new drug then certainly we may tend to believe that p star is greater than 0.5. On the other hand, if may we may find that only 25 out of 100 get cured then we may say p star is less than 0.5. Now, this is a something like you can say a layman's kind of thinking that we can certainly say that if out of 175 get cured then it is too large then 50 percent and therefore, we may tend to believe that p star is greater than 0.5. But what happens suppose it is in the sampling that we have done it turns out that out of 100 say 57 patients get cured successfully. Then would we still be in favor of the statement p star greater than 0.5 with the same convincing argument than the previous one. Can we say that it is significantly higher the effectiveness is significantly more than p star is equal to 0.5. Now, that is the question that a statistician would like to answer in a more effective fashion. Similarly, if we are considering say the hypothesis theta is equal to 0 and theta is equal to 1. Now, if we consider a random sample x 1 x 2 x n from the normal distribution we may consider x bar as an estimate of theta and then you may say that ok if x bar is 0 then accept h naught and if x bar is equal to 1 then accept h 1. Now, the thing is that if we are considering the sampling from the normal population then x bar is also a normal distribution with mean theta and variance sigma square by n. So, it is a continuous distribution. So, the probability that the x bar is 0 or the probability that x bar is 1 both are equal to 0. Therefore, it does not make sense to give a test of this type and not only that see what happens if x bar is say equal to minus 1 what happens if x bar is equal to say half or what happens if x bar is equal to 2. Therefore, in place of having a point test we may have to give a range. So, that we can significantly different differentiate between the two hypothesis h naught and h 1. So, we can say that a test of statistical hypothesis is a procedure to decide whether to accept or reject a given hypothesis. Now let us consider say and this will be the decision will be based on a sampling scheme based on a sample. So, let us take an example say x follows binomial say 3 p and our hypothesis whether p is equal to say 1 by 4 or h 1 p is equal to 3 by 4. So, this is that means we have considered a sample based on 3 observations out of which we say that x is the number of success. Now a layman's procedure could be that we may consider a test procedure can be let us call it T 1 procedure that if x is equal to 0 or 1 then decide in favor of h naught if x is equal to 2 or 3 then decide in favor of h 1. So, now you can see that this procedure is a heuristic procedure what we are saying is that if x is equal to 0 or 1 then it means that number of the proportion of the success is smaller. And therefore, we may say that the probability of success should be smaller and therefore, we go in favor of the hypothesis p is equal to 1 by 4. On the other hand if out of 3 tosses or out of 3 trials you get 2 or 3 success then you may say that the probability of success should be high and you feel that probability p equal to 3 by 4 must be the correct statement and therefore, we decide in favor of h 1. So, we say we give a statement accept h naught and here we say accept h 1 or we can say reject h naught. Since in the original problem we write one hypothesis as the null hypothesis that means, the initial one and another one as alternative hypothesis we may make the statements like rejecting h naught or accepting h naught or we may say accepting h 1 if we say reject h naught and so on. Now based on this we are able to so basically what we are doing we are having the sample space here consisting of 4 points 0, 1, 2, 3 and we are dividing it into 2 parts we call it acceptance region that is 0, 1 and the A complement that is the rejection region we call it 2, 3. So, this is called acceptance region and this is the rejection region. So basically a test of hypothesis partitions the sample space into 2 disjoint sets say A and R where A corresponds to the acceptance of h naught and R corresponds to the rejection of h naught or you can say acceptance of h 1. So, that is why this A we call to be acceptance region and R we call to be the rejection region or critical region. Since we are basing our decision on the outcome of a random experiment that means, we are doing the sampling therefore certainly there are there is a chance of error in the form of introducing this type of error. So, we call it 2 types of errors. So, when we conduct a test of hypothesis based on a random sample we are likely to make 2 types of errors. So, first one we call type 1 error that means rejecting h naught when it is true and second one is called type 2 error that is accepting h naught when it is false. Now, the consequences of the 2 types of errors can be of various types depending upon different problems. Let us take a example related to say medicine. So, in a medical experiment say tests are conducted on a patient to detect the presence of a certain disease say D. So, now based on the tests we may conclude based on the tests we may conclude. So, your hypothesis is like h naught D is present that means, the person has the disease or h 1 D is not present. So, now you see we may decide to accept or reject h naught. Now, what are the consequences? So, if you look at type 1 error that means you will we are concluding that rejecting h naught that is we conclude that D is not present whereas, in fact it is present then it may lead to fatal consequences for the patient. If we consider say type 2 error that means you conclude that D is present whereas, in fact it is not then it may lead to harassment of the patient in terms of unnecessary treatment, monetary loss and health side effects. Now, therefore in any given problem it is of important to control the 2 types of errors. So, we give measures for these 2 types of errors we consider say alpha is equal to the probability of type 1 error that is the probability of rejecting h naught when it is true and similarly we consider beta that is equal to probability of type 2 error that is equal to probability of accepting h naught when it is false. So, in any given problem it will be interesting or you can say it will be desirable to control both the errors alpha and beta basically we will like to have them to be a minimum. So, basically it will be the goal to minimize both alpha and beta however it is not practically possible. The reason is that if I reduce alpha then beta will increase and if I reduce beta then alpha will increase you can think from this example that I gave for this example let us calculate. Let us consider this test T 1 what is alpha here alpha is the probability of rejecting h naught that means, x equal to 2 or x is equal to 3 when it is true that means, when p is equal to 1 by 4 that means, is equal to probability of x is equal to 2 when p is equal to 1 by 4 plus probability of x is equal to 3 when p is equal to 1 by 4. So, that is equal to 3 c 2 1 by 4 square 3 by 4 plus 3 c 3 1 by 4 cube. So, you can write this values it is equal to 9 by 64 plus 1. So, that is equal to 10 by 64 that is equal to 5 by 32. Let us look at beta, beta is equal to probability of x is equal to that is probability of accepting h naught when it is false. So, we accept h naught when x equal to 0 or x equal to 1 when it is false that means, when p is equal to 3 by 4. So, that is equal to probability of x equal to 0 when p is equal to 3 by 4 plus probability of x equal to 1 when p is equal to 3 by 4. Once again we calculate these quantities it turns out to be 3 c 0 1 by 4 cube plus 3 c 1 3 by 4 into 1 by 4 square. So, once again it is equal to 10 by 64 which is equal to 5 by 32. Now, I design another test say let us consider another test say T 2 that is accept h naught when say x equal to 0 and reject h naught when x equal to 0 1 2 and 3. For this test let us calculate say alpha let me call it alpha 1 say for the test this one I will call alpha 1 and beta 1. Now, here I will call it alpha 2 and beta 2. So, that is equal to probability of x is equal to 1 plus probability x equal to 2 plus probability x equal to 3 when h naught is true that is p is equal to 1 by 4. So, that is equal to 3 c 1 1 by 4 into 3 by 4 square plus 3 c 2 1 by 4 square 3 by 4 plus 3 c 3 1 by 4 cube that is equal to 27 plus 9 plus 1 by 64 that is equal to 37 by 64. And let us look at say probability of type 2 error then that is becoming probability of x equal to 0 that is probability of accepting h naught when it is false. So, this is simply equal to 1 by 64. So, you can see here that by using this particular test we have been able to reduce beta 2 from 10 by 64 to 1 by 64, but at the same time the probability of type 1 error has increased from 10 by 64 to 37 by 64. In the same way we can consider of reduction of alpha, but then beta will increase. So, therefore, a practical way which the Neyman and Pearson suggested was that we fix an upper level for probability of one type of error and then try to find out a test procedure for which the other type of error is minimum or we can say 1 minus the probability of the other type of error is maximum. So, as a convention it was considered we define power of a test say let us call it gamma that is equal to 1 minus beta that is probability of rejecting h naught when it is false. So, it was proposed to find the tests which for a given. So, given value of maximum alpha will have smallest beta or maximum 1 minus beta that is gamma. So, this was called most powerful test because maximum power most powerful test of size alpha because we put the maximum value of alpha that is called the size of the test or level of significance there are various names of it and we consider the maximum the test which will have the minimum probability of type 2 error or the maximum power that most powerful test. So, the theory of most powerful test. So, for simple versus simple case a complete solution was obtained by Neyman and Pearson in 1926 and thereafter it was generalized to the concept of uniformly most powerful test later on by the same authors and for composite hypothesis and also for some other situations where even uniformly most powerful test does not exist. So, they considered certain restricted class of test called unbiased test and among those tests they found the most powerful test. The theory of most powerful test was developed by Neyman and Pearson in 1926 to 1937 in this period. So, firstly they considered the solution for the simple versus simple case. So, suppose we have the problem let us write in terms of observations. So, x is say following f x and we make the hypothesis whether f x is equal to f naught x or h 1 f x is equal to f 1 x. So, consider say T x is equal to f 1 x by f naught x. The most powerful test is to reject h naught if f 1 x by f naught x is greater than k. Basically this is not the complete description we also have the range. For example, we may have a discrete distribution and in that case we also have reject h naught this accept h naught if f 1 x by f naught x is less than k and there was also a portion equal to that is reject with probability say p if f 1 x by f naught x is equal to k. Now, this constant k is chosen to satisfy the size condition. However, even importantly it was that it is not necessary not only sufficient it is also necessary condition for the most powerful test. So, simultaneously they did they showed the existence of a such a test existence of the most powerful test and also that if there is a most powerful test it has to be of this form. Now, this turned out to be extremely useful result and let me explain through one example. Let us consider say a simple testing problem say x 1 x 2 x n say follow normal 0 sigma square. We are having the testing problem say sigma square is equal to 1 against say h 1 sigma square is equal to say 5. Now, let us take let us consider the density function here of all the observations. So, x is equal to x 1 x 2 x n where your x is equal to x 1 x 2 x n. So, this is equal to 1 by root 2 pi to the power n sigma to the power n e to the power minus 1 by 2 sigma square sigma x i square. So, we consider f 1 x by f naught x that means the ratio of the densities when sigma square is equal to 5 and when sigma square is equal to 1. So, this will become equal to now this 1 by root 2 pi to the power n will get cancelled out you get 1 by root 5 to the power n e to the power minus 1 by 10 sigma x i square divided by 1 e to the power minus 1 by 2 sigma x i square. So, we consider the rejection region this is greater than some k. Now, this you can write in a modified fashion because this constant I can adjust on the right hand side and it will become e to the power half minus 1 by 10 sigma x i square greater than some k 1. I can take logarithm here. So, it will reduce to sigma x i square greater than some k 2. Now, we have to choose k 2 such that probability of sigma x i square greater than k 2 under sigma square is equal to 1 is equal to alpha. So, you can easily see that when I am doing the sampling from the normal distributions I can actually calculate the distribution here. So, under when sigma square is equal to 1 then I have x 1 that is x i is following normal 0 1. This will imply that sigma x i square will follow chi square distribution on n degrees of freedom. If that is so, then this statement is reducing to let us call it say let w denote a chi square n random variable then we have alpha is equal to probability w greater than k 2. That means, if I am considering a chi square curve on n degrees of freedom then this k 2 point is actually chi square n l. That is reject h naught if sigma x i square is greater than chi square n alpha. So, for the most powerful test the rejection region is of this form. So, this is the most powerful critical region for h naught sigma square is equal to 1 against say h 1 sigma square is equal to 5. Let us consider a little generalization of this problem. See you notice here here I took the null hypothesis 1 and in the alternative sigma square was 5 which was slightly bigger. And therefore, you are seen here in the denominator we had this minus half here and when we took the difference this becomes a positive quantity. And therefore, the region is in the form sigma x i square greater than k 2. On the other hand suppose I modify this suppose I consider suppose we have alternative say h 1 star sigma square is equal to say half. If that is so, then in this particular place we will get sigma x i square and if that is happening then you will get negative quantity here. So, if I take log the region will get reversed then the critical region will be half the form w less than some k 3. So, if that is so, then if you consider the region then if I want the probability alpha then this should be chi square n 1 minus alpha that is w will be less than chi square n 1 minus alpha. So, you can generalize. So, we can generalize to this problem suppose I consider sigma square is equal to sigma naught square against sigma square is equal to sigma 1 square and if sigma 1 square is greater than sigma naught square then reject h naught if w is greater than chi square n alpha. And if sigma 1 square is less than sigma naught square then reject h naught if w is less than chi square n 1 minus alpha. This is also pointing out to some important characteristic of this distribution when we are considering normal 0 sigma square in the density in the exponent we are having sigma x i square as a sufficient statistics and there is a property here actually this is called a monotone likelihood ratio property which is satisfied here. And therefore, the region of rejection will be decided by the direction in which sigma x i square is taking value. So, since you can also think of it as a maximum likelihood estimator and from there also you can see that for the larger values of sigma x i square I will favor the hypothesis h 1 and for the smaller values I will favor h naught. And similarly in reverse fashion we will consider here for the smaller value I will favor h 1 here and for larger values I will favor h naught in the other case when sigma 1 square is less than sigma naught square. In the following lecture I will give you the test for various hypothesis which are based on the for the parameters of the normal distributions which are based on this theory basically this results have been extended to the composite hypothesis. For example, I may consider here sigma square less than sigma naught square against sigma square greater than sigma naught square and vice versa. When we consider those situations we have the uniformly most powerful test then we have two parameter situations then we have uniformly most powerful unbiased tests of these hypothesis. Now without mentioning these things I will be explicitly giving the test for the various normal population problems in the next lecture.