 My name is Narayan Rangaraj from the industrial engineering and operations group, operations research group at IIT Bombay and some part of the introductory notes are also due to professor Sabnis from mathematics department. So, you have already seen this idea of confidence intervals which tell you in some precise probabilistic language how likely a parameter likely to be is going to be in a certain interval. So, that is when you are asked to give a range for a given parameter that you are trying to estimate. So, supposing you want to estimate the number of defectives in a total population and based on a sample you have to conclude something you cannot say for sure because the defective phenomenon is probabilistic. So, you can give a range and the larger the range the more likely the parameters likely is going to be in it. So, you can give a very large range, but that is not very useful. So, you want to give a tight range, but which contains the parameter of interest with high probability that is the best that you can do. So, confidence intervals you have already seen, but sometimes you want to ask the reverse question you do not want the the statistician or the experimental investigator or the scientist or the engineer to give a range of values. You have a hypothesis or you have a proposition and the analyst is supposed to either agree or disagree with it. So, you as the user have a have a hypothesis and the the role of the analyst is to either agree or disagree with it. So, the simplest example of course is like a court where a person is either guilty or innocent and some judge or jury has to agree or disagree with that. That is the that is the simplest example of hypothesis testing where the role of the analyst is to give an opinion yes or no not give a range of values and things like that, but give an opinion yes or no. So, on on what statistical basis do we do this? So, that is the topic of hypothesis testing. So, actually the materialist is straight forward it is the first few sections of ROS we will see how far we can go in in this lecture in the next one. So, that is what we will we will do. So, hypothesis testing this is what I plan to do introduction and motivation a notion of significance levels which is a way of saying how confident we are of our hypothesis test. So, when we agree or disagree then with with what probability are we sure of doing that. So, the there is a well known terminology used in hypothesis testing there is type 1 and type 2 errors we will quickly see that and specifically we will we will apply the concept of hypothesis testing to normal populations you know because of the central limit theorem we know that many interesting phenomenon can be assumed to have an appropriately chosen normal distribution. So, testing for means or variances of those normal populations is is one important area in hypothesis testing we will also see it for binomial populations. So, let us quickly start with the introduction. So, given given some data how likely is it that the data has arisen from some hypothesis distribution. So, we will we will answer this question more precisely later on, but this is the type of question. So, for example, if a manufacturer claims that some some parameter has an average value m then do we accept that. So, if somebody says that you know the lifetime of some bulb is so many hours, but you know inherently the the the performance of the bulb is the probabilistic one. So, if we just look at some data how do we accept or or not accept the the manufacturer's claim that the average is indeed so and so. So, as you can imagine if we assume that in this case that the parameter is normally distributed we can draw some random samples and you know it is very unlikely that all the values will be equal to whatever the manufacturer is claiming some of them will be more some will be less. For example, if all of them are less than what he claims then you know we are we are unlikely to believe his his claim. So, some will be more some will be less. So, with what confidence do we accept the manufacturer's claim that the mean is indeed so and so. So, in most in most practical situations there will be some physical meaning to the claim. So, for example, in the in the lifetime claim so if the manufacturer claims that the life of this bulb is 1000 hours then you know if it turns out that the life is actually 1200 hours on average then we are quite happy about it. So, what we are really testing is whether the whether the mean is 1000 hours or more. So, we are ok with you know an increase on one side. So, that is the type of thing which we would like to test through through some data. So, here some quick examples say a medical researcher may want to decide on the basis of some experimental evidence whether coffee drinking increase the risk of cancer in humans. So, this is the type of thing where you know the expert has to give an opinion. So, if there are two kinds of measurement gauges in the market then an engineer may have to decide on the basis of some sample data whether there is any difference between the accuracy of the two. So, this is the type of question that is asked in hypothesis testing which we would like to try to answer. So, you have seen the concept of random variables and independent random variables when you are talking about say about pairs of random variables. So, independence test. So, if you look at data then is there a dependence between a person's blood type and eye color. Suppose this is the experiment or the investigation and some data is collected on these two attributes and our question is someone says that these two things are independent random variables. So, do we say yes or no on the basis of the observed data. So, this is the type of example that I told you that a construction firm has bought a large number of cables and the claim of the manufacturer is that the cables have an average breaking strength of something 7000 psi. So, what this means is that the breaking strength on average is 7000 psi or more. So, do we accept this? So, in each of these cases you know there is a conjecture about the system or the population and we have to decide where to determine this conjecture based on some sample data. So, it is very similar to what you did in construction of confidence interval, but the way the question is posed is slightly different. So, a statistical hypothesis is an assertion of conjecture concerning one or more populations and in the language of hypothesis testing two complementary hypothesis come to mind either the assertion is true or it is false. So, some people use h and h dash h for the assertion is true and h dash for h assertion is false. So, we have two choices either we can reject h dash and conclude that h is strongly supported by the data or we can fail to reject h dash and conclude that h is not strongly supported by the data. So, you know for example, supposing someone claims that I have developed a better vaccine than what is there in the market for some disease. So, you might want to say so where is the evidence of that. So, you look at treatment data for the two vaccines and then you have to conclude whether there is enough evidence to overthrow the existing treatment and whether to accept the new treatment. Now, the data that you have on these things are sample data arising from some inherently probabilistic phenomenon. So, you cannot say for sure, but based on the evidence you would like to you have to say something I mean you cannot just say I do not know. So, in the language of hypothesis testing. So, just to contrast so supposing we put a mathematical proposition as opposed to a statistical proposition. So, mathematical proposition is that effects say some function is of this type some function has a minimum. So, this is either true or false. So, there is no ambiguity about this. So, you can verify this by whatever means, but statistical hypothesis are different in nature. So, statistical hypothesis so people are competing I mean two products are competing and the marketing analyst has made a claim that the proportion of consumers preferring brand A to brand B is 0.4 that is the 40 percent of the consumers prefer brand A. So, this is the hypothesis. Now, you can interpret this in a number of ways. One of the ways is that if you draw a random consumer then that person is likely to prefer brand A to brand B with probability 0.4 or if you pick a large number of people and ask them all which they prefer then about 40 percent of them are likely to say I prefer brand A to brand B. So, this is the hypothesis. So, supposing we actually now start doing this. So, I picked 15 customers and 12 of them are found to prefer brand A to brand B. So, it looks like a large number of them prefer brand A to brand B. So, this hypothesis that 0.4 of them I mean 0.4 fraction prefer brand A to brand B it looks a bit odd because 12 out of 15 prefer brand A to brand B. But from your knowledge of binomial distributions supposing I randomly picked 15 people from the population is it possible that 12 out of them prefer brand A to brand B given that each individual is likely to prefer brand A to brand B with probability 0.4. So, is this possible? So, did you get the question that each customer prefers brand A to brand B with probability 0.4? I got 15 persons in a room and asked them and 12 of them happen to prefer brand A to brand B. So, is this possible? It is certainly possible, but you know you can work out that it is quite unlikely it is certainly possible because you know you can work out the probability that 12 people prefer brand A to brand B given that each one will prefer brand A to brand B with probability 0.4. So, you can actually work it out, but as of now it seems unlikely. So, we would like to say that it is it is highly unlikely that the statistical hypothesis is true. So, we would like to make this conclusion. So, but supposing I say that you know 10 people preferred then you will say that is also unlikely, but certainly more likely than 12 people. What about 8 out of 15 possible? Definitely possible 7 out of 15 very very possible you know. So, at some place we have to draw the line saying that this this looks ok 0.4 seems seems ok. So, of course, exactly 40 percent of the people that you you sampled turned out to like prefer brand A to brand B then you say yeah it looks it looks possible that the the real the real fraction is 0.4, but if I got something too low. So, if I if I got 0 people out of 15 prefering brand A to brand B then you would be suspicious of the statement that 0.4 is the fraction. If I got 15 on 15 you would be suspicious, but somewhere in the in the middle you would you would be ok with it. So, what is that middle region where you would be ok with it? So, that that is the type of question that we will we will ask in. So, for example, here if the true probability which we do not know, but that is the claim in this case that is the claim made by somebody if if the true probability is 0.4 then the probability of observing 12 or more successes in 15 trials is 0.002 it is from that binomial calculation each one. So, it is like tossing a coin 15 times the probability of heads is 0.4 and I got 12 heads out of 15. So, what is the probability of that? So, you can work it out right I hope you can work it out. So, if you think about it you can. So, it is physically possible. So, it is not impossible. So, I cannot reject the hypothesis totally, but it is highly unlikely. So, the the way that the statistician would like to pose his conclusion is to give a level of significance. So, so before that similar language for an example which I will follow through now using some numerical value. So, for example, supposing the cure rate for a given disease using standard medication 60 percent, then a new drug is anticipated which somebody says has better medication properties than the standard one. Now, the new drug is to be tried on a sample of 20 patients and the number cured x in the 20 is to be recorded. So, this is the way we will we will go about the investigation to see whether the new drug is indeed more effective than the old drug. So, supposing 12 out of those 20 turned out to be ok, then you know it is it is about the same as the old drug. So, we want something significantly better than that. So, is there substantial evidence that the new drug has higher cure rate than the standard medication? So, society is generally conservative and you know we require strong evidence to overthrow status quo. So, generally people are comfortable wherever they are or they have got used to it. So, to overthrow something which is established you require strong evidence because one thing is it has a cost to go over to a new thing. So, you do not want to do it just very casually you want enough evidence that something is really better otherwise it is not worth it. So, this is a conservative way of putting it I am not saying that we should be that way, but sometimes it appears like that. So, now here the relevant hypothesis is that the new drug is better than the standard medication that is the p the success rate of the new drug is greater than 0.6 and the reverse of that is it is not better than the than the standard medication p is less than or equal to 0.6. So, this is the language used in hypothesis testing there is something called the null hypothesis and something called the alternative hypothesis. I will try to find out why this term null and alternative use why null hypothesis, but that is the it is usually used for the status quo that you are where you are and the opposite of that is the alternative hypothesis. So, of the complementary statements concerning the unknown state of nature in this case the effectiveness of the new drug one is called the null hypothesis the other is called the alternative hypothesis. So, in this case you know the choice of what to call the null hypothesis and what to call the alternative. So, we accept both the possibilities initially and then we want evidence regarding one of them and the way it is posed is that one of them is taken as the null hypothesis and you look if there is enough evidence to overturn the null hypothesis or to reject the null hypothesis. You can as well do it the other way around, but the implications are slightly different in terms of your perception and the way people look at it. For example, in a court of law you can assume the person is innocent and it is up to the prosecution to prove that the person is guilty. So, if there is not enough evidence then you say that I have no evidence to say that the person is guilty. So, I continue to believe that he is innocent. You can say you can start the proceedings the other way around the person assumed to be guilty and it is the duty of the defense to prove that or to overturn that assumption. You cannot prove anything in this statistical thing you can only give enough evidence to believe one way or another because from the type of example that we have seen we cannot be absolutely sure. So, even if for example, even in the even if the new drug is indeed has a success rate of 80 percent. If I draw a sample of 20 people by bad luck I may I may just pick a lot of people who did not respond to the treatment even though the average I mean the probability of responding is 80 percent. I may by bad luck I may pick you know out of 20 people 15 people did not respond to the treatment that can happen with a very low probability it can happen. So, the data shows that you know the sample proportion is only 0.25 whereas, the real proportion is 0.8 that can happen. So, if I look at that data then I will be I will not be accepting the hypothesis that the new drug is better. So, I will have made a mistake that that can happen. So, in hypothesis testing we will make mistakes the only thing is we want to put a probabilistic bound on those mistakes. So, as I said the choice of what to call the null hypothesis and what to call the alternate hypothesis is the matter of convention. So, in in courts of law often the null hypothesis is the person is innocent and the alternative hypothesis which can be accepted only with enough evidence to overturn the null hypothesis otherwise you say that the null hypothesis cannot be rejected is that ok. So, that is you know innocent until proven guilty that is the way many courts operate. So, for example, the consequences of wrongly rejecting the null hypothesis that means if you wrongly convict somebody that is a very bad thing I mean that is what we feel. So, supposing you I mean this is a matter of opinion and convention between between us. So, supposing I say that I wrongly convict somebody and punish then you know later on we find out that actually we had made a mistake that is one type of error which we can make. The other type of error which we can make is we wrongly conclude the person is innocent and do not punish actually the person is guilty, but we do not punish that is also mistake, but we may say that the first mistake is a more serious one because an innocent person was punished. So, for example, in extreme cases supposing you know you hang a person who is innocent then there is no way of even compensating that mistake or you know if you impose a severe penalty then you know there is no way of retrieving that that loss. Whereas, the other way round that is somebody who should have gone to jail or who should have been hanged as per the laws of the country was not hanged that is also a blot on the justice system, but that is a less serious error is that ok. So, there is a difference between the two types of mistakes. So, depending on which one we view as more serious we we appropriately define the null hypothesis in the alternative hypothesis. So, the convention is that wrongly overturning the null hypothesis is a more serious error than wrongly accepting the null hypothesis ok. So, that is where this type 1 and type 2 error come in which we will we will talk about. So, choice of so, the null hypothesis is called H naught and the alternative hypothesis is called H 1. So, I told you that either H and H dash or H naught and H 1. So, in this terminology when an investigation is aimed at establishing an assertion with substantive support obtained from the sample the negation of that assertion is taken as the null hypothesis the assertion is taken as the alternative. So, when you want to make a new claim then that is put as the alternative status quo is often the null hypothesis. So, before claiming that a statement is valid adequate evidence must be produced to support it. So, if I if I want to establish that a person is is likely to be guilty then I have to give adequate evidence for it otherwise you say that you you will stick with the null hypothesis. So, the null hypothesis should be regarded as true and should be rejected only when the data strongly testify against it ok. So, this is what I said that. So, in a sense the the purpose of the hearing is to see whether there is enough evidence to establish the guilt ok. If there is not enough evidence then you do not conclude that he is guilty you do not you do not actually prove that the person is innocent or something that is not the purpose of the the trial the purpose of the trial is to see if there is enough evidence to convict ok. So, technically I suppose you should you should conclude at the end of a trial that he was found not guilty you do not say he was found innocent or something you say he was found not guilty that is there is no evidence to conclude that he is guilty. So, therefore, he is not guilty at the moment the other way round. So, if you start with assuming that he is guilty and then you try to say that he is innocent then you conclude that the person is innocent. So, this is just a question of language, but you can see the way the the consequence that I that I have said. So, I believe in in in Scotland or something they have this this third verdict of not proven. So, you know you say that there is not enough evidence to conclude that the person is guilty and in some cases there is not enough evidence the other way also that is if you start to the assumption that guilty there is not enough evidence to prove that the person is innocent also. So, you sort of leave the person in in limbo in sort of in between and say that not proven and that can also be very damaging I mean that is it is like it depends on on on how society views these things. So, at least in the old days the Scottish courts used to you give this verdict of not proven that is we sort of we tend to believe that the person may be guilty, but we do not have enough evidence of it, but we do not have strong evidence that the person is actually innocent also. So, we leave it somewhere in between. So, I think in India and many other places we we we declare the person not guilty that is it or we convict. So, now coming back to this more more tangible problem. So, supposing the existing cure rate is 60 percent that is the probability of cure is 0.6. So, randomly selected person from the population will react successfully to the new medication with 60 percent probability. So, supposing 20 patients were selected at random and the new drug was administered with proper procedures. So, x is the number cured. So, now based on the value of x can we conclude that the new drug is better than the old one. So, that is the statement. So, actually let p be the cure rate of the new drug. So, what we want to see is supposing p is the actual the success rate of the new drug you know could we have got this x from from that p then you know we will take a we will take a call on this. So, in view of the guidelines of our hypothesis testing paradigm or the language of hypothesis testing or the framework of hypothesis testing p is less than or equal to new 6 p p is less than or equal to 0.6 is the hypothesis that the new drug is not better. We actually want to come out to the evidence that the new drug is better otherwise the old drug is there it is already there in the marketplace there is nothing substantially better to change over. So, let x denote the number of cures out of 20 in the trial that I have conducted. So, x can take on value 0, 1, 2 up to 20. So, I can say something like this I will if 15 if 15 people out of 20 respond to the new drug then I will actually assert the new drug is better than the old drug. Old drug had success rate 60 percent. So, this one has better than that and if it is 14 or less then I will I will not change over. So, here my I will conduct a test. So, the sample size is in this case 20 and I will test I mean I will record the number of successes and that is x and for a given p the number x is a random variable for a given p depending on the outcome of that random variable measurement I will I will decide whether to accept or reject the claim. So, such a number such a random variable x is called a test statistic. So, remember that you know I have to specify this procedure up front I have to say that on this basis I will accept or reject the test. So, the test is the new cure is administered to 20 people and I will measure the number of successes that is a random variable depending on the value of that random variable in this case one random variable I will I will have a region where I will accept the hypothesis the alternate hypothesis or I will accept the null hypothesis I mean basically either reject the null hypothesis or accept the null hypothesis that is the way it is it is put. So, that that random variable which we define is called the test statistic. So, now, coming to the type the two types of errors that I spoke about the decision which is reached by a test it could be wrong. So, this is the first thing that you know every statistician or every data analyst or every probabilist or scientist dealing with random phenomenon has to accept and internalize that we could be wrong. So, you know unlike a deterministic phenomenon where you are either right or wrong probably. So, in making a statistical statement we could be right or wrong we are only trying to put some concrete quantitative bound on on our errors. So, as I said you could make a mistake. So, the question is what type of mistake and what is the likelihood of that mistake and how much confidence do we have in our assertion? We can run away from the whole thing and say because I am not sure I would not say anything, but that often is not possible. So, we have to make a statement. So, for example, in the light bulb example supposing we are buying equipment for this I can say that you know I will test each and every piece of equipment till its failure and then only I will buy it then you know I will be left with nothing to use. So, I will get a lot of 1000 light bulbs I will select 10 of them and I will find out what is their lifetime till destruction and on that basis I will accept or reject the hypothesis regarding the entire sample. So, that seems more practical because you know. So, this is this is often the basis for quality control through inspection right. So, supposing I have to certify that some material that I have purchased is of the desired quality I cannot sample the entire lot I have to sample a part of it and on that basis I have to either accept or reject a lot which claims to come with a certain quality. So, this is in fact a very practical thing which is which is done you know day in and day out. So, lot of survey sampling. So, your TRP ratings in advertising, your customs inspection procedures, your audit principles, quality control on shop floors in manufacturing environments, quality certification for any attribute it is based on these principles of you know sampling and coming to some conclusion based on that. So, what is being talked about here is actually very tangible and concrete in an uncertain world right. So, the two types of errors is are as follows one is that the null hypothesis is true, but the test says that it should be rejected. So, we wrongly reject the null hypothesis. So, this is called type 1 error. So, this is a sort of the more serious one. So, tests are often designed to control this type 1 error. The other type of error is actually that the null hypothesis should be rejected, but the test the random outcome of the test says that we should accept it continue to accept it. So, that is also a mistake. So, that is called type 2 error. So, is that what type 1 and type 2 error. So, here is the sort of summary. So, on the columns are the real state of nature that is H naught is true or H naught is false, which we do not know, but those are the two possibilities. Our actions are on the rows either do not reject H naught or reject H naught. So, if H naught is true and we do not reject H naught then that is the correct thing to do, but if H naught is true, but we reject H naught then then we can commit a wrong which is of type 1. So, that is called type 1 error and if H naught is false and we, but we do not reject it then that is also wrong that is called type 2 error and H naught is false and it reject H naught that is correct. So, this is the guideline for in a sense defining what is H naught and H 1 in the first place as I said it could be done either way. So, generally speaking falsely rejecting H naught that means, type 1 error is viewed as a more serious consequence than failing to reject H naught when H 1 is true. So, is that ok that is the. So, in the in the legal analogy falsely convicting somebody who is actually innocent is considered a more serious error than declaring a guilty person innocent by mistake ok. So, that is the. So, to put some probabilities to it probability of type 1 error is the probability of rejection of H naught when H naught is actually true. So, that is called alpha and probability of type 2 error is the probability of not rejecting H naught when H 1 is actually true. That means alternate is actually true, but we do not reject the null. So, that is called beta. So, generally we put some bounds on alpha and then try to get the best beta possible you know we do not want alpha to be more than something because you know that is you know consequence that we do not want to bear. So, for example, we may say the probability of type 1 error should be no more than 5 percent or something like that. So, a test of the null hypothesis is a course of action specifying the set of values of the random variable test statistic x for which H naught is to be rejected. So, for example, in our testing for a new new treatment H naught was that the probabilities less than or equal to 0.6. So, we will we will reject it if x is the high value out of 20 if let us say 15 are successful then it seems likely that you know the value of the value of p is in fact more than 0.6. So, we will reject the null hypothesis and accept it. So, we may say we can say the following for example, that we will reject the null hypothesis if x is 15 or higher. So, that is a possible test that we say. So, in a given implementation of this test we take a sample of size 20 draw I mean test I mean see the outcome of the treatment on those 20 randomly chosen persons record the number x which is the number of successes if the number is more than 15 we reject the null hypothesis. So, this is our proposed test. So, we would like to see. So, the random variable whose value serves to determine the action is called the test statistic and the region of values the range of values for which the null hypothesis is rejected is called the rejection region of the test. So, that is just the language used. So, for the example if the null hypothesis is the probability of success is 0.6 or less and the null alternate is that the treatment is in fact, better than the existing one which is 0.6. So, the alternate hypothesis h 1 is that p is greater than 0.6 we could propose that h naught is rejected if x is greater than or equal to 15 and h naught is not rejected if x is less than or equal to 14. So, this is the possible test. So, x is actually a known probabilistic random variable. So, it follows a binomial distribution with n equal to 20 and p the probability of success of an individual trial. So, what is the what is the type 1 error? So, the type 1 error is the probability of rejecting h naught when h naught is true. So, that means that. So, supposing p is 0.6 then what is the chance that I get 15 successes? What is the chance that I get 16 successes or more? In that case I will be actually rejecting the hypothesis that the null hypothesis. So, summation from 15 to 20 for p equal to 0.6 I choose k sample I mean I can get k successes out of 20. So, p raise to k and 1 minus p raise to 20 minus k. So, those are the failures. So, that quantity is the type 1 error is that ok. So, supposing p is 0.6 that means the treatment is not really not really better than that that is a maximum that the null hypothesis is true. So, supposing the treatment is 0.6 but by chance I get 15 or more successes then I will say that actually the success rate of the new treatment is better than 0.6 and I will overturn null hypothesis in error. So, let me restate actually the success rate of the new treatment is only 0.6. So, roughly speaking 12 out of 20 should be responding favorable to the treatment, but by chance I got lucky and I mean lucky or whatever I by chance I got 15 or 16 or up to 20 successes. In that case I would wrongly conclude that the treatment has an effectiveness of more than 0.6 based on this test. So, the chance of that happening. So, my test definition is that x if x is greater or equal to 15 out of a sample of 20 reject the null hypothesis. So, I could reject the null hypothesis wrongly which means I put p equal to 0.6 compute this binomial expression and if this quantity tells us that the probability of making an error of type 1 is that ok. So, you can just think about it and say that this will be some quantity between 0 and 1. Hopefully this quantity should not be too large that means then you know if this quantity is you know 0.25 or something then you know there is a 25 percent chance that I am making a mistake of type 1 which is which is a bit uncomfortable. So, normally the language uses that you put a level of significance. So, significance level for a hypothesis test is a bound on the type 1 error that you want to commit ok. So, it the concept is quite simple, but the way it is put is that significance level alpha is the bound you would like to put on type 1 error for a test namely the probability of wrongly rejecting the null hypothesis when in fact, it is true. So, let us just look at probability p equal to 0.3, 0.4, 0.5, 0.6. So, if p is in fact, 0.3 then you know we are very unlikely to get more than 15 successes I mean up to 3 decimal points it is 0, but you know with p equal to 0.6 we actually can get 15 or more successes just by chance with a 12.5, 12.6 percent chance. So, if we design a test saying I will conclude that p less than or equal to 0.6 is not true if I get 15 or more successes then 12.6 percent of the time even with 0.6 probability of success I will actually get 15 or more successes. So, I will be making a mistake of type 1 12.6 percent of the time. So, if we are ok with that then we can go ahead with that test. So, do you see the issue involved in designing the test. So, in a hypothesis testing the task of the analyst is to design the test. So, here the design of the test is that let say I have funds to conduct tests on 20 people. So, the sample size is given to you let us say. So, now I have to say what is my acceptance region and rejection region. So, what is the number of people whose successes I will conclude as rejection region for the null hypothesis and the complement of that will be the acceptance region ok. So, there is a rejection region what is the probability that I will be rejecting in error you know by mistake. So, that is alpha which is the type 1 error which if the true probability is 0.6 then the test that we will get 15 or more successes with 0.6 binomial probability is actually not insignificant it is 12.6 percent. So, it is like saying that the I am asked to give a recommendation whether the new drug is better or not and actually the new drug is just as effective as the existing thing which is 0.6 based on this test I will overturn the existing treatment and go for the newer treatment hoping that it is better and I will be wrong 12.6 percent of the time I mean if this if this activity is done repeatedly. So, that means in error I will be changing my system and you know revamping everything and my norms and my standards and I will do that 12.6 percent of the time. So, that is maybe too much of a I mean if you are uncomfortable with that then we will say let us not go with 15 or above let us tighten it up and say 16 or above or 17 and above. So, that is the type of decision we may say or we may say we need a larger sample size because you know the larger the sample the more accurately I can pin it down. So, I hope you got the flavor of it you have to sit and work it out it is the underlying mathematics or the underlying statistical computations are actually quite straight forward in this case it is just a binomial probability computation. Of course, for a large sample size and when n p is small you can do a normal approximation to the binomial and you know use all the things that you have learned so far. The only new thing is what is the question what is the application of the principle. So, the language of hypothesis testing has to be understood that there is a there is a thing called the null hypothesis there is a thing called the alternate hypothesis. There is a test statistic which is a random variable defined by us depending on the value of the random variable there is a rejection region there is an acceptance region and there is a chance of committing an error of rejecting the null hypothesis by mistake when it is actually true and that error is called type 1 error and the maximum level of that type 1 error is called the level of significance of the test and so test has to be designed with that in mind. So, this is the summary of what I have said so far any questions from your side. So, in this case the level of significance is 0.126 that means there is a 12.6 percent chance of making a type 1 error. So, normally you know we would like to say beforehand what should be our type 1 error. So, we can set it in advance and then design the test accordingly. So, we may say we want to make only a 1 percent error or 5 percent error or 10 percent error of type 1 and so for example, 0.05 level of significance means that the chance of type 1 error is no more than 5 percent. So, I am willing to live with you know 5 percent of the people being hanged by mistake or whatever we are trying to. The type 2 error is the probability of not rejecting H naught when H 1 is true which is a similar binomial computation which we can compute for this one for different values of p is that ok. So, it is again in this case it is say p is 0.8 actually it is much better than 0.6 ok, but by bad luck we get only 12 successes out of 20 or even less. So, in the way we have defined our test for any value of p greater than 0.6 we could still get 14 or fewer successes. I mean we could definitely get 14 or fewer successes with some probability what is that probability that is type 2 error. That means, actually the value of p is say 0.7 or any value like that which is more than 0.6, but our test says you got 14 or fewer successes. So, do not do not over turn the null hypothesis even though actually it should have been. So, that also is there. So, type 2 error is also there that we could continue to accept the null hypothesis and it is in fact false. So, that type 2 error also can be computed. So, any given test will have a type 1 error and a type 2 error. So, you have to sort of balance out these things is that ok. So, normally the first criterion for designing a test is a bound on type 1 error and that is called the level of significance. So, that is the first thing then if we have any leeway in defining the test either by increasing the sample size or by redefining the number of successes or whatever we are trying to measure then we can try for putting a bound on type 2 error ok. Normally, we if we have only one parameter in the test statistic one random variable in the test statistic. For example, the sample size is given and we just have to decide what is our threshold value for accepting or rejecting that is the only thing in our control. Then all we can do is put a bound on type 1 error and say that this is a level significance test for level of significance for this test. And we can say that there is a resulting type 2 error we cannot do anything about it, but we have control the type 1 error is that ok. So, in this example for a given x I mean for a given level like 15 for a given p you can you can compute the type 2 error and you have to live with that that is it. There is another term used called the power of a test which is 1 minus type 2 probability of type 2 error. This is another term that is used which is the probability that the test will reject H naught when H 1 is actually true. And it is in this sense that the that means we are doing the right thing we are doing the right thing in rejecting H naught when H 1 is actually true. So, that is called the power of the test. So, the the one of those four boxes. So, H 1 is actually true and we actually reject H naught. So, this bottom right entry in in this table of four that H naught is false that means H 1 is true and we reject H naught. So, that probability of that of that happening is is called the power of the test. So, it is 1 minus given H naught is false two things can happen either we reject H naught or we do not reject H naught. The probability that we reject H naught is called type 2 error beta 1 minus that is called the power of the test. So, we would look for a test with high power. So, this is the same thing this is what I said earlier. So, in our example for for different values of p we can we can talk of the power of the test. So, for example, p equal to 0.8 the power of the test is 0.804 for p equal to 0.9 the power of the test is 0.989 that means for for p equal to 0.8 the test will rightly throughout the null hypothesis it should because the the probability is actually greater than 0.6. In fact, it is 0.8 it should throughout the null hypothesis it will do that 80 percent of the time at p equal to 0.9 which is significantly more than 0.6 it will throughout the null hypothesis 0.989 percent of the time this particular test which is rejecting the null hypothesis and x is greater than or equal to 15. So, every test will have some power. So, you know we can we can define different rejection regions we can say I will I will throw out the null hypothesis null hypothesis p is less than or equal to 0.6 that is the null hypothesis I will throw it out when my test statistic is greater than 15 or 18 or 14. So, each each rejection region will will have different power I mean will different will have different type 1 and type 2 errors different power also. So, we just have to select the right one. So, to follow up you can you can just read chapter 8 of of the text book and of course, these. So, the material is very standard. So, unless you put pen to paper and try to just follow the examples given in the book there are enough solved examples in the book and you know one or two of these problems will be taken up for tutorial sessions. So, the the material is very standard. So, only you have to just get familiar with the terminology of this this this topic which is actually a fairly useful one. I mean it is a it is got an interesting history of how how people have tried to formalize uncertainty and it it is it is practically useful in in several domains and if you look up any experimental research work reported you will see level of significance reports of statistical I mean of any experimental data that you will see such terms being used. You know we can certify something up to some level of significance which means they are putting a bound on the type 1 error. So, in in statistical quality control certification sampling survey marketing you know anything where you are relying on for example, sampling from a small set of data or where you are doing experiments which are subject to uncertainty you have to make some statements and this is one of the standard ways of making it. So, it is it is worth knowing about this from a purely practical point of view it will help you understand what is the way people talk about uncertainty in a in a formal way. So, just two small things to to wind up one is that at least in the quality control type of literature where you are you are supposed to inspect and this thing the the term OC is used operating characteristic that is a curve I mean that just a plot of so in this case for example, P is the probability of success of that treatment I plot that on the x axis and I plot the probability of accepting the hypothesis. So, that is called operating characteristic and 1 minus that is the power of the test. So, that the term OC is used. So, that will be some. So, in this case as P increases my probability of accepting the null hypothesis decreases. So, it will you can you can easily see that this this function B is some decrease.