 So, we clearly want to reduce the value of beta. In other words, we want to have a large value for 1 minus beta and this 1 minus beta is referred to in the statistical literature as how powerful your hypothesis testing analysis has been. So, how powerful is the test? Now, if you just think and reason it out and again I am trying to do this entirely without algebra and some of this algebra can be discussed offline using Moodle. But if you think of what this error depends on, it depends on where we located our alternate hypothesis. So, remember what I did before, I plotted a distribution for H naught centered around mu naught and then I plotted a distribution for H 1 centered around mu 1 and that set of goalposts I had for H naught ended up defining for me beta. So, it depends on mu 1. So, if I increase mu naught minus mu 1, the power of my test will increase. In other words, beta will become small if I increase and you can see this from a graphical perspective. So, what we want is a good diagnostic kit, what we want is a powerful test. So, if I have a powerful test that means I want to make very few errors of both types. I want a small alpha and I want a small beta that gives me a very powerful hypothesis testing procedure. So, for that having already fixed alpha, what you end up doing is you want to make sure you have got a small beta and that boils down to saying try to minimize beta by making sure your mu naught and mu 1 are far apart. So, if I already sketched H naught and here is a depiction of beta, you can quickly see that if I move my curve for H 1 to the right, the overlap between the two curves will decrease, the overlap between the two curves within my goalposts will start decreasing. So, I am moving the H 1 curve to the right, the overlap decreases and the overlap decreases that beta decreasing and if beta is decreasing 1 minus beta is increasing and so I have a more and more powerful test. Now, that is basically like saying mu naught is for example, 9.8 meters per second square. So, in other words if mu H naught is mu equals 9.8 meters per second square and H 1 is mu equals 9.4 meters per second square, what I will boil down to is saying I have a very powerful way of distinguishing between 9.8 and 9.4. Now, on the other hand I will probably not have a powerful enough test if I want to differentiate 9.8 meters per second square and 9.7 meters per second square because the two distributions according to the two claims are probably overlapping way too much in which case I will have a large error of some type or the other. So, very quickly you realize that the only thing you have control over in this entire hypothesis test is probably n because in most cases you know mu 1, you know mu naught, you fix these and the only parameter that you can control is the number of measurements you will go out and get to try to improve on this analysis. So, what I will very quickly list out here is what are the variables that the sample size in an experiment depends upon. So, what should n inherently depend upon and therefore, in your specific domains you should go back and think if you wish to make distinctions between different theories or different claims about model parameters, this should help you define a procedure how to identify a correct number of samples in your analysis. So, the number of samples that you are going to have to get will clearly depend upon whether you are separating mu naught and mu 1. It will also depend on how much error you have in your data collection in the first place. So, what is the variation in your data? For example, if you are collecting the heights of people, if you have a ruler with least count 1 centimeter that is clearly worse than having a ruler with a least count of 1 millimeter. So, this is 1 centimeter and 1 millimeter are reflected in the variation you see in the measurements and on top of that individuals have different heights and you are trying to capture the variation of heights in the entire population. So, there is a certain value of sigma square and if there is a large variation of heights in the entire population, then you have got a problem trying to make comments about the precise value of an average. So, there has got to be a dependency on sigma square and I will show you that graphically in just a minute. It depends on how precise or controlled you want or significant you want your analysis to be. So, in other words it will boil down to where you put down your goal post. So, alpha and finally, it will come down to how much error of the other type you are going to commit moment to fix alpha. In other words what is beta? So, you want alpha and beta to be small sigma square is probably something you can do nothing about. So, if I am trying to find out the heights of people then inherently the entire distribution of heights in the entire human population has mean mu and variance sigma square and that is not something I control that is something that is a property of the entire population. If I am trying to distinguish somebody's claim that the average height is 3 feet that is mu naught from the claim that the average height is 2 feet that is mu 1, I can do that. So, I set mu naught and mu 1. So, again in this entire exercise the one thing I control is n. I cannot control sigma square I cannot control mu and I do know that alpha and beta are to be small. There is if you sit and do algebra a way to actually exactly calculate the number of samples that you want and that is that equation on the top right, but that equation is derived for one specific scenario and the terms both the numerator and denominator will change marginally from situation to situation. So, I am not going to again get trapped into describing various algebraic equations to you instead again we will do this from a totally graphical perspective. So, it boils down to asking where are the intervals and relative to the goal post how do we think the sample size is going to matter as we try to make distinctions based on the 4 parameters or 4 variables in fact that I am listing the top left. So, let us look at each of these 4 one by one. So, we said that the sample size depends on how far apart the 2 conflicting claims about your model parameter value are. Somebody says the average human height is 3 feet somebody else says the average human height is 2 feet that is mu 0 and mu 1. Immediately you can get a sense that it is easier to distinguish 3 feet from 2 feet then it is 3 feet and then 3 feet 1 inch as your alternate hypothesis. So, in other words as the difference between the 2 statements increases in other words as the magnitude of mu 0 minus mu 1 increases you should be able to clearly distinguish between the 2 scenarios with as minimum a number of samples as possible. So, n should decrease intuitively as mu 0 minus mu 1 increases because the claims are getting further and further apart and a few measurements should help us resolve this. So, start with a scenario where mu 0 and mu 1 are close together. So, it is hard therefore, because the 2 alternate claims are close together remember that we are actually setting up a goalpost based on H naught that goalpost is we are defined on the choice of with the choice of alpha and alpha we are trying to keep very small order of magnitude 0.05. So, if I have gotten myself an alpha and mu 1 is close enough to mu naught when I now sketch H 1 I will suddenly find myself with the large overlap between the 2 curves within the acceptance region of a test and consequently beta is high beta is large and as I move the H 1 curve to the right and that is what I am doing now in the bottom. If I move the H 1 curve to the right clearly the overlap has decreased with H naught and that boils down to saying beta is now small and surely I therefore, have a better test. So, if you start moving your goalposts further and further apart you are going to end up with an improved test. If you try to keep your claims about the model parameters further and further apart you need fewer samples to work out the truth of what is going on. So, you can very quickly reason this out for a dependency on the variance once again. So, again I have a situation where I have a certain value for the variance it is actually a small value for the variance on top there is a lot of overlap. If my variance increases that basically means I have a lot of imprecision in how I am collecting my data. So, if there is a lot of imprecision that is a lot more uncertainty about what the true value is and that in turn should imply that it takes us far more sampling to actually end up at the ground proof. So, we are it boils down to be sample to reduce our uncertainty as to what is going on. So, if therefore, I increase my sigma square relative to what I have on the top the first thing is the curve gets kind of flattened because the variance is now larger. You can of course, remember that the area under the curve is 1 it is a probability distribution and. So, to keep the area constant with the larger variance I end up with a flattened curve and now on this curve if I insist on still using the same goal post as a consequence of alpha let us say 0.05 my goal post is still based on 0.05 what is going to happen is I am going to have now 2 flattened curves which overlap each other quite a bit. And therefore, it again forces me to come up with a very large number of samples to overcome and increase in the variance. We saw this indirectly when we looked at random variables yesterday we said that if you sample more and more we will be able to minimize the thickness of a distribution. So, we said that a normal distribution with variance sigma square by n if you start increasing the n then sigma square by n will decrease which means your curves are getting thinner and thinner and they are all being centered now practically as a spike around the mean value mu. So, similarly there is a dependency on the choice of alpha and again let me show you that graphically. So, I have for myself a certain choice of alpha on top let us say it is 0.05 that is the pink area under the H naught curve and if I decide to decrease alpha on this curve what should happen is I am moving my goal post out further that is the only way I will end up decreasing alpha. So, I am keeping my acceptance region for the null hypothesis as wide apart as I can my overlap is decreased and if my overlap decreases notice that my beta increases and to counter that ultimately I am going to have to beta being an error which we do not want to counter that ultimately it will turn out I need to increase the number of samples that I want. So, in other words it becomes harder for you to distinguish between two hypothesis H naught and H 1 if you want a conclusion with a very very high significance level statistical significance level on the other hand if you are very casual about the significance of your statistical analysis. In other words you are going to keep alpha reasonably high then you will get away with a study comparing H naught and H 1 that is probably not what you want in a rigorous scientific process. So, an increase in the sample size increases the power of the test by decreasing beta. So, as I just said if I increase the sample size what should happen is my distribution should grow thinner and thinner right now the I am showing them to you as fat distribution. So, as I increase the sample size the variance of these distribution is sigma square by n and so consequently the variance sigma square by n will decrease as n increases my curves get thinner and the curves get thinner the overlap between the curves should go down and that in turn means beta will go down. So, if you now look at this because I increased n compared to my first analysis because I went into a second analysis more samples my overlap decreases because I have got thinner curves. So, net result is you really see that the effect of increasing the number of samples is on getting a better power of the test or vice versa if you want a better power of the test you have got to work on increasing the number of samples to an appropriate level. Now, I need to go back to a point at made early on about H 1 was H 1 important. Now, if you remember the philosophy of what we are doing H 1 is actually the statement we want to prove true H 1 was a drug is better than a sugar pill and that is actually what we want to prove as a company, but we ended up saying that that is hard for us to prove all the time that is a molecule is better than anything else. So, instead we will set up the analysis as a molecule is as good as a sugar pill and we try to disprove that, but then we also said that H 1 had different ways for being written. So, we said that you could say that H 1 is theta not equal to theta not theta greater than theta not or less than theta not and so on. So, we are different ways of writing this there are so many ways of doing it is the choice of an alternate statement important. Why is that? So, because if you think about the procedure the sketch I am showing you on this slide has nothing to do with H 1 yet it is all only about H not. So, when I say H not mu equals mu not I get myself this curve I get myself this interval centered around mu and that becomes my acceptance region for H not. So, at this point there is no mention of H 1 and if the entire decision making about a test is about this acceptance region at first sight H 1 is not important, but it turns out H 1 is important for a subtle reason because you can control the rigor with which we carry out this test. So, let us look at that particular point. So, H 1 influences the interval estimate. So, how does it influence the interval estimate? So, go back to this alpha alpha was an error alpha is the error that H not is true, but because we are saying extreme measurements too far away from the mean we prefer to go with H 1 whatever H 1 is and in this particular cartoon we are acknowledging the fact that our extreme measurements which causes to go away from H not and to H 1 these extreme measurements could occur either to the left of mu or to the right of mu. In other words we could see very low values for our measurements or very high values for our measurements and so this error that we expect to allow or tolerate in our hypothesis testing procedure is an error which we now break up into two parts on the left and on the right and you say there is an equal chance of making an error on the lower side as there is a chance of making an error on the higher side which is why this error alpha is equally split as alpha by 2 areas on both sides of the mean. But what if we cut the impression from the way the problem itself is set up in hypothesis that if an error is going to be committed that the errors are going to be committed only on one side of the mean. And so I will give you an idea of where that tends to happen. So, think of somebody manufacturing for example light bulb. So, the manufacture of a light bulb is interested in claiming that is light bulb last on average 1000 hours. So, light bulb will last 1000 hours on average so mu equals 1000 hours that is if you sit and think about it given the way advertising works there is a good chance of the manufacturer is inflating this claim of the light span for light bulb. So, the manufacturer says that the light bulb last 1000 hours the reality you would think as somebody suspicious for the inflated advertising. The reality you would think is that the light span is indeed let us say 900 or for that matter any value less than 1000 hours. In other words your expectation indirectly as you start this experiment and in fact we have not even talking about measurements yet, but your expectation is that despite the H naught value of 1000 as claimed by the manufacturer that any measurement that you proceed to get will end up falling below the value of 1000. Therefore, you expect to see measurements the bulk of them at any rate on the left hand side of your plot not on the right hand side in which case if all the measurements are or at least most of the measurements are expected to fall on the left hand side then there is no need to expect an error in our hypothesis testing on the right hand side. So, why should we expect that we will see very large measurements and then we will end up concluding that the manufacturer is wrong because you have seen very large measurements. In fact if you see very large measurements I am sure the manufacturer would be glad to further inflate this claim of the light span of a bulk, but the manufacturer is probably already having cheated set up an inflated claim. So, therefore, if our measurements fall on the left hand side you have got to acknowledge that by clubbing all the areas that we have. So, in other words the entire alpha rather than being split on the left and on the right now needs to be packed onto one side pack it onto the left. So, this sets us up with the notion of a one sided and a two sided test. So, what we have been doing all along it turns out is a two sided test and a two sided test in in turn it turns out is because while we had said null hypothesis is mu equals mu naught. We had implicitly assumed that the alternate hypothesis is mu naught equals mu naught and at that point we are not said whether our measurements will fall less than mu naught or greater than mu naught. So, to be on the safe side we said that measurements could fall on either side and if you are going to commit an error in a hypothesis testing there is an equal chance of committing an error on both sides. So, we split alpha equally on to both sides. So, the alpha being on both sides is actually linked to H 1 being written down as mu naught equal to mu naught, but if instead I expect the manufacturer to inflate his claim about the lifespan of a bulb then a preferred statement of the alternate claim is that the lifespan mu of the light bulbs is less than the manufacturer's claim mu naught of 1000. And the moment I make that kind of a statement I imply that all my measurements or the bulk of my measurements will fall on the left hand side and if they fall on the left hand side I pack now my entire probability of error on the left hand side. So, the entire alpha now is on the left hand side and that changes the pair of goal posts that I have been working with in terms of accepting or rejecting the null hypothesis. So, this now is a one sided test, it is a one sided test where I have packed all the alpha on the lower side and conversely if I write H 1 as mu greater than mu naught then all the alpha will get end up being packed on the right hand side. So, that sets up an immediate question is there are two ways to do this based on where you are getting the goal post in turn based on what you are calling your null and your alternate hypothesis in particular your alternate hypothesis which is the preferred way to test a hypothesis the one sided test or a two sided test. So, if you think about the manufacture of light bulbs the whole reason to get into a testing experiment is because we feel that the manufacture is telling a lie with 1000 hours being the life span of light bulb. So, if you have this expectation that the manufacture is inflating a claim then what is the procedure you will follow you have to buy some light bulbs you have to test out how long this light bulb last you will compute the arithmetic mean of the life span of the batch of light bulb you bought that is your x bar and you end up asking is x bar close to mu naught and in fact what you are expecting is x bar is less than mu naught and if it is less than mu naught you are essentially implying that the manufacturer has come up with a false claim. So, now notice that because all the alpha is packed on one side if you want to squeeze the manufacturer you go with the second less than mu naught statement that allows you to more easily squeeze the manufacturer because essentially you are expecting that the manufacturer will have a window now of width delta 2 on the lower plot and that is a smaller value compared to the interval delta 1 and that is easy to see because the fact that all the area alpha has been packed on to one side surely that threshold delta 2 is less than delta 1. So, the more conservative way with which to squeeze somebody in a hypothesis test is to do a one sided test and you do the one sided test if you have got a gut feeling that somebody is been falsifying a statement about a null hypothesis. In other words that somebody is inflated a claim and then you accordingly pack your possibility of an error on the right side to squeeze that claim and the bulk of the measurements hopefully end up confirming what you set out to do. There is one final definition I need to put your way and that is of a p value and this is a subtlety because most published research ends up being a statement of whether you are in agreement with the hypothesis or you are not and the agreement with the hypothesis or not is boiling down to whether you are within an interval or whether you are not. So now unfortunately when people come up with a binary statement that you accept or reject a hypothesis you do not convey how close you came to changing your mind in carrying out the hypothesis. In other words you never report the x bar value that you see and if you do not report the x bar value that you see and you only report the conclusion of a test yes or no accept or reject and that result is people are unaware as to whether the x bar was very close to mu not or actually very close to the goal post but just inside the goal post and the difficulty is this if your x bar is just inside the goal post then technically you are accepting h not but you are leaving yourself open to the possibility that if you get few more measurements your arithmetic mean will change and now the new x bar could fall outside the goal post in which case you have to go and change the conclusion of your test. So therefore it becomes important to convey not only what you consider the conclusion of the test from your testing procedure but you also got to convey to a reader how close you are to changing your own mind about the conclusion of your test and the way to do that is to either report x bar relative to the thresholds of the test but alternately you could instead report how much area lies outside x bar just like we have had this pink area outside our goal post all along. So the pink area outside our goal post was alpha the area outside x bar is p and that is the p value and of course if the p value is very small that is because our observations are so far outside in fact outside original goal post. So p value ends up being less than alpha in which case surely our measurements are so extreme that we can reject the claim about null hypothesis with very high statistical significance. So the p value relative to the alpha value conveys to any reader of a scientific analysis how close one is to changing your own mind about the outcome of a hypothesis test. So it is very good practice to convey the outcome of a test as you see it but also to convey what you think is the possibility that you will change your own mind with regards to the hypothesis test that you have conveyed. So kind of to summarize what we have done in terms of defining a procedure and I am limiting myself to what are called parametric test they were one or two questions on model about non parametric test I have not had time to define any of those situations. But let us just summarize then for today what is the proposed approach in carrying out a hypothesis. The first thing is not to do an experiment literally the first thing is to get your model straight state the hypothesis that you expect about a model parameter identify the random variable that you will have to work with to take on and analyze the model parameter for example the model parameter you are going to work with is mu in which case the random variable you will have to set up is x bar and now it is a fight out between x bar and mu. So question is x bar close to mu or not. Now since you are working with random variable for example x bar you will quickly have to tell what hypothesis statements you expect for the model parameters. You will have to define what you think is the alternate hypothesis typically the alternate hypothesis is the statement you want shown to be correct. But remember our procedure we are falsifying instead the null hypothesis instead of proving the alternate hypothesis identify the distribution that applies for your random variable under h naught for example if you are looking at mu random variable was x bar and you got to find out the distribution that x bar follows that is a normal distribution then identify the goal post that is the significance level alpha and this is a very critical thing in any parametric test that you identify the goal post at the beginning of the test because one of the easiest ways you can cheat is to claim that you have a hypothesis collect your data and then after seeing where your data is you then make a recall as to where your goal posts are. So clearly once you have seen your data there is a very large possibility of bias in how you conclude on the test. So the rigorous and honest procedure is that you first identify the goal post before you even perform an experiment and look at the data. So identify the significance level then given the significance level immediately identify the critical thresholds in other words the goal post. So you may need to know an appropriate sampling distribution you may need to know which one it is as a function of sample size we have not covered alternatives to the normal distribution there are several other distributions depending on which parameter you are going to look at there are things called the t distribution the chi-square distribution the f distribution and so on a lot of these I will clarify offline but for now you will have to identify the distribution relevant to your parameter then on that distribution you will have to decide whether you are going to take this alpha which you have chosen step 2 and you will have to decide whether you are going to put it on one side or on both sides and that now starts influencing whether you have a one sided or two sided test and then from there you will have to finally start looking and in fact look at this this is step 4 step 4 is where you finally start looking at the measurements that you have and after collating all your measurements you will compute your test statistic in this case x bar and then finally you will compare x bar against the goal post that you set up and you will identify your conclusions about the test result based on whether you are inside or outside this goal post as defined by alpha that is based on the critical level alpha but as I just said it is important not only to tell what you think is the outcome of the test but you also want to report how close you have to changing your mind about the outcome of the test and therefore it is also good practice and in fact almost every significant journal out there will demand that you report not just the outcome of a test but also the p value to report the test result and the p value and finally what you want to do is you want to report how powerful you think the test has been in other words figure out what the value of beta is and report 1 minus beta so that is the part of the test. So this then is a procedure of 7 steps that one that can be carried out for testing any hypothesis now what I will do offline is I will at some point load collection of problems which you can then take on which demonstrate how for different scenarios will go through a sequence of these 7 steps where you will identify the relevant parameters the relevant variables the relevant distributions then the goal post and then compute from there the test result and the p value and the power of a test but one of the points to appreciate from this compilation of 7 steps is that of these steps are done in any other sequence this is a chance that you will compromise the rigor of the hypothesis test so this is a reason you do these things in a particular fashion specifically looking at samples far after you have decided on the goal post any other sequence will influence the integrity of this hypothesis test. So you will appreciate that really knowing about a hypothesis test requires that you get a good grip on what variables are involved in your problem in your particular domain and on knowing what distributions these variables will follow. So that then is my take on how to approach any hypothesis test and what we will try to do now is respond to some of your feedback and visit some of the centers. So I will simply take the question from the chat so the question is at the time of hypothesis testing how do you identify and control external factors which could affect the system for example with my light bulb example you seem to have identified external factors like electricity fluctuations or how the operator handles the light bulb. The factor of the matter is when you perform a parametric test of one variable we are not discussing the impact of other variables in this analysis and in fact you need to sit through tomorrow's talk where I talk about relationships between variables and how to deal with the impact of what are called hidden or latent variables. So the hidden variables here that you are referring to are the handling of a light bulb and the fluctuation of electricity which might in turn cause variability in the measurements which might of course have nothing to do with the manufacturer's claim about the lifespan. So this basically boils down to understanding how to control the impact of other variables to the latent variables and that practically is the focus of my talk tomorrow. So I will possibly get back to that and come back to the light bulb example tomorrow as well. There is one other question from PSG Coimbatore and that question is under what circumstances you can get type 1 or type 2 errors and what would happen if these errors are not identified during the process of research. So the type 1 error was if you recall alpha in all the cartoons that are sketched and that alpha in turn was something that we fixed. So typical way to go about this is you try to figure out which type of error you really really want to minimize. So for example when we are talking about that diagnostic kit for tuberculosis you have got to identify which of the two types of errors that can happen you really want to minimize for example. The errors are that you either have somebody who is who has the disease and is going misdiagnosed as not having the disease alternately the hypothesis is that somebody is actually healthy but the kit unfortunately diagnosis them as having the disease. So which is the worst of the two scenarios. Some would say that it is worst to have somebody with the disease go and diagnosed because then that in turn implies that the disease has a possibility of spreading further. So if that is your criterion for a hypothesis test goal post then you have got to minimize that error and that in turn becomes alpha and that in turn allows you to define H naught because you want to now keep alpha and remember what is alpha H naught is true but instead you prefer to go with H 1. So it invariably boils down to trying to keep alpha small first before you start worrying about the other error and in a given domain in your given problem of the two errors you will probably find that you fear one type of error more than the other type of error. So that is how specific I can get in terms of an answer. So we are trying to go to BIT Durk and the question basically is about why or what to do with the bad values was that it. No the question was a way to get the population parameters so the mean and the standard deviation form. So now in all of the testing procedure we are starting out by saying null hypothesis mu equals mu naught and in other words I have not bothered to tell you where the mu naught itself came. It turns out that there is no rigorous way to go about this. This in fact is the outcome of previous experimentation that you have likely done or some previous analysis that you carried out. So it is important for you to appreciate that we do not do a test once come to a conclusion and that is the end of the whole matter. You end up refining your analysis by asking related questions and then trying to improve with additional data collection. So therefore for example if I go back to that gravity example where is the 9.8 come from. So there is some previous insight from some previous experiments that the true value may be 9.8 in which case you start asking the question because that is the most convenient question to ask is the value that you expect 9.8. Now the outcome of your test may come out that you do not agree you are not in agreement with 9.8 and if you are not in agreement with 9.8 that leaves you in a situation where you know that some something proposed is wrong but then what you do about that. So what you do about that is you start asking put the value next be 9.7 because you see what happened between the original hypothesis that is 9.8 and your next round of experimentation is that you collected some samples you found for yourself a value for x bar and in the absence of anything else x bar is the best you can claim to be the true underlying value of what it is you are trying to find out. So for example if 9.8 is original proposal and you disagree with 9.8 because your value say 9.6 and the next thing is to set up the hypothesis that the value is 9.6 collect new data and then try to zoom in further on to whether you are in agreement further agreement with 9.6. So this is how we work so in other words there is no question of having a hypothesis test testing precisely one value of a claim and that is that. Now if there is an independent way from theory of deriving let us say from physics that the value is indeed a universal constant 9.8 and that is where it starts but alternately you do this in a kind of Bayesian sense where based on the experience that you have had with experimentation you start refining your values. So over to BIT Durk any further questions that you might have. Yes sir I have one question if the number of parameters number of variables independent variables are more say 50 and somehow we can reduce it but if the number of parameters of 50 and one set we are in one set we are taking 50 input parameters then what should be the number of set we should take to model it the phenomenon. Well that actually is a little unclear in the sense that if you are talking about 50 variables in your analysis it is not clear why you want to test for all 50 at one shot. If you see the way I am talking about the testing of hypothesis I am limiting myself to one variable at a time one claim at a time and I am looking at a value that that particular variable can take at a time. So what you really do not want to be doing is taking on a test where you are looking at 50 separate measurements all of them at one shot where you want to comment about the true population values of 50 variables at a time there is a very high probability that the conclusion of your test will turn out to be wrong if you are looking at so large a number of variables at one shot yeah. So we will take a break